Re: handling null argument in custom udf

2012-12-06 Thread Søren

Right. Thanks for all the help.
It turned out that it did help to check for null in the code. No mystery.
I did try that earlier but the attempt got lost somehow.

Thanks for the advise on using GenericUDF.

cheers
Søren

On 05/12/2012 11:10, Vivek Mishra wrote:

The way UDF works is, you need to tell your ObjectInspector about your 
primitive or JavaTypes. So in your case even if value is null, you should be 
able to assign it as a String or any other object. Then invocation to 
evaluate() function should know about type of java object.

-Vivek

From: Vivek Mishra
Sent: 05 December 2012 15:36
To: user@hive.apache.org
Subject: RE: handling null argument in custom udf

Could you please look into and share your task log/attemptlog for complete 
error trace or actual error behind this?

-Vivek

From: Søren [s...@syntonetic.com]
Sent: 04 December 2012 20:28
To: user@hive.apache.org
Subject: Re: handling null argument in custom udf

Thanks. Did you mean I should handle null in my udf or my serde?

I did try to check for null inside the code in my udf, but it fails even before 
it gets called.

This is from when the udf fails:

Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to execute 
method public org.apache.hadoop.io.Text 
com.company.hive.myfun.evaluate(java.lang.Object,java.lang.Object)
on objectcom.company.hive.myfun@1412332 of class com.company.hive.myfun with 
arguments {0:java.lang.Object, null} of size 2

It looks like there is a null, or is this error message misleading?


On 04/12/2012 15:43, Edward Capriolo wrote:
There is no null argument. You should handle the null case in your code.

If (arga == null)

Or optionally you could use a generic udf but a regular one should handle what 
you are doing.

On Tuesday, December 4, 2012, Søren 
s...@syntonetic.commailto:s...@syntonetic.com wrote:

Hi Hive community

I have a custom udf, say myfun, written in Java which I utilize like this

select myfun(col_a, col_b) from mytable where etc

col_b is a string type and sometimes it is null.

When that happens, my query crashes with
---
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
Hive Runtime Error while processing row
{col_a:val,col_b:null}
...
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to execute 
method public org.apache.hadoop.io.Text
---

public final class myfun extends UDF {
 public Text evaluate(final Text argA, final Text argB) {

I'm unsure how this should be fixed in a proper way. Is the framework looking 
for an overload of evaluate that would comply with the null argument?

I need to say that the table is declared using my own json serde reading from 
S3. I'm not processing nulls in my serde in any special way because Hive seems 
to handle null in the right way when not passed to my own UDF.

Are there anyone out there with ideas or experiences on this issue?

thanks in advance
Søren











NOTE: This message may contain information that is confidential, proprietary, 
privileged or otherwise protected by law. The message is intended solely for 
the named addressee. If received in error, please destroy and notify the 
sender. Any use of this email is prohibited when received in error. Impetus 
does not represent, warrant and/or guarantee, that the integrity of this 
communication has been maintained nor that the communication is free of errors, 
virus, interception or interference.








NOTE: This message may contain information that is confidential, proprietary, 
privileged or otherwise protected by law. The message is intended solely for 
the named addressee. If received in error, please destroy and notify the 
sender. Any use of this email is prohibited when received in error. Impetus 
does not represent, warrant and/or guarantee, that the integrity of this 
communication has been maintained nor that the communication is free of errors, 
virus, interception or interference.




RE: handling null argument in custom udf

2012-12-05 Thread Vivek Mishra
Could you please look into and share your task log/attemptlog for complete 
error trace or actual error behind this?

-Vivek

From: Søren [s...@syntonetic.com]
Sent: 04 December 2012 20:28
To: user@hive.apache.org
Subject: Re: handling null argument in custom udf

Thanks. Did you mean I should handle null in my udf or my serde?

I did try to check for null inside the code in my udf, but it fails even before 
it gets called.

This is from when the udf fails:

Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to execute 
method public org.apache.hadoop.io.Text 
com.company.hive.myfun.evaluate(java.lang.Object,java.lang.Object)
on objectcom.company.hive.myfun@1412332 of class com.company.hive.myfun with 
arguments {0:java.lang.Object, null} of size 2

It looks like there is a null, or is this error message misleading?


On 04/12/2012 15:43, Edward Capriolo wrote:
There is no null argument. You should handle the null case in your code.

If (arga == null)

Or optionally you could use a generic udf but a regular one should handle what 
you are doing.

On Tuesday, December 4, 2012, Søren 
s...@syntonetic.commailto:s...@syntonetic.com wrote:
 Hi Hive community

 I have a custom udf, say myfun, written in Java which I utilize like this

 select myfun(col_a, col_b) from mytable where etc

 col_b is a string type and sometimes it is null.

 When that happens, my query crashes with
 ---
 java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
 Hive Runtime Error while processing row
 {col_a:val,col_b:null}
 ...
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to 
 execute method public org.apache.hadoop.io.Text
 ---

 public final class myfun extends UDF {
 public Text evaluate(final Text argA, final Text argB) {

 I'm unsure how this should be fixed in a proper way. Is the framework looking 
 for an overload of evaluate that would comply with the null argument?

 I need to say that the table is declared using my own json serde reading from 
 S3. I'm not processing nulls in my serde in any special way because Hive 
 seems to handle null in the right way when not passed to my own UDF.

 Are there anyone out there with ideas or experiences on this issue?

 thanks in advance
 Søren











NOTE: This message may contain information that is confidential, proprietary, 
privileged or otherwise protected by law. The message is intended solely for 
the named addressee. If received in error, please destroy and notify the 
sender. Any use of this email is prohibited when received in error. Impetus 
does not represent, warrant and/or guarantee, that the integrity of this 
communication has been maintained nor that the communication is free of errors, 
virus, interception or interference.


RE: handling null argument in custom udf

2012-12-05 Thread Vivek Mishra
The way UDF works is, you need to tell your ObjectInspector about your 
primitive or JavaTypes. So in your case even if value is null, you should be 
able to assign it as a String or any other object. Then invocation to 
evaluate() function should know about type of java object.

-Vivek

From: Vivek Mishra
Sent: 05 December 2012 15:36
To: user@hive.apache.org
Subject: RE: handling null argument in custom udf

Could you please look into and share your task log/attemptlog for complete 
error trace or actual error behind this?

-Vivek

From: Søren [s...@syntonetic.com]
Sent: 04 December 2012 20:28
To: user@hive.apache.org
Subject: Re: handling null argument in custom udf

Thanks. Did you mean I should handle null in my udf or my serde?

I did try to check for null inside the code in my udf, but it fails even before 
it gets called.

This is from when the udf fails:

Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to execute 
method public org.apache.hadoop.io.Text 
com.company.hive.myfun.evaluate(java.lang.Object,java.lang.Object)
on objectcom.company.hive.myfun@1412332 of class com.company.hive.myfun with 
arguments {0:java.lang.Object, null} of size 2

It looks like there is a null, or is this error message misleading?


On 04/12/2012 15:43, Edward Capriolo wrote:
There is no null argument. You should handle the null case in your code.

If (arga == null)

Or optionally you could use a generic udf but a regular one should handle what 
you are doing.

On Tuesday, December 4, 2012, Søren 
s...@syntonetic.commailto:s...@syntonetic.com wrote:
 Hi Hive community

 I have a custom udf, say myfun, written in Java which I utilize like this

 select myfun(col_a, col_b) from mytable where etc

 col_b is a string type and sometimes it is null.

 When that happens, my query crashes with
 ---
 java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
 Hive Runtime Error while processing row
 {col_a:val,col_b:null}
 ...
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to 
 execute method public org.apache.hadoop.io.Text
 ---

 public final class myfun extends UDF {
 public Text evaluate(final Text argA, final Text argB) {

 I'm unsure how this should be fixed in a proper way. Is the framework looking 
 for an overload of evaluate that would comply with the null argument?

 I need to say that the table is declared using my own json serde reading from 
 S3. I'm not processing nulls in my serde in any special way because Hive 
 seems to handle null in the right way when not passed to my own UDF.

 Are there anyone out there with ideas or experiences on this issue?

 thanks in advance
 Søren











NOTE: This message may contain information that is confidential, proprietary, 
privileged or otherwise protected by law. The message is intended solely for 
the named addressee. If received in error, please destroy and notify the 
sender. Any use of this email is prohibited when received in error. Impetus 
does not represent, warrant and/or guarantee, that the integrity of this 
communication has been maintained nor that the communication is free of errors, 
virus, interception or interference.








NOTE: This message may contain information that is confidential, proprietary, 
privileged or otherwise protected by law. The message is intended solely for 
the named addressee. If received in error, please destroy and notify the 
sender. Any use of this email is prohibited when received in error. Impetus 
does not represent, warrant and/or guarantee, that the integrity of this 
communication has been maintained nor that the communication is free of errors, 
virus, interception or interference.


Re: handling null argument in custom udf

2012-12-04 Thread Edward Capriolo
There is no null argument. You should handle the null case in your code.

If (arga == null)

Or optionally you could use a generic udf but a regular one should handle
what you are doing.

On Tuesday, December 4, 2012, Søren s...@syntonetic.com wrote:
 Hi Hive community

 I have a custom udf, say myfun, written in Java which I utilize like this

 select myfun(col_a, col_b) from mytable where etc

 col_b is a string type and sometimes it is null.

 When that happens, my query crashes with
 ---
 java.lang.RuntimeException:
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while
processing row
 {col_a:val,col_b:null}
 ...
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to
execute method public org.apache.hadoop.io.Text
 ---

 public final class myfun extends UDF {
 public Text evaluate(final Text argA, final Text argB) {

 I'm unsure how this should be fixed in a proper way. Is the framework
looking for an overload of evaluate that would comply with the null
argument?

 I need to say that the table is declared using my own json serde reading
from S3. I'm not processing nulls in my serde in any special way because
Hive seems to handle null in the right way when not passed to my own UDF.

 Are there anyone out there with ideas or experiences on this issue?

 thanks in advance
 Søren




Re: handling null argument in custom udf

2012-12-04 Thread Søren

Thanks. Did you mean I should handle null in my udf or my serde?

I did try to check for null inside the code in my udf, but it fails even 
before it gets called.


This is from when the udf fails:

Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to 
execute method public org.apache.hadoop.io.Text 
com.company.hive.myfun.evaluate(java.lang.Object,java.lang.Object)
on objectcom.company.hive.myfun@1412332 of class 
com.company.hive.myfun with arguments {0:java.lang.Object, null} of size 2


It looks like there is a null, or is this error message misleading?


On 04/12/2012 15:43, Edward Capriolo wrote:

There is no null argument. You should handle the null case in your code.

If (arga == null)

Or optionally you could use a generic udf but a regular one should 
handle what you are doing.


On Tuesday, December 4, 2012, Søren s...@syntonetic.com 
mailto:s...@syntonetic.com wrote:

 Hi Hive community

 I have a custom udf, say myfun, written in Java which I utilize like 
this


 select myfun(col_a, col_b) from mytable where etc

 col_b is a string type and sometimes it is null.

 When that happens, my query crashes with
 ---
 java.lang.RuntimeException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error 
while processing row

 {col_a:val,col_b:null}
 ...
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable 
to execute method public org.apache.hadoop.io.Text

 ---

 public final class myfun extends UDF {
 public Text evaluate(final Text argA, final Text argB) {

 I'm unsure how this should be fixed in a proper way. Is the 
framework looking for an overload of evaluate that would comply with 
the null argument?


 I need to say that the table is declared using my own json serde 
reading from S3. I'm not processing nulls in my serde in any special 
way because Hive seems to handle null in the right way when not passed 
to my own UDF.


 Are there anyone out there with ideas or experiences on this issue?

 thanks in advance
 Søren

 




Re: handling null argument in custom udf

2012-12-04 Thread Mark Grover
Soren,
Can you give the complete stack trace? Or share the code? Perhaps, the
skeletal code.
Look at Ceil UDF for example, it has a null check, you should be able to do
something similar:
https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFCeil.java#L43

I would encourage you in the long run to use GenericUDF though. They are
better performing because they don't use reflection. I wrote a blog post a
while back to get people started with UDFs. It's at:
http://mark.thegrovers.ca/1/post/2012/06/how-to-write-a-hive-udf.html

Perhaps, I should put the content on Apache wiki but in the meanwhile, take
a look at it...

Using the Translate UDF as an example(reference:
https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFTranslate.java
)
If you would like to have a column accept nulls:
1. Allow the argument type to be void type in initialize() like it's done
at
https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFTranslate.java#L151
2. Handle null values appropriately in evaluate() like it's done at
https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFTranslate.java#L172

Good luck!
Mark

On Tue, Dec 4, 2012 at 6:58 AM, Søren s...@syntonetic.com wrote:

  Thanks. Did you mean I should handle null in my udf or my serde?

 I did try to check for null inside the code in my udf, but it fails even
 before it gets called.

 This is from when the udf fails:
 
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to
 execute method public org.apache.hadoop.io.Text
 com.company.hive.myfun.evaluate(java.lang.Object,java.lang.Object)
 on objectcom.company.hive.myfun@1412332 of class com.company.hive.myfun with
 arguments {0:java.lang.Object, null} of size 2

 It looks like there is a null, or is this error message misleading?



 On 04/12/2012 15:43, Edward Capriolo wrote:

 There is no null argument. You should handle the null case in your code.

 If (arga == null)

 Or optionally you could use a generic udf but a regular one should handle
 what you are doing.

 On Tuesday, December 4, 2012, Søren s...@syntonetic.com wrote:
  Hi Hive community
 
  I have a custom udf, say myfun, written in Java which I utilize like this
 
  select myfun(col_a, col_b) from mytable where etc
 
  col_b is a string type and sometimes it is null.
 
  When that happens, my query crashes with
  ---
  java.lang.RuntimeException:
 org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while
 processing row
  {col_a:val,col_b:null}
  ...
  Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to
 execute method public org.apache.hadoop.io.Text
  ---
 
  public final class myfun extends UDF {
  public Text evaluate(final Text argA, final Text argB) {
 
  I'm unsure how this should be fixed in a proper way. Is the framework
 looking for an overload of evaluate that would comply with the null
 argument?
 
  I need to say that the table is declared using my own json serde reading
 from S3. I'm not processing nulls in my serde in any special way because
 Hive seems to handle null in the right way when not passed to my own UDF.
 
  Are there anyone out there with ideas or experiences on this issue?
 
  thanks in advance
  Søren