Re: Hive Generic UDF invoking Hbase

2015-09-30 Thread Jason Dere
Take a look at hive.fetch.task.conversion in 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties?, try 
setting to "none" or "minimal"



From: Ryan Harris <ryan.har...@zionsbancorp.com>
Sent: Wednesday, September 30, 2015 9:19 AM
To: user@hive.apache.org
Subject: RE: Hive Generic UDF invoking Hbase

This may be a bit of 'hack'  but I've found that basic select-only operations 
will often cause Hive to stream data without running the job through an actual 
MR phase.  That would typically be a logical approach for a "give me 
everything" query if it were not for the UDF...

try adding a basic where clause to the query and see if that changes the 
behavior... e.g.
SELECT membership(c1,c2,c3,c4,c5,c5,c7) from MemberTable where c1 is not NULL;


From: Yogesh Keshetty [mailto:yogesh.keshe...@outlook.com]
Sent: Tuesday, September 29, 2015 11:02 PM
To: Hive community
Subject: RE: Hive Generic UDF invoking Hbase

Thanks for the reply Douglas.

We haven't set up Tez yet. The default execution mode is MR.
I checked the log files. There is no indication of map reduce logs for the 
program. There is no map reduce program generated. I can't see the job with 
"hadoop job -list" command too.

When we were testing the generic UDF's in hive 0.13 for every insertion into 
Hbase hive would trigger MR program. When we migrated to hive 0.14/ 1.0 it 
wouldn't generate any MR for the same activity. I don't know what has changed 
internally. Anyone who has tried to call HBase tables from Hive UDFs? Please 
help us.

Thanks in advance!


From: douglas.mo...@thinkbiganalytics.com
To: user@hive.apache.org
Subject: Re: Hive Generic UDF invoking Hbase
Date: Wed, 30 Sep 2015 03:24:53 +
I'm guessing you might now be using tez now where you were using MR before.
You can tell hive to run in map reduce mode, by setting the hive execution 
mode, from within the hive script.

See this page for details
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties

To answer your question though, you can look at the yarn job logs 
https://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-site/YarnCommands.html#logs
For jobs that have stopped running
or the scheduler page on the resource manager. The scheduler page will show 
running jobs and how many containers they are using.

I'm not familiar with the MapR management UIs, they should have a UI to show 
running jobs and you can drill down to see tasks/containers.

Hope this helps

Sent from my iPhone

On Sep 29, 2015, at 9:39 PM, Yogesh Keshetty 
<yogesh.keshe...@outlook.com<mailto:yogesh.keshe...@outlook.com>> wrote:
Hi,

I have a quick question about Hive Generic UDF's. We are trying to do some CRUD 
operations on HBase tables from hive generic UDF. But, the issue here is until 
hive 0.13, it would  generate map reduce task where we could track the status 
of execution. Once we migrated to hive 1.0, it doesn't show any status, it is 
probably doing a streaming on the data. How can we know if it is using multiple 
mappers for the job?

I thought this process would be pretty fast in terms of performance. But, looks 
like it is taking way longer than what we estimated. For 11.2 million records 
it has been more than 8 hours still it is in progress.

Use Case:

Let us say my table name is "MemberTable". The generic UDF name is "Membership" 
which accepts n columns as the parameters to the UDF. Inside the UDF we wrote 
some internal algorithm and insert the values in to multiple hbase tables.

Sample Query:

CREATE TEMPORARY FUNCTION membership as 'com.fishbowl.udf.membership'

SELECT membership(c1,c2,c3,c4,c5,c5,c7) from MemberTable;


Cluster info:
4 Node cluster (each 32 GB)
Hive version: 1.0
Hbase Version: 0.98.12
Distro: Mapr


Thanks in advance!

PS: This is really urgent, I hope someone can help us asap.

Thank you,
Yogesh


THIS ELECTRONIC MESSAGE, INCLUDING ANY ACCOMPANYING DOCUMENTS, IS CONFIDENTIAL 
and may contain information that is privileged and exempt from disclosure under 
applicable law. If you are neither the intended recipient nor responsible for 
delivering the message to the intended recipient, please note that any 
dissemination, distribution, copying or the taking of any action in reliance 
upon the message is strictly prohibited. If you have received this 
communication in error, please notify the sender immediately. Thank you.


RE: Hive Generic UDF invoking Hbase

2015-09-30 Thread Yogesh Keshetty
I believe  It's not because of classpath. For a single task / for streaming 
it's working fine right.

Sent from Outlook




On Wed, Sep 30, 2015 at 1:58 PM -0700, "Ryan Harris" 
<ryan.har...@zionsbancorp.com> wrote:
are all tasks failing with the same error message?

based on this:
Caused by: java.lang.ClassNotFoundException: 
org.apache.hadoop.hbase.client.HTable
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)

I'd guess that there may be some classpath issue on your datanodes?
I don't have as much experience troubleshooting custom UDFs, hopefully someone 
else will have better insights for you there.


From: Yogesh Keshetty [mailto:yogesh.keshe...@outlook.com]
Sent: Wednesday, September 30, 2015 2:48 PM
To: Hive community
Subject: RE: Hive Generic UDF invoking Hbase

Jason and Ryan,

Thanks for the solutions. It's now launching in MapReduce Mode. However, we are 
encountering another issue, since UDF is executing parallely now, We are facing 
another issue. Inside the generic UDF we are processing the records and storing 
in Hbase record by record. The job is getting killed, I am assuming since all 
the tasks are trying to access the same Hbase Table parallely this is not 
happening? this was working with just streaming. Is there any setting that 
should be enabled?

Please find the stacktrace for the same.

Error during job, obtaining debugging information...
Examining task ID: task_1443279785342_0017_m_00 (and more) from job 
job_1443279785342_0017

Task with the most failures(4):
-
Task ID:
  task_1443279785342_0017_m_01

-
Diagnostic Messages for this Task:
Error: java.lang.RuntimeException: 
org.apache.hive.com.esotericsoftware.kryo.KryoException: 
java.lang.IllegalArgumentException: Unable to create serializer 
"org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer" for 
class: com.ga.fishbowl.CustomerMatchingPayment_Test
Serialization trace:
genericUDF (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
colExprMap (org.apache.hadoop.hive.ql.exec.SelectOperator)
childOperators (org.apache.hadoop.hive.ql.exec.FilterOperator)
childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
at 
org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:423)
at 
org.apache.hadoop.hive.ql.exec.Utilities.getMapWork(Utilities.java:286)
at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:263)
at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:478)
at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:471)
at 
org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:648)
at 
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.(MapTask.java:172)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:414)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:347)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1566)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: 
java.lang.IllegalArgumentException: Unable to create serializer 
"org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer" for 
class: com.ga.fishbowl.CustomerMatchingPayment_Test
Serialization trace:
genericUDF (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
colExprMap (org.apache.hadoop.hive.ql.exec.SelectOperator)
childOperators (org.apache.hadoop.hive.ql.exec.FilterOperator)
childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
at 
org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)
at 
org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
at 
org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
at 
org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:139)
at 
org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17)
at 
org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
at 
org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
at 
org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
at 
org.apache.hive.com.esotericsoftware.kryo.K

RE: Hive Generic UDF invoking Hbase

2015-09-30 Thread Yogesh Keshetty
nSerializer.java:112)

at
org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18)

at 
org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)

at
org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)

at
org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)

at
org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)

at
org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:139)

at
org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17)

at
org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)

at
org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)

at
org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)

at
org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:672)

at
org.apache.hadoop.hive.ql.exec.Utilities.deserializeObjectByKryo(Utilities.java:1025)

at
org.apache.hadoop.hive.ql.exec.Utilities.deserializePlan(Utilities.java:933)

at
org.apache.hadoop.hive.ql.exec.Utilities.deserializePlan(Utilities.java:947)

at 
org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:390)

... 13 more

Caused by: java.lang.IllegalArgumentException: Unable to
create serializer
"org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer"
for class: com.ga.fishbowl.CustomerMatchingPayment_Test

at
org.apache.hive.com.esotericsoftware.kryo.factories.ReflectionSerializerFactory.makeSerializer(ReflectionSerializerFactory.java:45)

at
org.apache.hive.com.esotericsoftware.kryo.factories.ReflectionSerializerFactory.makeSerializer(ReflectionSerializerFactory.java:26)

at
org.apache.hive.com.esotericsoftware.kryo.Kryo.newDefaultSerializer(Kryo.java:343)

at
org.apache.hive.com.esotericsoftware.kryo.Kryo.getDefaultSerializer(Kryo.java:336)

at
org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.registerImplicit(DefaultClassResolver.java:56)

at
org.apache.hive.com.esotericsoftware.kryo.Kryo.getRegistration(Kryo.java:476)

at
org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:148)

at
org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:115)

at
org.apache.hive.com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:656)

at
org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:99)

... 43 more

Caused by: java.lang.reflect.InvocationTargetException

at
sun.reflect.GeneratedConstructorAccessor4.newInstance(Unknown Source)

at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)

at
java.lang.reflect.Constructor.newInstance(Constructor.java:526)

at
org.apache.hive.com.esotericsoftware.kryo.factories.ReflectionSerializerFactory.makeSerializer(ReflectionSerializerFactory.java:32)

... 52 more

Caused by: java.lang.NoClassDefFoundError:
Lorg/apache/hadoop/hbase/client/HTable;

at
java.lang.Class.getDeclaredFields0(Native Method)

at
java.lang.Class.privateGetDeclaredFields(Class.java:2499)

at
java.lang.Class.getDeclaredFields(Class.java:1811)

at
org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.rebuildCachedFields(FieldSerializer.java:150)

at
org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.(FieldSerializer.java:109)

... 56 more

Caused by: java.lang.ClassNotFoundException:
org.apache.hadoop.hbase.client.HTable

at
java.net.URLClassLoader$1.run(URLClassLoader.java:366)

at
java.net.URLClassLoader$1.run(URLClassLoader.java:355)

at
java.security.AccessController.doPrivileged(Native Method)

at
java.net.URLClassLoader.findClass(URLClassLoader.java:354)

at
java.lang.ClassLoader.loadClass(ClassLoader.java:425)

at
sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)

at
java.lang.ClassLoader.loadClass(ClassLoader.java:358)

... 61 more

 

 

FAILED: Execution Error, return code 2 from
org.apache.hadoop.hive.ql.exec.mr.MapRedTask

Subject: Re: Hive Generic UDF invoking Hbase
From: jd...@hortonworks.com
To: user@hive.apache.org
Date: Wed, 30 Sep 2015 17:19:18 +







Take a look at hive.fetch.task.conversion in 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties​, try 
setting to "none" or "minimal"







From: Ryan Harris <ryan.har...@zionsbancorp.com>

Sent: Wednesday, September 30, 2015 9:19 AM

To: user@hive.apache.org

Su

RE: Hive Generic UDF invoking Hbase

2015-09-30 Thread Ryan Harris
are all tasks failing with the same error message?

based on this:
Caused by: java.lang.ClassNotFoundException: 
org.apache.hadoop.hbase.client.HTable
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)

I'd guess that there may be some classpath issue on your datanodes?
I don't have as much experience troubleshooting custom UDFs, hopefully someone 
else will have better insights for you there.


From: Yogesh Keshetty [mailto:yogesh.keshe...@outlook.com]
Sent: Wednesday, September 30, 2015 2:48 PM
To: Hive community
Subject: RE: Hive Generic UDF invoking Hbase

Jason and Ryan,

Thanks for the solutions. It's now launching in MapReduce Mode. However, we are 
encountering another issue, since UDF is executing parallely now, We are facing 
another issue. Inside the generic UDF we are processing the records and storing 
in Hbase record by record. The job is getting killed, I am assuming since all 
the tasks are trying to access the same Hbase Table parallely this is not 
happening? this was working with just streaming. Is there any setting that 
should be enabled?

Please find the stacktrace for the same.

Error during job, obtaining debugging information...
Examining task ID: task_1443279785342_0017_m_00 (and more) from job 
job_1443279785342_0017

Task with the most failures(4):
-
Task ID:
  task_1443279785342_0017_m_01

-
Diagnostic Messages for this Task:
Error: java.lang.RuntimeException: 
org.apache.hive.com.esotericsoftware.kryo.KryoException: 
java.lang.IllegalArgumentException: Unable to create serializer 
"org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer" for 
class: com.ga.fishbowl.CustomerMatchingPayment_Test
Serialization trace:
genericUDF (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
colExprMap (org.apache.hadoop.hive.ql.exec.SelectOperator)
childOperators (org.apache.hadoop.hive.ql.exec.FilterOperator)
childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
at 
org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:423)
at 
org.apache.hadoop.hive.ql.exec.Utilities.getMapWork(Utilities.java:286)
at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:263)
at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:478)
at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:471)
at 
org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:648)
at 
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.(MapTask.java:172)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:414)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:347)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1566)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: 
java.lang.IllegalArgumentException: Unable to create serializer 
"org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer" for 
class: com.ga.fishbowl.CustomerMatchingPayment_Test
Serialization trace:
genericUDF (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
colExprMap (org.apache.hadoop.hive.ql.exec.SelectOperator)
childOperators (org.apache.hadoop.hive.ql.exec.FilterOperator)
childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
at 
org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)
at 
org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
at 
org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
at 
org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:139)
at 
org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17)
at 
org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
at 
org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
at 
org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
at 
org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
at 
org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:112)
at 
org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(Collecti

RE: Hive Generic UDF invoking Hbase

2015-09-30 Thread Ryan Harris
without seeing the code I really can't help.
Have you written other functioning UDFs? Are you aware of the requirements? 
https://cwiki.apache.org/confluence/display/Hive/HivePlugins


From: Yogesh Keshetty [mailto:yogesh.keshe...@outlook.com]
Sent: Wednesday, September 30, 2015 3:19 PM
To: user@hive.apache.org; user@hive.apache.org
Subject: RE: Hive Generic UDF invoking Hbase

I believe  It's not because of classpath. For a single task / for streaming 
it's working fine right.
Sent from Outlook<http://aka.ms/Ox5hz3>



On Wed, Sep 30, 2015 at 1:58 PM -0700, "Ryan Harris" 
<ryan.har...@zionsbancorp.com<mailto:ryan.har...@zionsbancorp.com>> wrote:

are all tasks failing with the same error message?



based on this:

Caused by: java.lang.ClassNotFoundException: 
org.apache.hadoop.hbase.client.HTable

at java.net.URLClassLoader$1.run(URLClassLoader.java:366)



I'd guess that there may be some classpath issue on your datanodes?

I don't have as much experience troubleshooting custom UDFs, hopefully someone 
else will have better insights for you there.





From: Yogesh Keshetty [mailto:yogesh.keshe...@outlook.com]
Sent: Wednesday, September 30, 2015 2:48 PM
To: Hive community
Subject: RE: Hive Generic UDF invoking Hbase



Jason and Ryan,



Thanks for the solutions. It's now launching in MapReduce Mode. However, we are 
encountering another issue, since UDF is executing parallely now, We are facing 
another issue. Inside the generic UDF we are processing the records and storing 
in Hbase record by record. The job is getting killed, I am assuming since all 
the tasks are trying to access the same Hbase Table parallely this is not 
happening? this was working with just streaming. Is there any setting that 
should be enabled?



Please find the stacktrace for the same.

Error during job, obtaining debugging information...

Examining task ID: task_1443279785342_0017_m_00 (and more) from job 
job_1443279785342_0017



Task with the most failures(4):

-

Task ID:

  task_1443279785342_0017_m_01



-

Diagnostic Messages for this Task:

Error: java.lang.RuntimeException: 
org.apache.hive.com.esotericsoftware.kryo.KryoException: 
java.lang.IllegalArgumentException: Unable to create serializer 
"org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer" for 
class: com.ga.fishbowl.CustomerMatchingPayment_Test

Serialization trace:

genericUDF (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)

colExprMap (org.apache.hadoop.hive.ql.exec.SelectOperator)

childOperators (org.apache.hadoop.hive.ql.exec.FilterOperator)

childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)

aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)

at 
org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:423)

at 
org.apache.hadoop.hive.ql.exec.Utilities.getMapWork(Utilities.java:286)

at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:263)

at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:478)

at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:471)

at 
org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:648)

at 
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.(MapTask.java:172)

at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:414)

at org.apache.hadoop.mapred.MapTask.run(MapTask.java:347)

at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:415)

at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1566)

at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)

Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: 
java.lang.IllegalArgumentException: Unable to create serializer 
"org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer" for 
class: com.ga.fishbowl.CustomerMatchingPayment_Test

Serialization trace:

genericUDF (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)

colExprMap (org.apache.hadoop.hive.ql.exec.SelectOperator)

childOperators (org.apache.hadoop.hive.ql.exec.FilterOperator)

childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)

aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)

at 
org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)

at 
org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)

at 
org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)

at 
org.apache.hive.com.esotericsoftware.kryo.seria

Re: Hive Generic UDF invoking Hbase

2015-09-30 Thread Jason Dere
So your custom UDF is using org.apache.hadoop.hbase.client.HTable​?

How do you resolve your UDF JAR (and this class) on the Hive client - are you 
doing ADD JAR, or are your UDF JARs and HBase JARs in your Hive class path?

​


From: Ryan Harris <ryan.har...@zionsbancorp.com>
Sent: Wednesday, September 30, 2015 3:19 PM
To: user@hive.apache.org
Subject: RE: Hive Generic UDF invoking Hbase

without seeing the code I really can't help.
Have you written other functioning UDFs? Are you aware of the requirements? 
https://cwiki.apache.org/confluence/display/Hive/HivePlugins


From: Yogesh Keshetty [mailto:yogesh.keshe...@outlook.com]
Sent: Wednesday, September 30, 2015 3:19 PM
To: user@hive.apache.org; user@hive.apache.org
Subject: RE: Hive Generic UDF invoking Hbase

I believe  It's not because of classpath. For a single task / for streaming 
it's working fine right.
Sent from Outlook<http://aka.ms/Ox5hz3>



On Wed, Sep 30, 2015 at 1:58 PM -0700, "Ryan Harris" 
<ryan.har...@zionsbancorp.com<mailto:ryan.har...@zionsbancorp.com>> wrote:

are all tasks failing with the same error message?



based on this:

Caused by: java.lang.ClassNotFoundException: 
org.apache.hadoop.hbase.client.HTable

at java.net.URLClassLoader$1.run(URLClassLoader.java:366)



I'd guess that there may be some classpath issue on your datanodes?

I don't have as much experience troubleshooting custom UDFs, hopefully someone 
else will have better insights for you there.





From: Yogesh Keshetty [mailto:yogesh.keshe...@outlook.com]
Sent: Wednesday, September 30, 2015 2:48 PM
To: Hive community
Subject: RE: Hive Generic UDF invoking Hbase



Jason and Ryan,



Thanks for the solutions. It's now launching in MapReduce Mode. However, we are 
encountering another issue, since UDF is executing parallely now, We are facing 
another issue. Inside the generic UDF we are processing the records and storing 
in Hbase record by record. The job is getting killed, I am assuming since all 
the tasks are trying to access the same Hbase Table parallely this is not 
happening? this was working with just streaming. Is there any setting that 
should be enabled?



Please find the stacktrace for the same.

Error during job, obtaining debugging information...

Examining task ID: task_1443279785342_0017_m_00 (and more) from job 
job_1443279785342_0017



Task with the most failures(4):

-

Task ID:

  task_1443279785342_0017_m_01



-

Diagnostic Messages for this Task:

Error: java.lang.RuntimeException: 
org.apache.hive.com.esotericsoftware.kryo.KryoException: 
java.lang.IllegalArgumentException: Unable to create serializer 
"org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer" for 
class: com.ga.fishbowl.CustomerMatchingPayment_Test

Serialization trace:

genericUDF (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)

colExprMap (org.apache.hadoop.hive.ql.exec.SelectOperator)

childOperators (org.apache.hadoop.hive.ql.exec.FilterOperator)

childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)

aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)

at 
org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:423)

at 
org.apache.hadoop.hive.ql.exec.Utilities.getMapWork(Utilities.java:286)

at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:263)

at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:478)

at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:471)

at 
org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:648)

at 
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.(MapTask.java:172)

at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:414)

at org.apache.hadoop.mapred.MapTask.run(MapTask.java:347)

at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:415)

at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1566)

at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)

Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: 
java.lang.IllegalArgumentException: Unable to create serializer 
"org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer" for 
class: com.ga.fishbowl.CustomerMatchingPayment_Test

Serialization trace:

genericUDF (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)

colExprMap (org.apache.hadoop.hive.ql.exec.SelectOperator)

childOperators (org.apache.hadoop.hive.ql.exec.FilterOperator)

childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator

RE: Hive Generic UDF invoking Hbase

2015-09-30 Thread Yogesh Keshetty

Ryan -  Yes, I have written UDFs and Generic UDFs before but this is the first 
time I wrote a UDF that calls Hbase tables.
Jason - Yes in my Generic UDF I am using org.apache.hadoop.hbase.client.HTable​ 
, On the hive side we should set some auxiliary jars property to add Hbase 
related jars. The respective HBase jars are already set in the classpath.
Subject: Re: Hive Generic UDF invoking Hbase
From: jd...@hortonworks.com
To: user@hive.apache.org
Date: Wed, 30 Sep 2015 22:43:17 +







So your custom UDF is using org.apache.hadoop.hbase.client.HTable​? 
How do you resolve your UDF JAR (and this class) on the Hive client - are you 
doing ADD JAR, or are your UDF JARs and HBase JARs in your Hive class path?


​




From: Ryan Harris <ryan.har...@zionsbancorp.com>

Sent: Wednesday, September 30, 2015 3:19 PM

To: user@hive.apache.org

Subject: RE: Hive Generic UDF invoking Hbase
 



without seeing the code I really can't help.
Have you written other functioning UDFs? Are you aware of the requirements? 
https://cwiki.apache.org/confluence/display/Hive/HivePlugins
 
 


From: Yogesh Keshetty [mailto:yogesh.keshe...@outlook.com]


Sent: Wednesday, September 30, 2015 3:19 PM

To: user@hive.apache.org; user@hive.apache.org

Subject: RE: Hive Generic UDF invoking Hbase


 

I believe  It's not because of classpath. For a single task / for streaming 
it's working fine right. 

Sent from Outlook

 







On Wed, Sep 30, 2015 at 1:58 PM -0700, "Ryan Harris" 
<ryan.har...@zionsbancorp.com> wrote:



are all tasks failing with the same error message?
 
based on this:

Caused by: java.lang.ClassNotFoundException: 
org.apache.hadoop.hbase.client.HTable
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
 
I'd guess that there may be some classpath issue on your datanodes?
I don't have as much experience troubleshooting custom UDFs, hopefully someone 
else will have better insights for you there.
 
 


From: Yogesh Keshetty [mailto:yogesh.keshe...@outlook.com]


Sent: Wednesday, September 30, 2015 2:48 PM

To: Hive community

Subject: RE: Hive Generic UDF invoking Hbase


 

Jason and Ryan,

 


Thanks for the solutions. It's now launching in MapReduce Mode. However, we are 
encountering another issue, since UDF is executing parallely now, We are facing 
another issue. Inside the
 generic UDF we are processing the records and storing in Hbase record by 
record. The job is getting killed, I am assuming since all the tasks are trying 
to access the same Hbase Table parallely this is not happening? this was 
working with just streaming. Is
 there any setting that should be enabled? 


 


Please find the stacktrace for the same. 
Error during job, obtaining debugging information...
Examining task ID: task_1443279785342_0017_m_00 (and more) from job 
job_1443279785342_0017
 
Task with the most failures(4):
-
Task ID:
  task_1443279785342_0017_m_01
 
-
Diagnostic Messages for this Task:
Error: java.lang.RuntimeException: 
org.apache.hive.com.esotericsoftware.kryo.KryoException: 
java.lang.IllegalArgumentException: Unable to create serializer 
"org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer" for 
class:
 com.ga.fishbowl.CustomerMatchingPayment_Test
Serialization trace:
genericUDF (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
colExprMap (org.apache.hadoop.hive.ql.exec.SelectOperator)
childOperators (org.apache.hadoop.hive.ql.exec.FilterOperator)
childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
at 
org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:423)
at 
org.apache.hadoop.hive.ql.exec.Utilities.getMapWork(Utilities.java:286)
at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:263)
at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:478)
at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:471)
at 
org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:648)
at 
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.(MapTask.java:172)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:414)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:347)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1566)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: 
java.lang.IllegalArgumentException: Unable to create serializer 
"org.apache.hive.com.esotericsoftware.kryo.se

Re: Hive Generic UDF invoking Hbase

2015-09-30 Thread Jason Dere
Not totally familiar with the aux jars property .. does that make sure that the 
JAR is shipped as part of the MR job? If it does not, you could try adding the 
necessary jars using ADD JAR to see if that is the issue




From: Yogesh Keshetty <yogesh.keshe...@outlook.com>
Sent: Wednesday, September 30, 2015 3:52 PM
To: Hive community
Subject: RE: Hive Generic UDF invoking Hbase


Ryan -  Yes, I have written UDFs and Generic UDFs before but this is the first 
time I wrote a UDF that calls Hbase tables.

Jason - Yes in my Generic UDF I am using org.apache.hadoop.hbase.client.HTable​ 
, On the hive side we should set some auxiliary jars property to add Hbase 
related jars. The respective HBase jars are already set in the classpath.


Subject: Re: Hive Generic UDF invoking Hbase
From: jd...@hortonworks.com
To: user@hive.apache.org
Date: Wed, 30 Sep 2015 22:43:17 +


So your custom UDF is using org.apache.hadoop.hbase.client.HTable​?

How do you resolve your UDF JAR (and this class) on the Hive client - are you 
doing ADD JAR, or are your UDF JARs and HBase JARs in your Hive class path?

​


From: Ryan Harris <ryan.har...@zionsbancorp.com>
Sent: Wednesday, September 30, 2015 3:19 PM
To: user@hive.apache.org
Subject: RE: Hive Generic UDF invoking Hbase


without seeing the code I really can't help.

Have you written other functioning UDFs? Are you aware of the requirements? 
https://cwiki.apache.org/confluence/display/Hive/HivePlugins





From: Yogesh Keshetty [mailto:yogesh.keshe...@outlook.com]
Sent: Wednesday, September 30, 2015 3:19 PM
To: user@hive.apache.org; user@hive.apache.org
Subject: RE: Hive Generic UDF invoking Hbase



I believe  It's not because of classpath. For a single task / for streaming 
it's working fine right.

Sent from Outlook<http://aka.ms/Ox5hz3>





On Wed, Sep 30, 2015 at 1:58 PM -0700, "Ryan Harris" 
<ryan.har...@zionsbancorp.com<mailto:ryan.har...@zionsbancorp.com>> wrote:

are all tasks failing with the same error message?



based on this:

Caused by: java.lang.ClassNotFoundException: 
org.apache.hadoop.hbase.client.HTable

at java.net.URLClassLoader$1.run(URLClassLoader.java:366)



I'd guess that there may be some classpath issue on your datanodes?

I don't have as much experience troubleshooting custom UDFs, hopefully someone 
else will have better insights for you there.





From: Yogesh Keshetty [mailto:yogesh.keshe...@outlook.com]
Sent: Wednesday, September 30, 2015 2:48 PM
To: Hive community
Subject: RE: Hive Generic UDF invoking Hbase



Jason and Ryan,



Thanks for the solutions. It's now launching in MapReduce Mode. However, we are 
encountering another issue, since UDF is executing parallely now, We are facing 
another issue. Inside the generic UDF we are processing the records and storing 
in Hbase record by record. The job is getting killed, I am assuming since all 
the tasks are trying to access the same Hbase Table parallely this is not 
happening? this was working with just streaming. Is there any setting that 
should be enabled?



Please find the stacktrace for the same.

Error during job, obtaining debugging information...

Examining task ID: task_1443279785342_0017_m_00 (and more) from job 
job_1443279785342_0017



Task with the most failures(4):

-

Task ID:

  task_1443279785342_0017_m_01



-

Diagnostic Messages for this Task:

Error: java.lang.RuntimeException: 
org.apache.hive.com.esotericsoftware.kryo.KryoException: 
java.lang.IllegalArgumentException: Unable to create serializer 
"org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer" for 
class: com.ga.fishbowl.CustomerMatchingPayment_Test

Serialization trace:

genericUDF (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)

colExprMap (org.apache.hadoop.hive.ql.exec.SelectOperator)

childOperators (org.apache.hadoop.hive.ql.exec.FilterOperator)

childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)

aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)

at 
org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:423)

at 
org.apache.hadoop.hive.ql.exec.Utilities.getMapWork(Utilities.java:286)

at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:263)

at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:478)

at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:471)

at 
org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:648)

at 
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.(MapTask.java:172)

at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:414)

at org.apache.hadoop.mapred.MapTask.run(MapTask.java:347)

at org.apach

Re: Hive Generic UDF invoking Hbase

2015-09-29 Thread Moore, Douglas
I'm guessing you might now be using tez now where you were using MR before.
You can tell hive to run in map reduce mode, by setting the hive execution 
mode, from within the hive script.

See this page for details
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties

To answer your question though, you can look at the yarn job logs 
https://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-site/YarnCommands.html#logs
For jobs that have stopped running
or the scheduler page on the resource manager. The scheduler page will show 
running jobs and how many containers they are using.

I'm not familiar with the MapR management UIs, they should have a UI to show 
running jobs and you can drill down to see tasks/containers.

Hope this helps

Sent from my iPhone

On Sep 29, 2015, at 9:39 PM, Yogesh Keshetty 
> wrote:

Hi,

I have a quick question about Hive Generic UDF’s. We are trying to do some CRUD 
operations on HBase tables from hive generic UDF. But, the issue here is until 
hive 0.13, it would  generate map reduce task where we could track the status 
of execution. Once we migrated to hive 1.0, it doesn’t show any status, it is 
probably doing a streaming on the data. How can we know if it is using multiple 
mappers for the job?

I thought this process would be pretty fast in terms of performance. But, looks 
like it is taking way longer than what we estimated. For 11.2 million records 
it has been more than 8 hours still it is in progress.

Use Case:

Let us say my table name is “MemberTable”. The generic UDF name is “Membership” 
which accepts n columns as the parameters to the UDF. Inside the UDF we wrote 
some internal algorithm and insert the values in to multiple hbase tables.

Sample Query:

CREATE TEMPORARY FUNCTION membership as ‘com.fishbowl.udf.membership’

SELECT membership(c1,c2,c3,c4,c5,c5,c7) from MemberTable;


Cluster info:
4 Node cluster (each 32 GB)
Hive version: 1.0
Hbase Version: 0.98.12
Distro: Mapr


Thanks in advance!

PS: This is really urgent, I hope someone can help us asap.

Thank you,
Yogesh



RE: Hive Generic UDF invoking Hbase

2015-09-29 Thread Yogesh Keshetty
Thanks for the reply Douglas.
We haven't set up Tez yet. The default execution mode is MR.I checked the log 
files. There is no indication of map reduce logs for the program. There is no 
map reduce program generated. I can't see the job with "hadoop job -list" 
command too. 
When we were testing the generic UDF's in hive 0.13 for every insertion into 
Hbase hive would trigger MR program. When we migrated to hive 0.14/ 1.0 it 
wouldn't generate any MR for the same activity. I don't know what has changed 
internally. Anyone who has tried to call HBase tables from Hive UDFs? Please 
help us.
Thanks in advance!

From: douglas.mo...@thinkbiganalytics.com
To: user@hive.apache.org
Subject: Re: Hive Generic UDF invoking Hbase
Date: Wed, 30 Sep 2015 03:24:53 +






I'm guessing you might now be using tez now where you were using MR before.
You can tell hive to run in map reduce mode, by setting the hive execution 
mode, from within the hive script.



See this page for details
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties



To answer your question though, you can look at the yarn job logs 
https://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-site/YarnCommands.html#logs
For jobs that have stopped running 
or the scheduler page on the resource manager. The scheduler page will show 
running jobs and how many containers they are using. 



I'm not familiar with the MapR management UIs, they should have a UI to show 
running jobs and you can drill down to see tasks/containers.



Hope this helps


Sent from my iPhone


On Sep 29, 2015, at 9:39 PM, Yogesh Keshetty <yogesh.keshe...@outlook.com> 
wrote:







Hi,
 
I have a quick question about Hive Generic UDF’s. We are trying to do some CRUD 
operations on HBase tables from hive generic UDF. But, the issue here is until 
hive 0.13, it would  generate map reduce task where we could track the status
 of execution. Once we migrated to hive 1.0, it doesn’t show any status, it is 
probably doing a streaming on the data. How can we know if it is using multiple 
mappers for the job?
 
I thought this process would be pretty fast in terms of performance. But, looks 
like it is taking way longer than what we estimated. For 11.2 million records 
it has been more than 8 hours still it is in progress.
 
Use Case:
 
Let us say my table name is “MemberTable”. The generic UDF name is “Membership” 
which accepts
n columns as the parameters to the UDF. Inside the UDF we wrote some internal 
algorithm and insert the values in to multiple hbase tables.
 
Sample Query:
 
CREATE TEMPORARY FUNCTION membership as ‘com.fishbowl.udf.membership’
 
SELECT membership(c1,c2,c3,c4,c5,c5,c7) from MemberTable;
 
 
Cluster info:
4 Node cluster (each 32 GB)
Hive version: 1.0
Hbase Version: 0.98.12
Distro: Mapr 
 



Thanks in advance!



PS: This is really urgent, I hope someone can help us asap.



Thank you,
Yogesh