Re: Spark Job on YARN accessing Hbase Table

Ted Yu Sun, 13 Mar 2016 16:28:41 -0700

The backport would be done under HBASE-14160.

FYI


On Sun, Mar 13, 2016 at 4:14 PM, Benjamin Kim <bbuil...@gmail.com> wrote:

> Ted,
>
> Is there anything in the works or are there tasks already to do the
> back-porting?
>
> Just curious.
>
> Thanks,
> Ben
>
> On Mar 13, 2016, at 3:46 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>
> class HFileWriterImpl (in standalone file) is only present in master
> branch.
> It is not in branch-1.
>
> compressionByName() resides in class with @InterfaceAudience.Private which
> got moved in master branch.
>
> So looks like there is some work to be done for backporting to branch-1 :-)
>
> On Sun, Mar 13, 2016 at 1:35 PM, Benjamin Kim <bbuil...@gmail.com> wrote:
>
>> Ted,
>>
>> I did as you said, but it looks like that HBaseContext relies on some
>> differences in HBase itself.
>>
>> [ERROR]
>> /home/bkim/hbase-rel-1.0.2/hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/HBaseContext.scala:30:
>> error: object HFileWriterImpl is not a member of package
>> org.apache.hadoop.hbase.io.hfile
>> [ERROR] import org.apache.hadoop.hbase.io.hfile.{CacheConfig,
>> HFileContextBuilder, HFileWriterImpl}
>> [ERROR]        ^
>> [ERROR]
>> /home/bkim/hbase-rel-1.0.2/hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/HBaseContext.scala:627:
>> error: not found: value HFileWriterImpl
>> [ERROR]     val hfileCompression = HFileWriterImpl
>> [ERROR]                            ^
>> [ERROR]
>> /home/bkim/hbase-rel-1.0.2/hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/HBaseContext.scala:750:
>> error: not found: value HFileWriterImpl
>> [ERROR]     val defaultCompression = HFileWriterImpl
>> [ERROR]                              ^
>> [ERROR]
>> /home/bkim/hbase-rel-1.0.2/hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/HBaseContext.scala:898:
>> error: value COMPARATOR is not a member of object
>> org.apache.hadoop.hbase.CellComparator
>> [ERROR]
>> .withComparator(CellComparator.COMPARATOR).withFileContext(hFileContext)
>>
>> So… back to my original question… do you know when these
>> incompatibilities were introduced? If so, I can pulled that version at time
>> and try again.
>>
>> Thanks,
>> Ben
>>
>> On Mar 13, 2016, at 12:42 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>>
>> Benjamin:
>> Since hbase-spark is in its own module, you can pull the whole
>> hbase-spark subtree into hbase 1.0 root dir and add the following to root
>> pom.xml:
>>     <module>hbase-spark</module>
>>
>> Then you would be able to build the module yourself.
>>
>> hbase-spark module uses APIs which are compatible with hbase 1.0
>>
>> Cheers
>>
>> On Sun, Mar 13, 2016 at 11:39 AM, Benjamin Kim <bbuil...@gmail.com>
>> wrote:
>>
>>> Hi Ted,
>>>
>>> I see that you’re working on the hbase-spark module for hbase. I
>>> recently packaged the SparkOnHBase project and gave it a test run. It works
>>> like a charm on CDH 5.4 and 5.5. All I had to do was
>>> add /opt/cloudera/parcels/CDH/jars/htrace-core-3.1.0-incubating.jar to the
>>> classpath.txt file in /etc/spark/conf. Then, I ran spark-shell with “—jars
>>> /path/to/spark-hbase-0.0.2-clabs.jar” as an argument and used the
>>> easy-to-use HBaseContext for HBase operations. Now, I want to use the
>>> latest in Dataframes. Since the new functionality is only in the
>>> hbase-spark module, I want to know how to get it and package it for CDH
>>> 5.5, which still uses HBase 1.0.0. Can you tell me what version of hbase
>>> master is still backwards compatible?
>>>
>>> By the way, we are using Spark 1.6 if it matters.
>>>
>>> Thanks,
>>> Ben
>>>
>>> On Feb 10, 2016, at 2:34 AM, Ted Yu <yuzhih...@gmail.com> wrote:
>>>
>>> Have you tried adding hbase client jars to spark.executor.extraClassPath
>>> ?
>>>
>>> Cheers
>>>
>>> On Wed, Feb 10, 2016 at 12:17 AM, Prabhu Joseph <
>>> prabhujose.ga...@gmail.com> wrote:
>>>
>>>> + Spark-Dev
>>>>
>>>> For a Spark job on YARN accessing hbase table, added all hbase client
>>>> jars into spark.yarn.dist.files, NodeManager when launching container i.e
>>>> executor, does localization and brings all hbase-client jars into executor
>>>> CWD, but still the executor tasks fail with ClassNotFoundException of hbase
>>>> client jars, when i checked launch container.sh , Classpath does not have
>>>> $PWD/* and hence all the hbase client jars are ignored.
>>>>
>>>> Is spark.yarn.dist.files not for adding jars into the executor
>>>> classpath.
>>>>
>>>> Thanks,
>>>> Prabhu Joseph
>>>>
>>>> On Tue, Feb 9, 2016 at 1:42 PM, Prabhu Joseph <
>>>> prabhujose.ga...@gmail.com> wrote:
>>>>
>>>>> Hi All,
>>>>>
>>>>>  When i do count on a Hbase table from Spark Shell which runs as
>>>>> yarn-client mode, the job fails at count().
>>>>>
>>>>> MASTER=yarn-client ./spark-shell
>>>>>
>>>>> import org.apache.hadoop.hbase.{HBaseConfiguration, HTableDescriptor,
>>>>> TableName}
>>>>> import org.apache.hadoop.hbase.client.HBaseAdmin
>>>>> import org.apache.hadoop.hbase.mapreduce.TableInputFormat
>>>>>
>>>>> val conf = HBaseConfiguration.create()
>>>>> conf.set(TableInputFormat.INPUT_TABLE,"spark")
>>>>>
>>>>> val hBaseRDD = sc.newAPIHadoopRDD(conf,
>>>>> classOf[TableInputFormat],classOf[org.apache.hadoop.hbase.io.ImmutableBytesWritable],classOf[org.apache.hadoop.hbase.client.Result])
>>>>> hBaseRDD.count()
>>>>>
>>>>>
>>>>> Tasks throw below exception, the actual exception is swallowed, a bug
>>>>> JDK-7172206. After installing hbase client on all NodeManager machines, 
>>>>> the
>>>>> Spark job ran fine. So I confirmed that the issue is with executor
>>>>> classpath.
>>>>>
>>>>> But i am searching for some other way of including hbase jars in spark
>>>>> executor classpath instead of installing hbase client on all NM machines.
>>>>> Tried adding all hbase jars in spark.yarn.dist.files , NM logs shows that
>>>>> it localized all hbase jars, still the job fails. Tried
>>>>> spark.executor.extraClasspath, still the job fails.
>>>>>
>>>>> Is there any way we can access hbase from Executor without installing
>>>>> hbase-client on all machines.
>>>>>
>>>>>
>>>>> 16/02/09 02:34:57 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID
>>>>> 0, prabhuFS1): *java.lang.IllegalStateException: unread block data*
>>>>>         at
>>>>> java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2428)
>>>>>         at
>>>>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1382)
>>>>>         at
>>>>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1997)
>>>>>         at
>>>>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1921)
>>>>>         at
>>>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>>>>>         at
>>>>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>>>>>         at
>>>>> java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
>>>>>         at
>>>>> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:68)
>>>>>         at
>>>>> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:94)
>>>>>         at
>>>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185)
>>>>>         at
>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>>         at
>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>>         at java.lang.Thread.run(Thread.java:745)
>>>>>
>>>>>
>>>>>
>>>>> Thanks,
>>>>> Prabhu Joseph
>>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
>

Re: Spark Job on YARN accessing Hbase Table

Reply via email to