The backport would be done under HBASE-14160. FYI
On Sun, Mar 13, 2016 at 4:14 PM, Benjamin Kim <bbuil...@gmail.com> wrote: > Ted, > > Is there anything in the works or are there tasks already to do the > back-porting? > > Just curious. > > Thanks, > Ben > > On Mar 13, 2016, at 3:46 PM, Ted Yu <yuzhih...@gmail.com> wrote: > > class HFileWriterImpl (in standalone file) is only present in master > branch. > It is not in branch-1. > > compressionByName() resides in class with @InterfaceAudience.Private which > got moved in master branch. > > So looks like there is some work to be done for backporting to branch-1 :-) > > On Sun, Mar 13, 2016 at 1:35 PM, Benjamin Kim <bbuil...@gmail.com> wrote: > >> Ted, >> >> I did as you said, but it looks like that HBaseContext relies on some >> differences in HBase itself. >> >> [ERROR] >> /home/bkim/hbase-rel-1.0.2/hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/HBaseContext.scala:30: >> error: object HFileWriterImpl is not a member of package >> org.apache.hadoop.hbase.io.hfile >> [ERROR] import org.apache.hadoop.hbase.io.hfile.{CacheConfig, >> HFileContextBuilder, HFileWriterImpl} >> [ERROR] ^ >> [ERROR] >> /home/bkim/hbase-rel-1.0.2/hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/HBaseContext.scala:627: >> error: not found: value HFileWriterImpl >> [ERROR] val hfileCompression = HFileWriterImpl >> [ERROR] ^ >> [ERROR] >> /home/bkim/hbase-rel-1.0.2/hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/HBaseContext.scala:750: >> error: not found: value HFileWriterImpl >> [ERROR] val defaultCompression = HFileWriterImpl >> [ERROR] ^ >> [ERROR] >> /home/bkim/hbase-rel-1.0.2/hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/HBaseContext.scala:898: >> error: value COMPARATOR is not a member of object >> org.apache.hadoop.hbase.CellComparator >> [ERROR] >> .withComparator(CellComparator.COMPARATOR).withFileContext(hFileContext) >> >> So… back to my original question… do you know when these >> incompatibilities were introduced? If so, I can pulled that version at time >> and try again. >> >> Thanks, >> Ben >> >> On Mar 13, 2016, at 12:42 PM, Ted Yu <yuzhih...@gmail.com> wrote: >> >> Benjamin: >> Since hbase-spark is in its own module, you can pull the whole >> hbase-spark subtree into hbase 1.0 root dir and add the following to root >> pom.xml: >> <module>hbase-spark</module> >> >> Then you would be able to build the module yourself. >> >> hbase-spark module uses APIs which are compatible with hbase 1.0 >> >> Cheers >> >> On Sun, Mar 13, 2016 at 11:39 AM, Benjamin Kim <bbuil...@gmail.com> >> wrote: >> >>> Hi Ted, >>> >>> I see that you’re working on the hbase-spark module for hbase. I >>> recently packaged the SparkOnHBase project and gave it a test run. It works >>> like a charm on CDH 5.4 and 5.5. All I had to do was >>> add /opt/cloudera/parcels/CDH/jars/htrace-core-3.1.0-incubating.jar to the >>> classpath.txt file in /etc/spark/conf. Then, I ran spark-shell with “—jars >>> /path/to/spark-hbase-0.0.2-clabs.jar” as an argument and used the >>> easy-to-use HBaseContext for HBase operations. Now, I want to use the >>> latest in Dataframes. Since the new functionality is only in the >>> hbase-spark module, I want to know how to get it and package it for CDH >>> 5.5, which still uses HBase 1.0.0. Can you tell me what version of hbase >>> master is still backwards compatible? >>> >>> By the way, we are using Spark 1.6 if it matters. >>> >>> Thanks, >>> Ben >>> >>> On Feb 10, 2016, at 2:34 AM, Ted Yu <yuzhih...@gmail.com> wrote: >>> >>> Have you tried adding hbase client jars to spark.executor.extraClassPath >>> ? >>> >>> Cheers >>> >>> On Wed, Feb 10, 2016 at 12:17 AM, Prabhu Joseph < >>> prabhujose.ga...@gmail.com> wrote: >>> >>>> + Spark-Dev >>>> >>>> For a Spark job on YARN accessing hbase table, added all hbase client >>>> jars into spark.yarn.dist.files, NodeManager when launching container i.e >>>> executor, does localization and brings all hbase-client jars into executor >>>> CWD, but still the executor tasks fail with ClassNotFoundException of hbase >>>> client jars, when i checked launch container.sh , Classpath does not have >>>> $PWD/* and hence all the hbase client jars are ignored. >>>> >>>> Is spark.yarn.dist.files not for adding jars into the executor >>>> classpath. >>>> >>>> Thanks, >>>> Prabhu Joseph >>>> >>>> On Tue, Feb 9, 2016 at 1:42 PM, Prabhu Joseph < >>>> prabhujose.ga...@gmail.com> wrote: >>>> >>>>> Hi All, >>>>> >>>>> When i do count on a Hbase table from Spark Shell which runs as >>>>> yarn-client mode, the job fails at count(). >>>>> >>>>> MASTER=yarn-client ./spark-shell >>>>> >>>>> import org.apache.hadoop.hbase.{HBaseConfiguration, HTableDescriptor, >>>>> TableName} >>>>> import org.apache.hadoop.hbase.client.HBaseAdmin >>>>> import org.apache.hadoop.hbase.mapreduce.TableInputFormat >>>>> >>>>> val conf = HBaseConfiguration.create() >>>>> conf.set(TableInputFormat.INPUT_TABLE,"spark") >>>>> >>>>> val hBaseRDD = sc.newAPIHadoopRDD(conf, >>>>> classOf[TableInputFormat],classOf[org.apache.hadoop.hbase.io.ImmutableBytesWritable],classOf[org.apache.hadoop.hbase.client.Result]) >>>>> hBaseRDD.count() >>>>> >>>>> >>>>> Tasks throw below exception, the actual exception is swallowed, a bug >>>>> JDK-7172206. After installing hbase client on all NodeManager machines, >>>>> the >>>>> Spark job ran fine. So I confirmed that the issue is with executor >>>>> classpath. >>>>> >>>>> But i am searching for some other way of including hbase jars in spark >>>>> executor classpath instead of installing hbase client on all NM machines. >>>>> Tried adding all hbase jars in spark.yarn.dist.files , NM logs shows that >>>>> it localized all hbase jars, still the job fails. Tried >>>>> spark.executor.extraClasspath, still the job fails. >>>>> >>>>> Is there any way we can access hbase from Executor without installing >>>>> hbase-client on all machines. >>>>> >>>>> >>>>> 16/02/09 02:34:57 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID >>>>> 0, prabhuFS1): *java.lang.IllegalStateException: unread block data* >>>>> at >>>>> java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2428) >>>>> at >>>>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1382) >>>>> at >>>>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1997) >>>>> at >>>>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1921) >>>>> at >>>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) >>>>> at >>>>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) >>>>> at >>>>> java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) >>>>> at >>>>> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:68) >>>>> at >>>>> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:94) >>>>> at >>>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185) >>>>> at >>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>>>> at >>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>>>> at java.lang.Thread.run(Thread.java:745) >>>>> >>>>> >>>>> >>>>> Thanks, >>>>> Prabhu Joseph >>>>> >>>> >>>> >>> >>> >> >> > >