SV: Pyspark Hbase scan.
?Sorry forgot to attach traceback. Regards Rene Castberg Fra: Castberg, René Christian Sendt: 13. mars 2015 07:13 Til: user@spark.apache.org Kopi: gen tang Emne: SV: Pyspark Hbase scan. ?Hi, I have now successfully managed to test this in a local spark session. But i am having a huge programming getting this to work with Horton Works technical preview. I think that there is an incompatability with the way YARN has been compiled. After changing the hbase version, and adding: resolvers += Hortonworks Releases at http://repo.hortonworks.com/content/repositories/releases/; I get the attached traceback. Any help in how to compile this jar such that it works would be greatly appreciated. Regards Rene Castberg Fra: gen tang gen.tan...@gmail.com Sendt: 5. februar 2015 11:38 Til: Castberg, René Christian Kopi: user@spark.apache.org Emne: Re: Pyspark Hbase scan. Hi, In fact, this pull https://github.com/apache/spark/pull/3920 is to do Hbase scan. However, it is not merged yet. You can also take a look at the example code at http://spark-packages.org/package/20 which is using scala and python to read data from hbase. Hope this can be helpful. Cheers Gen On Thu, Feb 5, 2015 at 11:11 AM, Castberg, René Christian rene.castb...@dnvgl.commailto:rene.castb...@dnvgl.com wrote: ?Hi, I am trying to do a hbase scan and read it into a spark rdd using pyspark. I have successfully written data to hbase from pyspark, and been able to read a full table from hbase using the python example code. Unfortunately I am unable to find any example code for doing an HBase scan and read it into a spark rdd from pyspark. I have found a scala example : http://stackoverflow.com/questions/25189527/how-to-process-a-range-of-hbase-rows-using-spark But i can't find anything on how to do this from python. Can anybody shed some light on how (and if) this can be done?? Regards Rene Castberg? ** This e-mail and any attachments thereto may contain confidential information and/or information protected by intellectual property rights for the exclusive attention of the intended addressees named above. If you have received this transmission in error, please immediately notify the sender by return e-mail and delete this message and its attachments. Unauthorized use, copying or further full or partial distribution of this e-mail or its contents is prohibited. ** ** This e-mail and any attachments thereto may contain confidential information and/or information protected by intellectual property rights for the exclusive attention of the intended addressees named above. If you have received this transmission in error, please immediately notify the sender by return e-mail and delete this message and its attachments. Unauthorized use, copying or further full or partial distribution of this e-mail or its contents is prohibited. ** $ /hadoop-dist/spark-1.2.1-bin-hadoop2.4/bin/spark-submit --driver-class-path /usr/hdp/current/share/lzo/0.6.0/lib/hadoop-lzo-0.6.0.jar:/home/recast/spark_hbase/target/scala-2.10/spark_hbase-assembly-1.0.jar --jars /hadoop-dist/spark-1.2.1-bin-hadoop2.4/lib/spark-examples-1.2.1-hadoop2.4.0.jar --driver-library-path /usr/hdp/current/share/lzo/0.6.0/lib/native/Linux-amd64-64/ AIS_count_msb_hbase.py Spark assembly has been built with Hive, including Datanucleus jars on classpath 2.7.9 (default, Feb 25 2015, 14:55:10) [GCC 4.4.7 20120313 (Red Hat 4.4.7-11)] /hadoop-dist/Python/lib/python2.7/site-packages/setuptools-12.3-py2.7.egg/pkg_resources/__init__.py:1224: UserWarning: /tmp/python-eggs is writable by group/others and vulnerable to attack when used with get_resource_filename. Consider a more secure location (set with .set_extraction_path or the PYTHON_EGG_CACHE environment variable). Reading config file for : smalldata01.hdp SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/home/recast/spark_hbase/target/scala-2.10/spark_hbase-assembly-1.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/hadoop-dist/spark-1.2.1-bin-hadoop2.4/lib/spark-assembly-1.2.1-hadoop2.4.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 15/03/13 06:10:34 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 15/03/13 06:10:34 WARN YarnClientSchedulerBackend: NOTE: SPARK_WORKER_INSTANCES is deprecated. Use
SV: Pyspark Hbase scan.
?Hi, I have now successfully managed to test this in a local spark session. But i am having a huge programming getting this to work with Horton Works technical preview. I think that there is an incompatability with the way YARN has been compiled. After changing the hbase version, and adding: resolvers += Hortonworks Releases at http://repo.hortonworks.com/content/repositories/releases/; I get the attached traceback. Any help in how to compile this jar such that it works would be greatly appreciated. Regards Rene Castberg Fra: gen tang gen.tan...@gmail.com Sendt: 5. februar 2015 11:38 Til: Castberg, René Christian Kopi: user@spark.apache.org Emne: Re: Pyspark Hbase scan. Hi, In fact, this pull https://github.com/apache/spark/pull/3920 is to do Hbase scan. However, it is not merged yet. You can also take a look at the example code at http://spark-packages.org/package/20 which is using scala and python to read data from hbase. Hope this can be helpful. Cheers Gen On Thu, Feb 5, 2015 at 11:11 AM, Castberg, René Christian rene.castb...@dnvgl.commailto:rene.castb...@dnvgl.com wrote: ?Hi, I am trying to do a hbase scan and read it into a spark rdd using pyspark. I have successfully written data to hbase from pyspark, and been able to read a full table from hbase using the python example code. Unfortunately I am unable to find any example code for doing an HBase scan and read it into a spark rdd from pyspark. I have found a scala example : http://stackoverflow.com/questions/25189527/how-to-process-a-range-of-hbase-rows-using-spark But i can't find anything on how to do this from python. Can anybody shed some light on how (and if) this can be done?? Regards Rene Castberg? ** This e-mail and any attachments thereto may contain confidential information and/or information protected by intellectual property rights for the exclusive attention of the intended addressees named above. If you have received this transmission in error, please immediately notify the sender by return e-mail and delete this message and its attachments. Unauthorized use, copying or further full or partial distribution of this e-mail or its contents is prohibited. ** ** This e-mail and any attachments thereto may contain confidential information and/or information protected by intellectual property rights for the exclusive attention of the intended addressees named above. If you have received this transmission in error, please immediately notify the sender by return e-mail and delete this message and its attachments. Unauthorized use, copying or further full or partial distribution of this e-mail or its contents is prohibited. **
Pyspark Hbase scan.
?Hi, I am trying to do a hbase scan and read it into a spark rdd using pyspark. I have successfully written data to hbase from pyspark, and been able to read a full table from hbase using the python example code. Unfortunately I am unable to find any example code for doing an HBase scan and read it into a spark rdd from pyspark. I have found a scala example : http://stackoverflow.com/questions/25189527/how-to-process-a-range-of-hbase-rows-using-spark But i can't find anything on how to do this from python. Can anybody shed some light on how (and if) this can be done?? Regards Rene Castberg? ** This e-mail and any attachments thereto may contain confidential information and/or information protected by intellectual property rights for the exclusive attention of the intended addressees named above. If you have received this transmission in error, please immediately notify the sender by return e-mail and delete this message and its attachments. Unauthorized use, copying or further full or partial distribution of this e-mail or its contents is prohibited. **
Re: Pyspark Hbase scan.
Hi, In fact, this pull https://github.com/apache/spark/pull/3920 is to do Hbase scan. However, it is not merged yet. You can also take a look at the example code at http://spark-packages.org/package/20 which is using scala and python to read data from hbase. Hope this can be helpful. Cheers Gen On Thu, Feb 5, 2015 at 11:11 AM, Castberg, René Christian rene.castb...@dnvgl.com wrote: Hi, I am trying to do a hbase scan and read it into a spark rdd using pyspark. I have successfully written data to hbase from pyspark, and been able to read a full table from hbase using the python example code. Unfortunately I am unable to find any example code for doing an HBase scan and read it into a spark rdd from pyspark. I have found a scala example : http://stackoverflow.com/questions/25189527/how-to-process-a-range-of-hbase-rows-using-spark But i can't find anything on how to do this from python. Can anybody shed some light on how (and if) this can be done? Regards Rene Castberg ** This e-mail and any attachments thereto may contain confidential information and/or information protected by intellectual property rights for the exclusive attention of the intended addressees named above. If you have received this transmission in error, please immediately notify the sender by return e-mail and delete this message and its attachments. Unauthorized use, copying or further full or partial distribution of this e-mail or its contents is prohibited. **