Re: How to connect HBase and Spark using Python?

2016-07-25 Thread Def_Os
Solved, see: http://stackoverflow.com/questions/38470114/how-to-connect-hbase-and-spark-using-python/38575095 -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-connect-HBase-and-Spark-using-Python-tp27372p27409.html Sent from the Apache Spark User List

Re: How to connect HBase and Spark using Python?

2016-07-22 Thread Def_Os
So it appears it should be possible to use HBase's new hbase-spark module, if you follow this pattern: https://hbase.apache.org/book.html#_sparksql_dataframes Unfortunately, when I run my example from PySpark, I get the following exception: > py4j.protocol.Py4JJavaError: An error occurred while

How to connect HBase and Spark using Python?

2016-07-20 Thread Def_Os
I'd like to know whether there's any way to query HBase with Spark SQL via the PySpark interface. See my question on SO: http://stackoverflow.com/questions/38470114/how-to-connect-hbase-and-spark-using-python The new HBase-Spark module in HBase, which introduces the HBaseContext/JavaHBaseContext,

Pandas timezone problems

2015-05-21 Thread Def_Os
After deserialization, something seems to be wrong with my pandas DataFrames. It looks like the timezone information is lost, and subsequent errors ensue. Serializing and deserializing a timezone-aware DataFrame tests just fine, so it must be Spark that somehow changes the data. My program runs

How does custom partitioning in PySpark work?

2014-10-29 Thread Def_Os
I want several RDDs (which are the result of my program's operations on existing RDDs) to match the partitioning of an existing RDD, since they will be joined together in the end. Do I understand correctly that I would benefit from using a custom partitioner that would be applied to all RDDs?