Hey all,

Just thought I'd share this with the list in case any one else would
benefit.  Currently working on a proper integration of PySpark and
DataStax's new Cassandra-Spark connector, but that's on going.

In the meanwhile, I've basically updated the cassandra_inputformat.py and
cassandra_outputformat.py examples that come with Spark.
https://github.com/Parsely/pyspark-cassandra.

The new example shows reading and writing to Cassandra including proper
handling of CQL 3.1 collections: lists, sets and maps. Think it also
clarifies the format RDDs are required be in to write data to Cassandra
<https://github.com/Parsely/pyspark-cassandra/blob/master/src/main/python/pyspark_cassandra_hadoop_example.py#L83-L97>
and
provides a more general serializer
<https://github.com/Parsely/pyspark-cassandra/blob/master/src/main/scala/SparkConverters.scala#L34-L88>
to write Python (serialized via Py4J) structs to Cassandra.

Comments or questions are welcome. Will update the group again when we have
support for the DataStax connector.

-- 
Mike Sukmanowsky
Aspiring Digital Carpenter

*p*: +1 (416) 953-4248
*e*: mike.sukmanow...@gmail.com

facebook <http://facebook.com/mike.sukmanowsky> | twitter
<http://twitter.com/msukmanowsky> | LinkedIn
<http://www.linkedin.com/profile/view?id=10897143> | github
<https://github.com/msukmanowsky>

Reply via email to