Workarounds for accessing sequence file data via PySpark?

2014-07-23 Thread Gary Malouf
I am aware that today PySpark can not load sequence files directly. Are there work-arounds people are using (short of duplicating all the data to text files) for accessing this data?

Re: Workarounds for accessing sequence file data via PySpark?

2014-07-23 Thread Nick Pentreath
Load from sequenceFile for PySpark is in master and save is in this PR underway (https://github.com/apache/spark/pull/1338) I hope that Kan will have it ready to merge in time for 1.1 release window (it should be, the PR just needs a final review or two). In the meantime you can check out master