Re: pyspark - gzip output compression

2015-02-05 Thread Kane Kim
I'm getting SequenceFile doesn't work with GzipCodec without native-hadoop code! Where to get those libs and where to put it in the spark? Also can I save plain text file (like saveAsTextFile) as gzip? Thanks. On Wed, Feb 4, 2015 at 11:10 PM, Kane Kim kane.ist...@gmail.com wrote: How to save

Re: pyspark - gzip output compression

2015-02-05 Thread Sean Owen
No, you can compress SequenceFile with gzip. If you are reading outside Hadoop then SequenceFile may not be a great choice. You can use the gzip codec with TextOutputFormat if you need to. On Feb 5, 2015 8:28 AM, Kane Kim kane.ist...@gmail.com wrote: I'm getting SequenceFile doesn't work with

pyspark - gzip output compression

2015-02-04 Thread Kane Kim
How to save RDD with gzip compression? Thanks.