Using a compression codec in saveAsSequenceFile in Pyspark (Python API)

2014-11-13 Thread sahanbull
Hi, I am trying to save an RDD to an S3 bucket using RDD.saveAsSequenceFile(self, path, CompressionCodec) function in python. I need to save the RDD in GZIP. Can anyone tell me how to send the gzip codec class as a parameter into the function. I tried

Re: Using a compression codec in saveAsSequenceFile in Pyspark (Python API)

2014-11-13 Thread Davies Liu
You could use the following as compressionCodecClass: DEFLATE org.apache.hadoop.io.compress.DefaultCodec gzip org.apache.hadoop.io.compress.GzipCodec bzip2 org.apache.hadoop.io.compress.BZip2Codec LZO com.hadoop.compression.lzo.LzopCodec for gzip,