Re: PySpark saveAsTextFile gzip

2015-01-15 Thread Akhil Das
You can use the saveAsNewAPIHadoop http://spark.apache.org/docs/1.1.0/api/python/pyspark.rdd.RDD-class.html#saveAsNewAPIHadoopFile file. You can use it for compressing your output, here's a sample code https://github.com/ScrapCodes/spark-1/blob/master/python/pyspark/tests.py#L1225 to use the API.

PySpark saveAsTextFile gzip

2015-01-15 Thread Tom Seddon
Hi, I've searched but can't seem to find a PySpark example. How do I write compressed text file output to S3 using PySpark saveAsTextFile? Thanks, Tom