[ https://issues.apache.org/jira/browse/SPARK-5633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14308128#comment-14308128 ]
Vladimir Vladimirov commented on SPARK-5633: -------------------------------------------- Here is a workaround before proposed functionality will be accepted: {code} def saveAsTextFileCompressed(t, path, codec="org.apache.hadoop.io.compress.GzipCodec"): def func(split, iterator): for x in iterator: if not isinstance(x, basestring): x = unicode(x) if isinstance(x, unicode): x = x.encode("utf-8") yield x keyed = t.mapPartitionsWithIndex(func) keyed._bypass_serializer = True codecClass = SparkContext._jvm.java.lang.Class.forName(codec) keyed._jrdd.map(t.ctx._jvm.BytesToString()).saveAsTextFile(path, codecClass) {code} > pyspark saveAsTextFile support for compression codec > ---------------------------------------------------- > > Key: SPARK-5633 > URL: https://issues.apache.org/jira/browse/SPARK-5633 > Project: Spark > Issue Type: Improvement > Components: PySpark > Affects Versions: 1.2.0 > Reporter: Vladimir Vladimirov > Priority: Minor > > Scala and Java API allows to provide compression codec with > saveAsTextFile(path, codec) > PySpark saveAsTextFile API does not support passing codec class. > This story is about adding saveAsTextFile(path, codec) support into pyspark. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org