[ 
https://issues.apache.org/jira/browse/SPARK-5633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14308128#comment-14308128
 ] 

Vladimir Vladimirov commented on SPARK-5633:
--------------------------------------------

Here is a workaround before proposed functionality will be accepted:
{code}
def saveAsTextFileCompressed(t, path, 
codec="org.apache.hadoop.io.compress.GzipCodec"):
    def func(split, iterator):
        for x in iterator:
            if not isinstance(x, basestring):
                x = unicode(x)
            if isinstance(x, unicode):
                x = x.encode("utf-8")
            yield x
    keyed = t.mapPartitionsWithIndex(func)
    keyed._bypass_serializer = True
    codecClass = SparkContext._jvm.java.lang.Class.forName(codec)
    keyed._jrdd.map(t.ctx._jvm.BytesToString()).saveAsTextFile(path, codecClass)
{code}

> pyspark saveAsTextFile support for compression codec
> ----------------------------------------------------
>
>                 Key: SPARK-5633
>                 URL: https://issues.apache.org/jira/browse/SPARK-5633
>             Project: Spark
>          Issue Type: Improvement
>          Components: PySpark
>    Affects Versions: 1.2.0
>            Reporter: Vladimir Vladimirov
>            Priority: Minor
>
> Scala and Java API allows to provide compression codec with 
> saveAsTextFile(path, codec)
> PySpark saveAsTextFile API does not support passing codec class.
> This story is about adding saveAsTextFile(path, codec) support into pyspark.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to