[ 
https://issues.apache.org/jira/browse/SPARK-11406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian ONeill updated SPARK-11406:
---------------------------------
    Comment: was deleted

(was: https://github.com/apache/spark/pull/9360)

> utf-8 decode issue w/ kinesis
> -----------------------------
>
>                 Key: SPARK-11406
>                 URL: https://issues.apache.org/jira/browse/SPARK-11406
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 1.5.1
>         Environment: osx, spark:master
>            Reporter: Brian ONeill
>            Priority: Minor
>
> If we get a bad message over Kinesis, the spark job blows up with:
> Caused by: org.apache.spark.api.python.PythonException: Traceback (most 
> recent call last):
>   File 
> "/Users/brianoneill/git/spark/python/lib/pyspark.zip/pyspark/worker.py", line 
> 111, in main
>     process()
>   File 
> "/Users/brianoneill/git/spark/python/lib/pyspark.zip/pyspark/worker.py", line 
> 106, in process
>     serializer.dump_stream(func(split_index, iterator), outfile)
>   File "/Users/brianoneill/git/spark/python/lib/pyspark.zip/pyspark/rdd.py", 
> line 2347, in pipeline_func
>   File "/Users/brianoneill/git/spark/python/lib/pyspark.zip/pyspark/rdd.py", 
> line 2347, in pipeline_func
>   File "/Users/brianoneill/git/spark/python/lib/pyspark.zip/pyspark/rdd.py", 
> line 317, in func
>   File "/Users/brianoneill/git/spark/python/lib/pyspark.zip/pyspark/rdd.py", 
> line 1777, in combineLocally
>   File 
> "/Users/brianoneill/git/spark/python/lib/pyspark.zip/pyspark/shuffle.py", 
> line 236, in mergeValues
>     for k, v in iterator:
>   File 
> "/Users/brianoneill/git/spark/python/lib/pyspark.zip/pyspark/streaming/kinesis.py",
>  line 88, in <lambda>
>   File 
> "/Users/brianoneill/git/spark/python/lib/pyspark.zip/pyspark/streaming/kinesis.py",
>  line 31, in utf8_decoder
>     return s.decode('utf-8')
>   File "/Users/brianoneill/venv/monetate/lib/python2.7/encodings/utf_8.py", 
> line 16, in decode
>     return codecs.utf_8_decode(input, errors, True)
> UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-2: invalid 
> continuation byte
> This kills the job.  Should there be a more elegant way to handle this case?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to