Brian ONeill created SPARK-11406: ------------------------------------ Summary: utf-8 decode issue w/ kinesis Key: SPARK-11406 URL: https://issues.apache.org/jira/browse/SPARK-11406 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 1.5.1 Environment: osx, spark:master Reporter: Brian ONeill
If we get a bad message over Kinesis, the spark job blows up with: Caused by: org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/Users/brianoneill/git/spark/python/lib/pyspark.zip/pyspark/worker.py", line 111, in main process() File "/Users/brianoneill/git/spark/python/lib/pyspark.zip/pyspark/worker.py", line 106, in process serializer.dump_stream(func(split_index, iterator), outfile) File "/Users/brianoneill/git/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 2347, in pipeline_func File "/Users/brianoneill/git/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 2347, in pipeline_func File "/Users/brianoneill/git/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 317, in func File "/Users/brianoneill/git/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 1777, in combineLocally File "/Users/brianoneill/git/spark/python/lib/pyspark.zip/pyspark/shuffle.py", line 236, in mergeValues for k, v in iterator: File "/Users/brianoneill/git/spark/python/lib/pyspark.zip/pyspark/streaming/kinesis.py", line 88, in <lambda> File "/Users/brianoneill/git/spark/python/lib/pyspark.zip/pyspark/streaming/kinesis.py", line 31, in utf8_decoder return s.decode('utf-8') File "/Users/brianoneill/venv/monetate/lib/python2.7/encodings/utf_8.py", line 16, in decode return codecs.utf_8_decode(input, errors, True) UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-2: invalid continuation byte This kills the job. Should there be a more elegant way to handle this case? -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org