GitHub user uncleGen opened a pull request: https://github.com/apache/spark/pull/17267
[SPARK-19926][PYSPARK] Make pyspark exception more readable ## What changes were proposed in this pull request? Exception in pyspark is a little difficult to read. before pr, like: ``` Traceback (most recent call last): File "<stdin>", line 5, in <module> File "/root/dev/spark/dist/python/pyspark/sql/streaming.py", line 853, in start return self._sq(self._jwrite.start()) File "/root/dev/spark/dist/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__ File "/root/dev/spark/dist/python/pyspark/sql/utils.py", line 69, in deco raise AnalysisException(s.split(': ', 1)[1], stackTrace) pyspark.sql.utils.AnalysisException: u'Append output mode not supported when there are streaming aggregations on streaming DataFrames/DataSets without watermark;;\nAggregate [window#17, word#5], [window#17 AS window#11, word#5, count(1) AS count#16L]\n+- Filter ((t#6 >= window#17.start) && (t#6 < window#17.end))\n +- Expand [ArrayBuffer(named_struct(start, ((((CEIL((cast((precisetimestamp(t#6) - 0) as double) / cast(30000000 as double))) + cast(0 as bigint)) - cast(1 as bigint)) * 30000000) + 0), end, (((((CEIL((cast((precisetimestamp(t#6) - 0) as double) / cast(30000000 as double))) + cast(0 as bigint)) - cast(1 as bigint)) * 30000000) + 0) + 30000000)), word#5, t#6-T30000ms), ArrayBuffer(named_struct(start, ((((CEIL((cast((precisetimestamp(t#6) - 0) as double) / cast(30000000 as double))) + cast(1 as bigint)) - cast(1 as bigint)) * 30000000) + 0), end, (((((CEIL((cast((precisetimestamp(t#6) - 0) as double) / cast(30000000 as double))) + cast(1 as bigint)) - cast(1 as bigint)) * 30000000) + 0) + 30000000)), word#5, t#6-T30000ms)], [window#17, word#5, t#6-T30000ms]\n +- EventTimeWatermark t#6: timestamp, interval 30 seconds\n +- Project [cast(word#0 as string) AS word#5, cast(t#1 as timestamp) AS t#6]\n +- StreamingRelation DataSource(org.apache.spark.sql.SparkSession@c4079ca,csv,List(),Some(StructType(StructField(word,StringType,true), StructField(t,IntegerType,true))),List(),None,Map(sep -> ;, path -> /tmp/data),None), FileSource[/tmp/data], [word#0, t#1]\n' ``` after pr: ``` Traceback (most recent call last): File "<stdin>", line 5, in <module> File "/root/dev/spark/dist/python/pyspark/sql/streaming.py", line 853, in start return self._sq(self._jwrite.start()) File "/root/dev/spark/dist/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__ File "/root/dev/spark/dist/python/pyspark/sql/utils.py", line 69, in deco raise AnalysisException(s.split(': ', 1)[1], stackTrace) pyspark.sql.utils.AnalysisException: Append output mode not supported when there are streaming aggregations on streaming DataFrames/DataSets without watermark;; Aggregate [window#17, word#5], [window#17 AS window#11, word#5, count(1) AS count#16L] +- Filter ((t#6 >= window#17.start) && (t#6 < window#17.end)) +- Expand [ArrayBuffer(named_struct(start, ((((CEIL((cast((precisetimestamp(t#6) - 0) as double) / cast(30000000 as double))) + cast(0 as bigint)) - cast(1 as bigint)) * 30000000) + 0), end, (((((CEIL((cast((precisetimestamp(t#6) - 0) as double) / cast(30000000 as double))) + cast(0 as bigint)) - cast(1 as bigint)) * 30000000) + 0) + 30000000)), word#5, t#6-T30000ms), ArrayBuffer(named_struct(start, ((((CEIL((cast((precisetimestamp(t#6) - 0) as double) / cast(30000000 as double))) + cast(1 as bigint)) - cast(1 as bigint)) * 30000000) + 0), end, (((((CEIL((cast((precisetimestamp(t#6) - 0) as double) / cast(30000000 as double))) + cast(1 as bigint)) - cast(1 as bigint)) * 30000000) + 0) + 30000000)), word#5, t#6-T30000ms)], [window#17, word#5, t#6-T30000ms] +- EventTimeWatermark t#6: timestamp, interval 30 seconds +- Project [cast(word#0 as string) AS word#5, cast(t#1 as timestamp) AS t#6] +- StreamingRelation DataSource(org.apache.spark.sql.SparkSession@5265083b,csv,List(),Some(StructType(StructField(word,StringType,true), StructField(t,IntegerType,true))),List(),None,Map(sep -> ;, path -> /tmp/data),None), FileSource[/tmp/data], [word#0, t#1] ``` ## How was this patch tested? Jenkins You can merge this pull request into a Git repository by running: $ git pull https://github.com/uncleGen/spark SPARK-19926 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17267.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17267 ---- commit 273c1bc8d719158dd074cb806d5db487b9709edb Author: uncleGen <husty...@gmail.com> Date: 2017-03-12T12:57:31Z Make pyspark exception more readable ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org