[GitHub] spark pull request #17267: [SPARK-19926][PYSPARK] Make pyspark exception mor...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/17267 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17267: [SPARK-19926][PYSPARK] Make pyspark exception mor...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17267#discussion_r123131587 --- Diff: python/pyspark/sql/utils.py --- @@ -24,7 +28,11 @@ def __init__(self, desc, stackTrace): self.stackTrace = stackTrace def __str__(self): -return repr(self.desc) +desc = self.desc +if isinstance(desc, unicode): +return str(desc.encode('utf-8')) --- End diff -- cc @zero323 and @davies too. Would you have some time to take a look for this one? This is a typical annoying problem between unicode and byte strings. There are many similar PRs (at least I can identify few PRs trying to handle this problem. One good example might help resolving other PRs too. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17267: [SPARK-19926][PYSPARK] Make pyspark exception mor...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/17267#discussion_r121825610 --- Diff: python/pyspark/sql/utils.py --- @@ -24,7 +28,11 @@ def __init__(self, desc, stackTrace): self.stackTrace = stackTrace def __str__(self): -return repr(self.desc) +desc = self.desc +if isinstance(desc, unicode): +return str(desc.encode('utf-8')) --- End diff -- Good catch! I previously thought `str` works like Python2. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17267: [SPARK-19926][PYSPARK] Make pyspark exception mor...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17267#discussion_r121815790 --- Diff: python/pyspark/sql/utils.py --- @@ -24,7 +28,11 @@ def __init__(self, desc, stackTrace): self.stackTrace = stackTrace def __str__(self): -return repr(self.desc) +desc = self.desc +if isinstance(desc, unicode): +return str(desc.encode('utf-8')) --- End diff -- @ueshin, you are right and I misread the codes. We need to - unicode in Python 2 => `u.encode("utf-8")`. - others in Python 2 => return `str(s)`. - others in Python 3 => return `str(s)`. Root cause for https://github.com/apache/spark/pull/17267#issuecomment-308231375 looks because `encode` on string (also same as unicode in Python 2) in Python 3 produces 8-bit bytes, `b"..."`, (also same as normal string, `"..."` and `b"..."`, where `b` is ignored, in Python 2). And `str` function works differently as below: Python 2 ```python >>> str(b"aa") 'aa' >>> b"aa" 'aa' ``` Python 3 ```python >>> str(b"aa") "b'aa'" >>> "aa" 'aa' ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17267: [SPARK-19926][PYSPARK] Make pyspark exception mor...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/17267#discussion_r105838261 --- Diff: python/pyspark/sql/utils.py --- @@ -16,6 +16,10 @@ # import py4j +import sys + +if sys.version > '3': --- End diff -- I think it should be `>=`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17267: [SPARK-19926][PYSPARK] Make pyspark exception mor...
Github user uncleGen commented on a diff in the pull request: https://github.com/apache/spark/pull/17267#discussion_r105827541 --- Diff: python/pyspark/sql/utils.py --- @@ -24,7 +24,7 @@ def __init__(self, desc, stackTrace): self.stackTrace = stackTrace def __str__(self): -return repr(self.desc) +return str(self.desc) --- End diff -- based on latest commit: ``` >>> df.select("ì") Traceback (most recent call last): File "", line 1, in File ".../spark/python/pyspark/sql/dataframe.py", line 992, in select jdf = self._jdf.select(self._jcols(*cols)) File ".../spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__ File ".../spark/python/pyspark/sql/utils.py", line 75, in deco raise AnalysisException(s.split(': ', 1)[1], stackTrace) pyspark.sql.utils.AnalysisException : cannot resolve '`ì`' given input columns: [age, name];; 'Project ['ì] +- Relation[age#0L,name#1] json --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17267: [SPARK-19926][PYSPARK] Make pyspark exception mor...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17267#discussion_r105664922 --- Diff: python/pyspark/sql/utils.py --- @@ -24,7 +24,7 @@ def __init__(self, desc, stackTrace): self.stackTrace = stackTrace def __str__(self): -return repr(self.desc) +return str(self.desc) --- End diff -- Yea, I support this change and tested some more cases with that encode. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17267: [SPARK-19926][PYSPARK] Make pyspark exception mor...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/17267#discussion_r105663697 --- Diff: python/pyspark/sql/utils.py --- @@ -24,7 +24,7 @@ def __init__(self, desc, stackTrace): self.stackTrace = stackTrace def __str__(self): -return repr(self.desc) +return str(self.desc) --- End diff -- Maybe another benefit for this change is, before it you will see the error log in your example like: u"cannot resolve '`\uc544`' given input columns: [id];;\n'Project ['\uc544] `repr` will show unicode escape characters `\uc544`. Even you encode it, you will see binary representation for it. `str` can show the correct "ì" if encoded with utf-8. If I test it correctly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17267: [SPARK-19926][PYSPARK] Make pyspark exception mor...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17267#discussion_r105659050 --- Diff: python/pyspark/sql/utils.py --- @@ -24,7 +24,7 @@ def __init__(self, desc, stackTrace): self.stackTrace = stackTrace def __str__(self): -return repr(self.desc) +return str(self.desc) --- End diff -- Ah, thank you for confirmation. I thought I was mistaken :). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17267: [SPARK-19926][PYSPARK] Make pyspark exception mor...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/17267#discussion_r105657313 --- Diff: python/pyspark/sql/utils.py --- @@ -24,7 +24,7 @@ def __init__(self, desc, stackTrace): self.stackTrace = stackTrace def __str__(self): -return repr(self.desc) +return str(self.desc) --- End diff -- @HyukjinKwon Good catch! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17267: [SPARK-19926][PYSPARK] Make pyspark exception mor...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/17267#discussion_r105657204 --- Diff: python/pyspark/sql/utils.py --- @@ -24,7 +24,7 @@ def __init__(self, desc, stackTrace): self.stackTrace = stackTrace def __str__(self): -return repr(self.desc) +return str(self.desc) --- End diff -- We can add a check under Python2. If it is unicode, just encode it with utf-8. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17267: [SPARK-19926][PYSPARK] Make pyspark exception mor...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17267#discussion_r105654542 --- Diff: python/pyspark/sql/utils.py --- @@ -24,7 +24,7 @@ def __init__(self, desc, stackTrace): self.stackTrace = stackTrace def __str__(self): -return repr(self.desc) +return str(self.desc) --- End diff -- @uncleGen, could you double check if I did something wrong maybe? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17267: [SPARK-19926][PYSPARK] Make pyspark exception mor...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17267#discussion_r105654236 --- Diff: python/pyspark/sql/utils.py --- @@ -24,7 +24,7 @@ def __init__(self, desc, stackTrace): self.stackTrace = stackTrace def __str__(self): -return repr(self.desc) +return str(self.desc) --- End diff -- I just tested with this change as below to help: - before ```python >>> try: ... spark.range(1).select("ì") ... except Exception as e: ... print e ... u"cannot resolve '`\uc544`' given input columns: [id];;\n'Project ['\uc544]\n+- Range (0, 1, step=1, splits=Some(8))\n" >>> >>> spark.range(1).select("ì") Traceback (most recent call last): File "", line 1, in File ".../spark/python/pyspark/sql/dataframe.py", line 992, in select jdf = self._jdf.select(self._jcols(*cols)) File ".../spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__ File ".../spark/python/pyspark/sql/utils.py", line 69, in deco raise AnalysisException(s.split(': ', 1)[1], stackTrace) pyspark.sql.utils.AnalysisException: u"cannot resolve '`\uc544`' given input columns: [id];;\n'Project ['\uc544]\n+- Range (0, 1, step=1, splits=Some(8))\n" ``` - after ```python >>> try: ... spark.range(1).select("ì") ... except Exception as e: ... print e ... Traceback (most recent call last): File "", line 4, in File ".../spark/python/pyspark/sql/utils.py", line 27, in __str__ return str(self.desc) UnicodeEncodeError: 'ascii' codec can't encode character u'\uc544' in position 17: ordinal not in range(128) >>> spark.range(1).select("ì") Traceback (most recent call last): File "", line 1, in File ".../spark/python/pyspark/sql/dataframe.py", line 992, in select jdf = self._jdf.select(self._jcols(*cols)) File ".../spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__ File ".../spark/python/pyspark/sql/utils.py", line 69, in deco raise AnalysisException(s.split(': ', 1)[1], stackTrace) pyspark.sql.utils.AnalysisException >>> ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17267: [SPARK-19926][PYSPARK] Make pyspark exception mor...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17267#discussion_r105653661 --- Diff: python/pyspark/sql/utils.py --- @@ -24,7 +24,7 @@ def __init__(self, desc, stackTrace): self.stackTrace = stackTrace def __str__(self): -return repr(self.desc) +return str(self.desc) --- End diff -- Hm.. does this work for `unicode` in Python 2.7, for example, `spark.range(1).select("ì")`? Up to my knowledge, converting it to ascii directly throws an exception. ```python >>> str(u"ì") Traceback (most recent call last): File "", line 1, in UnicodeEncodeError: 'ascii' codec can't encode character u'\uc544' in position 0: ordinal not in range(128) >>> repr(u"ì") "u'\\uc544'" ``` Maybe, we should check if this is `unicode` and do `.encode`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17267: [SPARK-19926][PYSPARK] Make pyspark exception mor...
GitHub user uncleGen opened a pull request: https://github.com/apache/spark/pull/17267 [SPARK-19926][PYSPARK] Make pyspark exception more readable ## What changes were proposed in this pull request? Exception in pyspark is a little difficult to read. before pr, like: ``` Traceback (most recent call last): File "", line 5, in File "/root/dev/spark/dist/python/pyspark/sql/streaming.py", line 853, in start return self._sq(self._jwrite.start()) File "/root/dev/spark/dist/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__ File "/root/dev/spark/dist/python/pyspark/sql/utils.py", line 69, in deco raise AnalysisException(s.split(': ', 1)[1], stackTrace) pyspark.sql.utils.AnalysisException: u'Append output mode not supported when there are streaming aggregations on streaming DataFrames/DataSets without watermark;;\nAggregate [window#17, word#5], [window#17 AS window#11, word#5, count(1) AS count#16L]\n+- Filter ((t#6 >= window#17.start) && (t#6 < window#17.end))\n +- Expand [ArrayBuffer(named_struct(start, CEIL((cast((precisetimestamp(t#6) - 0) as double) / cast(3000 as double))) + cast(0 as bigint)) - cast(1 as bigint)) * 3000) + 0), end, (CEIL((cast((precisetimestamp(t#6) - 0) as double) / cast(3000 as double))) + cast(0 as bigint)) - cast(1 as bigint)) * 3000) + 0) + 3000)), word#5, t#6-T3ms), ArrayBuffer(named_struct(start, CEIL((cast((precisetimestamp(t#6) - 0) as double) / cast(3000 as double))) + cast(1 as bigint)) - cast(1 as bigint)) * 3000) + 0), end, (CEIL((cast((precisetimestamp(t#6) - 0) as double) / cast(3000 as double))) + cast(1 as bigint)) - cast(1 as bigint)) * 3000) + 0) + 3000)), word#5, t#6-T3ms)], [window#17, word#5, t#6-T3ms]\n +- EventTimeWatermark t#6: timestamp, interval 30 seconds\n +- Project [cast(word#0 as string) AS word#5, cast(t#1 as timestamp) AS t#6]\n+- StreamingRelation DataSource(org.apache.spark.sql.SparkSession@c4079ca,csv,List(),Some(StructType(StructField(word,StringType,true), StructField(t,IntegerType,true))),List(),None,Map(sep -> ;, path -> /tmp/data),None), FileSource[/tmp/data], [word#0, t#1]\n' ``` after pr: ``` Traceback (most recent call last): File "", line 5, in File "/root/dev/spark/dist/python/pyspark/sql/streaming.py", line 853, in start return self._sq(self._jwrite.start()) File "/root/dev/spark/dist/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__ File "/root/dev/spark/dist/python/pyspark/sql/utils.py", line 69, in deco raise AnalysisException(s.split(': ', 1)[1], stackTrace) pyspark.sql.utils.AnalysisException: Append output mode not supported when there are streaming aggregations on streaming DataFrames/DataSets without watermark;; Aggregate [window#17, word#5], [window#17 AS window#11, word#5, count(1) AS count#16L] +- Filter ((t#6 >= window#17.start) && (t#6 < window#17.end)) +- Expand [ArrayBuffer(named_struct(start, CEIL((cast((precisetimestamp(t#6) - 0) as double) / cast(3000 as double))) + cast(0 as bigint)) - cast(1 as bigint)) * 3000) + 0), end, (CEIL((cast((precisetimestamp(t#6) - 0) as double) / cast(3000 as double))) + cast(0 as bigint)) - cast(1 as bigint)) * 3000) + 0) + 3000)), word#5, t#6-T3ms), ArrayBuffer(named_struct(start, CEIL((cast((precisetimestamp(t#6) - 0) as double) / cast(3000 as double))) + cast(1 as bigint)) - cast(1 as bigint)) * 3000) + 0), end, (CEIL((cast((precisetimestamp(t#6) - 0) as double) / cast(3000 as double))) + cast(1 as bigint)) - cast(1 as bigint)) * 3000) + 0) + 3000)), word#5, t#6-T3ms)], [window#17, word#5, t#6-T3ms] +- EventTimeWatermark t#6: timestamp, interval 30 seconds +- Project [cast(word#0 as string) AS word#5, cast(t#1 as timestamp) AS t#6] +- StreamingRelation DataSource(org.apache.spark.sql.SparkSession@5265083b,csv,List(),Some(StructType(StructField(word,StringType,true), StructField(t,IntegerType,true))),List(),None,Map(sep -> ;, path -> /tmp/data),None), FileSource[/tmp/data], [word#0, t#1] ``` ## How was this patch tested? Jenkins You can merge this pull request into a Git repository by running: $ git pull https://github.com/uncleGen/spark SPARK-19926 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17267.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17267 commit 273c1bc8d719158dd074cb806d5db487b9709edb Author: uncleGenDate: 2017-03-12T12:57:31Z Make pyspark exception more readable --- If