[GitHub] spark pull request #17267: [SPARK-19926][PYSPARK] Make pyspark exception mor...

2017-08-05 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/17267


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17267: [SPARK-19926][PYSPARK] Make pyspark exception mor...

2017-06-20 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/17267#discussion_r123131587
  
--- Diff: python/pyspark/sql/utils.py ---
@@ -24,7 +28,11 @@ def __init__(self, desc, stackTrace):
 self.stackTrace = stackTrace
 
 def __str__(self):
-return repr(self.desc)
+desc = self.desc
+if isinstance(desc, unicode):
+return str(desc.encode('utf-8'))
--- End diff --

cc @zero323 and @davies too. Would you have some time to take a look for 
this one? This is a typical annoying problem between unicode and byte strings. 
There are many similar PRs (at least I can identify few PRs trying to handle 
this problem. One good example might help resolving other PRs too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17267: [SPARK-19926][PYSPARK] Make pyspark exception mor...

2017-06-13 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/17267#discussion_r121825610
  
--- Diff: python/pyspark/sql/utils.py ---
@@ -24,7 +28,11 @@ def __init__(self, desc, stackTrace):
 self.stackTrace = stackTrace
 
 def __str__(self):
-return repr(self.desc)
+desc = self.desc
+if isinstance(desc, unicode):
+return str(desc.encode('utf-8'))
--- End diff --

Good catch! I previously thought `str` works like Python2.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17267: [SPARK-19926][PYSPARK] Make pyspark exception mor...

2017-06-13 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/17267#discussion_r121815790
  
--- Diff: python/pyspark/sql/utils.py ---
@@ -24,7 +28,11 @@ def __init__(self, desc, stackTrace):
 self.stackTrace = stackTrace
 
 def __str__(self):
-return repr(self.desc)
+desc = self.desc
+if isinstance(desc, unicode):
+return str(desc.encode('utf-8'))
--- End diff --

@ueshin, you are right and I misread the codes. We need to

- unicode in Python 2 => `u.encode("utf-8")`.
- others in Python 2 => return `str(s)`.
- others in Python 3 => return `str(s)`.

Root cause for 
https://github.com/apache/spark/pull/17267#issuecomment-308231375 looks because 
`encode` on string (also same as unicode in Python 2) in Python 3 produces 
8-bit bytes, `b"..."`, (also same as normal string, `"..."` and `b"..."`, where 
`b` is ignored, in Python 2). And `str` function works differently as below:

Python 2

```python
>>> str(b"aa")
'aa'
>>> b"aa"
'aa'
```

Python 3

```python
>>> str(b"aa")
"b'aa'"
>>> "aa"
'aa'
```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17267: [SPARK-19926][PYSPARK] Make pyspark exception mor...

2017-03-14 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/17267#discussion_r105838261
  
--- Diff: python/pyspark/sql/utils.py ---
@@ -16,6 +16,10 @@
 #
 
 import py4j
+import sys
+
+if sys.version > '3':
--- End diff --

I think it should be `>=`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17267: [SPARK-19926][PYSPARK] Make pyspark exception mor...

2017-03-13 Thread uncleGen
Github user uncleGen commented on a diff in the pull request:

https://github.com/apache/spark/pull/17267#discussion_r105827541
  
--- Diff: python/pyspark/sql/utils.py ---
@@ -24,7 +24,7 @@ def __init__(self, desc, stackTrace):
 self.stackTrace = stackTrace
 
 def __str__(self):
-return repr(self.desc)
+return str(self.desc)
--- End diff --

based on latest commit:

```
>>> df.select("아")
Traceback (most recent call last):
  File "", line 1, in 
  File ".../spark/python/pyspark/sql/dataframe.py", line 992, in select
jdf = self._jdf.select(self._jcols(*cols))
  File ".../spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", 
line 1133, in __call__
  File ".../spark/python/pyspark/sql/utils.py", line 75, in deco
raise AnalysisException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.AnalysisException
: cannot resolve '`아`' given input columns: [age, name];;
'Project ['아]
+- Relation[age#0L,name#1] json


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17267: [SPARK-19926][PYSPARK] Make pyspark exception mor...

2017-03-13 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/17267#discussion_r105664922
  
--- Diff: python/pyspark/sql/utils.py ---
@@ -24,7 +24,7 @@ def __init__(self, desc, stackTrace):
 self.stackTrace = stackTrace
 
 def __str__(self):
-return repr(self.desc)
+return str(self.desc)
--- End diff --

Yea, I support this change and tested some more cases with that encode.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17267: [SPARK-19926][PYSPARK] Make pyspark exception mor...

2017-03-13 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/17267#discussion_r105663697
  
--- Diff: python/pyspark/sql/utils.py ---
@@ -24,7 +24,7 @@ def __init__(self, desc, stackTrace):
 self.stackTrace = stackTrace
 
 def __str__(self):
-return repr(self.desc)
+return str(self.desc)
--- End diff --

Maybe another benefit for this change is, before it you will see the error 
log in your example like:

u"cannot resolve '`\uc544`' given input columns: [id];;\n'Project ['\uc544]

`repr` will show unicode escape characters `\uc544`. Even you encode it, 
you will see binary representation for it. `str` can show the correct "아" if 
encoded with utf-8.

If I test it correctly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17267: [SPARK-19926][PYSPARK] Make pyspark exception mor...

2017-03-13 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/17267#discussion_r105659050
  
--- Diff: python/pyspark/sql/utils.py ---
@@ -24,7 +24,7 @@ def __init__(self, desc, stackTrace):
 self.stackTrace = stackTrace
 
 def __str__(self):
-return repr(self.desc)
+return str(self.desc)
--- End diff --

Ah, thank you for confirmation. I thought I was mistaken :).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17267: [SPARK-19926][PYSPARK] Make pyspark exception mor...

2017-03-13 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/17267#discussion_r105657313
  
--- Diff: python/pyspark/sql/utils.py ---
@@ -24,7 +24,7 @@ def __init__(self, desc, stackTrace):
 self.stackTrace = stackTrace
 
 def __str__(self):
-return repr(self.desc)
+return str(self.desc)
--- End diff --

@HyukjinKwon Good catch!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17267: [SPARK-19926][PYSPARK] Make pyspark exception mor...

2017-03-13 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/17267#discussion_r105657204
  
--- Diff: python/pyspark/sql/utils.py ---
@@ -24,7 +24,7 @@ def __init__(self, desc, stackTrace):
 self.stackTrace = stackTrace
 
 def __str__(self):
-return repr(self.desc)
+return str(self.desc)
--- End diff --

We can add a check under Python2. If it is unicode, just encode it with 
utf-8.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17267: [SPARK-19926][PYSPARK] Make pyspark exception mor...

2017-03-13 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/17267#discussion_r105654542
  
--- Diff: python/pyspark/sql/utils.py ---
@@ -24,7 +24,7 @@ def __init__(self, desc, stackTrace):
 self.stackTrace = stackTrace
 
 def __str__(self):
-return repr(self.desc)
+return str(self.desc)
--- End diff --

@uncleGen, could you double check if I did something wrong maybe?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17267: [SPARK-19926][PYSPARK] Make pyspark exception mor...

2017-03-13 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/17267#discussion_r105654236
  
--- Diff: python/pyspark/sql/utils.py ---
@@ -24,7 +24,7 @@ def __init__(self, desc, stackTrace):
 self.stackTrace = stackTrace
 
 def __str__(self):
-return repr(self.desc)
+return str(self.desc)
--- End diff --

I just tested with this change as below to help:

- before

```python
>>> try:
... spark.range(1).select("아")
... except Exception as e:
... print e
...

u"cannot resolve '`\uc544`' given input columns: [id];;\n'Project 
['\uc544]\n+- Range (0, 1, step=1, splits=Some(8))\n"
>>>
>>> spark.range(1).select("아")
Traceback (most recent call last):
  File "", line 1, in 
  File ".../spark/python/pyspark/sql/dataframe.py", line 992, in select
jdf = self._jdf.select(self._jcols(*cols))
  File ".../spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", 
line 1133, in __call__
  File ".../spark/python/pyspark/sql/utils.py", line 69, in deco
raise AnalysisException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.AnalysisException: u"cannot resolve '`\uc544`' given 
input columns: [id];;\n'Project ['\uc544]\n+- Range (0, 1, step=1, 
splits=Some(8))\n"
```

- after

```python
>>> try:
... spark.range(1).select("아")
... except Exception as e:
... print e
...
Traceback (most recent call last):
  File "", line 4, in 
  File ".../spark/python/pyspark/sql/utils.py", line 27, in __str__
return str(self.desc)
UnicodeEncodeError: 'ascii' codec can't encode character u'\uc544' in 
position 17: ordinal not in range(128)

>>> spark.range(1).select("아")
Traceback (most recent call last):
  File "", line 1, in 
  File ".../spark/python/pyspark/sql/dataframe.py", line 992, in select
jdf = self._jdf.select(self._jcols(*cols))
  File ".../spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", 
line 1133, in __call__
  File ".../spark/python/pyspark/sql/utils.py", line 69, in deco
raise AnalysisException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.AnalysisException
>>>
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17267: [SPARK-19926][PYSPARK] Make pyspark exception mor...

2017-03-13 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/17267#discussion_r105653661
  
--- Diff: python/pyspark/sql/utils.py ---
@@ -24,7 +24,7 @@ def __init__(self, desc, stackTrace):
 self.stackTrace = stackTrace
 
 def __str__(self):
-return repr(self.desc)
+return str(self.desc)
--- End diff --

Hm.. does this work for `unicode` in Python 2.7, for example, 
`spark.range(1).select("아")`? Up to my knowledge, converting it to ascii 
directly throws an exception.

```python
>>> str(u"아")
Traceback (most recent call last):
  File "", line 1, in 
UnicodeEncodeError: 'ascii' codec can't encode character u'\uc544' in 
position 0: ordinal not in range(128)
>>> repr(u"아")
"u'\\uc544'"
```

Maybe, we should check if this is `unicode` and do `.encode`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17267: [SPARK-19926][PYSPARK] Make pyspark exception mor...

2017-03-12 Thread uncleGen
GitHub user uncleGen opened a pull request:

https://github.com/apache/spark/pull/17267

[SPARK-19926][PYSPARK] Make pyspark exception more readable

## What changes were proposed in this pull request?

Exception in pyspark is a little difficult to read.

before pr, like:

```
Traceback (most recent call last):
  File "", line 5, in 
  File "/root/dev/spark/dist/python/pyspark/sql/streaming.py", line 853, in 
start
return self._sq(self._jwrite.start())
  File 
"/root/dev/spark/dist/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", 
line 1133, in __call__
  File "/root/dev/spark/dist/python/pyspark/sql/utils.py", line 69, in deco
raise AnalysisException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.AnalysisException: u'Append output mode not supported 
when there are streaming aggregations on streaming DataFrames/DataSets without 
watermark;;\nAggregate [window#17, word#5], [window#17 AS window#11, word#5, 
count(1) AS count#16L]\n+- Filter ((t#6 >= window#17.start) && (t#6 < 
window#17.end))\n   +- Expand [ArrayBuffer(named_struct(start, 
CEIL((cast((precisetimestamp(t#6) - 0) as double) / cast(3000 as 
double))) + cast(0 as bigint)) - cast(1 as bigint)) * 3000) + 0), end, 
(CEIL((cast((precisetimestamp(t#6) - 0) as double) / cast(3000 as 
double))) + cast(0 as bigint)) - cast(1 as bigint)) * 3000) + 0) + 
3000)), word#5, t#6-T3ms), ArrayBuffer(named_struct(start, 
CEIL((cast((precisetimestamp(t#6) - 0) as double) / cast(3000 as 
double))) + cast(1 as bigint)) - cast(1 as bigint)) * 3000) + 0), end, 
(CEIL((cast((precisetimestamp(t#6) - 0) as double) / cast(3000 as 
double))) + cast(1 as bigint)) - cast(1 as bigint))
  * 3000) + 0) + 3000)), word#5, t#6-T3ms)], [window#17, word#5, 
t#6-T3ms]\n  +- EventTimeWatermark t#6: timestamp, interval 30 
seconds\n +- Project [cast(word#0 as string) AS word#5, cast(t#1 as 
timestamp) AS t#6]\n+- StreamingRelation 
DataSource(org.apache.spark.sql.SparkSession@c4079ca,csv,List(),Some(StructType(StructField(word,StringType,true),
 StructField(t,IntegerType,true))),List(),None,Map(sep -> ;, path -> 
/tmp/data),None), FileSource[/tmp/data], [word#0, t#1]\n'
```

after pr:

```
Traceback (most recent call last):
  File "", line 5, in 
  File "/root/dev/spark/dist/python/pyspark/sql/streaming.py", line 853, in 
start
return self._sq(self._jwrite.start())
  File 
"/root/dev/spark/dist/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", 
line 1133, in __call__
  File "/root/dev/spark/dist/python/pyspark/sql/utils.py", line 69, in deco
raise AnalysisException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.AnalysisException: Append output mode not supported when 
there are streaming aggregations on streaming DataFrames/DataSets without 
watermark;;
Aggregate [window#17, word#5], [window#17 AS window#11, word#5, count(1) AS 
count#16L]
+- Filter ((t#6 >= window#17.start) && (t#6 < window#17.end))
   +- Expand [ArrayBuffer(named_struct(start, 
CEIL((cast((precisetimestamp(t#6) - 0) as double) / cast(3000 as 
double))) + cast(0 as bigint)) - cast(1 as bigint)) * 3000) + 0), end, 
(CEIL((cast((precisetimestamp(t#6) - 0) as double) / cast(3000 as 
double))) + cast(0 as bigint)) - cast(1 as bigint)) * 3000) + 0) + 
3000)), word#5, t#6-T3ms), ArrayBuffer(named_struct(start, 
CEIL((cast((precisetimestamp(t#6) - 0) as double) / cast(3000 as 
double))) + cast(1 as bigint)) - cast(1 as bigint)) * 3000) + 0), end, 
(CEIL((cast((precisetimestamp(t#6) - 0) as double) / cast(3000 as 
double))) + cast(1 as bigint)) - cast(1 as bigint)) * 3000) + 0) + 
3000)), word#5, t#6-T3ms)], [window#17, word#5, t#6-T3ms]
  +- EventTimeWatermark t#6: timestamp, interval 30 seconds
 +- Project [cast(word#0 as string) AS word#5, cast(t#1 as 
timestamp) AS t#6]
+- StreamingRelation 
DataSource(org.apache.spark.sql.SparkSession@5265083b,csv,List(),Some(StructType(StructField(word,StringType,true),
 StructField(t,IntegerType,true))),List(),None,Map(sep -> ;, path -> 
/tmp/data),None), FileSource[/tmp/data], [word#0, t#1]
```

## How was this patch tested?

Jenkins


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/uncleGen/spark SPARK-19926

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17267.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17267


commit 273c1bc8d719158dd074cb806d5db487b9709edb
Author: uncleGen 
Date:   2017-03-12T12:57:31Z

Make pyspark exception more readable




---
If