[ 
https://issues.apache.org/jira/browse/SPARK-12070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15033218#comment-15033218
 ] 

Jeff Zhang commented on SPARK-12070:
------------------------------------

The root cause is that when using syntax like this str[1:] for slice, the 
length will be set as the max int of python which is long for java. Because the 
range of python int is larger than that of java int. 



> PySpark implementation of Slicing operator incorrect
> ----------------------------------------------------
>
>                 Key: SPARK-12070
>                 URL: https://issues.apache.org/jira/browse/SPARK-12070
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 1.5.2
>            Reporter: Jeff Zhang
>
> {code}
> aa=('Ofer', 1), ('Wei', 2)
> a = sqlContext.createDataFrame(aa)
> a.select(a._1[2:]).show()
> {code}
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "/Users/jzhang/github/spark/python/pyspark/sql/column.py", line 286, 
> in substr
>     jc = self._jc.substr(startPos, length)
>   File 
> "/Users/jzhang/github/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py",
>  line 813, in __call__
>   File "/Users/jzhang/github/spark/python/pyspark/sql/utils.py", line 45, in 
> deco
>     return f(*a, **kw)
>   File 
> "/Users/jzhang/github/spark/python/lib/py4j-0.9-src.zip/py4j/protocol.py", 
> line 312, in get_return_value
> py4j.protocol.Py4JError: An error occurred while calling o37.substr. Trace:
> py4j.Py4JException: Method substr([class java.lang.Integer, class 
> java.lang.Long]) does not exist
>       at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:335)
>       at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:344)
>       at py4j.Gateway.invoke(Gateway.java:252)
>       at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
>       at py4j.commands.CallCommand.execute(CallCommand.java:79)
>       at py4j.GatewayConnection.run(GatewayConnection.java:209)
>       at java.lang.Thread.run(Thread.java:745)
> {code}
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to