[jira] [Commented] (SPARK-27052) Using PySpark udf in transform yields NULL values

2019-03-27 Thread Artem Rybin (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16802563#comment-16802563
 ] 

Artem Rybin commented on SPARK-27052:
-

[~ueshin], how I understood, you had implemented this feature on Scala.

Do you have ideas about this issue?

> Using PySpark udf in transform yields NULL values
> -
>
> Key: SPARK-27052
> URL: https://issues.apache.org/jira/browse/SPARK-27052
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 2.4.0
>Reporter: hejsgpuom62c
>Priority: Major
>
> Steps to reproduce
> {code:java}
> from typing import Optional
> from pyspark.sql.functions import expr
> def f(x: Optional[int]) -> Optional[int]:
> return x + 1 if x is not None else None
> spark.udf.register('f', f, "integer")
> df = (spark
> .createDataFrame([(1, [1, 2, 3])], ("id", "xs"))
> .withColumn("xsinc", expr("transform(xs, x -> f(x))")))
> df.show()
> # +---+-+-+
> # | id|   xs|xsinc|
> # +---+-+-+
> # |  1|[1, 2, 3]| [,,]|
> # +---+-+-+
> {code}
>  
> Source https://stackoverflow.com/a/53762650



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27052) Using PySpark udf in transform yields NULL values

2019-03-22 Thread Herman van Hovell (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16799017#comment-16799017
 ] 

Herman van Hovell commented on SPARK-27052:
---

This is not supported at the moment. This will probably be non-trivial to 
implement since we need to figure an performant way to invoke python here. In 
this particular case we can probably rewrite the higher order function into a 
chain map operations of which one will be executed by python. Anyway lets 
discuss this first before starting to code this up.

> Using PySpark udf in transform yields NULL values
> -
>
> Key: SPARK-27052
> URL: https://issues.apache.org/jira/browse/SPARK-27052
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 2.4.0
>Reporter: hejsgpuom62c
>Priority: Major
>
> Steps to reproduce
> {code:java}
> from typing import Optional
> from pyspark.sql.functions import expr
> def f(x: Optional[int]) -> Optional[int]:
> return x + 1 if x is not None else None
> spark.udf.register('f', f, "integer")
> df = (spark
> .createDataFrame([(1, [1, 2, 3])], ("id", "xs"))
> .withColumn("xsinc", expr("transform(xs, x -> f(x))")))
> df.show()
> # +---+-+-+
> # | id|   xs|xsinc|
> # +---+-+-+
> # |  1|[1, 2, 3]| [,,]|
> # +---+-+-+
> {code}
>  
> Source https://stackoverflow.com/a/53762650



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27052) Using PySpark udf in transform yields NULL values

2019-03-19 Thread Hyukjin Kwon (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16795883#comment-16795883
 ] 

Hyukjin Kwon commented on SPARK-27052:
--

It doesn't usually and necessarily assign someone. When you open a PR, it 
automatically assigns. 

> Using PySpark udf in transform yields NULL values
> -
>
> Key: SPARK-27052
> URL: https://issues.apache.org/jira/browse/SPARK-27052
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 2.4.0
>Reporter: hejsgpuom62c
>Priority: Major
>
> Steps to reproduce
> {code:java}
> from typing import Optional
> from pyspark.sql.functions import expr
> def f(x: Optional[int]) -> Optional[int]:
> return x + 1 if x is not None else None
> spark.udf.register('f', f, "integer")
> df = (spark
> .createDataFrame([(1, [1, 2, 3])], ("id", "xs"))
> .withColumn("xsinc", expr("transform(xs, x -> f(x))")))
> df.show()
> # +---+-+-+
> # | id|   xs|xsinc|
> # +---+-+-+
> # |  1|[1, 2, 3]| [,,]|
> # +---+-+-+
> {code}
>  
> Source https://stackoverflow.com/a/53762650



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27052) Using PySpark udf in transform yields NULL values

2019-03-19 Thread Artem Rybin (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16795797#comment-16795797
 ] 

Artem Rybin commented on SPARK-27052:
-

Hi [~hejsgpuom62c]!

I reproduced this issue. I would like to investigate this.

Please, assign it to me.

> Using PySpark udf in transform yields NULL values
> -
>
> Key: SPARK-27052
> URL: https://issues.apache.org/jira/browse/SPARK-27052
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 2.4.0
>Reporter: hejsgpuom62c
>Priority: Major
>
> Steps to reproduce
> {code:java}
> from typing import Optional
> from pyspark.sql.functions import expr
> def f(x: Optional[int]) -> Optional[int]:
> return x + 1 if x is not None else None
> spark.udf.register('f', f, "integer")
> df = (spark
> .createDataFrame([(1, [1, 2, 3])], ("id", "xs"))
> .withColumn("xsinc", expr("transform(xs, x -> f(x))")))
> df.show()
> # +---+-+-+
> # | id|   xs|xsinc|
> # +---+-+-+
> # |  1|[1, 2, 3]| [,,]|
> # +---+-+-+
> {code}
>  
> Source https://stackoverflow.com/a/53762650



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org