[ https://issues.apache.org/jira/browse/SPARK-35211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Darcy Shen updated SPARK-35211: ------------------------------- Description: {code:java} $ pip freeze certifi==2020.12.5 coverage==5.5 flake8==3.9.0 mccabe==0.6.1 mypy==0.812 mypy-extensions==0.4.3 numpy==1.20.1 pandas==1.2.3 pyarrow==2.0.0 pycodestyle==2.7.0 pyflakes==2.3.0 python-dateutil==2.8.1 pytz==2021.1 scipy==1.6.1 six==1.15.0 typed-ast==1.4.2 typing-extensions==3.7.4.3 xmlrunner==1.7.7 {code} {code} (spark) ➜ spark git:(master) bin/pyspark Python 3.8.8 (default, Feb 24 2021, 13:46:16) [Clang 10.0.0 ] :: Anaconda, Inc. on darwin Type "help", "copyright", "credits" or "license" for more information. Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 21/04/24 15:51:29 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /__ / .__/\_,_/_/ /_/\_\ version 3.2.0-SNAPSHOT /_/ Using Python version 3.8.8 (default, Feb 24 2021 13:46:16) Spark context Web UI available at http://172.30.0.12:4040 Spark context available as 'sc' (master = local[*], app id = local-1619250689842). SparkSession available as 'spark'. >>> spark.conf.set("spark.sql.execution.arrow.pyspark.enabled", "false") >>> from pyspark.testing.sqlutils import ExamplePoint >>> >>> import pandas as pd >>> >>> pdf = pd.DataFrame({'point': pd.Series([ExamplePoint(1, 1), ExamplePoint(2, >>> 2)])}) >>> >>> df = spark.createDataFrame(pdf) >>> >>> df.show() +----------+ | point| +----------+ |(0.0, 0.0)| |(0.0, 0.0)| +----------+ >>> df.toPandas() point 0 (0.0,0.0) 1 (0.0,0.0) >>> >>> {code} The correct result should be: {code} point 0 (1.0,1.0) 1 (2.0,2.0) {code} The following code snippet works fine: {code} (spark) ➜ spark git:(sadhen/SPARK-35211) ✗ bin/pyspark Python 3.8.8 (default, Feb 24 2021, 13:46:16) [Clang 10.0.0 ] :: Anaconda, Inc. on darwin Type "help", "copyright", "credits" or "license" for more information. Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 21/04/24 17:08:09 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /__ / .__/\_,_/_/ /_/\_\ version 3.2.0-SNAPSHOT /_/ Using Python version 3.8.8 (default, Feb 24 2021 13:46:16) Spark context Web UI available at http://172.30.0.12:4040 Spark context available as 'sc' (master = local[*], app id = local-1619255290637). SparkSession available as 'spark'. >>> spark.conf.set("spark.sql.execution.arrow.pyspark.enabled", "false") >>> from pyspark.testing.sqlutils import ExamplePoint >>> import pandas as pd >>> pdf = pd.DataFrame({'point': pd.Series([ExamplePoint(1.0, 1.0), >>> ExamplePoint(2.0, 2.0)])}) >>> df = spark.createDataFrame(pdf) >>> df.show() +----------+ | point| +----------+ |(1.0, 1.0)| |(2.0, 2.0)| +----------+ {code} was: {code:java} $ pip freeze certifi==2020.12.5 coverage==5.5 flake8==3.9.0 mccabe==0.6.1 mypy==0.812 mypy-extensions==0.4.3 numpy==1.20.1 pandas==1.2.3 pyarrow==2.0.0 pycodestyle==2.7.0 pyflakes==2.3.0 python-dateutil==2.8.1 pytz==2021.1 scipy==1.6.1 six==1.15.0 typed-ast==1.4.2 typing-extensions==3.7.4.3 xmlrunner==1.7.7 {code} {code} (spark) ➜ spark git:(master) bin/pyspark Python 3.8.8 (default, Feb 24 2021, 13:46:16) [Clang 10.0.0 ] :: Anaconda, Inc. on darwin Type "help", "copyright", "credits" or "license" for more information. Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 21/04/24 15:51:29 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /__ / .__/\_,_/_/ /_/\_\ version 3.2.0-SNAPSHOT /_/ Using Python version 3.8.8 (default, Feb 24 2021 13:46:16) Spark context Web UI available at http://172.30.0.12:4040 Spark context available as 'sc' (master = local[*], app id = local-1619250689842). SparkSession available as 'spark'. >>> spark.conf.set("spark.sql.execution.arrow.pyspark.enabled", "false") >>> from pyspark.testing.sqlutils import ExamplePoint >>> >>> import pandas as pd >>> >>> pdf = pd.DataFrame({'point': pd.Series([ExamplePoint(1, 1), ExamplePoint(2, >>> 2)])}) >>> >>> df = spark.createDataFrame(pdf) >>> >>> df.show() +----------+ | point| +----------+ |(0.0, 0.0)| |(0.0, 0.0)| +----------+ >>> df.toPandas() point 0 (0.0,0.0) 1 (0.0,0.0) >>> >>> {code} The correct result should be: {code} point 0 (1.0,1.0) 1 (2.0,2.0) {code} > Support UDT for Pandas with Arrow Disabled > ------------------------------------------ > > Key: SPARK-35211 > URL: https://issues.apache.org/jira/browse/SPARK-35211 > Project: Spark > Issue Type: Sub-task > Components: PySpark > Affects Versions: 3.1.1 > Reporter: Darcy Shen > Priority: Major > > {code:java} > $ pip freeze > certifi==2020.12.5 > coverage==5.5 > flake8==3.9.0 > mccabe==0.6.1 > mypy==0.812 > mypy-extensions==0.4.3 > numpy==1.20.1 > pandas==1.2.3 > pyarrow==2.0.0 > pycodestyle==2.7.0 > pyflakes==2.3.0 > python-dateutil==2.8.1 > pytz==2021.1 > scipy==1.6.1 > six==1.15.0 > typed-ast==1.4.2 > typing-extensions==3.7.4.3 > xmlrunner==1.7.7 > {code} > {code} > (spark) ➜ spark git:(master) bin/pyspark > Python 3.8.8 (default, Feb 24 2021, 13:46:16) > [Clang 10.0.0 ] :: Anaconda, Inc. on darwin > Type "help", "copyright", "credits" or "license" for more information. > Using Spark's default log4j profile: > org/apache/spark/log4j-defaults.properties > Setting default log level to "WARN". > To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use > setLogLevel(newLevel). > 21/04/24 15:51:29 WARN NativeCodeLoader: Unable to load native-hadoop library > for your platform... using builtin-java classes where applicable > Welcome to > ____ __ > / __/__ ___ _____/ /__ > _\ \/ _ \/ _ `/ __/ '_/ > /__ / .__/\_,_/_/ /_/\_\ version 3.2.0-SNAPSHOT > /_/ > Using Python version 3.8.8 (default, Feb 24 2021 13:46:16) > Spark context Web UI available at http://172.30.0.12:4040 > Spark context available as 'sc' (master = local[*], app id = > local-1619250689842). > SparkSession available as 'spark'. > >>> spark.conf.set("spark.sql.execution.arrow.pyspark.enabled", "false") > >>> from pyspark.testing.sqlutils import ExamplePoint > >>> > >>> import pandas as pd > >>> > >>> pdf = pd.DataFrame({'point': pd.Series([ExamplePoint(1, 1), > >>> ExamplePoint(2, 2)])}) > >>> > >>> df = spark.createDataFrame(pdf) > >>> > >>> df.show() > +----------+ > | point| > +----------+ > |(0.0, 0.0)| > |(0.0, 0.0)| > +----------+ > >>> df.toPandas() > point > 0 (0.0,0.0) > 1 (0.0,0.0) > >>> > >>> > {code} > The correct result should be: > {code} > point > 0 (1.0,1.0) > 1 (2.0,2.0) > {code} > The following code snippet works fine: > {code} > (spark) ➜ spark git:(sadhen/SPARK-35211) ✗ bin/pyspark > Python 3.8.8 (default, Feb 24 2021, 13:46:16) > [Clang 10.0.0 ] :: Anaconda, Inc. on darwin > Type "help", "copyright", "credits" or "license" for more information. > Using Spark's default log4j profile: > org/apache/spark/log4j-defaults.properties > Setting default log level to "WARN". > To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use > setLogLevel(newLevel). > 21/04/24 17:08:09 WARN NativeCodeLoader: Unable to load native-hadoop library > for your platform... using builtin-java classes where applicable > Welcome to > ____ __ > / __/__ ___ _____/ /__ > _\ \/ _ \/ _ `/ __/ '_/ > /__ / .__/\_,_/_/ /_/\_\ version 3.2.0-SNAPSHOT > /_/ > Using Python version 3.8.8 (default, Feb 24 2021 13:46:16) > Spark context Web UI available at http://172.30.0.12:4040 > Spark context available as 'sc' (master = local[*], app id = > local-1619255290637). > SparkSession available as 'spark'. > >>> spark.conf.set("spark.sql.execution.arrow.pyspark.enabled", "false") > >>> from pyspark.testing.sqlutils import ExamplePoint > >>> import pandas as pd > >>> pdf = pd.DataFrame({'point': pd.Series([ExamplePoint(1.0, 1.0), > >>> ExamplePoint(2.0, 2.0)])}) > >>> df = spark.createDataFrame(pdf) > >>> df.show() > +----------+ > | point| > +----------+ > |(1.0, 1.0)| > |(2.0, 2.0)| > +----------+ > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org