[jira] [Commented] (SPARK-33073) Improve error handling on Pandas to Arrow conversion failures
[ https://issues.apache.org/jira/browse/SPARK-33073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17209309#comment-17209309 ] Apache Spark commented on SPARK-33073: -- User 'BryanCutler' has created a pull request for this issue: https://github.com/apache/spark/pull/29962 > Improve error handling on Pandas to Arrow conversion failures > - > > Key: SPARK-33073 > URL: https://issues.apache.org/jira/browse/SPARK-33073 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.0.1 >Reporter: Bryan Cutler >Assignee: Bryan Cutler >Priority: Major > Fix For: 3.1.0 > > > Currently, when converting from Pandas to Arrow for Pandas UDF return values > or from createDataFrame(), PySpark will catch all ArrowExceptions and display > info on how to disable the safe conversion config. This is displayed with the > original error as a tuple: > {noformat} > ('Exception thrown when converting pandas.Series (object) to Arrow Array > (int32). It can be caused by overflows or other unsafe conversions warned by > Arrow. Arrow safe type check can be disabled by using SQL config > `spark.sql.execution.pandas.convertToArrowArraySafely`.', ArrowInvalid('Could > not convert a with type str: tried to convert to int')) > {noformat} > The problem is that this is meant mainly for thing like float truncation or > overflow, but will also show if the user has an invalid schema with types > that are incompatible. The extra information is confusing in this case and > the real error is buried. > This could be improved by only printing the extra info on how to disable safe > checking if the config is actually set and using exception chaining to better > show the original error. Also, any safe failures would be a ValueError, which > ArrowInvaildError is a subclass, so the catch could be made more narrow. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33073) Improve error handling on Pandas to Arrow conversion failures
[ https://issues.apache.org/jira/browse/SPARK-33073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17208523#comment-17208523 ] Apache Spark commented on SPARK-33073: -- User 'BryanCutler' has created a pull request for this issue: https://github.com/apache/spark/pull/29951 > Improve error handling on Pandas to Arrow conversion failures > - > > Key: SPARK-33073 > URL: https://issues.apache.org/jira/browse/SPARK-33073 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.0.1 >Reporter: Bryan Cutler >Priority: Major > > Currently, when converting from Pandas to Arrow for Pandas UDF return values > or from createDataFrame(), PySpark will catch all ArrowExceptions and display > info on how to disable the safe conversion config. This is displayed with the > original error as a tuple: > {noformat} > ('Exception thrown when converting pandas.Series (object) to Arrow Array > (int32). It can be caused by overflows or other unsafe conversions warned by > Arrow. Arrow safe type check can be disabled by using SQL config > `spark.sql.execution.pandas.convertToArrowArraySafely`.', ArrowInvalid('Could > not convert a with type str: tried to convert to int')) > {noformat} > The problem is that this is meant mainly for thing like float truncation or > overflow, but will also show if the user has an invalid schema with types > that are incompatible. The extra information is confusing in this case and > the real error is buried. > This could be improved by only printing the extra info on how to disable safe > checking if the config is actually set and using exception chaining to better > show the original error. Also, any safe failures would be a ValueError, which > ArrowInvaildError is a subclass, so the catch could be made more narrow. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org