subject:"\[jira\] \[Commented\] \(SPARK\-33073\) Improve error handling on Pandas to Arrow conversion failures"

[jira] [Commented] (SPARK-33073) Improve error handling on Pandas to Arrow conversion failures

2020-10-06 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17209309#comment-17209309
 ] 

Apache Spark commented on SPARK-33073:
--

User 'BryanCutler' has created a pull request for this issue:
https://github.com/apache/spark/pull/29962

> Improve error handling on Pandas to Arrow conversion failures
> -
>
> Key: SPARK-33073
> URL: https://issues.apache.org/jira/browse/SPARK-33073
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.0.1
>Reporter: Bryan Cutler
>Assignee: Bryan Cutler
>Priority: Major
> Fix For: 3.1.0
>
>
> Currently, when converting from Pandas to Arrow for Pandas UDF return values 
> or from createDataFrame(), PySpark will catch all ArrowExceptions and display 
> info on how to disable the safe conversion config. This is displayed with the 
> original error as a tuple:
> {noformat}
> ('Exception thrown when converting pandas.Series (object) to Arrow Array 
> (int32). It can be caused by overflows or other unsafe conversions warned by 
> Arrow. Arrow safe type check can be disabled by using SQL config 
> `spark.sql.execution.pandas.convertToArrowArraySafely`.', ArrowInvalid('Could 
> not convert a with type str: tried to convert to int'))
> {noformat}
> The problem is that this is meant mainly for thing like float truncation or 
> overflow, but will also show if the user has an invalid schema with types 
> that are incompatible. The extra information is confusing in this case and 
> the real error is buried.
> This could be improved by only printing the extra info on how to disable safe 
> checking if the config is actually set and using exception chaining to better 
> show the original error. Also, any safe failures would be a ValueError, which 
> ArrowInvaildError is a subclass, so the catch could be made more narrow.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33073) Improve error handling on Pandas to Arrow conversion failures

2020-10-06 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17208523#comment-17208523
 ] 

Apache Spark commented on SPARK-33073:
--

User 'BryanCutler' has created a pull request for this issue:
https://github.com/apache/spark/pull/29951

> Improve error handling on Pandas to Arrow conversion failures
> -
>
> Key: SPARK-33073
> URL: https://issues.apache.org/jira/browse/SPARK-33073
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.0.1
>Reporter: Bryan Cutler
>Priority: Major
>
> Currently, when converting from Pandas to Arrow for Pandas UDF return values 
> or from createDataFrame(), PySpark will catch all ArrowExceptions and display 
> info on how to disable the safe conversion config. This is displayed with the 
> original error as a tuple:
> {noformat}
> ('Exception thrown when converting pandas.Series (object) to Arrow Array 
> (int32). It can be caused by overflows or other unsafe conversions warned by 
> Arrow. Arrow safe type check can be disabled by using SQL config 
> `spark.sql.execution.pandas.convertToArrowArraySafely`.', ArrowInvalid('Could 
> not convert a with type str: tried to convert to int'))
> {noformat}
> The problem is that this is meant mainly for thing like float truncation or 
> overflow, but will also show if the user has an invalid schema with types 
> that are incompatible. The extra information is confusing in this case and 
> the real error is buried.
> This could be improved by only printing the extra info on how to disable safe 
> checking if the config is actually set and using exception chaining to better 
> show the original error. Also, any safe failures would be a ValueError, which 
> ArrowInvaildError is a subclass, so the catch could be made more narrow.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33073) Improve error handling on Pandas to Arrow conversion failures

[jira] [Commented] (SPARK-33073) Improve error handling on Pandas to Arrow conversion failures

2 matches

Site Navigation

Mail list logo

Footer information