[jira] [Commented] (SPARK-40770) Improved error messages for applyInPandas for schema mismatch
[ https://issues.apache.org/jira/browse/SPARK-40770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17686372#comment-17686372 ] Apache Spark commented on SPARK-40770: -- User 'EnricoMi' has created a pull request for this issue: https://github.com/apache/spark/pull/39952 > Improved error messages for applyInPandas for schema mismatch > - > > Key: SPARK-40770 > URL: https://issues.apache.org/jira/browse/SPARK-40770 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Enrico Minack >Assignee: Enrico Minack >Priority: Minor > Fix For: 3.5.0 > > > Error messages raised by `applyInPandas` are very generic or useless when > used with complex schemata: > {code} > KeyError: 'val' > {code} > {code} > RuntimeError: Number of columns of the returned pandas.DataFrame doesn't > match specified schema. Expected: 2 Actual: 3 > {code} > {code} > java.lang.IllegalArgumentException: not all nodes and buffers were consumed. > nodes: [ArrowFieldNode [length=3, nullCount=0]] buffers: [ArrowBuf[304], > address:139860828549160, length:0, ArrowBuf[305], address:139860828549160, > length:24] > {code} > {code} > pyarrow.lib.ArrowTypeError: Expected a string or bytes dtype, got int64 > {code} > {code} > pyarrow.lib.ArrowInvalid: Could not convert '0' with type str: tried to > convert to double > {code} > These should be improved by adding column names or descriptive messages (in > the same order as above): > {code} > RuntimeError: Column names of the returned pandas.DataFrame do not match > specified schema. Missing: val Unexpected: v Schema: id, val > {code} > {code} > RuntimeError: Column names of the returned pandas.DataFrame do not match > specified schema. Missing: val Unexpected: foo, v Schema: id, val > {code} > {code} > RuntimeError: Column names of the returned pandas.DataFrame do not match > specified schema. Unexpected: v Schema: id, id > {code} > {code} > pyarrow.lib.ArrowTypeError: Expected a string or bytes dtype, got int64 > The above exception was the direct cause of the following exception: > TypeError: Exception thrown when converting pandas.Series (int64) with name > 'val' to Arrow Array (string). > {code} > {code} > pyarrow.lib.ArrowInvalid: Could not convert '0' with type str: tried to > convert to double > The above exception was the direct cause of the following exception: > ValueError: Exception thrown when converting pandas.Series (object) with name > 'val' to Arrow Array (double). > {code} > When no column names are given, the following error was returned: > {code} > RuntimeError: Number of columns of the returned pandas.DataFrame doesn't > match specified schema. Expected: 2 Actual: 3 > {code} > Where it should contain the output schema: > {code} > RuntimeError: Number of columns of the returned pandas.DataFrame doesn't > match specified schema. Expected: 2 Actual: 3 Schema: id, val > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40770) Improved error messages for applyInPandas for schema mismatch
[ https://issues.apache.org/jira/browse/SPARK-40770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17686371#comment-17686371 ] Apache Spark commented on SPARK-40770: -- User 'EnricoMi' has created a pull request for this issue: https://github.com/apache/spark/pull/39952 > Improved error messages for applyInPandas for schema mismatch > - > > Key: SPARK-40770 > URL: https://issues.apache.org/jira/browse/SPARK-40770 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Enrico Minack >Assignee: Enrico Minack >Priority: Minor > Fix For: 3.5.0 > > > Error messages raised by `applyInPandas` are very generic or useless when > used with complex schemata: > {code} > KeyError: 'val' > {code} > {code} > RuntimeError: Number of columns of the returned pandas.DataFrame doesn't > match specified schema. Expected: 2 Actual: 3 > {code} > {code} > java.lang.IllegalArgumentException: not all nodes and buffers were consumed. > nodes: [ArrowFieldNode [length=3, nullCount=0]] buffers: [ArrowBuf[304], > address:139860828549160, length:0, ArrowBuf[305], address:139860828549160, > length:24] > {code} > {code} > pyarrow.lib.ArrowTypeError: Expected a string or bytes dtype, got int64 > {code} > {code} > pyarrow.lib.ArrowInvalid: Could not convert '0' with type str: tried to > convert to double > {code} > These should be improved by adding column names or descriptive messages (in > the same order as above): > {code} > RuntimeError: Column names of the returned pandas.DataFrame do not match > specified schema. Missing: val Unexpected: v Schema: id, val > {code} > {code} > RuntimeError: Column names of the returned pandas.DataFrame do not match > specified schema. Missing: val Unexpected: foo, v Schema: id, val > {code} > {code} > RuntimeError: Column names of the returned pandas.DataFrame do not match > specified schema. Unexpected: v Schema: id, id > {code} > {code} > pyarrow.lib.ArrowTypeError: Expected a string or bytes dtype, got int64 > The above exception was the direct cause of the following exception: > TypeError: Exception thrown when converting pandas.Series (int64) with name > 'val' to Arrow Array (string). > {code} > {code} > pyarrow.lib.ArrowInvalid: Could not convert '0' with type str: tried to > convert to double > The above exception was the direct cause of the following exception: > ValueError: Exception thrown when converting pandas.Series (object) with name > 'val' to Arrow Array (double). > {code} > When no column names are given, the following error was returned: > {code} > RuntimeError: Number of columns of the returned pandas.DataFrame doesn't > match specified schema. Expected: 2 Actual: 3 > {code} > Where it should contain the output schema: > {code} > RuntimeError: Number of columns of the returned pandas.DataFrame doesn't > match specified schema. Expected: 2 Actual: 3 Schema: id, val > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40770) Improved error messages for applyInPandas for schema mismatch
[ https://issues.apache.org/jira/browse/SPARK-40770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17616483#comment-17616483 ] Apache Spark commented on SPARK-40770: -- User 'EnricoMi' has created a pull request for this issue: https://github.com/apache/spark/pull/38223 > Improved error messages for applyInPandas for schema mismatch > - > > Key: SPARK-40770 > URL: https://issues.apache.org/jira/browse/SPARK-40770 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Enrico Minack >Priority: Minor > > Error messages raised by `applyInPandas` are very generic or useless when > used with complex schemata: > {code} > KeyError: 'val' > {code} > {code} > RuntimeError: Number of columns of the returned pandas.DataFrame doesn't > match specified schema. Expected: 2 Actual: 3 > {code} > {code} > java.lang.IllegalArgumentException: not all nodes and buffers were consumed. > nodes: [ArrowFieldNode [length=3, nullCount=0]] buffers: [ArrowBuf[304], > address:139860828549160, length:0, ArrowBuf[305], address:139860828549160, > length:24] > {code} > {code} > pyarrow.lib.ArrowTypeError: Expected a string or bytes dtype, got int64 > {code} > {code} > pyarrow.lib.ArrowInvalid: Could not convert '0' with type str: tried to > convert to double > {code} > These should be improved by adding column names or descriptive messages (in > the same order as above): > {code} > RuntimeError: Column names of the returned pandas.DataFrame do not match > specified schema. Missing: val Unexpected: v Schema: id, val > {code} > {code} > RuntimeError: Column names of the returned pandas.DataFrame do not match > specified schema. Missing: val Unexpected: foo, v Schema: id, val > {code} > {code} > RuntimeError: Column names of the returned pandas.DataFrame do not match > specified schema. Unexpected: v Schema: id, id > {code} > {code} > pyarrow.lib.ArrowTypeError: Expected a string or bytes dtype, got int64 > The above exception was the direct cause of the following exception: > TypeError: Exception thrown when converting pandas.Series (int64) with name > 'val' to Arrow Array (string). > {code} > {code} > pyarrow.lib.ArrowInvalid: Could not convert '0' with type str: tried to > convert to double > The above exception was the direct cause of the following exception: > ValueError: Exception thrown when converting pandas.Series (object) with name > 'val' to Arrow Array (double). > {code} > When no column names are given, the following error was returned: > {code} > RuntimeError: Number of columns of the returned pandas.DataFrame doesn't > match specified schema. Expected: 2 Actual: 3 > {code} > Where it should contain the output schema: > {code} > RuntimeError: Number of columns of the returned pandas.DataFrame doesn't > match specified schema. Expected: 2 Actual: 3 Schema: id, val > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40770) Improved error messages for applyInPandas for schema mismatch
[ https://issues.apache.org/jira/browse/SPARK-40770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17616478#comment-17616478 ] Apache Spark commented on SPARK-40770: -- User 'EnricoMi' has created a pull request for this issue: https://github.com/apache/spark/pull/38223 > Improved error messages for applyInPandas for schema mismatch > - > > Key: SPARK-40770 > URL: https://issues.apache.org/jira/browse/SPARK-40770 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Enrico Minack >Priority: Minor > > Error messages raised by `applyInPandas` are very generic or useless when > used with complex schemata: > {code} > KeyError: 'val' > {code} > {code} > RuntimeError: Number of columns of the returned pandas.DataFrame doesn't > match specified schema. Expected: 2 Actual: 3 > {code} > {code} > java.lang.IllegalArgumentException: not all nodes and buffers were consumed. > nodes: [ArrowFieldNode [length=3, nullCount=0]] buffers: [ArrowBuf[304], > address:139860828549160, length:0, ArrowBuf[305], address:139860828549160, > length:24] > {code} > {code} > pyarrow.lib.ArrowTypeError: Expected a string or bytes dtype, got int64 > {code} > {code} > pyarrow.lib.ArrowInvalid: Could not convert '0' with type str: tried to > convert to double > {code} > These should be improved by adding column names or descriptive messages (in > the same order as above): > {code} > RuntimeError: Column names of the returned pandas.DataFrame do not match > specified schema. Missing: val Unexpected: v Schema: id, val > {code} > {code} > RuntimeError: Column names of the returned pandas.DataFrame do not match > specified schema. Missing: val Unexpected: foo, v Schema: id, val > {code} > {code} > RuntimeError: Column names of the returned pandas.DataFrame do not match > specified schema. Unexpected: v Schema: id, id > {code} > {code} > pyarrow.lib.ArrowTypeError: Expected a string or bytes dtype, got int64 > The above exception was the direct cause of the following exception: > TypeError: Exception thrown when converting pandas.Series (int64) with name > 'val' to Arrow Array (string). > {code} > {code} > pyarrow.lib.ArrowInvalid: Could not convert '0' with type str: tried to > convert to double > The above exception was the direct cause of the following exception: > ValueError: Exception thrown when converting pandas.Series (object) with name > 'val' to Arrow Array (double). > {code} > When no column names are given, the following error was returned: > {code} > RuntimeError: Number of columns of the returned pandas.DataFrame doesn't > match specified schema. Expected: 2 Actual: 3 > {code} > Where it should contain the output schema: > {code} > RuntimeError: Number of columns of the returned pandas.DataFrame doesn't > match specified schema. Expected: 2 Actual: 3 Schema: id, val > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org