[jira] [Assigned] (SPARK-42289) DS V2 pushdown could let JDBC dialect decide to push down offset and limit
[ https://issues.apache.org/jira/browse/SPARK-42289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42289: Assignee: Apache Spark > DS V2 pushdown could let JDBC dialect decide to push down offset and limit > -- > > Key: SPARK-42289 > URL: https://issues.apache.org/jira/browse/SPARK-42289 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: jiaan.geng >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42313) Assign name to _LEGACY_ERROR_TEMP_1152
[ https://issues.apache.org/jira/browse/SPARK-42313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17686374#comment-17686374 ] Apache Spark commented on SPARK-42313: -- User 'itholic' has created a pull request for this issue: https://github.com/apache/spark/pull/39953 > Assign name to _LEGACY_ERROR_TEMP_1152 > -- > > Key: SPARK-42313 > URL: https://issues.apache.org/jira/browse/SPARK-42313 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42313) Assign name to _LEGACY_ERROR_TEMP_1152
[ https://issues.apache.org/jira/browse/SPARK-42313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42313: Assignee: Apache Spark > Assign name to _LEGACY_ERROR_TEMP_1152 > -- > > Key: SPARK-42313 > URL: https://issues.apache.org/jira/browse/SPARK-42313 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42313) Assign name to _LEGACY_ERROR_TEMP_1152
[ https://issues.apache.org/jira/browse/SPARK-42313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42313: Assignee: (was: Apache Spark) > Assign name to _LEGACY_ERROR_TEMP_1152 > -- > > Key: SPARK-42313 > URL: https://issues.apache.org/jira/browse/SPARK-42313 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40770) Improved error messages for applyInPandas for schema mismatch
[ https://issues.apache.org/jira/browse/SPARK-40770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17686372#comment-17686372 ] Apache Spark commented on SPARK-40770: -- User 'EnricoMi' has created a pull request for this issue: https://github.com/apache/spark/pull/39952 > Improved error messages for applyInPandas for schema mismatch > - > > Key: SPARK-40770 > URL: https://issues.apache.org/jira/browse/SPARK-40770 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Enrico Minack >Assignee: Enrico Minack >Priority: Minor > Fix For: 3.5.0 > > > Error messages raised by `applyInPandas` are very generic or useless when > used with complex schemata: > {code} > KeyError: 'val' > {code} > {code} > RuntimeError: Number of columns of the returned pandas.DataFrame doesn't > match specified schema. Expected: 2 Actual: 3 > {code} > {code} > java.lang.IllegalArgumentException: not all nodes and buffers were consumed. > nodes: [ArrowFieldNode [length=3, nullCount=0]] buffers: [ArrowBuf[304], > address:139860828549160, length:0, ArrowBuf[305], address:139860828549160, > length:24] > {code} > {code} > pyarrow.lib.ArrowTypeError: Expected a string or bytes dtype, got int64 > {code} > {code} > pyarrow.lib.ArrowInvalid: Could not convert '0' with type str: tried to > convert to double > {code} > These should be improved by adding column names or descriptive messages (in > the same order as above): > {code} > RuntimeError: Column names of the returned pandas.DataFrame do not match > specified schema. Missing: val Unexpected: v Schema: id, val > {code} > {code} > RuntimeError: Column names of the returned pandas.DataFrame do not match > specified schema. Missing: val Unexpected: foo, v Schema: id, val > {code} > {code} > RuntimeError: Column names of the returned pandas.DataFrame do not match > specified schema. Unexpected: v Schema: id, id > {code} > {code} > pyarrow.lib.ArrowTypeError: Expected a string or bytes dtype, got int64 > The above exception was the direct cause of the following exception: > TypeError: Exception thrown when converting pandas.Series (int64) with name > 'val' to Arrow Array (string). > {code} > {code} > pyarrow.lib.ArrowInvalid: Could not convert '0' with type str: tried to > convert to double > The above exception was the direct cause of the following exception: > ValueError: Exception thrown when converting pandas.Series (object) with name > 'val' to Arrow Array (double). > {code} > When no column names are given, the following error was returned: > {code} > RuntimeError: Number of columns of the returned pandas.DataFrame doesn't > match specified schema. Expected: 2 Actual: 3 > {code} > Where it should contain the output schema: > {code} > RuntimeError: Number of columns of the returned pandas.DataFrame doesn't > match specified schema. Expected: 2 Actual: 3 Schema: id, val > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40770) Improved error messages for applyInPandas for schema mismatch
[ https://issues.apache.org/jira/browse/SPARK-40770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17686371#comment-17686371 ] Apache Spark commented on SPARK-40770: -- User 'EnricoMi' has created a pull request for this issue: https://github.com/apache/spark/pull/39952 > Improved error messages for applyInPandas for schema mismatch > - > > Key: SPARK-40770 > URL: https://issues.apache.org/jira/browse/SPARK-40770 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Enrico Minack >Assignee: Enrico Minack >Priority: Minor > Fix For: 3.5.0 > > > Error messages raised by `applyInPandas` are very generic or useless when > used with complex schemata: > {code} > KeyError: 'val' > {code} > {code} > RuntimeError: Number of columns of the returned pandas.DataFrame doesn't > match specified schema. Expected: 2 Actual: 3 > {code} > {code} > java.lang.IllegalArgumentException: not all nodes and buffers were consumed. > nodes: [ArrowFieldNode [length=3, nullCount=0]] buffers: [ArrowBuf[304], > address:139860828549160, length:0, ArrowBuf[305], address:139860828549160, > length:24] > {code} > {code} > pyarrow.lib.ArrowTypeError: Expected a string or bytes dtype, got int64 > {code} > {code} > pyarrow.lib.ArrowInvalid: Could not convert '0' with type str: tried to > convert to double > {code} > These should be improved by adding column names or descriptive messages (in > the same order as above): > {code} > RuntimeError: Column names of the returned pandas.DataFrame do not match > specified schema. Missing: val Unexpected: v Schema: id, val > {code} > {code} > RuntimeError: Column names of the returned pandas.DataFrame do not match > specified schema. Missing: val Unexpected: foo, v Schema: id, val > {code} > {code} > RuntimeError: Column names of the returned pandas.DataFrame do not match > specified schema. Unexpected: v Schema: id, id > {code} > {code} > pyarrow.lib.ArrowTypeError: Expected a string or bytes dtype, got int64 > The above exception was the direct cause of the following exception: > TypeError: Exception thrown when converting pandas.Series (int64) with name > 'val' to Arrow Array (string). > {code} > {code} > pyarrow.lib.ArrowInvalid: Could not convert '0' with type str: tried to > convert to double > The above exception was the direct cause of the following exception: > ValueError: Exception thrown when converting pandas.Series (object) with name > 'val' to Arrow Array (double). > {code} > When no column names are given, the following error was returned: > {code} > RuntimeError: Number of columns of the returned pandas.DataFrame doesn't > match specified schema. Expected: 2 Actual: 3 > {code} > Where it should contain the output schema: > {code} > RuntimeError: Number of columns of the returned pandas.DataFrame doesn't > match specified schema. Expected: 2 Actual: 3 Schema: id, val > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42312) Assign name to _LEGACY_ERROR_TEMP_0042
[ https://issues.apache.org/jira/browse/SPARK-42312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17686368#comment-17686368 ] Apache Spark commented on SPARK-42312: -- User 'itholic' has created a pull request for this issue: https://github.com/apache/spark/pull/39951 > Assign name to _LEGACY_ERROR_TEMP_0042 > -- > > Key: SPARK-42312 > URL: https://issues.apache.org/jira/browse/SPARK-42312 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42312) Assign name to _LEGACY_ERROR_TEMP_0042
[ https://issues.apache.org/jira/browse/SPARK-42312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42312: Assignee: (was: Apache Spark) > Assign name to _LEGACY_ERROR_TEMP_0042 > -- > > Key: SPARK-42312 > URL: https://issues.apache.org/jira/browse/SPARK-42312 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42312) Assign name to _LEGACY_ERROR_TEMP_0042
[ https://issues.apache.org/jira/browse/SPARK-42312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42312: Assignee: Apache Spark > Assign name to _LEGACY_ERROR_TEMP_0042 > -- > > Key: SPARK-42312 > URL: https://issues.apache.org/jira/browse/SPARK-42312 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42312) Assign name to _LEGACY_ERROR_TEMP_0042
[ https://issues.apache.org/jira/browse/SPARK-42312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17686369#comment-17686369 ] Apache Spark commented on SPARK-42312: -- User 'itholic' has created a pull request for this issue: https://github.com/apache/spark/pull/39951 > Assign name to _LEGACY_ERROR_TEMP_0042 > -- > > Key: SPARK-42312 > URL: https://issues.apache.org/jira/browse/SPARK-42312 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42388) Avoid unnecessary parquet footer reads when no filters in vectorized reader
[ https://issues.apache.org/jira/browse/SPARK-42388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17686280#comment-17686280 ] Apache Spark commented on SPARK-42388: -- User 'yabola' has created a pull request for this issue: https://github.com/apache/spark/pull/39950 > Avoid unnecessary parquet footer reads when no filters in vectorized reader > --- > > Key: SPARK-42388 > URL: https://issues.apache.org/jira/browse/SPARK-42388 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Mars >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42388) Avoid unnecessary parquet footer reads when no filters in vectorized reader
[ https://issues.apache.org/jira/browse/SPARK-42388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42388: Assignee: (was: Apache Spark) > Avoid unnecessary parquet footer reads when no filters in vectorized reader > --- > > Key: SPARK-42388 > URL: https://issues.apache.org/jira/browse/SPARK-42388 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Mars >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42388) Avoid unnecessary parquet footer reads when no filters in vectorized reader
[ https://issues.apache.org/jira/browse/SPARK-42388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42388: Assignee: Apache Spark > Avoid unnecessary parquet footer reads when no filters in vectorized reader > --- > > Key: SPARK-42388 > URL: https://issues.apache.org/jira/browse/SPARK-42388 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Mars >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42386) Rewrite HiveGenericUDF with Invoke
[ https://issues.apache.org/jira/browse/SPARK-42386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17686222#comment-17686222 ] Apache Spark commented on SPARK-42386: -- User 'panbingkun' has created a pull request for this issue: https://github.com/apache/spark/pull/39949 > Rewrite HiveGenericUDF with Invoke > -- > > Key: SPARK-42386 > URL: https://issues.apache.org/jira/browse/SPARK-42386 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: BingKun Pan >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42386) Rewrite HiveGenericUDF with Invoke
[ https://issues.apache.org/jira/browse/SPARK-42386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42386: Assignee: (was: Apache Spark) > Rewrite HiveGenericUDF with Invoke > -- > > Key: SPARK-42386 > URL: https://issues.apache.org/jira/browse/SPARK-42386 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: BingKun Pan >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42386) Rewrite HiveGenericUDF with Invoke
[ https://issues.apache.org/jira/browse/SPARK-42386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42386: Assignee: Apache Spark > Rewrite HiveGenericUDF with Invoke > -- > > Key: SPARK-42386 > URL: https://issues.apache.org/jira/browse/SPARK-42386 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: BingKun Pan >Assignee: Apache Spark >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42385) Upgrade RoaringBitmap to 0.9.39
[ https://issues.apache.org/jira/browse/SPARK-42385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17686190#comment-17686190 ] Apache Spark commented on SPARK-42385: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/39948 > Upgrade RoaringBitmap to 0.9.39 > --- > > Key: SPARK-42385 > URL: https://issues.apache.org/jira/browse/SPARK-42385 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: Yang Jie >Priority: Minor > > [https://github.com/RoaringBitmap/RoaringBitmap/releases/tag/0.9.39] > * ForAllInRange Fixes Yet Again by [@larsk-db|https://github.com/larsk-db] > in [#614|https://github.com/RoaringBitmap/RoaringBitmap/pull/614] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42385) Upgrade RoaringBitmap to 0.9.39
[ https://issues.apache.org/jira/browse/SPARK-42385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42385: Assignee: (was: Apache Spark) > Upgrade RoaringBitmap to 0.9.39 > --- > > Key: SPARK-42385 > URL: https://issues.apache.org/jira/browse/SPARK-42385 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: Yang Jie >Priority: Minor > > [https://github.com/RoaringBitmap/RoaringBitmap/releases/tag/0.9.39] > * ForAllInRange Fixes Yet Again by [@larsk-db|https://github.com/larsk-db] > in [#614|https://github.com/RoaringBitmap/RoaringBitmap/pull/614] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42385) Upgrade RoaringBitmap to 0.9.39
[ https://issues.apache.org/jira/browse/SPARK-42385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42385: Assignee: Apache Spark > Upgrade RoaringBitmap to 0.9.39 > --- > > Key: SPARK-42385 > URL: https://issues.apache.org/jira/browse/SPARK-42385 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Minor > > [https://github.com/RoaringBitmap/RoaringBitmap/releases/tag/0.9.39] > * ForAllInRange Fixes Yet Again by [@larsk-db|https://github.com/larsk-db] > in [#614|https://github.com/RoaringBitmap/RoaringBitmap/pull/614] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41715) Catch specific exceptions for both Spark Connect and PySpark
[ https://issues.apache.org/jira/browse/SPARK-41715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17686163#comment-17686163 ] Apache Spark commented on SPARK-41715: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/39947 > Catch specific exceptions for both Spark Connect and PySpark > > > Key: SPARK-41715 > URL: https://issues.apache.org/jira/browse/SPARK-41715 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Minor > > In python/pyspark/sql/tests/test_catalog.py, we should catch more specific > exceptions such as AnalysisException. The test is shared in both Spark > Connect and PySpark so we should figure the way out to share it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41715) Catch specific exceptions for both Spark Connect and PySpark
[ https://issues.apache.org/jira/browse/SPARK-41715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17686164#comment-17686164 ] Apache Spark commented on SPARK-41715: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/39947 > Catch specific exceptions for both Spark Connect and PySpark > > > Key: SPARK-41715 > URL: https://issues.apache.org/jira/browse/SPARK-41715 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Minor > > In python/pyspark/sql/tests/test_catalog.py, we should catch more specific > exceptions such as AnalysisException. The test is shared in both Spark > Connect and PySpark so we should figure the way out to share it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41715) Catch specific exceptions for both Spark Connect and PySpark
[ https://issues.apache.org/jira/browse/SPARK-41715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41715: Assignee: (was: Apache Spark) > Catch specific exceptions for both Spark Connect and PySpark > > > Key: SPARK-41715 > URL: https://issues.apache.org/jira/browse/SPARK-41715 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Minor > > In python/pyspark/sql/tests/test_catalog.py, we should catch more specific > exceptions such as AnalysisException. The test is shared in both Spark > Connect and PySpark so we should figure the way out to share it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41715) Catch specific exceptions for both Spark Connect and PySpark
[ https://issues.apache.org/jira/browse/SPARK-41715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17686162#comment-17686162 ] Apache Spark commented on SPARK-41715: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/39947 > Catch specific exceptions for both Spark Connect and PySpark > > > Key: SPARK-41715 > URL: https://issues.apache.org/jira/browse/SPARK-41715 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Minor > > In python/pyspark/sql/tests/test_catalog.py, we should catch more specific > exceptions such as AnalysisException. The test is shared in both Spark > Connect and PySpark so we should figure the way out to share it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41715) Catch specific exceptions for both Spark Connect and PySpark
[ https://issues.apache.org/jira/browse/SPARK-41715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41715: Assignee: Apache Spark > Catch specific exceptions for both Spark Connect and PySpark > > > Key: SPARK-41715 > URL: https://issues.apache.org/jira/browse/SPARK-41715 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Apache Spark >Priority: Minor > > In python/pyspark/sql/tests/test_catalog.py, we should catch more specific > exceptions such as AnalysisException. The test is shared in both Spark > Connect and PySpark so we should figure the way out to share it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40453) Improve error handling for GRPC server
[ https://issues.apache.org/jira/browse/SPARK-40453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40453: Assignee: Apache Spark > Improve error handling for GRPC server > -- > > Key: SPARK-40453 > URL: https://issues.apache.org/jira/browse/SPARK-40453 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.2.2 >Reporter: Martin Grund >Assignee: Apache Spark >Priority: Major > > Right now the errors are handled very rudimentary and do not produce proper > GRPC errors. This issue address the work needed to return proper errors. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40453) Improve error handling for GRPC server
[ https://issues.apache.org/jira/browse/SPARK-40453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17686161#comment-17686161 ] Apache Spark commented on SPARK-40453: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/39947 > Improve error handling for GRPC server > -- > > Key: SPARK-40453 > URL: https://issues.apache.org/jira/browse/SPARK-40453 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.2.2 >Reporter: Martin Grund >Priority: Major > > Right now the errors are handled very rudimentary and do not produce proper > GRPC errors. This issue address the work needed to return proper errors. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40453) Improve error handling for GRPC server
[ https://issues.apache.org/jira/browse/SPARK-40453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40453: Assignee: (was: Apache Spark) > Improve error handling for GRPC server > -- > > Key: SPARK-40453 > URL: https://issues.apache.org/jira/browse/SPARK-40453 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.2.2 >Reporter: Martin Grund >Priority: Major > > Right now the errors are handled very rudimentary and do not produce proper > GRPC errors. This issue address the work needed to return proper errors. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40453) Improve error handling for GRPC server
[ https://issues.apache.org/jira/browse/SPARK-40453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17686160#comment-17686160 ] Apache Spark commented on SPARK-40453: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/39947 > Improve error handling for GRPC server > -- > > Key: SPARK-40453 > URL: https://issues.apache.org/jira/browse/SPARK-40453 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.2.2 >Reporter: Martin Grund >Priority: Major > > Right now the errors are handled very rudimentary and do not produce proper > GRPC errors. This issue address the work needed to return proper errors. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42310) Assign name to _LEGACY_ERROR_TEMP_1289
[ https://issues.apache.org/jira/browse/SPARK-42310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42310: Assignee: (was: Apache Spark) > Assign name to _LEGACY_ERROR_TEMP_1289 > -- > > Key: SPARK-42310 > URL: https://issues.apache.org/jira/browse/SPARK-42310 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42310) Assign name to _LEGACY_ERROR_TEMP_1289
[ https://issues.apache.org/jira/browse/SPARK-42310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42310: Assignee: Apache Spark > Assign name to _LEGACY_ERROR_TEMP_1289 > -- > > Key: SPARK-42310 > URL: https://issues.apache.org/jira/browse/SPARK-42310 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42310) Assign name to _LEGACY_ERROR_TEMP_1289
[ https://issues.apache.org/jira/browse/SPARK-42310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17686036#comment-17686036 ] Apache Spark commented on SPARK-42310: -- User 'itholic' has created a pull request for this issue: https://github.com/apache/spark/pull/39946 > Assign name to _LEGACY_ERROR_TEMP_1289 > -- > > Key: SPARK-42310 > URL: https://issues.apache.org/jira/browse/SPARK-42310 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42384) Mask function's generated code does not handle null input
[ https://issues.apache.org/jira/browse/SPARK-42384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17685995#comment-17685995 ] Apache Spark commented on SPARK-42384: -- User 'bersprockets' has created a pull request for this issue: https://github.com/apache/spark/pull/39945 > Mask function's generated code does not handle null input > - > > Key: SPARK-42384 > URL: https://issues.apache.org/jira/browse/SPARK-42384 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0, 3.5.0 >Reporter: Bruce Robbins >Priority: Major > > Example: > {noformat} > create or replace temp view v1 as > select * from values > (null), > ('AbCD123-@$#') > as data(col1); > cache table v1; > select mask(col1) from v1; > {noformat} > This query results in a {{NullPointerException}}: > {noformat} > 23/02/07 16:36:06 ERROR Executor: Exception in task 0.0 in stage 3.0 (TID 3) > java.lang.NullPointerException > at > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeWriter.write(UnsafeWriter.java:110) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760) > {noformat} > The generated code calls {{UnsafeWriter.write(0, value_0)}} regardless of > whether {{Mask.transformInput}} returns null or not. The > {{UnsafeWriter.write}} method for {{UTF8String}} does not expect a null > pointer. > {noformat} > /* 031 */ boolean isNull_1 = i.isNullAt(0); > /* 032 */ UTF8String value_1 = isNull_1 ? > /* 033 */ null : (i.getUTF8String(0)); > /* 034 */ > /* 035 */ > /* 036 */ > /* 037 */ > /* 038 */ UTF8String value_0 = null; > /* 039 */ value_0 = > org.apache.spark.sql.catalyst.expressions.Mask.transformInput(value_1, > ((UTF8String) references[0] /* literal */), ((UTF8String) references[1] /* > literal */), ((UTF8String) references[2] /* literal */), ((UTF8String) > references[3] /* literal */));; > /* 040 */ if (false) { > /* 041 */ mutableStateArray_0[0].setNullAt(0); > /* 042 */ } else { > /* 043 */ mutableStateArray_0[0].write(0, value_0); > /* 044 */ } > /* 045 */ return (mutableStateArray_0[0].getRow()); > /* 046 */ } > {noformat} > The bug is not exercised by a literal null input value, since there appears > to be some optimization that simply replaces the entire function call with a > null literal: > {noformat} > spark-sql> explain SELECT mask(NULL); > == Physical Plan == > *(1) Project [null AS mask(NULL, X, x, n, NULL)#47] > +- *(1) Scan OneRowRelation[] > Time taken: 0.026 seconds, Fetched 1 row(s) > spark-sql> SELECT mask(NULL); > NULL > Time taken: 0.042 seconds, Fetched 1 row(s) > spark-sql> > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42384) Mask function's generated code does not handle null input
[ https://issues.apache.org/jira/browse/SPARK-42384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42384: Assignee: (was: Apache Spark) > Mask function's generated code does not handle null input > - > > Key: SPARK-42384 > URL: https://issues.apache.org/jira/browse/SPARK-42384 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0, 3.5.0 >Reporter: Bruce Robbins >Priority: Major > > Example: > {noformat} > create or replace temp view v1 as > select * from values > (null), > ('AbCD123-@$#') > as data(col1); > cache table v1; > select mask(col1) from v1; > {noformat} > This query results in a {{NullPointerException}}: > {noformat} > 23/02/07 16:36:06 ERROR Executor: Exception in task 0.0 in stage 3.0 (TID 3) > java.lang.NullPointerException > at > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeWriter.write(UnsafeWriter.java:110) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760) > {noformat} > The generated code calls {{UnsafeWriter.write(0, value_0)}} regardless of > whether {{Mask.transformInput}} returns null or not. The > {{UnsafeWriter.write}} method for {{UTF8String}} does not expect a null > pointer. > {noformat} > /* 031 */ boolean isNull_1 = i.isNullAt(0); > /* 032 */ UTF8String value_1 = isNull_1 ? > /* 033 */ null : (i.getUTF8String(0)); > /* 034 */ > /* 035 */ > /* 036 */ > /* 037 */ > /* 038 */ UTF8String value_0 = null; > /* 039 */ value_0 = > org.apache.spark.sql.catalyst.expressions.Mask.transformInput(value_1, > ((UTF8String) references[0] /* literal */), ((UTF8String) references[1] /* > literal */), ((UTF8String) references[2] /* literal */), ((UTF8String) > references[3] /* literal */));; > /* 040 */ if (false) { > /* 041 */ mutableStateArray_0[0].setNullAt(0); > /* 042 */ } else { > /* 043 */ mutableStateArray_0[0].write(0, value_0); > /* 044 */ } > /* 045 */ return (mutableStateArray_0[0].getRow()); > /* 046 */ } > {noformat} > The bug is not exercised by a literal null input value, since there appears > to be some optimization that simply replaces the entire function call with a > null literal: > {noformat} > spark-sql> explain SELECT mask(NULL); > == Physical Plan == > *(1) Project [null AS mask(NULL, X, x, n, NULL)#47] > +- *(1) Scan OneRowRelation[] > Time taken: 0.026 seconds, Fetched 1 row(s) > spark-sql> SELECT mask(NULL); > NULL > Time taken: 0.042 seconds, Fetched 1 row(s) > spark-sql> > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42384) Mask function's generated code does not handle null input
[ https://issues.apache.org/jira/browse/SPARK-42384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42384: Assignee: Apache Spark > Mask function's generated code does not handle null input > - > > Key: SPARK-42384 > URL: https://issues.apache.org/jira/browse/SPARK-42384 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0, 3.5.0 >Reporter: Bruce Robbins >Assignee: Apache Spark >Priority: Major > > Example: > {noformat} > create or replace temp view v1 as > select * from values > (null), > ('AbCD123-@$#') > as data(col1); > cache table v1; > select mask(col1) from v1; > {noformat} > This query results in a {{NullPointerException}}: > {noformat} > 23/02/07 16:36:06 ERROR Executor: Exception in task 0.0 in stage 3.0 (TID 3) > java.lang.NullPointerException > at > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeWriter.write(UnsafeWriter.java:110) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760) > {noformat} > The generated code calls {{UnsafeWriter.write(0, value_0)}} regardless of > whether {{Mask.transformInput}} returns null or not. The > {{UnsafeWriter.write}} method for {{UTF8String}} does not expect a null > pointer. > {noformat} > /* 031 */ boolean isNull_1 = i.isNullAt(0); > /* 032 */ UTF8String value_1 = isNull_1 ? > /* 033 */ null : (i.getUTF8String(0)); > /* 034 */ > /* 035 */ > /* 036 */ > /* 037 */ > /* 038 */ UTF8String value_0 = null; > /* 039 */ value_0 = > org.apache.spark.sql.catalyst.expressions.Mask.transformInput(value_1, > ((UTF8String) references[0] /* literal */), ((UTF8String) references[1] /* > literal */), ((UTF8String) references[2] /* literal */), ((UTF8String) > references[3] /* literal */));; > /* 040 */ if (false) { > /* 041 */ mutableStateArray_0[0].setNullAt(0); > /* 042 */ } else { > /* 043 */ mutableStateArray_0[0].write(0, value_0); > /* 044 */ } > /* 045 */ return (mutableStateArray_0[0].getRow()); > /* 046 */ } > {noformat} > The bug is not exercised by a literal null input value, since there appears > to be some optimization that simply replaces the entire function call with a > null literal: > {noformat} > spark-sql> explain SELECT mask(NULL); > == Physical Plan == > *(1) Project [null AS mask(NULL, X, x, n, NULL)#47] > +- *(1) Scan OneRowRelation[] > Time taken: 0.026 seconds, Fetched 1 row(s) > spark-sql> SELECT mask(NULL); > NULL > Time taken: 0.042 seconds, Fetched 1 row(s) > spark-sql> > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42383) Protobuf serializer for RocksDB.TypeAliases
[ https://issues.apache.org/jira/browse/SPARK-42383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42383: Assignee: Apache Spark > Protobuf serializer for RocksDB.TypeAliases > --- > > Key: SPARK-42383 > URL: https://issues.apache.org/jira/browse/SPARK-42383 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.5.0 >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42383) Protobuf serializer for RocksDB.TypeAliases
[ https://issues.apache.org/jira/browse/SPARK-42383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42383: Assignee: (was: Apache Spark) > Protobuf serializer for RocksDB.TypeAliases > --- > > Key: SPARK-42383 > URL: https://issues.apache.org/jira/browse/SPARK-42383 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.5.0 >Reporter: Yang Jie >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42383) Protobuf serializer for RocksDB.TypeAliases
[ https://issues.apache.org/jira/browse/SPARK-42383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17685950#comment-17685950 ] Apache Spark commented on SPARK-42383: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/39944 > Protobuf serializer for RocksDB.TypeAliases > --- > > Key: SPARK-42383 > URL: https://issues.apache.org/jira/browse/SPARK-42383 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.5.0 >Reporter: Yang Jie >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40819) Parquet INT64 (TIMESTAMP(NANOS,true)) now throwing Illegal Parquet type instead of automatically converting to LongType
[ https://issues.apache.org/jira/browse/SPARK-40819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17685928#comment-17685928 ] Apache Spark commented on SPARK-40819: -- User 'awdavidson' has created a pull request for this issue: https://github.com/apache/spark/pull/39943 > Parquet INT64 (TIMESTAMP(NANOS,true)) now throwing Illegal Parquet type > instead of automatically converting to LongType > > > Key: SPARK-40819 > URL: https://issues.apache.org/jira/browse/SPARK-40819 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0, 3.2.1, 3.3.0, 3.2.2, 3.3.1, 3.2.3, 3.3.2, 3.4.0 >Reporter: Alfred Davidson >Assignee: Alfred Davidson >Priority: Critical > Labels: regression > Fix For: 3.2.4, 3.3.2, 3.4.0 > > > Since 3.2 parquet files containing attributes with type "INT64 > (TIMESTAMP(NANOS, true))" are no longer readable and attempting to read > throws: > > {code:java} > Caused by: org.apache.spark.sql.AnalysisException: Illegal Parquet type: > INT64 (TIMESTAMP(NANOS,true)) > at > org.apache.spark.sql.errors.QueryCompilationErrors$.illegalParquetTypeError(QueryCompilationErrors.scala:1284) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.illegalType$1(ParquetSchemaConverter.scala:105) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.convertPrimitiveField(ParquetSchemaConverter.scala:174) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.convertField(ParquetSchemaConverter.scala:90) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.$anonfun$convert$1(ParquetSchemaConverter.scala:72) > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238) > at scala.collection.Iterator.foreach(Iterator.scala:941) > at scala.collection.Iterator.foreach$(Iterator.scala:941) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1429) > at scala.collection.IterableLike.foreach(IterableLike.scala:74) > at scala.collection.IterableLike.foreach$(IterableLike.scala:73) > at scala.collection.AbstractIterable.foreach(Iterable.scala:56) > at scala.collection.TraversableLike.map(TraversableLike.scala:238) > at scala.collection.TraversableLike.map$(TraversableLike.scala:231) > at scala.collection.AbstractTraversable.map(Traversable.scala:108) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.convert(ParquetSchemaConverter.scala:66) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.convert(ParquetSchemaConverter.scala:63) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$.$anonfun$readSchemaFromFooter$2(ParquetFileFormat.scala:548) > at scala.Option.getOrElse(Option.scala:189) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$.readSchemaFromFooter(ParquetFileFormat.scala:548) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$.$anonfun$mergeSchemasInParallel$2(ParquetFileFormat.scala:528) > at scala.collection.immutable.Stream.map(Stream.scala:418) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$.$anonfun$mergeSchemasInParallel$1(ParquetFileFormat.scala:528) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$.$anonfun$mergeSchemasInParallel$1$adapted(ParquetFileFormat.scala:521) > at > org.apache.spark.sql.execution.datasources.SchemaMergeUtils$.$anonfun$mergeSchemasInParallel$2(SchemaMergeUtils.scala:76) > {code} > Prior to 3.2 successfully reads the parquet automatically converting to a > LongType. > I believe work part of https://issues.apache.org/jira/browse/SPARK-34661 > introduced the change in behaviour, more specifically here: > [https://github.com/apache/spark/pull/31776/files#diff-3730a913c4b95edf09fb78f8739c538bae53f7269555b6226efe7ccee1901b39R154] > which throws the QueryCompilationErrors.illegalParquetTypeError -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42267) Support left_outer join
[ https://issues.apache.org/jira/browse/SPARK-42267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17685758#comment-17685758 ] Apache Spark commented on SPARK-42267: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/39940 > Support left_outer join > --- > > Key: SPARK-42267 > URL: https://issues.apache.org/jira/browse/SPARK-42267 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Assignee: Ruifeng Zheng >Priority: Major > Fix For: 3.4.0 > > > ``` > >>> df = spark.range(1) > >>> df2 = spark.range(2) > >>> df.join(df2, how="left_outer") > Traceback (most recent call last): > File "", line 1, in > File "/Users/xinrong.meng/spark/python/pyspark/sql/connect/dataframe.py", > line 438, in join > plan.Join(left=self._plan, right=other._plan, on=on, how=how), > File "/Users/xinrong.meng/spark/python/pyspark/sql/connect/plan.py", line > 730, in __init__ > raise NotImplementedError( > NotImplementedError: > Unsupported join type: left_outer. Supported join types > include: > "inner", "outer", "full", "fullouter", "full_outer", > "leftouter", "left", "left_outer", "rightouter", > "right", "right_outer", "leftsemi", "left_semi", > "semi", "leftanti", "left_anti", "anti", "cross", > ``` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42267) Support left_outer join
[ https://issues.apache.org/jira/browse/SPARK-42267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17685760#comment-17685760 ] Apache Spark commented on SPARK-42267: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/39940 > Support left_outer join > --- > > Key: SPARK-42267 > URL: https://issues.apache.org/jira/browse/SPARK-42267 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Assignee: Ruifeng Zheng >Priority: Major > Fix For: 3.4.0 > > > ``` > >>> df = spark.range(1) > >>> df2 = spark.range(2) > >>> df.join(df2, how="left_outer") > Traceback (most recent call last): > File "", line 1, in > File "/Users/xinrong.meng/spark/python/pyspark/sql/connect/dataframe.py", > line 438, in join > plan.Join(left=self._plan, right=other._plan, on=on, how=how), > File "/Users/xinrong.meng/spark/python/pyspark/sql/connect/plan.py", line > 730, in __init__ > raise NotImplementedError( > NotImplementedError: > Unsupported join type: left_outer. Supported join types > include: > "inner", "outer", "full", "fullouter", "full_outer", > "leftouter", "left", "left_outer", "rightouter", > "right", "right_outer", "leftsemi", "left_semi", > "semi", "leftanti", "left_anti", "anti", "cross", > ``` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42381) `CreateDataFrame` should accept objects
[ https://issues.apache.org/jira/browse/SPARK-42381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17685752#comment-17685752 ] Apache Spark commented on SPARK-42381: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/39939 > `CreateDataFrame` should accept objects > --- > > Key: SPARK-42381 > URL: https://issues.apache.org/jira/browse/SPARK-42381 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42381) `CreateDataFrame` should accept objects
[ https://issues.apache.org/jira/browse/SPARK-42381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42381: Assignee: (was: Apache Spark) > `CreateDataFrame` should accept objects > --- > > Key: SPARK-42381 > URL: https://issues.apache.org/jira/browse/SPARK-42381 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42381) `CreateDataFrame` should accept objects
[ https://issues.apache.org/jira/browse/SPARK-42381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17685750#comment-17685750 ] Apache Spark commented on SPARK-42381: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/39939 > `CreateDataFrame` should accept objects > --- > > Key: SPARK-42381 > URL: https://issues.apache.org/jira/browse/SPARK-42381 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42381) `CreateDataFrame` should accept objects
[ https://issues.apache.org/jira/browse/SPARK-42381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42381: Assignee: Apache Spark > `CreateDataFrame` should accept objects > --- > > Key: SPARK-42381 > URL: https://issues.apache.org/jira/browse/SPARK-42381 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42309) Assign name to _LEGACY_ERROR_TEMP_1204
[ https://issues.apache.org/jira/browse/SPARK-42309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42309: Assignee: Apache Spark > Assign name to _LEGACY_ERROR_TEMP_1204 > -- > > Key: SPARK-42309 > URL: https://issues.apache.org/jira/browse/SPARK-42309 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42309) Assign name to _LEGACY_ERROR_TEMP_1204
[ https://issues.apache.org/jira/browse/SPARK-42309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42309: Assignee: (was: Apache Spark) > Assign name to _LEGACY_ERROR_TEMP_1204 > -- > > Key: SPARK-42309 > URL: https://issues.apache.org/jira/browse/SPARK-42309 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42309) Assign name to _LEGACY_ERROR_TEMP_1204
[ https://issues.apache.org/jira/browse/SPARK-42309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17685723#comment-17685723 ] Apache Spark commented on SPARK-42309: -- User 'itholic' has created a pull request for this issue: https://github.com/apache/spark/pull/39937 > Assign name to _LEGACY_ERROR_TEMP_1204 > -- > > Key: SPARK-42309 > URL: https://issues.apache.org/jira/browse/SPARK-42309 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42267) Support left_outer join
[ https://issues.apache.org/jira/browse/SPARK-42267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42267: Assignee: (was: Apache Spark) > Support left_outer join > --- > > Key: SPARK-42267 > URL: https://issues.apache.org/jira/browse/SPARK-42267 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Priority: Major > > ``` > >>> df = spark.range(1) > >>> df2 = spark.range(2) > >>> df.join(df2, how="left_outer") > Traceback (most recent call last): > File "", line 1, in > File "/Users/xinrong.meng/spark/python/pyspark/sql/connect/dataframe.py", > line 438, in join > plan.Join(left=self._plan, right=other._plan, on=on, how=how), > File "/Users/xinrong.meng/spark/python/pyspark/sql/connect/plan.py", line > 730, in __init__ > raise NotImplementedError( > NotImplementedError: > Unsupported join type: left_outer. Supported join types > include: > "inner", "outer", "full", "fullouter", "full_outer", > "leftouter", "left", "left_outer", "rightouter", > "right", "right_outer", "leftsemi", "left_semi", > "semi", "leftanti", "left_anti", "anti", "cross", > ``` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42267) Support left_outer join
[ https://issues.apache.org/jira/browse/SPARK-42267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17685722#comment-17685722 ] Apache Spark commented on SPARK-42267: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/39938 > Support left_outer join > --- > > Key: SPARK-42267 > URL: https://issues.apache.org/jira/browse/SPARK-42267 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Priority: Major > > ``` > >>> df = spark.range(1) > >>> df2 = spark.range(2) > >>> df.join(df2, how="left_outer") > Traceback (most recent call last): > File "", line 1, in > File "/Users/xinrong.meng/spark/python/pyspark/sql/connect/dataframe.py", > line 438, in join > plan.Join(left=self._plan, right=other._plan, on=on, how=how), > File "/Users/xinrong.meng/spark/python/pyspark/sql/connect/plan.py", line > 730, in __init__ > raise NotImplementedError( > NotImplementedError: > Unsupported join type: left_outer. Supported join types > include: > "inner", "outer", "full", "fullouter", "full_outer", > "leftouter", "left", "left_outer", "rightouter", > "right", "right_outer", "leftsemi", "left_semi", > "semi", "leftanti", "left_anti", "anti", "cross", > ``` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42267) Support left_outer join
[ https://issues.apache.org/jira/browse/SPARK-42267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42267: Assignee: Apache Spark > Support left_outer join > --- > > Key: SPARK-42267 > URL: https://issues.apache.org/jira/browse/SPARK-42267 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Assignee: Apache Spark >Priority: Major > > ``` > >>> df = spark.range(1) > >>> df2 = spark.range(2) > >>> df.join(df2, how="left_outer") > Traceback (most recent call last): > File "", line 1, in > File "/Users/xinrong.meng/spark/python/pyspark/sql/connect/dataframe.py", > line 438, in join > plan.Join(left=self._plan, right=other._plan, on=on, how=how), > File "/Users/xinrong.meng/spark/python/pyspark/sql/connect/plan.py", line > 730, in __init__ > raise NotImplementedError( > NotImplementedError: > Unsupported join type: left_outer. Supported join types > include: > "inner", "outer", "full", "fullouter", "full_outer", > "leftouter", "left", "left_outer", "rightouter", > "right", "right_outer", "leftsemi", "left_semi", > "semi", "leftanti", "left_anti", "anti", "cross", > ``` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42379) Use FileSystem.exists in FileSystemBasedCheckpointFileManager.exists
[ https://issues.apache.org/jira/browse/SPARK-42379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17685681#comment-17685681 ] Apache Spark commented on SPARK-42379: -- User 'HeartSaVioR' has created a pull request for this issue: https://github.com/apache/spark/pull/39936 > Use FileSystem.exists in FileSystemBasedCheckpointFileManager.exists > > > Key: SPARK-42379 > URL: https://issues.apache.org/jira/browse/SPARK-42379 > Project: Spark > Issue Type: Task > Components: Structured Streaming >Affects Versions: 3.5.0 >Reporter: Jungtaek Lim >Priority: Major > > Other methods in FileSystemBasedCheckpointFileManager already uses > FileSystem.exists for all cases checking existence of the path. Use > FileSystem.exists in FileSystemBasedCheckpointFileManager.exists, which is > consistent with other methods in FileSystemBasedCheckpointFileManager. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42379) Use FileSystem.exists in FileSystemBasedCheckpointFileManager.exists
[ https://issues.apache.org/jira/browse/SPARK-42379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42379: Assignee: (was: Apache Spark) > Use FileSystem.exists in FileSystemBasedCheckpointFileManager.exists > > > Key: SPARK-42379 > URL: https://issues.apache.org/jira/browse/SPARK-42379 > Project: Spark > Issue Type: Task > Components: Structured Streaming >Affects Versions: 3.5.0 >Reporter: Jungtaek Lim >Priority: Major > > Other methods in FileSystemBasedCheckpointFileManager already uses > FileSystem.exists for all cases checking existence of the path. Use > FileSystem.exists in FileSystemBasedCheckpointFileManager.exists, which is > consistent with other methods in FileSystemBasedCheckpointFileManager. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42379) Use FileSystem.exists in FileSystemBasedCheckpointFileManager.exists
[ https://issues.apache.org/jira/browse/SPARK-42379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42379: Assignee: Apache Spark > Use FileSystem.exists in FileSystemBasedCheckpointFileManager.exists > > > Key: SPARK-42379 > URL: https://issues.apache.org/jira/browse/SPARK-42379 > Project: Spark > Issue Type: Task > Components: Structured Streaming >Affects Versions: 3.5.0 >Reporter: Jungtaek Lim >Assignee: Apache Spark >Priority: Major > > Other methods in FileSystemBasedCheckpointFileManager already uses > FileSystem.exists for all cases checking existence of the path. Use > FileSystem.exists in FileSystemBasedCheckpointFileManager.exists, which is > consistent with other methods in FileSystemBasedCheckpointFileManager. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42379) Use FileSystem.exists in FileSystemBasedCheckpointFileManager.exists
[ https://issues.apache.org/jira/browse/SPARK-42379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17685680#comment-17685680 ] Apache Spark commented on SPARK-42379: -- User 'HeartSaVioR' has created a pull request for this issue: https://github.com/apache/spark/pull/39936 > Use FileSystem.exists in FileSystemBasedCheckpointFileManager.exists > > > Key: SPARK-42379 > URL: https://issues.apache.org/jira/browse/SPARK-42379 > Project: Spark > Issue Type: Task > Components: Structured Streaming >Affects Versions: 3.5.0 >Reporter: Jungtaek Lim >Priority: Major > > Other methods in FileSystemBasedCheckpointFileManager already uses > FileSystem.exists for all cases checking existence of the path. Use > FileSystem.exists in FileSystemBasedCheckpointFileManager.exists, which is > consistent with other methods in FileSystemBasedCheckpointFileManager. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42210) Standardize registered pickled Python UDFs
[ https://issues.apache.org/jira/browse/SPARK-42210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17685667#comment-17685667 ] Apache Spark commented on SPARK-42210: -- User 'xinrong-meng' has created a pull request for this issue: https://github.com/apache/spark/pull/39860 > Standardize registered pickled Python UDFs > -- > > Key: SPARK-42210 > URL: https://issues.apache.org/jira/browse/SPARK-42210 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Priority: Major > > Implement spark.udf. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42210) Standardize registered pickled Python UDFs
[ https://issues.apache.org/jira/browse/SPARK-42210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42210: Assignee: Apache Spark > Standardize registered pickled Python UDFs > -- > > Key: SPARK-42210 > URL: https://issues.apache.org/jira/browse/SPARK-42210 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Assignee: Apache Spark >Priority: Major > > Implement spark.udf. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42210) Standardize registered pickled Python UDFs
[ https://issues.apache.org/jira/browse/SPARK-42210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42210: Assignee: (was: Apache Spark) > Standardize registered pickled Python UDFs > -- > > Key: SPARK-42210 > URL: https://issues.apache.org/jira/browse/SPARK-42210 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Priority: Major > > Implement spark.udf. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42210) Standardize registered pickled Python UDFs
[ https://issues.apache.org/jira/browse/SPARK-42210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17685666#comment-17685666 ] Apache Spark commented on SPARK-42210: -- User 'xinrong-meng' has created a pull request for this issue: https://github.com/apache/spark/pull/39860 > Standardize registered pickled Python UDFs > -- > > Key: SPARK-42210 > URL: https://issues.apache.org/jira/browse/SPARK-42210 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Priority: Major > > Implement spark.udf. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42244) Refine error message by using Python types.
[ https://issues.apache.org/jira/browse/SPARK-42244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17685655#comment-17685655 ] Apache Spark commented on SPARK-42244: -- User 'itholic' has created a pull request for this issue: https://github.com/apache/spark/pull/39935 > Refine error message by using Python types. > --- > > Key: SPARK-42244 > URL: https://issues.apache.org/jira/browse/SPARK-42244 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Fix For: 3.4.0 > > > Currently, the type name in error message is mixed like `string` and `str`. > We might need to consolidate them into one rule. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42378) Make `DataFrame.select` support `a.*`
[ https://issues.apache.org/jira/browse/SPARK-42378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17685633#comment-17685633 ] Apache Spark commented on SPARK-42378: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/39934 > Make `DataFrame.select` support `a.*` > - > > Key: SPARK-42378 > URL: https://issues.apache.org/jira/browse/SPARK-42378 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42378) Make `DataFrame.select` support `a.*`
[ https://issues.apache.org/jira/browse/SPARK-42378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42378: Assignee: Apache Spark > Make `DataFrame.select` support `a.*` > - > > Key: SPARK-42378 > URL: https://issues.apache.org/jira/browse/SPARK-42378 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42378) Make `DataFrame.select` support `a.*`
[ https://issues.apache.org/jira/browse/SPARK-42378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42378: Assignee: (was: Apache Spark) > Make `DataFrame.select` support `a.*` > - > > Key: SPARK-42378 > URL: https://issues.apache.org/jira/browse/SPARK-42378 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42377) Test Framework for Connect Scala Client
[ https://issues.apache.org/jira/browse/SPARK-42377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17685613#comment-17685613 ] Apache Spark commented on SPARK-42377: -- User 'hvanhovell' has created a pull request for this issue: https://github.com/apache/spark/pull/39933 > Test Framework for Connect Scala Client > --- > > Key: SPARK-42377 > URL: https://issues.apache.org/jira/browse/SPARK-42377 > Project: Spark > Issue Type: Task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Herman van Hövell >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42377) Test Framework for Connect Scala Client
[ https://issues.apache.org/jira/browse/SPARK-42377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42377: Assignee: (was: Apache Spark) > Test Framework for Connect Scala Client > --- > > Key: SPARK-42377 > URL: https://issues.apache.org/jira/browse/SPARK-42377 > Project: Spark > Issue Type: Task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Herman van Hövell >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42377) Test Framework for Connect Scala Client
[ https://issues.apache.org/jira/browse/SPARK-42377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17685612#comment-17685612 ] Apache Spark commented on SPARK-42377: -- User 'hvanhovell' has created a pull request for this issue: https://github.com/apache/spark/pull/39933 > Test Framework for Connect Scala Client > --- > > Key: SPARK-42377 > URL: https://issues.apache.org/jira/browse/SPARK-42377 > Project: Spark > Issue Type: Task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Herman van Hövell >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42377) Test Framework for Connect Scala Client
[ https://issues.apache.org/jira/browse/SPARK-42377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42377: Assignee: Apache Spark > Test Framework for Connect Scala Client > --- > > Key: SPARK-42377 > URL: https://issues.apache.org/jira/browse/SPARK-42377 > Project: Spark > Issue Type: Task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Herman van Hövell >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42376) Introduce watermark propagation among operators
[ https://issues.apache.org/jira/browse/SPARK-42376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42376: Assignee: Apache Spark > Introduce watermark propagation among operators > --- > > Key: SPARK-42376 > URL: https://issues.apache.org/jira/browse/SPARK-42376 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.5.0 >Reporter: Jungtaek Lim >Assignee: Apache Spark >Priority: Major > > With introduction of SPARK-40925, we enabled workloads containing multiple > stateful operators in a single streaming query. > The JIRA ticket clearly described out-of-scope, "Here we propose fixing the > late record filtering in stateful operators to allow chaining of stateful > operators {*}which do not produce delayed records (like time-interval join or > potentially flatMapGroupsWithState){*}". > We identified production use case for stream-stream time-interval join > followed by stateful operator (e.g. window aggregation), and propose to > address such use case via this ticket. > The design will be described in the PR, but the sketched idea is introducing > simulation of watermark propagation among operators. As of now, Spark > considers all stateful operators to have same input watermark and output > watermark, which introduced the limitation. With this ticket, we construct > the logic to simulate watermark propagation so that each operator can have > its own (input watermark, output watermark). Operators introducing delayed > records will produce delayed output watermark, and downstream operator can > take the delay into account as input watermark will be adjusted. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42376) Introduce watermark propagation among operators
[ https://issues.apache.org/jira/browse/SPARK-42376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17685350#comment-17685350 ] Apache Spark commented on SPARK-42376: -- User 'HeartSaVioR' has created a pull request for this issue: https://github.com/apache/spark/pull/39931 > Introduce watermark propagation among operators > --- > > Key: SPARK-42376 > URL: https://issues.apache.org/jira/browse/SPARK-42376 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.5.0 >Reporter: Jungtaek Lim >Priority: Major > > With introduction of SPARK-40925, we enabled workloads containing multiple > stateful operators in a single streaming query. > The JIRA ticket clearly described out-of-scope, "Here we propose fixing the > late record filtering in stateful operators to allow chaining of stateful > operators {*}which do not produce delayed records (like time-interval join or > potentially flatMapGroupsWithState){*}". > We identified production use case for stream-stream time-interval join > followed by stateful operator (e.g. window aggregation), and propose to > address such use case via this ticket. > The design will be described in the PR, but the sketched idea is introducing > simulation of watermark propagation among operators. As of now, Spark > considers all stateful operators to have same input watermark and output > watermark, which introduced the limitation. With this ticket, we construct > the logic to simulate watermark propagation so that each operator can have > its own (input watermark, output watermark). Operators introducing delayed > records will produce delayed output watermark, and downstream operator can > take the delay into account as input watermark will be adjusted. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42376) Introduce watermark propagation among operators
[ https://issues.apache.org/jira/browse/SPARK-42376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42376: Assignee: (was: Apache Spark) > Introduce watermark propagation among operators > --- > > Key: SPARK-42376 > URL: https://issues.apache.org/jira/browse/SPARK-42376 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.5.0 >Reporter: Jungtaek Lim >Priority: Major > > With introduction of SPARK-40925, we enabled workloads containing multiple > stateful operators in a single streaming query. > The JIRA ticket clearly described out-of-scope, "Here we propose fixing the > late record filtering in stateful operators to allow chaining of stateful > operators {*}which do not produce delayed records (like time-interval join or > potentially flatMapGroupsWithState){*}". > We identified production use case for stream-stream time-interval join > followed by stateful operator (e.g. window aggregation), and propose to > address such use case via this ticket. > The design will be described in the PR, but the sketched idea is introducing > simulation of watermark propagation among operators. As of now, Spark > considers all stateful operators to have same input watermark and output > watermark, which introduced the limitation. With this ticket, we construct > the logic to simulate watermark propagation so that each operator can have > its own (input watermark, output watermark). Operators introducing delayed > records will produce delayed output watermark, and downstream operator can > take the delay into account as input watermark will be adjusted. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37099) Introduce a rank-based filter to optimize top-k computation
[ https://issues.apache.org/jira/browse/SPARK-37099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17685328#comment-17685328 ] Apache Spark commented on SPARK-37099: -- User 'beliefer' has created a pull request for this issue: https://github.com/apache/spark/pull/39930 > Introduce a rank-based filter to optimize top-k computation > --- > > Key: SPARK-37099 > URL: https://issues.apache.org/jira/browse/SPARK-37099 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Major > Attachments: q67.png, q67_optimized.png, skewed_window.png > > > in JD, we found that more than 90% usage of window function follows this > pattern: > {code:java} > select (... (row_number|rank|dense_rank) () over( [partition by ...] order > by ... ) as rn) > where rn (==|<|<=) k and other conditions{code} > > However, existing physical plan is not optimum: > > 1, we should select local top-k records within each partitions, and then > compute the global top-k. this can help reduce the shuffle amount; > > For these three rank functions (row_number|rank|dense_rank), the rank of a > key computed on partitial dataset is always <= its final rank computed on > the whole dataset. so we can safely discard rows with partitial rank > k, > anywhere. > > > 2, skewed-window: some partition is skewed and take a long time to finish > computation. > > A real-world skewed-window case in our system is attached. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42373) Remove unused blank line removal from CSVExprUtils
[ https://issues.apache.org/jira/browse/SPARK-42373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42373: Assignee: Apache Spark > Remove unused blank line removal from CSVExprUtils > -- > > Key: SPARK-42373 > URL: https://issues.apache.org/jira/browse/SPARK-42373 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.1 >Reporter: Willi Raschkowski >Assignee: Apache Spark >Priority: Minor > > The non-multiline CSV read codepath contains references to removal of blank > lines throughout. This is not necessary as blank lines are removed by the > parser. Furthermore, it causes confusion, indicating that blank lines are > removed at this point when instead they are already omitted from the data. > The multiline code-path does not explicitly remove blank lines leading to > what looks like disparity in behavior between the two. > The codepath for {{DataFrameReader.csv(dataset: Dataset[String])}} does need > to explicitly skip lines, and this should be respected in {{CSVUtils}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42373) Remove unused blank line removal from CSVExprUtils
[ https://issues.apache.org/jira/browse/SPARK-42373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42373: Assignee: (was: Apache Spark) > Remove unused blank line removal from CSVExprUtils > -- > > Key: SPARK-42373 > URL: https://issues.apache.org/jira/browse/SPARK-42373 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.1 >Reporter: Willi Raschkowski >Priority: Minor > > The non-multiline CSV read codepath contains references to removal of blank > lines throughout. This is not necessary as blank lines are removed by the > parser. Furthermore, it causes confusion, indicating that blank lines are > removed at this point when instead they are already omitted from the data. > The multiline code-path does not explicitly remove blank lines leading to > what looks like disparity in behavior between the two. > The codepath for {{DataFrameReader.csv(dataset: Dataset[String])}} does need > to explicitly skip lines, and this should be respected in {{CSVUtils}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42373) Remove unused blank line removal from CSVExprUtils
[ https://issues.apache.org/jira/browse/SPARK-42373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17685268#comment-17685268 ] Apache Spark commented on SPARK-42373: -- User 'ted-jenks' has created a pull request for this issue: https://github.com/apache/spark/pull/39927 > Remove unused blank line removal from CSVExprUtils > -- > > Key: SPARK-42373 > URL: https://issues.apache.org/jira/browse/SPARK-42373 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.1 >Reporter: Willi Raschkowski >Priority: Minor > > The non-multiline CSV read codepath contains references to removal of blank > lines throughout. This is not necessary as blank lines are removed by the > parser. Furthermore, it causes confusion, indicating that blank lines are > removed at this point when instead they are already omitted from the data. > The multiline code-path does not explicitly remove blank lines leading to > what looks like disparity in behavior between the two. > The codepath for {{DataFrameReader.csv(dataset: Dataset[String])}} does need > to explicitly skip lines, and this should be respected in {{CSVUtils}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42372) Improve performance of HiveGenericUDTF by making inputProjection instantiate once
[ https://issues.apache.org/jira/browse/SPARK-42372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42372: Assignee: (was: Apache Spark) > Improve performance of HiveGenericUDTF by making inputProjection instantiate > once > - > > Key: SPARK-42372 > URL: https://issues.apache.org/jira/browse/SPARK-42372 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Kent Yao >Priority: Major > > {code:java} > +++ b/sql/hive/benchmarks/HiveUDFBenchmark-per-row-results.txt > @@ -0,0 +1,7 @@ > +OpenJDK 64-Bit Server VM 1.8.0_352-bre_2022_12_13_23_06-b00 on Mac OS X 13.1 > +Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz > +Hive UDTF benchmark: Best Time(ms) Avg Time(ms) > Stdev(ms) Rate(M/s) Per Row(ns) Relative > + > +Hive UDTF dup 2 1574 1680 > 118 0.7 1501.1 1.0X > +Hive UDTF dup 4 2642 3076 > 588 0.4 2519.9 0.6X > + > diff --git a/sql/hive/benchmarks/HiveUDFBenchmark-results.txt > b/sql/hive/benchmarks/HiveUDFBenchmark-results.txt > new file mode 100644 > index 00..8af8b6582c > --- /dev/null > +++ b/sql/hive/benchmarks/HiveUDFBenchmark-results.txt > @@ -0,0 +1,7 @@ > +OpenJDK 64-Bit Server VM 1.8.0_352-bre_2022_12_13_23_06-b00 on Mac OS X 13.1 > +Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz > +Hive UDTF benchmark: Best Time(ms) Avg Time(ms) > Stdev(ms) Rate(M/s) Per Row(ns) Relative > + > +Hive UDTF dup 2 712 789 > 101 1.5 678.7 1.0X > +Hive UDTF dup 4 1212 1294 > 78 0.9 1156.0 0.6X > + {code} > over 2x performance gain via a benchmarking -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42372) Improve performance of HiveGenericUDTF by making inputProjection instantiate once
[ https://issues.apache.org/jira/browse/SPARK-42372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17685237#comment-17685237 ] Apache Spark commented on SPARK-42372: -- User 'yaooqinn' has created a pull request for this issue: https://github.com/apache/spark/pull/39929 > Improve performance of HiveGenericUDTF by making inputProjection instantiate > once > - > > Key: SPARK-42372 > URL: https://issues.apache.org/jira/browse/SPARK-42372 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Kent Yao >Priority: Major > > {code:java} > +++ b/sql/hive/benchmarks/HiveUDFBenchmark-per-row-results.txt > @@ -0,0 +1,7 @@ > +OpenJDK 64-Bit Server VM 1.8.0_352-bre_2022_12_13_23_06-b00 on Mac OS X 13.1 > +Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz > +Hive UDTF benchmark: Best Time(ms) Avg Time(ms) > Stdev(ms) Rate(M/s) Per Row(ns) Relative > + > +Hive UDTF dup 2 1574 1680 > 118 0.7 1501.1 1.0X > +Hive UDTF dup 4 2642 3076 > 588 0.4 2519.9 0.6X > + > diff --git a/sql/hive/benchmarks/HiveUDFBenchmark-results.txt > b/sql/hive/benchmarks/HiveUDFBenchmark-results.txt > new file mode 100644 > index 00..8af8b6582c > --- /dev/null > +++ b/sql/hive/benchmarks/HiveUDFBenchmark-results.txt > @@ -0,0 +1,7 @@ > +OpenJDK 64-Bit Server VM 1.8.0_352-bre_2022_12_13_23_06-b00 on Mac OS X 13.1 > +Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz > +Hive UDTF benchmark: Best Time(ms) Avg Time(ms) > Stdev(ms) Rate(M/s) Per Row(ns) Relative > + > +Hive UDTF dup 2 712 789 > 101 1.5 678.7 1.0X > +Hive UDTF dup 4 1212 1294 > 78 0.9 1156.0 0.6X > + {code} > over 2x performance gain via a benchmarking -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42372) Improve performance of HiveGenericUDTF by making inputProjection instantiate once
[ https://issues.apache.org/jira/browse/SPARK-42372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42372: Assignee: Apache Spark > Improve performance of HiveGenericUDTF by making inputProjection instantiate > once > - > > Key: SPARK-42372 > URL: https://issues.apache.org/jira/browse/SPARK-42372 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Kent Yao >Assignee: Apache Spark >Priority: Major > > {code:java} > +++ b/sql/hive/benchmarks/HiveUDFBenchmark-per-row-results.txt > @@ -0,0 +1,7 @@ > +OpenJDK 64-Bit Server VM 1.8.0_352-bre_2022_12_13_23_06-b00 on Mac OS X 13.1 > +Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz > +Hive UDTF benchmark: Best Time(ms) Avg Time(ms) > Stdev(ms) Rate(M/s) Per Row(ns) Relative > + > +Hive UDTF dup 2 1574 1680 > 118 0.7 1501.1 1.0X > +Hive UDTF dup 4 2642 3076 > 588 0.4 2519.9 0.6X > + > diff --git a/sql/hive/benchmarks/HiveUDFBenchmark-results.txt > b/sql/hive/benchmarks/HiveUDFBenchmark-results.txt > new file mode 100644 > index 00..8af8b6582c > --- /dev/null > +++ b/sql/hive/benchmarks/HiveUDFBenchmark-results.txt > @@ -0,0 +1,7 @@ > +OpenJDK 64-Bit Server VM 1.8.0_352-bre_2022_12_13_23_06-b00 on Mac OS X 13.1 > +Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz > +Hive UDTF benchmark: Best Time(ms) Avg Time(ms) > Stdev(ms) Rate(M/s) Per Row(ns) Relative > + > +Hive UDTF dup 2 712 789 > 101 1.5 678.7 1.0X > +Hive UDTF dup 4 1212 1294 > 78 0.9 1156.0 0.6X > + {code} > over 2x performance gain via a benchmarking -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42371) Add scripts to start and stop Spark Connect server
[ https://issues.apache.org/jira/browse/SPARK-42371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17685208#comment-17685208 ] Apache Spark commented on SPARK-42371: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/39928 > Add scripts to start and stop Spark Connect server > -- > > Key: SPARK-42371 > URL: https://issues.apache.org/jira/browse/SPARK-42371 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > > Currently, there is no proper way to start and stop the Spark Connect server. > Now it requires you to start it with, for example, a Spark shell: > {code} > # For development, > ./bin/spark-shell \ >--jars `ls connector/connect/target/**/spark-connect*SNAPSHOT.jar` \ > --conf spark.plugins=org.apache.spark.sql.connect.SparkConnectPlugin > {code} > {code} > # For released Spark versions > ./bin/spark-shell \ > --packages org.apache.spark:spark-connect_2.12:3.4.0 \ > --conf spark.plugins=org.apache.spark.sql.connect.SparkConnectPlugin > {code} > which is awkward. > We need some dedicated scripts for it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42371) Add scripts to start and stop Spark Connect server
[ https://issues.apache.org/jira/browse/SPARK-42371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17685209#comment-17685209 ] Apache Spark commented on SPARK-42371: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/39928 > Add scripts to start and stop Spark Connect server > -- > > Key: SPARK-42371 > URL: https://issues.apache.org/jira/browse/SPARK-42371 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > > Currently, there is no proper way to start and stop the Spark Connect server. > Now it requires you to start it with, for example, a Spark shell: > {code} > # For development, > ./bin/spark-shell \ >--jars `ls connector/connect/target/**/spark-connect*SNAPSHOT.jar` \ > --conf spark.plugins=org.apache.spark.sql.connect.SparkConnectPlugin > {code} > {code} > # For released Spark versions > ./bin/spark-shell \ > --packages org.apache.spark:spark-connect_2.12:3.4.0 \ > --conf spark.plugins=org.apache.spark.sql.connect.SparkConnectPlugin > {code} > which is awkward. > We need some dedicated scripts for it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42371) Add scripts to start and stop Spark Connect server
[ https://issues.apache.org/jira/browse/SPARK-42371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42371: Assignee: (was: Apache Spark) > Add scripts to start and stop Spark Connect server > -- > > Key: SPARK-42371 > URL: https://issues.apache.org/jira/browse/SPARK-42371 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > > Currently, there is no proper way to start and stop the Spark Connect server. > Now it requires you to start it with, for example, a Spark shell: > {code} > # For development, > ./bin/spark-shell \ >--jars `ls connector/connect/target/**/spark-connect*SNAPSHOT.jar` \ > --conf spark.plugins=org.apache.spark.sql.connect.SparkConnectPlugin > {code} > {code} > # For released Spark versions > ./bin/spark-shell \ > --packages org.apache.spark:spark-connect_2.12:3.4.0 \ > --conf spark.plugins=org.apache.spark.sql.connect.SparkConnectPlugin > {code} > which is awkward. > We need some dedicated scripts for it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42371) Add scripts to start and stop Spark Connect server
[ https://issues.apache.org/jira/browse/SPARK-42371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42371: Assignee: Apache Spark > Add scripts to start and stop Spark Connect server > -- > > Key: SPARK-42371 > URL: https://issues.apache.org/jira/browse/SPARK-42371 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Apache Spark >Priority: Major > > Currently, there is no proper way to start and stop the Spark Connect server. > Now it requires you to start it with, for example, a Spark shell: > {code} > # For development, > ./bin/spark-shell \ >--jars `ls connector/connect/target/**/spark-connect*SNAPSHOT.jar` \ > --conf spark.plugins=org.apache.spark.sql.connect.SparkConnectPlugin > {code} > {code} > # For released Spark versions > ./bin/spark-shell \ > --packages org.apache.spark:spark-connect_2.12:3.4.0 \ > --conf spark.plugins=org.apache.spark.sql.connect.SparkConnectPlugin > {code} > which is awkward. > We need some dedicated scripts for it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41823) DataFrame.join creating ambiguous column names
[ https://issues.apache.org/jira/browse/SPARK-41823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17685134#comment-17685134 ] Apache Spark commented on SPARK-41823: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/39925 > DataFrame.join creating ambiguous column names > -- > > Key: SPARK-41823 > URL: https://issues.apache.org/jira/browse/SPARK-41823 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Priority: Major > > {code:java} > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 254, in pyspark.sql.connect.dataframe.DataFrame.drop > Failed example: > df.join(df2, df.name == df2.name, 'inner').drop('name').show() > Exception raised: > Traceback (most recent call last): > File > "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", > line 1350, in __run > exec(compile(example.source, filename, "single", > File "", line > 1, in > df.join(df2, df.name == df2.name, 'inner').drop('name').show() > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 534, in show > print(self._show_string(n, truncate, vertical)) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 423, in _show_string > ).toPandas() > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 1031, in toPandas > return self._session.client.to_pandas(query) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 413, in to_pandas > return self._execute_and_fetch(req) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 573, in _execute_and_fetch > self._handle_error(rpc_error) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 619, in _handle_error > raise SparkConnectAnalysisException( > pyspark.sql.connect.client.SparkConnectAnalysisException: > [AMBIGUOUS_REFERENCE] Reference `name` is ambiguous, could be: [`name`, > `name`]. > Plan: {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41823) DataFrame.join creating ambiguous column names
[ https://issues.apache.org/jira/browse/SPARK-41823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17685133#comment-17685133 ] Apache Spark commented on SPARK-41823: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/39925 > DataFrame.join creating ambiguous column names > -- > > Key: SPARK-41823 > URL: https://issues.apache.org/jira/browse/SPARK-41823 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Priority: Major > > {code:java} > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 254, in pyspark.sql.connect.dataframe.DataFrame.drop > Failed example: > df.join(df2, df.name == df2.name, 'inner').drop('name').show() > Exception raised: > Traceback (most recent call last): > File > "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", > line 1350, in __run > exec(compile(example.source, filename, "single", > File "", line > 1, in > df.join(df2, df.name == df2.name, 'inner').drop('name').show() > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 534, in show > print(self._show_string(n, truncate, vertical)) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 423, in _show_string > ).toPandas() > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 1031, in toPandas > return self._session.client.to_pandas(query) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 413, in to_pandas > return self._execute_and_fetch(req) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 573, in _execute_and_fetch > self._handle_error(rpc_error) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 619, in _handle_error > raise SparkConnectAnalysisException( > pyspark.sql.connect.client.SparkConnectAnalysisException: > [AMBIGUOUS_REFERENCE] Reference `name` is ambiguous, could be: [`name`, > `name`]. > Plan: {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42369) Fix constructor for java.nio.DirectByteBuffer for Java 21+
[ https://issues.apache.org/jira/browse/SPARK-42369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42369: Assignee: Apache Spark > Fix constructor for java.nio.DirectByteBuffer for Java 21+ > -- > > Key: SPARK-42369 > URL: https://issues.apache.org/jira/browse/SPARK-42369 > Project: Spark > Issue Type: Bug > Components: Java API >Affects Versions: 3.5.0 >Reporter: Ludovic Henry >Assignee: Apache Spark >Priority: Major > > In the latest JDK, the constructor {{DirectByteBuffer(long, int)}} was > replaced with {{{}DirectByteBuffer(long, long){}}}. We just want to support > both by probing for the legacy one first and falling back to the newer one > second. > This change is completely transparent for the end-user, and makes sure Spark > works transparently on the latest JDK as well. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42369) Fix constructor for java.nio.DirectByteBuffer for Java 21+
[ https://issues.apache.org/jira/browse/SPARK-42369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17685132#comment-17685132 ] Apache Spark commented on SPARK-42369: -- User 'luhenry' has created a pull request for this issue: https://github.com/apache/spark/pull/39909 > Fix constructor for java.nio.DirectByteBuffer for Java 21+ > -- > > Key: SPARK-42369 > URL: https://issues.apache.org/jira/browse/SPARK-42369 > Project: Spark > Issue Type: Bug > Components: Java API >Affects Versions: 3.5.0 >Reporter: Ludovic Henry >Priority: Major > > In the latest JDK, the constructor {{DirectByteBuffer(long, int)}} was > replaced with {{{}DirectByteBuffer(long, long){}}}. We just want to support > both by probing for the legacy one first and falling back to the newer one > second. > This change is completely transparent for the end-user, and makes sure Spark > works transparently on the latest JDK as well. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42369) Fix constructor for java.nio.DirectByteBuffer for Java 21+
[ https://issues.apache.org/jira/browse/SPARK-42369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42369: Assignee: (was: Apache Spark) > Fix constructor for java.nio.DirectByteBuffer for Java 21+ > -- > > Key: SPARK-42369 > URL: https://issues.apache.org/jira/browse/SPARK-42369 > Project: Spark > Issue Type: Bug > Components: Java API >Affects Versions: 3.5.0 >Reporter: Ludovic Henry >Priority: Major > > In the latest JDK, the constructor {{DirectByteBuffer(long, int)}} was > replaced with {{{}DirectByteBuffer(long, long){}}}. We just want to support > both by probing for the legacy one first and falling back to the newer one > second. > This change is completely transparent for the end-user, and makes sure Spark > works transparently on the latest JDK as well. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41812) DataFrame.join: ambiguous column
[ https://issues.apache.org/jira/browse/SPARK-41812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17685131#comment-17685131 ] Apache Spark commented on SPARK-41812: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/39925 > DataFrame.join: ambiguous column > > > Key: SPARK-41812 > URL: https://issues.apache.org/jira/browse/SPARK-41812 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > > {code} > File "/.../spark/python/pyspark/sql/connect/column.py", line 106, in > pyspark.sql.connect.column.Column.eqNullSafe > Failed example: > df1.join(df2, df1["value"] == df2["value"]).count() > Exception raised: > Traceback (most recent call last): > File "/.../miniconda3/envs/python3.9/lib/python3.9/doctest.py", line > 1336, in __run > exec(compile(example.source, filename, "single", > File "", line > 1, in > df1.join(df2, df1["value"] == df2["value"]).count() > File "/.../spark/python/pyspark/sql/connect/dataframe.py", line 151, in > count > pdd = self.agg(_invoke_function("count", lit(1))).toPandas() > File "/.../spark/python/pyspark/sql/connect/dataframe.py", line 1031, > in toPandas > return self._session.client.to_pandas(query) > File "/.../spark/python/pyspark/sql/connect/client.py", line 413, in > to_pandas > return self._execute_and_fetch(req) > File "/.../spark/python/pyspark/sql/connect/client.py", line 573, in > _execute_and_fetch > self._handle_error(rpc_error) > File "/.../spark/python/pyspark/sql/connect/client.py", line 619, in > _handle_error > raise SparkConnectAnalysisException( > pyspark.sql.connect.client.SparkConnectAnalysisException: > [AMBIGUOUS_REFERENCE] Reference `value` is ambiguous, could be: [`value`, > `value`]. > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41708) Pull v1write information to WriteFiles
[ https://issues.apache.org/jira/browse/SPARK-41708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17685103#comment-17685103 ] Apache Spark commented on SPARK-41708: -- User 'cloud-fan' has created a pull request for this issue: https://github.com/apache/spark/pull/39922 > Pull v1write information to WriteFiles > -- > > Key: SPARK-41708 > URL: https://issues.apache.org/jira/browse/SPARK-41708 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: XiDuo You >Assignee: XiDuo You >Priority: Major > Fix For: 3.4.0 > > > Make WriteFiles hold v1 write information -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39851) Improve join stats estimation if one side can keep uniqueness
[ https://issues.apache.org/jira/browse/SPARK-39851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17685104#comment-17685104 ] Apache Spark commented on SPARK-39851: -- User 'wankunde' has created a pull request for this issue: https://github.com/apache/spark/pull/39923 > Improve join stats estimation if one side can keep uniqueness > - > > Key: SPARK-39851 > URL: https://issues.apache.org/jira/browse/SPARK-39851 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Yuming Wang >Priority: Major > > {code:sql} > SELECT i_item_sk ss_item_sk > FROM item, >(SELECT DISTINCT iss.i_brand_idbrand_id, > iss.i_class_idclass_id, > iss.i_category_id category_id > FROM item iss) x > WHERE i_brand_id = brand_id >AND i_class_id = class_id >AND i_category_id = category_id > {code} > Current: > {noformat} > == Optimized Logical Plan == > Project [i_item_sk#4 AS ss_item_sk#54], Statistics(sizeInBytes=370.8 MiB, > rowCount=3.24E+7) > +- Join Inner, (((i_brand_id#11 = brand_id#51) AND (i_class_id#13 = > class_id#52)) AND (i_category_id#15 = category_id#53)), > Statistics(sizeInBytes=1112.3 MiB, rowCount=3.24E+7) >:- Project [i_item_sk#4, i_brand_id#11, i_class_id#13, i_category_id#15], > Statistics(sizeInBytes=4.6 MiB, rowCount=2.02E+5) >: +- Filter ((isnotnull(i_brand_id#11) AND isnotnull(i_class_id#13)) AND > isnotnull(i_category_id#15)), Statistics(sizeInBytes=84.6 MiB, > rowCount=2.02E+5) >: +- Relation > spark_catalog.default.item[i_item_sk#4,i_item_id#5,i_rec_start_date#6,i_rec_end_date#7,i_item_desc#8,i_current_price#9,i_wholesale_cost#10,i_brand_id#11,i_brand#12,i_class_id#13,i_class#14,i_category_id#15,i_category#16,i_manufact_id#17,i_manufact#18,i_size#19,i_formulation#20,i_color#21,i_units#22,i_container#23,i_manager_id#24,i_product_name#25] > parquet, Statistics(sizeInBytes=85.2 MiB, rowCount=2.04E+5) >+- Aggregate [brand_id#51, class_id#52, category_id#53], [brand_id#51, > class_id#52, category_id#53], Statistics(sizeInBytes=2.6 MiB, > rowCount=1.37E+5) > +- Project [i_brand_id#62 AS brand_id#51, i_class_id#64 AS class_id#52, > i_category_id#66 AS category_id#53], Statistics(sizeInBytes=3.9 MiB, > rowCount=2.02E+5) > +- Filter ((isnotnull(i_brand_id#62) AND isnotnull(i_class_id#64)) > AND isnotnull(i_category_id#66)), Statistics(sizeInBytes=84.6 MiB, > rowCount=2.02E+5) > +- Relation > spark_catalog.default.item[i_item_sk#55,i_item_id#56,i_rec_start_date#57,i_rec_end_date#58,i_item_desc#59,i_current_price#60,i_wholesale_cost#61,i_brand_id#62,i_brand#63,i_class_id#64,i_class#65,i_category_id#66,i_category#67,i_manufact_id#68,i_manufact#69,i_size#70,i_formulation#71,i_color#72,i_units#73,i_container#74,i_manager_id#75,i_product_name#76] > parquet, Statistics(sizeInBytes=85.2 MiB, rowCount=2.04E+5) > {noformat} > Excepted: > {noformat} > == Optimized Logical Plan == > Project [i_item_sk#4 AS ss_item_sk#54], Statistics(sizeInBytes=2.3 MiB, > rowCount=2.02E+5) > +- Join Inner, (((i_brand_id#11 = brand_id#51) AND (i_class_id#13 = > class_id#52)) AND (i_category_id#15 = category_id#53)), > Statistics(sizeInBytes=7.0 MiB, rowCount=2.02E+5) >:- Project [i_item_sk#4, i_brand_id#11, i_class_id#13, i_category_id#15], > Statistics(sizeInBytes=4.6 MiB, rowCount=2.02E+5) >: +- Filter ((isnotnull(i_brand_id#11) AND isnotnull(i_class_id#13)) AND > isnotnull(i_category_id#15)), Statistics(sizeInBytes=84.6 MiB, > rowCount=2.02E+5) >: +- Relation > spark_catalog.default.item[i_item_sk#4,i_item_id#5,i_rec_start_date#6,i_rec_end_date#7,i_item_desc#8,i_current_price#9,i_wholesale_cost#10,i_brand_id#11,i_brand#12,i_class_id#13,i_class#14,i_category_id#15,i_category#16,i_manufact_id#17,i_manufact#18,i_size#19,i_formulation#20,i_color#21,i_units#22,i_container#23,i_manager_id#24,i_product_name#25] > parquet, Statistics(sizeInBytes=85.2 MiB, rowCount=2.04E+5) >+- Aggregate [brand_id#51, class_id#52, category_id#53], [brand_id#51, > class_id#52, category_id#53], Statistics(sizeInBytes=2.6 MiB, > rowCount=1.37E+5) > +- Project [i_brand_id#62 AS brand_id#51, i_class_id#64 AS class_id#52, > i_category_id#66 AS category_id#53], Statistics(sizeInBytes=3.9 MiB, > rowCount=2.02E+5) > +- Filter ((isnotnull(i_brand_id#62) AND isnotnull(i_class_id#64)) > AND isnotnull(i_category_id#66)), Statistics(sizeInBytes=84.6 MiB, > rowCount=2.02E+5) > +- Relation >
[jira] [Commented] (SPARK-41708) Pull v1write information to WriteFiles
[ https://issues.apache.org/jira/browse/SPARK-41708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17685102#comment-17685102 ] Apache Spark commented on SPARK-41708: -- User 'cloud-fan' has created a pull request for this issue: https://github.com/apache/spark/pull/39924 > Pull v1write information to WriteFiles > -- > > Key: SPARK-41708 > URL: https://issues.apache.org/jira/browse/SPARK-41708 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: XiDuo You >Assignee: XiDuo You >Priority: Major > Fix For: 3.4.0 > > > Make WriteFiles hold v1 write information -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42368) Ignore SparkRemoteFileTest K8s IT test case in GitHub Action
[ https://issues.apache.org/jira/browse/SPARK-42368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42368: Assignee: Apache Spark > Ignore SparkRemoteFileTest K8s IT test case in GitHub Action > > > Key: SPARK-42368 > URL: https://issues.apache.org/jira/browse/SPARK-42368 > Project: Spark > Issue Type: Test > Components: Project Infra, Tests >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Assignee: Apache Spark >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42368) Ignore SparkRemoteFileTest K8s IT test case in GitHub Action
[ https://issues.apache.org/jira/browse/SPARK-42368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17685101#comment-17685101 ] Apache Spark commented on SPARK-42368: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/39921 > Ignore SparkRemoteFileTest K8s IT test case in GitHub Action > > > Key: SPARK-42368 > URL: https://issues.apache.org/jira/browse/SPARK-42368 > Project: Spark > Issue Type: Test > Components: Project Infra, Tests >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42368) Ignore SparkRemoteFileTest K8s IT test case in GitHub Action
[ https://issues.apache.org/jira/browse/SPARK-42368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42368: Assignee: (was: Apache Spark) > Ignore SparkRemoteFileTest K8s IT test case in GitHub Action > > > Key: SPARK-42368 > URL: https://issues.apache.org/jira/browse/SPARK-42368 > Project: Spark > Issue Type: Test > Components: Project Infra, Tests >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41716) Factor pyspark.sql.connect.Catalog._catalog_to_pandas to client.py
[ https://issues.apache.org/jira/browse/SPARK-41716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41716: Assignee: (was: Apache Spark) > Factor pyspark.sql.connect.Catalog._catalog_to_pandas to client.py > -- > > Key: SPARK-41716 > URL: https://issues.apache.org/jira/browse/SPARK-41716 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > > _catalog_to_pandas is more about client.py. We should probably factor this > out to the client. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41716) Factor pyspark.sql.connect.Catalog._catalog_to_pandas to client.py
[ https://issues.apache.org/jira/browse/SPARK-41716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41716: Assignee: Apache Spark > Factor pyspark.sql.connect.Catalog._catalog_to_pandas to client.py > -- > > Key: SPARK-41716 > URL: https://issues.apache.org/jira/browse/SPARK-41716 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Apache Spark >Priority: Major > > _catalog_to_pandas is more about client.py. We should probably factor this > out to the client. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41716) Factor pyspark.sql.connect.Catalog._catalog_to_pandas to client.py
[ https://issues.apache.org/jira/browse/SPARK-41716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17685048#comment-17685048 ] Apache Spark commented on SPARK-41716: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/39920 > Factor pyspark.sql.connect.Catalog._catalog_to_pandas to client.py > -- > > Key: SPARK-41716 > URL: https://issues.apache.org/jira/browse/SPARK-41716 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > > _catalog_to_pandas is more about client.py. We should probably factor this > out to the client. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41612) Support Catalog.isCached
[ https://issues.apache.org/jira/browse/SPARK-41612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17685032#comment-17685032 ] Apache Spark commented on SPARK-41612: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/39919 > Support Catalog.isCached > > > Key: SPARK-41612 > URL: https://issues.apache.org/jira/browse/SPARK-41612 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Martin Grund >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41623) Support Catalog.uncacheTable
[ https://issues.apache.org/jira/browse/SPARK-41623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41623: Assignee: (was: Apache Spark) > Support Catalog.uncacheTable > > > Key: SPARK-41623 > URL: https://issues.apache.org/jira/browse/SPARK-41623 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Martin Grund >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41612) Support Catalog.isCached
[ https://issues.apache.org/jira/browse/SPARK-41612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41612: Assignee: (was: Apache Spark) > Support Catalog.isCached > > > Key: SPARK-41612 > URL: https://issues.apache.org/jira/browse/SPARK-41612 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Martin Grund >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41612) Support Catalog.isCached
[ https://issues.apache.org/jira/browse/SPARK-41612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17685030#comment-17685030 ] Apache Spark commented on SPARK-41612: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/39919 > Support Catalog.isCached > > > Key: SPARK-41612 > URL: https://issues.apache.org/jira/browse/SPARK-41612 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Martin Grund >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41623) Support Catalog.uncacheTable
[ https://issues.apache.org/jira/browse/SPARK-41623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17685028#comment-17685028 ] Apache Spark commented on SPARK-41623: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/39919 > Support Catalog.uncacheTable > > > Key: SPARK-41623 > URL: https://issues.apache.org/jira/browse/SPARK-41623 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Martin Grund >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org