[jira] [Assigned] (SPARK-42289) DS V2 pushdown could let JDBC dialect decide to push down offset and limit

2023-02-09 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42289:


Assignee: Apache Spark

> DS V2 pushdown could let JDBC dialect decide to push down offset and limit
> --
>
> Key: SPARK-42289
> URL: https://issues.apache.org/jira/browse/SPARK-42289
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: jiaan.geng
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42313) Assign name to _LEGACY_ERROR_TEMP_1152

2023-02-09 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17686374#comment-17686374
 ] 

Apache Spark commented on SPARK-42313:
--

User 'itholic' has created a pull request for this issue:
https://github.com/apache/spark/pull/39953

> Assign name to _LEGACY_ERROR_TEMP_1152
> --
>
> Key: SPARK-42313
> URL: https://issues.apache.org/jira/browse/SPARK-42313
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42313) Assign name to _LEGACY_ERROR_TEMP_1152

2023-02-09 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42313:


Assignee: Apache Spark

> Assign name to _LEGACY_ERROR_TEMP_1152
> --
>
> Key: SPARK-42313
> URL: https://issues.apache.org/jira/browse/SPARK-42313
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42313) Assign name to _LEGACY_ERROR_TEMP_1152

2023-02-09 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42313:


Assignee: (was: Apache Spark)

> Assign name to _LEGACY_ERROR_TEMP_1152
> --
>
> Key: SPARK-42313
> URL: https://issues.apache.org/jira/browse/SPARK-42313
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40770) Improved error messages for applyInPandas for schema mismatch

2023-02-09 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17686372#comment-17686372
 ] 

Apache Spark commented on SPARK-40770:
--

User 'EnricoMi' has created a pull request for this issue:
https://github.com/apache/spark/pull/39952

> Improved error messages for applyInPandas for schema mismatch
> -
>
> Key: SPARK-40770
> URL: https://issues.apache.org/jira/browse/SPARK-40770
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Enrico Minack
>Assignee: Enrico Minack
>Priority: Minor
> Fix For: 3.5.0
>
>
> Error messages raised by `applyInPandas` are very generic or useless when 
> used with complex schemata:
> {code}
> KeyError: 'val'
> {code}
> {code}
> RuntimeError: Number of columns of the returned pandas.DataFrame doesn't 
> match specified schema. Expected: 2 Actual: 3
> {code}
> {code}
> java.lang.IllegalArgumentException: not all nodes and buffers were consumed. 
> nodes: [ArrowFieldNode [length=3, nullCount=0]] buffers: [ArrowBuf[304], 
> address:139860828549160, length:0, ArrowBuf[305], address:139860828549160, 
> length:24]
> {code}
> {code}
> pyarrow.lib.ArrowTypeError: Expected a string or bytes dtype, got int64
> {code}
> {code}
> pyarrow.lib.ArrowInvalid: Could not convert '0' with type str: tried to 
> convert to double
> {code}
> These should be improved by adding column names or descriptive messages (in 
> the same order as above):
> {code}
> RuntimeError: Column names of the returned pandas.DataFrame do not match 
> specified schema.  Missing: val  Unexpected: v  Schema: id, val
> {code}
> {code}
> RuntimeError: Column names of the returned pandas.DataFrame do not match 
> specified schema.  Missing: val  Unexpected: foo, v  Schema: id, val
> {code}
> {code}
> RuntimeError: Column names of the returned pandas.DataFrame do not match 
> specified schema.  Unexpected: v  Schema: id, id
> {code}
> {code}
> pyarrow.lib.ArrowTypeError: Expected a string or bytes dtype, got int64
> The above exception was the direct cause of the following exception:
> TypeError: Exception thrown when converting pandas.Series (int64) with name 
> 'val' to Arrow Array (string).
> {code}
> {code}
> pyarrow.lib.ArrowInvalid: Could not convert '0' with type str: tried to 
> convert to double
> The above exception was the direct cause of the following exception:
> ValueError: Exception thrown when converting pandas.Series (object) with name 
> 'val' to Arrow Array (double).
> {code}
> When no column names are given, the following error was returned:
> {code}
> RuntimeError: Number of columns of the returned pandas.DataFrame doesn't 
> match specified schema. Expected: 2 Actual: 3
> {code}
> Where it should contain the output schema:
> {code}
> RuntimeError: Number of columns of the returned pandas.DataFrame doesn't 
> match specified schema.  Expected: 2  Actual: 3  Schema: id, val
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40770) Improved error messages for applyInPandas for schema mismatch

2023-02-09 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17686371#comment-17686371
 ] 

Apache Spark commented on SPARK-40770:
--

User 'EnricoMi' has created a pull request for this issue:
https://github.com/apache/spark/pull/39952

> Improved error messages for applyInPandas for schema mismatch
> -
>
> Key: SPARK-40770
> URL: https://issues.apache.org/jira/browse/SPARK-40770
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Enrico Minack
>Assignee: Enrico Minack
>Priority: Minor
> Fix For: 3.5.0
>
>
> Error messages raised by `applyInPandas` are very generic or useless when 
> used with complex schemata:
> {code}
> KeyError: 'val'
> {code}
> {code}
> RuntimeError: Number of columns of the returned pandas.DataFrame doesn't 
> match specified schema. Expected: 2 Actual: 3
> {code}
> {code}
> java.lang.IllegalArgumentException: not all nodes and buffers were consumed. 
> nodes: [ArrowFieldNode [length=3, nullCount=0]] buffers: [ArrowBuf[304], 
> address:139860828549160, length:0, ArrowBuf[305], address:139860828549160, 
> length:24]
> {code}
> {code}
> pyarrow.lib.ArrowTypeError: Expected a string or bytes dtype, got int64
> {code}
> {code}
> pyarrow.lib.ArrowInvalid: Could not convert '0' with type str: tried to 
> convert to double
> {code}
> These should be improved by adding column names or descriptive messages (in 
> the same order as above):
> {code}
> RuntimeError: Column names of the returned pandas.DataFrame do not match 
> specified schema.  Missing: val  Unexpected: v  Schema: id, val
> {code}
> {code}
> RuntimeError: Column names of the returned pandas.DataFrame do not match 
> specified schema.  Missing: val  Unexpected: foo, v  Schema: id, val
> {code}
> {code}
> RuntimeError: Column names of the returned pandas.DataFrame do not match 
> specified schema.  Unexpected: v  Schema: id, id
> {code}
> {code}
> pyarrow.lib.ArrowTypeError: Expected a string or bytes dtype, got int64
> The above exception was the direct cause of the following exception:
> TypeError: Exception thrown when converting pandas.Series (int64) with name 
> 'val' to Arrow Array (string).
> {code}
> {code}
> pyarrow.lib.ArrowInvalid: Could not convert '0' with type str: tried to 
> convert to double
> The above exception was the direct cause of the following exception:
> ValueError: Exception thrown when converting pandas.Series (object) with name 
> 'val' to Arrow Array (double).
> {code}
> When no column names are given, the following error was returned:
> {code}
> RuntimeError: Number of columns of the returned pandas.DataFrame doesn't 
> match specified schema. Expected: 2 Actual: 3
> {code}
> Where it should contain the output schema:
> {code}
> RuntimeError: Number of columns of the returned pandas.DataFrame doesn't 
> match specified schema.  Expected: 2  Actual: 3  Schema: id, val
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42312) Assign name to _LEGACY_ERROR_TEMP_0042

2023-02-09 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17686368#comment-17686368
 ] 

Apache Spark commented on SPARK-42312:
--

User 'itholic' has created a pull request for this issue:
https://github.com/apache/spark/pull/39951

> Assign name to _LEGACY_ERROR_TEMP_0042
> --
>
> Key: SPARK-42312
> URL: https://issues.apache.org/jira/browse/SPARK-42312
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42312) Assign name to _LEGACY_ERROR_TEMP_0042

2023-02-09 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42312:


Assignee: (was: Apache Spark)

> Assign name to _LEGACY_ERROR_TEMP_0042
> --
>
> Key: SPARK-42312
> URL: https://issues.apache.org/jira/browse/SPARK-42312
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42312) Assign name to _LEGACY_ERROR_TEMP_0042

2023-02-09 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42312:


Assignee: Apache Spark

> Assign name to _LEGACY_ERROR_TEMP_0042
> --
>
> Key: SPARK-42312
> URL: https://issues.apache.org/jira/browse/SPARK-42312
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42312) Assign name to _LEGACY_ERROR_TEMP_0042

2023-02-09 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17686369#comment-17686369
 ] 

Apache Spark commented on SPARK-42312:
--

User 'itholic' has created a pull request for this issue:
https://github.com/apache/spark/pull/39951

> Assign name to _LEGACY_ERROR_TEMP_0042
> --
>
> Key: SPARK-42312
> URL: https://issues.apache.org/jira/browse/SPARK-42312
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42388) Avoid unnecessary parquet footer reads when no filters in vectorized reader

2023-02-09 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17686280#comment-17686280
 ] 

Apache Spark commented on SPARK-42388:
--

User 'yabola' has created a pull request for this issue:
https://github.com/apache/spark/pull/39950

> Avoid unnecessary parquet footer reads when no filters in vectorized reader
> ---
>
> Key: SPARK-42388
> URL: https://issues.apache.org/jira/browse/SPARK-42388
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Mars
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42388) Avoid unnecessary parquet footer reads when no filters in vectorized reader

2023-02-09 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42388:


Assignee: (was: Apache Spark)

> Avoid unnecessary parquet footer reads when no filters in vectorized reader
> ---
>
> Key: SPARK-42388
> URL: https://issues.apache.org/jira/browse/SPARK-42388
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Mars
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42388) Avoid unnecessary parquet footer reads when no filters in vectorized reader

2023-02-09 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42388:


Assignee: Apache Spark

> Avoid unnecessary parquet footer reads when no filters in vectorized reader
> ---
>
> Key: SPARK-42388
> URL: https://issues.apache.org/jira/browse/SPARK-42388
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Mars
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42386) Rewrite HiveGenericUDF with Invoke

2023-02-08 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17686222#comment-17686222
 ] 

Apache Spark commented on SPARK-42386:
--

User 'panbingkun' has created a pull request for this issue:
https://github.com/apache/spark/pull/39949

> Rewrite HiveGenericUDF with Invoke
> --
>
> Key: SPARK-42386
> URL: https://issues.apache.org/jira/browse/SPARK-42386
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42386) Rewrite HiveGenericUDF with Invoke

2023-02-08 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42386:


Assignee: (was: Apache Spark)

> Rewrite HiveGenericUDF with Invoke
> --
>
> Key: SPARK-42386
> URL: https://issues.apache.org/jira/browse/SPARK-42386
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42386) Rewrite HiveGenericUDF with Invoke

2023-02-08 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42386:


Assignee: Apache Spark

> Rewrite HiveGenericUDF with Invoke
> --
>
> Key: SPARK-42386
> URL: https://issues.apache.org/jira/browse/SPARK-42386
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: BingKun Pan
>Assignee: Apache Spark
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42385) Upgrade RoaringBitmap to 0.9.39

2023-02-08 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17686190#comment-17686190
 ] 

Apache Spark commented on SPARK-42385:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/39948

> Upgrade RoaringBitmap to 0.9.39
> ---
>
> Key: SPARK-42385
> URL: https://issues.apache.org/jira/browse/SPARK-42385
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Priority: Minor
>
> [https://github.com/RoaringBitmap/RoaringBitmap/releases/tag/0.9.39]
>  * ForAllInRange Fixes Yet Again by [@larsk-db|https://github.com/larsk-db] 
> in [#614|https://github.com/RoaringBitmap/RoaringBitmap/pull/614]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42385) Upgrade RoaringBitmap to 0.9.39

2023-02-08 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42385:


Assignee: (was: Apache Spark)

> Upgrade RoaringBitmap to 0.9.39
> ---
>
> Key: SPARK-42385
> URL: https://issues.apache.org/jira/browse/SPARK-42385
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Priority: Minor
>
> [https://github.com/RoaringBitmap/RoaringBitmap/releases/tag/0.9.39]
>  * ForAllInRange Fixes Yet Again by [@larsk-db|https://github.com/larsk-db] 
> in [#614|https://github.com/RoaringBitmap/RoaringBitmap/pull/614]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42385) Upgrade RoaringBitmap to 0.9.39

2023-02-08 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42385:


Assignee: Apache Spark

> Upgrade RoaringBitmap to 0.9.39
> ---
>
> Key: SPARK-42385
> URL: https://issues.apache.org/jira/browse/SPARK-42385
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Minor
>
> [https://github.com/RoaringBitmap/RoaringBitmap/releases/tag/0.9.39]
>  * ForAllInRange Fixes Yet Again by [@larsk-db|https://github.com/larsk-db] 
> in [#614|https://github.com/RoaringBitmap/RoaringBitmap/pull/614]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41715) Catch specific exceptions for both Spark Connect and PySpark

2023-02-08 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17686163#comment-17686163
 ] 

Apache Spark commented on SPARK-41715:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/39947

> Catch specific exceptions for both Spark Connect and PySpark
> 
>
> Key: SPARK-41715
> URL: https://issues.apache.org/jira/browse/SPARK-41715
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Minor
>
> In python/pyspark/sql/tests/test_catalog.py, we should catch more specific 
> exceptions such as AnalysisException. The test is shared in both Spark 
> Connect and PySpark so we should figure the way out to share it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41715) Catch specific exceptions for both Spark Connect and PySpark

2023-02-08 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17686164#comment-17686164
 ] 

Apache Spark commented on SPARK-41715:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/39947

> Catch specific exceptions for both Spark Connect and PySpark
> 
>
> Key: SPARK-41715
> URL: https://issues.apache.org/jira/browse/SPARK-41715
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Minor
>
> In python/pyspark/sql/tests/test_catalog.py, we should catch more specific 
> exceptions such as AnalysisException. The test is shared in both Spark 
> Connect and PySpark so we should figure the way out to share it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41715) Catch specific exceptions for both Spark Connect and PySpark

2023-02-08 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41715:


Assignee: (was: Apache Spark)

> Catch specific exceptions for both Spark Connect and PySpark
> 
>
> Key: SPARK-41715
> URL: https://issues.apache.org/jira/browse/SPARK-41715
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Minor
>
> In python/pyspark/sql/tests/test_catalog.py, we should catch more specific 
> exceptions such as AnalysisException. The test is shared in both Spark 
> Connect and PySpark so we should figure the way out to share it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41715) Catch specific exceptions for both Spark Connect and PySpark

2023-02-08 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17686162#comment-17686162
 ] 

Apache Spark commented on SPARK-41715:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/39947

> Catch specific exceptions for both Spark Connect and PySpark
> 
>
> Key: SPARK-41715
> URL: https://issues.apache.org/jira/browse/SPARK-41715
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Minor
>
> In python/pyspark/sql/tests/test_catalog.py, we should catch more specific 
> exceptions such as AnalysisException. The test is shared in both Spark 
> Connect and PySpark so we should figure the way out to share it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41715) Catch specific exceptions for both Spark Connect and PySpark

2023-02-08 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41715:


Assignee: Apache Spark

> Catch specific exceptions for both Spark Connect and PySpark
> 
>
> Key: SPARK-41715
> URL: https://issues.apache.org/jira/browse/SPARK-41715
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Apache Spark
>Priority: Minor
>
> In python/pyspark/sql/tests/test_catalog.py, we should catch more specific 
> exceptions such as AnalysisException. The test is shared in both Spark 
> Connect and PySpark so we should figure the way out to share it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40453) Improve error handling for GRPC server

2023-02-08 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40453:


Assignee: Apache Spark

> Improve error handling for GRPC server
> --
>
> Key: SPARK-40453
> URL: https://issues.apache.org/jira/browse/SPARK-40453
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.2.2
>Reporter: Martin Grund
>Assignee: Apache Spark
>Priority: Major
>
> Right now the errors are handled very rudimentary and do not produce proper 
> GRPC errors. This issue address the work needed to return proper errors.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40453) Improve error handling for GRPC server

2023-02-08 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17686161#comment-17686161
 ] 

Apache Spark commented on SPARK-40453:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/39947

> Improve error handling for GRPC server
> --
>
> Key: SPARK-40453
> URL: https://issues.apache.org/jira/browse/SPARK-40453
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.2.2
>Reporter: Martin Grund
>Priority: Major
>
> Right now the errors are handled very rudimentary and do not produce proper 
> GRPC errors. This issue address the work needed to return proper errors.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40453) Improve error handling for GRPC server

2023-02-08 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40453:


Assignee: (was: Apache Spark)

> Improve error handling for GRPC server
> --
>
> Key: SPARK-40453
> URL: https://issues.apache.org/jira/browse/SPARK-40453
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.2.2
>Reporter: Martin Grund
>Priority: Major
>
> Right now the errors are handled very rudimentary and do not produce proper 
> GRPC errors. This issue address the work needed to return proper errors.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40453) Improve error handling for GRPC server

2023-02-08 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17686160#comment-17686160
 ] 

Apache Spark commented on SPARK-40453:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/39947

> Improve error handling for GRPC server
> --
>
> Key: SPARK-40453
> URL: https://issues.apache.org/jira/browse/SPARK-40453
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.2.2
>Reporter: Martin Grund
>Priority: Major
>
> Right now the errors are handled very rudimentary and do not produce proper 
> GRPC errors. This issue address the work needed to return proper errors.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42310) Assign name to _LEGACY_ERROR_TEMP_1289

2023-02-08 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42310:


Assignee: (was: Apache Spark)

> Assign name to _LEGACY_ERROR_TEMP_1289
> --
>
> Key: SPARK-42310
> URL: https://issues.apache.org/jira/browse/SPARK-42310
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42310) Assign name to _LEGACY_ERROR_TEMP_1289

2023-02-08 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42310:


Assignee: Apache Spark

> Assign name to _LEGACY_ERROR_TEMP_1289
> --
>
> Key: SPARK-42310
> URL: https://issues.apache.org/jira/browse/SPARK-42310
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42310) Assign name to _LEGACY_ERROR_TEMP_1289

2023-02-08 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17686036#comment-17686036
 ] 

Apache Spark commented on SPARK-42310:
--

User 'itholic' has created a pull request for this issue:
https://github.com/apache/spark/pull/39946

> Assign name to _LEGACY_ERROR_TEMP_1289
> --
>
> Key: SPARK-42310
> URL: https://issues.apache.org/jira/browse/SPARK-42310
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42384) Mask function's generated code does not handle null input

2023-02-08 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17685995#comment-17685995
 ] 

Apache Spark commented on SPARK-42384:
--

User 'bersprockets' has created a pull request for this issue:
https://github.com/apache/spark/pull/39945

> Mask function's generated code does not handle null input
> -
>
> Key: SPARK-42384
> URL: https://issues.apache.org/jira/browse/SPARK-42384
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0, 3.5.0
>Reporter: Bruce Robbins
>Priority: Major
>
> Example:
> {noformat}
> create or replace temp view v1 as
> select * from values
> (null),
> ('AbCD123-@$#')
> as data(col1);
> cache table v1;
> select mask(col1) from v1;
> {noformat}
> This query results in a {{NullPointerException}}:
> {noformat}
> 23/02/07 16:36:06 ERROR Executor: Exception in task 0.0 in stage 3.0 (TID 3)
> java.lang.NullPointerException
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeWriter.write(UnsafeWriter.java:110)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760)
> {noformat}
> The generated code calls {{UnsafeWriter.write(0, value_0)}} regardless of 
> whether {{Mask.transformInput}} returns null or not. The 
> {{UnsafeWriter.write}} method for {{UTF8String}} does not expect a null 
> pointer.
> {noformat}
> /* 031 */ boolean isNull_1 = i.isNullAt(0);
> /* 032 */ UTF8String value_1 = isNull_1 ?
> /* 033 */ null : (i.getUTF8String(0));
> /* 034 */
> /* 035 */
> /* 036 */
> /* 037 */
> /* 038 */ UTF8String value_0 = null;
> /* 039 */ value_0 = 
> org.apache.spark.sql.catalyst.expressions.Mask.transformInput(value_1, 
> ((UTF8String) references[0] /* literal */), ((UTF8String) references[1] /* 
> literal */), ((UTF8String) references[2] /* literal */), ((UTF8String) 
> references[3] /* literal */));;
> /* 040 */ if (false) {
> /* 041 */   mutableStateArray_0[0].setNullAt(0);
> /* 042 */ } else {
> /* 043 */   mutableStateArray_0[0].write(0, value_0);
> /* 044 */ }
> /* 045 */ return (mutableStateArray_0[0].getRow());
> /* 046 */   }
> {noformat}
> The bug is not exercised by a literal null input value, since there appears 
> to be some optimization that simply replaces the entire function call with a 
> null literal:
> {noformat}
> spark-sql> explain SELECT mask(NULL);
> == Physical Plan ==
> *(1) Project [null AS mask(NULL, X, x, n, NULL)#47]
> +- *(1) Scan OneRowRelation[]
> Time taken: 0.026 seconds, Fetched 1 row(s)
> spark-sql> SELECT mask(NULL);
> NULL
> Time taken: 0.042 seconds, Fetched 1 row(s)
> spark-sql> 
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42384) Mask function's generated code does not handle null input

2023-02-08 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42384:


Assignee: (was: Apache Spark)

> Mask function's generated code does not handle null input
> -
>
> Key: SPARK-42384
> URL: https://issues.apache.org/jira/browse/SPARK-42384
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0, 3.5.0
>Reporter: Bruce Robbins
>Priority: Major
>
> Example:
> {noformat}
> create or replace temp view v1 as
> select * from values
> (null),
> ('AbCD123-@$#')
> as data(col1);
> cache table v1;
> select mask(col1) from v1;
> {noformat}
> This query results in a {{NullPointerException}}:
> {noformat}
> 23/02/07 16:36:06 ERROR Executor: Exception in task 0.0 in stage 3.0 (TID 3)
> java.lang.NullPointerException
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeWriter.write(UnsafeWriter.java:110)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760)
> {noformat}
> The generated code calls {{UnsafeWriter.write(0, value_0)}} regardless of 
> whether {{Mask.transformInput}} returns null or not. The 
> {{UnsafeWriter.write}} method for {{UTF8String}} does not expect a null 
> pointer.
> {noformat}
> /* 031 */ boolean isNull_1 = i.isNullAt(0);
> /* 032 */ UTF8String value_1 = isNull_1 ?
> /* 033 */ null : (i.getUTF8String(0));
> /* 034 */
> /* 035 */
> /* 036 */
> /* 037 */
> /* 038 */ UTF8String value_0 = null;
> /* 039 */ value_0 = 
> org.apache.spark.sql.catalyst.expressions.Mask.transformInput(value_1, 
> ((UTF8String) references[0] /* literal */), ((UTF8String) references[1] /* 
> literal */), ((UTF8String) references[2] /* literal */), ((UTF8String) 
> references[3] /* literal */));;
> /* 040 */ if (false) {
> /* 041 */   mutableStateArray_0[0].setNullAt(0);
> /* 042 */ } else {
> /* 043 */   mutableStateArray_0[0].write(0, value_0);
> /* 044 */ }
> /* 045 */ return (mutableStateArray_0[0].getRow());
> /* 046 */   }
> {noformat}
> The bug is not exercised by a literal null input value, since there appears 
> to be some optimization that simply replaces the entire function call with a 
> null literal:
> {noformat}
> spark-sql> explain SELECT mask(NULL);
> == Physical Plan ==
> *(1) Project [null AS mask(NULL, X, x, n, NULL)#47]
> +- *(1) Scan OneRowRelation[]
> Time taken: 0.026 seconds, Fetched 1 row(s)
> spark-sql> SELECT mask(NULL);
> NULL
> Time taken: 0.042 seconds, Fetched 1 row(s)
> spark-sql> 
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42384) Mask function's generated code does not handle null input

2023-02-08 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42384:


Assignee: Apache Spark

> Mask function's generated code does not handle null input
> -
>
> Key: SPARK-42384
> URL: https://issues.apache.org/jira/browse/SPARK-42384
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0, 3.5.0
>Reporter: Bruce Robbins
>Assignee: Apache Spark
>Priority: Major
>
> Example:
> {noformat}
> create or replace temp view v1 as
> select * from values
> (null),
> ('AbCD123-@$#')
> as data(col1);
> cache table v1;
> select mask(col1) from v1;
> {noformat}
> This query results in a {{NullPointerException}}:
> {noformat}
> 23/02/07 16:36:06 ERROR Executor: Exception in task 0.0 in stage 3.0 (TID 3)
> java.lang.NullPointerException
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeWriter.write(UnsafeWriter.java:110)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760)
> {noformat}
> The generated code calls {{UnsafeWriter.write(0, value_0)}} regardless of 
> whether {{Mask.transformInput}} returns null or not. The 
> {{UnsafeWriter.write}} method for {{UTF8String}} does not expect a null 
> pointer.
> {noformat}
> /* 031 */ boolean isNull_1 = i.isNullAt(0);
> /* 032 */ UTF8String value_1 = isNull_1 ?
> /* 033 */ null : (i.getUTF8String(0));
> /* 034 */
> /* 035 */
> /* 036 */
> /* 037 */
> /* 038 */ UTF8String value_0 = null;
> /* 039 */ value_0 = 
> org.apache.spark.sql.catalyst.expressions.Mask.transformInput(value_1, 
> ((UTF8String) references[0] /* literal */), ((UTF8String) references[1] /* 
> literal */), ((UTF8String) references[2] /* literal */), ((UTF8String) 
> references[3] /* literal */));;
> /* 040 */ if (false) {
> /* 041 */   mutableStateArray_0[0].setNullAt(0);
> /* 042 */ } else {
> /* 043 */   mutableStateArray_0[0].write(0, value_0);
> /* 044 */ }
> /* 045 */ return (mutableStateArray_0[0].getRow());
> /* 046 */   }
> {noformat}
> The bug is not exercised by a literal null input value, since there appears 
> to be some optimization that simply replaces the entire function call with a 
> null literal:
> {noformat}
> spark-sql> explain SELECT mask(NULL);
> == Physical Plan ==
> *(1) Project [null AS mask(NULL, X, x, n, NULL)#47]
> +- *(1) Scan OneRowRelation[]
> Time taken: 0.026 seconds, Fetched 1 row(s)
> spark-sql> SELECT mask(NULL);
> NULL
> Time taken: 0.042 seconds, Fetched 1 row(s)
> spark-sql> 
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42383) Protobuf serializer for RocksDB.TypeAliases

2023-02-08 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42383:


Assignee: Apache Spark

> Protobuf serializer for RocksDB.TypeAliases
> ---
>
> Key: SPARK-42383
> URL: https://issues.apache.org/jira/browse/SPARK-42383
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42383) Protobuf serializer for RocksDB.TypeAliases

2023-02-08 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42383:


Assignee: (was: Apache Spark)

> Protobuf serializer for RocksDB.TypeAliases
> ---
>
> Key: SPARK-42383
> URL: https://issues.apache.org/jira/browse/SPARK-42383
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42383) Protobuf serializer for RocksDB.TypeAliases

2023-02-08 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17685950#comment-17685950
 ] 

Apache Spark commented on SPARK-42383:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/39944

> Protobuf serializer for RocksDB.TypeAliases
> ---
>
> Key: SPARK-42383
> URL: https://issues.apache.org/jira/browse/SPARK-42383
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40819) Parquet INT64 (TIMESTAMP(NANOS,true)) now throwing Illegal Parquet type instead of automatically converting to LongType

2023-02-08 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17685928#comment-17685928
 ] 

Apache Spark commented on SPARK-40819:
--

User 'awdavidson' has created a pull request for this issue:
https://github.com/apache/spark/pull/39943

> Parquet INT64 (TIMESTAMP(NANOS,true)) now throwing Illegal Parquet type 
> instead of automatically converting to LongType 
> 
>
> Key: SPARK-40819
> URL: https://issues.apache.org/jira/browse/SPARK-40819
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0, 3.2.1, 3.3.0, 3.2.2, 3.3.1, 3.2.3, 3.3.2, 3.4.0
>Reporter: Alfred Davidson
>Assignee: Alfred Davidson
>Priority: Critical
>  Labels: regression
> Fix For: 3.2.4, 3.3.2, 3.4.0
>
>
> Since 3.2 parquet files containing attributes with type "INT64 
> (TIMESTAMP(NANOS, true))" are no longer readable and attempting to read 
> throws:
>  
> {code:java}
> Caused by: org.apache.spark.sql.AnalysisException: Illegal Parquet type: 
> INT64 (TIMESTAMP(NANOS,true))
>   at 
> org.apache.spark.sql.errors.QueryCompilationErrors$.illegalParquetTypeError(QueryCompilationErrors.scala:1284)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.illegalType$1(ParquetSchemaConverter.scala:105)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.convertPrimitiveField(ParquetSchemaConverter.scala:174)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.convertField(ParquetSchemaConverter.scala:90)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.$anonfun$convert$1(ParquetSchemaConverter.scala:72)
>   at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
>   at scala.collection.Iterator.foreach(Iterator.scala:941)
>   at scala.collection.Iterator.foreach$(Iterator.scala:941)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
>   at scala.collection.IterableLike.foreach(IterableLike.scala:74)
>   at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
>   at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
>   at scala.collection.TraversableLike.map(TraversableLike.scala:238)
>   at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:108)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.convert(ParquetSchemaConverter.scala:66)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.convert(ParquetSchemaConverter.scala:63)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$.$anonfun$readSchemaFromFooter$2(ParquetFileFormat.scala:548)
>   at scala.Option.getOrElse(Option.scala:189)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$.readSchemaFromFooter(ParquetFileFormat.scala:548)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$.$anonfun$mergeSchemasInParallel$2(ParquetFileFormat.scala:528)
>   at scala.collection.immutable.Stream.map(Stream.scala:418)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$.$anonfun$mergeSchemasInParallel$1(ParquetFileFormat.scala:528)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$.$anonfun$mergeSchemasInParallel$1$adapted(ParquetFileFormat.scala:521)
>   at 
> org.apache.spark.sql.execution.datasources.SchemaMergeUtils$.$anonfun$mergeSchemasInParallel$2(SchemaMergeUtils.scala:76)
>  {code}
> Prior to 3.2 successfully reads the parquet automatically converting to a 
> LongType.
> I believe work part of https://issues.apache.org/jira/browse/SPARK-34661 
> introduced the change in behaviour, more specifically here: 
> [https://github.com/apache/spark/pull/31776/files#diff-3730a913c4b95edf09fb78f8739c538bae53f7269555b6226efe7ccee1901b39R154]
>  which throws the QueryCompilationErrors.illegalParquetTypeError



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42267) Support left_outer join

2023-02-08 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17685758#comment-17685758
 ] 

Apache Spark commented on SPARK-42267:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/39940

> Support left_outer join
> ---
>
> Key: SPARK-42267
> URL: https://issues.apache.org/jira/browse/SPARK-42267
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Assignee: Ruifeng Zheng
>Priority: Major
> Fix For: 3.4.0
>
>
> ```
> >>> df = spark.range(1)
> >>> df2 = spark.range(2)
> >>> df.join(df2, how="left_outer")
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "/Users/xinrong.meng/spark/python/pyspark/sql/connect/dataframe.py", 
> line 438, in join
> plan.Join(left=self._plan, right=other._plan, on=on, how=how),
>   File "/Users/xinrong.meng/spark/python/pyspark/sql/connect/plan.py", line 
> 730, in __init__
> raise NotImplementedError(
> NotImplementedError: 
> Unsupported join type: left_outer. Supported join types 
> include:
> "inner", "outer", "full", "fullouter", "full_outer",
> "leftouter", "left", "left_outer", "rightouter",
> "right", "right_outer", "leftsemi", "left_semi",
> "semi", "leftanti", "left_anti", "anti", "cross",
> ```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42267) Support left_outer join

2023-02-08 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17685760#comment-17685760
 ] 

Apache Spark commented on SPARK-42267:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/39940

> Support left_outer join
> ---
>
> Key: SPARK-42267
> URL: https://issues.apache.org/jira/browse/SPARK-42267
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Assignee: Ruifeng Zheng
>Priority: Major
> Fix For: 3.4.0
>
>
> ```
> >>> df = spark.range(1)
> >>> df2 = spark.range(2)
> >>> df.join(df2, how="left_outer")
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "/Users/xinrong.meng/spark/python/pyspark/sql/connect/dataframe.py", 
> line 438, in join
> plan.Join(left=self._plan, right=other._plan, on=on, how=how),
>   File "/Users/xinrong.meng/spark/python/pyspark/sql/connect/plan.py", line 
> 730, in __init__
> raise NotImplementedError(
> NotImplementedError: 
> Unsupported join type: left_outer. Supported join types 
> include:
> "inner", "outer", "full", "fullouter", "full_outer",
> "leftouter", "left", "left_outer", "rightouter",
> "right", "right_outer", "leftsemi", "left_semi",
> "semi", "leftanti", "left_anti", "anti", "cross",
> ```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42381) `CreateDataFrame` should accept objects

2023-02-08 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17685752#comment-17685752
 ] 

Apache Spark commented on SPARK-42381:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/39939

> `CreateDataFrame` should accept objects
> ---
>
> Key: SPARK-42381
> URL: https://issues.apache.org/jira/browse/SPARK-42381
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42381) `CreateDataFrame` should accept objects

2023-02-08 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42381:


Assignee: (was: Apache Spark)

> `CreateDataFrame` should accept objects
> ---
>
> Key: SPARK-42381
> URL: https://issues.apache.org/jira/browse/SPARK-42381
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42381) `CreateDataFrame` should accept objects

2023-02-08 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17685750#comment-17685750
 ] 

Apache Spark commented on SPARK-42381:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/39939

> `CreateDataFrame` should accept objects
> ---
>
> Key: SPARK-42381
> URL: https://issues.apache.org/jira/browse/SPARK-42381
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42381) `CreateDataFrame` should accept objects

2023-02-08 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42381:


Assignee: Apache Spark

> `CreateDataFrame` should accept objects
> ---
>
> Key: SPARK-42381
> URL: https://issues.apache.org/jira/browse/SPARK-42381
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42309) Assign name to _LEGACY_ERROR_TEMP_1204

2023-02-07 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42309:


Assignee: Apache Spark

> Assign name to _LEGACY_ERROR_TEMP_1204
> --
>
> Key: SPARK-42309
> URL: https://issues.apache.org/jira/browse/SPARK-42309
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42309) Assign name to _LEGACY_ERROR_TEMP_1204

2023-02-07 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42309:


Assignee: (was: Apache Spark)

> Assign name to _LEGACY_ERROR_TEMP_1204
> --
>
> Key: SPARK-42309
> URL: https://issues.apache.org/jira/browse/SPARK-42309
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42309) Assign name to _LEGACY_ERROR_TEMP_1204

2023-02-07 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17685723#comment-17685723
 ] 

Apache Spark commented on SPARK-42309:
--

User 'itholic' has created a pull request for this issue:
https://github.com/apache/spark/pull/39937

> Assign name to _LEGACY_ERROR_TEMP_1204
> --
>
> Key: SPARK-42309
> URL: https://issues.apache.org/jira/browse/SPARK-42309
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42267) Support left_outer join

2023-02-07 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42267:


Assignee: (was: Apache Spark)

> Support left_outer join
> ---
>
> Key: SPARK-42267
> URL: https://issues.apache.org/jira/browse/SPARK-42267
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Priority: Major
>
> ```
> >>> df = spark.range(1)
> >>> df2 = spark.range(2)
> >>> df.join(df2, how="left_outer")
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "/Users/xinrong.meng/spark/python/pyspark/sql/connect/dataframe.py", 
> line 438, in join
> plan.Join(left=self._plan, right=other._plan, on=on, how=how),
>   File "/Users/xinrong.meng/spark/python/pyspark/sql/connect/plan.py", line 
> 730, in __init__
> raise NotImplementedError(
> NotImplementedError: 
> Unsupported join type: left_outer. Supported join types 
> include:
> "inner", "outer", "full", "fullouter", "full_outer",
> "leftouter", "left", "left_outer", "rightouter",
> "right", "right_outer", "leftsemi", "left_semi",
> "semi", "leftanti", "left_anti", "anti", "cross",
> ```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42267) Support left_outer join

2023-02-07 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17685722#comment-17685722
 ] 

Apache Spark commented on SPARK-42267:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/39938

> Support left_outer join
> ---
>
> Key: SPARK-42267
> URL: https://issues.apache.org/jira/browse/SPARK-42267
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Priority: Major
>
> ```
> >>> df = spark.range(1)
> >>> df2 = spark.range(2)
> >>> df.join(df2, how="left_outer")
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "/Users/xinrong.meng/spark/python/pyspark/sql/connect/dataframe.py", 
> line 438, in join
> plan.Join(left=self._plan, right=other._plan, on=on, how=how),
>   File "/Users/xinrong.meng/spark/python/pyspark/sql/connect/plan.py", line 
> 730, in __init__
> raise NotImplementedError(
> NotImplementedError: 
> Unsupported join type: left_outer. Supported join types 
> include:
> "inner", "outer", "full", "fullouter", "full_outer",
> "leftouter", "left", "left_outer", "rightouter",
> "right", "right_outer", "leftsemi", "left_semi",
> "semi", "leftanti", "left_anti", "anti", "cross",
> ```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42267) Support left_outer join

2023-02-07 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42267:


Assignee: Apache Spark

> Support left_outer join
> ---
>
> Key: SPARK-42267
> URL: https://issues.apache.org/jira/browse/SPARK-42267
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Assignee: Apache Spark
>Priority: Major
>
> ```
> >>> df = spark.range(1)
> >>> df2 = spark.range(2)
> >>> df.join(df2, how="left_outer")
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "/Users/xinrong.meng/spark/python/pyspark/sql/connect/dataframe.py", 
> line 438, in join
> plan.Join(left=self._plan, right=other._plan, on=on, how=how),
>   File "/Users/xinrong.meng/spark/python/pyspark/sql/connect/plan.py", line 
> 730, in __init__
> raise NotImplementedError(
> NotImplementedError: 
> Unsupported join type: left_outer. Supported join types 
> include:
> "inner", "outer", "full", "fullouter", "full_outer",
> "leftouter", "left", "left_outer", "rightouter",
> "right", "right_outer", "leftsemi", "left_semi",
> "semi", "leftanti", "left_anti", "anti", "cross",
> ```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42379) Use FileSystem.exists in FileSystemBasedCheckpointFileManager.exists

2023-02-07 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17685681#comment-17685681
 ] 

Apache Spark commented on SPARK-42379:
--

User 'HeartSaVioR' has created a pull request for this issue:
https://github.com/apache/spark/pull/39936

> Use FileSystem.exists in FileSystemBasedCheckpointFileManager.exists
> 
>
> Key: SPARK-42379
> URL: https://issues.apache.org/jira/browse/SPARK-42379
> Project: Spark
>  Issue Type: Task
>  Components: Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Jungtaek Lim
>Priority: Major
>
> Other methods in FileSystemBasedCheckpointFileManager already uses 
> FileSystem.exists for all cases checking existence of the path. Use 
> FileSystem.exists in FileSystemBasedCheckpointFileManager.exists, which is 
> consistent with other methods in FileSystemBasedCheckpointFileManager.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42379) Use FileSystem.exists in FileSystemBasedCheckpointFileManager.exists

2023-02-07 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42379:


Assignee: (was: Apache Spark)

> Use FileSystem.exists in FileSystemBasedCheckpointFileManager.exists
> 
>
> Key: SPARK-42379
> URL: https://issues.apache.org/jira/browse/SPARK-42379
> Project: Spark
>  Issue Type: Task
>  Components: Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Jungtaek Lim
>Priority: Major
>
> Other methods in FileSystemBasedCheckpointFileManager already uses 
> FileSystem.exists for all cases checking existence of the path. Use 
> FileSystem.exists in FileSystemBasedCheckpointFileManager.exists, which is 
> consistent with other methods in FileSystemBasedCheckpointFileManager.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42379) Use FileSystem.exists in FileSystemBasedCheckpointFileManager.exists

2023-02-07 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42379:


Assignee: Apache Spark

> Use FileSystem.exists in FileSystemBasedCheckpointFileManager.exists
> 
>
> Key: SPARK-42379
> URL: https://issues.apache.org/jira/browse/SPARK-42379
> Project: Spark
>  Issue Type: Task
>  Components: Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Jungtaek Lim
>Assignee: Apache Spark
>Priority: Major
>
> Other methods in FileSystemBasedCheckpointFileManager already uses 
> FileSystem.exists for all cases checking existence of the path. Use 
> FileSystem.exists in FileSystemBasedCheckpointFileManager.exists, which is 
> consistent with other methods in FileSystemBasedCheckpointFileManager.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42379) Use FileSystem.exists in FileSystemBasedCheckpointFileManager.exists

2023-02-07 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17685680#comment-17685680
 ] 

Apache Spark commented on SPARK-42379:
--

User 'HeartSaVioR' has created a pull request for this issue:
https://github.com/apache/spark/pull/39936

> Use FileSystem.exists in FileSystemBasedCheckpointFileManager.exists
> 
>
> Key: SPARK-42379
> URL: https://issues.apache.org/jira/browse/SPARK-42379
> Project: Spark
>  Issue Type: Task
>  Components: Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Jungtaek Lim
>Priority: Major
>
> Other methods in FileSystemBasedCheckpointFileManager already uses 
> FileSystem.exists for all cases checking existence of the path. Use 
> FileSystem.exists in FileSystemBasedCheckpointFileManager.exists, which is 
> consistent with other methods in FileSystemBasedCheckpointFileManager.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42210) Standardize registered pickled Python UDFs

2023-02-07 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17685667#comment-17685667
 ] 

Apache Spark commented on SPARK-42210:
--

User 'xinrong-meng' has created a pull request for this issue:
https://github.com/apache/spark/pull/39860

> Standardize registered pickled Python UDFs
> --
>
> Key: SPARK-42210
> URL: https://issues.apache.org/jira/browse/SPARK-42210
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Priority: Major
>
> Implement spark.udf.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42210) Standardize registered pickled Python UDFs

2023-02-07 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42210:


Assignee: Apache Spark

> Standardize registered pickled Python UDFs
> --
>
> Key: SPARK-42210
> URL: https://issues.apache.org/jira/browse/SPARK-42210
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Assignee: Apache Spark
>Priority: Major
>
> Implement spark.udf.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42210) Standardize registered pickled Python UDFs

2023-02-07 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42210:


Assignee: (was: Apache Spark)

> Standardize registered pickled Python UDFs
> --
>
> Key: SPARK-42210
> URL: https://issues.apache.org/jira/browse/SPARK-42210
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Priority: Major
>
> Implement spark.udf.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42210) Standardize registered pickled Python UDFs

2023-02-07 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17685666#comment-17685666
 ] 

Apache Spark commented on SPARK-42210:
--

User 'xinrong-meng' has created a pull request for this issue:
https://github.com/apache/spark/pull/39860

> Standardize registered pickled Python UDFs
> --
>
> Key: SPARK-42210
> URL: https://issues.apache.org/jira/browse/SPARK-42210
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Priority: Major
>
> Implement spark.udf.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42244) Refine error message by using Python types.

2023-02-07 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17685655#comment-17685655
 ] 

Apache Spark commented on SPARK-42244:
--

User 'itholic' has created a pull request for this issue:
https://github.com/apache/spark/pull/39935

> Refine error message by using Python types.
> ---
>
> Key: SPARK-42244
> URL: https://issues.apache.org/jira/browse/SPARK-42244
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
> Fix For: 3.4.0
>
>
> Currently, the type name in error message is mixed like `string` and `str`.
> We might need to consolidate them into one rule.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42378) Make `DataFrame.select` support `a.*`

2023-02-07 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17685633#comment-17685633
 ] 

Apache Spark commented on SPARK-42378:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/39934

> Make `DataFrame.select` support `a.*`
> -
>
> Key: SPARK-42378
> URL: https://issues.apache.org/jira/browse/SPARK-42378
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42378) Make `DataFrame.select` support `a.*`

2023-02-07 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42378:


Assignee: Apache Spark

> Make `DataFrame.select` support `a.*`
> -
>
> Key: SPARK-42378
> URL: https://issues.apache.org/jira/browse/SPARK-42378
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42378) Make `DataFrame.select` support `a.*`

2023-02-07 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42378:


Assignee: (was: Apache Spark)

> Make `DataFrame.select` support `a.*`
> -
>
> Key: SPARK-42378
> URL: https://issues.apache.org/jira/browse/SPARK-42378
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42377) Test Framework for Connect Scala Client

2023-02-07 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17685613#comment-17685613
 ] 

Apache Spark commented on SPARK-42377:
--

User 'hvanhovell' has created a pull request for this issue:
https://github.com/apache/spark/pull/39933

> Test Framework for Connect Scala Client
> ---
>
> Key: SPARK-42377
> URL: https://issues.apache.org/jira/browse/SPARK-42377
> Project: Spark
>  Issue Type: Task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42377) Test Framework for Connect Scala Client

2023-02-07 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42377:


Assignee: (was: Apache Spark)

> Test Framework for Connect Scala Client
> ---
>
> Key: SPARK-42377
> URL: https://issues.apache.org/jira/browse/SPARK-42377
> Project: Spark
>  Issue Type: Task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42377) Test Framework for Connect Scala Client

2023-02-07 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17685612#comment-17685612
 ] 

Apache Spark commented on SPARK-42377:
--

User 'hvanhovell' has created a pull request for this issue:
https://github.com/apache/spark/pull/39933

> Test Framework for Connect Scala Client
> ---
>
> Key: SPARK-42377
> URL: https://issues.apache.org/jira/browse/SPARK-42377
> Project: Spark
>  Issue Type: Task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42377) Test Framework for Connect Scala Client

2023-02-07 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42377:


Assignee: Apache Spark

> Test Framework for Connect Scala Client
> ---
>
> Key: SPARK-42377
> URL: https://issues.apache.org/jira/browse/SPARK-42377
> Project: Spark
>  Issue Type: Task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42376) Introduce watermark propagation among operators

2023-02-07 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42376:


Assignee: Apache Spark

> Introduce watermark propagation among operators
> ---
>
> Key: SPARK-42376
> URL: https://issues.apache.org/jira/browse/SPARK-42376
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Jungtaek Lim
>Assignee: Apache Spark
>Priority: Major
>
> With introduction of SPARK-40925, we enabled workloads containing multiple 
> stateful operators in a single streaming query.
> The JIRA ticket clearly described out-of-scope, "Here we propose fixing the 
> late record filtering in stateful operators to allow chaining of stateful 
> operators {*}which do not produce delayed records (like time-interval join or 
> potentially flatMapGroupsWithState){*}".
> We identified production use case for stream-stream time-interval join 
> followed by stateful operator (e.g. window aggregation), and propose to 
> address such use case via this ticket.
> The design will be described in the PR, but the sketched idea is introducing 
> simulation of watermark propagation among operators. As of now, Spark 
> considers all stateful operators to have same input watermark and output 
> watermark, which introduced the limitation. With this ticket, we construct 
> the logic to simulate watermark propagation so that each operator can have 
> its own (input watermark, output watermark). Operators introducing delayed 
> records will produce delayed output watermark, and downstream operator can 
> take the delay into account as input watermark will be adjusted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42376) Introduce watermark propagation among operators

2023-02-07 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17685350#comment-17685350
 ] 

Apache Spark commented on SPARK-42376:
--

User 'HeartSaVioR' has created a pull request for this issue:
https://github.com/apache/spark/pull/39931

> Introduce watermark propagation among operators
> ---
>
> Key: SPARK-42376
> URL: https://issues.apache.org/jira/browse/SPARK-42376
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Jungtaek Lim
>Priority: Major
>
> With introduction of SPARK-40925, we enabled workloads containing multiple 
> stateful operators in a single streaming query.
> The JIRA ticket clearly described out-of-scope, "Here we propose fixing the 
> late record filtering in stateful operators to allow chaining of stateful 
> operators {*}which do not produce delayed records (like time-interval join or 
> potentially flatMapGroupsWithState){*}".
> We identified production use case for stream-stream time-interval join 
> followed by stateful operator (e.g. window aggregation), and propose to 
> address such use case via this ticket.
> The design will be described in the PR, but the sketched idea is introducing 
> simulation of watermark propagation among operators. As of now, Spark 
> considers all stateful operators to have same input watermark and output 
> watermark, which introduced the limitation. With this ticket, we construct 
> the logic to simulate watermark propagation so that each operator can have 
> its own (input watermark, output watermark). Operators introducing delayed 
> records will produce delayed output watermark, and downstream operator can 
> take the delay into account as input watermark will be adjusted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42376) Introduce watermark propagation among operators

2023-02-07 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42376:


Assignee: (was: Apache Spark)

> Introduce watermark propagation among operators
> ---
>
> Key: SPARK-42376
> URL: https://issues.apache.org/jira/browse/SPARK-42376
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Jungtaek Lim
>Priority: Major
>
> With introduction of SPARK-40925, we enabled workloads containing multiple 
> stateful operators in a single streaming query.
> The JIRA ticket clearly described out-of-scope, "Here we propose fixing the 
> late record filtering in stateful operators to allow chaining of stateful 
> operators {*}which do not produce delayed records (like time-interval join or 
> potentially flatMapGroupsWithState){*}".
> We identified production use case for stream-stream time-interval join 
> followed by stateful operator (e.g. window aggregation), and propose to 
> address such use case via this ticket.
> The design will be described in the PR, but the sketched idea is introducing 
> simulation of watermark propagation among operators. As of now, Spark 
> considers all stateful operators to have same input watermark and output 
> watermark, which introduced the limitation. With this ticket, we construct 
> the logic to simulate watermark propagation so that each operator can have 
> its own (input watermark, output watermark). Operators introducing delayed 
> records will produce delayed output watermark, and downstream operator can 
> take the delay into account as input watermark will be adjusted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37099) Introduce a rank-based filter to optimize top-k computation

2023-02-07 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17685328#comment-17685328
 ] 

Apache Spark commented on SPARK-37099:
--

User 'beliefer' has created a pull request for this issue:
https://github.com/apache/spark/pull/39930

> Introduce a rank-based filter to optimize top-k computation
> ---
>
> Key: SPARK-37099
> URL: https://issues.apache.org/jira/browse/SPARK-37099
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
> Attachments: q67.png, q67_optimized.png, skewed_window.png
>
>
> in JD, we found that more than 90% usage of window function follows this 
> pattern:
> {code:java}
>  select (... (row_number|rank|dense_rank) () over( [partition by ...] order 
> by ... ) as rn)
> where rn (==|<|<=) k and other conditions{code}
>  
> However, existing physical plan is not optimum:
>  
> 1, we should select local top-k records within each partitions, and then 
> compute the global top-k. this can help reduce the shuffle amount;
>  
> For these three rank functions (row_number|rank|dense_rank), the rank of a 
> key computed on partitial dataset  is always <=  its final rank computed on 
> the whole dataset. so we can safely discard rows with partitial rank > k, 
> anywhere.
>  
>  
> 2, skewed-window: some partition is skewed and take a long time to finish 
> computation.
>  
> A real-world skewed-window case in our system is attached.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42373) Remove unused blank line removal from CSVExprUtils

2023-02-07 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42373:


Assignee: Apache Spark

> Remove unused blank line removal from CSVExprUtils
> --
>
> Key: SPARK-42373
> URL: https://issues.apache.org/jira/browse/SPARK-42373
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.1
>Reporter: Willi Raschkowski
>Assignee: Apache Spark
>Priority: Minor
>
> The non-multiline CSV read codepath contains references to removal of blank 
> lines throughout. This is not necessary as blank lines are removed by the 
> parser. Furthermore, it causes confusion, indicating that blank lines are 
> removed at this point when instead they are already omitted from the data. 
> The multiline code-path does not explicitly remove blank lines leading to 
> what looks like disparity in behavior between the two.
> The codepath for {{DataFrameReader.csv(dataset: Dataset[String])}} does need 
> to explicitly skip lines, and this should be respected in {{CSVUtils}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42373) Remove unused blank line removal from CSVExprUtils

2023-02-07 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42373:


Assignee: (was: Apache Spark)

> Remove unused blank line removal from CSVExprUtils
> --
>
> Key: SPARK-42373
> URL: https://issues.apache.org/jira/browse/SPARK-42373
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.1
>Reporter: Willi Raschkowski
>Priority: Minor
>
> The non-multiline CSV read codepath contains references to removal of blank 
> lines throughout. This is not necessary as blank lines are removed by the 
> parser. Furthermore, it causes confusion, indicating that blank lines are 
> removed at this point when instead they are already omitted from the data. 
> The multiline code-path does not explicitly remove blank lines leading to 
> what looks like disparity in behavior between the two.
> The codepath for {{DataFrameReader.csv(dataset: Dataset[String])}} does need 
> to explicitly skip lines, and this should be respected in {{CSVUtils}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42373) Remove unused blank line removal from CSVExprUtils

2023-02-07 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17685268#comment-17685268
 ] 

Apache Spark commented on SPARK-42373:
--

User 'ted-jenks' has created a pull request for this issue:
https://github.com/apache/spark/pull/39927

> Remove unused blank line removal from CSVExprUtils
> --
>
> Key: SPARK-42373
> URL: https://issues.apache.org/jira/browse/SPARK-42373
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.1
>Reporter: Willi Raschkowski
>Priority: Minor
>
> The non-multiline CSV read codepath contains references to removal of blank 
> lines throughout. This is not necessary as blank lines are removed by the 
> parser. Furthermore, it causes confusion, indicating that blank lines are 
> removed at this point when instead they are already omitted from the data. 
> The multiline code-path does not explicitly remove blank lines leading to 
> what looks like disparity in behavior between the two.
> The codepath for {{DataFrameReader.csv(dataset: Dataset[String])}} does need 
> to explicitly skip lines, and this should be respected in {{CSVUtils}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42372) Improve performance of HiveGenericUDTF by making inputProjection instantiate once

2023-02-07 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42372:


Assignee: (was: Apache Spark)

> Improve performance of HiveGenericUDTF by making inputProjection instantiate 
> once
> -
>
> Key: SPARK-42372
> URL: https://issues.apache.org/jira/browse/SPARK-42372
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Kent Yao
>Priority: Major
>
> {code:java}
> +++ b/sql/hive/benchmarks/HiveUDFBenchmark-per-row-results.txt
> @@ -0,0 +1,7 @@
> +OpenJDK 64-Bit Server VM 1.8.0_352-bre_2022_12_13_23_06-b00 on Mac OS X 13.1
> +Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
> +Hive UDTF benchmark:                      Best Time(ms)   Avg Time(ms)   
> Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
> +
> +Hive UDTF dup 2                                    1574           1680       
>   118          0.7        1501.1       1.0X
> +Hive UDTF dup 4                                    2642           3076       
>   588          0.4        2519.9       0.6X
> +
> diff --git a/sql/hive/benchmarks/HiveUDFBenchmark-results.txt 
> b/sql/hive/benchmarks/HiveUDFBenchmark-results.txt
> new file mode 100644
> index 00..8af8b6582c
> --- /dev/null
> +++ b/sql/hive/benchmarks/HiveUDFBenchmark-results.txt
> @@ -0,0 +1,7 @@
> +OpenJDK 64-Bit Server VM 1.8.0_352-bre_2022_12_13_23_06-b00 on Mac OS X 13.1
> +Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
> +Hive UDTF benchmark:                      Best Time(ms)   Avg Time(ms)   
> Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
> +
> +Hive UDTF dup 2                                     712            789       
>   101          1.5         678.7       1.0X
> +Hive UDTF dup 4                                    1212           1294       
>    78          0.9        1156.0       0.6X
> + {code}
> over 2x performance gain via a benchmarking



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42372) Improve performance of HiveGenericUDTF by making inputProjection instantiate once

2023-02-07 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17685237#comment-17685237
 ] 

Apache Spark commented on SPARK-42372:
--

User 'yaooqinn' has created a pull request for this issue:
https://github.com/apache/spark/pull/39929

> Improve performance of HiveGenericUDTF by making inputProjection instantiate 
> once
> -
>
> Key: SPARK-42372
> URL: https://issues.apache.org/jira/browse/SPARK-42372
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Kent Yao
>Priority: Major
>
> {code:java}
> +++ b/sql/hive/benchmarks/HiveUDFBenchmark-per-row-results.txt
> @@ -0,0 +1,7 @@
> +OpenJDK 64-Bit Server VM 1.8.0_352-bre_2022_12_13_23_06-b00 on Mac OS X 13.1
> +Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
> +Hive UDTF benchmark:                      Best Time(ms)   Avg Time(ms)   
> Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
> +
> +Hive UDTF dup 2                                    1574           1680       
>   118          0.7        1501.1       1.0X
> +Hive UDTF dup 4                                    2642           3076       
>   588          0.4        2519.9       0.6X
> +
> diff --git a/sql/hive/benchmarks/HiveUDFBenchmark-results.txt 
> b/sql/hive/benchmarks/HiveUDFBenchmark-results.txt
> new file mode 100644
> index 00..8af8b6582c
> --- /dev/null
> +++ b/sql/hive/benchmarks/HiveUDFBenchmark-results.txt
> @@ -0,0 +1,7 @@
> +OpenJDK 64-Bit Server VM 1.8.0_352-bre_2022_12_13_23_06-b00 on Mac OS X 13.1
> +Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
> +Hive UDTF benchmark:                      Best Time(ms)   Avg Time(ms)   
> Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
> +
> +Hive UDTF dup 2                                     712            789       
>   101          1.5         678.7       1.0X
> +Hive UDTF dup 4                                    1212           1294       
>    78          0.9        1156.0       0.6X
> + {code}
> over 2x performance gain via a benchmarking



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42372) Improve performance of HiveGenericUDTF by making inputProjection instantiate once

2023-02-07 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42372:


Assignee: Apache Spark

> Improve performance of HiveGenericUDTF by making inputProjection instantiate 
> once
> -
>
> Key: SPARK-42372
> URL: https://issues.apache.org/jira/browse/SPARK-42372
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Kent Yao
>Assignee: Apache Spark
>Priority: Major
>
> {code:java}
> +++ b/sql/hive/benchmarks/HiveUDFBenchmark-per-row-results.txt
> @@ -0,0 +1,7 @@
> +OpenJDK 64-Bit Server VM 1.8.0_352-bre_2022_12_13_23_06-b00 on Mac OS X 13.1
> +Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
> +Hive UDTF benchmark:                      Best Time(ms)   Avg Time(ms)   
> Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
> +
> +Hive UDTF dup 2                                    1574           1680       
>   118          0.7        1501.1       1.0X
> +Hive UDTF dup 4                                    2642           3076       
>   588          0.4        2519.9       0.6X
> +
> diff --git a/sql/hive/benchmarks/HiveUDFBenchmark-results.txt 
> b/sql/hive/benchmarks/HiveUDFBenchmark-results.txt
> new file mode 100644
> index 00..8af8b6582c
> --- /dev/null
> +++ b/sql/hive/benchmarks/HiveUDFBenchmark-results.txt
> @@ -0,0 +1,7 @@
> +OpenJDK 64-Bit Server VM 1.8.0_352-bre_2022_12_13_23_06-b00 on Mac OS X 13.1
> +Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
> +Hive UDTF benchmark:                      Best Time(ms)   Avg Time(ms)   
> Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
> +
> +Hive UDTF dup 2                                     712            789       
>   101          1.5         678.7       1.0X
> +Hive UDTF dup 4                                    1212           1294       
>    78          0.9        1156.0       0.6X
> + {code}
> over 2x performance gain via a benchmarking



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42371) Add scripts to start and stop Spark Connect server

2023-02-07 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17685208#comment-17685208
 ] 

Apache Spark commented on SPARK-42371:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/39928

> Add scripts to start and stop Spark Connect server
> --
>
> Key: SPARK-42371
> URL: https://issues.apache.org/jira/browse/SPARK-42371
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> Currently, there is no proper way to start and stop the Spark Connect server. 
> Now it requires you to start it with, for example, a Spark shell:
> {code}
> # For development,
> ./bin/spark-shell \
>--jars `ls connector/connect/target/**/spark-connect*SNAPSHOT.jar` \
>   --conf spark.plugins=org.apache.spark.sql.connect.SparkConnectPlugin
> {code}
> {code}
> # For released Spark versions
> ./bin/spark-shell \
>   --packages org.apache.spark:spark-connect_2.12:3.4.0 \
>   --conf spark.plugins=org.apache.spark.sql.connect.SparkConnectPlugin
> {code}
> which is awkward.
> We need some dedicated scripts for it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42371) Add scripts to start and stop Spark Connect server

2023-02-07 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17685209#comment-17685209
 ] 

Apache Spark commented on SPARK-42371:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/39928

> Add scripts to start and stop Spark Connect server
> --
>
> Key: SPARK-42371
> URL: https://issues.apache.org/jira/browse/SPARK-42371
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> Currently, there is no proper way to start and stop the Spark Connect server. 
> Now it requires you to start it with, for example, a Spark shell:
> {code}
> # For development,
> ./bin/spark-shell \
>--jars `ls connector/connect/target/**/spark-connect*SNAPSHOT.jar` \
>   --conf spark.plugins=org.apache.spark.sql.connect.SparkConnectPlugin
> {code}
> {code}
> # For released Spark versions
> ./bin/spark-shell \
>   --packages org.apache.spark:spark-connect_2.12:3.4.0 \
>   --conf spark.plugins=org.apache.spark.sql.connect.SparkConnectPlugin
> {code}
> which is awkward.
> We need some dedicated scripts for it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42371) Add scripts to start and stop Spark Connect server

2023-02-07 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42371:


Assignee: (was: Apache Spark)

> Add scripts to start and stop Spark Connect server
> --
>
> Key: SPARK-42371
> URL: https://issues.apache.org/jira/browse/SPARK-42371
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> Currently, there is no proper way to start and stop the Spark Connect server. 
> Now it requires you to start it with, for example, a Spark shell:
> {code}
> # For development,
> ./bin/spark-shell \
>--jars `ls connector/connect/target/**/spark-connect*SNAPSHOT.jar` \
>   --conf spark.plugins=org.apache.spark.sql.connect.SparkConnectPlugin
> {code}
> {code}
> # For released Spark versions
> ./bin/spark-shell \
>   --packages org.apache.spark:spark-connect_2.12:3.4.0 \
>   --conf spark.plugins=org.apache.spark.sql.connect.SparkConnectPlugin
> {code}
> which is awkward.
> We need some dedicated scripts for it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42371) Add scripts to start and stop Spark Connect server

2023-02-07 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42371:


Assignee: Apache Spark

> Add scripts to start and stop Spark Connect server
> --
>
> Key: SPARK-42371
> URL: https://issues.apache.org/jira/browse/SPARK-42371
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Apache Spark
>Priority: Major
>
> Currently, there is no proper way to start and stop the Spark Connect server. 
> Now it requires you to start it with, for example, a Spark shell:
> {code}
> # For development,
> ./bin/spark-shell \
>--jars `ls connector/connect/target/**/spark-connect*SNAPSHOT.jar` \
>   --conf spark.plugins=org.apache.spark.sql.connect.SparkConnectPlugin
> {code}
> {code}
> # For released Spark versions
> ./bin/spark-shell \
>   --packages org.apache.spark:spark-connect_2.12:3.4.0 \
>   --conf spark.plugins=org.apache.spark.sql.connect.SparkConnectPlugin
> {code}
> which is awkward.
> We need some dedicated scripts for it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41823) DataFrame.join creating ambiguous column names

2023-02-07 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17685134#comment-17685134
 ] 

Apache Spark commented on SPARK-41823:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/39925

> DataFrame.join creating ambiguous column names
> --
>
> Key: SPARK-41823
> URL: https://issues.apache.org/jira/browse/SPARK-41823
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 254, in pyspark.sql.connect.dataframe.DataFrame.drop
> Failed example:
>     df.join(df2, df.name == df2.name, 'inner').drop('name').show()
> Exception raised:
>     Traceback (most recent call last):
>       File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
>         exec(compile(example.source, filename, "single",
>       File "", line 
> 1, in 
>         df.join(df2, df.name == df2.name, 'inner').drop('name').show()
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 534, in show
>         print(self._show_string(n, truncate, vertical))
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 423, in _show_string
>         ).toPandas()
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 1031, in toPandas
>         return self._session.client.to_pandas(query)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 413, in to_pandas
>         return self._execute_and_fetch(req)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 573, in _execute_and_fetch
>         self._handle_error(rpc_error)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 619, in _handle_error
>         raise SparkConnectAnalysisException(
>     pyspark.sql.connect.client.SparkConnectAnalysisException: 
> [AMBIGUOUS_REFERENCE] Reference `name` is ambiguous, could be: [`name`, 
> `name`].
>     Plan: {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41823) DataFrame.join creating ambiguous column names

2023-02-07 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17685133#comment-17685133
 ] 

Apache Spark commented on SPARK-41823:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/39925

> DataFrame.join creating ambiguous column names
> --
>
> Key: SPARK-41823
> URL: https://issues.apache.org/jira/browse/SPARK-41823
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 254, in pyspark.sql.connect.dataframe.DataFrame.drop
> Failed example:
>     df.join(df2, df.name == df2.name, 'inner').drop('name').show()
> Exception raised:
>     Traceback (most recent call last):
>       File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
>         exec(compile(example.source, filename, "single",
>       File "", line 
> 1, in 
>         df.join(df2, df.name == df2.name, 'inner').drop('name').show()
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 534, in show
>         print(self._show_string(n, truncate, vertical))
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 423, in _show_string
>         ).toPandas()
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 1031, in toPandas
>         return self._session.client.to_pandas(query)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 413, in to_pandas
>         return self._execute_and_fetch(req)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 573, in _execute_and_fetch
>         self._handle_error(rpc_error)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 619, in _handle_error
>         raise SparkConnectAnalysisException(
>     pyspark.sql.connect.client.SparkConnectAnalysisException: 
> [AMBIGUOUS_REFERENCE] Reference `name` is ambiguous, could be: [`name`, 
> `name`].
>     Plan: {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42369) Fix constructor for java.nio.DirectByteBuffer for Java 21+

2023-02-07 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42369:


Assignee: Apache Spark

> Fix constructor for java.nio.DirectByteBuffer for Java 21+
> --
>
> Key: SPARK-42369
> URL: https://issues.apache.org/jira/browse/SPARK-42369
> Project: Spark
>  Issue Type: Bug
>  Components: Java API
>Affects Versions: 3.5.0
>Reporter: Ludovic Henry
>Assignee: Apache Spark
>Priority: Major
>
> In the latest JDK, the constructor {{DirectByteBuffer(long, int)}} was 
> replaced with {{{}DirectByteBuffer(long, long){}}}. We just want to support 
> both by probing for the legacy one first and falling back to the newer one 
> second.
> This change is completely transparent for the end-user, and makes sure Spark 
> works transparently on the latest JDK as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42369) Fix constructor for java.nio.DirectByteBuffer for Java 21+

2023-02-07 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17685132#comment-17685132
 ] 

Apache Spark commented on SPARK-42369:
--

User 'luhenry' has created a pull request for this issue:
https://github.com/apache/spark/pull/39909

> Fix constructor for java.nio.DirectByteBuffer for Java 21+
> --
>
> Key: SPARK-42369
> URL: https://issues.apache.org/jira/browse/SPARK-42369
> Project: Spark
>  Issue Type: Bug
>  Components: Java API
>Affects Versions: 3.5.0
>Reporter: Ludovic Henry
>Priority: Major
>
> In the latest JDK, the constructor {{DirectByteBuffer(long, int)}} was 
> replaced with {{{}DirectByteBuffer(long, long){}}}. We just want to support 
> both by probing for the legacy one first and falling back to the newer one 
> second.
> This change is completely transparent for the end-user, and makes sure Spark 
> works transparently on the latest JDK as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42369) Fix constructor for java.nio.DirectByteBuffer for Java 21+

2023-02-07 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42369:


Assignee: (was: Apache Spark)

> Fix constructor for java.nio.DirectByteBuffer for Java 21+
> --
>
> Key: SPARK-42369
> URL: https://issues.apache.org/jira/browse/SPARK-42369
> Project: Spark
>  Issue Type: Bug
>  Components: Java API
>Affects Versions: 3.5.0
>Reporter: Ludovic Henry
>Priority: Major
>
> In the latest JDK, the constructor {{DirectByteBuffer(long, int)}} was 
> replaced with {{{}DirectByteBuffer(long, long){}}}. We just want to support 
> both by probing for the legacy one first and falling back to the newer one 
> second.
> This change is completely transparent for the end-user, and makes sure Spark 
> works transparently on the latest JDK as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41812) DataFrame.join: ambiguous column

2023-02-07 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17685131#comment-17685131
 ] 

Apache Spark commented on SPARK-41812:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/39925

> DataFrame.join: ambiguous column
> 
>
> Key: SPARK-41812
> URL: https://issues.apache.org/jira/browse/SPARK-41812
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> {code}
> File "/.../spark/python/pyspark/sql/connect/column.py", line 106, in 
> pyspark.sql.connect.column.Column.eqNullSafe
> Failed example:
> df1.join(df2, df1["value"] == df2["value"]).count()
> Exception raised:
> Traceback (most recent call last):
>   File "/.../miniconda3/envs/python3.9/lib/python3.9/doctest.py", line 
> 1336, in __run
> exec(compile(example.source, filename, "single",
>   File "", line 
> 1, in 
> df1.join(df2, df1["value"] == df2["value"]).count()
>   File "/.../spark/python/pyspark/sql/connect/dataframe.py", line 151, in 
> count
> pdd = self.agg(_invoke_function("count", lit(1))).toPandas()
>   File "/.../spark/python/pyspark/sql/connect/dataframe.py", line 1031, 
> in toPandas
> return self._session.client.to_pandas(query)
>   File "/.../spark/python/pyspark/sql/connect/client.py", line 413, in 
> to_pandas
> return self._execute_and_fetch(req)
>   File "/.../spark/python/pyspark/sql/connect/client.py", line 573, in 
> _execute_and_fetch
> self._handle_error(rpc_error)
>   File "/.../spark/python/pyspark/sql/connect/client.py", line 619, in 
> _handle_error
> raise SparkConnectAnalysisException(
> pyspark.sql.connect.client.SparkConnectAnalysisException: 
> [AMBIGUOUS_REFERENCE] Reference `value` is ambiguous, could be: [`value`, 
> `value`].
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41708) Pull v1write information to WriteFiles

2023-02-06 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17685103#comment-17685103
 ] 

Apache Spark commented on SPARK-41708:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/39922

> Pull v1write information to WriteFiles
> --
>
> Key: SPARK-41708
> URL: https://issues.apache.org/jira/browse/SPARK-41708
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: XiDuo You
>Assignee: XiDuo You
>Priority: Major
> Fix For: 3.4.0
>
>
> Make WriteFiles hold v1 write information



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39851) Improve join stats estimation if one side can keep uniqueness

2023-02-06 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17685104#comment-17685104
 ] 

Apache Spark commented on SPARK-39851:
--

User 'wankunde' has created a pull request for this issue:
https://github.com/apache/spark/pull/39923

> Improve join stats estimation if one side can keep uniqueness
> -
>
> Key: SPARK-39851
> URL: https://issues.apache.org/jira/browse/SPARK-39851
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Yuming Wang
>Priority: Major
>
> {code:sql}
> SELECT i_item_sk ss_item_sk
> FROM   item,
>(SELECT DISTINCT iss.i_brand_idbrand_id,
> iss.i_class_idclass_id,
> iss.i_category_id category_id
> FROM   item iss) x
> WHERE  i_brand_id = brand_id
>AND i_class_id = class_id
>AND i_category_id = category_id 
> {code}
> Current:
> {noformat}
> == Optimized Logical Plan ==
> Project [i_item_sk#4 AS ss_item_sk#54], Statistics(sizeInBytes=370.8 MiB, 
> rowCount=3.24E+7)
> +- Join Inner, (((i_brand_id#11 = brand_id#51) AND (i_class_id#13 = 
> class_id#52)) AND (i_category_id#15 = category_id#53)), 
> Statistics(sizeInBytes=1112.3 MiB, rowCount=3.24E+7)
>:- Project [i_item_sk#4, i_brand_id#11, i_class_id#13, i_category_id#15], 
> Statistics(sizeInBytes=4.6 MiB, rowCount=2.02E+5)
>:  +- Filter ((isnotnull(i_brand_id#11) AND isnotnull(i_class_id#13)) AND 
> isnotnull(i_category_id#15)), Statistics(sizeInBytes=84.6 MiB, 
> rowCount=2.02E+5)
>: +- Relation 
> spark_catalog.default.item[i_item_sk#4,i_item_id#5,i_rec_start_date#6,i_rec_end_date#7,i_item_desc#8,i_current_price#9,i_wholesale_cost#10,i_brand_id#11,i_brand#12,i_class_id#13,i_class#14,i_category_id#15,i_category#16,i_manufact_id#17,i_manufact#18,i_size#19,i_formulation#20,i_color#21,i_units#22,i_container#23,i_manager_id#24,i_product_name#25]
>  parquet, Statistics(sizeInBytes=85.2 MiB, rowCount=2.04E+5)
>+- Aggregate [brand_id#51, class_id#52, category_id#53], [brand_id#51, 
> class_id#52, category_id#53], Statistics(sizeInBytes=2.6 MiB, 
> rowCount=1.37E+5)
>   +- Project [i_brand_id#62 AS brand_id#51, i_class_id#64 AS class_id#52, 
> i_category_id#66 AS category_id#53], Statistics(sizeInBytes=3.9 MiB, 
> rowCount=2.02E+5)
>  +- Filter ((isnotnull(i_brand_id#62) AND isnotnull(i_class_id#64)) 
> AND isnotnull(i_category_id#66)), Statistics(sizeInBytes=84.6 MiB, 
> rowCount=2.02E+5)
> +- Relation 
> spark_catalog.default.item[i_item_sk#55,i_item_id#56,i_rec_start_date#57,i_rec_end_date#58,i_item_desc#59,i_current_price#60,i_wholesale_cost#61,i_brand_id#62,i_brand#63,i_class_id#64,i_class#65,i_category_id#66,i_category#67,i_manufact_id#68,i_manufact#69,i_size#70,i_formulation#71,i_color#72,i_units#73,i_container#74,i_manager_id#75,i_product_name#76]
>  parquet, Statistics(sizeInBytes=85.2 MiB, rowCount=2.04E+5)
> {noformat}
> Excepted:
> {noformat}
> == Optimized Logical Plan ==
> Project [i_item_sk#4 AS ss_item_sk#54], Statistics(sizeInBytes=2.3 MiB, 
> rowCount=2.02E+5)
> +- Join Inner, (((i_brand_id#11 = brand_id#51) AND (i_class_id#13 = 
> class_id#52)) AND (i_category_id#15 = category_id#53)), 
> Statistics(sizeInBytes=7.0 MiB, rowCount=2.02E+5)
>:- Project [i_item_sk#4, i_brand_id#11, i_class_id#13, i_category_id#15], 
> Statistics(sizeInBytes=4.6 MiB, rowCount=2.02E+5)
>:  +- Filter ((isnotnull(i_brand_id#11) AND isnotnull(i_class_id#13)) AND 
> isnotnull(i_category_id#15)), Statistics(sizeInBytes=84.6 MiB, 
> rowCount=2.02E+5)
>: +- Relation 
> spark_catalog.default.item[i_item_sk#4,i_item_id#5,i_rec_start_date#6,i_rec_end_date#7,i_item_desc#8,i_current_price#9,i_wholesale_cost#10,i_brand_id#11,i_brand#12,i_class_id#13,i_class#14,i_category_id#15,i_category#16,i_manufact_id#17,i_manufact#18,i_size#19,i_formulation#20,i_color#21,i_units#22,i_container#23,i_manager_id#24,i_product_name#25]
>  parquet, Statistics(sizeInBytes=85.2 MiB, rowCount=2.04E+5)
>+- Aggregate [brand_id#51, class_id#52, category_id#53], [brand_id#51, 
> class_id#52, category_id#53], Statistics(sizeInBytes=2.6 MiB, 
> rowCount=1.37E+5)
>   +- Project [i_brand_id#62 AS brand_id#51, i_class_id#64 AS class_id#52, 
> i_category_id#66 AS category_id#53], Statistics(sizeInBytes=3.9 MiB, 
> rowCount=2.02E+5)
>  +- Filter ((isnotnull(i_brand_id#62) AND isnotnull(i_class_id#64)) 
> AND isnotnull(i_category_id#66)), Statistics(sizeInBytes=84.6 MiB, 
> rowCount=2.02E+5)
> +- Relation 
> 

[jira] [Commented] (SPARK-41708) Pull v1write information to WriteFiles

2023-02-06 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17685102#comment-17685102
 ] 

Apache Spark commented on SPARK-41708:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/39924

> Pull v1write information to WriteFiles
> --
>
> Key: SPARK-41708
> URL: https://issues.apache.org/jira/browse/SPARK-41708
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: XiDuo You
>Assignee: XiDuo You
>Priority: Major
> Fix For: 3.4.0
>
>
> Make WriteFiles hold v1 write information



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42368) Ignore SparkRemoteFileTest K8s IT test case in GitHub Action

2023-02-06 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42368:


Assignee: Apache Spark

> Ignore SparkRemoteFileTest K8s IT test case in GitHub Action
> 
>
> Key: SPARK-42368
> URL: https://issues.apache.org/jira/browse/SPARK-42368
> Project: Spark
>  Issue Type: Test
>  Components: Project Infra, Tests
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42368) Ignore SparkRemoteFileTest K8s IT test case in GitHub Action

2023-02-06 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17685101#comment-17685101
 ] 

Apache Spark commented on SPARK-42368:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/39921

> Ignore SparkRemoteFileTest K8s IT test case in GitHub Action
> 
>
> Key: SPARK-42368
> URL: https://issues.apache.org/jira/browse/SPARK-42368
> Project: Spark
>  Issue Type: Test
>  Components: Project Infra, Tests
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42368) Ignore SparkRemoteFileTest K8s IT test case in GitHub Action

2023-02-06 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42368:


Assignee: (was: Apache Spark)

> Ignore SparkRemoteFileTest K8s IT test case in GitHub Action
> 
>
> Key: SPARK-42368
> URL: https://issues.apache.org/jira/browse/SPARK-42368
> Project: Spark
>  Issue Type: Test
>  Components: Project Infra, Tests
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41716) Factor pyspark.sql.connect.Catalog._catalog_to_pandas to client.py

2023-02-06 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41716:


Assignee: (was: Apache Spark)

> Factor pyspark.sql.connect.Catalog._catalog_to_pandas to client.py
> --
>
> Key: SPARK-41716
> URL: https://issues.apache.org/jira/browse/SPARK-41716
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> _catalog_to_pandas is more about client.py. We should probably factor this 
> out to the client.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41716) Factor pyspark.sql.connect.Catalog._catalog_to_pandas to client.py

2023-02-06 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41716:


Assignee: Apache Spark

> Factor pyspark.sql.connect.Catalog._catalog_to_pandas to client.py
> --
>
> Key: SPARK-41716
> URL: https://issues.apache.org/jira/browse/SPARK-41716
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Apache Spark
>Priority: Major
>
> _catalog_to_pandas is more about client.py. We should probably factor this 
> out to the client.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41716) Factor pyspark.sql.connect.Catalog._catalog_to_pandas to client.py

2023-02-06 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17685048#comment-17685048
 ] 

Apache Spark commented on SPARK-41716:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/39920

> Factor pyspark.sql.connect.Catalog._catalog_to_pandas to client.py
> --
>
> Key: SPARK-41716
> URL: https://issues.apache.org/jira/browse/SPARK-41716
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> _catalog_to_pandas is more about client.py. We should probably factor this 
> out to the client.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41612) Support Catalog.isCached

2023-02-06 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17685032#comment-17685032
 ] 

Apache Spark commented on SPARK-41612:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/39919

> Support Catalog.isCached
> 
>
> Key: SPARK-41612
> URL: https://issues.apache.org/jira/browse/SPARK-41612
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Martin Grund
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41623) Support Catalog.uncacheTable

2023-02-06 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41623:


Assignee: (was: Apache Spark)

> Support Catalog.uncacheTable
> 
>
> Key: SPARK-41623
> URL: https://issues.apache.org/jira/browse/SPARK-41623
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Martin Grund
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41612) Support Catalog.isCached

2023-02-06 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41612:


Assignee: (was: Apache Spark)

> Support Catalog.isCached
> 
>
> Key: SPARK-41612
> URL: https://issues.apache.org/jira/browse/SPARK-41612
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Martin Grund
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41612) Support Catalog.isCached

2023-02-06 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17685030#comment-17685030
 ] 

Apache Spark commented on SPARK-41612:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/39919

> Support Catalog.isCached
> 
>
> Key: SPARK-41612
> URL: https://issues.apache.org/jira/browse/SPARK-41612
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Martin Grund
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41623) Support Catalog.uncacheTable

2023-02-06 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17685028#comment-17685028
 ] 

Apache Spark commented on SPARK-41623:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/39919

> Support Catalog.uncacheTable
> 
>
> Key: SPARK-41623
> URL: https://issues.apache.org/jira/browse/SPARK-41623
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Martin Grund
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



<    12   13   14   15   16   17   18   19   20   21   >