[jira] [Commented] (SPARK-42151) Align UPDATE assignments with table attributes

2023-03-06 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697153#comment-17697153
 ] 

Apache Spark commented on SPARK-42151:
--

User 'aokolnychyi' has created a pull request for this issue:
https://github.com/apache/spark/pull/40308

> Align UPDATE assignments with table attributes
> --
>
> Key: SPARK-42151
> URL: https://issues.apache.org/jira/browse/SPARK-42151
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Anton Okolnychyi
>Priority: Major
>
> Assignment in UPDATE commands should be aligned with table attributes prior 
> to rewriting those UPDATE commands.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42689) Allow ShuffleDriverComponent to declare if shuffle data is reliably stored

2023-03-06 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697121#comment-17697121
 ] 

Apache Spark commented on SPARK-42689:
--

User 'mridulm' has created a pull request for this issue:
https://github.com/apache/spark/pull/40307

> Allow ShuffleDriverComponent to declare if shuffle data is reliably stored
> --
>
> Key: SPARK-42689
> URL: https://issues.apache.org/jira/browse/SPARK-42689
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.1.0, 3.2.0, 3.3.0, 3.4.0
>Reporter: Mridul Muralidharan
>Priority: Major
>
> Currently, if there is an executor node loss, we assume the shuffle data on 
> that node is also lost. This is not necessarily the case if there is a 
> shuffle component managing the shuffle data and reliably maintaining it (for 
> example, in distributed filesystem or in a disaggregated shuffle cluster).
> Downstream projects have patches to Apache Spark in order to workaround this 
> issue, for example Apache Celeborn has 
> [this|https://github.com/apache/incubator-celeborn/blob/main/assets/spark-patch/RSS_RDA_spark3.patch].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42689) Allow ShuffleDriverComponent to declare if shuffle data is reliably stored

2023-03-06 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42689:


Assignee: Apache Spark

> Allow ShuffleDriverComponent to declare if shuffle data is reliably stored
> --
>
> Key: SPARK-42689
> URL: https://issues.apache.org/jira/browse/SPARK-42689
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.1.0, 3.2.0, 3.3.0, 3.4.0
>Reporter: Mridul Muralidharan
>Assignee: Apache Spark
>Priority: Major
>
> Currently, if there is an executor node loss, we assume the shuffle data on 
> that node is also lost. This is not necessarily the case if there is a 
> shuffle component managing the shuffle data and reliably maintaining it (for 
> example, in distributed filesystem or in a disaggregated shuffle cluster).
> Downstream projects have patches to Apache Spark in order to workaround this 
> issue, for example Apache Celeborn has 
> [this|https://github.com/apache/incubator-celeborn/blob/main/assets/spark-patch/RSS_RDA_spark3.patch].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42689) Allow ShuffleDriverComponent to declare if shuffle data is reliably stored

2023-03-06 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42689:


Assignee: (was: Apache Spark)

> Allow ShuffleDriverComponent to declare if shuffle data is reliably stored
> --
>
> Key: SPARK-42689
> URL: https://issues.apache.org/jira/browse/SPARK-42689
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.1.0, 3.2.0, 3.3.0, 3.4.0
>Reporter: Mridul Muralidharan
>Priority: Major
>
> Currently, if there is an executor node loss, we assume the shuffle data on 
> that node is also lost. This is not necessarily the case if there is a 
> shuffle component managing the shuffle data and reliably maintaining it (for 
> example, in distributed filesystem or in a disaggregated shuffle cluster).
> Downstream projects have patches to Apache Spark in order to workaround this 
> issue, for example Apache Celeborn has 
> [this|https://github.com/apache/incubator-celeborn/blob/main/assets/spark-patch/RSS_RDA_spark3.patch].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42689) Allow ShuffleDriverComponent to declare if shuffle data is reliably stored

2023-03-06 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697120#comment-17697120
 ] 

Apache Spark commented on SPARK-42689:
--

User 'mridulm' has created a pull request for this issue:
https://github.com/apache/spark/pull/40307

> Allow ShuffleDriverComponent to declare if shuffle data is reliably stored
> --
>
> Key: SPARK-42689
> URL: https://issues.apache.org/jira/browse/SPARK-42689
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.1.0, 3.2.0, 3.3.0, 3.4.0
>Reporter: Mridul Muralidharan
>Priority: Major
>
> Currently, if there is an executor node loss, we assume the shuffle data on 
> that node is also lost. This is not necessarily the case if there is a 
> shuffle component managing the shuffle data and reliably maintaining it (for 
> example, in distributed filesystem or in a disaggregated shuffle cluster).
> Downstream projects have patches to Apache Spark in order to workaround this 
> issue, for example Apache Celeborn has 
> [this|https://github.com/apache/incubator-celeborn/blob/main/assets/spark-patch/RSS_RDA_spark3.patch].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42687) Better error message for unspported `Pivot` operator in Structure Streaming

2023-03-06 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42687:


Assignee: (was: Apache Spark)

> Better error message for unspported `Pivot` operator in Structure Streaming
> ---
>
> Key: SPARK-42687
> URL: https://issues.apache.org/jira/browse/SPARK-42687
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Huanli Wang
>Priority: Minor
>
> {{pivot}} is an unsupported operation in structured streaming but produces a 
> bad error message that is quite misleading.
>  
> The following is the current error message for the {{pivot}} in SS:
> {{AnalysisException: Queries with streaming sources must be executed with 
> writeStream.start();}}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42687) Better error message for unspported `Pivot` operator in Structure Streaming

2023-03-06 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697100#comment-17697100
 ] 

Apache Spark commented on SPARK-42687:
--

User 'huanliwang-db' has created a pull request for this issue:
https://github.com/apache/spark/pull/40306

> Better error message for unspported `Pivot` operator in Structure Streaming
> ---
>
> Key: SPARK-42687
> URL: https://issues.apache.org/jira/browse/SPARK-42687
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Huanli Wang
>Priority: Minor
>
> {{pivot}} is an unsupported operation in structured streaming but produces a 
> bad error message that is quite misleading.
>  
> The following is the current error message for the {{pivot}} in SS:
> {{AnalysisException: Queries with streaming sources must be executed with 
> writeStream.start();}}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42687) Better error message for unspported `Pivot` operator in Structure Streaming

2023-03-06 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42687:


Assignee: Apache Spark

> Better error message for unspported `Pivot` operator in Structure Streaming
> ---
>
> Key: SPARK-42687
> URL: https://issues.apache.org/jira/browse/SPARK-42687
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Huanli Wang
>Assignee: Apache Spark
>Priority: Minor
>
> {{pivot}} is an unsupported operation in structured streaming but produces a 
> bad error message that is quite misleading.
>  
> The following is the current error message for the {{pivot}} in SS:
> {{AnalysisException: Queries with streaming sources must be executed with 
> writeStream.start();}}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42665) `simple udf` test failed using Maven

2023-03-06 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697043#comment-17697043
 ] 

Apache Spark commented on SPARK-42665:
--

User 'zhenlineo' has created a pull request for this issue:
https://github.com/apache/spark/pull/40304

> `simple udf` test failed using Maven 
> -
>
> Key: SPARK-42665
> URL: https://issues.apache.org/jira/browse/SPARK-42665
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Major
>
> {code:java}
> simple udf *** FAILED ***
>   io.grpc.StatusRuntimeException: INTERNAL: 
> org.apache.spark.sql.ClientE2ETestSuite
>   at io.grpc.Status.asRuntimeException(Status.java:535)
>   at 
> io.grpc.stub.ClientCalls$BlockingResponseStream.hasNext(ClientCalls.java:660)
>   at 
> org.apache.spark.sql.connect.client.SparkResult.org$apache$spark$sql$connect$client$SparkResult$$processResponses(SparkResult.scala:61)
>   at 
> org.apache.spark.sql.connect.client.SparkResult.length(SparkResult.scala:106)
>   at 
> org.apache.spark.sql.connect.client.SparkResult.toArray(SparkResult.scala:123)
>   at org.apache.spark.sql.Dataset.$anonfun$collect$1(Dataset.scala:2426)
>   at org.apache.spark.sql.Dataset.withResult(Dataset.scala:2747)
>   at org.apache.spark.sql.Dataset.collect(Dataset.scala:2425)
>   at 
> org.apache.spark.sql.ClientE2ETestSuite.$anonfun$new$8(ClientE2ETestSuite.scala:85)
>   at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42665) `simple udf` test failed using Maven

2023-03-06 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697042#comment-17697042
 ] 

Apache Spark commented on SPARK-42665:
--

User 'zhenlineo' has created a pull request for this issue:
https://github.com/apache/spark/pull/40304

> `simple udf` test failed using Maven 
> -
>
> Key: SPARK-42665
> URL: https://issues.apache.org/jira/browse/SPARK-42665
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Major
>
> {code:java}
> simple udf *** FAILED ***
>   io.grpc.StatusRuntimeException: INTERNAL: 
> org.apache.spark.sql.ClientE2ETestSuite
>   at io.grpc.Status.asRuntimeException(Status.java:535)
>   at 
> io.grpc.stub.ClientCalls$BlockingResponseStream.hasNext(ClientCalls.java:660)
>   at 
> org.apache.spark.sql.connect.client.SparkResult.org$apache$spark$sql$connect$client$SparkResult$$processResponses(SparkResult.scala:61)
>   at 
> org.apache.spark.sql.connect.client.SparkResult.length(SparkResult.scala:106)
>   at 
> org.apache.spark.sql.connect.client.SparkResult.toArray(SparkResult.scala:123)
>   at org.apache.spark.sql.Dataset.$anonfun$collect$1(Dataset.scala:2426)
>   at org.apache.spark.sql.Dataset.withResult(Dataset.scala:2747)
>   at org.apache.spark.sql.Dataset.collect(Dataset.scala:2425)
>   at 
> org.apache.spark.sql.ClientE2ETestSuite.$anonfun$new$8(ClientE2ETestSuite.scala:85)
>   at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42665) `simple udf` test failed using Maven

2023-03-06 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42665:


Assignee: Apache Spark

> `simple udf` test failed using Maven 
> -
>
> Key: SPARK-42665
> URL: https://issues.apache.org/jira/browse/SPARK-42665
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Major
>
> {code:java}
> simple udf *** FAILED ***
>   io.grpc.StatusRuntimeException: INTERNAL: 
> org.apache.spark.sql.ClientE2ETestSuite
>   at io.grpc.Status.asRuntimeException(Status.java:535)
>   at 
> io.grpc.stub.ClientCalls$BlockingResponseStream.hasNext(ClientCalls.java:660)
>   at 
> org.apache.spark.sql.connect.client.SparkResult.org$apache$spark$sql$connect$client$SparkResult$$processResponses(SparkResult.scala:61)
>   at 
> org.apache.spark.sql.connect.client.SparkResult.length(SparkResult.scala:106)
>   at 
> org.apache.spark.sql.connect.client.SparkResult.toArray(SparkResult.scala:123)
>   at org.apache.spark.sql.Dataset.$anonfun$collect$1(Dataset.scala:2426)
>   at org.apache.spark.sql.Dataset.withResult(Dataset.scala:2747)
>   at org.apache.spark.sql.Dataset.collect(Dataset.scala:2425)
>   at 
> org.apache.spark.sql.ClientE2ETestSuite.$anonfun$new$8(ClientE2ETestSuite.scala:85)
>   at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42665) `simple udf` test failed using Maven

2023-03-06 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697041#comment-17697041
 ] 

Apache Spark commented on SPARK-42665:
--

User 'zhenlineo' has created a pull request for this issue:
https://github.com/apache/spark/pull/40304

> `simple udf` test failed using Maven 
> -
>
> Key: SPARK-42665
> URL: https://issues.apache.org/jira/browse/SPARK-42665
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Major
>
> {code:java}
> simple udf *** FAILED ***
>   io.grpc.StatusRuntimeException: INTERNAL: 
> org.apache.spark.sql.ClientE2ETestSuite
>   at io.grpc.Status.asRuntimeException(Status.java:535)
>   at 
> io.grpc.stub.ClientCalls$BlockingResponseStream.hasNext(ClientCalls.java:660)
>   at 
> org.apache.spark.sql.connect.client.SparkResult.org$apache$spark$sql$connect$client$SparkResult$$processResponses(SparkResult.scala:61)
>   at 
> org.apache.spark.sql.connect.client.SparkResult.length(SparkResult.scala:106)
>   at 
> org.apache.spark.sql.connect.client.SparkResult.toArray(SparkResult.scala:123)
>   at org.apache.spark.sql.Dataset.$anonfun$collect$1(Dataset.scala:2426)
>   at org.apache.spark.sql.Dataset.withResult(Dataset.scala:2747)
>   at org.apache.spark.sql.Dataset.collect(Dataset.scala:2425)
>   at 
> org.apache.spark.sql.ClientE2ETestSuite.$anonfun$new$8(ClientE2ETestSuite.scala:85)
>   at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42665) `simple udf` test failed using Maven

2023-03-06 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42665:


Assignee: (was: Apache Spark)

> `simple udf` test failed using Maven 
> -
>
> Key: SPARK-42665
> URL: https://issues.apache.org/jira/browse/SPARK-42665
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Major
>
> {code:java}
> simple udf *** FAILED ***
>   io.grpc.StatusRuntimeException: INTERNAL: 
> org.apache.spark.sql.ClientE2ETestSuite
>   at io.grpc.Status.asRuntimeException(Status.java:535)
>   at 
> io.grpc.stub.ClientCalls$BlockingResponseStream.hasNext(ClientCalls.java:660)
>   at 
> org.apache.spark.sql.connect.client.SparkResult.org$apache$spark$sql$connect$client$SparkResult$$processResponses(SparkResult.scala:61)
>   at 
> org.apache.spark.sql.connect.client.SparkResult.length(SparkResult.scala:106)
>   at 
> org.apache.spark.sql.connect.client.SparkResult.toArray(SparkResult.scala:123)
>   at org.apache.spark.sql.Dataset.$anonfun$collect$1(Dataset.scala:2426)
>   at org.apache.spark.sql.Dataset.withResult(Dataset.scala:2747)
>   at org.apache.spark.sql.Dataset.collect(Dataset.scala:2425)
>   at 
> org.apache.spark.sql.ClientE2ETestSuite.$anonfun$new$8(ClientE2ETestSuite.scala:85)
>   at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42685) optimize byteToString routines

2023-03-06 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697036#comment-17697036
 ] 

Apache Spark commented on SPARK-42685:
--

User 'alkis' has created a pull request for this issue:
https://github.com/apache/spark/pull/40301

> optimize byteToString routines
> --
>
> Key: SPARK-42685
> URL: https://issues.apache.org/jira/browse/SPARK-42685
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.3.2
>Reporter: Alkis Evlogimenos
>Priority: Major
>
> {{Utils.byteToString routines are slow because they use BigInt and 
> BigDecimal. This is causing visible CPU usage (1-2% in scan benchmarks).}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42685) optimize byteToString routines

2023-03-06 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697034#comment-17697034
 ] 

Apache Spark commented on SPARK-42685:
--

User 'alkis' has created a pull request for this issue:
https://github.com/apache/spark/pull/40301

> optimize byteToString routines
> --
>
> Key: SPARK-42685
> URL: https://issues.apache.org/jira/browse/SPARK-42685
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.3.2
>Reporter: Alkis Evlogimenos
>Priority: Major
>
> {{Utils.byteToString routines are slow because they use BigInt and 
> BigDecimal. This is causing visible CPU usage (1-2% in scan benchmarks).}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42685) optimize byteToString routines

2023-03-06 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42685:


Assignee: (was: Apache Spark)

> optimize byteToString routines
> --
>
> Key: SPARK-42685
> URL: https://issues.apache.org/jira/browse/SPARK-42685
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.3.2
>Reporter: Alkis Evlogimenos
>Priority: Major
>
> {{Utils.byteToString routines are slow because they use BigInt and 
> BigDecimal. This is causing visible CPU usage (1-2% in scan benchmarks).}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42685) optimize byteToString routines

2023-03-06 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42685:


Assignee: Apache Spark

> optimize byteToString routines
> --
>
> Key: SPARK-42685
> URL: https://issues.apache.org/jira/browse/SPARK-42685
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.3.2
>Reporter: Alkis Evlogimenos
>Assignee: Apache Spark
>Priority: Major
>
> {{Utils.byteToString routines are slow because they use BigInt and 
> BigDecimal. This is causing visible CPU usage (1-2% in scan benchmarks).}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42656) Spark Connect Scala Client Shell Script

2023-03-06 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697033#comment-17697033
 ] 

Apache Spark commented on SPARK-42656:
--

User 'zhenlineo' has created a pull request for this issue:
https://github.com/apache/spark/pull/40303

> Spark Connect Scala Client Shell Script
> ---
>
> Key: SPARK-42656
> URL: https://issues.apache.org/jira/browse/SPARK-42656
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Zhen Li
>Assignee: Zhen Li
>Priority: Major
> Fix For: 3.4.0
>
>
> Adding a shell script to run scala client in a scala REPL to allow users to 
> connect to spark connect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42686) TaskMemoryManager debug logging is expensive

2023-03-06 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42686:


Assignee: Apache Spark

> TaskMemoryManager debug logging is expensive
> 
>
> Key: SPARK-42686
> URL: https://issues.apache.org/jira/browse/SPARK-42686
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.3.2
>Reporter: Alkis Evlogimenos
>Assignee: Apache Spark
>Priority: Major
>
> TaskMemoryManager debug logging is expensive mostly because formatting 
> operations are done eagerly and some of them are quite expensive (i.e. 
> humanized strings). This causes visible CPU usage in scan benchmarks for 
> example.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42686) TaskMemoryManager debug logging is expensive

2023-03-06 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42686:


Assignee: (was: Apache Spark)

> TaskMemoryManager debug logging is expensive
> 
>
> Key: SPARK-42686
> URL: https://issues.apache.org/jira/browse/SPARK-42686
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.3.2
>Reporter: Alkis Evlogimenos
>Priority: Major
>
> TaskMemoryManager debug logging is expensive mostly because formatting 
> operations are done eagerly and some of them are quite expensive (i.e. 
> humanized strings). This causes visible CPU usage in scan benchmarks for 
> example.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42686) TaskMemoryManager debug logging is expensive

2023-03-06 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697032#comment-17697032
 ] 

Apache Spark commented on SPARK-42686:
--

User 'alkis' has created a pull request for this issue:
https://github.com/apache/spark/pull/40302

> TaskMemoryManager debug logging is expensive
> 
>
> Key: SPARK-42686
> URL: https://issues.apache.org/jira/browse/SPARK-42686
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.3.2
>Reporter: Alkis Evlogimenos
>Priority: Major
>
> TaskMemoryManager debug logging is expensive mostly because formatting 
> operations are done eagerly and some of them are quite expensive (i.e. 
> humanized strings). This causes visible CPU usage in scan benchmarks for 
> example.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42683) Automatically rename metadata columns that conflict with data schema columns

2023-03-06 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42683:


Assignee: Apache Spark

> Automatically rename metadata columns that conflict with data schema columns
> 
>
> Key: SPARK-42683
> URL: https://issues.apache.org/jira/browse/SPARK-42683
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.3.2, 3.4.0
>Reporter: Ryan Johnson
>Assignee: Apache Spark
>Priority: Major
>
> Today, if a datasource already has a column called `_metadata`, queries 
> cannot access the file-source metadata column that normally carries that 
> name. We can address this conflict with two changes to metadata column 
> handling:
>  # Automatically rename any metadata column whose name conflicts with a data 
> schema column
>  # Add a facility to reliably find metadata columns by their original/logical 
> name, even if they were renamed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42683) Automatically rename metadata columns that conflict with data schema columns

2023-03-06 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42683:


Assignee: (was: Apache Spark)

> Automatically rename metadata columns that conflict with data schema columns
> 
>
> Key: SPARK-42683
> URL: https://issues.apache.org/jira/browse/SPARK-42683
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.3.2, 3.4.0
>Reporter: Ryan Johnson
>Priority: Major
>
> Today, if a datasource already has a column called `_metadata`, queries 
> cannot access the file-source metadata column that normally carries that 
> name. We can address this conflict with two changes to metadata column 
> handling:
>  # Automatically rename any metadata column whose name conflicts with a data 
> schema column
>  # Add a facility to reliably find metadata columns by their original/logical 
> name, even if they were renamed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42683) Automatically rename metadata columns that conflict with data schema columns

2023-03-06 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697017#comment-17697017
 ] 

Apache Spark commented on SPARK-42683:
--

User 'ryan-johnson-databricks' has created a pull request for this issue:
https://github.com/apache/spark/pull/40300

> Automatically rename metadata columns that conflict with data schema columns
> 
>
> Key: SPARK-42683
> URL: https://issues.apache.org/jira/browse/SPARK-42683
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.3.2, 3.4.0
>Reporter: Ryan Johnson
>Priority: Major
>
> Today, if a datasource already has a column called `_metadata`, queries 
> cannot access the file-source metadata column that normally carries that 
> name. We can address this conflict with two changes to metadata column 
> handling:
>  # Automatically rename any metadata column whose name conflicts with a data 
> schema column
>  # Add a facility to reliably find metadata columns by their original/logical 
> name, even if they were renamed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42684) v2 catalog should not allow column default value by default

2023-03-06 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42684:


Assignee: Apache Spark

> v2 catalog should not allow column default value by default
> ---
>
> Key: SPARK-42684
> URL: https://issues.apache.org/jira/browse/SPARK-42684
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Wenchen Fan
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42684) v2 catalog should not allow column default value by default

2023-03-06 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42684:


Assignee: (was: Apache Spark)

> v2 catalog should not allow column default value by default
> ---
>
> Key: SPARK-42684
> URL: https://issues.apache.org/jira/browse/SPARK-42684
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Wenchen Fan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42595) Support query inserted partitions after insert data into table when hive.exec.dynamic.partition=true

2023-03-06 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42595:


Assignee: (was: Apache Spark)

> Support query inserted partitions after insert data into table when 
> hive.exec.dynamic.partition=true
> 
>
> Key: SPARK-42595
> URL: https://issues.apache.org/jira/browse/SPARK-42595
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: zhang haoyan
>Priority: Major
>
> When hive.exec.dynamic.partition=true and 
> hive.exec.dynamic.partition.mode=nonstrict, we can insert table by sql like 
> 'insert overwrite table aaa partition(dt) select ',  of course we can 
> know the partitions inserted into the table by the sql itself,  but if we 
> want do something for common use, we need some common way to get the inserted 
> partitions,  for example:
>     spark.sql("insert overwrite table aaa partition(dt) select ")  
> //insert table
>     val partitions = getInsertedPartitions()   //need some way to get 
> inserted partitions
>     monitorInsertedPartitions(partitions)    //do something for common use
> Since insert statement should not return any data, this ticket propose to 
> introduce spark.hive.exec.dynamic.partition.savePartitions=true (default 
> false) 
> spark.hive.exec.dynamic.partition.savePartitions.tableNamePrefix=hive_dynamic_inserted_partitions
> when spark.hive.exec.dynamic.partition.savePartitions=true we save the 
> partitions to the 
> temporary view 
> $spark.hive.exec.dynamic.partition.savePartitions.tableNamePrefix_$dbName_$tableName
> we will allow user to do this
> scala> spark.conf.set("hive.exec.dynamic.partition", true)
> scala> spark.conf.set("hive.exec.dynamic.partition.mode", "nonstrict")
> scala> spark.conf.set("spark.hive.exec.dynamic.partition.savePartitions", 
> true)
> scala> spark.sql("insert overwrite table db1.test_partition_table partition 
> (dt) select 1, '2023-02-22'").show(false)
> ++                                                                            
>   
> ||
> ++
> ++
> scala> spark.sql("select * from 
> hive_dynamic_inserted_partitions_db1_test_partition_table").show(false)
> +--+                                                                  
>   
> |dt        |
> +--+
> |2023-02-22|
> +--+



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42595) Support query inserted partitions after insert data into table when hive.exec.dynamic.partition=true

2023-03-06 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697007#comment-17697007
 ] 

Apache Spark commented on SPARK-42595:
--

User 'haoyanzhang' has created a pull request for this issue:
https://github.com/apache/spark/pull/40298

> Support query inserted partitions after insert data into table when 
> hive.exec.dynamic.partition=true
> 
>
> Key: SPARK-42595
> URL: https://issues.apache.org/jira/browse/SPARK-42595
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: zhang haoyan
>Priority: Major
>
> When hive.exec.dynamic.partition=true and 
> hive.exec.dynamic.partition.mode=nonstrict, we can insert table by sql like 
> 'insert overwrite table aaa partition(dt) select ',  of course we can 
> know the partitions inserted into the table by the sql itself,  but if we 
> want do something for common use, we need some common way to get the inserted 
> partitions,  for example:
>     spark.sql("insert overwrite table aaa partition(dt) select ")  
> //insert table
>     val partitions = getInsertedPartitions()   //need some way to get 
> inserted partitions
>     monitorInsertedPartitions(partitions)    //do something for common use
> Since insert statement should not return any data, this ticket propose to 
> introduce spark.hive.exec.dynamic.partition.savePartitions=true (default 
> false) 
> spark.hive.exec.dynamic.partition.savePartitions.tableNamePrefix=hive_dynamic_inserted_partitions
> when spark.hive.exec.dynamic.partition.savePartitions=true we save the 
> partitions to the 
> temporary view 
> $spark.hive.exec.dynamic.partition.savePartitions.tableNamePrefix_$dbName_$tableName
> we will allow user to do this
> scala> spark.conf.set("hive.exec.dynamic.partition", true)
> scala> spark.conf.set("hive.exec.dynamic.partition.mode", "nonstrict")
> scala> spark.conf.set("spark.hive.exec.dynamic.partition.savePartitions", 
> true)
> scala> spark.sql("insert overwrite table db1.test_partition_table partition 
> (dt) select 1, '2023-02-22'").show(false)
> ++                                                                            
>   
> ||
> ++
> ++
> scala> spark.sql("select * from 
> hive_dynamic_inserted_partitions_db1_test_partition_table").show(false)
> +--+                                                                  
>   
> |dt        |
> +--+
> |2023-02-22|
> +--+



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42595) Support query inserted partitions after insert data into table when hive.exec.dynamic.partition=true

2023-03-06 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42595:


Assignee: Apache Spark

> Support query inserted partitions after insert data into table when 
> hive.exec.dynamic.partition=true
> 
>
> Key: SPARK-42595
> URL: https://issues.apache.org/jira/browse/SPARK-42595
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: zhang haoyan
>Assignee: Apache Spark
>Priority: Major
>
> When hive.exec.dynamic.partition=true and 
> hive.exec.dynamic.partition.mode=nonstrict, we can insert table by sql like 
> 'insert overwrite table aaa partition(dt) select ',  of course we can 
> know the partitions inserted into the table by the sql itself,  but if we 
> want do something for common use, we need some common way to get the inserted 
> partitions,  for example:
>     spark.sql("insert overwrite table aaa partition(dt) select ")  
> //insert table
>     val partitions = getInsertedPartitions()   //need some way to get 
> inserted partitions
>     monitorInsertedPartitions(partitions)    //do something for common use
> Since insert statement should not return any data, this ticket propose to 
> introduce spark.hive.exec.dynamic.partition.savePartitions=true (default 
> false) 
> spark.hive.exec.dynamic.partition.savePartitions.tableNamePrefix=hive_dynamic_inserted_partitions
> when spark.hive.exec.dynamic.partition.savePartitions=true we save the 
> partitions to the 
> temporary view 
> $spark.hive.exec.dynamic.partition.savePartitions.tableNamePrefix_$dbName_$tableName
> we will allow user to do this
> scala> spark.conf.set("hive.exec.dynamic.partition", true)
> scala> spark.conf.set("hive.exec.dynamic.partition.mode", "nonstrict")
> scala> spark.conf.set("spark.hive.exec.dynamic.partition.savePartitions", 
> true)
> scala> spark.sql("insert overwrite table db1.test_partition_table partition 
> (dt) select 1, '2023-02-22'").show(false)
> ++                                                                            
>   
> ||
> ++
> ++
> scala> spark.sql("select * from 
> hive_dynamic_inserted_partitions_db1_test_partition_table").show(false)
> +--+                                                                  
>   
> |dt        |
> +--+
> |2023-02-22|
> +--+



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42684) v2 catalog should not allow column default value by default

2023-03-06 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697006#comment-17697006
 ] 

Apache Spark commented on SPARK-42684:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/40299

> v2 catalog should not allow column default value by default
> ---
>
> Key: SPARK-42684
> URL: https://issues.apache.org/jira/browse/SPARK-42684
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Wenchen Fan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42412) Initial prototype implementation for PySparkML

2023-03-06 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696957#comment-17696957
 ] 

Apache Spark commented on SPARK-42412:
--

User 'WeichenXu123' has created a pull request for this issue:
https://github.com/apache/spark/pull/40297

> Initial prototype implementation for PySparkML
> --
>
> Key: SPARK-42412
> URL: https://issues.apache.org/jira/browse/SPARK-42412
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.5.0
>Reporter: Weichen Xu
>Assignee: Weichen Xu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42680) Create the helper function withSQLConf for connect's test

2023-03-06 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42680:


Assignee: (was: Apache Spark)

> Create the helper function withSQLConf for connect's test
> -
>
> Key: SPARK-42680
> URL: https://issues.apache.org/jira/browse/SPARK-42680
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.1
>Reporter: jiaan.geng
>Priority: Major
>
> Spark SQL have the helper function withSQLConf that is easy to change SQL 
> config and make test easy.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42680) Create the helper function withSQLConf for connect's test

2023-03-06 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42680:


Assignee: Apache Spark

> Create the helper function withSQLConf for connect's test
> -
>
> Key: SPARK-42680
> URL: https://issues.apache.org/jira/browse/SPARK-42680
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.1
>Reporter: jiaan.geng
>Assignee: Apache Spark
>Priority: Major
>
> Spark SQL have the helper function withSQLConf that is easy to change SQL 
> config and make test easy.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42680) Create the helper function withSQLConf for connect's test

2023-03-06 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696824#comment-17696824
 ] 

Apache Spark commented on SPARK-42680:
--

User 'beliefer' has created a pull request for this issue:
https://github.com/apache/spark/pull/40296

> Create the helper function withSQLConf for connect's test
> -
>
> Key: SPARK-42680
> URL: https://issues.apache.org/jira/browse/SPARK-42680
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.1
>Reporter: jiaan.geng
>Priority: Major
>
> Spark SQL have the helper function withSQLConf that is easy to change SQL 
> config and make test easy.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42681) Relax ordering constraint for ALTER TABLE ADD|REPLACE column options

2023-03-06 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42681:


Assignee: Apache Spark

> Relax ordering constraint for ALTER TABLE ADD|REPLACE column options
> 
>
> Key: SPARK-42681
> URL: https://issues.apache.org/jira/browse/SPARK-42681
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Vitalii Li
>Assignee: Apache Spark
>Priority: Major
> Fix For: 3.5.0
>
>
> Currently the grammar for ALTER TABLE ADD|REPLACE column is:
> qualifiedColTypeWithPosition
> : name=multipartIdentifier dataType (NOT NULL)? defaultExpression? 
> commentSpec? colPosition?
> ;
> This enforces a constraint on the order of: (NOT NULL, DEFAULT value, COMMENT 
> value FIRST|AFTER value). We can update the grammar to allow these options in 
> any order instead, to improve usability.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42681) Relax ordering constraint for ALTER TABLE ADD|REPLACE column options

2023-03-06 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42681:


Assignee: (was: Apache Spark)

> Relax ordering constraint for ALTER TABLE ADD|REPLACE column options
> 
>
> Key: SPARK-42681
> URL: https://issues.apache.org/jira/browse/SPARK-42681
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Vitalii Li
>Priority: Major
> Fix For: 3.5.0
>
>
> Currently the grammar for ALTER TABLE ADD|REPLACE column is:
> qualifiedColTypeWithPosition
> : name=multipartIdentifier dataType (NOT NULL)? defaultExpression? 
> commentSpec? colPosition?
> ;
> This enforces a constraint on the order of: (NOT NULL, DEFAULT value, COMMENT 
> value FIRST|AFTER value). We can update the grammar to allow these options in 
> any order instead, to improve usability.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42681) Relax ordering constraint for ALTER TABLE ADD|REPLACE column options

2023-03-06 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696767#comment-17696767
 ] 

Apache Spark commented on SPARK-42681:
--

User 'vitaliili-db' has created a pull request for this issue:
https://github.com/apache/spark/pull/40295

> Relax ordering constraint for ALTER TABLE ADD|REPLACE column options
> 
>
> Key: SPARK-42681
> URL: https://issues.apache.org/jira/browse/SPARK-42681
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Vitalii Li
>Priority: Major
> Fix For: 3.5.0
>
>
> Currently the grammar for ALTER TABLE ADD|REPLACE column is:
> qualifiedColTypeWithPosition
> : name=multipartIdentifier dataType (NOT NULL)? defaultExpression? 
> commentSpec? colPosition?
> ;
> This enforces a constraint on the order of: (NOT NULL, DEFAULT value, COMMENT 
> value FIRST|AFTER value). We can update the grammar to allow these options in 
> any order instead, to improve usability.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40610) Spark fall back to use getPartitions instead of getPartitionsByFilter when date_add functions used in where clause

2023-03-05 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696716#comment-17696716
 ] 

Apache Spark commented on SPARK-40610:
--

User 'wangyum' has created a pull request for this issue:
https://github.com/apache/spark/pull/40294

> Spark fall back to use getPartitions instead of getPartitionsByFilter when 
> date_add functions used in where clause 
> ---
>
> Key: SPARK-40610
> URL: https://issues.apache.org/jira/browse/SPARK-40610
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.1
> Environment: edw.tmp_test_metastore_usage_source is a big table with 
> 1000 partitions and hundreds of columns
>Reporter: icyjhl
>Priority: Major
> Attachments: spark_error.log, spark_sql.sql, sql_in_mysql.sql
>
>
> When I run a insert overwrite statement, I got error saying:
>  
> {code:java}
> MetaStoreClient lost connection. Attempting to reconnect (1 of 1) after 1s. 
> listPartitions {code}
>  
> It's weird as I only selected for about 3 partitions, so I rerun the sql and 
> checked the metastore, then I found it's fetching all columns in all 
> partitions:
>  
> {code:java}
> select "CD_ID", "COMMENT", "COLUMN_NAME", "TYPE_NAME" from "COLUMNS_V2" where 
> "CD_ID" 
> in 
> (675384,675393,675385,675394,675396,675397,675395,675398,675399,675401,675402,675400,675406……){code}
>  
>  
> After testing, I found the problem is with the date_add function in where 
> clause, if remove it ,sql works fine, else metastore would fetch all columns 
> in all partitions.
>  
>  
> {code:java}
> insert overwrite table test.tmp_test_metastore_usage
> SELECT userid
>     ,SUBSTR(sendtime,1,10) AS creation_date
>     ,cast(json_bh_esdate_deltadays_max as DECIMAL(38,2)) AS 
> bh_esdate_deltadays_max
>     ,json_bh_qiye_industryphyname AS bh_qiye_industryphyname
>     ,cast(json_bh_esdate_deltadays_min as DECIMAL(38,2)) AS 
> bh_esdate_deltadays_min
>     ,cast(json_bh_subconam_min as DECIMAL(38,2)) AS bh_subconam_min
>     ,cast(json_bh_qiye_regcap_min as DECIMAL(38,2)) AS bh_qiye_regcap_min
>     ,json_bh_industryphyname AS bh_industryphyname
>     ,cast(json_bh_subconam_mean as DECIMAL(38,2)) AS bh_subconam_mean
>     ,cast(json_bh_industryphyname_nunique as DECIMAL(38,2)) AS 
> bh_industryphyname_nunique
>     ,cast(current_timestamp() as string) as dw_cre_date
>     ,cast(current_timestamp() as string) as dw_upd_date
> FROM (
>     SELECT userid
>         ,sendtime
>         ,json_bh_esdate_deltadays_max
>         ,json_bh_qiye_industryphyname
>         ,json_bh_esdate_deltadays_min
>         ,json_bh_subconam_min
>         ,json_bh_qiye_regcap_min
>         ,json_bh_industryphyname
>         ,json_bh_subconam_mean
>         ,json_bh_industryphyname_nunique
>         ,row_number() OVER (
>             PARTITION BY userid,dt ORDER BY sendtime DESC
>             ) rn
>     FROM edw.tmp_test_metastore_usage_source
>     WHERE dt >= date_add('2022-09-22',-3 )
>         AND json_bizid IN ('6101')
>         AND json_dingid IN ('611')
>     ) t
> WHERE rn = 1 {code}
>  
>  By the way 2.4.7 works good.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40610) Spark fall back to use getPartitions instead of getPartitionsByFilter when date_add functions used in where clause

2023-03-05 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40610:


Assignee: Apache Spark

> Spark fall back to use getPartitions instead of getPartitionsByFilter when 
> date_add functions used in where clause 
> ---
>
> Key: SPARK-40610
> URL: https://issues.apache.org/jira/browse/SPARK-40610
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.1
> Environment: edw.tmp_test_metastore_usage_source is a big table with 
> 1000 partitions and hundreds of columns
>Reporter: icyjhl
>Assignee: Apache Spark
>Priority: Major
> Attachments: spark_error.log, spark_sql.sql, sql_in_mysql.sql
>
>
> When I run a insert overwrite statement, I got error saying:
>  
> {code:java}
> MetaStoreClient lost connection. Attempting to reconnect (1 of 1) after 1s. 
> listPartitions {code}
>  
> It's weird as I only selected for about 3 partitions, so I rerun the sql and 
> checked the metastore, then I found it's fetching all columns in all 
> partitions:
>  
> {code:java}
> select "CD_ID", "COMMENT", "COLUMN_NAME", "TYPE_NAME" from "COLUMNS_V2" where 
> "CD_ID" 
> in 
> (675384,675393,675385,675394,675396,675397,675395,675398,675399,675401,675402,675400,675406……){code}
>  
>  
> After testing, I found the problem is with the date_add function in where 
> clause, if remove it ,sql works fine, else metastore would fetch all columns 
> in all partitions.
>  
>  
> {code:java}
> insert overwrite table test.tmp_test_metastore_usage
> SELECT userid
>     ,SUBSTR(sendtime,1,10) AS creation_date
>     ,cast(json_bh_esdate_deltadays_max as DECIMAL(38,2)) AS 
> bh_esdate_deltadays_max
>     ,json_bh_qiye_industryphyname AS bh_qiye_industryphyname
>     ,cast(json_bh_esdate_deltadays_min as DECIMAL(38,2)) AS 
> bh_esdate_deltadays_min
>     ,cast(json_bh_subconam_min as DECIMAL(38,2)) AS bh_subconam_min
>     ,cast(json_bh_qiye_regcap_min as DECIMAL(38,2)) AS bh_qiye_regcap_min
>     ,json_bh_industryphyname AS bh_industryphyname
>     ,cast(json_bh_subconam_mean as DECIMAL(38,2)) AS bh_subconam_mean
>     ,cast(json_bh_industryphyname_nunique as DECIMAL(38,2)) AS 
> bh_industryphyname_nunique
>     ,cast(current_timestamp() as string) as dw_cre_date
>     ,cast(current_timestamp() as string) as dw_upd_date
> FROM (
>     SELECT userid
>         ,sendtime
>         ,json_bh_esdate_deltadays_max
>         ,json_bh_qiye_industryphyname
>         ,json_bh_esdate_deltadays_min
>         ,json_bh_subconam_min
>         ,json_bh_qiye_regcap_min
>         ,json_bh_industryphyname
>         ,json_bh_subconam_mean
>         ,json_bh_industryphyname_nunique
>         ,row_number() OVER (
>             PARTITION BY userid,dt ORDER BY sendtime DESC
>             ) rn
>     FROM edw.tmp_test_metastore_usage_source
>     WHERE dt >= date_add('2022-09-22',-3 )
>         AND json_bizid IN ('6101')
>         AND json_dingid IN ('611')
>     ) t
> WHERE rn = 1 {code}
>  
>  By the way 2.4.7 works good.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40610) Spark fall back to use getPartitions instead of getPartitionsByFilter when date_add functions used in where clause

2023-03-05 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40610:


Assignee: (was: Apache Spark)

> Spark fall back to use getPartitions instead of getPartitionsByFilter when 
> date_add functions used in where clause 
> ---
>
> Key: SPARK-40610
> URL: https://issues.apache.org/jira/browse/SPARK-40610
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.1
> Environment: edw.tmp_test_metastore_usage_source is a big table with 
> 1000 partitions and hundreds of columns
>Reporter: icyjhl
>Priority: Major
> Attachments: spark_error.log, spark_sql.sql, sql_in_mysql.sql
>
>
> When I run a insert overwrite statement, I got error saying:
>  
> {code:java}
> MetaStoreClient lost connection. Attempting to reconnect (1 of 1) after 1s. 
> listPartitions {code}
>  
> It's weird as I only selected for about 3 partitions, so I rerun the sql and 
> checked the metastore, then I found it's fetching all columns in all 
> partitions:
>  
> {code:java}
> select "CD_ID", "COMMENT", "COLUMN_NAME", "TYPE_NAME" from "COLUMNS_V2" where 
> "CD_ID" 
> in 
> (675384,675393,675385,675394,675396,675397,675395,675398,675399,675401,675402,675400,675406……){code}
>  
>  
> After testing, I found the problem is with the date_add function in where 
> clause, if remove it ,sql works fine, else metastore would fetch all columns 
> in all partitions.
>  
>  
> {code:java}
> insert overwrite table test.tmp_test_metastore_usage
> SELECT userid
>     ,SUBSTR(sendtime,1,10) AS creation_date
>     ,cast(json_bh_esdate_deltadays_max as DECIMAL(38,2)) AS 
> bh_esdate_deltadays_max
>     ,json_bh_qiye_industryphyname AS bh_qiye_industryphyname
>     ,cast(json_bh_esdate_deltadays_min as DECIMAL(38,2)) AS 
> bh_esdate_deltadays_min
>     ,cast(json_bh_subconam_min as DECIMAL(38,2)) AS bh_subconam_min
>     ,cast(json_bh_qiye_regcap_min as DECIMAL(38,2)) AS bh_qiye_regcap_min
>     ,json_bh_industryphyname AS bh_industryphyname
>     ,cast(json_bh_subconam_mean as DECIMAL(38,2)) AS bh_subconam_mean
>     ,cast(json_bh_industryphyname_nunique as DECIMAL(38,2)) AS 
> bh_industryphyname_nunique
>     ,cast(current_timestamp() as string) as dw_cre_date
>     ,cast(current_timestamp() as string) as dw_upd_date
> FROM (
>     SELECT userid
>         ,sendtime
>         ,json_bh_esdate_deltadays_max
>         ,json_bh_qiye_industryphyname
>         ,json_bh_esdate_deltadays_min
>         ,json_bh_subconam_min
>         ,json_bh_qiye_regcap_min
>         ,json_bh_industryphyname
>         ,json_bh_subconam_mean
>         ,json_bh_industryphyname_nunique
>         ,row_number() OVER (
>             PARTITION BY userid,dt ORDER BY sendtime DESC
>             ) rn
>     FROM edw.tmp_test_metastore_usage_source
>     WHERE dt >= date_add('2022-09-22',-3 )
>         AND json_bizid IN ('6101')
>         AND json_dingid IN ('611')
>     ) t
> WHERE rn = 1 {code}
>  
>  By the way 2.4.7 works good.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40610) Spark fall back to use getPartitions instead of getPartitionsByFilter when date_add functions used in where clause

2023-03-05 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696715#comment-17696715
 ] 

Apache Spark commented on SPARK-40610:
--

User 'wangyum' has created a pull request for this issue:
https://github.com/apache/spark/pull/40294

> Spark fall back to use getPartitions instead of getPartitionsByFilter when 
> date_add functions used in where clause 
> ---
>
> Key: SPARK-40610
> URL: https://issues.apache.org/jira/browse/SPARK-40610
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.1
> Environment: edw.tmp_test_metastore_usage_source is a big table with 
> 1000 partitions and hundreds of columns
>Reporter: icyjhl
>Priority: Major
> Attachments: spark_error.log, spark_sql.sql, sql_in_mysql.sql
>
>
> When I run a insert overwrite statement, I got error saying:
>  
> {code:java}
> MetaStoreClient lost connection. Attempting to reconnect (1 of 1) after 1s. 
> listPartitions {code}
>  
> It's weird as I only selected for about 3 partitions, so I rerun the sql and 
> checked the metastore, then I found it's fetching all columns in all 
> partitions:
>  
> {code:java}
> select "CD_ID", "COMMENT", "COLUMN_NAME", "TYPE_NAME" from "COLUMNS_V2" where 
> "CD_ID" 
> in 
> (675384,675393,675385,675394,675396,675397,675395,675398,675399,675401,675402,675400,675406……){code}
>  
>  
> After testing, I found the problem is with the date_add function in where 
> clause, if remove it ,sql works fine, else metastore would fetch all columns 
> in all partitions.
>  
>  
> {code:java}
> insert overwrite table test.tmp_test_metastore_usage
> SELECT userid
>     ,SUBSTR(sendtime,1,10) AS creation_date
>     ,cast(json_bh_esdate_deltadays_max as DECIMAL(38,2)) AS 
> bh_esdate_deltadays_max
>     ,json_bh_qiye_industryphyname AS bh_qiye_industryphyname
>     ,cast(json_bh_esdate_deltadays_min as DECIMAL(38,2)) AS 
> bh_esdate_deltadays_min
>     ,cast(json_bh_subconam_min as DECIMAL(38,2)) AS bh_subconam_min
>     ,cast(json_bh_qiye_regcap_min as DECIMAL(38,2)) AS bh_qiye_regcap_min
>     ,json_bh_industryphyname AS bh_industryphyname
>     ,cast(json_bh_subconam_mean as DECIMAL(38,2)) AS bh_subconam_mean
>     ,cast(json_bh_industryphyname_nunique as DECIMAL(38,2)) AS 
> bh_industryphyname_nunique
>     ,cast(current_timestamp() as string) as dw_cre_date
>     ,cast(current_timestamp() as string) as dw_upd_date
> FROM (
>     SELECT userid
>         ,sendtime
>         ,json_bh_esdate_deltadays_max
>         ,json_bh_qiye_industryphyname
>         ,json_bh_esdate_deltadays_min
>         ,json_bh_subconam_min
>         ,json_bh_qiye_regcap_min
>         ,json_bh_industryphyname
>         ,json_bh_subconam_mean
>         ,json_bh_industryphyname_nunique
>         ,row_number() OVER (
>             PARTITION BY userid,dt ORDER BY sendtime DESC
>             ) rn
>     FROM edw.tmp_test_metastore_usage_source
>     WHERE dt >= date_add('2022-09-22',-3 )
>         AND json_bizid IN ('6101')
>         AND json_dingid IN ('611')
>     ) t
> WHERE rn = 1 {code}
>  
>  By the way 2.4.7 works good.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42677) Fix the invalid tests for broadcast hint

2023-03-05 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696687#comment-17696687
 ] 

Apache Spark commented on SPARK-42677:
--

User 'beliefer' has created a pull request for this issue:
https://github.com/apache/spark/pull/40293

> Fix the invalid tests for broadcast hint
> 
>
> Key: SPARK-42677
> URL: https://issues.apache.org/jira/browse/SPARK-42677
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: jiaan.geng
>Priority: Major
>
> Currently, there are a lot of test cases for broadcast hint is invalid. 
> Because the data size is smaller than broadcast threshold.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42677) Fix the invalid tests for broadcast hint

2023-03-05 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42677:


Assignee: Apache Spark

> Fix the invalid tests for broadcast hint
> 
>
> Key: SPARK-42677
> URL: https://issues.apache.org/jira/browse/SPARK-42677
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: jiaan.geng
>Assignee: Apache Spark
>Priority: Major
>
> Currently, there are a lot of test cases for broadcast hint is invalid. 
> Because the data size is smaller than broadcast threshold.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42677) Fix the invalid tests for broadcast hint

2023-03-05 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696686#comment-17696686
 ] 

Apache Spark commented on SPARK-42677:
--

User 'beliefer' has created a pull request for this issue:
https://github.com/apache/spark/pull/40293

> Fix the invalid tests for broadcast hint
> 
>
> Key: SPARK-42677
> URL: https://issues.apache.org/jira/browse/SPARK-42677
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: jiaan.geng
>Priority: Major
>
> Currently, there are a lot of test cases for broadcast hint is invalid. 
> Because the data size is smaller than broadcast threshold.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42677) Fix the invalid tests for broadcast hint

2023-03-05 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42677:


Assignee: (was: Apache Spark)

> Fix the invalid tests for broadcast hint
> 
>
> Key: SPARK-42677
> URL: https://issues.apache.org/jira/browse/SPARK-42677
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: jiaan.geng
>Priority: Major
>
> Currently, there are a lot of test cases for broadcast hint is invalid. 
> Because the data size is smaller than broadcast threshold.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42676) Write temp checkpoints for streaming queries to local filesystem even if default FS is set differently

2023-03-05 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696680#comment-17696680
 ] 

Apache Spark commented on SPARK-42676:
--

User 'anishshri-db' has created a pull request for this issue:
https://github.com/apache/spark/pull/40292

> Write temp checkpoints for streaming queries to local filesystem even if 
> default FS is set differently
> --
>
> Key: SPARK-42676
> URL: https://issues.apache.org/jira/browse/SPARK-42676
> Project: Spark
>  Issue Type: Task
>  Components: Structured Streaming
>Affects Versions: 3.4.0
>Reporter: Anish Shrigondekar
>Priority: Major
>
> Write temp checkpoints for streaming queries to local filesystem even if 
> default FS is set differently
>  
> We have seen cases where the default FS could be a remote file system and 
> since the path for streaming checkpoints is not specified explcitily, this 
> could cause pileup under 2 cases:
>  * query exits with exception and the flag to force checkpoint removal is not 
> set
>  * driver/cluster terminates without query being terminated gracefully



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42676) Write temp checkpoints for streaming queries to local filesystem even if default FS is set differently

2023-03-05 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696678#comment-17696678
 ] 

Apache Spark commented on SPARK-42676:
--

User 'anishshri-db' has created a pull request for this issue:
https://github.com/apache/spark/pull/40292

> Write temp checkpoints for streaming queries to local filesystem even if 
> default FS is set differently
> --
>
> Key: SPARK-42676
> URL: https://issues.apache.org/jira/browse/SPARK-42676
> Project: Spark
>  Issue Type: Task
>  Components: Structured Streaming
>Affects Versions: 3.4.0
>Reporter: Anish Shrigondekar
>Priority: Major
>
> Write temp checkpoints for streaming queries to local filesystem even if 
> default FS is set differently
>  
> We have seen cases where the default FS could be a remote file system and 
> since the path for streaming checkpoints is not specified explcitily, this 
> could cause pileup under 2 cases:
>  * query exits with exception and the flag to force checkpoint removal is not 
> set
>  * driver/cluster terminates without query being terminated gracefully



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42676) Write temp checkpoints for streaming queries to local filesystem even if default FS is set differently

2023-03-05 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42676:


Assignee: (was: Apache Spark)

> Write temp checkpoints for streaming queries to local filesystem even if 
> default FS is set differently
> --
>
> Key: SPARK-42676
> URL: https://issues.apache.org/jira/browse/SPARK-42676
> Project: Spark
>  Issue Type: Task
>  Components: Structured Streaming
>Affects Versions: 3.4.0
>Reporter: Anish Shrigondekar
>Priority: Major
>
> Write temp checkpoints for streaming queries to local filesystem even if 
> default FS is set differently
>  
> We have seen cases where the default FS could be a remote file system and 
> since the path for streaming checkpoints is not specified explcitily, this 
> could cause pileup under 2 cases:
>  * query exits with exception and the flag to force checkpoint removal is not 
> set
>  * driver/cluster terminates without query being terminated gracefully



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42676) Write temp checkpoints for streaming queries to local filesystem even if default FS is set differently

2023-03-05 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42676:


Assignee: Apache Spark

> Write temp checkpoints for streaming queries to local filesystem even if 
> default FS is set differently
> --
>
> Key: SPARK-42676
> URL: https://issues.apache.org/jira/browse/SPARK-42676
> Project: Spark
>  Issue Type: Task
>  Components: Structured Streaming
>Affects Versions: 3.4.0
>Reporter: Anish Shrigondekar
>Assignee: Apache Spark
>Priority: Major
>
> Write temp checkpoints for streaming queries to local filesystem even if 
> default FS is set differently
>  
> We have seen cases where the default FS could be a remote file system and 
> since the path for streaming checkpoints is not specified explcitily, this 
> could cause pileup under 2 cases:
>  * query exits with exception and the flag to force checkpoint removal is not 
> set
>  * driver/cluster terminates without query being terminated gracefully



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42578) Add JDBC to DataFrameWriter

2023-03-05 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42578:


Assignee: (was: Apache Spark)

> Add JDBC to DataFrameWriter
> ---
>
> Key: SPARK-42578
> URL: https://issues.apache.org/jira/browse/SPARK-42578
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42578) Add JDBC to DataFrameWriter

2023-03-05 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42578:


Assignee: Apache Spark

> Add JDBC to DataFrameWriter
> ---
>
> Key: SPARK-42578
> URL: https://issues.apache.org/jira/browse/SPARK-42578
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42578) Add JDBC to DataFrameWriter

2023-03-05 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696671#comment-17696671
 ] 

Apache Spark commented on SPARK-42578:
--

User 'beliefer' has created a pull request for this issue:
https://github.com/apache/spark/pull/40291

> Add JDBC to DataFrameWriter
> ---
>
> Key: SPARK-42578
> URL: https://issues.apache.org/jira/browse/SPARK-42578
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42478) Make a serializable jobTrackerId instead of a non-serializable JobID in FileWriterFactory

2023-03-05 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696668#comment-17696668
 ] 

Apache Spark commented on SPARK-42478:
--

User 'Yikf' has created a pull request for this issue:
https://github.com/apache/spark/pull/40289

> Make a serializable jobTrackerId instead of a non-serializable JobID in 
> FileWriterFactory
> -
>
> Key: SPARK-42478
> URL: https://issues.apache.org/jira/browse/SPARK-42478
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.2
>Reporter: yikaifei
>Assignee: yikaifei
>Priority: Major
> Fix For: 3.4.0
>
>
> https://issues.apache.org/jira/browse/SPARK-41448 make consistent MR job IDs 
> in FileBatchWriter and FileFormatWriter, but it breaks a serializable issue, 
> JobId is non-serializable



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42478) Make a serializable jobTrackerId instead of a non-serializable JobID in FileWriterFactory

2023-03-05 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696667#comment-17696667
 ] 

Apache Spark commented on SPARK-42478:
--

User 'Yikf' has created a pull request for this issue:
https://github.com/apache/spark/pull/40290

> Make a serializable jobTrackerId instead of a non-serializable JobID in 
> FileWriterFactory
> -
>
> Key: SPARK-42478
> URL: https://issues.apache.org/jira/browse/SPARK-42478
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.2
>Reporter: yikaifei
>Assignee: yikaifei
>Priority: Major
> Fix For: 3.4.0
>
>
> https://issues.apache.org/jira/browse/SPARK-41448 make consistent MR job IDs 
> in FileBatchWriter and FileFormatWriter, but it breaks a serializable issue, 
> JobId is non-serializable



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42478) Make a serializable jobTrackerId instead of a non-serializable JobID in FileWriterFactory

2023-03-05 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1769#comment-1769
 ] 

Apache Spark commented on SPARK-42478:
--

User 'Yikf' has created a pull request for this issue:
https://github.com/apache/spark/pull/40289

> Make a serializable jobTrackerId instead of a non-serializable JobID in 
> FileWriterFactory
> -
>
> Key: SPARK-42478
> URL: https://issues.apache.org/jira/browse/SPARK-42478
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.2
>Reporter: yikaifei
>Assignee: yikaifei
>Priority: Major
> Fix For: 3.4.0
>
>
> https://issues.apache.org/jira/browse/SPARK-41448 make consistent MR job IDs 
> in FileBatchWriter and FileFormatWriter, but it breaks a serializable issue, 
> JobId is non-serializable



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42496) Introduction Spark Connect at main page.

2023-03-05 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42496:


Assignee: (was: Apache Spark)

> Introduction Spark Connect at main page.
> 
>
> Key: SPARK-42496
> URL: https://issues.apache.org/jira/browse/SPARK-42496
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Documentation
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>
> We should document the introduction of Spark Connect at PySpark main 
> documentation page to give a summary to users.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42496) Introduction Spark Connect at main page.

2023-03-05 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42496:


Assignee: Apache Spark

> Introduction Spark Connect at main page.
> 
>
> Key: SPARK-42496
> URL: https://issues.apache.org/jira/browse/SPARK-42496
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Documentation
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Assignee: Apache Spark
>Priority: Major
>
> We should document the introduction of Spark Connect at PySpark main 
> documentation page to give a summary to users.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42496) Introduction Spark Connect at main page.

2023-03-05 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696655#comment-17696655
 ] 

Apache Spark commented on SPARK-42496:
--

User 'itholic' has created a pull request for this issue:
https://github.com/apache/spark/pull/40288

> Introduction Spark Connect at main page.
> 
>
> Key: SPARK-42496
> URL: https://issues.apache.org/jira/browse/SPARK-42496
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Documentation
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>
> We should document the introduction of Spark Connect at PySpark main 
> documentation page to give a summary to users.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42562) UnresolvedLambdaVariables in python do not need unique names

2023-03-05 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42562:


Assignee: (was: Apache Spark)

> UnresolvedLambdaVariables in python do not need unique names
> 
>
> Key: SPARK-42562
> URL: https://issues.apache.org/jira/browse/SPARK-42562
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Priority: Major
>
> UnresolvedLambdaVariables do not need unique names in python. We already did 
> this for the scala client, and it is good to have parity between the two 
> implementations.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42562) UnresolvedLambdaVariables in python do not need unique names

2023-03-05 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696647#comment-17696647
 ] 

Apache Spark commented on SPARK-42562:
--

User 'beliefer' has created a pull request for this issue:
https://github.com/apache/spark/pull/40287

> UnresolvedLambdaVariables in python do not need unique names
> 
>
> Key: SPARK-42562
> URL: https://issues.apache.org/jira/browse/SPARK-42562
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Priority: Major
>
> UnresolvedLambdaVariables do not need unique names in python. We already did 
> this for the scala client, and it is good to have parity between the two 
> implementations.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42562) UnresolvedLambdaVariables in python do not need unique names

2023-03-05 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42562:


Assignee: Apache Spark

> UnresolvedLambdaVariables in python do not need unique names
> 
>
> Key: SPARK-42562
> URL: https://issues.apache.org/jira/browse/SPARK-42562
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Assignee: Apache Spark
>Priority: Major
>
> UnresolvedLambdaVariables do not need unique names in python. We already did 
> this for the scala client, and it is good to have parity between the two 
> implementations.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42577) A large stage could run indefinitely due to executor lost

2023-03-05 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696570#comment-17696570
 ] 

Apache Spark commented on SPARK-42577:
--

User 'ivoson' has created a pull request for this issue:
https://github.com/apache/spark/pull/40286

> A large stage could run indefinitely due to executor lost
> -
>
> Key: SPARK-42577
> URL: https://issues.apache.org/jira/browse/SPARK-42577
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.3, 3.1.3, 3.2.3, 3.3.2
>Reporter: wuyi
>Priority: Major
>
> When a stage is extremely large and Spark runs on spot instances or 
> problematic clusters with frequent worker/executor loss,  the stage could run 
> indefinitely due to task rerun caused by the executor loss. This happens, 
> when the external shuffle service is on, and the large stages runs hours to 
> complete, when spark tries to submit a child stage, it will find the parent 
> stage - the large one, has missed some partitions, so the large stage has to 
> rerun. When it completes again, it finds new missing partitions due to the 
> same reason.
> We should add a attempt limitation for this kind of scenario.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42577) A large stage could run indefinitely due to executor lost

2023-03-05 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42577:


Assignee: (was: Apache Spark)

> A large stage could run indefinitely due to executor lost
> -
>
> Key: SPARK-42577
> URL: https://issues.apache.org/jira/browse/SPARK-42577
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.3, 3.1.3, 3.2.3, 3.3.2
>Reporter: wuyi
>Priority: Major
>
> When a stage is extremely large and Spark runs on spot instances or 
> problematic clusters with frequent worker/executor loss,  the stage could run 
> indefinitely due to task rerun caused by the executor loss. This happens, 
> when the external shuffle service is on, and the large stages runs hours to 
> complete, when spark tries to submit a child stage, it will find the parent 
> stage - the large one, has missed some partitions, so the large stage has to 
> rerun. When it completes again, it finds new missing partitions due to the 
> same reason.
> We should add a attempt limitation for this kind of scenario.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42577) A large stage could run indefinitely due to executor lost

2023-03-05 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696569#comment-17696569
 ] 

Apache Spark commented on SPARK-42577:
--

User 'ivoson' has created a pull request for this issue:
https://github.com/apache/spark/pull/40286

> A large stage could run indefinitely due to executor lost
> -
>
> Key: SPARK-42577
> URL: https://issues.apache.org/jira/browse/SPARK-42577
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.3, 3.1.3, 3.2.3, 3.3.2
>Reporter: wuyi
>Priority: Major
>
> When a stage is extremely large and Spark runs on spot instances or 
> problematic clusters with frequent worker/executor loss,  the stage could run 
> indefinitely due to task rerun caused by the executor loss. This happens, 
> when the external shuffle service is on, and the large stages runs hours to 
> complete, when spark tries to submit a child stage, it will find the parent 
> stage - the large one, has missed some partitions, so the large stage has to 
> rerun. When it completes again, it finds new missing partitions due to the 
> same reason.
> We should add a attempt limitation for this kind of scenario.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42577) A large stage could run indefinitely due to executor lost

2023-03-05 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42577:


Assignee: Apache Spark

> A large stage could run indefinitely due to executor lost
> -
>
> Key: SPARK-42577
> URL: https://issues.apache.org/jira/browse/SPARK-42577
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.3, 3.1.3, 3.2.3, 3.3.2
>Reporter: wuyi
>Assignee: Apache Spark
>Priority: Major
>
> When a stage is extremely large and Spark runs on spot instances or 
> problematic clusters with frequent worker/executor loss,  the stage could run 
> indefinitely due to task rerun caused by the executor loss. This happens, 
> when the external shuffle service is on, and the large stages runs hours to 
> complete, when spark tries to submit a child stage, it will find the parent 
> stage - the large one, has missed some partitions, so the large stage has to 
> rerun. When it completes again, it finds new missing partitions due to the 
> same reason.
> We should add a attempt limitation for this kind of scenario.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42675) Should clean up temp view after test

2023-03-05 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42675:


Assignee: (was: Apache Spark)

> Should clean up temp view after test
> 
>
> Key: SPARK-42675
> URL: https://issues.apache.org/jira/browse/SPARK-42675
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, Tests
>Affects Versions: 3.4.0, 3.5.0
>Reporter: Yang Jie
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42675) Should clean up temp view after test

2023-03-05 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696564#comment-17696564
 ] 

Apache Spark commented on SPARK-42675:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/40285

> Should clean up temp view after test
> 
>
> Key: SPARK-42675
> URL: https://issues.apache.org/jira/browse/SPARK-42675
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, Tests
>Affects Versions: 3.4.0, 3.5.0
>Reporter: Yang Jie
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42675) Should clean up temp view after test

2023-03-05 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42675:


Assignee: Apache Spark

> Should clean up temp view after test
> 
>
> Key: SPARK-42675
> URL: https://issues.apache.org/jira/browse/SPARK-42675
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, Tests
>Affects Versions: 3.4.0, 3.5.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42675) Should clean up temp view after test

2023-03-05 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42675:


Assignee: Apache Spark

> Should clean up temp view after test
> 
>
> Key: SPARK-42675
> URL: https://issues.apache.org/jira/browse/SPARK-42675
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, Tests
>Affects Versions: 3.4.0, 3.5.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42674) Upgrade scalafmt from 3.7.1 to 3.7.2

2023-03-05 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42674:


Assignee: (was: Apache Spark)

>  Upgrade scalafmt from 3.7.1 to 3.7.2
> -
>
> Key: SPARK-42674
> URL: https://issues.apache.org/jira/browse/SPARK-42674
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.1
>Reporter: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42674) Upgrade scalafmt from 3.7.1 to 3.7.2

2023-03-05 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42674:


Assignee: Apache Spark

>  Upgrade scalafmt from 3.7.1 to 3.7.2
> -
>
> Key: SPARK-42674
> URL: https://issues.apache.org/jira/browse/SPARK-42674
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.1
>Reporter: BingKun Pan
>Assignee: Apache Spark
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42674) Upgrade scalafmt from 3.7.1 to 3.7.2

2023-03-05 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696539#comment-17696539
 ] 

Apache Spark commented on SPARK-42674:
--

User 'panbingkun' has created a pull request for this issue:
https://github.com/apache/spark/pull/40284

>  Upgrade scalafmt from 3.7.1 to 3.7.2
> -
>
> Key: SPARK-42674
> URL: https://issues.apache.org/jira/browse/SPARK-42674
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.1
>Reporter: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42673) Ban maven 3.9.x for Spark Build

2023-03-05 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42673:


Assignee: (was: Apache Spark)

> Ban maven 3.9.x for Spark Build
> ---
>
> Key: SPARK-42673
> URL: https://issues.apache.org/jira/browse/SPARK-42673
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0, 3.5.0
>Reporter: Yang Jie
>Priority: Major
>
> {code:java}
> [ERROR] An error occurred attempting to read POM
> org.codehaus.plexus.util.xml.pull.XmlPullParserException: UTF-8 BOM plus xml 
> decl of ISO-8859-1 is incompatible (position: START_DOCUMENT seen  version="1.0" encoding="ISO-8859-1"... @1:42) 
> at org.codehaus.plexus.util.xml.pull.MXParser.parseXmlDeclWithVersion 
> (MXParser.java:3423)
> at org.codehaus.plexus.util.xml.pull.MXParser.parseXmlDecl 
> (MXParser.java:3345)
> at org.codehaus.plexus.util.xml.pull.MXParser.parsePI (MXParser.java:3197)
> at org.codehaus.plexus.util.xml.pull.MXParser.parseProlog 
> (MXParser.java:1828)
> at org.codehaus.plexus.util.xml.pull.MXParser.nextImpl 
> (MXParser.java:1757)
> at org.codehaus.plexus.util.xml.pull.MXParser.next (MXParser.java:1375)
> at org.apache.maven.model.io.xpp3.MavenXpp3Reader.read 
> (MavenXpp3Reader.java:3940)
> at org.apache.maven.model.io.xpp3.MavenXpp3Reader.read 
> (MavenXpp3Reader.java:612)
> at org.apache.maven.model.io.xpp3.MavenXpp3Reader.read 
> (MavenXpp3Reader.java:627)
> at org.cyclonedx.maven.BaseCycloneDxMojo.readPom 
> (BaseCycloneDxMojo.java:759)
> at org.cyclonedx.maven.BaseCycloneDxMojo.readPom 
> (BaseCycloneDxMojo.java:746)
> at org.cyclonedx.maven.BaseCycloneDxMojo.retrieveParentProject 
> (BaseCycloneDxMojo.java:694)
> at org.cyclonedx.maven.BaseCycloneDxMojo.getClosestMetadata 
> (BaseCycloneDxMojo.java:524)
> at org.cyclonedx.maven.BaseCycloneDxMojo.convert 
> (BaseCycloneDxMojo.java:481)
> at org.cyclonedx.maven.CycloneDxMojo.execute (CycloneDxMojo.java:70)
> at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo 
> (DefaultBuildPluginManager.java:126)
> at org.apache.maven.lifecycle.internal.MojoExecutor.doExecute2 
> (MojoExecutor.java:342)
> at org.apache.maven.lifecycle.internal.MojoExecutor.doExecute 
> (MojoExecutor.java:330)
> at org.apache.maven.lifecycle.internal.MojoExecutor.execute 
> (MojoExecutor.java:213)
> at org.apache.maven.lifecycle.internal.MojoExecutor.execute 
> (MojoExecutor.java:175)
> at org.apache.maven.lifecycle.internal.MojoExecutor.access$000 
> (MojoExecutor.java:76)
> at org.apache.maven.lifecycle.internal.MojoExecutor$1.run 
> (MojoExecutor.java:163)
> at org.apache.maven.plugin.DefaultMojosExecutionStrategy.execute 
> (DefaultMojosExecutionStrategy.java:39)
> at org.apache.maven.lifecycle.internal.MojoExecutor.execute 
> (MojoExecutor.java:160)
> at 
> org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject 
> (LifecycleModuleBuilder.java:105)
> at 
> org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject 
> (LifecycleModuleBuilder.java:73)
> at 
> org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build
>  (SingleThreadedBuilder.java:53)
> at org.apache.maven.lifecycle.internal.LifecycleStarter.execute 
> (LifecycleStarter.java:118)
> at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:260)
> at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:172)
> at org.apache.maven.DefaultMaven.execute (DefaultMaven.java:100)
> at org.apache.maven.cli.MavenCli.execute (MavenCli.java:821)
> at org.apache.maven.cli.MavenCli.doMain (MavenCli.java:270)
> at org.apache.maven.cli.MavenCli.main (MavenCli.java:192)
> at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke 
> (NativeMethodAccessorImpl.java:62)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke 
> (DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke (Method.java:498)
> at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced 
> (Launcher.java:282)
> at org.codehaus.plexus.classworlds.launcher.Launcher.launch 
> (Launcher.java:225)
> at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode 
> (Launcher.java:406)
> at org.codehaus.plexus.classworlds.launcher.Launcher.main 
> (Launcher.java:347) {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42673) Ban maven 3.9.x for Spark Build

2023-03-05 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42673:


Assignee: Apache Spark

> Ban maven 3.9.x for Spark Build
> ---
>
> Key: SPARK-42673
> URL: https://issues.apache.org/jira/browse/SPARK-42673
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0, 3.5.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Major
>
> {code:java}
> [ERROR] An error occurred attempting to read POM
> org.codehaus.plexus.util.xml.pull.XmlPullParserException: UTF-8 BOM plus xml 
> decl of ISO-8859-1 is incompatible (position: START_DOCUMENT seen  version="1.0" encoding="ISO-8859-1"... @1:42) 
> at org.codehaus.plexus.util.xml.pull.MXParser.parseXmlDeclWithVersion 
> (MXParser.java:3423)
> at org.codehaus.plexus.util.xml.pull.MXParser.parseXmlDecl 
> (MXParser.java:3345)
> at org.codehaus.plexus.util.xml.pull.MXParser.parsePI (MXParser.java:3197)
> at org.codehaus.plexus.util.xml.pull.MXParser.parseProlog 
> (MXParser.java:1828)
> at org.codehaus.plexus.util.xml.pull.MXParser.nextImpl 
> (MXParser.java:1757)
> at org.codehaus.plexus.util.xml.pull.MXParser.next (MXParser.java:1375)
> at org.apache.maven.model.io.xpp3.MavenXpp3Reader.read 
> (MavenXpp3Reader.java:3940)
> at org.apache.maven.model.io.xpp3.MavenXpp3Reader.read 
> (MavenXpp3Reader.java:612)
> at org.apache.maven.model.io.xpp3.MavenXpp3Reader.read 
> (MavenXpp3Reader.java:627)
> at org.cyclonedx.maven.BaseCycloneDxMojo.readPom 
> (BaseCycloneDxMojo.java:759)
> at org.cyclonedx.maven.BaseCycloneDxMojo.readPom 
> (BaseCycloneDxMojo.java:746)
> at org.cyclonedx.maven.BaseCycloneDxMojo.retrieveParentProject 
> (BaseCycloneDxMojo.java:694)
> at org.cyclonedx.maven.BaseCycloneDxMojo.getClosestMetadata 
> (BaseCycloneDxMojo.java:524)
> at org.cyclonedx.maven.BaseCycloneDxMojo.convert 
> (BaseCycloneDxMojo.java:481)
> at org.cyclonedx.maven.CycloneDxMojo.execute (CycloneDxMojo.java:70)
> at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo 
> (DefaultBuildPluginManager.java:126)
> at org.apache.maven.lifecycle.internal.MojoExecutor.doExecute2 
> (MojoExecutor.java:342)
> at org.apache.maven.lifecycle.internal.MojoExecutor.doExecute 
> (MojoExecutor.java:330)
> at org.apache.maven.lifecycle.internal.MojoExecutor.execute 
> (MojoExecutor.java:213)
> at org.apache.maven.lifecycle.internal.MojoExecutor.execute 
> (MojoExecutor.java:175)
> at org.apache.maven.lifecycle.internal.MojoExecutor.access$000 
> (MojoExecutor.java:76)
> at org.apache.maven.lifecycle.internal.MojoExecutor$1.run 
> (MojoExecutor.java:163)
> at org.apache.maven.plugin.DefaultMojosExecutionStrategy.execute 
> (DefaultMojosExecutionStrategy.java:39)
> at org.apache.maven.lifecycle.internal.MojoExecutor.execute 
> (MojoExecutor.java:160)
> at 
> org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject 
> (LifecycleModuleBuilder.java:105)
> at 
> org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject 
> (LifecycleModuleBuilder.java:73)
> at 
> org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build
>  (SingleThreadedBuilder.java:53)
> at org.apache.maven.lifecycle.internal.LifecycleStarter.execute 
> (LifecycleStarter.java:118)
> at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:260)
> at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:172)
> at org.apache.maven.DefaultMaven.execute (DefaultMaven.java:100)
> at org.apache.maven.cli.MavenCli.execute (MavenCli.java:821)
> at org.apache.maven.cli.MavenCli.doMain (MavenCli.java:270)
> at org.apache.maven.cli.MavenCli.main (MavenCli.java:192)
> at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke 
> (NativeMethodAccessorImpl.java:62)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke 
> (DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke (Method.java:498)
> at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced 
> (Launcher.java:282)
> at org.codehaus.plexus.classworlds.launcher.Launcher.launch 
> (Launcher.java:225)
> at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode 
> (Launcher.java:406)
> at org.codehaus.plexus.classworlds.launcher.Launcher.main 
> (Launcher.java:347) {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42673) Ban maven 3.9.x for Spark Build

2023-03-05 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696537#comment-17696537
 ] 

Apache Spark commented on SPARK-42673:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/40283

> Ban maven 3.9.x for Spark Build
> ---
>
> Key: SPARK-42673
> URL: https://issues.apache.org/jira/browse/SPARK-42673
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0, 3.5.0
>Reporter: Yang Jie
>Priority: Major
>
> {code:java}
> [ERROR] An error occurred attempting to read POM
> org.codehaus.plexus.util.xml.pull.XmlPullParserException: UTF-8 BOM plus xml 
> decl of ISO-8859-1 is incompatible (position: START_DOCUMENT seen  version="1.0" encoding="ISO-8859-1"... @1:42) 
> at org.codehaus.plexus.util.xml.pull.MXParser.parseXmlDeclWithVersion 
> (MXParser.java:3423)
> at org.codehaus.plexus.util.xml.pull.MXParser.parseXmlDecl 
> (MXParser.java:3345)
> at org.codehaus.plexus.util.xml.pull.MXParser.parsePI (MXParser.java:3197)
> at org.codehaus.plexus.util.xml.pull.MXParser.parseProlog 
> (MXParser.java:1828)
> at org.codehaus.plexus.util.xml.pull.MXParser.nextImpl 
> (MXParser.java:1757)
> at org.codehaus.plexus.util.xml.pull.MXParser.next (MXParser.java:1375)
> at org.apache.maven.model.io.xpp3.MavenXpp3Reader.read 
> (MavenXpp3Reader.java:3940)
> at org.apache.maven.model.io.xpp3.MavenXpp3Reader.read 
> (MavenXpp3Reader.java:612)
> at org.apache.maven.model.io.xpp3.MavenXpp3Reader.read 
> (MavenXpp3Reader.java:627)
> at org.cyclonedx.maven.BaseCycloneDxMojo.readPom 
> (BaseCycloneDxMojo.java:759)
> at org.cyclonedx.maven.BaseCycloneDxMojo.readPom 
> (BaseCycloneDxMojo.java:746)
> at org.cyclonedx.maven.BaseCycloneDxMojo.retrieveParentProject 
> (BaseCycloneDxMojo.java:694)
> at org.cyclonedx.maven.BaseCycloneDxMojo.getClosestMetadata 
> (BaseCycloneDxMojo.java:524)
> at org.cyclonedx.maven.BaseCycloneDxMojo.convert 
> (BaseCycloneDxMojo.java:481)
> at org.cyclonedx.maven.CycloneDxMojo.execute (CycloneDxMojo.java:70)
> at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo 
> (DefaultBuildPluginManager.java:126)
> at org.apache.maven.lifecycle.internal.MojoExecutor.doExecute2 
> (MojoExecutor.java:342)
> at org.apache.maven.lifecycle.internal.MojoExecutor.doExecute 
> (MojoExecutor.java:330)
> at org.apache.maven.lifecycle.internal.MojoExecutor.execute 
> (MojoExecutor.java:213)
> at org.apache.maven.lifecycle.internal.MojoExecutor.execute 
> (MojoExecutor.java:175)
> at org.apache.maven.lifecycle.internal.MojoExecutor.access$000 
> (MojoExecutor.java:76)
> at org.apache.maven.lifecycle.internal.MojoExecutor$1.run 
> (MojoExecutor.java:163)
> at org.apache.maven.plugin.DefaultMojosExecutionStrategy.execute 
> (DefaultMojosExecutionStrategy.java:39)
> at org.apache.maven.lifecycle.internal.MojoExecutor.execute 
> (MojoExecutor.java:160)
> at 
> org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject 
> (LifecycleModuleBuilder.java:105)
> at 
> org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject 
> (LifecycleModuleBuilder.java:73)
> at 
> org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build
>  (SingleThreadedBuilder.java:53)
> at org.apache.maven.lifecycle.internal.LifecycleStarter.execute 
> (LifecycleStarter.java:118)
> at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:260)
> at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:172)
> at org.apache.maven.DefaultMaven.execute (DefaultMaven.java:100)
> at org.apache.maven.cli.MavenCli.execute (MavenCli.java:821)
> at org.apache.maven.cli.MavenCli.doMain (MavenCli.java:270)
> at org.apache.maven.cli.MavenCli.main (MavenCli.java:192)
> at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke 
> (NativeMethodAccessorImpl.java:62)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke 
> (DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke (Method.java:498)
> at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced 
> (Launcher.java:282)
> at org.codehaus.plexus.classworlds.launcher.Launcher.launch 
> (Launcher.java:225)
> at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode 
> (Launcher.java:406)
> at org.codehaus.plexus.classworlds.launcher.Launcher.main 
> (Launcher.java:347) {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, 

[jira] [Commented] (SPARK-42672) Document error class list

2023-03-05 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696536#comment-17696536
 ] 

Apache Spark commented on SPARK-42672:
--

User 'itholic' has created a pull request for this issue:
https://github.com/apache/spark/pull/40282

> Document error class list
> -
>
> Key: SPARK-42672
> URL: https://issues.apache.org/jira/browse/SPARK-42672
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42672) Document error class list

2023-03-05 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696535#comment-17696535
 ] 

Apache Spark commented on SPARK-42672:
--

User 'itholic' has created a pull request for this issue:
https://github.com/apache/spark/pull/40282

> Document error class list
> -
>
> Key: SPARK-42672
> URL: https://issues.apache.org/jira/browse/SPARK-42672
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42672) Document error class list

2023-03-05 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42672:


Assignee: (was: Apache Spark)

> Document error class list
> -
>
> Key: SPARK-42672
> URL: https://issues.apache.org/jira/browse/SPARK-42672
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42672) Document error class list

2023-03-05 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42672:


Assignee: Apache Spark

> Document error class list
> -
>
> Key: SPARK-42672
> URL: https://issues.apache.org/jira/browse/SPARK-42672
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41497) Accumulator undercounting in the case of retry task with rdd cache

2023-03-04 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696526#comment-17696526
 ] 

Apache Spark commented on SPARK-41497:
--

User 'ivoson' has created a pull request for this issue:
https://github.com/apache/spark/pull/40281

> Accumulator undercounting in the case of retry task with rdd cache
> --
>
> Key: SPARK-41497
> URL: https://issues.apache.org/jira/browse/SPARK-41497
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.8, 3.0.3, 3.1.3, 3.2.2, 3.3.1
>Reporter: wuyi
>Assignee: Tengfei Huang
>Priority: Major
> Fix For: 3.5.0
>
>
> Accumulator could be undercounted when the retried task has rdd cache.  See 
> the example below and you could also find the completed and reproducible 
> example at 
> [https://github.com/apache/spark/compare/master...Ngone51:spark:fix-acc]
>   
> {code:scala}
> test("SPARK-XXX") {
>   // Set up a cluster with 2 executors
>   val conf = new SparkConf()
> .setMaster("local-cluster[2, 1, 
> 1024]").setAppName("TaskSchedulerImplSuite")
>   sc = new SparkContext(conf)
>   // Set up a custom task scheduler. The scheduler will fail the first task 
> attempt of the job
>   // submitted below. In particular, the failed first attempt task would 
> success on computation
>   // (accumulator accounting, result caching) but only fail to report its 
> success status due
>   // to the concurrent executor lost. The second task attempt would success.
>   taskScheduler = setupSchedulerWithCustomStatusUpdate(sc)
>   val myAcc = sc.longAccumulator("myAcc")
>   // Initiate a rdd with only one partition so there's only one task and 
> specify the storage level
>   // with MEMORY_ONLY_2 so that the rdd result will be cached on both two 
> executors.
>   val rdd = sc.parallelize(0 until 10, 1).mapPartitions { iter =>
> myAcc.add(100)
> iter.map(x => x + 1)
>   }.persist(StorageLevel.MEMORY_ONLY_2)
>   // This will pass since the second task attempt will succeed
>   assert(rdd.count() === 10)
>   // This will fail due to `myAcc.add(100)` won't be executed during the 
> second task attempt's
>   // execution. Because the second task attempt will load the rdd cache 
> directly instead of
>   // executing the task function so `myAcc.add(100)` is skipped.
>   assert(myAcc.value === 100)
> } {code}
>  
> We could also hit this issue with decommission even if the rdd only has one 
> copy. For example, decommission could migrate the rdd cache block to another 
> executor (the result is actually the same with 2 copies) and the 
> decommissioned executor lost before the task reports its success status to 
> the driver. 
>  
> And the issue is a bit more complicated than expected to fix. I have tried to 
> give some fixes but all of them are not ideal:
> Option 1: Clean up any rdd cache related to the failed task: in practice, 
> this option can already fix the issue in most cases. However, theoretically, 
> rdd cache could be reported to the driver right after the driver cleans up 
> the failed task's caches due to asynchronous communication. So this option 
> can’t resolve the issue thoroughly;
> Option 2: Disallow rdd cache reuse across the task attempts for the same 
> task: this option can 100% fix the issue. The problem is this way can also 
> affect the case where rdd cache can be reused across the attempts (e.g., when 
> there is no accumulator operation in the task), which can have perf 
> regression;
> Option 3: Introduce accumulator cache: first, this requires a new framework 
> for supporting accumulator cache; second, the driver should improve its logic 
> to distinguish whether the accumulator cache value should be reported to the 
> user to avoid overcounting. For example, in the case of task retry, the value 
> should be reported. However, in the case of rdd cache reuse, the value 
> shouldn’t be reported (should it?);
> Option 4: Do task success validation when a task trying to load the rdd 
> cache: this way defines a rdd cache is only valid/accessible if the task has 
> succeeded. This way could be either overkill or a bit complex (because 
> currently Spark would clean up the task state once it’s finished. So we need 
> to maintain a structure to know if task once succeeded or not. )



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42671) Fix bug for createDataFrame from complex type schema

2023-03-04 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696504#comment-17696504
 ] 

Apache Spark commented on SPARK-42671:
--

User 'panbingkun' has created a pull request for this issue:
https://github.com/apache/spark/pull/40280

> Fix bug for createDataFrame from complex type schema
> 
>
> Key: SPARK-42671
> URL: https://issues.apache.org/jira/browse/SPARK-42671
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.4.1
>Reporter: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42671) Fix bug for createDataFrame from complex type schema

2023-03-04 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42671:


Assignee: (was: Apache Spark)

> Fix bug for createDataFrame from complex type schema
> 
>
> Key: SPARK-42671
> URL: https://issues.apache.org/jira/browse/SPARK-42671
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.4.1
>Reporter: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42671) Fix bug for createDataFrame from complex type schema

2023-03-04 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696503#comment-17696503
 ] 

Apache Spark commented on SPARK-42671:
--

User 'panbingkun' has created a pull request for this issue:
https://github.com/apache/spark/pull/40280

> Fix bug for createDataFrame from complex type schema
> 
>
> Key: SPARK-42671
> URL: https://issues.apache.org/jira/browse/SPARK-42671
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.4.1
>Reporter: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42671) Fix bug for createDataFrame from complex type schema

2023-03-04 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42671:


Assignee: Apache Spark

> Fix bug for createDataFrame from complex type schema
> 
>
> Key: SPARK-42671
> URL: https://issues.apache.org/jira/browse/SPARK-42671
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.4.1
>Reporter: BingKun Pan
>Assignee: Apache Spark
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42670) Upgrade maven-surefire-plugin to 3.0.0-M9 & eliminate build warnings

2023-03-04 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42670:


Assignee: (was: Apache Spark)

> Upgrade maven-surefire-plugin to 3.0.0-M9 & eliminate build warnings
> 
>
> Key: SPARK-42670
> URL: https://issues.apache.org/jira/browse/SPARK-42670
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42670) Upgrade maven-surefire-plugin to 3.0.0-M9 & eliminate build warnings

2023-03-04 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42670:


Assignee: Apache Spark

> Upgrade maven-surefire-plugin to 3.0.0-M9 & eliminate build warnings
> 
>
> Key: SPARK-42670
> URL: https://issues.apache.org/jira/browse/SPARK-42670
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: BingKun Pan
>Assignee: Apache Spark
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42670) Upgrade maven-surefire-plugin to 3.0.0-M9 & eliminate build warnings

2023-03-04 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696450#comment-17696450
 ] 

Apache Spark commented on SPARK-42670:
--

User 'panbingkun' has created a pull request for this issue:
https://github.com/apache/spark/pull/40278

> Upgrade maven-surefire-plugin to 3.0.0-M9 & eliminate build warnings
> 
>
> Key: SPARK-42670
> URL: https://issues.apache.org/jira/browse/SPARK-42670
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42555) Add JDBC to DataFrameReader

2023-03-03 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696410#comment-17696410
 ] 

Apache Spark commented on SPARK-42555:
--

User 'beliefer' has created a pull request for this issue:
https://github.com/apache/spark/pull/40277

> Add JDBC to DataFrameReader
> ---
>
> Key: SPARK-42555
> URL: https://issues.apache.org/jira/browse/SPARK-42555
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Assignee: jiaan.geng
>Priority: Major
> Fix For: 3.4.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42630) Make `parse_data_type` use new proto message `DDLParse`

2023-03-03 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696392#comment-17696392
 ] 

Apache Spark commented on SPARK-42630:
--

User 'ueshin' has created a pull request for this issue:
https://github.com/apache/spark/pull/40276

> Make `parse_data_type` use new proto message `DDLParse`
> ---
>
> Key: SPARK-42630
> URL: https://issues.apache.org/jira/browse/SPARK-42630
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42630) Make `parse_data_type` use new proto message `DDLParse`

2023-03-03 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696391#comment-17696391
 ] 

Apache Spark commented on SPARK-42630:
--

User 'ueshin' has created a pull request for this issue:
https://github.com/apache/spark/pull/40276

> Make `parse_data_type` use new proto message `DDLParse`
> ---
>
> Key: SPARK-42630
> URL: https://issues.apache.org/jira/browse/SPARK-42630
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42557) Add Broadcast to functions

2023-03-03 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42557:


Assignee: (was: Apache Spark)

> Add Broadcast to functions
> --
>
> Key: SPARK-42557
> URL: https://issues.apache.org/jira/browse/SPARK-42557
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Priority: Major
>
> Add the {{broadcast}} function to functions.scala. Please check if we can get 
> the same semantics as the current implementation using unresolved hints.
> https://github.com/apache/spark/blame/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L1246-L1261



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42557) Add Broadcast to functions

2023-03-03 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42557:


Assignee: Apache Spark

> Add Broadcast to functions
> --
>
> Key: SPARK-42557
> URL: https://issues.apache.org/jira/browse/SPARK-42557
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Assignee: Apache Spark
>Priority: Major
>
> Add the {{broadcast}} function to functions.scala. Please check if we can get 
> the same semantics as the current implementation using unresolved hints.
> https://github.com/apache/spark/blame/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L1246-L1261



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42557) Add Broadcast to functions

2023-03-03 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696387#comment-17696387
 ] 

Apache Spark commented on SPARK-42557:
--

User 'beliefer' has created a pull request for this issue:
https://github.com/apache/spark/pull/40275

> Add Broadcast to functions
> --
>
> Key: SPARK-42557
> URL: https://issues.apache.org/jira/browse/SPARK-42557
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Priority: Major
>
> Add the {{broadcast}} function to functions.scala. Please check if we can get 
> the same semantics as the current implementation using unresolved hints.
> https://github.com/apache/spark/blame/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L1246-L1261



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42215) Better Scala Client Integration test

2023-03-03 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42215:


Assignee: (was: Apache Spark)

> Better Scala Client Integration test
> 
>
> Key: SPARK-42215
> URL: https://issues.apache.org/jira/browse/SPARK-42215
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Zhen Li
>Priority: Major
>
> The current Scala client has a few integration tests that requires a build 
> first before running client tests. This is not very nice to maven developers 
> as they will not be able to do a `mvn clean install` to run all tests.
>  
> Look into marking these test as ITs and other better ways for maven to run 
> test after packages are built.
>  
> Make sure the test run in SBT as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42215) Better Scala Client Integration test

2023-03-03 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696378#comment-17696378
 ] 

Apache Spark commented on SPARK-42215:
--

User 'zhenlineo' has created a pull request for this issue:
https://github.com/apache/spark/pull/40274

> Better Scala Client Integration test
> 
>
> Key: SPARK-42215
> URL: https://issues.apache.org/jira/browse/SPARK-42215
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Zhen Li
>Priority: Major
>
> The current Scala client has a few integration tests that requires a build 
> first before running client tests. This is not very nice to maven developers 
> as they will not be able to do a `mvn clean install` to run all tests.
>  
> Look into marking these test as ITs and other better ways for maven to run 
> test after packages are built.
>  
> Make sure the test run in SBT as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42215) Better Scala Client Integration test

2023-03-03 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42215:


Assignee: Apache Spark

> Better Scala Client Integration test
> 
>
> Key: SPARK-42215
> URL: https://issues.apache.org/jira/browse/SPARK-42215
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Zhen Li
>Assignee: Apache Spark
>Priority: Major
>
> The current Scala client has a few integration tests that requires a build 
> first before running client tests. This is not very nice to maven developers 
> as they will not be able to do a `mvn clean install` to run all tests.
>  
> Look into marking these test as ITs and other better ways for maven to run 
> test after packages are built.
>  
> Make sure the test run in SBT as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42668) Catch exception while trying to close compressed stream in HDFSStateStoreProvider abort

2023-03-03 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696374#comment-17696374
 ] 

Apache Spark commented on SPARK-42668:
--

User 'anishshri-db' has created a pull request for this issue:
https://github.com/apache/spark/pull/40273

> Catch exception while trying to close compressed stream in 
> HDFSStateStoreProvider abort
> ---
>
> Key: SPARK-42668
> URL: https://issues.apache.org/jira/browse/SPARK-42668
> Project: Spark
>  Issue Type: Task
>  Components: Structured Streaming
>Affects Versions: 3.4.0
>Reporter: Anish Shrigondekar
>Priority: Major
>
> Catch exception while trying to close compressed stream in 
> HDFSStateStoreProvider abort
> We have seen some cases where the task exits as cancelled/failed which 
> triggers the abort in the task completion listener for 
> HDFSStateStoreProvider. As part of this, we cancel the backing stream and 
> close the compressed stream. However, different stores such as Azure blob 
> store could throw exceptions which are not caught in the current path, 
> leading to job failures. This change proposes to fix this issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42668) Catch exception while trying to close compressed stream in HDFSStateStoreProvider abort

2023-03-03 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42668:


Assignee: (was: Apache Spark)

> Catch exception while trying to close compressed stream in 
> HDFSStateStoreProvider abort
> ---
>
> Key: SPARK-42668
> URL: https://issues.apache.org/jira/browse/SPARK-42668
> Project: Spark
>  Issue Type: Task
>  Components: Structured Streaming
>Affects Versions: 3.4.0
>Reporter: Anish Shrigondekar
>Priority: Major
>
> Catch exception while trying to close compressed stream in 
> HDFSStateStoreProvider abort
> We have seen some cases where the task exits as cancelled/failed which 
> triggers the abort in the task completion listener for 
> HDFSStateStoreProvider. As part of this, we cancel the backing stream and 
> close the compressed stream. However, different stores such as Azure blob 
> store could throw exceptions which are not caught in the current path, 
> leading to job failures. This change proposes to fix this issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42668) Catch exception while trying to close compressed stream in HDFSStateStoreProvider abort

2023-03-03 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42668:


Assignee: Apache Spark

> Catch exception while trying to close compressed stream in 
> HDFSStateStoreProvider abort
> ---
>
> Key: SPARK-42668
> URL: https://issues.apache.org/jira/browse/SPARK-42668
> Project: Spark
>  Issue Type: Task
>  Components: Structured Streaming
>Affects Versions: 3.4.0
>Reporter: Anish Shrigondekar
>Assignee: Apache Spark
>Priority: Major
>
> Catch exception while trying to close compressed stream in 
> HDFSStateStoreProvider abort
> We have seen some cases where the task exits as cancelled/failed which 
> triggers the abort in the task completion listener for 
> HDFSStateStoreProvider. As part of this, we cancel the backing stream and 
> close the compressed stream. However, different stores such as Azure blob 
> store could throw exceptions which are not caught in the current path, 
> leading to job failures. This change proposes to fix this issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42667) Spark Connect: newSession API

2023-03-03 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696350#comment-17696350
 ] 

Apache Spark commented on SPARK-42667:
--

User 'amaliujia' has created a pull request for this issue:
https://github.com/apache/spark/pull/40272

> Spark Connect: newSession API
> -
>
> Key: SPARK-42667
> URL: https://issues.apache.org/jira/browse/SPARK-42667
> Project: Spark
>  Issue Type: Task
>  Components: Connect
>Affects Versions: 3.4.1
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



<    2   3   4   5   6   7   8   9   10   11   >