date:20240301

[jira] [Commented] (SPARK-44173) Make Spark an sbt build only project

2024-03-01 Thread Yang Jie (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-44173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17822747#comment-17822747
 ] 

Yang Jie commented on SPARK-44173:
--

Hi, [~dongjoon] ~

Sorry, I missed the previous message. This Jira was created based on some 
discussions in https://github.com/apache/spark/pull/40317. With the 
establishment of the Maven daily test pipeline, we now have a way to discover 
problems in Maven tests in a timely manner, so the description in this Jira's 
`Description` has become less critical.

I agree with your point, thank you for converting this to a normal Jira :)

> Make Spark an sbt build only project
> 
>
> Key: SPARK-44173
> URL: https://issues.apache.org/jira/browse/SPARK-44173
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Minor
>
> Supporting both Maven and SBT always brings various testing problems and 
> increases the complexity of testing code writing
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47148) Avoid to materialize AQE ExchangeQueryStageExec on the cancellation

2024-03-01 Thread Eren Avsarogullari (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eren Avsarogullari updated SPARK-47148:
---
Description: 
AQE can materialize both *ShuffleQueryStage* and *BroadcastQueryStage* on the 
cancellation. This causes unnecessary stage materialization by submitting 
Shuffle Job and Broadcast Job. Under normal circumstances, if the stage is 
already non-materialized (a.k.a *ShuffleQueryStage.shuffleFuture* or 
*{{BroadcastQueryStage.broadcastFuture}}* is not initialized yet), it should 
just be skipped without materializing it.

Please find sample use-case:
*1- Stage Materialization Steps:*
When stage materialization is failed:
{code:java}
1.1- ShuffleQueryStage1 - is materialized successfully,
1.2- ShuffleQueryStage2 - materialization is failed,
1.3- ShuffleQueryStage3 - Not materialized yet so 
ShuffleQueryStage3.shuffleFuture is not initialized yet{code}
*2- Stage Cancellation Steps:*
{code:java}
2.1- ShuffleQueryStage1 - is canceled due to already materialized,
2.2- ShuffleQueryStage2 - is earlyFailedStage so currently, it is skipped as 
default by AQE because it could not be materialized,
2.3- ShuffleQueryStage3 - Problem is here: This stage is not materialized yet 
but currently, it is also tried to cancel and this stage requires to be 
materialized first.{code}

  was:
AQE can materialize *ShuffleQueryStage* on the cancellation. This causes 
unnecessary stage materialization by submitting Shuffle Job. Under normal 
circumstances, if the stage is already non-materialized (a.k.a 
ShuffleQueryStage.shuffleFuture is not initialized yet), it should just be 
skipped without materializing it.

Please find sample use-case:
*1- Stage Materialization Steps:*
When stage materialization is failed:
{code:java}
1.1- ShuffleQueryStage1 - is materialized successfully,
1.2- ShuffleQueryStage2 - materialization is failed,
1.3- ShuffleQueryStage3 - Not materialized yet so 
ShuffleQueryStage3.shuffleFuture is not initialized yet{code}
*2- Stage Cancellation Steps:*
{code:java}
2.1- ShuffleQueryStage1 - is canceled due to already materialized,
2.2- ShuffleQueryStage2 - is earlyFailedStage so currently, it is skipped as 
default by AQE because it could not be materialized,
2.3- ShuffleQueryStage3 - Problem is here: This stage is not materialized yet 
but currently, it is also tried to cancel and this stage requires to be 
materialized first.{code}


> Avoid to materialize AQE ExchangeQueryStageExec on the cancellation
> ---
>
> Key: SPARK-47148
> URL: https://issues.apache.org/jira/browse/SPARK-47148
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle, SQL
>Affects Versions: 4.0.0
>Reporter: Eren Avsarogullari
>Priority: Major
>  Labels: pull-request-available
>
> AQE can materialize both *ShuffleQueryStage* and *BroadcastQueryStage* on the 
> cancellation. This causes unnecessary stage materialization by submitting 
> Shuffle Job and Broadcast Job. Under normal circumstances, if the stage is 
> already non-materialized (a.k.a *ShuffleQueryStage.shuffleFuture* or 
> *{{BroadcastQueryStage.broadcastFuture}}* is not initialized yet), it should 
> just be skipped without materializing it.
> Please find sample use-case:
> *1- Stage Materialization Steps:*
> When stage materialization is failed:
> {code:java}
> 1.1- ShuffleQueryStage1 - is materialized successfully,
> 1.2- ShuffleQueryStage2 - materialization is failed,
> 1.3- ShuffleQueryStage3 - Not materialized yet so 
> ShuffleQueryStage3.shuffleFuture is not initialized yet{code}
> *2- Stage Cancellation Steps:*
> {code:java}
> 2.1- ShuffleQueryStage1 - is canceled due to already materialized,
> 2.2- ShuffleQueryStage2 - is earlyFailedStage so currently, it is skipped as 
> default by AQE because it could not be materialized,
> 2.3- ShuffleQueryStage3 - Problem is here: This stage is not materialized yet 
> but currently, it is also tried to cancel and this stage requires to be 
> materialized first.{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-24815) Structured Streaming should support dynamic allocation

2024-03-01 Thread Pavan Kotikalapudi (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-24815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17822722#comment-17822722
 ] 

Pavan Kotikalapudi commented on SPARK-24815:


Thanks a lot for mentoring and driving this effort Mich.

As you suggested I will update the benefits and challenges in the SPIP doc. 
That can outline the scope of the current work and possibility of any future 
work for other use cases.

 

Re:  

> Pluggable Dynamic Allocation , Separate Algorithm for Structured Streaming

I really like the idea. I started off with that but limited it to only core 
module as it serves at primitive level of evaluation (that current dra is 
already doing). but this idea is better as you said design wise and also for 
different kinds of workloads.

 

> Warning for Enabled Core Dynamic Allocation

Right now we need normal DRA because structured streaming DRA is built on top 
of it. I have added another flag `spark.dynamicAllocation.streaming.enabled` so 
that this particular pieces of streaming algo would kick in on top of 
traditional DRA. This approach also makes it backwards compatible especially 
when users have to upgrade spark.

 

> Structured Streaming should support dynamic allocation
> --
>
> Key: SPARK-24815
> URL: https://issues.apache.org/jira/browse/SPARK-24815
> Project: Spark
>  Issue Type: Improvement
>  Components: Scheduler, Spark Core, Structured Streaming
>Affects Versions: 2.3.1
>Reporter: Karthik Palaniappan
>Priority: Minor
>  Labels: pull-request-available
>
> For batch jobs, dynamic allocation is very useful for adding and removing 
> containers to match the actual workload. On multi-tenant clusters, it ensures 
> that a Spark job is taking no more resources than necessary. In cloud 
> environments, it enables autoscaling.
> However, if you set spark.dynamicAllocation.enabled=true and run a structured 
> streaming job, the batch dynamic allocation algorithm kicks in. It requests 
> more executors if the task backlog is a certain size, and removes executors 
> if they idle for a certain period of time.
> Quick thoughts:
> 1) Dynamic allocation should be pluggable, rather than hardcoded to a 
> particular implementation in SparkContext.scala (this should be a separate 
> JIRA).
> 2) We should make a structured streaming algorithm that's separate from the 
> batch algorithm. Eventually, continuous processing might need its own 
> algorithm.
> 3) Spark should print a warning if you run a structured streaming job when 
> Core's dynamic allocation is enabled



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47251) Block invalid types from the `args` argument for `sql` command

2024-03-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47251:
---
Labels: pull-request-available  (was: )

> Block invalid types from the `args` argument for `sql` command
> --
>
> Key: SPARK-47251
> URL: https://issues.apache.org/jira/browse/SPARK-47251
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.5.1
>Reporter: Takuya Ueshin
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47251) Block invalid types from the `args` argument for `sql` command

2024-03-01 Thread Takuya Ueshin (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin updated SPARK-47251:
--
Summary: Block invalid types from the `args` argument for `sql` command  
(was: Block invalid types from the `arg` argument for `sql` command)

> Block invalid types from the `args` argument for `sql` command
> --
>
> Key: SPARK-47251
> URL: https://issues.apache.org/jira/browse/SPARK-47251
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.5.1
>Reporter: Takuya Ueshin
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-47158) Assign proper name and sqlState to _LEGACY_ERROR_TEMP_2134 & 2231

2024-03-01 Thread Max Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk reassigned SPARK-47158:


Assignee: Haejoon Lee

> Assign proper name and sqlState to _LEGACY_ERROR_TEMP_2134 & 2231
> -
>
> Key: SPARK-47158
> URL: https://issues.apache.org/jira/browse/SPARK-47158
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
>
> Assign proper name and sqlState to _LEGACY_ERROR_TEMP_2134 & 2231



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-47251) Block invalid types from the `arg` argument for `sql` command

2024-03-01 Thread Takuya Ueshin (Jira)

Takuya Ueshin created SPARK-47251:
-

 Summary: Block invalid types from the `arg` argument for `sql` 
command
 Key: SPARK-47251
 URL: https://issues.apache.org/jira/browse/SPARK-47251
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 3.5.1
Reporter: Takuya Ueshin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47158) Assign proper name and sqlState to _LEGACY_ERROR_TEMP_2134 & 2231

2024-03-01 Thread Max Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-47158.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45244
[https://github.com/apache/spark/pull/45244]

> Assign proper name and sqlState to _LEGACY_ERROR_TEMP_2134 & 2231
> -
>
> Key: SPARK-47158
> URL: https://issues.apache.org/jira/browse/SPARK-47158
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Assign proper name and sqlState to _LEGACY_ERROR_TEMP_2134 & 2231



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47237) Upgrade xmlschema-core to 2.3.1

2024-03-01 Thread Max Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-47237.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45347
[https://github.com/apache/spark/pull/45347]

> Upgrade xmlschema-core to 2.3.1
> ---
>
> Key: SPARK-47237
> URL: https://issues.apache.org/jira/browse/SPARK-47237
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-47237) Upgrade xmlschema-core to 2.3.1

2024-03-01 Thread Max Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk reassigned SPARK-47237:


Assignee: BingKun Pan

> Upgrade xmlschema-core to 2.3.1
> ---
>
> Key: SPARK-47237
> URL: https://issues.apache.org/jira/browse/SPARK-47237
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47216) Refine layout of SQL performance tuning page

2024-03-01 Thread Max Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-47216.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45322
[https://github.com/apache/spark/pull/45322]

> Refine layout of SQL performance tuning page
> 
>
> Key: SPARK-47216
> URL: https://issues.apache.org/jira/browse/SPARK-47216
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 4.0.0
>Reporter: Nicholas Chammas
>Assignee: Nicholas Chammas
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-47216) Refine layout of SQL performance tuning page

2024-03-01 Thread Max Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk reassigned SPARK-47216:


Assignee: Nicholas Chammas

> Refine layout of SQL performance tuning page
> 
>
> Key: SPARK-47216
> URL: https://issues.apache.org/jira/browse/SPARK-47216
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 4.0.0
>Reporter: Nicholas Chammas
>Assignee: Nicholas Chammas
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47250) Move additional RocksDB errors/exceptions to NERF

2024-03-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47250:
---
Labels: pull-request-available  (was: )

> Move additional RocksDB errors/exceptions to NERF
> -
>
> Key: SPARK-47250
> URL: https://issues.apache.org/jira/browse/SPARK-47250
> Project: Spark
>  Issue Type: Task
>  Components: Structured Streaming
>Affects Versions: 4.0.0
>Reporter: Anish Shrigondekar
>Priority: Major
>  Labels: pull-request-available
>
> Move additional RocksDB errors/exceptions to NERF



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-47243) Correct the package name of `StateMetadataSource.scala`

2024-03-01 Thread Max Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk reassigned SPARK-47243:


Assignee: Yang Jie

> Correct the package name of `StateMetadataSource.scala`
> ---
>
> Key: SPARK-47243
> URL: https://issues.apache.org/jira/browse/SPARK-47243
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47243) Correct the package name of `StateMetadataSource.scala`

2024-03-01 Thread Max Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-47243.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45352
[https://github.com/apache/spark/pull/45352]

> Correct the package name of `StateMetadataSource.scala`
> ---
>
> Key: SPARK-47243
> URL: https://issues.apache.org/jira/browse/SPARK-47243
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-47250) Move additional RocksDB errors/exceptions to NERF

2024-03-01 Thread Anish Shrigondekar (Jira)

Anish Shrigondekar created SPARK-47250:
--

 Summary: Move additional RocksDB errors/exceptions to NERF
 Key: SPARK-47250
 URL: https://issues.apache.org/jira/browse/SPARK-47250
 Project: Spark
  Issue Type: Task
  Components: Structured Streaming
Affects Versions: 4.0.0
Reporter: Anish Shrigondekar


Move additional RocksDB errors/exceptions to NERF



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-46762) Spark Connect 3.5 Classloading issue with external jar

2024-03-01 Thread nirav patel (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-46762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17822653#comment-17822653
 ] 

nirav patel commented on SPARK-46762:
-

just realized I added spark-connect 3.4 example start up command instead of 3.5 
. I just updated it in OP.

> Spark Connect 3.5 Classloading issue with external jar
> --
>
> Key: SPARK-46762
> URL: https://issues.apache.org/jira/browse/SPARK-46762
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: nirav patel
>Priority: Major
> Attachments: Screenshot 2024-02-22 at 2.04.37 PM.png, Screenshot 
> 2024-02-22 at 2.04.49 PM.png
>
>
> We are having following `java.lang.ClassCastException` error in spark 
> Executors when using spark-connect 3.5 with external spark sql catalog jar - 
> iceberg-spark-runtime-3.5_2.12-1.4.3.jar
> We also set "spark.executor.userClassPathFirst=true" otherwise child class 
> gets loaded by MutableClassLoader and parent class gets loaded by 
> ChildFirstCLassLoader and that causes ClassCastException as well.
>  
> {code:java}
> pyspark.errors.exceptions.connect.SparkConnectGrpcException: 
> (org.apache.spark.SparkException) Job aborted due to stage failure: Task 0 in 
> stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 
> (TID 3) (spark35-m.c.mycomp-dev-test.internal executor 2): 
> java.lang.ClassCastException: class 
> org.apache.iceberg.spark.source.SerializableTableWithSize cannot be cast to 
> class org.apache.iceberg.Table 
> (org.apache.iceberg.spark.source.SerializableTableWithSize is in unnamed 
> module of loader org.apache.spark.util.ChildFirstURLClassLoader @5e7ae053; 
> org.apache.iceberg.Table is in unnamed module of loader 
> org.apache.spark.util.ChildFirstURLClassLoader @4b18b943)
>     at 
> org.apache.iceberg.spark.source.SparkInputPartition.table(SparkInputPartition.java:88)
>     at 
> org.apache.iceberg.spark.source.RowDataReader.(RowDataReader.java:50)
>     at 
> org.apache.iceberg.spark.source.SparkRowReaderFactory.createReader(SparkRowReaderFactory.java:45)
>     at 
> org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.advanceToNextIter(DataSourceRDD.scala:84)
>     at 
> org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.hasNext(DataSourceRDD.scala:63)
>     at 
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
>     at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
>     at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
>     at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:388)
>     at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:890)
>     at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:890)
>     at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>     at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364)
>     at org.apache.spark.rdd.RDD.iterator(RDD.scala:328)
>     at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
>     at 
> org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
>     at org.apache.spark.scheduler.Task.run(Task.scala:141)
>     at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)
>     at org.apach...{code}
>  
> `org.apache.iceberg.spark.source.SerializableTableWithSize` is a child of 
> `org.apache.iceberg.Table` and they are both in only one jar  
> `iceberg-spark-runtime-3.5_2.12-1.4.3.jar` 
> We verified that there's only one jar of 
> `iceberg-spark-runtime-3.5_2.12-1.4.3.jar` loaded when spark-connect server 
> is started. 
> Looking more into Error it seems classloader itself is instantiated multiple 
> times somewhere. I can see two instances: 
> org.apache.spark.util.ChildFirstURLClassLoader @5e7ae053 and 
> org.apache.spark.util.ChildFirstURLClassLoader @4b18b943 
>  
> *Affected version:*
> spark 3.5 and spark-connect_2.12:3.5.0 works fine
>  
> *Not affected version and variation:*
> Spark 3.4 and spark-connect_2.12:3.4.0 works fine with external jar
> Also works with just Spark 3.5 spark-submit script directly (ie without using 
> spark-connect 3.5 )
>  
> Issue has been open with Iceberg as well: 
> [https://github.com/apache/iceberg/issues/8978]
> And been discussed in dev@org.apache.iceberg: 
> [https://lists.apache.org/thread/5q1pdqqrd1h06hgs8vx9ztt60z5yv8n1]
>  
>  
> Steps to reproduce:
>  
> 1) Just to see that spark is loading same class twice using different 
> classloader:
>  
> Start spark-connect server with required jars and configuration for 
> iceberg-hive catalog. 
> {code:java}
> sudo /usr/lib/spark/sbin/start-connect-server.sh \  
>  --packages

[jira] [Updated] (SPARK-46762) Spark Connect 3.5 Classloading issue with external jar

2024-03-01 Thread nirav patel (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nirav patel updated SPARK-46762:

Description: 
We are having following `java.lang.ClassCastException` error in spark Executors 
when using spark-connect 3.5 with external spark sql catalog jar - 
iceberg-spark-runtime-3.5_2.12-1.4.3.jar

We also set "spark.executor.userClassPathFirst=true" otherwise child class gets 
loaded by MutableClassLoader and parent class gets loaded by 
ChildFirstCLassLoader and that causes ClassCastException as well.

 
{code:java}
pyspark.errors.exceptions.connect.SparkConnectGrpcException: 
(org.apache.spark.SparkException) Job aborted due to stage failure: Task 0 in 
stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 
3) (spark35-m.c.mycomp-dev-test.internal executor 2): 
java.lang.ClassCastException: class 
org.apache.iceberg.spark.source.SerializableTableWithSize cannot be cast to 
class org.apache.iceberg.Table 
(org.apache.iceberg.spark.source.SerializableTableWithSize is in unnamed module 
of loader org.apache.spark.util.ChildFirstURLClassLoader @5e7ae053; 
org.apache.iceberg.Table is in unnamed module of loader 
org.apache.spark.util.ChildFirstURLClassLoader @4b18b943)
    at 
org.apache.iceberg.spark.source.SparkInputPartition.table(SparkInputPartition.java:88)
    at 
org.apache.iceberg.spark.source.RowDataReader.(RowDataReader.java:50)
    at 
org.apache.iceberg.spark.source.SparkRowReaderFactory.createReader(SparkRowReaderFactory.java:45)
    at 
org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.advanceToNextIter(DataSourceRDD.scala:84)
    at 
org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.hasNext(DataSourceRDD.scala:63)
    at 
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
    at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
    at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
    at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:388)
    at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:890)
    at 
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:890)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:328)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
    at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
    at org.apache.spark.scheduler.Task.run(Task.scala:141)
    at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)
    at org.apach...{code}
 

`org.apache.iceberg.spark.source.SerializableTableWithSize` is a child of 
`org.apache.iceberg.Table` and they are both in only one jar  
`iceberg-spark-runtime-3.5_2.12-1.4.3.jar` 

We verified that there's only one jar of 
`iceberg-spark-runtime-3.5_2.12-1.4.3.jar` loaded when spark-connect server is 
started. 

Looking more into Error it seems classloader itself is instantiated multiple 
times somewhere. I can see two instances: 
org.apache.spark.util.ChildFirstURLClassLoader @5e7ae053 and 
org.apache.spark.util.ChildFirstURLClassLoader @4b18b943 

 

*Affected version:*

spark 3.5 and spark-connect_2.12:3.5.0 works fine

 

*Not affected version and variation:*

Spark 3.4 and spark-connect_2.12:3.4.0 works fine with external jar

Also works with just Spark 3.5 spark-submit script directly (ie without using 
spark-connect 3.5 )

 

Issue has been open with Iceberg as well: 
[https://github.com/apache/iceberg/issues/8978]

And been discussed in dev@org.apache.iceberg: 
[https://lists.apache.org/thread/5q1pdqqrd1h06hgs8vx9ztt60z5yv8n1]

 

 

Steps to reproduce:

 

1) Just to see that spark is loading same class twice using different 
classloader:

 

Start spark-connect server with required jars and configuration for 
iceberg-hive catalog. 
{code:java}
sudo /usr/lib/spark/sbin/start-connect-server.sh \  
 --packages org.apache.spark:spark-connect_2.12:3.5.0 \  
  --jars gs://libs/iceberg-spark-runtime-3.5_2.12-1.4.3.jar \
 --conf "spark.executor.extraJavaOptions=-verbose:class" \
 --conf 
"spark.sql.catalog.iceberg_catalog=org.apache.iceberg.spark.SparkCatalog" \
 --conf "spark.sql.catalog.iceberg_catalog.type=hive" \
 --conf "spark.sql.catalog.iceberg_catalog.uri = thrift://metastore-host:port 
{code}
reference: [https://iceberg.apache.org/docs/1.4.2/spark-configuration/#catalogs]

Since i Have `"spark.executor.extraJavaOptions=-verbose:class"` you should see 
in executor logs that `org.apache.iceberg.Table` is loaded twice

You can take a heapdump like I did to verify that as well. I already have a 
screenshot for heapdumps.

 

2) To actually reproduce ClassCastException:

Try running a spark query

[jira] [Updated] (SPARK-44173) Make Spark an sbt build only project

2024-03-01 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-44173:
--
Parent: (was: SPARK-44111)
Issue Type: Improvement  (was: Sub-task)

> Make Spark an sbt build only project
> 
>
> Key: SPARK-44173
> URL: https://issues.apache.org/jira/browse/SPARK-44173
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Minor
>
> Supporting both Maven and SBT always brings various testing problems and 
> increases the complexity of testing code writing
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-44173) Make Spark an sbt build only project

2024-03-01 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-44173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17822651#comment-17822651
 ] 

Dongjoon Hyun commented on SPARK-44173:
---

Hi, [~LuciferYang] .

Given that the difficulty in SBT's dependency management, let's consider this 
separately as a long term perspective. I converted this to a normal Jira.

> Make Spark an sbt build only project
> 
>
> Key: SPARK-44173
> URL: https://issues.apache.org/jira/browse/SPARK-44173
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Minor
>
> Supporting both Maven and SBT always brings various testing problems and 
> increases the complexity of testing code writing
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47043) Fix `spark-common-utils` module to have explicit `jackson-core` and `jackson-annotations`dependency

2024-03-01 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-47043:
--
Parent: (was: SPARK-44111)
Issue Type: Improvement  (was: Sub-task)

> Fix `spark-common-utils` module to have explicit `jackson-core` and 
> `jackson-annotations`dependency
> ---
>
> Key: SPARK-47043
> URL: https://issues.apache.org/jira/browse/SPARK-47043
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, Tests
>Affects Versions: 4.0.0
>Reporter: William Wong
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Following scala code depends on `jackson-core` and `jackson-annotations` 
> explicitly.  However, spark-common-utils modules missing related dependency.
> {code:java}
> ~/dev/sources/spark$ grep -R jackson.core ./common/utils/* | grep import
> ./common/utils/src/main/scala/org/apache/spark/ErrorClassesJSONReader.scala:import
>  com.fasterxml.jackson.core.`type`.TypeReference
> ./common/utils/src/main/scala/org/apache/spark/util/JsonUtils.scala:import 
> com.fasterxml.jackson.core.{JsonEncoding, JsonGenerator}
> ~/dev/sources/spark$ grep -R jackson.annotation ./common/utils/* | grep import
> ./common/utils/src/main/scala/org/apache/spark/ErrorClassesJSONReader.scala:import
>  com.fasterxml.jackson.annotation.JsonIgnore
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47042) Fix `spark-common-utils` module to have explicit `commons-lang3` dependency

2024-03-01 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-47042:
--
Parent: (was: SPARK-44111)
Issue Type: Improvement  (was: Sub-task)

> Fix `spark-common-utils` module to have explicit `commons-lang3` dependency
> ---
>
> Key: SPARK-47042
> URL: https://issues.apache.org/jira/browse/SPARK-47042
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, Tests
>Affects Versions: 4.0.0
>Reporter: William Wong
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Following scala code depends on `commons-lang3` explicitly.  However, the 
> common-utils modules missing related dependency.
> {code:java}
> ~/dev/sources/spark/common/utils$ grep -R lang3 * | grep import
> src/main/scala/org/apache/spark/util/MavenUtils.scala:import 
> org.apache.commons.lang3.StringUtils
> src/main/scala/org/apache/spark/util/ClosureCleaner.scala:import 
> org.apache.commons.lang3.ClassUtils
> src/main/java/org/apache/spark/network/util/JavaUtils.java:import 
> org.apache.commons.lang3.SystemUtils; {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-41392) Add `bouncy-castle` test dependencies to `sql/core` module for Hadoop 3.4.0

2024-03-01 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-41392:
--
Summary: Add `bouncy-castle` test dependencies to `sql/core` module for 
Hadoop 3.4.0  (was: spark builds against hadoop trunk/3.4.0-SNAPSHOT fail in 
scala-maven plugin)

> Add `bouncy-castle` test dependencies to `sql/core` module for Hadoop 3.4.0
> ---
>
> Key: SPARK-41392
> URL: https://issues.apache.org/jira/browse/SPARK-41392
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Steve Loughran
>Assignee: Yang Jie
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> on hadoop trunk (but not the 3.3.x line), spark builds fail with a CNFE
> {code}
> net.alchim31.maven:scala-maven-plugin:4.7.2:testCompile: 
> org/bouncycastle/jce/provider/BouncyCastleProvider
> {code}
> full stack
> {code}
> [ERROR] Failed to execute goal 
> net.alchim31.maven:scala-maven-plugin:4.7.2:testCompile 
> (scala-test-compile-first) on project spark-sql_2.12: Execution 
> scala-test-compile-first of goal 
> net.alchim31.maven:scala-maven-plugin:4.7.2:testCompile failed: A required 
> class was missing while executing 
> net.alchim31.maven:scala-maven-plugin:4.7.2:testCompile: 
> org/bouncycastle/jce/provider/BouncyCastleProvider
> [ERROR] -
> [ERROR] realm =plugin>net.alchim31.maven:scala-maven-plugin:4.7.2
> [ERROR] strategy = org.codehaus.plexus.classworlds.strategy.SelfFirstStrategy
> [ERROR] urls[0] = 
> file:/Users/stevel/.m2/repository/net/alchim31/maven/scala-maven-plugin/4.7.2/scala-maven-plugin-4.7.2.jar
> [ERROR] urls[1] = 
> file:/Users/stevel/.m2/repository/org/apache/maven/shared/maven-dependency-tree/3.2.0/maven-dependency-tree-3.2.0.jar
> [ERROR] urls[2] = 
> file:/Users/stevel/.m2/repository/org/eclipse/aether/aether-util/1.0.0.v20140518/aether-util-1.0.0.v20140518.jar
> [ERROR] urls[3] = 
> file:/Users/stevel/.m2/repository/org/apache/maven/reporting/maven-reporting-api/3.1.1/maven-reporting-api-3.1.1.jar
> [ERROR] urls[4] = 
> file:/Users/stevel/.m2/repository/org/apache/maven/doxia/doxia-sink-api/1.11.1/doxia-sink-api-1.11.1.jar
> [ERROR] urls[5] = 
> file:/Users/stevel/.m2/repository/org/apache/maven/doxia/doxia-logging-api/1.11.1/doxia-logging-api-1.11.1.jar
> [ERROR] urls[6] = 
> file:/Users/stevel/.m2/repository/org/apache/maven/maven-archiver/3.6.0/maven-archiver-3.6.0.jar
> [ERROR] urls[7] = 
> file:/Users/stevel/.m2/repository/org/codehaus/plexus/plexus-io/3.4.0/plexus-io-3.4.0.jar
> [ERROR] urls[8] = 
> file:/Users/stevel/.m2/repository/org/codehaus/plexus/plexus-interpolation/1.26/plexus-interpolation-1.26.jar
> [ERROR] urls[9] = 
> file:/Users/stevel/.m2/repository/org/apache/commons/commons-exec/1.3/commons-exec-1.3.jar
> [ERROR] urls[10] = 
> file:/Users/stevel/.m2/repository/org/codehaus/plexus/plexus-utils/3.4.2/plexus-utils-3.4.2.jar
> [ERROR] urls[11] = 
> file:/Users/stevel/.m2/repository/org/codehaus/plexus/plexus-archiver/4.5.0/plexus-archiver-4.5.0.jar
> [ERROR] urls[12] = 
> file:/Users/stevel/.m2/repository/commons-io/commons-io/2.11.0/commons-io-2.11.0.jar
> [ERROR] urls[13] = 
> file:/Users/stevel/.m2/repository/org/apache/commons/commons-compress/1.21/commons-compress-1.21.jar
> [ERROR] urls[14] = 
> file:/Users/stevel/.m2/repository/org/iq80/snappy/snappy/0.4/snappy-0.4.jar
> [ERROR] urls[15] = 
> file:/Users/stevel/.m2/repository/org/tukaani/xz/1.9/xz-1.9.jar
> [ERROR] urls[16] = 
> file:/Users/stevel/.m2/repository/com/github/luben/zstd-jni/1.5.2-4/zstd-jni-1.5.2-4.jar
> [ERROR] urls[17] = 
> file:/Users/stevel/.m2/repository/org/scala-sbt/zinc_2.13/1.7.1/zinc_2.13-1.7.1.jar
> [ERROR] urls[18] = 
> file:/Users/stevel/.m2/repository/org/scala-lang/scala-library/2.13.8/scala-library-2.13.8.jar
> [ERROR] urls[19] = 
> file:/Users/stevel/.m2/repository/org/scala-sbt/zinc-core_2.13/1.7.1/zinc-core_2.13-1.7.1.jar
> [ERROR] urls[20] = 
> file:/Users/stevel/.m2/repository/org/scala-sbt/zinc-apiinfo_2.13/1.7.1/zinc-apiinfo_2.13-1.7.1.jar
> [ERROR] urls[21] = 
> file:/Users/stevel/.m2/repository/org/scala-sbt/compiler-bridge_2.13/1.7.1/compiler-bridge_2.13-1.7.1.jar
> [ERROR] urls[22] = 
> file:/Users/stevel/.m2/repository/org/scala-sbt/zinc-classpath_2.13/1.7.1/zinc-classpath_2.13-1.7.1.jar
> [ERROR] urls[23] = 
> file:/Users/stevel/.m2/repository/org/scala-lang/scala-compiler/2.13.8/scala-compiler-2.13.8.jar
> [ERROR] urls[24] = 
> file:/Users/stevel/.m2/repository/org/scala-sbt/compiler-interface/1.7.1/compiler-interface-1.7.1.jar
> [ERROR] urls[25] = 
> file:/Users/stevel/.m2/repository/org/scala-sbt/util-interface/1.7.0/util-interface-1.7.0.jar
> [ERROR] urls[26] = 
>

[jira] [Commented] (SPARK-41392) spark builds against hadoop trunk/3.4.0-SNAPSHOT fail in scala-maven plugin

2024-03-01 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17822649#comment-17822649
 ] 

Dongjoon Hyun commented on SPARK-41392:
---

It's a great news. :) 

> spark builds against hadoop trunk/3.4.0-SNAPSHOT fail in scala-maven plugin
> ---
>
> Key: SPARK-41392
> URL: https://issues.apache.org/jira/browse/SPARK-41392
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Steve Loughran
>Assignee: Yang Jie
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> on hadoop trunk (but not the 3.3.x line), spark builds fail with a CNFE
> {code}
> net.alchim31.maven:scala-maven-plugin:4.7.2:testCompile: 
> org/bouncycastle/jce/provider/BouncyCastleProvider
> {code}
> full stack
> {code}
> [ERROR] Failed to execute goal 
> net.alchim31.maven:scala-maven-plugin:4.7.2:testCompile 
> (scala-test-compile-first) on project spark-sql_2.12: Execution 
> scala-test-compile-first of goal 
> net.alchim31.maven:scala-maven-plugin:4.7.2:testCompile failed: A required 
> class was missing while executing 
> net.alchim31.maven:scala-maven-plugin:4.7.2:testCompile: 
> org/bouncycastle/jce/provider/BouncyCastleProvider
> [ERROR] -
> [ERROR] realm =plugin>net.alchim31.maven:scala-maven-plugin:4.7.2
> [ERROR] strategy = org.codehaus.plexus.classworlds.strategy.SelfFirstStrategy
> [ERROR] urls[0] = 
> file:/Users/stevel/.m2/repository/net/alchim31/maven/scala-maven-plugin/4.7.2/scala-maven-plugin-4.7.2.jar
> [ERROR] urls[1] = 
> file:/Users/stevel/.m2/repository/org/apache/maven/shared/maven-dependency-tree/3.2.0/maven-dependency-tree-3.2.0.jar
> [ERROR] urls[2] = 
> file:/Users/stevel/.m2/repository/org/eclipse/aether/aether-util/1.0.0.v20140518/aether-util-1.0.0.v20140518.jar
> [ERROR] urls[3] = 
> file:/Users/stevel/.m2/repository/org/apache/maven/reporting/maven-reporting-api/3.1.1/maven-reporting-api-3.1.1.jar
> [ERROR] urls[4] = 
> file:/Users/stevel/.m2/repository/org/apache/maven/doxia/doxia-sink-api/1.11.1/doxia-sink-api-1.11.1.jar
> [ERROR] urls[5] = 
> file:/Users/stevel/.m2/repository/org/apache/maven/doxia/doxia-logging-api/1.11.1/doxia-logging-api-1.11.1.jar
> [ERROR] urls[6] = 
> file:/Users/stevel/.m2/repository/org/apache/maven/maven-archiver/3.6.0/maven-archiver-3.6.0.jar
> [ERROR] urls[7] = 
> file:/Users/stevel/.m2/repository/org/codehaus/plexus/plexus-io/3.4.0/plexus-io-3.4.0.jar
> [ERROR] urls[8] = 
> file:/Users/stevel/.m2/repository/org/codehaus/plexus/plexus-interpolation/1.26/plexus-interpolation-1.26.jar
> [ERROR] urls[9] = 
> file:/Users/stevel/.m2/repository/org/apache/commons/commons-exec/1.3/commons-exec-1.3.jar
> [ERROR] urls[10] = 
> file:/Users/stevel/.m2/repository/org/codehaus/plexus/plexus-utils/3.4.2/plexus-utils-3.4.2.jar
> [ERROR] urls[11] = 
> file:/Users/stevel/.m2/repository/org/codehaus/plexus/plexus-archiver/4.5.0/plexus-archiver-4.5.0.jar
> [ERROR] urls[12] = 
> file:/Users/stevel/.m2/repository/commons-io/commons-io/2.11.0/commons-io-2.11.0.jar
> [ERROR] urls[13] = 
> file:/Users/stevel/.m2/repository/org/apache/commons/commons-compress/1.21/commons-compress-1.21.jar
> [ERROR] urls[14] = 
> file:/Users/stevel/.m2/repository/org/iq80/snappy/snappy/0.4/snappy-0.4.jar
> [ERROR] urls[15] = 
> file:/Users/stevel/.m2/repository/org/tukaani/xz/1.9/xz-1.9.jar
> [ERROR] urls[16] = 
> file:/Users/stevel/.m2/repository/com/github/luben/zstd-jni/1.5.2-4/zstd-jni-1.5.2-4.jar
> [ERROR] urls[17] = 
> file:/Users/stevel/.m2/repository/org/scala-sbt/zinc_2.13/1.7.1/zinc_2.13-1.7.1.jar
> [ERROR] urls[18] = 
> file:/Users/stevel/.m2/repository/org/scala-lang/scala-library/2.13.8/scala-library-2.13.8.jar
> [ERROR] urls[19] = 
> file:/Users/stevel/.m2/repository/org/scala-sbt/zinc-core_2.13/1.7.1/zinc-core_2.13-1.7.1.jar
> [ERROR] urls[20] = 
> file:/Users/stevel/.m2/repository/org/scala-sbt/zinc-apiinfo_2.13/1.7.1/zinc-apiinfo_2.13-1.7.1.jar
> [ERROR] urls[21] = 
> file:/Users/stevel/.m2/repository/org/scala-sbt/compiler-bridge_2.13/1.7.1/compiler-bridge_2.13-1.7.1.jar
> [ERROR] urls[22] = 
> file:/Users/stevel/.m2/repository/org/scala-sbt/zinc-classpath_2.13/1.7.1/zinc-classpath_2.13-1.7.1.jar
> [ERROR] urls[23] = 
> file:/Users/stevel/.m2/repository/org/scala-lang/scala-compiler/2.13.8/scala-compiler-2.13.8.jar
> [ERROR] urls[24] = 
> file:/Users/stevel/.m2/repository/org/scala-sbt/compiler-interface/1.7.1/compiler-interface-1.7.1.jar
> [ERROR] urls[25] = 
> file:/Users/stevel/.m2/repository/org/scala-sbt/util-interface/1.7.0/util-interface-1.7.0.jar
> [ERROR] urls[26] = 
> file:/Users/stevel/.m2/repository/org/scala-sbt/zinc-persist-core-assembly/1.7.1/zinc-persist-core-assembly-1.7.1.jar

[jira] [Updated] (SPARK-47167) Add concrete relation class for JDBC relation made in V1TableScan

2024-03-01 Thread Uros Stankovic (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uros Stankovic updated SPARK-47167:
---
Description: 
JDBCRelation's method toV1TableScan creates v1 anonymous relation that can use 
predicates and other pushdowns of JDBCRelation object (v2 relation).

That relation can later be logged to telemetry or to shown in Spark UI by that 
name. It is not descriptive enough, so the idea is to use concrete class.

  was:BaseRelation class do not provide any descriptive information like name 
or description, etc. So it would be great to add such class so debugging and 
logging would be easier.


> Add concrete relation class for JDBC relation made in V1TableScan
> -
>
> Key: SPARK-47167
> URL: https://issues.apache.org/jira/browse/SPARK-47167
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.5.1
>Reporter: Uros Stankovic
>Assignee: Uros Stankovic
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> JDBCRelation's method toV1TableScan creates v1 anonymous relation that can 
> use predicates and other pushdowns of JDBCRelation object (v2 relation).
> That relation can later be logged to telemetry or to shown in Spark UI by 
> that name. It is not descriptive enough, so the idea is to use concrete class.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47167) Add concrete relation class for JDBC relation made in V1TableScan

2024-03-01 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47167.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45259
[https://github.com/apache/spark/pull/45259]

> Add concrete relation class for JDBC relation made in V1TableScan
> -
>
> Key: SPARK-47167
> URL: https://issues.apache.org/jira/browse/SPARK-47167
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.5.1
>Reporter: Uros Stankovic
>Assignee: Uros Stankovic
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> BaseRelation class do not provide any descriptive information like name or 
> description, etc. So it would be great to add such class so debugging and 
> logging would be easier.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47167) Add concrete relation class for JDBC relation made in V1TableScan

2024-03-01 Thread Uros Stankovic (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uros Stankovic updated SPARK-47167:
---
Summary: Add concrete relation class for JDBC relation made in V1TableScan  
(was: Add descriptive relation class)

> Add concrete relation class for JDBC relation made in V1TableScan
> -
>
> Key: SPARK-47167
> URL: https://issues.apache.org/jira/browse/SPARK-47167
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.5.1
>Reporter: Uros Stankovic
>Priority: Minor
>  Labels: pull-request-available
>
> BaseRelation class do not provide any descriptive information like name or 
> description, etc. So it would be great to add such class so debugging and 
> logging would be easier.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41392) spark builds against hadoop trunk/3.4.0-SNAPSHOT fail in scala-maven plugin

2024-03-01 Thread Steve Loughran (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17822572#comment-17822572
 ] 

Steve Loughran commented on SPARK-41392:


expect an official release this week; this pr will ensure it works

> spark builds against hadoop trunk/3.4.0-SNAPSHOT fail in scala-maven plugin
> ---
>
> Key: SPARK-41392
> URL: https://issues.apache.org/jira/browse/SPARK-41392
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Steve Loughran
>Assignee: Yang Jie
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> on hadoop trunk (but not the 3.3.x line), spark builds fail with a CNFE
> {code}
> net.alchim31.maven:scala-maven-plugin:4.7.2:testCompile: 
> org/bouncycastle/jce/provider/BouncyCastleProvider
> {code}
> full stack
> {code}
> [ERROR] Failed to execute goal 
> net.alchim31.maven:scala-maven-plugin:4.7.2:testCompile 
> (scala-test-compile-first) on project spark-sql_2.12: Execution 
> scala-test-compile-first of goal 
> net.alchim31.maven:scala-maven-plugin:4.7.2:testCompile failed: A required 
> class was missing while executing 
> net.alchim31.maven:scala-maven-plugin:4.7.2:testCompile: 
> org/bouncycastle/jce/provider/BouncyCastleProvider
> [ERROR] -
> [ERROR] realm =plugin>net.alchim31.maven:scala-maven-plugin:4.7.2
> [ERROR] strategy = org.codehaus.plexus.classworlds.strategy.SelfFirstStrategy
> [ERROR] urls[0] = 
> file:/Users/stevel/.m2/repository/net/alchim31/maven/scala-maven-plugin/4.7.2/scala-maven-plugin-4.7.2.jar
> [ERROR] urls[1] = 
> file:/Users/stevel/.m2/repository/org/apache/maven/shared/maven-dependency-tree/3.2.0/maven-dependency-tree-3.2.0.jar
> [ERROR] urls[2] = 
> file:/Users/stevel/.m2/repository/org/eclipse/aether/aether-util/1.0.0.v20140518/aether-util-1.0.0.v20140518.jar
> [ERROR] urls[3] = 
> file:/Users/stevel/.m2/repository/org/apache/maven/reporting/maven-reporting-api/3.1.1/maven-reporting-api-3.1.1.jar
> [ERROR] urls[4] = 
> file:/Users/stevel/.m2/repository/org/apache/maven/doxia/doxia-sink-api/1.11.1/doxia-sink-api-1.11.1.jar
> [ERROR] urls[5] = 
> file:/Users/stevel/.m2/repository/org/apache/maven/doxia/doxia-logging-api/1.11.1/doxia-logging-api-1.11.1.jar
> [ERROR] urls[6] = 
> file:/Users/stevel/.m2/repository/org/apache/maven/maven-archiver/3.6.0/maven-archiver-3.6.0.jar
> [ERROR] urls[7] = 
> file:/Users/stevel/.m2/repository/org/codehaus/plexus/plexus-io/3.4.0/plexus-io-3.4.0.jar
> [ERROR] urls[8] = 
> file:/Users/stevel/.m2/repository/org/codehaus/plexus/plexus-interpolation/1.26/plexus-interpolation-1.26.jar
> [ERROR] urls[9] = 
> file:/Users/stevel/.m2/repository/org/apache/commons/commons-exec/1.3/commons-exec-1.3.jar
> [ERROR] urls[10] = 
> file:/Users/stevel/.m2/repository/org/codehaus/plexus/plexus-utils/3.4.2/plexus-utils-3.4.2.jar
> [ERROR] urls[11] = 
> file:/Users/stevel/.m2/repository/org/codehaus/plexus/plexus-archiver/4.5.0/plexus-archiver-4.5.0.jar
> [ERROR] urls[12] = 
> file:/Users/stevel/.m2/repository/commons-io/commons-io/2.11.0/commons-io-2.11.0.jar
> [ERROR] urls[13] = 
> file:/Users/stevel/.m2/repository/org/apache/commons/commons-compress/1.21/commons-compress-1.21.jar
> [ERROR] urls[14] = 
> file:/Users/stevel/.m2/repository/org/iq80/snappy/snappy/0.4/snappy-0.4.jar
> [ERROR] urls[15] = 
> file:/Users/stevel/.m2/repository/org/tukaani/xz/1.9/xz-1.9.jar
> [ERROR] urls[16] = 
> file:/Users/stevel/.m2/repository/com/github/luben/zstd-jni/1.5.2-4/zstd-jni-1.5.2-4.jar
> [ERROR] urls[17] = 
> file:/Users/stevel/.m2/repository/org/scala-sbt/zinc_2.13/1.7.1/zinc_2.13-1.7.1.jar
> [ERROR] urls[18] = 
> file:/Users/stevel/.m2/repository/org/scala-lang/scala-library/2.13.8/scala-library-2.13.8.jar
> [ERROR] urls[19] = 
> file:/Users/stevel/.m2/repository/org/scala-sbt/zinc-core_2.13/1.7.1/zinc-core_2.13-1.7.1.jar
> [ERROR] urls[20] = 
> file:/Users/stevel/.m2/repository/org/scala-sbt/zinc-apiinfo_2.13/1.7.1/zinc-apiinfo_2.13-1.7.1.jar
> [ERROR] urls[21] = 
> file:/Users/stevel/.m2/repository/org/scala-sbt/compiler-bridge_2.13/1.7.1/compiler-bridge_2.13-1.7.1.jar
> [ERROR] urls[22] = 
> file:/Users/stevel/.m2/repository/org/scala-sbt/zinc-classpath_2.13/1.7.1/zinc-classpath_2.13-1.7.1.jar
> [ERROR] urls[23] = 
> file:/Users/stevel/.m2/repository/org/scala-lang/scala-compiler/2.13.8/scala-compiler-2.13.8.jar
> [ERROR] urls[24] = 
> file:/Users/stevel/.m2/repository/org/scala-sbt/compiler-interface/1.7.1/compiler-interface-1.7.1.jar
> [ERROR] urls[25] = 
> file:/Users/stevel/.m2/repository/org/scala-sbt/util-interface/1.7.0/util-interface-1.7.0.jar
> [ERROR] urls[26] = 
>

[jira] [Updated] (SPARK-47131) contains, startswith, endswith

2024-03-01 Thread Jira



 [ 
https://issues.apache.org/jira/browse/SPARK-47131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uroš Bojanić updated SPARK-47131:
-
Description: 
Refactored built-in string functions to enable collation support for: 
{_}contains{_}, {_}startsWith{_}, {_}endsWith{_}. Spark SQL users should now be 
able to use COLLATE within arguments for built-in string functions: CONTAINS, 
STARTSWITH, ENDSWITH in Spark SQL queries. Note: CONTAINS implementation for 
non-binary collations is a separate subtask (SPARK-47248

).

  was:Refactored built-in string functions to enable collation support for: 
{_}contains{_}, {_}startsWith{_}, {_}endsWith{_}. Spark SQL users should now be 
able to use COLLATE within arguments for built-in string functions: CONTAINS, 
STARTSWITH, ENDSWITH in Spark SQL queries.


> contains, startswith, endswith
> --
>
> Key: SPARK-47131
> URL: https://issues.apache.org/jira/browse/SPARK-47131
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
>
> Refactored built-in string functions to enable collation support for: 
> {_}contains{_}, {_}startsWith{_}, {_}endsWith{_}. Spark SQL users should now 
> be able to use COLLATE within arguments for built-in string functions: 
> CONTAINS, STARTSWITH, ENDSWITH in Spark SQL queries. Note: CONTAINS 
> implementation for non-binary collations is a separate subtask (SPARK-47248
> ).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47131) contains, startswith, endswith

2024-03-01 Thread Jira



 [ 
https://issues.apache.org/jira/browse/SPARK-47131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uroš Bojanić updated SPARK-47131:
-
Description: Refactored built-in string functions to enable collation 
support for: {_}contains{_}, {_}startsWith{_}, {_}endsWith{_}. Spark SQL users 
should now be able to use COLLATE within arguments for built-in string 
functions: CONTAINS, STARTSWITH, ENDSWITH in Spark SQL queries. Note: CONTAINS 
implementation for non-binary collations is a separate subtask (SPARK-47248).  
(was: Refactored built-in string functions to enable collation support for: 
{_}contains{_}, {_}startsWith{_}, {_}endsWith{_}. Spark SQL users should now be 
able to use COLLATE within arguments for built-in string functions: CONTAINS, 
STARTSWITH, ENDSWITH in Spark SQL queries. Note: CONTAINS implementation for 
non-binary collations is a separate subtask (SPARK-47248

).)

> contains, startswith, endswith
> --
>
> Key: SPARK-47131
> URL: https://issues.apache.org/jira/browse/SPARK-47131
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
>
> Refactored built-in string functions to enable collation support for: 
> {_}contains{_}, {_}startsWith{_}, {_}endsWith{_}. Spark SQL users should now 
> be able to use COLLATE within arguments for built-in string functions: 
> CONTAINS, STARTSWITH, ENDSWITH in Spark SQL queries. Note: CONTAINS 
> implementation for non-binary collations is a separate subtask (SPARK-47248).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47248) contains (non-binary collations)

2024-03-01 Thread Jira



 [ 
https://issues.apache.org/jira/browse/SPARK-47248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uroš Bojanić updated SPARK-47248:
-
Description: Implemented efficient collation-aware in-place substring 
comparison to enable collation support for: {_}contains{_}. Spark SQL users 
should now be able to use COLLATE within arguments for built-in string 
function: CONTAINS in Spark SQL queries.  (was: Enable efficient 
collation-aware in-place substring comparison.)
Summary: contains (non-binary collations)  (was: contains)

> contains (non-binary collations)
> 
>
> Key: SPARK-47248
> URL: https://issues.apache.org/jira/browse/SPARK-47248
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Priority: Major
>
> Implemented efficient collation-aware in-place substring comparison to enable 
> collation support for: {_}contains{_}. Spark SQL users should now be able to 
> use COLLATE within arguments for built-in string function: CONTAINS in Spark 
> SQL queries.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47248) contains (non-binary collations)

2024-03-01 Thread Jira



 [ 
https://issues.apache.org/jira/browse/SPARK-47248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uroš Bojanić updated SPARK-47248:
-
Description: Implemented efficient collation-aware in-place substring 
comparison to enable collation support for: {_}contains{_}. Spark SQL users 
should now be able to use COLLATE within arguments for built-in string 
function: CONTAINS in Spark SQL queries (for non-binary collations).  (was: 
Implemented efficient collation-aware in-place substring comparison to enable 
collation support for: {_}contains{_}. Spark SQL users should now be able to 
use COLLATE within arguments for built-in string function: CONTAINS in Spark 
SQL queries.)

> contains (non-binary collations)
> 
>
> Key: SPARK-47248
> URL: https://issues.apache.org/jira/browse/SPARK-47248
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Priority: Major
>
> Implemented efficient collation-aware in-place substring comparison to enable 
> collation support for: {_}contains{_}. Spark SQL users should now be able to 
> use COLLATE within arguments for built-in string function: CONTAINS in Spark 
> SQL queries (for non-binary collations).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-47248) contains

2024-03-01 Thread Jira

Uroš Bojanić created SPARK-47248:


 Summary: contains
 Key: SPARK-47248
 URL: https://issues.apache.org/jira/browse/SPARK-47248
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 4.0.0
Reporter: Uroš Bojanić


Enable efficient collation-aware in-place substring comparison.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47131) contains, startswith, endswith

2024-03-01 Thread Jira



 [ 
https://issues.apache.org/jira/browse/SPARK-47131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uroš Bojanić updated SPARK-47131:
-
Component/s: SQL
 (was: Spark Core)

> contains, startswith, endswith
> --
>
> Key: SPARK-47131
> URL: https://issues.apache.org/jira/browse/SPARK-47131
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
>
> Refactored built-in string functions to enable collation support for: 
> {_}contains{_}, {_}startsWith{_}, {_}endsWith{_}. Spark SQL users should now 
> be able to use COLLATE within arguments for built-in string functions: 
> CONTAINS, STARTSWITH, ENDSWITH in Spark SQL queries.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47217) De-duplication of Relations in Joins, can result in plan resolution failure

2024-03-01 Thread Peter Toth (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Toth updated SPARK-47217:
---
Shepherd:   (was: Peter Toth)

> De-duplication of Relations in Joins, can result in plan resolution failure
> ---
>
> Key: SPARK-47217
> URL: https://issues.apache.org/jira/browse/SPARK-47217
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.1
>Reporter: Asif
>Priority: Major
>  Labels: Spark-SQL, pull-request-available
>
> In case of some flavours of  nested joins involving repetition of relation, 
> the projected columns when passed to the DataFrame.select API , as form of 
> df.column , can result in plan resolution failure due to attribute resolution 
> not happening.
> A scenario in which this happens is
> {noformat}
>
>   Project ( dataframe A.column("col-a") )
>  |
>   Join2
>   || 
>Join1  DataFrame A  
>   |
>  DataFrame ADataFrame B
> {noformat}
> In such cases, If it so happens that Join2 - right leg DataFrame A gets 
> re-aliased due to De-Duplication of relations, and if the project uses Column 
> definition obtained from DataFrame A, its exprId will not match the 
> re-aliased Join2 - right Leg- DataFrame A , causing resolution failure.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47247) use smaller target size when coalescing partitions with exploding joins

2024-03-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47247:
---
Labels: pull-request-available  (was: )

> use smaller target size when coalescing partitions with exploding joins
> ---
>
> Key: SPARK-47247
> URL: https://issues.apache.org/jira/browse/SPARK-47247
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Wenchen Fan
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-47247) use smaller target size when coalescing partitions with exploding joins

2024-03-01 Thread Wenchen Fan (Jira)

Wenchen Fan created SPARK-47247:
---

 Summary: use smaller target size when coalescing partitions with 
exploding joins
 Key: SPARK-47247
 URL: https://issues.apache.org/jira/browse/SPARK-47247
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 4.0.0
Reporter: Wenchen Fan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-47131) contains, startswith, endswith

2024-03-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-47131:
--

Assignee: Apache Spark

> contains, startswith, endswith
> --
>
> Key: SPARK-47131
> URL: https://issues.apache.org/jira/browse/SPARK-47131
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Apache Spark
>Priority: Major
>  Labels: pull-request-available
>
> Refactored built-in string functions to enable collation support for: 
> {_}contains{_}, {_}startsWith{_}, {_}endsWith{_}. Spark SQL users should now 
> be able to use COLLATE within arguments for built-in string functions: 
> CONTAINS, STARTSWITH, ENDSWITH in Spark SQL queries.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-47131) contains, startswith, endswith

2024-03-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-47131:
--

Assignee: (was: Apache Spark)

> contains, startswith, endswith
> --
>
> Key: SPARK-47131
> URL: https://issues.apache.org/jira/browse/SPARK-47131
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
>
> Refactored built-in string functions to enable collation support for: 
> {_}contains{_}, {_}startsWith{_}, {_}endsWith{_}. Spark SQL users should now 
> be able to use COLLATE within arguments for built-in string functions: 
> CONTAINS, STARTSWITH, ENDSWITH in Spark SQL queries.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-47188) Add configuration to determine whether to exclude hive statistics properties

2024-03-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-47188:
--

Assignee: (was: Apache Spark)

> Add configuration to determine whether to exclude hive statistics properties
> 
>
> Key: SPARK-47188
> URL: https://issues.apache.org/jira/browse/SPARK-47188
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: xiaoping.huang
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-47188) Add configuration to determine whether to exclude hive statistics properties

2024-03-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-47188:
--

Assignee: Apache Spark

> Add configuration to determine whether to exclude hive statistics properties
> 
>
> Key: SPARK-47188
> URL: https://issues.apache.org/jira/browse/SPARK-47188
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: xiaoping.huang
>Assignee: Apache Spark
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-43255) Assign a name to the error class _LEGACY_ERROR_TEMP_2020

2024-03-01 Thread Max Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk reassigned SPARK-43255:


Assignee: Jin Helin

> Assign a name to the error class _LEGACY_ERROR_TEMP_2020
> 
>
> Key: SPARK-43255
> URL: https://issues.apache.org/jira/browse/SPARK-43255
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Max Gekk
>Assignee: Jin Helin
>Priority: Minor
>  Labels: pull-request-available, starter
>
> Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2020* defined in 
> {*}core/src/main/resources/error/error-classes.json{*}. The name should be 
> short but complete (look at the example in error-classes.json).
> Add a test which triggers the error from user code if such test still doesn't 
> exist. Check exception fields by using {*}checkError(){*}. The last function 
> checks valuable error fields only, and avoids dependencies from error text 
> message. In this way, tech editors can modify error format in 
> error-classes.json, and don't worry of Spark's internal tests. Migrate other 
> tests that might trigger the error onto checkError().
> If you cannot reproduce the error from user space (using SQL query), replace 
> the error by an internal error, see {*}SparkException.internalError(){*}.
> Improve the error message format in error-classes.json if the current is not 
> clear. Propose a solution to users how to avoid and fix such kind of errors.
> Please, look at the PR below as examples:
>  * [https://github.com/apache/spark/pull/38685]
>  * [https://github.com/apache/spark/pull/38656]
>  * [https://github.com/apache/spark/pull/38490]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-43255) Assign a name to the error class _LEGACY_ERROR_TEMP_2020

2024-03-01 Thread Max Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-43255.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45302
[https://github.com/apache/spark/pull/45302]

> Assign a name to the error class _LEGACY_ERROR_TEMP_2020
> 
>
> Key: SPARK-43255
> URL: https://issues.apache.org/jira/browse/SPARK-43255
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Max Gekk
>Assignee: Jin Helin
>Priority: Minor
>  Labels: pull-request-available, starter
> Fix For: 4.0.0
>
>
> Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2020* defined in 
> {*}core/src/main/resources/error/error-classes.json{*}. The name should be 
> short but complete (look at the example in error-classes.json).
> Add a test which triggers the error from user code if such test still doesn't 
> exist. Check exception fields by using {*}checkError(){*}. The last function 
> checks valuable error fields only, and avoids dependencies from error text 
> message. In this way, tech editors can modify error format in 
> error-classes.json, and don't worry of Spark's internal tests. Migrate other 
> tests that might trigger the error onto checkError().
> If you cannot reproduce the error from user space (using SQL query), replace 
> the error by an internal error, see {*}SparkException.internalError(){*}.
> Improve the error message format in error-classes.json if the current is not 
> clear. Propose a solution to users how to avoid and fix such kind of errors.
> Please, look at the PR below as examples:
>  * [https://github.com/apache/spark/pull/38685]
>  * [https://github.com/apache/spark/pull/38656]
>  * [https://github.com/apache/spark/pull/38490]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-47188) Add configuration to determine whether to exclude hive statistics properties

2024-03-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-47188:
--

Assignee: (was: Apache Spark)

> Add configuration to determine whether to exclude hive statistics properties
> 
>
> Key: SPARK-47188
> URL: https://issues.apache.org/jira/browse/SPARK-47188
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: xiaoping.huang
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-43255) Assign a name to the error class _LEGACY_ERROR_TEMP_2020

2024-03-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-43255:
--

Assignee: (was: Apache Spark)

> Assign a name to the error class _LEGACY_ERROR_TEMP_2020
> 
>
> Key: SPARK-43255
> URL: https://issues.apache.org/jira/browse/SPARK-43255
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Max Gekk
>Priority: Minor
>  Labels: pull-request-available, starter
>
> Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2020* defined in 
> {*}core/src/main/resources/error/error-classes.json{*}. The name should be 
> short but complete (look at the example in error-classes.json).
> Add a test which triggers the error from user code if such test still doesn't 
> exist. Check exception fields by using {*}checkError(){*}. The last function 
> checks valuable error fields only, and avoids dependencies from error text 
> message. In this way, tech editors can modify error format in 
> error-classes.json, and don't worry of Spark's internal tests. Migrate other 
> tests that might trigger the error onto checkError().
> If you cannot reproduce the error from user space (using SQL query), replace 
> the error by an internal error, see {*}SparkException.internalError(){*}.
> Improve the error message format in error-classes.json if the current is not 
> clear. Propose a solution to users how to avoid and fix such kind of errors.
> Please, look at the PR below as examples:
>  * [https://github.com/apache/spark/pull/38685]
>  * [https://github.com/apache/spark/pull/38656]
>  * [https://github.com/apache/spark/pull/38490]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-47188) Add configuration to determine whether to exclude hive statistics properties

2024-03-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-47188:
--

Assignee: Apache Spark

> Add configuration to determine whether to exclude hive statistics properties
> 
>
> Key: SPARK-47188
> URL: https://issues.apache.org/jira/browse/SPARK-47188
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: xiaoping.huang
>Assignee: Apache Spark
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-43255) Assign a name to the error class _LEGACY_ERROR_TEMP_2020

2024-03-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-43255:
--

Assignee: Apache Spark

> Assign a name to the error class _LEGACY_ERROR_TEMP_2020
> 
>
> Key: SPARK-43255
> URL: https://issues.apache.org/jira/browse/SPARK-43255
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Max Gekk
>Assignee: Apache Spark
>Priority: Minor
>  Labels: pull-request-available, starter
>
> Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2020* defined in 
> {*}core/src/main/resources/error/error-classes.json{*}. The name should be 
> short but complete (look at the example in error-classes.json).
> Add a test which triggers the error from user code if such test still doesn't 
> exist. Check exception fields by using {*}checkError(){*}. The last function 
> checks valuable error fields only, and avoids dependencies from error text 
> message. In this way, tech editors can modify error format in 
> error-classes.json, and don't worry of Spark's internal tests. Migrate other 
> tests that might trigger the error onto checkError().
> If you cannot reproduce the error from user space (using SQL query), replace 
> the error by an internal error, see {*}SparkException.internalError(){*}.
> Improve the error message format in error-classes.json if the current is not 
> clear. Propose a solution to users how to avoid and fix such kind of errors.
> Please, look at the PR below as examples:
>  * [https://github.com/apache/spark/pull/38685]
>  * [https://github.com/apache/spark/pull/38656]
>  * [https://github.com/apache/spark/pull/38490]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-43255) Assign a name to the error class _LEGACY_ERROR_TEMP_2020

2024-03-01 Thread Jin Helin (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-43255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17822461#comment-17822461
 ] 

Jin Helin commented on SPARK-43255:
---

I will work on this.

> Assign a name to the error class _LEGACY_ERROR_TEMP_2020
> 
>
> Key: SPARK-43255
> URL: https://issues.apache.org/jira/browse/SPARK-43255
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Max Gekk
>Priority: Minor
>  Labels: pull-request-available, starter
>
> Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2020* defined in 
> {*}core/src/main/resources/error/error-classes.json{*}. The name should be 
> short but complete (look at the example in error-classes.json).
> Add a test which triggers the error from user code if such test still doesn't 
> exist. Check exception fields by using {*}checkError(){*}. The last function 
> checks valuable error fields only, and avoids dependencies from error text 
> message. In this way, tech editors can modify error format in 
> error-classes.json, and don't worry of Spark's internal tests. Migrate other 
> tests that might trigger the error onto checkError().
> If you cannot reproduce the error from user space (using SQL query), replace 
> the error by an internal error, see {*}SparkException.internalError(){*}.
> Improve the error message format in error-classes.json if the current is not 
> clear. Propose a solution to users how to avoid and fix such kind of errors.
> Please, look at the PR below as examples:
>  * [https://github.com/apache/spark/pull/38685]
>  * [https://github.com/apache/spark/pull/38656]
>  * [https://github.com/apache/spark/pull/38490]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47244) SparkConnectPlanner make internal functions private

2024-03-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47244:
---
Labels: pull-request-available  (was: )

> SparkConnectPlanner make internal functions private
> ---
>
> Key: SPARK-47244
> URL: https://issues.apache.org/jira/browse/SPARK-47244
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-47244) SparkConnectPlanner make internal functions private

2024-03-01 Thread Ruifeng Zheng (Jira)

Ruifeng Zheng created SPARK-47244:
-

 Summary: SparkConnectPlanner make internal functions private
 Key: SPARK-47244
 URL: https://issues.apache.org/jira/browse/SPARK-47244
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 4.0.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

50 matches

Mail list logo