[jira] [Commented] (SPARK-44173) Make Spark an sbt build only project
[ https://issues.apache.org/jira/browse/SPARK-44173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17822747#comment-17822747 ] Yang Jie commented on SPARK-44173: -- Hi, [~dongjoon] ~ Sorry, I missed the previous message. This Jira was created based on some discussions in https://github.com/apache/spark/pull/40317. With the establishment of the Maven daily test pipeline, we now have a way to discover problems in Maven tests in a timely manner, so the description in this Jira's `Description` has become less critical. I agree with your point, thank you for converting this to a normal Jira :) > Make Spark an sbt build only project > > > Key: SPARK-44173 > URL: https://issues.apache.org/jira/browse/SPARK-44173 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: Yang Jie >Priority: Minor > > Supporting both Maven and SBT always brings various testing problems and > increases the complexity of testing code writing > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47148) Avoid to materialize AQE ExchangeQueryStageExec on the cancellation
[ https://issues.apache.org/jira/browse/SPARK-47148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eren Avsarogullari updated SPARK-47148: --- Description: AQE can materialize both *ShuffleQueryStage* and *BroadcastQueryStage* on the cancellation. This causes unnecessary stage materialization by submitting Shuffle Job and Broadcast Job. Under normal circumstances, if the stage is already non-materialized (a.k.a *ShuffleQueryStage.shuffleFuture* or *{{BroadcastQueryStage.broadcastFuture}}* is not initialized yet), it should just be skipped without materializing it. Please find sample use-case: *1- Stage Materialization Steps:* When stage materialization is failed: {code:java} 1.1- ShuffleQueryStage1 - is materialized successfully, 1.2- ShuffleQueryStage2 - materialization is failed, 1.3- ShuffleQueryStage3 - Not materialized yet so ShuffleQueryStage3.shuffleFuture is not initialized yet{code} *2- Stage Cancellation Steps:* {code:java} 2.1- ShuffleQueryStage1 - is canceled due to already materialized, 2.2- ShuffleQueryStage2 - is earlyFailedStage so currently, it is skipped as default by AQE because it could not be materialized, 2.3- ShuffleQueryStage3 - Problem is here: This stage is not materialized yet but currently, it is also tried to cancel and this stage requires to be materialized first.{code} was: AQE can materialize *ShuffleQueryStage* on the cancellation. This causes unnecessary stage materialization by submitting Shuffle Job. Under normal circumstances, if the stage is already non-materialized (a.k.a ShuffleQueryStage.shuffleFuture is not initialized yet), it should just be skipped without materializing it. Please find sample use-case: *1- Stage Materialization Steps:* When stage materialization is failed: {code:java} 1.1- ShuffleQueryStage1 - is materialized successfully, 1.2- ShuffleQueryStage2 - materialization is failed, 1.3- ShuffleQueryStage3 - Not materialized yet so ShuffleQueryStage3.shuffleFuture is not initialized yet{code} *2- Stage Cancellation Steps:* {code:java} 2.1- ShuffleQueryStage1 - is canceled due to already materialized, 2.2- ShuffleQueryStage2 - is earlyFailedStage so currently, it is skipped as default by AQE because it could not be materialized, 2.3- ShuffleQueryStage3 - Problem is here: This stage is not materialized yet but currently, it is also tried to cancel and this stage requires to be materialized first.{code} > Avoid to materialize AQE ExchangeQueryStageExec on the cancellation > --- > > Key: SPARK-47148 > URL: https://issues.apache.org/jira/browse/SPARK-47148 > Project: Spark > Issue Type: Bug > Components: Shuffle, SQL >Affects Versions: 4.0.0 >Reporter: Eren Avsarogullari >Priority: Major > Labels: pull-request-available > > AQE can materialize both *ShuffleQueryStage* and *BroadcastQueryStage* on the > cancellation. This causes unnecessary stage materialization by submitting > Shuffle Job and Broadcast Job. Under normal circumstances, if the stage is > already non-materialized (a.k.a *ShuffleQueryStage.shuffleFuture* or > *{{BroadcastQueryStage.broadcastFuture}}* is not initialized yet), it should > just be skipped without materializing it. > Please find sample use-case: > *1- Stage Materialization Steps:* > When stage materialization is failed: > {code:java} > 1.1- ShuffleQueryStage1 - is materialized successfully, > 1.2- ShuffleQueryStage2 - materialization is failed, > 1.3- ShuffleQueryStage3 - Not materialized yet so > ShuffleQueryStage3.shuffleFuture is not initialized yet{code} > *2- Stage Cancellation Steps:* > {code:java} > 2.1- ShuffleQueryStage1 - is canceled due to already materialized, > 2.2- ShuffleQueryStage2 - is earlyFailedStage so currently, it is skipped as > default by AQE because it could not be materialized, > 2.3- ShuffleQueryStage3 - Problem is here: This stage is not materialized yet > but currently, it is also tried to cancel and this stage requires to be > materialized first.{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24815) Structured Streaming should support dynamic allocation
[ https://issues.apache.org/jira/browse/SPARK-24815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17822722#comment-17822722 ] Pavan Kotikalapudi commented on SPARK-24815: Thanks a lot for mentoring and driving this effort Mich. As you suggested I will update the benefits and challenges in the SPIP doc. That can outline the scope of the current work and possibility of any future work for other use cases. Re: > Pluggable Dynamic Allocation , Separate Algorithm for Structured Streaming I really like the idea. I started off with that but limited it to only core module as it serves at primitive level of evaluation (that current dra is already doing). but this idea is better as you said design wise and also for different kinds of workloads. > Warning for Enabled Core Dynamic Allocation Right now we need normal DRA because structured streaming DRA is built on top of it. I have added another flag `spark.dynamicAllocation.streaming.enabled` so that this particular pieces of streaming algo would kick in on top of traditional DRA. This approach also makes it backwards compatible especially when users have to upgrade spark. > Structured Streaming should support dynamic allocation > -- > > Key: SPARK-24815 > URL: https://issues.apache.org/jira/browse/SPARK-24815 > Project: Spark > Issue Type: Improvement > Components: Scheduler, Spark Core, Structured Streaming >Affects Versions: 2.3.1 >Reporter: Karthik Palaniappan >Priority: Minor > Labels: pull-request-available > > For batch jobs, dynamic allocation is very useful for adding and removing > containers to match the actual workload. On multi-tenant clusters, it ensures > that a Spark job is taking no more resources than necessary. In cloud > environments, it enables autoscaling. > However, if you set spark.dynamicAllocation.enabled=true and run a structured > streaming job, the batch dynamic allocation algorithm kicks in. It requests > more executors if the task backlog is a certain size, and removes executors > if they idle for a certain period of time. > Quick thoughts: > 1) Dynamic allocation should be pluggable, rather than hardcoded to a > particular implementation in SparkContext.scala (this should be a separate > JIRA). > 2) We should make a structured streaming algorithm that's separate from the > batch algorithm. Eventually, continuous processing might need its own > algorithm. > 3) Spark should print a warning if you run a structured streaming job when > Core's dynamic allocation is enabled -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47251) Block invalid types from the `args` argument for `sql` command
[ https://issues.apache.org/jira/browse/SPARK-47251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47251: --- Labels: pull-request-available (was: ) > Block invalid types from the `args` argument for `sql` command > -- > > Key: SPARK-47251 > URL: https://issues.apache.org/jira/browse/SPARK-47251 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.5.1 >Reporter: Takuya Ueshin >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47251) Block invalid types from the `args` argument for `sql` command
[ https://issues.apache.org/jira/browse/SPARK-47251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin updated SPARK-47251: -- Summary: Block invalid types from the `args` argument for `sql` command (was: Block invalid types from the `arg` argument for `sql` command) > Block invalid types from the `args` argument for `sql` command > -- > > Key: SPARK-47251 > URL: https://issues.apache.org/jira/browse/SPARK-47251 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.5.1 >Reporter: Takuya Ueshin >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47158) Assign proper name and sqlState to _LEGACY_ERROR_TEMP_2134 & 2231
[ https://issues.apache.org/jira/browse/SPARK-47158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk reassigned SPARK-47158: Assignee: Haejoon Lee > Assign proper name and sqlState to _LEGACY_ERROR_TEMP_2134 & 2231 > - > > Key: SPARK-47158 > URL: https://issues.apache.org/jira/browse/SPARK-47158 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Labels: pull-request-available > > Assign proper name and sqlState to _LEGACY_ERROR_TEMP_2134 & 2231 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47251) Block invalid types from the `arg` argument for `sql` command
Takuya Ueshin created SPARK-47251: - Summary: Block invalid types from the `arg` argument for `sql` command Key: SPARK-47251 URL: https://issues.apache.org/jira/browse/SPARK-47251 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 3.5.1 Reporter: Takuya Ueshin -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47158) Assign proper name and sqlState to _LEGACY_ERROR_TEMP_2134 & 2231
[ https://issues.apache.org/jira/browse/SPARK-47158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk resolved SPARK-47158. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45244 [https://github.com/apache/spark/pull/45244] > Assign proper name and sqlState to _LEGACY_ERROR_TEMP_2134 & 2231 > - > > Key: SPARK-47158 > URL: https://issues.apache.org/jira/browse/SPARK-47158 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Assign proper name and sqlState to _LEGACY_ERROR_TEMP_2134 & 2231 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47237) Upgrade xmlschema-core to 2.3.1
[ https://issues.apache.org/jira/browse/SPARK-47237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk resolved SPARK-47237. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45347 [https://github.com/apache/spark/pull/45347] > Upgrade xmlschema-core to 2.3.1 > --- > > Key: SPARK-47237 > URL: https://issues.apache.org/jira/browse/SPARK-47237 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47237) Upgrade xmlschema-core to 2.3.1
[ https://issues.apache.org/jira/browse/SPARK-47237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk reassigned SPARK-47237: Assignee: BingKun Pan > Upgrade xmlschema-core to 2.3.1 > --- > > Key: SPARK-47237 > URL: https://issues.apache.org/jira/browse/SPARK-47237 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47216) Refine layout of SQL performance tuning page
[ https://issues.apache.org/jira/browse/SPARK-47216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk resolved SPARK-47216. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45322 [https://github.com/apache/spark/pull/45322] > Refine layout of SQL performance tuning page > > > Key: SPARK-47216 > URL: https://issues.apache.org/jira/browse/SPARK-47216 > Project: Spark > Issue Type: Improvement > Components: Documentation >Affects Versions: 4.0.0 >Reporter: Nicholas Chammas >Assignee: Nicholas Chammas >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47216) Refine layout of SQL performance tuning page
[ https://issues.apache.org/jira/browse/SPARK-47216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk reassigned SPARK-47216: Assignee: Nicholas Chammas > Refine layout of SQL performance tuning page > > > Key: SPARK-47216 > URL: https://issues.apache.org/jira/browse/SPARK-47216 > Project: Spark > Issue Type: Improvement > Components: Documentation >Affects Versions: 4.0.0 >Reporter: Nicholas Chammas >Assignee: Nicholas Chammas >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47250) Move additional RocksDB errors/exceptions to NERF
[ https://issues.apache.org/jira/browse/SPARK-47250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47250: --- Labels: pull-request-available (was: ) > Move additional RocksDB errors/exceptions to NERF > - > > Key: SPARK-47250 > URL: https://issues.apache.org/jira/browse/SPARK-47250 > Project: Spark > Issue Type: Task > Components: Structured Streaming >Affects Versions: 4.0.0 >Reporter: Anish Shrigondekar >Priority: Major > Labels: pull-request-available > > Move additional RocksDB errors/exceptions to NERF -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47243) Correct the package name of `StateMetadataSource.scala`
[ https://issues.apache.org/jira/browse/SPARK-47243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk reassigned SPARK-47243: Assignee: Yang Jie > Correct the package name of `StateMetadataSource.scala` > --- > > Key: SPARK-47243 > URL: https://issues.apache.org/jira/browse/SPARK-47243 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47243) Correct the package name of `StateMetadataSource.scala`
[ https://issues.apache.org/jira/browse/SPARK-47243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk resolved SPARK-47243. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45352 [https://github.com/apache/spark/pull/45352] > Correct the package name of `StateMetadataSource.scala` > --- > > Key: SPARK-47243 > URL: https://issues.apache.org/jira/browse/SPARK-47243 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47250) Move additional RocksDB errors/exceptions to NERF
Anish Shrigondekar created SPARK-47250: -- Summary: Move additional RocksDB errors/exceptions to NERF Key: SPARK-47250 URL: https://issues.apache.org/jira/browse/SPARK-47250 Project: Spark Issue Type: Task Components: Structured Streaming Affects Versions: 4.0.0 Reporter: Anish Shrigondekar Move additional RocksDB errors/exceptions to NERF -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-46762) Spark Connect 3.5 Classloading issue with external jar
[ https://issues.apache.org/jira/browse/SPARK-46762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17822653#comment-17822653 ] nirav patel commented on SPARK-46762: - just realized I added spark-connect 3.4 example start up command instead of 3.5 . I just updated it in OP. > Spark Connect 3.5 Classloading issue with external jar > -- > > Key: SPARK-46762 > URL: https://issues.apache.org/jira/browse/SPARK-46762 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.5.0 >Reporter: nirav patel >Priority: Major > Attachments: Screenshot 2024-02-22 at 2.04.37 PM.png, Screenshot > 2024-02-22 at 2.04.49 PM.png > > > We are having following `java.lang.ClassCastException` error in spark > Executors when using spark-connect 3.5 with external spark sql catalog jar - > iceberg-spark-runtime-3.5_2.12-1.4.3.jar > We also set "spark.executor.userClassPathFirst=true" otherwise child class > gets loaded by MutableClassLoader and parent class gets loaded by > ChildFirstCLassLoader and that causes ClassCastException as well. > > {code:java} > pyspark.errors.exceptions.connect.SparkConnectGrpcException: > (org.apache.spark.SparkException) Job aborted due to stage failure: Task 0 in > stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 > (TID 3) (spark35-m.c.mycomp-dev-test.internal executor 2): > java.lang.ClassCastException: class > org.apache.iceberg.spark.source.SerializableTableWithSize cannot be cast to > class org.apache.iceberg.Table > (org.apache.iceberg.spark.source.SerializableTableWithSize is in unnamed > module of loader org.apache.spark.util.ChildFirstURLClassLoader @5e7ae053; > org.apache.iceberg.Table is in unnamed module of loader > org.apache.spark.util.ChildFirstURLClassLoader @4b18b943) > at > org.apache.iceberg.spark.source.SparkInputPartition.table(SparkInputPartition.java:88) > at > org.apache.iceberg.spark.source.RowDataReader.(RowDataReader.java:50) > at > org.apache.iceberg.spark.source.SparkRowReaderFactory.createReader(SparkRowReaderFactory.java:45) > at > org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.advanceToNextIter(DataSourceRDD.scala:84) > at > org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.hasNext(DataSourceRDD.scala:63) > at > org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) > at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) > at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:388) > at > org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:890) > at > org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:890) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:328) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93) > at > org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161) > at org.apache.spark.scheduler.Task.run(Task.scala:141) > at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620) > at org.apach...{code} > > `org.apache.iceberg.spark.source.SerializableTableWithSize` is a child of > `org.apache.iceberg.Table` and they are both in only one jar > `iceberg-spark-runtime-3.5_2.12-1.4.3.jar` > We verified that there's only one jar of > `iceberg-spark-runtime-3.5_2.12-1.4.3.jar` loaded when spark-connect server > is started. > Looking more into Error it seems classloader itself is instantiated multiple > times somewhere. I can see two instances: > org.apache.spark.util.ChildFirstURLClassLoader @5e7ae053 and > org.apache.spark.util.ChildFirstURLClassLoader @4b18b943 > > *Affected version:* > spark 3.5 and spark-connect_2.12:3.5.0 works fine > > *Not affected version and variation:* > Spark 3.4 and spark-connect_2.12:3.4.0 works fine with external jar > Also works with just Spark 3.5 spark-submit script directly (ie without using > spark-connect 3.5 ) > > Issue has been open with Iceberg as well: > [https://github.com/apache/iceberg/issues/8978] > And been discussed in dev@org.apache.iceberg: > [https://lists.apache.org/thread/5q1pdqqrd1h06hgs8vx9ztt60z5yv8n1] > > > Steps to reproduce: > > 1) Just to see that spark is loading same class twice using different > classloader: > > Start spark-connect server with required jars and configuration for > iceberg-hive catalog. > {code:java} > sudo /usr/lib/spark/sbin/start-connect-server.sh \ > --packages
[jira] [Updated] (SPARK-46762) Spark Connect 3.5 Classloading issue with external jar
[ https://issues.apache.org/jira/browse/SPARK-46762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nirav patel updated SPARK-46762: Description: We are having following `java.lang.ClassCastException` error in spark Executors when using spark-connect 3.5 with external spark sql catalog jar - iceberg-spark-runtime-3.5_2.12-1.4.3.jar We also set "spark.executor.userClassPathFirst=true" otherwise child class gets loaded by MutableClassLoader and parent class gets loaded by ChildFirstCLassLoader and that causes ClassCastException as well. {code:java} pyspark.errors.exceptions.connect.SparkConnectGrpcException: (org.apache.spark.SparkException) Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3) (spark35-m.c.mycomp-dev-test.internal executor 2): java.lang.ClassCastException: class org.apache.iceberg.spark.source.SerializableTableWithSize cannot be cast to class org.apache.iceberg.Table (org.apache.iceberg.spark.source.SerializableTableWithSize is in unnamed module of loader org.apache.spark.util.ChildFirstURLClassLoader @5e7ae053; org.apache.iceberg.Table is in unnamed module of loader org.apache.spark.util.ChildFirstURLClassLoader @4b18b943) at org.apache.iceberg.spark.source.SparkInputPartition.table(SparkInputPartition.java:88) at org.apache.iceberg.spark.source.RowDataReader.(RowDataReader.java:50) at org.apache.iceberg.spark.source.SparkRowReaderFactory.createReader(SparkRowReaderFactory.java:45) at org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.advanceToNextIter(DataSourceRDD.scala:84) at org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.hasNext(DataSourceRDD.scala:63) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:388) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:890) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:890) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364) at org.apache.spark.rdd.RDD.iterator(RDD.scala:328) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93) at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161) at org.apache.spark.scheduler.Task.run(Task.scala:141) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620) at org.apach...{code} `org.apache.iceberg.spark.source.SerializableTableWithSize` is a child of `org.apache.iceberg.Table` and they are both in only one jar `iceberg-spark-runtime-3.5_2.12-1.4.3.jar` We verified that there's only one jar of `iceberg-spark-runtime-3.5_2.12-1.4.3.jar` loaded when spark-connect server is started. Looking more into Error it seems classloader itself is instantiated multiple times somewhere. I can see two instances: org.apache.spark.util.ChildFirstURLClassLoader @5e7ae053 and org.apache.spark.util.ChildFirstURLClassLoader @4b18b943 *Affected version:* spark 3.5 and spark-connect_2.12:3.5.0 works fine *Not affected version and variation:* Spark 3.4 and spark-connect_2.12:3.4.0 works fine with external jar Also works with just Spark 3.5 spark-submit script directly (ie without using spark-connect 3.5 ) Issue has been open with Iceberg as well: [https://github.com/apache/iceberg/issues/8978] And been discussed in dev@org.apache.iceberg: [https://lists.apache.org/thread/5q1pdqqrd1h06hgs8vx9ztt60z5yv8n1] Steps to reproduce: 1) Just to see that spark is loading same class twice using different classloader: Start spark-connect server with required jars and configuration for iceberg-hive catalog. {code:java} sudo /usr/lib/spark/sbin/start-connect-server.sh \ --packages org.apache.spark:spark-connect_2.12:3.5.0 \ --jars gs://libs/iceberg-spark-runtime-3.5_2.12-1.4.3.jar \ --conf "spark.executor.extraJavaOptions=-verbose:class" \ --conf "spark.sql.catalog.iceberg_catalog=org.apache.iceberg.spark.SparkCatalog" \ --conf "spark.sql.catalog.iceberg_catalog.type=hive" \ --conf "spark.sql.catalog.iceberg_catalog.uri = thrift://metastore-host:port {code} reference: [https://iceberg.apache.org/docs/1.4.2/spark-configuration/#catalogs] Since i Have `"spark.executor.extraJavaOptions=-verbose:class"` you should see in executor logs that `org.apache.iceberg.Table` is loaded twice You can take a heapdump like I did to verify that as well. I already have a screenshot for heapdumps. 2) To actually reproduce ClassCastException: Try running a spark query
[jira] [Updated] (SPARK-44173) Make Spark an sbt build only project
[ https://issues.apache.org/jira/browse/SPARK-44173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-44173: -- Parent: (was: SPARK-44111) Issue Type: Improvement (was: Sub-task) > Make Spark an sbt build only project > > > Key: SPARK-44173 > URL: https://issues.apache.org/jira/browse/SPARK-44173 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: Yang Jie >Priority: Minor > > Supporting both Maven and SBT always brings various testing problems and > increases the complexity of testing code writing > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44173) Make Spark an sbt build only project
[ https://issues.apache.org/jira/browse/SPARK-44173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17822651#comment-17822651 ] Dongjoon Hyun commented on SPARK-44173: --- Hi, [~LuciferYang] . Given that the difficulty in SBT's dependency management, let's consider this separately as a long term perspective. I converted this to a normal Jira. > Make Spark an sbt build only project > > > Key: SPARK-44173 > URL: https://issues.apache.org/jira/browse/SPARK-44173 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 4.0.0 >Reporter: Yang Jie >Priority: Minor > > Supporting both Maven and SBT always brings various testing problems and > increases the complexity of testing code writing > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47043) Fix `spark-common-utils` module to have explicit `jackson-core` and `jackson-annotations`dependency
[ https://issues.apache.org/jira/browse/SPARK-47043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-47043: -- Parent: (was: SPARK-44111) Issue Type: Improvement (was: Sub-task) > Fix `spark-common-utils` module to have explicit `jackson-core` and > `jackson-annotations`dependency > --- > > Key: SPARK-47043 > URL: https://issues.apache.org/jira/browse/SPARK-47043 > Project: Spark > Issue Type: Improvement > Components: Build, Tests >Affects Versions: 4.0.0 >Reporter: William Wong >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Following scala code depends on `jackson-core` and `jackson-annotations` > explicitly. However, spark-common-utils modules missing related dependency. > {code:java} > ~/dev/sources/spark$ grep -R jackson.core ./common/utils/* | grep import > ./common/utils/src/main/scala/org/apache/spark/ErrorClassesJSONReader.scala:import > com.fasterxml.jackson.core.`type`.TypeReference > ./common/utils/src/main/scala/org/apache/spark/util/JsonUtils.scala:import > com.fasterxml.jackson.core.{JsonEncoding, JsonGenerator} > ~/dev/sources/spark$ grep -R jackson.annotation ./common/utils/* | grep import > ./common/utils/src/main/scala/org/apache/spark/ErrorClassesJSONReader.scala:import > com.fasterxml.jackson.annotation.JsonIgnore > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47042) Fix `spark-common-utils` module to have explicit `commons-lang3` dependency
[ https://issues.apache.org/jira/browse/SPARK-47042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-47042: -- Parent: (was: SPARK-44111) Issue Type: Improvement (was: Sub-task) > Fix `spark-common-utils` module to have explicit `commons-lang3` dependency > --- > > Key: SPARK-47042 > URL: https://issues.apache.org/jira/browse/SPARK-47042 > Project: Spark > Issue Type: Improvement > Components: Build, Tests >Affects Versions: 4.0.0 >Reporter: William Wong >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Following scala code depends on `commons-lang3` explicitly. However, the > common-utils modules missing related dependency. > {code:java} > ~/dev/sources/spark/common/utils$ grep -R lang3 * | grep import > src/main/scala/org/apache/spark/util/MavenUtils.scala:import > org.apache.commons.lang3.StringUtils > src/main/scala/org/apache/spark/util/ClosureCleaner.scala:import > org.apache.commons.lang3.ClassUtils > src/main/java/org/apache/spark/network/util/JavaUtils.java:import > org.apache.commons.lang3.SystemUtils; {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41392) Add `bouncy-castle` test dependencies to `sql/core` module for Hadoop 3.4.0
[ https://issues.apache.org/jira/browse/SPARK-41392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-41392: -- Summary: Add `bouncy-castle` test dependencies to `sql/core` module for Hadoop 3.4.0 (was: spark builds against hadoop trunk/3.4.0-SNAPSHOT fail in scala-maven plugin) > Add `bouncy-castle` test dependencies to `sql/core` module for Hadoop 3.4.0 > --- > > Key: SPARK-41392 > URL: https://issues.apache.org/jira/browse/SPARK-41392 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 4.0.0 >Reporter: Steve Loughran >Assignee: Yang Jie >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > on hadoop trunk (but not the 3.3.x line), spark builds fail with a CNFE > {code} > net.alchim31.maven:scala-maven-plugin:4.7.2:testCompile: > org/bouncycastle/jce/provider/BouncyCastleProvider > {code} > full stack > {code} > [ERROR] Failed to execute goal > net.alchim31.maven:scala-maven-plugin:4.7.2:testCompile > (scala-test-compile-first) on project spark-sql_2.12: Execution > scala-test-compile-first of goal > net.alchim31.maven:scala-maven-plugin:4.7.2:testCompile failed: A required > class was missing while executing > net.alchim31.maven:scala-maven-plugin:4.7.2:testCompile: > org/bouncycastle/jce/provider/BouncyCastleProvider > [ERROR] - > [ERROR] realm =plugin>net.alchim31.maven:scala-maven-plugin:4.7.2 > [ERROR] strategy = org.codehaus.plexus.classworlds.strategy.SelfFirstStrategy > [ERROR] urls[0] = > file:/Users/stevel/.m2/repository/net/alchim31/maven/scala-maven-plugin/4.7.2/scala-maven-plugin-4.7.2.jar > [ERROR] urls[1] = > file:/Users/stevel/.m2/repository/org/apache/maven/shared/maven-dependency-tree/3.2.0/maven-dependency-tree-3.2.0.jar > [ERROR] urls[2] = > file:/Users/stevel/.m2/repository/org/eclipse/aether/aether-util/1.0.0.v20140518/aether-util-1.0.0.v20140518.jar > [ERROR] urls[3] = > file:/Users/stevel/.m2/repository/org/apache/maven/reporting/maven-reporting-api/3.1.1/maven-reporting-api-3.1.1.jar > [ERROR] urls[4] = > file:/Users/stevel/.m2/repository/org/apache/maven/doxia/doxia-sink-api/1.11.1/doxia-sink-api-1.11.1.jar > [ERROR] urls[5] = > file:/Users/stevel/.m2/repository/org/apache/maven/doxia/doxia-logging-api/1.11.1/doxia-logging-api-1.11.1.jar > [ERROR] urls[6] = > file:/Users/stevel/.m2/repository/org/apache/maven/maven-archiver/3.6.0/maven-archiver-3.6.0.jar > [ERROR] urls[7] = > file:/Users/stevel/.m2/repository/org/codehaus/plexus/plexus-io/3.4.0/plexus-io-3.4.0.jar > [ERROR] urls[8] = > file:/Users/stevel/.m2/repository/org/codehaus/plexus/plexus-interpolation/1.26/plexus-interpolation-1.26.jar > [ERROR] urls[9] = > file:/Users/stevel/.m2/repository/org/apache/commons/commons-exec/1.3/commons-exec-1.3.jar > [ERROR] urls[10] = > file:/Users/stevel/.m2/repository/org/codehaus/plexus/plexus-utils/3.4.2/plexus-utils-3.4.2.jar > [ERROR] urls[11] = > file:/Users/stevel/.m2/repository/org/codehaus/plexus/plexus-archiver/4.5.0/plexus-archiver-4.5.0.jar > [ERROR] urls[12] = > file:/Users/stevel/.m2/repository/commons-io/commons-io/2.11.0/commons-io-2.11.0.jar > [ERROR] urls[13] = > file:/Users/stevel/.m2/repository/org/apache/commons/commons-compress/1.21/commons-compress-1.21.jar > [ERROR] urls[14] = > file:/Users/stevel/.m2/repository/org/iq80/snappy/snappy/0.4/snappy-0.4.jar > [ERROR] urls[15] = > file:/Users/stevel/.m2/repository/org/tukaani/xz/1.9/xz-1.9.jar > [ERROR] urls[16] = > file:/Users/stevel/.m2/repository/com/github/luben/zstd-jni/1.5.2-4/zstd-jni-1.5.2-4.jar > [ERROR] urls[17] = > file:/Users/stevel/.m2/repository/org/scala-sbt/zinc_2.13/1.7.1/zinc_2.13-1.7.1.jar > [ERROR] urls[18] = > file:/Users/stevel/.m2/repository/org/scala-lang/scala-library/2.13.8/scala-library-2.13.8.jar > [ERROR] urls[19] = > file:/Users/stevel/.m2/repository/org/scala-sbt/zinc-core_2.13/1.7.1/zinc-core_2.13-1.7.1.jar > [ERROR] urls[20] = > file:/Users/stevel/.m2/repository/org/scala-sbt/zinc-apiinfo_2.13/1.7.1/zinc-apiinfo_2.13-1.7.1.jar > [ERROR] urls[21] = > file:/Users/stevel/.m2/repository/org/scala-sbt/compiler-bridge_2.13/1.7.1/compiler-bridge_2.13-1.7.1.jar > [ERROR] urls[22] = > file:/Users/stevel/.m2/repository/org/scala-sbt/zinc-classpath_2.13/1.7.1/zinc-classpath_2.13-1.7.1.jar > [ERROR] urls[23] = > file:/Users/stevel/.m2/repository/org/scala-lang/scala-compiler/2.13.8/scala-compiler-2.13.8.jar > [ERROR] urls[24] = > file:/Users/stevel/.m2/repository/org/scala-sbt/compiler-interface/1.7.1/compiler-interface-1.7.1.jar > [ERROR] urls[25] = > file:/Users/stevel/.m2/repository/org/scala-sbt/util-interface/1.7.0/util-interface-1.7.0.jar > [ERROR] urls[26] = >
[jira] [Commented] (SPARK-41392) spark builds against hadoop trunk/3.4.0-SNAPSHOT fail in scala-maven plugin
[ https://issues.apache.org/jira/browse/SPARK-41392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17822649#comment-17822649 ] Dongjoon Hyun commented on SPARK-41392: --- It's a great news. :) > spark builds against hadoop trunk/3.4.0-SNAPSHOT fail in scala-maven plugin > --- > > Key: SPARK-41392 > URL: https://issues.apache.org/jira/browse/SPARK-41392 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 4.0.0 >Reporter: Steve Loughran >Assignee: Yang Jie >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > on hadoop trunk (but not the 3.3.x line), spark builds fail with a CNFE > {code} > net.alchim31.maven:scala-maven-plugin:4.7.2:testCompile: > org/bouncycastle/jce/provider/BouncyCastleProvider > {code} > full stack > {code} > [ERROR] Failed to execute goal > net.alchim31.maven:scala-maven-plugin:4.7.2:testCompile > (scala-test-compile-first) on project spark-sql_2.12: Execution > scala-test-compile-first of goal > net.alchim31.maven:scala-maven-plugin:4.7.2:testCompile failed: A required > class was missing while executing > net.alchim31.maven:scala-maven-plugin:4.7.2:testCompile: > org/bouncycastle/jce/provider/BouncyCastleProvider > [ERROR] - > [ERROR] realm =plugin>net.alchim31.maven:scala-maven-plugin:4.7.2 > [ERROR] strategy = org.codehaus.plexus.classworlds.strategy.SelfFirstStrategy > [ERROR] urls[0] = > file:/Users/stevel/.m2/repository/net/alchim31/maven/scala-maven-plugin/4.7.2/scala-maven-plugin-4.7.2.jar > [ERROR] urls[1] = > file:/Users/stevel/.m2/repository/org/apache/maven/shared/maven-dependency-tree/3.2.0/maven-dependency-tree-3.2.0.jar > [ERROR] urls[2] = > file:/Users/stevel/.m2/repository/org/eclipse/aether/aether-util/1.0.0.v20140518/aether-util-1.0.0.v20140518.jar > [ERROR] urls[3] = > file:/Users/stevel/.m2/repository/org/apache/maven/reporting/maven-reporting-api/3.1.1/maven-reporting-api-3.1.1.jar > [ERROR] urls[4] = > file:/Users/stevel/.m2/repository/org/apache/maven/doxia/doxia-sink-api/1.11.1/doxia-sink-api-1.11.1.jar > [ERROR] urls[5] = > file:/Users/stevel/.m2/repository/org/apache/maven/doxia/doxia-logging-api/1.11.1/doxia-logging-api-1.11.1.jar > [ERROR] urls[6] = > file:/Users/stevel/.m2/repository/org/apache/maven/maven-archiver/3.6.0/maven-archiver-3.6.0.jar > [ERROR] urls[7] = > file:/Users/stevel/.m2/repository/org/codehaus/plexus/plexus-io/3.4.0/plexus-io-3.4.0.jar > [ERROR] urls[8] = > file:/Users/stevel/.m2/repository/org/codehaus/plexus/plexus-interpolation/1.26/plexus-interpolation-1.26.jar > [ERROR] urls[9] = > file:/Users/stevel/.m2/repository/org/apache/commons/commons-exec/1.3/commons-exec-1.3.jar > [ERROR] urls[10] = > file:/Users/stevel/.m2/repository/org/codehaus/plexus/plexus-utils/3.4.2/plexus-utils-3.4.2.jar > [ERROR] urls[11] = > file:/Users/stevel/.m2/repository/org/codehaus/plexus/plexus-archiver/4.5.0/plexus-archiver-4.5.0.jar > [ERROR] urls[12] = > file:/Users/stevel/.m2/repository/commons-io/commons-io/2.11.0/commons-io-2.11.0.jar > [ERROR] urls[13] = > file:/Users/stevel/.m2/repository/org/apache/commons/commons-compress/1.21/commons-compress-1.21.jar > [ERROR] urls[14] = > file:/Users/stevel/.m2/repository/org/iq80/snappy/snappy/0.4/snappy-0.4.jar > [ERROR] urls[15] = > file:/Users/stevel/.m2/repository/org/tukaani/xz/1.9/xz-1.9.jar > [ERROR] urls[16] = > file:/Users/stevel/.m2/repository/com/github/luben/zstd-jni/1.5.2-4/zstd-jni-1.5.2-4.jar > [ERROR] urls[17] = > file:/Users/stevel/.m2/repository/org/scala-sbt/zinc_2.13/1.7.1/zinc_2.13-1.7.1.jar > [ERROR] urls[18] = > file:/Users/stevel/.m2/repository/org/scala-lang/scala-library/2.13.8/scala-library-2.13.8.jar > [ERROR] urls[19] = > file:/Users/stevel/.m2/repository/org/scala-sbt/zinc-core_2.13/1.7.1/zinc-core_2.13-1.7.1.jar > [ERROR] urls[20] = > file:/Users/stevel/.m2/repository/org/scala-sbt/zinc-apiinfo_2.13/1.7.1/zinc-apiinfo_2.13-1.7.1.jar > [ERROR] urls[21] = > file:/Users/stevel/.m2/repository/org/scala-sbt/compiler-bridge_2.13/1.7.1/compiler-bridge_2.13-1.7.1.jar > [ERROR] urls[22] = > file:/Users/stevel/.m2/repository/org/scala-sbt/zinc-classpath_2.13/1.7.1/zinc-classpath_2.13-1.7.1.jar > [ERROR] urls[23] = > file:/Users/stevel/.m2/repository/org/scala-lang/scala-compiler/2.13.8/scala-compiler-2.13.8.jar > [ERROR] urls[24] = > file:/Users/stevel/.m2/repository/org/scala-sbt/compiler-interface/1.7.1/compiler-interface-1.7.1.jar > [ERROR] urls[25] = > file:/Users/stevel/.m2/repository/org/scala-sbt/util-interface/1.7.0/util-interface-1.7.0.jar > [ERROR] urls[26] = > file:/Users/stevel/.m2/repository/org/scala-sbt/zinc-persist-core-assembly/1.7.1/zinc-persist-core-assembly-1.7.1.jar
[jira] [Updated] (SPARK-47167) Add concrete relation class for JDBC relation made in V1TableScan
[ https://issues.apache.org/jira/browse/SPARK-47167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uros Stankovic updated SPARK-47167: --- Description: JDBCRelation's method toV1TableScan creates v1 anonymous relation that can use predicates and other pushdowns of JDBCRelation object (v2 relation). That relation can later be logged to telemetry or to shown in Spark UI by that name. It is not descriptive enough, so the idea is to use concrete class. was:BaseRelation class do not provide any descriptive information like name or description, etc. So it would be great to add such class so debugging and logging would be easier. > Add concrete relation class for JDBC relation made in V1TableScan > - > > Key: SPARK-47167 > URL: https://issues.apache.org/jira/browse/SPARK-47167 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.5.1 >Reporter: Uros Stankovic >Assignee: Uros Stankovic >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > > JDBCRelation's method toV1TableScan creates v1 anonymous relation that can > use predicates and other pushdowns of JDBCRelation object (v2 relation). > That relation can later be logged to telemetry or to shown in Spark UI by > that name. It is not descriptive enough, so the idea is to use concrete class. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47167) Add concrete relation class for JDBC relation made in V1TableScan
[ https://issues.apache.org/jira/browse/SPARK-47167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47167. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45259 [https://github.com/apache/spark/pull/45259] > Add concrete relation class for JDBC relation made in V1TableScan > - > > Key: SPARK-47167 > URL: https://issues.apache.org/jira/browse/SPARK-47167 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.5.1 >Reporter: Uros Stankovic >Assignee: Uros Stankovic >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > > BaseRelation class do not provide any descriptive information like name or > description, etc. So it would be great to add such class so debugging and > logging would be easier. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47167) Add concrete relation class for JDBC relation made in V1TableScan
[ https://issues.apache.org/jira/browse/SPARK-47167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uros Stankovic updated SPARK-47167: --- Summary: Add concrete relation class for JDBC relation made in V1TableScan (was: Add descriptive relation class) > Add concrete relation class for JDBC relation made in V1TableScan > - > > Key: SPARK-47167 > URL: https://issues.apache.org/jira/browse/SPARK-47167 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.5.1 >Reporter: Uros Stankovic >Priority: Minor > Labels: pull-request-available > > BaseRelation class do not provide any descriptive information like name or > description, etc. So it would be great to add such class so debugging and > logging would be easier. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41392) spark builds against hadoop trunk/3.4.0-SNAPSHOT fail in scala-maven plugin
[ https://issues.apache.org/jira/browse/SPARK-41392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17822572#comment-17822572 ] Steve Loughran commented on SPARK-41392: expect an official release this week; this pr will ensure it works > spark builds against hadoop trunk/3.4.0-SNAPSHOT fail in scala-maven plugin > --- > > Key: SPARK-41392 > URL: https://issues.apache.org/jira/browse/SPARK-41392 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 4.0.0 >Reporter: Steve Loughran >Assignee: Yang Jie >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > on hadoop trunk (but not the 3.3.x line), spark builds fail with a CNFE > {code} > net.alchim31.maven:scala-maven-plugin:4.7.2:testCompile: > org/bouncycastle/jce/provider/BouncyCastleProvider > {code} > full stack > {code} > [ERROR] Failed to execute goal > net.alchim31.maven:scala-maven-plugin:4.7.2:testCompile > (scala-test-compile-first) on project spark-sql_2.12: Execution > scala-test-compile-first of goal > net.alchim31.maven:scala-maven-plugin:4.7.2:testCompile failed: A required > class was missing while executing > net.alchim31.maven:scala-maven-plugin:4.7.2:testCompile: > org/bouncycastle/jce/provider/BouncyCastleProvider > [ERROR] - > [ERROR] realm =plugin>net.alchim31.maven:scala-maven-plugin:4.7.2 > [ERROR] strategy = org.codehaus.plexus.classworlds.strategy.SelfFirstStrategy > [ERROR] urls[0] = > file:/Users/stevel/.m2/repository/net/alchim31/maven/scala-maven-plugin/4.7.2/scala-maven-plugin-4.7.2.jar > [ERROR] urls[1] = > file:/Users/stevel/.m2/repository/org/apache/maven/shared/maven-dependency-tree/3.2.0/maven-dependency-tree-3.2.0.jar > [ERROR] urls[2] = > file:/Users/stevel/.m2/repository/org/eclipse/aether/aether-util/1.0.0.v20140518/aether-util-1.0.0.v20140518.jar > [ERROR] urls[3] = > file:/Users/stevel/.m2/repository/org/apache/maven/reporting/maven-reporting-api/3.1.1/maven-reporting-api-3.1.1.jar > [ERROR] urls[4] = > file:/Users/stevel/.m2/repository/org/apache/maven/doxia/doxia-sink-api/1.11.1/doxia-sink-api-1.11.1.jar > [ERROR] urls[5] = > file:/Users/stevel/.m2/repository/org/apache/maven/doxia/doxia-logging-api/1.11.1/doxia-logging-api-1.11.1.jar > [ERROR] urls[6] = > file:/Users/stevel/.m2/repository/org/apache/maven/maven-archiver/3.6.0/maven-archiver-3.6.0.jar > [ERROR] urls[7] = > file:/Users/stevel/.m2/repository/org/codehaus/plexus/plexus-io/3.4.0/plexus-io-3.4.0.jar > [ERROR] urls[8] = > file:/Users/stevel/.m2/repository/org/codehaus/plexus/plexus-interpolation/1.26/plexus-interpolation-1.26.jar > [ERROR] urls[9] = > file:/Users/stevel/.m2/repository/org/apache/commons/commons-exec/1.3/commons-exec-1.3.jar > [ERROR] urls[10] = > file:/Users/stevel/.m2/repository/org/codehaus/plexus/plexus-utils/3.4.2/plexus-utils-3.4.2.jar > [ERROR] urls[11] = > file:/Users/stevel/.m2/repository/org/codehaus/plexus/plexus-archiver/4.5.0/plexus-archiver-4.5.0.jar > [ERROR] urls[12] = > file:/Users/stevel/.m2/repository/commons-io/commons-io/2.11.0/commons-io-2.11.0.jar > [ERROR] urls[13] = > file:/Users/stevel/.m2/repository/org/apache/commons/commons-compress/1.21/commons-compress-1.21.jar > [ERROR] urls[14] = > file:/Users/stevel/.m2/repository/org/iq80/snappy/snappy/0.4/snappy-0.4.jar > [ERROR] urls[15] = > file:/Users/stevel/.m2/repository/org/tukaani/xz/1.9/xz-1.9.jar > [ERROR] urls[16] = > file:/Users/stevel/.m2/repository/com/github/luben/zstd-jni/1.5.2-4/zstd-jni-1.5.2-4.jar > [ERROR] urls[17] = > file:/Users/stevel/.m2/repository/org/scala-sbt/zinc_2.13/1.7.1/zinc_2.13-1.7.1.jar > [ERROR] urls[18] = > file:/Users/stevel/.m2/repository/org/scala-lang/scala-library/2.13.8/scala-library-2.13.8.jar > [ERROR] urls[19] = > file:/Users/stevel/.m2/repository/org/scala-sbt/zinc-core_2.13/1.7.1/zinc-core_2.13-1.7.1.jar > [ERROR] urls[20] = > file:/Users/stevel/.m2/repository/org/scala-sbt/zinc-apiinfo_2.13/1.7.1/zinc-apiinfo_2.13-1.7.1.jar > [ERROR] urls[21] = > file:/Users/stevel/.m2/repository/org/scala-sbt/compiler-bridge_2.13/1.7.1/compiler-bridge_2.13-1.7.1.jar > [ERROR] urls[22] = > file:/Users/stevel/.m2/repository/org/scala-sbt/zinc-classpath_2.13/1.7.1/zinc-classpath_2.13-1.7.1.jar > [ERROR] urls[23] = > file:/Users/stevel/.m2/repository/org/scala-lang/scala-compiler/2.13.8/scala-compiler-2.13.8.jar > [ERROR] urls[24] = > file:/Users/stevel/.m2/repository/org/scala-sbt/compiler-interface/1.7.1/compiler-interface-1.7.1.jar > [ERROR] urls[25] = > file:/Users/stevel/.m2/repository/org/scala-sbt/util-interface/1.7.0/util-interface-1.7.0.jar > [ERROR] urls[26] = >
[jira] [Updated] (SPARK-47131) contains, startswith, endswith
[ https://issues.apache.org/jira/browse/SPARK-47131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uroš Bojanić updated SPARK-47131: - Description: Refactored built-in string functions to enable collation support for: {_}contains{_}, {_}startsWith{_}, {_}endsWith{_}. Spark SQL users should now be able to use COLLATE within arguments for built-in string functions: CONTAINS, STARTSWITH, ENDSWITH in Spark SQL queries. Note: CONTAINS implementation for non-binary collations is a separate subtask (SPARK-47248 ). was:Refactored built-in string functions to enable collation support for: {_}contains{_}, {_}startsWith{_}, {_}endsWith{_}. Spark SQL users should now be able to use COLLATE within arguments for built-in string functions: CONTAINS, STARTSWITH, ENDSWITH in Spark SQL queries. > contains, startswith, endswith > -- > > Key: SPARK-47131 > URL: https://issues.apache.org/jira/browse/SPARK-47131 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Priority: Major > Labels: pull-request-available > > Refactored built-in string functions to enable collation support for: > {_}contains{_}, {_}startsWith{_}, {_}endsWith{_}. Spark SQL users should now > be able to use COLLATE within arguments for built-in string functions: > CONTAINS, STARTSWITH, ENDSWITH in Spark SQL queries. Note: CONTAINS > implementation for non-binary collations is a separate subtask (SPARK-47248 > ). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47131) contains, startswith, endswith
[ https://issues.apache.org/jira/browse/SPARK-47131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uroš Bojanić updated SPARK-47131: - Description: Refactored built-in string functions to enable collation support for: {_}contains{_}, {_}startsWith{_}, {_}endsWith{_}. Spark SQL users should now be able to use COLLATE within arguments for built-in string functions: CONTAINS, STARTSWITH, ENDSWITH in Spark SQL queries. Note: CONTAINS implementation for non-binary collations is a separate subtask (SPARK-47248). (was: Refactored built-in string functions to enable collation support for: {_}contains{_}, {_}startsWith{_}, {_}endsWith{_}. Spark SQL users should now be able to use COLLATE within arguments for built-in string functions: CONTAINS, STARTSWITH, ENDSWITH in Spark SQL queries. Note: CONTAINS implementation for non-binary collations is a separate subtask (SPARK-47248 ).) > contains, startswith, endswith > -- > > Key: SPARK-47131 > URL: https://issues.apache.org/jira/browse/SPARK-47131 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Priority: Major > Labels: pull-request-available > > Refactored built-in string functions to enable collation support for: > {_}contains{_}, {_}startsWith{_}, {_}endsWith{_}. Spark SQL users should now > be able to use COLLATE within arguments for built-in string functions: > CONTAINS, STARTSWITH, ENDSWITH in Spark SQL queries. Note: CONTAINS > implementation for non-binary collations is a separate subtask (SPARK-47248). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47248) contains (non-binary collations)
[ https://issues.apache.org/jira/browse/SPARK-47248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uroš Bojanić updated SPARK-47248: - Description: Implemented efficient collation-aware in-place substring comparison to enable collation support for: {_}contains{_}. Spark SQL users should now be able to use COLLATE within arguments for built-in string function: CONTAINS in Spark SQL queries. (was: Enable efficient collation-aware in-place substring comparison.) Summary: contains (non-binary collations) (was: contains) > contains (non-binary collations) > > > Key: SPARK-47248 > URL: https://issues.apache.org/jira/browse/SPARK-47248 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Priority: Major > > Implemented efficient collation-aware in-place substring comparison to enable > collation support for: {_}contains{_}. Spark SQL users should now be able to > use COLLATE within arguments for built-in string function: CONTAINS in Spark > SQL queries. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47248) contains (non-binary collations)
[ https://issues.apache.org/jira/browse/SPARK-47248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uroš Bojanić updated SPARK-47248: - Description: Implemented efficient collation-aware in-place substring comparison to enable collation support for: {_}contains{_}. Spark SQL users should now be able to use COLLATE within arguments for built-in string function: CONTAINS in Spark SQL queries (for non-binary collations). (was: Implemented efficient collation-aware in-place substring comparison to enable collation support for: {_}contains{_}. Spark SQL users should now be able to use COLLATE within arguments for built-in string function: CONTAINS in Spark SQL queries.) > contains (non-binary collations) > > > Key: SPARK-47248 > URL: https://issues.apache.org/jira/browse/SPARK-47248 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Priority: Major > > Implemented efficient collation-aware in-place substring comparison to enable > collation support for: {_}contains{_}. Spark SQL users should now be able to > use COLLATE within arguments for built-in string function: CONTAINS in Spark > SQL queries (for non-binary collations). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47248) contains
Uroš Bojanić created SPARK-47248: Summary: contains Key: SPARK-47248 URL: https://issues.apache.org/jira/browse/SPARK-47248 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 4.0.0 Reporter: Uroš Bojanić Enable efficient collation-aware in-place substring comparison. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47131) contains, startswith, endswith
[ https://issues.apache.org/jira/browse/SPARK-47131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uroš Bojanić updated SPARK-47131: - Component/s: SQL (was: Spark Core) > contains, startswith, endswith > -- > > Key: SPARK-47131 > URL: https://issues.apache.org/jira/browse/SPARK-47131 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Priority: Major > Labels: pull-request-available > > Refactored built-in string functions to enable collation support for: > {_}contains{_}, {_}startsWith{_}, {_}endsWith{_}. Spark SQL users should now > be able to use COLLATE within arguments for built-in string functions: > CONTAINS, STARTSWITH, ENDSWITH in Spark SQL queries. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47217) De-duplication of Relations in Joins, can result in plan resolution failure
[ https://issues.apache.org/jira/browse/SPARK-47217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Toth updated SPARK-47217: --- Shepherd: (was: Peter Toth) > De-duplication of Relations in Joins, can result in plan resolution failure > --- > > Key: SPARK-47217 > URL: https://issues.apache.org/jira/browse/SPARK-47217 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.5.1 >Reporter: Asif >Priority: Major > Labels: Spark-SQL, pull-request-available > > In case of some flavours of nested joins involving repetition of relation, > the projected columns when passed to the DataFrame.select API , as form of > df.column , can result in plan resolution failure due to attribute resolution > not happening. > A scenario in which this happens is > {noformat} > > Project ( dataframe A.column("col-a") ) > | > Join2 > || >Join1 DataFrame A > | > DataFrame ADataFrame B > {noformat} > In such cases, If it so happens that Join2 - right leg DataFrame A gets > re-aliased due to De-Duplication of relations, and if the project uses Column > definition obtained from DataFrame A, its exprId will not match the > re-aliased Join2 - right Leg- DataFrame A , causing resolution failure. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47247) use smaller target size when coalescing partitions with exploding joins
[ https://issues.apache.org/jira/browse/SPARK-47247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47247: --- Labels: pull-request-available (was: ) > use smaller target size when coalescing partitions with exploding joins > --- > > Key: SPARK-47247 > URL: https://issues.apache.org/jira/browse/SPARK-47247 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Wenchen Fan >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47247) use smaller target size when coalescing partitions with exploding joins
Wenchen Fan created SPARK-47247: --- Summary: use smaller target size when coalescing partitions with exploding joins Key: SPARK-47247 URL: https://issues.apache.org/jira/browse/SPARK-47247 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.0.0 Reporter: Wenchen Fan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47131) contains, startswith, endswith
[ https://issues.apache.org/jira/browse/SPARK-47131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-47131: -- Assignee: Apache Spark > contains, startswith, endswith > -- > > Key: SPARK-47131 > URL: https://issues.apache.org/jira/browse/SPARK-47131 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Apache Spark >Priority: Major > Labels: pull-request-available > > Refactored built-in string functions to enable collation support for: > {_}contains{_}, {_}startsWith{_}, {_}endsWith{_}. Spark SQL users should now > be able to use COLLATE within arguments for built-in string functions: > CONTAINS, STARTSWITH, ENDSWITH in Spark SQL queries. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47131) contains, startswith, endswith
[ https://issues.apache.org/jira/browse/SPARK-47131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-47131: -- Assignee: (was: Apache Spark) > contains, startswith, endswith > -- > > Key: SPARK-47131 > URL: https://issues.apache.org/jira/browse/SPARK-47131 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Priority: Major > Labels: pull-request-available > > Refactored built-in string functions to enable collation support for: > {_}contains{_}, {_}startsWith{_}, {_}endsWith{_}. Spark SQL users should now > be able to use COLLATE within arguments for built-in string functions: > CONTAINS, STARTSWITH, ENDSWITH in Spark SQL queries. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47188) Add configuration to determine whether to exclude hive statistics properties
[ https://issues.apache.org/jira/browse/SPARK-47188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-47188: -- Assignee: (was: Apache Spark) > Add configuration to determine whether to exclude hive statistics properties > > > Key: SPARK-47188 > URL: https://issues.apache.org/jira/browse/SPARK-47188 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: xiaoping.huang >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47188) Add configuration to determine whether to exclude hive statistics properties
[ https://issues.apache.org/jira/browse/SPARK-47188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-47188: -- Assignee: Apache Spark > Add configuration to determine whether to exclude hive statistics properties > > > Key: SPARK-47188 > URL: https://issues.apache.org/jira/browse/SPARK-47188 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: xiaoping.huang >Assignee: Apache Spark >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43255) Assign a name to the error class _LEGACY_ERROR_TEMP_2020
[ https://issues.apache.org/jira/browse/SPARK-43255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk reassigned SPARK-43255: Assignee: Jin Helin > Assign a name to the error class _LEGACY_ERROR_TEMP_2020 > > > Key: SPARK-43255 > URL: https://issues.apache.org/jira/browse/SPARK-43255 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Max Gekk >Assignee: Jin Helin >Priority: Minor > Labels: pull-request-available, starter > > Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2020* defined in > {*}core/src/main/resources/error/error-classes.json{*}. The name should be > short but complete (look at the example in error-classes.json). > Add a test which triggers the error from user code if such test still doesn't > exist. Check exception fields by using {*}checkError(){*}. The last function > checks valuable error fields only, and avoids dependencies from error text > message. In this way, tech editors can modify error format in > error-classes.json, and don't worry of Spark's internal tests. Migrate other > tests that might trigger the error onto checkError(). > If you cannot reproduce the error from user space (using SQL query), replace > the error by an internal error, see {*}SparkException.internalError(){*}. > Improve the error message format in error-classes.json if the current is not > clear. Propose a solution to users how to avoid and fix such kind of errors. > Please, look at the PR below as examples: > * [https://github.com/apache/spark/pull/38685] > * [https://github.com/apache/spark/pull/38656] > * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43255) Assign a name to the error class _LEGACY_ERROR_TEMP_2020
[ https://issues.apache.org/jira/browse/SPARK-43255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk resolved SPARK-43255. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45302 [https://github.com/apache/spark/pull/45302] > Assign a name to the error class _LEGACY_ERROR_TEMP_2020 > > > Key: SPARK-43255 > URL: https://issues.apache.org/jira/browse/SPARK-43255 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Max Gekk >Assignee: Jin Helin >Priority: Minor > Labels: pull-request-available, starter > Fix For: 4.0.0 > > > Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2020* defined in > {*}core/src/main/resources/error/error-classes.json{*}. The name should be > short but complete (look at the example in error-classes.json). > Add a test which triggers the error from user code if such test still doesn't > exist. Check exception fields by using {*}checkError(){*}. The last function > checks valuable error fields only, and avoids dependencies from error text > message. In this way, tech editors can modify error format in > error-classes.json, and don't worry of Spark's internal tests. Migrate other > tests that might trigger the error onto checkError(). > If you cannot reproduce the error from user space (using SQL query), replace > the error by an internal error, see {*}SparkException.internalError(){*}. > Improve the error message format in error-classes.json if the current is not > clear. Propose a solution to users how to avoid and fix such kind of errors. > Please, look at the PR below as examples: > * [https://github.com/apache/spark/pull/38685] > * [https://github.com/apache/spark/pull/38656] > * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47188) Add configuration to determine whether to exclude hive statistics properties
[ https://issues.apache.org/jira/browse/SPARK-47188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-47188: -- Assignee: (was: Apache Spark) > Add configuration to determine whether to exclude hive statistics properties > > > Key: SPARK-47188 > URL: https://issues.apache.org/jira/browse/SPARK-47188 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: xiaoping.huang >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43255) Assign a name to the error class _LEGACY_ERROR_TEMP_2020
[ https://issues.apache.org/jira/browse/SPARK-43255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-43255: -- Assignee: (was: Apache Spark) > Assign a name to the error class _LEGACY_ERROR_TEMP_2020 > > > Key: SPARK-43255 > URL: https://issues.apache.org/jira/browse/SPARK-43255 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Max Gekk >Priority: Minor > Labels: pull-request-available, starter > > Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2020* defined in > {*}core/src/main/resources/error/error-classes.json{*}. The name should be > short but complete (look at the example in error-classes.json). > Add a test which triggers the error from user code if such test still doesn't > exist. Check exception fields by using {*}checkError(){*}. The last function > checks valuable error fields only, and avoids dependencies from error text > message. In this way, tech editors can modify error format in > error-classes.json, and don't worry of Spark's internal tests. Migrate other > tests that might trigger the error onto checkError(). > If you cannot reproduce the error from user space (using SQL query), replace > the error by an internal error, see {*}SparkException.internalError(){*}. > Improve the error message format in error-classes.json if the current is not > clear. Propose a solution to users how to avoid and fix such kind of errors. > Please, look at the PR below as examples: > * [https://github.com/apache/spark/pull/38685] > * [https://github.com/apache/spark/pull/38656] > * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47188) Add configuration to determine whether to exclude hive statistics properties
[ https://issues.apache.org/jira/browse/SPARK-47188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-47188: -- Assignee: Apache Spark > Add configuration to determine whether to exclude hive statistics properties > > > Key: SPARK-47188 > URL: https://issues.apache.org/jira/browse/SPARK-47188 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: xiaoping.huang >Assignee: Apache Spark >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43255) Assign a name to the error class _LEGACY_ERROR_TEMP_2020
[ https://issues.apache.org/jira/browse/SPARK-43255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-43255: -- Assignee: Apache Spark > Assign a name to the error class _LEGACY_ERROR_TEMP_2020 > > > Key: SPARK-43255 > URL: https://issues.apache.org/jira/browse/SPARK-43255 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Max Gekk >Assignee: Apache Spark >Priority: Minor > Labels: pull-request-available, starter > > Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2020* defined in > {*}core/src/main/resources/error/error-classes.json{*}. The name should be > short but complete (look at the example in error-classes.json). > Add a test which triggers the error from user code if such test still doesn't > exist. Check exception fields by using {*}checkError(){*}. The last function > checks valuable error fields only, and avoids dependencies from error text > message. In this way, tech editors can modify error format in > error-classes.json, and don't worry of Spark's internal tests. Migrate other > tests that might trigger the error onto checkError(). > If you cannot reproduce the error from user space (using SQL query), replace > the error by an internal error, see {*}SparkException.internalError(){*}. > Improve the error message format in error-classes.json if the current is not > clear. Propose a solution to users how to avoid and fix such kind of errors. > Please, look at the PR below as examples: > * [https://github.com/apache/spark/pull/38685] > * [https://github.com/apache/spark/pull/38656] > * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43255) Assign a name to the error class _LEGACY_ERROR_TEMP_2020
[ https://issues.apache.org/jira/browse/SPARK-43255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17822461#comment-17822461 ] Jin Helin commented on SPARK-43255: --- I will work on this. > Assign a name to the error class _LEGACY_ERROR_TEMP_2020 > > > Key: SPARK-43255 > URL: https://issues.apache.org/jira/browse/SPARK-43255 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Max Gekk >Priority: Minor > Labels: pull-request-available, starter > > Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2020* defined in > {*}core/src/main/resources/error/error-classes.json{*}. The name should be > short but complete (look at the example in error-classes.json). > Add a test which triggers the error from user code if such test still doesn't > exist. Check exception fields by using {*}checkError(){*}. The last function > checks valuable error fields only, and avoids dependencies from error text > message. In this way, tech editors can modify error format in > error-classes.json, and don't worry of Spark's internal tests. Migrate other > tests that might trigger the error onto checkError(). > If you cannot reproduce the error from user space (using SQL query), replace > the error by an internal error, see {*}SparkException.internalError(){*}. > Improve the error message format in error-classes.json if the current is not > clear. Propose a solution to users how to avoid and fix such kind of errors. > Please, look at the PR below as examples: > * [https://github.com/apache/spark/pull/38685] > * [https://github.com/apache/spark/pull/38656] > * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47244) SparkConnectPlanner make internal functions private
[ https://issues.apache.org/jira/browse/SPARK-47244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47244: --- Labels: pull-request-available (was: ) > SparkConnectPlanner make internal functions private > --- > > Key: SPARK-47244 > URL: https://issues.apache.org/jira/browse/SPARK-47244 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47244) SparkConnectPlanner make internal functions private
Ruifeng Zheng created SPARK-47244: - Summary: SparkConnectPlanner make internal functions private Key: SPARK-47244 URL: https://issues.apache.org/jira/browse/SPARK-47244 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 4.0.0 Reporter: Ruifeng Zheng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org