[jira] [Commented] (SPARK-46762) Spark Connect 3.5 Classloading issue with external jar
[ https://issues.apache.org/jira/browse/SPARK-46762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17821918#comment-17821918 ] Zhen Li commented on SPARK-46762: - [~tenstriker] can you provide more info to reproduce your error? the class loading problem is a bit hard to debug. It would be helpful if you can give us a command or test to reproduce the error? > Spark Connect 3.5 Classloading issue with external jar > -- > > Key: SPARK-46762 > URL: https://issues.apache.org/jira/browse/SPARK-46762 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.5.0 >Reporter: nirav patel >Priority: Major > Attachments: Screenshot 2024-02-22 at 2.04.37 PM.png, Screenshot > 2024-02-22 at 2.04.49 PM.png > > > We are having following `java.lang.ClassCastException` error in spark > Executors when using spark-connect 3.5 with external spark sql catalog jar - > iceberg-spark-runtime-3.5_2.12-1.4.3.jar > We also set "spark.executor.userClassPathFirst=true" otherwise child class > gets loaded by MutableClassLoader and parent class gets loaded by > ChildFirstCLassLoader and that causes ClassCastException as well. > > {code:java} > pyspark.errors.exceptions.connect.SparkConnectGrpcException: > (org.apache.spark.SparkException) Job aborted due to stage failure: Task 0 in > stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 > (TID 3) (spark35-m.c.mycomp-dev-test.internal executor 2): > java.lang.ClassCastException: class > org.apache.iceberg.spark.source.SerializableTableWithSize cannot be cast to > class org.apache.iceberg.Table > (org.apache.iceberg.spark.source.SerializableTableWithSize is in unnamed > module of loader org.apache.spark.util.ChildFirstURLClassLoader @5e7ae053; > org.apache.iceberg.Table is in unnamed module of loader > org.apache.spark.util.ChildFirstURLClassLoader @4b18b943) > at > org.apache.iceberg.spark.source.SparkInputPartition.table(SparkInputPartition.java:88) > at > org.apache.iceberg.spark.source.RowDataReader.(RowDataReader.java:50) > at > org.apache.iceberg.spark.source.SparkRowReaderFactory.createReader(SparkRowReaderFactory.java:45) > at > org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.advanceToNextIter(DataSourceRDD.scala:84) > at > org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.hasNext(DataSourceRDD.scala:63) > at > org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) > at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) > at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:388) > at > org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:890) > at > org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:890) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:328) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93) > at > org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161) > at org.apache.spark.scheduler.Task.run(Task.scala:141) > at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620) > at org.apach...{code} > > `org.apache.iceberg.spark.source.SerializableTableWithSize` is a child of > `org.apache.iceberg.Table` and they are both in only one jar > `iceberg-spark-runtime-3.5_2.12-1.4.3.jar` > We verified that there's only one jar of > `iceberg-spark-runtime-3.5_2.12-1.4.3.jar` loaded when spark-connect server > is started. > Looking more into Error it seems classloader itself is instantiated multiple > times somewhere. I can see two instances: > org.apache.spark.util.ChildFirstURLClassLoader @5e7ae053 and > org.apache.spark.util.ChildFirstURLClassLoader @4b18b943 > > *Affected version:* > spark 3.5 and spark-connect_2.12:3.5.0 works fine > > *Not affected version and variation:* > Spark 3.4 and spark-connect_2.12:3.4.0 works fine with external jar > Also works with just Spark 3.5 spark-submit script directly (ie without using > spark-connect 3.5 ) > > Issue has been open with Iceberg as well: > [https://github.com/apache/iceberg/issues/8978] > And been discussed in dev@org.apache.iceberg: > [https://lists.apache.org/thread/5q1pdqqrd1h06hgs8vx9ztt60z5yv8n1] > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additiona
[jira] [Created] (SPARK-45679) Add clusterBy in DataFrame API
Zhen Li created SPARK-45679: --- Summary: Add clusterBy in DataFrame API Key: SPARK-45679 URL: https://issues.apache.org/jira/browse/SPARK-45679 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 3.5.1 Reporter: Zhen Li Add clusterBy to Dataframe API e.g. in python DataframeWriterV1 ``` df.write .format("delta") .clusterBy("clusteringColumn1", "clusteringColumn2") .save(...) or saveAsTable(...) ``` DataFrameWriterV2 ``` df.writeTo(...).using("delta") .clusterBy("clusteringColumn1", "clusteringColumn2") .create() or replace() or createOrReplace() ``` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44615) Rename spark connect client suites to avoid conflict
Zhen Li created SPARK-44615: --- Summary: Rename spark connect client suites to avoid conflict Key: SPARK-44615 URL: https://issues.apache.org/jira/browse/SPARK-44615 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 3.5.0 Reporter: Zhen Li -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44576) Session Artifact update breaks XXWithState methods in KVGDS
Zhen Li created SPARK-44576: --- Summary: Session Artifact update breaks XXWithState methods in KVGDS Key: SPARK-44576 URL: https://issues.apache.org/jira/browse/SPARK-44576 Project: Spark Issue Type: Bug Components: Connect Affects Versions: 3.5.0 Reporter: Zhen Li When changing the client test jar from system classloader to session classloader (https://github.com/apache/spark/compare/master...zhenlineo:spark:streaming-artifacts?expand=1), all XXWithState test suite failed with class loader errors: e.g. ``` 23/07/25 16:13:14 WARN TaskSetManager: Lost task 1.0 in stage 2.0 (TID 16) (10.8.132.125 executor driver): TaskKilled (Stage cancelled: Job aborted due to stage failure: Task 170 in stage 2.0 failed 1 times, most recent failure: Lost task 170.0 in stage 2.0 (TID 14) (10.8.132.125 executor driver): java.lang.ClassCastException: class org.apache.spark.sql.streaming.ClickState cannot be cast to class org.apache.spark.sql.streaming.ClickState (org.apache.spark.sql.streaming.ClickState is in unnamed module of loader org.apache.spark.util.MutableURLClassLoader @2c604965; org.apache.spark.sql.streaming.ClickState is in unnamed module of loader java.net.URLClassLoader @57751f4) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage4.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:43) at org.apache.spark.sql.execution.datasources.v2.WritingSparkTask.$anonfun$run$1(WriteToDataSourceV2Exec.scala:441) at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1514) at org.apache.spark.sql.execution.datasources.v2.WritingSparkTask.run(WriteToDataSourceV2Exec.scala:486) at org.apache.spark.sql.execution.datasources.v2.WritingSparkTask.run$(WriteToDataSourceV2Exec.scala:425) at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.run(WriteToDataSourceV2Exec.scala:491) at org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.$anonfun$writeWithV2$2(WriteToDataSourceV2Exec.scala:388) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93) at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161) at org.apache.spark.scheduler.Task.run(Task.scala:141) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:592) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1480) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:595) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:829) Driver stacktrace:) 23/07/25 16:13:14 ERROR Utils: Aborting task java.lang.IllegalStateException: Error committing version 1 into HDFSStateStore[id=(op=0,part=5),dir=file:/private/var/folders/b0/f9jmmrrx5js7xsswxyf58nwrgp/T/temporary-02cca002-e189-4e32-afd8-964d6f8d5056/state/0/5] at org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$HDFSBackedStateStore.commit(HDFSBackedStateStoreProvider.scala:148) at org.apache.spark.sql.execution.streaming.FlatMapGroupsWithStateExecBase.$anonfun$processDataWithPartition$4(FlatMapGroupsWithStateExec.scala:183) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.apache.spark.util.Utils$.timeTakenMs(Utils.scala:611) at org.apache.spark.sql.execution.streaming.StateStoreWriter.timeTakenMs(statefulOperators.scala:179) at org.apache.spark.sql.execution.streaming.StateStoreWriter.timeTakenMs$(statefulOperators.scala:179) at org.apache.spark.sql.execution.streaming.FlatMapGroupsWithStateExec.timeTakenMs(FlatMapGroupsWithStateExec.scala:374) at org.apache.spark.sql.execution.streaming.FlatMapGroupsWithStateExecBase.$anonfun$processDataWithPartition$3(FlatMapGroupsWithStateExec.scala:183) at org.apache.spark.util.CompletionIterator$$anon$1.completion(CompletionIterator.scala:47) at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:36) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage4.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(W
[jira] [Commented] (SPARK-43416) Fix the bug where the ProduceEncoder#tuples fields names are different from server
[ https://issues.apache.org/jira/browse/SPARK-43416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17740312#comment-17740312 ] Zhen Li commented on SPARK-43416: - [~hvanhovell] Yes. Fixed by https://github.com/apache/spark/pull/41846 > Fix the bug where the ProduceEncoder#tuples fields names are different from > server > -- > > Key: SPARK-43416 > URL: https://issues.apache.org/jira/browse/SPARK-43416 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.5.0 >Reporter: Zhen Li >Priority: Major > > The fields are named _1, _2, ... etc. However on the server side it could be > nicely named in agg operations such as key, value etc. Fix this if possible. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44228) Handle Row Encoder in nested struct
Zhen Li created SPARK-44228: --- Summary: Handle Row Encoder in nested struct Key: SPARK-44228 URL: https://issues.apache.org/jira/browse/SPARK-44228 Project: Spark Issue Type: Story Components: Connect Affects Versions: 3.5.0 Reporter: Zhen Li Followups of [SPARK-43321] and [SPARK-44161] where the nested row encoder could be possible. Add some tests to ensure we covered all cases. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44225) Move resolveSelfJoinCondition to Analyzer
[ https://issues.apache.org/jira/browse/SPARK-44225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhen Li updated SPARK-44225: Description: Move the Joinwith object e.g. `ResolveSelfJoinCondition` into analyzer instead. See more discussion from SPARK-43321 https://github.com/apache/spark/pull/40997/files#r1244509826 was: Move the Joinwith `resolveSelfJoinCondition` into analyzer instead. See more discussion from SPARK-43321 https://github.com/apache/spark/pull/40997/files#r1244509826 > Move resolveSelfJoinCondition to Analyzer > - > > Key: SPARK-44225 > URL: https://issues.apache.org/jira/browse/SPARK-44225 > Project: Spark > Issue Type: Story > Components: SQL >Affects Versions: 3.5.0 >Reporter: Zhen Li >Priority: Major > > Move the Joinwith object e.g. `ResolveSelfJoinCondition` into analyzer > instead. > See more discussion from SPARK-43321 > https://github.com/apache/spark/pull/40997/files#r1244509826 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44225) Move resolveSelfJoinCondition to Analyzer
Zhen Li created SPARK-44225: --- Summary: Move resolveSelfJoinCondition to Analyzer Key: SPARK-44225 URL: https://issues.apache.org/jira/browse/SPARK-44225 Project: Spark Issue Type: Story Components: SQL Affects Versions: 3.5.0 Reporter: Zhen Li Move the Joinwith `resolveSelfJoinCondition` into analyzer instead. See more discussion from SPARK-43321 https://github.com/apache/spark/pull/40997/files#r1244509826 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44161) Row as UDF inputs causes encoder errors
Zhen Li created SPARK-44161: --- Summary: Row as UDF inputs causes encoder errors Key: SPARK-44161 URL: https://issues.apache.org/jira/browse/SPARK-44161 Project: Spark Issue Type: Bug Components: Connect Affects Versions: 3.5.0 Reporter: Zhen Li Ensure row inputs to udfs can be handled correctly. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43757) Change CheckConnectJvmClientCompatibility to deny list to increase the API check coverage
Zhen Li created SPARK-43757: --- Summary: Change CheckConnectJvmClientCompatibility to deny list to increase the API check coverage Key: SPARK-43757 URL: https://issues.apache.org/jira/browse/SPARK-43757 Project: Spark Issue Type: Bug Components: Connect Affects Versions: 3.5.0 Reporter: Zhen Li The current compatibility check only checks selected classes. So when adding a new class, if a developer forgets to add this class into the checklist, then this API is not covered in compatibility tests. Thus we should change this API check to always include all APIs by default. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43717) Scala Client Dataset#reduce failed to handle null partitions for scala primitive types
Zhen Li created SPARK-43717: --- Summary: Scala Client Dataset#reduce failed to handle null partitions for scala primitive types Key: SPARK-43717 URL: https://issues.apache.org/jira/browse/SPARK-43717 Project: Spark Issue Type: Bug Components: Connect Affects Versions: 3.5.0 Reporter: Zhen Li Scala client failed with NPE when running: assert(spark.range(0, 5, 1, 10).as[Long].reduce(_ + _) == 10) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43416) Fix the bug where the ProduceEncoder#tuples fields names are different from server
Zhen Li created SPARK-43416: --- Summary: Fix the bug where the ProduceEncoder#tuples fields names are different from server Key: SPARK-43416 URL: https://issues.apache.org/jira/browse/SPARK-43416 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 3.5.0 Reporter: Zhen Li The fields are named _1, _2, ... etc. However on the server side it could be nicely named in agg operations such as key, value etc. Fix this if possible. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43415) Impl mapValues for KVGDS#mapValues
Zhen Li created SPARK-43415: --- Summary: Impl mapValues for KVGDS#mapValues Key: SPARK-43415 URL: https://issues.apache.org/jira/browse/SPARK-43415 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 3.5.0 Reporter: Zhen Li Use an resolved func to pass the mapValues together with all aggExprs. Then on the server side unfold it to apply mapValues first before running aggregate. e.g. https://github.com/apache/spark/commit/a234a9b0851ebce87c0ef831b24866f94f0c0d36 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43321) Impl Dataset#JoinWith
Zhen Li created SPARK-43321: --- Summary: Impl Dataset#JoinWith Key: SPARK-43321 URL: https://issues.apache.org/jira/browse/SPARK-43321 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 3.5.0 Reporter: Zhen Li Impl missing method JoinWith -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43223) KeyValueGroupedDataset#agg
Zhen Li created SPARK-43223: --- Summary: KeyValueGroupedDataset#agg Key: SPARK-43223 URL: https://issues.apache.org/jira/browse/SPARK-43223 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 3.5.0 Reporter: Zhen Li Adding missing agg functions in the KVGDS API -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43136) Scala mapGroup, coGroup
Zhen Li created SPARK-43136: --- Summary: Scala mapGroup, coGroup Key: SPARK-43136 URL: https://issues.apache.org/jira/browse/SPARK-43136 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 3.5.0 Reporter: Zhen Li Adding Basics of Dataset#groupByKey -> KeyValueGroupedDataset support -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42999) Impl Dataset#foreach, foreachPartitions
Zhen Li created SPARK-42999: --- Summary: Impl Dataset#foreach, foreachPartitions Key: SPARK-42999 URL: https://issues.apache.org/jira/browse/SPARK-42999 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 3.5.0 Reporter: Zhen Li Impl the missing methods in Scala Client Dataset API. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42953) Impl typed map, flatMap, mapPartitions in Dataset
Zhen Li created SPARK-42953: --- Summary: Impl typed map, flatMap, mapPartitions in Dataset Key: SPARK-42953 URL: https://issues.apache.org/jira/browse/SPARK-42953 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 3.4.0 Reporter: Zhen Li Add missing typed API support in the Dataset API. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42519) Add more WriteTo tests after Scala Client session config is supported
[ https://issues.apache.org/jira/browse/SPARK-42519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17704655#comment-17704655 ] Zhen Li commented on SPARK-42519: - Hi [~fanjia] We need to figure out a way to pass the class files to the server classpath. This is also the main blocker for this ticket. To do so, either we can configure something to pass the classfiles via the spark-submit call, or we can wait for client side artifact auto sync work and see if we can sync the test files via this. > Add more WriteTo tests after Scala Client session config is supported > - > > Key: SPARK-42519 > URL: https://issues.apache.org/jira/browse/SPARK-42519 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.4.0 >Reporter: Zhen Li >Priority: Major > > Add more test cases following the examples in > "SparkConnectProtoSuite("WriteTo")" tests and add more tests. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42786) Impl typed select in Dataset
Zhen Li created SPARK-42786: --- Summary: Impl typed select in Dataset Key: SPARK-42786 URL: https://issues.apache.org/jira/browse/SPARK-42786 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 3.4.0 Reporter: Zhen Li -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42175) Implement more methods in the Scala Client Dataset API
[ https://issues.apache.org/jira/browse/SPARK-42175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhen Li resolved SPARK-42175. - Resolution: Duplicate > Implement more methods in the Scala Client Dataset API > -- > > Key: SPARK-42175 > URL: https://issues.apache.org/jira/browse/SPARK-42175 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.4.0 >Reporter: Zhen Li >Priority: Major > > Also fix the TODOs in the MiMa compatibility test. > https://github.com/apache/spark/pull/39712 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42656) Spark Connect Scala Client Shell Script
Zhen Li created SPARK-42656: --- Summary: Spark Connect Scala Client Shell Script Key: SPARK-42656 URL: https://issues.apache.org/jira/browse/SPARK-42656 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 3.4.0 Reporter: Zhen Li Adding a shell script to run scala client in a scala REPL to allow users to connect to spark connect. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42575) Replace `AnyFunSuite` with `ConnectFunSuite` for scala client tests
Zhen Li created SPARK-42575: --- Summary: Replace `AnyFunSuite` with `ConnectFunSuite` for scala client tests Key: SPARK-42575 URL: https://issues.apache.org/jira/browse/SPARK-42575 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 3.4.0 Reporter: Zhen Li Make enginner's life easier. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42573) Enable binary compatibility tests for SparkSession/Dataset/Column/functions
Zhen Li created SPARK-42573: --- Summary: Enable binary compatibility tests for SparkSession/Dataset/Column/functions Key: SPARK-42573 URL: https://issues.apache.org/jira/browse/SPARK-42573 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 3.4.0 Reporter: Zhen Li -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42533) SSL support for Scala Client
Zhen Li created SPARK-42533: --- Summary: SSL support for Scala Client Key: SPARK-42533 URL: https://issues.apache.org/jira/browse/SPARK-42533 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 3.4.0 Reporter: Zhen Li Add the basic encryption support for scala client. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-42518) Scala client Write API V2
[ https://issues.apache.org/jira/browse/SPARK-42518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhen Li closed SPARK-42518. --- > Scala client Write API V2 > - > > Key: SPARK-42518 > URL: https://issues.apache.org/jira/browse/SPARK-42518 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.4.0 >Reporter: Zhen Li >Assignee: Zhen Li >Priority: Major > Fix For: 3.4.0 > > > Impl the Dataset#writeTo method. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42519) Add more WriteTo tests after Scala Client session config is supported
[ https://issues.apache.org/jira/browse/SPARK-42519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhen Li updated SPARK-42519: Description: Add more test cases following the examples in "SparkConnectProtoSuite("WriteTo")" tests and add more tests. (was: Impl Scala Client Session Config to allow users to be able to set configs for spark.) > Add more WriteTo tests after Scala Client session config is supported > - > > Key: SPARK-42519 > URL: https://issues.apache.org/jira/browse/SPARK-42519 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.4.0 >Reporter: Zhen Li >Priority: Major > > Add more test cases following the examples in > "SparkConnectProtoSuite("WriteTo")" tests and add more tests. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42519) Add more WriteTo tests after Scala Client session config is supported
[ https://issues.apache.org/jira/browse/SPARK-42519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhen Li updated SPARK-42519: Summary: Add more WriteTo tests after Scala Client session config is supported (was: Scala Client session config) > Add more WriteTo tests after Scala Client session config is supported > - > > Key: SPARK-42519 > URL: https://issues.apache.org/jira/browse/SPARK-42519 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.4.0 >Reporter: Zhen Li >Priority: Major > > Impl Scala Client Session Config to allow users to be able to set configs for > spark. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42519) Scala Client session config
Zhen Li created SPARK-42519: --- Summary: Scala Client session config Key: SPARK-42519 URL: https://issues.apache.org/jira/browse/SPARK-42519 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 3.4.0 Reporter: Zhen Li Impl Scala Client Session Config to allow users to be able to set configs for spark. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42518) Scala client Write API V2
Zhen Li created SPARK-42518: --- Summary: Scala client Write API V2 Key: SPARK-42518 URL: https://issues.apache.org/jira/browse/SPARK-42518 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 3.4.0 Reporter: Zhen Li Impl the Dataset#writeTo method. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-42482) Scala client Write API V1
[ https://issues.apache.org/jira/browse/SPARK-42482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhen Li closed SPARK-42482. --- > Scala client Write API V1 > - > > Key: SPARK-42482 > URL: https://issues.apache.org/jira/browse/SPARK-42482 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.4.0 >Reporter: Zhen Li >Priority: Major > Fix For: 3.4.0 > > > Add basic Dataset#write API for Scala client. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-42457) Scala Client Session Read API
[ https://issues.apache.org/jira/browse/SPARK-42457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhen Li closed SPARK-42457. --- > Scala Client Session Read API > - > > Key: SPARK-42457 > URL: https://issues.apache.org/jira/browse/SPARK-42457 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.4.0 >Reporter: Zhen Li >Assignee: Zhen Li >Priority: Major > Fix For: 3.4.0 > > > Add SparkSession#read impl to be able to read data. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-42202) Scala Client E2E test stop the server gracefully
[ https://issues.apache.org/jira/browse/SPARK-42202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhen Li closed SPARK-42202. --- > Scala Client E2E test stop the server gracefully > > > Key: SPARK-42202 > URL: https://issues.apache.org/jira/browse/SPARK-42202 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.4.0 >Reporter: Zhen Li >Assignee: Zhen Li >Priority: Minor > Fix For: 3.4.0 > > > The current solution kills the spark connect server process which may result > in some errors in the command line. > Suggest a minor fix to close the server process gracefully. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-42429) IntelliJ Build issue: value getArgument is not a member of org.mockito.invocation.InvocationOnMock
[ https://issues.apache.org/jira/browse/SPARK-42429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhen Li closed SPARK-42429. --- > IntelliJ Build issue: value getArgument is not a member of > org.mockito.invocation.InvocationOnMock > -- > > Key: SPARK-42429 > URL: https://issues.apache.org/jira/browse/SPARK-42429 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.2.4 >Reporter: Zhen Li >Assignee: Zhen Li >Priority: Trivial > Fix For: 3.4.0 > > > When running the tests with IntelliJ, sometime the error pops out: > > spark/sql/core/src/test/scala/org/apache/spark/sql/execution/command/PlanResolutionSuite.scala:149:18 > value getArgument is not a member of org.mockito.invocation.InvocationOnMock > invocation.getArgument[Identifier](0).name match { > > It seems caused by some conflicts versioning of mockito in the IDE. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-42172) Compatibility check for Scala Client
[ https://issues.apache.org/jira/browse/SPARK-42172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhen Li closed SPARK-42172. --- > Compatibility check for Scala Client > > > Key: SPARK-42172 > URL: https://issues.apache.org/jira/browse/SPARK-42172 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.4.0 >Reporter: Zhen Li >Assignee: Zhen Li >Priority: Major > Fix For: 3.4.0 > > > Adding compatibility checks for Scala client to ensure the Scala Client API > is binary compatible with Existing Spark SQL API (Dataset, SparkSession etc.) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-42043) Basic Scala Client Result Implementation
[ https://issues.apache.org/jira/browse/SPARK-42043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhen Li closed SPARK-42043. --- > Basic Scala Client Result Implementation > - > > Key: SPARK-42043 > URL: https://issues.apache.org/jira/browse/SPARK-42043 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.4.0 >Reporter: Zhen Li >Assignee: Zhen Li >Priority: Major > Fix For: 3.4.0 > > > Adding the basic scala client Result implementation. Add some tests to verify > the result can be received correctly. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42482) Scala client Write API V1
[ https://issues.apache.org/jira/browse/SPARK-42482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhen Li updated SPARK-42482: Description: Add basic Dataset#write API for Scala client. (was: Add basic SparkSession#write API for Scala client.) > Scala client Write API V1 > - > > Key: SPARK-42482 > URL: https://issues.apache.org/jira/browse/SPARK-42482 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.4.0 >Reporter: Zhen Li >Priority: Major > > Add basic Dataset#write API for Scala client. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42482) Scala client Write API V1
Zhen Li created SPARK-42482: --- Summary: Scala client Write API V1 Key: SPARK-42482 URL: https://issues.apache.org/jira/browse/SPARK-42482 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 3.4.0 Reporter: Zhen Li Add basic SparkSession#write API for Scala client. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42457) Scala Client Session Read API
Zhen Li created SPARK-42457: --- Summary: Scala Client Session Read API Key: SPARK-42457 URL: https://issues.apache.org/jira/browse/SPARK-42457 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 3.4.0 Reporter: Zhen Li Add SparkSession#read impl to be able to read data. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42449) Fix `native-image.propertie` in Scala Client
[ https://issues.apache.org/jira/browse/SPARK-42449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhen Li updated SPARK-42449: Description: The content of `native-image.propertie` file is not correct. This file is used to create a native image using GraalVM see more info: https://docs.oracle.com/en/graalvm/enterprise/20/docs/reference-manual/native-image/BuildConfiguration/ https://www.graalvm.org/22.1/reference-manual/native-image/BuildConfiguration/ e.g. The content in `META-INF/native-image/io.netty` should also relocated, just as in `grpc-netty-shaded`. Now, the content of `META-INF/native-image/io.netty/netty-codec-http2/native-image.properties` is ``` Args = --initialize-at-build-time=io.netty \ --initialize-at-run-time=io.netty.handler.codec.http2.Http2CodecUtil,io.netty.handler.codec.http2.Http2ClientUpgradeCodec,io.netty.handler.codec.http2.Http2ConnectionHandler,io.netty.handler.codec.http2.DefaultHttp2FrameWriter ``` but it should like ``` Args = --initialize-at-build-time=org.sparkproject.connect.client.io.netty \ --initialize-at-run-time=org.sparkproject.connect.client.io.netty.handler.codec.http2.Http2CodecUtil,org.sparkproject.connect.client.io.netty.handler.codec.http2.Http2ClientUpgradeCodec,org.sparkproject.connect.client.io.netty.handler.codec.http2.Http2ConnectionHandler,org.sparkproject.connect.client.io.netty.handler.codec.http2.DefaultHttp2FrameWriter ``` Other Transformer may need to be added See more info in this discussion thread https://github.com/apache/spark/pull/39866#discussion_r1098833915 was: The content of `native-image.propertie` file is not correct. This file is used by GraalVM see https://docs.oracle.com/en/graalvm/enterprise/20/docs/reference-manual/native-image/BuildConfiguration/. e.g. The content in `META-INF/native-image/io.netty` should also relocated, just as in `grpc-netty-shaded`. Now, the content of `META-INF/native-image/io.netty/netty-codec-http2/native-image.properties` is ``` Args = --initialize-at-build-time=io.netty \ --initialize-at-run-time=io.netty.handler.codec.http2.Http2CodecUtil,io.netty.handler.codec.http2.Http2ClientUpgradeCodec,io.netty.handler.codec.http2.Http2ConnectionHandler,io.netty.handler.codec.http2.DefaultHttp2FrameWriter ``` but it should like ``` Args = --initialize-at-build-time=org.sparkproject.connect.client.io.netty \ --initialize-at-run-time=org.sparkproject.connect.client.io.netty.handler.codec.http2.Http2CodecUtil,org.sparkproject.connect.client.io.netty.handler.codec.http2.Http2ClientUpgradeCodec,org.sparkproject.connect.client.io.netty.handler.codec.http2.Http2ConnectionHandler,org.sparkproject.connect.client.io.netty.handler.codec.http2.DefaultHttp2FrameWriter ``` Other Transformer may need to be added See more info in this discussion thread https://github.com/apache/spark/pull/39866#discussion_r1098833915 > Fix `native-image.propertie` in Scala Client > > > Key: SPARK-42449 > URL: https://issues.apache.org/jira/browse/SPARK-42449 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.4.0 >Reporter: Zhen Li >Priority: Minor > > The content of `native-image.propertie` file is not correct. This file is > used to create a native image using GraalVM see more info: > https://docs.oracle.com/en/graalvm/enterprise/20/docs/reference-manual/native-image/BuildConfiguration/ > https://www.graalvm.org/22.1/reference-manual/native-image/BuildConfiguration/ > e.g. > The content in `META-INF/native-image/io.netty` should also relocated, just > as in `grpc-netty-shaded`. > Now, the content of > `META-INF/native-image/io.netty/netty-codec-http2/native-image.properties` is > ``` > Args = --initialize-at-build-time=io.netty \ > > --initialize-at-run-time=io.netty.handler.codec.http2.Http2CodecUtil,io.netty.handler.codec.http2.Http2ClientUpgradeCodec,io.netty.handler.codec.http2.Http2ConnectionHandler,io.netty.handler.codec.http2.DefaultHttp2FrameWriter > ``` > but it should like > ``` > Args = --initialize-at-build-time=org.sparkproject.connect.client.io.netty \ > > --initialize-at-run-time=org.sparkproject.connect.client.io.netty.handler.codec.http2.Http2CodecUtil,org.sparkproject.connect.client.io.netty.handler.codec.http2.Http2ClientUpgradeCodec,org.sparkproject.connect.client.io.netty.handler.codec.http2.Http2ConnectionHandler,org.sparkproject.connect.client.io.netty.handler.codec.http2.DefaultHttp2FrameWriter > > ``` > Other Transformer may need to be added > See more info in this discussion thread > https://github.com/apache/spark/pull/39866#discussion_r1098833915 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To uns
[jira] [Updated] (SPARK-42449) Fix `native-image.propertie` in Scala Client
[ https://issues.apache.org/jira/browse/SPARK-42449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhen Li updated SPARK-42449: Description: The content of `native-image.propertie` file is not correct. This file is used by GraalVM see https://docs.oracle.com/en/graalvm/enterprise/20/docs/reference-manual/native-image/BuildConfiguration/. e.g. The content in `META-INF/native-image/io.netty` should also relocated, just as in `grpc-netty-shaded`. Now, the content of `META-INF/native-image/io.netty/netty-codec-http2/native-image.properties` is ``` Args = --initialize-at-build-time=io.netty \ --initialize-at-run-time=io.netty.handler.codec.http2.Http2CodecUtil,io.netty.handler.codec.http2.Http2ClientUpgradeCodec,io.netty.handler.codec.http2.Http2ConnectionHandler,io.netty.handler.codec.http2.DefaultHttp2FrameWriter ``` but it should like ``` Args = --initialize-at-build-time=org.sparkproject.connect.client.io.netty \ --initialize-at-run-time=org.sparkproject.connect.client.io.netty.handler.codec.http2.Http2CodecUtil,org.sparkproject.connect.client.io.netty.handler.codec.http2.Http2ClientUpgradeCodec,org.sparkproject.connect.client.io.netty.handler.codec.http2.Http2ConnectionHandler,org.sparkproject.connect.client.io.netty.handler.codec.http2.DefaultHttp2FrameWriter ``` Other Transformer may need to be added See more info in this discussion thread https://github.com/apache/spark/pull/39866#discussion_r1098833915 was: The content of `native-image.propertie` file is not correct. This file may be used by graal project to find the shaded contents. e.g. The content in `META-INF/native-image/io.netty` should also relocated, just as in `grpc-netty-shaded`. Now, the content of `META-INF/native-image/io.netty/netty-codec-http2/native-image.properties` is ``` Args = --initialize-at-build-time=io.netty \ --initialize-at-run-time=io.netty.handler.codec.http2.Http2CodecUtil,io.netty.handler.codec.http2.Http2ClientUpgradeCodec,io.netty.handler.codec.http2.Http2ConnectionHandler,io.netty.handler.codec.http2.DefaultHttp2FrameWriter ``` but it should like ``` Args = --initialize-at-build-time=org.sparkproject.connect.client.io.netty \ --initialize-at-run-time=org.sparkproject.connect.client.io.netty.handler.codec.http2.Http2CodecUtil,org.sparkproject.connect.client.io.netty.handler.codec.http2.Http2ClientUpgradeCodec,org.sparkproject.connect.client.io.netty.handler.codec.http2.Http2ConnectionHandler,org.sparkproject.connect.client.io.netty.handler.codec.http2.DefaultHttp2FrameWriter ``` Other Transformer may need to be added See more info in this discussion thread https://github.com/apache/spark/pull/39866#discussion_r1098833915 > Fix `native-image.propertie` in Scala Client > > > Key: SPARK-42449 > URL: https://issues.apache.org/jira/browse/SPARK-42449 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.4.0 >Reporter: Zhen Li >Priority: Minor > > The content of `native-image.propertie` file is not correct. This file is > used by GraalVM see > https://docs.oracle.com/en/graalvm/enterprise/20/docs/reference-manual/native-image/BuildConfiguration/. > e.g. > The content in `META-INF/native-image/io.netty` should also relocated, just > as in `grpc-netty-shaded`. > Now, the content of > `META-INF/native-image/io.netty/netty-codec-http2/native-image.properties` is > ``` > Args = --initialize-at-build-time=io.netty \ > > --initialize-at-run-time=io.netty.handler.codec.http2.Http2CodecUtil,io.netty.handler.codec.http2.Http2ClientUpgradeCodec,io.netty.handler.codec.http2.Http2ConnectionHandler,io.netty.handler.codec.http2.DefaultHttp2FrameWriter > ``` > but it should like > ``` > Args = --initialize-at-build-time=org.sparkproject.connect.client.io.netty \ > > --initialize-at-run-time=org.sparkproject.connect.client.io.netty.handler.codec.http2.Http2CodecUtil,org.sparkproject.connect.client.io.netty.handler.codec.http2.Http2ClientUpgradeCodec,org.sparkproject.connect.client.io.netty.handler.codec.http2.Http2ConnectionHandler,org.sparkproject.connect.client.io.netty.handler.codec.http2.DefaultHttp2FrameWriter > > ``` > Other Transformer may need to be added > See more info in this discussion thread > https://github.com/apache/spark/pull/39866#discussion_r1098833915 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42449) Fix `native-image.propertie` in Scala Client
Zhen Li created SPARK-42449: --- Summary: Fix `native-image.propertie` in Scala Client Key: SPARK-42449 URL: https://issues.apache.org/jira/browse/SPARK-42449 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 3.4.0 Reporter: Zhen Li The content of `native-image.propertie` file is not correct. This file may be used by graal project to find the shaded contents. e.g. The content in `META-INF/native-image/io.netty` should also relocated, just as in `grpc-netty-shaded`. Now, the content of `META-INF/native-image/io.netty/netty-codec-http2/native-image.properties` is ``` Args = --initialize-at-build-time=io.netty \ --initialize-at-run-time=io.netty.handler.codec.http2.Http2CodecUtil,io.netty.handler.codec.http2.Http2ClientUpgradeCodec,io.netty.handler.codec.http2.Http2ConnectionHandler,io.netty.handler.codec.http2.DefaultHttp2FrameWriter ``` but it should like ``` Args = --initialize-at-build-time=org.sparkproject.connect.client.io.netty \ --initialize-at-run-time=org.sparkproject.connect.client.io.netty.handler.codec.http2.Http2CodecUtil,org.sparkproject.connect.client.io.netty.handler.codec.http2.Http2ClientUpgradeCodec,org.sparkproject.connect.client.io.netty.handler.codec.http2.Http2ConnectionHandler,org.sparkproject.connect.client.io.netty.handler.codec.http2.DefaultHttp2FrameWriter ``` Other Transformer may need to be added See more info in this discussion thread https://github.com/apache/spark/pull/39866#discussion_r1098833915 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42429) IntelliJ Build issue: value getArgument is not a member of org.mockito.invocation.InvocationOnMock
[ https://issues.apache.org/jira/browse/SPARK-42429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhen Li updated SPARK-42429: Description: When running the tests with IntelliJ, sometime the error pops out: spark/sql/core/src/test/scala/org/apache/spark/sql/execution/command/PlanResolutionSuite.scala:149:18 value getArgument is not a member of org.mockito.invocation.InvocationOnMock invocation.getArgument[Identifier](0).name match { It seems caused by some conflicts versioning of mockito in the IDE. was: When running the tests with IntelliJ, sometime the error pops out: spark/sql/core/src/test/scala/org/apache/spark/sql/execution/command/PlanResolutionSuite.scala:149:18 value getArgument is not a member of org.mockito.invocation.InvocationOnMock invocation.getArgument[Identifier](0).name match { It seems caused by some conflicts versioning of mockito in the IDE. > IntelliJ Build issue: value getArgument is not a member of > org.mockito.invocation.InvocationOnMock > -- > > Key: SPARK-42429 > URL: https://issues.apache.org/jira/browse/SPARK-42429 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.2.4 >Reporter: Zhen Li >Priority: Trivial > > When running the tests with IntelliJ, sometime the error pops out: > > spark/sql/core/src/test/scala/org/apache/spark/sql/execution/command/PlanResolutionSuite.scala:149:18 > value getArgument is not a member of org.mockito.invocation.InvocationOnMock > invocation.getArgument[Identifier](0).name match { > > It seems caused by some conflicts versioning of mockito in the IDE. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42429) IntelliJ Build issue: value getArgument is not a member of org.mockito.invocation.InvocationOnMock
[ https://issues.apache.org/jira/browse/SPARK-42429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhen Li updated SPARK-42429: Description: When running the tests with IntelliJ, sometime the error pops out: {{ spark/sql/core/src/test/scala/org/apache/spark/sql/execution/command/PlanResolutionSuite.scala:149:18 value getArgument is not a member of org.mockito.invocation.InvocationOnMock invocation.getArgument[Identifier](0).name match { }} It seems caused by some conflicts versioning of mockito in the IDE. was: When running the tests with IntelliJ, sometime the error pops out: ``` {{spark/sql/core/src/test/scala/org/apache/spark/sql/execution/command/PlanResolutionSuite.scala:149:18 value getArgument is not a member of org.mockito.invocation.InvocationOnMock invocation.getArgument[Identifier](0).name match {}} {{{}```{}}}{{{}{}}}{{{}{}}} {{}} It seems caused by some conflicts versioning of mockito in the IDE. {{}} > IntelliJ Build issue: value getArgument is not a member of > org.mockito.invocation.InvocationOnMock > -- > > Key: SPARK-42429 > URL: https://issues.apache.org/jira/browse/SPARK-42429 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.2.4 >Reporter: Zhen Li >Priority: Trivial > > When running the tests with IntelliJ, sometime the error pops out: > > {{ > spark/sql/core/src/test/scala/org/apache/spark/sql/execution/command/PlanResolutionSuite.scala:149:18 > value getArgument is not a member of org.mockito.invocation.InvocationOnMock > invocation.getArgument[Identifier](0).name match { > }} > > It seems caused by some conflicts versioning of mockito in the IDE. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42429) IntelliJ Build issue: value getArgument is not a member of org.mockito.invocation.InvocationOnMock
[ https://issues.apache.org/jira/browse/SPARK-42429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhen Li updated SPARK-42429: Description: When running the tests with IntelliJ, sometime the error pops out: spark/sql/core/src/test/scala/org/apache/spark/sql/execution/command/PlanResolutionSuite.scala:149:18 value getArgument is not a member of org.mockito.invocation.InvocationOnMock invocation.getArgument[Identifier](0).name match { It seems caused by some conflicts versioning of mockito in the IDE. was: When running the tests with IntelliJ, sometime the error pops out: {{ spark/sql/core/src/test/scala/org/apache/spark/sql/execution/command/PlanResolutionSuite.scala:149:18 value getArgument is not a member of org.mockito.invocation.InvocationOnMock invocation.getArgument[Identifier](0).name match { }} It seems caused by some conflicts versioning of mockito in the IDE. > IntelliJ Build issue: value getArgument is not a member of > org.mockito.invocation.InvocationOnMock > -- > > Key: SPARK-42429 > URL: https://issues.apache.org/jira/browse/SPARK-42429 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.2.4 >Reporter: Zhen Li >Priority: Trivial > > When running the tests with IntelliJ, sometime the error pops out: > > > spark/sql/core/src/test/scala/org/apache/spark/sql/execution/command/PlanResolutionSuite.scala:149:18 > value getArgument is not a member of org.mockito.invocation.InvocationOnMock > invocation.getArgument[Identifier](0).name match { > > It seems caused by some conflicts versioning of mockito in the IDE. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42429) IntelliJ Build issue: value getArgument is not a member of org.mockito.invocation.InvocationOnMock
Zhen Li created SPARK-42429: --- Summary: IntelliJ Build issue: value getArgument is not a member of org.mockito.invocation.InvocationOnMock Key: SPARK-42429 URL: https://issues.apache.org/jira/browse/SPARK-42429 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 3.2.4 Reporter: Zhen Li When running the tests with IntelliJ, sometime the error pops out: ``` {{spark/sql/core/src/test/scala/org/apache/spark/sql/execution/command/PlanResolutionSuite.scala:149:18 value getArgument is not a member of org.mockito.invocation.InvocationOnMock invocation.getArgument[Identifier](0).name match {}} {{{}```{}}}{{{}{}}}{{{}{}}} {{}} It seems caused by some conflicts versioning of mockito in the IDE. {{}} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42215) Better Scala Client Integration test
[ https://issues.apache.org/jira/browse/SPARK-42215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17681401#comment-17681401 ] Zhen Li commented on SPARK-42215: - Marking the tests as ITs in maven may cause the tests not found by SBT. Make sure the tests can still be found by SBT. > Better Scala Client Integration test > > > Key: SPARK-42215 > URL: https://issues.apache.org/jira/browse/SPARK-42215 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.4.0 >Reporter: Zhen Li >Priority: Major > > The current Scala client has a few integration tests that requires a build > first before running client tests. This is not very nice to maven developers > as they will not be able to do a `mvn clean install` to run all tests. > > Look into marking these test as ITs and other better ways for maven to run > test after packages are built. > > Make sure the test run in SBT as well. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42215) Better Scala Client Integration test
[ https://issues.apache.org/jira/browse/SPARK-42215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhen Li updated SPARK-42215: Description: The current Scala client has a few integration tests that requires a build first before running client tests. This is not very nice to maven developers as they will not be able to do a `mvn clean install` to run all tests. Look into marking these test as ITs and other better ways for maven to run test after packages are built. Make sure the test run in SBT as well. was: The current Scala client has a few integration tests that requires a build first before running client tests. This is not very nice to maven developers as they will not be able to do a `mvn clean install` to run all tests. Look into marking these test as ITs and other better ways for maven to run test after packages are built. > Better Scala Client Integration test > > > Key: SPARK-42215 > URL: https://issues.apache.org/jira/browse/SPARK-42215 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.4.0 >Reporter: Zhen Li >Priority: Major > > The current Scala client has a few integration tests that requires a build > first before running client tests. This is not very nice to maven developers > as they will not be able to do a `mvn clean install` to run all tests. > > Look into marking these test as ITs and other better ways for maven to run > test after packages are built. > > Make sure the test run in SBT as well. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42215) Better Scala Client Integration test
Zhen Li created SPARK-42215: --- Summary: Better Scala Client Integration test Key: SPARK-42215 URL: https://issues.apache.org/jira/browse/SPARK-42215 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 3.4.0 Reporter: Zhen Li The current Scala client has a few integration tests that requires a build first before running client tests. This is not very nice to maven developers as they will not be able to do a `mvn clean install` to run all tests. Look into marking these test as ITs and other better ways for maven to run test after packages are built. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42202) Scala Client E2E test stop the server gracefully
Zhen Li created SPARK-42202: --- Summary: Scala Client E2E test stop the server gracefully Key: SPARK-42202 URL: https://issues.apache.org/jira/browse/SPARK-42202 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 3.4.0 Reporter: Zhen Li The current solution kills the spark connect server process which may result in some errors in the command line. Suggest a minor fix to close the server process gracefully. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-38378) ANTLR grammar definition in separate Parser and Lexer files
[ https://issues.apache.org/jira/browse/SPARK-38378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhen Li closed SPARK-38378. --- > ANTLR grammar definition in separate Parser and Lexer files > --- > > Key: SPARK-38378 > URL: https://issues.apache.org/jira/browse/SPARK-38378 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Zhen Li >Assignee: Zhen Li >Priority: Major > Fix For: 3.3.0 > > > Suggesting to separate the ANTLR grammar defined in `SqlBase.g4` into > separate parser `SqlBaseParser.g4` and lexer `SqlBaseLexer.g4`. > Benefits: > *Gain more flexibility when implementing new SQL features* > The current ANTLR grammar definition is given as a mixed grammar in the > `SqlBase.g4` file. > By separating the lexer and parser, we will be able to use the full power of > ANTLR parser and lexer grammars. e.g. lexer mode. This will give us more > flexibility when implementing new SQL features. > *The code is more clean.* > Having parser and lexer in different files also keeps the code more explicit > about which is the parser and which is the lexer. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-38646) Pull a trait out for Python functions
[ https://issues.apache.org/jira/browse/SPARK-38646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhen Li closed SPARK-38646. --- > Pull a trait out for Python functions > - > > Key: SPARK-38646 > URL: https://issues.apache.org/jira/browse/SPARK-38646 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.3.0, 3.2.2 >Reporter: Zhen Li >Assignee: Zhen Li >Priority: Major > Fix For: 3.4.0 > > > Currently pyspark uses a case class PythonFunction PythonRDD and many other > interfaces/classes. Propose to change to use a trait instead to avoid tying > impl with APIs. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42175) Implement more methods in the Scala Client Dataset API
[ https://issues.apache.org/jira/browse/SPARK-42175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhen Li updated SPARK-42175: Description: Also fix the TODOs in the MiMa compatibility test. https://github.com/apache/spark/pull/39712 > Implement more methods in the Scala Client Dataset API > -- > > Key: SPARK-42175 > URL: https://issues.apache.org/jira/browse/SPARK-42175 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.4.0 >Reporter: Zhen Li >Priority: Major > > Also fix the TODOs in the MiMa compatibility test. > https://github.com/apache/spark/pull/39712 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42175) Implement more methods in the Scala Client Dataset API
Zhen Li created SPARK-42175: --- Summary: Implement more methods in the Scala Client Dataset API Key: SPARK-42175 URL: https://issues.apache.org/jira/browse/SPARK-42175 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 3.4.0 Reporter: Zhen Li -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42172) Compatibility check for Scala Client
Zhen Li created SPARK-42172: --- Summary: Compatibility check for Scala Client Key: SPARK-42172 URL: https://issues.apache.org/jira/browse/SPARK-42172 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 3.4.0 Reporter: Zhen Li Adding compatibility checks for Scala client to ensure the Scala Client API is binary compatible with Existing Spark SQL API (Dataset, SparkSession etc.) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42135) Scala Client Proper logging for the client
Zhen Li created SPARK-42135: --- Summary: Scala Client Proper logging for the client Key: SPARK-42135 URL: https://issues.apache.org/jira/browse/SPARK-42135 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 3.4.0 Reporter: Zhen Li Introduce proper logging for the client and change [https://github.com/apache/spark/pull/39541/files/2a589543bdec80f4cf806af0a8566d2de8c04140#r1082062813] to use the client logging. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42043) Basic Scala Client Result Implementation
[ https://issues.apache.org/jira/browse/SPARK-42043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhen Li updated SPARK-42043: Description: Adding the basic scala client Result implementation. Add some tests to verify the result can be received correctly. (was: Adding the basic scala client implementation, including Dataset, SparkSession and SparkResult.) Summary: Basic Scala Client Result Implementation (was: Basic Scala Client) > Basic Scala Client Result Implementation > - > > Key: SPARK-42043 > URL: https://issues.apache.org/jira/browse/SPARK-42043 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.4.0 >Reporter: Zhen Li >Priority: Major > > Adding the basic scala client Result implementation. Add some tests to verify > the result can be received correctly. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42043) Basic Scala Client
Zhen Li created SPARK-42043: --- Summary: Basic Scala Client Key: SPARK-42043 URL: https://issues.apache.org/jira/browse/SPARK-42043 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 3.4.0 Reporter: Zhen Li Adding the basic scala client implementation, including Dataset, SparkSession and SparkResult. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38646) Pull a trait out for Python functions
Zhen Li created SPARK-38646: --- Summary: Pull a trait out for Python functions Key: SPARK-38646 URL: https://issues.apache.org/jira/browse/SPARK-38646 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 3.3.0, 3.2.2 Reporter: Zhen Li Currently pyspark uses a case class PythonFunction PythonRDD and many other interfaces/classes. Propose to change to use a trait instead to avoid tying impl with APIs. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38378) ANTLR grammar definition in separate Parser and Lexer files
[ https://issues.apache.org/jira/browse/SPARK-38378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhen Li updated SPARK-38378: Affects Version/s: (was: 3.2.2) > ANTLR grammar definition in separate Parser and Lexer files > --- > > Key: SPARK-38378 > URL: https://issues.apache.org/jira/browse/SPARK-38378 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Zhen Li >Priority: Major > > Suggesting to separate the ANTLR grammar defined in `SqlBase.g4` into > separate parser `SqlBaseParser.g4` and lexer `SqlBaseLexer.g4`. > Benefits: > *Gain more flexibility when implementing new SQL features* > The current ANTLR grammar definition is given as a mixed grammar in the > `SqlBase.g4` file. > By separating the lexer and parser, we will be able to use the full power of > ANTLR parser and lexer grammars. e.g. lexer mode. This will give us more > flexibility when implementing new SQL features. > *The code is more clean.* > Having parser and lexer in different files also keeps the code more explicit > about which is the parser and which is the lexer. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38378) ANTLR grammar definition in separate Parser and Lexer files
Zhen Li created SPARK-38378: --- Summary: ANTLR grammar definition in separate Parser and Lexer files Key: SPARK-38378 URL: https://issues.apache.org/jira/browse/SPARK-38378 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.3.0, 3.2.2 Reporter: Zhen Li Suggesting to separate the ANTLR grammar defined in `SqlBase.g4` into separate parser `SqlBaseParser.g4` and lexer `SqlBaseLexer.g4`. Benefits: *Gain more flexibility when implementing new SQL features* The current ANTLR grammar definition is given as a mixed grammar in the `SqlBase.g4` file. By separating the lexer and parser, we will be able to use the full power of ANTLR parser and lexer grammars. e.g. lexer mode. This will give us more flexibility when implementing new SQL features. *The code is more clean.* Having parser and lexer in different files also keeps the code more explicit about which is the parser and which is the lexer. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33033) Display time series view for task metrics in history server
Zhen Li created SPARK-33033: --- Summary: Display time series view for task metrics in history server Key: SPARK-33033 URL: https://issues.apache.org/jira/browse/SPARK-33033 Project: Spark Issue Type: Improvement Components: Web UI Affects Versions: 3.1.0 Reporter: Zhen Li Event log contains all tasks' metrics data, which are useful for performance debugging. By now spark UI only displays final aggregation results, much information is hidden by this way. If spark UI could provide time series data view, it would be more helpful to performance debugging problems. We would like to build application statistics page in history server based on task metrics to provide more straight forward insight for spark application. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31882) DAG-viz is not rendered correctly with pagination.
[ https://issues.apache.org/jira/browse/SPARK-31882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhen Li updated SPARK-31882: Affects Version/s: 2.4.4 > DAG-viz is not rendered correctly with pagination. > -- > > Key: SPARK-31882 > URL: https://issues.apache.org/jira/browse/SPARK-31882 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.4.4, 3.0.0, 3.1.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Major > > Because DAG-viz for a job fetches link urls for each stage from the stage > table, rendering can fail with pagination. > You can reproduce this issue with the following operation. > {code:java} > sc.parallelize(1 to 10).map(value => (value > ,value)).repartition(1).repartition(1).repartition(1).reduceByKey(_ + > _).collect{code} > And then, visit the corresponding job page. > There are 5 stages so show <5 stages in the paged table. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32886) '.../jobs/undefined' link from "Event Timeline" in jobs page
[ https://issues.apache.org/jira/browse/SPARK-32886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhen Li updated SPARK-32886: Affects Version/s: 2.4.4 > '.../jobs/undefined' link from "Event Timeline" in jobs page > > > Key: SPARK-32886 > URL: https://issues.apache.org/jira/browse/SPARK-32886 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.4.4, 3.0.0, 3.1.0 >Reporter: Zhen Li >Assignee: Apache Spark >Priority: Minor > Attachments: undefinedlink.JPG > > > In event timeline view of jobs page, clicking job item would redirect you to > corresponding job page. when there are two many jobs, some job items' link > would redirect to wrong link like '.../jobs/undefined' -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32886) '.../jobs/undefined' link from EvenTimeline view
[ https://issues.apache.org/jira/browse/SPARK-32886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhen Li updated SPARK-32886: Attachment: undefinedlink.JPG > '.../jobs/undefined' link from EvenTimeline view > > > Key: SPARK-32886 > URL: https://issues.apache.org/jira/browse/SPARK-32886 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 3.0.0, 3.1.0 >Reporter: Zhen Li >Priority: Minor > Attachments: undefinedlink.JPG > > > In event timeline view of jobs page, clicking job item would redirect you to > corresponding job page. when there are two many jobs, some job items' link > would redirect to wrong link like '.../jobs/undefined' -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32886) '.../jobs/undefined' link from "Event Timeline" in jobs page
[ https://issues.apache.org/jira/browse/SPARK-32886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhen Li updated SPARK-32886: Summary: '.../jobs/undefined' link from "Event Timeline" in jobs page (was: '.../jobs/undefined' link from EvenTimeline view) > '.../jobs/undefined' link from "Event Timeline" in jobs page > > > Key: SPARK-32886 > URL: https://issues.apache.org/jira/browse/SPARK-32886 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 3.0.0, 3.1.0 >Reporter: Zhen Li >Priority: Minor > Attachments: undefinedlink.JPG > > > In event timeline view of jobs page, clicking job item would redirect you to > corresponding job page. when there are two many jobs, some job items' link > would redirect to wrong link like '.../jobs/undefined' -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-32886) '.../jobs/undefined' link from EvenTimeline view
Zhen Li created SPARK-32886: --- Summary: '.../jobs/undefined' link from EvenTimeline view Key: SPARK-32886 URL: https://issues.apache.org/jira/browse/SPARK-32886 Project: Spark Issue Type: Bug Components: Web UI Affects Versions: 3.0.0, 3.1.0 Reporter: Zhen Li In event timeline view of jobs page, clicking job item would redirect you to corresponding job page. when there are two many jobs, some job items' link would redirect to wrong link like '.../jobs/undefined' -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32581) update duration property for live ui application list and application apis
[ https://issues.apache.org/jira/browse/SPARK-32581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhen Li updated SPARK-32581: Attachment: updatedapiJPG.JPG oldapi.JPG > update duration property for live ui application list and application apis > -- > > Key: SPARK-32581 > URL: https://issues.apache.org/jira/browse/SPARK-32581 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.1.0 >Reporter: Zhen Li >Priority: Trivial > Attachments: oldapi.JPG, updatedapiJPG.JPG > > > "duration" property in response from application list and application APIs of > live UI is always "0". we want to let these two APIs return correct value, > same with "*Total Uptime*" in live UI's job page -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-32581) update duration property for live ui application list and application apis
Zhen Li created SPARK-32581: --- Summary: update duration property for live ui application list and application apis Key: SPARK-32581 URL: https://issues.apache.org/jira/browse/SPARK-32581 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.1.0 Reporter: Zhen Li "duration" property in response from application list and application APIs of live UI is always "0". we want to let these two APIs return correct value, same with "*Total Uptime*" in live UI's job page -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32028) App id link in history summary page point to wrong application attempt
[ https://issues.apache.org/jira/browse/SPARK-32028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhen Li updated SPARK-32028: Description: App id link in history summary page url is wrong, for multi attempts case. for details, please see attached screen. (was: App id link in history summary page url is wrong, for multi attempts case.) > App id link in history summary page point to wrong application attempt > -- > > Key: SPARK-32028 > URL: https://issues.apache.org/jira/browse/SPARK-32028 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.4.4, 3.0.0, 3.1.0 >Reporter: Zhen Li >Priority: Minor > Attachments: multi_same.JPG, wrong_attemptJPG.JPG > > > App id link in history summary page url is wrong, for multi attempts case. > for details, please see attached screen. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32028) App id link in history summary page point to wrong application attempt
[ https://issues.apache.org/jira/browse/SPARK-32028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhen Li updated SPARK-32028: Attachment: wrong_attemptJPG.JPG multi_same.JPG > App id link in history summary page point to wrong application attempt > -- > > Key: SPARK-32028 > URL: https://issues.apache.org/jira/browse/SPARK-32028 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.4.4, 3.0.0, 3.1.0 >Reporter: Zhen Li >Priority: Minor > Attachments: multi_same.JPG, wrong_attemptJPG.JPG > > > App id link in history summary page url is wrong, for multi attempts case. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-32028) App id link in history summary page point to wrong application attempt
Zhen Li created SPARK-32028: --- Summary: App id link in history summary page point to wrong application attempt Key: SPARK-32028 URL: https://issues.apache.org/jira/browse/SPARK-32028 Project: Spark Issue Type: Bug Components: Web UI Affects Versions: 3.0.0, 2.4.4, 3.1.0 Reporter: Zhen Li Attachments: multi_same.JPG, wrong_attemptJPG.JPG App id link in history summary page url is wrong, for multi attempts case. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-32024) Disk usage tracker went negative in HistoryServerDiskManager
Zhen Li created SPARK-32024: --- Summary: Disk usage tracker went negative in HistoryServerDiskManager Key: SPARK-32024 URL: https://issues.apache.org/jira/browse/SPARK-32024 Project: Spark Issue Type: Bug Components: Web UI Affects Versions: 3.0.0, 2.4.4, 3.1.0 Environment: System: Windows, Linux. Config: spark.history.retainedApplications 200 spark.history.store.maxDiskUsage 10g spark.history.store.path /cache_hs Reporter: Zhen Li After restart history server, we would see below error randomly. h2. HTTP ERROR 500 java.lang.IllegalStateException: Disk usage tracker went negative (now = -, delta = -) ||URI:|/history//*/stages/| ||STATUS:|500| ||MESSAGE:|java.lang.IllegalStateException: Disk usage tracker went negative (now = -, delta = -)| ||SERVLET:|org.apache.spark.deploy.history.HistoryServer$$anon$1-6ce1f601| ||CAUSED BY:|java.lang.IllegalStateException: Disk usage tracker went negative (now = -, delta = -)| h3. Caused by: java.lang.IllegalStateException: Disk usage tracker went negative (now = -633925, delta = -38947) at org.apache.spark.deploy.history.HistoryServerDiskManager.org$apache$spark$deploy$history$HistoryServerDiskManager$$updateUsage(HistoryServerDiskManager.scala:258) at org.apache.spark.deploy.history.HistoryServerDiskManager$Lease.rollback(HistoryServerDiskManager.scala:316) at org.apache.spark.deploy.history.FsHistoryProvider.loadDiskStore(FsHistoryProvider.scala:1192) at org.apache.spark.deploy.history.FsHistoryProvider.getAppUI(FsHistoryProvider.scala:363) at org.apache.spark.deploy.history.HistoryServer.getAppUI(HistoryServer.scala:191) at org.apache.spark.deploy.history.ApplicationCache.$anonfun$loadApplicationEntry$2(ApplicationCache.scala:163) at org.apache.spark.deploy.history.ApplicationCache.time(ApplicationCache.scala:135) at org.apache.spark.deploy.history.ApplicationCache.org$apache$spark$deploy$history$ApplicationCache$$loadApplicationEntry(ApplicationCache.scala:161) at org.apache.spark.deploy.history.ApplicationCache$$anon$1.load(ApplicationCache.scala:56) at org.apache.spark.deploy.history.ApplicationCache$$anon$1.load(ApplicationCache.scala:52) at org.sparkproject.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599) at org.sparkproject.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379) at org.sparkproject.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342) at org.sparkproject.guava.cache.LocalCache$Segment.get(LocalCache.java:2257) at org.sparkproject.guava.cache.LocalCache.get(LocalCache.java:4000) at org.sparkproject.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004) at org.sparkproject.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874) at org.apache.spark.deploy.history.ApplicationCache.get(ApplicationCache.scala:89) at org.apache.spark.deploy.history.ApplicationCache.withSparkUI(ApplicationCache.scala:101) at org.apache.spark.deploy.history.HistoryServer.org$apache$spark$deploy$history$HistoryServer$$loadAppUi(HistoryServer.scala:248) at org.apache.spark.deploy.history.HistoryServer$$anon$1.doGet(HistoryServer.scala:101) at javax.servlet.http.HttpServlet.service(HttpServlet.java:687) at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at org.sparkproject.jetty.servlet.ServletHolder.handle(ServletHolder.java:763) at org.sparkproject.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1631) at org.apache.spark.ui.HttpSecurityFilter.doFilter(HttpSecurityFilter.scala:95) at org.sparkproject.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1618) at org.sparkproject.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:549) at org.sparkproject.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233) at org.sparkproject.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1363) at org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188) at org.sparkproject.jetty.servlet.ServletHandler.doScope(ServletHandler.java:489) at org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186) at org.sparkproject.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1278) at org.sparkproject.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.sparkproject.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:767) at org.sparkproject.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:221) at org.sparkproject.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) at org.sparkproject.jetty.server.Server.handle(Server.java:500) at org.sparkproject.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:383) at org.sparkproject.jetty.server.HttpChannel.dispatch
[jira] [Updated] (SPARK-31929) Too many event files triggered "java.io.IOException" in history server on Windows
[ https://issues.apache.org/jira/browse/SPARK-31929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhen Li updated SPARK-31929: Environment: System: Windows Config: spark.history.retainedApplications 200 spark.history.retainedApplications 200spark.history.store.maxDiskUsage 2g spark.history.store.path d:// cache_hs was: System: Windows Config: spark.history.retainedApplications 200 spark.history.retainedApplications 200spark.history.store.maxDiskUsage 2g spark.history.store.path d:\\cache_hs > Too many event files triggered "java.io.IOException" in history server on > Windows > - > > Key: SPARK-31929 > URL: https://issues.apache.org/jira/browse/SPARK-31929 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.4.4 > Environment: System: Windows > Config: > spark.history.retainedApplications 200 > spark.history.retainedApplications 200spark.history.store.maxDiskUsage 2g > spark.history.store.path d:// > cache_hs >Reporter: Zhen Li >Priority: Minor > > h2. > h2. HTTP ERROR 500 > Problem accessing /history/app-20190711215551-0001/stages/. Reason: > Server Error > > h3. Caused by: > java.io.IOException: Unable to delete file: > d:\cache_hs\apps\app-20190711215551-0001.ldb\MANIFEST-07 at > org.apache.commons.io.FileUtils.forceDelete(FileUtils.java:2381) at > org.apache.commons.io.FileUtils.cleanDirectory(FileUtils.java:1679) at > org.apache.commons.io.FileUtils.deleteDirectory(FileUtils.java:1575) at > org.apache.spark.deploy.history.HistoryServerDiskManager.org$apache$spark$deploy$history$HistoryServerDiskManager$$deleteStore(HistoryServerDiskManager.scala:198) > at > org.apache.spark.deploy.history.HistoryServerDiskManager.$anonfun$release$1(HistoryServerDiskManager.scala:161) > at scala.runtime.java8.JFunction1$mcVJ$sp.apply(JFunction1$mcVJ$sp.java:23) > at scala.Option.foreach(Option.scala:407) at > org.apache.spark.deploy.history.HistoryServerDiskManager.release(HistoryServerDiskManager.scala:156) > at > org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$loadDiskStore$1(FsHistoryProvider.scala:1163) > at > org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$loadDiskStore$1$adapted(FsHistoryProvider.scala:1157) > at scala.Option.foreach(Option.scala:407) at > org.apache.spark.deploy.history.FsHistoryProvider.loadDiskStore(FsHistoryProvider.scala:1157) > at > org.apache.spark.deploy.history.FsHistoryProvider.getAppUI(FsHistoryProvider.scala:363) > at > org.apache.spark.deploy.history.HistoryServer.getAppUI(HistoryServer.scala:191) > at > org.apache.spark.deploy.history.ApplicationCache.$anonfun$loadApplicationEntry$2(ApplicationCache.scala:163) > at > org.apache.spark.deploy.history.ApplicationCache.time(ApplicationCache.scala:135) > at > org.apache.spark.deploy.history.ApplicationCache.org$apache$spark$deploy$history$ApplicationCache$$loadApplicationEntry(ApplicationCache.scala:161) > at > org.apache.spark.deploy.history.ApplicationCache$$anon$1.load(ApplicationCache.scala:56) > at > org.apache.spark.deploy.history.ApplicationCache$$anon$1.load(ApplicationCache.scala:52) > at > org.sparkproject.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599) > at > org.sparkproject.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379) > at > org.sparkproject.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342) > at org.sparkproject.guava.cache.LocalCache$Segment.get(LocalCache.java:2257) > at org.sparkproject.guava.cache.LocalCache.get(LocalCache.java:4000) at > org.sparkproject.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004) at > org.sparkproject.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874) > at > org.apache.spark.deploy.history.ApplicationCache.get(ApplicationCache.scala:89) > at > org.apache.spark.deploy.history.ApplicationCache.withSparkUI(ApplicationCache.scala:101) > at > org.apache.spark.deploy.history.HistoryServer.org$apache$spark$deploy$history$HistoryServer$$loadAppUi(HistoryServer.scala:248) > at > org.apache.spark.deploy.history.HistoryServer$$anon$1.doGet(HistoryServer.scala:101) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:687) at > javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at > org.sparkproject.jetty.servlet.ServletHolder.handle(ServletHolder.java:873) > at > org.sparkproject.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1623) > at > org.apache.spark.ui.HttpSecurityFilter.doFilter(HttpSecurityFilter.scala:95) > at > org.sparkproject.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1610) > at > org.sparkproject.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540) > at > org.sparkp
[jira] [Updated] (SPARK-31929) Too many event files triggered "java.io.IOException" in history server on Windows
[ https://issues.apache.org/jira/browse/SPARK-31929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhen Li updated SPARK-31929: Environment: System: Windows Config: spark.history.retainedApplications 200 spark.history.retainedApplications 200spark.history.store.maxDiskUsage 2g spark.history.store.path d://cache_hs was: System: Windows Config: spark.history.retainedApplications 200 spark.history.retainedApplications 200spark.history.store.maxDiskUsage 2g spark.history.store.path d:// cache_hs > Too many event files triggered "java.io.IOException" in history server on > Windows > - > > Key: SPARK-31929 > URL: https://issues.apache.org/jira/browse/SPARK-31929 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.4.4 > Environment: System: Windows > Config: > spark.history.retainedApplications 200 > spark.history.retainedApplications 200spark.history.store.maxDiskUsage 2g > spark.history.store.path d://cache_hs >Reporter: Zhen Li >Priority: Minor > > h2. > h2. HTTP ERROR 500 > Problem accessing /history/app-20190711215551-0001/stages/. Reason: > Server Error > > h3. Caused by: > java.io.IOException: Unable to delete file: > d:\cache_hs\apps\app-20190711215551-0001.ldb\MANIFEST-07 at > org.apache.commons.io.FileUtils.forceDelete(FileUtils.java:2381) at > org.apache.commons.io.FileUtils.cleanDirectory(FileUtils.java:1679) at > org.apache.commons.io.FileUtils.deleteDirectory(FileUtils.java:1575) at > org.apache.spark.deploy.history.HistoryServerDiskManager.org$apache$spark$deploy$history$HistoryServerDiskManager$$deleteStore(HistoryServerDiskManager.scala:198) > at > org.apache.spark.deploy.history.HistoryServerDiskManager.$anonfun$release$1(HistoryServerDiskManager.scala:161) > at scala.runtime.java8.JFunction1$mcVJ$sp.apply(JFunction1$mcVJ$sp.java:23) > at scala.Option.foreach(Option.scala:407) at > org.apache.spark.deploy.history.HistoryServerDiskManager.release(HistoryServerDiskManager.scala:156) > at > org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$loadDiskStore$1(FsHistoryProvider.scala:1163) > at > org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$loadDiskStore$1$adapted(FsHistoryProvider.scala:1157) > at scala.Option.foreach(Option.scala:407) at > org.apache.spark.deploy.history.FsHistoryProvider.loadDiskStore(FsHistoryProvider.scala:1157) > at > org.apache.spark.deploy.history.FsHistoryProvider.getAppUI(FsHistoryProvider.scala:363) > at > org.apache.spark.deploy.history.HistoryServer.getAppUI(HistoryServer.scala:191) > at > org.apache.spark.deploy.history.ApplicationCache.$anonfun$loadApplicationEntry$2(ApplicationCache.scala:163) > at > org.apache.spark.deploy.history.ApplicationCache.time(ApplicationCache.scala:135) > at > org.apache.spark.deploy.history.ApplicationCache.org$apache$spark$deploy$history$ApplicationCache$$loadApplicationEntry(ApplicationCache.scala:161) > at > org.apache.spark.deploy.history.ApplicationCache$$anon$1.load(ApplicationCache.scala:56) > at > org.apache.spark.deploy.history.ApplicationCache$$anon$1.load(ApplicationCache.scala:52) > at > org.sparkproject.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599) > at > org.sparkproject.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379) > at > org.sparkproject.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342) > at org.sparkproject.guava.cache.LocalCache$Segment.get(LocalCache.java:2257) > at org.sparkproject.guava.cache.LocalCache.get(LocalCache.java:4000) at > org.sparkproject.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004) at > org.sparkproject.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874) > at > org.apache.spark.deploy.history.ApplicationCache.get(ApplicationCache.scala:89) > at > org.apache.spark.deploy.history.ApplicationCache.withSparkUI(ApplicationCache.scala:101) > at > org.apache.spark.deploy.history.HistoryServer.org$apache$spark$deploy$history$HistoryServer$$loadAppUi(HistoryServer.scala:248) > at > org.apache.spark.deploy.history.HistoryServer$$anon$1.doGet(HistoryServer.scala:101) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:687) at > javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at > org.sparkproject.jetty.servlet.ServletHolder.handle(ServletHolder.java:873) > at > org.sparkproject.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1623) > at > org.apache.spark.ui.HttpSecurityFilter.doFilter(HttpSecurityFilter.scala:95) > at > org.sparkproject.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1610) > at > org.sparkproject.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540) > at > org.sparkproj
[jira] [Created] (SPARK-31929) Too many event files triggered "java.io.IOException" in history server on Windows
Zhen Li created SPARK-31929: --- Summary: Too many event files triggered "java.io.IOException" in history server on Windows Key: SPARK-31929 URL: https://issues.apache.org/jira/browse/SPARK-31929 Project: Spark Issue Type: Bug Components: Web UI Affects Versions: 2.4.4 Environment: System: Windows Config: spark.history.retainedApplications 200 spark.history.retainedApplications 200spark.history.store.maxDiskUsage 2g spark.history.store.path d:\\cache_hs Reporter: Zhen Li h2. h2. HTTP ERROR 500 Problem accessing /history/app-20190711215551-0001/stages/. Reason: Server Error h3. Caused by: java.io.IOException: Unable to delete file: d:\cache_hs\apps\app-20190711215551-0001.ldb\MANIFEST-07 at org.apache.commons.io.FileUtils.forceDelete(FileUtils.java:2381) at org.apache.commons.io.FileUtils.cleanDirectory(FileUtils.java:1679) at org.apache.commons.io.FileUtils.deleteDirectory(FileUtils.java:1575) at org.apache.spark.deploy.history.HistoryServerDiskManager.org$apache$spark$deploy$history$HistoryServerDiskManager$$deleteStore(HistoryServerDiskManager.scala:198) at org.apache.spark.deploy.history.HistoryServerDiskManager.$anonfun$release$1(HistoryServerDiskManager.scala:161) at scala.runtime.java8.JFunction1$mcVJ$sp.apply(JFunction1$mcVJ$sp.java:23) at scala.Option.foreach(Option.scala:407) at org.apache.spark.deploy.history.HistoryServerDiskManager.release(HistoryServerDiskManager.scala:156) at org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$loadDiskStore$1(FsHistoryProvider.scala:1163) at org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$loadDiskStore$1$adapted(FsHistoryProvider.scala:1157) at scala.Option.foreach(Option.scala:407) at org.apache.spark.deploy.history.FsHistoryProvider.loadDiskStore(FsHistoryProvider.scala:1157) at org.apache.spark.deploy.history.FsHistoryProvider.getAppUI(FsHistoryProvider.scala:363) at org.apache.spark.deploy.history.HistoryServer.getAppUI(HistoryServer.scala:191) at org.apache.spark.deploy.history.ApplicationCache.$anonfun$loadApplicationEntry$2(ApplicationCache.scala:163) at org.apache.spark.deploy.history.ApplicationCache.time(ApplicationCache.scala:135) at org.apache.spark.deploy.history.ApplicationCache.org$apache$spark$deploy$history$ApplicationCache$$loadApplicationEntry(ApplicationCache.scala:161) at org.apache.spark.deploy.history.ApplicationCache$$anon$1.load(ApplicationCache.scala:56) at org.apache.spark.deploy.history.ApplicationCache$$anon$1.load(ApplicationCache.scala:52) at org.sparkproject.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599) at org.sparkproject.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379) at org.sparkproject.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342) at org.sparkproject.guava.cache.LocalCache$Segment.get(LocalCache.java:2257) at org.sparkproject.guava.cache.LocalCache.get(LocalCache.java:4000) at org.sparkproject.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004) at org.sparkproject.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874) at org.apache.spark.deploy.history.ApplicationCache.get(ApplicationCache.scala:89) at org.apache.spark.deploy.history.ApplicationCache.withSparkUI(ApplicationCache.scala:101) at org.apache.spark.deploy.history.HistoryServer.org$apache$spark$deploy$history$HistoryServer$$loadAppUi(HistoryServer.scala:248) at org.apache.spark.deploy.history.HistoryServer$$anon$1.doGet(HistoryServer.scala:101) at javax.servlet.http.HttpServlet.service(HttpServlet.java:687) at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at org.sparkproject.jetty.servlet.ServletHolder.handle(ServletHolder.java:873) at org.sparkproject.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1623) at org.apache.spark.ui.HttpSecurityFilter.doFilter(HttpSecurityFilter.scala:95) at org.sparkproject.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1610) at org.sparkproject.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540) at org.sparkproject.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255) at org.sparkproject.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1345) at org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203) at org.sparkproject.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480) at org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201) at org.sparkproject.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1247) at org.sparkproject.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144) at org.sparkproject.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:753) at org.sparkproject.jetty.server.handler.ContextHandle