[jira] [Commented] (SPARK-46762) Spark Connect 3.5 Classloading issue with external jar

2024-02-28 Thread Zhen Li (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-46762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821918#comment-17821918
 ] 

Zhen Li commented on SPARK-46762:
-

[~tenstriker] can you provide more info to reproduce your error? the class 
loading problem is a bit hard to debug. It would be helpful if you can give us 
a command or test to reproduce the error?

> Spark Connect 3.5 Classloading issue with external jar
> --
>
> Key: SPARK-46762
> URL: https://issues.apache.org/jira/browse/SPARK-46762
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: nirav patel
>Priority: Major
> Attachments: Screenshot 2024-02-22 at 2.04.37 PM.png, Screenshot 
> 2024-02-22 at 2.04.49 PM.png
>
>
> We are having following `java.lang.ClassCastException` error in spark 
> Executors when using spark-connect 3.5 with external spark sql catalog jar - 
> iceberg-spark-runtime-3.5_2.12-1.4.3.jar
> We also set "spark.executor.userClassPathFirst=true" otherwise child class 
> gets loaded by MutableClassLoader and parent class gets loaded by 
> ChildFirstCLassLoader and that causes ClassCastException as well.
>  
> {code:java}
> pyspark.errors.exceptions.connect.SparkConnectGrpcException: 
> (org.apache.spark.SparkException) Job aborted due to stage failure: Task 0 in 
> stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 
> (TID 3) (spark35-m.c.mycomp-dev-test.internal executor 2): 
> java.lang.ClassCastException: class 
> org.apache.iceberg.spark.source.SerializableTableWithSize cannot be cast to 
> class org.apache.iceberg.Table 
> (org.apache.iceberg.spark.source.SerializableTableWithSize is in unnamed 
> module of loader org.apache.spark.util.ChildFirstURLClassLoader @5e7ae053; 
> org.apache.iceberg.Table is in unnamed module of loader 
> org.apache.spark.util.ChildFirstURLClassLoader @4b18b943)
>     at 
> org.apache.iceberg.spark.source.SparkInputPartition.table(SparkInputPartition.java:88)
>     at 
> org.apache.iceberg.spark.source.RowDataReader.(RowDataReader.java:50)
>     at 
> org.apache.iceberg.spark.source.SparkRowReaderFactory.createReader(SparkRowReaderFactory.java:45)
>     at 
> org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.advanceToNextIter(DataSourceRDD.scala:84)
>     at 
> org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.hasNext(DataSourceRDD.scala:63)
>     at 
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
>     at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
>     at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
>     at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:388)
>     at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:890)
>     at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:890)
>     at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>     at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364)
>     at org.apache.spark.rdd.RDD.iterator(RDD.scala:328)
>     at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
>     at 
> org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
>     at org.apache.spark.scheduler.Task.run(Task.scala:141)
>     at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)
>     at org.apach...{code}
>  
> `org.apache.iceberg.spark.source.SerializableTableWithSize` is a child of 
> `org.apache.iceberg.Table` and they are both in only one jar  
> `iceberg-spark-runtime-3.5_2.12-1.4.3.jar` 
> We verified that there's only one jar of 
> `iceberg-spark-runtime-3.5_2.12-1.4.3.jar` loaded when spark-connect server 
> is started. 
> Looking more into Error it seems classloader itself is instantiated multiple 
> times somewhere. I can see two instances: 
> org.apache.spark.util.ChildFirstURLClassLoader @5e7ae053 and 
> org.apache.spark.util.ChildFirstURLClassLoader @4b18b943 
>  
> *Affected version:*
> spark 3.5 and spark-connect_2.12:3.5.0 works fine
>  
> *Not affected version and variation:*
> Spark 3.4 and spark-connect_2.12:3.4.0 works fine with external jar
> Also works with just Spark 3.5 spark-submit script directly (ie without using 
> spark-connect 3.5 )
>  
> Issue has been open with Iceberg as well: 
> [https://github.com/apache/iceberg/issues/8978]
> And been discussed in dev@org.apache.iceberg: 
> [https://lists.apache.org/thread/5q1pdqqrd1h06hgs8vx9ztt60z5yv8n1]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, 

[jira] [Created] (SPARK-45679) Add clusterBy in DataFrame API

2023-10-26 Thread Zhen Li (Jira)
Zhen Li created SPARK-45679:
---

 Summary: Add clusterBy in DataFrame API
 Key: SPARK-45679
 URL: https://issues.apache.org/jira/browse/SPARK-45679
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.5.1
Reporter: Zhen Li


Add clusterBy to Dataframe API e.g. in python

DataframeWriterV1
```
df.write
  .format("delta")
  .clusterBy("clusteringColumn1", "clusteringColumn2")
  .save(...) or saveAsTable(...)
```

DataFrameWriterV2
```
df.writeTo(...).using("delta")
  .clusterBy("clusteringColumn1", "clusteringColumn2")
  .create() or replace() or createOrReplace()
```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44615) Rename spark connect client suites to avoid conflict

2023-07-31 Thread Zhen Li (Jira)
Zhen Li created SPARK-44615:
---

 Summary: Rename spark connect client suites to avoid conflict
 Key: SPARK-44615
 URL: https://issues.apache.org/jira/browse/SPARK-44615
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.5.0
Reporter: Zhen Li






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44576) Session Artifact update breaks XXWithState methods in KVGDS

2023-07-27 Thread Zhen Li (Jira)
Zhen Li created SPARK-44576:
---

 Summary: Session Artifact update breaks XXWithState methods in 
KVGDS
 Key: SPARK-44576
 URL: https://issues.apache.org/jira/browse/SPARK-44576
 Project: Spark
  Issue Type: Bug
  Components: Connect
Affects Versions: 3.5.0
Reporter: Zhen Li


When changing the client test jar from system classloader to session 
classloader 
(https://github.com/apache/spark/compare/master...zhenlineo:spark:streaming-artifacts?expand=1),
 all XXWithState test suite failed with class loader errors: e.g.
```
23/07/25 16:13:14 WARN TaskSetManager: Lost task 1.0 in stage 2.0 (TID 16) 
(10.8.132.125 executor driver): TaskKilled (Stage cancelled: Job aborted due to 
stage failure: Task 170 in stage 2.0 failed 1 times, most recent failure: Lost 
task 170.0 in stage 2.0 (TID 14) (10.8.132.125 executor driver): 
java.lang.ClassCastException: class org.apache.spark.sql.streaming.ClickState 
cannot be cast to class org.apache.spark.sql.streaming.ClickState 
(org.apache.spark.sql.streaming.ClickState is in unnamed module of loader 
org.apache.spark.util.MutableURLClassLoader @2c604965; 
org.apache.spark.sql.streaming.ClickState is in unnamed module of loader 
java.net.URLClassLoader @57751f4)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage4.processNext(Unknown
 Source)
at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at 
org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:43)
at 
org.apache.spark.sql.execution.datasources.v2.WritingSparkTask.$anonfun$run$1(WriteToDataSourceV2Exec.scala:441)
at 
org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1514)
at 
org.apache.spark.sql.execution.datasources.v2.WritingSparkTask.run(WriteToDataSourceV2Exec.scala:486)
at 
org.apache.spark.sql.execution.datasources.v2.WritingSparkTask.run$(WriteToDataSourceV2Exec.scala:425)
at 
org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.run(WriteToDataSourceV2Exec.scala:491)
at 
org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.$anonfun$writeWithV2$2(WriteToDataSourceV2Exec.scala:388)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
at 
org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
at org.apache.spark.scheduler.Task.run(Task.scala:141)
at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:592)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1480)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:595)
at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)

Driver stacktrace:)
23/07/25 16:13:14 ERROR Utils: Aborting task
java.lang.IllegalStateException: Error committing version 1 into 
HDFSStateStore[id=(op=0,part=5),dir=file:/private/var/folders/b0/f9jmmrrx5js7xsswxyf58nwrgp/T/temporary-02cca002-e189-4e32-afd8-964d6f8d5056/state/0/5]
at 
org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$HDFSBackedStateStore.commit(HDFSBackedStateStoreProvider.scala:148)
at 
org.apache.spark.sql.execution.streaming.FlatMapGroupsWithStateExecBase.$anonfun$processDataWithPartition$4(FlatMapGroupsWithStateExec.scala:183)
at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at org.apache.spark.util.Utils$.timeTakenMs(Utils.scala:611)
at 
org.apache.spark.sql.execution.streaming.StateStoreWriter.timeTakenMs(statefulOperators.scala:179)
at 
org.apache.spark.sql.execution.streaming.StateStoreWriter.timeTakenMs$(statefulOperators.scala:179)
at 
org.apache.spark.sql.execution.streaming.FlatMapGroupsWithStateExec.timeTakenMs(FlatMapGroupsWithStateExec.scala:374)
at 
org.apache.spark.sql.execution.streaming.FlatMapGroupsWithStateExecBase.$anonfun$processDataWithPartition$3(FlatMapGroupsWithStateExec.scala:183)
at 
org.apache.spark.util.CompletionIterator$$anon$1.completion(CompletionIterator.scala:47)
at 
org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:36)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage4.processNext(Unknown
 Source)
at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at 

[jira] [Commented] (SPARK-43416) Fix the bug where the ProduceEncoder#tuples fields names are different from server

2023-07-05 Thread Zhen Li (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17740312#comment-17740312
 ] 

Zhen Li commented on SPARK-43416:
-

[~hvanhovell] Yes. Fixed by https://github.com/apache/spark/pull/41846

> Fix the bug where the ProduceEncoder#tuples fields names are different from 
> server
> --
>
> Key: SPARK-43416
> URL: https://issues.apache.org/jira/browse/SPARK-43416
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Zhen Li
>Priority: Major
>
> The fields are named _1, _2, ... etc. However on the server side it could be 
> nicely named in agg operations such as key, value etc. Fix this if possible.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44228) Handle Row Encoder in nested struct

2023-06-28 Thread Zhen Li (Jira)
Zhen Li created SPARK-44228:
---

 Summary: Handle Row Encoder in nested struct
 Key: SPARK-44228
 URL: https://issues.apache.org/jira/browse/SPARK-44228
 Project: Spark
  Issue Type: Story
  Components: Connect
Affects Versions: 3.5.0
Reporter: Zhen Li


Followups of [SPARK-43321] and [SPARK-44161] where the nested row encoder could 
be possible.

Add some tests to ensure we covered all cases.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44225) Move resolveSelfJoinCondition to Analyzer

2023-06-28 Thread Zhen Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhen Li updated SPARK-44225:

Description: 
Move the Joinwith object e.g. `ResolveSelfJoinCondition` into analyzer instead.
See more discussion from SPARK-43321 
https://github.com/apache/spark/pull/40997/files#r1244509826

  was:
Move the Joinwith `resolveSelfJoinCondition` into analyzer instead.
See more discussion from SPARK-43321 
https://github.com/apache/spark/pull/40997/files#r1244509826


> Move resolveSelfJoinCondition to Analyzer
> -
>
> Key: SPARK-44225
> URL: https://issues.apache.org/jira/browse/SPARK-44225
> Project: Spark
>  Issue Type: Story
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Zhen Li
>Priority: Major
>
> Move the Joinwith object e.g. `ResolveSelfJoinCondition` into analyzer 
> instead.
> See more discussion from SPARK-43321 
> https://github.com/apache/spark/pull/40997/files#r1244509826



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44225) Move resolveSelfJoinCondition to Analyzer

2023-06-28 Thread Zhen Li (Jira)
Zhen Li created SPARK-44225:
---

 Summary: Move resolveSelfJoinCondition to Analyzer
 Key: SPARK-44225
 URL: https://issues.apache.org/jira/browse/SPARK-44225
 Project: Spark
  Issue Type: Story
  Components: SQL
Affects Versions: 3.5.0
Reporter: Zhen Li


Move the Joinwith `resolveSelfJoinCondition` into analyzer instead.
See more discussion from SPARK-43321 
https://github.com/apache/spark/pull/40997/files#r1244509826



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44161) Row as UDF inputs causes encoder errors

2023-06-23 Thread Zhen Li (Jira)
Zhen Li created SPARK-44161:
---

 Summary: Row as UDF inputs causes encoder errors
 Key: SPARK-44161
 URL: https://issues.apache.org/jira/browse/SPARK-44161
 Project: Spark
  Issue Type: Bug
  Components: Connect
Affects Versions: 3.5.0
Reporter: Zhen Li


Ensure row inputs to udfs can be handled correctly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43757) Change CheckConnectJvmClientCompatibility to deny list to increase the API check coverage

2023-05-23 Thread Zhen Li (Jira)
Zhen Li created SPARK-43757:
---

 Summary: Change CheckConnectJvmClientCompatibility to deny list to 
increase the API check coverage
 Key: SPARK-43757
 URL: https://issues.apache.org/jira/browse/SPARK-43757
 Project: Spark
  Issue Type: Bug
  Components: Connect
Affects Versions: 3.5.0
Reporter: Zhen Li


The current compatibility check only checks selected classes. So when adding a 
new class, if a developer forgets to add this class into the checklist, then 
this API is not covered in compatibility tests. Thus we should change this API 
check to always include all APIs by default.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43717) Scala Client Dataset#reduce failed to handle null partitions for scala primitive types

2023-05-22 Thread Zhen Li (Jira)
Zhen Li created SPARK-43717:
---

 Summary: Scala Client Dataset#reduce failed to handle null 
partitions for scala primitive types
 Key: SPARK-43717
 URL: https://issues.apache.org/jira/browse/SPARK-43717
 Project: Spark
  Issue Type: Bug
  Components: Connect
Affects Versions: 3.5.0
Reporter: Zhen Li


Scala client failed with NPE when running:

assert(spark.range(0, 5, 1, 10).as[Long].reduce(_ + _) == 10)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43416) Fix the bug where the ProduceEncoder#tuples fields names are different from server

2023-05-08 Thread Zhen Li (Jira)
Zhen Li created SPARK-43416:
---

 Summary: Fix the bug where the ProduceEncoder#tuples fields names 
are different from server
 Key: SPARK-43416
 URL: https://issues.apache.org/jira/browse/SPARK-43416
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.5.0
Reporter: Zhen Li


The fields are named _1, _2, ... etc. However on the server side it could be 
nicely named in agg operations such as key, value etc. Fix this if possible.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43415) Impl mapValues for KVGDS#mapValues

2023-05-08 Thread Zhen Li (Jira)
Zhen Li created SPARK-43415:
---

 Summary: Impl mapValues for KVGDS#mapValues
 Key: SPARK-43415
 URL: https://issues.apache.org/jira/browse/SPARK-43415
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.5.0
Reporter: Zhen Li


Use an resolved func to pass the mapValues together with all aggExprs. Then on 
the server side unfold it to apply mapValues first before running aggregate.

e.g. 
https://github.com/apache/spark/commit/a234a9b0851ebce87c0ef831b24866f94f0c0d36



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43321) Impl Dataset#JoinWith

2023-04-28 Thread Zhen Li (Jira)
Zhen Li created SPARK-43321:
---

 Summary: Impl Dataset#JoinWith
 Key: SPARK-43321
 URL: https://issues.apache.org/jira/browse/SPARK-43321
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.5.0
Reporter: Zhen Li


Impl missing method JoinWith



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43223) KeyValueGroupedDataset#agg

2023-04-20 Thread Zhen Li (Jira)
Zhen Li created SPARK-43223:
---

 Summary: KeyValueGroupedDataset#agg
 Key: SPARK-43223
 URL: https://issues.apache.org/jira/browse/SPARK-43223
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.5.0
Reporter: Zhen Li


Adding missing agg functions in the KVGDS API



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43136) Scala mapGroup, coGroup

2023-04-13 Thread Zhen Li (Jira)
Zhen Li created SPARK-43136:
---

 Summary: Scala mapGroup, coGroup
 Key: SPARK-43136
 URL: https://issues.apache.org/jira/browse/SPARK-43136
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.5.0
Reporter: Zhen Li


Adding Basics of Dataset#groupByKey -> KeyValueGroupedDataset support



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42999) Impl Dataset#foreach, foreachPartitions

2023-03-31 Thread Zhen Li (Jira)
Zhen Li created SPARK-42999:
---

 Summary: Impl Dataset#foreach, foreachPartitions
 Key: SPARK-42999
 URL: https://issues.apache.org/jira/browse/SPARK-42999
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.5.0
Reporter: Zhen Li


Impl the missing methods in Scala Client Dataset API.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42953) Impl typed map, flatMap, mapPartitions in Dataset

2023-03-28 Thread Zhen Li (Jira)
Zhen Li created SPARK-42953:
---

 Summary: Impl typed map, flatMap, mapPartitions in Dataset
 Key: SPARK-42953
 URL: https://issues.apache.org/jira/browse/SPARK-42953
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.4.0
Reporter: Zhen Li


Add missing typed API support in the Dataset API.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42519) Add more WriteTo tests after Scala Client session config is supported

2023-03-24 Thread Zhen Li (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17704655#comment-17704655
 ] 

Zhen Li commented on SPARK-42519:
-

Hi [~fanjia] We need to figure out a way to pass the class files to the server 
classpath. This is also the main blocker for this ticket. To do so, either we 
can configure something to pass the classfiles via the spark-submit call, or we 
can wait for client side artifact auto sync work and see if we can sync the 
test files via this.

> Add more WriteTo tests after Scala Client session config is supported
> -
>
> Key: SPARK-42519
> URL: https://issues.apache.org/jira/browse/SPARK-42519
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Zhen Li
>Priority: Major
>
> Add more test cases following the examples in 
> "SparkConnectProtoSuite("WriteTo")" tests and add more tests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42786) Impl typed select in Dataset

2023-03-14 Thread Zhen Li (Jira)
Zhen Li created SPARK-42786:
---

 Summary: Impl typed select in Dataset
 Key: SPARK-42786
 URL: https://issues.apache.org/jira/browse/SPARK-42786
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.4.0
Reporter: Zhen Li






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42175) Implement more methods in the Scala Client Dataset API

2023-03-03 Thread Zhen Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhen Li resolved SPARK-42175.
-
Resolution: Duplicate

> Implement more methods in the Scala Client Dataset API
> --
>
> Key: SPARK-42175
> URL: https://issues.apache.org/jira/browse/SPARK-42175
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Zhen Li
>Priority: Major
>
> Also fix the TODOs in the MiMa compatibility test. 
> https://github.com/apache/spark/pull/39712



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42656) Spark Connect Scala Client Shell Script

2023-03-02 Thread Zhen Li (Jira)
Zhen Li created SPARK-42656:
---

 Summary: Spark Connect Scala Client Shell Script
 Key: SPARK-42656
 URL: https://issues.apache.org/jira/browse/SPARK-42656
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.4.0
Reporter: Zhen Li


Adding a shell script to run scala client in a scala REPL to allow users to 
connect to spark connect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42575) Replace `AnyFunSuite` with `ConnectFunSuite` for scala client tests

2023-02-24 Thread Zhen Li (Jira)
Zhen Li created SPARK-42575:
---

 Summary: Replace `AnyFunSuite` with `ConnectFunSuite` for scala 
client tests
 Key: SPARK-42575
 URL: https://issues.apache.org/jira/browse/SPARK-42575
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.4.0
Reporter: Zhen Li


Make enginner's life easier.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42573) Enable binary compatibility tests for SparkSession/Dataset/Column/functions

2023-02-24 Thread Zhen Li (Jira)
Zhen Li created SPARK-42573:
---

 Summary: Enable binary compatibility tests for 
SparkSession/Dataset/Column/functions
 Key: SPARK-42573
 URL: https://issues.apache.org/jira/browse/SPARK-42573
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.4.0
Reporter: Zhen Li






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42533) SSL support for Scala Client

2023-02-22 Thread Zhen Li (Jira)
Zhen Li created SPARK-42533:
---

 Summary: SSL support for Scala Client
 Key: SPARK-42533
 URL: https://issues.apache.org/jira/browse/SPARK-42533
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.4.0
Reporter: Zhen Li


Add the basic encryption support for scala client.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-42518) Scala client Write API V2

2023-02-22 Thread Zhen Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhen Li closed SPARK-42518.
---

> Scala client Write API V2
> -
>
> Key: SPARK-42518
> URL: https://issues.apache.org/jira/browse/SPARK-42518
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Zhen Li
>Assignee: Zhen Li
>Priority: Major
> Fix For: 3.4.0
>
>
> Impl the Dataset#writeTo method.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42519) Add more WriteTo tests after Scala Client session config is supported

2023-02-21 Thread Zhen Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhen Li updated SPARK-42519:

Description: Add more test cases following the examples in 
"SparkConnectProtoSuite("WriteTo")" tests and add more tests.  (was: Impl Scala 
Client Session Config to allow users to be able to set configs for spark.)

> Add more WriteTo tests after Scala Client session config is supported
> -
>
> Key: SPARK-42519
> URL: https://issues.apache.org/jira/browse/SPARK-42519
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Zhen Li
>Priority: Major
>
> Add more test cases following the examples in 
> "SparkConnectProtoSuite("WriteTo")" tests and add more tests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42519) Add more WriteTo tests after Scala Client session config is supported

2023-02-21 Thread Zhen Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhen Li updated SPARK-42519:

Summary: Add more WriteTo tests after Scala Client session config is 
supported  (was: Scala Client session config)

> Add more WriteTo tests after Scala Client session config is supported
> -
>
> Key: SPARK-42519
> URL: https://issues.apache.org/jira/browse/SPARK-42519
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Zhen Li
>Priority: Major
>
> Impl Scala Client Session Config to allow users to be able to set configs for 
> spark.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42519) Scala Client session config

2023-02-21 Thread Zhen Li (Jira)
Zhen Li created SPARK-42519:
---

 Summary: Scala Client session config
 Key: SPARK-42519
 URL: https://issues.apache.org/jira/browse/SPARK-42519
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.4.0
Reporter: Zhen Li


Impl Scala Client Session Config to allow users to be able to set configs for 
spark.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42518) Scala client Write API V2

2023-02-21 Thread Zhen Li (Jira)
Zhen Li created SPARK-42518:
---

 Summary: Scala client Write API V2
 Key: SPARK-42518
 URL: https://issues.apache.org/jira/browse/SPARK-42518
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.4.0
Reporter: Zhen Li


Impl the Dataset#writeTo method.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-42482) Scala client Write API V1

2023-02-21 Thread Zhen Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhen Li closed SPARK-42482.
---

> Scala client Write API V1
> -
>
> Key: SPARK-42482
> URL: https://issues.apache.org/jira/browse/SPARK-42482
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Zhen Li
>Priority: Major
> Fix For: 3.4.0
>
>
> Add basic Dataset#write API for Scala client.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-42457) Scala Client Session Read API

2023-02-21 Thread Zhen Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhen Li closed SPARK-42457.
---

> Scala Client Session Read API
> -
>
> Key: SPARK-42457
> URL: https://issues.apache.org/jira/browse/SPARK-42457
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Zhen Li
>Assignee: Zhen Li
>Priority: Major
> Fix For: 3.4.0
>
>
> Add SparkSession#read impl to be able to read data.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-42202) Scala Client E2E test stop the server gracefully

2023-02-21 Thread Zhen Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhen Li closed SPARK-42202.
---

> Scala Client E2E test stop the server gracefully
> 
>
> Key: SPARK-42202
> URL: https://issues.apache.org/jira/browse/SPARK-42202
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Zhen Li
>Assignee: Zhen Li
>Priority: Minor
> Fix For: 3.4.0
>
>
> The current solution kills the spark connect server process which may result 
> in some errors in the command line.
> Suggest a minor fix to close the server process gracefully.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-42429) IntelliJ Build issue: value getArgument is not a member of org.mockito.invocation.InvocationOnMock

2023-02-21 Thread Zhen Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhen Li closed SPARK-42429.
---

> IntelliJ Build issue: value getArgument is not a member of 
> org.mockito.invocation.InvocationOnMock
> --
>
> Key: SPARK-42429
> URL: https://issues.apache.org/jira/browse/SPARK-42429
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.2.4
>Reporter: Zhen Li
>Assignee: Zhen Li
>Priority: Trivial
> Fix For: 3.4.0
>
>
> When running the tests with IntelliJ, sometime the error pops out:
>  
> spark/sql/core/src/test/scala/org/apache/spark/sql/execution/command/PlanResolutionSuite.scala:149:18
> value getArgument is not a member of org.mockito.invocation.InvocationOnMock
> invocation.getArgument[Identifier](0).name match {
>  
> It seems caused by some conflicts versioning of mockito in the IDE. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-42172) Compatibility check for Scala Client

2023-02-21 Thread Zhen Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhen Li closed SPARK-42172.
---

> Compatibility check for Scala Client
> 
>
> Key: SPARK-42172
> URL: https://issues.apache.org/jira/browse/SPARK-42172
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Zhen Li
>Assignee: Zhen Li
>Priority: Major
> Fix For: 3.4.0
>
>
> Adding compatibility checks for Scala client to ensure the Scala Client API 
> is binary compatible with Existing Spark SQL API (Dataset, SparkSession etc.)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-42043) Basic Scala Client Result Implementation

2023-02-21 Thread Zhen Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhen Li closed SPARK-42043.
---

> Basic Scala Client Result Implementation 
> -
>
> Key: SPARK-42043
> URL: https://issues.apache.org/jira/browse/SPARK-42043
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Zhen Li
>Assignee: Zhen Li
>Priority: Major
> Fix For: 3.4.0
>
>
> Adding the basic scala client Result implementation. Add some tests to verify 
> the result can be received correctly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42482) Scala client Write API V1

2023-02-17 Thread Zhen Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhen Li updated SPARK-42482:

Description: Add basic Dataset#write API for Scala client.  (was: Add basic 
SparkSession#write API for Scala client.)

> Scala client Write API V1
> -
>
> Key: SPARK-42482
> URL: https://issues.apache.org/jira/browse/SPARK-42482
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Zhen Li
>Priority: Major
>
> Add basic Dataset#write API for Scala client.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42482) Scala client Write API V1

2023-02-17 Thread Zhen Li (Jira)
Zhen Li created SPARK-42482:
---

 Summary: Scala client Write API V1
 Key: SPARK-42482
 URL: https://issues.apache.org/jira/browse/SPARK-42482
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.4.0
Reporter: Zhen Li


Add basic SparkSession#write API for Scala client.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42457) Scala Client Session Read API

2023-02-15 Thread Zhen Li (Jira)
Zhen Li created SPARK-42457:
---

 Summary: Scala Client Session Read API
 Key: SPARK-42457
 URL: https://issues.apache.org/jira/browse/SPARK-42457
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.4.0
Reporter: Zhen Li


Add SparkSession#read impl to be able to read data.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42449) Fix `native-image.propertie` in Scala Client

2023-02-15 Thread Zhen Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhen Li updated SPARK-42449:

Description: 
The content of `native-image.propertie` file is not correct. This file is used 
to create a native image using GraalVM see more info: 
https://docs.oracle.com/en/graalvm/enterprise/20/docs/reference-manual/native-image/BuildConfiguration/
https://www.graalvm.org/22.1/reference-manual/native-image/BuildConfiguration/

e.g.

The content in `META-INF/native-image/io.netty` should also relocated, just as 
in `grpc-netty-shaded`.

Now, the content of 
`META-INF/native-image/io.netty/netty-codec-http2/native-image.properties` is

```
Args = --initialize-at-build-time=io.netty \
   
--initialize-at-run-time=io.netty.handler.codec.http2.Http2CodecUtil,io.netty.handler.codec.http2.Http2ClientUpgradeCodec,io.netty.handler.codec.http2.Http2ConnectionHandler,io.netty.handler.codec.http2.DefaultHttp2FrameWriter
```

but it should like
```
Args = --initialize-at-build-time=org.sparkproject.connect.client.io.netty \
   
--initialize-at-run-time=org.sparkproject.connect.client.io.netty.handler.codec.http2.Http2CodecUtil,org.sparkproject.connect.client.io.netty.handler.codec.http2.Http2ClientUpgradeCodec,org.sparkproject.connect.client.io.netty.handler.codec.http2.Http2ConnectionHandler,org.sparkproject.connect.client.io.netty.handler.codec.http2.DefaultHttp2FrameWriter
   
```
Other Transformer may need to be added

See more info in this discussion thread 
https://github.com/apache/spark/pull/39866#discussion_r1098833915



  was:
The content of `native-image.propertie` file is not correct. This file is used 
by GraalVM see 
https://docs.oracle.com/en/graalvm/enterprise/20/docs/reference-manual/native-image/BuildConfiguration/.

e.g.

The content in `META-INF/native-image/io.netty` should also relocated, just as 
in `grpc-netty-shaded`.

Now, the content of 
`META-INF/native-image/io.netty/netty-codec-http2/native-image.properties` is

```
Args = --initialize-at-build-time=io.netty \
   
--initialize-at-run-time=io.netty.handler.codec.http2.Http2CodecUtil,io.netty.handler.codec.http2.Http2ClientUpgradeCodec,io.netty.handler.codec.http2.Http2ConnectionHandler,io.netty.handler.codec.http2.DefaultHttp2FrameWriter
```

but it should like
```
Args = --initialize-at-build-time=org.sparkproject.connect.client.io.netty \
   
--initialize-at-run-time=org.sparkproject.connect.client.io.netty.handler.codec.http2.Http2CodecUtil,org.sparkproject.connect.client.io.netty.handler.codec.http2.Http2ClientUpgradeCodec,org.sparkproject.connect.client.io.netty.handler.codec.http2.Http2ConnectionHandler,org.sparkproject.connect.client.io.netty.handler.codec.http2.DefaultHttp2FrameWriter
   
```
Other Transformer may need to be added

See more info in this discussion thread 
https://github.com/apache/spark/pull/39866#discussion_r1098833915




> Fix `native-image.propertie` in Scala Client
> 
>
> Key: SPARK-42449
> URL: https://issues.apache.org/jira/browse/SPARK-42449
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Zhen Li
>Priority: Minor
>
> The content of `native-image.propertie` file is not correct. This file is 
> used to create a native image using GraalVM see more info: 
> https://docs.oracle.com/en/graalvm/enterprise/20/docs/reference-manual/native-image/BuildConfiguration/
> https://www.graalvm.org/22.1/reference-manual/native-image/BuildConfiguration/
> e.g.
> The content in `META-INF/native-image/io.netty` should also relocated, just 
> as in `grpc-netty-shaded`.
> Now, the content of 
> `META-INF/native-image/io.netty/netty-codec-http2/native-image.properties` is
> ```
> Args = --initialize-at-build-time=io.netty \
>
> --initialize-at-run-time=io.netty.handler.codec.http2.Http2CodecUtil,io.netty.handler.codec.http2.Http2ClientUpgradeCodec,io.netty.handler.codec.http2.Http2ConnectionHandler,io.netty.handler.codec.http2.DefaultHttp2FrameWriter
> ```
> but it should like
> ```
> Args = --initialize-at-build-time=org.sparkproject.connect.client.io.netty \
>
> --initialize-at-run-time=org.sparkproject.connect.client.io.netty.handler.codec.http2.Http2CodecUtil,org.sparkproject.connect.client.io.netty.handler.codec.http2.Http2ClientUpgradeCodec,org.sparkproject.connect.client.io.netty.handler.codec.http2.Http2ConnectionHandler,org.sparkproject.connect.client.io.netty.handler.codec.http2.DefaultHttp2FrameWriter
>
> ```
> Other Transformer may need to be added
> See more info in this discussion thread 
> https://github.com/apache/spark/pull/39866#discussion_r1098833915



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To 

[jira] [Updated] (SPARK-42449) Fix `native-image.propertie` in Scala Client

2023-02-15 Thread Zhen Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhen Li updated SPARK-42449:

Description: 
The content of `native-image.propertie` file is not correct. This file is used 
by GraalVM see 
https://docs.oracle.com/en/graalvm/enterprise/20/docs/reference-manual/native-image/BuildConfiguration/.

e.g.

The content in `META-INF/native-image/io.netty` should also relocated, just as 
in `grpc-netty-shaded`.

Now, the content of 
`META-INF/native-image/io.netty/netty-codec-http2/native-image.properties` is

```
Args = --initialize-at-build-time=io.netty \
   
--initialize-at-run-time=io.netty.handler.codec.http2.Http2CodecUtil,io.netty.handler.codec.http2.Http2ClientUpgradeCodec,io.netty.handler.codec.http2.Http2ConnectionHandler,io.netty.handler.codec.http2.DefaultHttp2FrameWriter
```

but it should like
```
Args = --initialize-at-build-time=org.sparkproject.connect.client.io.netty \
   
--initialize-at-run-time=org.sparkproject.connect.client.io.netty.handler.codec.http2.Http2CodecUtil,org.sparkproject.connect.client.io.netty.handler.codec.http2.Http2ClientUpgradeCodec,org.sparkproject.connect.client.io.netty.handler.codec.http2.Http2ConnectionHandler,org.sparkproject.connect.client.io.netty.handler.codec.http2.DefaultHttp2FrameWriter
   
```
Other Transformer may need to be added

See more info in this discussion thread 
https://github.com/apache/spark/pull/39866#discussion_r1098833915



  was:
The content of `native-image.propertie` file is not correct. This file may be 
used by graal project to find the shaded contents.

e.g.

The content in `META-INF/native-image/io.netty` should also relocated, just as 
in `grpc-netty-shaded`.

Now, the content of 
`META-INF/native-image/io.netty/netty-codec-http2/native-image.properties` is

```
Args = --initialize-at-build-time=io.netty \
   
--initialize-at-run-time=io.netty.handler.codec.http2.Http2CodecUtil,io.netty.handler.codec.http2.Http2ClientUpgradeCodec,io.netty.handler.codec.http2.Http2ConnectionHandler,io.netty.handler.codec.http2.DefaultHttp2FrameWriter
```

but it should like
```
Args = --initialize-at-build-time=org.sparkproject.connect.client.io.netty \
   
--initialize-at-run-time=org.sparkproject.connect.client.io.netty.handler.codec.http2.Http2CodecUtil,org.sparkproject.connect.client.io.netty.handler.codec.http2.Http2ClientUpgradeCodec,org.sparkproject.connect.client.io.netty.handler.codec.http2.Http2ConnectionHandler,org.sparkproject.connect.client.io.netty.handler.codec.http2.DefaultHttp2FrameWriter
   
```
Other Transformer may need to be added

See more info in this discussion thread 
https://github.com/apache/spark/pull/39866#discussion_r1098833915




> Fix `native-image.propertie` in Scala Client
> 
>
> Key: SPARK-42449
> URL: https://issues.apache.org/jira/browse/SPARK-42449
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Zhen Li
>Priority: Minor
>
> The content of `native-image.propertie` file is not correct. This file is 
> used by GraalVM see 
> https://docs.oracle.com/en/graalvm/enterprise/20/docs/reference-manual/native-image/BuildConfiguration/.
> e.g.
> The content in `META-INF/native-image/io.netty` should also relocated, just 
> as in `grpc-netty-shaded`.
> Now, the content of 
> `META-INF/native-image/io.netty/netty-codec-http2/native-image.properties` is
> ```
> Args = --initialize-at-build-time=io.netty \
>
> --initialize-at-run-time=io.netty.handler.codec.http2.Http2CodecUtil,io.netty.handler.codec.http2.Http2ClientUpgradeCodec,io.netty.handler.codec.http2.Http2ConnectionHandler,io.netty.handler.codec.http2.DefaultHttp2FrameWriter
> ```
> but it should like
> ```
> Args = --initialize-at-build-time=org.sparkproject.connect.client.io.netty \
>
> --initialize-at-run-time=org.sparkproject.connect.client.io.netty.handler.codec.http2.Http2CodecUtil,org.sparkproject.connect.client.io.netty.handler.codec.http2.Http2ClientUpgradeCodec,org.sparkproject.connect.client.io.netty.handler.codec.http2.Http2ConnectionHandler,org.sparkproject.connect.client.io.netty.handler.codec.http2.DefaultHttp2FrameWriter
>
> ```
> Other Transformer may need to be added
> See more info in this discussion thread 
> https://github.com/apache/spark/pull/39866#discussion_r1098833915



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42449) Fix `native-image.propertie` in Scala Client

2023-02-15 Thread Zhen Li (Jira)
Zhen Li created SPARK-42449:
---

 Summary: Fix `native-image.propertie` in Scala Client
 Key: SPARK-42449
 URL: https://issues.apache.org/jira/browse/SPARK-42449
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.4.0
Reporter: Zhen Li


The content of `native-image.propertie` file is not correct. This file may be 
used by graal project to find the shaded contents.

e.g.

The content in `META-INF/native-image/io.netty` should also relocated, just as 
in `grpc-netty-shaded`.

Now, the content of 
`META-INF/native-image/io.netty/netty-codec-http2/native-image.properties` is

```
Args = --initialize-at-build-time=io.netty \
   
--initialize-at-run-time=io.netty.handler.codec.http2.Http2CodecUtil,io.netty.handler.codec.http2.Http2ClientUpgradeCodec,io.netty.handler.codec.http2.Http2ConnectionHandler,io.netty.handler.codec.http2.DefaultHttp2FrameWriter
```

but it should like
```
Args = --initialize-at-build-time=org.sparkproject.connect.client.io.netty \
   
--initialize-at-run-time=org.sparkproject.connect.client.io.netty.handler.codec.http2.Http2CodecUtil,org.sparkproject.connect.client.io.netty.handler.codec.http2.Http2ClientUpgradeCodec,org.sparkproject.connect.client.io.netty.handler.codec.http2.Http2ConnectionHandler,org.sparkproject.connect.client.io.netty.handler.codec.http2.DefaultHttp2FrameWriter
   
```
Other Transformer may need to be added

See more info in this discussion thread 
https://github.com/apache/spark/pull/39866#discussion_r1098833915





--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42429) IntelliJ Build issue: value getArgument is not a member of org.mockito.invocation.InvocationOnMock

2023-02-13 Thread Zhen Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhen Li updated SPARK-42429:

Description: 
When running the tests with IntelliJ, sometime the error pops out:

 

spark/sql/core/src/test/scala/org/apache/spark/sql/execution/command/PlanResolutionSuite.scala:149:18
value getArgument is not a member of org.mockito.invocation.InvocationOnMock
invocation.getArgument[Identifier](0).name match {

 

It seems caused by some conflicts versioning of mockito in the IDE. 

  was:
When running the tests with IntelliJ, sometime the error pops out:

 

 

spark/sql/core/src/test/scala/org/apache/spark/sql/execution/command/PlanResolutionSuite.scala:149:18
value getArgument is not a member of org.mockito.invocation.InvocationOnMock
invocation.getArgument[Identifier](0).name match {

 

It seems caused by some conflicts versioning of mockito in the IDE. 


> IntelliJ Build issue: value getArgument is not a member of 
> org.mockito.invocation.InvocationOnMock
> --
>
> Key: SPARK-42429
> URL: https://issues.apache.org/jira/browse/SPARK-42429
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.2.4
>Reporter: Zhen Li
>Priority: Trivial
>
> When running the tests with IntelliJ, sometime the error pops out:
>  
> spark/sql/core/src/test/scala/org/apache/spark/sql/execution/command/PlanResolutionSuite.scala:149:18
> value getArgument is not a member of org.mockito.invocation.InvocationOnMock
> invocation.getArgument[Identifier](0).name match {
>  
> It seems caused by some conflicts versioning of mockito in the IDE. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42429) IntelliJ Build issue: value getArgument is not a member of org.mockito.invocation.InvocationOnMock

2023-02-13 Thread Zhen Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhen Li updated SPARK-42429:

Description: 
When running the tests with IntelliJ, sometime the error pops out:

 

{{

spark/sql/core/src/test/scala/org/apache/spark/sql/execution/command/PlanResolutionSuite.scala:149:18
value getArgument is not a member of org.mockito.invocation.InvocationOnMock
invocation.getArgument[Identifier](0).name match {

}}

 

It seems caused by some conflicts versioning of mockito in the IDE. 

  was:
When running the tests with IntelliJ, sometime the error pops out:

```

{{spark/sql/core/src/test/scala/org/apache/spark/sql/execution/command/PlanResolutionSuite.scala:149:18
value getArgument is not a member of org.mockito.invocation.InvocationOnMock
  invocation.getArgument[Identifier](0).name match {}}

{{{}```{}}}{{{}{}}}{{{}{}}}

{{}}

It seems caused by some conflicts versioning of mockito in the IDE. 

{{}}


> IntelliJ Build issue: value getArgument is not a member of 
> org.mockito.invocation.InvocationOnMock
> --
>
> Key: SPARK-42429
> URL: https://issues.apache.org/jira/browse/SPARK-42429
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.2.4
>Reporter: Zhen Li
>Priority: Trivial
>
> When running the tests with IntelliJ, sometime the error pops out:
>  
> {{
> spark/sql/core/src/test/scala/org/apache/spark/sql/execution/command/PlanResolutionSuite.scala:149:18
> value getArgument is not a member of org.mockito.invocation.InvocationOnMock
> invocation.getArgument[Identifier](0).name match {
> }}
>  
> It seems caused by some conflicts versioning of mockito in the IDE. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42429) IntelliJ Build issue: value getArgument is not a member of org.mockito.invocation.InvocationOnMock

2023-02-13 Thread Zhen Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhen Li updated SPARK-42429:

Description: 
When running the tests with IntelliJ, sometime the error pops out:

 

 

spark/sql/core/src/test/scala/org/apache/spark/sql/execution/command/PlanResolutionSuite.scala:149:18
value getArgument is not a member of org.mockito.invocation.InvocationOnMock
invocation.getArgument[Identifier](0).name match {

 

It seems caused by some conflicts versioning of mockito in the IDE. 

  was:
When running the tests with IntelliJ, sometime the error pops out:

 

{{

spark/sql/core/src/test/scala/org/apache/spark/sql/execution/command/PlanResolutionSuite.scala:149:18
value getArgument is not a member of org.mockito.invocation.InvocationOnMock
invocation.getArgument[Identifier](0).name match {

}}

 

It seems caused by some conflicts versioning of mockito in the IDE. 


> IntelliJ Build issue: value getArgument is not a member of 
> org.mockito.invocation.InvocationOnMock
> --
>
> Key: SPARK-42429
> URL: https://issues.apache.org/jira/browse/SPARK-42429
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.2.4
>Reporter: Zhen Li
>Priority: Trivial
>
> When running the tests with IntelliJ, sometime the error pops out:
>  
>  
> spark/sql/core/src/test/scala/org/apache/spark/sql/execution/command/PlanResolutionSuite.scala:149:18
> value getArgument is not a member of org.mockito.invocation.InvocationOnMock
> invocation.getArgument[Identifier](0).name match {
>  
> It seems caused by some conflicts versioning of mockito in the IDE. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42429) IntelliJ Build issue: value getArgument is not a member of org.mockito.invocation.InvocationOnMock

2023-02-13 Thread Zhen Li (Jira)
Zhen Li created SPARK-42429:
---

 Summary: IntelliJ Build issue: value getArgument is not a member 
of org.mockito.invocation.InvocationOnMock
 Key: SPARK-42429
 URL: https://issues.apache.org/jira/browse/SPARK-42429
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.2.4
Reporter: Zhen Li


When running the tests with IntelliJ, sometime the error pops out:

```

{{spark/sql/core/src/test/scala/org/apache/spark/sql/execution/command/PlanResolutionSuite.scala:149:18
value getArgument is not a member of org.mockito.invocation.InvocationOnMock
  invocation.getArgument[Identifier](0).name match {}}

{{{}```{}}}{{{}{}}}{{{}{}}}

{{}}

It seems caused by some conflicts versioning of mockito in the IDE. 

{{}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42215) Better Scala Client Integration test

2023-01-27 Thread Zhen Li (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17681401#comment-17681401
 ] 

Zhen Li commented on SPARK-42215:
-

Marking the tests as ITs in maven may cause the tests not found by SBT. Make 
sure the tests can still be found by SBT.

> Better Scala Client Integration test
> 
>
> Key: SPARK-42215
> URL: https://issues.apache.org/jira/browse/SPARK-42215
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Zhen Li
>Priority: Major
>
> The current Scala client has a few integration tests that requires a build 
> first before running client tests. This is not very nice to maven developers 
> as they will not be able to do a `mvn clean install` to run all tests.
>  
> Look into marking these test as ITs and other better ways for maven to run 
> test after packages are built.
>  
> Make sure the test run in SBT as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42215) Better Scala Client Integration test

2023-01-27 Thread Zhen Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhen Li updated SPARK-42215:

Description: 
The current Scala client has a few integration tests that requires a build 
first before running client tests. This is not very nice to maven developers as 
they will not be able to do a `mvn clean install` to run all tests.

 

Look into marking these test as ITs and other better ways for maven to run test 
after packages are built.

 

Make sure the test run in SBT as well.

  was:
The current Scala client has a few integration tests that requires a build 
first before running client tests. This is not very nice to maven developers as 
they will not be able to do a `mvn clean install` to run all tests.

 

Look into marking these test as ITs and other better ways for maven to run test 
after packages are built.


> Better Scala Client Integration test
> 
>
> Key: SPARK-42215
> URL: https://issues.apache.org/jira/browse/SPARK-42215
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Zhen Li
>Priority: Major
>
> The current Scala client has a few integration tests that requires a build 
> first before running client tests. This is not very nice to maven developers 
> as they will not be able to do a `mvn clean install` to run all tests.
>  
> Look into marking these test as ITs and other better ways for maven to run 
> test after packages are built.
>  
> Make sure the test run in SBT as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42215) Better Scala Client Integration test

2023-01-27 Thread Zhen Li (Jira)
Zhen Li created SPARK-42215:
---

 Summary: Better Scala Client Integration test
 Key: SPARK-42215
 URL: https://issues.apache.org/jira/browse/SPARK-42215
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.4.0
Reporter: Zhen Li


The current Scala client has a few integration tests that requires a build 
first before running client tests. This is not very nice to maven developers as 
they will not be able to do a `mvn clean install` to run all tests.

 

Look into marking these test as ITs and other better ways for maven to run test 
after packages are built.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42202) Scala Client E2E test stop the server gracefully

2023-01-26 Thread Zhen Li (Jira)
Zhen Li created SPARK-42202:
---

 Summary: Scala Client E2E test stop the server gracefully
 Key: SPARK-42202
 URL: https://issues.apache.org/jira/browse/SPARK-42202
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.4.0
Reporter: Zhen Li


The current solution kills the spark connect server process which may result in 
some errors in the command line.

Suggest a minor fix to close the server process gracefully.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-38378) ANTLR grammar definition in separate Parser and Lexer files

2023-01-26 Thread Zhen Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhen Li closed SPARK-38378.
---

> ANTLR grammar definition in separate Parser and Lexer files
> ---
>
> Key: SPARK-38378
> URL: https://issues.apache.org/jira/browse/SPARK-38378
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Zhen Li
>Assignee: Zhen Li
>Priority: Major
> Fix For: 3.3.0
>
>
> Suggesting to separate the ANTLR grammar defined in `SqlBase.g4` into 
> separate parser `SqlBaseParser.g4` and lexer `SqlBaseLexer.g4`. 
> Benefits:
> *Gain more flexibility when implementing new SQL features*
> The current ANTLR grammar definition is given as a mixed grammar in the 
> `SqlBase.g4` file.
> By separating the lexer and parser, we will be able to use the full power of 
> ANTLR parser and lexer grammars. e.g. lexer mode. This will give us more 
> flexibility when implementing new SQL features.
> *The code is more clean.* 
> Having parser and lexer in different files also keeps the code more explicit 
> about which is the parser and which is the lexer.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-38646) Pull a trait out for Python functions

2023-01-26 Thread Zhen Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhen Li closed SPARK-38646.
---

> Pull a trait out for Python functions
> -
>
> Key: SPARK-38646
> URL: https://issues.apache.org/jira/browse/SPARK-38646
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.3.0, 3.2.2
>Reporter: Zhen Li
>Assignee: Zhen Li
>Priority: Major
> Fix For: 3.4.0
>
>
> Currently pyspark uses a case class PythonFunction PythonRDD and many other 
> interfaces/classes. Propose to change to use a trait instead to avoid tying 
> impl with APIs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42175) Implement more methods in the Scala Client Dataset API

2023-01-24 Thread Zhen Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhen Li updated SPARK-42175:

Description: Also fix the TODOs in the MiMa compatibility test. 
https://github.com/apache/spark/pull/39712

> Implement more methods in the Scala Client Dataset API
> --
>
> Key: SPARK-42175
> URL: https://issues.apache.org/jira/browse/SPARK-42175
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Zhen Li
>Priority: Major
>
> Also fix the TODOs in the MiMa compatibility test. 
> https://github.com/apache/spark/pull/39712



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42175) Implement more methods in the Scala Client Dataset API

2023-01-24 Thread Zhen Li (Jira)
Zhen Li created SPARK-42175:
---

 Summary: Implement more methods in the Scala Client Dataset API
 Key: SPARK-42175
 URL: https://issues.apache.org/jira/browse/SPARK-42175
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.4.0
Reporter: Zhen Li






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42172) Compatibility check for Scala Client

2023-01-24 Thread Zhen Li (Jira)
Zhen Li created SPARK-42172:
---

 Summary: Compatibility check for Scala Client
 Key: SPARK-42172
 URL: https://issues.apache.org/jira/browse/SPARK-42172
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.4.0
Reporter: Zhen Li


Adding compatibility checks for Scala client to ensure the Scala Client API is 
binary compatible with Existing Spark SQL API (Dataset, SparkSession etc.)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42135) Scala Client Proper logging for the client

2023-01-20 Thread Zhen Li (Jira)
Zhen Li created SPARK-42135:
---

 Summary: Scala Client Proper logging for the client
 Key: SPARK-42135
 URL: https://issues.apache.org/jira/browse/SPARK-42135
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.4.0
Reporter: Zhen Li


Introduce proper logging for the client and change 
[https://github.com/apache/spark/pull/39541/files/2a589543bdec80f4cf806af0a8566d2de8c04140#r1082062813]
 to use the client logging.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42043) Basic Scala Client Result Implementation

2023-01-12 Thread Zhen Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhen Li updated SPARK-42043:

Description: Adding the basic scala client Result implementation. Add some 
tests to verify the result can be received correctly.  (was: Adding the basic 
scala client implementation, including Dataset, SparkSession and SparkResult.)
Summary: Basic Scala Client Result Implementation   (was: Basic Scala 
Client)

> Basic Scala Client Result Implementation 
> -
>
> Key: SPARK-42043
> URL: https://issues.apache.org/jira/browse/SPARK-42043
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Zhen Li
>Priority: Major
>
> Adding the basic scala client Result implementation. Add some tests to verify 
> the result can be received correctly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42043) Basic Scala Client

2023-01-12 Thread Zhen Li (Jira)
Zhen Li created SPARK-42043:
---

 Summary: Basic Scala Client
 Key: SPARK-42043
 URL: https://issues.apache.org/jira/browse/SPARK-42043
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.4.0
Reporter: Zhen Li


Adding the basic scala client implementation, including Dataset, SparkSession 
and SparkResult.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38646) Pull a trait out for Python functions

2022-03-24 Thread Zhen Li (Jira)
Zhen Li created SPARK-38646:
---

 Summary: Pull a trait out for Python functions
 Key: SPARK-38646
 URL: https://issues.apache.org/jira/browse/SPARK-38646
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 3.3.0, 3.2.2
Reporter: Zhen Li


Currently pyspark uses a case class PythonFunction PythonRDD and many other 
interfaces/classes. Propose to change to use a trait instead to avoid tying 
impl with APIs.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38378) ANTLR grammar definition in separate Parser and Lexer files

2022-03-01 Thread Zhen Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhen Li updated SPARK-38378:

Affects Version/s: (was: 3.2.2)

> ANTLR grammar definition in separate Parser and Lexer files
> ---
>
> Key: SPARK-38378
> URL: https://issues.apache.org/jira/browse/SPARK-38378
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Zhen Li
>Priority: Major
>
> Suggesting to separate the ANTLR grammar defined in `SqlBase.g4` into 
> separate parser `SqlBaseParser.g4` and lexer `SqlBaseLexer.g4`. 
> Benefits:
> *Gain more flexibility when implementing new SQL features*
> The current ANTLR grammar definition is given as a mixed grammar in the 
> `SqlBase.g4` file.
> By separating the lexer and parser, we will be able to use the full power of 
> ANTLR parser and lexer grammars. e.g. lexer mode. This will give us more 
> flexibility when implementing new SQL features.
> *The code is more clean.* 
> Having parser and lexer in different files also keeps the code more explicit 
> about which is the parser and which is the lexer.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38378) ANTLR grammar definition in separate Parser and Lexer files

2022-03-01 Thread Zhen Li (Jira)
Zhen Li created SPARK-38378:
---

 Summary: ANTLR grammar definition in separate Parser and Lexer 
files
 Key: SPARK-38378
 URL: https://issues.apache.org/jira/browse/SPARK-38378
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.3.0, 3.2.2
Reporter: Zhen Li


Suggesting to separate the ANTLR grammar defined in `SqlBase.g4` into separate 
parser `SqlBaseParser.g4` and lexer `SqlBaseLexer.g4`. 

Benefits:

*Gain more flexibility when implementing new SQL features*

The current ANTLR grammar definition is given as a mixed grammar in the 
`SqlBase.g4` file.

By separating the lexer and parser, we will be able to use the full power of 
ANTLR parser and lexer grammars. e.g. lexer mode. This will give us more 
flexibility when implementing new SQL features.

*The code is more clean.* 

Having parser and lexer in different files also keeps the code more explicit 
about which is the parser and which is the lexer.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33033) Display time series view for task metrics in history server

2020-09-30 Thread Zhen Li (Jira)
Zhen Li created SPARK-33033:
---

 Summary: Display time series view for task metrics in history 
server
 Key: SPARK-33033
 URL: https://issues.apache.org/jira/browse/SPARK-33033
 Project: Spark
  Issue Type: Improvement
  Components: Web UI
Affects Versions: 3.1.0
Reporter: Zhen Li


Event log contains all tasks' metrics data, which are useful for performance 
debugging. By now spark UI only displays final aggregation results, much 
information is hidden by this way. If spark UI could provide time series data 
view, it would be more helpful to performance debugging problems. We would like 
to build application statistics page in history server based on task metrics to 
provide more straight forward insight for spark application.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-31882) DAG-viz is not rendered correctly with pagination.

2020-09-21 Thread Zhen Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhen Li updated SPARK-31882:

Affects Version/s: 2.4.4

> DAG-viz is not rendered correctly with pagination.
> --
>
> Key: SPARK-31882
> URL: https://issues.apache.org/jira/browse/SPARK-31882
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.4.4, 3.0.0, 3.1.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Major
>
> Because DAG-viz for a job fetches link urls for each stage from the stage 
> table, rendering can fail with pagination.
> You can reproduce this issue with the following operation.
> {code:java}
>  sc.parallelize(1 to 10).map(value => (value 
> ,value)).repartition(1).repartition(1).repartition(1).reduceByKey(_ + 
> _).collect{code}
> And then, visit the corresponding job page.
> There are 5 stages so show <5 stages in the paged table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-32886) '.../jobs/undefined' link from "Event Timeline" in jobs page

2020-09-17 Thread Zhen Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhen Li updated SPARK-32886:

Affects Version/s: 2.4.4

> '.../jobs/undefined' link from "Event Timeline" in jobs page
> 
>
> Key: SPARK-32886
> URL: https://issues.apache.org/jira/browse/SPARK-32886
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.4.4, 3.0.0, 3.1.0
>Reporter: Zhen Li
>Assignee: Apache Spark
>Priority: Minor
> Attachments: undefinedlink.JPG
>
>
> In event timeline view of jobs page, clicking job item would redirect you to 
> corresponding job page. when there are two many jobs, some job items' link 
> would redirect to wrong link like '.../jobs/undefined'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-32886) '.../jobs/undefined' link from EvenTimeline view

2020-09-15 Thread Zhen Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhen Li updated SPARK-32886:

Attachment: undefinedlink.JPG

> '.../jobs/undefined' link from EvenTimeline view
> 
>
> Key: SPARK-32886
> URL: https://issues.apache.org/jira/browse/SPARK-32886
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Zhen Li
>Priority: Minor
> Attachments: undefinedlink.JPG
>
>
> In event timeline view of jobs page, clicking job item would redirect you to 
> corresponding job page. when there are two many jobs, some job items' link 
> would redirect to wrong link like '.../jobs/undefined'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-32886) '.../jobs/undefined' link from "Event Timeline" in jobs page

2020-09-15 Thread Zhen Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhen Li updated SPARK-32886:

Summary: '.../jobs/undefined' link from "Event Timeline" in jobs page  
(was: '.../jobs/undefined' link from EvenTimeline view)

> '.../jobs/undefined' link from "Event Timeline" in jobs page
> 
>
> Key: SPARK-32886
> URL: https://issues.apache.org/jira/browse/SPARK-32886
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Zhen Li
>Priority: Minor
> Attachments: undefinedlink.JPG
>
>
> In event timeline view of jobs page, clicking job item would redirect you to 
> corresponding job page. when there are two many jobs, some job items' link 
> would redirect to wrong link like '.../jobs/undefined'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-32886) '.../jobs/undefined' link from EvenTimeline view

2020-09-15 Thread Zhen Li (Jira)
Zhen Li created SPARK-32886:
---

 Summary: '.../jobs/undefined' link from EvenTimeline view
 Key: SPARK-32886
 URL: https://issues.apache.org/jira/browse/SPARK-32886
 Project: Spark
  Issue Type: Bug
  Components: Web UI
Affects Versions: 3.0.0, 3.1.0
Reporter: Zhen Li


In event timeline view of jobs page, clicking job item would redirect you to 
corresponding job page. when there are two many jobs, some job items' link 
would redirect to wrong link like '.../jobs/undefined'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-32581) update duration property for live ui application list and application apis

2020-08-10 Thread Zhen Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhen Li updated SPARK-32581:

Attachment: updatedapiJPG.JPG
oldapi.JPG

> update duration property for live ui application list and application apis
> --
>
> Key: SPARK-32581
> URL: https://issues.apache.org/jira/browse/SPARK-32581
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.1.0
>Reporter: Zhen Li
>Priority: Trivial
> Attachments: oldapi.JPG, updatedapiJPG.JPG
>
>
> "duration" property in response from application list and application APIs of 
> live UI is always "0". we want to let these two APIs return correct value, 
> same with "*Total Uptime*" in live UI's job page



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-32581) update duration property for live ui application list and application apis

2020-08-10 Thread Zhen Li (Jira)
Zhen Li created SPARK-32581:
---

 Summary: update duration property for live ui application list and 
application apis
 Key: SPARK-32581
 URL: https://issues.apache.org/jira/browse/SPARK-32581
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.1.0
Reporter: Zhen Li


"duration" property in response from application list and application APIs of 
live UI is always "0". we want to let these two APIs return correct value, same 
with "*Total Uptime*" in live UI's job page



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-32028) App id link in history summary page point to wrong application attempt

2020-06-18 Thread Zhen Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhen Li updated SPARK-32028:

Description: App id link in history summary page url is wrong, for multi 
attempts case. for details, please see attached screen.  (was: App id link in 
history summary page url is wrong, for multi attempts case.)

> App id link in history summary page point to wrong application attempt
> --
>
> Key: SPARK-32028
> URL: https://issues.apache.org/jira/browse/SPARK-32028
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.4.4, 3.0.0, 3.1.0
>Reporter: Zhen Li
>Priority: Minor
> Attachments: multi_same.JPG, wrong_attemptJPG.JPG
>
>
> App id link in history summary page url is wrong, for multi attempts case. 
> for details, please see attached screen.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-32028) App id link in history summary page point to wrong application attempt

2020-06-18 Thread Zhen Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhen Li updated SPARK-32028:

Attachment: wrong_attemptJPG.JPG
multi_same.JPG

> App id link in history summary page point to wrong application attempt
> --
>
> Key: SPARK-32028
> URL: https://issues.apache.org/jira/browse/SPARK-32028
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.4.4, 3.0.0, 3.1.0
>Reporter: Zhen Li
>Priority: Minor
> Attachments: multi_same.JPG, wrong_attemptJPG.JPG
>
>
> App id link in history summary page url is wrong, for multi attempts case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-32028) App id link in history summary page point to wrong application attempt

2020-06-18 Thread Zhen Li (Jira)
Zhen Li created SPARK-32028:
---

 Summary: App id link in history summary page point to wrong 
application attempt
 Key: SPARK-32028
 URL: https://issues.apache.org/jira/browse/SPARK-32028
 Project: Spark
  Issue Type: Bug
  Components: Web UI
Affects Versions: 3.0.0, 2.4.4, 3.1.0
Reporter: Zhen Li
 Attachments: multi_same.JPG, wrong_attemptJPG.JPG

App id link in history summary page url is wrong, for multi attempts case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-32024) Disk usage tracker went negative in HistoryServerDiskManager

2020-06-18 Thread Zhen Li (Jira)
Zhen Li created SPARK-32024:
---

 Summary: Disk usage tracker went negative in 
HistoryServerDiskManager
 Key: SPARK-32024
 URL: https://issues.apache.org/jira/browse/SPARK-32024
 Project: Spark
  Issue Type: Bug
  Components: Web UI
Affects Versions: 3.0.0, 2.4.4, 3.1.0
 Environment: System: Windows, Linux.

Config:

spark.history.retainedApplications 200

spark.history.store.maxDiskUsage 10g

spark.history.store.path /cache_hs
Reporter: Zhen Li


After restart history server, we would see below error randomly.
h2. HTTP ERROR 500 java.lang.IllegalStateException: Disk usage tracker went 
negative (now = -, delta = -)
||URI:|/history//*/stages/|
||STATUS:|500|
||MESSAGE:|java.lang.IllegalStateException: Disk usage tracker went negative 
(now = -, delta = -)|
||SERVLET:|org.apache.spark.deploy.history.HistoryServer$$anon$1-6ce1f601|
||CAUSED BY:|java.lang.IllegalStateException: Disk usage tracker went negative 
(now = -, delta = -)|
h3. Caused by:

java.lang.IllegalStateException: Disk usage tracker went negative (now = 
-633925, delta = -38947) at 
org.apache.spark.deploy.history.HistoryServerDiskManager.org$apache$spark$deploy$history$HistoryServerDiskManager$$updateUsage(HistoryServerDiskManager.scala:258)
 at 
org.apache.spark.deploy.history.HistoryServerDiskManager$Lease.rollback(HistoryServerDiskManager.scala:316)
 at 
org.apache.spark.deploy.history.FsHistoryProvider.loadDiskStore(FsHistoryProvider.scala:1192)
 at 
org.apache.spark.deploy.history.FsHistoryProvider.getAppUI(FsHistoryProvider.scala:363)
 at 
org.apache.spark.deploy.history.HistoryServer.getAppUI(HistoryServer.scala:191) 
at 
org.apache.spark.deploy.history.ApplicationCache.$anonfun$loadApplicationEntry$2(ApplicationCache.scala:163)
 at 
org.apache.spark.deploy.history.ApplicationCache.time(ApplicationCache.scala:135)
 at 
org.apache.spark.deploy.history.ApplicationCache.org$apache$spark$deploy$history$ApplicationCache$$loadApplicationEntry(ApplicationCache.scala:161)
 at 
org.apache.spark.deploy.history.ApplicationCache$$anon$1.load(ApplicationCache.scala:56)
 at 
org.apache.spark.deploy.history.ApplicationCache$$anon$1.load(ApplicationCache.scala:52)
 at 
org.sparkproject.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)
 at 
org.sparkproject.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379) 
at 
org.sparkproject.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)
 at org.sparkproject.guava.cache.LocalCache$Segment.get(LocalCache.java:2257) 
at org.sparkproject.guava.cache.LocalCache.get(LocalCache.java:4000) at 
org.sparkproject.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004) at 
org.sparkproject.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874)
 at 
org.apache.spark.deploy.history.ApplicationCache.get(ApplicationCache.scala:89) 
at 
org.apache.spark.deploy.history.ApplicationCache.withSparkUI(ApplicationCache.scala:101)
 at 
org.apache.spark.deploy.history.HistoryServer.org$apache$spark$deploy$history$HistoryServer$$loadAppUi(HistoryServer.scala:248)
 at 
org.apache.spark.deploy.history.HistoryServer$$anon$1.doGet(HistoryServer.scala:101)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:687) at 
javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at 
org.sparkproject.jetty.servlet.ServletHolder.handle(ServletHolder.java:763) at 
org.sparkproject.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1631)
 at 
org.apache.spark.ui.HttpSecurityFilter.doFilter(HttpSecurityFilter.scala:95) at 
org.sparkproject.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1618)
 at 
org.sparkproject.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:549) 
at 
org.sparkproject.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
 at 
org.sparkproject.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1363)
 at 
org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
 at 
org.sparkproject.jetty.servlet.ServletHandler.doScope(ServletHandler.java:489) 
at 
org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
 at 
org.sparkproject.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1278)
 at 
org.sparkproject.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
 at 
org.sparkproject.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:767)
 at 
org.sparkproject.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:221)
 at 
org.sparkproject.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
 at org.sparkproject.jetty.server.Server.handle(Server.java:500) at 
org.sparkproject.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:383) 
at 

[jira] [Updated] (SPARK-31929) Too many event files triggered "java.io.IOException" in history server on Windows

2020-06-08 Thread Zhen Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhen Li updated SPARK-31929:

Environment: 
System: Windows

Config: 

spark.history.retainedApplications 200

spark.history.retainedApplications 200spark.history.store.maxDiskUsage 2g

spark.history.store.path d://
cache_hs

  was:
System: Windows

Config: 

spark.history.retainedApplications 200

spark.history.retainedApplications 200spark.history.store.maxDiskUsage 2g

spark.history.store.path d:\\cache_hs


> Too many event files triggered "java.io.IOException" in history server on 
> Windows
> -
>
> Key: SPARK-31929
> URL: https://issues.apache.org/jira/browse/SPARK-31929
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.4.4
> Environment: System: Windows
> Config: 
> spark.history.retainedApplications 200
> spark.history.retainedApplications 200spark.history.store.maxDiskUsage 2g
> spark.history.store.path d://
> cache_hs
>Reporter: Zhen Li
>Priority: Minor
>
> h2.  
> h2. HTTP ERROR 500
> Problem accessing /history/app-20190711215551-0001/stages/. Reason:
> Server Error
>  
> h3. Caused by:
> java.io.IOException: Unable to delete file: 
> d:\cache_hs\apps\app-20190711215551-0001.ldb\MANIFEST-07 at 
> org.apache.commons.io.FileUtils.forceDelete(FileUtils.java:2381) at 
> org.apache.commons.io.FileUtils.cleanDirectory(FileUtils.java:1679) at 
> org.apache.commons.io.FileUtils.deleteDirectory(FileUtils.java:1575) at 
> org.apache.spark.deploy.history.HistoryServerDiskManager.org$apache$spark$deploy$history$HistoryServerDiskManager$$deleteStore(HistoryServerDiskManager.scala:198)
>  at 
> org.apache.spark.deploy.history.HistoryServerDiskManager.$anonfun$release$1(HistoryServerDiskManager.scala:161)
>  at scala.runtime.java8.JFunction1$mcVJ$sp.apply(JFunction1$mcVJ$sp.java:23) 
> at scala.Option.foreach(Option.scala:407) at 
> org.apache.spark.deploy.history.HistoryServerDiskManager.release(HistoryServerDiskManager.scala:156)
>  at 
> org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$loadDiskStore$1(FsHistoryProvider.scala:1163)
>  at 
> org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$loadDiskStore$1$adapted(FsHistoryProvider.scala:1157)
>  at scala.Option.foreach(Option.scala:407) at 
> org.apache.spark.deploy.history.FsHistoryProvider.loadDiskStore(FsHistoryProvider.scala:1157)
>  at 
> org.apache.spark.deploy.history.FsHistoryProvider.getAppUI(FsHistoryProvider.scala:363)
>  at 
> org.apache.spark.deploy.history.HistoryServer.getAppUI(HistoryServer.scala:191)
>  at 
> org.apache.spark.deploy.history.ApplicationCache.$anonfun$loadApplicationEntry$2(ApplicationCache.scala:163)
>  at 
> org.apache.spark.deploy.history.ApplicationCache.time(ApplicationCache.scala:135)
>  at 
> org.apache.spark.deploy.history.ApplicationCache.org$apache$spark$deploy$history$ApplicationCache$$loadApplicationEntry(ApplicationCache.scala:161)
>  at 
> org.apache.spark.deploy.history.ApplicationCache$$anon$1.load(ApplicationCache.scala:56)
>  at 
> org.apache.spark.deploy.history.ApplicationCache$$anon$1.load(ApplicationCache.scala:52)
>  at 
> org.sparkproject.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)
>  at 
> org.sparkproject.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)
>  at 
> org.sparkproject.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)
>  at org.sparkproject.guava.cache.LocalCache$Segment.get(LocalCache.java:2257) 
> at org.sparkproject.guava.cache.LocalCache.get(LocalCache.java:4000) at 
> org.sparkproject.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004) at 
> org.sparkproject.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874)
>  at 
> org.apache.spark.deploy.history.ApplicationCache.get(ApplicationCache.scala:89)
>  at 
> org.apache.spark.deploy.history.ApplicationCache.withSparkUI(ApplicationCache.scala:101)
>  at 
> org.apache.spark.deploy.history.HistoryServer.org$apache$spark$deploy$history$HistoryServer$$loadAppUi(HistoryServer.scala:248)
>  at 
> org.apache.spark.deploy.history.HistoryServer$$anon$1.doGet(HistoryServer.scala:101)
>  at javax.servlet.http.HttpServlet.service(HttpServlet.java:687) at 
> javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at 
> org.sparkproject.jetty.servlet.ServletHolder.handle(ServletHolder.java:873) 
> at 
> org.sparkproject.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1623)
>  at 
> org.apache.spark.ui.HttpSecurityFilter.doFilter(HttpSecurityFilter.scala:95) 
> at 
> org.sparkproject.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1610)
>  at 
> org.sparkproject.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)
>  at 
> 

[jira] [Updated] (SPARK-31929) Too many event files triggered "java.io.IOException" in history server on Windows

2020-06-08 Thread Zhen Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhen Li updated SPARK-31929:

Environment: 
System: Windows

Config: 

spark.history.retainedApplications 200

spark.history.retainedApplications 200spark.history.store.maxDiskUsage 2g

spark.history.store.path d://cache_hs

  was:
System: Windows

Config: 

spark.history.retainedApplications 200

spark.history.retainedApplications 200spark.history.store.maxDiskUsage 2g

spark.history.store.path d://
cache_hs


> Too many event files triggered "java.io.IOException" in history server on 
> Windows
> -
>
> Key: SPARK-31929
> URL: https://issues.apache.org/jira/browse/SPARK-31929
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.4.4
> Environment: System: Windows
> Config: 
> spark.history.retainedApplications 200
> spark.history.retainedApplications 200spark.history.store.maxDiskUsage 2g
> spark.history.store.path d://cache_hs
>Reporter: Zhen Li
>Priority: Minor
>
> h2.  
> h2. HTTP ERROR 500
> Problem accessing /history/app-20190711215551-0001/stages/. Reason:
> Server Error
>  
> h3. Caused by:
> java.io.IOException: Unable to delete file: 
> d:\cache_hs\apps\app-20190711215551-0001.ldb\MANIFEST-07 at 
> org.apache.commons.io.FileUtils.forceDelete(FileUtils.java:2381) at 
> org.apache.commons.io.FileUtils.cleanDirectory(FileUtils.java:1679) at 
> org.apache.commons.io.FileUtils.deleteDirectory(FileUtils.java:1575) at 
> org.apache.spark.deploy.history.HistoryServerDiskManager.org$apache$spark$deploy$history$HistoryServerDiskManager$$deleteStore(HistoryServerDiskManager.scala:198)
>  at 
> org.apache.spark.deploy.history.HistoryServerDiskManager.$anonfun$release$1(HistoryServerDiskManager.scala:161)
>  at scala.runtime.java8.JFunction1$mcVJ$sp.apply(JFunction1$mcVJ$sp.java:23) 
> at scala.Option.foreach(Option.scala:407) at 
> org.apache.spark.deploy.history.HistoryServerDiskManager.release(HistoryServerDiskManager.scala:156)
>  at 
> org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$loadDiskStore$1(FsHistoryProvider.scala:1163)
>  at 
> org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$loadDiskStore$1$adapted(FsHistoryProvider.scala:1157)
>  at scala.Option.foreach(Option.scala:407) at 
> org.apache.spark.deploy.history.FsHistoryProvider.loadDiskStore(FsHistoryProvider.scala:1157)
>  at 
> org.apache.spark.deploy.history.FsHistoryProvider.getAppUI(FsHistoryProvider.scala:363)
>  at 
> org.apache.spark.deploy.history.HistoryServer.getAppUI(HistoryServer.scala:191)
>  at 
> org.apache.spark.deploy.history.ApplicationCache.$anonfun$loadApplicationEntry$2(ApplicationCache.scala:163)
>  at 
> org.apache.spark.deploy.history.ApplicationCache.time(ApplicationCache.scala:135)
>  at 
> org.apache.spark.deploy.history.ApplicationCache.org$apache$spark$deploy$history$ApplicationCache$$loadApplicationEntry(ApplicationCache.scala:161)
>  at 
> org.apache.spark.deploy.history.ApplicationCache$$anon$1.load(ApplicationCache.scala:56)
>  at 
> org.apache.spark.deploy.history.ApplicationCache$$anon$1.load(ApplicationCache.scala:52)
>  at 
> org.sparkproject.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)
>  at 
> org.sparkproject.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)
>  at 
> org.sparkproject.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)
>  at org.sparkproject.guava.cache.LocalCache$Segment.get(LocalCache.java:2257) 
> at org.sparkproject.guava.cache.LocalCache.get(LocalCache.java:4000) at 
> org.sparkproject.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004) at 
> org.sparkproject.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874)
>  at 
> org.apache.spark.deploy.history.ApplicationCache.get(ApplicationCache.scala:89)
>  at 
> org.apache.spark.deploy.history.ApplicationCache.withSparkUI(ApplicationCache.scala:101)
>  at 
> org.apache.spark.deploy.history.HistoryServer.org$apache$spark$deploy$history$HistoryServer$$loadAppUi(HistoryServer.scala:248)
>  at 
> org.apache.spark.deploy.history.HistoryServer$$anon$1.doGet(HistoryServer.scala:101)
>  at javax.servlet.http.HttpServlet.service(HttpServlet.java:687) at 
> javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at 
> org.sparkproject.jetty.servlet.ServletHolder.handle(ServletHolder.java:873) 
> at 
> org.sparkproject.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1623)
>  at 
> org.apache.spark.ui.HttpSecurityFilter.doFilter(HttpSecurityFilter.scala:95) 
> at 
> org.sparkproject.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1610)
>  at 
> org.sparkproject.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)
>  at 
> 

[jira] [Created] (SPARK-31929) Too many event files triggered "java.io.IOException" in history server on Windows

2020-06-08 Thread Zhen Li (Jira)
Zhen Li created SPARK-31929:
---

 Summary: Too many event files triggered "java.io.IOException" in 
history server on Windows
 Key: SPARK-31929
 URL: https://issues.apache.org/jira/browse/SPARK-31929
 Project: Spark
  Issue Type: Bug
  Components: Web UI
Affects Versions: 2.4.4
 Environment: System: Windows

Config: 

spark.history.retainedApplications 200

spark.history.retainedApplications 200spark.history.store.maxDiskUsage 2g

spark.history.store.path d:\\cache_hs
Reporter: Zhen Li


h2.  
h2. HTTP ERROR 500

Problem accessing /history/app-20190711215551-0001/stages/. Reason:

Server Error

 
h3. Caused by:

java.io.IOException: Unable to delete file: 
d:\cache_hs\apps\app-20190711215551-0001.ldb\MANIFEST-07 at 
org.apache.commons.io.FileUtils.forceDelete(FileUtils.java:2381) at 
org.apache.commons.io.FileUtils.cleanDirectory(FileUtils.java:1679) at 
org.apache.commons.io.FileUtils.deleteDirectory(FileUtils.java:1575) at 
org.apache.spark.deploy.history.HistoryServerDiskManager.org$apache$spark$deploy$history$HistoryServerDiskManager$$deleteStore(HistoryServerDiskManager.scala:198)
 at 
org.apache.spark.deploy.history.HistoryServerDiskManager.$anonfun$release$1(HistoryServerDiskManager.scala:161)
 at scala.runtime.java8.JFunction1$mcVJ$sp.apply(JFunction1$mcVJ$sp.java:23) at 
scala.Option.foreach(Option.scala:407) at 
org.apache.spark.deploy.history.HistoryServerDiskManager.release(HistoryServerDiskManager.scala:156)
 at 
org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$loadDiskStore$1(FsHistoryProvider.scala:1163)
 at 
org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$loadDiskStore$1$adapted(FsHistoryProvider.scala:1157)
 at scala.Option.foreach(Option.scala:407) at 
org.apache.spark.deploy.history.FsHistoryProvider.loadDiskStore(FsHistoryProvider.scala:1157)
 at 
org.apache.spark.deploy.history.FsHistoryProvider.getAppUI(FsHistoryProvider.scala:363)
 at 
org.apache.spark.deploy.history.HistoryServer.getAppUI(HistoryServer.scala:191) 
at 
org.apache.spark.deploy.history.ApplicationCache.$anonfun$loadApplicationEntry$2(ApplicationCache.scala:163)
 at 
org.apache.spark.deploy.history.ApplicationCache.time(ApplicationCache.scala:135)
 at 
org.apache.spark.deploy.history.ApplicationCache.org$apache$spark$deploy$history$ApplicationCache$$loadApplicationEntry(ApplicationCache.scala:161)
 at 
org.apache.spark.deploy.history.ApplicationCache$$anon$1.load(ApplicationCache.scala:56)
 at 
org.apache.spark.deploy.history.ApplicationCache$$anon$1.load(ApplicationCache.scala:52)
 at 
org.sparkproject.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)
 at 
org.sparkproject.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379) 
at 
org.sparkproject.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)
 at org.sparkproject.guava.cache.LocalCache$Segment.get(LocalCache.java:2257) 
at org.sparkproject.guava.cache.LocalCache.get(LocalCache.java:4000) at 
org.sparkproject.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004) at 
org.sparkproject.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874)
 at 
org.apache.spark.deploy.history.ApplicationCache.get(ApplicationCache.scala:89) 
at 
org.apache.spark.deploy.history.ApplicationCache.withSparkUI(ApplicationCache.scala:101)
 at 
org.apache.spark.deploy.history.HistoryServer.org$apache$spark$deploy$history$HistoryServer$$loadAppUi(HistoryServer.scala:248)
 at 
org.apache.spark.deploy.history.HistoryServer$$anon$1.doGet(HistoryServer.scala:101)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:687) at 
javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at 
org.sparkproject.jetty.servlet.ServletHolder.handle(ServletHolder.java:873) at 
org.sparkproject.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1623)
 at 
org.apache.spark.ui.HttpSecurityFilter.doFilter(HttpSecurityFilter.scala:95) at 
org.sparkproject.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1610)
 at 
org.sparkproject.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540) 
at 
org.sparkproject.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
 at 
org.sparkproject.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1345)
 at 
org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
 at 
org.sparkproject.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480) 
at 
org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
 at 
org.sparkproject.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1247)
 at 
org.sparkproject.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
 at 
org.sparkproject.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:753)
 at