[jira] [Reopened] (SPARK-48505) Simplify the implementation of Utils#isG1GC
[ https://issues.apache.org/jira/browse/SPARK-48505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie reopened SPARK-48505: -- > Simplify the implementation of Utils#isG1GC > --- > > Key: SPARK-48505 > URL: https://issues.apache.org/jira/browse/SPARK-48505 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48533) Add test for cached schema
[ https://issues.apache.org/jira/browse/SPARK-48533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48533: Assignee: Ruifeng Zheng > Add test for cached schema > -- > > Key: SPARK-48533 > URL: https://issues.apache.org/jira/browse/SPARK-48533 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark, Tests >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48533) Add test for cached schema
[ https://issues.apache.org/jira/browse/SPARK-48533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48533. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46871 [https://github.com/apache/spark/pull/46871] > Add test for cached schema > -- > > Key: SPARK-48533 > URL: https://issues.apache.org/jira/browse/SPARK-48533 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark, Tests >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48534) Support interruptOperation in streaming queries
Hyukjin Kwon created SPARK-48534: Summary: Support interruptOperation in streaming queries Key: SPARK-48534 URL: https://issues.apache.org/jira/browse/SPARK-48534 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 4.0.0 Reporter: Hyukjin Kwon Similar with https://issues.apache.org/jira/browse/SPARK-48485 but we should also add interruptOperation -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48422) Serialize all data at once may cause MemoryError
[ https://issues.apache.org/jira/browse/SPARK-48422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhouYang updated SPARK-48422: - Description: In worker.py, there is a function called process(), the iterator loads all data at once {code:java} def process(): iterator = deserializer.load_stream(infile) serializer.dump_stream(func(split_index, iterator), outfile){code} It will cause MemoryError when working on large scale data, For the reason that I have indeed encountered this situation as below: {code:java} MemoryError at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:203) at org.apache.spark.api.python.PythonRunner$$anon$1.(PythonRDD.scala:244) at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:162) at org.apache.spark.sql.execution.python.BatchEvalPythonExec$$anonfun$doExecute$1.apply(BatchEvalPythonExec.scala:144) at org.apache.spark.sql.execution.python.BatchEvalPythonExec$$anonfun$doExecute$1.apply(BatchEvalPythonExec.scala:87) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:797) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:797) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:89) at org.apache.spark.scheduler.Task.run(Task.scala:109) at org.apache.spark.executor.Executor$TaskRunner$$anon$2.run(Executor.scala:355) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1721) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:353) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 2024-05-22 16:50:03,173 INFO org.apache.spark.scheduler.TaskSetManager: Starting task 0.1 in stage 5.0 (TID 21, saturndatanode3, executor 2, partition 0, ANY, 5075 bytes) 2024-05-22 16:50:03,174 INFO org.apache.spark.scheduler.TaskSetManager: Lost task 1.0 in stage 5.0 (TID 19) on saturndatanode3, executor 2: org.apache.spark.api.python.PythonException (Traceback (most recent call last): File "xx/spark/python/lib/pyspark.zip/pyspark/worker.py", line 200, in main process() File "x/spark/python/lib/pyspark.zip/pyspark/worker.py", line 195, in process serializer.dump_stream(func(split_index, iterator), outfile) File "x/spark/python/lib/pyspark.zip/pyspark/worker.py", line 106, in func = lambda _, it: map(mapper, it) File "", line 1, in File "x/spark/python/lib/pyspark.zip/pyspark/worker.py", line 73, in return lambda *a: f(*a){code} I did some tests by adding memory monitor code, I found that this code takes up a lot of memory during execution: {code:java} start_memory = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss print(f"Memory usage at the beginning: {start_memory} KB")iterator = deserializer.load_stream(infile) serializer.dump_stream(func(split_index, iterator), outfile)end_memory = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss print(f"Memory usage at the end: {end_memory} KB")memory_difference = end_memory - start_memory print(f"Memory usage changes:{memory_difference} KB"){code} Can I process the data in the iterator in batches as below? {code:java} def process(): iterator = deserializer.load_stream(infile) def batched_func(iterator, func, serializer, outfile): batch = [] count = 0 for item in iterator: batch.append(item) count += 1 // Process the data in the iterator in batches, with 1 entries each time. if count >= 1: serializer.dump_stream(func(split_index, batch), outfile) batch = [] count = 0 if batch: serializer.dump_stream(func(split_index, batch), outfile) batched_func(iterator, func, serializer, outfile){code} I test with code as above, it works well with lower memory each time. was: In worker.py, there is a function called process(), the iterator loads all data at once {code:java} def process(): iterator = deserializer.load_stream(infile) serializer.dump_stream(func(split_index, iterator), outfile){code} It will cause MemoryError when working on large scale data, For the reason that I have indeed encountered this situation as below: {code:java}
[jira] [Updated] (SPARK-48422) Serialize all data at once may cause MemoryError
[ https://issues.apache.org/jira/browse/SPARK-48422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhouYang updated SPARK-48422: - Summary: Serialize all data at once may cause MemoryError (was: Using lambda may cause MemoryError) > Serialize all data at once may cause MemoryError > > > Key: SPARK-48422 > URL: https://issues.apache.org/jira/browse/SPARK-48422 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.2.1 >Reporter: ZhouYang >Priority: Critical > > In worker.py, there is a function called process(), the iterator loads all > data at once > {code:java} > def process(): > iterator = deserializer.load_stream(infile) > serializer.dump_stream(func(split_index, iterator), outfile){code} > It will cause MemoryError when working on large scale data, For the reason > that I have indeed encountered this situation as below: > {code:java} > MemoryError at > org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:203) at > org.apache.spark.api.python.PythonRunner$$anon$1.(PythonRDD.scala:244) > at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:162) at > org.apache.spark.sql.execution.python.BatchEvalPythonExec$$anonfun$doExecute$1.apply(BatchEvalPythonExec.scala:144) > at > org.apache.spark.sql.execution.python.BatchEvalPythonExec$$anonfun$doExecute$1.apply(BatchEvalPythonExec.scala:87) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:797) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:797) > at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at > org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at > org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at > org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:89) at > org.apache.spark.scheduler.Task.run(Task.scala:109) at > org.apache.spark.executor.Executor$TaskRunner$$anon$2.run(Executor.scala:355) > at java.security.AccessController.doPrivileged(Native Method) at > javax.security.auth.Subject.doAs(Subject.java:422) at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1721) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:353) at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) 2024-05-22 16:50:03,173 INFO > org.apache.spark.scheduler.TaskSetManager: Starting task 0.1 in stage 5.0 > (TID 21, saturndatanode3, executor 2, partition 0, ANY, 5075 bytes) > 2024-05-22 16:50:03,174 INFO org.apache.spark.scheduler.TaskSetManager: Lost > task 1.0 in stage 5.0 (TID 19) on saturndatanode3, executor 2: > org.apache.spark.api.python.PythonException (Traceback (most recent call > last): File "xx/spark/python/lib/pyspark.zip/pyspark/worker.py", line > 200, in main process() File > "x/spark/python/lib/pyspark.zip/pyspark/worker.py", line 195, in process > serializer.dump_stream(func(split_index, iterator), outfile) File > "x/spark/python/lib/pyspark.zip/pyspark/worker.py", line 106, in > func = lambda _, it: map(mapper, it) File "", line 1, in > File "x/spark/python/lib/pyspark.zip/pyspark/worker.py", line 73, in > return lambda *a: f(*a){code} > I did some tests by adding memory monitor code, I found that this code takes > up a lot of memory during execution: > {code:java} > start_memory = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss > print(f"Memory usage at the beginning: {start_memory} KB")iterator = > deserializer.load_stream(infile) > serializer.dump_stream(func(split_index, iterator), outfile)end_memory = > resource.getrusage(resource.RUSAGE_SELF).ru_maxrss > print(f"Memory usage at the end: {end_memory} KB")memory_difference = > end_memory - start_memory > print(f"Memory usage changes:{memory_difference} KB"){code} > Can I process the data in the iterator in batches as below? > {code:java} > def process(): > iterator = deserializer.load_stream(infile) > def batched_func(iterator, func, serializer, outputfile): > batch = [] > count = 0 > for item in iterator: > batch.append(item) > count += 1 > // Process the data in the iterator in batches, with 1 > entries each time. > if count >= 1: > serializer.dump_stream(func(split_index, batch), outfile) > batch = [] >
[jira] [Updated] (SPARK-48533) Add test for cached schema
[ https://issues.apache.org/jira/browse/SPARK-48533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48533: --- Labels: pull-request-available (was: ) > Add test for cached schema > -- > > Key: SPARK-48533 > URL: https://issues.apache.org/jira/browse/SPARK-48533 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark, Tests >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48533) Add test for cached schema
Ruifeng Zheng created SPARK-48533: - Summary: Add test for cached schema Key: SPARK-48533 URL: https://issues.apache.org/jira/browse/SPARK-48533 Project: Spark Issue Type: Sub-task Components: Connect, PySpark, Tests Affects Versions: 4.0.0 Reporter: Ruifeng Zheng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48532) Upgrade maven plugin to latest version
[ https://issues.apache.org/jira/browse/SPARK-48532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48532: --- Labels: pull-request-available (was: ) > Upgrade maven plugin to latest version > -- > > Key: SPARK-48532 > URL: https://issues.apache.org/jira/browse/SPARK-48532 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48523) Add `grpc_max_message_size ` description to `client-connection-string.md`
[ https://issues.apache.org/jira/browse/SPARK-48523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48523. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46862 [https://github.com/apache/spark/pull/46862] > Add `grpc_max_message_size ` description to `client-connection-string.md` > - > > Key: SPARK-48523 > URL: https://issues.apache.org/jira/browse/SPARK-48523 > Project: Spark > Issue Type: Improvement > Components: Connect, Documentation >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48523) Add `grpc_max_message_size ` description to `client-connection-string.md`
[ https://issues.apache.org/jira/browse/SPARK-48523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48523: Assignee: BingKun Pan > Add `grpc_max_message_size ` description to `client-connection-string.md` > - > > Key: SPARK-48523 > URL: https://issues.apache.org/jira/browse/SPARK-48523 > Project: Spark > Issue Type: Improvement > Components: Connect, Documentation >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48485) Support interruptTag and interruptAll in streaming queries
[ https://issues.apache.org/jira/browse/SPARK-48485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48485. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46819 [https://github.com/apache/spark/pull/46819] > Support interruptTag and interruptAll in streaming queries > -- > > Key: SPARK-48485 > URL: https://issues.apache.org/jira/browse/SPARK-48485 > Project: Spark > Issue Type: Improvement > Components: Connect, Structured Streaming >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Spark Connect's interrupt API does not interrupt streaming queries. We should > support them. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48495) Document planned approach to shredding
[ https://issues.apache.org/jira/browse/SPARK-48495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48495: --- Labels: pull-request-available (was: ) > Document planned approach to shredding > -- > > Key: SPARK-48495 > URL: https://issues.apache.org/jira/browse/SPARK-48495 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: David Cashman >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48307) InlineCTE should keep not-inlined relations in the original WithCTE node
[ https://issues.apache.org/jira/browse/SPARK-48307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48307. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46617 [https://github.com/apache/spark/pull/46617] > InlineCTE should keep not-inlined relations in the original WithCTE node > > > Key: SPARK-48307 > URL: https://issues.apache.org/jira/browse/SPARK-48307 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48528) Refine K8s Operator `merge_spark_pr.py` to use `kubernetes-operator-x.y.z` versions
[ https://issues.apache.org/jira/browse/SPARK-48528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-48528. --- Fix Version/s: kubernetes-operator-0.1.0 Assignee: Dongjoon Hyun Resolution: Fixed This is resolved via https://github.com/apache/spark-kubernetes-operator/pull/14 > Refine K8s Operator `merge_spark_pr.py` to use `kubernetes-operator-x.y.z` > versions > --- > > Key: SPARK-48528 > URL: https://issues.apache.org/jira/browse/SPARK-48528 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: kubernetes-operator-0.1.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Minor > Labels: pull-request-available > Fix For: kubernetes-operator-0.1.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48531) Fix `Black` target version to Python 3.9
[ https://issues.apache.org/jira/browse/SPARK-48531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-48531. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46867 [https://github.com/apache/spark/pull/46867] > Fix `Black` target version to Python 3.9 > > > Key: SPARK-48531 > URL: https://issues.apache.org/jira/browse/SPARK-48531 > Project: Spark > Issue Type: Sub-task > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48531) Fix `Black` target version to Python 3.9
[ https://issues.apache.org/jira/browse/SPARK-48531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-48531: - Assignee: Dongjoon Hyun > Fix `Black` target version to Python 3.9 > > > Key: SPARK-48531 > URL: https://issues.apache.org/jira/browse/SPARK-48531 > Project: Spark > Issue Type: Sub-task > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48531) Fix `Black` target version to Python 3.9
[ https://issues.apache.org/jira/browse/SPARK-48531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48531: --- Labels: pull-request-available (was: ) > Fix `Black` target version to Python 3.9 > > > Key: SPARK-48531 > URL: https://issues.apache.org/jira/browse/SPARK-48531 > Project: Spark > Issue Type: Sub-task > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48531) Fix `Black` target version to Python 3.9
[ https://issues.apache.org/jira/browse/SPARK-48531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-48531: -- Parent: SPARK-44111 Issue Type: Sub-task (was: Improvement) > Fix `Black` target version to Python 3.9 > > > Key: SPARK-48531 > URL: https://issues.apache.org/jira/browse/SPARK-48531 > Project: Spark > Issue Type: Sub-task > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48531) Fix `Black` target version to Python 3.9
Dongjoon Hyun created SPARK-48531: - Summary: Fix `Black` target version to Python 3.9 Key: SPARK-48531 URL: https://issues.apache.org/jira/browse/SPARK-48531 Project: Spark Issue Type: Improvement Components: Project Infra Affects Versions: 4.0.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48528) Refine K8s Operator `merge_spark_pr.py` to use `kubernetes-operator-x.y.z` versions
[ https://issues.apache.org/jira/browse/SPARK-48528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-48528: -- Summary: Refine K8s Operator `merge_spark_pr.py` to use `kubernetes-operator-x.y.z` versions (was: Refine K8s Operator `merge_spark_pr.py` to use `kubernetes-operator-x.y.z` version only) > Refine K8s Operator `merge_spark_pr.py` to use `kubernetes-operator-x.y.z` > versions > --- > > Key: SPARK-48528 > URL: https://issues.apache.org/jira/browse/SPARK-48528 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: kubernetes-operator-0.1.0 >Reporter: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48530) [M0] Support for local variables
David Milicevic created SPARK-48530: --- Summary: [M0] Support for local variables Key: SPARK-48530 URL: https://issues.apache.org/jira/browse/SPARK-48530 Project: Spark Issue Type: Sub-task Components: Spark Core Affects Versions: 4.0.0 Reporter: David Milicevic At the moment, variables in SQL scripts are creating session variables. We don't want this, we want variables to be considered as local (within the block/compound). To achieve this, we probably need to wait for labels support. Once we have it, we can prepend variable names with labels to make distinction between variables with the same name and only then reuse session variables mechanism to save values with such composed names. If the block/compound doesn't have label, we should generate it automatically (GUID or something similar). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48528) Refine K8s Operator `merge_spark_pr.py` to use `kubernetes-operator-x.y.z` version only
[ https://issues.apache.org/jira/browse/SPARK-48528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48528: --- Labels: pull-request-available (was: ) > Refine K8s Operator `merge_spark_pr.py` to use `kubernetes-operator-x.y.z` > version only > --- > > Key: SPARK-48528 > URL: https://issues.apache.org/jira/browse/SPARK-48528 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: kubernetes-operator-0.1.0 >Reporter: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48382) Add controller / reconciler module to operator
[ https://issues.apache.org/jira/browse/SPARK-48382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-48382: - Assignee: Zhou JIANG > Add controller / reconciler module to operator > -- > > Key: SPARK-48382 > URL: https://issues.apache.org/jira/browse/SPARK-48382 > Project: Spark > Issue Type: Sub-task > Components: k8s >Affects Versions: kubernetes-operator-0.1.0 >Reporter: Zhou JIANG >Assignee: Zhou JIANG >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-48529) [M0] Support for labels
[ https://issues.apache.org/jira/browse/SPARK-48529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17852119#comment-17852119 ] David Milicevic commented on SPARK-48529: - [~milan.dankovic] is working on designing this. > [M0] Support for labels > --- > > Key: SPARK-48529 > URL: https://issues.apache.org/jira/browse/SPARK-48529 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: David Milicevic >Priority: Major > > Add support for labels to SQL parser. > > For more details: > * Design doc in parent Jira item. > * [SQL ref > spec.|https://docs.google.com/document/d/1_UCvU3dYdcniV66akT1K6huWX4g7jpXDKaoPRDSZr2E/edit#heading=h.4cz970y1mk93] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48529) [M0] Support for labels
[ https://issues.apache.org/jira/browse/SPARK-48529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Milicevic updated SPARK-48529: Description: Add support for labels to SQL parser. For more details: * Design doc in parent Jira item. * [SQL ref spec.|https://docs.google.com/document/d/1_UCvU3dYdcniV66akT1K6huWX4g7jpXDKaoPRDSZr2E/edit#heading=h.4cz970y1mk93] was:Add support for labels to SQL parser. > [M0] Support for labels > --- > > Key: SPARK-48529 > URL: https://issues.apache.org/jira/browse/SPARK-48529 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: David Milicevic >Priority: Major > > Add support for labels to SQL parser. > > For more details: > * Design doc in parent Jira item. > * [SQL ref > spec.|https://docs.google.com/document/d/1_UCvU3dYdcniV66akT1K6huWX4g7jpXDKaoPRDSZr2E/edit#heading=h.4cz970y1mk93] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48529) [M0] Support for labels
David Milicevic created SPARK-48529: --- Summary: [M0] Support for labels Key: SPARK-48529 URL: https://issues.apache.org/jira/browse/SPARK-48529 Project: Spark Issue Type: Sub-task Components: Spark Core Affects Versions: 4.0.0 Reporter: David Milicevic Add support for labels to SQL parser. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48528) Refine K8s Operator `merge_spark_pr.py` to use `kubernetes-operator-x.y.z` version only
Dongjoon Hyun created SPARK-48528: - Summary: Refine K8s Operator `merge_spark_pr.py` to use `kubernetes-operator-x.y.z` version only Key: SPARK-48528 URL: https://issues.apache.org/jira/browse/SPARK-48528 Project: Spark Issue Type: Improvement Components: Kubernetes Affects Versions: kubernetes-operator-0.1.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48527) [M0] Thrift investigation
David Milicevic created SPARK-48527: --- Summary: [M0] Thrift investigation Key: SPARK-48527 URL: https://issues.apache.org/jira/browse/SPARK-48527 Project: Spark Issue Type: Sub-task Components: Spark Core Affects Versions: 4.0.0 Reporter: David Milicevic Some notebook modes (SQL Warehouse, what else?) execute SQL commands through SQL Gateway and Thrift stacks. We need to: - Figure out why SQL scripts execution is failing in these cases. - Understand the SQL Gateway + Thrift stack better, so we can more easily propose design for new API(s) we are going to introduce in the future. For more details, design doc can be found in parent Jira item. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48526) Allow passing custom sink to StreamTest::testStream
[ https://issues.apache.org/jira/browse/SPARK-48526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48526: --- Labels: pull-request-available (was: ) > Allow passing custom sink to StreamTest::testStream > --- > > Key: SPARK-48526 > URL: https://issues.apache.org/jira/browse/SPARK-48526 > Project: Spark > Issue Type: Test > Components: Structured Streaming >Affects Versions: 4.0.0 >Reporter: Johan Lasperas >Priority: Trivial > Labels: pull-request-available > > The testing helpers for streaming don't allow providing a custom sink, this > is limiting in (at least) two ways: > * A sink can't be reused across multiple calls to `testStream`, e.g. when > canceling and resuming streaming > * A custom sink implementation other than `MemorySink` can't be provided. A > use case here is for example to test the Delta streaming sink by wrapping it > in a MemorySink interface and passing it to the test framework. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48526) Allow passing custom sink to StreamTest::testStream
Johan Lasperas created SPARK-48526: -- Summary: Allow passing custom sink to StreamTest::testStream Key: SPARK-48526 URL: https://issues.apache.org/jira/browse/SPARK-48526 Project: Spark Issue Type: Test Components: Structured Streaming Affects Versions: 4.0.0 Reporter: Johan Lasperas The testing helpers for streaming don't allow providing a custom sink, this is limiting in (at least) two ways: * A sink can't be reused across multiple calls to `testStream`, e.g. when canceling and resuming streaming * A custom sink implementation other than `MemorySink` can't be provided. A use case here is for example to test the Delta streaming sink by wrapping it in a MemorySink interface and passing it to the test framework. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48377) [M1] Multiple results API - sqlScript()
[ https://issues.apache.org/jira/browse/SPARK-48377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Milicevic updated SPARK-48377: Summary: [M1] Multiple results API - sqlScript() (was: Multiple results API - sqlScript()) > [M1] Multiple results API - sqlScript() > --- > > Key: SPARK-48377 > URL: https://issues.apache.org/jira/browse/SPARK-48377 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: David Milicevic >Priority: Major > > For now: > * Write an API proposal > ** The API itself should be fine, but we need to figure out what the result > set should look like, i.e. in what format we return multiple DataFrames. > ** The result set should be compatible with CALL and EXECUTE IMMEDIATE as > well. > * Figure out how the API will propagate down the Spark Connect stack > (depends on SPARK-48452 investigation) > > Probably to be separated into multiple subtasks in the future. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48375) [M1] Support for SIGNAL statement
[ https://issues.apache.org/jira/browse/SPARK-48375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Milicevic updated SPARK-48375: Summary: [M1] Support for SIGNAL statement (was: Support for SIGNAL statement) > [M1] Support for SIGNAL statement > - > > Key: SPARK-48375 > URL: https://issues.apache.org/jira/browse/SPARK-48375 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: David Milicevic >Priority: Major > > Details TBD. > > For more details: > * Design doc in parent Jira item. > * [SQL ref > spec.|https://docs.google.com/document/d/1_UCvU3dYdcniV66akT1K6huWX4g7jpXDKaoPRDSZr2E/edit#heading=h.4cz970y1mk93] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48456) [M1] Performance benchmark
[ https://issues.apache.org/jira/browse/SPARK-48456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Milicevic updated SPARK-48456: Summary: [M1] Performance benchmark (was: Performance benchmark) > [M1] Performance benchmark > -- > > Key: SPARK-48456 > URL: https://issues.apache.org/jira/browse/SPARK-48456 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: David Milicevic >Priority: Major > > Performance parity is officially an M2 requirement, but by the end of M0 I > think we should start doing some perf benchmarks to figure out where do we > stand in the beginning and if we need to change something right from the > start before we get to work on a more complex stuff. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48376) [M1] Support for ITERATE statement
[ https://issues.apache.org/jira/browse/SPARK-48376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Milicevic updated SPARK-48376: Summary: [M1] Support for ITERATE statement (was: Support for ITERATE statement) > [M1] Support for ITERATE statement > -- > > Key: SPARK-48376 > URL: https://issues.apache.org/jira/browse/SPARK-48376 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: David Milicevic >Priority: Major > > Add support for ITERATE statement in WHILE (and other) loops to SQL scripting > parser & interpreter. > This is the same functionality as CONTINUE in other languages. > > For more details: > * Design doc in parent Jira item. > * [SQL ref > spec|https://docs.google.com/document/d/1cpSuR3KxRuTSJ4ZMQ73FJ4_-hjouNNU2zfI4vri6yhs/edit]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48326) Use the official Apache Spark 4.0.0-preview1
[ https://issues.apache.org/jira/browse/SPARK-48326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-48326: -- Fix Version/s: kubernetes-operator-0.1.0 (was: 4.0.0) > Use the official Apache Spark 4.0.0-preview1 > > > Key: SPARK-48326 > URL: https://issues.apache.org/jira/browse/SPARK-48326 > Project: Spark > Issue Type: Sub-task > Components: k8s >Affects Versions: kubernetes-operator-0.1.0 >Reporter: Zhou JIANG >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > Fix For: kubernetes-operator-0.1.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48349) [M1] Support for debugging
[ https://issues.apache.org/jira/browse/SPARK-48349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Milicevic updated SPARK-48349: Summary: [M1] Support for debugging (was: Support for debugging) > [M1] Support for debugging > -- > > Key: SPARK-48349 > URL: https://issues.apache.org/jira/browse/SPARK-48349 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: David Milicevic >Priority: Major > > TBD. > Probably to be separated into multiple subtasks. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48326) Use the official Apache Spark 4.0.0-preview1
[ https://issues.apache.org/jira/browse/SPARK-48326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-48326: -- Summary: Use the official Apache Spark 4.0.0-preview1 (was: Upgrade submission worker base Spark version to 4.0.0-preview2) > Use the official Apache Spark 4.0.0-preview1 > > > Key: SPARK-48326 > URL: https://issues.apache.org/jira/browse/SPARK-48326 > Project: Spark > Issue Type: Sub-task > Components: k8s >Affects Versions: kubernetes-operator-0.1.0 >Reporter: Zhou JIANG >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48455) [M1] Public documentation
[ https://issues.apache.org/jira/browse/SPARK-48455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Milicevic updated SPARK-48455: Summary: [M1] Public documentation (was: Public documentation) > [M1] Public documentation > - > > Key: SPARK-48455 > URL: https://issues.apache.org/jira/browse/SPARK-48455 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: David Milicevic >Priority: Major > > I guess this shouldn't be anything revolutionary, just a basic doc with SQL > Scripting grammar and functions explained properly. > > We might want to sync with Serge about this to figure out if he has any > thoughts before we start working on it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48453) [M1] Support for PRINT/TRACE statement
[ https://issues.apache.org/jira/browse/SPARK-48453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Milicevic updated SPARK-48453: Summary: [M1] Support for PRINT/TRACE statement (was: Support for PRINT/TRACE statement) > [M1] Support for PRINT/TRACE statement > -- > > Key: SPARK-48453 > URL: https://issues.apache.org/jira/browse/SPARK-48453 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: David Milicevic >Priority: Major > > This is not defined in Ref > Spec[,|https://docs.google.com/document/d/1cpSuR3KxRuTSJ4ZMQ73FJ4_-hjouNNU2zfI4vri6yhs/edit#heading=h.4cz970y1mk93],] > but during POC we figured out that it might be useful. > Still need to figure out the details when we get to it, because the > propagation to the client and UI on the client side might not be trivial, but > this needs further investigation. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48525) [M0] Private documentation
[ https://issues.apache.org/jira/browse/SPARK-48525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Milicevic updated SPARK-48525: Summary: [M0] Private documentation (was: Private documentation) > [M0] Private documentation > -- > > Key: SPARK-48525 > URL: https://issues.apache.org/jira/browse/SPARK-48525 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: David Milicevic >Priority: Major > > We do need some form of documentation for Private Preview - e.g. we used a > PDF doc for Collations. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48356) [M1] Support for FOR statement
[ https://issues.apache.org/jira/browse/SPARK-48356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Milicevic updated SPARK-48356: Summary: [M1] Support for FOR statement (was: Support for FOR statement) > [M1] Support for FOR statement > -- > > Key: SPARK-48356 > URL: https://issues.apache.org/jira/browse/SPARK-48356 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: David Milicevic >Priority: Major > > Details TBD. > > For more details: > * Design doc in parent Jira item. > * [SQL ref > spec|https://docs.google.com/document/d/1cpSuR3KxRuTSJ4ZMQ73FJ4_-hjouNNU2zfI4vri6yhs/edit]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48358) [M1] Support for REPEAT statement
[ https://issues.apache.org/jira/browse/SPARK-48358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Milicevic updated SPARK-48358: Summary: [M1] Support for REPEAT statement (was: Support for REPEAT statement) > [M1] Support for REPEAT statement > - > > Key: SPARK-48358 > URL: https://issues.apache.org/jira/browse/SPARK-48358 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: David Milicevic >Priority: Major > > Details TBD. > > For more details: > * Design doc in parent Jira item. > * [SQL ref > spec|https://docs.google.com/document/d/1cpSuR3KxRuTSJ4ZMQ73FJ4_-hjouNNU2zfI4vri6yhs/edit]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48357) [M1] Support for LOOP statement
[ https://issues.apache.org/jira/browse/SPARK-48357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Milicevic updated SPARK-48357: Summary: [M1] Support for LOOP statement (was: Support for LOOP statement) > [M1] Support for LOOP statement > --- > > Key: SPARK-48357 > URL: https://issues.apache.org/jira/browse/SPARK-48357 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: David Milicevic >Priority: Major > > Add support for LOOP statement. > > For more details: > * Design doc in parent Jira item. > * [SQL ref > spec|https://docs.google.com/document/d/1cpSuR3KxRuTSJ4ZMQ73FJ4_-hjouNNU2zfI4vri6yhs/edit]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48388) [M0] Fix SET behavior for scripts
[ https://issues.apache.org/jira/browse/SPARK-48388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Milicevic updated SPARK-48388: Summary: [M0] Fix SET behavior for scripts (was: Fix SET behavior for scripts) > [M0] Fix SET behavior for scripts > - > > Key: SPARK-48388 > URL: https://issues.apache.org/jira/browse/SPARK-48388 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: David Milicevic >Priority: Major > > By standard, SET is used to set variable value in SQL scripts. > On our end, SET is configured to work with some Hive configs, so the grammar > is a bit messed up and for that reason it was decided to use SET VAR instead > of SET to work with SQL variables. > This is not by standard and we should figure out the way to be able to use > SET for SQL variables and forbid setting of Hive configs from SQL scripts. > > For more details, design doc can be found in parent Jira item. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48457) [M0] Testing and operational readiness
[ https://issues.apache.org/jira/browse/SPARK-48457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Milicevic updated SPARK-48457: Summary: [M0] Testing and operational readiness (was: Testing and operational readiness) > [M0] Testing and operational readiness > -- > > Key: SPARK-48457 > URL: https://issues.apache.org/jira/browse/SPARK-48457 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: David Milicevic >Priority: Major > > We are basically doing this as we are developing the feature. This work item > should serve as a checkpoint by the end of M0 to figure out if we have > covered everything. > > Testing is very clearly defined by itself. > For the operational readiness part, we are still to figure out what exactly > we can do in the case of SQL scripting. It's a really straightforward feature > and public documentation should serve well enough for most of the issues we > might encounter. But, we should probably think about: > * Some KPI indicators. > * Telemetry. > * Something else? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48346) [M0] Support for IF ELSE statement
[ https://issues.apache.org/jira/browse/SPARK-48346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Milicevic updated SPARK-48346: Summary: [M0] Support for IF ELSE statement (was: Support for IF ELSE statement) > [M0] Support for IF ELSE statement > -- > > Key: SPARK-48346 > URL: https://issues.apache.org/jira/browse/SPARK-48346 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: David Milicevic >Priority: Major > > Add support for IF ELSE statements to SQL scripting parser & interpreter: > * IF > * IF / ELSE > * IF / ELSE IF / ELSE > > For more details, design doc can be found in parent Jira item. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48326) Upgrade submission worker base Spark version to 4.0.0-preview2
[ https://issues.apache.org/jira/browse/SPARK-48326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-48326: - Assignee: Dongjoon Hyun > Upgrade submission worker base Spark version to 4.0.0-preview2 > -- > > Key: SPARK-48326 > URL: https://issues.apache.org/jira/browse/SPARK-48326 > Project: Spark > Issue Type: Sub-task > Components: k8s >Affects Versions: kubernetes-operator-0.1.0 >Reporter: Zhou JIANG >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48347) [M0] Support for WHILE statement
[ https://issues.apache.org/jira/browse/SPARK-48347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Milicevic updated SPARK-48347: Summary: [M0] Support for WHILE statement (was: Support for WHILE statement) > [M0] Support for WHILE statement > > > Key: SPARK-48347 > URL: https://issues.apache.org/jira/browse/SPARK-48347 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: David Milicevic >Priority: Major > > Add support for WHILE statements to SQL scripting parser & interpreter. > > For more details, design doc can be found in parent Jira item. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48355) [M1] Support for CASE statement
[ https://issues.apache.org/jira/browse/SPARK-48355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Milicevic updated SPARK-48355: Summary: [M1] Support for CASE statement (was: Support for CASE statement) > [M1] Support for CASE statement > --- > > Key: SPARK-48355 > URL: https://issues.apache.org/jira/browse/SPARK-48355 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: David Milicevic >Priority: Major > > Details TBD. > > For more details: > * Design doc in parent Jira item. > * [SQL ref > spec.|https://docs.google.com/document/d/1_UCvU3dYdcniV66akT1K6huWX4g7jpXDKaoPRDSZr2E/edit#heading=h.4cz970y1mk93] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48452) [M0] Spark Connect investigation
[ https://issues.apache.org/jira/browse/SPARK-48452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Milicevic updated SPARK-48452: Summary: [M0] Spark Connect investigation (was: Spark Connect investigation) > [M0] Spark Connect investigation > > > Key: SPARK-48452 > URL: https://issues.apache.org/jira/browse/SPARK-48452 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: David Milicevic >Priority: Major > > Some notebook modes, VS code extension, etc. execute SQL commands through > Spark Connect. > We need to: > - Figure out exceptions that we are getting in Spark Connect stack for SQL > scripts. > - Understand the Spark Connect stack better, so we can more easily propose > design for new API(s) we are going to introduce in the future. > > For more details, design doc can be found in parent Jira item. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48326) Upgrade submission worker base Spark version to 4.0.0-preview2
[ https://issues.apache.org/jira/browse/SPARK-48326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-48326. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 13 [https://github.com/apache/spark-kubernetes-operator/pull/13] > Upgrade submission worker base Spark version to 4.0.0-preview2 > -- > > Key: SPARK-48326 > URL: https://issues.apache.org/jira/browse/SPARK-48326 > Project: Spark > Issue Type: Sub-task > Components: k8s >Affects Versions: kubernetes-operator-0.1.0 >Reporter: Zhou JIANG >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48348) [M0] Support for LEAVE statement
[ https://issues.apache.org/jira/browse/SPARK-48348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Milicevic updated SPARK-48348: Summary: [M0] Support for LEAVE statement (was: Support for LEAVE statement) > [M0] Support for LEAVE statement > > > Key: SPARK-48348 > URL: https://issues.apache.org/jira/browse/SPARK-48348 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: David Milicevic >Priority: Major > > Add support for LEAVE statement in WHILE (and other) loops to SQL scripting > parser & interpreter. > This is the same functionality as BREAK in other languages. > > For more details: > * Design doc in parent Jira item. > * [SQL ref > spec|https://docs.google.com/document/d/1cpSuR3KxRuTSJ4ZMQ73FJ4_-hjouNNU2zfI4vri6yhs/edit]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48345) [M0] Checks for variable declarations
[ https://issues.apache.org/jira/browse/SPARK-48345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Milicevic updated SPARK-48345: Summary: [M0] Checks for variable declarations (was: Checks for variable declarations) > [M0] Checks for variable declarations > - > > Key: SPARK-48345 > URL: https://issues.apache.org/jira/browse/SPARK-48345 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: David Milicevic >Priority: Major > > Add checks to parser (visitBatchBody() in AstBuilder) for variable > declarations, based on a passed-in flag: > * Variable can be declared only at the beginning of the compound. > * Support for exception when wrong variable declaration is encountered. > > For more details, design doc can be found in parent Jira item. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48353) [M0] Support for TRY/CATCH statement
[ https://issues.apache.org/jira/browse/SPARK-48353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Milicevic updated SPARK-48353: Summary: [M0] Support for TRY/CATCH statement (was: Support for TRY/CATCH statement) > [M0] Support for TRY/CATCH statement > > > Key: SPARK-48353 > URL: https://issues.apache.org/jira/browse/SPARK-48353 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: David Milicevic >Priority: Major > > Details TBD. > > For more details: > * Design doc in parent Jira item. > * [SQL ref > spec.|https://docs.google.com/document/d/1_UCvU3dYdcniV66akT1K6huWX4g7jpXDKaoPRDSZr2E/edit#heading=h.4cz970y1mk93] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48350) [M0] Support for exceptions thrown from parser/interpreter
[ https://issues.apache.org/jira/browse/SPARK-48350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Milicevic updated SPARK-48350: Summary: [M0] Support for exceptions thrown from parser/interpreter (was: Support for exceptions thrown from parser/interpreter) > [M0] Support for exceptions thrown from parser/interpreter > -- > > Key: SPARK-48350 > URL: https://issues.apache.org/jira/browse/SPARK-48350 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: David Milicevic >Priority: Major > > In general, add support for SQL scripting related exceptions. > By the time someone starts working on this item, some exception support might > already exist - check if it needs refactoring. > > Have in might that for some (all?) exceptions we might need to know which > line(s) in the script are responsible for it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48343) [M0] Interpreter support
[ https://issues.apache.org/jira/browse/SPARK-48343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Milicevic updated SPARK-48343: Summary: [M0] Interpreter support (was: Interpreter support) > [M0] Interpreter support > > > Key: SPARK-48343 > URL: https://issues.apache.org/jira/browse/SPARK-48343 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: David Milicevic >Priority: Major > > Implement interpreter for SQL scripting: > * Interpreter > * Interpreter testing > For more details, design doc can be found in parent Jira item. > Update design doc accordingly. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48344) [M0] Changes to sql() API
[ https://issues.apache.org/jira/browse/SPARK-48344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Milicevic updated SPARK-48344: Summary: [M0] Changes to sql() API (was: Changes to sql() API) > [M0] Changes to sql() API > - > > Key: SPARK-48344 > URL: https://issues.apache.org/jira/browse/SPARK-48344 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: David Milicevic >Priority: Major > > Implement changes to sql() API to support SQL script execution: > * SparkSession changes > * sql() API changes - iterate through the script, but return only last > dataframe > * Spark Config flag to enable/disable SQL scripting in sql() API > * E2E testing > > For more details, design doc can be found in parent Jira item. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48342) [M0] Parser support
[ https://issues.apache.org/jira/browse/SPARK-48342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Milicevic updated SPARK-48342: Summary: [M0] Parser support (was: Parser support) > [M0] Parser support > --- > > Key: SPARK-48342 > URL: https://issues.apache.org/jira/browse/SPARK-48342 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: David Milicevic >Priority: Major > Labels: pull-request-available > > Implement parse for SQL scripting with all supporting changes for upcoming > interpreter implementation and future extensions of the parser: > * Parser - support only compound statements > * Parser testing > > For more details, design doc can be found in parent Jira item. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48525) Private documentation
David Milicevic created SPARK-48525: --- Summary: Private documentation Key: SPARK-48525 URL: https://issues.apache.org/jira/browse/SPARK-48525 Project: Spark Issue Type: Sub-task Components: Spark Core Affects Versions: 4.0.0 Reporter: David Milicevic We do need some form of documentation for Private Preview - e.g. we used a PDF doc for Collations. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48455) Public documentation
[ https://issues.apache.org/jira/browse/SPARK-48455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Milicevic updated SPARK-48455: Description: I guess this shouldn't be anything revolutionary, just a basic doc with SQL Scripting grammar and functions explained properly. We might want to sync with Serge about this to figure out if he has any thoughts before we start working on it. was: Public documentation is officially Milestone 1 requirement, but I think we should start working on this even during Milestone 0. I guess this shouldn't be anything revolutionary, just a basic doc with SQL Scripting grammar and functions explained properly. We might want to sync with Serge about this to figure out if he has any thoughts before we start working on it. > Public documentation > > > Key: SPARK-48455 > URL: https://issues.apache.org/jira/browse/SPARK-48455 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: David Milicevic >Priority: Major > > I guess this shouldn't be anything revolutionary, just a basic doc with SQL > Scripting grammar and functions explained properly. > > We might want to sync with Serge about this to figure out if he has any > thoughts before we start working on it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48357) Support for LOOP statement
[ https://issues.apache.org/jira/browse/SPARK-48357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Milicevic updated SPARK-48357: Description: Add support for LOOP statement. For more details: * Design doc in parent Jira item. * [SQL ref spec|https://docs.google.com/document/d/1cpSuR3KxRuTSJ4ZMQ73FJ4_-hjouNNU2zfI4vri6yhs/edit]. was: Details TBD. Maybe split to multiple items? LEAVE should be the equivalent to BREAK? For more details: * Design doc in parent Jira item. * [SQL ref spec|https://docs.google.com/document/d/1cpSuR3KxRuTSJ4ZMQ73FJ4_-hjouNNU2zfI4vri6yhs/edit]. > Support for LOOP statement > -- > > Key: SPARK-48357 > URL: https://issues.apache.org/jira/browse/SPARK-48357 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: David Milicevic >Priority: Major > > Add support for LOOP statement. > > For more details: > * Design doc in parent Jira item. > * [SQL ref > spec|https://docs.google.com/document/d/1cpSuR3KxRuTSJ4ZMQ73FJ4_-hjouNNU2zfI4vri6yhs/edit]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48326) Upgrade submission worker base Spark version to 4.0.0-preview2
[ https://issues.apache.org/jira/browse/SPARK-48326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48326: --- Labels: pull-request-available (was: ) > Upgrade submission worker base Spark version to 4.0.0-preview2 > -- > > Key: SPARK-48326 > URL: https://issues.apache.org/jira/browse/SPARK-48326 > Project: Spark > Issue Type: Sub-task > Components: k8s >Affects Versions: kubernetes-operator-0.1.0 >Reporter: Zhou JIANG >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48357) Support for LOOP statement
[ https://issues.apache.org/jira/browse/SPARK-48357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Milicevic updated SPARK-48357: Summary: Support for LOOP statement (was: Support for LOOP and LEAVE statements) > Support for LOOP statement > -- > > Key: SPARK-48357 > URL: https://issues.apache.org/jira/browse/SPARK-48357 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: David Milicevic >Priority: Major > > Details TBD. > Maybe split to multiple items? > > LEAVE should be the equivalent to BREAK? > > For more details: > * Design doc in parent Jira item. > * [SQL ref > spec|https://docs.google.com/document/d/1cpSuR3KxRuTSJ4ZMQ73FJ4_-hjouNNU2zfI4vri6yhs/edit]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48376) Support for ITERATE statement
[ https://issues.apache.org/jira/browse/SPARK-48376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Milicevic updated SPARK-48376: Description: Add support for ITERATE statement in WHILE (and other) loops to SQL scripting parser & interpreter. This is the same functionality as CONTINUE in other languages. For more details: * Design doc in parent Jira item. * [SQL ref spec|https://docs.google.com/document/d/1cpSuR3KxRuTSJ4ZMQ73FJ4_-hjouNNU2zfI4vri6yhs/edit]. was: Details TBD. Maybe split to multiple items? ITERATE should be the equivalent to CONTINUE? For more details: * Design doc in parent Jira item. * [SQL ref spec|https://docs.google.com/document/d/1cpSuR3KxRuTSJ4ZMQ73FJ4_-hjouNNU2zfI4vri6yhs/edit]. > Support for ITERATE statement > - > > Key: SPARK-48376 > URL: https://issues.apache.org/jira/browse/SPARK-48376 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: David Milicevic >Priority: Major > > Add support for ITERATE statement in WHILE (and other) loops to SQL scripting > parser & interpreter. > This is the same functionality as CONTINUE in other languages. > > For more details: > * Design doc in parent Jira item. > * [SQL ref > spec|https://docs.google.com/document/d/1cpSuR3KxRuTSJ4ZMQ73FJ4_-hjouNNU2zfI4vri6yhs/edit]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48348) Support for LEAVE statement
[ https://issues.apache.org/jira/browse/SPARK-48348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Milicevic updated SPARK-48348: Description: Add support for LEAVE statement in WHILE (and other) loops to SQL scripting parser & interpreter. This is the same functionality as BREAK in other languages. For more details: * Design doc in parent Jira item. * [SQL ref spec|https://docs.google.com/document/d/1cpSuR3KxRuTSJ4ZMQ73FJ4_-hjouNNU2zfI4vri6yhs/edit]. was: Add support for LEAVE statement in WHILE (and other) loops to SQL scripting parser & interpreter. This is the same functionality as BREAK in other languages. For more details, design doc can be found in parent Jira item. > Support for LEAVE statement > --- > > Key: SPARK-48348 > URL: https://issues.apache.org/jira/browse/SPARK-48348 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: David Milicevic >Priority: Major > > Add support for LEAVE statement in WHILE (and other) loops to SQL scripting > parser & interpreter. > This is the same functionality as BREAK in other languages. > > For more details: > * Design doc in parent Jira item. > * [SQL ref > spec|https://docs.google.com/document/d/1cpSuR3KxRuTSJ4ZMQ73FJ4_-hjouNNU2zfI4vri6yhs/edit]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48348) Support for LEAVE statement
[ https://issues.apache.org/jira/browse/SPARK-48348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Milicevic updated SPARK-48348: Description: Add support for LEAVE statement in WHILE (and other) loops to SQL scripting parser & interpreter. This is the same functionality as BREAK in other languages. For more details, design doc can be found in parent Jira item. was: Add support for BREAK and CONTINUE statements in WHILE loops to SQL scripting parser & interpreter. For more details, design doc can be found in parent Jira item. > Support for LEAVE statement > --- > > Key: SPARK-48348 > URL: https://issues.apache.org/jira/browse/SPARK-48348 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: David Milicevic >Priority: Major > > Add support for LEAVE statement in WHILE (and other) loops to SQL scripting > parser & interpreter. > This is the same functionality as BREAK in other languages. > > For more details, design doc can be found in parent Jira item. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48348) Support for LEAVE statement
[ https://issues.apache.org/jira/browse/SPARK-48348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Milicevic updated SPARK-48348: Summary: Support for LEAVE statement (was: Support for BREAK and CONTINUE statements) > Support for LEAVE statement > --- > > Key: SPARK-48348 > URL: https://issues.apache.org/jira/browse/SPARK-48348 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: David Milicevic >Priority: Major > > Add support for BREAK and CONTINUE statements in WHILE loops to SQL scripting > parser & interpreter. > > For more details, design doc can be found in parent Jira item. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48524) Semantic equality of Not, IsNull and IsNotNull expressions are incorrect
[ https://issues.apache.org/jira/browse/SPARK-48524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48524: --- Labels: pull-request-available (was: ) > Semantic equality of Not, IsNull and IsNotNull expressions are incorrect > > > Key: SPARK-48524 > URL: https://issues.apache.org/jira/browse/SPARK-48524 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.4.3 >Reporter: Thomas Powell >Priority: Major > Labels: pull-request-available > > Not(IsNull) should be semantically equally to IsNotNull and vice versa. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48524) Semantic equality of Not, IsNull and IsNotNull expressions are incorrect
Thomas Powell created SPARK-48524: - Summary: Semantic equality of Not, IsNull and IsNotNull expressions are incorrect Key: SPARK-48524 URL: https://issues.apache.org/jira/browse/SPARK-48524 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.4.3 Reporter: Thomas Powell Not(IsNull) should be semantically equally to IsNotNull and vice versa. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48521) Repartition, sort and partitionBy not working together
[ https://issues.apache.org/jira/browse/SPARK-48521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alvaro Berdonces updated SPARK-48521: - Description: Hi, we are having some problems writing sorted csv’s using Spark 3.5.1. Example data: [Parquet 1M records from flights|https://www.tablab.app/datasets/sample/parquet?datatable-source=demo-flights-1m] Example code: {code:scala} val df = spark.read.parquet("Flights 1m.parquet").withColumn("partition_col", lit("2024")).localCheckpoint df.repartition(1).sort("FL_DATE", "DISTANCE").write.mode("overwrite").partitionBy("partition_col").csv("repartition_order") {code} Running previous example using Spark 3.3.4 writes a single file ordered by FL_DATE and DISTANCE fields inside the folder partition_col=2024. On the other hand when using Spark 3.5.1 it returns same number of files as cores in the executors (4 in my case using 2 executors with 2 cores each) and rows are not sorted inside. We can see that after repartition(1) and before the sort Spark adds another stage repartition(200) because of the default shuffle partitions value, and then AQE coalesce small partitions. Spark 3.3.4 plan: {code:java} == Physical Plan == Execute InsertIntoHadoopFsRelationCommand (8) +- AdaptiveSparkPlan (7) +- == Final Plan == * Sort (4) +- ShuffleQueryStage (3), Statistics(sizeInBytes=76.3 MiB, rowCount=1.00E+6) +- Exchange (2) +- * Scan ExistingRDD (1) +- == Initial Plan == Sort (6) +- Exchange (5) +- Scan ExistingRDD (1) {code} Spark 3.5.1 plan: {code:java} == Physical Plan == AdaptiveSparkPlan (15) +- == Final Plan == Execute InsertIntoHadoopFsRelationCommand (9) +- WriteFiles (8) +- * Sort (7) +- AQEShuffleRead (6) +- ShuffleQueryStage (5), Statistics(sizeInBytes=76.3 MiB, rowCount=1.00E+6) +- Exchange (4) +- ShuffleQueryStage (3), Statistics(sizeInBytes=76.3 MiB, rowCount=1.00E+6) +- Exchange (2) +- * Scan ExistingRDD (1) +- == Initial Plan == Execute InsertIntoHadoopFsRelationCommand (14) +- WriteFiles (13) +- Sort (12) +- Exchange (11) +- Exchange (10) +- Scan ExistingRDD (1) {code} was: Hi, we are having some problems writing sorted csv’s using Spark 3.5.1. Example data: [Parquet 1M records from flights|https://www.tablab.app/datasets/sample/parquet?datatable-source=demo-flights-1m] Example code: {code:scala} val df = spark.read.parquet("Flights 1m.parquet").withColumn("partition_col", lit("2024")).localCheckpoint df.repartition(1).sort("FL_DATE", "DISTANCE").write.mode("overwrite").partitionBy("partition_col").csv("repartition_order") {code} Running previous example using Spark 3.3.4 writes a single file ordered by FL_DATE and DISTANCE fields inside the folder partition_col=2024. On the other hand when using Spark 3.5.1 it returns same number of files as cores in the executors (4 in my case using 2 executors with 2 cores each) and rows are not sorted inside. We can see that after repartition(1) and before the sort Spark adds another stage repartition(200) because of the default shuffle partitions value, and then AQE coalesce small partitions. Spark 3.3.4 plan: {code:java} == Physical Plan == Execute InsertIntoHadoopFsRelationCommand (8) +- AdaptiveSparkPlan (7) +- == Final Plan == * Sort (4) +- ShuffleQueryStage (3), Statistics(sizeInBytes=76.3 MiB, rowCount=1.00E+6) +- Exchange (2) +- * Scan ExistingRDD (1) +- == Initial Plan == Sort (6) +- Exchange (5) +- Scan ExistingRDD (1) {code} Spark 3.5.1 plan: {code:java} == Physical Plan == AdaptiveSparkPlan (15) +- == Final Plan == Execute InsertIntoHadoopFsRelationCommand (9) +- WriteFiles (8) +- * Sort (7) +- AQEShuffleRead (6) +- ShuffleQueryStage (5), Statistics(sizeInBytes=76.3 MiB, rowCount=1.00E+6) +- Exchange (4) +- ShuffleQueryStage (3), Statistics(sizeInBytes=76.3 MiB, rowCount=1.00E+6) +- Exchange (2) +- * Scan ExistingRDD (1) +- == Initial Plan == Execute InsertIntoHadoopFsRelationCommand (14) +- WriteFiles (13) +- Sort (12) +- Exchange (11) +- Exchange (10) +- Scan ExistingRDD (1) {code} > Repartition, sort and partitionBy not working together > -- > > Key: SPARK-48521 > URL: https://issues.apache.org/jira/browse/SPARK-48521 > Project: Spark > Issue Type: Bug > Components: Optimizer, Spark Core, SQL >Affects Versions: 3.5.0, 3.5.1 >Reporter: Alvaro Berdonces >Priority: Major > > Hi, we are having some
[jira] [Assigned] (SPARK-48522) Update Stream Library to 2.9.8
[ https://issues.apache.org/jira/browse/SPARK-48522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie reassigned SPARK-48522: Assignee: Kent Yao > Update Stream Library to 2.9.8 > -- > > Key: SPARK-48522 > URL: https://issues.apache.org/jira/browse/SPARK-48522 > Project: Spark > Issue Type: Dependency upgrade > Components: Build >Affects Versions: 4.0.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48522) Update Stream Library to 2.9.8
[ https://issues.apache.org/jira/browse/SPARK-48522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie resolved SPARK-48522. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46861 [https://github.com/apache/spark/pull/46861] > Update Stream Library to 2.9.8 > -- > > Key: SPARK-48522 > URL: https://issues.apache.org/jira/browse/SPARK-48522 > Project: Spark > Issue Type: Dependency upgrade > Components: Build >Affects Versions: 4.0.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48506) Compression codec short names are case insensitive expect for event logging
[ https://issues.apache.org/jira/browse/SPARK-48506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie reassigned SPARK-48506: Assignee: Kent Yao > Compression codec short names are case insensitive expect for event logging > --- > > Key: SPARK-48506 > URL: https://issues.apache.org/jira/browse/SPARK-48506 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.3, 3.1.3, 3.2.4, 3.5.1, 3.3.4, 3.4.3 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48506) Compression codec short names are case insensitive expect for event logging
[ https://issues.apache.org/jira/browse/SPARK-48506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie resolved SPARK-48506. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46847 [https://github.com/apache/spark/pull/46847] > Compression codec short names are case insensitive expect for event logging > --- > > Key: SPARK-48506 > URL: https://issues.apache.org/jira/browse/SPARK-48506 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.3, 3.1.3, 3.2.4, 3.5.1, 3.3.4, 3.4.3 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-22876) spark.yarn.am.attemptFailuresValidityInterval does not work correctly
[ https://issues.apache.org/jira/browse/SPARK-22876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-22876: --- Labels: bulk-closed pull-request-available (was: bulk-closed) > spark.yarn.am.attemptFailuresValidityInterval does not work correctly > - > > Key: SPARK-22876 > URL: https://issues.apache.org/jira/browse/SPARK-22876 > Project: Spark > Issue Type: Bug > Components: Spark Core, YARN >Affects Versions: 2.2.0 > Environment: hadoop version 2.7.3 >Reporter: Jinhan Zhong >Priority: Minor > Labels: bulk-closed, pull-request-available > > I assume we can use spark.yarn.maxAppAttempts together with > spark.yarn.am.attemptFailuresValidityInterval to make a long running > application avoid stopping after acceptable number of failures. > But after testing, I found that the application always stops after failing n > times ( n is minimum value of spark.yarn.maxAppAttempts and > yarn.resourcemanager.am.max-attempts from client yarn-site.xml) > for example, following setup will allow the application master to fail 20 > times. > * spark.yarn.am.attemptFailuresValidityInterval=1s > * spark.yarn.maxAppAttempts=20 > * yarn client: yarn.resourcemanager.am.max-attempts=20 > * yarn resource manager: yarn.resourcemanager.am.max-attempts=3 > And after checking the source code, I found in source file > ApplicationMaster.scala > https://github.com/apache/spark/blob/master/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala#L293 > there's a ShutdownHook that checks the attempt id against the maxAppAttempts, > if attempt id >= maxAppAttempts, it will try to unregister the application > and the application will finish. > is this a expected design or a bug? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-45101) Spark UI: A stage is still active even when all of it's tasks are succeeded
[ https://issues.apache.org/jira/browse/SPARK-45101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17852025#comment-17852025 ] RickyMa commented on SPARK-45101: - No. It's just a Spark SQL. > Spark UI: A stage is still active even when all of it's tasks are succeeded > --- > > Key: SPARK-45101 > URL: https://issues.apache.org/jira/browse/SPARK-45101 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.1, 3.5.0, 4.0.0 >Reporter: RickyMa >Priority: Critical > Attachments: 1.png, 2.png, 3.png > > > In the stage UI, we can see all the tasks' statuses are SUCCESS. > But the stage is still marked as active. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48523) Add `grpc_max_message_size ` description to `client-connection-string.md`
[ https://issues.apache.org/jira/browse/SPARK-48523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48523: --- Labels: pull-request-available (was: ) > Add `grpc_max_message_size ` description to `client-connection-string.md` > - > > Key: SPARK-48523 > URL: https://issues.apache.org/jira/browse/SPARK-48523 > Project: Spark > Issue Type: Improvement > Components: Connect, Documentation >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48523) Add `grpc_max_message_size ` description to `client-connection-string.md`
[ https://issues.apache.org/jira/browse/SPARK-48523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BingKun Pan updated SPARK-48523: Summary: Add `grpc_max_message_size ` description to `client-connection-string.md` (was: Add `grpc_max_message_size ` to `client-connection-string.md`) > Add `grpc_max_message_size ` description to `client-connection-string.md` > - > > Key: SPARK-48523 > URL: https://issues.apache.org/jira/browse/SPARK-48523 > Project: Spark > Issue Type: Improvement > Components: Connect, Documentation >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48519) Upgrade jetty to 11.0.21
[ https://issues.apache.org/jira/browse/SPARK-48519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie resolved SPARK-48519. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46843 [https://github.com/apache/spark/pull/46843] > Upgrade jetty to 11.0.21 > > > Key: SPARK-48519 > URL: https://issues.apache.org/jira/browse/SPARK-48519 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > * https://github.com/jetty/jetty.project/releases/tag/jetty-11.0.21 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48519) Upgrade jetty to 11.0.21
[ https://issues.apache.org/jira/browse/SPARK-48519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie reassigned SPARK-48519: Assignee: Yang Jie > Upgrade jetty to 11.0.21 > > > Key: SPARK-48519 > URL: https://issues.apache.org/jira/browse/SPARK-48519 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Labels: pull-request-available > > * https://github.com/jetty/jetty.project/releases/tag/jetty-11.0.21 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48519) Upgrade jetty to 11.0.21
[ https://issues.apache.org/jira/browse/SPARK-48519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48519: --- Labels: pull-request-available (was: ) > Upgrade jetty to 11.0.21 > > > Key: SPARK-48519 > URL: https://issues.apache.org/jira/browse/SPARK-48519 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: Yang Jie >Priority: Major > Labels: pull-request-available > > * https://github.com/jetty/jetty.project/releases/tag/jetty-11.0.21 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48523) Add `grpc_max_message_size ` to `client-connection-string.md`
BingKun Pan created SPARK-48523: --- Summary: Add `grpc_max_message_size ` to `client-connection-string.md` Key: SPARK-48523 URL: https://issues.apache.org/jira/browse/SPARK-48523 Project: Spark Issue Type: Improvement Components: Connect, Documentation Affects Versions: 4.0.0 Reporter: BingKun Pan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48518) Make LZF compression be able to run in parallel
[ https://issues.apache.org/jira/browse/SPARK-48518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao reassigned SPARK-48518: Assignee: Kent Yao > Make LZF compression be able to run in parallel > --- > > Key: SPARK-48518 > URL: https://issues.apache.org/jira/browse/SPARK-48518 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48518) Make LZF compression be able to run in parallel
[ https://issues.apache.org/jira/browse/SPARK-48518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao resolved SPARK-48518. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46858 [https://github.com/apache/spark/pull/46858] > Make LZF compression be able to run in parallel > --- > > Key: SPARK-48518 > URL: https://issues.apache.org/jira/browse/SPARK-48518 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48522) Update Stream Library to 2.9.8
[ https://issues.apache.org/jira/browse/SPARK-48522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48522: --- Labels: pull-request-available (was: ) > Update Stream Library to 2.9.8 > -- > > Key: SPARK-48522 > URL: https://issues.apache.org/jira/browse/SPARK-48522 > Project: Spark > Issue Type: Dependency upgrade > Components: Build >Affects Versions: 4.0.0 >Reporter: Kent Yao >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48522) Update Stream Library to 2.9.8
Kent Yao created SPARK-48522: Summary: Update Stream Library to 2.9.8 Key: SPARK-48522 URL: https://issues.apache.org/jira/browse/SPARK-48522 Project: Spark Issue Type: Dependency upgrade Components: Build Affects Versions: 4.0.0 Reporter: Kent Yao -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48173) CheckAnalsis should see the entire query plan
[ https://issues.apache.org/jira/browse/SPARK-48173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-48173: Fix Version/s: 3.5.2 > CheckAnalsis should see the entire query plan > - > > Key: SPARK-48173 > URL: https://issues.apache.org/jira/browse/SPARK-48173 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0, 3.5.2 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48512) Refactor Python tests
[ https://issues.apache.org/jira/browse/SPARK-48512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-48512. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46852 [https://github.com/apache/spark/pull/46852] > Refactor Python tests > - > > Key: SPARK-48512 > URL: https://issues.apache.org/jira/browse/SPARK-48512 > Project: Spark > Issue Type: Sub-task > Components: PySpark, Tests >Affects Versions: 4.0.0 >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48521) Repartition, sort and partitionBy not working together
[ https://issues.apache.org/jira/browse/SPARK-48521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alvaro Berdonces updated SPARK-48521: - Description: Hi, we are having some problems writing sorted csv’s using Spark 3.5.1. Example data: [Parquet 1M records from flights|https://www.tablab.app/datasets/sample/parquet?datatable-source=demo-flights-1m] Example code: {code:scala} val df = spark.read.parquet("Flights 1m.parquet").withColumn("partition_col", lit("2024")).localCheckpoint df.repartition(1).sort("FL_DATE", "DISTANCE").write.mode("overwrite").partitionBy("partition_col").csv("repartition_order") {code} Running previous example using Spark 3.3.4 writes a single file ordered by FL_DATE and DISTANCE fields inside the folder partition_col=2024. On the other hand when using Spark 3.5.1 it returns same number of files as cores in the executors (4 in my case using 2 executors with 2 cores each) and rows are not sorted inside. We can see that after repartition(1) and before the sort Spark adds another stage repartition(200) because of the default shuffle partitions value, and then AQE coalesce small partitions. Spark 3.3.4 plan: {code:java} == Physical Plan == Execute InsertIntoHadoopFsRelationCommand (8) +- AdaptiveSparkPlan (7) +- == Final Plan == * Sort (4) +- ShuffleQueryStage (3), Statistics(sizeInBytes=76.3 MiB, rowCount=1.00E+6) +- Exchange (2) +- * Scan ExistingRDD (1) +- == Initial Plan == Sort (6) +- Exchange (5) +- Scan ExistingRDD (1) {code} Spark 3.5.1 plan: {code:java} == Physical Plan == AdaptiveSparkPlan (15) +- == Final Plan == Execute InsertIntoHadoopFsRelationCommand (9) +- WriteFiles (8) +- * Sort (7) +- AQEShuffleRead (6) +- ShuffleQueryStage (5), Statistics(sizeInBytes=76.3 MiB, rowCount=1.00E+6) +- Exchange (4) +- ShuffleQueryStage (3), Statistics(sizeInBytes=76.3 MiB, rowCount=1.00E+6) +- Exchange (2) +- * Scan ExistingRDD (1) +- == Initial Plan == Execute InsertIntoHadoopFsRelationCommand (14) +- WriteFiles (13) +- Sort (12) +- Exchange (11) +- Exchange (10) +- Scan ExistingRDD (1) {code} was: Hi, we are having some problems writing sorted csv’s using Spark 3.5.1. Example data: [Parquet 1M records from flights|https://www.tablab.app/datasets/sample/parquet?datatable-source=demo-flights-1m] Example code: {code:scala} val df = spark.read.parquet("Flights 1m.parquet").withColumn("partition_col", lit("2024")).localCheckpoint df.repartition(1).sort("FL_DATE", "DISTANCE").write.mode("overwrite").partitionBy("partition_col").csv("repartition_order") {code} Running previous example using Spark 3.3.4 writes a single file ordered by FL_DATE and DISTANCE fields inside the folder partition_col=2024. On the other hand when using Spark 3.5.1 it returns same number of files as cores in the executors (4 in my case using 2 executors with 2 cores each) and rows are not sorted inside. We can see that after repartition(1) and before the sort Spark adds another stage repartition(200) because of the default shuffle partitions value, and then AQE coalesce small partitions. Spark 3.3.4 plan: {code:java} == Physical Plan == Execute InsertIntoHadoopFsRelationCommand (8) +- AdaptiveSparkPlan (7) +- == Final Plan == * Sort (4) +- ShuffleQueryStage (3), Statistics(sizeInBytes=76.3 MiB, rowCount=1.00E+6) +- Exchange (2) +- * Scan ExistingRDD (1) +- == Initial Plan == Sort (6) +- Exchange (5) +- Scan ExistingRDD (1) {code} Spark 3.5.1 plan: {code:java} == Physical Plan == AdaptiveSparkPlan (15) +- == Final Plan == Execute InsertIntoHadoopFsRelationCommand (9) +- WriteFiles (8) +- * Sort (7) +- AQEShuffleRead (6) +- ShuffleQueryStage (5), Statistics(sizeInBytes=76.3 MiB, rowCount=1.00E+6) +- Exchange (4) +- ShuffleQueryStage (3), Statistics(sizeInBytes=76.3 MiB, rowCount=1.00E+6) +- Exchange (2) +- * Scan ExistingRDD (1) +- == Initial Plan == Execute InsertIntoHadoopFsRelationCommand (14) +- WriteFiles (13) +- Sort (12) +- Exchange (11) +- Exchange (10) +- Scan ExistingRDD (1) {code} > Repartition, sort and partitionBy not working together > -- > > Key: SPARK-48521 > URL: https://issues.apache.org/jira/browse/SPARK-48521 > Project: Spark > Issue Type: Bug > Components: Optimizer, Spark Core, SQL >Affects Versions: 3.5.0, 3.5.1 >Reporter: Alvaro Berdonces >Priority: Major > > Hi, we are
[jira] [Created] (SPARK-48521) Repartition, sort and partitionBy not working together
Alvaro Berdonces created SPARK-48521: Summary: Repartition, sort and partitionBy not working together Key: SPARK-48521 URL: https://issues.apache.org/jira/browse/SPARK-48521 Project: Spark Issue Type: Bug Components: Optimizer, Spark Core, SQL Affects Versions: 3.5.1, 3.5.0 Reporter: Alvaro Berdonces Hi, we are having some problems writing sorted csv’s using Spark 3.5.1. Example data: [Parquet 1M records from flights|https://www.tablab.app/datasets/sample/parquet?datatable-source=demo-flights-1m] Example code: {code:scala} val df = spark.read.parquet("Flights 1m.parquet").withColumn("partition_col", lit("2024")).localCheckpoint df.repartition(1).sort("FL_DATE", "DISTANCE").write.mode("overwrite").partitionBy("partition_col").csv("repartition_order") {code} Running previous example using Spark 3.3.4 writes a single file ordered by FL_DATE and DISTANCE fields inside the folder partition_col=2024. On the other hand when using Spark 3.5.1 it returns same number of files as cores in the executors (4 in my case using 2 executors with 2 cores each) and rows are not sorted inside. We can see that after repartition(1) and before the sort Spark adds another stage repartition(200) because of the default shuffle partitions value, and then AQE coalesce small partitions. Spark 3.3.4 plan: {code:java} == Physical Plan == Execute InsertIntoHadoopFsRelationCommand (8) +- AdaptiveSparkPlan (7) +- == Final Plan == * Sort (4) +- ShuffleQueryStage (3), Statistics(sizeInBytes=76.3 MiB, rowCount=1.00E+6) +- Exchange (2) +- * Scan ExistingRDD (1) +- == Initial Plan == Sort (6) +- Exchange (5) +- Scan ExistingRDD (1) {code} Spark 3.5.1 plan: {code:java} == Physical Plan == AdaptiveSparkPlan (15) +- == Final Plan == Execute InsertIntoHadoopFsRelationCommand (9) +- WriteFiles (8) +- * Sort (7) +- AQEShuffleRead (6) +- ShuffleQueryStage (5), Statistics(sizeInBytes=76.3 MiB, rowCount=1.00E+6) +- Exchange (4) +- ShuffleQueryStage (3), Statistics(sizeInBytes=76.3 MiB, rowCount=1.00E+6) +- Exchange (2) +- * Scan ExistingRDD (1) +- == Initial Plan == Execute InsertIntoHadoopFsRelationCommand (14) +- WriteFiles (13) +- Sort (12) +- Exchange (11) +- Exchange (10) +- Scan ExistingRDD (1) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48520) spark-sql does not support using Spark Connect
David Sisson created SPARK-48520: Summary: spark-sql does not support using Spark Connect Key: SPARK-48520 URL: https://issues.apache.org/jira/browse/SPARK-48520 Project: Spark Issue Type: Bug Components: Connect, SQL Affects Versions: 3.5.1 Reporter: David Sisson Similar to spark-shell (for Scala) specifying a Spark Connect option results in a "master URL must be set in your configuration" error.{{{}{}}} Sample execution: {{SPARK_REMOTE=sc://localhost spark-sql}} Another attempt at setting the same value: {{spark-sql --remote localhost:50051}} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38862) Let consumers provide their own method for Authentication for The REST Submission Server
[ https://issues.apache.org/jira/browse/SPARK-38862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jack updated SPARK-38862: - Component/s: Documentation > Let consumers provide their own method for Authentication for The REST > Submission Server > > > Key: SPARK-38862 > URL: https://issues.apache.org/jira/browse/SPARK-38862 > Project: Spark > Issue Type: New Feature > Components: Documentation, Spark Core, Spark Submit >Affects Versions: 3.4.0, 4.0.0 >Reporter: Jack >Priority: Major > Labels: authentication, pull-request-available, rest, spark, > spark-submit, submit > > [Spark documentation|https://spark.apache.org/docs/latest/security.html] > states that > ??The REST Submission Server and the MesosClusterDispatcher do not support > authentication. You should ensure that all network access to the REST API & > MesosClusterDispatcher (port 6066 and 7077 respectively by default) are > restricted to hosts that are trusted to submit jobs.?? > Whilst it is true that we can use network policies to restrict access to our > exposed submission endpoint, it would be preferable to at least also allow > some primitive form of authentication at a global level, whether this is by > some token provided to the runtime environment or is a "system user" using > basic authentication of a username/password combination - I am not strictly > opinionated and I think either would suffice. > Alternatively, one could implement a custom proxy to provide this > authentication check, but upon investigation this option is rejected by the > spark master as-is today. > I would imagine that whatever solution is agreed for a first phase, a custom > authenticator may be something we want a user to be able to provide so that > if an admin needed some more advanced authentication check, such as RBAC et > al, it could be facilitated without the need for writing a complete custom > proxy layer; although it could be argued just some basic built in layer being > available; eg. RestSubmissionBasicAuthenticator could be preferable. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38862) Let consumers provide their own method for Authentication for The REST Submission Server
[ https://issues.apache.org/jira/browse/SPARK-38862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jack updated SPARK-38862: - Affects Version/s: 4.0.0 > Let consumers provide their own method for Authentication for The REST > Submission Server > > > Key: SPARK-38862 > URL: https://issues.apache.org/jira/browse/SPARK-38862 > Project: Spark > Issue Type: New Feature > Components: Spark Core, Spark Submit >Affects Versions: 3.4.0, 4.0.0 >Reporter: Jack >Priority: Major > Labels: authentication, pull-request-available, rest, spark, > spark-submit, submit > > [Spark documentation|https://spark.apache.org/docs/latest/security.html] > states that > ??The REST Submission Server and the MesosClusterDispatcher do not support > authentication. You should ensure that all network access to the REST API & > MesosClusterDispatcher (port 6066 and 7077 respectively by default) are > restricted to hosts that are trusted to submit jobs.?? > Whilst it is true that we can use network policies to restrict access to our > exposed submission endpoint, it would be preferable to at least also allow > some primitive form of authentication at a global level, whether this is by > some token provided to the runtime environment or is a "system user" using > basic authentication of a username/password combination - I am not strictly > opinionated and I think either would suffice. > Alternatively, one could implement a custom proxy to provide this > authentication check, but upon investigation this option is rejected by the > spark master as-is today. > I would imagine that whatever solution is agreed for a first phase, a custom > authenticator may be something we want a user to be able to provide so that > if an admin needed some more advanced authentication check, such as RBAC et > al, it could be facilitated without the need for writing a complete custom > proxy layer; although it could be argued just some basic built in layer being > available; eg. RestSubmissionBasicAuthenticator could be preferable. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38862) Let consumers provide their own method for Authentication for The REST Submission Server
[ https://issues.apache.org/jira/browse/SPARK-38862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jack updated SPARK-38862: - Description: [Spark documentation|https://spark.apache.org/docs/latest/security.html] states that ??The REST Submission Server and the MesosClusterDispatcher do not support authentication. You should ensure that all network access to the REST API & MesosClusterDispatcher (port 6066 and 7077 respectively by default) are restricted to hosts that are trusted to submit jobs.?? Whilst it is true that we can use network policies to restrict access to our exposed submission endpoint, it would be preferable to at least also allow some primitive form of authentication at a global level, whether this is by some token provided to the runtime environment or is a "system user" using basic authentication of a username/password combination - I am not strictly opinionated and I think either would suffice. Alternatively, one could implement a custom proxy to provide this authentication check, but upon investigation this option is rejected by the spark master as-is today. I would imagine that whatever solution is agreed for a first phase, a custom authenticator may be something we want a user to be able to provide so that if an admin needed some more advanced authentication check, such as RBAC et al, it could be facilitated without the need for writing a complete custom proxy layer; although it could be argued just some basic built in layer being available; eg. RestSubmissionBasicAuthenticator could be preferable. was: [Spark documentation|https://spark.apache.org/docs/latest/security.html] states that ??The REST Submission Server and the MesosClusterDispatcher do not support authentication. You should ensure that all network access to the REST API & MesosClusterDispatcher (port 6066 and 7077 respectively by default) are restricted to hosts that are trusted to submit jobs.?? Whilst it is true that we can use network policies to restrict access to our exposed submission endpoint, it would be preferable to at least also allow some primitive form of authentication at a global level, whether this is by some token provided to the runtime environment or is a "system user" using basic authentication of a username/password combination - I am not strictly opinionated and I think either would suffice. I appreciate that one could implement a custom proxy to provide this authentication check, but it seems like a common use case that others may benefit from to be able to authenticate against the rest submission endpoint, and by implementing this capability as an optionally configurable aspect of Spark itself, we can utilise the existing server to provide this check. I would imagine that whatever solution is agreed for a first phase, a custom authenticator may be something we want a user to be able to provide so that if an admin needed some more advanced authentication check, such as RBAC et al, it could be facilitated without the need for writing a complete custom proxy layer; but I do feel there should be some basic built in available; eg. RestSubmissionBasicAuthenticator. > Let consumers provide their own method for Authentication for The REST > Submission Server > > > Key: SPARK-38862 > URL: https://issues.apache.org/jira/browse/SPARK-38862 > Project: Spark > Issue Type: New Feature > Components: Spark Core, Spark Submit >Affects Versions: 3.4.0 >Reporter: Jack >Priority: Major > Labels: authentication, pull-request-available, rest, spark, > spark-submit, submit > > [Spark documentation|https://spark.apache.org/docs/latest/security.html] > states that > ??The REST Submission Server and the MesosClusterDispatcher do not support > authentication. You should ensure that all network access to the REST API & > MesosClusterDispatcher (port 6066 and 7077 respectively by default) are > restricted to hosts that are trusted to submit jobs.?? > Whilst it is true that we can use network policies to restrict access to our > exposed submission endpoint, it would be preferable to at least also allow > some primitive form of authentication at a global level, whether this is by > some token provided to the runtime environment or is a "system user" using > basic authentication of a username/password combination - I am not strictly > opinionated and I think either would suffice. > Alternatively, one could implement a custom proxy to provide this > authentication check, but upon investigation this option is rejected by the > spark master as-is today. > I would imagine that whatever solution is agreed for a first phase, a custom > authenticator may be something we want a user to be able to provide so that > if an admin
[jira] [Commented] (SPARK-38862) Let consumers provide their own method for Authentication for The REST Submission Server
[ https://issues.apache.org/jira/browse/SPARK-38862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17851948#comment-17851948 ] Jack commented on SPARK-38862: -- After arguing with myself on this Jira, the outline of the approach taken in the linked PR is to provide a method of letting consumers specify that they will set up a secure gateway outside of spark itself. This approach feels best to start with since it's super simple and there are so many potential requirements at front door/user/app auth, likely so many that to do it well it would end up bloating out spark unnecessarily. Practically speaking, this allows somebody to opt in to tell spark they will spin up something like Nginx collocated with the master node, keeping those master ports protected in a private network space, and proxy all requests to rest server via this gateway. If they are on the same node/ip, I've found spark avoids assigning any ports that are already claimed - although any interested party validating my assumptions would be great. In essence, it means you can then enable the other spark.authenticate options and take control of this area yourself. [~dongjoon], please let me know if you would like me to provide some examples of such a configuration/example architecture of the solution in the docs, right now I've followed the general feel of the security documentation which is let users interpret and ensure they understand things themselves opposed to being overly prescriptive. Interpretation of what a secure gateway actually is probably means different things to different people. I've tried to keep the implementation open for extension without over anticipating what inbuilt auth might look like in the future. The new code is mostly all private to the master, so I feel no warrant for evolving/unstable annotations. I feel this solution is fairly comprehensive for this use case while maintaining backward compatibility. I based this on master/v4.0. > Let consumers provide their own method for Authentication for The REST > Submission Server > > > Key: SPARK-38862 > URL: https://issues.apache.org/jira/browse/SPARK-38862 > Project: Spark > Issue Type: New Feature > Components: Spark Core, Spark Submit >Affects Versions: 3.4.0 >Reporter: Jack >Priority: Major > Labels: authentication, pull-request-available, rest, spark, > spark-submit, submit > > [Spark documentation|https://spark.apache.org/docs/latest/security.html] > states that > ??The REST Submission Server and the MesosClusterDispatcher do not support > authentication. You should ensure that all network access to the REST API & > MesosClusterDispatcher (port 6066 and 7077 respectively by default) are > restricted to hosts that are trusted to submit jobs.?? > Whilst it is true that we can use network policies to restrict access to our > exposed submission endpoint, it would be preferable to at least also allow > some primitive form of authentication at a global level, whether this is by > some token provided to the runtime environment or is a "system user" using > basic authentication of a username/password combination - I am not strictly > opinionated and I think either would suffice. > I appreciate that one could implement a custom proxy to provide this > authentication check, but it seems like a common use case that others may > benefit from to be able to authenticate against the rest submission endpoint, > and by implementing this capability as an optionally configurable aspect of > Spark itself, we can utilise the existing server to provide this check. > I would imagine that whatever solution is agreed for a first phase, a custom > authenticator may be something we want a user to be able to provide so that > if an admin needed some more advanced authentication check, such as RBAC et > al, it could be facilitated without the need for writing a complete custom > proxy layer; but I do feel there should be some basic built in available; eg. > RestSubmissionBasicAuthenticator. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38862) Let consumers provide their own method for Authentication for The REST Submission Server
[ https://issues.apache.org/jira/browse/SPARK-38862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jack updated SPARK-38862: - Summary: Let consumers provide their own method for Authentication for The REST Submission Server (was: Basic Authentication or Token Based Authentication for The REST Submission Server) > Let consumers provide their own method for Authentication for The REST > Submission Server > > > Key: SPARK-38862 > URL: https://issues.apache.org/jira/browse/SPARK-38862 > Project: Spark > Issue Type: New Feature > Components: Spark Core, Spark Submit >Affects Versions: 3.4.0 >Reporter: Jack >Priority: Major > Labels: authentication, pull-request-available, rest, spark, > spark-submit, submit > > [Spark documentation|https://spark.apache.org/docs/latest/security.html] > states that > ??The REST Submission Server and the MesosClusterDispatcher do not support > authentication. You should ensure that all network access to the REST API & > MesosClusterDispatcher (port 6066 and 7077 respectively by default) are > restricted to hosts that are trusted to submit jobs.?? > Whilst it is true that we can use network policies to restrict access to our > exposed submission endpoint, it would be preferable to at least also allow > some primitive form of authentication at a global level, whether this is by > some token provided to the runtime environment or is a "system user" using > basic authentication of a username/password combination - I am not strictly > opinionated and I think either would suffice. > I appreciate that one could implement a custom proxy to provide this > authentication check, but it seems like a common use case that others may > benefit from to be able to authenticate against the rest submission endpoint, > and by implementing this capability as an optionally configurable aspect of > Spark itself, we can utilise the existing server to provide this check. > I would imagine that whatever solution is agreed for a first phase, a custom > authenticator may be something we want a user to be able to provide so that > if an admin needed some more advanced authentication check, such as RBAC et > al, it could be facilitated without the need for writing a complete custom > proxy layer; but I do feel there should be some basic built in available; eg. > RestSubmissionBasicAuthenticator. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48505) Simplify the implementation of Utils#isG1GC
[ https://issues.apache.org/jira/browse/SPARK-48505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao resolved SPARK-48505. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46783 [https://github.com/apache/spark/pull/46783] > Simplify the implementation of Utils#isG1GC > --- > > Key: SPARK-48505 > URL: https://issues.apache.org/jira/browse/SPARK-48505 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48505) Simplify the implementation of Utils#isG1GC
[ https://issues.apache.org/jira/browse/SPARK-48505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao reassigned SPARK-48505: Assignee: Yang Jie > Simplify the implementation of Utils#isG1GC > --- > > Key: SPARK-48505 > URL: https://issues.apache.org/jira/browse/SPARK-48505 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48519) Upgrade jetty to 11.0.21
Yang Jie created SPARK-48519: Summary: Upgrade jetty to 11.0.21 Key: SPARK-48519 URL: https://issues.apache.org/jira/browse/SPARK-48519 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 4.0.0 Reporter: Yang Jie * https://github.com/jetty/jetty.project/releases/tag/jetty-11.0.21 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org