[jira] [Updated] (SPARK-41490) Assign name to _LEGACY_ERROR_TEMP_2441

2023-02-01 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk updated SPARK-41490:
-
Fix Version/s: 3.4.0

> Assign name to _LEGACY_ERROR_TEMP_2441
> --
>
> Key: SPARK-41490
> URL: https://issues.apache.org/jira/browse/SPARK-41490
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
> Fix For: 3.4.0, 3.5.0
>
>
> We should assign proper name for all LEGACY temp error classes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42288) Expose file path if reading failed

2023-02-01 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42288:


Assignee: (was: Apache Spark)

> Expose file path if reading failed
> --
>
> Key: SPARK-42288
> URL: https://issues.apache.org/jira/browse/SPARK-42288
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Yi kaifei
>Priority: Minor
>
> `MalformedInputException` may be thrown because the decompression failed when 
> reading the file. In this case, the error message does not contain the file 
> name. If the file name is included, it is easier to locate the problem.
> {code:java}
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 41 in 
> stage 15641.0 failed 10 times, most recent failure: Lost task 41.9 in stage 
> 15641.0 (TID 6287211) (hostname executor 58): 
> io.airlift.compress.MalformedInputException: Malformed input: offset=65075
>   at 
> io.airlift.compress.snappy.SnappyRawDecompressor.uncompressAll(SnappyRawDecompressor.java:108)
>   at 
> io.airlift.compress.snappy.SnappyRawDecompressor.decompress(SnappyRawDecompressor.java:53)
>   at 
> io.airlift.compress.snappy.SnappyDecompressor.decompress(SnappyDecompressor.java:45)
>   at 
> org.apache.orc.impl.AircompressorCodec.decompress(AircompressorCodec.java:94)
>   at org.apache.orc.impl.SnappyCodec.decompress(SnappyCodec.java:45)
>   at 
> org.apache.orc.impl.InStream$CompressedStream.readHeader(InStream.java:495)
>   at 
> org.apache.orc.impl.InStream$CompressedStream.ensureUncompressed(InStream.java:522)
>   at org.apache.orc.impl.InStream$CompressedStream.read(InStream.java:509)
>   at 
> org.apache.orc.impl.SerializationUtils.readRemainingLongs(SerializationUtils.java:1102)
>   at 
> org.apache.orc.impl.SerializationUtils.unrolledUnPackBytes(SerializationUtils.java:1094)
>   at 
> org.apache.orc.impl.SerializationUtils.unrolledUnPack32(SerializationUtils.java:1059)
>   at 
> org.apache.orc.impl.SerializationUtils.readInts(SerializationUtils.java:925)
>   at 
> org.apache.orc.impl.RunLengthIntegerReaderV2.readDirectValues(RunLengthIntegerReaderV2.java:268)
>   at 
> org.apache.orc.impl.RunLengthIntegerReaderV2.readValues(RunLengthIntegerReaderV2.java:69)
>   at 
> org.apache.orc.impl.RunLengthIntegerReaderV2.next(RunLengthIntegerReaderV2.java:323)
>   at 
> org.apache.orc.impl.RunLengthIntegerReaderV2.nextVector(RunLengthIntegerReaderV2.java:373)
>   at 
> org.apache.orc.impl.TreeReaderFactory$LongTreeReader.nextVector(TreeReaderFactory.java:641)
>   at 
> org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextBatch(TreeReaderFactory.java:2047)
>   at 
> org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1219)
>   at 
> org.apache.spark.sql.execution.datasources.orc.OrcColumnarBatchReader.nextBatch(OrcColumnarBatchReader.java:197)
>   at 
> org.apache.spark.sql.execution.datasources.orc.OrcColumnarBatchReader.nextKeyValue(OrcColumnarBatchReader.java:99)
>   at 
> org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39)
>   at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:93)
>   at 
> org.apache.spark.sql.execution.FileSourceScanExec$$anon$1.hasNext(DataSourceScanExec.scala:522)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage8.columnartorow_nextBatch_0$(Unknown
>  Source)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage8.agg_doAggregateWithKeys_0$(Unknown
>  Source)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage8.processNext(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759)
>   at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
>   at 
> org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:179)
>   at 
> org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
>   at org.apache.spark.scheduler.Task.run(Task.scala:131)
>   at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:510)
>   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1491)
>   at 

[jira] [Assigned] (SPARK-42288) Expose file path if reading failed

2023-02-01 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42288:


Assignee: Apache Spark

> Expose file path if reading failed
> --
>
> Key: SPARK-42288
> URL: https://issues.apache.org/jira/browse/SPARK-42288
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Yi kaifei
>Assignee: Apache Spark
>Priority: Minor
>
> `MalformedInputException` may be thrown because the decompression failed when 
> reading the file. In this case, the error message does not contain the file 
> name. If the file name is included, it is easier to locate the problem.
> {code:java}
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 41 in 
> stage 15641.0 failed 10 times, most recent failure: Lost task 41.9 in stage 
> 15641.0 (TID 6287211) (hostname executor 58): 
> io.airlift.compress.MalformedInputException: Malformed input: offset=65075
>   at 
> io.airlift.compress.snappy.SnappyRawDecompressor.uncompressAll(SnappyRawDecompressor.java:108)
>   at 
> io.airlift.compress.snappy.SnappyRawDecompressor.decompress(SnappyRawDecompressor.java:53)
>   at 
> io.airlift.compress.snappy.SnappyDecompressor.decompress(SnappyDecompressor.java:45)
>   at 
> org.apache.orc.impl.AircompressorCodec.decompress(AircompressorCodec.java:94)
>   at org.apache.orc.impl.SnappyCodec.decompress(SnappyCodec.java:45)
>   at 
> org.apache.orc.impl.InStream$CompressedStream.readHeader(InStream.java:495)
>   at 
> org.apache.orc.impl.InStream$CompressedStream.ensureUncompressed(InStream.java:522)
>   at org.apache.orc.impl.InStream$CompressedStream.read(InStream.java:509)
>   at 
> org.apache.orc.impl.SerializationUtils.readRemainingLongs(SerializationUtils.java:1102)
>   at 
> org.apache.orc.impl.SerializationUtils.unrolledUnPackBytes(SerializationUtils.java:1094)
>   at 
> org.apache.orc.impl.SerializationUtils.unrolledUnPack32(SerializationUtils.java:1059)
>   at 
> org.apache.orc.impl.SerializationUtils.readInts(SerializationUtils.java:925)
>   at 
> org.apache.orc.impl.RunLengthIntegerReaderV2.readDirectValues(RunLengthIntegerReaderV2.java:268)
>   at 
> org.apache.orc.impl.RunLengthIntegerReaderV2.readValues(RunLengthIntegerReaderV2.java:69)
>   at 
> org.apache.orc.impl.RunLengthIntegerReaderV2.next(RunLengthIntegerReaderV2.java:323)
>   at 
> org.apache.orc.impl.RunLengthIntegerReaderV2.nextVector(RunLengthIntegerReaderV2.java:373)
>   at 
> org.apache.orc.impl.TreeReaderFactory$LongTreeReader.nextVector(TreeReaderFactory.java:641)
>   at 
> org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextBatch(TreeReaderFactory.java:2047)
>   at 
> org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1219)
>   at 
> org.apache.spark.sql.execution.datasources.orc.OrcColumnarBatchReader.nextBatch(OrcColumnarBatchReader.java:197)
>   at 
> org.apache.spark.sql.execution.datasources.orc.OrcColumnarBatchReader.nextKeyValue(OrcColumnarBatchReader.java:99)
>   at 
> org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39)
>   at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:93)
>   at 
> org.apache.spark.sql.execution.FileSourceScanExec$$anon$1.hasNext(DataSourceScanExec.scala:522)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage8.columnartorow_nextBatch_0$(Unknown
>  Source)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage8.agg_doAggregateWithKeys_0$(Unknown
>  Source)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage8.processNext(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759)
>   at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
>   at 
> org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:179)
>   at 
> org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
>   at org.apache.spark.scheduler.Task.run(Task.scala:131)
>   at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:510)
>   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1491)
>   at 

[jira] [Commented] (SPARK-42288) Expose file path if reading failed

2023-02-01 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17683245#comment-17683245
 ] 

Apache Spark commented on SPARK-42288:
--

User 'Yikf' has created a pull request for this issue:
https://github.com/apache/spark/pull/39858

> Expose file path if reading failed
> --
>
> Key: SPARK-42288
> URL: https://issues.apache.org/jira/browse/SPARK-42288
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Yi kaifei
>Priority: Minor
>
> `MalformedInputException` may be thrown because the decompression failed when 
> reading the file. In this case, the error message does not contain the file 
> name. If the file name is included, it is easier to locate the problem.
> {code:java}
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 41 in 
> stage 15641.0 failed 10 times, most recent failure: Lost task 41.9 in stage 
> 15641.0 (TID 6287211) (hostname executor 58): 
> io.airlift.compress.MalformedInputException: Malformed input: offset=65075
>   at 
> io.airlift.compress.snappy.SnappyRawDecompressor.uncompressAll(SnappyRawDecompressor.java:108)
>   at 
> io.airlift.compress.snappy.SnappyRawDecompressor.decompress(SnappyRawDecompressor.java:53)
>   at 
> io.airlift.compress.snappy.SnappyDecompressor.decompress(SnappyDecompressor.java:45)
>   at 
> org.apache.orc.impl.AircompressorCodec.decompress(AircompressorCodec.java:94)
>   at org.apache.orc.impl.SnappyCodec.decompress(SnappyCodec.java:45)
>   at 
> org.apache.orc.impl.InStream$CompressedStream.readHeader(InStream.java:495)
>   at 
> org.apache.orc.impl.InStream$CompressedStream.ensureUncompressed(InStream.java:522)
>   at org.apache.orc.impl.InStream$CompressedStream.read(InStream.java:509)
>   at 
> org.apache.orc.impl.SerializationUtils.readRemainingLongs(SerializationUtils.java:1102)
>   at 
> org.apache.orc.impl.SerializationUtils.unrolledUnPackBytes(SerializationUtils.java:1094)
>   at 
> org.apache.orc.impl.SerializationUtils.unrolledUnPack32(SerializationUtils.java:1059)
>   at 
> org.apache.orc.impl.SerializationUtils.readInts(SerializationUtils.java:925)
>   at 
> org.apache.orc.impl.RunLengthIntegerReaderV2.readDirectValues(RunLengthIntegerReaderV2.java:268)
>   at 
> org.apache.orc.impl.RunLengthIntegerReaderV2.readValues(RunLengthIntegerReaderV2.java:69)
>   at 
> org.apache.orc.impl.RunLengthIntegerReaderV2.next(RunLengthIntegerReaderV2.java:323)
>   at 
> org.apache.orc.impl.RunLengthIntegerReaderV2.nextVector(RunLengthIntegerReaderV2.java:373)
>   at 
> org.apache.orc.impl.TreeReaderFactory$LongTreeReader.nextVector(TreeReaderFactory.java:641)
>   at 
> org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextBatch(TreeReaderFactory.java:2047)
>   at 
> org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1219)
>   at 
> org.apache.spark.sql.execution.datasources.orc.OrcColumnarBatchReader.nextBatch(OrcColumnarBatchReader.java:197)
>   at 
> org.apache.spark.sql.execution.datasources.orc.OrcColumnarBatchReader.nextKeyValue(OrcColumnarBatchReader.java:99)
>   at 
> org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39)
>   at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:93)
>   at 
> org.apache.spark.sql.execution.FileSourceScanExec$$anon$1.hasNext(DataSourceScanExec.scala:522)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage8.columnartorow_nextBatch_0$(Unknown
>  Source)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage8.agg_doAggregateWithKeys_0$(Unknown
>  Source)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage8.processNext(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759)
>   at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
>   at 
> org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:179)
>   at 
> org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
>   at org.apache.spark.scheduler.Task.run(Task.scala:131)
>   at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:510)
>   at 

[jira] [Updated] (SPARK-41489) Assign name to _LEGACY_ERROR_TEMP_2415

2023-02-01 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk updated SPARK-41489:
-
Fix Version/s: 3.4.0

> Assign name to _LEGACY_ERROR_TEMP_2415
> --
>
> Key: SPARK-41489
> URL: https://issues.apache.org/jira/browse/SPARK-41489
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
> Fix For: 3.4.0, 3.5.0
>
>
> We should assign proper name for all LEGACY temp error classes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42288) Expose file path if reading failed

2023-02-01 Thread Yi kaifei (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi kaifei updated SPARK-42288:
--
Description: 
`MalformedInputException` may be thrown because the decompression failed when 
reading the file. In this case, the error message does not contain the file 
name. If the file name is included, it is easier to locate the problem.

 

```
org.apache.spark.SparkException: Job aborted due to stage failure: Task 41 in 
stage 15641.0 failed 10 times, most recent failure: Lost task 41.9 in stage 
15641.0 (TID 6287211) (hostname executor 58): 
io.airlift.compress.MalformedInputException: Malformed input: offset=65075
at 
io.airlift.compress.snappy.SnappyRawDecompressor.uncompressAll(SnappyRawDecompressor.java:108)
at 
io.airlift.compress.snappy.SnappyRawDecompressor.decompress(SnappyRawDecompressor.java:53)
at 
io.airlift.compress.snappy.SnappyDecompressor.decompress(SnappyDecompressor.java:45)
at 
org.apache.orc.impl.AircompressorCodec.decompress(AircompressorCodec.java:94)
at org.apache.orc.impl.SnappyCodec.decompress(SnappyCodec.java:45)
at 
org.apache.orc.impl.InStream$CompressedStream.readHeader(InStream.java:495)
at 
org.apache.orc.impl.InStream$CompressedStream.ensureUncompressed(InStream.java:522)
at org.apache.orc.impl.InStream$CompressedStream.read(InStream.java:509)
at 
org.apache.orc.impl.SerializationUtils.readRemainingLongs(SerializationUtils.java:1102)
at 
org.apache.orc.impl.SerializationUtils.unrolledUnPackBytes(SerializationUtils.java:1094)
at 
org.apache.orc.impl.SerializationUtils.unrolledUnPack32(SerializationUtils.java:1059)
at 
org.apache.orc.impl.SerializationUtils.readInts(SerializationUtils.java:925)
at 
org.apache.orc.impl.RunLengthIntegerReaderV2.readDirectValues(RunLengthIntegerReaderV2.java:268)
at 
org.apache.orc.impl.RunLengthIntegerReaderV2.readValues(RunLengthIntegerReaderV2.java:69)
at 
org.apache.orc.impl.RunLengthIntegerReaderV2.next(RunLengthIntegerReaderV2.java:323)
at 
org.apache.orc.impl.RunLengthIntegerReaderV2.nextVector(RunLengthIntegerReaderV2.java:373)
at 
org.apache.orc.impl.TreeReaderFactory$LongTreeReader.nextVector(TreeReaderFactory.java:641)
at 
org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextBatch(TreeReaderFactory.java:2047)
at 
org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1219)
at 
org.apache.spark.sql.execution.datasources.orc.OrcColumnarBatchReader.nextBatch(OrcColumnarBatchReader.java:197)
at 
org.apache.spark.sql.execution.datasources.orc.OrcColumnarBatchReader.nextKeyValue(OrcColumnarBatchReader.java:99)
at 
org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39)
at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:93)
at 
org.apache.spark.sql.execution.FileSourceScanExec$$anon$1.hasNext(DataSourceScanExec.scala:522)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage8.columnartorow_nextBatch_0$(Unknown
 Source)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage8.agg_doAggregateWithKeys_0$(Unknown
 Source)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage8.processNext(Unknown
 Source)
at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
at 
org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:179)
at 
org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
at org.apache.spark.scheduler.Task.run(Task.scala:131)
at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:510)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1491)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:513)
at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
```

> Expose file path if reading failed
> --
>
> Key: SPARK-42288
> URL: https://issues.apache.org/jira/browse/SPARK-42288
>

[jira] [Updated] (SPARK-42288) Expose file path if reading failed

2023-02-01 Thread Yi kaifei (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi kaifei updated SPARK-42288:
--
Description: 
`MalformedInputException` may be thrown because the decompression failed when 
reading the file. In this case, the error message does not contain the file 
name. If the file name is included, it is easier to locate the problem.
{code:java}
org.apache.spark.SparkException: Job aborted due to stage failure: Task 41 in 
stage 15641.0 failed 10 times, most recent failure: Lost task 41.9 in stage 
15641.0 (TID 6287211) (hostname executor 58): 
io.airlift.compress.MalformedInputException: Malformed input: offset=65075
at 
io.airlift.compress.snappy.SnappyRawDecompressor.uncompressAll(SnappyRawDecompressor.java:108)
at 
io.airlift.compress.snappy.SnappyRawDecompressor.decompress(SnappyRawDecompressor.java:53)
at 
io.airlift.compress.snappy.SnappyDecompressor.decompress(SnappyDecompressor.java:45)
at 
org.apache.orc.impl.AircompressorCodec.decompress(AircompressorCodec.java:94)
at org.apache.orc.impl.SnappyCodec.decompress(SnappyCodec.java:45)
at 
org.apache.orc.impl.InStream$CompressedStream.readHeader(InStream.java:495)
at 
org.apache.orc.impl.InStream$CompressedStream.ensureUncompressed(InStream.java:522)
at org.apache.orc.impl.InStream$CompressedStream.read(InStream.java:509)
at 
org.apache.orc.impl.SerializationUtils.readRemainingLongs(SerializationUtils.java:1102)
at 
org.apache.orc.impl.SerializationUtils.unrolledUnPackBytes(SerializationUtils.java:1094)
at 
org.apache.orc.impl.SerializationUtils.unrolledUnPack32(SerializationUtils.java:1059)
at 
org.apache.orc.impl.SerializationUtils.readInts(SerializationUtils.java:925)
at 
org.apache.orc.impl.RunLengthIntegerReaderV2.readDirectValues(RunLengthIntegerReaderV2.java:268)
at 
org.apache.orc.impl.RunLengthIntegerReaderV2.readValues(RunLengthIntegerReaderV2.java:69)
at 
org.apache.orc.impl.RunLengthIntegerReaderV2.next(RunLengthIntegerReaderV2.java:323)
at 
org.apache.orc.impl.RunLengthIntegerReaderV2.nextVector(RunLengthIntegerReaderV2.java:373)
at 
org.apache.orc.impl.TreeReaderFactory$LongTreeReader.nextVector(TreeReaderFactory.java:641)
at 
org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextBatch(TreeReaderFactory.java:2047)
at 
org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1219)
at 
org.apache.spark.sql.execution.datasources.orc.OrcColumnarBatchReader.nextBatch(OrcColumnarBatchReader.java:197)
at 
org.apache.spark.sql.execution.datasources.orc.OrcColumnarBatchReader.nextKeyValue(OrcColumnarBatchReader.java:99)
at 
org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39)
at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:93)
at 
org.apache.spark.sql.execution.FileSourceScanExec$$anon$1.hasNext(DataSourceScanExec.scala:522)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage8.columnartorow_nextBatch_0$(Unknown
 Source)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage8.agg_doAggregateWithKeys_0$(Unknown
 Source)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage8.processNext(Unknown
 Source)
at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
at 
org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:179)
at 
org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
at org.apache.spark.scheduler.Task.run(Task.scala:131)
at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:510)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1491)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:513)
at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
 {code}

  was:
`MalformedInputException` may be thrown because the decompression failed when 
reading the file. In this case, the error message does not contain the file 
name. If the file name is 

[jira] [Created] (SPARK-42288) Expose file path if reading failed

2023-02-01 Thread Yi kaifei (Jira)
Yi kaifei created SPARK-42288:
-

 Summary: Expose file path if reading failed
 Key: SPARK-42288
 URL: https://issues.apache.org/jira/browse/SPARK-42288
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.4.0
Reporter: Yi kaifei






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42285) Introduce conf spark.sql.parquet.inferTimestampNTZ.enabled for TimestampNTZ inference on Parquet

2023-02-01 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17683238#comment-17683238
 ] 

Apache Spark commented on SPARK-42285:
--

User 'gengliangwang' has created a pull request for this issue:
https://github.com/apache/spark/pull/39856

> Introduce conf spark.sql.parquet.inferTimestampNTZ.enabled for TimestampNTZ 
> inference on Parquet
> 
>
> Key: SPARK-42285
> URL: https://issues.apache.org/jira/browse/SPARK-42285
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>
> Introduce conf spark.sql.parquet.inferTimestampNTZ.enabled for TimestampNTZ 
> inference on Parquet, instead of using spark.sql.parquet.timestampNTZ.enabled 
> which makes it impossible for TimestampNTZ writing when the flag is disabled.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42287) Refactor `assembly / assemblyExcludedJars` rule in `SparkConnectClient`

2023-02-01 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42287:


Assignee: Apache Spark

> Refactor `assembly / assemblyExcludedJars` rule in `SparkConnectClient`
> ---
>
> Key: SPARK-42287
> URL: https://issues.apache.org/jira/browse/SPARK-42287
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42287) Refactor `assembly / assemblyExcludedJars` rule in `SparkConnectClient`

2023-02-01 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42287:


Assignee: (was: Apache Spark)

> Refactor `assembly / assemblyExcludedJars` rule in `SparkConnectClient`
> ---
>
> Key: SPARK-42287
> URL: https://issues.apache.org/jira/browse/SPARK-42287
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42287) Refactor `assembly / assemblyExcludedJars` rule in `SparkConnectClient`

2023-02-01 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17683237#comment-17683237
 ] 

Apache Spark commented on SPARK-42287:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/39857

> Refactor `assembly / assemblyExcludedJars` rule in `SparkConnectClient`
> ---
>
> Key: SPARK-42287
> URL: https://issues.apache.org/jira/browse/SPARK-42287
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42217) Support lateral column alias in queries with Window

2023-02-01 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-42217.

Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39773
[https://github.com/apache/spark/pull/39773]

> Support lateral column alias in queries with Window
> ---
>
> Key: SPARK-42217
> URL: https://issues.apache.org/jira/browse/SPARK-42217
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Xinyi Yu
>Assignee: Xinyi Yu
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42217) Support lateral column alias in queries with Window

2023-02-01 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang reassigned SPARK-42217:
--

Assignee: Xinyi Yu

> Support lateral column alias in queries with Window
> ---
>
> Key: SPARK-42217
> URL: https://issues.apache.org/jira/browse/SPARK-42217
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Xinyi Yu
>Assignee: Xinyi Yu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42285) Introduce conf spark.sql.parquet.inferTimestampNTZ.enabled for TimestampNTZ inference on Parquet

2023-02-01 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-42285.

Resolution: Fixed

Resolved in https://github.com/apache/spark/pull/39856

> Introduce conf spark.sql.parquet.inferTimestampNTZ.enabled for TimestampNTZ 
> inference on Parquet
> 
>
> Key: SPARK-42285
> URL: https://issues.apache.org/jira/browse/SPARK-42285
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>
> Introduce conf spark.sql.parquet.inferTimestampNTZ.enabled for TimestampNTZ 
> inference on Parquet, instead of using spark.sql.parquet.timestampNTZ.enabled 
> which makes it impossible for TimestampNTZ writing when the flag is disabled.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42287) Refactor `assembly / assemblyExcludedJars` rule in `SparkConnectClient`

2023-02-01 Thread Yang Jie (Jira)
Yang Jie created SPARK-42287:


 Summary: Refactor `assembly / assemblyExcludedJars` rule in 
`SparkConnectClient`
 Key: SPARK-42287
 URL: https://issues.apache.org/jira/browse/SPARK-42287
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.5.0
Reporter: Yang Jie






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42273) Skip Spark Connect tests if dependencies are not installed

2023-02-01 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-42273.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39840
[https://github.com/apache/spark/pull/39840]

> Skip Spark Connect tests if dependencies are not installed
> --
>
> Key: SPARK-42273
> URL: https://issues.apache.org/jira/browse/SPARK-42273
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Tests
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.4.0
>
>
> {code}
> arget/7411b1a1-5ebc-47a6-b3cb-c73dedc9a3c9/python3.9__pyspark.sql.tests.connect.test_parity_catalog__7iw4wnpb.log)
> Traceback (most recent call last):
>   File 
> "/usr/local/Cellar/python@3.9/3.9.10/Frameworks/Python.framework/Versions/3.9/lib/python3.9/runpy.py",
>  line 197, in _run_module_as_main
> return _run_code(code, main_globals, None,
>   File 
> "/usr/local/Cellar/python@3.9/3.9.10/Frameworks/Python.framework/Versions/3.9/lib/python3.9/runpy.py",
>  line 87, in _run_code
> exec(code, run_globals)
>   File "/.../spark/python/pyspark/sql/tests/connect/test_connect_basic.py", 
> line 29, in 
> from pyspark.sql.connect.client import Retrying
>   File "/.../spark/python/pyspark/sql/connect/__init__.py", line 21, in 
> 
> from pyspark.sql.connect.dataframe import DataFrame  # noqa: F401
>   File "/.../spark/python/pyspark/sql/connect/dataframe.py", line 50, in 
> 
> import pyspark.sql.connect.plan as plan
>   File "/.../spark/python/pyspark/sql/connect/plan.py", line 26, in 
> import pyspark.sql.connect.proto as proto
>   File "/.../spark/python/pyspark/sql/connect/proto/__init__.py", line 18, in 
> 
> from pyspark.sql.connect.proto.base_pb2_grpc import *
>   File "/.../spark/python/pyspark/sql/connect/proto/base_pb2_grpc.py", line 
> 19, in 
> import grpc
> ModuleNotFoundError: No module named 'grpc'
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42273) Skip Spark Connect tests if dependencies are not installed

2023-02-01 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-42273:


Assignee: Hyukjin Kwon

> Skip Spark Connect tests if dependencies are not installed
> --
>
> Key: SPARK-42273
> URL: https://issues.apache.org/jira/browse/SPARK-42273
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Tests
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>
> {code}
> arget/7411b1a1-5ebc-47a6-b3cb-c73dedc9a3c9/python3.9__pyspark.sql.tests.connect.test_parity_catalog__7iw4wnpb.log)
> Traceback (most recent call last):
>   File 
> "/usr/local/Cellar/python@3.9/3.9.10/Frameworks/Python.framework/Versions/3.9/lib/python3.9/runpy.py",
>  line 197, in _run_module_as_main
> return _run_code(code, main_globals, None,
>   File 
> "/usr/local/Cellar/python@3.9/3.9.10/Frameworks/Python.framework/Versions/3.9/lib/python3.9/runpy.py",
>  line 87, in _run_code
> exec(code, run_globals)
>   File "/.../spark/python/pyspark/sql/tests/connect/test_connect_basic.py", 
> line 29, in 
> from pyspark.sql.connect.client import Retrying
>   File "/.../spark/python/pyspark/sql/connect/__init__.py", line 21, in 
> 
> from pyspark.sql.connect.dataframe import DataFrame  # noqa: F401
>   File "/.../spark/python/pyspark/sql/connect/dataframe.py", line 50, in 
> 
> import pyspark.sql.connect.plan as plan
>   File "/.../spark/python/pyspark/sql/connect/plan.py", line 26, in 
> import pyspark.sql.connect.proto as proto
>   File "/.../spark/python/pyspark/sql/connect/proto/__init__.py", line 18, in 
> 
> from pyspark.sql.connect.proto.base_pb2_grpc import *
>   File "/.../spark/python/pyspark/sql/connect/proto/base_pb2_grpc.py", line 
> 19, in 
> import grpc
> ModuleNotFoundError: No module named 'grpc'
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42271) Reuse UDF test cases under `pyspark.sql.tests`

2023-02-01 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-42271:


Assignee: Xinrong Meng

> Reuse UDF test cases under `pyspark.sql.tests`
> --
>
> Key: SPARK-42271
> URL: https://issues.apache.org/jira/browse/SPARK-42271
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42271) Reuse UDF test cases under `pyspark.sql.tests`

2023-02-01 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-42271.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39814
[https://github.com/apache/spark/pull/39814]

> Reuse UDF test cases under `pyspark.sql.tests`
> --
>
> Key: SPARK-42271
> URL: https://issues.apache.org/jira/browse/SPARK-42271
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42282) Split 'pyspark.pandas.tests.test_groupby'

2023-02-01 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng reassigned SPARK-42282:
-

Assignee: Ruifeng Zheng

> Split 'pyspark.pandas.tests.test_groupby'
> -
>
> Key: SPARK-42282
> URL: https://issues.apache.org/jira/browse/SPARK-42282
> Project: Spark
>  Issue Type: Test
>  Components: ps, Tests
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42282) Split 'pyspark.pandas.tests.test_groupby'

2023-02-01 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-42282.
---
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39849
[https://github.com/apache/spark/pull/39849]

> Split 'pyspark.pandas.tests.test_groupby'
> -
>
> Key: SPARK-42282
> URL: https://issues.apache.org/jira/browse/SPARK-42282
> Project: Spark
>  Issue Type: Test
>  Components: ps, Tests
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Minor
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42093) Move JavaTypeInference to AgnosticEncoders

2023-02-01 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-42093.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39615
[https://github.com/apache/spark/pull/39615]

> Move JavaTypeInference to AgnosticEncoders
> --
>
> Key: SPARK-42093
> URL: https://issues.apache.org/jira/browse/SPARK-42093
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Assignee: Herman van Hövell
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42268) Add UserDefinedType in protos

2023-02-01 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-42268.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39835
[https://github.com/apache/spark/pull/39835]

> Add UserDefinedType in protos
> -
>
> Key: SPARK-42268
> URL: https://issues.apache.org/jira/browse/SPARK-42268
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42268) Add UserDefinedType in protos

2023-02-01 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-42268:


Assignee: Ruifeng Zheng

> Add UserDefinedType in protos
> -
>
> Key: SPARK-42268
> URL: https://issues.apache.org/jira/browse/SPARK-42268
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42275) Avoid using built-in list, dict in static typing

2023-02-01 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-42275.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39844
[https://github.com/apache/spark/pull/39844]

> Avoid using built-in list, dict in static typing
> 
>
> Key: SPARK-42275
> URL: https://issues.apache.org/jira/browse/SPARK-42275
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42275) Avoid using built-in list, dict in static typing

2023-02-01 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-42275:


Assignee: Ruifeng Zheng

> Avoid using built-in list, dict in static typing
> 
>
> Key: SPARK-42275
> URL: https://issues.apache.org/jira/browse/SPARK-42275
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42279) Simplify `pyspark.pandas.tests.test_resample`

2023-02-01 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-42279:


Assignee: Ruifeng Zheng

> Simplify `pyspark.pandas.tests.test_resample`
> -
>
> Key: SPARK-42279
> URL: https://issues.apache.org/jira/browse/SPARK-42279
> Project: Spark
>  Issue Type: Test
>  Components: ps, Tests
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42279) Simplify `pyspark.pandas.tests.test_resample`

2023-02-01 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-42279.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39847
[https://github.com/apache/spark/pull/39847]

> Simplify `pyspark.pandas.tests.test_resample`
> -
>
> Key: SPARK-42279
> URL: https://issues.apache.org/jira/browse/SPARK-42279
> Project: Spark
>  Issue Type: Test
>  Components: ps, Tests
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Minor
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42284) Make sure Connect Server assembly jar is available before we run Scala Client tests

2023-02-01 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-42284:


Assignee: Herman van Hövell

> Make sure Connect Server assembly jar is available before we run Scala Client 
> tests
> ---
>
> Key: SPARK-42284
> URL: https://issues.apache.org/jira/browse/SPARK-42284
> Project: Spark
>  Issue Type: Task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Assignee: Herman van Hövell
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42284) Make sure Connect Server assembly jar is available before we run Scala Client tests

2023-02-01 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-42284.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39854
[https://github.com/apache/spark/pull/39854]

> Make sure Connect Server assembly jar is available before we run Scala Client 
> tests
> ---
>
> Key: SPARK-42284
> URL: https://issues.apache.org/jira/browse/SPARK-42284
> Project: Spark
>  Issue Type: Task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Assignee: Herman van Hövell
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42115) Push down limit through Python UDFs

2023-02-01 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-42115:
---

Assignee: Hyukjin Kwon

> Push down limit through Python UDFs
> ---
>
> Key: SPARK-42115
> URL: https://issues.apache.org/jira/browse/SPARK-42115
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>
> {code}
> from pyspark.sql.functions import udf
> spark.range(10).write.mode("overwrite").parquet("/tmp/abc")
> @udf(returnType='string')
> def my_udf(arg):
> return arg
> df = spark.read.parquet("/tmp/abc")
> df.limit(10).withColumn("prediction", my_udf(df["id"])).explain()
> {code}
> As an example. since Python UDFs are executed asynchronously, so pushing 
> limits benefit the performance.
> {code}
> == Physical Plan ==
> CollectLimit 10
> +- *(2) Project [id#3L, pythonUDF0#10 AS prediction#6]
>+- BatchEvalPython [my_udf(id#3L)#5], [pythonUDF0#10]
>   +- *(1) ColumnarToRow
>  +- FileScan parquet [id#3L] Batched: true, DataFilters: [], Format: 
> Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/abc], 
> PartitionFilters: [], PushedFilters: [], ReadSchema: struct
> {code}
> This is a regression from Spark 3.3.1:
> {code}
> == Physical Plan ==
> AdaptiveSparkPlan isFinalPlan=false
> +- Project [id#3L, pythonUDF0#10 AS prediction#6]
>    +- BatchEvalPython [my_udf(id#3L)#5], [pythonUDF0#10]
>       +- GlobalLimit 10
>          +- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=30]
>             +- LocalLimit 10
>                +- FileScan parquet [id#3L] Batched: true, DataFilters: [], 
> Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/abc], 
> PartitionFilters: [], PushedFilters: [], ReadSchema: struct
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42115) Push down limit through Python UDFs

2023-02-01 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-42115.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39842
[https://github.com/apache/spark/pull/39842]

> Push down limit through Python UDFs
> ---
>
> Key: SPARK-42115
> URL: https://issues.apache.org/jira/browse/SPARK-42115
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.4.0
>
>
> {code}
> from pyspark.sql.functions import udf
> spark.range(10).write.mode("overwrite").parquet("/tmp/abc")
> @udf(returnType='string')
> def my_udf(arg):
> return arg
> df = spark.read.parquet("/tmp/abc")
> df.limit(10).withColumn("prediction", my_udf(df["id"])).explain()
> {code}
> As an example. since Python UDFs are executed asynchronously, so pushing 
> limits benefit the performance.
> {code}
> == Physical Plan ==
> CollectLimit 10
> +- *(2) Project [id#3L, pythonUDF0#10 AS prediction#6]
>+- BatchEvalPython [my_udf(id#3L)#5], [pythonUDF0#10]
>   +- *(1) ColumnarToRow
>  +- FileScan parquet [id#3L] Batched: true, DataFilters: [], Format: 
> Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/abc], 
> PartitionFilters: [], PushedFilters: [], ReadSchema: struct
> {code}
> This is a regression from Spark 3.3.1:
> {code}
> == Physical Plan ==
> AdaptiveSparkPlan isFinalPlan=false
> +- Project [id#3L, pythonUDF0#10 AS prediction#6]
>    +- BatchEvalPython [my_udf(id#3L)#5], [pythonUDF0#10]
>       +- GlobalLimit 10
>          +- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=30]
>             +- LocalLimit 10
>                +- FileScan parquet [id#3L] Batched: true, DataFilters: [], 
> Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/abc], 
> PartitionFilters: [], PushedFilters: [], ReadSchema: struct
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39375) SPIP: Spark Connect - A client and server interface for Apache Spark

2023-02-01 Thread Erik Krogen (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17683177#comment-17683177
 ] 

Erik Krogen commented on SPARK-39375:
-

I see some work being done on UDFs (SPARK-42246 for PySpark UDFs, SPARK-42283 
for the start of Scala UDFs). In the [design doc for Spark 
Connect|https://docs.google.com/document/d/17X6-P5H2522SnE-gF1BVwyildp_PDX8oXD-4l9vqQmA/edit#]
 UDFs were left as a later problem. Do we have a design/approach documented 
anywhere for UDFs? The design of these is a crucial part of the future/success 
of Spark Connect and it's a bit concerning to me that we're making 
implementation progres in this direction without an agreed-upon design (AFAICT 
-- please let me know if I missed something).

> SPIP: Spark Connect - A client and server interface for Apache Spark
> 
>
> Key: SPARK-39375
> URL: https://issues.apache.org/jira/browse/SPARK-39375
> Project: Spark
>  Issue Type: Epic
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Martin Grund
>Assignee: Martin Grund
>Priority: Critical
>  Labels: SPIP
>
> Please find the full document for discussion here: [Spark Connect 
> SPIP|https://docs.google.com/document/d/1Mnl6jmGszixLW4KcJU5j9IgpG9-UabS0dcM6PM2XGDc/edit#heading=h.wmsrrfealhrj]
>  Below, we have just referenced the introduction.
> h2. What are you trying to do?
> While Spark is used extensively, it was designed nearly a decade ago, which, 
> in the age of serverless computing and ubiquitous programming language use, 
> poses a number of limitations. Most of the limitations stem from the tightly 
> coupled Spark driver architecture and fact that clusters are typically shared 
> across users: (1) {*}Lack of built-in remote connectivity{*}: the Spark 
> driver runs both the client application and scheduler, which results in a 
> heavyweight architecture that requires proximity to the cluster. There is no 
> built-in capability to  remotely connect to a Spark cluster in languages 
> other than SQL and users therefore rely on external solutions such as the 
> inactive project [Apache Livy|https://livy.apache.org/]. (2) {*}Lack of rich 
> developer experience{*}: The current architecture and APIs do not cater for 
> interactive data exploration (as done with Notebooks), or allow for building 
> out rich developer experience common in modern code editors. (3) 
> {*}Stability{*}: with the current shared driver architecture, users causing 
> critical exceptions (e.g. OOM) bring the whole cluster down for all users. 
> (4) {*}Upgradability{*}: the current entangling of platform and client APIs 
> (e.g. first and third-party dependencies in the classpath) does not allow for 
> seamless upgrades between Spark versions (and with that, hinders new feature 
> adoption).
>  
> We propose to overcome these challenges by building on the DataFrame API and 
> the underlying unresolved logical plans. The DataFrame API is widely used and 
> makes it very easy to iteratively express complex logic. We will introduce 
> {_}Spark Connect{_}, a remote option of the DataFrame API that separates the 
> client from the Spark server. With Spark Connect, Spark will become 
> decoupled, allowing for built-in remote connectivity: The decoupled client 
> SDK can be used to run interactive data exploration and connect to the server 
> for DataFrame operations. 
>  
> Spark Connect will benefit Spark developers in different ways: The decoupled 
> architecture will result in improved stability, as clients are separated from 
> the driver. From the Spark Connect client perspective, Spark will be (almost) 
> versionless, and thus enable seamless upgradability, as server APIs can 
> evolve without affecting the client API. The decoupled client-server 
> architecture can be leveraged to build close integrations with local 
> developer tooling. Finally, separating the client process from the Spark 
> server process will improve Spark’s overall security posture by avoiding the 
> tight coupling of the client inside the Spark runtime environment.
>  
> Spark Connect will strengthen Spark’s position as the modern unified engine 
> for large-scale data analytics and expand applicability to use cases and 
> developers we could not reach with the current setup: Spark will become 
> ubiquitously usable as the DataFrame API can be used with (almost) any 
> programming language.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42286) Fix internal error for valid CASE WHEN expression with CAST when inserting into a table

2023-02-01 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42286:


Assignee: (was: Apache Spark)

> Fix internal error for valid CASE WHEN expression with CAST when inserting 
> into a table
> ---
>
> Key: SPARK-42286
> URL: https://issues.apache.org/jira/browse/SPARK-42286
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Runyao.Chen
>Priority: Major
>
> ```
> spark-sql> create or replace table es570639t1 as select x FROM values (1), 
> (2), (3) as tab(x);
> spark-sql> create or replace table es570639t2 (x Decimal(9, 0));
> spark-sql> insert into es570639t2 select 0 - (case when x = 1 then 1 else x 
> end) from es570639t1 where x = 1;
> ```
> hits the following internal error
> org.apache.spark.SparkException: [INTERNAL_ERROR] Child is not Cast or 
> ExpressionProxy of Cast
>  
> Stack trace:
> org.apache.spark.SparkException: [INTERNAL_ERROR] Child is not Cast or 
> ExpressionProxy of Cast at 
> org.apache.spark.SparkException$.internalError(SparkException.scala:78) at 
> org.apache.spark.SparkException$.internalError(SparkException.scala:82) at 
> org.apache.spark.sql.catalyst.expressions.CheckOverflowInTableInsert.checkChild(Cast.scala:2693)
>  at 
> org.apache.spark.sql.catalyst.expressions.CheckOverflowInTableInsert.withNewChildInternal(Cast.scala:2697)
>  at 
> org.apache.spark.sql.catalyst.expressions.CheckOverflowInTableInsert.withNewChildInternal(Cast.scala:2683)
>  at 
> org.apache.spark.sql.catalyst.trees.UnaryLike.$anonfun$mapChildren$5(TreeNode.scala:1315)
>  at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:106)
>  at 
> org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1314)
>  at 
> org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1309)
>  at 
> org.apache.spark.sql.catalyst.expressions.UnaryExpression.mapChildren(Expression.scala:636)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:570)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$1(TreeNode.scala:570)
>  
> This internal error comes from `CheckOverflowInTableInsert``checkChild`, 
> where we covered only `Cast` expr and `ExpressionProxy` expr, but not the 
> `CaseWhen` expr.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42286) Fix internal error for valid CASE WHEN expression with CAST when inserting into a table

2023-02-01 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17683168#comment-17683168
 ] 

Apache Spark commented on SPARK-42286:
--

User 'RunyaoChen' has created a pull request for this issue:
https://github.com/apache/spark/pull/39855

> Fix internal error for valid CASE WHEN expression with CAST when inserting 
> into a table
> ---
>
> Key: SPARK-42286
> URL: https://issues.apache.org/jira/browse/SPARK-42286
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Runyao.Chen
>Priority: Major
>
> ```
> spark-sql> create or replace table es570639t1 as select x FROM values (1), 
> (2), (3) as tab(x);
> spark-sql> create or replace table es570639t2 (x Decimal(9, 0));
> spark-sql> insert into es570639t2 select 0 - (case when x = 1 then 1 else x 
> end) from es570639t1 where x = 1;
> ```
> hits the following internal error
> org.apache.spark.SparkException: [INTERNAL_ERROR] Child is not Cast or 
> ExpressionProxy of Cast
>  
> Stack trace:
> org.apache.spark.SparkException: [INTERNAL_ERROR] Child is not Cast or 
> ExpressionProxy of Cast at 
> org.apache.spark.SparkException$.internalError(SparkException.scala:78) at 
> org.apache.spark.SparkException$.internalError(SparkException.scala:82) at 
> org.apache.spark.sql.catalyst.expressions.CheckOverflowInTableInsert.checkChild(Cast.scala:2693)
>  at 
> org.apache.spark.sql.catalyst.expressions.CheckOverflowInTableInsert.withNewChildInternal(Cast.scala:2697)
>  at 
> org.apache.spark.sql.catalyst.expressions.CheckOverflowInTableInsert.withNewChildInternal(Cast.scala:2683)
>  at 
> org.apache.spark.sql.catalyst.trees.UnaryLike.$anonfun$mapChildren$5(TreeNode.scala:1315)
>  at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:106)
>  at 
> org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1314)
>  at 
> org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1309)
>  at 
> org.apache.spark.sql.catalyst.expressions.UnaryExpression.mapChildren(Expression.scala:636)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:570)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$1(TreeNode.scala:570)
>  
> This internal error comes from `CheckOverflowInTableInsert``checkChild`, 
> where we covered only `Cast` expr and `ExpressionProxy` expr, but not the 
> `CaseWhen` expr.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42286) Fix internal error for valid CASE WHEN expression with CAST when inserting into a table

2023-02-01 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17683167#comment-17683167
 ] 

Apache Spark commented on SPARK-42286:
--

User 'RunyaoChen' has created a pull request for this issue:
https://github.com/apache/spark/pull/39855

> Fix internal error for valid CASE WHEN expression with CAST when inserting 
> into a table
> ---
>
> Key: SPARK-42286
> URL: https://issues.apache.org/jira/browse/SPARK-42286
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Runyao.Chen
>Priority: Major
>
> ```
> spark-sql> create or replace table es570639t1 as select x FROM values (1), 
> (2), (3) as tab(x);
> spark-sql> create or replace table es570639t2 (x Decimal(9, 0));
> spark-sql> insert into es570639t2 select 0 - (case when x = 1 then 1 else x 
> end) from es570639t1 where x = 1;
> ```
> hits the following internal error
> org.apache.spark.SparkException: [INTERNAL_ERROR] Child is not Cast or 
> ExpressionProxy of Cast
>  
> Stack trace:
> org.apache.spark.SparkException: [INTERNAL_ERROR] Child is not Cast or 
> ExpressionProxy of Cast at 
> org.apache.spark.SparkException$.internalError(SparkException.scala:78) at 
> org.apache.spark.SparkException$.internalError(SparkException.scala:82) at 
> org.apache.spark.sql.catalyst.expressions.CheckOverflowInTableInsert.checkChild(Cast.scala:2693)
>  at 
> org.apache.spark.sql.catalyst.expressions.CheckOverflowInTableInsert.withNewChildInternal(Cast.scala:2697)
>  at 
> org.apache.spark.sql.catalyst.expressions.CheckOverflowInTableInsert.withNewChildInternal(Cast.scala:2683)
>  at 
> org.apache.spark.sql.catalyst.trees.UnaryLike.$anonfun$mapChildren$5(TreeNode.scala:1315)
>  at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:106)
>  at 
> org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1314)
>  at 
> org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1309)
>  at 
> org.apache.spark.sql.catalyst.expressions.UnaryExpression.mapChildren(Expression.scala:636)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:570)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$1(TreeNode.scala:570)
>  
> This internal error comes from `CheckOverflowInTableInsert``checkChild`, 
> where we covered only `Cast` expr and `ExpressionProxy` expr, but not the 
> `CaseWhen` expr.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42286) Fix internal error for valid CASE WHEN expression with CAST when inserting into a table

2023-02-01 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42286:


Assignee: Apache Spark

> Fix internal error for valid CASE WHEN expression with CAST when inserting 
> into a table
> ---
>
> Key: SPARK-42286
> URL: https://issues.apache.org/jira/browse/SPARK-42286
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Runyao.Chen
>Assignee: Apache Spark
>Priority: Major
>
> ```
> spark-sql> create or replace table es570639t1 as select x FROM values (1), 
> (2), (3) as tab(x);
> spark-sql> create or replace table es570639t2 (x Decimal(9, 0));
> spark-sql> insert into es570639t2 select 0 - (case when x = 1 then 1 else x 
> end) from es570639t1 where x = 1;
> ```
> hits the following internal error
> org.apache.spark.SparkException: [INTERNAL_ERROR] Child is not Cast or 
> ExpressionProxy of Cast
>  
> Stack trace:
> org.apache.spark.SparkException: [INTERNAL_ERROR] Child is not Cast or 
> ExpressionProxy of Cast at 
> org.apache.spark.SparkException$.internalError(SparkException.scala:78) at 
> org.apache.spark.SparkException$.internalError(SparkException.scala:82) at 
> org.apache.spark.sql.catalyst.expressions.CheckOverflowInTableInsert.checkChild(Cast.scala:2693)
>  at 
> org.apache.spark.sql.catalyst.expressions.CheckOverflowInTableInsert.withNewChildInternal(Cast.scala:2697)
>  at 
> org.apache.spark.sql.catalyst.expressions.CheckOverflowInTableInsert.withNewChildInternal(Cast.scala:2683)
>  at 
> org.apache.spark.sql.catalyst.trees.UnaryLike.$anonfun$mapChildren$5(TreeNode.scala:1315)
>  at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:106)
>  at 
> org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1314)
>  at 
> org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1309)
>  at 
> org.apache.spark.sql.catalyst.expressions.UnaryExpression.mapChildren(Expression.scala:636)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:570)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$1(TreeNode.scala:570)
>  
> This internal error comes from `CheckOverflowInTableInsert``checkChild`, 
> where we covered only `Cast` expr and `ExpressionProxy` expr, but not the 
> `CaseWhen` expr.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42286) Fix internal error for valid CASE WHEN expression with CAST when inserting into a table

2023-02-01 Thread Runyao.Chen (Jira)
Runyao.Chen created SPARK-42286:
---

 Summary: Fix internal error for valid CASE WHEN expression with 
CAST when inserting into a table
 Key: SPARK-42286
 URL: https://issues.apache.org/jira/browse/SPARK-42286
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.4.0
Reporter: Runyao.Chen


```

spark-sql> create or replace table es570639t1 as select x FROM values (1), (2), 
(3) as tab(x);
spark-sql> create or replace table es570639t2 (x Decimal(9, 0));
spark-sql> insert into es570639t2 select 0 - (case when x = 1 then 1 else x 
end) from es570639t1 where x = 1;

```

hits the following internal error
org.apache.spark.SparkException: [INTERNAL_ERROR] Child is not Cast or 
ExpressionProxy of Cast
 

Stack trace:
org.apache.spark.SparkException: [INTERNAL_ERROR] Child is not Cast or 
ExpressionProxy of Cast at 
org.apache.spark.SparkException$.internalError(SparkException.scala:78) at 
org.apache.spark.SparkException$.internalError(SparkException.scala:82) at 
org.apache.spark.sql.catalyst.expressions.CheckOverflowInTableInsert.checkChild(Cast.scala:2693)
 at 
org.apache.spark.sql.catalyst.expressions.CheckOverflowInTableInsert.withNewChildInternal(Cast.scala:2697)
 at 
org.apache.spark.sql.catalyst.expressions.CheckOverflowInTableInsert.withNewChildInternal(Cast.scala:2683)
 at 
org.apache.spark.sql.catalyst.trees.UnaryLike.$anonfun$mapChildren$5(TreeNode.scala:1315)
 at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:106)
 at 
org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1314) 
at 
org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1309) 
at 
org.apache.spark.sql.catalyst.expressions.UnaryExpression.mapChildren(Expression.scala:636)
 at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:570)
 at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$1(TreeNode.scala:570)
 

This internal error comes from `CheckOverflowInTableInsert``checkChild`, where 
we covered only `Cast` expr and `ExpressionProxy` expr, but not the `CaseWhen` 
expr.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42277) Use ROCKSDB for spark.history.store.hybridStore.diskBackend by default

2023-02-01 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-42277.
---
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39845
[https://github.com/apache/spark/pull/39845]

> Use ROCKSDB for spark.history.store.hybridStore.diskBackend by default
> --
>
> Key: SPARK-42277
> URL: https://issues.apache.org/jira/browse/SPARK-42277
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42277) Use ROCKSDB for spark.history.store.hybridStore.diskBackend by default

2023-02-01 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-42277:
-

Assignee: Dongjoon Hyun

> Use ROCKSDB for spark.history.store.hybridStore.diskBackend by default
> --
>
> Key: SPARK-42277
> URL: https://issues.apache.org/jira/browse/SPARK-42277
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38829) New configuration for controlling timestamp inference of Parquet

2023-02-01 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17683156#comment-17683156
 ] 

Apache Spark commented on SPARK-38829:
--

User 'gengliangwang' has created a pull request for this issue:
https://github.com/apache/spark/pull/39856

> New configuration for controlling timestamp inference of Parquet
> 
>
> Key: SPARK-38829
> URL: https://issues.apache.org/jira/browse/SPARK-38829
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Gengliang Wang
>Assignee: Ivan Sadikov
>Priority: Major
> Fix For: 3.3.0
>
>
> A new SQL conf which can fallback to the behavior that reads all the Parquet 
> Timestamp column as TimestampType.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42285) Introduce conf spark.sql.parquet.inferTimestampNTZ.enabled for TimestampNTZ inference on Parquet

2023-02-01 Thread Gengliang Wang (Jira)
Gengliang Wang created SPARK-42285:
--

 Summary: Introduce conf 
spark.sql.parquet.inferTimestampNTZ.enabled for TimestampNTZ inference on 
Parquet
 Key: SPARK-42285
 URL: https://issues.apache.org/jira/browse/SPARK-42285
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.4.0
Reporter: Gengliang Wang
Assignee: Gengliang Wang


Introduce conf spark.sql.parquet.inferTimestampNTZ.enabled for TimestampNTZ 
inference on Parquet, instead of using spark.sql.parquet.timestampNTZ.enabled 
which makes it impossible for TimestampNTZ writing when the flag is disabled.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42283) Add Simple Scala UDFs to Scala/JVM Client

2023-02-01 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SPARK-42283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hövell reassigned SPARK-42283:
-

Assignee: Venkata Sai Akhil Gudesa

> Add Simple Scala UDFs to Scala/JVM Client
> -
>
> Key: SPARK-42283
> URL: https://issues.apache.org/jira/browse/SPARK-42283
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Venkata Sai Akhil Gudesa
>Assignee: Venkata Sai Akhil Gudesa
>Priority: Major
> Fix For: 3.4.0
>
>
> “Simple” here refers to UDFs that utilize no client-specific class files (e.g 
> REPL-generated) and JARs. Essentially, a “simple” UDF may only reference 
> in-built libraries and classes defined within the scope of the UDF.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42283) Add Simple Scala UDFs to Scala/JVM Client

2023-02-01 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SPARK-42283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hövell resolved SPARK-42283.
---
Fix Version/s: 3.4.0
   Resolution: Fixed

> Add Simple Scala UDFs to Scala/JVM Client
> -
>
> Key: SPARK-42283
> URL: https://issues.apache.org/jira/browse/SPARK-42283
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Venkata Sai Akhil Gudesa
>Priority: Major
> Fix For: 3.4.0
>
>
> “Simple” here refers to UDFs that utilize no client-specific class files (e.g 
> REPL-generated) and JARs. Essentially, a “simple” UDF may only reference 
> in-built libraries and classes defined within the scope of the UDF.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42228) connect-client-jvm module should shaded+relocation grpc

2023-02-01 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SPARK-42228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hövell reassigned SPARK-42228:
-

Assignee: Yang Jie

> connect-client-jvm module should shaded+relocation grpc
> ---
>
> Key: SPARK-42228
> URL: https://issues.apache.org/jira/browse/SPARK-42228
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.4.0, 3.5.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Blocker
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42228) connect-client-jvm module should shaded+relocation grpc

2023-02-01 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SPARK-42228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hövell resolved SPARK-42228.
---
Fix Version/s: 3.4.0
   Resolution: Fixed

> connect-client-jvm module should shaded+relocation grpc
> ---
>
> Key: SPARK-42228
> URL: https://issues.apache.org/jira/browse/SPARK-42228
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.4.0, 3.5.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Blocker
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42284) Make sure Connect Server assembly jar is available before we run Scala Client tests

2023-02-01 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42284:


Assignee: Apache Spark

> Make sure Connect Server assembly jar is available before we run Scala Client 
> tests
> ---
>
> Key: SPARK-42284
> URL: https://issues.apache.org/jira/browse/SPARK-42284
> Project: Spark
>  Issue Type: Task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42284) Make sure Connect Server assembly jar is available before we run Scala Client tests

2023-02-01 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17683098#comment-17683098
 ] 

Apache Spark commented on SPARK-42284:
--

User 'hvanhovell' has created a pull request for this issue:
https://github.com/apache/spark/pull/39854

> Make sure Connect Server assembly jar is available before we run Scala Client 
> tests
> ---
>
> Key: SPARK-42284
> URL: https://issues.apache.org/jira/browse/SPARK-42284
> Project: Spark
>  Issue Type: Task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42284) Make sure Connect Server assembly jar is available before we run Scala Client tests

2023-02-01 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42284:


Assignee: (was: Apache Spark)

> Make sure Connect Server assembly jar is available before we run Scala Client 
> tests
> ---
>
> Key: SPARK-42284
> URL: https://issues.apache.org/jira/browse/SPARK-42284
> Project: Spark
>  Issue Type: Task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42284) Make sure Connect Server assembly jar is available before we run Scala Client tests

2023-02-01 Thread Jira
Herman van Hövell created SPARK-42284:
-

 Summary: Make sure Connect Server assembly jar is available before 
we run Scala Client tests
 Key: SPARK-42284
 URL: https://issues.apache.org/jira/browse/SPARK-42284
 Project: Spark
  Issue Type: Task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Herman van Hövell






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41985) Centralize more column resolution rules

2023-02-01 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-41985:
---

Assignee: Wenchen Fan

> Centralize more column resolution rules
> ---
>
> Key: SPARK-41985
> URL: https://issues.apache.org/jira/browse/SPARK-41985
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-41985) Centralize more column resolution rules

2023-02-01 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-41985.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39508
[https://github.com/apache/spark/pull/39508]

> Centralize more column resolution rules
> ---
>
> Key: SPARK-41985
> URL: https://issues.apache.org/jira/browse/SPARK-41985
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-41488) Assign name to _LEGACY_ERROR_TEMP_1176

2023-02-01 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-41488.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39833
[https://github.com/apache/spark/pull/39833]

> Assign name to _LEGACY_ERROR_TEMP_1176
> --
>
> Key: SPARK-41488
> URL: https://issues.apache.org/jira/browse/SPARK-41488
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
> Fix For: 3.4.0
>
>
> We should assign proper name for all LEGACY temp error classes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41488) Assign name to _LEGACY_ERROR_TEMP_1176

2023-02-01 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk reassigned SPARK-41488:


Assignee: Haejoon Lee

> Assign name to _LEGACY_ERROR_TEMP_1176
> --
>
> Key: SPARK-41488
> URL: https://issues.apache.org/jira/browse/SPARK-41488
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>
> We should assign proper name for all LEGACY temp error classes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42229) Migrate SparkCoreErrors into error class

2023-02-01 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk updated SPARK-42229:
-
Fix Version/s: 3.4.0

> Migrate SparkCoreErrors into error class
> 
>
> Key: SPARK-42229
> URL: https://issues.apache.org/jira/browse/SPARK-42229
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
> Fix For: 3.4.0, 3.5.0
>
>
> Migrate core/src/main/scala/org/apache/spark/errors/SparkCoreErrors.scala 
> onto error class.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42239) Integrate MUST_AGGREGATE_CORRELATED_SCALAR_SUBQUERY

2023-02-01 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk updated SPARK-42239:
-
Fix Version/s: 3.4.0

> Integrate MUST_AGGREGATE_CORRELATED_SCALAR_SUBQUERY
> ---
>
> Key: SPARK-42239
> URL: https://issues.apache.org/jira/browse/SPARK-42239
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
> Fix For: 3.4.0, 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42281) Update Debugging PySpark documents to show error message properly

2023-02-01 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17683041#comment-17683041
 ] 

Apache Spark commented on SPARK-42281:
--

User 'itholic' has created a pull request for this issue:
https://github.com/apache/spark/pull/39852

> Update Debugging PySpark documents to show error message properly
> -
>
> Key: SPARK-42281
> URL: https://issues.apache.org/jira/browse/SPARK-42281
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>
> The example in 
> [https://spark.apache.org/docs/latest/api/python/development/debugging.html#debugging-pyspark]
>  is outdated due to new PySpark error framework.
> We should show proper example



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42281) Update Debugging PySpark documents to show error message properly

2023-02-01 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42281:


Assignee: (was: Apache Spark)

> Update Debugging PySpark documents to show error message properly
> -
>
> Key: SPARK-42281
> URL: https://issues.apache.org/jira/browse/SPARK-42281
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>
> The example in 
> [https://spark.apache.org/docs/latest/api/python/development/debugging.html#debugging-pyspark]
>  is outdated due to new PySpark error framework.
> We should show proper example



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42283) Add Simple Scala UDFs to Scala/JVM Client

2023-02-01 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17683039#comment-17683039
 ] 

Apache Spark commented on SPARK-42283:
--

User 'vicennial' has created a pull request for this issue:
https://github.com/apache/spark/pull/39850

> Add Simple Scala UDFs to Scala/JVM Client
> -
>
> Key: SPARK-42283
> URL: https://issues.apache.org/jira/browse/SPARK-42283
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Venkata Sai Akhil Gudesa
>Priority: Major
>
> “Simple” here refers to UDFs that utilize no client-specific class files (e.g 
> REPL-generated) and JARs. Essentially, a “simple” UDF may only reference 
> in-built libraries and classes defined within the scope of the UDF.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42281) Update Debugging PySpark documents to show error message properly

2023-02-01 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42281:


Assignee: Apache Spark

> Update Debugging PySpark documents to show error message properly
> -
>
> Key: SPARK-42281
> URL: https://issues.apache.org/jira/browse/SPARK-42281
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Assignee: Apache Spark
>Priority: Major
>
> The example in 
> [https://spark.apache.org/docs/latest/api/python/development/debugging.html#debugging-pyspark]
>  is outdated due to new PySpark error framework.
> We should show proper example



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42283) Add Simple Scala UDFs to Scala/JVM Client

2023-02-01 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42283:


Assignee: (was: Apache Spark)

> Add Simple Scala UDFs to Scala/JVM Client
> -
>
> Key: SPARK-42283
> URL: https://issues.apache.org/jira/browse/SPARK-42283
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Venkata Sai Akhil Gudesa
>Priority: Major
>
> “Simple” here refers to UDFs that utilize no client-specific class files (e.g 
> REPL-generated) and JARs. Essentially, a “simple” UDF may only reference 
> in-built libraries and classes defined within the scope of the UDF.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42283) Add Simple Scala UDFs to Scala/JVM Client

2023-02-01 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42283:


Assignee: Apache Spark

> Add Simple Scala UDFs to Scala/JVM Client
> -
>
> Key: SPARK-42283
> URL: https://issues.apache.org/jira/browse/SPARK-42283
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Venkata Sai Akhil Gudesa
>Assignee: Apache Spark
>Priority: Major
>
> “Simple” here refers to UDFs that utilize no client-specific class files (e.g 
> REPL-generated) and JARs. Essentially, a “simple” UDF may only reference 
> in-built libraries and classes defined within the scope of the UDF.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42281) Update Debugging PySpark documents to show error message properly

2023-02-01 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17683038#comment-17683038
 ] 

Apache Spark commented on SPARK-42281:
--

User 'itholic' has created a pull request for this issue:
https://github.com/apache/spark/pull/39852

> Update Debugging PySpark documents to show error message properly
> -
>
> Key: SPARK-42281
> URL: https://issues.apache.org/jira/browse/SPARK-42281
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>
> The example in 
> [https://spark.apache.org/docs/latest/api/python/development/debugging.html#debugging-pyspark]
>  is outdated due to new PySpark error framework.
> We should show proper example



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42283) Add Simple Scala UDFs to Scala/JVM Client

2023-02-01 Thread Venkata Sai Akhil Gudesa (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venkata Sai Akhil Gudesa updated SPARK-42283:
-
Description: “Simple” here refers to UDFs that utilize no client-specific 
class files (e.g REPL-generated) and JARs. Essentially, a “simple” UDF may only 
reference in-built libraries and classes defined within the scope of the UDF.  
(was: “Simple” here refers to UDFs that utilize no client-specific class files 
(e.g REPL-generated) and JARs. Essentially, a “vanilla” UDF may only reference 
in-built libraries and classes defined within the scope of the UDF.)

> Add Simple Scala UDFs to Scala/JVM Client
> -
>
> Key: SPARK-42283
> URL: https://issues.apache.org/jira/browse/SPARK-42283
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Venkata Sai Akhil Gudesa
>Priority: Major
>
> “Simple” here refers to UDFs that utilize no client-specific class files (e.g 
> REPL-generated) and JARs. Essentially, a “simple” UDF may only reference 
> in-built libraries and classes defined within the scope of the UDF.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42283) Add Simple Scala UDFs to Scala/JVM Client

2023-02-01 Thread Venkata Sai Akhil Gudesa (Jira)
Venkata Sai Akhil Gudesa created SPARK-42283:


 Summary: Add Simple Scala UDFs to Scala/JVM Client
 Key: SPARK-42283
 URL: https://issues.apache.org/jira/browse/SPARK-42283
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.5.0
Reporter: Venkata Sai Akhil Gudesa


“Simple” here refers to UDFs that utilize no client-specific class files (e.g 
REPL-generated) and JARs. Essentially, a “vanilla” UDF may only reference 
in-built libraries and classes defined within the scope of the UDF.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42282) Split 'pyspark.pandas.tests.test_groupby'

2023-02-01 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17683025#comment-17683025
 ] 

Apache Spark commented on SPARK-42282:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/39849

> Split 'pyspark.pandas.tests.test_groupby'
> -
>
> Key: SPARK-42282
> URL: https://issues.apache.org/jira/browse/SPARK-42282
> Project: Spark
>  Issue Type: Test
>  Components: ps, Tests
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42282) Split 'pyspark.pandas.tests.test_groupby'

2023-02-01 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42282:


Assignee: Apache Spark

> Split 'pyspark.pandas.tests.test_groupby'
> -
>
> Key: SPARK-42282
> URL: https://issues.apache.org/jira/browse/SPARK-42282
> Project: Spark
>  Issue Type: Test
>  Components: ps, Tests
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Apache Spark
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42282) Split 'pyspark.pandas.tests.test_groupby'

2023-02-01 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17683023#comment-17683023
 ] 

Apache Spark commented on SPARK-42282:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/39849

> Split 'pyspark.pandas.tests.test_groupby'
> -
>
> Key: SPARK-42282
> URL: https://issues.apache.org/jira/browse/SPARK-42282
> Project: Spark
>  Issue Type: Test
>  Components: ps, Tests
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42282) Split 'pyspark.pandas.tests.test_groupby'

2023-02-01 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42282:


Assignee: (was: Apache Spark)

> Split 'pyspark.pandas.tests.test_groupby'
> -
>
> Key: SPARK-42282
> URL: https://issues.apache.org/jira/browse/SPARK-42282
> Project: Spark
>  Issue Type: Test
>  Components: ps, Tests
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42282) Split 'pyspark.pandas.tests.test_groupby'

2023-02-01 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-42282:
-

 Summary: Split 'pyspark.pandas.tests.test_groupby'
 Key: SPARK-42282
 URL: https://issues.apache.org/jira/browse/SPARK-42282
 Project: Spark
  Issue Type: Test
  Components: ps, Tests
Affects Versions: 3.4.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42279) Simplify `pyspark.pandas.tests.test_resample`

2023-02-01 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng updated SPARK-42279:
--
Summary: Simplify `pyspark.pandas.tests.test_resample`  (was: Simplify 
`test_resample`)

> Simplify `pyspark.pandas.tests.test_resample`
> -
>
> Key: SPARK-42279
> URL: https://issues.apache.org/jira/browse/SPARK-42279
> Project: Spark
>  Issue Type: Test
>  Components: ps, Tests
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42276) Add ServicesResourceTransformer to connect server module shade configuration

2023-02-01 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17683018#comment-17683018
 ] 

Apache Spark commented on SPARK-42276:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/39848

> Add ServicesResourceTransformer to connect server module  shade configuration
> -
>
> Key: SPARK-42276
> URL: https://issues.apache.org/jira/browse/SPARK-42276
> Project: Spark
>  Issue Type: Bug
>  Components: Build, Connect
>Affects Versions: 3.4.0, 3.5.0
>Reporter: Yang Jie
>Priority: Minor
>
> The contents of META-INF/services directory in the shaded connect-server jar 
> have not been relocated.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42276) Add ServicesResourceTransformer to connect server module shade configuration

2023-02-01 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17683017#comment-17683017
 ] 

Apache Spark commented on SPARK-42276:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/39848

> Add ServicesResourceTransformer to connect server module  shade configuration
> -
>
> Key: SPARK-42276
> URL: https://issues.apache.org/jira/browse/SPARK-42276
> Project: Spark
>  Issue Type: Bug
>  Components: Build, Connect
>Affects Versions: 3.4.0, 3.5.0
>Reporter: Yang Jie
>Priority: Minor
>
> The contents of META-INF/services directory in the shaded connect-server jar 
> have not been relocated.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42276) Add ServicesResourceTransformer to connect server module shade configuration

2023-02-01 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42276:


Assignee: (was: Apache Spark)

> Add ServicesResourceTransformer to connect server module  shade configuration
> -
>
> Key: SPARK-42276
> URL: https://issues.apache.org/jira/browse/SPARK-42276
> Project: Spark
>  Issue Type: Bug
>  Components: Build, Connect
>Affects Versions: 3.4.0, 3.5.0
>Reporter: Yang Jie
>Priority: Minor
>
> The contents of META-INF/services directory in the shaded connect-server jar 
> have not been relocated.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42276) Add ServicesResourceTransformer to connect server module shade configuration

2023-02-01 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42276:


Assignee: Apache Spark

> Add ServicesResourceTransformer to connect server module  shade configuration
> -
>
> Key: SPARK-42276
> URL: https://issues.apache.org/jira/browse/SPARK-42276
> Project: Spark
>  Issue Type: Bug
>  Components: Build, Connect
>Affects Versions: 3.4.0, 3.5.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Minor
>
> The contents of META-INF/services directory in the shaded connect-server jar 
> have not been relocated.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42276) Add ServicesResourceTransformer to connect server module shade configuration

2023-02-01 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-42276:
-
Summary: Add ServicesResourceTransformer to connect server module  shade 
configuration  (was: Add ServicesResourceTransformer to connect server module  
relocation configuration)

> Add ServicesResourceTransformer to connect server module  shade configuration
> -
>
> Key: SPARK-42276
> URL: https://issues.apache.org/jira/browse/SPARK-42276
> Project: Spark
>  Issue Type: Bug
>  Components: Build, Connect
>Affects Versions: 3.4.0, 3.5.0
>Reporter: Yang Jie
>Priority: Minor
>
> The contents of META-INF/services directory in the shaded connect-server jar 
> have not been relocated.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42281) Update Debugging PySpark documents to show error message properly

2023-02-01 Thread Haejoon Lee (Jira)
Haejoon Lee created SPARK-42281:
---

 Summary: Update Debugging PySpark documents to show error message 
properly
 Key: SPARK-42281
 URL: https://issues.apache.org/jira/browse/SPARK-42281
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation, PySpark
Affects Versions: 3.4.0
Reporter: Haejoon Lee


The example in 
[https://spark.apache.org/docs/latest/api/python/development/debugging.html#debugging-pyspark]
 is outdated due to new PySpark error framework.

We should show proper example



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42276) Add ServicesResourceTransformer to connect server module relocation configuration

2023-02-01 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-42276:
-
Issue Type: Bug  (was: Improvement)

> Add ServicesResourceTransformer to connect server module  relocation 
> configuration
> --
>
> Key: SPARK-42276
> URL: https://issues.apache.org/jira/browse/SPARK-42276
> Project: Spark
>  Issue Type: Bug
>  Components: Build, Connect
>Affects Versions: 3.4.0, 3.5.0
>Reporter: Yang Jie
>Priority: Minor
>
> The contents of META-INF/services directory in the shaded connect-server jar 
> have not been relocated.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42276) Add ServicesResourceTransformer to connect server module relocation configuration

2023-02-01 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-42276:
-
Issue Type: Improvement  (was: Bug)

> Add ServicesResourceTransformer to connect server module  relocation 
> configuration
> --
>
> Key: SPARK-42276
> URL: https://issues.apache.org/jira/browse/SPARK-42276
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, Connect
>Affects Versions: 3.4.0, 3.5.0
>Reporter: Yang Jie
>Priority: Major
>
> The contents of META-INF/services directory in the shaded connect-server jar 
> have not been relocated.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42276) Add ServicesResourceTransformer to connect server module relocation configuration

2023-02-01 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-42276:
-
Priority: Minor  (was: Major)

> Add ServicesResourceTransformer to connect server module  relocation 
> configuration
> --
>
> Key: SPARK-42276
> URL: https://issues.apache.org/jira/browse/SPARK-42276
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, Connect
>Affects Versions: 3.4.0, 3.5.0
>Reporter: Yang Jie
>Priority: Minor
>
> The contents of META-INF/services directory in the shaded connect-server jar 
> have not been relocated.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42228) connect-client-jvm module should shaded+relocation grpc

2023-02-01 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-42228:
-
Priority: Blocker  (was: Major)

> connect-client-jvm module should shaded+relocation grpc
> ---
>
> Key: SPARK-42228
> URL: https://issues.apache.org/jira/browse/SPARK-42228
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Priority: Blocker
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42228) connect-client-jvm module should shaded+relocation grpc

2023-02-01 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-42228:
-
Affects Version/s: 3.4.0

> connect-client-jvm module should shaded+relocation grpc
> ---
>
> Key: SPARK-42228
> URL: https://issues.apache.org/jira/browse/SPARK-42228
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.4.0, 3.5.0
>Reporter: Yang Jie
>Priority: Blocker
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42278) DS V2 pushdown supports supports JDBC dialects compile `SortOrder` by themselves

2023-02-01 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-42278:
---

Assignee: jiaan.geng

> DS V2 pushdown supports supports JDBC dialects compile `SortOrder` by 
> themselves
> 
>
> Key: SPARK-42278
> URL: https://issues.apache.org/jira/browse/SPARK-42278
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: jiaan.geng
>Assignee: jiaan.geng
>Priority: Major
>
> Currently, DS V2 pushdown framework compile the SortOrder in fixed format.
> This is not flexible and friendly for some databases that do not support this 
> syntax.
> For example, the fixed format order by col asc nulls first is not supported 
> by mssql server who doesn't support nulls first.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42278) DS V2 pushdown supports supports JDBC dialects compile `SortOrder` by themselves

2023-02-01 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-42278.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39846
[https://github.com/apache/spark/pull/39846]

> DS V2 pushdown supports supports JDBC dialects compile `SortOrder` by 
> themselves
> 
>
> Key: SPARK-42278
> URL: https://issues.apache.org/jira/browse/SPARK-42278
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: jiaan.geng
>Assignee: jiaan.geng
>Priority: Major
> Fix For: 3.4.0
>
>
> Currently, DS V2 pushdown framework compile the SortOrder in fixed format.
> This is not flexible and friendly for some databases that do not support this 
> syntax.
> For example, the fixed format order by col asc nulls first is not supported 
> by mssql server who doesn't support nulls first.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42280) add spark.yarn.archive/jars similar option for spark on K8S

2023-02-01 Thread Xianjin YE (Jira)
Xianjin YE created SPARK-42280:
--

 Summary: add spark.yarn.archive/jars similar option for spark on 
K8S
 Key: SPARK-42280
 URL: https://issues.apache.org/jira/browse/SPARK-42280
 Project: Spark
  Issue Type: Improvement
  Components: Kubernetes
Affects Versions: 3.3.1, 3.2.2
Reporter: Xianjin YE


For spark on yarn,  there are `spark.yarn.archive` and `spark.yarn.jars` to 
distribute spark runtime jars before driver/executor starts up.

 

I'd like to propose similar functionality for spark on K8S. The benefits are:
 # accelerating workloads migration from yarn to K8S which use the above feature
 # explore new version of spark more easily without to rebuild the spark image
 # currently, there's really no other way to add additional/extension jars to 
executors on k8s before startup.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42276) Add ServicesResourceTransformer to connect server module relocation configuration

2023-02-01 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-42276:
-
Description: The contents of META-INF/services directory in the shaded 
connect-server jar have not been relocated.

> Add ServicesResourceTransformer to connect server module  relocation 
> configuration
> --
>
> Key: SPARK-42276
> URL: https://issues.apache.org/jira/browse/SPARK-42276
> Project: Spark
>  Issue Type: Bug
>  Components: Build, Connect
>Affects Versions: 3.4.0, 3.5.0
>Reporter: Yang Jie
>Priority: Major
>
> The contents of META-INF/services directory in the shaded connect-server jar 
> have not been relocated.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42279) Simplify `test_resample`

2023-02-01 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17682990#comment-17682990
 ] 

Apache Spark commented on SPARK-42279:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/39847

> Simplify `test_resample`
> 
>
> Key: SPARK-42279
> URL: https://issues.apache.org/jira/browse/SPARK-42279
> Project: Spark
>  Issue Type: Test
>  Components: ps, Tests
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42279) Simplify `test_resample`

2023-02-01 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42279:


Assignee: Apache Spark

> Simplify `test_resample`
> 
>
> Key: SPARK-42279
> URL: https://issues.apache.org/jira/browse/SPARK-42279
> Project: Spark
>  Issue Type: Test
>  Components: ps, Tests
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Apache Spark
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42279) Simplify `test_resample`

2023-02-01 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42279:


Assignee: (was: Apache Spark)

> Simplify `test_resample`
> 
>
> Key: SPARK-42279
> URL: https://issues.apache.org/jira/browse/SPARK-42279
> Project: Spark
>  Issue Type: Test
>  Components: ps, Tests
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42276) Add ServicesResourceTransformer to connect server module relocation configuration

2023-02-01 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-42276:
-
Summary: Add ServicesResourceTransformer to connect server module  
relocation configuration  (was: Fix relocation configuration of connect server 
module)

> Add ServicesResourceTransformer to connect server module  relocation 
> configuration
> --
>
> Key: SPARK-42276
> URL: https://issues.apache.org/jira/browse/SPARK-42276
> Project: Spark
>  Issue Type: Bug
>  Components: Build, Connect
>Affects Versions: 3.4.0, 3.5.0
>Reporter: Yang Jie
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42279) Simplify `test_resample`

2023-02-01 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-42279:
-

 Summary: Simplify `test_resample`
 Key: SPARK-42279
 URL: https://issues.apache.org/jira/browse/SPARK-42279
 Project: Spark
  Issue Type: Test
  Components: ps, Tests
Affects Versions: 3.4.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41241) Use Hive and Spark SQL to modify table field comment, the modified results of Hive cannot be queried using Spark SQL

2023-02-01 Thread weiliang hao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

weiliang hao updated SPARK-41241:
-
Description: 
-- Hive

> create table table_test(id int);

> alter table table_test change column id id int comment "hive comment";

> desc formatted table_test;
{code:java}
+---+++
|           col_name            |                     data_type                 
     |                      comment                       |
+---+++
| # col_name                    | data_type                                     
     | comment                                            |
| id                            | int                                           
     | hive comment                                        |
|                               | NULL                                          
     | NULL                                               |
| # Detailed Table Information  | NULL                                          
     | NULL                                               |
| Database:                     | default                                       
     | NULL                                               |
| OwnerType:                    | USER                                          
     | NULL                                               |
| Owner:                        | anonymous                                     
     | NULL                                               |
| CreateTime:                   | Wed Nov 23 23:06:41 CST 2022                  
     | NULL                                               |
| LastAccessTime:               | UNKNOWN                                       
     | NULL                                               |
| Retention:                    | 0                                             
     | NULL                                               |
| Location:                     | 
hdfs://localhost:8020/warehouse/tablespace/managed/hive/table_test | NULL       
                                        |
| Table Type:                   | MANAGED_TABLE                                 
     | NULL                                               |
| Table Parameters:             | NULL                                          
     | NULL                                               |
|                               | COLUMN_STATS_ACCURATE                         
     | {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"id\":\"true\"}} |
|                               | bucketing_version                             
     | 2                                                  |
|                               | last_modified_by                              
     | anonymous                                          |
|                               | last_modified_time                            
     | 1669216665                                         |
|                               | numFiles                                      
     | 0                                                  |
|                               | numRows                                       
     | 0                                                  |
|                               | rawDataSize                                   
     | 0                                                  |
|                               | totalSize                                     
     | 0                                                  |
|                               | transactional                                 
     | true                                               |
|                               | transactional_properties                      
     | default                                            |
|                               | transient_lastDdlTime                         
     | 1669216665                                         |
|                               | NULL                                          
     | NULL                                               |
| # Storage Information         | NULL                                          
     | NULL                                               |
| SerDe Library:                | org.apache.hadoop.hive.ql.io.orc.OrcSerde     
     | NULL                                               |
| InputFormat:                  | 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat    | NULL                       
                        |
| OutputFormat:                 | 
org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat   | NULL                       
                        |
| Compressed:                   | No                                            
 

[jira] [Assigned] (SPARK-42274) Upgrade `compress-lzf` to 1.1.2

2023-02-01 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-42274:
-

Assignee: Dongjoon Hyun

> Upgrade `compress-lzf` to 1.1.2
> ---
>
> Key: SPARK-42274
> URL: https://issues.apache.org/jira/browse/SPARK-42274
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42274) Upgrade `compress-lzf` to 1.1.2

2023-02-01 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-42274.
---
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39841
[https://github.com/apache/spark/pull/39841]

> Upgrade `compress-lzf` to 1.1.2
> ---
>
> Key: SPARK-42274
> URL: https://issues.apache.org/jira/browse/SPARK-42274
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42259) ResolveGroupingAnalytics should take care of Python UDAF

2023-02-01 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-42259:
---

Assignee: Wenchen Fan

> ResolveGroupingAnalytics should take care of Python UDAF
> 
>
> Key: SPARK-42259
> URL: https://issues.apache.org/jira/browse/SPARK-42259
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42259) ResolveGroupingAnalytics should take care of Python UDAF

2023-02-01 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-42259.
-
Fix Version/s: 3.2.4
   3.3.2
   3.4.0
   Resolution: Fixed

Issue resolved by pull request 39824
[https://github.com/apache/spark/pull/39824]

> ResolveGroupingAnalytics should take care of Python UDAF
> 
>
> Key: SPARK-42259
> URL: https://issues.apache.org/jira/browse/SPARK-42259
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
> Fix For: 3.2.4, 3.3.2, 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42272) Use available ephemeral port for Spark Connect server in testing

2023-02-01 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-42272:


Assignee: Hyukjin Kwon

> Use available ephemeral port for Spark Connect server in testing
> 
>
> Key: SPARK-42272
> URL: https://issues.apache.org/jira/browse/SPARK-42272
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Tests
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>
> Currently Spark Connect tests cannot run in parallel, and requires to set the 
> parallelism as 1
> {code}
> python/run-tests --module pyspark-connect --parallelism 1
> {code}
> The main reason is because of the port being used is hardcorded as the 
> default 15002. We should better search available port, and use it



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42272) Use available ephemeral port for Spark Connect server in testing

2023-02-01 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-42272.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39834
[https://github.com/apache/spark/pull/39834]

> Use available ephemeral port for Spark Connect server in testing
> 
>
> Key: SPARK-42272
> URL: https://issues.apache.org/jira/browse/SPARK-42272
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Tests
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.4.0
>
>
> Currently Spark Connect tests cannot run in parallel, and requires to set the 
> parallelism as 1
> {code}
> python/run-tests --module pyspark-connect --parallelism 1
> {code}
> The main reason is because of the port being used is hardcorded as the 
> default 15002. We should better search available port, and use it



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42278) DS V2 pushdown supports supports JDBC dialects compile `SortOrder` by themselves

2023-02-01 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17682919#comment-17682919
 ] 

Apache Spark commented on SPARK-42278:
--

User 'beliefer' has created a pull request for this issue:
https://github.com/apache/spark/pull/39846

> DS V2 pushdown supports supports JDBC dialects compile `SortOrder` by 
> themselves
> 
>
> Key: SPARK-42278
> URL: https://issues.apache.org/jira/browse/SPARK-42278
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: jiaan.geng
>Priority: Major
>
> Currently, DS V2 pushdown framework compile the SortOrder in fixed format.
> This is not flexible and friendly for some databases that do not support this 
> syntax.
> For example, the fixed format order by col asc nulls first is not supported 
> by mssql server who doesn't support nulls first.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   >