[jira] [Updated] (SPARK-41490) Assign name to _LEGACY_ERROR_TEMP_2441
[ https://issues.apache.org/jira/browse/SPARK-41490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk updated SPARK-41490: - Fix Version/s: 3.4.0 > Assign name to _LEGACY_ERROR_TEMP_2441 > -- > > Key: SPARK-41490 > URL: https://issues.apache.org/jira/browse/SPARK-41490 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Fix For: 3.4.0, 3.5.0 > > > We should assign proper name for all LEGACY temp error classes. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42288) Expose file path if reading failed
[ https://issues.apache.org/jira/browse/SPARK-42288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42288: Assignee: (was: Apache Spark) > Expose file path if reading failed > -- > > Key: SPARK-42288 > URL: https://issues.apache.org/jira/browse/SPARK-42288 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Yi kaifei >Priority: Minor > > `MalformedInputException` may be thrown because the decompression failed when > reading the file. In this case, the error message does not contain the file > name. If the file name is included, it is easier to locate the problem. > {code:java} > org.apache.spark.SparkException: Job aborted due to stage failure: Task 41 in > stage 15641.0 failed 10 times, most recent failure: Lost task 41.9 in stage > 15641.0 (TID 6287211) (hostname executor 58): > io.airlift.compress.MalformedInputException: Malformed input: offset=65075 > at > io.airlift.compress.snappy.SnappyRawDecompressor.uncompressAll(SnappyRawDecompressor.java:108) > at > io.airlift.compress.snappy.SnappyRawDecompressor.decompress(SnappyRawDecompressor.java:53) > at > io.airlift.compress.snappy.SnappyDecompressor.decompress(SnappyDecompressor.java:45) > at > org.apache.orc.impl.AircompressorCodec.decompress(AircompressorCodec.java:94) > at org.apache.orc.impl.SnappyCodec.decompress(SnappyCodec.java:45) > at > org.apache.orc.impl.InStream$CompressedStream.readHeader(InStream.java:495) > at > org.apache.orc.impl.InStream$CompressedStream.ensureUncompressed(InStream.java:522) > at org.apache.orc.impl.InStream$CompressedStream.read(InStream.java:509) > at > org.apache.orc.impl.SerializationUtils.readRemainingLongs(SerializationUtils.java:1102) > at > org.apache.orc.impl.SerializationUtils.unrolledUnPackBytes(SerializationUtils.java:1094) > at > org.apache.orc.impl.SerializationUtils.unrolledUnPack32(SerializationUtils.java:1059) > at > org.apache.orc.impl.SerializationUtils.readInts(SerializationUtils.java:925) > at > org.apache.orc.impl.RunLengthIntegerReaderV2.readDirectValues(RunLengthIntegerReaderV2.java:268) > at > org.apache.orc.impl.RunLengthIntegerReaderV2.readValues(RunLengthIntegerReaderV2.java:69) > at > org.apache.orc.impl.RunLengthIntegerReaderV2.next(RunLengthIntegerReaderV2.java:323) > at > org.apache.orc.impl.RunLengthIntegerReaderV2.nextVector(RunLengthIntegerReaderV2.java:373) > at > org.apache.orc.impl.TreeReaderFactory$LongTreeReader.nextVector(TreeReaderFactory.java:641) > at > org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextBatch(TreeReaderFactory.java:2047) > at > org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1219) > at > org.apache.spark.sql.execution.datasources.orc.OrcColumnarBatchReader.nextBatch(OrcColumnarBatchReader.java:197) > at > org.apache.spark.sql.execution.datasources.orc.OrcColumnarBatchReader.nextKeyValue(OrcColumnarBatchReader.java:99) > at > org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:93) > at > org.apache.spark.sql.execution.FileSourceScanExec$$anon$1.hasNext(DataSourceScanExec.scala:522) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage8.columnartorow_nextBatch_0$(Unknown > Source) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage8.agg_doAggregateWithKeys_0$(Unknown > Source) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage8.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759) > at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) > at > org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:179) > at > org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52) > at org.apache.spark.scheduler.Task.run(Task.scala:131) > at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:510) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1491) > at
[jira] [Assigned] (SPARK-42288) Expose file path if reading failed
[ https://issues.apache.org/jira/browse/SPARK-42288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42288: Assignee: Apache Spark > Expose file path if reading failed > -- > > Key: SPARK-42288 > URL: https://issues.apache.org/jira/browse/SPARK-42288 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Yi kaifei >Assignee: Apache Spark >Priority: Minor > > `MalformedInputException` may be thrown because the decompression failed when > reading the file. In this case, the error message does not contain the file > name. If the file name is included, it is easier to locate the problem. > {code:java} > org.apache.spark.SparkException: Job aborted due to stage failure: Task 41 in > stage 15641.0 failed 10 times, most recent failure: Lost task 41.9 in stage > 15641.0 (TID 6287211) (hostname executor 58): > io.airlift.compress.MalformedInputException: Malformed input: offset=65075 > at > io.airlift.compress.snappy.SnappyRawDecompressor.uncompressAll(SnappyRawDecompressor.java:108) > at > io.airlift.compress.snappy.SnappyRawDecompressor.decompress(SnappyRawDecompressor.java:53) > at > io.airlift.compress.snappy.SnappyDecompressor.decompress(SnappyDecompressor.java:45) > at > org.apache.orc.impl.AircompressorCodec.decompress(AircompressorCodec.java:94) > at org.apache.orc.impl.SnappyCodec.decompress(SnappyCodec.java:45) > at > org.apache.orc.impl.InStream$CompressedStream.readHeader(InStream.java:495) > at > org.apache.orc.impl.InStream$CompressedStream.ensureUncompressed(InStream.java:522) > at org.apache.orc.impl.InStream$CompressedStream.read(InStream.java:509) > at > org.apache.orc.impl.SerializationUtils.readRemainingLongs(SerializationUtils.java:1102) > at > org.apache.orc.impl.SerializationUtils.unrolledUnPackBytes(SerializationUtils.java:1094) > at > org.apache.orc.impl.SerializationUtils.unrolledUnPack32(SerializationUtils.java:1059) > at > org.apache.orc.impl.SerializationUtils.readInts(SerializationUtils.java:925) > at > org.apache.orc.impl.RunLengthIntegerReaderV2.readDirectValues(RunLengthIntegerReaderV2.java:268) > at > org.apache.orc.impl.RunLengthIntegerReaderV2.readValues(RunLengthIntegerReaderV2.java:69) > at > org.apache.orc.impl.RunLengthIntegerReaderV2.next(RunLengthIntegerReaderV2.java:323) > at > org.apache.orc.impl.RunLengthIntegerReaderV2.nextVector(RunLengthIntegerReaderV2.java:373) > at > org.apache.orc.impl.TreeReaderFactory$LongTreeReader.nextVector(TreeReaderFactory.java:641) > at > org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextBatch(TreeReaderFactory.java:2047) > at > org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1219) > at > org.apache.spark.sql.execution.datasources.orc.OrcColumnarBatchReader.nextBatch(OrcColumnarBatchReader.java:197) > at > org.apache.spark.sql.execution.datasources.orc.OrcColumnarBatchReader.nextKeyValue(OrcColumnarBatchReader.java:99) > at > org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:93) > at > org.apache.spark.sql.execution.FileSourceScanExec$$anon$1.hasNext(DataSourceScanExec.scala:522) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage8.columnartorow_nextBatch_0$(Unknown > Source) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage8.agg_doAggregateWithKeys_0$(Unknown > Source) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage8.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759) > at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) > at > org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:179) > at > org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52) > at org.apache.spark.scheduler.Task.run(Task.scala:131) > at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:510) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1491) > at
[jira] [Commented] (SPARK-42288) Expose file path if reading failed
[ https://issues.apache.org/jira/browse/SPARK-42288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17683245#comment-17683245 ] Apache Spark commented on SPARK-42288: -- User 'Yikf' has created a pull request for this issue: https://github.com/apache/spark/pull/39858 > Expose file path if reading failed > -- > > Key: SPARK-42288 > URL: https://issues.apache.org/jira/browse/SPARK-42288 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Yi kaifei >Priority: Minor > > `MalformedInputException` may be thrown because the decompression failed when > reading the file. In this case, the error message does not contain the file > name. If the file name is included, it is easier to locate the problem. > {code:java} > org.apache.spark.SparkException: Job aborted due to stage failure: Task 41 in > stage 15641.0 failed 10 times, most recent failure: Lost task 41.9 in stage > 15641.0 (TID 6287211) (hostname executor 58): > io.airlift.compress.MalformedInputException: Malformed input: offset=65075 > at > io.airlift.compress.snappy.SnappyRawDecompressor.uncompressAll(SnappyRawDecompressor.java:108) > at > io.airlift.compress.snappy.SnappyRawDecompressor.decompress(SnappyRawDecompressor.java:53) > at > io.airlift.compress.snappy.SnappyDecompressor.decompress(SnappyDecompressor.java:45) > at > org.apache.orc.impl.AircompressorCodec.decompress(AircompressorCodec.java:94) > at org.apache.orc.impl.SnappyCodec.decompress(SnappyCodec.java:45) > at > org.apache.orc.impl.InStream$CompressedStream.readHeader(InStream.java:495) > at > org.apache.orc.impl.InStream$CompressedStream.ensureUncompressed(InStream.java:522) > at org.apache.orc.impl.InStream$CompressedStream.read(InStream.java:509) > at > org.apache.orc.impl.SerializationUtils.readRemainingLongs(SerializationUtils.java:1102) > at > org.apache.orc.impl.SerializationUtils.unrolledUnPackBytes(SerializationUtils.java:1094) > at > org.apache.orc.impl.SerializationUtils.unrolledUnPack32(SerializationUtils.java:1059) > at > org.apache.orc.impl.SerializationUtils.readInts(SerializationUtils.java:925) > at > org.apache.orc.impl.RunLengthIntegerReaderV2.readDirectValues(RunLengthIntegerReaderV2.java:268) > at > org.apache.orc.impl.RunLengthIntegerReaderV2.readValues(RunLengthIntegerReaderV2.java:69) > at > org.apache.orc.impl.RunLengthIntegerReaderV2.next(RunLengthIntegerReaderV2.java:323) > at > org.apache.orc.impl.RunLengthIntegerReaderV2.nextVector(RunLengthIntegerReaderV2.java:373) > at > org.apache.orc.impl.TreeReaderFactory$LongTreeReader.nextVector(TreeReaderFactory.java:641) > at > org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextBatch(TreeReaderFactory.java:2047) > at > org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1219) > at > org.apache.spark.sql.execution.datasources.orc.OrcColumnarBatchReader.nextBatch(OrcColumnarBatchReader.java:197) > at > org.apache.spark.sql.execution.datasources.orc.OrcColumnarBatchReader.nextKeyValue(OrcColumnarBatchReader.java:99) > at > org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:93) > at > org.apache.spark.sql.execution.FileSourceScanExec$$anon$1.hasNext(DataSourceScanExec.scala:522) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage8.columnartorow_nextBatch_0$(Unknown > Source) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage8.agg_doAggregateWithKeys_0$(Unknown > Source) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage8.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759) > at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) > at > org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:179) > at > org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52) > at org.apache.spark.scheduler.Task.run(Task.scala:131) > at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:510) > at
[jira] [Updated] (SPARK-41489) Assign name to _LEGACY_ERROR_TEMP_2415
[ https://issues.apache.org/jira/browse/SPARK-41489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk updated SPARK-41489: - Fix Version/s: 3.4.0 > Assign name to _LEGACY_ERROR_TEMP_2415 > -- > > Key: SPARK-41489 > URL: https://issues.apache.org/jira/browse/SPARK-41489 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Fix For: 3.4.0, 3.5.0 > > > We should assign proper name for all LEGACY temp error classes. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42288) Expose file path if reading failed
[ https://issues.apache.org/jira/browse/SPARK-42288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi kaifei updated SPARK-42288: -- Description: `MalformedInputException` may be thrown because the decompression failed when reading the file. In this case, the error message does not contain the file name. If the file name is included, it is easier to locate the problem. ``` org.apache.spark.SparkException: Job aborted due to stage failure: Task 41 in stage 15641.0 failed 10 times, most recent failure: Lost task 41.9 in stage 15641.0 (TID 6287211) (hostname executor 58): io.airlift.compress.MalformedInputException: Malformed input: offset=65075 at io.airlift.compress.snappy.SnappyRawDecompressor.uncompressAll(SnappyRawDecompressor.java:108) at io.airlift.compress.snappy.SnappyRawDecompressor.decompress(SnappyRawDecompressor.java:53) at io.airlift.compress.snappy.SnappyDecompressor.decompress(SnappyDecompressor.java:45) at org.apache.orc.impl.AircompressorCodec.decompress(AircompressorCodec.java:94) at org.apache.orc.impl.SnappyCodec.decompress(SnappyCodec.java:45) at org.apache.orc.impl.InStream$CompressedStream.readHeader(InStream.java:495) at org.apache.orc.impl.InStream$CompressedStream.ensureUncompressed(InStream.java:522) at org.apache.orc.impl.InStream$CompressedStream.read(InStream.java:509) at org.apache.orc.impl.SerializationUtils.readRemainingLongs(SerializationUtils.java:1102) at org.apache.orc.impl.SerializationUtils.unrolledUnPackBytes(SerializationUtils.java:1094) at org.apache.orc.impl.SerializationUtils.unrolledUnPack32(SerializationUtils.java:1059) at org.apache.orc.impl.SerializationUtils.readInts(SerializationUtils.java:925) at org.apache.orc.impl.RunLengthIntegerReaderV2.readDirectValues(RunLengthIntegerReaderV2.java:268) at org.apache.orc.impl.RunLengthIntegerReaderV2.readValues(RunLengthIntegerReaderV2.java:69) at org.apache.orc.impl.RunLengthIntegerReaderV2.next(RunLengthIntegerReaderV2.java:323) at org.apache.orc.impl.RunLengthIntegerReaderV2.nextVector(RunLengthIntegerReaderV2.java:373) at org.apache.orc.impl.TreeReaderFactory$LongTreeReader.nextVector(TreeReaderFactory.java:641) at org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextBatch(TreeReaderFactory.java:2047) at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1219) at org.apache.spark.sql.execution.datasources.orc.OrcColumnarBatchReader.nextBatch(OrcColumnarBatchReader.java:197) at org.apache.spark.sql.execution.datasources.orc.OrcColumnarBatchReader.nextKeyValue(OrcColumnarBatchReader.java:99) at org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:93) at org.apache.spark.sql.execution.FileSourceScanExec$$anon$1.hasNext(DataSourceScanExec.scala:522) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage8.columnartorow_nextBatch_0$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage8.agg_doAggregateWithKeys_0$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage8.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) at org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:179) at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52) at org.apache.spark.scheduler.Task.run(Task.scala:131) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:510) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1491) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:513) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:829) ``` > Expose file path if reading failed > -- > > Key: SPARK-42288 > URL: https://issues.apache.org/jira/browse/SPARK-42288 >
[jira] [Updated] (SPARK-42288) Expose file path if reading failed
[ https://issues.apache.org/jira/browse/SPARK-42288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi kaifei updated SPARK-42288: -- Description: `MalformedInputException` may be thrown because the decompression failed when reading the file. In this case, the error message does not contain the file name. If the file name is included, it is easier to locate the problem. {code:java} org.apache.spark.SparkException: Job aborted due to stage failure: Task 41 in stage 15641.0 failed 10 times, most recent failure: Lost task 41.9 in stage 15641.0 (TID 6287211) (hostname executor 58): io.airlift.compress.MalformedInputException: Malformed input: offset=65075 at io.airlift.compress.snappy.SnappyRawDecompressor.uncompressAll(SnappyRawDecompressor.java:108) at io.airlift.compress.snappy.SnappyRawDecompressor.decompress(SnappyRawDecompressor.java:53) at io.airlift.compress.snappy.SnappyDecompressor.decompress(SnappyDecompressor.java:45) at org.apache.orc.impl.AircompressorCodec.decompress(AircompressorCodec.java:94) at org.apache.orc.impl.SnappyCodec.decompress(SnappyCodec.java:45) at org.apache.orc.impl.InStream$CompressedStream.readHeader(InStream.java:495) at org.apache.orc.impl.InStream$CompressedStream.ensureUncompressed(InStream.java:522) at org.apache.orc.impl.InStream$CompressedStream.read(InStream.java:509) at org.apache.orc.impl.SerializationUtils.readRemainingLongs(SerializationUtils.java:1102) at org.apache.orc.impl.SerializationUtils.unrolledUnPackBytes(SerializationUtils.java:1094) at org.apache.orc.impl.SerializationUtils.unrolledUnPack32(SerializationUtils.java:1059) at org.apache.orc.impl.SerializationUtils.readInts(SerializationUtils.java:925) at org.apache.orc.impl.RunLengthIntegerReaderV2.readDirectValues(RunLengthIntegerReaderV2.java:268) at org.apache.orc.impl.RunLengthIntegerReaderV2.readValues(RunLengthIntegerReaderV2.java:69) at org.apache.orc.impl.RunLengthIntegerReaderV2.next(RunLengthIntegerReaderV2.java:323) at org.apache.orc.impl.RunLengthIntegerReaderV2.nextVector(RunLengthIntegerReaderV2.java:373) at org.apache.orc.impl.TreeReaderFactory$LongTreeReader.nextVector(TreeReaderFactory.java:641) at org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextBatch(TreeReaderFactory.java:2047) at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1219) at org.apache.spark.sql.execution.datasources.orc.OrcColumnarBatchReader.nextBatch(OrcColumnarBatchReader.java:197) at org.apache.spark.sql.execution.datasources.orc.OrcColumnarBatchReader.nextKeyValue(OrcColumnarBatchReader.java:99) at org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:93) at org.apache.spark.sql.execution.FileSourceScanExec$$anon$1.hasNext(DataSourceScanExec.scala:522) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage8.columnartorow_nextBatch_0$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage8.agg_doAggregateWithKeys_0$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage8.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) at org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:179) at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52) at org.apache.spark.scheduler.Task.run(Task.scala:131) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:510) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1491) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:513) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:829) {code} was: `MalformedInputException` may be thrown because the decompression failed when reading the file. In this case, the error message does not contain the file name. If the file name is
[jira] [Created] (SPARK-42288) Expose file path if reading failed
Yi kaifei created SPARK-42288: - Summary: Expose file path if reading failed Key: SPARK-42288 URL: https://issues.apache.org/jira/browse/SPARK-42288 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.4.0 Reporter: Yi kaifei -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42285) Introduce conf spark.sql.parquet.inferTimestampNTZ.enabled for TimestampNTZ inference on Parquet
[ https://issues.apache.org/jira/browse/SPARK-42285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17683238#comment-17683238 ] Apache Spark commented on SPARK-42285: -- User 'gengliangwang' has created a pull request for this issue: https://github.com/apache/spark/pull/39856 > Introduce conf spark.sql.parquet.inferTimestampNTZ.enabled for TimestampNTZ > inference on Parquet > > > Key: SPARK-42285 > URL: https://issues.apache.org/jira/browse/SPARK-42285 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > > Introduce conf spark.sql.parquet.inferTimestampNTZ.enabled for TimestampNTZ > inference on Parquet, instead of using spark.sql.parquet.timestampNTZ.enabled > which makes it impossible for TimestampNTZ writing when the flag is disabled. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42287) Refactor `assembly / assemblyExcludedJars` rule in `SparkConnectClient`
[ https://issues.apache.org/jira/browse/SPARK-42287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42287: Assignee: Apache Spark > Refactor `assembly / assemblyExcludedJars` rule in `SparkConnectClient` > --- > > Key: SPARK-42287 > URL: https://issues.apache.org/jira/browse/SPARK-42287 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.5.0 >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42287) Refactor `assembly / assemblyExcludedJars` rule in `SparkConnectClient`
[ https://issues.apache.org/jira/browse/SPARK-42287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42287: Assignee: (was: Apache Spark) > Refactor `assembly / assemblyExcludedJars` rule in `SparkConnectClient` > --- > > Key: SPARK-42287 > URL: https://issues.apache.org/jira/browse/SPARK-42287 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.5.0 >Reporter: Yang Jie >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42287) Refactor `assembly / assemblyExcludedJars` rule in `SparkConnectClient`
[ https://issues.apache.org/jira/browse/SPARK-42287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17683237#comment-17683237 ] Apache Spark commented on SPARK-42287: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/39857 > Refactor `assembly / assemblyExcludedJars` rule in `SparkConnectClient` > --- > > Key: SPARK-42287 > URL: https://issues.apache.org/jira/browse/SPARK-42287 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.5.0 >Reporter: Yang Jie >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42217) Support lateral column alias in queries with Window
[ https://issues.apache.org/jira/browse/SPARK-42217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang resolved SPARK-42217. Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39773 [https://github.com/apache/spark/pull/39773] > Support lateral column alias in queries with Window > --- > > Key: SPARK-42217 > URL: https://issues.apache.org/jira/browse/SPARK-42217 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Xinyi Yu >Assignee: Xinyi Yu >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42217) Support lateral column alias in queries with Window
[ https://issues.apache.org/jira/browse/SPARK-42217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang reassigned SPARK-42217: -- Assignee: Xinyi Yu > Support lateral column alias in queries with Window > --- > > Key: SPARK-42217 > URL: https://issues.apache.org/jira/browse/SPARK-42217 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Xinyi Yu >Assignee: Xinyi Yu >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42285) Introduce conf spark.sql.parquet.inferTimestampNTZ.enabled for TimestampNTZ inference on Parquet
[ https://issues.apache.org/jira/browse/SPARK-42285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang resolved SPARK-42285. Resolution: Fixed Resolved in https://github.com/apache/spark/pull/39856 > Introduce conf spark.sql.parquet.inferTimestampNTZ.enabled for TimestampNTZ > inference on Parquet > > > Key: SPARK-42285 > URL: https://issues.apache.org/jira/browse/SPARK-42285 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > > Introduce conf spark.sql.parquet.inferTimestampNTZ.enabled for TimestampNTZ > inference on Parquet, instead of using spark.sql.parquet.timestampNTZ.enabled > which makes it impossible for TimestampNTZ writing when the flag is disabled. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42287) Refactor `assembly / assemblyExcludedJars` rule in `SparkConnectClient`
Yang Jie created SPARK-42287: Summary: Refactor `assembly / assemblyExcludedJars` rule in `SparkConnectClient` Key: SPARK-42287 URL: https://issues.apache.org/jira/browse/SPARK-42287 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 3.5.0 Reporter: Yang Jie -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42273) Skip Spark Connect tests if dependencies are not installed
[ https://issues.apache.org/jira/browse/SPARK-42273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-42273. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39840 [https://github.com/apache/spark/pull/39840] > Skip Spark Connect tests if dependencies are not installed > -- > > Key: SPARK-42273 > URL: https://issues.apache.org/jira/browse/SPARK-42273 > Project: Spark > Issue Type: Sub-task > Components: Connect, Tests >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 3.4.0 > > > {code} > arget/7411b1a1-5ebc-47a6-b3cb-c73dedc9a3c9/python3.9__pyspark.sql.tests.connect.test_parity_catalog__7iw4wnpb.log) > Traceback (most recent call last): > File > "/usr/local/Cellar/python@3.9/3.9.10/Frameworks/Python.framework/Versions/3.9/lib/python3.9/runpy.py", > line 197, in _run_module_as_main > return _run_code(code, main_globals, None, > File > "/usr/local/Cellar/python@3.9/3.9.10/Frameworks/Python.framework/Versions/3.9/lib/python3.9/runpy.py", > line 87, in _run_code > exec(code, run_globals) > File "/.../spark/python/pyspark/sql/tests/connect/test_connect_basic.py", > line 29, in > from pyspark.sql.connect.client import Retrying > File "/.../spark/python/pyspark/sql/connect/__init__.py", line 21, in > > from pyspark.sql.connect.dataframe import DataFrame # noqa: F401 > File "/.../spark/python/pyspark/sql/connect/dataframe.py", line 50, in > > import pyspark.sql.connect.plan as plan > File "/.../spark/python/pyspark/sql/connect/plan.py", line 26, in > import pyspark.sql.connect.proto as proto > File "/.../spark/python/pyspark/sql/connect/proto/__init__.py", line 18, in > > from pyspark.sql.connect.proto.base_pb2_grpc import * > File "/.../spark/python/pyspark/sql/connect/proto/base_pb2_grpc.py", line > 19, in > import grpc > ModuleNotFoundError: No module named 'grpc' > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42273) Skip Spark Connect tests if dependencies are not installed
[ https://issues.apache.org/jira/browse/SPARK-42273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-42273: Assignee: Hyukjin Kwon > Skip Spark Connect tests if dependencies are not installed > -- > > Key: SPARK-42273 > URL: https://issues.apache.org/jira/browse/SPARK-42273 > Project: Spark > Issue Type: Sub-task > Components: Connect, Tests >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > > {code} > arget/7411b1a1-5ebc-47a6-b3cb-c73dedc9a3c9/python3.9__pyspark.sql.tests.connect.test_parity_catalog__7iw4wnpb.log) > Traceback (most recent call last): > File > "/usr/local/Cellar/python@3.9/3.9.10/Frameworks/Python.framework/Versions/3.9/lib/python3.9/runpy.py", > line 197, in _run_module_as_main > return _run_code(code, main_globals, None, > File > "/usr/local/Cellar/python@3.9/3.9.10/Frameworks/Python.framework/Versions/3.9/lib/python3.9/runpy.py", > line 87, in _run_code > exec(code, run_globals) > File "/.../spark/python/pyspark/sql/tests/connect/test_connect_basic.py", > line 29, in > from pyspark.sql.connect.client import Retrying > File "/.../spark/python/pyspark/sql/connect/__init__.py", line 21, in > > from pyspark.sql.connect.dataframe import DataFrame # noqa: F401 > File "/.../spark/python/pyspark/sql/connect/dataframe.py", line 50, in > > import pyspark.sql.connect.plan as plan > File "/.../spark/python/pyspark/sql/connect/plan.py", line 26, in > import pyspark.sql.connect.proto as proto > File "/.../spark/python/pyspark/sql/connect/proto/__init__.py", line 18, in > > from pyspark.sql.connect.proto.base_pb2_grpc import * > File "/.../spark/python/pyspark/sql/connect/proto/base_pb2_grpc.py", line > 19, in > import grpc > ModuleNotFoundError: No module named 'grpc' > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42271) Reuse UDF test cases under `pyspark.sql.tests`
[ https://issues.apache.org/jira/browse/SPARK-42271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-42271: Assignee: Xinrong Meng > Reuse UDF test cases under `pyspark.sql.tests` > -- > > Key: SPARK-42271 > URL: https://issues.apache.org/jira/browse/SPARK-42271 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42271) Reuse UDF test cases under `pyspark.sql.tests`
[ https://issues.apache.org/jira/browse/SPARK-42271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-42271. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39814 [https://github.com/apache/spark/pull/39814] > Reuse UDF test cases under `pyspark.sql.tests` > -- > > Key: SPARK-42271 > URL: https://issues.apache.org/jira/browse/SPARK-42271 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42282) Split 'pyspark.pandas.tests.test_groupby'
[ https://issues.apache.org/jira/browse/SPARK-42282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng reassigned SPARK-42282: - Assignee: Ruifeng Zheng > Split 'pyspark.pandas.tests.test_groupby' > - > > Key: SPARK-42282 > URL: https://issues.apache.org/jira/browse/SPARK-42282 > Project: Spark > Issue Type: Test > Components: ps, Tests >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42282) Split 'pyspark.pandas.tests.test_groupby'
[ https://issues.apache.org/jira/browse/SPARK-42282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-42282. --- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39849 [https://github.com/apache/spark/pull/39849] > Split 'pyspark.pandas.tests.test_groupby' > - > > Key: SPARK-42282 > URL: https://issues.apache.org/jira/browse/SPARK-42282 > Project: Spark > Issue Type: Test > Components: ps, Tests >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Minor > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42093) Move JavaTypeInference to AgnosticEncoders
[ https://issues.apache.org/jira/browse/SPARK-42093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-42093. - Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39615 [https://github.com/apache/spark/pull/39615] > Move JavaTypeInference to AgnosticEncoders > -- > > Key: SPARK-42093 > URL: https://issues.apache.org/jira/browse/SPARK-42093 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Herman van Hövell >Assignee: Herman van Hövell >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42268) Add UserDefinedType in protos
[ https://issues.apache.org/jira/browse/SPARK-42268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-42268. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39835 [https://github.com/apache/spark/pull/39835] > Add UserDefinedType in protos > - > > Key: SPARK-42268 > URL: https://issues.apache.org/jira/browse/SPARK-42268 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42268) Add UserDefinedType in protos
[ https://issues.apache.org/jira/browse/SPARK-42268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-42268: Assignee: Ruifeng Zheng > Add UserDefinedType in protos > - > > Key: SPARK-42268 > URL: https://issues.apache.org/jira/browse/SPARK-42268 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42275) Avoid using built-in list, dict in static typing
[ https://issues.apache.org/jira/browse/SPARK-42275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-42275. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39844 [https://github.com/apache/spark/pull/39844] > Avoid using built-in list, dict in static typing > > > Key: SPARK-42275 > URL: https://issues.apache.org/jira/browse/SPARK-42275 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42275) Avoid using built-in list, dict in static typing
[ https://issues.apache.org/jira/browse/SPARK-42275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-42275: Assignee: Ruifeng Zheng > Avoid using built-in list, dict in static typing > > > Key: SPARK-42275 > URL: https://issues.apache.org/jira/browse/SPARK-42275 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42279) Simplify `pyspark.pandas.tests.test_resample`
[ https://issues.apache.org/jira/browse/SPARK-42279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-42279: Assignee: Ruifeng Zheng > Simplify `pyspark.pandas.tests.test_resample` > - > > Key: SPARK-42279 > URL: https://issues.apache.org/jira/browse/SPARK-42279 > Project: Spark > Issue Type: Test > Components: ps, Tests >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42279) Simplify `pyspark.pandas.tests.test_resample`
[ https://issues.apache.org/jira/browse/SPARK-42279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-42279. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39847 [https://github.com/apache/spark/pull/39847] > Simplify `pyspark.pandas.tests.test_resample` > - > > Key: SPARK-42279 > URL: https://issues.apache.org/jira/browse/SPARK-42279 > Project: Spark > Issue Type: Test > Components: ps, Tests >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Minor > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42284) Make sure Connect Server assembly jar is available before we run Scala Client tests
[ https://issues.apache.org/jira/browse/SPARK-42284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-42284: Assignee: Herman van Hövell > Make sure Connect Server assembly jar is available before we run Scala Client > tests > --- > > Key: SPARK-42284 > URL: https://issues.apache.org/jira/browse/SPARK-42284 > Project: Spark > Issue Type: Task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Herman van Hövell >Assignee: Herman van Hövell >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42284) Make sure Connect Server assembly jar is available before we run Scala Client tests
[ https://issues.apache.org/jira/browse/SPARK-42284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-42284. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39854 [https://github.com/apache/spark/pull/39854] > Make sure Connect Server assembly jar is available before we run Scala Client > tests > --- > > Key: SPARK-42284 > URL: https://issues.apache.org/jira/browse/SPARK-42284 > Project: Spark > Issue Type: Task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Herman van Hövell >Assignee: Herman van Hövell >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42115) Push down limit through Python UDFs
[ https://issues.apache.org/jira/browse/SPARK-42115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-42115: --- Assignee: Hyukjin Kwon > Push down limit through Python UDFs > --- > > Key: SPARK-42115 > URL: https://issues.apache.org/jira/browse/SPARK-42115 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > > {code} > from pyspark.sql.functions import udf > spark.range(10).write.mode("overwrite").parquet("/tmp/abc") > @udf(returnType='string') > def my_udf(arg): > return arg > df = spark.read.parquet("/tmp/abc") > df.limit(10).withColumn("prediction", my_udf(df["id"])).explain() > {code} > As an example. since Python UDFs are executed asynchronously, so pushing > limits benefit the performance. > {code} > == Physical Plan == > CollectLimit 10 > +- *(2) Project [id#3L, pythonUDF0#10 AS prediction#6] >+- BatchEvalPython [my_udf(id#3L)#5], [pythonUDF0#10] > +- *(1) ColumnarToRow > +- FileScan parquet [id#3L] Batched: true, DataFilters: [], Format: > Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/abc], > PartitionFilters: [], PushedFilters: [], ReadSchema: struct > {code} > This is a regression from Spark 3.3.1: > {code} > == Physical Plan == > AdaptiveSparkPlan isFinalPlan=false > +- Project [id#3L, pythonUDF0#10 AS prediction#6] > +- BatchEvalPython [my_udf(id#3L)#5], [pythonUDF0#10] > +- GlobalLimit 10 > +- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=30] > +- LocalLimit 10 > +- FileScan parquet [id#3L] Batched: true, DataFilters: [], > Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/abc], > PartitionFilters: [], PushedFilters: [], ReadSchema: struct > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42115) Push down limit through Python UDFs
[ https://issues.apache.org/jira/browse/SPARK-42115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-42115. - Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39842 [https://github.com/apache/spark/pull/39842] > Push down limit through Python UDFs > --- > > Key: SPARK-42115 > URL: https://issues.apache.org/jira/browse/SPARK-42115 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 3.4.0 > > > {code} > from pyspark.sql.functions import udf > spark.range(10).write.mode("overwrite").parquet("/tmp/abc") > @udf(returnType='string') > def my_udf(arg): > return arg > df = spark.read.parquet("/tmp/abc") > df.limit(10).withColumn("prediction", my_udf(df["id"])).explain() > {code} > As an example. since Python UDFs are executed asynchronously, so pushing > limits benefit the performance. > {code} > == Physical Plan == > CollectLimit 10 > +- *(2) Project [id#3L, pythonUDF0#10 AS prediction#6] >+- BatchEvalPython [my_udf(id#3L)#5], [pythonUDF0#10] > +- *(1) ColumnarToRow > +- FileScan parquet [id#3L] Batched: true, DataFilters: [], Format: > Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/abc], > PartitionFilters: [], PushedFilters: [], ReadSchema: struct > {code} > This is a regression from Spark 3.3.1: > {code} > == Physical Plan == > AdaptiveSparkPlan isFinalPlan=false > +- Project [id#3L, pythonUDF0#10 AS prediction#6] > +- BatchEvalPython [my_udf(id#3L)#5], [pythonUDF0#10] > +- GlobalLimit 10 > +- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=30] > +- LocalLimit 10 > +- FileScan parquet [id#3L] Batched: true, DataFilters: [], > Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/abc], > PartitionFilters: [], PushedFilters: [], ReadSchema: struct > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39375) SPIP: Spark Connect - A client and server interface for Apache Spark
[ https://issues.apache.org/jira/browse/SPARK-39375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17683177#comment-17683177 ] Erik Krogen commented on SPARK-39375: - I see some work being done on UDFs (SPARK-42246 for PySpark UDFs, SPARK-42283 for the start of Scala UDFs). In the [design doc for Spark Connect|https://docs.google.com/document/d/17X6-P5H2522SnE-gF1BVwyildp_PDX8oXD-4l9vqQmA/edit#] UDFs were left as a later problem. Do we have a design/approach documented anywhere for UDFs? The design of these is a crucial part of the future/success of Spark Connect and it's a bit concerning to me that we're making implementation progres in this direction without an agreed-upon design (AFAICT -- please let me know if I missed something). > SPIP: Spark Connect - A client and server interface for Apache Spark > > > Key: SPARK-39375 > URL: https://issues.apache.org/jira/browse/SPARK-39375 > Project: Spark > Issue Type: Epic > Components: Connect >Affects Versions: 3.4.0 >Reporter: Martin Grund >Assignee: Martin Grund >Priority: Critical > Labels: SPIP > > Please find the full document for discussion here: [Spark Connect > SPIP|https://docs.google.com/document/d/1Mnl6jmGszixLW4KcJU5j9IgpG9-UabS0dcM6PM2XGDc/edit#heading=h.wmsrrfealhrj] > Below, we have just referenced the introduction. > h2. What are you trying to do? > While Spark is used extensively, it was designed nearly a decade ago, which, > in the age of serverless computing and ubiquitous programming language use, > poses a number of limitations. Most of the limitations stem from the tightly > coupled Spark driver architecture and fact that clusters are typically shared > across users: (1) {*}Lack of built-in remote connectivity{*}: the Spark > driver runs both the client application and scheduler, which results in a > heavyweight architecture that requires proximity to the cluster. There is no > built-in capability to remotely connect to a Spark cluster in languages > other than SQL and users therefore rely on external solutions such as the > inactive project [Apache Livy|https://livy.apache.org/]. (2) {*}Lack of rich > developer experience{*}: The current architecture and APIs do not cater for > interactive data exploration (as done with Notebooks), or allow for building > out rich developer experience common in modern code editors. (3) > {*}Stability{*}: with the current shared driver architecture, users causing > critical exceptions (e.g. OOM) bring the whole cluster down for all users. > (4) {*}Upgradability{*}: the current entangling of platform and client APIs > (e.g. first and third-party dependencies in the classpath) does not allow for > seamless upgrades between Spark versions (and with that, hinders new feature > adoption). > > We propose to overcome these challenges by building on the DataFrame API and > the underlying unresolved logical plans. The DataFrame API is widely used and > makes it very easy to iteratively express complex logic. We will introduce > {_}Spark Connect{_}, a remote option of the DataFrame API that separates the > client from the Spark server. With Spark Connect, Spark will become > decoupled, allowing for built-in remote connectivity: The decoupled client > SDK can be used to run interactive data exploration and connect to the server > for DataFrame operations. > > Spark Connect will benefit Spark developers in different ways: The decoupled > architecture will result in improved stability, as clients are separated from > the driver. From the Spark Connect client perspective, Spark will be (almost) > versionless, and thus enable seamless upgradability, as server APIs can > evolve without affecting the client API. The decoupled client-server > architecture can be leveraged to build close integrations with local > developer tooling. Finally, separating the client process from the Spark > server process will improve Spark’s overall security posture by avoiding the > tight coupling of the client inside the Spark runtime environment. > > Spark Connect will strengthen Spark’s position as the modern unified engine > for large-scale data analytics and expand applicability to use cases and > developers we could not reach with the current setup: Spark will become > ubiquitously usable as the DataFrame API can be used with (almost) any > programming language. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42286) Fix internal error for valid CASE WHEN expression with CAST when inserting into a table
[ https://issues.apache.org/jira/browse/SPARK-42286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42286: Assignee: (was: Apache Spark) > Fix internal error for valid CASE WHEN expression with CAST when inserting > into a table > --- > > Key: SPARK-42286 > URL: https://issues.apache.org/jira/browse/SPARK-42286 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Runyao.Chen >Priority: Major > > ``` > spark-sql> create or replace table es570639t1 as select x FROM values (1), > (2), (3) as tab(x); > spark-sql> create or replace table es570639t2 (x Decimal(9, 0)); > spark-sql> insert into es570639t2 select 0 - (case when x = 1 then 1 else x > end) from es570639t1 where x = 1; > ``` > hits the following internal error > org.apache.spark.SparkException: [INTERNAL_ERROR] Child is not Cast or > ExpressionProxy of Cast > > Stack trace: > org.apache.spark.SparkException: [INTERNAL_ERROR] Child is not Cast or > ExpressionProxy of Cast at > org.apache.spark.SparkException$.internalError(SparkException.scala:78) at > org.apache.spark.SparkException$.internalError(SparkException.scala:82) at > org.apache.spark.sql.catalyst.expressions.CheckOverflowInTableInsert.checkChild(Cast.scala:2693) > at > org.apache.spark.sql.catalyst.expressions.CheckOverflowInTableInsert.withNewChildInternal(Cast.scala:2697) > at > org.apache.spark.sql.catalyst.expressions.CheckOverflowInTableInsert.withNewChildInternal(Cast.scala:2683) > at > org.apache.spark.sql.catalyst.trees.UnaryLike.$anonfun$mapChildren$5(TreeNode.scala:1315) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:106) > at > org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1314) > at > org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1309) > at > org.apache.spark.sql.catalyst.expressions.UnaryExpression.mapChildren(Expression.scala:636) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:570) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$1(TreeNode.scala:570) > > This internal error comes from `CheckOverflowInTableInsert``checkChild`, > where we covered only `Cast` expr and `ExpressionProxy` expr, but not the > `CaseWhen` expr. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42286) Fix internal error for valid CASE WHEN expression with CAST when inserting into a table
[ https://issues.apache.org/jira/browse/SPARK-42286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17683168#comment-17683168 ] Apache Spark commented on SPARK-42286: -- User 'RunyaoChen' has created a pull request for this issue: https://github.com/apache/spark/pull/39855 > Fix internal error for valid CASE WHEN expression with CAST when inserting > into a table > --- > > Key: SPARK-42286 > URL: https://issues.apache.org/jira/browse/SPARK-42286 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Runyao.Chen >Priority: Major > > ``` > spark-sql> create or replace table es570639t1 as select x FROM values (1), > (2), (3) as tab(x); > spark-sql> create or replace table es570639t2 (x Decimal(9, 0)); > spark-sql> insert into es570639t2 select 0 - (case when x = 1 then 1 else x > end) from es570639t1 where x = 1; > ``` > hits the following internal error > org.apache.spark.SparkException: [INTERNAL_ERROR] Child is not Cast or > ExpressionProxy of Cast > > Stack trace: > org.apache.spark.SparkException: [INTERNAL_ERROR] Child is not Cast or > ExpressionProxy of Cast at > org.apache.spark.SparkException$.internalError(SparkException.scala:78) at > org.apache.spark.SparkException$.internalError(SparkException.scala:82) at > org.apache.spark.sql.catalyst.expressions.CheckOverflowInTableInsert.checkChild(Cast.scala:2693) > at > org.apache.spark.sql.catalyst.expressions.CheckOverflowInTableInsert.withNewChildInternal(Cast.scala:2697) > at > org.apache.spark.sql.catalyst.expressions.CheckOverflowInTableInsert.withNewChildInternal(Cast.scala:2683) > at > org.apache.spark.sql.catalyst.trees.UnaryLike.$anonfun$mapChildren$5(TreeNode.scala:1315) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:106) > at > org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1314) > at > org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1309) > at > org.apache.spark.sql.catalyst.expressions.UnaryExpression.mapChildren(Expression.scala:636) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:570) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$1(TreeNode.scala:570) > > This internal error comes from `CheckOverflowInTableInsert``checkChild`, > where we covered only `Cast` expr and `ExpressionProxy` expr, but not the > `CaseWhen` expr. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42286) Fix internal error for valid CASE WHEN expression with CAST when inserting into a table
[ https://issues.apache.org/jira/browse/SPARK-42286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17683167#comment-17683167 ] Apache Spark commented on SPARK-42286: -- User 'RunyaoChen' has created a pull request for this issue: https://github.com/apache/spark/pull/39855 > Fix internal error for valid CASE WHEN expression with CAST when inserting > into a table > --- > > Key: SPARK-42286 > URL: https://issues.apache.org/jira/browse/SPARK-42286 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Runyao.Chen >Priority: Major > > ``` > spark-sql> create or replace table es570639t1 as select x FROM values (1), > (2), (3) as tab(x); > spark-sql> create or replace table es570639t2 (x Decimal(9, 0)); > spark-sql> insert into es570639t2 select 0 - (case when x = 1 then 1 else x > end) from es570639t1 where x = 1; > ``` > hits the following internal error > org.apache.spark.SparkException: [INTERNAL_ERROR] Child is not Cast or > ExpressionProxy of Cast > > Stack trace: > org.apache.spark.SparkException: [INTERNAL_ERROR] Child is not Cast or > ExpressionProxy of Cast at > org.apache.spark.SparkException$.internalError(SparkException.scala:78) at > org.apache.spark.SparkException$.internalError(SparkException.scala:82) at > org.apache.spark.sql.catalyst.expressions.CheckOverflowInTableInsert.checkChild(Cast.scala:2693) > at > org.apache.spark.sql.catalyst.expressions.CheckOverflowInTableInsert.withNewChildInternal(Cast.scala:2697) > at > org.apache.spark.sql.catalyst.expressions.CheckOverflowInTableInsert.withNewChildInternal(Cast.scala:2683) > at > org.apache.spark.sql.catalyst.trees.UnaryLike.$anonfun$mapChildren$5(TreeNode.scala:1315) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:106) > at > org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1314) > at > org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1309) > at > org.apache.spark.sql.catalyst.expressions.UnaryExpression.mapChildren(Expression.scala:636) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:570) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$1(TreeNode.scala:570) > > This internal error comes from `CheckOverflowInTableInsert``checkChild`, > where we covered only `Cast` expr and `ExpressionProxy` expr, but not the > `CaseWhen` expr. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42286) Fix internal error for valid CASE WHEN expression with CAST when inserting into a table
[ https://issues.apache.org/jira/browse/SPARK-42286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42286: Assignee: Apache Spark > Fix internal error for valid CASE WHEN expression with CAST when inserting > into a table > --- > > Key: SPARK-42286 > URL: https://issues.apache.org/jira/browse/SPARK-42286 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Runyao.Chen >Assignee: Apache Spark >Priority: Major > > ``` > spark-sql> create or replace table es570639t1 as select x FROM values (1), > (2), (3) as tab(x); > spark-sql> create or replace table es570639t2 (x Decimal(9, 0)); > spark-sql> insert into es570639t2 select 0 - (case when x = 1 then 1 else x > end) from es570639t1 where x = 1; > ``` > hits the following internal error > org.apache.spark.SparkException: [INTERNAL_ERROR] Child is not Cast or > ExpressionProxy of Cast > > Stack trace: > org.apache.spark.SparkException: [INTERNAL_ERROR] Child is not Cast or > ExpressionProxy of Cast at > org.apache.spark.SparkException$.internalError(SparkException.scala:78) at > org.apache.spark.SparkException$.internalError(SparkException.scala:82) at > org.apache.spark.sql.catalyst.expressions.CheckOverflowInTableInsert.checkChild(Cast.scala:2693) > at > org.apache.spark.sql.catalyst.expressions.CheckOverflowInTableInsert.withNewChildInternal(Cast.scala:2697) > at > org.apache.spark.sql.catalyst.expressions.CheckOverflowInTableInsert.withNewChildInternal(Cast.scala:2683) > at > org.apache.spark.sql.catalyst.trees.UnaryLike.$anonfun$mapChildren$5(TreeNode.scala:1315) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:106) > at > org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1314) > at > org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1309) > at > org.apache.spark.sql.catalyst.expressions.UnaryExpression.mapChildren(Expression.scala:636) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:570) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$1(TreeNode.scala:570) > > This internal error comes from `CheckOverflowInTableInsert``checkChild`, > where we covered only `Cast` expr and `ExpressionProxy` expr, but not the > `CaseWhen` expr. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42286) Fix internal error for valid CASE WHEN expression with CAST when inserting into a table
Runyao.Chen created SPARK-42286: --- Summary: Fix internal error for valid CASE WHEN expression with CAST when inserting into a table Key: SPARK-42286 URL: https://issues.apache.org/jira/browse/SPARK-42286 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.4.0 Reporter: Runyao.Chen ``` spark-sql> create or replace table es570639t1 as select x FROM values (1), (2), (3) as tab(x); spark-sql> create or replace table es570639t2 (x Decimal(9, 0)); spark-sql> insert into es570639t2 select 0 - (case when x = 1 then 1 else x end) from es570639t1 where x = 1; ``` hits the following internal error org.apache.spark.SparkException: [INTERNAL_ERROR] Child is not Cast or ExpressionProxy of Cast Stack trace: org.apache.spark.SparkException: [INTERNAL_ERROR] Child is not Cast or ExpressionProxy of Cast at org.apache.spark.SparkException$.internalError(SparkException.scala:78) at org.apache.spark.SparkException$.internalError(SparkException.scala:82) at org.apache.spark.sql.catalyst.expressions.CheckOverflowInTableInsert.checkChild(Cast.scala:2693) at org.apache.spark.sql.catalyst.expressions.CheckOverflowInTableInsert.withNewChildInternal(Cast.scala:2697) at org.apache.spark.sql.catalyst.expressions.CheckOverflowInTableInsert.withNewChildInternal(Cast.scala:2683) at org.apache.spark.sql.catalyst.trees.UnaryLike.$anonfun$mapChildren$5(TreeNode.scala:1315) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:106) at org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1314) at org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1309) at org.apache.spark.sql.catalyst.expressions.UnaryExpression.mapChildren(Expression.scala:636) at org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:570) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$1(TreeNode.scala:570) This internal error comes from `CheckOverflowInTableInsert``checkChild`, where we covered only `Cast` expr and `ExpressionProxy` expr, but not the `CaseWhen` expr. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42277) Use ROCKSDB for spark.history.store.hybridStore.diskBackend by default
[ https://issues.apache.org/jira/browse/SPARK-42277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-42277. --- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39845 [https://github.com/apache/spark/pull/39845] > Use ROCKSDB for spark.history.store.hybridStore.diskBackend by default > -- > > Key: SPARK-42277 > URL: https://issues.apache.org/jira/browse/SPARK-42277 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42277) Use ROCKSDB for spark.history.store.hybridStore.diskBackend by default
[ https://issues.apache.org/jira/browse/SPARK-42277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-42277: - Assignee: Dongjoon Hyun > Use ROCKSDB for spark.history.store.hybridStore.diskBackend by default > -- > > Key: SPARK-42277 > URL: https://issues.apache.org/jira/browse/SPARK-42277 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38829) New configuration for controlling timestamp inference of Parquet
[ https://issues.apache.org/jira/browse/SPARK-38829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17683156#comment-17683156 ] Apache Spark commented on SPARK-38829: -- User 'gengliangwang' has created a pull request for this issue: https://github.com/apache/spark/pull/39856 > New configuration for controlling timestamp inference of Parquet > > > Key: SPARK-38829 > URL: https://issues.apache.org/jira/browse/SPARK-38829 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Gengliang Wang >Assignee: Ivan Sadikov >Priority: Major > Fix For: 3.3.0 > > > A new SQL conf which can fallback to the behavior that reads all the Parquet > Timestamp column as TimestampType. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42285) Introduce conf spark.sql.parquet.inferTimestampNTZ.enabled for TimestampNTZ inference on Parquet
Gengliang Wang created SPARK-42285: -- Summary: Introduce conf spark.sql.parquet.inferTimestampNTZ.enabled for TimestampNTZ inference on Parquet Key: SPARK-42285 URL: https://issues.apache.org/jira/browse/SPARK-42285 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.4.0 Reporter: Gengliang Wang Assignee: Gengliang Wang Introduce conf spark.sql.parquet.inferTimestampNTZ.enabled for TimestampNTZ inference on Parquet, instead of using spark.sql.parquet.timestampNTZ.enabled which makes it impossible for TimestampNTZ writing when the flag is disabled. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42283) Add Simple Scala UDFs to Scala/JVM Client
[ https://issues.apache.org/jira/browse/SPARK-42283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Herman van Hövell reassigned SPARK-42283: - Assignee: Venkata Sai Akhil Gudesa > Add Simple Scala UDFs to Scala/JVM Client > - > > Key: SPARK-42283 > URL: https://issues.apache.org/jira/browse/SPARK-42283 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.5.0 >Reporter: Venkata Sai Akhil Gudesa >Assignee: Venkata Sai Akhil Gudesa >Priority: Major > Fix For: 3.4.0 > > > “Simple” here refers to UDFs that utilize no client-specific class files (e.g > REPL-generated) and JARs. Essentially, a “simple” UDF may only reference > in-built libraries and classes defined within the scope of the UDF. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42283) Add Simple Scala UDFs to Scala/JVM Client
[ https://issues.apache.org/jira/browse/SPARK-42283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Herman van Hövell resolved SPARK-42283. --- Fix Version/s: 3.4.0 Resolution: Fixed > Add Simple Scala UDFs to Scala/JVM Client > - > > Key: SPARK-42283 > URL: https://issues.apache.org/jira/browse/SPARK-42283 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.5.0 >Reporter: Venkata Sai Akhil Gudesa >Priority: Major > Fix For: 3.4.0 > > > “Simple” here refers to UDFs that utilize no client-specific class files (e.g > REPL-generated) and JARs. Essentially, a “simple” UDF may only reference > in-built libraries and classes defined within the scope of the UDF. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42228) connect-client-jvm module should shaded+relocation grpc
[ https://issues.apache.org/jira/browse/SPARK-42228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Herman van Hövell reassigned SPARK-42228: - Assignee: Yang Jie > connect-client-jvm module should shaded+relocation grpc > --- > > Key: SPARK-42228 > URL: https://issues.apache.org/jira/browse/SPARK-42228 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.4.0, 3.5.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Blocker > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42228) connect-client-jvm module should shaded+relocation grpc
[ https://issues.apache.org/jira/browse/SPARK-42228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Herman van Hövell resolved SPARK-42228. --- Fix Version/s: 3.4.0 Resolution: Fixed > connect-client-jvm module should shaded+relocation grpc > --- > > Key: SPARK-42228 > URL: https://issues.apache.org/jira/browse/SPARK-42228 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.4.0, 3.5.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Blocker > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42284) Make sure Connect Server assembly jar is available before we run Scala Client tests
[ https://issues.apache.org/jira/browse/SPARK-42284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42284: Assignee: Apache Spark > Make sure Connect Server assembly jar is available before we run Scala Client > tests > --- > > Key: SPARK-42284 > URL: https://issues.apache.org/jira/browse/SPARK-42284 > Project: Spark > Issue Type: Task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Herman van Hövell >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42284) Make sure Connect Server assembly jar is available before we run Scala Client tests
[ https://issues.apache.org/jira/browse/SPARK-42284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17683098#comment-17683098 ] Apache Spark commented on SPARK-42284: -- User 'hvanhovell' has created a pull request for this issue: https://github.com/apache/spark/pull/39854 > Make sure Connect Server assembly jar is available before we run Scala Client > tests > --- > > Key: SPARK-42284 > URL: https://issues.apache.org/jira/browse/SPARK-42284 > Project: Spark > Issue Type: Task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Herman van Hövell >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42284) Make sure Connect Server assembly jar is available before we run Scala Client tests
[ https://issues.apache.org/jira/browse/SPARK-42284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42284: Assignee: (was: Apache Spark) > Make sure Connect Server assembly jar is available before we run Scala Client > tests > --- > > Key: SPARK-42284 > URL: https://issues.apache.org/jira/browse/SPARK-42284 > Project: Spark > Issue Type: Task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Herman van Hövell >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42284) Make sure Connect Server assembly jar is available before we run Scala Client tests
Herman van Hövell created SPARK-42284: - Summary: Make sure Connect Server assembly jar is available before we run Scala Client tests Key: SPARK-42284 URL: https://issues.apache.org/jira/browse/SPARK-42284 Project: Spark Issue Type: Task Components: Connect Affects Versions: 3.4.0 Reporter: Herman van Hövell -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41985) Centralize more column resolution rules
[ https://issues.apache.org/jira/browse/SPARK-41985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-41985: --- Assignee: Wenchen Fan > Centralize more column resolution rules > --- > > Key: SPARK-41985 > URL: https://issues.apache.org/jira/browse/SPARK-41985 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41985) Centralize more column resolution rules
[ https://issues.apache.org/jira/browse/SPARK-41985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-41985. - Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39508 [https://github.com/apache/spark/pull/39508] > Centralize more column resolution rules > --- > > Key: SPARK-41985 > URL: https://issues.apache.org/jira/browse/SPARK-41985 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41488) Assign name to _LEGACY_ERROR_TEMP_1176
[ https://issues.apache.org/jira/browse/SPARK-41488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk resolved SPARK-41488. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39833 [https://github.com/apache/spark/pull/39833] > Assign name to _LEGACY_ERROR_TEMP_1176 > -- > > Key: SPARK-41488 > URL: https://issues.apache.org/jira/browse/SPARK-41488 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Fix For: 3.4.0 > > > We should assign proper name for all LEGACY temp error classes. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41488) Assign name to _LEGACY_ERROR_TEMP_1176
[ https://issues.apache.org/jira/browse/SPARK-41488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk reassigned SPARK-41488: Assignee: Haejoon Lee > Assign name to _LEGACY_ERROR_TEMP_1176 > -- > > Key: SPARK-41488 > URL: https://issues.apache.org/jira/browse/SPARK-41488 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > > We should assign proper name for all LEGACY temp error classes. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42229) Migrate SparkCoreErrors into error class
[ https://issues.apache.org/jira/browse/SPARK-42229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk updated SPARK-42229: - Fix Version/s: 3.4.0 > Migrate SparkCoreErrors into error class > > > Key: SPARK-42229 > URL: https://issues.apache.org/jira/browse/SPARK-42229 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Fix For: 3.4.0, 3.5.0 > > > Migrate core/src/main/scala/org/apache/spark/errors/SparkCoreErrors.scala > onto error class. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42239) Integrate MUST_AGGREGATE_CORRELATED_SCALAR_SUBQUERY
[ https://issues.apache.org/jira/browse/SPARK-42239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk updated SPARK-42239: - Fix Version/s: 3.4.0 > Integrate MUST_AGGREGATE_CORRELATED_SCALAR_SUBQUERY > --- > > Key: SPARK-42239 > URL: https://issues.apache.org/jira/browse/SPARK-42239 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Fix For: 3.4.0, 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42281) Update Debugging PySpark documents to show error message properly
[ https://issues.apache.org/jira/browse/SPARK-42281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17683041#comment-17683041 ] Apache Spark commented on SPARK-42281: -- User 'itholic' has created a pull request for this issue: https://github.com/apache/spark/pull/39852 > Update Debugging PySpark documents to show error message properly > - > > Key: SPARK-42281 > URL: https://issues.apache.org/jira/browse/SPARK-42281 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Priority: Major > > The example in > [https://spark.apache.org/docs/latest/api/python/development/debugging.html#debugging-pyspark] > is outdated due to new PySpark error framework. > We should show proper example -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42281) Update Debugging PySpark documents to show error message properly
[ https://issues.apache.org/jira/browse/SPARK-42281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42281: Assignee: (was: Apache Spark) > Update Debugging PySpark documents to show error message properly > - > > Key: SPARK-42281 > URL: https://issues.apache.org/jira/browse/SPARK-42281 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Priority: Major > > The example in > [https://spark.apache.org/docs/latest/api/python/development/debugging.html#debugging-pyspark] > is outdated due to new PySpark error framework. > We should show proper example -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42283) Add Simple Scala UDFs to Scala/JVM Client
[ https://issues.apache.org/jira/browse/SPARK-42283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17683039#comment-17683039 ] Apache Spark commented on SPARK-42283: -- User 'vicennial' has created a pull request for this issue: https://github.com/apache/spark/pull/39850 > Add Simple Scala UDFs to Scala/JVM Client > - > > Key: SPARK-42283 > URL: https://issues.apache.org/jira/browse/SPARK-42283 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.5.0 >Reporter: Venkata Sai Akhil Gudesa >Priority: Major > > “Simple” here refers to UDFs that utilize no client-specific class files (e.g > REPL-generated) and JARs. Essentially, a “simple” UDF may only reference > in-built libraries and classes defined within the scope of the UDF. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42281) Update Debugging PySpark documents to show error message properly
[ https://issues.apache.org/jira/browse/SPARK-42281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42281: Assignee: Apache Spark > Update Debugging PySpark documents to show error message properly > - > > Key: SPARK-42281 > URL: https://issues.apache.org/jira/browse/SPARK-42281 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Assignee: Apache Spark >Priority: Major > > The example in > [https://spark.apache.org/docs/latest/api/python/development/debugging.html#debugging-pyspark] > is outdated due to new PySpark error framework. > We should show proper example -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42283) Add Simple Scala UDFs to Scala/JVM Client
[ https://issues.apache.org/jira/browse/SPARK-42283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42283: Assignee: (was: Apache Spark) > Add Simple Scala UDFs to Scala/JVM Client > - > > Key: SPARK-42283 > URL: https://issues.apache.org/jira/browse/SPARK-42283 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.5.0 >Reporter: Venkata Sai Akhil Gudesa >Priority: Major > > “Simple” here refers to UDFs that utilize no client-specific class files (e.g > REPL-generated) and JARs. Essentially, a “simple” UDF may only reference > in-built libraries and classes defined within the scope of the UDF. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42283) Add Simple Scala UDFs to Scala/JVM Client
[ https://issues.apache.org/jira/browse/SPARK-42283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42283: Assignee: Apache Spark > Add Simple Scala UDFs to Scala/JVM Client > - > > Key: SPARK-42283 > URL: https://issues.apache.org/jira/browse/SPARK-42283 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.5.0 >Reporter: Venkata Sai Akhil Gudesa >Assignee: Apache Spark >Priority: Major > > “Simple” here refers to UDFs that utilize no client-specific class files (e.g > REPL-generated) and JARs. Essentially, a “simple” UDF may only reference > in-built libraries and classes defined within the scope of the UDF. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42281) Update Debugging PySpark documents to show error message properly
[ https://issues.apache.org/jira/browse/SPARK-42281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17683038#comment-17683038 ] Apache Spark commented on SPARK-42281: -- User 'itholic' has created a pull request for this issue: https://github.com/apache/spark/pull/39852 > Update Debugging PySpark documents to show error message properly > - > > Key: SPARK-42281 > URL: https://issues.apache.org/jira/browse/SPARK-42281 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Priority: Major > > The example in > [https://spark.apache.org/docs/latest/api/python/development/debugging.html#debugging-pyspark] > is outdated due to new PySpark error framework. > We should show proper example -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42283) Add Simple Scala UDFs to Scala/JVM Client
[ https://issues.apache.org/jira/browse/SPARK-42283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venkata Sai Akhil Gudesa updated SPARK-42283: - Description: “Simple” here refers to UDFs that utilize no client-specific class files (e.g REPL-generated) and JARs. Essentially, a “simple” UDF may only reference in-built libraries and classes defined within the scope of the UDF. (was: “Simple” here refers to UDFs that utilize no client-specific class files (e.g REPL-generated) and JARs. Essentially, a “vanilla” UDF may only reference in-built libraries and classes defined within the scope of the UDF.) > Add Simple Scala UDFs to Scala/JVM Client > - > > Key: SPARK-42283 > URL: https://issues.apache.org/jira/browse/SPARK-42283 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.5.0 >Reporter: Venkata Sai Akhil Gudesa >Priority: Major > > “Simple” here refers to UDFs that utilize no client-specific class files (e.g > REPL-generated) and JARs. Essentially, a “simple” UDF may only reference > in-built libraries and classes defined within the scope of the UDF. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42283) Add Simple Scala UDFs to Scala/JVM Client
Venkata Sai Akhil Gudesa created SPARK-42283: Summary: Add Simple Scala UDFs to Scala/JVM Client Key: SPARK-42283 URL: https://issues.apache.org/jira/browse/SPARK-42283 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 3.5.0 Reporter: Venkata Sai Akhil Gudesa “Simple” here refers to UDFs that utilize no client-specific class files (e.g REPL-generated) and JARs. Essentially, a “vanilla” UDF may only reference in-built libraries and classes defined within the scope of the UDF. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42282) Split 'pyspark.pandas.tests.test_groupby'
[ https://issues.apache.org/jira/browse/SPARK-42282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17683025#comment-17683025 ] Apache Spark commented on SPARK-42282: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/39849 > Split 'pyspark.pandas.tests.test_groupby' > - > > Key: SPARK-42282 > URL: https://issues.apache.org/jira/browse/SPARK-42282 > Project: Spark > Issue Type: Test > Components: ps, Tests >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42282) Split 'pyspark.pandas.tests.test_groupby'
[ https://issues.apache.org/jira/browse/SPARK-42282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42282: Assignee: Apache Spark > Split 'pyspark.pandas.tests.test_groupby' > - > > Key: SPARK-42282 > URL: https://issues.apache.org/jira/browse/SPARK-42282 > Project: Spark > Issue Type: Test > Components: ps, Tests >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Apache Spark >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42282) Split 'pyspark.pandas.tests.test_groupby'
[ https://issues.apache.org/jira/browse/SPARK-42282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17683023#comment-17683023 ] Apache Spark commented on SPARK-42282: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/39849 > Split 'pyspark.pandas.tests.test_groupby' > - > > Key: SPARK-42282 > URL: https://issues.apache.org/jira/browse/SPARK-42282 > Project: Spark > Issue Type: Test > Components: ps, Tests >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42282) Split 'pyspark.pandas.tests.test_groupby'
[ https://issues.apache.org/jira/browse/SPARK-42282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42282: Assignee: (was: Apache Spark) > Split 'pyspark.pandas.tests.test_groupby' > - > > Key: SPARK-42282 > URL: https://issues.apache.org/jira/browse/SPARK-42282 > Project: Spark > Issue Type: Test > Components: ps, Tests >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42282) Split 'pyspark.pandas.tests.test_groupby'
Ruifeng Zheng created SPARK-42282: - Summary: Split 'pyspark.pandas.tests.test_groupby' Key: SPARK-42282 URL: https://issues.apache.org/jira/browse/SPARK-42282 Project: Spark Issue Type: Test Components: ps, Tests Affects Versions: 3.4.0 Reporter: Ruifeng Zheng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42279) Simplify `pyspark.pandas.tests.test_resample`
[ https://issues.apache.org/jira/browse/SPARK-42279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng updated SPARK-42279: -- Summary: Simplify `pyspark.pandas.tests.test_resample` (was: Simplify `test_resample`) > Simplify `pyspark.pandas.tests.test_resample` > - > > Key: SPARK-42279 > URL: https://issues.apache.org/jira/browse/SPARK-42279 > Project: Spark > Issue Type: Test > Components: ps, Tests >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42276) Add ServicesResourceTransformer to connect server module shade configuration
[ https://issues.apache.org/jira/browse/SPARK-42276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17683018#comment-17683018 ] Apache Spark commented on SPARK-42276: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/39848 > Add ServicesResourceTransformer to connect server module shade configuration > - > > Key: SPARK-42276 > URL: https://issues.apache.org/jira/browse/SPARK-42276 > Project: Spark > Issue Type: Bug > Components: Build, Connect >Affects Versions: 3.4.0, 3.5.0 >Reporter: Yang Jie >Priority: Minor > > The contents of META-INF/services directory in the shaded connect-server jar > have not been relocated. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42276) Add ServicesResourceTransformer to connect server module shade configuration
[ https://issues.apache.org/jira/browse/SPARK-42276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17683017#comment-17683017 ] Apache Spark commented on SPARK-42276: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/39848 > Add ServicesResourceTransformer to connect server module shade configuration > - > > Key: SPARK-42276 > URL: https://issues.apache.org/jira/browse/SPARK-42276 > Project: Spark > Issue Type: Bug > Components: Build, Connect >Affects Versions: 3.4.0, 3.5.0 >Reporter: Yang Jie >Priority: Minor > > The contents of META-INF/services directory in the shaded connect-server jar > have not been relocated. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42276) Add ServicesResourceTransformer to connect server module shade configuration
[ https://issues.apache.org/jira/browse/SPARK-42276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42276: Assignee: (was: Apache Spark) > Add ServicesResourceTransformer to connect server module shade configuration > - > > Key: SPARK-42276 > URL: https://issues.apache.org/jira/browse/SPARK-42276 > Project: Spark > Issue Type: Bug > Components: Build, Connect >Affects Versions: 3.4.0, 3.5.0 >Reporter: Yang Jie >Priority: Minor > > The contents of META-INF/services directory in the shaded connect-server jar > have not been relocated. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42276) Add ServicesResourceTransformer to connect server module shade configuration
[ https://issues.apache.org/jira/browse/SPARK-42276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42276: Assignee: Apache Spark > Add ServicesResourceTransformer to connect server module shade configuration > - > > Key: SPARK-42276 > URL: https://issues.apache.org/jira/browse/SPARK-42276 > Project: Spark > Issue Type: Bug > Components: Build, Connect >Affects Versions: 3.4.0, 3.5.0 >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Minor > > The contents of META-INF/services directory in the shaded connect-server jar > have not been relocated. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42276) Add ServicesResourceTransformer to connect server module shade configuration
[ https://issues.apache.org/jira/browse/SPARK-42276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie updated SPARK-42276: - Summary: Add ServicesResourceTransformer to connect server module shade configuration (was: Add ServicesResourceTransformer to connect server module relocation configuration) > Add ServicesResourceTransformer to connect server module shade configuration > - > > Key: SPARK-42276 > URL: https://issues.apache.org/jira/browse/SPARK-42276 > Project: Spark > Issue Type: Bug > Components: Build, Connect >Affects Versions: 3.4.0, 3.5.0 >Reporter: Yang Jie >Priority: Minor > > The contents of META-INF/services directory in the shaded connect-server jar > have not been relocated. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42281) Update Debugging PySpark documents to show error message properly
Haejoon Lee created SPARK-42281: --- Summary: Update Debugging PySpark documents to show error message properly Key: SPARK-42281 URL: https://issues.apache.org/jira/browse/SPARK-42281 Project: Spark Issue Type: Sub-task Components: Documentation, PySpark Affects Versions: 3.4.0 Reporter: Haejoon Lee The example in [https://spark.apache.org/docs/latest/api/python/development/debugging.html#debugging-pyspark] is outdated due to new PySpark error framework. We should show proper example -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42276) Add ServicesResourceTransformer to connect server module relocation configuration
[ https://issues.apache.org/jira/browse/SPARK-42276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie updated SPARK-42276: - Issue Type: Bug (was: Improvement) > Add ServicesResourceTransformer to connect server module relocation > configuration > -- > > Key: SPARK-42276 > URL: https://issues.apache.org/jira/browse/SPARK-42276 > Project: Spark > Issue Type: Bug > Components: Build, Connect >Affects Versions: 3.4.0, 3.5.0 >Reporter: Yang Jie >Priority: Minor > > The contents of META-INF/services directory in the shaded connect-server jar > have not been relocated. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42276) Add ServicesResourceTransformer to connect server module relocation configuration
[ https://issues.apache.org/jira/browse/SPARK-42276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie updated SPARK-42276: - Issue Type: Improvement (was: Bug) > Add ServicesResourceTransformer to connect server module relocation > configuration > -- > > Key: SPARK-42276 > URL: https://issues.apache.org/jira/browse/SPARK-42276 > Project: Spark > Issue Type: Improvement > Components: Build, Connect >Affects Versions: 3.4.0, 3.5.0 >Reporter: Yang Jie >Priority: Major > > The contents of META-INF/services directory in the shaded connect-server jar > have not been relocated. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42276) Add ServicesResourceTransformer to connect server module relocation configuration
[ https://issues.apache.org/jira/browse/SPARK-42276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie updated SPARK-42276: - Priority: Minor (was: Major) > Add ServicesResourceTransformer to connect server module relocation > configuration > -- > > Key: SPARK-42276 > URL: https://issues.apache.org/jira/browse/SPARK-42276 > Project: Spark > Issue Type: Improvement > Components: Build, Connect >Affects Versions: 3.4.0, 3.5.0 >Reporter: Yang Jie >Priority: Minor > > The contents of META-INF/services directory in the shaded connect-server jar > have not been relocated. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42228) connect-client-jvm module should shaded+relocation grpc
[ https://issues.apache.org/jira/browse/SPARK-42228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie updated SPARK-42228: - Priority: Blocker (was: Major) > connect-client-jvm module should shaded+relocation grpc > --- > > Key: SPARK-42228 > URL: https://issues.apache.org/jira/browse/SPARK-42228 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.5.0 >Reporter: Yang Jie >Priority: Blocker > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42228) connect-client-jvm module should shaded+relocation grpc
[ https://issues.apache.org/jira/browse/SPARK-42228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie updated SPARK-42228: - Affects Version/s: 3.4.0 > connect-client-jvm module should shaded+relocation grpc > --- > > Key: SPARK-42228 > URL: https://issues.apache.org/jira/browse/SPARK-42228 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.4.0, 3.5.0 >Reporter: Yang Jie >Priority: Blocker > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42278) DS V2 pushdown supports supports JDBC dialects compile `SortOrder` by themselves
[ https://issues.apache.org/jira/browse/SPARK-42278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-42278: --- Assignee: jiaan.geng > DS V2 pushdown supports supports JDBC dialects compile `SortOrder` by > themselves > > > Key: SPARK-42278 > URL: https://issues.apache.org/jira/browse/SPARK-42278 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: jiaan.geng >Assignee: jiaan.geng >Priority: Major > > Currently, DS V2 pushdown framework compile the SortOrder in fixed format. > This is not flexible and friendly for some databases that do not support this > syntax. > For example, the fixed format order by col asc nulls first is not supported > by mssql server who doesn't support nulls first. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42278) DS V2 pushdown supports supports JDBC dialects compile `SortOrder` by themselves
[ https://issues.apache.org/jira/browse/SPARK-42278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-42278. - Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39846 [https://github.com/apache/spark/pull/39846] > DS V2 pushdown supports supports JDBC dialects compile `SortOrder` by > themselves > > > Key: SPARK-42278 > URL: https://issues.apache.org/jira/browse/SPARK-42278 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: jiaan.geng >Assignee: jiaan.geng >Priority: Major > Fix For: 3.4.0 > > > Currently, DS V2 pushdown framework compile the SortOrder in fixed format. > This is not flexible and friendly for some databases that do not support this > syntax. > For example, the fixed format order by col asc nulls first is not supported > by mssql server who doesn't support nulls first. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42280) add spark.yarn.archive/jars similar option for spark on K8S
Xianjin YE created SPARK-42280: -- Summary: add spark.yarn.archive/jars similar option for spark on K8S Key: SPARK-42280 URL: https://issues.apache.org/jira/browse/SPARK-42280 Project: Spark Issue Type: Improvement Components: Kubernetes Affects Versions: 3.3.1, 3.2.2 Reporter: Xianjin YE For spark on yarn, there are `spark.yarn.archive` and `spark.yarn.jars` to distribute spark runtime jars before driver/executor starts up. I'd like to propose similar functionality for spark on K8S. The benefits are: # accelerating workloads migration from yarn to K8S which use the above feature # explore new version of spark more easily without to rebuild the spark image # currently, there's really no other way to add additional/extension jars to executors on k8s before startup. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42276) Add ServicesResourceTransformer to connect server module relocation configuration
[ https://issues.apache.org/jira/browse/SPARK-42276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie updated SPARK-42276: - Description: The contents of META-INF/services directory in the shaded connect-server jar have not been relocated. > Add ServicesResourceTransformer to connect server module relocation > configuration > -- > > Key: SPARK-42276 > URL: https://issues.apache.org/jira/browse/SPARK-42276 > Project: Spark > Issue Type: Bug > Components: Build, Connect >Affects Versions: 3.4.0, 3.5.0 >Reporter: Yang Jie >Priority: Major > > The contents of META-INF/services directory in the shaded connect-server jar > have not been relocated. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42279) Simplify `test_resample`
[ https://issues.apache.org/jira/browse/SPARK-42279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17682990#comment-17682990 ] Apache Spark commented on SPARK-42279: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/39847 > Simplify `test_resample` > > > Key: SPARK-42279 > URL: https://issues.apache.org/jira/browse/SPARK-42279 > Project: Spark > Issue Type: Test > Components: ps, Tests >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42279) Simplify `test_resample`
[ https://issues.apache.org/jira/browse/SPARK-42279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42279: Assignee: Apache Spark > Simplify `test_resample` > > > Key: SPARK-42279 > URL: https://issues.apache.org/jira/browse/SPARK-42279 > Project: Spark > Issue Type: Test > Components: ps, Tests >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Apache Spark >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42279) Simplify `test_resample`
[ https://issues.apache.org/jira/browse/SPARK-42279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42279: Assignee: (was: Apache Spark) > Simplify `test_resample` > > > Key: SPARK-42279 > URL: https://issues.apache.org/jira/browse/SPARK-42279 > Project: Spark > Issue Type: Test > Components: ps, Tests >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42276) Add ServicesResourceTransformer to connect server module relocation configuration
[ https://issues.apache.org/jira/browse/SPARK-42276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie updated SPARK-42276: - Summary: Add ServicesResourceTransformer to connect server module relocation configuration (was: Fix relocation configuration of connect server module) > Add ServicesResourceTransformer to connect server module relocation > configuration > -- > > Key: SPARK-42276 > URL: https://issues.apache.org/jira/browse/SPARK-42276 > Project: Spark > Issue Type: Bug > Components: Build, Connect >Affects Versions: 3.4.0, 3.5.0 >Reporter: Yang Jie >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42279) Simplify `test_resample`
Ruifeng Zheng created SPARK-42279: - Summary: Simplify `test_resample` Key: SPARK-42279 URL: https://issues.apache.org/jira/browse/SPARK-42279 Project: Spark Issue Type: Test Components: ps, Tests Affects Versions: 3.4.0 Reporter: Ruifeng Zheng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41241) Use Hive and Spark SQL to modify table field comment, the modified results of Hive cannot be queried using Spark SQL
[ https://issues.apache.org/jira/browse/SPARK-41241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weiliang hao updated SPARK-41241: - Description: -- Hive > create table table_test(id int); > alter table table_test change column id id int comment "hive comment"; > desc formatted table_test; {code:java} +---+++ | col_name | data_type | comment | +---+++ | # col_name | data_type | comment | | id | int | hive comment | | | NULL | NULL | | # Detailed Table Information | NULL | NULL | | Database: | default | NULL | | OwnerType: | USER | NULL | | Owner: | anonymous | NULL | | CreateTime: | Wed Nov 23 23:06:41 CST 2022 | NULL | | LastAccessTime: | UNKNOWN | NULL | | Retention: | 0 | NULL | | Location: | hdfs://localhost:8020/warehouse/tablespace/managed/hive/table_test | NULL | | Table Type: | MANAGED_TABLE | NULL | | Table Parameters: | NULL | NULL | | | COLUMN_STATS_ACCURATE | {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"id\":\"true\"}} | | | bucketing_version | 2 | | | last_modified_by | anonymous | | | last_modified_time | 1669216665 | | | numFiles | 0 | | | numRows | 0 | | | rawDataSize | 0 | | | totalSize | 0 | | | transactional | true | | | transactional_properties | default | | | transient_lastDdlTime | 1669216665 | | | NULL | NULL | | # Storage Information | NULL | NULL | | SerDe Library: | org.apache.hadoop.hive.ql.io.orc.OrcSerde | NULL | | InputFormat: | org.apache.hadoop.hive.ql.io.orc.OrcInputFormat | NULL | | OutputFormat: | org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat | NULL | | Compressed: | No
[jira] [Assigned] (SPARK-42274) Upgrade `compress-lzf` to 1.1.2
[ https://issues.apache.org/jira/browse/SPARK-42274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-42274: - Assignee: Dongjoon Hyun > Upgrade `compress-lzf` to 1.1.2 > --- > > Key: SPARK-42274 > URL: https://issues.apache.org/jira/browse/SPARK-42274 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42274) Upgrade `compress-lzf` to 1.1.2
[ https://issues.apache.org/jira/browse/SPARK-42274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-42274. --- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39841 [https://github.com/apache/spark/pull/39841] > Upgrade `compress-lzf` to 1.1.2 > --- > > Key: SPARK-42274 > URL: https://issues.apache.org/jira/browse/SPARK-42274 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Minor > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42259) ResolveGroupingAnalytics should take care of Python UDAF
[ https://issues.apache.org/jira/browse/SPARK-42259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-42259: --- Assignee: Wenchen Fan > ResolveGroupingAnalytics should take care of Python UDAF > > > Key: SPARK-42259 > URL: https://issues.apache.org/jira/browse/SPARK-42259 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0, 3.3.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42259) ResolveGroupingAnalytics should take care of Python UDAF
[ https://issues.apache.org/jira/browse/SPARK-42259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-42259. - Fix Version/s: 3.2.4 3.3.2 3.4.0 Resolution: Fixed Issue resolved by pull request 39824 [https://github.com/apache/spark/pull/39824] > ResolveGroupingAnalytics should take care of Python UDAF > > > Key: SPARK-42259 > URL: https://issues.apache.org/jira/browse/SPARK-42259 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0, 3.3.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > Fix For: 3.2.4, 3.3.2, 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42272) Use available ephemeral port for Spark Connect server in testing
[ https://issues.apache.org/jira/browse/SPARK-42272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-42272: Assignee: Hyukjin Kwon > Use available ephemeral port for Spark Connect server in testing > > > Key: SPARK-42272 > URL: https://issues.apache.org/jira/browse/SPARK-42272 > Project: Spark > Issue Type: Sub-task > Components: Connect, Tests >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > > Currently Spark Connect tests cannot run in parallel, and requires to set the > parallelism as 1 > {code} > python/run-tests --module pyspark-connect --parallelism 1 > {code} > The main reason is because of the port being used is hardcorded as the > default 15002. We should better search available port, and use it -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42272) Use available ephemeral port for Spark Connect server in testing
[ https://issues.apache.org/jira/browse/SPARK-42272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-42272. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39834 [https://github.com/apache/spark/pull/39834] > Use available ephemeral port for Spark Connect server in testing > > > Key: SPARK-42272 > URL: https://issues.apache.org/jira/browse/SPARK-42272 > Project: Spark > Issue Type: Sub-task > Components: Connect, Tests >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 3.4.0 > > > Currently Spark Connect tests cannot run in parallel, and requires to set the > parallelism as 1 > {code} > python/run-tests --module pyspark-connect --parallelism 1 > {code} > The main reason is because of the port being used is hardcorded as the > default 15002. We should better search available port, and use it -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42278) DS V2 pushdown supports supports JDBC dialects compile `SortOrder` by themselves
[ https://issues.apache.org/jira/browse/SPARK-42278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17682919#comment-17682919 ] Apache Spark commented on SPARK-42278: -- User 'beliefer' has created a pull request for this issue: https://github.com/apache/spark/pull/39846 > DS V2 pushdown supports supports JDBC dialects compile `SortOrder` by > themselves > > > Key: SPARK-42278 > URL: https://issues.apache.org/jira/browse/SPARK-42278 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: jiaan.geng >Priority: Major > > Currently, DS V2 pushdown framework compile the SortOrder in fixed format. > This is not flexible and friendly for some databases that do not support this > syntax. > For example, the fixed format order by col asc nulls first is not supported > by mssql server who doesn't support nulls first. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org