[jira] [Resolved] (SPARK-42288) Expose file path if reading failed
[ https://issues.apache.org/jira/browse/SPARK-42288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yikaifei resolved SPARK-42288. -- Resolution: Duplicate > Expose file path if reading failed > -- > > Key: SPARK-42288 > URL: https://issues.apache.org/jira/browse/SPARK-42288 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: yikaifei >Priority: Minor > > `MalformedInputException` may be thrown because the decompression failed when > reading the file. In this case, the error message does not contain the file > name. If the file name is included, it is easier to locate the problem. > {code:java} > org.apache.spark.SparkException: Job aborted due to stage failure: Task 41 in > stage 15641.0 failed 10 times, most recent failure: Lost task 41.9 in stage > 15641.0 (TID 6287211) (hostname executor 58): > io.airlift.compress.MalformedInputException: Malformed input: offset=65075 > at > io.airlift.compress.snappy.SnappyRawDecompressor.uncompressAll(SnappyRawDecompressor.java:108) > at > io.airlift.compress.snappy.SnappyRawDecompressor.decompress(SnappyRawDecompressor.java:53) > at > io.airlift.compress.snappy.SnappyDecompressor.decompress(SnappyDecompressor.java:45) > at > org.apache.orc.impl.AircompressorCodec.decompress(AircompressorCodec.java:94) > at org.apache.orc.impl.SnappyCodec.decompress(SnappyCodec.java:45) > at > org.apache.orc.impl.InStream$CompressedStream.readHeader(InStream.java:495) > at > org.apache.orc.impl.InStream$CompressedStream.ensureUncompressed(InStream.java:522) > at org.apache.orc.impl.InStream$CompressedStream.read(InStream.java:509) > at > org.apache.orc.impl.SerializationUtils.readRemainingLongs(SerializationUtils.java:1102) > at > org.apache.orc.impl.SerializationUtils.unrolledUnPackBytes(SerializationUtils.java:1094) > at > org.apache.orc.impl.SerializationUtils.unrolledUnPack32(SerializationUtils.java:1059) > at > org.apache.orc.impl.SerializationUtils.readInts(SerializationUtils.java:925) > at > org.apache.orc.impl.RunLengthIntegerReaderV2.readDirectValues(RunLengthIntegerReaderV2.java:268) > at > org.apache.orc.impl.RunLengthIntegerReaderV2.readValues(RunLengthIntegerReaderV2.java:69) > at > org.apache.orc.impl.RunLengthIntegerReaderV2.next(RunLengthIntegerReaderV2.java:323) > at > org.apache.orc.impl.RunLengthIntegerReaderV2.nextVector(RunLengthIntegerReaderV2.java:373) > at > org.apache.orc.impl.TreeReaderFactory$LongTreeReader.nextVector(TreeReaderFactory.java:641) > at > org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextBatch(TreeReaderFactory.java:2047) > at > org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1219) > at > org.apache.spark.sql.execution.datasources.orc.OrcColumnarBatchReader.nextBatch(OrcColumnarBatchReader.java:197) > at > org.apache.spark.sql.execution.datasources.orc.OrcColumnarBatchReader.nextKeyValue(OrcColumnarBatchReader.java:99) > at > org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:93) > at > org.apache.spark.sql.execution.FileSourceScanExec$$anon$1.hasNext(DataSourceScanExec.scala:522) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage8.columnartorow_nextBatch_0$(Unknown > Source) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage8.agg_doAggregateWithKeys_0$(Unknown > Source) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage8.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759) > at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) > at > org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:179) > at > org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52) > at org.apache.spark.scheduler.Task.run(Task.scala:131) > at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:510) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1491) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:513)
[jira] [Created] (SPARK-43228) Join keys also match PartitioningCollection
Yuming Wang created SPARK-43228: --- Summary: Join keys also match PartitioningCollection Key: SPARK-43228 URL: https://issues.apache.org/jira/browse/SPARK-43228 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.5.0 Reporter: Yuming Wang -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43229) Support Barrier Python UDF
Ruifeng Zheng created SPARK-43229: - Summary: Support Barrier Python UDF Key: SPARK-43229 URL: https://issues.apache.org/jira/browse/SPARK-43229 Project: Spark Issue Type: New Feature Components: Connect, ML, PySpark Affects Versions: 3.5.0 Reporter: Ruifeng Zheng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43156) Correctness COUNT bug in correlated scalar subselect with `COUNT(*) is null`
[ https://issues.apache.org/jira/browse/SPARK-43156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17714903#comment-17714903 ] ASF GitHub Bot commented on SPARK-43156: User 'Hisoka-X' has created a pull request for this issue: https://github.com/apache/spark/pull/40865 > Correctness COUNT bug in correlated scalar subselect with `COUNT(*) is null` > > > Key: SPARK-43156 > URL: https://issues.apache.org/jira/browse/SPARK-43156 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Jack Chen >Priority: Major > > Example query: > {code:java} > spark.sql("select *, (select (count(1)) is null from t1 where t0.a = t1.c) > from t0").collect() > res6: Array[org.apache.spark.sql.Row] = Array([1,1.0,null], [2,2.0,false]) > {code} > In this subquery, count(1) always evaluates to a non-null integer value, so > count(1) is null is always false. The correct evaluation of the subquery is > always false. > We incorrectly evaluate it to null for empty groups. The reason is that > NullPropagation rewrites Aggregate [c] [isnull(count(1))] to Aggregate [c] > [false] - this rewrite would be correct normally, but in the context of a > scalar subquery it breaks our count bug handling in > RewriteCorrelatedScalarSubquery.constructLeftJoins . By the time we get > there, the query appears to not have the count bug - it looks the same as if > the original query had a subquery with select any_value(false) from r..., and > that case is _not_ subject to the count bug. > > Postgres comparison show correct always-false result: > [http://sqlfiddle.com/#!17/67822/5] > DDL for the example: > {code:java} > create or replace temp view t0 (a, b) > as values > (1, 1.0), > (2, 2.0); > create or replace temp view t1 (c, d) > as values > (2, 3.0); {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43229) Support Barrier Python UDF
[ https://issues.apache.org/jira/browse/SPARK-43229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17714902#comment-17714902 ] ASF GitHub Bot commented on SPARK-43229: User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/40896 > Support Barrier Python UDF > -- > > Key: SPARK-43229 > URL: https://issues.apache.org/jira/browse/SPARK-43229 > Project: Spark > Issue Type: New Feature > Components: Connect, ML, PySpark >Affects Versions: 3.5.0 >Reporter: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43128) Streaming progress struct (especially in Scala)
[ https://issues.apache.org/jira/browse/SPARK-43128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17714904#comment-17714904 ] ASF GitHub Bot commented on SPARK-43128: User 'bogao007' has created a pull request for this issue: https://github.com/apache/spark/pull/40895 > Streaming progress struct (especially in Scala) > --- > > Key: SPARK-43128 > URL: https://issues.apache.org/jira/browse/SPARK-43128 > Project: Spark > Issue Type: Task > Components: Connect, Structured Streaming >Affects Versions: 3.5.0 >Reporter: Raghu Angadi >Priority: Major > > Streaming spark connect transfers streaming progress as full “json”. > This works ok for Python since it does not have any schema defined. > But in Scala, it is a full fledged class. We need to decide if we want to > match legacy Progress struct in spark-connect. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43230) Simplify `DataFrameNaFunctions.fillna`
Ruifeng Zheng created SPARK-43230: - Summary: Simplify `DataFrameNaFunctions.fillna` Key: SPARK-43230 URL: https://issues.apache.org/jira/browse/SPARK-43230 Project: Spark Issue Type: New Feature Components: Connect Affects Versions: 3.5.0 Reporter: Ruifeng Zheng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43230) Simplify `DataFrameNaFunctions.fillna`
[ https://issues.apache.org/jira/browse/SPARK-43230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17714906#comment-17714906 ] ASF GitHub Bot commented on SPARK-43230: User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/40898 > Simplify `DataFrameNaFunctions.fillna` > -- > > Key: SPARK-43230 > URL: https://issues.apache.org/jira/browse/SPARK-43230 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.5.0 >Reporter: Ruifeng Zheng >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43228) Join keys also match PartitioningCollection
[ https://issues.apache.org/jira/browse/SPARK-43228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17714907#comment-17714907 ] Yuming Wang commented on SPARK-43228: - https://github.com/apache/spark/pull/40897 > Join keys also match PartitioningCollection > --- > > Key: SPARK-43228 > URL: https://issues.apache.org/jira/browse/SPARK-43228 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: Yuming Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43228) Join keys also match PartitioningCollection
[ https://issues.apache.org/jira/browse/SPARK-43228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17714909#comment-17714909 ] ASF GitHub Bot commented on SPARK-43228: User 'wangyum' has created a pull request for this issue: https://github.com/apache/spark/pull/40897 > Join keys also match PartitioningCollection > --- > > Key: SPARK-43228 > URL: https://issues.apache.org/jira/browse/SPARK-43228 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: Yuming Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43199) Make InlineCTE idempotent
[ https://issues.apache.org/jira/browse/SPARK-43199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17714910#comment-17714910 ] ASF GitHub Bot commented on SPARK-43199: User 'peter-toth' has created a pull request for this issue: https://github.com/apache/spark/pull/40856 > Make InlineCTE idempotent > - > > Key: SPARK-43199 > URL: https://issues.apache.org/jira/browse/SPARK-43199 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Peter Toth >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43231) Reduce the memory requirement in torch-related tests
Ruifeng Zheng created SPARK-43231: - Summary: Reduce the memory requirement in torch-related tests Key: SPARK-43231 URL: https://issues.apache.org/jira/browse/SPARK-43231 Project: Spark Issue Type: Test Components: Connect, ML, PySpark, Tests Affects Versions: 3.5.0 Reporter: Ruifeng Zheng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43232) Improve ObjectHashAggregateExec performance
XiDuo You created SPARK-43232: - Summary: Improve ObjectHashAggregateExec performance Key: SPARK-43232 URL: https://issues.apache.org/jira/browse/SPARK-43232 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.5.0 Reporter: XiDuo You The `ObjectHashAggregateExec` has two preformance issues: - heavy overhead of scala sugar in `createNewAggregationBuffer` - unnecessary grouping key comparation if fallback to sort based aggregator -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43232) Improve ObjectHashAggregateExec performance
[ https://issues.apache.org/jira/browse/SPARK-43232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] XiDuo You updated SPARK-43232: -- Description: The `ObjectHashAggregateExec` has two preformance issues: - heavy overhead of scala sugar in `createNewAggregationBuffer` - unnecessary grouping key comparation after fallback to sort based aggregator was: The `ObjectHashAggregateExec` has two preformance issues: - heavy overhead of scala sugar in `createNewAggregationBuffer` - unnecessary grouping key comparation if fallback to sort based aggregator > Improve ObjectHashAggregateExec performance > --- > > Key: SPARK-43232 > URL: https://issues.apache.org/jira/browse/SPARK-43232 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: XiDuo You >Priority: Major > > The `ObjectHashAggregateExec` has two preformance issues: > - heavy overhead of scala sugar in `createNewAggregationBuffer` > - unnecessary grouping key comparation after fallback to sort based > aggregator > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42780) Upgrade google Tink to 1.9.0
[ https://issues.apache.org/jira/browse/SPARK-42780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bjørn Jørgensen updated SPARK-42780: Summary: Upgrade google Tink to 1.9.0 (was: Upgrade google Tink from 1.7.0 to 1.8.0) > Upgrade google Tink to 1.9.0 > > > Key: SPARK-42780 > URL: https://issues.apache.org/jira/browse/SPARK-42780 > Project: Spark > Issue Type: Dependency upgrade > Components: Build >Affects Versions: 3.5.0 >Reporter: Bjørn Jørgensen >Priority: Major > > [SNYK-JAVA-COMGOOGLEPROTOBUF-3040284|https://security.snyk.io/vuln/SNYK-JAVA-COMGOOGLEPROTOBUF-3040284] > [SNYK-JAVA-COMGOOGLEPROTOBUF-3167772|https://security.snyk.io/vuln/SNYK-JAVA-COMGOOGLEPROTOBUF-3167772] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42780) Upgrade google Tink to 1.9.0
[ https://issues.apache.org/jira/browse/SPARK-42780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17714971#comment-17714971 ] Bjørn Jørgensen commented on SPARK-42780: - https://github.com/apache/spark/pull/40878 > Upgrade google Tink to 1.9.0 > > > Key: SPARK-42780 > URL: https://issues.apache.org/jira/browse/SPARK-42780 > Project: Spark > Issue Type: Dependency upgrade > Components: Build >Affects Versions: 3.5.0 >Reporter: Bjørn Jørgensen >Priority: Major > > [SNYK-JAVA-COMGOOGLEPROTOBUF-3040284|https://security.snyk.io/vuln/SNYK-JAVA-COMGOOGLEPROTOBUF-3040284] > [SNYK-JAVA-COMGOOGLEPROTOBUF-3167772|https://security.snyk.io/vuln/SNYK-JAVA-COMGOOGLEPROTOBUF-3167772] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43142) DSL expressions fail on attribute with special characters
[ https://issues.apache.org/jira/browse/SPARK-43142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-43142. - Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 40794 [https://github.com/apache/spark/pull/40794] > DSL expressions fail on attribute with special characters > - > > Key: SPARK-43142 > URL: https://issues.apache.org/jira/browse/SPARK-43142 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Willi Raschkowski >Priority: Major > Fix For: 3.5.0 > > > Expressions on implicitly converted attributes fail if the attributes have > names containing special characters. They fail even if the attributes are > backtick-quoted: > {code:java} > scala> import org.apache.spark.sql.catalyst.dsl.expressions._ > import org.apache.spark.sql.catalyst.dsl.expressions._ > scala> "`slashed/col`".attr > res0: org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute = > 'slashed/col > scala> "`slashed/col`".attr.asc > org.apache.spark.sql.catalyst.parser.ParseException: > mismatched input '/' expecting {, '.', '-'}(line 1, pos 7) > == SQL == > slashed/col > ---^^^ > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43142) DSL expressions fail on attribute with special characters
[ https://issues.apache.org/jira/browse/SPARK-43142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-43142: --- Assignee: Willi Raschkowski > DSL expressions fail on attribute with special characters > - > > Key: SPARK-43142 > URL: https://issues.apache.org/jira/browse/SPARK-43142 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Willi Raschkowski >Assignee: Willi Raschkowski >Priority: Major > Fix For: 3.5.0 > > > Expressions on implicitly converted attributes fail if the attributes have > names containing special characters. They fail even if the attributes are > backtick-quoted: > {code:java} > scala> import org.apache.spark.sql.catalyst.dsl.expressions._ > import org.apache.spark.sql.catalyst.dsl.expressions._ > scala> "`slashed/col`".attr > res0: org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute = > 'slashed/col > scala> "`slashed/col`".attr.asc > org.apache.spark.sql.catalyst.parser.ParseException: > mismatched input '/' expecting {, '.', '-'}(line 1, pos 7) > == SQL == > slashed/col > ---^^^ > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42330) Assign name to _LEGACY_ERROR_TEMP_2175
[ https://issues.apache.org/jira/browse/SPARK-42330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17714992#comment-17714992 ] Koray Beyaz commented on SPARK-42330: - Working on this issue > Assign name to _LEGACY_ERROR_TEMP_2175 > -- > > Key: SPARK-42330 > URL: https://issues.apache.org/jira/browse/SPARK-42330 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43196) Replace reflection w/ direct calling for `ContainerLaunchContext#setTokensConf`
[ https://issues.apache.org/jira/browse/SPARK-43196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17715030#comment-17715030 ] Ignite TC Bot commented on SPARK-43196: --- User 'pan3793' has created a pull request for this issue: https://github.com/apache/spark/pull/40900 > Replace reflection w/ direct calling for > `ContainerLaunchContext#setTokensConf` > --- > > Key: SPARK-43196 > URL: https://issues.apache.org/jira/browse/SPARK-43196 > Project: Spark > Issue Type: Sub-task > Components: YARN >Affects Versions: 3.5.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Minor > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-43142) DSL expressions fail on attribute with special characters
[ https://issues.apache.org/jira/browse/SPARK-43142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang reopened SPARK-43142: - > DSL expressions fail on attribute with special characters > - > > Key: SPARK-43142 > URL: https://issues.apache.org/jira/browse/SPARK-43142 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Willi Raschkowski >Assignee: Willi Raschkowski >Priority: Major > Fix For: 3.5.0 > > > Expressions on implicitly converted attributes fail if the attributes have > names containing special characters. They fail even if the attributes are > backtick-quoted: > {code:java} > scala> import org.apache.spark.sql.catalyst.dsl.expressions._ > import org.apache.spark.sql.catalyst.dsl.expressions._ > scala> "`slashed/col`".attr > res0: org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute = > 'slashed/col > scala> "`slashed/col`".attr.asc > org.apache.spark.sql.catalyst.parser.ParseException: > mismatched input '/' expecting {, '.', '-'}(line 1, pos 7) > == SQL == > slashed/col > ---^^^ > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43142) DSL expressions fail on attribute with special characters
[ https://issues.apache.org/jira/browse/SPARK-43142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-43142: Fix Version/s: (was: 3.5.0) > DSL expressions fail on attribute with special characters > - > > Key: SPARK-43142 > URL: https://issues.apache.org/jira/browse/SPARK-43142 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Willi Raschkowski >Assignee: Willi Raschkowski >Priority: Major > > Expressions on implicitly converted attributes fail if the attributes have > names containing special characters. They fail even if the attributes are > backtick-quoted: > {code:java} > scala> import org.apache.spark.sql.catalyst.dsl.expressions._ > import org.apache.spark.sql.catalyst.dsl.expressions._ > scala> "`slashed/col`".attr > res0: org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute = > 'slashed/col > scala> "`slashed/col`".attr.asc > org.apache.spark.sql.catalyst.parser.ParseException: > mismatched input '/' expecting {, '.', '-'}(line 1, pos 7) > == SQL == > slashed/col > ---^^^ > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43179) Add option for applications to control saving of metadata in the External Shuffle Service LevelDB
[ https://issues.apache.org/jira/browse/SPARK-43179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mridul Muralidharan resolved SPARK-43179. - Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 40843 [https://github.com/apache/spark/pull/40843] > Add option for applications to control saving of metadata in the External > Shuffle Service LevelDB > - > > Key: SPARK-43179 > URL: https://issues.apache.org/jira/browse/SPARK-43179 > Project: Spark > Issue Type: Improvement > Components: Shuffle >Affects Versions: 3.4.0 >Reporter: Chandni Singh >Assignee: Chandni Singh >Priority: Major > Fix For: 3.5.0 > > > Currently, the External Shuffle Service stores application metadata in > LevelDB. This is necessary to enable the shuffle server to resume serving > shuffle data for an application whose executors registered before the > NodeManager restarts. However, the metadata includes the application secret, > which is stored in LevelDB without encryption. This is a potential security > risk, particularly for applications with high security requirements. While > filesystem access control lists (ACLs) can help protect keys and > certificates, they may not be sufficient for some use cases. In response, we > have decided not to store metadata for these high-security applications in > LevelDB. As a result, these applications may experience more failures in the > event of a node restart, but we believe this trade-off is acceptable given > the increased security risk. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43179) Add option for applications to control saving of metadata in the External Shuffle Service LevelDB
[ https://issues.apache.org/jira/browse/SPARK-43179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mridul Muralidharan reassigned SPARK-43179: --- Assignee: Chandni Singh > Add option for applications to control saving of metadata in the External > Shuffle Service LevelDB > - > > Key: SPARK-43179 > URL: https://issues.apache.org/jira/browse/SPARK-43179 > Project: Spark > Issue Type: Improvement > Components: Shuffle >Affects Versions: 3.4.0 >Reporter: Chandni Singh >Assignee: Chandni Singh >Priority: Major > > Currently, the External Shuffle Service stores application metadata in > LevelDB. This is necessary to enable the shuffle server to resume serving > shuffle data for an application whose executors registered before the > NodeManager restarts. However, the metadata includes the application secret, > which is stored in LevelDB without encryption. This is a potential security > risk, particularly for applications with high security requirements. While > filesystem access control lists (ACLs) can help protect keys and > certificates, they may not be sufficient for some use cases. In response, we > have decided not to store metadata for these high-security applications in > LevelDB. As a result, these applications may experience more failures in the > event of a node restart, but we believe this trade-off is acceptable given > the increased security risk. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43134) Add streaming query exception API in Scala
[ https://issues.apache.org/jira/browse/SPARK-43134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17715148#comment-17715148 ] Wei Liu commented on SPARK-43134: - I'm working on this > Add streaming query exception API in Scala > -- > > Key: SPARK-43134 > URL: https://issues.apache.org/jira/browse/SPARK-43134 > Project: Spark > Issue Type: Task > Components: Connect, Structured Streaming >Affects Versions: 3.5.0 >Reporter: Raghu Angadi >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43032) Add StreamingQueryManager API
[ https://issues.apache.org/jira/browse/SPARK-43032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17715147#comment-17715147 ] Wei Liu commented on SPARK-43032: - [https://github.com/apache/spark/pull/40861] still draft > Add StreamingQueryManager API > - > > Key: SPARK-43032 > URL: https://issues.apache.org/jira/browse/SPARK-43032 > Project: Spark > Issue Type: Task > Components: Connect, Structured Streaming >Affects Versions: 3.5.0 >Reporter: Raghu Angadi >Priority: Major > > Add StreamingQueryManager API. It would include API that can be directly > support. API like registering streaming listener will be handled separately. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43143) Scala: Add StreamingQuery awaitTermination() API
[ https://issues.apache.org/jira/browse/SPARK-43143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17715149#comment-17715149 ] Wei Liu commented on SPARK-43143: - I'm working on this > Scala: Add StreamingQuery awaitTermination() API > > > Key: SPARK-43143 > URL: https://issues.apache.org/jira/browse/SPARK-43143 > Project: Spark > Issue Type: Task > Components: Connect, Structured Streaming >Affects Versions: 3.5.0 >Reporter: Raghu Angadi >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43206) Streaming query exception() also include stack trace
[ https://issues.apache.org/jira/browse/SPARK-43206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17715171#comment-17715171 ] Wei Liu commented on SPARK-43206: - I'll work on this. To myself: don't forget jvm exceptions > Streaming query exception() also include stack trace > > > Key: SPARK-43206 > URL: https://issues.apache.org/jira/browse/SPARK-43206 > Project: Spark > Issue Type: Task > Components: Connect, Structured Streaming >Affects Versions: 3.5.0 >Reporter: Wei Liu >Priority: Major > > [https://github.com/apache/spark/pull/40785#issuecomment-1515522281] > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43233) Before batch reading from Kafka, log topic partition, offset range, etc, for debuggin
Siying Dong created SPARK-43233: --- Summary: Before batch reading from Kafka, log topic partition, offset range, etc, for debuggin Key: SPARK-43233 URL: https://issues.apache.org/jira/browse/SPARK-43233 Project: Spark Issue Type: Improvement Components: Structured Streaming Affects Versions: 3.4.0 Reporter: Siying Dong When debugging some slowness issue in structured streaming, it is hard to map a Kafka topic and partition to a Kafka task. Adding some logging in executor might help make it easier. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43233) Before batch reading from Kafka, log topic partition, offset range, etc, for debuggin
[ https://issues.apache.org/jira/browse/SPARK-43233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated SPARK-43233: Issue Type: Task (was: Improvement) > Before batch reading from Kafka, log topic partition, offset range, etc, for > debuggin > - > > Key: SPARK-43233 > URL: https://issues.apache.org/jira/browse/SPARK-43233 > Project: Spark > Issue Type: Task > Components: Structured Streaming >Affects Versions: 3.4.0 >Reporter: Siying Dong >Priority: Trivial > > When debugging some slowness issue in structured streaming, it is hard to map > a Kafka topic and partition to a Kafka task. Adding some logging in executor > might help make it easier. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43206) Connect Better StreamingQueryException
[ https://issues.apache.org/jira/browse/SPARK-43206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17715178#comment-17715178 ] Wei Liu commented on SPARK-43206: - Also cause, offsets, stack trace... > Connect Better StreamingQueryException > -- > > Key: SPARK-43206 > URL: https://issues.apache.org/jira/browse/SPARK-43206 > Project: Spark > Issue Type: Task > Components: Connect, Structured Streaming >Affects Versions: 3.5.0 >Reporter: Wei Liu >Priority: Major > > [https://github.com/apache/spark/pull/40785#issuecomment-1515522281] > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43206) Connect Better StreamingQueryException
[ https://issues.apache.org/jira/browse/SPARK-43206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Liu updated SPARK-43206: Summary: Connect Better StreamingQueryException (was: Streaming query exception() also include stack trace) > Connect Better StreamingQueryException > -- > > Key: SPARK-43206 > URL: https://issues.apache.org/jira/browse/SPARK-43206 > Project: Spark > Issue Type: Task > Components: Connect, Structured Streaming >Affects Versions: 3.5.0 >Reporter: Wei Liu >Priority: Major > > [https://github.com/apache/spark/pull/40785#issuecomment-1515522281] > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38114) Spark build fails in Windows
[ https://issues.apache.org/jira/browse/SPARK-38114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17715180#comment-17715180 ] Felipe commented on SPARK-38114: Hi, this seems a big issue. Anybody found a workaround? > Spark build fails in Windows > > > Key: SPARK-38114 > URL: https://issues.apache.org/jira/browse/SPARK-38114 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.3.0 >Reporter: SOUVIK PAUL >Priority: Major > > java.lang.NoSuchMethodError: > org.fusesource.jansi.AnsiConsole.wrapOutputStream(Ljava/io/OutputStream;)Ljava/io/OutputStream; > jline.AnsiWindowsTerminal.detectAnsiSupport(AnsiWindowsTerminal.java:57) > jline.AnsiWindowsTerminal.(AnsiWindowsTerminal.java:27) > > A similar issue is being faced by the quarkus project with latest Maven. > [https://github.com/quarkusio/quarkus/issues/19491] > > Upgrading the scala-maven-plugin seems to resolve the issue but this ticket > can be a blocker > https://issues.apache.org/jira/browse/SPARK-36547 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43174) Fix SparkSQLCLIDriver completer
[ https://issues.apache.org/jira/browse/SPARK-43174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang resolved SPARK-43174. - Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 40838 [https://github.com/apache/spark/pull/40838] > Fix SparkSQLCLIDriver completer > --- > > Key: SPARK-43174 > URL: https://issues.apache.org/jira/browse/SPARK-43174 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.5.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43174) Fix SparkSQLCLIDriver completer
[ https://issues.apache.org/jira/browse/SPARK-43174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang reassigned SPARK-43174: --- Assignee: Yuming Wang > Fix SparkSQLCLIDriver completer > --- > > Key: SPARK-43174 > URL: https://issues.apache.org/jira/browse/SPARK-43174 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.5.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43046) Implement dropDuplicatesWithinWatermark in Spark Connect
[ https://issues.apache.org/jira/browse/SPARK-43046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-43046. -- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 40834 [https://github.com/apache/spark/pull/40834] > Implement dropDuplicatesWithinWatermark in Spark Connect > > > Key: SPARK-43046 > URL: https://issues.apache.org/jira/browse/SPARK-43046 > Project: Spark > Issue Type: Task > Components: Structured Streaming >Affects Versions: 3.5.0 >Reporter: Jungtaek Lim >Priority: Major > Fix For: 3.5.0 > > > Once SPARK-42931 has merged, we will need to add > dropDuplicatesWithinWatermark API to Spark connect, both Python and > Scala/Java. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43082) Arrow-optimized Python UDFs in Spark Connect
[ https://issues.apache.org/jira/browse/SPARK-43082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-43082. -- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 40725 [https://github.com/apache/spark/pull/40725] > Arrow-optimized Python UDFs in Spark Connect > > > Key: SPARK-43082 > URL: https://issues.apache.org/jira/browse/SPARK-43082 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.5.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > Fix For: 3.5.0 > > > Implement Arrow-optimized Python UDFs in Spark Connect. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43082) Arrow-optimized Python UDFs in Spark Connect
[ https://issues.apache.org/jira/browse/SPARK-43082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-43082: Assignee: Xinrong Meng > Arrow-optimized Python UDFs in Spark Connect > > > Key: SPARK-43082 > URL: https://issues.apache.org/jira/browse/SPARK-43082 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.5.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > > Implement Arrow-optimized Python UDFs in Spark Connect. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43234) Migrate ValueError from Connect DataFrame into error class
Haejoon Lee created SPARK-43234: --- Summary: Migrate ValueError from Connect DataFrame into error class Key: SPARK-43234 URL: https://issues.apache.org/jira/browse/SPARK-43234 Project: Spark Issue Type: Sub-task Components: Connect, PySpark Affects Versions: 3.5.0 Reporter: Haejoon Lee Migrate ValueError from Connect DataFrame into error class -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org