[jira] [Updated] (SPARK-37802) composite field name like `field name` doesn't work with Aggregate push down
[ https://issues.apache.org/jira/browse/SPARK-37802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxin Gao updated SPARK-37802: --- Fix Version/s: 3.2.1 (was: 3.2.0) > composite field name like `field name` doesn't work with Aggregate push down > > > Key: SPARK-37802 > URL: https://issues.apache.org/jira/browse/SPARK-37802 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0, 3.3.0 >Reporter: Huaxin Gao >Assignee: Huaxin Gao >Priority: Minor > Fix For: 3.2.1, 3.3.0 > > > {code:java} > sql("SELECT SUM(`field name`) FROM h2.test.table") > org.apache.spark.sql.catalyst.parser.ParseException: > extraneous input 'name' expecting (line 1, pos 9) > at > org.apache.spark.sql.catalyst.parser.ParseErrorListener$.syntaxError(ParseDriver.scala:212) > at > org.antlr.v4.runtime.ProxyErrorListener.syntaxError(ProxyErrorListener.java:41) > at org.antlr.v4.runtime.Parser.notifyErrorListeners(Parser.java:544) > at > org.antlr.v4.runtime.DefaultErrorStrategy.reportUnwantedToken(DefaultErrorStrategy.java:377) > at > org.antlr.v4.runtime.DefaultErrorStrategy.singleTokenDeletion(DefaultErrorStrategy.java:548) > at > org.antlr.v4.runtime.DefaultErrorStrategy.recoverInline(DefaultErrorStrategy.java:467) > at org.antlr.v4.runtime.Parser.match(Parser.java:206) > at > org.apache.spark.sql.catalyst.parser.SqlBaseParser.singleMultipartIdentifier(SqlBaseParser.java:519) > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37853) Clean up deprecation compilation warning related to log4j2
Yang Jie created SPARK-37853: Summary: Clean up deprecation compilation warning related to log4j2 Key: SPARK-37853 URL: https://issues.apache.org/jira/browse/SPARK-37853 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.3.0 Reporter: Yang Jie [WARNING] [Warn] /spark-source/core/src/main/scala/org/apache/spark/util/logging/DriverLogger.scala:64: [deprecation @ org.apache.spark.util.logging.DriverLogger.addLogAppender.fa | origin=org.apache.logging.log4j.core.appender.FileAppender.createAppender | version=] method createAppender in class FileAppender is deprecated [WARNING] [Warn] /spark-source/core/src/test/scala/org/apache/spark/SparkFunSuite.scala:268: [deprecation @ org.apache.spark.SparkFunSuite.LogAppender. | origin=org.apache.logging.log4j.core.appender.AbstractAppender. | version=] constructor AbstractAppender in class AbstractAppender is deprecated -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37851) Mark org.apache.spark.sql.hive.execution as slow tests
[ https://issues.apache.org/jira/browse/SPARK-37851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-37851: - Summary: Mark org.apache.spark.sql.hive.execution as slow tests (was: Mark TPCDS*Suite, SQLQuerySuite and org.apache.spark.sql.hive.execution as slow tests) > Mark org.apache.spark.sql.hive.execution as slow tests > -- > > Key: SPARK-37851 > URL: https://issues.apache.org/jira/browse/SPARK-37851 > Project: Spark > Issue Type: Improvement > Components: SQL, Tests >Affects Versions: 3.3.0 >Reporter: Hyukjin Kwon >Priority: Major > > Related to SPARK-33171 and SPARK-32884. We should rebalance both: > "sql - slow tests": > https://github.com/apache/spark/runs/4755996273?check_suite_focus=true > "sql - other tests": > https://github.com/apache/spark/runs/4755996343?check_suite_focus=true > and > "hive -slow tests": > https://github.com/apache/spark/runs/4755996153?check_suite_focus=true > "hive - other tests": > https://github.com/apache/spark/runs/4755996212?check_suite_focus=true -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37851) Mark org.apache.spark.sql.hive.execution as slow tests
[ https://issues.apache.org/jira/browse/SPARK-37851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-37851: - Description: Related to SPARK-33171 and SPARK-32884. We should rebalance both: "hive -slow tests": https://github.com/apache/spark/runs/4755996153?check_suite_focus=true "hive - other tests": https://github.com/apache/spark/runs/4755996212?check_suite_focus=true was: Related to SPARK-33171 and SPARK-32884. We should rebalance both: "sql - slow tests": https://github.com/apache/spark/runs/4755996273?check_suite_focus=true "sql - other tests": https://github.com/apache/spark/runs/4755996343?check_suite_focus=true and "hive -slow tests": https://github.com/apache/spark/runs/4755996153?check_suite_focus=true "hive - other tests": https://github.com/apache/spark/runs/4755996212?check_suite_focus=true > Mark org.apache.spark.sql.hive.execution as slow tests > -- > > Key: SPARK-37851 > URL: https://issues.apache.org/jira/browse/SPARK-37851 > Project: Spark > Issue Type: Improvement > Components: SQL, Tests >Affects Versions: 3.3.0 >Reporter: Hyukjin Kwon >Priority: Major > > Related to SPARK-33171 and SPARK-32884. We should rebalance both: > "hive -slow tests": > https://github.com/apache/spark/runs/4755996153?check_suite_focus=true > "hive - other tests": > https://github.com/apache/spark/runs/4755996212?check_suite_focus=true -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37853) Clean up deprecation compilation warning related to log4j2
[ https://issues.apache.org/jira/browse/SPARK-37853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17471785#comment-17471785 ] Apache Spark commented on SPARK-37853: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/35153 > Clean up deprecation compilation warning related to log4j2 > -- > > Key: SPARK-37853 > URL: https://issues.apache.org/jira/browse/SPARK-37853 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Yang Jie >Priority: Minor > > [WARNING] [Warn] > /spark-source/core/src/main/scala/org/apache/spark/util/logging/DriverLogger.scala:64: > [deprecation @ org.apache.spark.util.logging.DriverLogger.addLogAppender.fa > | origin=org.apache.logging.log4j.core.appender.FileAppender.createAppender | > version=] method createAppender in class FileAppender is deprecated > [WARNING] [Warn] > /spark-source/core/src/test/scala/org/apache/spark/SparkFunSuite.scala:268: > [deprecation @ org.apache.spark.SparkFunSuite.LogAppender. | > origin=org.apache.logging.log4j.core.appender.AbstractAppender. | > version=] constructor AbstractAppender in class AbstractAppender is deprecated -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37853) Clean up deprecation compilation warning related to log4j2
[ https://issues.apache.org/jira/browse/SPARK-37853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37853: Assignee: (was: Apache Spark) > Clean up deprecation compilation warning related to log4j2 > -- > > Key: SPARK-37853 > URL: https://issues.apache.org/jira/browse/SPARK-37853 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Yang Jie >Priority: Minor > > [WARNING] [Warn] > /spark-source/core/src/main/scala/org/apache/spark/util/logging/DriverLogger.scala:64: > [deprecation @ org.apache.spark.util.logging.DriverLogger.addLogAppender.fa > | origin=org.apache.logging.log4j.core.appender.FileAppender.createAppender | > version=] method createAppender in class FileAppender is deprecated > [WARNING] [Warn] > /spark-source/core/src/test/scala/org/apache/spark/SparkFunSuite.scala:268: > [deprecation @ org.apache.spark.SparkFunSuite.LogAppender. | > origin=org.apache.logging.log4j.core.appender.AbstractAppender. | > version=] constructor AbstractAppender in class AbstractAppender is deprecated -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37853) Clean up deprecation compilation warning related to log4j2
[ https://issues.apache.org/jira/browse/SPARK-37853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37853: Assignee: Apache Spark > Clean up deprecation compilation warning related to log4j2 > -- > > Key: SPARK-37853 > URL: https://issues.apache.org/jira/browse/SPARK-37853 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Minor > > [WARNING] [Warn] > /spark-source/core/src/main/scala/org/apache/spark/util/logging/DriverLogger.scala:64: > [deprecation @ org.apache.spark.util.logging.DriverLogger.addLogAppender.fa > | origin=org.apache.logging.log4j.core.appender.FileAppender.createAppender | > version=] method createAppender in class FileAppender is deprecated > [WARNING] [Warn] > /spark-source/core/src/test/scala/org/apache/spark/SparkFunSuite.scala:268: > [deprecation @ org.apache.spark.SparkFunSuite.LogAppender. | > origin=org.apache.logging.log4j.core.appender.AbstractAppender. | > version=] constructor AbstractAppender in class AbstractAppender is deprecated -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37853) Clean up deprecation compilation warning related to log4j2
[ https://issues.apache.org/jira/browse/SPARK-37853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17471786#comment-17471786 ] Apache Spark commented on SPARK-37853: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/35153 > Clean up deprecation compilation warning related to log4j2 > -- > > Key: SPARK-37853 > URL: https://issues.apache.org/jira/browse/SPARK-37853 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Yang Jie >Priority: Minor > > [WARNING] [Warn] > /spark-source/core/src/main/scala/org/apache/spark/util/logging/DriverLogger.scala:64: > [deprecation @ org.apache.spark.util.logging.DriverLogger.addLogAppender.fa > | origin=org.apache.logging.log4j.core.appender.FileAppender.createAppender | > version=] method createAppender in class FileAppender is deprecated > [WARNING] [Warn] > /spark-source/core/src/test/scala/org/apache/spark/SparkFunSuite.scala:268: > [deprecation @ org.apache.spark.SparkFunSuite.LogAppender. | > origin=org.apache.logging.log4j.core.appender.AbstractAppender. | > version=] constructor AbstractAppender in class AbstractAppender is deprecated -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37854) Use type match to simplify TestUtils#withHttpConnection
Yang Jie created SPARK-37854: Summary: Use type match to simplify TestUtils#withHttpConnection Key: SPARK-37854 URL: https://issues.apache.org/jira/browse/SPARK-37854 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.3.0 Reporter: Yang Jie -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37854) Use type match to simplify TestUtils#withHttpConnection
[ https://issues.apache.org/jira/browse/SPARK-37854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37854: Assignee: (was: Apache Spark) > Use type match to simplify TestUtils#withHttpConnection > --- > > Key: SPARK-37854 > URL: https://issues.apache.org/jira/browse/SPARK-37854 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Yang Jie >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37854) Use type match to simplify TestUtils#withHttpConnection
[ https://issues.apache.org/jira/browse/SPARK-37854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17471799#comment-17471799 ] Apache Spark commented on SPARK-37854: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/35154 > Use type match to simplify TestUtils#withHttpConnection > --- > > Key: SPARK-37854 > URL: https://issues.apache.org/jira/browse/SPARK-37854 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Yang Jie >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37854) Use type match to simplify TestUtils#withHttpConnection
[ https://issues.apache.org/jira/browse/SPARK-37854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37854: Assignee: Apache Spark > Use type match to simplify TestUtils#withHttpConnection > --- > > Key: SPARK-37854 > URL: https://issues.apache.org/jira/browse/SPARK-37854 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37854) Use type match to simplify TestUtils#withHttpConnection
[ https://issues.apache.org/jira/browse/SPARK-37854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie updated SPARK-37854: - Description: {code:java} if (connection.isInstanceOf[HttpsURLConnection]) { connection.asInstanceOf[HttpsURLConnection].setSSLSocketFactory(sslCtx.getSocketFactory()) connection.asInstanceOf[HttpsURLConnection].setHostnameVerifier(verifier) {code} > Use type match to simplify TestUtils#withHttpConnection > --- > > Key: SPARK-37854 > URL: https://issues.apache.org/jira/browse/SPARK-37854 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Yang Jie >Priority: Major > > {code:java} > if (connection.isInstanceOf[HttpsURLConnection]) { > > > connection.asInstanceOf[HttpsURLConnection].setSSLSocketFactory(sslCtx.getSocketFactory()) > > connection.asInstanceOf[HttpsURLConnection].setHostnameVerifier(verifier) > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37818) Add option for show create table command
[ https://issues.apache.org/jira/browse/SPARK-37818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang reassigned SPARK-37818: -- Assignee: PengLei > Add option for show create table command > > > Key: SPARK-37818 > URL: https://issues.apache.org/jira/browse/SPARK-37818 > Project: Spark > Issue Type: Documentation > Components: Documentation >Affects Versions: 3.3.0 >Reporter: PengLei >Assignee: PengLei >Priority: Trivial > Fix For: 3.3.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-37818) Add option for show create table command
[ https://issues.apache.org/jira/browse/SPARK-37818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang resolved SPARK-37818. Fix Version/s: 3.2.1 Resolution: Fixed Issue resolved by pull request 35107 [https://github.com/apache/spark/pull/35107] > Add option for show create table command > > > Key: SPARK-37818 > URL: https://issues.apache.org/jira/browse/SPARK-37818 > Project: Spark > Issue Type: Documentation > Components: Documentation >Affects Versions: 3.3.0 >Reporter: PengLei >Assignee: PengLei >Priority: Trivial > Fix For: 3.3.0, 3.2.1 > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37818) Add option for show create table command
[ https://issues.apache.org/jira/browse/SPARK-37818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17471925#comment-17471925 ] Gengliang Wang commented on SPARK-37818: [~huaxingao] FYI I set the fixed version as 3.2.1. I saw there is a tag 3.2.1-rc1 already, so I will update the fixed version as 3.2.2 if this doc change can't make it on 3.2.1 > Add option for show create table command > > > Key: SPARK-37818 > URL: https://issues.apache.org/jira/browse/SPARK-37818 > Project: Spark > Issue Type: Documentation > Components: Documentation >Affects Versions: 3.3.0 >Reporter: PengLei >Assignee: PengLei >Priority: Trivial > Fix For: 3.2.1, 3.3.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-35442) Support propagate empty relation through aggregate
[ https://issues.apache.org/jira/browse/SPARK-35442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-35442. - Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 35149 [https://github.com/apache/spark/pull/35149] > Support propagate empty relation through aggregate > -- > > Key: SPARK-35442 > URL: https://issues.apache.org/jira/browse/SPARK-35442 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: XiDuo You >Priority: Minor > Fix For: 3.3.0 > > > The Aggregate in AQE is different with others, the `LogicalQueryStage` looks > like `LogicalQueryStage(Aggregate, BaseAggregate)`. We should handle this > case specially. > Logically, if the Aggregate grouping expression is not empty, we can > eliminate it safely. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35746) Task id in the Stage page timeline is incorrect
[ https://issues.apache.org/jira/browse/SPARK-35746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17471982#comment-17471982 ] Apache Spark commented on SPARK-35746: -- User 'stczwd' has created a pull request for this issue: https://github.com/apache/spark/pull/35155 > Task id in the Stage page timeline is incorrect > --- > > Key: SPARK-35746 > URL: https://issues.apache.org/jira/browse/SPARK-35746 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 3.0.0, 3.1.2 >Reporter: shahid >Assignee: shahid >Priority: Minor > Attachments: image-2021-06-12-07-03-09-808.png > > > !image-2021-06-12-07-03-09-808.png! -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35746) Task id in the Stage page timeline is incorrect
[ https://issues.apache.org/jira/browse/SPARK-35746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17471981#comment-17471981 ] Apache Spark commented on SPARK-35746: -- User 'stczwd' has created a pull request for this issue: https://github.com/apache/spark/pull/35155 > Task id in the Stage page timeline is incorrect > --- > > Key: SPARK-35746 > URL: https://issues.apache.org/jira/browse/SPARK-35746 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 3.0.0, 3.1.2 >Reporter: shahid >Assignee: shahid >Priority: Minor > Attachments: image-2021-06-12-07-03-09-808.png > > > !image-2021-06-12-07-03-09-808.png! -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37491) Fix Series.asof when values of the series is not sorted
[ https://issues.apache.org/jira/browse/SPARK-37491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17472056#comment-17472056 ] pralabhkumar commented on SPARK-37491: -- Lets take example of pser = pd.Series([2, 1, np.nan, 4], index=[10, 20, 30, 40], name="Koalas") pser.asof([5,20]) will give output [Nan , 1] While ps.from_pandas(pser).asof[5,20] will give output [Nan, 2] *Explanation* Data frame created after applying condition. F.when(index_scol <= SF.lit(index).cast(index_type) Without applying max aggregation +-+--+-+ |col_5 |col_25 |__index_level_0__| +-+--+-+ |null|2.0|10 | |null|1.0|20 | |null|null|30 | |null|null|40 | +-+--+-+ Since we are taking max , output is coming 2. Ideally what we need is the last non null value or each col with increasing order of __index_level_0__. Now to implement the logic . What I planning to do is create a below DF from the above DF , using explode , partition and row_number __index_level_0__. Identifier value row_number 40 col_5 null. 1 30 col_5 null 2 20 col_5 null 3 10 col_5 null 4 40 col_20 2 1 30 col_20 1 2 20 col_20 null 3 10 col_20 null 4 Then filter on row_number=1 . There are other things to take care , but majority of the logic is this . Please let me know if its in correct direction ( This is actually passing all the asof test cases ,including the case which is described in jira. ) . [~itholic] > Fix Series.asof when values of the series is not sorted > --- > > Key: SPARK-37491 > URL: https://issues.apache.org/jira/browse/SPARK-37491 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.3.0 >Reporter: dch nguyen >Priority: Major > > https://github.com/apache/spark/pull/34737#discussion_r758223279 -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37854) Use type match to simplify TestUtils#withHttpConnection
[ https://issues.apache.org/jira/browse/SPARK-37854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen updated SPARK-37854: - Priority: Trivial (was: Major) > Use type match to simplify TestUtils#withHttpConnection > --- > > Key: SPARK-37854 > URL: https://issues.apache.org/jira/browse/SPARK-37854 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Yang Jie >Priority: Trivial > > {code:java} > if (connection.isInstanceOf[HttpsURLConnection]) { > > > connection.asInstanceOf[HttpsURLConnection].setSSLSocketFactory(sslCtx.getSocketFactory()) > > connection.asInstanceOf[HttpsURLConnection].setHostnameVerifier(verifier) > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37855) IllegalStateException when transforming an array inside a nested struct
G Muciaccia created SPARK-37855: --- Summary: IllegalStateException when transforming an array inside a nested struct Key: SPARK-37855 URL: https://issues.apache.org/jira/browse/SPARK-37855 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.2.0 Environment: OS: Ubuntu 20.04.3 LTS Scala version: 2.12.12 Reporter: G Muciaccia *NOTE*: this bug is only present in version {{3.2.0}}. Downgrading to {{3.1.2}} solves the problem. h3. Prerequisites to reproduce the bug # use Spark version 3.2.0 # create a DataFrame with an array field, which contains a struct field with a nested array field # *apply a limit* to the DataFrame # transform the outer array, renaming one of its fields # transform the inner array too, which requires two {{getField}} in sequence h3. Example that reproduces the bug This is a minimal example (as minimal as I could make it) to reproduce the bug: {code} import org.apache.spark.sql.functions._ import org.apache.spark.sql.types._ import org.apache.spark.sql.{DataFrame, Row} def makeInput(): DataFrame = { val innerElement1 = Row(3, 3.12) val innerElement2 = Row(4, 2.1) val innerElement3 = Row(1, 985.2) val innerElement4 = Row(10, 757548.0) val innerElement5 = Row(1223, 0.665) val outerElement1 = Row(1, Row(List(innerElement1, innerElement2))) val outerElement2 = Row(2, Row(List(innerElement3))) val outerElement3 = Row(3, Row(List(innerElement4, innerElement5))) val data = Seq( Row("row1", List(outerElement1)), Row("row2", List(outerElement2, outerElement3)), ) val schema = new StructType() .add("name", StringType) .add("outer_array", ArrayType(new StructType() .add("id", IntegerType) .add("inner_array_struct", new StructType() .add("inner_array", ArrayType(new StructType() .add("id", IntegerType) .add("value", DoubleType) )) ) )) spark.createDataFrame(spark.sparkContext .parallelize(data),schema) } // val df = makeInput() val df = makeInput().limit(2) // val df = makeInput().limit(2).cache() val res = df.withColumn("extracted", transform( col("outer_array"), c1 => { struct( c1.getField("id").alias("outer_id"), transform( c1.getField("inner_array_struct").getField("inner_array"), c2 => { struct( c2.getField("value").alias("inner_value") ) } ) ) } )) res.printSchema() res.show(false) {code} h4. Executing the example code When executing it as-is, the execution will fail on the {{show}} statement, with {code} java.lang.IllegalStateException Couldn't find _extract_inner_array#23 in [name#2,outer_array#3] {code} However, *if the limit is not applied, or if the DataFrame is cached after the limit, everything works* (you can uncomment the corresponding lines in the example to try it). -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37818) Add option for show create table command
[ https://issues.apache.org/jira/browse/SPARK-37818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17472181#comment-17472181 ] Huaxin Gao commented on SPARK-37818: [~Gengliang.Wang] I am drafting the 3.2.1 voting email now. I will need to change the fixed version to 3.2.2, otherwise, the list of bug fixes will contain this one. I will change this back to 3.2.1 if RC1 doesn't pass. > Add option for show create table command > > > Key: SPARK-37818 > URL: https://issues.apache.org/jira/browse/SPARK-37818 > Project: Spark > Issue Type: Documentation > Components: Documentation >Affects Versions: 3.3.0 >Reporter: PengLei >Assignee: PengLei >Priority: Trivial > Fix For: 3.2.1, 3.3.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37818) Add option for show create table command
[ https://issues.apache.org/jira/browse/SPARK-37818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxin Gao updated SPARK-37818: --- Fix Version/s: (was: 3.2.1) > Add option for show create table command > > > Key: SPARK-37818 > URL: https://issues.apache.org/jira/browse/SPARK-37818 > Project: Spark > Issue Type: Documentation > Components: Documentation >Affects Versions: 3.3.0 >Reporter: PengLei >Assignee: PengLei >Priority: Trivial > Fix For: 3.3.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37818) Add option for show create table command
[ https://issues.apache.org/jira/browse/SPARK-37818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17472183#comment-17472183 ] Huaxin Gao commented on SPARK-37818: [~Gengliang.Wang] version 3.2.2 doesn't exist yet. I will just set the version to 3.3.0 for now. Will update the version to 3.2.2 later. > Add option for show create table command > > > Key: SPARK-37818 > URL: https://issues.apache.org/jira/browse/SPARK-37818 > Project: Spark > Issue Type: Documentation > Components: Documentation >Affects Versions: 3.3.0 >Reporter: PengLei >Assignee: PengLei >Priority: Trivial > Fix For: 3.3.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37856) Executor pods keep existing if driver container was restarted
Denis Krivenko created SPARK-37856: -- Summary: Executor pods keep existing if driver container was restarted Key: SPARK-37856 URL: https://issues.apache.org/jira/browse/SPARK-37856 Project: Spark Issue Type: Bug Components: Kubernetes Affects Versions: 3.2.0, 3.1.2 Environment: * Kubernetes 1.20 * Spark 3.1.2 * Hadoop 3.2.0 * Java 11 * Scala 2.12 and * Kubernetes 1.20 * Spark 3.2.0 * Hadoop 3.3.1 * Java 11 * Scala 2.12 Reporter: Denis Krivenko I run Spark Thrift Server on Kubernetes cluster, so the driver pod runs continuously and it creates and manages executor pods. From time to time OOM issue occurs on a driver pod or executor pods. When it happens on * executor - the executor pod is getting deleted and the driver creates a new executor pod instead. It works as expected. * driver - Kubernetes restarts the driver container and the driver creates new executor pods. All previous executors stop, but still exist with *Error* state for Spark 3.1.2 or with *Completed* state for Spark 3.2.0 The behavior can be reproduced by restarting a pod container with the command {code:java} kubectl exec POD_NAME -c CONTAINER_NAME -- /sbin/killall5{code} Property _spark.kubernetes.executor.deleteOnTermination_ is set to *true* by default. If I delete driver pod all executor pods (in any state) are also deleted completely. +Pod list+ {code:java} NAME READY STATUS RESTARTS AGE spark-thrift-server-85cf5d689b-vvrwd 1/1 Running 1 3d15h spark-thrift-server-198cc57e3f9a7400-exec-10 1/1 Running 0 86m spark-thrift-server-198cc57e3f9a7400-exec-6 1/1 Running 0 12h spark-thrift-server-198cc57e3f9a7400-exec-8 1/1 Running 0 9h spark-thrift-server-198cc57e3f9a7400-exec-9 1/1 Running 0 3h12m spark-thrift-server-1a9aee7e31f36eea-exec-17 0/1 Completed 0 38h spark-thrift-server-1a9aee7e31f36eea-exec-18 0/1 Completed 0 38h spark-thrift-server-1a9aee7e31f36eea-exec-19 0/1 Completed 0 36h spark-thrift-server-1a9aee7e31f36eea-exec-21 0/1 Completed 0 24h {code} +Driver pod+ {code:java} apiVersion: v1 kind: Pod metadata: name: spark-thrift-server-85cf5d689b-vvrwd uid: b69a7c68-a767-4e3b-939c-061347b1c25e spec: ... status: containerStatuses: - containerID: containerd://7206acf424aa30b6f8533c0e32c99ebfdc5ee80648e76289f6bd2f87460ddcd3 image: xxx/spark:3.2.0 lastState: terminated: containerID: containerd://fe3cacb8e6470ac37dcd50d525ae3d54c8b6bfef3558325bc22e7b40daab1703 exitCode: 143 finishedAt: "2022-01-09T16:09:50Z" reason: OOMKilled startedAt: "2022-01-07T00:32:21Z" name: spark-thrift-server ready: true restartCount: 1 started: true state: running: startedAt: "2022-01-09T16:09:51Z" {code} Executor pod {code:java} apiVersion: v1 kind: Pod metadata: name: spark-thrift-server-1a9aee7e31f36eea-exec-17 ownerReferences: - apiVersion: v1 controller: true kind: Pod name: spark-thrift-server-85cf5d689b-vvrwd uid: b69a7c68-a767-4e3b-939c-061347b1c25e spec: ... status: containerStatuses: - containerID: containerd://75c68190147ba980f4b9014eef3989ddc2ee30de321fd1119957b6684a995c19 image: xxx/spark:3.2.0 lastState: {} name: spark-kubernetes-executor ready: false restartCount: 0 started: false state: terminated: containerID: containerd://75c68190147ba980f4b9014eef3989ddc2ee30de321fd1119957b6684a995c19 exitCode: 0 finishedAt: "2022-01-09T16:08:57Z" reason: Completed startedAt: "2022-01-09T01:39:15Z" {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37856) Executor pods keep existing if driver container was restarted
[ https://issues.apache.org/jira/browse/SPARK-37856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Denis Krivenko updated SPARK-37856: --- Environment: Kubernetes 1.20 | Spark 3.1.2 | Hadoop 3.2.0 | Java 11 | Scala 2.12 Kubernetes 1.20 | Spark 3.2.0 | Hadoop 3.3.1 | Java 11 | Scala 2.12 was: * Kubernetes 1.20 * Spark 3.1.2 * Hadoop 3.2.0 * Java 11 * Scala 2.12 and * Kubernetes 1.20 * Spark 3.2.0 * Hadoop 3.3.1 * Java 11 * Scala 2.12 > Executor pods keep existing if driver container was restarted > - > > Key: SPARK-37856 > URL: https://issues.apache.org/jira/browse/SPARK-37856 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.1.2, 3.2.0 > Environment: Kubernetes 1.20 | Spark 3.1.2 | Hadoop 3.2.0 | Java 11 | > Scala 2.12 > Kubernetes 1.20 | Spark 3.2.0 | Hadoop 3.3.1 | Java 11 | Scala 2.12 >Reporter: Denis Krivenko >Priority: Minor > > I run Spark Thrift Server on Kubernetes cluster, so the driver pod runs > continuously and it creates and manages executor pods. From time to time OOM > issue occurs on a driver pod or executor pods. > When it happens on > * executor - the executor pod is getting deleted and the driver creates a > new executor pod instead. It works as expected. > * driver - Kubernetes restarts the driver container and the driver > creates new executor pods. All previous executors stop, but still exist with > *Error* state for Spark 3.1.2 or with *Completed* state for Spark 3.2.0 > The behavior can be reproduced by restarting a pod container with the command > {code:java} > kubectl exec POD_NAME -c CONTAINER_NAME -- /sbin/killall5{code} > Property _spark.kubernetes.executor.deleteOnTermination_ is set to *true* by > default. > If I delete driver pod all executor pods (in any state) are also deleted > completely. > +Pod list+ > {code:java} > NAME READY STATUS RESTARTS > AGE > spark-thrift-server-85cf5d689b-vvrwd 1/1 Running 1 > 3d15h > spark-thrift-server-198cc57e3f9a7400-exec-10 1/1 Running 0 > 86m > spark-thrift-server-198cc57e3f9a7400-exec-6 1/1 Running 0 > 12h > spark-thrift-server-198cc57e3f9a7400-exec-8 1/1 Running 0 > 9h > spark-thrift-server-198cc57e3f9a7400-exec-9 1/1 Running 0 > 3h12m > spark-thrift-server-1a9aee7e31f36eea-exec-17 0/1 Completed 0 > 38h > spark-thrift-server-1a9aee7e31f36eea-exec-18 0/1 Completed 0 > 38h > spark-thrift-server-1a9aee7e31f36eea-exec-19 0/1 Completed 0 > 36h > spark-thrift-server-1a9aee7e31f36eea-exec-21 0/1 Completed 0 > 24h > {code} > +Driver pod+ > {code:java} > apiVersion: v1 > kind: Pod > metadata: > name: spark-thrift-server-85cf5d689b-vvrwd > uid: b69a7c68-a767-4e3b-939c-061347b1c25e > spec: > ... > status: > containerStatuses: > - containerID: > containerd://7206acf424aa30b6f8533c0e32c99ebfdc5ee80648e76289f6bd2f87460ddcd3 > image: xxx/spark:3.2.0 > lastState: > terminated: > containerID: > containerd://fe3cacb8e6470ac37dcd50d525ae3d54c8b6bfef3558325bc22e7b40daab1703 > exitCode: 143 > finishedAt: "2022-01-09T16:09:50Z" > reason: OOMKilled > startedAt: "2022-01-07T00:32:21Z" > name: spark-thrift-server > ready: true > restartCount: 1 > started: true > state: > running: > startedAt: "2022-01-09T16:09:51Z" {code} > Executor pod > {code:java} > apiVersion: v1 > kind: Pod > metadata: > name: spark-thrift-server-1a9aee7e31f36eea-exec-17 > ownerReferences: > - apiVersion: v1 > controller: true > kind: Pod > name: spark-thrift-server-85cf5d689b-vvrwd > uid: b69a7c68-a767-4e3b-939c-061347b1c25e > spec: > ... > status: > containerStatuses: > - containerID: > containerd://75c68190147ba980f4b9014eef3989ddc2ee30de321fd1119957b6684a995c19 > image: xxx/spark:3.2.0 > lastState: {} > name: spark-kubernetes-executor > ready: false > restartCount: 0 > started: false > state: > terminated: > containerID: > containerd://75c68190147ba980f4b9014eef3989ddc2ee30de321fd1119957b6684a995c19 > exitCode: 0 > finishedAt: "2022-01-09T16:08:57Z" > reason: Completed > startedAt: "2022-01-09T01:39:15Z" {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37857) Any depth search not working in get_json_object ($..foo)
Venkata Subramaniam created SPARK-37857: --- Summary: Any depth search not working in get_json_object ($..foo) Key: SPARK-37857 URL: https://issues.apache.org/jira/browse/SPARK-37857 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.2.0 Reporter: Venkata Subramaniam The following example should return a value _abc_ but instead returns null {code:java} spark.sql("""select get_json_object('{"k":{"value":"abc"}}', '$..value') as j""").show() {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35262) Memory leak when dataset is being persisted
[ https://issues.apache.org/jira/browse/SPARK-35262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17472248#comment-17472248 ] Denis Krivenko commented on SPARK-35262: [~iamelin] Could you please check/confirm the issue still exists in 3.2.0? > Memory leak when dataset is being persisted > --- > > Key: SPARK-35262 > URL: https://issues.apache.org/jira/browse/SPARK-35262 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.1 >Reporter: Igor Amelin >Priority: Major > > If a Java- or Scala-application with SparkSession runs a long time and > persists a lot of datasets, it can crash because of a memory leak. > I've noticed the following. When we have a dataset and persist it, the > SparkSession used to load that dataset is cloned in CacheManager, and this > clone is added as a listener to `listenersPlusTimers` in `ListenerBus`. But > this clone isn't removed from the list of listeners after that, e.g. > unpersisting the dataset. If we persist a lot of datasets, the SparkSession > is cloned and added to `ListenerBus` many times. This leads to a memory leak > since the `listenersPlusTimers` list become very large. > I've found out that the SparkSession is cloned is CacheManager when the > parameters `spark.sql.sources.bucketing.autoBucketedScan.enabled` and > `spark.sql.adaptive.enabled` are true. The first one is true by default, and > this default behavior leads to the problem. When auto bucketed scan is > disabled, the SparkSession isn't cloned, and there are no duplicates in > ListenerBus, so the memory leak doesn't occur. > Here is a small Java application to reproduce the memory leak: > [https://github.com/iamelin/spark-memory-leak] -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37858) Wrap Java exceptions from aes functions
Max Gekk created SPARK-37858: Summary: Wrap Java exceptions from aes functions Key: SPARK-37858 URL: https://issues.apache.org/jira/browse/SPARK-37858 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.3.0 Reporter: Max Gekk Assignee: Max Gekk Currently, Spark SQL can throw Java exceptions from the aes_encrypt()/aes_decrypt() functions, for instance: {code:java} java.lang.RuntimeException: javax.crypto.AEADBadTagException: Tag mismatch! at org.apache.spark.sql.catalyst.expressions.ExpressionImplUtils.aesInternal(ExpressionImplUtils.java:93) at org.apache.spark.sql.catalyst.expressions.ExpressionImplUtils.aesDecrypt(ExpressionImplUtils.java:43) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.project_doConsume_0$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759) at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:354) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:890) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:890) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365) at org.apache.spark.rdd.RDD.iterator(RDD.scala:329) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:136) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:507) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1468) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:510) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: javax.crypto.AEADBadTagException: Tag mismatch! at com.sun.crypto.provider.GaloisCounterMode.decryptFinal(GaloisCounterMode.java:620) at com.sun.crypto.provider.CipherCore.finalNoPadding(CipherCore.java:1116) at com.sun.crypto.provider.CipherCore.fillOutputBuffer(CipherCore.java:1053) at com.sun.crypto.provider.CipherCore.doFinal(CipherCore.java:853) at com.sun.crypto.provider.AESCipher.engineDoFinal(AESCipher.java:446) at javax.crypto.Cipher.doFinal(Cipher.java:2226) at org.apache.spark.sql.catalyst.expressions.ExpressionImplUtils.aesInternal(ExpressionImplUtils.java:87) ... 19 more {code} That might confuse non-Scala/Java users. Need to wrap such kind of exception by Spark's exception. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37858) Wrap Java exceptions from aes functions
[ https://issues.apache.org/jira/browse/SPARK-37858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk updated SPARK-37858: - Issue Type: Improvement (was: Bug) > Wrap Java exceptions from aes functions > --- > > Key: SPARK-37858 > URL: https://issues.apache.org/jira/browse/SPARK-37858 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > > Currently, Spark SQL can throw Java exceptions from the > aes_encrypt()/aes_decrypt() functions, for instance: > {code:java} > java.lang.RuntimeException: javax.crypto.AEADBadTagException: Tag mismatch! > at > org.apache.spark.sql.catalyst.expressions.ExpressionImplUtils.aesInternal(ExpressionImplUtils.java:93) > at > org.apache.spark.sql.catalyst.expressions.ExpressionImplUtils.aesDecrypt(ExpressionImplUtils.java:43) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.project_doConsume_0$(Unknown > Source) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:354) > at > org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:890) > at > org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:890) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:329) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) > at org.apache.spark.scheduler.Task.run(Task.scala:136) > at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:507) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1468) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:510) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: javax.crypto.AEADBadTagException: Tag mismatch! > at > com.sun.crypto.provider.GaloisCounterMode.decryptFinal(GaloisCounterMode.java:620) > at > com.sun.crypto.provider.CipherCore.finalNoPadding(CipherCore.java:1116) > at > com.sun.crypto.provider.CipherCore.fillOutputBuffer(CipherCore.java:1053) > at com.sun.crypto.provider.CipherCore.doFinal(CipherCore.java:853) > at com.sun.crypto.provider.AESCipher.engineDoFinal(AESCipher.java:446) > at javax.crypto.Cipher.doFinal(Cipher.java:2226) > at > org.apache.spark.sql.catalyst.expressions.ExpressionImplUtils.aesInternal(ExpressionImplUtils.java:87) > ... 19 more > {code} > That might confuse non-Scala/Java users. Need to wrap such kind of exception > by Spark's exception. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36644) Push down boolean column filter
[ https://issues.apache.org/jira/browse/SPARK-36644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17472269#comment-17472269 ] Apache Spark commented on SPARK-36644: -- User 'kazuyukitanimura' has created a pull request for this issue: https://github.com/apache/spark/pull/35156 > Push down boolean column filter > --- > > Key: SPARK-36644 > URL: https://issues.apache.org/jira/browse/SPARK-36644 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 3.3.0 >Reporter: Kazuyuki Tanimura >Assignee: Kazuyuki Tanimura >Priority: Major > Fix For: 3.3.0 > > > The following query does not push down the filter > ``` > SELECT * FROM t WHERE boolean_field > ``` > although the following query pushes down the filter as expected. > ``` > SELECT * FROM t WHERE boolean_field = true > ``` -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36782) Deadlock between map-output-dispatcher and dispatcher-BlockManagerMaster upon migrating shuffle blocks
[ https://issues.apache.org/jira/browse/SPARK-36782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Holden Karau updated SPARK-36782: - Fix Version/s: 3.1.3 > Deadlock between map-output-dispatcher and dispatcher-BlockManagerMaster upon > migrating shuffle blocks > -- > > Key: SPARK-36782 > URL: https://issues.apache.org/jira/browse/SPARK-36782 > Project: Spark > Issue Type: Bug > Components: Block Manager >Affects Versions: 3.1.0, 3.1.1, 3.1.2, 3.2.0, 3.1.3, 3.2.1, 3.3.0 >Reporter: Fabian Thiele >Assignee: Fabian Thiele >Priority: Major > Fix For: 3.2.0, 3.1.3 > > Attachments: > 0001-Add-test-showing-that-decommission-might-deadlock.patch, > spark_stacktrace_deadlock.txt > > > I can observe a deadlock on the driver that can be triggered rather reliably > in a job with a larger amount of tasks - upon using > {code:java} > spark.decommission.enabled: true > spark.storage.decommission.rddBlocks.enabled: true > spark.storage.decommission.shuffleBlocks.enabled: true > spark.storage.decommission.enabled: true{code} > > It origins in the {{dispatcher-BlockManagerMaster}} making a call to > {{updateBlockInfo}} when shuffles are migrated. This is not performed by a > thread from the pool but instead by the {{dispatcher-BlockManagerMaster}} > itself. I suppose this was done under the assumption that this would be very > fast. However if the block that is updated is a shuffle index block it calls > {code:java} > mapOutputTracker.updateMapOutput(shuffleId, mapId, blockManagerId){code} > for which it waits to acquire a write lock as part of the > {{MapOutputTracker}}. > If the timing is bad then one of the {{map-output-dispatchers}} are holding > this lock as part of e.g. {{serializedMapStatus}}. In this function > {{MapOutputTracker.serializeOutputStatuses}} is called and as part of that we > do > {code:java} > if (arrSize >= minBroadcastSize) { > // Use broadcast instead. > // Important arr(0) is the tag == DIRECT, ignore that while deserializing ! > // arr is a nested Array so that it can handle over 2GB serialized data > val arr = chunkedByteBuf.getChunks().map(_.array()) > val bcast = broadcastManager.newBroadcast(arr, isLocal){code} > which makes an RPC call to {{dispatcher-BlockManagerMaster}}. That one > however is unable to answer as it is blocked while waiting on the > aforementioned lock. Hence the deadlock. The ingredients of this deadlock are > therefore: sufficient size of the array to go the broadcast-path, as well as > timing of incoming {{updateBlockInfo}} call as happens regularly during > decommissioning. Potentially earlier versions than 3.1.0 are affected but I > could not sufficiently conclude that. > I have a stacktrace of all driver threads showing the deadlock: > [^spark_stacktrace_deadlock.txt] > A coworker of mine wrote a patch that replicates the issue as a test case as > well: [^0001-Add-test-showing-that-decommission-might-deadlock.patch] -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37858) Throw Spark exceptions from AES functions
[ https://issues.apache.org/jira/browse/SPARK-37858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk updated SPARK-37858: - Summary: Throw Spark exceptions from AES functions (was: Wrap Java exceptions from aes functions) > Throw Spark exceptions from AES functions > - > > Key: SPARK-37858 > URL: https://issues.apache.org/jira/browse/SPARK-37858 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > > Currently, Spark SQL can throw Java exceptions from the > aes_encrypt()/aes_decrypt() functions, for instance: > {code:java} > java.lang.RuntimeException: javax.crypto.AEADBadTagException: Tag mismatch! > at > org.apache.spark.sql.catalyst.expressions.ExpressionImplUtils.aesInternal(ExpressionImplUtils.java:93) > at > org.apache.spark.sql.catalyst.expressions.ExpressionImplUtils.aesDecrypt(ExpressionImplUtils.java:43) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.project_doConsume_0$(Unknown > Source) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:354) > at > org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:890) > at > org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:890) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:329) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) > at org.apache.spark.scheduler.Task.run(Task.scala:136) > at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:507) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1468) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:510) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: javax.crypto.AEADBadTagException: Tag mismatch! > at > com.sun.crypto.provider.GaloisCounterMode.decryptFinal(GaloisCounterMode.java:620) > at > com.sun.crypto.provider.CipherCore.finalNoPadding(CipherCore.java:1116) > at > com.sun.crypto.provider.CipherCore.fillOutputBuffer(CipherCore.java:1053) > at com.sun.crypto.provider.CipherCore.doFinal(CipherCore.java:853) > at com.sun.crypto.provider.AESCipher.engineDoFinal(AESCipher.java:446) > at javax.crypto.Cipher.doFinal(Cipher.java:2226) > at > org.apache.spark.sql.catalyst.expressions.ExpressionImplUtils.aesInternal(ExpressionImplUtils.java:87) > ... 19 more > {code} > That might confuse non-Scala/Java users. Need to wrap such kind of exception > by Spark's exception. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37858) Throw Spark exceptions from AES functions
[ https://issues.apache.org/jira/browse/SPARK-37858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37858: Assignee: Apache Spark (was: Max Gekk) > Throw Spark exceptions from AES functions > - > > Key: SPARK-37858 > URL: https://issues.apache.org/jira/browse/SPARK-37858 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Max Gekk >Assignee: Apache Spark >Priority: Major > > Currently, Spark SQL can throw Java exceptions from the > aes_encrypt()/aes_decrypt() functions, for instance: > {code:java} > java.lang.RuntimeException: javax.crypto.AEADBadTagException: Tag mismatch! > at > org.apache.spark.sql.catalyst.expressions.ExpressionImplUtils.aesInternal(ExpressionImplUtils.java:93) > at > org.apache.spark.sql.catalyst.expressions.ExpressionImplUtils.aesDecrypt(ExpressionImplUtils.java:43) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.project_doConsume_0$(Unknown > Source) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:354) > at > org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:890) > at > org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:890) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:329) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) > at org.apache.spark.scheduler.Task.run(Task.scala:136) > at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:507) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1468) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:510) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: javax.crypto.AEADBadTagException: Tag mismatch! > at > com.sun.crypto.provider.GaloisCounterMode.decryptFinal(GaloisCounterMode.java:620) > at > com.sun.crypto.provider.CipherCore.finalNoPadding(CipherCore.java:1116) > at > com.sun.crypto.provider.CipherCore.fillOutputBuffer(CipherCore.java:1053) > at com.sun.crypto.provider.CipherCore.doFinal(CipherCore.java:853) > at com.sun.crypto.provider.AESCipher.engineDoFinal(AESCipher.java:446) > at javax.crypto.Cipher.doFinal(Cipher.java:2226) > at > org.apache.spark.sql.catalyst.expressions.ExpressionImplUtils.aesInternal(ExpressionImplUtils.java:87) > ... 19 more > {code} > That might confuse non-Scala/Java users. Need to wrap such kind of exception > by Spark's exception. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37858) Throw Spark exceptions from AES functions
[ https://issues.apache.org/jira/browse/SPARK-37858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17472294#comment-17472294 ] Apache Spark commented on SPARK-37858: -- User 'MaxGekk' has created a pull request for this issue: https://github.com/apache/spark/pull/35157 > Throw Spark exceptions from AES functions > - > > Key: SPARK-37858 > URL: https://issues.apache.org/jira/browse/SPARK-37858 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > > Currently, Spark SQL can throw Java exceptions from the > aes_encrypt()/aes_decrypt() functions, for instance: > {code:java} > java.lang.RuntimeException: javax.crypto.AEADBadTagException: Tag mismatch! > at > org.apache.spark.sql.catalyst.expressions.ExpressionImplUtils.aesInternal(ExpressionImplUtils.java:93) > at > org.apache.spark.sql.catalyst.expressions.ExpressionImplUtils.aesDecrypt(ExpressionImplUtils.java:43) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.project_doConsume_0$(Unknown > Source) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:354) > at > org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:890) > at > org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:890) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:329) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) > at org.apache.spark.scheduler.Task.run(Task.scala:136) > at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:507) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1468) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:510) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: javax.crypto.AEADBadTagException: Tag mismatch! > at > com.sun.crypto.provider.GaloisCounterMode.decryptFinal(GaloisCounterMode.java:620) > at > com.sun.crypto.provider.CipherCore.finalNoPadding(CipherCore.java:1116) > at > com.sun.crypto.provider.CipherCore.fillOutputBuffer(CipherCore.java:1053) > at com.sun.crypto.provider.CipherCore.doFinal(CipherCore.java:853) > at com.sun.crypto.provider.AESCipher.engineDoFinal(AESCipher.java:446) > at javax.crypto.Cipher.doFinal(Cipher.java:2226) > at > org.apache.spark.sql.catalyst.expressions.ExpressionImplUtils.aesInternal(ExpressionImplUtils.java:87) > ... 19 more > {code} > That might confuse non-Scala/Java users. Need to wrap such kind of exception > by Spark's exception. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37858) Throw Spark exceptions from AES functions
[ https://issues.apache.org/jira/browse/SPARK-37858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37858: Assignee: Max Gekk (was: Apache Spark) > Throw Spark exceptions from AES functions > - > > Key: SPARK-37858 > URL: https://issues.apache.org/jira/browse/SPARK-37858 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > > Currently, Spark SQL can throw Java exceptions from the > aes_encrypt()/aes_decrypt() functions, for instance: > {code:java} > java.lang.RuntimeException: javax.crypto.AEADBadTagException: Tag mismatch! > at > org.apache.spark.sql.catalyst.expressions.ExpressionImplUtils.aesInternal(ExpressionImplUtils.java:93) > at > org.apache.spark.sql.catalyst.expressions.ExpressionImplUtils.aesDecrypt(ExpressionImplUtils.java:43) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.project_doConsume_0$(Unknown > Source) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:354) > at > org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:890) > at > org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:890) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:329) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) > at org.apache.spark.scheduler.Task.run(Task.scala:136) > at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:507) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1468) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:510) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: javax.crypto.AEADBadTagException: Tag mismatch! > at > com.sun.crypto.provider.GaloisCounterMode.decryptFinal(GaloisCounterMode.java:620) > at > com.sun.crypto.provider.CipherCore.finalNoPadding(CipherCore.java:1116) > at > com.sun.crypto.provider.CipherCore.fillOutputBuffer(CipherCore.java:1053) > at com.sun.crypto.provider.CipherCore.doFinal(CipherCore.java:853) > at com.sun.crypto.provider.AESCipher.engineDoFinal(AESCipher.java:446) > at javax.crypto.Cipher.doFinal(Cipher.java:2226) > at > org.apache.spark.sql.catalyst.expressions.ExpressionImplUtils.aesInternal(ExpressionImplUtils.java:87) > ... 19 more > {code} > That might confuse non-Scala/Java users. Need to wrap such kind of exception > by Spark's exception. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37852) Enable flake's E741 rule in PySpark
[ https://issues.apache.org/jira/browse/SPARK-37852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maciej Szymkiewicz reassigned SPARK-37852: -- Assignee: Hyukjin Kwon > Enable flake's E741 rule in PySpark > --- > > Key: SPARK-37852 > URL: https://issues.apache.org/jira/browse/SPARK-37852 > Project: Spark > Issue Type: Improvement > Components: Project Infra, PySpark >Affects Versions: 3.3.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > > To comply PEP 8, we should enable this rule (see also > https://www.python.org/dev/peps/pep-0008/#names-to-avoid) -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-37852) Enable flake's E741 rule in PySpark
[ https://issues.apache.org/jira/browse/SPARK-37852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maciej Szymkiewicz resolved SPARK-37852. Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 35152 [https://github.com/apache/spark/pull/35152] > Enable flake's E741 rule in PySpark > --- > > Key: SPARK-37852 > URL: https://issues.apache.org/jira/browse/SPARK-37852 > Project: Spark > Issue Type: Improvement > Components: Project Infra, PySpark >Affects Versions: 3.3.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 3.3.0 > > > To comply PEP 8, we should enable this rule (see also > https://www.python.org/dev/peps/pep-0008/#names-to-avoid) -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37859) SQL tables created with JDBC with Spark 3.1 are not readable with 3.2
Karen Feng created SPARK-37859: -- Summary: SQL tables created with JDBC with Spark 3.1 are not readable with 3.2 Key: SPARK-37859 URL: https://issues.apache.org/jira/browse/SPARK-37859 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.2.0 Reporter: Karen Feng In https://github.com/apache/spark/blob/bd24b4884b804fc85a083f82b864823851d5980c/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala#L312, a new metadata field is added during reading. As we do a full comparison of the user-provided schema and the actual schema in https://github.com/apache/spark/blob/bd24b4884b804fc85a083f82b864823851d5980c/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala#L356, resolution fails if a table created with Spark 3.1 is read with Spark 3.2. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37859) SQL tables created with JDBC with Spark 3.1 are not readable with 3.2
[ https://issues.apache.org/jira/browse/SPARK-37859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37859: Assignee: (was: Apache Spark) > SQL tables created with JDBC with Spark 3.1 are not readable with 3.2 > - > > Key: SPARK-37859 > URL: https://issues.apache.org/jira/browse/SPARK-37859 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Karen Feng >Priority: Major > > In > https://github.com/apache/spark/blob/bd24b4884b804fc85a083f82b864823851d5980c/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala#L312, > a new metadata field is added during reading. As we do a full comparison of > the user-provided schema and the actual schema in > https://github.com/apache/spark/blob/bd24b4884b804fc85a083f82b864823851d5980c/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala#L356, > resolution fails if a table created with Spark 3.1 is read with Spark 3.2. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37859) SQL tables created with JDBC with Spark 3.1 are not readable with 3.2
[ https://issues.apache.org/jira/browse/SPARK-37859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37859: Assignee: Apache Spark > SQL tables created with JDBC with Spark 3.1 are not readable with 3.2 > - > > Key: SPARK-37859 > URL: https://issues.apache.org/jira/browse/SPARK-37859 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Karen Feng >Assignee: Apache Spark >Priority: Major > > In > https://github.com/apache/spark/blob/bd24b4884b804fc85a083f82b864823851d5980c/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala#L312, > a new metadata field is added during reading. As we do a full comparison of > the user-provided schema and the actual schema in > https://github.com/apache/spark/blob/bd24b4884b804fc85a083f82b864823851d5980c/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala#L356, > resolution fails if a table created with Spark 3.1 is read with Spark 3.2. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37859) SQL tables created with JDBC with Spark 3.1 are not readable with 3.2
[ https://issues.apache.org/jira/browse/SPARK-37859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17472351#comment-17472351 ] Apache Spark commented on SPARK-37859: -- User 'karenfeng' has created a pull request for this issue: https://github.com/apache/spark/pull/35158 > SQL tables created with JDBC with Spark 3.1 are not readable with 3.2 > - > > Key: SPARK-37859 > URL: https://issues.apache.org/jira/browse/SPARK-37859 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Karen Feng >Priority: Major > > In > https://github.com/apache/spark/blob/bd24b4884b804fc85a083f82b864823851d5980c/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala#L312, > a new metadata field is added during reading. As we do a full comparison of > the user-provided schema and the actual schema in > https://github.com/apache/spark/blob/bd24b4884b804fc85a083f82b864823851d5980c/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala#L356, > resolution fails if a table created with Spark 3.1 is read with Spark 3.2. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37859) SQL tables created with JDBC with Spark 3.1 are not readable with 3.2
[ https://issues.apache.org/jira/browse/SPARK-37859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17472350#comment-17472350 ] Apache Spark commented on SPARK-37859: -- User 'karenfeng' has created a pull request for this issue: https://github.com/apache/spark/pull/35158 > SQL tables created with JDBC with Spark 3.1 are not readable with 3.2 > - > > Key: SPARK-37859 > URL: https://issues.apache.org/jira/browse/SPARK-37859 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Karen Feng >Priority: Major > > In > https://github.com/apache/spark/blob/bd24b4884b804fc85a083f82b864823851d5980c/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala#L312, > a new metadata field is added during reading. As we do a full comparison of > the user-provided schema and the actual schema in > https://github.com/apache/spark/blob/bd24b4884b804fc85a083f82b864823851d5980c/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala#L356, > resolution fails if a table created with Spark 3.1 is read with Spark 3.2. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37851) Mark org.apache.spark.sql.hive.execution as slow tests
[ https://issues.apache.org/jira/browse/SPARK-37851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-37851: - Assignee: Hyukjin Kwon > Mark org.apache.spark.sql.hive.execution as slow tests > -- > > Key: SPARK-37851 > URL: https://issues.apache.org/jira/browse/SPARK-37851 > Project: Spark > Issue Type: Improvement > Components: SQL, Tests >Affects Versions: 3.3.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > > Related to SPARK-33171 and SPARK-32884. We should rebalance both: > "hive -slow tests": > https://github.com/apache/spark/runs/4755996153?check_suite_focus=true > "hive - other tests": > https://github.com/apache/spark/runs/4755996212?check_suite_focus=true -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-37851) Mark org.apache.spark.sql.hive.execution as slow tests
[ https://issues.apache.org/jira/browse/SPARK-37851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-37851. --- Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 35151 [https://github.com/apache/spark/pull/35151] > Mark org.apache.spark.sql.hive.execution as slow tests > -- > > Key: SPARK-37851 > URL: https://issues.apache.org/jira/browse/SPARK-37851 > Project: Spark > Issue Type: Improvement > Components: SQL, Tests >Affects Versions: 3.3.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 3.3.0 > > > Related to SPARK-33171 and SPARK-32884. We should rebalance both: > "hive -slow tests": > https://github.com/apache/spark/runs/4755996153?check_suite_focus=true > "hive - other tests": > https://github.com/apache/spark/runs/4755996212?check_suite_focus=true -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30789) Support (IGNORE | RESPECT) NULLS for LEAD/LAG/NTH_VALUE/FIRST_VALUE/LAST_VALUE
[ https://issues.apache.org/jira/browse/SPARK-30789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxin Gao updated SPARK-30789: --- Fix Version/s: 3.2.1 > Support (IGNORE | RESPECT) NULLS for LEAD/LAG/NTH_VALUE/FIRST_VALUE/LAST_VALUE > -- > > Key: SPARK-30789 > URL: https://issues.apache.org/jira/browse/SPARK-30789 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: jiaan.geng >Assignee: Apache Spark >Priority: Major > Fix For: 3.2.0, 3.2.1 > > > All of LEAD/LAG/NTH_VALUE/FIRST_VALUE/LAST_VALUE should support IGNORE NULLS > | RESPECT NULLS. For example: > {code:java} > LEAD (value_expr [, offset ]) > [ IGNORE NULLS | RESPECT NULLS ] > OVER ( [ PARTITION BY window_partition ] ORDER BY window_ordering ){code} > > {code:java} > LAG (value_expr [, offset ]) > [ IGNORE NULLS | RESPECT NULLS ] > OVER ( [ PARTITION BY window_partition ] ORDER BY window_ordering ){code} > > {code:java} > NTH_VALUE (expr, offset) > [ IGNORE NULLS | RESPECT NULLS ] > OVER > ( [ PARTITION BY window_partition ] > [ ORDER BY window_ordering > frame_clause ] ){code} > > *Oracle:* > [https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/NTH_VALUE.html#GUID-F8A0E88C-67E5-4AA6-9515-95D03A7F9EA0] > *Redshift* > [https://docs.aws.amazon.com/redshift/latest/dg/r_WF_NTH.html] > *Presto* > [https://prestodb.io/docs/current/functions/window.html] > *DB2* > [https://www.ibm.com/support/knowledgecenter/SSGU8G_14.1.0/com.ibm.sqls.doc/ids_sqs_1513.htm] > *Teradata* > [https://docs.teradata.com/r/756LNiPSFdY~4JcCCcR5Cw/GjCT6l7trjkIEjt~7Dhx4w] > *Snowflake* > [https://docs.snowflake.com/en/sql-reference/functions/lead.html] > [https://docs.snowflake.com/en/sql-reference/functions/lag.html] > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35714) Bug fix for deadlock during the executor shutdown
[ https://issues.apache.org/jira/browse/SPARK-35714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxin Gao updated SPARK-35714: --- Fix Version/s: 3.2.1 > Bug fix for deadlock during the executor shutdown > - > > Key: SPARK-35714 > URL: https://issues.apache.org/jira/browse/SPARK-35714 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.1.2 >Reporter: Wan Kun >Assignee: Wan Kun >Priority: Minor > Fix For: 3.0.3, 3.2.0, 3.1.3, 3.2.1 > > Attachments: three_thread_lock.log > > > When a executor received a TERM signal, it (the second TERM signal) will lock > java.lang.Shutdown class and then call Shutdown.exit() method to exit the JVM. > Shutdown will call SparkShutdownHook to shutdown the executor. > During the executor shutdown phase, RemoteProcessDisconnected event will be > send to the RPC inbox, and then WorkerWatcher will try to call > System.exit(-1) again. > Because java.lang.Shutdown has already locked, a deadlock has occurred. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-37853) Clean up deprecation compilation warning related to log4j2
[ https://issues.apache.org/jira/browse/SPARK-37853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-37853. --- Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 35153 [https://github.com/apache/spark/pull/35153] > Clean up deprecation compilation warning related to log4j2 > -- > > Key: SPARK-37853 > URL: https://issues.apache.org/jira/browse/SPARK-37853 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Minor > Fix For: 3.3.0 > > > [WARNING] [Warn] > /spark-source/core/src/main/scala/org/apache/spark/util/logging/DriverLogger.scala:64: > [deprecation @ org.apache.spark.util.logging.DriverLogger.addLogAppender.fa > | origin=org.apache.logging.log4j.core.appender.FileAppender.createAppender | > version=] method createAppender in class FileAppender is deprecated > [WARNING] [Warn] > /spark-source/core/src/test/scala/org/apache/spark/SparkFunSuite.scala:268: > [deprecation @ org.apache.spark.SparkFunSuite.LogAppender. | > origin=org.apache.logging.log4j.core.appender.AbstractAppender. | > version=] constructor AbstractAppender in class AbstractAppender is deprecated -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37853) Clean up deprecation compilation warning related to log4j2
[ https://issues.apache.org/jira/browse/SPARK-37853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-37853: - Assignee: Yang Jie > Clean up deprecation compilation warning related to log4j2 > -- > > Key: SPARK-37853 > URL: https://issues.apache.org/jira/browse/SPARK-37853 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Minor > > [WARNING] [Warn] > /spark-source/core/src/main/scala/org/apache/spark/util/logging/DriverLogger.scala:64: > [deprecation @ org.apache.spark.util.logging.DriverLogger.addLogAppender.fa > | origin=org.apache.logging.log4j.core.appender.FileAppender.createAppender | > version=] method createAppender in class FileAppender is deprecated > [WARNING] [Warn] > /spark-source/core/src/test/scala/org/apache/spark/SparkFunSuite.scala:268: > [deprecation @ org.apache.spark.SparkFunSuite.LogAppender. | > origin=org.apache.logging.log4j.core.appender.AbstractAppender. | > version=] constructor AbstractAppender in class AbstractAppender is deprecated -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34399) Add file commit time to metrics and shown in SQL Tab UI
[ https://issues.apache.org/jira/browse/SPARK-34399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxin Gao updated SPARK-34399: --- Fix Version/s: 3.2.1 (was: 3.2.0) > Add file commit time to metrics and shown in SQL Tab UI > --- > > Key: SPARK-34399 > URL: https://issues.apache.org/jira/browse/SPARK-34399 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: angerszhu >Assignee: angerszhu >Priority: Major > Fix For: 3.2.1 > > > Add file commit time to metrics and shown in SQL Tab UI -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30789) Support (IGNORE | RESPECT) NULLS for LEAD/LAG/NTH_VALUE/FIRST_VALUE/LAST_VALUE
[ https://issues.apache.org/jira/browse/SPARK-30789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxin Gao updated SPARK-30789: --- Fix Version/s: (was: 3.2.0) > Support (IGNORE | RESPECT) NULLS for LEAD/LAG/NTH_VALUE/FIRST_VALUE/LAST_VALUE > -- > > Key: SPARK-30789 > URL: https://issues.apache.org/jira/browse/SPARK-30789 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: jiaan.geng >Assignee: Apache Spark >Priority: Major > Fix For: 3.2.1 > > > All of LEAD/LAG/NTH_VALUE/FIRST_VALUE/LAST_VALUE should support IGNORE NULLS > | RESPECT NULLS. For example: > {code:java} > LEAD (value_expr [, offset ]) > [ IGNORE NULLS | RESPECT NULLS ] > OVER ( [ PARTITION BY window_partition ] ORDER BY window_ordering ){code} > > {code:java} > LAG (value_expr [, offset ]) > [ IGNORE NULLS | RESPECT NULLS ] > OVER ( [ PARTITION BY window_partition ] ORDER BY window_ordering ){code} > > {code:java} > NTH_VALUE (expr, offset) > [ IGNORE NULLS | RESPECT NULLS ] > OVER > ( [ PARTITION BY window_partition ] > [ ORDER BY window_ordering > frame_clause ] ){code} > > *Oracle:* > [https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/NTH_VALUE.html#GUID-F8A0E88C-67E5-4AA6-9515-95D03A7F9EA0] > *Redshift* > [https://docs.aws.amazon.com/redshift/latest/dg/r_WF_NTH.html] > *Presto* > [https://prestodb.io/docs/current/functions/window.html] > *DB2* > [https://www.ibm.com/support/knowledgecenter/SSGU8G_14.1.0/com.ibm.sqls.doc/ids_sqs_1513.htm] > *Teradata* > [https://docs.teradata.com/r/756LNiPSFdY~4JcCCcR5Cw/GjCT6l7trjkIEjt~7Dhx4w] > *Snowflake* > [https://docs.snowflake.com/en/sql-reference/functions/lead.html] > [https://docs.snowflake.com/en/sql-reference/functions/lag.html] > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36464) Fix Underlying Size Variable Initialization in ChunkedByteBufferOutputStream for Writing Over 2GB Data
[ https://issues.apache.org/jira/browse/SPARK-36464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxin Gao updated SPARK-36464: --- Fix Version/s: 3.2.1 (was: 3.2.0) > Fix Underlying Size Variable Initialization in ChunkedByteBufferOutputStream > for Writing Over 2GB Data > -- > > Key: SPARK-36464 > URL: https://issues.apache.org/jira/browse/SPARK-36464 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.8, 3.0.3, 3.1.2, 3.3.0 >Reporter: Kazuyuki Tanimura >Assignee: Kazuyuki Tanimura >Priority: Major > Fix For: 3.1.3, 3.0.4, 3.2.1 > > > The `size` method of `ChunkedByteBufferOutputStream` returns a `Long` value; > however, the underlying `_size` variable is initialized as `Int`. > That causes an overflow and returns a negative size when over 2GB data is > written into `ChunkedByteBufferOutputStream` -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36464) Fix Underlying Size Variable Initialization in ChunkedByteBufferOutputStream for Writing Over 2GB Data
[ https://issues.apache.org/jira/browse/SPARK-36464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxin Gao updated SPARK-36464: --- Fix Version/s: 3.2.0 > Fix Underlying Size Variable Initialization in ChunkedByteBufferOutputStream > for Writing Over 2GB Data > -- > > Key: SPARK-36464 > URL: https://issues.apache.org/jira/browse/SPARK-36464 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.8, 3.0.3, 3.1.2, 3.3.0 >Reporter: Kazuyuki Tanimura >Assignee: Kazuyuki Tanimura >Priority: Major > Fix For: 3.2.0, 3.1.3, 3.0.4, 3.2.1 > > > The `size` method of `ChunkedByteBufferOutputStream` returns a `Long` value; > however, the underlying `_size` variable is initialized as `Int`. > That causes an overflow and returns a negative size when over 2GB data is > written into `ChunkedByteBufferOutputStream` -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33277) Python/Pandas UDF right after off-heap vectorized reader could cause executor crash.
[ https://issues.apache.org/jira/browse/SPARK-33277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxin Gao updated SPARK-33277: --- Fix Version/s: 3.2.1 > Python/Pandas UDF right after off-heap vectorized reader could cause executor > crash. > > > Key: SPARK-33277 > URL: https://issues.apache.org/jira/browse/SPARK-33277 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL >Affects Versions: 2.4.7, 3.0.1 >Reporter: Takuya Ueshin >Assignee: Takuya Ueshin >Priority: Major > Fix For: 2.4.8, 3.0.2, 3.1.0, 3.2.1 > > > Python/Pandas UDF right after off-heap vectorized reader could cause executor > crash. > E.g.,: > {code:java} > spark.range(0, 10, 1, 1).write.parquet(path) > spark.conf.set("spark.sql.columnVector.offheap.enabled", True) > def f(x): > return 0 > fUdf = udf(f, LongType()) > spark.read.parquet(path).select(fUdf('id')).head() > {code} > This is because, the Python evaluation consumes the parent iterator in a > separate thread and it consumes more data from the parent even after the task > ends and the parent is closed. If an off-heap column vector exists in the > parent iterator, it could cause segmentation fault which crashes the executor. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37853) Clean up deprecation compilation warning related to log4j2
[ https://issues.apache.org/jira/browse/SPARK-37853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-37853: -- Parent: SPARK-37814 Issue Type: Sub-task (was: Improvement) > Clean up deprecation compilation warning related to log4j2 > -- > > Key: SPARK-37853 > URL: https://issues.apache.org/jira/browse/SPARK-37853 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Minor > Fix For: 3.3.0 > > > [WARNING] [Warn] > /spark-source/core/src/main/scala/org/apache/spark/util/logging/DriverLogger.scala:64: > [deprecation @ org.apache.spark.util.logging.DriverLogger.addLogAppender.fa > | origin=org.apache.logging.log4j.core.appender.FileAppender.createAppender | > version=] method createAppender in class FileAppender is deprecated > [WARNING] [Warn] > /spark-source/core/src/test/scala/org/apache/spark/SparkFunSuite.scala:268: > [deprecation @ org.apache.spark.SparkFunSuite.LogAppender. | > origin=org.apache.logging.log4j.core.appender.AbstractAppender. | > version=] constructor AbstractAppender in class AbstractAppender is deprecated -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37843) Suppress NoSuchFieldError at setMDCForTask
[ https://issues.apache.org/jira/browse/SPARK-37843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-37843: -- Parent: SPARK-37814 Issue Type: Sub-task (was: Bug) > Suppress NoSuchFieldError at setMDCForTask > -- > > Key: SPARK-37843 > URL: https://issues.apache.org/jira/browse/SPARK-37843 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.3.0 > > > {code} > 00:57:11 2022-01-07 15:57:11.693 - stderr> Exception in thread "Executor task > launch worker-0" java.lang.NoSuchFieldError: mdc > 00:57:11 2022-01-07 15:57:11.693 - stderr>at > org.apache.log4j.MDCFriend.fixForJava9(MDCFriend.java:11) > 00:57:11 2022-01-07 15:57:11.693 - stderr>at > org.slf4j.impl.Log4jMDCAdapter.(Log4jMDCAdapter.java:38) > 00:57:11 2022-01-07 15:57:11.693 - stderr>at > org.slf4j.impl.StaticMDCBinder.getMDCA(StaticMDCBinder.java:59) > 00:57:11 2022-01-07 15:57:11.693 - stderr>at > org.slf4j.MDC.bwCompatibleGetMDCAdapterFromBinder(MDC.java:99) > 00:57:11 2022-01-07 15:57:11.693 - stderr>at > org.slf4j.MDC.(MDC.java:108) > 00:57:11 2022-01-07 15:57:11.693 - stderr>at > org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$setMDCForTask(Executor.scala:750) > 00:57:11 2022-01-07 15:57:11.693 - stderr>at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:441) > 00:57:11 2022-01-07 15:57:11.693 - stderr>at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) > 00:57:11 2022-01-07 15:57:11.693 - stderr>at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) > 00:57:11 2022-01-07 15:57:11.693 - stderr>at > java.base/java.lang.Thread.run(Thread.java:833) > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37844) Remove slf4j-log4j12 dependency from hadoop-minikdc
[ https://issues.apache.org/jira/browse/SPARK-37844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-37844: -- Parent: SPARK-37814 Issue Type: Sub-task (was: Bug) > Remove slf4j-log4j12 dependency from hadoop-minikdc > --- > > Key: SPARK-37844 > URL: https://issues.apache.org/jira/browse/SPARK-37844 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, Tests >Affects Versions: 3.3.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.3.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36979) Add RewriteLateralSubquery rule into nonExcludableRules
[ https://issues.apache.org/jira/browse/SPARK-36979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxin Gao updated SPARK-36979: --- Fix Version/s: 3.2.1 (was: 3.2.0) > Add RewriteLateralSubquery rule into nonExcludableRules > --- > > Key: SPARK-36979 > URL: https://issues.apache.org/jira/browse/SPARK-36979 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: XiDuo You >Assignee: XiDuo You >Priority: Minor > Fix For: 3.2.1 > > > Lateral Join has no meaning without rule `RewriteLateralSubquery`. So now if > we set > `spark.sql.optimizer.excludedRules=org.apache.spark.sql.catalyst.optimizer.RewriteLateralSubquery`, > the lateral join query will fail with: > {code:java} > java.lang.AssertionError: assertion failed: No plan for LateralJoin > lateral-subquery#218 > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36717) Wrong order of variable initialization may lead to incorrect behavior
[ https://issues.apache.org/jira/browse/SPARK-36717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxin Gao updated SPARK-36717: --- Fix Version/s: 3.2.1 > Wrong order of variable initialization may lead to incorrect behavior > - > > Key: SPARK-36717 > URL: https://issues.apache.org/jira/browse/SPARK-36717 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.1.2 >Reporter: Jianmeng Li >Assignee: Jianmeng Li >Priority: Minor > Fix For: 3.2.0, 3.1.3, 3.0.4, 3.2.1, 3.3.0 > > > Incorrect order of variable initialization may lead to incorrect behavior, > Related code: > [TorrentBroadcast.scala|https://github.com/apache/spark/blob/0494dc90af48ce7da0625485a4dc6917a244d580/core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala#L94] > , TorrentBroadCast will get wrong checksumEnabled value after > initialization, this may not be what we need, we can move L94 front of > setConf(SparkEnv.get.conf) to avoid this. > Supplement: > Snippet 1: > {code:java} > class Broadcast { > def setConf(): Unit = { > checksumEnabled = true > } > setConf() > var checksumEnabled = false > } > println(new Broadcast().checksumEnabled){code} > output: > {code:java} > false{code} > Snippet 2: > {code:java} > class Broadcast { > var checksumEnabled = false > def setConf(): Unit = { > checksumEnabled = true > } > setConf() > } > println(new Broadcast().checksumEnabled){code} > output: > {code:java} > true{code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37790) Upgrade SLF4J to 1.7.32
[ https://issues.apache.org/jira/browse/SPARK-37790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-37790: -- Parent: SPARK-37814 Issue Type: Sub-task (was: Improvement) > Upgrade SLF4J to 1.7.32 > --- > > Key: SPARK-37790 > URL: https://issues.apache.org/jira/browse/SPARK-37790 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 3.3.0 >Reporter: William Hyun >Assignee: William Hyun >Priority: Major > Fix For: 3.3.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36717) Wrong order of variable initialization may lead to incorrect behavior
[ https://issues.apache.org/jira/browse/SPARK-36717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxin Gao updated SPARK-36717: --- Fix Version/s: (was: 3.2.0) > Wrong order of variable initialization may lead to incorrect behavior > - > > Key: SPARK-36717 > URL: https://issues.apache.org/jira/browse/SPARK-36717 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.1.2 >Reporter: Jianmeng Li >Assignee: Jianmeng Li >Priority: Minor > Fix For: 3.1.3, 3.0.4, 3.2.1, 3.3.0 > > > Incorrect order of variable initialization may lead to incorrect behavior, > Related code: > [TorrentBroadcast.scala|https://github.com/apache/spark/blob/0494dc90af48ce7da0625485a4dc6917a244d580/core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala#L94] > , TorrentBroadCast will get wrong checksumEnabled value after > initialization, this may not be what we need, we can move L94 front of > setConf(SparkEnv.get.conf) to avoid this. > Supplement: > Snippet 1: > {code:java} > class Broadcast { > def setConf(): Unit = { > checksumEnabled = true > } > setConf() > var checksumEnabled = false > } > println(new Broadcast().checksumEnabled){code} > output: > {code:java} > false{code} > Snippet 2: > {code:java} > class Broadcast { > var checksumEnabled = false > def setConf(): Unit = { > checksumEnabled = true > } > setConf() > } > println(new Broadcast().checksumEnabled){code} > output: > {code:java} > true{code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37860) [BUG] Revert: Fix taskid in the stage page task event timeline
Jackey Lee created SPARK-37860: -- Summary: [BUG] Revert: Fix taskid in the stage page task event timeline Key: SPARK-37860 URL: https://issues.apache.org/jira/browse/SPARK-37860 Project: Spark Issue Type: Bug Components: Web UI Affects Versions: 3.2.1 Reporter: Jackey Lee In [#32888|https://github.com/apache/spark/pull/32888], [@shahidki31|https://github.com/shahidki31] change taskInfo.index to taskInfo.taskId. However, we generally use {{index.attempt}} or {{taskId}} to distinguish tasks within a stage, not {{{}taskId.attempt{}}}. Thus [#32888|https://github.com/apache/spark/pull/32888] was a wrong fix issue, we should revert it. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35746) Task id in the Stage page timeline is incorrect
[ https://issues.apache.org/jira/browse/SPARK-35746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17472389#comment-17472389 ] Apache Spark commented on SPARK-35746: -- User 'stczwd' has created a pull request for this issue: https://github.com/apache/spark/pull/35159 > Task id in the Stage page timeline is incorrect > --- > > Key: SPARK-35746 > URL: https://issues.apache.org/jira/browse/SPARK-35746 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 3.0.0, 3.1.2 >Reporter: shahid >Assignee: shahid >Priority: Minor > Attachments: image-2021-06-12-07-03-09-808.png > > > !image-2021-06-12-07-03-09-808.png! -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37159) Change HiveExternalCatalogVersionsSuite to be able to test with Java 17
[ https://issues.apache.org/jira/browse/SPARK-37159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17472409#comment-17472409 ] Dongjoon Hyun commented on SPARK-37159: --- I'll collect this to a subtask of SPARK-33772, [~sarutak]. > Change HiveExternalCatalogVersionsSuite to be able to test with Java 17 > --- > > Key: SPARK-37159 > URL: https://issues.apache.org/jira/browse/SPARK-37159 > Project: Spark > Issue Type: Bug > Components: SQL, Tests >Affects Versions: 3.3.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Minor > Fix For: 3.3.0 > > > SPARK-37105 seems to have fixed most of tests in `sql/hive` for Java 17 but > `HiveExternalCatalogVersionsSuite`. > {code} > [info] org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite *** ABORTED > *** (42 seconds, 526 milliseconds) > [info] spark-submit returned with exit code 1. > [info] Command line: > '/home/kou/work/oss/spark-java17/sql/hive/target/tmp/org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite/test-spark-d86af275-0c40-4b47-9cab-defa92a5ffa7/spark-3.2.0/bin/spark-submit' > '--name' 'prepare testing tables' '--master' 'local[2]' '--conf' > 'spark.ui.enabled=false' '--conf' 'spark.master.rest.enabled=false' '--conf' > 'spark.sql.hive.metastore.version=2.3' '--conf' > 'spark.sql.hive.metastore.jars=maven' '--conf' > 'spark.sql.warehouse.dir=/home/kou/work/oss/spark-java17/sql/hive/target/tmp/org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite/warehouse-69d9bdbc-54ce-443b-8677-a413663ddb62' > '--conf' 'spark.sql.test.version.index=0' '--driver-java-options' > '-Dderby.system.home=/home/kou/work/oss/spark-java17/sql/hive/target/tmp/org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite/warehouse-69d9bdbc-54ce-443b-8677-a413663ddb62' > > '/home/kou/work/oss/spark-java17/sql/hive/target/tmp/org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite/test15166225869206697603.py' > [info] > [info] 2021-10-28 06:07:18.486 - stderr> Using Spark's default log4j > profile: org/apache/spark/log4j-defaults.properties > [info] 2021-10-28 06:07:18.49 - stderr> 21/10/28 22:07:18 INFO > SparkContext: Running Spark version 3.2.0 > [info] 2021-10-28 06:07:18.537 - stderr> 21/10/28 22:07:18 WARN > NativeCodeLoader: Unable to load native-hadoop library for your platform... > using builtin-java classes where applicable > [info] 2021-10-28 06:07:18.616 - stderr> 21/10/28 22:07:18 INFO > ResourceUtils: == > [info] 2021-10-28 06:07:18.616 - stderr> 21/10/28 22:07:18 INFO > ResourceUtils: No custom resources configured for spark.driver. > [info] 2021-10-28 06:07:18.616 - stderr> 21/10/28 22:07:18 INFO > ResourceUtils: == > [info] 2021-10-28 06:07:18.617 - stderr> 21/10/28 22:07:18 INFO > SparkContext: Submitted application: prepare testing tables > [info] 2021-10-28 06:07:18.632 - stderr> 21/10/28 22:07:18 INFO > ResourceProfile: Default ResourceProfile created, executor resources: > Map(cores -> name: cores, amount: 1, script: , vendor: , memory -> name: > memory, amount: 1024, script: , vendor: , offHeap -> name: offHeap, amount: > 0, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0) > [info] 2021-10-28 06:07:18.641 - stderr> 21/10/28 22:07:18 INFO > ResourceProfile: Limiting resource is cpu > [info] 2021-10-28 06:07:18.641 - stderr> 21/10/28 22:07:18 INFO > ResourceProfileManager: Added ResourceProfile id: 0 > [info] 2021-10-28 06:07:18.679 - stderr> 21/10/28 22:07:18 INFO > SecurityManager: Changing view acls to: kou > [info] 2021-10-28 06:07:18.679 - stderr> 21/10/28 22:07:18 INFO > SecurityManager: Changing modify acls to: kou > [info] 2021-10-28 06:07:18.68 - stderr> 21/10/28 22:07:18 INFO > SecurityManager: Changing view acls groups to: > [info] 2021-10-28 06:07:18.68 - stderr> 21/10/28 22:07:18 INFO > SecurityManager: Changing modify acls groups to: > [info] 2021-10-28 06:07:18.68 - stderr> 21/10/28 22:07:18 INFO > SecurityManager: SecurityManager: authentication disabled; ui acls disabled; > users with view permissions: Set(kou); groups with view permissions: Set(); > users with modify permissions: Set(kou); groups with modify permissions: > Set() > [info] 2021-10-28 06:07:18.886 - stderr> 21/10/28 22:07:18 INFO Utils: > Successfully started service 'sparkDriver' on port 35867. > [info] 2021-10-28 06:07:18.906 - stderr> 21/10/28 22:07:18 INFO SparkEnv: > Registering MapOutputTracker > [info] 2021-10-28 06:07:18.93 - stderr> 21/10/28 22:07:18 INFO SparkEnv: > Registering BlockManagerMaster > [info] 2021-10-28 06:07:18.943 - stderr> 21/10/28 22
[jira] [Updated] (SPARK-37159) Change HiveExternalCatalogVersionsSuite to be able to test with Java 17
[ https://issues.apache.org/jira/browse/SPARK-37159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-37159: -- Parent: SPARK-33772 Issue Type: Sub-task (was: Bug) > Change HiveExternalCatalogVersionsSuite to be able to test with Java 17 > --- > > Key: SPARK-37159 > URL: https://issues.apache.org/jira/browse/SPARK-37159 > Project: Spark > Issue Type: Sub-task > Components: SQL, Tests >Affects Versions: 3.3.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Minor > Fix For: 3.3.0 > > > SPARK-37105 seems to have fixed most of tests in `sql/hive` for Java 17 but > `HiveExternalCatalogVersionsSuite`. > {code} > [info] org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite *** ABORTED > *** (42 seconds, 526 milliseconds) > [info] spark-submit returned with exit code 1. > [info] Command line: > '/home/kou/work/oss/spark-java17/sql/hive/target/tmp/org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite/test-spark-d86af275-0c40-4b47-9cab-defa92a5ffa7/spark-3.2.0/bin/spark-submit' > '--name' 'prepare testing tables' '--master' 'local[2]' '--conf' > 'spark.ui.enabled=false' '--conf' 'spark.master.rest.enabled=false' '--conf' > 'spark.sql.hive.metastore.version=2.3' '--conf' > 'spark.sql.hive.metastore.jars=maven' '--conf' > 'spark.sql.warehouse.dir=/home/kou/work/oss/spark-java17/sql/hive/target/tmp/org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite/warehouse-69d9bdbc-54ce-443b-8677-a413663ddb62' > '--conf' 'spark.sql.test.version.index=0' '--driver-java-options' > '-Dderby.system.home=/home/kou/work/oss/spark-java17/sql/hive/target/tmp/org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite/warehouse-69d9bdbc-54ce-443b-8677-a413663ddb62' > > '/home/kou/work/oss/spark-java17/sql/hive/target/tmp/org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite/test15166225869206697603.py' > [info] > [info] 2021-10-28 06:07:18.486 - stderr> Using Spark's default log4j > profile: org/apache/spark/log4j-defaults.properties > [info] 2021-10-28 06:07:18.49 - stderr> 21/10/28 22:07:18 INFO > SparkContext: Running Spark version 3.2.0 > [info] 2021-10-28 06:07:18.537 - stderr> 21/10/28 22:07:18 WARN > NativeCodeLoader: Unable to load native-hadoop library for your platform... > using builtin-java classes where applicable > [info] 2021-10-28 06:07:18.616 - stderr> 21/10/28 22:07:18 INFO > ResourceUtils: == > [info] 2021-10-28 06:07:18.616 - stderr> 21/10/28 22:07:18 INFO > ResourceUtils: No custom resources configured for spark.driver. > [info] 2021-10-28 06:07:18.616 - stderr> 21/10/28 22:07:18 INFO > ResourceUtils: == > [info] 2021-10-28 06:07:18.617 - stderr> 21/10/28 22:07:18 INFO > SparkContext: Submitted application: prepare testing tables > [info] 2021-10-28 06:07:18.632 - stderr> 21/10/28 22:07:18 INFO > ResourceProfile: Default ResourceProfile created, executor resources: > Map(cores -> name: cores, amount: 1, script: , vendor: , memory -> name: > memory, amount: 1024, script: , vendor: , offHeap -> name: offHeap, amount: > 0, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0) > [info] 2021-10-28 06:07:18.641 - stderr> 21/10/28 22:07:18 INFO > ResourceProfile: Limiting resource is cpu > [info] 2021-10-28 06:07:18.641 - stderr> 21/10/28 22:07:18 INFO > ResourceProfileManager: Added ResourceProfile id: 0 > [info] 2021-10-28 06:07:18.679 - stderr> 21/10/28 22:07:18 INFO > SecurityManager: Changing view acls to: kou > [info] 2021-10-28 06:07:18.679 - stderr> 21/10/28 22:07:18 INFO > SecurityManager: Changing modify acls to: kou > [info] 2021-10-28 06:07:18.68 - stderr> 21/10/28 22:07:18 INFO > SecurityManager: Changing view acls groups to: > [info] 2021-10-28 06:07:18.68 - stderr> 21/10/28 22:07:18 INFO > SecurityManager: Changing modify acls groups to: > [info] 2021-10-28 06:07:18.68 - stderr> 21/10/28 22:07:18 INFO > SecurityManager: SecurityManager: authentication disabled; ui acls disabled; > users with view permissions: Set(kou); groups with view permissions: Set(); > users with modify permissions: Set(kou); groups with modify permissions: > Set() > [info] 2021-10-28 06:07:18.886 - stderr> 21/10/28 22:07:18 INFO Utils: > Successfully started service 'sparkDriver' on port 35867. > [info] 2021-10-28 06:07:18.906 - stderr> 21/10/28 22:07:18 INFO SparkEnv: > Registering MapOutputTracker > [info] 2021-10-28 06:07:18.93 - stderr> 21/10/28 22:07:18 INFO SparkEnv: > Registering BlockManagerMaster > [info] 2021-10-28 06:07:18.943 - stderr> 21/10/28 22:07:18 INFO > BlockManagerMasterEndpoint: Usin
[jira] [Commented] (SPARK-37860) [BUG] Revert: Fix taskid in the stage page task event timeline
[ https://issues.apache.org/jira/browse/SPARK-37860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17472420#comment-17472420 ] Apache Spark commented on SPARK-37860: -- User 'stczwd' has created a pull request for this issue: https://github.com/apache/spark/pull/35160 > [BUG] Revert: Fix taskid in the stage page task event timeline > -- > > Key: SPARK-37860 > URL: https://issues.apache.org/jira/browse/SPARK-37860 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 3.2.1 >Reporter: Jackey Lee >Priority: Major > > In [#32888|https://github.com/apache/spark/pull/32888], > [@shahidki31|https://github.com/shahidki31] change taskInfo.index to > taskInfo.taskId. However, we generally use {{index.attempt}} or {{taskId}} to > distinguish tasks within a stage, not {{{}taskId.attempt{}}}. > Thus [#32888|https://github.com/apache/spark/pull/32888] was a wrong fix > issue, we should revert it. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37860) [BUG] Revert: Fix taskid in the stage page task event timeline
[ https://issues.apache.org/jira/browse/SPARK-37860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37860: Assignee: (was: Apache Spark) > [BUG] Revert: Fix taskid in the stage page task event timeline > -- > > Key: SPARK-37860 > URL: https://issues.apache.org/jira/browse/SPARK-37860 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 3.2.1 >Reporter: Jackey Lee >Priority: Major > > In [#32888|https://github.com/apache/spark/pull/32888], > [@shahidki31|https://github.com/shahidki31] change taskInfo.index to > taskInfo.taskId. However, we generally use {{index.attempt}} or {{taskId}} to > distinguish tasks within a stage, not {{{}taskId.attempt{}}}. > Thus [#32888|https://github.com/apache/spark/pull/32888] was a wrong fix > issue, we should revert it. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37860) [BUG] Revert: Fix taskid in the stage page task event timeline
[ https://issues.apache.org/jira/browse/SPARK-37860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37860: Assignee: Apache Spark > [BUG] Revert: Fix taskid in the stage page task event timeline > -- > > Key: SPARK-37860 > URL: https://issues.apache.org/jira/browse/SPARK-37860 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 3.2.1 >Reporter: Jackey Lee >Assignee: Apache Spark >Priority: Major > > In [#32888|https://github.com/apache/spark/pull/32888], > [@shahidki31|https://github.com/shahidki31] change taskInfo.index to > taskInfo.taskId. However, we generally use {{index.attempt}} or {{taskId}} to > distinguish tasks within a stage, not {{{}taskId.attempt{}}}. > Thus [#32888|https://github.com/apache/spark/pull/32888] was a wrong fix > issue, we should revert it. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37860) [BUG] Revert: Fix taskid in the stage page task event timeline
[ https://issues.apache.org/jira/browse/SPARK-37860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17472421#comment-17472421 ] Apache Spark commented on SPARK-37860: -- User 'stczwd' has created a pull request for this issue: https://github.com/apache/spark/pull/35160 > [BUG] Revert: Fix taskid in the stage page task event timeline > -- > > Key: SPARK-37860 > URL: https://issues.apache.org/jira/browse/SPARK-37860 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 3.2.1 >Reporter: Jackey Lee >Priority: Major > > In [#32888|https://github.com/apache/spark/pull/32888], > [@shahidki31|https://github.com/shahidki31] change taskInfo.index to > taskInfo.taskId. However, we generally use {{index.attempt}} or {{taskId}} to > distinguish tasks within a stage, not {{{}taskId.attempt{}}}. > Thus [#32888|https://github.com/apache/spark/pull/32888] was a wrong fix > issue, we should revert it. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37159) Change HiveExternalCatalogVersionsSuite to be able to test with Java 17
[ https://issues.apache.org/jira/browse/SPARK-37159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17472434#comment-17472434 ] Kousuke Saruta commented on SPARK-37159: All right. Thank you [~dongjoon]! > Change HiveExternalCatalogVersionsSuite to be able to test with Java 17 > --- > > Key: SPARK-37159 > URL: https://issues.apache.org/jira/browse/SPARK-37159 > Project: Spark > Issue Type: Sub-task > Components: SQL, Tests >Affects Versions: 3.3.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Minor > Fix For: 3.3.0 > > > SPARK-37105 seems to have fixed most of tests in `sql/hive` for Java 17 but > `HiveExternalCatalogVersionsSuite`. > {code} > [info] org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite *** ABORTED > *** (42 seconds, 526 milliseconds) > [info] spark-submit returned with exit code 1. > [info] Command line: > '/home/kou/work/oss/spark-java17/sql/hive/target/tmp/org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite/test-spark-d86af275-0c40-4b47-9cab-defa92a5ffa7/spark-3.2.0/bin/spark-submit' > '--name' 'prepare testing tables' '--master' 'local[2]' '--conf' > 'spark.ui.enabled=false' '--conf' 'spark.master.rest.enabled=false' '--conf' > 'spark.sql.hive.metastore.version=2.3' '--conf' > 'spark.sql.hive.metastore.jars=maven' '--conf' > 'spark.sql.warehouse.dir=/home/kou/work/oss/spark-java17/sql/hive/target/tmp/org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite/warehouse-69d9bdbc-54ce-443b-8677-a413663ddb62' > '--conf' 'spark.sql.test.version.index=0' '--driver-java-options' > '-Dderby.system.home=/home/kou/work/oss/spark-java17/sql/hive/target/tmp/org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite/warehouse-69d9bdbc-54ce-443b-8677-a413663ddb62' > > '/home/kou/work/oss/spark-java17/sql/hive/target/tmp/org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite/test15166225869206697603.py' > [info] > [info] 2021-10-28 06:07:18.486 - stderr> Using Spark's default log4j > profile: org/apache/spark/log4j-defaults.properties > [info] 2021-10-28 06:07:18.49 - stderr> 21/10/28 22:07:18 INFO > SparkContext: Running Spark version 3.2.0 > [info] 2021-10-28 06:07:18.537 - stderr> 21/10/28 22:07:18 WARN > NativeCodeLoader: Unable to load native-hadoop library for your platform... > using builtin-java classes where applicable > [info] 2021-10-28 06:07:18.616 - stderr> 21/10/28 22:07:18 INFO > ResourceUtils: == > [info] 2021-10-28 06:07:18.616 - stderr> 21/10/28 22:07:18 INFO > ResourceUtils: No custom resources configured for spark.driver. > [info] 2021-10-28 06:07:18.616 - stderr> 21/10/28 22:07:18 INFO > ResourceUtils: == > [info] 2021-10-28 06:07:18.617 - stderr> 21/10/28 22:07:18 INFO > SparkContext: Submitted application: prepare testing tables > [info] 2021-10-28 06:07:18.632 - stderr> 21/10/28 22:07:18 INFO > ResourceProfile: Default ResourceProfile created, executor resources: > Map(cores -> name: cores, amount: 1, script: , vendor: , memory -> name: > memory, amount: 1024, script: , vendor: , offHeap -> name: offHeap, amount: > 0, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0) > [info] 2021-10-28 06:07:18.641 - stderr> 21/10/28 22:07:18 INFO > ResourceProfile: Limiting resource is cpu > [info] 2021-10-28 06:07:18.641 - stderr> 21/10/28 22:07:18 INFO > ResourceProfileManager: Added ResourceProfile id: 0 > [info] 2021-10-28 06:07:18.679 - stderr> 21/10/28 22:07:18 INFO > SecurityManager: Changing view acls to: kou > [info] 2021-10-28 06:07:18.679 - stderr> 21/10/28 22:07:18 INFO > SecurityManager: Changing modify acls to: kou > [info] 2021-10-28 06:07:18.68 - stderr> 21/10/28 22:07:18 INFO > SecurityManager: Changing view acls groups to: > [info] 2021-10-28 06:07:18.68 - stderr> 21/10/28 22:07:18 INFO > SecurityManager: Changing modify acls groups to: > [info] 2021-10-28 06:07:18.68 - stderr> 21/10/28 22:07:18 INFO > SecurityManager: SecurityManager: authentication disabled; ui acls disabled; > users with view permissions: Set(kou); groups with view permissions: Set(); > users with modify permissions: Set(kou); groups with modify permissions: > Set() > [info] 2021-10-28 06:07:18.886 - stderr> 21/10/28 22:07:18 INFO Utils: > Successfully started service 'sparkDriver' on port 35867. > [info] 2021-10-28 06:07:18.906 - stderr> 21/10/28 22:07:18 INFO SparkEnv: > Registering MapOutputTracker > [info] 2021-10-28 06:07:18.93 - stderr> 21/10/28 22:07:18 INFO SparkEnv: > Registering BlockManagerMaster > [info] 2021-10-28 06:07:18.943 - stderr> 21/10/28 22:07:18 INFO > Blo
[jira] [Resolved] (SPARK-37847) PushBlockStreamCallback should check isTooLate first to avoid NPE
[ https://issues.apache.org/jira/browse/SPARK-37847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mridul Muralidharan resolved SPARK-37847. - Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 35146 [https://github.com/apache/spark/pull/35146] > PushBlockStreamCallback should check isTooLate first to avoid NPE > - > > Key: SPARK-37847 > URL: https://issues.apache.org/jira/browse/SPARK-37847 > Project: Spark > Issue Type: Sub-task > Components: Shuffle, Spark Core >Affects Versions: 3.2.1, 3.3.0 >Reporter: Cheng Pan >Assignee: Cheng Pan >Priority: Major > Fix For: 3.3.0 > > > {code:java} > 2022-01-07 21:06:14,464 INFO shuffle.RemoteBlockPushResolver: shuffle > partition application_1640143179334_0149_-1 102 6922, chunk_size=1, > meta_length=138, data_length=112632 > 2022-01-07 21:06:14,615 ERROR shuffle.RemoteBlockPushResolver: Encountered > issue when merging shufflePush_102_0_279_6922 > java.lang.NullPointerException > at > org.apache.spark.network.shuffle.RemoteBlockPushResolver$AppShuffleMergePartitionsInfo.access$200(RemoteBlockPushResolver.java:1017) > at > org.apache.spark.network.shuffle.RemoteBlockPushResolver$PushBlockStreamCallback.isStale(RemoteBlockPushResolver.java:806) > at > org.apache.spark.network.shuffle.RemoteBlockPushResolver$PushBlockStreamCallback.onData(RemoteBlockPushResolver.java:840) > at > org.apache.spark.network.server.TransportRequestHandler$3.onData(TransportRequestHandler.java:209) > at > org.apache.spark.network.client.StreamInterceptor.handle(StreamInterceptor.java:79) > at > org.apache.spark.network.util.TransportFrameDecoder.feedInterceptor(TransportFrameDecoder.java:263) > at > org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:87) > at > org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) > at > org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) > at > org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) > at > org.sparkproject.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) > at > org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) > at > org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) > at > org.sparkproject.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) > at > org.sparkproject.io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166) > at > org.sparkproject.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:722) > at > org.sparkproject.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:658) > at > org.sparkproject.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:584) > at > org.sparkproject.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496) > at > org.sparkproject.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) > at > org.sparkproject.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) > at > org.sparkproject.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.lang.Thread.run(Thread.java:748) > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37847) PushBlockStreamCallback should check isTooLate first to avoid NPE
[ https://issues.apache.org/jira/browse/SPARK-37847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mridul Muralidharan reassigned SPARK-37847: --- Assignee: Cheng Pan > PushBlockStreamCallback should check isTooLate first to avoid NPE > - > > Key: SPARK-37847 > URL: https://issues.apache.org/jira/browse/SPARK-37847 > Project: Spark > Issue Type: Sub-task > Components: Shuffle, Spark Core >Affects Versions: 3.2.1, 3.3.0 >Reporter: Cheng Pan >Assignee: Cheng Pan >Priority: Major > > {code:java} > 2022-01-07 21:06:14,464 INFO shuffle.RemoteBlockPushResolver: shuffle > partition application_1640143179334_0149_-1 102 6922, chunk_size=1, > meta_length=138, data_length=112632 > 2022-01-07 21:06:14,615 ERROR shuffle.RemoteBlockPushResolver: Encountered > issue when merging shufflePush_102_0_279_6922 > java.lang.NullPointerException > at > org.apache.spark.network.shuffle.RemoteBlockPushResolver$AppShuffleMergePartitionsInfo.access$200(RemoteBlockPushResolver.java:1017) > at > org.apache.spark.network.shuffle.RemoteBlockPushResolver$PushBlockStreamCallback.isStale(RemoteBlockPushResolver.java:806) > at > org.apache.spark.network.shuffle.RemoteBlockPushResolver$PushBlockStreamCallback.onData(RemoteBlockPushResolver.java:840) > at > org.apache.spark.network.server.TransportRequestHandler$3.onData(TransportRequestHandler.java:209) > at > org.apache.spark.network.client.StreamInterceptor.handle(StreamInterceptor.java:79) > at > org.apache.spark.network.util.TransportFrameDecoder.feedInterceptor(TransportFrameDecoder.java:263) > at > org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:87) > at > org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) > at > org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) > at > org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) > at > org.sparkproject.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) > at > org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) > at > org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) > at > org.sparkproject.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) > at > org.sparkproject.io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166) > at > org.sparkproject.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:722) > at > org.sparkproject.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:658) > at > org.sparkproject.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:584) > at > org.sparkproject.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496) > at > org.sparkproject.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) > at > org.sparkproject.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) > at > org.sparkproject.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.lang.Thread.run(Thread.java:748) > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37861) How to reload udf jar class without restart driver and executor
Jinpeng Chi created SPARK-37861: --- Summary: How to reload udf jar class without restart driver and executor Key: SPARK-37861 URL: https://issues.apache.org/jira/browse/SPARK-37861 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.1.2 Reporter: Jinpeng Chi How to reload udf jar class without restart driver and executor -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37840) Dynamically update the loaded Hive UDF JAR
[ https://issues.apache.org/jira/browse/SPARK-37840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17472471#comment-17472471 ] Jinpeng Chi commented on SPARK-37840: - [~hyukjin.kwon] No > Dynamically update the loaded Hive UDF JAR > -- > > Key: SPARK-37840 > URL: https://issues.apache.org/jira/browse/SPARK-37840 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: melin >Priority: Major > > In the production environment, spark ThriftServer needs to be restarted if > jar files are updated after UDF files are loaded。 -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-37860) [BUG] Revert: Fix taskid in the stage page task event timeline
[ https://issues.apache.org/jira/browse/SPARK-37860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta resolved SPARK-37860. Fix Version/s: 3.1.3 3.0.4 3.2.1 3.3.0 Assignee: Jackey Lee Resolution: Fixed Issue resolved in https://github.com/apache/spark/pull/35160 > [BUG] Revert: Fix taskid in the stage page task event timeline > -- > > Key: SPARK-37860 > URL: https://issues.apache.org/jira/browse/SPARK-37860 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 3.2.1 >Reporter: Jackey Lee >Assignee: Jackey Lee >Priority: Major > Fix For: 3.1.3, 3.0.4, 3.2.1, 3.3.0 > > > In [#32888|https://github.com/apache/spark/pull/32888], > [@shahidki31|https://github.com/shahidki31] change taskInfo.index to > taskInfo.taskId. However, we generally use {{index.attempt}} or {{taskId}} to > distinguish tasks within a stage, not {{{}taskId.attempt{}}}. > Thus [#32888|https://github.com/apache/spark/pull/32888] was a wrong fix > issue, we should revert it. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37860) [BUG] Revert: Fix taskid in the stage page task event timeline
[ https://issues.apache.org/jira/browse/SPARK-37860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17472487#comment-17472487 ] Kousuke Saruta commented on SPARK-37860: Note: If the vote of Spark 3.2.1 RC1 passes, replace the fix version of 3.2.1 with 3.2.2. > [BUG] Revert: Fix taskid in the stage page task event timeline > -- > > Key: SPARK-37860 > URL: https://issues.apache.org/jira/browse/SPARK-37860 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 3.2.1 >Reporter: Jackey Lee >Assignee: Jackey Lee >Priority: Major > Fix For: 3.1.3, 3.0.4, 3.2.1, 3.3.0 > > > In [#32888|https://github.com/apache/spark/pull/32888], > [@shahidki31|https://github.com/shahidki31] change taskInfo.index to > taskInfo.taskId. However, we generally use {{index.attempt}} or {{taskId}} to > distinguish tasks within a stage, not {{{}taskId.attempt{}}}. > Thus [#32888|https://github.com/apache/spark/pull/32888] was a wrong fix > issue, we should revert it. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37840) Dynamically update the loaded Hive UDF JAR
[ https://issues.apache.org/jira/browse/SPARK-37840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17472488#comment-17472488 ] Hyukjin Kwon commented on SPARK-37840: -- ADD FILE|JAR|ARCHIVE tracks timestamps IIRC, and they can update newer files. What's the current behaviour? > Dynamically update the loaded Hive UDF JAR > -- > > Key: SPARK-37840 > URL: https://issues.apache.org/jira/browse/SPARK-37840 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: melin >Priority: Major > > In the production environment, spark ThriftServer needs to be restarted if > jar files are updated after UDF files are loaded。 -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37862) RecordBinaryComparator should fast skip the check of aligning with unaligned platform
XiDuo You created SPARK-37862: - Summary: RecordBinaryComparator should fast skip the check of aligning with unaligned platform Key: SPARK-37862 URL: https://issues.apache.org/jira/browse/SPARK-37862 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.3.0 Reporter: XiDuo You Same with SPARK-37796 It would be better to fast sikip the check of aligning if the platform is unaligned. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37863) Add submitTime for Spark Application
Zhou Yifan created SPARK-37863: -- Summary: Add submitTime for Spark Application Key: SPARK-37863 URL: https://issues.apache.org/jira/browse/SPARK-37863 Project: Spark Issue Type: New Feature Components: Spark Submit Affects Versions: 3.3.0 Reporter: Zhou Yifan Spark Driver may take a long time to startup when running with cluster mode, if YARN temporarily does not have enough resources, or HDFS's performance is poor due to high pressure. If `submitTime` is added, we can detect such situation by comparing it with the `spark.app.startTime`(already exists) . -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37863) Add submitTime for Spark Application
[ https://issues.apache.org/jira/browse/SPARK-37863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhou Yifan updated SPARK-37863: --- Description: Spark Driver may take a long time to startup when running with cluster mode, if YARN temporarily does not have enough resources, or HDFS's performance is poor due to high pressure. After`submitTime` is added, we can detect such situation by comparing it with the `spark.app.startTime`(already exists) . was: Spark Driver may take a long time to startup when running with cluster mode, if YARN temporarily does not have enough resources, or HDFS's performance is poor due to high pressure. If `submitTime` is added, we can detect such situation by comparing it with the `spark.app.startTime`(already exists) . > Add submitTime for Spark Application > > > Key: SPARK-37863 > URL: https://issues.apache.org/jira/browse/SPARK-37863 > Project: Spark > Issue Type: New Feature > Components: Spark Submit >Affects Versions: 3.3.0 >Reporter: Zhou Yifan >Priority: Major > > Spark Driver may take a long time to startup when running with cluster mode, > if YARN temporarily does not have enough resources, or HDFS's performance is > poor due to high pressure. > After`submitTime` is added, we can detect such situation by comparing it with > the `spark.app.startTime`(already exists) . -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37862) RecordBinaryComparator should fast skip the check of aligning with unaligned platform
[ https://issues.apache.org/jira/browse/SPARK-37862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17472493#comment-17472493 ] Apache Spark commented on SPARK-37862: -- User 'ulysses-you' has created a pull request for this issue: https://github.com/apache/spark/pull/35161 > RecordBinaryComparator should fast skip the check of aligning with unaligned > platform > - > > Key: SPARK-37862 > URL: https://issues.apache.org/jira/browse/SPARK-37862 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: XiDuo You >Priority: Major > > Same with SPARK-37796 > It would be better to fast sikip the check of aligning if the platform is > unaligned. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37862) RecordBinaryComparator should fast skip the check of aligning with unaligned platform
[ https://issues.apache.org/jira/browse/SPARK-37862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37862: Assignee: (was: Apache Spark) > RecordBinaryComparator should fast skip the check of aligning with unaligned > platform > - > > Key: SPARK-37862 > URL: https://issues.apache.org/jira/browse/SPARK-37862 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: XiDuo You >Priority: Major > > Same with SPARK-37796 > It would be better to fast sikip the check of aligning if the platform is > unaligned. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37862) RecordBinaryComparator should fast skip the check of aligning with unaligned platform
[ https://issues.apache.org/jira/browse/SPARK-37862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37862: Assignee: Apache Spark > RecordBinaryComparator should fast skip the check of aligning with unaligned > platform > - > > Key: SPARK-37862 > URL: https://issues.apache.org/jira/browse/SPARK-37862 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: XiDuo You >Assignee: Apache Spark >Priority: Major > > Same with SPARK-37796 > It would be better to fast sikip the check of aligning if the platform is > unaligned. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37863) Add submitTime for Spark Application
[ https://issues.apache.org/jira/browse/SPARK-37863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37863: Assignee: Apache Spark > Add submitTime for Spark Application > > > Key: SPARK-37863 > URL: https://issues.apache.org/jira/browse/SPARK-37863 > Project: Spark > Issue Type: New Feature > Components: Spark Submit >Affects Versions: 3.3.0 >Reporter: Zhou Yifan >Assignee: Apache Spark >Priority: Major > > Spark Driver may take a long time to startup when running with cluster mode, > if YARN temporarily does not have enough resources, or HDFS's performance is > poor due to high pressure. > After`submitTime` is added, we can detect such situation by comparing it with > the `spark.app.startTime`(already exists) . -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37863) Add submitTime for Spark Application
[ https://issues.apache.org/jira/browse/SPARK-37863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17472507#comment-17472507 ] Apache Spark commented on SPARK-37863: -- User 'zhouyifan279' has created a pull request for this issue: https://github.com/apache/spark/pull/35162 > Add submitTime for Spark Application > > > Key: SPARK-37863 > URL: https://issues.apache.org/jira/browse/SPARK-37863 > Project: Spark > Issue Type: New Feature > Components: Spark Submit >Affects Versions: 3.3.0 >Reporter: Zhou Yifan >Priority: Major > > Spark Driver may take a long time to startup when running with cluster mode, > if YARN temporarily does not have enough resources, or HDFS's performance is > poor due to high pressure. > After`submitTime` is added, we can detect such situation by comparing it with > the `spark.app.startTime`(already exists) . -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37863) Add submitTime for Spark Application
[ https://issues.apache.org/jira/browse/SPARK-37863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37863: Assignee: (was: Apache Spark) > Add submitTime for Spark Application > > > Key: SPARK-37863 > URL: https://issues.apache.org/jira/browse/SPARK-37863 > Project: Spark > Issue Type: New Feature > Components: Spark Submit >Affects Versions: 3.3.0 >Reporter: Zhou Yifan >Priority: Major > > Spark Driver may take a long time to startup when running with cluster mode, > if YARN temporarily does not have enough resources, or HDFS's performance is > poor due to high pressure. > After`submitTime` is added, we can detect such situation by comparing it with > the `spark.app.startTime`(already exists) . -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37840) Dynamically update the loaded Hive UDF JAR
[ https://issues.apache.org/jira/browse/SPARK-37840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17472519#comment-17472519 ] Jinpeng Chi commented on SPARK-37840: - Spark does not currently support delete jar > Dynamically update the loaded Hive UDF JAR > -- > > Key: SPARK-37840 > URL: https://issues.apache.org/jira/browse/SPARK-37840 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: melin >Priority: Major > > In the production environment, spark ThriftServer needs to be restarted if > jar files are updated after UDF files are loaded。 -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37840) Dynamically update the loaded Hive UDF JAR
[ https://issues.apache.org/jira/browse/SPARK-37840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17472521#comment-17472521 ] Jinpeng Chi commented on SPARK-37840: - [~hyukjin.kwon] But the loading is still from the previous Jar > Dynamically update the loaded Hive UDF JAR > -- > > Key: SPARK-37840 > URL: https://issues.apache.org/jira/browse/SPARK-37840 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: melin >Priority: Major > > In the production environment, spark ThriftServer needs to be restarted if > jar files are updated after UDF files are loaded。 -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org