[jira] [Resolved] (SPARK-33179) Switch default Hadoop profile in run-tests.py
[ https://issues.apache.org/jira/browse/SPARK-33179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-33179. -- Fix Version/s: 3.1.0 Resolution: Fixed Issue resolved by pull request 30090 [https://github.com/apache/spark/pull/30090] > Switch default Hadoop profile in run-tests.py > - > > Key: SPARK-33179 > URL: https://issues.apache.org/jira/browse/SPARK-33179 > Project: Spark > Issue Type: Improvement > Components: Tests >Affects Versions: 3.1.0 >Reporter: William Hyun >Assignee: William Hyun >Priority: Major > Fix For: 3.1.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33179) Switch default Hadoop profile in run-tests.py
[ https://issues.apache.org/jira/browse/SPARK-33179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-33179: Assignee: William Hyun > Switch default Hadoop profile in run-tests.py > - > > Key: SPARK-33179 > URL: https://issues.apache.org/jira/browse/SPARK-33179 > Project: Spark > Issue Type: Improvement > Components: Tests >Affects Versions: 3.1.0 >Reporter: William Hyun >Assignee: William Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-33128) mismatched input since Spark 3.0
[ https://issues.apache.org/jira/browse/SPARK-33128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17216485#comment-17216485 ] Yang Jie edited comment on SPARK-33128 at 10/19/20, 6:43 AM: - [~yumwang] I found that without SPARK-21136, case can be passed and there are some related reminders in the sql-migration-guide as follow: {code:java} ### Query Engine - In Spark version 2.4 and below, SQL queries such as `FROM ` or `FROM UNION ALL FROM ` are supported by accident. In hive-style `FROM SELECT `, the `SELECT` clause is not negligible. Neither Hive nor Presto support this syntax. These queries are treated as invalid in Spark 3.0. {code} but it looks like a mistake for "SELECT 1 UNION ALL SELECT 1" was (Author: luciferyang): [~yumwang] I found that without SPARK-21136, case can be passed and there are some related reminders in the sql-migration-guide as follow: {code:java} ### Query Engine - In Spark version 2.4 and below, SQL queries such as `FROM ` or `FROM UNION ALL FROM ` are supported by accident. In hive-style `FROM SELECT `, the `SELECT` clause is not negligible. Neither Hive nor Presto support this syntax. These queries are treated as invalid in Spark 3.0. {code} but it looks like a mistake > mismatched input since Spark 3.0 > > > Key: SPARK-33128 > URL: https://issues.apache.org/jira/browse/SPARK-33128 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0, 3.0.1, 3.1.0 >Reporter: Yuming Wang >Priority: Major > > Spark 2.4: > {noformat} > Welcome to > __ > / __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ >/___/ .__/\_,_/_/ /_/\_\ version 2.4.4 > /_/ > Using Scala version 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java > 1.8.0_221) > Type in expressions to have them evaluated. > Type :help for more information. > scala> spark.sql("SELECT 1 UNION SELECT 1 UNION ALL SELECT 1").show > +---+ > | 1| > +---+ > | 1| > | 1| > +---+ > {noformat} > Spark 3.x: > {noformat} > Welcome to > __ > / __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ >/___/ .__/\_,_/_/ /_/\_\ version 3.1.0-SNAPSHOT > /_/ > Using Scala version 2.12.10 (Java HotSpot(TM) 64-Bit Server VM, Java 14.0.1) > Type in expressions to have them evaluated. > Type :help for more information. > scala> spark.sql("SELECT 1 UNION SELECT 1 UNION ALL SELECT 1").show > org.apache.spark.sql.catalyst.parser.ParseException: > mismatched input 'SELECT' expecting {, ';'}(line 1, pos 15) > == SQL == > SELECT 1 UNION SELECT 1 UNION ALL SELECT 1 > ---^^^ > at > org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:263) > at > org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:130) > at > org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:51) > at > org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:81) > at org.apache.spark.sql.SparkSession.$anonfun$sql$2(SparkSession.scala:610) > at > org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111) > at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:610) > at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:769) > at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:607) > ... 47 elided > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33128) mismatched input since Spark 3.0
[ https://issues.apache.org/jira/browse/SPARK-33128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17216485#comment-17216485 ] Yang Jie commented on SPARK-33128: -- [~yumwang] I found that without SPARK-21136, case can be passed and there are some related reminders in the sql-migration-guide as follow: {code:java} ### Query Engine - In Spark version 2.4 and below, SQL queries such as `FROM ` or `FROM UNION ALL FROM ` are supported by accident. In hive-style `FROM SELECT `, the `SELECT` clause is not negligible. Neither Hive nor Presto support this syntax. These queries are treated as invalid in Spark 3.0. {code} but it looks like a mistake > mismatched input since Spark 3.0 > > > Key: SPARK-33128 > URL: https://issues.apache.org/jira/browse/SPARK-33128 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0, 3.0.1, 3.1.0 >Reporter: Yuming Wang >Priority: Major > > Spark 2.4: > {noformat} > Welcome to > __ > / __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ >/___/ .__/\_,_/_/ /_/\_\ version 2.4.4 > /_/ > Using Scala version 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java > 1.8.0_221) > Type in expressions to have them evaluated. > Type :help for more information. > scala> spark.sql("SELECT 1 UNION SELECT 1 UNION ALL SELECT 1").show > +---+ > | 1| > +---+ > | 1| > | 1| > +---+ > {noformat} > Spark 3.x: > {noformat} > Welcome to > __ > / __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ >/___/ .__/\_,_/_/ /_/\_\ version 3.1.0-SNAPSHOT > /_/ > Using Scala version 2.12.10 (Java HotSpot(TM) 64-Bit Server VM, Java 14.0.1) > Type in expressions to have them evaluated. > Type :help for more information. > scala> spark.sql("SELECT 1 UNION SELECT 1 UNION ALL SELECT 1").show > org.apache.spark.sql.catalyst.parser.ParseException: > mismatched input 'SELECT' expecting {, ';'}(line 1, pos 15) > == SQL == > SELECT 1 UNION SELECT 1 UNION ALL SELECT 1 > ---^^^ > at > org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:263) > at > org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:130) > at > org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:51) > at > org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:81) > at org.apache.spark.sql.SparkSession.$anonfun$sql$2(SparkSession.scala:610) > at > org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111) > at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:610) > at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:769) > at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:607) > ... 47 elided > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32557) Logging and Swallowing the Exception Per Entry in History Server
[ https://issues.apache.org/jira/browse/SPARK-32557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim updated SPARK-32557: - Fix Version/s: 3.0.2 > Logging and Swallowing the Exception Per Entry in History Server > > > Key: SPARK-32557 > URL: https://issues.apache.org/jira/browse/SPARK-32557 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Yan Xiaole >Assignee: Yan Xiaole >Priority: Major > Fix For: 3.0.2, 3.1.0 > > > As discussed in [https://github.com/apache/spark/pull/29350] > To avoid any entry affect others while History server scanning log dir, we > would like to add a try catch to log and swallow the exception per entry. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33146) Encountering an invalid rolling event log folder prevents loading other applications in SHS
[ https://issues.apache.org/jira/browse/SPARK-33146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim updated SPARK-33146: - Fix Version/s: 3.0.2 > Encountering an invalid rolling event log folder prevents loading other > applications in SHS > --- > > Key: SPARK-33146 > URL: https://issues.apache.org/jira/browse/SPARK-33146 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.1 >Reporter: Adam Binford >Assignee: Adam Binford >Priority: Major > Fix For: 3.0.2, 3.1.0 > > > A follow-on issue from https://issues.apache.org/jira/browse/SPARK-33133 > If an invalid rolling event log folder is encountered by the Spark History > Server upon startup, it crashes the whole loading process and prevents any > valid applications from loading. We should simply catch the error, log it, > and continue loading other applications. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33139) protect setActiveSession and clearActiveSession
[ https://issues.apache.org/jira/browse/SPARK-33139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17216468#comment-17216468 ] Apache Spark commented on SPARK-33139: -- User 'leanken' has created a pull request for this issue: https://github.com/apache/spark/pull/30092 > protect setActiveSession and clearActiveSession > --- > > Key: SPARK-33139 > URL: https://issues.apache.org/jira/browse/SPARK-33139 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.0 >Reporter: Leanken.Lin >Assignee: Leanken.Lin >Priority: Major > Fix For: 3.1.0 > > > This PR is a sub-task of > [SPARK-33138](https://issues.apache.org/jira/browse/SPARK-33138). In order to > make SQLConf.get reliable and stable, we need to make sure user can't pollute > the SQLConf and SparkSession Context via calling setActiveSession and > clearActiveSession. > Change of the PR: > * add legacy config spark.sql.legacy.allowModifyActiveSession to fallback to > old behavior if user do need to call these two API. > * by default, if user call these two API, it will throw exception > * add extra two internal and private API setActiveSessionInternal and > clearActiveSessionInternal for current internal usage > * change all internal reference to new internal API exception for > SQLContext.setActive and SQLContext.clearActive -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33139) protect setActiveSession and clearActiveSession
[ https://issues.apache.org/jira/browse/SPARK-33139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17216467#comment-17216467 ] Apache Spark commented on SPARK-33139: -- User 'leanken' has created a pull request for this issue: https://github.com/apache/spark/pull/30092 > protect setActiveSession and clearActiveSession > --- > > Key: SPARK-33139 > URL: https://issues.apache.org/jira/browse/SPARK-33139 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.0 >Reporter: Leanken.Lin >Assignee: Leanken.Lin >Priority: Major > Fix For: 3.1.0 > > > This PR is a sub-task of > [SPARK-33138](https://issues.apache.org/jira/browse/SPARK-33138). In order to > make SQLConf.get reliable and stable, we need to make sure user can't pollute > the SQLConf and SparkSession Context via calling setActiveSession and > clearActiveSession. > Change of the PR: > * add legacy config spark.sql.legacy.allowModifyActiveSession to fallback to > old behavior if user do need to call these two API. > * by default, if user call these two API, it will throw exception > * add extra two internal and private API setActiveSessionInternal and > clearActiveSessionInternal for current internal usage > * change all internal reference to new internal API exception for > SQLContext.setActive and SQLContext.clearActive -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33180) Enables 'fail_if_no_tests' when reporting test results
[ https://issues.apache.org/jira/browse/SPARK-33180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17216461#comment-17216461 ] Apache Spark commented on SPARK-33180: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/30091 > Enables 'fail_if_no_tests' when reporting test results > -- > > Key: SPARK-33180 > URL: https://issues.apache.org/jira/browse/SPARK-33180 > Project: Spark > Issue Type: Sub-task > Components: Project Infra >Affects Versions: 2.4.8, 3.0.2, 3.1.0 >Reporter: Hyukjin Kwon >Priority: Minor > > SPARK-33069 skipped because it raises a false alarm when there are no test > cases. This is now fixed in > https://github.com/ScaCap/action-surefire-report/issues/29 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33180) Enables 'fail_if_no_tests' when reporting test results
[ https://issues.apache.org/jira/browse/SPARK-33180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17216460#comment-17216460 ] Apache Spark commented on SPARK-33180: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/30091 > Enables 'fail_if_no_tests' when reporting test results > -- > > Key: SPARK-33180 > URL: https://issues.apache.org/jira/browse/SPARK-33180 > Project: Spark > Issue Type: Sub-task > Components: Project Infra >Affects Versions: 2.4.8, 3.0.2, 3.1.0 >Reporter: Hyukjin Kwon >Priority: Minor > > SPARK-33069 skipped because it raises a false alarm when there are no test > cases. This is now fixed in > https://github.com/ScaCap/action-surefire-report/issues/29 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33180) Enables 'fail_if_no_tests' when reporting test results
[ https://issues.apache.org/jira/browse/SPARK-33180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33180: Assignee: (was: Apache Spark) > Enables 'fail_if_no_tests' when reporting test results > -- > > Key: SPARK-33180 > URL: https://issues.apache.org/jira/browse/SPARK-33180 > Project: Spark > Issue Type: Sub-task > Components: Project Infra >Affects Versions: 2.4.8, 3.0.2, 3.1.0 >Reporter: Hyukjin Kwon >Priority: Minor > > SPARK-33069 skipped because it raises a false alarm when there are no test > cases. This is now fixed in > https://github.com/ScaCap/action-surefire-report/issues/29 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33180) Enables 'fail_if_no_tests' when reporting test results
[ https://issues.apache.org/jira/browse/SPARK-33180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33180: Assignee: Apache Spark > Enables 'fail_if_no_tests' when reporting test results > -- > > Key: SPARK-33180 > URL: https://issues.apache.org/jira/browse/SPARK-33180 > Project: Spark > Issue Type: Sub-task > Components: Project Infra >Affects Versions: 2.4.8, 3.0.2, 3.1.0 >Reporter: Hyukjin Kwon >Assignee: Apache Spark >Priority: Minor > > SPARK-33069 skipped because it raises a false alarm when there are no test > cases. This is now fixed in > https://github.com/ScaCap/action-surefire-report/issues/29 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33180) Enables 'fail_if_no_tests' when reporting test results
[ https://issues.apache.org/jira/browse/SPARK-33180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-33180: - Summary: Enables 'fail_if_no_tests' when reporting test results (was: Enables 'fail_if_no_tests' in GitHub Actions instead of manually skipping) > Enables 'fail_if_no_tests' when reporting test results > -- > > Key: SPARK-33180 > URL: https://issues.apache.org/jira/browse/SPARK-33180 > Project: Spark > Issue Type: Sub-task > Components: Project Infra >Affects Versions: 2.4.8, 3.0.2, 3.1.0 >Reporter: Hyukjin Kwon >Priority: Minor > > SPARK-33069 skipped because it raises a false alarm when there are no test > cases. This is now fixed in > https://github.com/ScaCap/action-surefire-report/issues/29 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33180) Enables 'fail_if_no_tests' in GitHub Actions instead of manually skipping
Hyukjin Kwon created SPARK-33180: Summary: Enables 'fail_if_no_tests' in GitHub Actions instead of manually skipping Key: SPARK-33180 URL: https://issues.apache.org/jira/browse/SPARK-33180 Project: Spark Issue Type: Sub-task Components: Project Infra Affects Versions: 2.4.8, 3.0.2, 3.1.0 Reporter: Hyukjin Kwon SPARK-33069 skipped because it raises a false alarm when there are no test cases. This is now fixed in https://github.com/ScaCap/action-surefire-report/issues/29 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-33123) Ignore `GitHub Action file` change in Amplab Jenkins
[ https://issues.apache.org/jira/browse/SPARK-33123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-33123. -- Fix Version/s: 2.4.8 3.0.2 3.1.0 Resolution: Fixed Issue resolved by pull request 30020 [https://github.com/apache/spark/pull/30020] > Ignore `GitHub Action file` change in Amplab Jenkins > > > Key: SPARK-33123 > URL: https://issues.apache.org/jira/browse/SPARK-33123 > Project: Spark > Issue Type: Improvement > Components: Tests >Affects Versions: 3.1.0 >Reporter: William Hyun >Assignee: William Hyun >Priority: Major > Fix For: 3.1.0, 3.0.2, 2.4.8 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33123) Ignore `GitHub Action file` change in Amplab Jenkins
[ https://issues.apache.org/jira/browse/SPARK-33123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-33123: Assignee: William Hyun > Ignore `GitHub Action file` change in Amplab Jenkins > > > Key: SPARK-33123 > URL: https://issues.apache.org/jira/browse/SPARK-33123 > Project: Spark > Issue Type: Improvement > Components: Tests >Affects Versions: 3.1.0 >Reporter: William Hyun >Assignee: William Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33179) Switch default Hadoop profile in run-tests.py
[ https://issues.apache.org/jira/browse/SPARK-33179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17216444#comment-17216444 ] Apache Spark commented on SPARK-33179: -- User 'williamhyun' has created a pull request for this issue: https://github.com/apache/spark/pull/30090 > Switch default Hadoop profile in run-tests.py > - > > Key: SPARK-33179 > URL: https://issues.apache.org/jira/browse/SPARK-33179 > Project: Spark > Issue Type: Improvement > Components: Tests >Affects Versions: 3.1.0 >Reporter: William Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33179) Switch default Hadoop profile in run-tests.py
[ https://issues.apache.org/jira/browse/SPARK-33179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33179: Assignee: Apache Spark > Switch default Hadoop profile in run-tests.py > - > > Key: SPARK-33179 > URL: https://issues.apache.org/jira/browse/SPARK-33179 > Project: Spark > Issue Type: Improvement > Components: Tests >Affects Versions: 3.1.0 >Reporter: William Hyun >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33179) Switch default Hadoop profile in run-tests.py
[ https://issues.apache.org/jira/browse/SPARK-33179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17216445#comment-17216445 ] Apache Spark commented on SPARK-33179: -- User 'williamhyun' has created a pull request for this issue: https://github.com/apache/spark/pull/30090 > Switch default Hadoop profile in run-tests.py > - > > Key: SPARK-33179 > URL: https://issues.apache.org/jira/browse/SPARK-33179 > Project: Spark > Issue Type: Improvement > Components: Tests >Affects Versions: 3.1.0 >Reporter: William Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33179) Switch default Hadoop profile in run-tests.py
[ https://issues.apache.org/jira/browse/SPARK-33179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33179: Assignee: (was: Apache Spark) > Switch default Hadoop profile in run-tests.py > - > > Key: SPARK-33179 > URL: https://issues.apache.org/jira/browse/SPARK-33179 > Project: Spark > Issue Type: Improvement > Components: Tests >Affects Versions: 3.1.0 >Reporter: William Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33179) Switch default Hadoop profile in run-tests.py
[ https://issues.apache.org/jira/browse/SPARK-33179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] William Hyun updated SPARK-33179: - Summary: Switch default Hadoop profile in run-tests.py (was: Switch default Hadoop version in run-tests.py) > Switch default Hadoop profile in run-tests.py > - > > Key: SPARK-33179 > URL: https://issues.apache.org/jira/browse/SPARK-33179 > Project: Spark > Issue Type: Improvement > Components: Tests >Affects Versions: 3.1.0 >Reporter: William Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33179) Switch default Hadoop version in run-tests.py
William Hyun created SPARK-33179: Summary: Switch default Hadoop version in run-tests.py Key: SPARK-33179 URL: https://issues.apache.org/jira/browse/SPARK-33179 Project: Spark Issue Type: Improvement Components: Tests Affects Versions: 3.1.0 Reporter: William Hyun -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32069) Improve error message on reading unexpected directory which is not a table partition
[ https://issues.apache.org/jira/browse/SPARK-32069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-32069: - Assignee: angerszhu > Improve error message on reading unexpected directory which is not a table > partition > > > Key: SPARK-32069 > URL: https://issues.apache.org/jira/browse/SPARK-32069 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Gengliang Wang >Assignee: angerszhu >Priority: Minor > Labels: starter > Fix For: 3.1.0 > > > To reproduce: > {code:java} > spark-sql> create table test(i long); > spark-sql> insert into test values(1); > {code} > {code:java} > bash $ mkdir ./spark-warehouse/test/data > {code} > There will be such error messge > {code:java} > java.io.IOException: Not a file: > file:/Users/gengliang.wang/projects/spark/spark-warehouse/test/data > at > org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:322) > at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:205) > at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:300) > at scala.Option.getOrElse(Option.scala:189) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:296) > at > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) > at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:300) > at scala.Option.getOrElse(Option.scala:189) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:296) > at > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) > at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:300) > at scala.Option.getOrElse(Option.scala:189) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:296) > at > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) > at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:300) > at scala.Option.getOrElse(Option.scala:189) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:296) > at > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) > at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:300) > at scala.Option.getOrElse(Option.scala:189) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:296) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:2173) > at org.apache.spark.rdd.RDD.$anonfun$collect$1(RDD.scala:1030) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) > at org.apache.spark.rdd.RDD.withScope(RDD.scala:414) > at org.apache.spark.rdd.RDD.collect(RDD.scala:1029) > at > org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:385) > at > org.apache.spark.sql.execution.SparkPlan.executeCollectPublic(SparkPlan.scala:412) > at > org.apache.spark.sql.execution.HiveResult$.hiveResultString(HiveResult.scala:76) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.$anonfun$run$1(SparkSQLDriver.scala:65) > at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103) > at > org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163) > at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90) > at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:763) > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:65) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:377) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1(SparkSQLCLIDriver.scala:496) > at scala.collection.Iterator.foreach(Iterator.scala:941) > at scala.collection.Iterator.foreach$(Iterator.scala:941) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1429) > at scala.collection.IterableLike.foreach(IterableLike.scala:74) > at scala.collection.IterableLike.foreach$(IterableLike.scala:73) > at scala.collection.AbstractIterable.foreach(Iterable.scala:56) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processLine(SparkSQLCLIDriver.scala:490) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:282) > at > org.apache.spark.sql.hive.thriftserver.Spar
[jira] [Resolved] (SPARK-32069) Improve error message on reading unexpected directory which is not a table partition
[ https://issues.apache.org/jira/browse/SPARK-32069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-32069. --- Fix Version/s: 3.1.0 Resolution: Fixed Issue resolved by pull request 30027 [https://github.com/apache/spark/pull/30027] > Improve error message on reading unexpected directory which is not a table > partition > > > Key: SPARK-32069 > URL: https://issues.apache.org/jira/browse/SPARK-32069 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Gengliang Wang >Priority: Minor > Labels: starter > Fix For: 3.1.0 > > > To reproduce: > {code:java} > spark-sql> create table test(i long); > spark-sql> insert into test values(1); > {code} > {code:java} > bash $ mkdir ./spark-warehouse/test/data > {code} > There will be such error messge > {code:java} > java.io.IOException: Not a file: > file:/Users/gengliang.wang/projects/spark/spark-warehouse/test/data > at > org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:322) > at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:205) > at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:300) > at scala.Option.getOrElse(Option.scala:189) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:296) > at > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) > at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:300) > at scala.Option.getOrElse(Option.scala:189) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:296) > at > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) > at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:300) > at scala.Option.getOrElse(Option.scala:189) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:296) > at > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) > at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:300) > at scala.Option.getOrElse(Option.scala:189) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:296) > at > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) > at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:300) > at scala.Option.getOrElse(Option.scala:189) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:296) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:2173) > at org.apache.spark.rdd.RDD.$anonfun$collect$1(RDD.scala:1030) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) > at org.apache.spark.rdd.RDD.withScope(RDD.scala:414) > at org.apache.spark.rdd.RDD.collect(RDD.scala:1029) > at > org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:385) > at > org.apache.spark.sql.execution.SparkPlan.executeCollectPublic(SparkPlan.scala:412) > at > org.apache.spark.sql.execution.HiveResult$.hiveResultString(HiveResult.scala:76) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.$anonfun$run$1(SparkSQLDriver.scala:65) > at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103) > at > org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163) > at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90) > at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:763) > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:65) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:377) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1(SparkSQLCLIDriver.scala:496) > at scala.collection.Iterator.foreach(Iterator.scala:941) > at scala.collection.Iterator.foreach$(Iterator.scala:941) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1429) > at scala.collection.IterableLike.foreach(IterableLike.scala:74) > at scala.collection.IterableLike.foreach$(IterableLike.scala:73) > at scala.collection.AbstractIterable.foreach(Iterable.scala:56) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processLine(SparkSQLCLIDriver.scala:490) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriv
[jira] [Resolved] (SPARK-33177) CollectList and CollectSet should not be nullable
[ https://issues.apache.org/jira/browse/SPARK-33177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-33177. -- Fix Version/s: 3.1.0 Resolution: Fixed Issue resolved by pull request 30087 [https://github.com/apache/spark/pull/30087] > CollectList and CollectSet should not be nullable > - > > Key: SPARK-33177 > URL: https://issues.apache.org/jira/browse/SPARK-33177 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Tanel Kiis >Assignee: Tanel Kiis >Priority: Minor > Fix For: 3.1.0 > > > CollectList and CollectSet SQL expressions never return null value. Marking > them as non-nullable can have some performance benefits, because some > optimizer rules apply only to non-nullable expressions -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33177) CollectList and CollectSet should not be nullable
[ https://issues.apache.org/jira/browse/SPARK-33177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-33177: Assignee: Tanel Kiis > CollectList and CollectSet should not be nullable > - > > Key: SPARK-33177 > URL: https://issues.apache.org/jira/browse/SPARK-33177 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Tanel Kiis >Assignee: Tanel Kiis >Priority: Minor > > CollectList and CollectSet SQL expressions never return null value. Marking > them as non-nullable can have some performance benefits, because some > optimizer rules apply only to non-nullable expressions -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17333) Make pyspark interface friendly with mypy static analysis
[ https://issues.apache.org/jira/browse/SPARK-17333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-17333: - Parent: SPARK-32681 Affects Version/s: 3.1.0 Issue Type: Sub-task (was: Improvement) > Make pyspark interface friendly with mypy static analysis > - > > Key: SPARK-17333 > URL: https://issues.apache.org/jira/browse/SPARK-17333 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.1.0 >Reporter: Assaf Mendelson >Priority: Major > > Static analysis tools such as those common to IDE for auto completion and > error marking, tend to have poor results with pyspark. > This is cause by two separate issues: > The first is that many elements are created programmatically such as the max > function in pyspark.sql.functions. > The second is that we tend to use pyspark in a functional manner, meaning > that we chain many actions (e.g. df.filter().groupby().agg()) and since > python has no type information this can become difficult to understand. > I would suggest changing the interface to improve it. > The way I see it we can either change the interface or provide interface > enhancements. > Changing the interface means defining (when possible) all functions directly, > i.e. instead of having a __functions__ dictionary in pyspark.sql.functions.py > and then generating the functions programmatically by using _create_function, > create the function directly. > def max(col): >""" >docstring >""" >_create_function(max,"docstring") > Second we can add type indications to all functions as defined in pep 484 or > pycharm's legacy type hinting > (https://www.jetbrains.com/help/pycharm/2016.1/type-hinting-in-pycharm.html#legacy). > So for example max might look like this: > def max(col): >""" >does a max. > :type col: Column > :rtype Column >""" > This would provide a wide range of support as these types of hints, while old > are pretty common. > A second option is to use PEP 3107 to define interfaces (pyi files) > in this case we might have a functions.pyi file which would contain something > like: > def max(col: Column) -> Column: > """ > Aggregate function: returns the maximum value of the expression in a > group. > """ > ... > This has the advantage of easier to understand types and not touching the > code (only supported code) but has the disadvantage of being separately > managed (i.e. greater chance of doing a mistake) and the fact that some > configuration would be needed in the IDE/static analysis tool instead of > working out of the box. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17333) Make pyspark interface friendly with mypy static analysis
[ https://issues.apache.org/jira/browse/SPARK-17333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-17333: - Priority: Major (was: Trivial) > Make pyspark interface friendly with mypy static analysis > - > > Key: SPARK-17333 > URL: https://issues.apache.org/jira/browse/SPARK-17333 > Project: Spark > Issue Type: Improvement > Components: PySpark >Reporter: Assaf Mendelson >Priority: Major > > Static analysis tools such as those common to IDE for auto completion and > error marking, tend to have poor results with pyspark. > This is cause by two separate issues: > The first is that many elements are created programmatically such as the max > function in pyspark.sql.functions. > The second is that we tend to use pyspark in a functional manner, meaning > that we chain many actions (e.g. df.filter().groupby().agg()) and since > python has no type information this can become difficult to understand. > I would suggest changing the interface to improve it. > The way I see it we can either change the interface or provide interface > enhancements. > Changing the interface means defining (when possible) all functions directly, > i.e. instead of having a __functions__ dictionary in pyspark.sql.functions.py > and then generating the functions programmatically by using _create_function, > create the function directly. > def max(col): >""" >docstring >""" >_create_function(max,"docstring") > Second we can add type indications to all functions as defined in pep 484 or > pycharm's legacy type hinting > (https://www.jetbrains.com/help/pycharm/2016.1/type-hinting-in-pycharm.html#legacy). > So for example max might look like this: > def max(col): >""" >does a max. > :type col: Column > :rtype Column >""" > This would provide a wide range of support as these types of hints, while old > are pretty common. > A second option is to use PEP 3107 to define interfaces (pyi files) > in this case we might have a functions.pyi file which would contain something > like: > def max(col: Column) -> Column: > """ > Aggregate function: returns the maximum value of the expression in a > group. > """ > ... > This has the advantage of easier to understand types and not touching the > code (only supported code) but has the disadvantage of being separately > managed (i.e. greater chance of doing a mistake) and the fact that some > configuration would be needed in the IDE/static analysis tool instead of > working out of the box. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33137) Support ALTER TABLE in JDBC v2 Table Catalog: update type and nullability of columns (PostgreSQL dialect)
[ https://issues.apache.org/jira/browse/SPARK-33137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17216386#comment-17216386 ] Apache Spark commented on SPARK-33137: -- User 'huaxingao' has created a pull request for this issue: https://github.com/apache/spark/pull/30089 > Support ALTER TABLE in JDBC v2 Table Catalog: update type and nullability of > columns (PostgreSQL dialect) > - > > Key: SPARK-33137 > URL: https://issues.apache.org/jira/browse/SPARK-33137 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.0 >Reporter: Huaxin Gao >Priority: Major > > Override the default SQL strings for: > ALTER TABLE UPDATE COLUMN TYPE > ALTER TABLE UPDATE COLUMN NULLABILITY > in the following PostgreSQL JDBC dialect according to official documentation. > Write PostgreSQL integration tests for JDBC. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33137) Support ALTER TABLE in JDBC v2 Table Catalog: update type and nullability of columns (PostgreSQL dialect)
[ https://issues.apache.org/jira/browse/SPARK-33137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33137: Assignee: (was: Apache Spark) > Support ALTER TABLE in JDBC v2 Table Catalog: update type and nullability of > columns (PostgreSQL dialect) > - > > Key: SPARK-33137 > URL: https://issues.apache.org/jira/browse/SPARK-33137 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.0 >Reporter: Huaxin Gao >Priority: Major > > Override the default SQL strings for: > ALTER TABLE UPDATE COLUMN TYPE > ALTER TABLE UPDATE COLUMN NULLABILITY > in the following PostgreSQL JDBC dialect according to official documentation. > Write PostgreSQL integration tests for JDBC. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33137) Support ALTER TABLE in JDBC v2 Table Catalog: update type and nullability of columns (PostgreSQL dialect)
[ https://issues.apache.org/jira/browse/SPARK-33137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33137: Assignee: Apache Spark > Support ALTER TABLE in JDBC v2 Table Catalog: update type and nullability of > columns (PostgreSQL dialect) > - > > Key: SPARK-33137 > URL: https://issues.apache.org/jira/browse/SPARK-33137 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.0 >Reporter: Huaxin Gao >Assignee: Apache Spark >Priority: Major > > Override the default SQL strings for: > ALTER TABLE UPDATE COLUMN TYPE > ALTER TABLE UPDATE COLUMN NULLABILITY > in the following PostgreSQL JDBC dialect according to official documentation. > Write PostgreSQL integration tests for JDBC. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33137) Support ALTER TABLE in JDBC v2 Table Catalog: update type and nullability of columns (PostgreSQL dialect)
[ https://issues.apache.org/jira/browse/SPARK-33137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17216385#comment-17216385 ] Apache Spark commented on SPARK-33137: -- User 'huaxingao' has created a pull request for this issue: https://github.com/apache/spark/pull/30089 > Support ALTER TABLE in JDBC v2 Table Catalog: update type and nullability of > columns (PostgreSQL dialect) > - > > Key: SPARK-33137 > URL: https://issues.apache.org/jira/browse/SPARK-33137 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.0 >Reporter: Huaxin Gao >Priority: Major > > Override the default SQL strings for: > ALTER TABLE UPDATE COLUMN TYPE > ALTER TABLE UPDATE COLUMN NULLABILITY > in the following PostgreSQL JDBC dialect according to official documentation. > Write PostgreSQL integration tests for JDBC. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21065) Spark Streaming concurrentJobs + StreamingJobProgressListener conflict
[ https://issues.apache.org/jira/browse/SPARK-21065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17216375#comment-17216375 ] Shixiong Zhu commented on SPARK-21065: -- If you are seeing many active batches, it's likely your streaming application is too slow. You can try to look at UI and see if there are anything obvious that you can optimize. > Spark Streaming concurrentJobs + StreamingJobProgressListener conflict > -- > > Key: SPARK-21065 > URL: https://issues.apache.org/jira/browse/SPARK-21065 > Project: Spark > Issue Type: Bug > Components: DStreams, Scheduler, Spark Core, Web UI >Affects Versions: 2.1.0 >Reporter: Dan Dutrow >Priority: Major > > My streaming application has 200+ output operations, many of them stateful > and several of them windowed. In an attempt to reduce the processing times, I > set "spark.streaming.concurrentJobs" to 2+. Initial results are very > positive, cutting our processing time from ~3 minutes to ~1 minute, but > eventually we encounter an exception as follows: > (Note that 149697756 ms is 2017-06-09 03:06:00, so it's trying to get a > batch from 45 minutes before the exception is thrown.) > 2017-06-09 03:50:28,259 [Spark Listener Bus] ERROR > org.apache.spark.streaming.scheduler.StreamingListenerBus - Listener > StreamingJobProgressListener threw an exception > java.util.NoSuchElementException: key not found 149697756 ms > at scala.collection.MalLike$class.default(MapLike.scala:228) > at scala.collection.AbstractMap.default(Map.scala:59) > at scala.collection.mutable.HashMap.apply(HashMap.scala:65) > at > org.apache.spark.streaming.ui.StreamingJobProgressListener.onOutputOperationCompleted(StreamingJobProgressListener.scala:128) > at > org.apache.spark.streaming.scheduler.StreamingListenerBus.doPostEvent(StreamingListenerBus.scala:67) > at > org.apache.spark.streaming.scheduler.StreamingListenerBus.doPostEvent(StreamingListenerBus.scala:29) > at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:63) > at > org.apache.spark.streaming.scheduler.StreamingListenerBus.postToAll(StreamingListenerBus.scala:29) > at > org.apache.spark.streaming.scheduler.StreamingListenerBus.onOtherEvent(StreamingListenerBus.scala:43) > ... > The Spark code causing the exception is here: > https://github.com/apache/spark/blob/branch-2.1/streaming/src/main/scala/org/apache/spark/streaming/ui/StreamingJobProgressListener.scala#LC125 > override def onOutputOperationCompleted( > outputOperationCompleted: StreamingListenerOutputOperationCompleted): > Unit = synchronized { > // This method is called before onBatchCompleted > {color:red}runningBatchUIData(outputOperationCompleted.outputOperationInfo.batchTime).{color} > updateOutputOperationInfo(outputOperationCompleted.outputOperationInfo) > } > It seems to me that it may be caused by that batch being removed earlier. > https://github.com/apache/spark/blob/branch-2.1/streaming/src/main/scala/org/apache/spark/streaming/ui/StreamingJobProgressListener.scala#LC102 > override def onBatchCompleted(batchCompleted: > StreamingListenerBatchCompleted): Unit = { > synchronized { > waitingBatchUIData.remove(batchCompleted.batchInfo.batchTime) > > {color:red}runningBatchUIData.remove(batchCompleted.batchInfo.batchTime){color} > val batchUIData = BatchUIData(batchCompleted.batchInfo) > completedBatchUIData.enqueue(batchUIData) > if (completedBatchUIData.size > batchUIDataLimit) { > val removedBatch = completedBatchUIData.dequeue() > batchTimeToOutputOpIdSparkJobIdPair.remove(removedBatch.batchTime) > } > totalCompletedBatches += 1L > totalProcessedRecords += batchUIData.numRecords > } > } > What is the solution here? Should I make my spark streaming context remember > duration a lot longer? ssc.remember(batchDuration * rememberMultiple) > Otherwise, it seems like there should be some kind of existence check on > runningBatchUIData before dereferencing it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32797) Install mypy on the Jenkins CI workers
[ https://issues.apache.org/jira/browse/SPARK-32797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fokko Driesprong updated SPARK-32797: - Affects Version/s: (was: 3.0.0) 3.1.0 > Install mypy on the Jenkins CI workers > -- > > Key: SPARK-32797 > URL: https://issues.apache.org/jira/browse/SPARK-32797 > Project: Spark > Issue Type: Improvement > Components: jenkins, PySpark >Affects Versions: 3.1.0 >Reporter: Fokko Driesprong >Priority: Major > > We want to check the types of the PySpark code. This requires mypy to be > installed on the CI. Can you do this [~shaneknapp]? > Related PR: [https://github.com/apache/spark/pull/29180] > You can install this using pip: [https://pypi.org/project/mypy/] Should be > similar to flake8 and sphinx. The latest version is ok! Thanks! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17333) Make pyspark interface friendly with mypy static analysis
[ https://issues.apache.org/jira/browse/SPARK-17333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17216345#comment-17216345 ] Apache Spark commented on SPARK-17333: -- User 'Fokko' has created a pull request for this issue: https://github.com/apache/spark/pull/30088 > Make pyspark interface friendly with mypy static analysis > - > > Key: SPARK-17333 > URL: https://issues.apache.org/jira/browse/SPARK-17333 > Project: Spark > Issue Type: Improvement > Components: PySpark >Reporter: Assaf Mendelson >Priority: Trivial > > Static analysis tools such as those common to IDE for auto completion and > error marking, tend to have poor results with pyspark. > This is cause by two separate issues: > The first is that many elements are created programmatically such as the max > function in pyspark.sql.functions. > The second is that we tend to use pyspark in a functional manner, meaning > that we chain many actions (e.g. df.filter().groupby().agg()) and since > python has no type information this can become difficult to understand. > I would suggest changing the interface to improve it. > The way I see it we can either change the interface or provide interface > enhancements. > Changing the interface means defining (when possible) all functions directly, > i.e. instead of having a __functions__ dictionary in pyspark.sql.functions.py > and then generating the functions programmatically by using _create_function, > create the function directly. > def max(col): >""" >docstring >""" >_create_function(max,"docstring") > Second we can add type indications to all functions as defined in pep 484 or > pycharm's legacy type hinting > (https://www.jetbrains.com/help/pycharm/2016.1/type-hinting-in-pycharm.html#legacy). > So for example max might look like this: > def max(col): >""" >does a max. > :type col: Column > :rtype Column >""" > This would provide a wide range of support as these types of hints, while old > are pretty common. > A second option is to use PEP 3107 to define interfaces (pyi files) > in this case we might have a functions.pyi file which would contain something > like: > def max(col: Column) -> Column: > """ > Aggregate function: returns the maximum value of the expression in a > group. > """ > ... > This has the advantage of easier to understand types and not touching the > code (only supported code) but has the disadvantage of being separately > managed (i.e. greater chance of doing a mistake) and the fact that some > configuration would be needed in the IDE/static analysis tool instead of > working out of the box. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21065) Spark Streaming concurrentJobs + StreamingJobProgressListener conflict
[ https://issues.apache.org/jira/browse/SPARK-21065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17216342#comment-17216342 ] Sachit Murarka commented on SPARK-21065: [~zsxwing] Thanks for quick response , any suggestion on optimizing many active batches. (Probably I should reduce the processing time or increase the batch interval). Correct? Any other thing? > Spark Streaming concurrentJobs + StreamingJobProgressListener conflict > -- > > Key: SPARK-21065 > URL: https://issues.apache.org/jira/browse/SPARK-21065 > Project: Spark > Issue Type: Bug > Components: DStreams, Scheduler, Spark Core, Web UI >Affects Versions: 2.1.0 >Reporter: Dan Dutrow >Priority: Major > > My streaming application has 200+ output operations, many of them stateful > and several of them windowed. In an attempt to reduce the processing times, I > set "spark.streaming.concurrentJobs" to 2+. Initial results are very > positive, cutting our processing time from ~3 minutes to ~1 minute, but > eventually we encounter an exception as follows: > (Note that 149697756 ms is 2017-06-09 03:06:00, so it's trying to get a > batch from 45 minutes before the exception is thrown.) > 2017-06-09 03:50:28,259 [Spark Listener Bus] ERROR > org.apache.spark.streaming.scheduler.StreamingListenerBus - Listener > StreamingJobProgressListener threw an exception > java.util.NoSuchElementException: key not found 149697756 ms > at scala.collection.MalLike$class.default(MapLike.scala:228) > at scala.collection.AbstractMap.default(Map.scala:59) > at scala.collection.mutable.HashMap.apply(HashMap.scala:65) > at > org.apache.spark.streaming.ui.StreamingJobProgressListener.onOutputOperationCompleted(StreamingJobProgressListener.scala:128) > at > org.apache.spark.streaming.scheduler.StreamingListenerBus.doPostEvent(StreamingListenerBus.scala:67) > at > org.apache.spark.streaming.scheduler.StreamingListenerBus.doPostEvent(StreamingListenerBus.scala:29) > at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:63) > at > org.apache.spark.streaming.scheduler.StreamingListenerBus.postToAll(StreamingListenerBus.scala:29) > at > org.apache.spark.streaming.scheduler.StreamingListenerBus.onOtherEvent(StreamingListenerBus.scala:43) > ... > The Spark code causing the exception is here: > https://github.com/apache/spark/blob/branch-2.1/streaming/src/main/scala/org/apache/spark/streaming/ui/StreamingJobProgressListener.scala#LC125 > override def onOutputOperationCompleted( > outputOperationCompleted: StreamingListenerOutputOperationCompleted): > Unit = synchronized { > // This method is called before onBatchCompleted > {color:red}runningBatchUIData(outputOperationCompleted.outputOperationInfo.batchTime).{color} > updateOutputOperationInfo(outputOperationCompleted.outputOperationInfo) > } > It seems to me that it may be caused by that batch being removed earlier. > https://github.com/apache/spark/blob/branch-2.1/streaming/src/main/scala/org/apache/spark/streaming/ui/StreamingJobProgressListener.scala#LC102 > override def onBatchCompleted(batchCompleted: > StreamingListenerBatchCompleted): Unit = { > synchronized { > waitingBatchUIData.remove(batchCompleted.batchInfo.batchTime) > > {color:red}runningBatchUIData.remove(batchCompleted.batchInfo.batchTime){color} > val batchUIData = BatchUIData(batchCompleted.batchInfo) > completedBatchUIData.enqueue(batchUIData) > if (completedBatchUIData.size > batchUIDataLimit) { > val removedBatch = completedBatchUIData.dequeue() > batchTimeToOutputOpIdSparkJobIdPair.remove(removedBatch.batchTime) > } > totalCompletedBatches += 1L > totalProcessedRecords += batchUIData.numRecords > } > } > What is the solution here? Should I make my spark streaming context remember > duration a lot longer? ssc.remember(batchDuration * rememberMultiple) > Otherwise, it seems like there should be some kind of existence check on > runningBatchUIData before dereferencing it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21065) Spark Streaming concurrentJobs + StreamingJobProgressListener conflict
[ https://issues.apache.org/jira/browse/SPARK-21065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17216341#comment-17216341 ] Shixiong Zhu commented on SPARK-21065: -- `spark.streaming.concurrentJobs` is not safe. Fixing it requires fundamental system changes. We don't have any plan for this. > Spark Streaming concurrentJobs + StreamingJobProgressListener conflict > -- > > Key: SPARK-21065 > URL: https://issues.apache.org/jira/browse/SPARK-21065 > Project: Spark > Issue Type: Bug > Components: DStreams, Scheduler, Spark Core, Web UI >Affects Versions: 2.1.0 >Reporter: Dan Dutrow >Priority: Major > > My streaming application has 200+ output operations, many of them stateful > and several of them windowed. In an attempt to reduce the processing times, I > set "spark.streaming.concurrentJobs" to 2+. Initial results are very > positive, cutting our processing time from ~3 minutes to ~1 minute, but > eventually we encounter an exception as follows: > (Note that 149697756 ms is 2017-06-09 03:06:00, so it's trying to get a > batch from 45 minutes before the exception is thrown.) > 2017-06-09 03:50:28,259 [Spark Listener Bus] ERROR > org.apache.spark.streaming.scheduler.StreamingListenerBus - Listener > StreamingJobProgressListener threw an exception > java.util.NoSuchElementException: key not found 149697756 ms > at scala.collection.MalLike$class.default(MapLike.scala:228) > at scala.collection.AbstractMap.default(Map.scala:59) > at scala.collection.mutable.HashMap.apply(HashMap.scala:65) > at > org.apache.spark.streaming.ui.StreamingJobProgressListener.onOutputOperationCompleted(StreamingJobProgressListener.scala:128) > at > org.apache.spark.streaming.scheduler.StreamingListenerBus.doPostEvent(StreamingListenerBus.scala:67) > at > org.apache.spark.streaming.scheduler.StreamingListenerBus.doPostEvent(StreamingListenerBus.scala:29) > at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:63) > at > org.apache.spark.streaming.scheduler.StreamingListenerBus.postToAll(StreamingListenerBus.scala:29) > at > org.apache.spark.streaming.scheduler.StreamingListenerBus.onOtherEvent(StreamingListenerBus.scala:43) > ... > The Spark code causing the exception is here: > https://github.com/apache/spark/blob/branch-2.1/streaming/src/main/scala/org/apache/spark/streaming/ui/StreamingJobProgressListener.scala#LC125 > override def onOutputOperationCompleted( > outputOperationCompleted: StreamingListenerOutputOperationCompleted): > Unit = synchronized { > // This method is called before onBatchCompleted > {color:red}runningBatchUIData(outputOperationCompleted.outputOperationInfo.batchTime).{color} > updateOutputOperationInfo(outputOperationCompleted.outputOperationInfo) > } > It seems to me that it may be caused by that batch being removed earlier. > https://github.com/apache/spark/blob/branch-2.1/streaming/src/main/scala/org/apache/spark/streaming/ui/StreamingJobProgressListener.scala#LC102 > override def onBatchCompleted(batchCompleted: > StreamingListenerBatchCompleted): Unit = { > synchronized { > waitingBatchUIData.remove(batchCompleted.batchInfo.batchTime) > > {color:red}runningBatchUIData.remove(batchCompleted.batchInfo.batchTime){color} > val batchUIData = BatchUIData(batchCompleted.batchInfo) > completedBatchUIData.enqueue(batchUIData) > if (completedBatchUIData.size > batchUIDataLimit) { > val removedBatch = completedBatchUIData.dequeue() > batchTimeToOutputOpIdSparkJobIdPair.remove(removedBatch.batchTime) > } > totalCompletedBatches += 1L > totalProcessedRecords += batchUIData.numRecords > } > } > What is the solution here? Should I make my spark streaming context remember > duration a lot longer? ssc.remember(batchDuration * rememberMultiple) > Otherwise, it seems like there should be some kind of existence check on > runningBatchUIData before dereferencing it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21065) Spark Streaming concurrentJobs + StreamingJobProgressListener conflict
[ https://issues.apache.org/jira/browse/SPARK-21065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17216330#comment-17216330 ] Sachit Murarka commented on SPARK-21065: [~zsxwing] , Any idea if concurrentJobs still causes issue in 2.4 release of Spark as well > Spark Streaming concurrentJobs + StreamingJobProgressListener conflict > -- > > Key: SPARK-21065 > URL: https://issues.apache.org/jira/browse/SPARK-21065 > Project: Spark > Issue Type: Bug > Components: DStreams, Scheduler, Spark Core, Web UI >Affects Versions: 2.1.0 >Reporter: Dan Dutrow >Priority: Major > > My streaming application has 200+ output operations, many of them stateful > and several of them windowed. In an attempt to reduce the processing times, I > set "spark.streaming.concurrentJobs" to 2+. Initial results are very > positive, cutting our processing time from ~3 minutes to ~1 minute, but > eventually we encounter an exception as follows: > (Note that 149697756 ms is 2017-06-09 03:06:00, so it's trying to get a > batch from 45 minutes before the exception is thrown.) > 2017-06-09 03:50:28,259 [Spark Listener Bus] ERROR > org.apache.spark.streaming.scheduler.StreamingListenerBus - Listener > StreamingJobProgressListener threw an exception > java.util.NoSuchElementException: key not found 149697756 ms > at scala.collection.MalLike$class.default(MapLike.scala:228) > at scala.collection.AbstractMap.default(Map.scala:59) > at scala.collection.mutable.HashMap.apply(HashMap.scala:65) > at > org.apache.spark.streaming.ui.StreamingJobProgressListener.onOutputOperationCompleted(StreamingJobProgressListener.scala:128) > at > org.apache.spark.streaming.scheduler.StreamingListenerBus.doPostEvent(StreamingListenerBus.scala:67) > at > org.apache.spark.streaming.scheduler.StreamingListenerBus.doPostEvent(StreamingListenerBus.scala:29) > at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:63) > at > org.apache.spark.streaming.scheduler.StreamingListenerBus.postToAll(StreamingListenerBus.scala:29) > at > org.apache.spark.streaming.scheduler.StreamingListenerBus.onOtherEvent(StreamingListenerBus.scala:43) > ... > The Spark code causing the exception is here: > https://github.com/apache/spark/blob/branch-2.1/streaming/src/main/scala/org/apache/spark/streaming/ui/StreamingJobProgressListener.scala#LC125 > override def onOutputOperationCompleted( > outputOperationCompleted: StreamingListenerOutputOperationCompleted): > Unit = synchronized { > // This method is called before onBatchCompleted > {color:red}runningBatchUIData(outputOperationCompleted.outputOperationInfo.batchTime).{color} > updateOutputOperationInfo(outputOperationCompleted.outputOperationInfo) > } > It seems to me that it may be caused by that batch being removed earlier. > https://github.com/apache/spark/blob/branch-2.1/streaming/src/main/scala/org/apache/spark/streaming/ui/StreamingJobProgressListener.scala#LC102 > override def onBatchCompleted(batchCompleted: > StreamingListenerBatchCompleted): Unit = { > synchronized { > waitingBatchUIData.remove(batchCompleted.batchInfo.batchTime) > > {color:red}runningBatchUIData.remove(batchCompleted.batchInfo.batchTime){color} > val batchUIData = BatchUIData(batchCompleted.batchInfo) > completedBatchUIData.enqueue(batchUIData) > if (completedBatchUIData.size > batchUIDataLimit) { > val removedBatch = completedBatchUIData.dequeue() > batchTimeToOutputOpIdSparkJobIdPair.remove(removedBatch.batchTime) > } > totalCompletedBatches += 1L > totalProcessedRecords += batchUIData.numRecords > } > } > What is the solution here? Should I make my spark streaming context remember > duration a lot longer? ssc.remember(batchDuration * rememberMultiple) > Otherwise, it seems like there should be some kind of existence check on > runningBatchUIData before dereferencing it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-33175) Detect duplicated mountPath and fail at Spark side
[ https://issues.apache.org/jira/browse/SPARK-33175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-33175: -- Comment: was deleted (was: User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/30080) > Detect duplicated mountPath and fail at Spark side > -- > > Key: SPARK-33175 > URL: https://issues.apache.org/jira/browse/SPARK-33175 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes >Affects Versions: 2.4.7, 3.0.2, 3.1.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.1.0 > > > If there is a mountPath conflict, the pod is created and repeats the > following error messages and keep running. This should not keep running and > we had better fail at Spark side. > {code} > $ k get pod -l 'spark-role in (driver,executor)' > NAMEREADY STATUSRESTARTS AGE > tpcds 1/1 Running 0 14m > {code} > {code} > 20/10/18 05:09:26 WARN ExecutorPodsSnapshotsStoreImpl: Exception when > notifying snapshot subscriber. > io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: > POST at: ... > Message: Pod "tpcds-exec-1" is invalid: > spec.containers[0].volumeMounts[1].mountPath: Invalid value: "/data1": must > be unique. > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33175) Detect duplicated mountPath and fail at Spark side
[ https://issues.apache.org/jira/browse/SPARK-33175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17216301#comment-17216301 ] Dongjoon Hyun commented on SPARK-33175: --- This is resolved via https://github.com/apache/spark/pull/30084 > Detect duplicated mountPath and fail at Spark side > -- > > Key: SPARK-33175 > URL: https://issues.apache.org/jira/browse/SPARK-33175 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes >Affects Versions: 2.4.7, 3.0.2, 3.1.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.1.0 > > > If there is a mountPath conflict, the pod is created and repeats the > following error messages and keep running. This should not keep running and > we had better fail at Spark side. > {code} > $ k get pod -l 'spark-role in (driver,executor)' > NAMEREADY STATUSRESTARTS AGE > tpcds 1/1 Running 0 14m > {code} > {code} > 20/10/18 05:09:26 WARN ExecutorPodsSnapshotsStoreImpl: Exception when > notifying snapshot subscriber. > io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: > POST at: ... > Message: Pod "tpcds-exec-1" is invalid: > spec.containers[0].volumeMounts[1].mountPath: Invalid value: "/data1": must > be unique. > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-33175) Detect duplicated mountPath and fail at Spark side
[ https://issues.apache.org/jira/browse/SPARK-33175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-33175. --- Fix Version/s: 3.1.0 Resolution: Fixed > Detect duplicated mountPath and fail at Spark side > -- > > Key: SPARK-33175 > URL: https://issues.apache.org/jira/browse/SPARK-33175 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes >Affects Versions: 2.4.7, 3.0.2, 3.1.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.1.0 > > > If there is a mountPath conflict, the pod is created and repeats the > following error messages and keep running. This should not keep running and > we had better fail at Spark side. > {code} > $ k get pod -l 'spark-role in (driver,executor)' > NAMEREADY STATUSRESTARTS AGE > tpcds 1/1 Running 0 14m > {code} > {code} > 20/10/18 05:09:26 WARN ExecutorPodsSnapshotsStoreImpl: Exception when > notifying snapshot subscriber. > io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: > POST at: ... > Message: Pod "tpcds-exec-1" is invalid: > spec.containers[0].volumeMounts[1].mountPath: Invalid value: "/data1": must > be unique. > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33176) Use 11-jre-slim as default in K8s Dockerfile
[ https://issues.apache.org/jira/browse/SPARK-33176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-33176: -- Affects Version/s: 3.0.1 > Use 11-jre-slim as default in K8s Dockerfile > > > Key: SPARK-33176 > URL: https://issues.apache.org/jira/browse/SPARK-33176 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes >Affects Versions: 3.0.1, 3.1.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.0.2, 3.1.0 > > > Apache Spark supports both Java8/Java11. However, there is a difference. > 1. Java8-built distribution can run both Java8/Java11 > 2. Java11-built distribution can run on Java11, but not Java8. > In short, we had better use Java11 in Dockerfile to embrace both cases > without any issues. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33176) Use 11-jre-slim as default in K8s Dockerfile
[ https://issues.apache.org/jira/browse/SPARK-33176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-33176: -- Fix Version/s: 3.0.2 > Use 11-jre-slim as default in K8s Dockerfile > > > Key: SPARK-33176 > URL: https://issues.apache.org/jira/browse/SPARK-33176 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes >Affects Versions: 3.1.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.0.2, 3.1.0 > > > Apache Spark supports both Java8/Java11. However, there is a difference. > 1. Java8-built distribution can run both Java8/Java11 > 2. Java11-built distribution can run on Java11, but not Java8. > In short, we had better use Java11 in Dockerfile to embrace both cases > without any issues. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-33176) Use 11-jre-slim as default in K8s Dockerfile
[ https://issues.apache.org/jira/browse/SPARK-33176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-33176. --- Fix Version/s: 3.1.0 Resolution: Fixed Issue resolved by pull request 30083 [https://github.com/apache/spark/pull/30083] > Use 11-jre-slim as default in K8s Dockerfile > > > Key: SPARK-33176 > URL: https://issues.apache.org/jira/browse/SPARK-33176 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes >Affects Versions: 3.1.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.1.0 > > > Apache Spark supports both Java8/Java11. However, there is a difference. > 1. Java8-built distribution can run both Java8/Java11 > 2. Java11-built distribution can run on Java11, but not Java8. > In short, we had better use Java11 in Dockerfile to embrace both cases > without any issues. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33178) Dynamic Scaling In/Scaling out of executors - Kubernetes
Vishnu G Singhal created SPARK-33178: Summary: Dynamic Scaling In/Scaling out of executors - Kubernetes Key: SPARK-33178 URL: https://issues.apache.org/jira/browse/SPARK-33178 Project: Spark Issue Type: New Feature Components: Kubernetes, Spark Core Affects Versions: 3.0.1, 3.0.0 Environment: Spark deployment on Kubernetes. Reporter: Vishnu G Singhal Can we have dynamic scaling in/out of executors in kubernetes spark deployment based on the load . -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33177) CollectList and CollectSet should not be nullable
[ https://issues.apache.org/jira/browse/SPARK-33177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17216276#comment-17216276 ] Apache Spark commented on SPARK-33177: -- User 'tanelk' has created a pull request for this issue: https://github.com/apache/spark/pull/30087 > CollectList and CollectSet should not be nullable > - > > Key: SPARK-33177 > URL: https://issues.apache.org/jira/browse/SPARK-33177 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Tanel Kiis >Priority: Minor > > CollectList and CollectSet SQL expressions never return null value. Marking > them as non-nullable can have some performance benefits, because some > optimizer rules apply only to non-nullable expressions -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33177) CollectList and CollectSet should not be nullable
[ https://issues.apache.org/jira/browse/SPARK-33177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33177: Assignee: Apache Spark > CollectList and CollectSet should not be nullable > - > > Key: SPARK-33177 > URL: https://issues.apache.org/jira/browse/SPARK-33177 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Tanel Kiis >Assignee: Apache Spark >Priority: Minor > > CollectList and CollectSet SQL expressions never return null value. Marking > them as non-nullable can have some performance benefits, because some > optimizer rules apply only to non-nullable expressions -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33177) CollectList and CollectSet should not be nullable
[ https://issues.apache.org/jira/browse/SPARK-33177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33177: Assignee: (was: Apache Spark) > CollectList and CollectSet should not be nullable > - > > Key: SPARK-33177 > URL: https://issues.apache.org/jira/browse/SPARK-33177 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Tanel Kiis >Priority: Minor > > CollectList and CollectSet SQL expressions never return null value. Marking > them as non-nullable can have some performance benefits, because some > optimizer rules apply only to non-nullable expressions -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33177) CollectList and CollectSet should not be nullable
[ https://issues.apache.org/jira/browse/SPARK-33177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17216275#comment-17216275 ] Apache Spark commented on SPARK-33177: -- User 'tanelk' has created a pull request for this issue: https://github.com/apache/spark/pull/30087 > CollectList and CollectSet should not be nullable > - > > Key: SPARK-33177 > URL: https://issues.apache.org/jira/browse/SPARK-33177 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Tanel Kiis >Priority: Minor > > CollectList and CollectSet SQL expressions never return null value. Marking > them as non-nullable can have some performance benefits, because some > optimizer rules apply only to non-nullable expressions -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33177) CollectList and CollectSet should not be nullable
Tanel Kiis created SPARK-33177: -- Summary: CollectList and CollectSet should not be nullable Key: SPARK-33177 URL: https://issues.apache.org/jira/browse/SPARK-33177 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.1.0 Reporter: Tanel Kiis CollectList and CollectSet SQL expressions never return null value. Marking them as non-nullable can have some performance benefits, because some optimizer rules apply only to non-nullable expressions -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33164) SPIP: add SQL support to "SELECT * (EXCEPT someColumn) FROM .." equivalent to DataSet.dropColumn(someColumn)
[ https://issues.apache.org/jira/browse/SPARK-33164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17216174#comment-17216174 ] Arnaud Nauwynck commented on SPARK-33164: - Notice that there is also the feature "REPLACE" that might be implemented as in BigQuery {noformat} select * (REPLACE expr as name) {noformat} see : https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#select_replace Finally, another syntaxic sugar might be {noformat} select * (RENAME oldname as newname) {noformat} ... which may be (apart from column order) equivalent to {noformat} select * (EXCEPT oldname) FROM (select *, oldname as new name) .. {noformat} > SPIP: add SQL support to "SELECT * (EXCEPT someColumn) FROM .." equivalent to > DataSet.dropColumn(someColumn) > > > Key: SPARK-33164 > URL: https://issues.apache.org/jira/browse/SPARK-33164 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.5, 2.4.6, 2.4.7, 3.0.0, 3.0.1 >Reporter: Arnaud Nauwynck >Priority: Minor > Original Estimate: 120h > Remaining Estimate: 120h > > *Q1.* What are you trying to do? Articulate your objectives using absolutely > no jargon. > I would like to have the extended SQL syntax "SELECT * EXCEPT someColumn FROM > .." > to be able to select all columns except some in a SELECT clause. > It would be similar to SQL syntax from some databases, like Google BigQuery > or PostgresQL. > https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax > Google question "select * EXCEPT one column", and you will see many > developpers have the same problems. > example posts: > https://blog.jooq.org/2018/05/14/selecting-all-columns-except-one-in-postgresql/ > https://www.thetopsites.net/article/53001825.shtml > There are several typicall examples where is is very helpfull : > use-case1: > you add "count ( * ) countCol" column, and then filter on it using for > example "having countCol = 1" > ... and then you want to select all columns EXCEPT this dummy column which > always is "1" > {noformat} > select * (EXCEPT countCol) > from ( > select count(*) countCol, * >from MyTable >where ... >group by ... having countCol = 1 > ) > {noformat} > > use-case 2: > same with analytical function "partition over(...) rankCol ... where > rankCol=1" > For example to get the latest row before a given time, in a time series > table. > This is "Time-Travel" queries addressed by framework like "DeltaLake" > {noformat} > CREATE table t_updates (update_time timestamp, id string, col1 type1, col2 > type2, ... col42) > pastTime=.. > SELECT * (except rankCol) > FROM ( >SELECT *, > RANK() OVER (PARTITION BY id ORDER BY update_time) rankCol >FROM t_updates >where update_time < pastTime > ) WHERE rankCol = 1 > > {noformat} > > use-case 3: > copy some data from table "t" to corresponding table "t_snapshot", and back > to "t" > {noformat} >CREATE TABLE t (col1 type1, col2 type2, col3 type3, ... col42 type42) ... > >/* create corresponding table: (snap_id string, col1 type1, col2 type2, > col3 type3, ... col42 type42) */ >CREATE TABLE t_snapshot >AS SELECT '' as snap_id, * FROM t WHERE 1=2 >/* insert data from t to some snapshot */ >INSERT INTO t_snapshot >SELECT 'snap1' as snap_id, * from t > >/* select some data from snapshot table (without snap_id column) .. */ >SELECT * (EXCEPT snap_id) FROM t_snapshot where snap_id='snap1' > > {noformat} > > > *Q2.* What problem is this proposal NOT designed to solve? > It is only a SQL syntaxic sugar. > It does not change SQL execution plan or anything complex. > *Q3.* How is it done today, and what are the limits of current practice? > > Today, you can either use the DataSet API, with .dropColumn(someColumn) > or you need to HARD-CODE manually all columns in your SQL. Therefore your > code is NOT generic (or you are using a SQL meta-code generator?) > *Q4.* What is new in your approach and why do you think it will be successful? > It is NOT new... it is already a proven solution from DataSet.dropColumn(), > Postgresql, BigQuery > > *Q5.* Who cares? If you are successful, what difference will it make? > It simplifies life of developpers, dba, data analysts, end users. > It simplify development of SQL code, in a more generic way for many tasks. > *Q6.* What are the risks? > There is VERY limited risk on spark SQL, because it already exists in DataSet > API. > It is an extension of SQL syntax, so the risk is annoying some IDE SQL > editors for a new SQL syntax. > *Q7.* How long will it take? > No idea. I guess someone experience
[jira] [Updated] (SPARK-33143) Make SocketAuthServer socket timeout configurable
[ https://issues.apache.org/jira/browse/SPARK-33143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-33143: - Component/s: PySpark > Make SocketAuthServer socket timeout configurable > - > > Key: SPARK-33143 > URL: https://issues.apache.org/jira/browse/SPARK-33143 > Project: Spark > Issue Type: Improvement > Components: PySpark, Spark Core >Affects Versions: 2.4.7, 3.0.1 >Reporter: Miklos Szurap >Priority: Major > > In SPARK-21551 the socket timeout for the Pyspark applications has been > increased from 3 to 15 seconds. However it is still hardcoded. > In certain situations even the 15 seconds is not enough, so it should be made > configurable. > This is requested after seeing it in real-life workload failures. > Also it has been suggested and requested in an earlier comment in > [SPARK-18649|https://issues.apache.org/jira/browse/SPARK-18649?focusedCommentId=16493498&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16493498] > In > Spark 2.4 it is under > [PythonRDD.scala|https://github.com/apache/spark/blob/branch-2.4/core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala#L899] > in Spark 3.x the code has been moved to > [SocketAuthServer.scala|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/security/SocketAuthServer.scala#L51] > {code} > serverSocket.setSoTimeout(15000) > {code} > Please include this in both 2.4 and 3.x branches. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-33168) spark REST API Unable to get JobDescription
[ https://issues.apache.org/jira/browse/SPARK-33168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-33168. -- Resolution: Cannot Reproduce > spark REST API Unable to get JobDescription > --- > > Key: SPARK-33168 > URL: https://issues.apache.org/jira/browse/SPARK-33168 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.4 >Reporter: zhaoyachao >Priority: Major > > spark set job description ,use spark REST API > (localhost:4040/api/v1/applications/xxx/jobs)unable to get job > description,but it can be displayed at localhost:4040/jobs > spark.sparkContext.setJobDescription({color:#6a8759}"test_count"{color}) > spark.range({color:#6897bb}100{color}).count() -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33168) spark REST API Unable to get JobDescription
[ https://issues.apache.org/jira/browse/SPARK-33168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17216154#comment-17216154 ] Hyukjin Kwon commented on SPARK-33168: -- I can retrieve the job description as below: {code:java} localhost:4040/api/v1/applications/local-1603014688155/jobs [ { "jobId" : 0, "name" : "count at :24", "description" : "test_count", "submissionTime" : "2020-10-18T09:51:32.690GMT", "completionTime" : "2020-10-18T09:51:33.473GMT", "stageIds" : [ 0, 1 ], "status" : "SUCCEEDED", "numTasks" : 17, "numActiveTasks" : 0, "numCompletedTasks" : 17, "numSkippedTasks" : 0, "numFailedTasks" : 0, "numKilledTasks" : 0, "numCompletedIndices" : 17, "numActiveStages" : 0, "numCompletedStages" : 2, "numSkippedStages" : 0, "numFailedStages" : 0, "killedTasksSummary" : { } } ]% {code} > spark REST API Unable to get JobDescription > --- > > Key: SPARK-33168 > URL: https://issues.apache.org/jira/browse/SPARK-33168 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.4 >Reporter: zhaoyachao >Priority: Major > > spark set job description ,use spark REST API > (localhost:4040/api/v1/applications/xxx/jobs)unable to get job > description,but it can be displayed at localhost:4040/jobs > spark.sparkContext.setJobDescription({color:#6a8759}"test_count"{color}) > spark.range({color:#6897bb}100{color}).count() -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32323) Javascript/HTML bug in spark application UI
[ https://issues.apache.org/jira/browse/SPARK-32323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17216145#comment-17216145 ] Ihor Bobak commented on SPARK-32323: Firefox 81.0.2 64bit, the latest one. But the bug appeared long ago, and until then Firefox updated multiple times on my VM. I believe this is not about > Javascript/HTML bug in spark application UI > --- > > Key: SPARK-32323 > URL: https://issues.apache.org/jira/browse/SPARK-32323 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 3.0.0 > Environment: Ubuntu 18, Spark 3.0.0 standalone cluster >Reporter: Ihor Bobak >Priority: Major > Attachments: 2020-07-15 16_36_31-pyspark-shell - Spark Jobs.png > > > I attached screeenshot - everything is written on it. > This appeared in Spark 3.0.0 in the Firefox browser (latest version) > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-32323) Javascript/HTML bug in spark application UI
[ https://issues.apache.org/jira/browse/SPARK-32323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17216145#comment-17216145 ] Ihor Bobak edited comment on SPARK-32323 at 10/18/20, 9:17 AM: --- Firefox 81.0.2 64bit, the latest one. But the bug appeared long ago, and until then Firefox updated multiple times on my VM. I believe this is not related to the version of the browser was (Author: ibobak): Firefox 81.0.2 64bit, the latest one. But the bug appeared long ago, and until then Firefox updated multiple times on my VM. I believe this is not about > Javascript/HTML bug in spark application UI > --- > > Key: SPARK-32323 > URL: https://issues.apache.org/jira/browse/SPARK-32323 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 3.0.0 > Environment: Ubuntu 18, Spark 3.0.0 standalone cluster >Reporter: Ihor Bobak >Priority: Major > Attachments: 2020-07-15 16_36_31-pyspark-shell - Spark Jobs.png > > > I attached screeenshot - everything is written on it. > This appeared in Spark 3.0.0 in the Firefox browser (latest version) > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-32323) Javascript/HTML bug in spark application UI
[ https://issues.apache.org/jira/browse/SPARK-32323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] akiyamaneko updated SPARK-32323: Comment: was deleted (was: [~ibobak] hi, can you provide your browser info>) > Javascript/HTML bug in spark application UI > --- > > Key: SPARK-32323 > URL: https://issues.apache.org/jira/browse/SPARK-32323 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 3.0.0 > Environment: Ubuntu 18, Spark 3.0.0 standalone cluster >Reporter: Ihor Bobak >Priority: Major > Attachments: 2020-07-15 16_36_31-pyspark-shell - Spark Jobs.png > > > I attached screeenshot - everything is written on it. > This appeared in Spark 3.0.0 in the Firefox browser (latest version) > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33172) Spark SQL CodeGenerator does not check for UserDefined type
[ https://issues.apache.org/jira/browse/SPARK-33172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-33172: - Target Version/s: (was: 2.4.8, 3.0.2) > Spark SQL CodeGenerator does not check for UserDefined type > --- > > Key: SPARK-33172 > URL: https://issues.apache.org/jira/browse/SPARK-33172 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 2.4.7, 3.0.1 >Reporter: David Rabinowitz >Priority: Minor > > The CodeGenerator takes the DataType given to {{getValueFromVector()}} as > is, and generates code based on its type. The generated code is not aware of > the actual type, and therefore cannot be compiled. For example, using a > DataFrame with a Spark ML Vector (VectorUDT) the generated code is: > {{InternalRow datasourcev2scan_value_2 = datasourcev2scan_isNull_2 ? null : > (datasourcev2scan_mutableStateArray_2[2].getStruct(datasourcev2scan_rowIdx_0, > 4));}} > {{ Which leads to a runtime error of}} > {{20/10/14 13:20:51 ERROR CodeGenerator: failed to compile: > org.codehaus.commons.compiler.CompileException: File 'generated.java', Line > 153, Column 126: No applicable constructor/method found for actual parameters > "int, int"; candidates are: "public > org.apache.spark.sql.vectorized.ColumnarRow > org.apache.spark.sql.vectorized.ColumnVector.getStruct(int)"}} > {{ org.codehaus.commons.compiler.CompileException: File 'generated.java', > Line 153, Column 126: No applicable constructor/method found for actual > parameters "int, int"; candidates are: "public > org.apache.spark.sql.vectorized.ColumnarRow > org.apache.spark.sql.vectorized.ColumnVector.getStruct(int)"}} > {{ at org.codehaus.janino.UnitCompiler.compileError(UnitCompiler.java:12124)}} > {{...}} > {{ which then throws Spark to an infinite loop of this error.}} > The solution is quite simple, {{getValueFromVector()}} should match nad > handle UserDefinedType the same as {{CodeGenerator.javaType()}} is doing. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33109) Upgrade to SBT 1.4 and support `dependencyTree` back
[ https://issues.apache.org/jira/browse/SPARK-33109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17216127#comment-17216127 ] Apache Spark commented on SPARK-33109: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/30085 > Upgrade to SBT 1.4 and support `dependencyTree` back > > > Key: SPARK-33109 > URL: https://issues.apache.org/jira/browse/SPARK-33109 > Project: Spark > Issue Type: Task > Components: Build >Affects Versions: 3.1.0 >Reporter: Dongjoon Hyun >Assignee: Denis Pyshev >Priority: Major > Fix For: 3.1.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33131) Fix grouping sets with having clause can not resolve qualified col name
[ https://issues.apache.org/jira/browse/SPARK-33131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-33131: -- Fix Version/s: 2.4.8 > Fix grouping sets with having clause can not resolve qualified col name > --- > > Key: SPARK-33131 > URL: https://issues.apache.org/jira/browse/SPARK-33131 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0, 2.4.1, 2.4.2, 2.4.3, 2.4.4, 2.4.5, 2.4.6, 2.4.7, > 3.1.0 >Reporter: ulysses you >Assignee: ulysses you >Priority: Minor > Fix For: 2.4.8, 3.0.2, 3.1.0 > > > Grouping sets construct new aggregate lost the qualified name of grouping > expression. Here is a example: > {code:java} > -- Works resolved by ResolveReferences > select c1 from values (1) as t1(c1) group by grouping sets(t1.c1) having c1 = > 1 > -- Works because of the extra expression c1 > select c1 as c2 from values (1) as t1(c1) group by grouping sets(t1.c1) > having t1.c1 = 1 > -- Failed > select c1 from values (1) as t1(c1) group by grouping sets(t1.c1) having > t1.c1 = 1{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33131) Fix grouping sets with having clause can not resolve qualified col name
[ https://issues.apache.org/jira/browse/SPARK-33131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-33131: -- Priority: Major (was: Minor) > Fix grouping sets with having clause can not resolve qualified col name > --- > > Key: SPARK-33131 > URL: https://issues.apache.org/jira/browse/SPARK-33131 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0, 2.4.1, 2.4.2, 2.4.3, 2.4.4, 2.4.5, 2.4.6, 2.4.7, > 3.1.0 >Reporter: ulysses you >Assignee: ulysses you >Priority: Major > Fix For: 2.4.8, 3.0.2, 3.1.0 > > > Grouping sets construct new aggregate lost the qualified name of grouping > expression. Here is a example: > {code:java} > -- Works resolved by ResolveReferences > select c1 from values (1) as t1(c1) group by grouping sets(t1.c1) having c1 = > 1 > -- Works because of the extra expression c1 > select c1 as c2 from values (1) as t1(c1) group by grouping sets(t1.c1) > having t1.c1 = 1 > -- Failed > select c1 from values (1) as t1(c1) group by grouping sets(t1.c1) having > t1.c1 = 1{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33175) Detect duplicated mountPath and fail at Spark side
[ https://issues.apache.org/jira/browse/SPARK-33175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17216114#comment-17216114 ] Apache Spark commented on SPARK-33175: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/30084 > Detect duplicated mountPath and fail at Spark side > -- > > Key: SPARK-33175 > URL: https://issues.apache.org/jira/browse/SPARK-33175 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes >Affects Versions: 2.4.7, 3.0.2, 3.1.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > > If there is a mountPath conflict, the pod is created and repeats the > following error messages and keep running. This should not keep running and > we had better fail at Spark side. > {code} > $ k get pod -l 'spark-role in (driver,executor)' > NAMEREADY STATUSRESTARTS AGE > tpcds 1/1 Running 0 14m > {code} > {code} > 20/10/18 05:09:26 WARN ExecutorPodsSnapshotsStoreImpl: Exception when > notifying snapshot subscriber. > io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: > POST at: ... > Message: Pod "tpcds-exec-1" is invalid: > spec.containers[0].volumeMounts[1].mountPath: Invalid value: "/data1": must > be unique. > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org