[jira] [Commented] (SPARK-33958) spark sql DoubleType(0 * (-1)) return "-0.0"
[ https://issues.apache.org/jira/browse/SPARK-33958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258109#comment-17258109 ] Zhang Jianguo commented on SPARK-33958: --- [~yumwang] Gauss and Oracle return 0. And it looks mathe troditional SQL standard better. My solution as following, plus 0.0 at every return of FloatType and DoubleType. 0.0 + 0.0 = 0.0 -0.0 + 0.0 = 0.0 I can provide pull request later. > spark sql DoubleType(0 * (-1)) return "-0.0" > - > > Key: SPARK-33958 > URL: https://issues.apache.org/jira/browse/SPARK-33958 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.3.2, 2.4.5, 3.0.0 >Reporter: Zhang Jianguo >Priority: Minor > > spark version: 2.3.2 > {code:java} > create table test_zjg(a double); > insert into test_zjg values(-1.0); > select a*0 from test_zjg > {code} > After select operation, *{color:#de350b}we will get -0.0 which expected as > 0.0:{color}* > \+\+ > \|(a * CAST(0 AS DOUBLE))\| > \+\+ > \|-0.0 \| > \+\+ > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33985) Transform with clusterby/orderby/sortby
[ https://issues.apache.org/jira/browse/SPARK-33985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] angerszhu updated SPARK-33985: -- Summary: Transform with clusterby/orderby/sortby (was: Support transform with clusterby/orderby/sortby) > Transform with clusterby/orderby/sortby > --- > > Key: SPARK-33985 > URL: https://issues.apache.org/jira/browse/SPARK-33985 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: angerszhu >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33988) Add an option to enable CBO in TPCDSQueryBenchmark
Takeshi Yamamuro created SPARK-33988: Summary: Add an option to enable CBO in TPCDSQueryBenchmark Key: SPARK-33988 URL: https://issues.apache.org/jira/browse/SPARK-33988 Project: Spark Issue Type: Test Components: SQL, Tests Affects Versions: 3.2.0 Reporter: Takeshi Yamamuro This ticket aims at adding a new option {{--cbo}} to enable CBO in TPCDSQueryBenchmark. I think this option is useful so as to monitor performance changes with CBO enabled. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33977) Add doc for "'like any' and 'like all' operators"
[ https://issues.apache.org/jira/browse/SPARK-33977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258121#comment-17258121 ] Apache Spark commented on SPARK-33977: -- User 'beliefer' has created a pull request for this issue: https://github.com/apache/spark/pull/31008 > Add doc for "'like any' and 'like all' operators" > - > > Key: SPARK-33977 > URL: https://issues.apache.org/jira/browse/SPARK-33977 > Project: Spark > Issue Type: Documentation > Components: Documentation >Affects Versions: 3.1.0 >Reporter: Xiao Li >Priority: Major > > Need to update the doc for the new LIKE predicates in the following file: > [https://github.com/apache/spark/blob/master/docs/sql-ref-syntax-qry-select-like.md] > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33977) Add doc for "'like any' and 'like all' operators"
[ https://issues.apache.org/jira/browse/SPARK-33977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33977: Assignee: (was: Apache Spark) > Add doc for "'like any' and 'like all' operators" > - > > Key: SPARK-33977 > URL: https://issues.apache.org/jira/browse/SPARK-33977 > Project: Spark > Issue Type: Documentation > Components: Documentation >Affects Versions: 3.1.0 >Reporter: Xiao Li >Priority: Major > > Need to update the doc for the new LIKE predicates in the following file: > [https://github.com/apache/spark/blob/master/docs/sql-ref-syntax-qry-select-like.md] > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33977) Add doc for "'like any' and 'like all' operators"
[ https://issues.apache.org/jira/browse/SPARK-33977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33977: Assignee: Apache Spark > Add doc for "'like any' and 'like all' operators" > - > > Key: SPARK-33977 > URL: https://issues.apache.org/jira/browse/SPARK-33977 > Project: Spark > Issue Type: Documentation > Components: Documentation >Affects Versions: 3.1.0 >Reporter: Xiao Li >Assignee: Apache Spark >Priority: Major > > Need to update the doc for the new LIKE predicates in the following file: > [https://github.com/apache/spark/blob/master/docs/sql-ref-syntax-qry-select-like.md] > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33976) Add a dedicated SQL document page for the TRANSFORM-related functionality,
[ https://issues.apache.org/jira/browse/SPARK-33976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258143#comment-17258143 ] Apache Spark commented on SPARK-33976: -- User 'AngersZh' has created a pull request for this issue: https://github.com/apache/spark/pull/31010 > Add a dedicated SQL document page for the TRANSFORM-related functionality, > -- > > Key: SPARK-33976 > URL: https://issues.apache.org/jira/browse/SPARK-33976 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: angerszhu >Priority: Major > > Add doc about transform > https://github.com/apache/spark/pull/30973#issuecomment-753715318 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33987) v2 ALTER TABLE .. DROP PARTITION does not refresh cached table
Maxim Gekk created SPARK-33987: -- Summary: v2 ALTER TABLE .. DROP PARTITION does not refresh cached table Key: SPARK-33987 URL: https://issues.apache.org/jira/browse/SPARK-33987 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.2.0 Reporter: Maxim Gekk The test below portraits the issue: {code:scala} test("SPARK-33950: refresh cache after partition dropping") { withNamespaceAndTable("ns", "tbl") { t => sql(s"CREATE TABLE $t (id int, part int) $defaultUsing PARTITIONED BY (part)") sql(s"INSERT INTO $t PARTITION (part=0) SELECT 0") sql(s"INSERT INTO $t PARTITION (part=1) SELECT 1") assert(!spark.catalog.isCached(t)) sql(s"CACHE TABLE $t") assert(spark.catalog.isCached(t)) QueryTest.checkAnswer(sql(s"SELECT * FROM $t"), Seq(Row(0, 0), Row(1, 1))) sql(s"ALTER TABLE $t DROP PARTITION (part=0)") assert(spark.catalog.isCached(t)) QueryTest.checkAnswer(sql(s"SELECT * FROM $t"), Seq(Row(1, 1))) } } {code} The last check fails: {code} == Results == !== Correct Answer - 1 == == Spark Answer - 2 == !struct<> struct ![1,1] [0,0] ! [1,1] {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33988) Add an option to enable CBO in TPCDSQueryBenchmark
[ https://issues.apache.org/jira/browse/SPARK-33988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33988: Assignee: Apache Spark > Add an option to enable CBO in TPCDSQueryBenchmark > -- > > Key: SPARK-33988 > URL: https://issues.apache.org/jira/browse/SPARK-33988 > Project: Spark > Issue Type: Test > Components: SQL, Tests >Affects Versions: 3.2.0 >Reporter: Takeshi Yamamuro >Assignee: Apache Spark >Priority: Major > > This ticket aims at adding a new option {{--cbo}} to enable CBO in > TPCDSQueryBenchmark. I think this option is useful so as to monitor > performance changes with CBO enabled. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33988) Add an option to enable CBO in TPCDSQueryBenchmark
[ https://issues.apache.org/jira/browse/SPARK-33988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258155#comment-17258155 ] Apache Spark commented on SPARK-33988: -- User 'maropu' has created a pull request for this issue: https://github.com/apache/spark/pull/31011 > Add an option to enable CBO in TPCDSQueryBenchmark > -- > > Key: SPARK-33988 > URL: https://issues.apache.org/jira/browse/SPARK-33988 > Project: Spark > Issue Type: Test > Components: SQL, Tests >Affects Versions: 3.2.0 >Reporter: Takeshi Yamamuro >Priority: Major > > This ticket aims at adding a new option {{--cbo}} to enable CBO in > TPCDSQueryBenchmark. I think this option is useful so as to monitor > performance changes with CBO enabled. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33989) Strip auto-generated cast when resolving UnresolvedAlias
ulysses you created SPARK-33989: --- Summary: Strip auto-generated cast when resolving UnresolvedAlias Key: SPARK-33989 URL: https://issues.apache.org/jira/browse/SPARK-33989 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.2.0 Reporter: ulysses you During analysis we may introduce the Cast if exists type cast implicitly. That makes assgined name unclear. Let's say we have a sql `select id == null` which id is int type, then the output field name will be `(id = CAST(null as int))`. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33987) v2 ALTER TABLE .. DROP PARTITION does not refresh cached table
[ https://issues.apache.org/jira/browse/SPARK-33987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258179#comment-17258179 ] Maxim Gekk commented on SPARK-33987: I am working on this > v2 ALTER TABLE .. DROP PARTITION does not refresh cached table > -- > > Key: SPARK-33987 > URL: https://issues.apache.org/jira/browse/SPARK-33987 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Maxim Gekk >Priority: Major > > The test below portraits the issue: > {code:scala} > test("SPARK-33950: refresh cache after partition dropping") { > withNamespaceAndTable("ns", "tbl") { t => > sql(s"CREATE TABLE $t (id int, part int) $defaultUsing PARTITIONED BY > (part)") > sql(s"INSERT INTO $t PARTITION (part=0) SELECT 0") > sql(s"INSERT INTO $t PARTITION (part=1) SELECT 1") > assert(!spark.catalog.isCached(t)) > sql(s"CACHE TABLE $t") > assert(spark.catalog.isCached(t)) > QueryTest.checkAnswer(sql(s"SELECT * FROM $t"), Seq(Row(0, 0), Row(1, > 1))) > sql(s"ALTER TABLE $t DROP PARTITION (part=0)") > assert(spark.catalog.isCached(t)) > QueryTest.checkAnswer(sql(s"SELECT * FROM $t"), Seq(Row(1, 1))) > } > } > {code} > The last check fails: > {code} > == Results == > !== Correct Answer - 1 == == Spark Answer - 2 == > !struct<> struct > ![1,1] [0,0] > ! [1,1] > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33990) v2 ALTER TABLE .. DROP PARTITION does not remove data from dropped partition
Maxim Gekk created SPARK-33990: -- Summary: v2 ALTER TABLE .. DROP PARTITION does not remove data from dropped partition Key: SPARK-33990 URL: https://issues.apache.org/jira/browse/SPARK-33990 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.2.0 Reporter: Maxim Gekk The test fails: {code:scala} test("SPARK-X: don not return data from dropped partition") { withNamespaceAndTable("ns", "tbl") { t => sql(s"CREATE TABLE $t (id int, part int) $defaultUsing PARTITIONED BY (part)") sql(s"INSERT INTO $t PARTITION (part=0) SELECT 0") sql(s"INSERT INTO $t PARTITION (part=1) SELECT 1") QueryTest.checkAnswer(sql(s"SELECT * FROM $t"), Seq(Row(0, 0), Row(1, 1))) sql(s"ALTER TABLE $t DROP PARTITION (part=0)") QueryTest.checkAnswer(sql(s"SELECT * FROM $t"), Seq(Row(1, 1))) } } {code} on the last check with: {code} == Results == !== Correct Answer - 1 == == Spark Answer - 2 == !struct<> struct ![1,1] [0,0] ! [1,1] {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33950) ALTER TABLE .. DROP PARTITION doesn't refresh cache
[ https://issues.apache.org/jira/browse/SPARK-33950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258075#comment-17258075 ] Apache Spark commented on SPARK-33950: -- User 'MaxGekk' has created a pull request for this issue: https://github.com/apache/spark/pull/31006 > ALTER TABLE .. DROP PARTITION doesn't refresh cache > --- > > Key: SPARK-33950 > URL: https://issues.apache.org/jira/browse/SPARK-33950 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > Fix For: 3.2.0 > > > Here is the example to reproduce the issue: > {code:sql} > spark-sql> CREATE TABLE tbl1 (col0 int, part0 int) USING parquet PARTITIONED > BY (part0); > spark-sql> INSERT INTO tbl1 PARTITION (part0=0) SELECT 0; > spark-sql> INSERT INTO tbl1 PARTITION (part0=1) SELECT 1; > spark-sql> CACHE TABLE tbl1; > spark-sql> SELECT * FROM tbl1; > 0 0 > 1 1 > spark-sql> ALTER TABLE tbl1 DROP PARTITION (part0=0); > spark-sql> SELECT * FROM tbl1; > 0 0 > 1 1 > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33983) Update cloudpickle to v1.6.0
[ https://issues.apache.org/jira/browse/SPARK-33983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258104#comment-17258104 ] Apache Spark commented on SPARK-33983: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/31007 > Update cloudpickle to v1.6.0 > > > Key: SPARK-33983 > URL: https://issues.apache.org/jira/browse/SPARK-33983 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Hyukjin Kwon >Priority: Major > > Cloudpickle 1.6.0 is released out. We should better match it to the latest > version. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-33982) Sparksql does not support when the inserted table is a read table
[ https://issues.apache.org/jira/browse/SPARK-33982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258052#comment-17258052 ] hao edited comment on SPARK-33982 at 1/4/21, 11:19 AM: --- 我认为sparksql应该得到支持insert overwrite 读取表中 was (Author: hao.duan): 我认为sparksql应该得到支持 > Sparksql does not support when the inserted table is a read table > - > > Key: SPARK-33982 > URL: https://issues.apache.org/jira/browse/SPARK-33982 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.1 >Reporter: hao >Priority: Major > > When the inserted table is a read table, sparksql will throw an error - > > org.apache.spark . sql.AnalysisException : Cannot overwrite a path that is > also being read from.; -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33949) Make approx_count_distinct result consistent whether Optimize rule exists or not
[ https://issues.apache.org/jira/browse/SPARK-33949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258074#comment-17258074 ] Apache Spark commented on SPARK-33949: -- User 'ulysses-you' has created a pull request for this issue: https://github.com/apache/spark/pull/31005 > Make approx_count_distinct result consistent whether Optimize rule exists or > not > > > Key: SPARK-33949 > URL: https://issues.apache.org/jira/browse/SPARK-33949 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: ulysses you >Priority: Minor > > This code will fail because folabe value not fold, we should keep result > consistent whether Optimize rule exists or not. > {code:java} > val excludedRules = Seq(ConstantFolding, > ReorderAssociativeOperator).map(_.ruleName) > withSQLConf(SQLConf.OPTIMIZER_EXCLUDED_RULES.key -> > excludedRules.mkString(",")) { > sql("select approx_count_distinct(1, 0.01 + 0.02)") > }{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33949) Make approx_count_distinct result consistent whether Optimize rule exists or not
[ https://issues.apache.org/jira/browse/SPARK-33949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33949: Assignee: Apache Spark > Make approx_count_distinct result consistent whether Optimize rule exists or > not > > > Key: SPARK-33949 > URL: https://issues.apache.org/jira/browse/SPARK-33949 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: ulysses you >Assignee: Apache Spark >Priority: Minor > > This code will fail because folabe value not fold, we should keep result > consistent whether Optimize rule exists or not. > {code:java} > val excludedRules = Seq(ConstantFolding, > ReorderAssociativeOperator).map(_.ruleName) > withSQLConf(SQLConf.OPTIMIZER_EXCLUDED_RULES.key -> > excludedRules.mkString(",")) { > sql("select approx_count_distinct(1, 0.01 + 0.02)") > }{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33949) Make approx_count_distinct result consistent whether Optimize rule exists or not
[ https://issues.apache.org/jira/browse/SPARK-33949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33949: Assignee: (was: Apache Spark) > Make approx_count_distinct result consistent whether Optimize rule exists or > not > > > Key: SPARK-33949 > URL: https://issues.apache.org/jira/browse/SPARK-33949 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: ulysses you >Priority: Minor > > This code will fail because folabe value not fold, we should keep result > consistent whether Optimize rule exists or not. > {code:java} > val excludedRules = Seq(ConstantFolding, > ReorderAssociativeOperator).map(_.ruleName) > withSQLConf(SQLConf.OPTIMIZER_EXCLUDED_RULES.key -> > excludedRules.mkString(",")) { > sql("select approx_count_distinct(1, 0.01 + 0.02)") > }{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33983) Update cloudpickle to v1.6.0
[ https://issues.apache.org/jira/browse/SPARK-33983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33983: Assignee: (was: Apache Spark) > Update cloudpickle to v1.6.0 > > > Key: SPARK-33983 > URL: https://issues.apache.org/jira/browse/SPARK-33983 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Hyukjin Kwon >Priority: Major > > Cloudpickle 1.6.0 is released out. We should better match it to the latest > version. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33983) Update cloudpickle to v1.6.0
[ https://issues.apache.org/jira/browse/SPARK-33983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33983: Assignee: Apache Spark > Update cloudpickle to v1.6.0 > > > Key: SPARK-33983 > URL: https://issues.apache.org/jira/browse/SPARK-33983 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Hyukjin Kwon >Assignee: Apache Spark >Priority: Major > > Cloudpickle 1.6.0 is released out. We should better match it to the latest > version. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-33958) spark sql DoubleType(0 * (-1)) return "-0.0"
[ https://issues.apache.org/jira/browse/SPARK-33958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258109#comment-17258109 ] Zhang Jianguo edited comment on SPARK-33958 at 1/4/21, 9:50 AM: [~yumwang] Gauss and Oracle return 0. And it looks match SQL standard better. My solution as following, plus 0.0 at every return of FloatType and DoubleType. 0.0 + 0.0 = 0.0 -0.0 + 0.0 = 0.0 I can provide pull request later. was (Author: alberyzjg): [~yumwang] Gauss and Oracle return 0. And it looks mathe troditional SQL standard better. My solution as following, plus 0.0 at every return of FloatType and DoubleType. 0.0 + 0.0 = 0.0 -0.0 + 0.0 = 0.0 I can provide pull request later. > spark sql DoubleType(0 * (-1)) return "-0.0" > - > > Key: SPARK-33958 > URL: https://issues.apache.org/jira/browse/SPARK-33958 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.3.2, 2.4.5, 3.0.0 >Reporter: Zhang Jianguo >Priority: Minor > > spark version: 2.3.2 > {code:java} > create table test_zjg(a double); > insert into test_zjg values(-1.0); > select a*0 from test_zjg > {code} > After select operation, *{color:#de350b}we will get -0.0 which expected as > 0.0:{color}* > \+\+ > \|(a * CAST(0 AS DOUBLE))\| > \+\+ > \|-0.0 \| > \+\+ > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33984) Upgrade to Py4J 0.10.9.1
Hyukjin Kwon created SPARK-33984: Summary: Upgrade to Py4J 0.10.9.1 Key: SPARK-33984 URL: https://issues.apache.org/jira/browse/SPARK-33984 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 3.2.0 Reporter: Hyukjin Kwon Py4J 0.10.9.1 is out with bug fixes. we should better upgrade in PySpark as well. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33985) Support transform with clusterby/orderby/sortby
angerszhu created SPARK-33985: - Summary: Support transform with clusterby/orderby/sortby Key: SPARK-33985 URL: https://issues.apache.org/jira/browse/SPARK-33985 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.2.0 Reporter: angerszhu -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33990) v2 ALTER TABLE .. DROP PARTITION does not remove data from dropped partition
[ https://issues.apache.org/jira/browse/SPARK-33990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258183#comment-17258183 ] Maxim Gekk commented on SPARK-33990: I am working on the issue. > v2 ALTER TABLE .. DROP PARTITION does not remove data from dropped partition > > > Key: SPARK-33990 > URL: https://issues.apache.org/jira/browse/SPARK-33990 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Maxim Gekk >Priority: Major > > The test fails: > {code:scala} > test("SPARK-X: don not return data from dropped partition") { > withNamespaceAndTable("ns", "tbl") { t => > sql(s"CREATE TABLE $t (id int, part int) $defaultUsing PARTITIONED BY > (part)") > sql(s"INSERT INTO $t PARTITION (part=0) SELECT 0") > sql(s"INSERT INTO $t PARTITION (part=1) SELECT 1") > QueryTest.checkAnswer(sql(s"SELECT * FROM $t"), Seq(Row(0, 0), Row(1, > 1))) > sql(s"ALTER TABLE $t DROP PARTITION (part=0)") > QueryTest.checkAnswer(sql(s"SELECT * FROM $t"), Seq(Row(1, 1))) > } > } > {code} > on the last check with: > {code} > == Results == > !== Correct Answer - 1 == == Spark Answer - 2 == > !struct<> struct > ![1,1] [0,0] > ! [1,1] > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33005) Kubernetes GA Preparation
[ https://issues.apache.org/jira/browse/SPARK-33005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258047#comment-17258047 ] Dongjoon Hyun commented on SPARK-33005: --- Sure, [~hyukjin.kwon]. > Kubernetes GA Preparation > - > > Key: SPARK-33005 > URL: https://issues.apache.org/jira/browse/SPARK-33005 > Project: Spark > Issue Type: Umbrella > Components: Kubernetes >Affects Versions: 3.1.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: releasenotes > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33984) Upgrade to Py4J 0.10.9.1
[ https://issues.apache.org/jira/browse/SPARK-33984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258134#comment-17258134 ] Apache Spark commented on SPARK-33984: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/31009 > Upgrade to Py4J 0.10.9.1 > > > Key: SPARK-33984 > URL: https://issues.apache.org/jira/browse/SPARK-33984 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Hyukjin Kwon >Priority: Major > > Py4J 0.10.9.1 is out with bug fixes. we should better upgrade in PySpark as > well. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33984) Upgrade to Py4J 0.10.9.1
[ https://issues.apache.org/jira/browse/SPARK-33984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33984: Assignee: (was: Apache Spark) > Upgrade to Py4J 0.10.9.1 > > > Key: SPARK-33984 > URL: https://issues.apache.org/jira/browse/SPARK-33984 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Hyukjin Kwon >Priority: Major > > Py4J 0.10.9.1 is out with bug fixes. we should better upgrade in PySpark as > well. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33984) Upgrade to Py4J 0.10.9.1
[ https://issues.apache.org/jira/browse/SPARK-33984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33984: Assignee: Apache Spark > Upgrade to Py4J 0.10.9.1 > > > Key: SPARK-33984 > URL: https://issues.apache.org/jira/browse/SPARK-33984 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Hyukjin Kwon >Assignee: Apache Spark >Priority: Major > > Py4J 0.10.9.1 is out with bug fixes. we should better upgrade in PySpark as > well. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33984) Upgrade to Py4J 0.10.9.1
[ https://issues.apache.org/jira/browse/SPARK-33984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258135#comment-17258135 ] Apache Spark commented on SPARK-33984: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/31009 > Upgrade to Py4J 0.10.9.1 > > > Key: SPARK-33984 > URL: https://issues.apache.org/jira/browse/SPARK-33984 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Hyukjin Kwon >Priority: Major > > Py4J 0.10.9.1 is out with bug fixes. we should better upgrade in PySpark as > well. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33976) Add a dedicated SQL document page for the TRANSFORM-related functionality,
[ https://issues.apache.org/jira/browse/SPARK-33976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33976: Assignee: Apache Spark > Add a dedicated SQL document page for the TRANSFORM-related functionality, > -- > > Key: SPARK-33976 > URL: https://issues.apache.org/jira/browse/SPARK-33976 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: angerszhu >Assignee: Apache Spark >Priority: Major > > Add doc about transform > https://github.com/apache/spark/pull/30973#issuecomment-753715318 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33986) Spark handle always return LOST status in standalone cluster mode with Spark launcher
ZhongyuWang created SPARK-33986: --- Summary: Spark handle always return LOST status in standalone cluster mode with Spark launcher Key: SPARK-33986 URL: https://issues.apache.org/jira/browse/SPARK-33986 Project: Spark Issue Type: Question Components: Spark Submit Affects Versions: 2.4.4 Environment: apache hadoop 2.6.5 apache spark 2.4.4 Reporter: ZhongyuWang I can use it to submit spark app successfully in standalone client/yarn client/yarn cluster mode,and get correct app status, but when i submit spark app in standalone cluster mode, Spark handle always return LOST status(once) and app running stablely until FINISHED( handle wasn't get any state change infomation). I noticed when I submited app from code, after a while, the SparkSubmit process was suddenly stopped. I checked sparkSubmit log(launcher redirect log) doesn't have any useful information. this is my pseudo code, {code:java} SparkAppHandle handle = launcher.startApplication(new SparkAppHandle.Listener() { @Override public void stateChanged(SparkAppHandle handle) { stateChangedHandle(handle.getAppId(), jobId, code, execId, handle.getState(), driverInfo, request, infoLog, errorLog); } @Override public void infoChanged(SparkAppHandle handle) { stateChangedHandle(handle.getAppId(), jobId, code, execId, handle.getState(), driverInfo, request, infoLog, errorLog); } });{code} any idea ? thx -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33976) Add a dedicated SQL document page for the TRANSFORM-related functionality,
[ https://issues.apache.org/jira/browse/SPARK-33976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33976: Assignee: (was: Apache Spark) > Add a dedicated SQL document page for the TRANSFORM-related functionality, > -- > > Key: SPARK-33976 > URL: https://issues.apache.org/jira/browse/SPARK-33976 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: angerszhu >Priority: Major > > Add doc about transform > https://github.com/apache/spark/pull/30973#issuecomment-753715318 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33986) Spark handle always return LOST status in standalone cluster mode with Spark launcher
[ https://issues.apache.org/jira/browse/SPARK-33986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhongyuWang updated SPARK-33986: Description: I can use it to submit spark app successfully in standalone client/yarn client/yarn cluster mode,and get correct app status, but when i submit spark app in standalone cluster mode, Spark handle always return LOST status(once) and app running stablely until FINISHED( handle wasn't get any state change infomation). I noticed when I submited app from code, after a while, the SparkSubmit process was suddenly stopped. I checked sparkSubmit log(launcher redirect log) doesn't have any useful information. this is my pseudo code, {code:java} SparkAppHandle handle = launcher.startApplication(new SparkAppHandle.Listener() { @Override public void stateChanged(SparkAppHandle handle) { stateChangedHandle(handle.getAppId(), jobId, code, execId, handle.getState(), driverInfo, request, infoLog, errorLog); } @Override public void infoChanged(SparkAppHandle handle) { stateChangedHandle(handle.getAppId(), jobId, code, execId, handle.getState(), driverInfo, request, infoLog, errorLog); } });{code} any idea ? thx was: I can use it to submit spark app successfully in standalone client/yarn client/yarn cluster mode,and get correct app status, but when i submit spark app in standalone cluster mode, Spark handle always return LOST status(once) and app running stablely until FINISHED( handle wasn't get any state change infomation). I noticed when I submited app from code, after a while, the SparkSubmit process was suddenly stopped. I checked sparkSubmit log(launcher redirect log) doesn't have any useful information. this is my pseudo code, {code:java} SparkAppHandle handle = launcher.startApplication(new SparkAppHandle.Listener() { @Override public void stateChanged(SparkAppHandle handle) { stateChangedHandle(handle.getAppId(), jobId, code, execId, handle.getState(), driverInfo, request, infoLog, errorLog); } @Override public void infoChanged(SparkAppHandle handle) { stateChangedHandle(handle.getAppId(), jobId, code, execId, handle.getState(), driverInfo, request, infoLog, errorLog); } });{code} any idea ? thx > Spark handle always return LOST status in standalone cluster mode with Spark > launcher > - > > Key: SPARK-33986 > URL: https://issues.apache.org/jira/browse/SPARK-33986 > Project: Spark > Issue Type: Question > Components: Spark Submit >Affects Versions: 2.4.4 > Environment: apache hadoop 2.6.5 > apache spark 2.4.4 >Reporter: ZhongyuWang >Priority: Major > > I can use it to submit spark app successfully in standalone client/yarn > client/yarn cluster mode,and get correct app status, but when i submit spark > app in standalone cluster mode, Spark handle always return LOST status(once) > and app running stablely until FINISHED( handle wasn't get any state change > infomation). I noticed when I submited app from code, after a while, the > SparkSubmit process was suddenly stopped. I checked sparkSubmit log(launcher > redirect log) doesn't have any useful information. > this is my pseudo code, > {code:java} > SparkAppHandle handle = launcher.startApplication(new > SparkAppHandle.Listener() { > @Override > public void stateChanged(SparkAppHandle handle) { > stateChangedHandle(handle.getAppId(), jobId, code, execId, > handle.getState(), driverInfo, request, infoLog, errorLog); > } > @Override > public void infoChanged(SparkAppHandle handle) { > stateChangedHandle(handle.getAppId(), jobId, code, execId, > handle.getState(), driverInfo, request, infoLog, errorLog); > } > });{code} > any idea ? thx -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33976) Add a dedicated SQL document page for the TRANSFORM-related functionality,
[ https://issues.apache.org/jira/browse/SPARK-33976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258141#comment-17258141 ] Apache Spark commented on SPARK-33976: -- User 'AngersZh' has created a pull request for this issue: https://github.com/apache/spark/pull/31010 > Add a dedicated SQL document page for the TRANSFORM-related functionality, > -- > > Key: SPARK-33976 > URL: https://issues.apache.org/jira/browse/SPARK-33976 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: angerszhu >Priority: Major > > Add doc about transform > https://github.com/apache/spark/pull/30973#issuecomment-753715318 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33985) Transform with clusterby/orderby/sortby
[ https://issues.apache.org/jira/browse/SPARK-33985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] angerszhu updated SPARK-33985: -- Description: Need to add UT to make sure data same with Hive > Transform with clusterby/orderby/sortby > --- > > Key: SPARK-33985 > URL: https://issues.apache.org/jira/browse/SPARK-33985 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: angerszhu >Priority: Major > > Need to add UT to make sure data same with Hive -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33983) Update cloudpickle to v1.6.0
Hyukjin Kwon created SPARK-33983: Summary: Update cloudpickle to v1.6.0 Key: SPARK-33983 URL: https://issues.apache.org/jira/browse/SPARK-33983 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 3.2.0 Reporter: Hyukjin Kwon Cloudpickle 1.6.0 is released out. We should better match it to the latest version. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33988) Add an option to enable CBO in TPCDSQueryBenchmark
[ https://issues.apache.org/jira/browse/SPARK-33988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33988: Assignee: (was: Apache Spark) > Add an option to enable CBO in TPCDSQueryBenchmark > -- > > Key: SPARK-33988 > URL: https://issues.apache.org/jira/browse/SPARK-33988 > Project: Spark > Issue Type: Test > Components: SQL, Tests >Affects Versions: 3.2.0 >Reporter: Takeshi Yamamuro >Priority: Major > > This ticket aims at adding a new option {{--cbo}} to enable CBO in > TPCDSQueryBenchmark. I think this option is useful so as to monitor > performance changes with CBO enabled. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-33005) Kubernetes GA Preparation
[ https://issues.apache.org/jira/browse/SPARK-33005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-33005. --- Resolution: Done > Kubernetes GA Preparation > - > > Key: SPARK-33005 > URL: https://issues.apache.org/jira/browse/SPARK-33005 > Project: Spark > Issue Type: Umbrella > Components: Kubernetes >Affects Versions: 3.1.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: releasenotes > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33711) Race condition in Spark k8s Pod lifecycle manager that leads to shutdowns
[ https://issues.apache.org/jira/browse/SPARK-33711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-33711: -- Parent: (was: SPARK-33005) Issue Type: Bug (was: Sub-task) > Race condition in Spark k8s Pod lifecycle manager that leads to shutdowns > -- > > Key: SPARK-33711 > URL: https://issues.apache.org/jira/browse/SPARK-33711 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 2.3.4, 2.4.7, 3.0.0, 3.1.0, 3.2.0 >Reporter: Attila Zsolt Piros >Priority: Major > > Watching a POD (ExecutorPodsWatchSnapshotSource) informs about single POD > changes which could wrongfully lead to detecting of missing PODs (PODs known > by scheduler backend but missing from POD snapshots) by the executor POD > lifecycle manager. > A key indicator of this is seeing this log msg: > "The executor with ID [some_id] was not found in the cluster but we didn't > get a reason why. Marking the executor as failed. The executor may have been > deleted but the driver missed the deletion event." > So one of the problem is running the missing POD detection even when a single > pod is changed without having a full consistent snapshot about all the PODs > (see ExecutorPodsPollingSnapshotSource). The other could be a race between > the executor POD lifecycle manager and the scheduler backend. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33982) Sparksql does not support when the inserted table is a read table
[ https://issues.apache.org/jira/browse/SPARK-33982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258052#comment-17258052 ] hao commented on SPARK-33982: - 我认为sparksql应该得到支持 > Sparksql does not support when the inserted table is a read table > - > > Key: SPARK-33982 > URL: https://issues.apache.org/jira/browse/SPARK-33982 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.1 >Reporter: hao >Priority: Major > > When the inserted table is a read table, sparksql will throw an error - > > org.apache.spark . sql.AnalysisException : Cannot overwrite a path that is > also being read from.; -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33982) Sparksql does not support when the inserted table is a read table
hao created SPARK-33982: --- Summary: Sparksql does not support when the inserted table is a read table Key: SPARK-33982 URL: https://issues.apache.org/jira/browse/SPARK-33982 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.0.1 Reporter: hao When the inserted table is a read table, sparksql will throw an error - > org.apache.spark . sql.AnalysisException : Cannot overwrite a path that is also being read from.; -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33965) CACHE TABLE does not support `spark_catalog` in Hive table names
[ https://issues.apache.org/jira/browse/SPARK-33965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-33965: --- Assignee: Maxim Gekk > CACHE TABLE does not support `spark_catalog` in Hive table names > > > Key: SPARK-33965 > URL: https://issues.apache.org/jira/browse/SPARK-33965 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > > The test fails: > {code:scala} > test("SPARK-X: cache table in spark_catalog") { > withNamespace("spark_catalog.ns") { > sql("CREATE NAMESPACE spark_catalog.ns") > val t = "spark_catalog.ns.tbl" > withTable(t) { > sql(s"CREATE TABLE $t (col int)") > assert(!spark.catalog.isCached(t)) > sql(s"CACHE TABLE $t") > assert(spark.catalog.isCached(t)) > } > } > } > {code} > with the exception: > {code:java} > [info] - SPARK-X: cache table in spark_catalog *** FAILED *** (278 > milliseconds) > [info] org.apache.spark.sql.AnalysisException: spark_catalog.ns.tbl is not > a valid TableIdentifier as it has more than 2 name parts. > [info] at > org.apache.spark.sql.connector.catalog.CatalogV2Implicits$MultipartIdentifierHelper.asTableIdentifier(CatalogV2Implicits.scala:130) > [info] at > org.apache.spark.sql.hive.test.TestHiveQueryExecution.$anonfun$analyzed$1(TestHive.scala:600) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-33965) CACHE TABLE does not support `spark_catalog` in Hive table names
[ https://issues.apache.org/jira/browse/SPARK-33965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-33965. - Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 30997 [https://github.com/apache/spark/pull/30997] > CACHE TABLE does not support `spark_catalog` in Hive table names > > > Key: SPARK-33965 > URL: https://issues.apache.org/jira/browse/SPARK-33965 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > Fix For: 3.2.0 > > > The test fails: > {code:scala} > test("SPARK-X: cache table in spark_catalog") { > withNamespace("spark_catalog.ns") { > sql("CREATE NAMESPACE spark_catalog.ns") > val t = "spark_catalog.ns.tbl" > withTable(t) { > sql(s"CREATE TABLE $t (col int)") > assert(!spark.catalog.isCached(t)) > sql(s"CACHE TABLE $t") > assert(spark.catalog.isCached(t)) > } > } > } > {code} > with the exception: > {code:java} > [info] - SPARK-X: cache table in spark_catalog *** FAILED *** (278 > milliseconds) > [info] org.apache.spark.sql.AnalysisException: spark_catalog.ns.tbl is not > a valid TableIdentifier as it has more than 2 name parts. > [info] at > org.apache.spark.sql.connector.catalog.CatalogV2Implicits$MultipartIdentifierHelper.asTableIdentifier(CatalogV2Implicits.scala:130) > [info] at > org.apache.spark.sql.hive.test.TestHiveQueryExecution.$anonfun$analyzed$1(TestHive.scala:600) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33005) Kubernetes GA Preparation
[ https://issues.apache.org/jira/browse/SPARK-33005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258051#comment-17258051 ] Hyukjin Kwon commented on SPARK-33005: -- Awesome! > Kubernetes GA Preparation > - > > Key: SPARK-33005 > URL: https://issues.apache.org/jira/browse/SPARK-33005 > Project: Spark > Issue Type: Umbrella > Components: Kubernetes >Affects Versions: 3.1.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: releasenotes > Fix For: 3.1.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33005) Kubernetes GA Preparation
[ https://issues.apache.org/jira/browse/SPARK-33005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-33005: - Fix Version/s: 3.1.0 > Kubernetes GA Preparation > - > > Key: SPARK-33005 > URL: https://issues.apache.org/jira/browse/SPARK-33005 > Project: Spark > Issue Type: Umbrella > Components: Kubernetes >Affects Versions: 3.1.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: releasenotes > Fix For: 3.1.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33978) Support ZSTD compression in ORC data source
[ https://issues.apache.org/jira/browse/SPARK-33978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-33978: - Assignee: Dongjoon Hyun > Support ZSTD compression in ORC data source > --- > > Key: SPARK-33978 > URL: https://issues.apache.org/jira/browse/SPARK-33978 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.2.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > > h3. What changes were proposed in this pull request? > This PR aims to support ZSTD compression in ORC data source. > h3. Why are the changes needed? > Apache ORC 1.6 supports ZSTD compression to generate more compact files and > save the storage cost. > *BEFORE* > {code:java} > scala> spark.range(10).write.option("compression", "zstd").orc("/tmp/zstd") > java.lang.IllegalArgumentException: Codec [zstd] is not available. Available > codecs are uncompressed, lzo, snappy, zlib, none. {code} > *AFTER* > {code:java} > scala> spark.range(10).write.option("compression", "zstd").orc("/tmp/zstd") > {code} > {code:java} > $ orc-tools meta /tmp/zstd > Processing data file > file:/tmp/zstd/part-00011-a63d9a17-456f-42d3-87a1-d922112ed28c-c000.orc > [length: 230] > Structure for > file:/tmp/zstd/part-00011-a63d9a17-456f-42d3-87a1-d922112ed28c-c000.orc > File Version: 0.12 with ORC_14 > Rows: 1 > Compression: ZSTD > Compression size: 262144 > Calendar: Julian/Gregorian > Type: struct > Stripe Statistics: > Stripe 1: > Column 0: count: 1 hasNull: false > Column 1: count: 1 hasNull: false bytesOnDisk: 6 min: 9 max: 9 sum: 9 > File Statistics: > Column 0: count: 1 hasNull: false > Column 1: count: 1 hasNull: false bytesOnDisk: 6 min: 9 max: 9 sum: 9 > Stripes: > Stripe: offset: 3 data: 6 rows: 1 tail: 35 index: 35 > Stream: column 0 section ROW_INDEX start: 3 length 11 > Stream: column 1 section ROW_INDEX start: 14 length 24 > Stream: column 1 section DATA start: 38 length 6 > Encoding column 0: DIRECT > Encoding column 1: DIRECT_V2 > File length: 230 bytes > Padding length: 0 bytes > Padding ratio: 0% > User Metadata: > org.apache.spark.version=3.2.0{code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-33978) Support ZSTD compression in ORC data source
[ https://issues.apache.org/jira/browse/SPARK-33978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-33978. --- Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 31002 [https://github.com/apache/spark/pull/31002] > Support ZSTD compression in ORC data source > --- > > Key: SPARK-33978 > URL: https://issues.apache.org/jira/browse/SPARK-33978 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.2.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.2.0 > > > h3. What changes were proposed in this pull request? > This PR aims to support ZSTD compression in ORC data source. > h3. Why are the changes needed? > Apache ORC 1.6 supports ZSTD compression to generate more compact files and > save the storage cost. > *BEFORE* > {code:java} > scala> spark.range(10).write.option("compression", "zstd").orc("/tmp/zstd") > java.lang.IllegalArgumentException: Codec [zstd] is not available. Available > codecs are uncompressed, lzo, snappy, zlib, none. {code} > *AFTER* > {code:java} > scala> spark.range(10).write.option("compression", "zstd").orc("/tmp/zstd") > {code} > {code:java} > $ orc-tools meta /tmp/zstd > Processing data file > file:/tmp/zstd/part-00011-a63d9a17-456f-42d3-87a1-d922112ed28c-c000.orc > [length: 230] > Structure for > file:/tmp/zstd/part-00011-a63d9a17-456f-42d3-87a1-d922112ed28c-c000.orc > File Version: 0.12 with ORC_14 > Rows: 1 > Compression: ZSTD > Compression size: 262144 > Calendar: Julian/Gregorian > Type: struct > Stripe Statistics: > Stripe 1: > Column 0: count: 1 hasNull: false > Column 1: count: 1 hasNull: false bytesOnDisk: 6 min: 9 max: 9 sum: 9 > File Statistics: > Column 0: count: 1 hasNull: false > Column 1: count: 1 hasNull: false bytesOnDisk: 6 min: 9 max: 9 sum: 9 > Stripes: > Stripe: offset: 3 data: 6 rows: 1 tail: 35 index: 35 > Stream: column 0 section ROW_INDEX start: 3 length 11 > Stream: column 1 section ROW_INDEX start: 14 length 24 > Stream: column 1 section DATA start: 38 length 6 > Encoding column 0: DIRECT > Encoding column 1: DIRECT_V2 > File length: 230 bytes > Padding length: 0 bytes > Padding ratio: 0% > User Metadata: > org.apache.spark.version=3.2.0{code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33276) Fix K8s IT Flakiness
[ https://issues.apache.org/jira/browse/SPARK-33276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-33276: -- Parent: (was: SPARK-33005) Issue Type: Bug (was: Sub-task) > Fix K8s IT Flakiness > > > Key: SPARK-33276 > URL: https://issues.apache.org/jira/browse/SPARK-33276 > Project: Spark > Issue Type: Bug > Components: Kubernetes, Tests >Affects Versions: 3.0.1, 3.1.0 >Reporter: Dongjoon Hyun >Priority: Major > > The following two consecutive runs are using the same git hash, > a744fea3be12f1a53ab553040b95da730210bc88 . > - > https://amplab.cs.berkeley.edu/jenkins/view/Spark%20K8s%20Builds/job/spark-master-test-k8s/646/ > - > https://amplab.cs.berkeley.edu/jenkins/view/Spark%20K8s%20Builds/job/spark-master-test-k8s/647/ > However, the second one fails while the first one succeeds. > {code} > KubernetesSuite: > - Run SparkPi with no resources *** FAILED *** > The code passed to eventually never returned normally. Attempted 190 times > over 3.00269949337 minutes. Last failure message: false was not true. > (KubernetesSuite.scala:383) > - Run SparkPi with a very long application name. > - Use SparkLauncher.NO_RESOURCE > - Run SparkPi with a master URL without a scheme. > - Run SparkPi with an argument. > - Run SparkPi with custom labels, annotations, and environment variables. > - All pods have the same service account by default > - Run extraJVMOptions check on driver > - Run SparkRemoteFileTest using a remote data file > - Run SparkPi with env and mount secrets. > - Run PySpark on simple pi.py example > - Run PySpark to test a pyfiles example > - Run PySpark with memory customization > - Run in client mode. > - Start pod creation from template > - PVs with local storage > - Launcher client dependencies > - Test basic decommissioning > - Test basic decommissioning with shuffle cleanup *** FAILED *** > The code passed to eventually never returned normally. Attempted 184 times > over 3.017213349366 minutes. Last failure message: "++ id -u > + myuid=185 > ++ id -g > + mygid=0 > + set +e > ++ getent passwd 185 > + uidentry= > + set -e > + '[' -z '' ']' > + '[' -w /etc/passwd ']' > + echo '185:x:185:0:anonymous uid:/opt/spark:/bin/false' > + SPARK_CLASSPATH=':/opt/spark/jars/*' > + env > + grep SPARK_JAVA_OPT_ > + sort -t_ -k4 -n > + sed 's/[^=]*=\(.*\)/\1/g' > + readarray -t SPARK_EXECUTOR_JAVA_OPTS > + '[' -n '' ']' > + '[' 3 == 3 ']' > ++ python3 -V > + pyv3='Python 3.7.3' > + export PYTHON_VERSION=3.7.3 > + PYTHON_VERSION=3.7.3 > + export PYSPARK_PYTHON=python3 > + PYSPARK_PYTHON=python3 > + export PYSPARK_DRIVER_PYTHON=python3 > + PYSPARK_DRIVER_PYTHON=python3 > + '[' -n '' ']' > + '[' -z ']' > + case "$1" in > + shift 1 > + CMD=("$SPARK_HOME/bin/spark-submit" --conf > "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client > "$@") > + exec /usr/bin/tini -s -- /opt/spark/bin/spark-submit --conf > spark.driver.bindAddress=172.17.0.4 --deploy-mode client --properties-file > /opt/spark/conf/spark.properties --class org.apache.spark.deploy.PythonRunner > local:///opt/spark/tests/decommissioning_cleanup.py > 20/10/28 19:47:28 WARN NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > Starting decom test > Using Spark's default log4j profile: > org/apache/spark/log4j-defaults.properties > 20/10/28 19:47:29 INFO SparkContext: Running Spark version 3.1.0-SNAPSHOT > 20/10/28 19:47:29 INFO ResourceUtils: > == > 20/10/28 19:47:29 INFO ResourceUtils: No custom resources configured for > spark.driver. > 20/10/28 19:47:29 INFO ResourceUtils: > == > 20/10/28 19:47:29 INFO SparkContext: Submitted application: DecomTest > 20/10/28 19:47:29 INFO ResourceProfile: Default ResourceProfile created, > executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: , > memory -> name: memory, amount: 1024, script: , vendor: ), task resources: > Map(cpus -> name: cpus, amount: 1.0) > 20/10/28 19:47:29 INFO ResourceProfile: Limiting resource is cpus at 1 > tasks per executor > 20/10/28 19:47:29 INFO ResourceProfileManager: Added ResourceProfile id: 0 > 20/10/28 19:47:29 INFO SecurityManager: Changing view acls to: 185,jenkins > 20/10/28 19:47:29 INFO SecurityManager: Changing modify acls to: 185,jenkins > 20/10/28 19:47:29 INFO SecurityManager: Changing view acls groups to: > 20/10/28 19:47:29 INFO SecurityManager: Changing modify acls groups to: > 20/10/28 19:47:29 INFO SecurityManager: SecurityManager: authentication > enabled; ui acls disabled; users with view
[jira] [Updated] (SPARK-28895) Spark client process is unable to upload jars to hdfs while using ConfigMap not HADOOP_CONF_DIR
[ https://issues.apache.org/jira/browse/SPARK-28895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-28895: -- Parent: (was: SPARK-33005) Issue Type: Bug (was: Sub-task) > Spark client process is unable to upload jars to hdfs while using ConfigMap > not HADOOP_CONF_DIR > --- > > Key: SPARK-28895 > URL: https://issues.apache.org/jira/browse/SPARK-28895 > Project: Spark > Issue Type: Bug > Components: Kubernetes, Spark Core >Affects Versions: 3.0.0 >Reporter: Kent Yao >Priority: Major > > The *BasicDriverFeatureStep* for Spark on Kubernetes will upload the > files/jars specified by --files/–jars to a hadoop compatible file system > configured by spark.kubernetes.file.upload.path. While using HADOOP_CONF_DIR, > the spark-submit process can recognize the file system, but when using > spark.kubernetes.hadoop.configMapName which only will be mount on the Pods > not applied back to our client process. > > ||Heading 1||Heading 2|| > |HADOOP_CONF_DIR=/path/to/etc/hadoop|OK| > |spark.kubernetes.hadoop.configMapName=hz10-hadoop-dir |FAILED| > > {code:java} > Kent@KentsMacBookPro > ~/Documents/spark-on-k8s/spark-3.0.0-SNAPSHOT-bin-2.7.3 bin/spark-submit > --conf spark.kubernetes.file.upload.path=hdfs://hz-cluster10/user/kyuubi/udf > --jars > /Users/Kent/Documents/spark-on-k8s/spark-3.0.0-SNAPSHOT-bin-2.7.3/hadoop-lzo-0.4.20-SNAPSHOT.jar > --conf spark.kerberos.keytab=/Users/Kent/Downloads/kyuubi.keytab --conf > spark.kerberos.principal=kyuubi/d...@hadoop.hz.netease.com --conf > spark.kubernetes.kerberos.krb5.path=/etc/krb5.conf --name hehe --deploy-mode > cluster --class org.apache.spark.examples.HdfsTest > local:///opt/spark/examples/jars/spark-examples_2.12-3.0.0-SNAPSHOT.jar > hdfs://hz-cluster10/user/kyuubi/hive_db/kyuubi.db/hive_tbl > Listening for transport dt_socket at address: 50014 > # spark.master=k8s://https://10.120.238.100:7443 > 19/08/27 17:21:06 WARN NativeCodeLoader: Unable to load native-hadoop library > for your platform... using builtin-java classes where applicable > Using Spark's default log4j profile: > org/apache/spark/log4j-defaults.properties > 19/08/27 17:21:07 INFO SparkKubernetesClientFactory: Auto-configuring K8S > client using current context from users K8S config file > Listening for transport dt_socket at address: 50014 > Exception in thread "main" org.apache.spark.SparkException: Uploading file > /Users/Kent/Documents/spark-on-k8s/spark-3.0.0-SNAPSHOT-bin-2.7.3/hadoop-lzo-0.4.20-SNAPSHOT.jar > failed... > at > org.apache.spark.deploy.k8s.KubernetesUtils$.uploadFileUri(KubernetesUtils.scala:287) > at > org.apache.spark.deploy.k8s.KubernetesUtils$.$anonfun$uploadAndTransformFileUris$1(KubernetesUtils.scala:246) > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:237) > at > scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) > at > scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) > at scala.collection.TraversableLike.map(TraversableLike.scala:237) > at scala.collection.TraversableLike.map$(TraversableLike.scala:230) > at scala.collection.AbstractTraversable.map(Traversable.scala:108) > at > org.apache.spark.deploy.k8s.KubernetesUtils$.uploadAndTransformFileUris(KubernetesUtils.scala:245) > at > org.apache.spark.deploy.k8s.features.BasicDriverFeatureStep.$anonfun$getAdditionalPodSystemProperties$1(BasicDriverFeatur# > spark.master=k8s://https://10.120.238.100:7443 > eStep.scala:165) > at scala.collection.immutable.List.foreach(List.scala:392) > at > org.apache.spark.deploy.k8s.features.BasicDriverFeatureStep.getAdditionalPodSystemProperties(BasicDriverFeatureStep.scala:163) > at > org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.$anonfun$buildFromFeatures$3(KubernetesDriverBuilder.scala:60) > at > scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126) > at > scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122) > at scala.collection.immutable.List.foldLeft(List.scala:89) > at > org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.buildFromFeatures(KubernetesDriverBuilder.scala:58) > at > org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:101) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$10(KubernetesClientApplication.scala:236) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$10$adapted(KubernetesClientApplication.scala:229) >
[jira] [Updated] (SPARK-33349) ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed
[ https://issues.apache.org/jira/browse/SPARK-33349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-33349: -- Parent: (was: SPARK-33005) Issue Type: Bug (was: Sub-task) > ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed > -- > > Key: SPARK-33349 > URL: https://issues.apache.org/jira/browse/SPARK-33349 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.0.1, 3.0.2, 3.1.0 >Reporter: Nicola Bova >Priority: Critical > > I launch my spark application with the > [spark-on-kubernetes-operator|https://github.com/GoogleCloudPlatform/spark-on-k8s-operator] > with the following yaml file: > {code:yaml} > apiVersion: sparkoperator.k8s.io/v1beta2 > kind: SparkApplication > metadata: > name: spark-kafka-streamer-test > namespace: kafka2hdfs > spec: > type: Scala > mode: cluster > image: /spark:3.0.2-SNAPSHOT-2.12-0.1.0 > imagePullPolicy: Always > timeToLiveSeconds: 259200 > mainClass: path.to.my.class.KafkaStreamer > mainApplicationFile: spark-kafka-streamer_2.12-spark300-assembly.jar > sparkVersion: 3.0.1 > restartPolicy: > type: Always > sparkConf: > "spark.kafka.consumer.cache.capacity": "8192" > "spark.kubernetes.memoryOverheadFactor": "0.3" > deps: > jars: > - my > - jar > - list > hadoopConfigMap: hdfs-config > driver: > cores: 4 > memory: 12g > labels: > version: 3.0.1 > serviceAccount: default > javaOptions: > "-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties" > executor: > instances: 4 > cores: 4 > memory: 16g > labels: > version: 3.0.1 > javaOptions: > "-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties" > {code} > I have tried with both Spark `3.0.1` and `3.0.2-SNAPSHOT` with the ["Restart > the watcher when we receive a version changed from > k8s"|https://github.com/apache/spark/pull/29533] patch. > This is the driver log: > {code} > 20/11/04 12:16:02 WARN NativeCodeLoader: Unable to load native-hadoop library > for your platform... using builtin-java classes where applicable > ... // my app log, it's a structured streaming app reading from kafka and > writing to hdfs > 20/11/04 13:12:12 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has > been closed (this is expected if the application is shutting down.) > io.fabric8.kubernetes.client.KubernetesClientException: too old resource > version: 1574101276 (1574213896) > at > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onMessage(WatchConnectionManager.java:259) > at okhttp3.internal.ws.RealWebSocket.onReadMessage(RealWebSocket.java:323) > at > okhttp3.internal.ws.WebSocketReader.readMessageFrame(WebSocketReader.java:219) > at > okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:105) > at okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:274) > at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:214) > at okhttp3.RealCall$AsyncCall.execute(RealCall.java:203) > at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) > at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown > Source) > at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown > Source) > at java.base/java.lang.Thread.run(Unknown Source) > {code} > The error above appears after roughly 50 minutes. > After the exception above, no more logs are produced and the app hangs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28992) Support update dependencies from hdfs when task run on executor pods
[ https://issues.apache.org/jira/browse/SPARK-28992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-28992: -- Parent: (was: SPARK-33005) Issue Type: Improvement (was: Sub-task) > Support update dependencies from hdfs when task run on executor pods > > > Key: SPARK-28992 > URL: https://issues.apache.org/jira/browse/SPARK-28992 > Project: Spark > Issue Type: Improvement > Components: Kubernetes, Spark Core >Affects Versions: 3.1.0 >Reporter: Kent Yao >Priority: Major > > Here is a case: > {code:java} > bin/spark-submit --class com.github.ehiggs.spark.terasort.TeraSort > hdfs://hz-cluster10/user/kyuubi/udf/spark-terasort-1.1-SNAPSHOT-jar-with-dependencies.jar > hdfs://hz-cluster10/user/kyuubi/terasort/1000g > hdfs://hz-cluster10/user/kyuubi/terasort/1000g-out1 > {code} > Spark supports add jar logic and application-jar from hdfs - - > [http://spark.apache.org/docs/latest/submitting-applications.html#launching-applications-with-spark-submit] > Take spark on yarn for example, it creates a __spark_hadoop_conf__.xml file > and upload the hadoop distribute cache, the executor processes can use this > to identify where their dependencies located. > But on k8s, i tried and failed to update dependencies. > {code:java} > 19/09/04 08:08:52 INFO scheduler.DAGScheduler: ShuffleMapStage 0 > (newAPIHadoopFile at TeraSort.scala:60) failed in 1.058 s due to Job aborted > due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent > failure: Lost task 0.3 in stage 0.0 (TID 9, 100.66.0.75, executor 2): > java.lang.IllegalArgumentException: java.net.UnknownHostException: > hz-cluster10 > 19/09/04 08:08:52 INFO scheduler.DAGScheduler: ShuffleMapStage 0 > (newAPIHadoopFile at TeraSort.scala:60) failed in 1.058 s due to Job aborted > due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent > failure: Lost task 0.3 in stage 0.0 (TID 9, 100.66.0.75, executor 2): > java.lang.IllegalArgumentException: java.net.UnknownHostException: > hz-cluster10 at > org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:378) > at > org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:310) > at > org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:176) > at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:678) at > org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:619) at > org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:149) > at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2669) at > org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94) at > org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703) at > org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685) at > org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373) at > org.apache.spark.util.Utils$.getHadoopFileSystem(Utils.scala:1881) at > org.apache.spark.util.Utils$.doFetchFile(Utils.scala:737) at > org.apache.spark.util.Utils$.fetchFile(Utils.scala:522) at > org.apache.spark.executor.Executor.$anonfun$updateDependencies$7(Executor.scala:869) > at > org.apache.spark.executor.Executor.$anonfun$updateDependencies$7$adapted(Executor.scala:860) > at > scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:792) > at scala.collection.mutable.HashMap.$anonfun$foreach$1(HashMap.scala:149) at > scala.collection.mutable.HashTable.foreachEntry(HashTable.scala:237) at > scala.collection.mutable.HashTable.foreachEntry$(HashTable.scala:230) at > scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:44) at > scala.collection.mutable.HashMap.foreach(HashMap.scala:149) at > scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:791) > at > org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$updateDependencies(Executor.scala:860) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:409) at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33952) Python-friendly dtypes for pyspark dataframes
[ https://issues.apache.org/jira/browse/SPARK-33952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258046#comment-17258046 ] Marc de Lignie commented on SPARK-33952: @[~hyukjin.kwon] Thanks for asking. When you write a pyspark UDF or get a pyspark DataFrame returned after a collect() it is much more recognizable to know that a column datatype is "[Row(x:[Row(x1:string, x2:string)], y:string, z:string)]" rather than "array>, y:string, z:string>>". Of course, this remains a matter of taste. Also, the original dtypes in terms of array, struct, map remain useful when applying push-down functions for which the documentation and naming uses these terms. > Python-friendly dtypes for pyspark dataframes > - > > Key: SPARK-33952 > URL: https://issues.apache.org/jira/browse/SPARK-33952 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Marc de Lignie >Priority: Minor > > The pyspark.sql.DataFrame.dtypes attribute contains string representations of > the column datatypes in terms of JVM datatypes. However, for a python user it > is a significant mental step to translate these to the corresponding python > types encountered in UDF's and collected dataframes. This holds in particular > for nested composite datatypes (array, map and struct). It is proposed to > provide python-friendly dtypes in pyspark (as an addition, not a replacement) > in which array<>, map<> and struct<> are translated to [], {} and Row(). > Sample code, including tests, is available as [gist on > github|https://gist.github.com/vtslab/81ded1a7af006100e00bf2a4a70a8147]. More > explanation is provided at: > [https://yaaics.blogspot.com/2020/12/python-friendly-dtypes-for-pyspark.html] > If this proposal finds sufficient support, I can provide a PR. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33992) resolveOperatorsUpWithNewOutput should wrap allowInvokingTransformsInAnalyzer
Kent Yao created SPARK-33992: Summary: resolveOperatorsUpWithNewOutput should wrap allowInvokingTransformsInAnalyzer Key: SPARK-33992 URL: https://issues.apache.org/jira/browse/SPARK-33992 Project: Spark Issue Type: Bug Components: SQL, Tests Affects Versions: 3.1.0 Reporter: Kent Yao PaddingAndLengthCheckForCharVarchar could fail query when resolveOperatorsUpWithNewOutput with {code:java} [info] - char/varchar resolution in sub query *** FAILED *** (367 milliseconds) [info] java.lang.RuntimeException: This method should not be called in the analyzer [info] at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.assertNotAnalysisRule(AnalysisHelper.scala:150) [info] at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.assertNotAnalysisRule$(AnalysisHelper.scala:146) [info] at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.assertNotAnalysisRule(LogicalPlan.scala:29) [info] at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDown(AnalysisHelper.scala:161) [info] at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDown$(AnalysisHelper.scala:160) [info] at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29) [info] at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29) [info] at org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$updateOuterReferencesInSubquery(QueryPlan.scala:267) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33992) resolveOperatorsUpWithNewOutput should wrap allowInvokingTransformsInAnalyzer
[ https://issues.apache.org/jira/browse/SPARK-33992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258192#comment-17258192 ] Apache Spark commented on SPARK-33992: -- User 'yaooqinn' has created a pull request for this issue: https://github.com/apache/spark/pull/31013 > resolveOperatorsUpWithNewOutput should wrap allowInvokingTransformsInAnalyzer > - > > Key: SPARK-33992 > URL: https://issues.apache.org/jira/browse/SPARK-33992 > Project: Spark > Issue Type: Bug > Components: SQL, Tests >Affects Versions: 3.1.0 >Reporter: Kent Yao >Priority: Minor > > PaddingAndLengthCheckForCharVarchar could fail query when > resolveOperatorsUpWithNewOutput > with > {code:java} > [info] - char/varchar resolution in sub query *** FAILED *** (367 > milliseconds) > [info] java.lang.RuntimeException: This method should not be called in the > analyzer > [info] at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.assertNotAnalysisRule(AnalysisHelper.scala:150) > [info] at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.assertNotAnalysisRule$(AnalysisHelper.scala:146) > [info] at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.assertNotAnalysisRule(LogicalPlan.scala:29) > [info] at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDown(AnalysisHelper.scala:161) > [info] at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDown$(AnalysisHelper.scala:160) > [info] at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29) > [info] at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29) > [info] at > org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$updateOuterReferencesInSubquery(QueryPlan.scala:267) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33992) resolveOperatorsUpWithNewOutput should wrap allowInvokingTransformsInAnalyzer
[ https://issues.apache.org/jira/browse/SPARK-33992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33992: Assignee: Apache Spark > resolveOperatorsUpWithNewOutput should wrap allowInvokingTransformsInAnalyzer > - > > Key: SPARK-33992 > URL: https://issues.apache.org/jira/browse/SPARK-33992 > Project: Spark > Issue Type: Bug > Components: SQL, Tests >Affects Versions: 3.1.0 >Reporter: Kent Yao >Assignee: Apache Spark >Priority: Minor > > PaddingAndLengthCheckForCharVarchar could fail query when > resolveOperatorsUpWithNewOutput > with > {code:java} > [info] - char/varchar resolution in sub query *** FAILED *** (367 > milliseconds) > [info] java.lang.RuntimeException: This method should not be called in the > analyzer > [info] at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.assertNotAnalysisRule(AnalysisHelper.scala:150) > [info] at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.assertNotAnalysisRule$(AnalysisHelper.scala:146) > [info] at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.assertNotAnalysisRule(LogicalPlan.scala:29) > [info] at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDown(AnalysisHelper.scala:161) > [info] at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDown$(AnalysisHelper.scala:160) > [info] at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29) > [info] at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29) > [info] at > org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$updateOuterReferencesInSubquery(QueryPlan.scala:267) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33990) v2 ALTER TABLE .. DROP PARTITION does not remove data from dropped partition
[ https://issues.apache.org/jira/browse/SPARK-33990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33990: Assignee: Apache Spark > v2 ALTER TABLE .. DROP PARTITION does not remove data from dropped partition > > > Key: SPARK-33990 > URL: https://issues.apache.org/jira/browse/SPARK-33990 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Maxim Gekk >Assignee: Apache Spark >Priority: Major > > The test fails: > {code:scala} > test("SPARK-X: don not return data from dropped partition") { > withNamespaceAndTable("ns", "tbl") { t => > sql(s"CREATE TABLE $t (id int, part int) $defaultUsing PARTITIONED BY > (part)") > sql(s"INSERT INTO $t PARTITION (part=0) SELECT 0") > sql(s"INSERT INTO $t PARTITION (part=1) SELECT 1") > QueryTest.checkAnswer(sql(s"SELECT * FROM $t"), Seq(Row(0, 0), Row(1, > 1))) > sql(s"ALTER TABLE $t DROP PARTITION (part=0)") > QueryTest.checkAnswer(sql(s"SELECT * FROM $t"), Seq(Row(1, 1))) > } > } > {code} > on the last check with: > {code} > == Results == > !== Correct Answer - 1 == == Spark Answer - 2 == > !struct<> struct > ![1,1] [0,0] > ! [1,1] > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33990) v2 ALTER TABLE .. DROP PARTITION does not remove data from dropped partition
[ https://issues.apache.org/jira/browse/SPARK-33990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33990: Assignee: (was: Apache Spark) > v2 ALTER TABLE .. DROP PARTITION does not remove data from dropped partition > > > Key: SPARK-33990 > URL: https://issues.apache.org/jira/browse/SPARK-33990 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Maxim Gekk >Priority: Major > > The test fails: > {code:scala} > test("SPARK-X: don not return data from dropped partition") { > withNamespaceAndTable("ns", "tbl") { t => > sql(s"CREATE TABLE $t (id int, part int) $defaultUsing PARTITIONED BY > (part)") > sql(s"INSERT INTO $t PARTITION (part=0) SELECT 0") > sql(s"INSERT INTO $t PARTITION (part=1) SELECT 1") > QueryTest.checkAnswer(sql(s"SELECT * FROM $t"), Seq(Row(0, 0), Row(1, > 1))) > sql(s"ALTER TABLE $t DROP PARTITION (part=0)") > QueryTest.checkAnswer(sql(s"SELECT * FROM $t"), Seq(Row(1, 1))) > } > } > {code} > on the last check with: > {code} > == Results == > !== Correct Answer - 1 == == Spark Answer - 2 == > !struct<> struct > ![1,1] [0,0] > ! [1,1] > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33991) Repair enumeration conversion error for AllJobsPage
[ https://issues.apache.org/jira/browse/SPARK-33991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258228#comment-17258228 ] Apache Spark commented on SPARK-33991: -- User 'FelixYik' has created a pull request for this issue: https://github.com/apache/spark/pull/31015 > Repair enumeration conversion error for AllJobsPage > --- > > Key: SPARK-33991 > URL: https://issues.apache.org/jira/browse/SPARK-33991 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Felix Yi >Priority: Critical > > For AllJobsPage class, AllJobsPage gets the schedulingMode of enumerated type > by loading the spark.scheduler.mode configuration from Sparkconf, but an > enumeration conversion error occurs when I set the value of this > configuration to lowercase. > The reason for this problem is that the value of the SchedulingMode > enumeration class is uppercase, which occurs when I configure spark. > scheduler.mode to be lowercase. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33991) Repair enumeration conversion error for AllJobsPage
[ https://issues.apache.org/jira/browse/SPARK-33991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33991: Assignee: Apache Spark > Repair enumeration conversion error for AllJobsPage > --- > > Key: SPARK-33991 > URL: https://issues.apache.org/jira/browse/SPARK-33991 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Felix Yi >Assignee: Apache Spark >Priority: Critical > > For AllJobsPage class, AllJobsPage gets the schedulingMode of enumerated type > by loading the spark.scheduler.mode configuration from Sparkconf, but an > enumeration conversion error occurs when I set the value of this > configuration to lowercase. > The reason for this problem is that the value of the SchedulingMode > enumeration class is uppercase, which occurs when I configure spark. > scheduler.mode to be lowercase. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33991) Repair enumeration conversion error for AllJobsPage
[ https://issues.apache.org/jira/browse/SPARK-33991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33991: Assignee: (was: Apache Spark) > Repair enumeration conversion error for AllJobsPage > --- > > Key: SPARK-33991 > URL: https://issues.apache.org/jira/browse/SPARK-33991 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Felix Yi >Priority: Critical > > For AllJobsPage class, AllJobsPage gets the schedulingMode of enumerated type > by loading the spark.scheduler.mode configuration from Sparkconf, but an > enumeration conversion error occurs when I set the value of this > configuration to lowercase. > The reason for this problem is that the value of the SchedulingMode > enumeration class is uppercase, which occurs when I configure spark. > scheduler.mode to be lowercase. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33991) Repair enumeration conversion error for page showing list
kaif Yi created SPARK-33991: --- Summary: Repair enumeration conversion error for page showing list Key: SPARK-33991 URL: https://issues.apache.org/jira/browse/SPARK-33991 Project: Spark Issue Type: Bug Components: Web UI Affects Versions: 3.0.0 Environment: For AllJobsPage class, AllJobsPage gets the schedulingMode of enumerated type by loading the spark.scheduler.mode configuration from Sparkconf, but an enumeration conversion error occurs when I set the value of this configuration to lowercase. Reporter: kaif Yi -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33991) Repair enumeration conversion error for page showing list
[ https://issues.apache.org/jira/browse/SPARK-33991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kaif Yi updated SPARK-33991: Description: For AllJobsPage class, AllJobsPage gets the schedulingMode of enumerated type by loading the spark.scheduler.mode configuration from Sparkconf, but an enumeration conversion error occurs when I set the value of this configuration to lowercase. Environment: (was: For AllJobsPage class, AllJobsPage gets the schedulingMode of enumerated type by loading the spark.scheduler.mode configuration from Sparkconf, but an enumeration conversion error occurs when I set the value of this configuration to lowercase.) > Repair enumeration conversion error for page showing list > - > > Key: SPARK-33991 > URL: https://issues.apache.org/jira/browse/SPARK-33991 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 3.0.0 >Reporter: kaif Yi >Priority: Critical > > For AllJobsPage class, AllJobsPage gets the schedulingMode of enumerated type > by loading the spark.scheduler.mode configuration from Sparkconf, but an > enumeration conversion error occurs when I set the value of this > configuration to lowercase. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33991) Repair enumeration conversion error for AllJobsPage
[ https://issues.apache.org/jira/browse/SPARK-33991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Yi updated SPARK-33991: - Description: For AllJobsPage class, AllJobsPage gets the schedulingMode of enumerated type by loading the spark.scheduler.mode configuration from Sparkconf, but an enumeration conversion error occurs when I set the value of this configuration to lowercase. The reason for this problem is that the value of the SchedulingMode enumeration class is uppercase, which occurs when I configure spark. scheduler.mode to be lowercase. was:For AllJobsPage class, AllJobsPage gets the schedulingMode of enumerated type by loading the spark.scheduler.mode configuration from Sparkconf, but an enumeration conversion error occurs when I set the value of this configuration to lowercase. > Repair enumeration conversion error for AllJobsPage > --- > > Key: SPARK-33991 > URL: https://issues.apache.org/jira/browse/SPARK-33991 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Felix Yi >Priority: Critical > > For AllJobsPage class, AllJobsPage gets the schedulingMode of enumerated type > by loading the spark.scheduler.mode configuration from Sparkconf, but an > enumeration conversion error occurs when I set the value of this > configuration to lowercase. > The reason for this problem is that the value of the SchedulingMode > enumeration class is uppercase, which occurs when I configure spark. > scheduler.mode to be lowercase. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-33948) branch-3.1 jenkins test failed in Scala 2.13
[ https://issues.apache.org/jira/browse/SPARK-33948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258217#comment-17258217 ] Yang Jie edited comment on SPARK-33948 at 1/4/21, 2:00 PM: --- *Sync:* {code:java} commit 58583f7c3fdcac1232607a7ab4b0d052320ac3ea (HEAD -> branch-3.1) Author: xuewei.linxuewei Date: Wed Dec 2 16:10:45 2020 + [SPARK-33619][SQL] Fix GetMapValueUtil code generation error Run completed in 11 minutes, 51 seconds. Total number of tests run: 4623 Suites: completed 256, aborted 0 Tests: succeeded 4607, failed 16, canceled 0, ignored 5, pending 0 *** 16 TESTS FAILED *** {code} {code:java} commit df8d3f1bf779ce1a9f3520939ab85814f09b48b7 (HEAD -> branch-3.1) Author: HyukjinKwon Date: Wed Dec 2 16:03:08 2020 + [SPARK-33544][SQL][FOLLOW-UP] Rename NoSideEffect to NoThrow and clarify the documentation more Run completed in 10 minutes, 39 seconds. Total number of tests run: 4622 Suites: completed 256, aborted 0 Tests: succeeded 4622, failed 0, canceled 0, ignored 5, pending 0 All tests passed. {code} After SPARK-33619 , there are 16 TESTS FAILED in branch-3.1, no further investigation yet, and I'm not sure why the master branch was successful, need more time to analyze. was (Author: luciferyang): *Sync:* {code:java} commit 58583f7c3fdcac1232607a7ab4b0d052320ac3ea (HEAD -> branch-3.1) Author: xuewei.linxuewei Date: Wed Dec 2 16:10:45 2020 + [SPARK-33619][SQL] Fix GetMapValueUtil code generation error Run completed in 11 minutes, 51 seconds. Total number of tests run: 4623 Suites: completed 256, aborted 0 Tests: succeeded 4607, failed 16, canceled 0, ignored 5, pending 0 *** 16 TESTS FAILED *** {code} {code:java} commit df8d3f1bf779ce1a9f3520939ab85814f09b48b7 (HEAD -> branch-3.1) Author: HyukjinKwon Date: Wed Dec 2 16:03:08 2020 + [SPARK-33544][SQL][FOLLOW-UP] Rename NoSideEffect to NoThrow and clarify the documentation more Run completed in 10 minutes, 39 seconds. Total number of tests run: 4622 Suites: completed 256, aborted 0 Tests: succeeded 4622, failed 0, canceled 0, ignored 5, pending 0 All tests passed. {code} After SPARK-33619 , there are 16 TESTS FAILED in branch-3.1, no further investigation yet > branch-3.1 jenkins test failed in Scala 2.13 > - > > Key: SPARK-33948 > URL: https://issues.apache.org/jira/browse/SPARK-33948 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, SQL >Affects Versions: 3.1.0 > Environment: * > >Reporter: Yang Jie >Priority: Major > > [https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/#showFailuresLink] > * > [org.apache.spark.network.client.TransportClientFactorySuite.reuseClientsUpToConfigVariableConcurrent|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/reuseClientsUpToConfigVariableConcurrent/] > * > [org.apache.spark.network.client.TransportClientFactorySuite.reuseClientsUpToConfigVariableConcurrent|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/reuseClientsUpToConfigVariableConcurrent_2/] > * > [org.apache.spark.network.client.TransportClientFactorySuite.fastFailConnectionInTimeWindow|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/fastFailConnectionInTimeWindow/] > * > [org.apache.spark.network.client.TransportClientFactorySuite.fastFailConnectionInTimeWindow|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/fastFailConnectionInTimeWindow_2/] > * > [org.apache.spark.network.client.TransportClientFactorySuite.closeFactoryBeforeCreateClient|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/closeFactoryBeforeCreateClient/] > * > [org.apache.spark.network.client.TransportClientFactorySuite.closeFactoryBeforeCreateClient|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/closeFactoryBeforeCreateClient_2/] > * >
[jira] [Updated] (SPARK-33991) Repair enumeration conversion error for page showing list of all ongoing and recently finished jobs
[ https://issues.apache.org/jira/browse/SPARK-33991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kaif Yi updated SPARK-33991: Summary: Repair enumeration conversion error for page showing list of all ongoing and recently finished jobs (was: Repair enumeration conversion error for page showing list) > Repair enumeration conversion error for page showing list of all ongoing and > recently finished jobs > --- > > Key: SPARK-33991 > URL: https://issues.apache.org/jira/browse/SPARK-33991 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 3.0.0 >Reporter: kaif Yi >Priority: Critical > > For AllJobsPage class, AllJobsPage gets the schedulingMode of enumerated type > by loading the spark.scheduler.mode configuration from Sparkconf, but an > enumeration conversion error occurs when I set the value of this > configuration to lowercase. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33991) Repair enumeration conversion error for AllJobsPage
[ https://issues.apache.org/jira/browse/SPARK-33991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kaif Yi updated SPARK-33991: Summary: Repair enumeration conversion error for AllJobsPage (was: Repair enumeration conversion error for page showing list of all ongoing and recently finished jobs) > Repair enumeration conversion error for AllJobsPage > --- > > Key: SPARK-33991 > URL: https://issues.apache.org/jira/browse/SPARK-33991 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 3.0.0 >Reporter: kaif Yi >Priority: Critical > > For AllJobsPage class, AllJobsPage gets the schedulingMode of enumerated type > by loading the spark.scheduler.mode configuration from Sparkconf, but an > enumeration conversion error occurs when I set the value of this > configuration to lowercase. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33990) v2 ALTER TABLE .. DROP PARTITION does not remove data from dropped partition
[ https://issues.apache.org/jira/browse/SPARK-33990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258195#comment-17258195 ] Apache Spark commented on SPARK-33990: -- User 'MaxGekk' has created a pull request for this issue: https://github.com/apache/spark/pull/31014 > v2 ALTER TABLE .. DROP PARTITION does not remove data from dropped partition > > > Key: SPARK-33990 > URL: https://issues.apache.org/jira/browse/SPARK-33990 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Maxim Gekk >Priority: Major > > The test fails: > {code:scala} > test("SPARK-X: don not return data from dropped partition") { > withNamespaceAndTable("ns", "tbl") { t => > sql(s"CREATE TABLE $t (id int, part int) $defaultUsing PARTITIONED BY > (part)") > sql(s"INSERT INTO $t PARTITION (part=0) SELECT 0") > sql(s"INSERT INTO $t PARTITION (part=1) SELECT 1") > QueryTest.checkAnswer(sql(s"SELECT * FROM $t"), Seq(Row(0, 0), Row(1, > 1))) > sql(s"ALTER TABLE $t DROP PARTITION (part=0)") > QueryTest.checkAnswer(sql(s"SELECT * FROM $t"), Seq(Row(1, 1))) > } > } > {code} > on the last check with: > {code} > == Results == > !== Correct Answer - 1 == == Spark Answer - 2 == > !struct<> struct > ![1,1] [0,0] > ! [1,1] > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33736) Handle MERGE in ReplaceNullWithFalseInPredicate
[ https://issues.apache.org/jira/browse/SPARK-33736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258222#comment-17258222 ] Anton Okolnychyi commented on SPARK-33736: -- Sorry, I was on holidays. Will get back to the PR this week. > Handle MERGE in ReplaceNullWithFalseInPredicate > --- > > Key: SPARK-33736 > URL: https://issues.apache.org/jira/browse/SPARK-33736 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Anton Okolnychyi >Priority: Major > > We need to handle merge statements in {{ReplaceNullWithFalseInPredicate}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33994) ORC encryption interop
Gidon Gershinsky created SPARK-33994: Summary: ORC encryption interop Key: SPARK-33994 URL: https://issues.apache.org/jira/browse/SPARK-33994 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.1.0 Reporter: Gidon Gershinsky Test interoperability between stand-alone ORC encryption, and Spark-managed ORC encryption -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33991) Repair enumeration conversion error for AllJobsPage
[ https://issues.apache.org/jira/browse/SPARK-33991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Yi updated SPARK-33991: - Component/s: (was: Web UI) Spark Core > Repair enumeration conversion error for AllJobsPage > --- > > Key: SPARK-33991 > URL: https://issues.apache.org/jira/browse/SPARK-33991 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Felix Yi >Priority: Critical > > For AllJobsPage class, AllJobsPage gets the schedulingMode of enumerated type > by loading the spark.scheduler.mode configuration from Sparkconf, but an > enumeration conversion error occurs when I set the value of this > configuration to lowercase. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33991) Repair enumeration conversion error for AllJobsPage
[ https://issues.apache.org/jira/browse/SPARK-33991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258201#comment-17258201 ] Felix Yi commented on SPARK-33991: -- I saw that the #org.apache.spark.scheduler.TaskSchedulerImpl class convert the spark. scheduler.mode value to uppercase, so I think it should be converted in AllJobsPage as well. {code:java} val schedulingMode: SchedulingMode = try { SchedulingMode.withName(schedulingModeConf.toUpperCase(Locale.ROOT)) } catch { case e: java.util.NoSuchElementException => throw new SparkException(s"Unrecognized $SCHEDULER_MODE_PROPERTY: $schedulingModeConf") } {code} > Repair enumeration conversion error for AllJobsPage > --- > > Key: SPARK-33991 > URL: https://issues.apache.org/jira/browse/SPARK-33991 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Felix Yi >Priority: Critical > > For AllJobsPage class, AllJobsPage gets the schedulingMode of enumerated type > by loading the spark.scheduler.mode configuration from Sparkconf, but an > enumeration conversion error occurs when I set the value of this > configuration to lowercase. > The reason for this problem is that the value of the SchedulingMode > enumeration class is uppercase, which occurs when I configure spark. > scheduler.mode to be lowercase. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25075) Build and test Spark against Scala 2.13
[ https://issues.apache.org/jira/browse/SPARK-25075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258218#comment-17258218 ] Guillaume Martres commented on SPARK-25075: --- Now that 2.13 support is basically complete, would it be possible to publish a preview release of spark 3.1 built against scala 2.13 on maven for testing purposes? Thanks! > Build and test Spark against Scala 2.13 > --- > > Key: SPARK-25075 > URL: https://issues.apache.org/jira/browse/SPARK-25075 > Project: Spark > Issue Type: Umbrella > Components: Build, MLlib, Project Infra, Spark Core, SQL >Affects Versions: 3.0.0 >Reporter: Guillaume Massé >Priority: Major > > This umbrella JIRA tracks the requirements for building and testing Spark > against the current Scala 2.13 milestone. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33948) branch-3.1 jenkins test failed in Scala 2.13
[ https://issues.apache.org/jira/browse/SPARK-33948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258217#comment-17258217 ] Yang Jie commented on SPARK-33948: -- *Sync:* {code:java} commit 58583f7c3fdcac1232607a7ab4b0d052320ac3ea (HEAD -> branch-3.1) Author: xuewei.linxuewei Date: Wed Dec 2 16:10:45 2020 + [SPARK-33619][SQL] Fix GetMapValueUtil code generation error Run completed in 11 minutes, 51 seconds. Total number of tests run: 4623 Suites: completed 256, aborted 0 Tests: succeeded 4607, failed 16, canceled 0, ignored 5, pending 0 *** 16 TESTS FAILED *** {code} {code:java} commit df8d3f1bf779ce1a9f3520939ab85814f09b48b7 (HEAD -> branch-3.1) Author: HyukjinKwon Date: Wed Dec 2 16:03:08 2020 + [SPARK-33544][SQL][FOLLOW-UP] Rename NoSideEffect to NoThrow and clarify the documentation more Run completed in 10 minutes, 39 seconds. Total number of tests run: 4622 Suites: completed 256, aborted 0 Tests: succeeded 4622, failed 0, canceled 0, ignored 5, pending 0 All tests passed. {code} After SPARK-33619 , there are 16 TESTS FAILED in branch-3.1, no further investigation yet > branch-3.1 jenkins test failed in Scala 2.13 > - > > Key: SPARK-33948 > URL: https://issues.apache.org/jira/browse/SPARK-33948 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, SQL >Affects Versions: 3.1.0 > Environment: * > >Reporter: Yang Jie >Priority: Major > > [https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/#showFailuresLink] > * > [org.apache.spark.network.client.TransportClientFactorySuite.reuseClientsUpToConfigVariableConcurrent|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/reuseClientsUpToConfigVariableConcurrent/] > * > [org.apache.spark.network.client.TransportClientFactorySuite.reuseClientsUpToConfigVariableConcurrent|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/reuseClientsUpToConfigVariableConcurrent_2/] > * > [org.apache.spark.network.client.TransportClientFactorySuite.fastFailConnectionInTimeWindow|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/fastFailConnectionInTimeWindow/] > * > [org.apache.spark.network.client.TransportClientFactorySuite.fastFailConnectionInTimeWindow|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/fastFailConnectionInTimeWindow_2/] > * > [org.apache.spark.network.client.TransportClientFactorySuite.closeFactoryBeforeCreateClient|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/closeFactoryBeforeCreateClient/] > * > [org.apache.spark.network.client.TransportClientFactorySuite.closeFactoryBeforeCreateClient|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/closeFactoryBeforeCreateClient_2/] > * > [org.apache.spark.network.client.TransportClientFactorySuite.closeBlockClientsWithFactory|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/closeBlockClientsWithFactory/] > * > [org.apache.spark.network.client.TransportClientFactorySuite.closeBlockClientsWithFactory|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/closeBlockClientsWithFactory_2/] > * > [org.apache.spark.network.client.TransportClientFactorySuite.neverReturnInactiveClients|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/neverReturnInactiveClients/] > * > [org.apache.spark.network.client.TransportClientFactorySuite.neverReturnInactiveClients|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/neverReturnInactiveClients_2/] > * >
[jira] [Assigned] (SPARK-33992) resolveOperatorsUpWithNewOutput should wrap allowInvokingTransformsInAnalyzer
[ https://issues.apache.org/jira/browse/SPARK-33992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33992: Assignee: (was: Apache Spark) > resolveOperatorsUpWithNewOutput should wrap allowInvokingTransformsInAnalyzer > - > > Key: SPARK-33992 > URL: https://issues.apache.org/jira/browse/SPARK-33992 > Project: Spark > Issue Type: Bug > Components: SQL, Tests >Affects Versions: 3.1.0 >Reporter: Kent Yao >Priority: Minor > > PaddingAndLengthCheckForCharVarchar could fail query when > resolveOperatorsUpWithNewOutput > with > {code:java} > [info] - char/varchar resolution in sub query *** FAILED *** (367 > milliseconds) > [info] java.lang.RuntimeException: This method should not be called in the > analyzer > [info] at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.assertNotAnalysisRule(AnalysisHelper.scala:150) > [info] at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.assertNotAnalysisRule$(AnalysisHelper.scala:146) > [info] at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.assertNotAnalysisRule(LogicalPlan.scala:29) > [info] at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDown(AnalysisHelper.scala:161) > [info] at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDown$(AnalysisHelper.scala:160) > [info] at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29) > [info] at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29) > [info] at > org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$updateOuterReferencesInSubquery(QueryPlan.scala:267) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33993) Parquet encryption interop
Gidon Gershinsky created SPARK-33993: Summary: Parquet encryption interop Key: SPARK-33993 URL: https://issues.apache.org/jira/browse/SPARK-33993 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.1.0 Reporter: Gidon Gershinsky Test interoperability between stand-alone Parquet encryption, and Spark-managed Parquet encryption -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33875) Implement DESCRIBE COLUMN for v2 catalog
[ https://issues.apache.org/jira/browse/SPARK-33875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-33875: --- Assignee: Terry Kim > Implement DESCRIBE COLUMN for v2 catalog > > > Key: SPARK-33875 > URL: https://issues.apache.org/jira/browse/SPARK-33875 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Terry Kim >Assignee: Terry Kim >Priority: Major > > Implement DESCRIBE COLUMN for v2 catalog -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33984) Upgrade to Py4J 0.10.9.1
[ https://issues.apache.org/jira/browse/SPARK-33984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-33984: - Assignee: Hyukjin Kwon > Upgrade to Py4J 0.10.9.1 > > > Key: SPARK-33984 > URL: https://issues.apache.org/jira/browse/SPARK-33984 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > > Py4J 0.10.9.1 is out with bug fixes. we should better upgrade in PySpark as > well. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-33984) Upgrade to Py4J 0.10.9.1
[ https://issues.apache.org/jira/browse/SPARK-33984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-33984. --- Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 31009 [https://github.com/apache/spark/pull/31009] > Upgrade to Py4J 0.10.9.1 > > > Key: SPARK-33984 > URL: https://issues.apache.org/jira/browse/SPARK-33984 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 3.2.0 > > > Py4J 0.10.9.1 is out with bug fixes. we should better upgrade in PySpark as > well. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-33875) Implement DESCRIBE COLUMN for v2 catalog
[ https://issues.apache.org/jira/browse/SPARK-33875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-33875. - Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 30881 [https://github.com/apache/spark/pull/30881] > Implement DESCRIBE COLUMN for v2 catalog > > > Key: SPARK-33875 > URL: https://issues.apache.org/jira/browse/SPARK-33875 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Terry Kim >Assignee: Terry Kim >Priority: Major > Fix For: 3.2.0 > > > Implement DESCRIBE COLUMN for v2 catalog -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31786) Exception on submitting Spark-Pi to Kubernetes 1.17.3
[ https://issues.apache.org/jira/browse/SPARK-31786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258383#comment-17258383 ] Dongjoon Hyun commented on SPARK-31786: --- Did you do `export HTTP2_DISABLE=true` before `spark-submit`? HTTP2_DISABLE is required all places when you use `K8s client` and technically there exist two places. # Your Mac (Outside K8s cluster): `spark-submit` # Spark Driver Pod (Inside K8s cluster): spark.kubernetes.driverEnv.HTTP2_DISABLE=true > Exception on submitting Spark-Pi to Kubernetes 1.17.3 > - > > Key: SPARK-31786 > URL: https://issues.apache.org/jira/browse/SPARK-31786 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 2.4.5, 3.0.0 >Reporter: Maciej Bryński >Assignee: Dongjoon Hyun >Priority: Blocker > Fix For: 3.0.0 > > > Hi, > I'm getting exception when submitting Spark-Pi app to Kubernetes cluster. > Kubernetes version: 1.17.3 > JDK version: openjdk version "1.8.0_252" > Exception: > {code} > ./bin/spark-submit --master k8s://https://172.31.23.60:8443 --deploy-mode > cluster --name spark-pi --conf > spark.kubernetes.container.image=spark-py:2.4.5 --conf > spark.kubernetes.executor.request.cores=0.1 --conf > spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf > spark.executor.instances=1 local:///opt/spark/examples/src/main/python/pi.py > log4j:WARN No appenders could be found for logger > (io.fabric8.kubernetes.client.Config). > log4j:WARN Please initialize the log4j system properly. > log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more > info. > Using Spark's default log4j profile: > org/apache/spark/log4j-defaults.properties > Exception in thread "main" > io.fabric8.kubernetes.client.KubernetesClientException: Operation: [create] > for kind: [Pod] with name: [null] in namespace: [default] failed. > at > io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:64) > at > io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:72) > at > io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:337) > at > io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:330) > at > org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$2.apply(KubernetesClientApplication.scala:141) > at > org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$2.apply(KubernetesClientApplication.scala:140) > at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2543) > at > org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:140) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:250) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:241) > at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2543) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:241) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:204) > at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845) > at > org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: java.net.SocketException: Broken pipe (Write failed) > at java.net.SocketOutputStream.socketWrite0(Native Method) > at > java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111) > at java.net.SocketOutputStream.write(SocketOutputStream.java:155) > at sun.security.ssl.OutputRecord.writeBuffer(OutputRecord.java:431) > at sun.security.ssl.OutputRecord.write(OutputRecord.java:417) > at > sun.security.ssl.SSLSocketImpl.writeRecordInternal(SSLSocketImpl.java:894) > at sun.security.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:865) > at sun.security.ssl.AppOutputStream.write(AppOutputStream.java:123) > at okio.Okio$1.write(Okio.java:79) > at okio.AsyncTimeout$1.write(AsyncTimeout.java:180) > at
[jira] [Resolved] (SPARK-33990) v2 ALTER TABLE .. DROP PARTITION does not remove data from dropped partition
[ https://issues.apache.org/jira/browse/SPARK-33990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-33990. --- Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 31014 [https://github.com/apache/spark/pull/31014] > v2 ALTER TABLE .. DROP PARTITION does not remove data from dropped partition > > > Key: SPARK-33990 > URL: https://issues.apache.org/jira/browse/SPARK-33990 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > Fix For: 3.2.0 > > > The test fails: > {code:scala} > test("SPARK-X: don not return data from dropped partition") { > withNamespaceAndTable("ns", "tbl") { t => > sql(s"CREATE TABLE $t (id int, part int) $defaultUsing PARTITIONED BY > (part)") > sql(s"INSERT INTO $t PARTITION (part=0) SELECT 0") > sql(s"INSERT INTO $t PARTITION (part=1) SELECT 1") > QueryTest.checkAnswer(sql(s"SELECT * FROM $t"), Seq(Row(0, 0), Row(1, > 1))) > sql(s"ALTER TABLE $t DROP PARTITION (part=0)") > QueryTest.checkAnswer(sql(s"SELECT * FROM $t"), Seq(Row(1, 1))) > } > } > {code} > on the last check with: > {code} > == Results == > !== Correct Answer - 1 == == Spark Answer - 2 == > !struct<> struct > ![1,1] [0,0] > ! [1,1] > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33990) v2 ALTER TABLE .. DROP PARTITION does not remove data from dropped partition
[ https://issues.apache.org/jira/browse/SPARK-33990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-33990: - Assignee: Maxim Gekk > v2 ALTER TABLE .. DROP PARTITION does not remove data from dropped partition > > > Key: SPARK-33990 > URL: https://issues.apache.org/jira/browse/SPARK-33990 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > > The test fails: > {code:scala} > test("SPARK-X: don not return data from dropped partition") { > withNamespaceAndTable("ns", "tbl") { t => > sql(s"CREATE TABLE $t (id int, part int) $defaultUsing PARTITIONED BY > (part)") > sql(s"INSERT INTO $t PARTITION (part=0) SELECT 0") > sql(s"INSERT INTO $t PARTITION (part=1) SELECT 1") > QueryTest.checkAnswer(sql(s"SELECT * FROM $t"), Seq(Row(0, 0), Row(1, > 1))) > sql(s"ALTER TABLE $t DROP PARTITION (part=0)") > QueryTest.checkAnswer(sql(s"SELECT * FROM $t"), Seq(Row(1, 1))) > } > } > {code} > on the last check with: > {code} > == Results == > !== Correct Answer - 1 == == Spark Answer - 2 == > !struct<> struct > ![1,1] [0,0] > ! [1,1] > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-33988) Add an option to enable CBO in TPCDSQueryBenchmark
[ https://issues.apache.org/jira/browse/SPARK-33988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-33988. --- Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 31011 [https://github.com/apache/spark/pull/31011] > Add an option to enable CBO in TPCDSQueryBenchmark > -- > > Key: SPARK-33988 > URL: https://issues.apache.org/jira/browse/SPARK-33988 > Project: Spark > Issue Type: Improvement > Components: SQL, Tests >Affects Versions: 3.2.0 >Reporter: Takeshi Yamamuro >Assignee: Takeshi Yamamuro >Priority: Major > Fix For: 3.2.0 > > > This ticket aims at adding a new option {{--cbo}} to enable CBO in > TPCDSQueryBenchmark. I think this option is useful so as to monitor > performance changes with CBO enabled. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33988) Add an option to enable CBO in TPCDSQueryBenchmark
[ https://issues.apache.org/jira/browse/SPARK-33988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-33988: - Assignee: Takeshi Yamamuro > Add an option to enable CBO in TPCDSQueryBenchmark > -- > > Key: SPARK-33988 > URL: https://issues.apache.org/jira/browse/SPARK-33988 > Project: Spark > Issue Type: Improvement > Components: SQL, Tests >Affects Versions: 3.2.0 >Reporter: Takeshi Yamamuro >Assignee: Takeshi Yamamuro >Priority: Major > > This ticket aims at adding a new option {{--cbo}} to enable CBO in > TPCDSQueryBenchmark. I think this option is useful so as to monitor > performance changes with CBO enabled. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31786) Exception on submitting Spark-Pi to Kubernetes 1.17.3
[ https://issues.apache.org/jira/browse/SPARK-31786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258345#comment-17258345 ] Sachit Murarka commented on SPARK-31786: [~maver1ck] / [~dongjoon] : I am facing this issue. I am using Spark 2.4.7 . I have tried the settings mentioned in the above comments spark.kubernetes.driverEnv.HTTP2_DISABLE=true Following is the exception : Exception in thread "main" io.fabric8.kubernetes.client.KubernetesClientException: Operation: [create] for kind: [Pod] with name: [null] in namespace: [spark-test] failed. at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:64) at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:72) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:337) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:330) at org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$2.apply(KubernetesClientApplication.scala:141) at org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$2.apply(KubernetesClientApplication.scala:140) at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2543) at org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:140) at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:250) at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:241) at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2543) at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:241) at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:204) at [org.apache.spark.deploy.SparkSubmit.org|http://org.apache.spark.deploy.sparksubmit.org/]$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: [java.net|http://java.net/].SocketTimeoutException: connect timed out at [java.net|http://java.net/].PlainSocketImpl.socketConnect(Native Method) at [java.net|http://java.net/].AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) at [java.net|http://java.net/].AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) at [java.net|http://java.net/].AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at [java.net|http://java.net/].SocksSocketImpl.connect(SocksSocketImpl.java:392) at [java.net|http://java.net/].Socket.connect(Socket.java:589) at okhttp3.internal.platform.Platform.connectSocket(Platform.java:129) at okhttp3.internal.connection.RealConnection.connectSocket(RealConnection.java:246) at okhttp3.internal.connection.RealConnection.connect(RealConnection.java:166) at okhttp3.internal.connection.StreamAllocation.findConnection(StreamAllocation.java:257) at okhttp3.internal.connection.StreamAllocation.findHealthyConnection(StreamAllocation.java:135) at okhttp3.internal.connection.StreamAllocation.newStream(StreamAllocation.java:114) at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:42) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121) at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121) at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:126) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121) at io.fabric8.kubernetes.client.utils.BackwardsCompatibilityInterceptor.intercept(BackwardsCompatibilityInterceptor.java:119) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121) at
[jira] [Updated] (SPARK-33988) Add an option to enable CBO in TPCDSQueryBenchmark
[ https://issues.apache.org/jira/browse/SPARK-33988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-33988: -- Parent: (was: SPARK-33828) Issue Type: Improvement (was: Sub-task) > Add an option to enable CBO in TPCDSQueryBenchmark > -- > > Key: SPARK-33988 > URL: https://issues.apache.org/jira/browse/SPARK-33988 > Project: Spark > Issue Type: Improvement > Components: SQL, Tests >Affects Versions: 3.2.0 >Reporter: Takeshi Yamamuro >Priority: Major > > This ticket aims at adding a new option {{--cbo}} to enable CBO in > TPCDSQueryBenchmark. I think this option is useful so as to monitor > performance changes with CBO enabled. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33988) Add an option to enable CBO in TPCDSQueryBenchmark
[ https://issues.apache.org/jira/browse/SPARK-33988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-33988: -- Parent: SPARK-33828 Issue Type: Sub-task (was: Test) > Add an option to enable CBO in TPCDSQueryBenchmark > -- > > Key: SPARK-33988 > URL: https://issues.apache.org/jira/browse/SPARK-33988 > Project: Spark > Issue Type: Sub-task > Components: SQL, Tests >Affects Versions: 3.2.0 >Reporter: Takeshi Yamamuro >Priority: Major > > This ticket aims at adding a new option {{--cbo}} to enable CBO in > TPCDSQueryBenchmark. I think this option is useful so as to monitor > performance changes with CBO enabled. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33983) Update cloudpickle to v1.6.0
[ https://issues.apache.org/jira/browse/SPARK-33983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-33983: - Assignee: Hyukjin Kwon > Update cloudpickle to v1.6.0 > > > Key: SPARK-33983 > URL: https://issues.apache.org/jira/browse/SPARK-33983 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > > Cloudpickle 1.6.0 is released out. We should better match it to the latest > version. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-33983) Update cloudpickle to v1.6.0
[ https://issues.apache.org/jira/browse/SPARK-33983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-33983. --- Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 31007 [https://github.com/apache/spark/pull/31007] > Update cloudpickle to v1.6.0 > > > Key: SPARK-33983 > URL: https://issues.apache.org/jira/browse/SPARK-33983 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 3.2.0 > > > Cloudpickle 1.6.0 is released out. We should better match it to the latest > version. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-31786) Exception on submitting Spark-Pi to Kubernetes 1.17.3
[ https://issues.apache.org/jira/browse/SPARK-31786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258411#comment-17258411 ] Sachit Murarka edited comment on SPARK-31786 at 1/4/21, 6:41 PM: - [~dongjoon] -> Yes I have used `export HTTP2_DISABLE=true` , but only on my machine . Should it on all nodes of Kubernetes? Also , regarding you mentioned in second point , spark.kubernetes.driverEnv.HTTP2_DISABLE=true has to be used with spark-submit in form of --conf . Please let me know if my understanding is correct. Also , since this is a workaround. What can be the long term solution. Should I consider Spark 3 instead of Spark 2.4.7? was (Author: smurarka): [~dongjoon] -> Yes I have used `export HTTP2_DISABLE=true` , but only on my machine . Should it on all nodes of Kubernetes? Also , regarding you mentioned in second point , spark.kubernetes.driverEnv.HTTP2_DISABLE=true has to be used with spark-submit in form of --conf . Please let me know if my understanding is correct. > Exception on submitting Spark-Pi to Kubernetes 1.17.3 > - > > Key: SPARK-31786 > URL: https://issues.apache.org/jira/browse/SPARK-31786 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 2.4.5, 3.0.0 >Reporter: Maciej Bryński >Assignee: Dongjoon Hyun >Priority: Blocker > Fix For: 3.0.0 > > > Hi, > I'm getting exception when submitting Spark-Pi app to Kubernetes cluster. > Kubernetes version: 1.17.3 > JDK version: openjdk version "1.8.0_252" > Exception: > {code} > ./bin/spark-submit --master k8s://https://172.31.23.60:8443 --deploy-mode > cluster --name spark-pi --conf > spark.kubernetes.container.image=spark-py:2.4.5 --conf > spark.kubernetes.executor.request.cores=0.1 --conf > spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf > spark.executor.instances=1 local:///opt/spark/examples/src/main/python/pi.py > log4j:WARN No appenders could be found for logger > (io.fabric8.kubernetes.client.Config). > log4j:WARN Please initialize the log4j system properly. > log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more > info. > Using Spark's default log4j profile: > org/apache/spark/log4j-defaults.properties > Exception in thread "main" > io.fabric8.kubernetes.client.KubernetesClientException: Operation: [create] > for kind: [Pod] with name: [null] in namespace: [default] failed. > at > io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:64) > at > io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:72) > at > io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:337) > at > io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:330) > at > org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$2.apply(KubernetesClientApplication.scala:141) > at > org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$2.apply(KubernetesClientApplication.scala:140) > at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2543) > at > org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:140) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:250) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:241) > at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2543) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:241) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:204) > at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845) > at > org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: java.net.SocketException: Broken pipe (Write failed) > at java.net.SocketOutputStream.socketWrite0(Native Method) > at > java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111) > at
[jira] [Commented] (SPARK-33908) Refact SparkSubmitUtils.resolveMavenCoordinates return parameter
[ https://issues.apache.org/jira/browse/SPARK-33908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258454#comment-17258454 ] Apache Spark commented on SPARK-33908: -- User 'AngersZh' has created a pull request for this issue: https://github.com/apache/spark/pull/31016 > Refact SparkSubmitUtils.resolveMavenCoordinates return parameter > > > Key: SPARK-33908 > URL: https://issues.apache.org/jira/browse/SPARK-33908 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.2.0 >Reporter: angerszhu >Assignee: angerszhu >Priority: Major > Fix For: 3.2.0 > > > Per talk in https://github.com/apache/spark/pull/29966#discussion_r531917374 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33987) v2 ALTER TABLE .. DROP PARTITION does not refresh cached table
[ https://issues.apache.org/jira/browse/SPARK-33987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258479#comment-17258479 ] Apache Spark commented on SPARK-33987: -- User 'MaxGekk' has created a pull request for this issue: https://github.com/apache/spark/pull/31017 > v2 ALTER TABLE .. DROP PARTITION does not refresh cached table > -- > > Key: SPARK-33987 > URL: https://issues.apache.org/jira/browse/SPARK-33987 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Maxim Gekk >Priority: Major > > The test below portraits the issue: > {code:scala} > test("SPARK-33950: refresh cache after partition dropping") { > withNamespaceAndTable("ns", "tbl") { t => > sql(s"CREATE TABLE $t (id int, part int) $defaultUsing PARTITIONED BY > (part)") > sql(s"INSERT INTO $t PARTITION (part=0) SELECT 0") > sql(s"INSERT INTO $t PARTITION (part=1) SELECT 1") > assert(!spark.catalog.isCached(t)) > sql(s"CACHE TABLE $t") > assert(spark.catalog.isCached(t)) > QueryTest.checkAnswer(sql(s"SELECT * FROM $t"), Seq(Row(0, 0), Row(1, > 1))) > sql(s"ALTER TABLE $t DROP PARTITION (part=0)") > assert(spark.catalog.isCached(t)) > QueryTest.checkAnswer(sql(s"SELECT * FROM $t"), Seq(Row(1, 1))) > } > } > {code} > The last check fails: > {code} > == Results == > !== Correct Answer - 1 == == Spark Answer - 2 == > !struct<> struct > ![1,1] [0,0] > ! [1,1] > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33987) v2 ALTER TABLE .. DROP PARTITION does not refresh cached table
[ https://issues.apache.org/jira/browse/SPARK-33987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33987: Assignee: Apache Spark > v2 ALTER TABLE .. DROP PARTITION does not refresh cached table > -- > > Key: SPARK-33987 > URL: https://issues.apache.org/jira/browse/SPARK-33987 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Maxim Gekk >Assignee: Apache Spark >Priority: Major > > The test below portraits the issue: > {code:scala} > test("SPARK-33950: refresh cache after partition dropping") { > withNamespaceAndTable("ns", "tbl") { t => > sql(s"CREATE TABLE $t (id int, part int) $defaultUsing PARTITIONED BY > (part)") > sql(s"INSERT INTO $t PARTITION (part=0) SELECT 0") > sql(s"INSERT INTO $t PARTITION (part=1) SELECT 1") > assert(!spark.catalog.isCached(t)) > sql(s"CACHE TABLE $t") > assert(spark.catalog.isCached(t)) > QueryTest.checkAnswer(sql(s"SELECT * FROM $t"), Seq(Row(0, 0), Row(1, > 1))) > sql(s"ALTER TABLE $t DROP PARTITION (part=0)") > assert(spark.catalog.isCached(t)) > QueryTest.checkAnswer(sql(s"SELECT * FROM $t"), Seq(Row(1, 1))) > } > } > {code} > The last check fails: > {code} > == Results == > !== Correct Answer - 1 == == Spark Answer - 2 == > !struct<> struct > ![1,1] [0,0] > ! [1,1] > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33987) v2 ALTER TABLE .. DROP PARTITION does not refresh cached table
[ https://issues.apache.org/jira/browse/SPARK-33987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33987: Assignee: (was: Apache Spark) > v2 ALTER TABLE .. DROP PARTITION does not refresh cached table > -- > > Key: SPARK-33987 > URL: https://issues.apache.org/jira/browse/SPARK-33987 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Maxim Gekk >Priority: Major > > The test below portraits the issue: > {code:scala} > test("SPARK-33950: refresh cache after partition dropping") { > withNamespaceAndTable("ns", "tbl") { t => > sql(s"CREATE TABLE $t (id int, part int) $defaultUsing PARTITIONED BY > (part)") > sql(s"INSERT INTO $t PARTITION (part=0) SELECT 0") > sql(s"INSERT INTO $t PARTITION (part=1) SELECT 1") > assert(!spark.catalog.isCached(t)) > sql(s"CACHE TABLE $t") > assert(spark.catalog.isCached(t)) > QueryTest.checkAnswer(sql(s"SELECT * FROM $t"), Seq(Row(0, 0), Row(1, > 1))) > sql(s"ALTER TABLE $t DROP PARTITION (part=0)") > assert(spark.catalog.isCached(t)) > QueryTest.checkAnswer(sql(s"SELECT * FROM $t"), Seq(Row(1, 1))) > } > } > {code} > The last check fails: > {code} > == Results == > !== Correct Answer - 1 == == Spark Answer - 2 == > !struct<> struct > ![1,1] [0,0] > ! [1,1] > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33894) Word2VecSuite failed for Scala 2.13
[ https://issues.apache.org/jira/browse/SPARK-33894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33894: Assignee: (was: Apache Spark) > Word2VecSuite failed for Scala 2.13 > --- > > Key: SPARK-33894 > URL: https://issues.apache.org/jira/browse/SPARK-33894 > Project: Spark > Issue Type: Sub-task > Components: MLlib >Affects Versions: 3.2.0 >Reporter: Darcy Shen >Priority: Major > > This may be the first failed build: > https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-hadoop-2.7-scala-2.13/52/ > h2. Possible Work Around Fix > Move > case class Data(word: String, vector: Array[Float]) > out of the class Word2VecModel > h2. Attempts to git bisect > master branch git "bisect" > cc23581e2645c91fa8d6e6c81dc87b4221718bb1 fail > 3d0323401f7a3e4369a3d3f4ff98f15d19e8a643 fail > 9d9d4a8e122cf1137edeca857e925f7e76c1ace2 fail > f5d2165c95fe83f24be9841807613950c1d5d6d0 fail 2020-12-01 > h2. Attached Stack Trace > To reproduce it in master: > ./dev/change-scala-version.sh 2.13 > sbt -Pscala-2.13 > > project mllib > > testOnly org.apache.spark.ml.feature.Word2VecSuite > [info] Word2VecSuite: > [info] - params (45 milliseconds) > [info] - Word2Vec (5 seconds, 768 milliseconds) > [info] - getVectors (549 milliseconds) > [info] - findSynonyms (222 milliseconds) > [info] - window size (382 milliseconds) > [info] - Word2Vec read/write numPartitions calculation (1 millisecond) > [info] - Word2Vec read/write (669 milliseconds) > [info] - Word2VecModel read/write *** FAILED *** (423 milliseconds) > [info] org.apache.spark.SparkException: Job aborted. > [info] at > org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:231) > [info] at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:188) > [info] at > org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:108) > [info] at > org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:106) > [info] at > org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:131) > [info] at > org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180) > [info] at > org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218) > [info] at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > [info] at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215) > [info] at > org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176) > [info] at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:132) > [info] at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:131) > [info] at > org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:989) > [info] at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103) > [info] at > org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163) > [info] at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90) > [info] at > org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772) > [info] at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64) > [info] at > org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:989) > [info] at > org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:438) > [info] at > org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:415) > [info] at > org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:293) > [info] at > org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:874) > [info] at > org.apache.spark.ml.feature.Word2VecModel$Word2VecModelWriter.saveImpl(Word2Vec.scala:368) > [info] at org.apache.spark.ml.util.MLWriter.save(ReadWrite.scala:168) > [info] at org.apache.spark.ml.util.MLWritable.save(ReadWrite.scala:287) > [info] at org.apache.spark.ml.util.MLWritable.save$(ReadWrite.scala:287) > [info] at org.apache.spark.ml.feature.Word2VecModel.save(Word2Vec.scala:207) > [info] at > org.apache.spark.ml.util.DefaultReadWriteTest.testDefaultReadWrite(DefaultReadWriteTest.scala:51) > [info] at > org.apache.spark.ml.util.DefaultReadWriteTest.testDefaultReadWrite$(DefaultReadWriteTest.scala:42) > [info] at > org.apache.spark.ml.feature.Word2VecSuite.testDefaultReadWrite(Word2VecSuite.scala:28) > [info] at >
[jira] [Commented] (SPARK-33894) Word2VecSuite failed for Scala 2.13
[ https://issues.apache.org/jira/browse/SPARK-33894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258485#comment-17258485 ] Apache Spark commented on SPARK-33894: -- User 'koertkuipers' has created a pull request for this issue: https://github.com/apache/spark/pull/31018 > Word2VecSuite failed for Scala 2.13 > --- > > Key: SPARK-33894 > URL: https://issues.apache.org/jira/browse/SPARK-33894 > Project: Spark > Issue Type: Sub-task > Components: MLlib >Affects Versions: 3.2.0 >Reporter: Darcy Shen >Priority: Major > > This may be the first failed build: > https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-hadoop-2.7-scala-2.13/52/ > h2. Possible Work Around Fix > Move > case class Data(word: String, vector: Array[Float]) > out of the class Word2VecModel > h2. Attempts to git bisect > master branch git "bisect" > cc23581e2645c91fa8d6e6c81dc87b4221718bb1 fail > 3d0323401f7a3e4369a3d3f4ff98f15d19e8a643 fail > 9d9d4a8e122cf1137edeca857e925f7e76c1ace2 fail > f5d2165c95fe83f24be9841807613950c1d5d6d0 fail 2020-12-01 > h2. Attached Stack Trace > To reproduce it in master: > ./dev/change-scala-version.sh 2.13 > sbt -Pscala-2.13 > > project mllib > > testOnly org.apache.spark.ml.feature.Word2VecSuite > [info] Word2VecSuite: > [info] - params (45 milliseconds) > [info] - Word2Vec (5 seconds, 768 milliseconds) > [info] - getVectors (549 milliseconds) > [info] - findSynonyms (222 milliseconds) > [info] - window size (382 milliseconds) > [info] - Word2Vec read/write numPartitions calculation (1 millisecond) > [info] - Word2Vec read/write (669 milliseconds) > [info] - Word2VecModel read/write *** FAILED *** (423 milliseconds) > [info] org.apache.spark.SparkException: Job aborted. > [info] at > org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:231) > [info] at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:188) > [info] at > org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:108) > [info] at > org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:106) > [info] at > org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:131) > [info] at > org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180) > [info] at > org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218) > [info] at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > [info] at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215) > [info] at > org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176) > [info] at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:132) > [info] at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:131) > [info] at > org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:989) > [info] at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103) > [info] at > org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163) > [info] at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90) > [info] at > org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772) > [info] at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64) > [info] at > org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:989) > [info] at > org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:438) > [info] at > org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:415) > [info] at > org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:293) > [info] at > org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:874) > [info] at > org.apache.spark.ml.feature.Word2VecModel$Word2VecModelWriter.saveImpl(Word2Vec.scala:368) > [info] at org.apache.spark.ml.util.MLWriter.save(ReadWrite.scala:168) > [info] at org.apache.spark.ml.util.MLWritable.save(ReadWrite.scala:287) > [info] at org.apache.spark.ml.util.MLWritable.save$(ReadWrite.scala:287) > [info] at org.apache.spark.ml.feature.Word2VecModel.save(Word2Vec.scala:207) > [info] at > org.apache.spark.ml.util.DefaultReadWriteTest.testDefaultReadWrite(DefaultReadWriteTest.scala:51) > [info] at > org.apache.spark.ml.util.DefaultReadWriteTest.testDefaultReadWrite$(DefaultReadWriteTest.scala:42) > [info] at >
[jira] [Assigned] (SPARK-33894) Word2VecSuite failed for Scala 2.13
[ https://issues.apache.org/jira/browse/SPARK-33894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33894: Assignee: Apache Spark > Word2VecSuite failed for Scala 2.13 > --- > > Key: SPARK-33894 > URL: https://issues.apache.org/jira/browse/SPARK-33894 > Project: Spark > Issue Type: Sub-task > Components: MLlib >Affects Versions: 3.2.0 >Reporter: Darcy Shen >Assignee: Apache Spark >Priority: Major > > This may be the first failed build: > https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-hadoop-2.7-scala-2.13/52/ > h2. Possible Work Around Fix > Move > case class Data(word: String, vector: Array[Float]) > out of the class Word2VecModel > h2. Attempts to git bisect > master branch git "bisect" > cc23581e2645c91fa8d6e6c81dc87b4221718bb1 fail > 3d0323401f7a3e4369a3d3f4ff98f15d19e8a643 fail > 9d9d4a8e122cf1137edeca857e925f7e76c1ace2 fail > f5d2165c95fe83f24be9841807613950c1d5d6d0 fail 2020-12-01 > h2. Attached Stack Trace > To reproduce it in master: > ./dev/change-scala-version.sh 2.13 > sbt -Pscala-2.13 > > project mllib > > testOnly org.apache.spark.ml.feature.Word2VecSuite > [info] Word2VecSuite: > [info] - params (45 milliseconds) > [info] - Word2Vec (5 seconds, 768 milliseconds) > [info] - getVectors (549 milliseconds) > [info] - findSynonyms (222 milliseconds) > [info] - window size (382 milliseconds) > [info] - Word2Vec read/write numPartitions calculation (1 millisecond) > [info] - Word2Vec read/write (669 milliseconds) > [info] - Word2VecModel read/write *** FAILED *** (423 milliseconds) > [info] org.apache.spark.SparkException: Job aborted. > [info] at > org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:231) > [info] at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:188) > [info] at > org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:108) > [info] at > org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:106) > [info] at > org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:131) > [info] at > org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180) > [info] at > org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218) > [info] at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > [info] at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215) > [info] at > org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176) > [info] at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:132) > [info] at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:131) > [info] at > org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:989) > [info] at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103) > [info] at > org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163) > [info] at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90) > [info] at > org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772) > [info] at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64) > [info] at > org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:989) > [info] at > org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:438) > [info] at > org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:415) > [info] at > org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:293) > [info] at > org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:874) > [info] at > org.apache.spark.ml.feature.Word2VecModel$Word2VecModelWriter.saveImpl(Word2Vec.scala:368) > [info] at org.apache.spark.ml.util.MLWriter.save(ReadWrite.scala:168) > [info] at org.apache.spark.ml.util.MLWritable.save(ReadWrite.scala:287) > [info] at org.apache.spark.ml.util.MLWritable.save$(ReadWrite.scala:287) > [info] at org.apache.spark.ml.feature.Word2VecModel.save(Word2Vec.scala:207) > [info] at > org.apache.spark.ml.util.DefaultReadWriteTest.testDefaultReadWrite(DefaultReadWriteTest.scala:51) > [info] at > org.apache.spark.ml.util.DefaultReadWriteTest.testDefaultReadWrite$(DefaultReadWriteTest.scala:42) > [info] at > org.apache.spark.ml.feature.Word2VecSuite.testDefaultReadWrite(Word2VecSuite.scala:28) > [info] at >
[jira] [Commented] (SPARK-33894) Word2VecSuite failed for Scala 2.13
[ https://issues.apache.org/jira/browse/SPARK-33894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258486#comment-17258486 ] Apache Spark commented on SPARK-33894: -- User 'koertkuipers' has created a pull request for this issue: https://github.com/apache/spark/pull/31018 > Word2VecSuite failed for Scala 2.13 > --- > > Key: SPARK-33894 > URL: https://issues.apache.org/jira/browse/SPARK-33894 > Project: Spark > Issue Type: Sub-task > Components: MLlib >Affects Versions: 3.2.0 >Reporter: Darcy Shen >Priority: Major > > This may be the first failed build: > https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-hadoop-2.7-scala-2.13/52/ > h2. Possible Work Around Fix > Move > case class Data(word: String, vector: Array[Float]) > out of the class Word2VecModel > h2. Attempts to git bisect > master branch git "bisect" > cc23581e2645c91fa8d6e6c81dc87b4221718bb1 fail > 3d0323401f7a3e4369a3d3f4ff98f15d19e8a643 fail > 9d9d4a8e122cf1137edeca857e925f7e76c1ace2 fail > f5d2165c95fe83f24be9841807613950c1d5d6d0 fail 2020-12-01 > h2. Attached Stack Trace > To reproduce it in master: > ./dev/change-scala-version.sh 2.13 > sbt -Pscala-2.13 > > project mllib > > testOnly org.apache.spark.ml.feature.Word2VecSuite > [info] Word2VecSuite: > [info] - params (45 milliseconds) > [info] - Word2Vec (5 seconds, 768 milliseconds) > [info] - getVectors (549 milliseconds) > [info] - findSynonyms (222 milliseconds) > [info] - window size (382 milliseconds) > [info] - Word2Vec read/write numPartitions calculation (1 millisecond) > [info] - Word2Vec read/write (669 milliseconds) > [info] - Word2VecModel read/write *** FAILED *** (423 milliseconds) > [info] org.apache.spark.SparkException: Job aborted. > [info] at > org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:231) > [info] at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:188) > [info] at > org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:108) > [info] at > org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:106) > [info] at > org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:131) > [info] at > org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180) > [info] at > org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218) > [info] at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > [info] at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215) > [info] at > org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176) > [info] at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:132) > [info] at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:131) > [info] at > org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:989) > [info] at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103) > [info] at > org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163) > [info] at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90) > [info] at > org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772) > [info] at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64) > [info] at > org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:989) > [info] at > org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:438) > [info] at > org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:415) > [info] at > org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:293) > [info] at > org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:874) > [info] at > org.apache.spark.ml.feature.Word2VecModel$Word2VecModelWriter.saveImpl(Word2Vec.scala:368) > [info] at org.apache.spark.ml.util.MLWriter.save(ReadWrite.scala:168) > [info] at org.apache.spark.ml.util.MLWritable.save(ReadWrite.scala:287) > [info] at org.apache.spark.ml.util.MLWritable.save$(ReadWrite.scala:287) > [info] at org.apache.spark.ml.feature.Word2VecModel.save(Word2Vec.scala:207) > [info] at > org.apache.spark.ml.util.DefaultReadWriteTest.testDefaultReadWrite(DefaultReadWriteTest.scala:51) > [info] at > org.apache.spark.ml.util.DefaultReadWriteTest.testDefaultReadWrite$(DefaultReadWriteTest.scala:42) > [info] at >
[jira] [Commented] (SPARK-31786) Exception on submitting Spark-Pi to Kubernetes 1.17.3
[ https://issues.apache.org/jira/browse/SPARK-31786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258411#comment-17258411 ] Sachit Murarka commented on SPARK-31786: [~dongjoon] -> Yes I have used `export HTTP2_DISABLE=true` , but only on my machine . Should it on all nodes of Kubernetes? Also , regarding you mentioned in second point , spark.kubernetes.driverEnv.HTTP2_DISABLE=true has to be used with spark-submit in form of --conf . Please let me know if my understanding is correct. > Exception on submitting Spark-Pi to Kubernetes 1.17.3 > - > > Key: SPARK-31786 > URL: https://issues.apache.org/jira/browse/SPARK-31786 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 2.4.5, 3.0.0 >Reporter: Maciej Bryński >Assignee: Dongjoon Hyun >Priority: Blocker > Fix For: 3.0.0 > > > Hi, > I'm getting exception when submitting Spark-Pi app to Kubernetes cluster. > Kubernetes version: 1.17.3 > JDK version: openjdk version "1.8.0_252" > Exception: > {code} > ./bin/spark-submit --master k8s://https://172.31.23.60:8443 --deploy-mode > cluster --name spark-pi --conf > spark.kubernetes.container.image=spark-py:2.4.5 --conf > spark.kubernetes.executor.request.cores=0.1 --conf > spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf > spark.executor.instances=1 local:///opt/spark/examples/src/main/python/pi.py > log4j:WARN No appenders could be found for logger > (io.fabric8.kubernetes.client.Config). > log4j:WARN Please initialize the log4j system properly. > log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more > info. > Using Spark's default log4j profile: > org/apache/spark/log4j-defaults.properties > Exception in thread "main" > io.fabric8.kubernetes.client.KubernetesClientException: Operation: [create] > for kind: [Pod] with name: [null] in namespace: [default] failed. > at > io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:64) > at > io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:72) > at > io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:337) > at > io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:330) > at > org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$2.apply(KubernetesClientApplication.scala:141) > at > org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$2.apply(KubernetesClientApplication.scala:140) > at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2543) > at > org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:140) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:250) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:241) > at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2543) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:241) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:204) > at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845) > at > org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: java.net.SocketException: Broken pipe (Write failed) > at java.net.SocketOutputStream.socketWrite0(Native Method) > at > java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111) > at java.net.SocketOutputStream.write(SocketOutputStream.java:155) > at sun.security.ssl.OutputRecord.writeBuffer(OutputRecord.java:431) > at sun.security.ssl.OutputRecord.write(OutputRecord.java:417) > at > sun.security.ssl.SSLSocketImpl.writeRecordInternal(SSLSocketImpl.java:894) > at sun.security.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:865) > at sun.security.ssl.AppOutputStream.write(AppOutputStream.java:123) > at okio.Okio$1.write(Okio.java:79) > at okio.AsyncTimeout$1.write(AsyncTimeout.java:180) > at