date:20210104

[jira] [Commented] (SPARK-33958) spark sql DoubleType(0 * (-1)) return "-0.0"

2021-01-04 Thread Zhang Jianguo (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258109#comment-17258109
 ] 

Zhang Jianguo commented on SPARK-33958:
---

[~yumwang]

Gauss and Oracle return 0. And it looks mathe troditional SQL standard better.

My solution as following, plus 0.0 at every return of FloatType and DoubleType.

0.0 + 0.0 = 0.0

-0.0 + 0.0 = 0.0

 

I can provide pull request later.

> spark sql DoubleType(0 * (-1))  return "-0.0"
> -
>
> Key: SPARK-33958
> URL: https://issues.apache.org/jira/browse/SPARK-33958
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.2, 2.4.5, 3.0.0
>Reporter: Zhang Jianguo
>Priority: Minor
>
> spark version: 2.3.2
> {code:java}
> create table test_zjg(a double);
> insert into test_zjg values(-1.0);
> select a*0 from test_zjg
> {code}
>  After select operation, *{color:#de350b}we will get -0.0 which expected as 
> 0.0:{color}*
> \+\+
> \|(a * CAST(0 AS DOUBLE))\|
> \+\+
> \|-0.0                               \|
> \+\+
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33985) Transform with clusterby/orderby/sortby

2021-01-04 Thread angerszhu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

angerszhu updated SPARK-33985:
--
Summary: Transform with clusterby/orderby/sortby  (was: Support transform 
with clusterby/orderby/sortby)

> Transform with clusterby/orderby/sortby
> ---
>
> Key: SPARK-33985
> URL: https://issues.apache.org/jira/browse/SPARK-33985
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33988) Add an option to enable CBO in TPCDSQueryBenchmark

2021-01-04 Thread Takeshi Yamamuro (Jira)

Takeshi Yamamuro created SPARK-33988:


 Summary: Add an option to enable CBO in TPCDSQueryBenchmark
 Key: SPARK-33988
 URL: https://issues.apache.org/jira/browse/SPARK-33988
 Project: Spark
  Issue Type: Test
  Components: SQL, Tests
Affects Versions: 3.2.0
Reporter: Takeshi Yamamuro


This ticket aims at adding a new option {{--cbo}} to enable CBO in 
TPCDSQueryBenchmark. I think this option is useful so as to monitor performance 
changes with CBO enabled.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33977) Add doc for "'like any' and 'like all' operators"

2021-01-04 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258121#comment-17258121
 ] 

Apache Spark commented on SPARK-33977:
--

User 'beliefer' has created a pull request for this issue:
https://github.com/apache/spark/pull/31008

> Add doc for "'like any' and 'like all' operators"
> -
>
> Key: SPARK-33977
> URL: https://issues.apache.org/jira/browse/SPARK-33977
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 3.1.0
>Reporter: Xiao Li
>Priority: Major
>
> Need to update the doc for the new LIKE predicates in the following file:
> [https://github.com/apache/spark/blob/master/docs/sql-ref-syntax-qry-select-like.md]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33977) Add doc for "'like any' and 'like all' operators"

2021-01-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33977:


Assignee: (was: Apache Spark)

> Add doc for "'like any' and 'like all' operators"
> -
>
> Key: SPARK-33977
> URL: https://issues.apache.org/jira/browse/SPARK-33977
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 3.1.0
>Reporter: Xiao Li
>Priority: Major
>
> Need to update the doc for the new LIKE predicates in the following file:
> [https://github.com/apache/spark/blob/master/docs/sql-ref-syntax-qry-select-like.md]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33977) Add doc for "'like any' and 'like all' operators"

2021-01-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33977:


Assignee: Apache Spark

> Add doc for "'like any' and 'like all' operators"
> -
>
> Key: SPARK-33977
> URL: https://issues.apache.org/jira/browse/SPARK-33977
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 3.1.0
>Reporter: Xiao Li
>Assignee: Apache Spark
>Priority: Major
>
> Need to update the doc for the new LIKE predicates in the following file:
> [https://github.com/apache/spark/blob/master/docs/sql-ref-syntax-qry-select-like.md]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33976) Add a dedicated SQL document page for the TRANSFORM-related functionality,

2021-01-04 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258143#comment-17258143
 ] 

Apache Spark commented on SPARK-33976:
--

User 'AngersZh' has created a pull request for this issue:
https://github.com/apache/spark/pull/31010

> Add a dedicated SQL document page for the TRANSFORM-related functionality,
> --
>
> Key: SPARK-33976
> URL: https://issues.apache.org/jira/browse/SPARK-33976
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Priority: Major
>
> Add doc about transform 
> https://github.com/apache/spark/pull/30973#issuecomment-753715318



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33987) v2 ALTER TABLE .. DROP PARTITION does not refresh cached table

2021-01-04 Thread Maxim Gekk (Jira)

Maxim Gekk created SPARK-33987:
--

 Summary: v2 ALTER TABLE .. DROP PARTITION does not refresh cached 
table
 Key: SPARK-33987
 URL: https://issues.apache.org/jira/browse/SPARK-33987
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Maxim Gekk


The test below portraits the issue:
{code:scala}
  test("SPARK-33950: refresh cache after partition dropping") {
withNamespaceAndTable("ns", "tbl") { t =>
  sql(s"CREATE TABLE $t (id int, part int) $defaultUsing PARTITIONED BY 
(part)")
  sql(s"INSERT INTO $t PARTITION (part=0) SELECT 0")
  sql(s"INSERT INTO $t PARTITION (part=1) SELECT 1")
  assert(!spark.catalog.isCached(t))
  sql(s"CACHE TABLE $t")
  assert(spark.catalog.isCached(t))
  QueryTest.checkAnswer(sql(s"SELECT * FROM $t"), Seq(Row(0, 0), Row(1, 1)))
  sql(s"ALTER TABLE $t DROP PARTITION (part=0)")
  assert(spark.catalog.isCached(t))
  QueryTest.checkAnswer(sql(s"SELECT * FROM $t"), Seq(Row(1, 1)))
}
  }
{code}
The last check fails:
{code}
== Results ==
!== Correct Answer - 1 ==   == Spark Answer - 2 ==
!struct<>   struct
![1,1]  [0,0]
!   [1,1]
{code}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33988) Add an option to enable CBO in TPCDSQueryBenchmark

2021-01-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33988:


Assignee: Apache Spark

> Add an option to enable CBO in TPCDSQueryBenchmark
> --
>
> Key: SPARK-33988
> URL: https://issues.apache.org/jira/browse/SPARK-33988
> Project: Spark
>  Issue Type: Test
>  Components: SQL, Tests
>Affects Versions: 3.2.0
>Reporter: Takeshi Yamamuro
>Assignee: Apache Spark
>Priority: Major
>
> This ticket aims at adding a new option {{--cbo}} to enable CBO in 
> TPCDSQueryBenchmark. I think this option is useful so as to monitor 
> performance changes with CBO enabled.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33988) Add an option to enable CBO in TPCDSQueryBenchmark

2021-01-04 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258155#comment-17258155
 ] 

Apache Spark commented on SPARK-33988:
--

User 'maropu' has created a pull request for this issue:
https://github.com/apache/spark/pull/31011

> Add an option to enable CBO in TPCDSQueryBenchmark
> --
>
> Key: SPARK-33988
> URL: https://issues.apache.org/jira/browse/SPARK-33988
> Project: Spark
>  Issue Type: Test
>  Components: SQL, Tests
>Affects Versions: 3.2.0
>Reporter: Takeshi Yamamuro
>Priority: Major
>
> This ticket aims at adding a new option {{--cbo}} to enable CBO in 
> TPCDSQueryBenchmark. I think this option is useful so as to monitor 
> performance changes with CBO enabled.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33989) Strip auto-generated cast when resolving UnresolvedAlias

2021-01-04 Thread ulysses you (Jira)

ulysses you created SPARK-33989:
---

 Summary: Strip auto-generated cast when resolving UnresolvedAlias
 Key: SPARK-33989
 URL: https://issues.apache.org/jira/browse/SPARK-33989
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.2.0
Reporter: ulysses you


During analysis we may introduce the Cast if exists type cast implicitly. That 
makes assgined name unclear.

Let's say we have a sql `select id == null` which id is int type, then the 
output field name will be `(id = CAST(null as int))`.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33987) v2 ALTER TABLE .. DROP PARTITION does not refresh cached table

2021-01-04 Thread Maxim Gekk (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258179#comment-17258179
 ] 

Maxim Gekk commented on SPARK-33987:


I am working on this

> v2 ALTER TABLE .. DROP PARTITION does not refresh cached table
> --
>
> Key: SPARK-33987
> URL: https://issues.apache.org/jira/browse/SPARK-33987
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Priority: Major
>
> The test below portraits the issue:
> {code:scala}
>   test("SPARK-33950: refresh cache after partition dropping") {
> withNamespaceAndTable("ns", "tbl") { t =>
>   sql(s"CREATE TABLE $t (id int, part int) $defaultUsing PARTITIONED BY 
> (part)")
>   sql(s"INSERT INTO $t PARTITION (part=0) SELECT 0")
>   sql(s"INSERT INTO $t PARTITION (part=1) SELECT 1")
>   assert(!spark.catalog.isCached(t))
>   sql(s"CACHE TABLE $t")
>   assert(spark.catalog.isCached(t))
>   QueryTest.checkAnswer(sql(s"SELECT * FROM $t"), Seq(Row(0, 0), Row(1, 
> 1)))
>   sql(s"ALTER TABLE $t DROP PARTITION (part=0)")
>   assert(spark.catalog.isCached(t))
>   QueryTest.checkAnswer(sql(s"SELECT * FROM $t"), Seq(Row(1, 1)))
> }
>   }
> {code}
> The last check fails:
> {code}
> == Results ==
> !== Correct Answer - 1 ==   == Spark Answer - 2 ==
> !struct<>   struct
> ![1,1]  [0,0]
> !   [1,1]
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33990) v2 ALTER TABLE .. DROP PARTITION does not remove data from dropped partition

2021-01-04 Thread Maxim Gekk (Jira)

Maxim Gekk created SPARK-33990:
--

 Summary: v2 ALTER TABLE .. DROP PARTITION does not remove data 
from dropped partition
 Key: SPARK-33990
 URL: https://issues.apache.org/jira/browse/SPARK-33990
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Maxim Gekk


The test fails:
{code:scala}
  test("SPARK-X: don not return data from dropped partition") {
withNamespaceAndTable("ns", "tbl") { t =>
  sql(s"CREATE TABLE $t (id int, part int) $defaultUsing PARTITIONED BY 
(part)")
  sql(s"INSERT INTO $t PARTITION (part=0) SELECT 0")
  sql(s"INSERT INTO $t PARTITION (part=1) SELECT 1")
  QueryTest.checkAnswer(sql(s"SELECT * FROM $t"), Seq(Row(0, 0), Row(1, 1)))
  sql(s"ALTER TABLE $t DROP PARTITION (part=0)")
  QueryTest.checkAnswer(sql(s"SELECT * FROM $t"), Seq(Row(1, 1)))
}
  }
{code}
on the last check with:
{code}
== Results ==
!== Correct Answer - 1 ==   == Spark Answer - 2 ==
!struct<>   struct
![1,1]  [0,0]
!   [1,1]
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33950) ALTER TABLE .. DROP PARTITION doesn't refresh cache

2021-01-04 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258075#comment-17258075
 ] 

Apache Spark commented on SPARK-33950:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/31006

> ALTER TABLE .. DROP PARTITION doesn't refresh cache
> ---
>
> Key: SPARK-33950
> URL: https://issues.apache.org/jira/browse/SPARK-33950
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
> Fix For: 3.2.0
>
>
> Here is the example to reproduce the issue:
> {code:sql}
> spark-sql> CREATE TABLE tbl1 (col0 int, part0 int) USING parquet PARTITIONED 
> BY (part0);
> spark-sql> INSERT INTO tbl1 PARTITION (part0=0) SELECT 0;
> spark-sql> INSERT INTO tbl1 PARTITION (part0=1) SELECT 1;
> spark-sql> CACHE TABLE tbl1;
> spark-sql> SELECT * FROM tbl1;
> 0 0
> 1 1
> spark-sql> ALTER TABLE tbl1 DROP PARTITION (part0=0);
> spark-sql> SELECT * FROM tbl1;
> 0 0
> 1 1
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33983) Update cloudpickle to v1.6.0

2021-01-04 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258104#comment-17258104
 ] 

Apache Spark commented on SPARK-33983:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/31007

> Update cloudpickle to v1.6.0
> 
>
> Key: SPARK-33983
> URL: https://issues.apache.org/jira/browse/SPARK-33983
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> Cloudpickle 1.6.0 is released out. We should better match it to the latest 
> version.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-33982) Sparksql does not support when the inserted table is a read table

2021-01-04 Thread hao (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258052#comment-17258052
 ] 

hao edited comment on SPARK-33982 at 1/4/21, 11:19 AM:
---

我认为sparksql应该得到支持insert overwrite 读取表中


was (Author: hao.duan):
我认为sparksql应该得到支持

> Sparksql does not support when the inserted table is a read table
> -
>
> Key: SPARK-33982
> URL: https://issues.apache.org/jira/browse/SPARK-33982
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1
>Reporter: hao
>Priority: Major
>
> When the inserted table is a read table, sparksql will throw an error - > 
> org.apache.spark . sql.AnalysisException : Cannot overwrite a path that is 
> also being read from.;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33949) Make approx_count_distinct result consistent whether Optimize rule exists or not

2021-01-04 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258074#comment-17258074
 ] 

Apache Spark commented on SPARK-33949:
--

User 'ulysses-you' has created a pull request for this issue:
https://github.com/apache/spark/pull/31005

> Make approx_count_distinct result consistent whether Optimize rule exists or 
> not
> 
>
> Key: SPARK-33949
> URL: https://issues.apache.org/jira/browse/SPARK-33949
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: ulysses you
>Priority: Minor
>
> This code will fail because folabe value not fold, we should keep result 
> consistent whether Optimize rule exists or not.
> {code:java}
> val excludedRules = Seq(ConstantFolding, 
> ReorderAssociativeOperator).map(_.ruleName)
> withSQLConf(SQLConf.OPTIMIZER_EXCLUDED_RULES.key -> 
> excludedRules.mkString(",")) {
>   sql("select approx_count_distinct(1, 0.01 + 0.02)")
> }{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33949) Make approx_count_distinct result consistent whether Optimize rule exists or not

2021-01-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33949:


Assignee: Apache Spark

> Make approx_count_distinct result consistent whether Optimize rule exists or 
> not
> 
>
> Key: SPARK-33949
> URL: https://issues.apache.org/jira/browse/SPARK-33949
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: ulysses you
>Assignee: Apache Spark
>Priority: Minor
>
> This code will fail because folabe value not fold, we should keep result 
> consistent whether Optimize rule exists or not.
> {code:java}
> val excludedRules = Seq(ConstantFolding, 
> ReorderAssociativeOperator).map(_.ruleName)
> withSQLConf(SQLConf.OPTIMIZER_EXCLUDED_RULES.key -> 
> excludedRules.mkString(",")) {
>   sql("select approx_count_distinct(1, 0.01 + 0.02)")
> }{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33949) Make approx_count_distinct result consistent whether Optimize rule exists or not

2021-01-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33949:


Assignee: (was: Apache Spark)

> Make approx_count_distinct result consistent whether Optimize rule exists or 
> not
> 
>
> Key: SPARK-33949
> URL: https://issues.apache.org/jira/browse/SPARK-33949
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: ulysses you
>Priority: Minor
>
> This code will fail because folabe value not fold, we should keep result 
> consistent whether Optimize rule exists or not.
> {code:java}
> val excludedRules = Seq(ConstantFolding, 
> ReorderAssociativeOperator).map(_.ruleName)
> withSQLConf(SQLConf.OPTIMIZER_EXCLUDED_RULES.key -> 
> excludedRules.mkString(",")) {
>   sql("select approx_count_distinct(1, 0.01 + 0.02)")
> }{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33983) Update cloudpickle to v1.6.0

2021-01-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33983:


Assignee: (was: Apache Spark)

> Update cloudpickle to v1.6.0
> 
>
> Key: SPARK-33983
> URL: https://issues.apache.org/jira/browse/SPARK-33983
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> Cloudpickle 1.6.0 is released out. We should better match it to the latest 
> version.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33983) Update cloudpickle to v1.6.0

2021-01-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33983:


Assignee: Apache Spark

> Update cloudpickle to v1.6.0
> 
>
> Key: SPARK-33983
> URL: https://issues.apache.org/jira/browse/SPARK-33983
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Hyukjin Kwon
>Assignee: Apache Spark
>Priority: Major
>
> Cloudpickle 1.6.0 is released out. We should better match it to the latest 
> version.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-33958) spark sql DoubleType(0 * (-1)) return "-0.0"

2021-01-04 Thread Zhang Jianguo (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258109#comment-17258109
 ] 

Zhang Jianguo edited comment on SPARK-33958 at 1/4/21, 9:50 AM:


[~yumwang]

Gauss and Oracle return 0. And it looks match SQL standard better.

My solution as following, plus 0.0 at every return of FloatType and DoubleType.

0.0 + 0.0 = 0.0

-0.0 + 0.0 = 0.0

 

I can provide pull request later.


was (Author: alberyzjg):
[~yumwang]

Gauss and Oracle return 0. And it looks mathe troditional SQL standard better.

My solution as following, plus 0.0 at every return of FloatType and DoubleType.

0.0 + 0.0 = 0.0

-0.0 + 0.0 = 0.0

 

I can provide pull request later.

> spark sql DoubleType(0 * (-1))  return "-0.0"
> -
>
> Key: SPARK-33958
> URL: https://issues.apache.org/jira/browse/SPARK-33958
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.2, 2.4.5, 3.0.0
>Reporter: Zhang Jianguo
>Priority: Minor
>
> spark version: 2.3.2
> {code:java}
> create table test_zjg(a double);
> insert into test_zjg values(-1.0);
> select a*0 from test_zjg
> {code}
>  After select operation, *{color:#de350b}we will get -0.0 which expected as 
> 0.0:{color}*
> \+\+
> \|(a * CAST(0 AS DOUBLE))\|
> \+\+
> \|-0.0                               \|
> \+\+
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33984) Upgrade to Py4J 0.10.9.1

2021-01-04 Thread Hyukjin Kwon (Jira)

Hyukjin Kwon created SPARK-33984:


 Summary: Upgrade to Py4J 0.10.9.1
 Key: SPARK-33984
 URL: https://issues.apache.org/jira/browse/SPARK-33984
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 3.2.0
Reporter: Hyukjin Kwon


Py4J 0.10.9.1 is out with bug fixes. we should better upgrade in PySpark as 
well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33985) Support transform with clusterby/orderby/sortby

2021-01-04 Thread angerszhu (Jira)

angerszhu created SPARK-33985:
-

 Summary: Support transform with clusterby/orderby/sortby
 Key: SPARK-33985
 URL: https://issues.apache.org/jira/browse/SPARK-33985
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: angerszhu






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33990) v2 ALTER TABLE .. DROP PARTITION does not remove data from dropped partition

2021-01-04 Thread Maxim Gekk (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258183#comment-17258183
 ] 

Maxim Gekk commented on SPARK-33990:


I am working on the issue.

> v2 ALTER TABLE .. DROP PARTITION does not remove data from dropped partition
> 
>
> Key: SPARK-33990
> URL: https://issues.apache.org/jira/browse/SPARK-33990
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Priority: Major
>
> The test fails:
> {code:scala}
>   test("SPARK-X: don not return data from dropped partition") {
> withNamespaceAndTable("ns", "tbl") { t =>
>   sql(s"CREATE TABLE $t (id int, part int) $defaultUsing PARTITIONED BY 
> (part)")
>   sql(s"INSERT INTO $t PARTITION (part=0) SELECT 0")
>   sql(s"INSERT INTO $t PARTITION (part=1) SELECT 1")
>   QueryTest.checkAnswer(sql(s"SELECT * FROM $t"), Seq(Row(0, 0), Row(1, 
> 1)))
>   sql(s"ALTER TABLE $t DROP PARTITION (part=0)")
>   QueryTest.checkAnswer(sql(s"SELECT * FROM $t"), Seq(Row(1, 1)))
> }
>   }
> {code}
> on the last check with:
> {code}
> == Results ==
> !== Correct Answer - 1 ==   == Spark Answer - 2 ==
> !struct<>   struct
> ![1,1]  [0,0]
> !   [1,1]
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33005) Kubernetes GA Preparation

2021-01-04 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258047#comment-17258047
 ] 

Dongjoon Hyun commented on SPARK-33005:
---

Sure, [~hyukjin.kwon].

> Kubernetes GA Preparation
> -
>
> Key: SPARK-33005
> URL: https://issues.apache.org/jira/browse/SPARK-33005
> Project: Spark
>  Issue Type: Umbrella
>  Components: Kubernetes
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: releasenotes
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33984) Upgrade to Py4J 0.10.9.1

2021-01-04 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258134#comment-17258134
 ] 

Apache Spark commented on SPARK-33984:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/31009

> Upgrade to Py4J 0.10.9.1
> 
>
> Key: SPARK-33984
> URL: https://issues.apache.org/jira/browse/SPARK-33984
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> Py4J 0.10.9.1 is out with bug fixes. we should better upgrade in PySpark as 
> well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33984) Upgrade to Py4J 0.10.9.1

2021-01-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33984:


Assignee: (was: Apache Spark)

> Upgrade to Py4J 0.10.9.1
> 
>
> Key: SPARK-33984
> URL: https://issues.apache.org/jira/browse/SPARK-33984
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> Py4J 0.10.9.1 is out with bug fixes. we should better upgrade in PySpark as 
> well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33984) Upgrade to Py4J 0.10.9.1

2021-01-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33984:


Assignee: Apache Spark

> Upgrade to Py4J 0.10.9.1
> 
>
> Key: SPARK-33984
> URL: https://issues.apache.org/jira/browse/SPARK-33984
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Hyukjin Kwon
>Assignee: Apache Spark
>Priority: Major
>
> Py4J 0.10.9.1 is out with bug fixes. we should better upgrade in PySpark as 
> well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33984) Upgrade to Py4J 0.10.9.1

2021-01-04 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258135#comment-17258135
 ] 

Apache Spark commented on SPARK-33984:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/31009

> Upgrade to Py4J 0.10.9.1
> 
>
> Key: SPARK-33984
> URL: https://issues.apache.org/jira/browse/SPARK-33984
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> Py4J 0.10.9.1 is out with bug fixes. we should better upgrade in PySpark as 
> well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33976) Add a dedicated SQL document page for the TRANSFORM-related functionality,

2021-01-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33976:


Assignee: Apache Spark

> Add a dedicated SQL document page for the TRANSFORM-related functionality,
> --
>
> Key: SPARK-33976
> URL: https://issues.apache.org/jira/browse/SPARK-33976
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Assignee: Apache Spark
>Priority: Major
>
> Add doc about transform 
> https://github.com/apache/spark/pull/30973#issuecomment-753715318



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33986) Spark handle always return LOST status in standalone cluster mode with Spark launcher

2021-01-04 Thread ZhongyuWang (Jira)

ZhongyuWang created SPARK-33986:
---

 Summary: Spark handle always return LOST status in standalone 
cluster mode with Spark launcher
 Key: SPARK-33986
 URL: https://issues.apache.org/jira/browse/SPARK-33986
 Project: Spark
  Issue Type: Question
  Components: Spark Submit
Affects Versions: 2.4.4
 Environment: apache hadoop 2.6.5

apache spark 2.4.4
Reporter: ZhongyuWang


I can use it to submit spark app successfully in standalone client/yarn 
client/yarn cluster mode，and get correct app status, but when i submit spark 
app in standalone cluster mode, Spark handle always return LOST status(once) 
and app running stablely until FINISHED( handle wasn't get any state change 
infomation).  I noticed when I submited app from code, after a while, the 
SparkSubmit process was suddenly stopped. I checked sparkSubmit log(launcher 
redirect log) doesn't have any useful information.

this is my pseudo code,
{code:java}
SparkAppHandle handle = launcher.startApplication(new SparkAppHandle.Listener() 
{
@Override
public void stateChanged(SparkAppHandle handle) {
stateChangedHandle(handle.getAppId(), jobId, code, execId, 
handle.getState(), driverInfo, request, infoLog, errorLog);
}
@Override
public void infoChanged(SparkAppHandle handle) {
stateChangedHandle(handle.getAppId(), jobId, code, execId, 
handle.getState(), driverInfo, request, infoLog, errorLog);
}
});{code}
any idea ? thx

 

 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33976) Add a dedicated SQL document page for the TRANSFORM-related functionality,

2021-01-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33976:


Assignee: (was: Apache Spark)

> Add a dedicated SQL document page for the TRANSFORM-related functionality,
> --
>
> Key: SPARK-33976
> URL: https://issues.apache.org/jira/browse/SPARK-33976
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Priority: Major
>
> Add doc about transform 
> https://github.com/apache/spark/pull/30973#issuecomment-753715318



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33986) Spark handle always return LOST status in standalone cluster mode with Spark launcher

2021-01-04 Thread ZhongyuWang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZhongyuWang updated SPARK-33986:

Description: 
I can use it to submit spark app successfully in standalone client/yarn 
client/yarn cluster mode，and get correct app status, but when i submit spark 
app in standalone cluster mode, Spark handle always return LOST status(once) 
and app running stablely until FINISHED( handle wasn't get any state change 
infomation).  I noticed when I submited app from code, after a while, the 
SparkSubmit process was suddenly stopped. I checked sparkSubmit log(launcher 
redirect log) doesn't have any useful information.

this is my pseudo code,
{code:java}
SparkAppHandle handle = launcher.startApplication(new SparkAppHandle.Listener() 
{
@Override
public void stateChanged(SparkAppHandle handle) {
stateChangedHandle(handle.getAppId(), jobId, code, execId, 
handle.getState(), driverInfo, request, infoLog, errorLog);
}
@Override
public void infoChanged(SparkAppHandle handle) {
stateChangedHandle(handle.getAppId(), jobId, code, execId, 
handle.getState(), driverInfo, request, infoLog, errorLog);
}
});{code}
any idea ? thx

  was:
I can use it to submit spark app successfully in standalone client/yarn 
client/yarn cluster mode，and get correct app status, but when i submit spark 
app in standalone cluster mode, Spark handle always return LOST status(once) 
and app running stablely until FINISHED( handle wasn't get any state change 
infomation).  I noticed when I submited app from code, after a while, the 
SparkSubmit process was suddenly stopped. I checked sparkSubmit log(launcher 
redirect log) doesn't have any useful information.

this is my pseudo code,
{code:java}
SparkAppHandle handle = launcher.startApplication(new SparkAppHandle.Listener() 
{
@Override
public void stateChanged(SparkAppHandle handle) {
stateChangedHandle(handle.getAppId(), jobId, code, execId, 
handle.getState(), driverInfo, request, infoLog, errorLog);
}
@Override
public void infoChanged(SparkAppHandle handle) {
stateChangedHandle(handle.getAppId(), jobId, code, execId, 
handle.getState(), driverInfo, request, infoLog, errorLog);
}
});{code}
any idea ? thx

 

 

 

 


> Spark handle always return LOST status in standalone cluster mode with Spark 
> launcher
> -
>
> Key: SPARK-33986
> URL: https://issues.apache.org/jira/browse/SPARK-33986
> Project: Spark
>  Issue Type: Question
>  Components: Spark Submit
>Affects Versions: 2.4.4
> Environment: apache hadoop 2.6.5
> apache spark 2.4.4
>Reporter: ZhongyuWang
>Priority: Major
>
> I can use it to submit spark app successfully in standalone client/yarn 
> client/yarn cluster mode，and get correct app status, but when i submit spark 
> app in standalone cluster mode, Spark handle always return LOST status(once) 
> and app running stablely until FINISHED( handle wasn't get any state change 
> infomation).  I noticed when I submited app from code, after a while, the 
> SparkSubmit process was suddenly stopped. I checked sparkSubmit log(launcher 
> redirect log) doesn't have any useful information.
> this is my pseudo code,
> {code:java}
> SparkAppHandle handle = launcher.startApplication(new 
> SparkAppHandle.Listener() {
> @Override
> public void stateChanged(SparkAppHandle handle) {
> stateChangedHandle(handle.getAppId(), jobId, code, execId, 
> handle.getState(), driverInfo, request, infoLog, errorLog);
> }
> @Override
> public void infoChanged(SparkAppHandle handle) {
> stateChangedHandle(handle.getAppId(), jobId, code, execId, 
> handle.getState(), driverInfo, request, infoLog, errorLog);
> }
> });{code}
> any idea ? thx



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33976) Add a dedicated SQL document page for the TRANSFORM-related functionality,

2021-01-04 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258141#comment-17258141
 ] 

Apache Spark commented on SPARK-33976:
--

User 'AngersZh' has created a pull request for this issue:
https://github.com/apache/spark/pull/31010

> Add a dedicated SQL document page for the TRANSFORM-related functionality,
> --
>
> Key: SPARK-33976
> URL: https://issues.apache.org/jira/browse/SPARK-33976
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Priority: Major
>
> Add doc about transform 
> https://github.com/apache/spark/pull/30973#issuecomment-753715318



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33985) Transform with clusterby/orderby/sortby

2021-01-04 Thread angerszhu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

angerszhu updated SPARK-33985:
--
Description: Need to add UT to make sure data same with Hive

> Transform with clusterby/orderby/sortby
> ---
>
> Key: SPARK-33985
> URL: https://issues.apache.org/jira/browse/SPARK-33985
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Priority: Major
>
> Need to add UT to make sure data same with Hive



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33983) Update cloudpickle to v1.6.0

2021-01-04 Thread Hyukjin Kwon (Jira)

Hyukjin Kwon created SPARK-33983:


 Summary: Update cloudpickle to v1.6.0
 Key: SPARK-33983
 URL: https://issues.apache.org/jira/browse/SPARK-33983
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 3.2.0
Reporter: Hyukjin Kwon


Cloudpickle 1.6.0 is released out. We should better match it to the latest 
version.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33988) Add an option to enable CBO in TPCDSQueryBenchmark

2021-01-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33988:


Assignee: (was: Apache Spark)

> Add an option to enable CBO in TPCDSQueryBenchmark
> --
>
> Key: SPARK-33988
> URL: https://issues.apache.org/jira/browse/SPARK-33988
> Project: Spark
>  Issue Type: Test
>  Components: SQL, Tests
>Affects Versions: 3.2.0
>Reporter: Takeshi Yamamuro
>Priority: Major
>
> This ticket aims at adding a new option {{--cbo}} to enable CBO in 
> TPCDSQueryBenchmark. I think this option is useful so as to monitor 
> performance changes with CBO enabled.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-33005) Kubernetes GA Preparation

2021-01-04 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-33005.
---
Resolution: Done

> Kubernetes GA Preparation
> -
>
> Key: SPARK-33005
> URL: https://issues.apache.org/jira/browse/SPARK-33005
> Project: Spark
>  Issue Type: Umbrella
>  Components: Kubernetes
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: releasenotes
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33711) Race condition in Spark k8s Pod lifecycle manager that leads to shutdowns

2021-01-04 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-33711:
--
Parent: (was: SPARK-33005)
Issue Type: Bug  (was: Sub-task)

>  Race condition in Spark k8s Pod lifecycle manager that leads to shutdowns
> --
>
> Key: SPARK-33711
> URL: https://issues.apache.org/jira/browse/SPARK-33711
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.3.4, 2.4.7, 3.0.0, 3.1.0, 3.2.0
>Reporter: Attila Zsolt Piros
>Priority: Major
>
> Watching a POD (ExecutorPodsWatchSnapshotSource) informs about single POD 
> changes which could wrongfully lead to detecting of missing PODs (PODs known 
> by scheduler backend but missing from POD snapshots) by the executor POD 
> lifecycle manager.
> A key indicator of this is seeing this log msg:
> "The executor with ID [some_id] was not found in the cluster but we didn't 
> get a reason why. Marking the executor as failed. The executor may have been 
> deleted but the driver missed the deletion event."
> So one of the problem is running the missing POD detection even when a single 
> pod is changed without having a full consistent snapshot about all the PODs 
> (see ExecutorPodsPollingSnapshotSource). The other could be a race between 
> the executor POD lifecycle manager and the scheduler backend.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33982) Sparksql does not support when the inserted table is a read table

2021-01-04 Thread hao (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258052#comment-17258052
 ] 

hao commented on SPARK-33982:
-

我认为sparksql应该得到支持

> Sparksql does not support when the inserted table is a read table
> -
>
> Key: SPARK-33982
> URL: https://issues.apache.org/jira/browse/SPARK-33982
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1
>Reporter: hao
>Priority: Major
>
> When the inserted table is a read table, sparksql will throw an error - > 
> org.apache.spark . sql.AnalysisException : Cannot overwrite a path that is 
> also being read from.;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33982) Sparksql does not support when the inserted table is a read table

2021-01-04 Thread hao (Jira)

hao created SPARK-33982:
---

 Summary: Sparksql does not support when the inserted table is a 
read table
 Key: SPARK-33982
 URL: https://issues.apache.org/jira/browse/SPARK-33982
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.1
Reporter: hao


When the inserted table is a read table, sparksql will throw an error - > 
org.apache.spark . sql.AnalysisException : Cannot overwrite a path that is also 
being read from.;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33965) CACHE TABLE does not support `spark_catalog` in Hive table names

2021-01-04 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-33965:
---

Assignee: Maxim Gekk

> CACHE TABLE does not support `spark_catalog` in Hive table names
> 
>
> Key: SPARK-33965
> URL: https://issues.apache.org/jira/browse/SPARK-33965
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
>
> The test fails:
> {code:scala}
>   test("SPARK-X: cache table in spark_catalog") {
> withNamespace("spark_catalog.ns") {
>   sql("CREATE NAMESPACE spark_catalog.ns")
>   val t = "spark_catalog.ns.tbl"
>   withTable(t) {
> sql(s"CREATE TABLE $t (col int)")
> assert(!spark.catalog.isCached(t))
> sql(s"CACHE TABLE $t")
> assert(spark.catalog.isCached(t))
>   }
> }
>   }
> {code}
> with the exception:
> {code:java}
> [info] - SPARK-X: cache table in spark_catalog *** FAILED *** (278 
> milliseconds)
> [info]   org.apache.spark.sql.AnalysisException: spark_catalog.ns.tbl is not 
> a valid TableIdentifier as it has more than 2 name parts.
> [info]   at 
> org.apache.spark.sql.connector.catalog.CatalogV2Implicits$MultipartIdentifierHelper.asTableIdentifier(CatalogV2Implicits.scala:130)
> [info]   at 
> org.apache.spark.sql.hive.test.TestHiveQueryExecution.$anonfun$analyzed$1(TestHive.scala:600)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-33965) CACHE TABLE does not support `spark_catalog` in Hive table names

2021-01-04 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-33965.
-
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 30997
[https://github.com/apache/spark/pull/30997]

> CACHE TABLE does not support `spark_catalog` in Hive table names
> 
>
> Key: SPARK-33965
> URL: https://issues.apache.org/jira/browse/SPARK-33965
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
> Fix For: 3.2.0
>
>
> The test fails:
> {code:scala}
>   test("SPARK-X: cache table in spark_catalog") {
> withNamespace("spark_catalog.ns") {
>   sql("CREATE NAMESPACE spark_catalog.ns")
>   val t = "spark_catalog.ns.tbl"
>   withTable(t) {
> sql(s"CREATE TABLE $t (col int)")
> assert(!spark.catalog.isCached(t))
> sql(s"CACHE TABLE $t")
> assert(spark.catalog.isCached(t))
>   }
> }
>   }
> {code}
> with the exception:
> {code:java}
> [info] - SPARK-X: cache table in spark_catalog *** FAILED *** (278 
> milliseconds)
> [info]   org.apache.spark.sql.AnalysisException: spark_catalog.ns.tbl is not 
> a valid TableIdentifier as it has more than 2 name parts.
> [info]   at 
> org.apache.spark.sql.connector.catalog.CatalogV2Implicits$MultipartIdentifierHelper.asTableIdentifier(CatalogV2Implicits.scala:130)
> [info]   at 
> org.apache.spark.sql.hive.test.TestHiveQueryExecution.$anonfun$analyzed$1(TestHive.scala:600)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33005) Kubernetes GA Preparation

2021-01-04 Thread Hyukjin Kwon (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258051#comment-17258051
 ] 

Hyukjin Kwon commented on SPARK-33005:
--

Awesome!

> Kubernetes GA Preparation
> -
>
> Key: SPARK-33005
> URL: https://issues.apache.org/jira/browse/SPARK-33005
> Project: Spark
>  Issue Type: Umbrella
>  Components: Kubernetes
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: releasenotes
> Fix For: 3.1.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33005) Kubernetes GA Preparation

2021-01-04 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-33005:
-
Fix Version/s: 3.1.0

> Kubernetes GA Preparation
> -
>
> Key: SPARK-33005
> URL: https://issues.apache.org/jira/browse/SPARK-33005
> Project: Spark
>  Issue Type: Umbrella
>  Components: Kubernetes
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: releasenotes
> Fix For: 3.1.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33978) Support ZSTD compression in ORC data source

2021-01-04 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-33978:
-

Assignee: Dongjoon Hyun

> Support ZSTD compression in ORC data source
> ---
>
> Key: SPARK-33978
> URL: https://issues.apache.org/jira/browse/SPARK-33978
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>
> h3. What changes were proposed in this pull request?
> This PR aims to support ZSTD compression in ORC data source.
> h3. Why are the changes needed?
> Apache ORC 1.6 supports ZSTD compression to generate more compact files and 
> save the storage cost.
> *BEFORE*
> {code:java}
> scala> spark.range(10).write.option("compression", "zstd").orc("/tmp/zstd")
>  java.lang.IllegalArgumentException: Codec [zstd] is not available. Available 
> codecs are uncompressed, lzo, snappy, zlib, none. {code}
> *AFTER*
> {code:java}
> scala> spark.range(10).write.option("compression", "zstd").orc("/tmp/zstd") 
> {code}
> {code:java}
>  $ orc-tools meta /tmp/zstd 
>  Processing data file 
> file:/tmp/zstd/part-00011-a63d9a17-456f-42d3-87a1-d922112ed28c-c000.orc 
> [length: 230]
>  Structure for 
> file:/tmp/zstd/part-00011-a63d9a17-456f-42d3-87a1-d922112ed28c-c000.orc
>  File Version: 0.12 with ORC_14
>  Rows: 1
>  Compression: ZSTD
>  Compression size: 262144
>  Calendar: Julian/Gregorian
>  Type: struct
> Stripe Statistics:
>  Stripe 1:
>  Column 0: count: 1 hasNull: false
>  Column 1: count: 1 hasNull: false bytesOnDisk: 6 min: 9 max: 9 sum: 9
> File Statistics:
>  Column 0: count: 1 hasNull: false
>  Column 1: count: 1 hasNull: false bytesOnDisk: 6 min: 9 max: 9 sum: 9
> Stripes:
>  Stripe: offset: 3 data: 6 rows: 1 tail: 35 index: 35
>  Stream: column 0 section ROW_INDEX start: 3 length 11
>  Stream: column 1 section ROW_INDEX start: 14 length 24
>  Stream: column 1 section DATA start: 38 length 6
>  Encoding column 0: DIRECT
>  Encoding column 1: DIRECT_V2
> File length: 230 bytes
>  Padding length: 0 bytes
>  Padding ratio: 0%
> User Metadata:
>  org.apache.spark.version=3.2.0{code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-33978) Support ZSTD compression in ORC data source

2021-01-04 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-33978.
---
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 31002
[https://github.com/apache/spark/pull/31002]

> Support ZSTD compression in ORC data source
> ---
>
> Key: SPARK-33978
> URL: https://issues.apache.org/jira/browse/SPARK-33978
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.2.0
>
>
> h3. What changes were proposed in this pull request?
> This PR aims to support ZSTD compression in ORC data source.
> h3. Why are the changes needed?
> Apache ORC 1.6 supports ZSTD compression to generate more compact files and 
> save the storage cost.
> *BEFORE*
> {code:java}
> scala> spark.range(10).write.option("compression", "zstd").orc("/tmp/zstd")
>  java.lang.IllegalArgumentException: Codec [zstd] is not available. Available 
> codecs are uncompressed, lzo, snappy, zlib, none. {code}
> *AFTER*
> {code:java}
> scala> spark.range(10).write.option("compression", "zstd").orc("/tmp/zstd") 
> {code}
> {code:java}
>  $ orc-tools meta /tmp/zstd 
>  Processing data file 
> file:/tmp/zstd/part-00011-a63d9a17-456f-42d3-87a1-d922112ed28c-c000.orc 
> [length: 230]
>  Structure for 
> file:/tmp/zstd/part-00011-a63d9a17-456f-42d3-87a1-d922112ed28c-c000.orc
>  File Version: 0.12 with ORC_14
>  Rows: 1
>  Compression: ZSTD
>  Compression size: 262144
>  Calendar: Julian/Gregorian
>  Type: struct
> Stripe Statistics:
>  Stripe 1:
>  Column 0: count: 1 hasNull: false
>  Column 1: count: 1 hasNull: false bytesOnDisk: 6 min: 9 max: 9 sum: 9
> File Statistics:
>  Column 0: count: 1 hasNull: false
>  Column 1: count: 1 hasNull: false bytesOnDisk: 6 min: 9 max: 9 sum: 9
> Stripes:
>  Stripe: offset: 3 data: 6 rows: 1 tail: 35 index: 35
>  Stream: column 0 section ROW_INDEX start: 3 length 11
>  Stream: column 1 section ROW_INDEX start: 14 length 24
>  Stream: column 1 section DATA start: 38 length 6
>  Encoding column 0: DIRECT
>  Encoding column 1: DIRECT_V2
> File length: 230 bytes
>  Padding length: 0 bytes
>  Padding ratio: 0%
> User Metadata:
>  org.apache.spark.version=3.2.0{code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33276) Fix K8s IT Flakiness

2021-01-04 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-33276:
--
Parent: (was: SPARK-33005)
Issue Type: Bug  (was: Sub-task)

> Fix K8s IT Flakiness
> 
>
> Key: SPARK-33276
> URL: https://issues.apache.org/jira/browse/SPARK-33276
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Tests
>Affects Versions: 3.0.1, 3.1.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> The following two consecutive runs are using the same git hash, 
> a744fea3be12f1a53ab553040b95da730210bc88 .
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20K8s%20Builds/job/spark-master-test-k8s/646/
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20K8s%20Builds/job/spark-master-test-k8s/647/
> However, the second one fails while the first one succeeds.
> {code}
> KubernetesSuite:
> - Run SparkPi with no resources *** FAILED ***
>   The code passed to eventually never returned normally. Attempted 190 times 
> over 3.00269949337 minutes. Last failure message: false was not true. 
> (KubernetesSuite.scala:383)
> - Run SparkPi with a very long application name.
> - Use SparkLauncher.NO_RESOURCE
> - Run SparkPi with a master URL without a scheme.
> - Run SparkPi with an argument.
> - Run SparkPi with custom labels, annotations, and environment variables.
> - All pods have the same service account by default
> - Run extraJVMOptions check on driver
> - Run SparkRemoteFileTest using a remote data file
> - Run SparkPi with env and mount secrets.
> - Run PySpark on simple pi.py example
> - Run PySpark to test a pyfiles example
> - Run PySpark with memory customization
> - Run in client mode.
> - Start pod creation from template
> - PVs with local storage
> - Launcher client dependencies
> - Test basic decommissioning
> - Test basic decommissioning with shuffle cleanup *** FAILED ***
>   The code passed to eventually never returned normally. Attempted 184 times 
> over 3.017213349366 minutes. Last failure message: "++ id -u
>   + myuid=185
>   ++ id -g
>   + mygid=0
>   + set +e
>   ++ getent passwd 185
>   + uidentry=
>   + set -e
>   + '[' -z '' ']'
>   + '[' -w /etc/passwd ']'
>   + echo '185:x:185:0:anonymous uid:/opt/spark:/bin/false'
>   + SPARK_CLASSPATH=':/opt/spark/jars/*'
>   + env
>   + grep SPARK_JAVA_OPT_
>   + sort -t_ -k4 -n
>   + sed 's/[^=]*=\(.*\)/\1/g'
>   + readarray -t SPARK_EXECUTOR_JAVA_OPTS
>   + '[' -n '' ']'
>   + '[' 3 == 3 ']'
>   ++ python3 -V
>   + pyv3='Python 3.7.3'
>   + export PYTHON_VERSION=3.7.3
>   + PYTHON_VERSION=3.7.3
>   + export PYSPARK_PYTHON=python3
>   + PYSPARK_PYTHON=python3
>   + export PYSPARK_DRIVER_PYTHON=python3
>   + PYSPARK_DRIVER_PYTHON=python3
>   + '[' -n '' ']'
>   + '[' -z ']'
>   + case "$1" in
>   + shift 1
>   + CMD=("$SPARK_HOME/bin/spark-submit" --conf 
> "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client 
> "$@")
>   + exec /usr/bin/tini -s -- /opt/spark/bin/spark-submit --conf 
> spark.driver.bindAddress=172.17.0.4 --deploy-mode client --properties-file 
> /opt/spark/conf/spark.properties --class org.apache.spark.deploy.PythonRunner 
> local:///opt/spark/tests/decommissioning_cleanup.py
>   20/10/28 19:47:28 WARN NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
>   Starting decom test
>   Using Spark's default log4j profile: 
> org/apache/spark/log4j-defaults.properties
>   20/10/28 19:47:29 INFO SparkContext: Running Spark version 3.1.0-SNAPSHOT
>   20/10/28 19:47:29 INFO ResourceUtils: 
> ==
>   20/10/28 19:47:29 INFO ResourceUtils: No custom resources configured for 
> spark.driver.
>   20/10/28 19:47:29 INFO ResourceUtils: 
> ==
>   20/10/28 19:47:29 INFO SparkContext: Submitted application: DecomTest
>   20/10/28 19:47:29 INFO ResourceProfile: Default ResourceProfile created, 
> executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: , 
> memory -> name: memory, amount: 1024, script: , vendor: ), task resources: 
> Map(cpus -> name: cpus, amount: 1.0)
>   20/10/28 19:47:29 INFO ResourceProfile: Limiting resource is cpus at 1 
> tasks per executor
>   20/10/28 19:47:29 INFO ResourceProfileManager: Added ResourceProfile id: 0
>   20/10/28 19:47:29 INFO SecurityManager: Changing view acls to: 185,jenkins
>   20/10/28 19:47:29 INFO SecurityManager: Changing modify acls to: 185,jenkins
>   20/10/28 19:47:29 INFO SecurityManager: Changing view acls groups to: 
>   20/10/28 19:47:29 INFO SecurityManager: Changing modify acls groups to: 
>   20/10/28 19:47:29 INFO SecurityManager: SecurityManager: authentication 
> enabled; ui acls disabled; users  with view

[jira] [Updated] (SPARK-28895) Spark client process is unable to upload jars to hdfs while using ConfigMap not HADOOP_CONF_DIR

2021-01-04 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-28895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-28895:
--
Parent: (was: SPARK-33005)
Issue Type: Bug  (was: Sub-task)

> Spark client process is unable to upload jars to hdfs while using ConfigMap 
> not HADOOP_CONF_DIR
> ---
>
> Key: SPARK-28895
> URL: https://issues.apache.org/jira/browse/SPARK-28895
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Spark Core
>Affects Versions: 3.0.0
>Reporter: Kent Yao
>Priority: Major
>
> The *BasicDriverFeatureStep* for Spark on Kubernetes will upload the 
> files/jars specified by --files/–jars to a hadoop compatible file system 
> configured by spark.kubernetes.file.upload.path. While using HADOOP_CONF_DIR, 
> the spark-submit process can recognize the file system, but when using 
> spark.kubernetes.hadoop.configMapName which only will be mount on the Pods 
> not applied back to our client process. 
>  
> ||Heading 1||Heading 2||
> |HADOOP_CONF_DIR=/path/to/etc/hadoop|OK|
> |spark.kubernetes.hadoop.configMapName=hz10-hadoop-dir |FAILED|
>  
> {code:java}
>  Kent@KentsMacBookPro  
> ~/Documents/spark-on-k8s/spark-3.0.0-SNAPSHOT-bin-2.7.3  bin/spark-submit 
> --conf spark.kubernetes.file.upload.path=hdfs://hz-cluster10/user/kyuubi/udf 
> --jars 
> /Users/Kent/Documents/spark-on-k8s/spark-3.0.0-SNAPSHOT-bin-2.7.3/hadoop-lzo-0.4.20-SNAPSHOT.jar
>  --conf spark.kerberos.keytab=/Users/Kent/Downloads/kyuubi.keytab --conf 
> spark.kerberos.principal=kyuubi/d...@hadoop.hz.netease.com --conf  
> spark.kubernetes.kerberos.krb5.path=/etc/krb5.conf  --name hehe --deploy-mode 
> cluster --class org.apache.spark.examples.HdfsTest   
> local:///opt/spark/examples/jars/spark-examples_2.12-3.0.0-SNAPSHOT.jar 
> hdfs://hz-cluster10/user/kyuubi/hive_db/kyuubi.db/hive_tbl
> Listening for transport dt_socket at address: 50014
> # spark.master=k8s://https://10.120.238.100:7443
> 19/08/27 17:21:06 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> Using Spark's default log4j profile: 
> org/apache/spark/log4j-defaults.properties
> 19/08/27 17:21:07 INFO SparkKubernetesClientFactory: Auto-configuring K8S 
> client using current context from users K8S config file
> Listening for transport dt_socket at address: 50014
> Exception in thread "main" org.apache.spark.SparkException: Uploading file 
> /Users/Kent/Documents/spark-on-k8s/spark-3.0.0-SNAPSHOT-bin-2.7.3/hadoop-lzo-0.4.20-SNAPSHOT.jar
>  failed...
>   at 
> org.apache.spark.deploy.k8s.KubernetesUtils$.uploadFileUri(KubernetesUtils.scala:287)
>   at 
> org.apache.spark.deploy.k8s.KubernetesUtils$.$anonfun$uploadAndTransformFileUris$1(KubernetesUtils.scala:246)
>   at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:237)
>   at 
> scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
>   at 
> scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
>   at scala.collection.TraversableLike.map(TraversableLike.scala:237)
>   at scala.collection.TraversableLike.map$(TraversableLike.scala:230)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:108)
>   at 
> org.apache.spark.deploy.k8s.KubernetesUtils$.uploadAndTransformFileUris(KubernetesUtils.scala:245)
>   at 
> org.apache.spark.deploy.k8s.features.BasicDriverFeatureStep.$anonfun$getAdditionalPodSystemProperties$1(BasicDriverFeatur#
>  spark.master=k8s://https://10.120.238.100:7443
> eStep.scala:165)
>   at scala.collection.immutable.List.foreach(List.scala:392)
>   at 
> org.apache.spark.deploy.k8s.features.BasicDriverFeatureStep.getAdditionalPodSystemProperties(BasicDriverFeatureStep.scala:163)
>   at 
> org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.$anonfun$buildFromFeatures$3(KubernetesDriverBuilder.scala:60)
>   at 
> scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
>   at 
> scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
>   at scala.collection.immutable.List.foldLeft(List.scala:89)
>   at 
> org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.buildFromFeatures(KubernetesDriverBuilder.scala:58)
>   at 
> org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:101)
>   at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$10(KubernetesClientApplication.scala:236)
>   at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$10$adapted(KubernetesClientApplication.scala:229)
>

[jira] [Updated] (SPARK-33349) ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed

2021-01-04 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-33349:
--
Parent: (was: SPARK-33005)
Issue Type: Bug  (was: Sub-task)

> ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed
> --
>
> Key: SPARK-33349
> URL: https://issues.apache.org/jira/browse/SPARK-33349
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.0.1, 3.0.2, 3.1.0
>Reporter: Nicola Bova
>Priority: Critical
>
> I launch my spark application with the 
> [spark-on-kubernetes-operator|https://github.com/GoogleCloudPlatform/spark-on-k8s-operator]
>  with the following yaml file:
> {code:yaml}
> apiVersion: sparkoperator.k8s.io/v1beta2
> kind: SparkApplication
> metadata:
>    name: spark-kafka-streamer-test
>    namespace: kafka2hdfs
> spec: 
>    type: Scala
>    mode: cluster
>    image: /spark:3.0.2-SNAPSHOT-2.12-0.1.0
>    imagePullPolicy: Always
>    timeToLiveSeconds: 259200
>    mainClass: path.to.my.class.KafkaStreamer
>    mainApplicationFile: spark-kafka-streamer_2.12-spark300-assembly.jar
>    sparkVersion: 3.0.1
>    restartPolicy:
>  type: Always
>    sparkConf:
>  "spark.kafka.consumer.cache.capacity": "8192"
>  "spark.kubernetes.memoryOverheadFactor": "0.3"
>    deps:
>    jars:
>  - my
>  - jar
>  - list
>    hadoopConfigMap: hdfs-config
>    driver:
>  cores: 4
>  memory: 12g
>  labels:
>    version: 3.0.1
>  serviceAccount: default
>  javaOptions: 
> "-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties"
>   executor:
>  instances: 4
>     cores: 4
>     memory: 16g
>     labels:
>   version: 3.0.1
>     javaOptions: 
> "-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties"
> {code}
>  I have tried with both Spark `3.0.1` and `3.0.2-SNAPSHOT` with the ["Restart 
> the watcher when we receive a version changed from 
> k8s"|https://github.com/apache/spark/pull/29533] patch.
> This is the driver log:
> {code}
> 20/11/04 12:16:02 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> ... // my app log, it's a structured streaming app reading from kafka and 
> writing to hdfs
> 20/11/04 13:12:12 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has 
> been closed (this is expected if the application is shutting down.)
> io.fabric8.kubernetes.client.KubernetesClientException: too old resource 
> version: 1574101276 (1574213896)
>  at 
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onMessage(WatchConnectionManager.java:259)
>  at okhttp3.internal.ws.RealWebSocket.onReadMessage(RealWebSocket.java:323)
>  at 
> okhttp3.internal.ws.WebSocketReader.readMessageFrame(WebSocketReader.java:219)
>  at 
> okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:105)
>  at okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:274)
>  at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:214)
>  at okhttp3.RealCall$AsyncCall.execute(RealCall.java:203)
>  at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
>  at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown 
> Source)
>  at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown 
> Source)
>  at java.base/java.lang.Thread.run(Unknown Source)
> {code}
> The error above appears after roughly 50 minutes.
> After the exception above, no more logs are produced and the app hangs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-28992) Support update dependencies from hdfs when task run on executor pods

2021-01-04 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-28992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-28992:
--
Parent: (was: SPARK-33005)
Issue Type: Improvement  (was: Sub-task)

> Support update dependencies from hdfs when task run on executor pods
> 
>
> Key: SPARK-28992
> URL: https://issues.apache.org/jira/browse/SPARK-28992
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes, Spark Core
>Affects Versions: 3.1.0
>Reporter: Kent Yao
>Priority: Major
>
> Here is a case: 
> {code:java}
> bin/spark-submit  --class com.github.ehiggs.spark.terasort.TeraSort 
> hdfs://hz-cluster10/user/kyuubi/udf/spark-terasort-1.1-SNAPSHOT-jar-with-dependencies.jar
>  hdfs://hz-cluster10/user/kyuubi/terasort/1000g 
> hdfs://hz-cluster10/user/kyuubi/terasort/1000g-out1
> {code}
> Spark supports add jar logic and application-jar from hdfs - -  
> [http://spark.apache.org/docs/latest/submitting-applications.html#launching-applications-with-spark-submit]
> Take spark on yarn for example, it creates a __spark_hadoop_conf__.xml file 
> and upload the hadoop distribute cache, the executor processes can use this 
> to identify where their dependencies located.
> But on k8s, i tried and failed to update dependencies.
> {code:java}
> 19/09/04 08:08:52 INFO scheduler.DAGScheduler: ShuffleMapStage 0 
> (newAPIHadoopFile at TeraSort.scala:60) failed in 1.058 s due to Job aborted 
> due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent 
> failure: Lost task 0.3 in stage 0.0 (TID 9, 100.66.0.75, executor 2): 
> java.lang.IllegalArgumentException: java.net.UnknownHostException: 
> hz-cluster10
> 19/09/04 08:08:52 INFO scheduler.DAGScheduler: ShuffleMapStage 0 
> (newAPIHadoopFile at TeraSort.scala:60) failed in 1.058 s due to Job aborted 
> due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent 
> failure: Lost task 0.3 in stage 0.0 (TID 9, 100.66.0.75, executor 2): 
> java.lang.IllegalArgumentException: java.net.UnknownHostException: 
> hz-cluster10 at 
> org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:378)
>  at 
> org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:310)
>  at 
> org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:176) 
> at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:678) at 
> org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:619) at 
> org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:149)
>  at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2669) at 
> org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94) at 
> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703) at 
> org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685) at 
> org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373) at 
> org.apache.spark.util.Utils$.getHadoopFileSystem(Utils.scala:1881) at 
> org.apache.spark.util.Utils$.doFetchFile(Utils.scala:737) at 
> org.apache.spark.util.Utils$.fetchFile(Utils.scala:522) at 
> org.apache.spark.executor.Executor.$anonfun$updateDependencies$7(Executor.scala:869)
>  at 
> org.apache.spark.executor.Executor.$anonfun$updateDependencies$7$adapted(Executor.scala:860)
>  at 
> scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:792)
>  at scala.collection.mutable.HashMap.$anonfun$foreach$1(HashMap.scala:149) at 
> scala.collection.mutable.HashTable.foreachEntry(HashTable.scala:237) at 
> scala.collection.mutable.HashTable.foreachEntry$(HashTable.scala:230) at 
> scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:44) at 
> scala.collection.mutable.HashMap.foreach(HashMap.scala:149) at 
> scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:791)
>  at 
> org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$updateDependencies(Executor.scala:860)
>  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:409) at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33952) Python-friendly dtypes for pyspark dataframes

2021-01-04 Thread Marc de Lignie (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258046#comment-17258046
 ] 

Marc de Lignie commented on SPARK-33952:


@[~hyukjin.kwon] Thanks for asking. When you write a pyspark UDF or get a 
pyspark DataFrame returned after a collect() it is much more recognizable to 
know that a column datatype is "[Row(x:[Row(x1:string, x2:string)], y:string, 
z:string)]" rather than "array>, 
y:string, z:string>>". Of course, this remains a matter of taste. Also, the 
original dtypes in terms of array, struct, map remain useful when applying 
push-down functions for which the documentation and naming uses these terms.

> Python-friendly dtypes for pyspark dataframes
> -
>
> Key: SPARK-33952
> URL: https://issues.apache.org/jira/browse/SPARK-33952
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Marc de Lignie
>Priority: Minor
>
> The pyspark.sql.DataFrame.dtypes attribute contains string representations of 
> the column datatypes in terms of JVM datatypes. However, for a python user it 
> is a significant mental step to translate these to the corresponding python 
> types encountered in UDF's and collected dataframes. This holds in particular 
> for nested composite datatypes (array, map and struct). It is proposed to 
> provide python-friendly dtypes in pyspark (as an addition, not a replacement) 
> in which array<>, map<> and struct<> are translated to [], {} and Row().
> Sample code, including tests, is available as [gist on 
> github|https://gist.github.com/vtslab/81ded1a7af006100e00bf2a4a70a8147]. More 
> explanation is provided at: 
> [https://yaaics.blogspot.com/2020/12/python-friendly-dtypes-for-pyspark.html]
> If this proposal finds sufficient support, I can provide a PR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33992) resolveOperatorsUpWithNewOutput should wrap allowInvokingTransformsInAnalyzer

2021-01-04 Thread Kent Yao (Jira)

Kent Yao created SPARK-33992:


 Summary: resolveOperatorsUpWithNewOutput should wrap 
allowInvokingTransformsInAnalyzer
 Key: SPARK-33992
 URL: https://issues.apache.org/jira/browse/SPARK-33992
 Project: Spark
  Issue Type: Bug
  Components: SQL, Tests
Affects Versions: 3.1.0
Reporter: Kent Yao


PaddingAndLengthCheckForCharVarchar could fail query when 
resolveOperatorsUpWithNewOutput
with 


{code:java}
[info] - char/varchar resolution in sub query  *** FAILED *** (367 milliseconds)
[info]   java.lang.RuntimeException: This method should not be called in the 
analyzer
[info]   at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.assertNotAnalysisRule(AnalysisHelper.scala:150)
[info]   at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.assertNotAnalysisRule$(AnalysisHelper.scala:146)
[info]   at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.assertNotAnalysisRule(LogicalPlan.scala:29)
[info]   at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDown(AnalysisHelper.scala:161)
[info]   at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDown$(AnalysisHelper.scala:160)
[info]   at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29)
[info]   at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29)
[info]   at 
org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$updateOuterReferencesInSubquery(QueryPlan.scala:267)
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33992) resolveOperatorsUpWithNewOutput should wrap allowInvokingTransformsInAnalyzer

2021-01-04 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258192#comment-17258192
 ] 

Apache Spark commented on SPARK-33992:
--

User 'yaooqinn' has created a pull request for this issue:
https://github.com/apache/spark/pull/31013

> resolveOperatorsUpWithNewOutput should wrap allowInvokingTransformsInAnalyzer
> -
>
> Key: SPARK-33992
> URL: https://issues.apache.org/jira/browse/SPARK-33992
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Affects Versions: 3.1.0
>Reporter: Kent Yao
>Priority: Minor
>
> PaddingAndLengthCheckForCharVarchar could fail query when 
> resolveOperatorsUpWithNewOutput
> with 
> {code:java}
> [info] - char/varchar resolution in sub query  *** FAILED *** (367 
> milliseconds)
> [info]   java.lang.RuntimeException: This method should not be called in the 
> analyzer
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.assertNotAnalysisRule(AnalysisHelper.scala:150)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.assertNotAnalysisRule$(AnalysisHelper.scala:146)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.assertNotAnalysisRule(LogicalPlan.scala:29)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDown(AnalysisHelper.scala:161)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDown$(AnalysisHelper.scala:160)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$updateOuterReferencesInSubquery(QueryPlan.scala:267)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33992) resolveOperatorsUpWithNewOutput should wrap allowInvokingTransformsInAnalyzer

2021-01-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33992:


Assignee: Apache Spark

> resolveOperatorsUpWithNewOutput should wrap allowInvokingTransformsInAnalyzer
> -
>
> Key: SPARK-33992
> URL: https://issues.apache.org/jira/browse/SPARK-33992
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Affects Versions: 3.1.0
>Reporter: Kent Yao
>Assignee: Apache Spark
>Priority: Minor
>
> PaddingAndLengthCheckForCharVarchar could fail query when 
> resolveOperatorsUpWithNewOutput
> with 
> {code:java}
> [info] - char/varchar resolution in sub query  *** FAILED *** (367 
> milliseconds)
> [info]   java.lang.RuntimeException: This method should not be called in the 
> analyzer
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.assertNotAnalysisRule(AnalysisHelper.scala:150)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.assertNotAnalysisRule$(AnalysisHelper.scala:146)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.assertNotAnalysisRule(LogicalPlan.scala:29)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDown(AnalysisHelper.scala:161)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDown$(AnalysisHelper.scala:160)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$updateOuterReferencesInSubquery(QueryPlan.scala:267)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33990) v2 ALTER TABLE .. DROP PARTITION does not remove data from dropped partition

2021-01-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33990:


Assignee: Apache Spark

> v2 ALTER TABLE .. DROP PARTITION does not remove data from dropped partition
> 
>
> Key: SPARK-33990
> URL: https://issues.apache.org/jira/browse/SPARK-33990
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Assignee: Apache Spark
>Priority: Major
>
> The test fails:
> {code:scala}
>   test("SPARK-X: don not return data from dropped partition") {
> withNamespaceAndTable("ns", "tbl") { t =>
>   sql(s"CREATE TABLE $t (id int, part int) $defaultUsing PARTITIONED BY 
> (part)")
>   sql(s"INSERT INTO $t PARTITION (part=0) SELECT 0")
>   sql(s"INSERT INTO $t PARTITION (part=1) SELECT 1")
>   QueryTest.checkAnswer(sql(s"SELECT * FROM $t"), Seq(Row(0, 0), Row(1, 
> 1)))
>   sql(s"ALTER TABLE $t DROP PARTITION (part=0)")
>   QueryTest.checkAnswer(sql(s"SELECT * FROM $t"), Seq(Row(1, 1)))
> }
>   }
> {code}
> on the last check with:
> {code}
> == Results ==
> !== Correct Answer - 1 ==   == Spark Answer - 2 ==
> !struct<>   struct
> ![1,1]  [0,0]
> !   [1,1]
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33990) v2 ALTER TABLE .. DROP PARTITION does not remove data from dropped partition

2021-01-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33990:


Assignee: (was: Apache Spark)

> v2 ALTER TABLE .. DROP PARTITION does not remove data from dropped partition
> 
>
> Key: SPARK-33990
> URL: https://issues.apache.org/jira/browse/SPARK-33990
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Priority: Major
>
> The test fails:
> {code:scala}
>   test("SPARK-X: don not return data from dropped partition") {
> withNamespaceAndTable("ns", "tbl") { t =>
>   sql(s"CREATE TABLE $t (id int, part int) $defaultUsing PARTITIONED BY 
> (part)")
>   sql(s"INSERT INTO $t PARTITION (part=0) SELECT 0")
>   sql(s"INSERT INTO $t PARTITION (part=1) SELECT 1")
>   QueryTest.checkAnswer(sql(s"SELECT * FROM $t"), Seq(Row(0, 0), Row(1, 
> 1)))
>   sql(s"ALTER TABLE $t DROP PARTITION (part=0)")
>   QueryTest.checkAnswer(sql(s"SELECT * FROM $t"), Seq(Row(1, 1)))
> }
>   }
> {code}
> on the last check with:
> {code}
> == Results ==
> !== Correct Answer - 1 ==   == Spark Answer - 2 ==
> !struct<>   struct
> ![1,1]  [0,0]
> !   [1,1]
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33991) Repair enumeration conversion error for AllJobsPage

2021-01-04 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258228#comment-17258228
 ] 

Apache Spark commented on SPARK-33991:
--

User 'FelixYik' has created a pull request for this issue:
https://github.com/apache/spark/pull/31015

> Repair enumeration conversion error for AllJobsPage
> ---
>
> Key: SPARK-33991
> URL: https://issues.apache.org/jira/browse/SPARK-33991
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Felix Yi
>Priority: Critical
>
> For AllJobsPage class, AllJobsPage gets the schedulingMode of enumerated type 
> by loading the spark.scheduler.mode configuration from Sparkconf, but an 
> enumeration conversion error occurs when I set the value of this 
> configuration to lowercase.
> The reason for this problem is that the value of the SchedulingMode 
> enumeration class is uppercase, which occurs when I configure spark. 
> scheduler.mode to be lowercase.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33991) Repair enumeration conversion error for AllJobsPage

2021-01-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33991:


Assignee: Apache Spark

> Repair enumeration conversion error for AllJobsPage
> ---
>
> Key: SPARK-33991
> URL: https://issues.apache.org/jira/browse/SPARK-33991
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Felix Yi
>Assignee: Apache Spark
>Priority: Critical
>
> For AllJobsPage class, AllJobsPage gets the schedulingMode of enumerated type 
> by loading the spark.scheduler.mode configuration from Sparkconf, but an 
> enumeration conversion error occurs when I set the value of this 
> configuration to lowercase.
> The reason for this problem is that the value of the SchedulingMode 
> enumeration class is uppercase, which occurs when I configure spark. 
> scheduler.mode to be lowercase.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33991) Repair enumeration conversion error for AllJobsPage

2021-01-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33991:


Assignee: (was: Apache Spark)

> Repair enumeration conversion error for AllJobsPage
> ---
>
> Key: SPARK-33991
> URL: https://issues.apache.org/jira/browse/SPARK-33991
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Felix Yi
>Priority: Critical
>
> For AllJobsPage class, AllJobsPage gets the schedulingMode of enumerated type 
> by loading the spark.scheduler.mode configuration from Sparkconf, but an 
> enumeration conversion error occurs when I set the value of this 
> configuration to lowercase.
> The reason for this problem is that the value of the SchedulingMode 
> enumeration class is uppercase, which occurs when I configure spark. 
> scheduler.mode to be lowercase.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33991) Repair enumeration conversion error for page showing list

2021-01-04 Thread kaif Yi (Jira)

kaif Yi created SPARK-33991:
---

 Summary: Repair enumeration conversion error for page showing list
 Key: SPARK-33991
 URL: https://issues.apache.org/jira/browse/SPARK-33991
 Project: Spark
  Issue Type: Bug
  Components: Web UI
Affects Versions: 3.0.0
 Environment: For AllJobsPage class, AllJobsPage gets the 
schedulingMode of enumerated type by loading the spark.scheduler.mode 
configuration from Sparkconf, but an enumeration conversion error occurs when I 
set the value of this configuration to lowercase.
Reporter: kaif Yi






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33991) Repair enumeration conversion error for page showing list

2021-01-04 Thread kaif Yi (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kaif Yi updated SPARK-33991:

Description: For AllJobsPage class, AllJobsPage gets the schedulingMode of 
enumerated type by loading the spark.scheduler.mode configuration from 
Sparkconf, but an enumeration conversion error occurs when I set the value of 
this configuration to lowercase.
Environment: (was: For AllJobsPage class, AllJobsPage gets the 
schedulingMode of enumerated type by loading the spark.scheduler.mode 
configuration from Sparkconf, but an enumeration conversion error occurs when I 
set the value of this configuration to lowercase.)

> Repair enumeration conversion error for page showing list
> -
>
> Key: SPARK-33991
> URL: https://issues.apache.org/jira/browse/SPARK-33991
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.0.0
>Reporter: kaif Yi
>Priority: Critical
>
> For AllJobsPage class, AllJobsPage gets the schedulingMode of enumerated type 
> by loading the spark.scheduler.mode configuration from Sparkconf, but an 
> enumeration conversion error occurs when I set the value of this 
> configuration to lowercase.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33991) Repair enumeration conversion error for AllJobsPage

2021-01-04 Thread Felix Yi (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Yi updated SPARK-33991:
-
Description: 
For AllJobsPage class, AllJobsPage gets the schedulingMode of enumerated type 
by loading the spark.scheduler.mode configuration from Sparkconf, but an 
enumeration conversion error occurs when I set the value of this configuration 
to lowercase.

The reason for this problem is that the value of the SchedulingMode enumeration 
class is uppercase, which occurs when I configure spark. scheduler.mode to be 
lowercase.

  was:For AllJobsPage class, AllJobsPage gets the schedulingMode of enumerated 
type by loading the spark.scheduler.mode configuration from Sparkconf, but an 
enumeration conversion error occurs when I set the value of this configuration 
to lowercase.


> Repair enumeration conversion error for AllJobsPage
> ---
>
> Key: SPARK-33991
> URL: https://issues.apache.org/jira/browse/SPARK-33991
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Felix Yi
>Priority: Critical
>
> For AllJobsPage class, AllJobsPage gets the schedulingMode of enumerated type 
> by loading the spark.scheduler.mode configuration from Sparkconf, but an 
> enumeration conversion error occurs when I set the value of this 
> configuration to lowercase.
> The reason for this problem is that the value of the SchedulingMode 
> enumeration class is uppercase, which occurs when I configure spark. 
> scheduler.mode to be lowercase.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-33948) branch-3.1 jenkins test failed in Scala 2.13

2021-01-04 Thread Yang Jie (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258217#comment-17258217
 ] 

Yang Jie edited comment on SPARK-33948 at 1/4/21, 2:00 PM:
---

*Sync:*
{code:java}
commit 58583f7c3fdcac1232607a7ab4b0d052320ac3ea (HEAD -> branch-3.1)
Author: xuewei.linxuewei 
Date:   Wed Dec 2 16:10:45 2020 +
[SPARK-33619][SQL] Fix GetMapValueUtil code generation error

Run completed in 11 minutes, 51 seconds.
Total number of tests run: 4623
Suites: completed 256, aborted 0
Tests: succeeded 4607, failed 16, canceled 0, ignored 5, pending 0
*** 16 TESTS FAILED ***
{code}
{code:java}
commit df8d3f1bf779ce1a9f3520939ab85814f09b48b7 (HEAD -> branch-3.1)
Author: HyukjinKwon 
Date:   Wed Dec 2 16:03:08 2020 +
[SPARK-33544][SQL][FOLLOW-UP] Rename NoSideEffect to NoThrow and clarify the 
documentation more

Run completed in 10 minutes, 39 seconds.
Total number of tests run: 4622
Suites: completed 256, aborted 0
Tests: succeeded 4622, failed 0, canceled 0, ignored 5, pending 0
All tests passed.
{code}
After SPARK-33619 , there are 16 TESTS FAILED in branch-3.1, no further 
investigation yet, and I'm not sure why the master branch was successful, need 
more time to analyze.

 


was (Author: luciferyang):
*Sync:*
{code:java}
commit 58583f7c3fdcac1232607a7ab4b0d052320ac3ea (HEAD -> branch-3.1)
Author: xuewei.linxuewei 
Date:   Wed Dec 2 16:10:45 2020 +
[SPARK-33619][SQL] Fix GetMapValueUtil code generation error

Run completed in 11 minutes, 51 seconds.
Total number of tests run: 4623
Suites: completed 256, aborted 0
Tests: succeeded 4607, failed 16, canceled 0, ignored 5, pending 0
*** 16 TESTS FAILED ***
{code}
{code:java}
commit df8d3f1bf779ce1a9f3520939ab85814f09b48b7 (HEAD -> branch-3.1)
Author: HyukjinKwon 
Date:   Wed Dec 2 16:03:08 2020 +
[SPARK-33544][SQL][FOLLOW-UP] Rename NoSideEffect to NoThrow and clarify the 
documentation more

Run completed in 10 minutes, 39 seconds.
Total number of tests run: 4622
Suites: completed 256, aborted 0
Tests: succeeded 4622, failed 0, canceled 0, ignored 5, pending 0
All tests passed.
{code}
After SPARK-33619 , there are 16 TESTS FAILED in branch-3.1, no further 
investigation yet

> branch-3.1 jenkins test failed in Scala 2.13 
> -
>
> Key: SPARK-33948
> URL: https://issues.apache.org/jira/browse/SPARK-33948
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, SQL
>Affects Versions: 3.1.0
> Environment: * 
>  
>Reporter: Yang Jie
>Priority: Major
>
> [https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/#showFailuresLink]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.reuseClientsUpToConfigVariableConcurrent|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/reuseClientsUpToConfigVariableConcurrent/]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.reuseClientsUpToConfigVariableConcurrent|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/reuseClientsUpToConfigVariableConcurrent_2/]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.fastFailConnectionInTimeWindow|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/fastFailConnectionInTimeWindow/]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.fastFailConnectionInTimeWindow|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/fastFailConnectionInTimeWindow_2/]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.closeFactoryBeforeCreateClient|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/closeFactoryBeforeCreateClient/]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.closeFactoryBeforeCreateClient|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/closeFactoryBeforeCreateClient_2/]
>  * 
>

[jira] [Updated] (SPARK-33991) Repair enumeration conversion error for page showing list of all ongoing and recently finished jobs

2021-01-04 Thread kaif Yi (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kaif Yi updated SPARK-33991:

Summary: Repair enumeration conversion error for page showing list of all 
ongoing and recently finished jobs  (was: Repair enumeration conversion error 
for page showing list)

> Repair enumeration conversion error for page showing list of all ongoing and 
> recently finished jobs
> ---
>
> Key: SPARK-33991
> URL: https://issues.apache.org/jira/browse/SPARK-33991
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.0.0
>Reporter: kaif Yi
>Priority: Critical
>
> For AllJobsPage class, AllJobsPage gets the schedulingMode of enumerated type 
> by loading the spark.scheduler.mode configuration from Sparkconf, but an 
> enumeration conversion error occurs when I set the value of this 
> configuration to lowercase.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33991) Repair enumeration conversion error for AllJobsPage

2021-01-04 Thread kaif Yi (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kaif Yi updated SPARK-33991:

Summary: Repair enumeration conversion error for AllJobsPage  (was: Repair 
enumeration conversion error for page showing list of all ongoing and recently 
finished jobs)

> Repair enumeration conversion error for AllJobsPage
> ---
>
> Key: SPARK-33991
> URL: https://issues.apache.org/jira/browse/SPARK-33991
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.0.0
>Reporter: kaif Yi
>Priority: Critical
>
> For AllJobsPage class, AllJobsPage gets the schedulingMode of enumerated type 
> by loading the spark.scheduler.mode configuration from Sparkconf, but an 
> enumeration conversion error occurs when I set the value of this 
> configuration to lowercase.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33990) v2 ALTER TABLE .. DROP PARTITION does not remove data from dropped partition

2021-01-04 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258195#comment-17258195
 ] 

Apache Spark commented on SPARK-33990:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/31014

> v2 ALTER TABLE .. DROP PARTITION does not remove data from dropped partition
> 
>
> Key: SPARK-33990
> URL: https://issues.apache.org/jira/browse/SPARK-33990
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Priority: Major
>
> The test fails:
> {code:scala}
>   test("SPARK-X: don not return data from dropped partition") {
> withNamespaceAndTable("ns", "tbl") { t =>
>   sql(s"CREATE TABLE $t (id int, part int) $defaultUsing PARTITIONED BY 
> (part)")
>   sql(s"INSERT INTO $t PARTITION (part=0) SELECT 0")
>   sql(s"INSERT INTO $t PARTITION (part=1) SELECT 1")
>   QueryTest.checkAnswer(sql(s"SELECT * FROM $t"), Seq(Row(0, 0), Row(1, 
> 1)))
>   sql(s"ALTER TABLE $t DROP PARTITION (part=0)")
>   QueryTest.checkAnswer(sql(s"SELECT * FROM $t"), Seq(Row(1, 1)))
> }
>   }
> {code}
> on the last check with:
> {code}
> == Results ==
> !== Correct Answer - 1 ==   == Spark Answer - 2 ==
> !struct<>   struct
> ![1,1]  [0,0]
> !   [1,1]
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33736) Handle MERGE in ReplaceNullWithFalseInPredicate

2021-01-04 Thread Anton Okolnychyi (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258222#comment-17258222
 ] 

Anton Okolnychyi commented on SPARK-33736:
--

Sorry, I was on holidays. Will get back to the PR this week.

> Handle MERGE in ReplaceNullWithFalseInPredicate
> ---
>
> Key: SPARK-33736
> URL: https://issues.apache.org/jira/browse/SPARK-33736
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Anton Okolnychyi
>Priority: Major
>
> We need to handle merge statements in {{ReplaceNullWithFalseInPredicate}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33994) ORC encryption interop

2021-01-04 Thread Gidon Gershinsky (Jira)

Gidon Gershinsky created SPARK-33994:


 Summary: ORC encryption interop
 Key: SPARK-33994
 URL: https://issues.apache.org/jira/browse/SPARK-33994
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.1.0
Reporter: Gidon Gershinsky


Test interoperability between stand-alone ORC encryption, and Spark-managed ORC 
encryption



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33991) Repair enumeration conversion error for AllJobsPage

2021-01-04 Thread Felix Yi (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Yi updated SPARK-33991:
-
Component/s: (was: Web UI)
 Spark Core

> Repair enumeration conversion error for AllJobsPage
> ---
>
> Key: SPARK-33991
> URL: https://issues.apache.org/jira/browse/SPARK-33991
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Felix Yi
>Priority: Critical
>
> For AllJobsPage class, AllJobsPage gets the schedulingMode of enumerated type 
> by loading the spark.scheduler.mode configuration from Sparkconf, but an 
> enumeration conversion error occurs when I set the value of this 
> configuration to lowercase.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33991) Repair enumeration conversion error for AllJobsPage

2021-01-04 Thread Felix Yi (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258201#comment-17258201
 ] 

Felix Yi commented on SPARK-33991:
--

I saw that the #org.apache.spark.scheduler.TaskSchedulerImpl class convert the 
spark. scheduler.mode value to uppercase, so I think it should be converted in 
AllJobsPage as well.
{code:java}
val schedulingMode: SchedulingMode =
  try {
SchedulingMode.withName(schedulingModeConf.toUpperCase(Locale.ROOT))
  } catch {
case e: java.util.NoSuchElementException =>
  throw new SparkException(s"Unrecognized $SCHEDULER_MODE_PROPERTY: 
$schedulingModeConf")
  }
{code}

> Repair enumeration conversion error for AllJobsPage
> ---
>
> Key: SPARK-33991
> URL: https://issues.apache.org/jira/browse/SPARK-33991
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Felix Yi
>Priority: Critical
>
> For AllJobsPage class, AllJobsPage gets the schedulingMode of enumerated type 
> by loading the spark.scheduler.mode configuration from Sparkconf, but an 
> enumeration conversion error occurs when I set the value of this 
> configuration to lowercase.
> The reason for this problem is that the value of the SchedulingMode 
> enumeration class is uppercase, which occurs when I configure spark. 
> scheduler.mode to be lowercase.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-25075) Build and test Spark against Scala 2.13

2021-01-04 Thread Guillaume Martres (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-25075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258218#comment-17258218
 ] 

Guillaume Martres commented on SPARK-25075:
---

Now that 2.13 support is basically complete, would it be possible to publish a 
preview release of spark 3.1 built against scala 2.13 on maven for testing 
purposes? Thanks!

> Build and test Spark against Scala 2.13
> ---
>
> Key: SPARK-25075
> URL: https://issues.apache.org/jira/browse/SPARK-25075
> Project: Spark
>  Issue Type: Umbrella
>  Components: Build, MLlib, Project Infra, Spark Core, SQL
>Affects Versions: 3.0.0
>Reporter: Guillaume Massé
>Priority: Major
>
> This umbrella JIRA tracks the requirements for building and testing Spark 
> against the current Scala 2.13 milestone.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33948) branch-3.1 jenkins test failed in Scala 2.13

2021-01-04 Thread Yang Jie (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258217#comment-17258217
 ] 

Yang Jie commented on SPARK-33948:
--

*Sync:*
{code:java}
commit 58583f7c3fdcac1232607a7ab4b0d052320ac3ea (HEAD -> branch-3.1)
Author: xuewei.linxuewei 
Date:   Wed Dec 2 16:10:45 2020 +
[SPARK-33619][SQL] Fix GetMapValueUtil code generation error

Run completed in 11 minutes, 51 seconds.
Total number of tests run: 4623
Suites: completed 256, aborted 0
Tests: succeeded 4607, failed 16, canceled 0, ignored 5, pending 0
*** 16 TESTS FAILED ***
{code}
{code:java}
commit df8d3f1bf779ce1a9f3520939ab85814f09b48b7 (HEAD -> branch-3.1)
Author: HyukjinKwon 
Date:   Wed Dec 2 16:03:08 2020 +
[SPARK-33544][SQL][FOLLOW-UP] Rename NoSideEffect to NoThrow and clarify the 
documentation more

Run completed in 10 minutes, 39 seconds.
Total number of tests run: 4622
Suites: completed 256, aborted 0
Tests: succeeded 4622, failed 0, canceled 0, ignored 5, pending 0
All tests passed.
{code}
After SPARK-33619 , there are 16 TESTS FAILED in branch-3.1, no further 
investigation yet

> branch-3.1 jenkins test failed in Scala 2.13 
> -
>
> Key: SPARK-33948
> URL: https://issues.apache.org/jira/browse/SPARK-33948
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, SQL
>Affects Versions: 3.1.0
> Environment: * 
>  
>Reporter: Yang Jie
>Priority: Major
>
> [https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/#showFailuresLink]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.reuseClientsUpToConfigVariableConcurrent|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/reuseClientsUpToConfigVariableConcurrent/]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.reuseClientsUpToConfigVariableConcurrent|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/reuseClientsUpToConfigVariableConcurrent_2/]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.fastFailConnectionInTimeWindow|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/fastFailConnectionInTimeWindow/]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.fastFailConnectionInTimeWindow|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/fastFailConnectionInTimeWindow_2/]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.closeFactoryBeforeCreateClient|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/closeFactoryBeforeCreateClient/]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.closeFactoryBeforeCreateClient|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/closeFactoryBeforeCreateClient_2/]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.closeBlockClientsWithFactory|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/closeBlockClientsWithFactory/]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.closeBlockClientsWithFactory|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/closeBlockClientsWithFactory_2/]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.neverReturnInactiveClients|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/neverReturnInactiveClients/]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.neverReturnInactiveClients|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/neverReturnInactiveClients_2/]
>  * 
>

[jira] [Assigned] (SPARK-33992) resolveOperatorsUpWithNewOutput should wrap allowInvokingTransformsInAnalyzer

2021-01-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33992:


Assignee: (was: Apache Spark)

> resolveOperatorsUpWithNewOutput should wrap allowInvokingTransformsInAnalyzer
> -
>
> Key: SPARK-33992
> URL: https://issues.apache.org/jira/browse/SPARK-33992
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Affects Versions: 3.1.0
>Reporter: Kent Yao
>Priority: Minor
>
> PaddingAndLengthCheckForCharVarchar could fail query when 
> resolveOperatorsUpWithNewOutput
> with 
> {code:java}
> [info] - char/varchar resolution in sub query  *** FAILED *** (367 
> milliseconds)
> [info]   java.lang.RuntimeException: This method should not be called in the 
> analyzer
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.assertNotAnalysisRule(AnalysisHelper.scala:150)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.assertNotAnalysisRule$(AnalysisHelper.scala:146)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.assertNotAnalysisRule(LogicalPlan.scala:29)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDown(AnalysisHelper.scala:161)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDown$(AnalysisHelper.scala:160)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$updateOuterReferencesInSubquery(QueryPlan.scala:267)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33993) Parquet encryption interop

2021-01-04 Thread Gidon Gershinsky (Jira)

Gidon Gershinsky created SPARK-33993:


 Summary: Parquet encryption interop
 Key: SPARK-33993
 URL: https://issues.apache.org/jira/browse/SPARK-33993
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.1.0
Reporter: Gidon Gershinsky


Test interoperability between stand-alone Parquet encryption, and Spark-managed 
Parquet encryption



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33875) Implement DESCRIBE COLUMN for v2 catalog

2021-01-04 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-33875:
---

Assignee: Terry Kim

> Implement DESCRIBE COLUMN for v2 catalog
> 
>
> Key: SPARK-33875
> URL: https://issues.apache.org/jira/browse/SPARK-33875
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Terry Kim
>Assignee: Terry Kim
>Priority: Major
>
> Implement DESCRIBE COLUMN for v2 catalog



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33984) Upgrade to Py4J 0.10.9.1

2021-01-04 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-33984:
-

Assignee: Hyukjin Kwon

> Upgrade to Py4J 0.10.9.1
> 
>
> Key: SPARK-33984
> URL: https://issues.apache.org/jira/browse/SPARK-33984
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>
> Py4J 0.10.9.1 is out with bug fixes. we should better upgrade in PySpark as 
> well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-33984) Upgrade to Py4J 0.10.9.1

2021-01-04 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-33984.
---
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 31009
[https://github.com/apache/spark/pull/31009]

> Upgrade to Py4J 0.10.9.1
> 
>
> Key: SPARK-33984
> URL: https://issues.apache.org/jira/browse/SPARK-33984
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.2.0
>
>
> Py4J 0.10.9.1 is out with bug fixes. we should better upgrade in PySpark as 
> well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-33875) Implement DESCRIBE COLUMN for v2 catalog

2021-01-04 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-33875.
-
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 30881
[https://github.com/apache/spark/pull/30881]

> Implement DESCRIBE COLUMN for v2 catalog
> 
>
> Key: SPARK-33875
> URL: https://issues.apache.org/jira/browse/SPARK-33875
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Terry Kim
>Assignee: Terry Kim
>Priority: Major
> Fix For: 3.2.0
>
>
> Implement DESCRIBE COLUMN for v2 catalog



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31786) Exception on submitting Spark-Pi to Kubernetes 1.17.3

2021-01-04 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258383#comment-17258383
 ] 

Dongjoon Hyun commented on SPARK-31786:
---

Did you do `export HTTP2_DISABLE=true` before `spark-submit`? HTTP2_DISABLE is 
required all places when you use `K8s client` and technically there exist two 
places.
 # Your Mac (Outside K8s cluster): `spark-submit`
 # Spark Driver Pod (Inside K8s cluster): 
spark.kubernetes.driverEnv.HTTP2_DISABLE=true

> Exception on submitting Spark-Pi to Kubernetes 1.17.3
> -
>
> Key: SPARK-31786
> URL: https://issues.apache.org/jira/browse/SPARK-31786
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.4.5, 3.0.0
>Reporter: Maciej Bryński
>Assignee: Dongjoon Hyun
>Priority: Blocker
> Fix For: 3.0.0
>
>
> Hi,
> I'm getting exception when submitting Spark-Pi app to Kubernetes cluster.
> Kubernetes version: 1.17.3
> JDK version: openjdk version "1.8.0_252"
> Exception:
> {code}
>  ./bin/spark-submit --master k8s://https://172.31.23.60:8443 --deploy-mode 
> cluster --name spark-pi --conf 
> spark.kubernetes.container.image=spark-py:2.4.5 --conf 
> spark.kubernetes.executor.request.cores=0.1 --conf 
> spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf 
> spark.executor.instances=1 local:///opt/spark/examples/src/main/python/pi.py
> log4j:WARN No appenders could be found for logger 
> (io.fabric8.kubernetes.client.Config).
> log4j:WARN Please initialize the log4j system properly.
> log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more 
> info.
> Using Spark's default log4j profile: 
> org/apache/spark/log4j-defaults.properties
> Exception in thread "main" 
> io.fabric8.kubernetes.client.KubernetesClientException: Operation: [create]  
> for kind: [Pod]  with name: [null]  in namespace: [default]  failed.
> at 
> io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:64)
> at 
> io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:72)
> at 
> io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:337)
> at 
> io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:330)
> at 
> org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$2.apply(KubernetesClientApplication.scala:141)
> at 
> org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$2.apply(KubernetesClientApplication.scala:140)
> at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2543)
> at 
> org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:140)
> at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:250)
> at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:241)
> at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2543)
> at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:241)
> at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:204)
> at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
> at 
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
> at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
> at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
> at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: java.net.SocketException: Broken pipe (Write failed)
> at java.net.SocketOutputStream.socketWrite0(Native Method)
> at 
> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111)
> at java.net.SocketOutputStream.write(SocketOutputStream.java:155)
> at sun.security.ssl.OutputRecord.writeBuffer(OutputRecord.java:431)
> at sun.security.ssl.OutputRecord.write(OutputRecord.java:417)
> at 
> sun.security.ssl.SSLSocketImpl.writeRecordInternal(SSLSocketImpl.java:894)
> at sun.security.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:865)
> at sun.security.ssl.AppOutputStream.write(AppOutputStream.java:123)
> at okio.Okio$1.write(Okio.java:79)
> at okio.AsyncTimeout$1.write(AsyncTimeout.java:180)
> at

[jira] [Resolved] (SPARK-33990) v2 ALTER TABLE .. DROP PARTITION does not remove data from dropped partition

2021-01-04 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-33990.
---
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 31014
[https://github.com/apache/spark/pull/31014]

> v2 ALTER TABLE .. DROP PARTITION does not remove data from dropped partition
> 
>
> Key: SPARK-33990
> URL: https://issues.apache.org/jira/browse/SPARK-33990
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
> Fix For: 3.2.0
>
>
> The test fails:
> {code:scala}
>   test("SPARK-X: don not return data from dropped partition") {
> withNamespaceAndTable("ns", "tbl") { t =>
>   sql(s"CREATE TABLE $t (id int, part int) $defaultUsing PARTITIONED BY 
> (part)")
>   sql(s"INSERT INTO $t PARTITION (part=0) SELECT 0")
>   sql(s"INSERT INTO $t PARTITION (part=1) SELECT 1")
>   QueryTest.checkAnswer(sql(s"SELECT * FROM $t"), Seq(Row(0, 0), Row(1, 
> 1)))
>   sql(s"ALTER TABLE $t DROP PARTITION (part=0)")
>   QueryTest.checkAnswer(sql(s"SELECT * FROM $t"), Seq(Row(1, 1)))
> }
>   }
> {code}
> on the last check with:
> {code}
> == Results ==
> !== Correct Answer - 1 ==   == Spark Answer - 2 ==
> !struct<>   struct
> ![1,1]  [0,0]
> !   [1,1]
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33990) v2 ALTER TABLE .. DROP PARTITION does not remove data from dropped partition

2021-01-04 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-33990:
-

Assignee: Maxim Gekk

> v2 ALTER TABLE .. DROP PARTITION does not remove data from dropped partition
> 
>
> Key: SPARK-33990
> URL: https://issues.apache.org/jira/browse/SPARK-33990
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
>
> The test fails:
> {code:scala}
>   test("SPARK-X: don not return data from dropped partition") {
> withNamespaceAndTable("ns", "tbl") { t =>
>   sql(s"CREATE TABLE $t (id int, part int) $defaultUsing PARTITIONED BY 
> (part)")
>   sql(s"INSERT INTO $t PARTITION (part=0) SELECT 0")
>   sql(s"INSERT INTO $t PARTITION (part=1) SELECT 1")
>   QueryTest.checkAnswer(sql(s"SELECT * FROM $t"), Seq(Row(0, 0), Row(1, 
> 1)))
>   sql(s"ALTER TABLE $t DROP PARTITION (part=0)")
>   QueryTest.checkAnswer(sql(s"SELECT * FROM $t"), Seq(Row(1, 1)))
> }
>   }
> {code}
> on the last check with:
> {code}
> == Results ==
> !== Correct Answer - 1 ==   == Spark Answer - 2 ==
> !struct<>   struct
> ![1,1]  [0,0]
> !   [1,1]
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-33988) Add an option to enable CBO in TPCDSQueryBenchmark

2021-01-04 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-33988.
---
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 31011
[https://github.com/apache/spark/pull/31011]

> Add an option to enable CBO in TPCDSQueryBenchmark
> --
>
> Key: SPARK-33988
> URL: https://issues.apache.org/jira/browse/SPARK-33988
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Tests
>Affects Versions: 3.2.0
>Reporter: Takeshi Yamamuro
>Assignee: Takeshi Yamamuro
>Priority: Major
> Fix For: 3.2.0
>
>
> This ticket aims at adding a new option {{--cbo}} to enable CBO in 
> TPCDSQueryBenchmark. I think this option is useful so as to monitor 
> performance changes with CBO enabled.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33988) Add an option to enable CBO in TPCDSQueryBenchmark

2021-01-04 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-33988:
-

Assignee: Takeshi Yamamuro

> Add an option to enable CBO in TPCDSQueryBenchmark
> --
>
> Key: SPARK-33988
> URL: https://issues.apache.org/jira/browse/SPARK-33988
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Tests
>Affects Versions: 3.2.0
>Reporter: Takeshi Yamamuro
>Assignee: Takeshi Yamamuro
>Priority: Major
>
> This ticket aims at adding a new option {{--cbo}} to enable CBO in 
> TPCDSQueryBenchmark. I think this option is useful so as to monitor 
> performance changes with CBO enabled.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31786) Exception on submitting Spark-Pi to Kubernetes 1.17.3

2021-01-04 Thread Sachit Murarka (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258345#comment-17258345
 ] 

Sachit Murarka commented on SPARK-31786:


[~maver1ck] / [~dongjoon] :
I am facing this issue. I am using Spark 2.4.7 . 

 

I have tried the settings mentioned in the above comments
spark.kubernetes.driverEnv.HTTP2_DISABLE=true


Following is the exception :


Exception in thread "main" 
io.fabric8.kubernetes.client.KubernetesClientException: Operation: [create] for 
kind: [Pod] with name: [null] in namespace: [spark-test] failed.
 at 
io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:64)
 at 
io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:72)
 at 
io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:337)
 at 
io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:330)
 at 
org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$2.apply(KubernetesClientApplication.scala:141)
 at 
org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$2.apply(KubernetesClientApplication.scala:140)
 at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2543)
 at 
org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:140)
 at 
org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:250)
 at 
org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:241)
 at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2543)
 at 
org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:241)
 at 
org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:204)
 at 
[org.apache.spark.deploy.SparkSubmit.org|http://org.apache.spark.deploy.sparksubmit.org/]$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
 at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
 at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
 at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
 at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
 at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
 at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: [java.net|http://java.net/].SocketTimeoutException: connect timed out
 at [java.net|http://java.net/].PlainSocketImpl.socketConnect(Native Method)
 at 
[java.net|http://java.net/].AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
 at 
[java.net|http://java.net/].AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
 at 
[java.net|http://java.net/].AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
 at 
[java.net|http://java.net/].SocksSocketImpl.connect(SocksSocketImpl.java:392)
 at [java.net|http://java.net/].Socket.connect(Socket.java:589)
 at okhttp3.internal.platform.Platform.connectSocket(Platform.java:129)
 at 
okhttp3.internal.connection.RealConnection.connectSocket(RealConnection.java:246)
 at okhttp3.internal.connection.RealConnection.connect(RealConnection.java:166)
 at 
okhttp3.internal.connection.StreamAllocation.findConnection(StreamAllocation.java:257)
 at 
okhttp3.internal.connection.StreamAllocation.findHealthyConnection(StreamAllocation.java:135)
 at 
okhttp3.internal.connection.StreamAllocation.newStream(StreamAllocation.java:114)
 at 
okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:42)
 at 
okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
 at 
okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
 at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93)
 at 
okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
 at 
okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
 at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
 at 
okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
 at 
okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:126)
 at 
okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
 at 
okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
 at 
io.fabric8.kubernetes.client.utils.BackwardsCompatibilityInterceptor.intercept(BackwardsCompatibilityInterceptor.java:119)
 at 
okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
 at 
okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
 at

[jira] [Updated] (SPARK-33988) Add an option to enable CBO in TPCDSQueryBenchmark

2021-01-04 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-33988:
--
Parent: (was: SPARK-33828)
Issue Type: Improvement  (was: Sub-task)

> Add an option to enable CBO in TPCDSQueryBenchmark
> --
>
> Key: SPARK-33988
> URL: https://issues.apache.org/jira/browse/SPARK-33988
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Tests
>Affects Versions: 3.2.0
>Reporter: Takeshi Yamamuro
>Priority: Major
>
> This ticket aims at adding a new option {{--cbo}} to enable CBO in 
> TPCDSQueryBenchmark. I think this option is useful so as to monitor 
> performance changes with CBO enabled.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33988) Add an option to enable CBO in TPCDSQueryBenchmark

2021-01-04 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-33988:
--
Parent: SPARK-33828
Issue Type: Sub-task  (was: Test)

> Add an option to enable CBO in TPCDSQueryBenchmark
> --
>
> Key: SPARK-33988
> URL: https://issues.apache.org/jira/browse/SPARK-33988
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Tests
>Affects Versions: 3.2.0
>Reporter: Takeshi Yamamuro
>Priority: Major
>
> This ticket aims at adding a new option {{--cbo}} to enable CBO in 
> TPCDSQueryBenchmark. I think this option is useful so as to monitor 
> performance changes with CBO enabled.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33983) Update cloudpickle to v1.6.0

2021-01-04 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-33983:
-

Assignee: Hyukjin Kwon

> Update cloudpickle to v1.6.0
> 
>
> Key: SPARK-33983
> URL: https://issues.apache.org/jira/browse/SPARK-33983
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>
> Cloudpickle 1.6.0 is released out. We should better match it to the latest 
> version.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-33983) Update cloudpickle to v1.6.0

2021-01-04 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-33983.
---
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 31007
[https://github.com/apache/spark/pull/31007]

> Update cloudpickle to v1.6.0
> 
>
> Key: SPARK-33983
> URL: https://issues.apache.org/jira/browse/SPARK-33983
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.2.0
>
>
> Cloudpickle 1.6.0 is released out. We should better match it to the latest 
> version.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-31786) Exception on submitting Spark-Pi to Kubernetes 1.17.3

2021-01-04 Thread Sachit Murarka (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258411#comment-17258411
 ] 

Sachit Murarka edited comment on SPARK-31786 at 1/4/21, 6:41 PM:
-

[~dongjoon] -> Yes I have used `export HTTP2_DISABLE=true` , but only on my 
machine .
 Should it on all nodes of Kubernetes?

Also , regarding you mentioned in second point , 
spark.kubernetes.driverEnv.HTTP2_DISABLE=true has to be used with spark-submit 
in form of --conf . 

Please let me know if my understanding is correct.

Also , since this is a workaround. What can be the long term solution. Should I 
consider Spark 3 instead of Spark 2.4.7?


was (Author: smurarka):
[~dongjoon] -> Yes I have used `export HTTP2_DISABLE=true` , but only on my 
machine .
Should it on all nodes of Kubernetes?

Also , regarding you mentioned in second point , 
spark.kubernetes.driverEnv.HTTP2_DISABLE=true has to be used with spark-submit 
in form of --conf . 

Please let me know if my understanding is correct.

> Exception on submitting Spark-Pi to Kubernetes 1.17.3
> -
>
> Key: SPARK-31786
> URL: https://issues.apache.org/jira/browse/SPARK-31786
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.4.5, 3.0.0
>Reporter: Maciej Bryński
>Assignee: Dongjoon Hyun
>Priority: Blocker
> Fix For: 3.0.0
>
>
> Hi,
> I'm getting exception when submitting Spark-Pi app to Kubernetes cluster.
> Kubernetes version: 1.17.3
> JDK version: openjdk version "1.8.0_252"
> Exception:
> {code}
>  ./bin/spark-submit --master k8s://https://172.31.23.60:8443 --deploy-mode 
> cluster --name spark-pi --conf 
> spark.kubernetes.container.image=spark-py:2.4.5 --conf 
> spark.kubernetes.executor.request.cores=0.1 --conf 
> spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf 
> spark.executor.instances=1 local:///opt/spark/examples/src/main/python/pi.py
> log4j:WARN No appenders could be found for logger 
> (io.fabric8.kubernetes.client.Config).
> log4j:WARN Please initialize the log4j system properly.
> log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more 
> info.
> Using Spark's default log4j profile: 
> org/apache/spark/log4j-defaults.properties
> Exception in thread "main" 
> io.fabric8.kubernetes.client.KubernetesClientException: Operation: [create]  
> for kind: [Pod]  with name: [null]  in namespace: [default]  failed.
> at 
> io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:64)
> at 
> io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:72)
> at 
> io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:337)
> at 
> io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:330)
> at 
> org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$2.apply(KubernetesClientApplication.scala:141)
> at 
> org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$2.apply(KubernetesClientApplication.scala:140)
> at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2543)
> at 
> org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:140)
> at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:250)
> at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:241)
> at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2543)
> at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:241)
> at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:204)
> at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
> at 
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
> at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
> at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
> at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: java.net.SocketException: Broken pipe (Write failed)
> at java.net.SocketOutputStream.socketWrite0(Native Method)
> at 
> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111)
> at

[jira] [Commented] (SPARK-33908) Refact SparkSubmitUtils.resolveMavenCoordinates return parameter

2021-01-04 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258454#comment-17258454
 ] 

Apache Spark commented on SPARK-33908:
--

User 'AngersZh' has created a pull request for this issue:
https://github.com/apache/spark/pull/31016

> Refact SparkSubmitUtils.resolveMavenCoordinates return parameter
> 
>
> Key: SPARK-33908
> URL: https://issues.apache.org/jira/browse/SPARK-33908
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Assignee: angerszhu
>Priority: Major
> Fix For: 3.2.0
>
>
> Per talk in https://github.com/apache/spark/pull/29966#discussion_r531917374



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33987) v2 ALTER TABLE .. DROP PARTITION does not refresh cached table

2021-01-04 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258479#comment-17258479
 ] 

Apache Spark commented on SPARK-33987:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/31017

> v2 ALTER TABLE .. DROP PARTITION does not refresh cached table
> --
>
> Key: SPARK-33987
> URL: https://issues.apache.org/jira/browse/SPARK-33987
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Priority: Major
>
> The test below portraits the issue:
> {code:scala}
>   test("SPARK-33950: refresh cache after partition dropping") {
> withNamespaceAndTable("ns", "tbl") { t =>
>   sql(s"CREATE TABLE $t (id int, part int) $defaultUsing PARTITIONED BY 
> (part)")
>   sql(s"INSERT INTO $t PARTITION (part=0) SELECT 0")
>   sql(s"INSERT INTO $t PARTITION (part=1) SELECT 1")
>   assert(!spark.catalog.isCached(t))
>   sql(s"CACHE TABLE $t")
>   assert(spark.catalog.isCached(t))
>   QueryTest.checkAnswer(sql(s"SELECT * FROM $t"), Seq(Row(0, 0), Row(1, 
> 1)))
>   sql(s"ALTER TABLE $t DROP PARTITION (part=0)")
>   assert(spark.catalog.isCached(t))
>   QueryTest.checkAnswer(sql(s"SELECT * FROM $t"), Seq(Row(1, 1)))
> }
>   }
> {code}
> The last check fails:
> {code}
> == Results ==
> !== Correct Answer - 1 ==   == Spark Answer - 2 ==
> !struct<>   struct
> ![1,1]  [0,0]
> !   [1,1]
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33987) v2 ALTER TABLE .. DROP PARTITION does not refresh cached table

2021-01-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33987:


Assignee: Apache Spark

> v2 ALTER TABLE .. DROP PARTITION does not refresh cached table
> --
>
> Key: SPARK-33987
> URL: https://issues.apache.org/jira/browse/SPARK-33987
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Assignee: Apache Spark
>Priority: Major
>
> The test below portraits the issue:
> {code:scala}
>   test("SPARK-33950: refresh cache after partition dropping") {
> withNamespaceAndTable("ns", "tbl") { t =>
>   sql(s"CREATE TABLE $t (id int, part int) $defaultUsing PARTITIONED BY 
> (part)")
>   sql(s"INSERT INTO $t PARTITION (part=0) SELECT 0")
>   sql(s"INSERT INTO $t PARTITION (part=1) SELECT 1")
>   assert(!spark.catalog.isCached(t))
>   sql(s"CACHE TABLE $t")
>   assert(spark.catalog.isCached(t))
>   QueryTest.checkAnswer(sql(s"SELECT * FROM $t"), Seq(Row(0, 0), Row(1, 
> 1)))
>   sql(s"ALTER TABLE $t DROP PARTITION (part=0)")
>   assert(spark.catalog.isCached(t))
>   QueryTest.checkAnswer(sql(s"SELECT * FROM $t"), Seq(Row(1, 1)))
> }
>   }
> {code}
> The last check fails:
> {code}
> == Results ==
> !== Correct Answer - 1 ==   == Spark Answer - 2 ==
> !struct<>   struct
> ![1,1]  [0,0]
> !   [1,1]
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33987) v2 ALTER TABLE .. DROP PARTITION does not refresh cached table

2021-01-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33987:


Assignee: (was: Apache Spark)

> v2 ALTER TABLE .. DROP PARTITION does not refresh cached table
> --
>
> Key: SPARK-33987
> URL: https://issues.apache.org/jira/browse/SPARK-33987
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Priority: Major
>
> The test below portraits the issue:
> {code:scala}
>   test("SPARK-33950: refresh cache after partition dropping") {
> withNamespaceAndTable("ns", "tbl") { t =>
>   sql(s"CREATE TABLE $t (id int, part int) $defaultUsing PARTITIONED BY 
> (part)")
>   sql(s"INSERT INTO $t PARTITION (part=0) SELECT 0")
>   sql(s"INSERT INTO $t PARTITION (part=1) SELECT 1")
>   assert(!spark.catalog.isCached(t))
>   sql(s"CACHE TABLE $t")
>   assert(spark.catalog.isCached(t))
>   QueryTest.checkAnswer(sql(s"SELECT * FROM $t"), Seq(Row(0, 0), Row(1, 
> 1)))
>   sql(s"ALTER TABLE $t DROP PARTITION (part=0)")
>   assert(spark.catalog.isCached(t))
>   QueryTest.checkAnswer(sql(s"SELECT * FROM $t"), Seq(Row(1, 1)))
> }
>   }
> {code}
> The last check fails:
> {code}
> == Results ==
> !== Correct Answer - 1 ==   == Spark Answer - 2 ==
> !struct<>   struct
> ![1,1]  [0,0]
> !   [1,1]
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33894) Word2VecSuite failed for Scala 2.13

2021-01-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33894:


Assignee: (was: Apache Spark)

> Word2VecSuite failed for Scala 2.13
> ---
>
> Key: SPARK-33894
> URL: https://issues.apache.org/jira/browse/SPARK-33894
> Project: Spark
>  Issue Type: Sub-task
>  Components: MLlib
>Affects Versions: 3.2.0
>Reporter: Darcy Shen
>Priority: Major
>
> This may be the first failed build:
> https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-hadoop-2.7-scala-2.13/52/
> h2. Possible Work Around Fix
> Move 
> case class Data(word: String, vector: Array[Float])
> out of the class Word2VecModel
> h2. Attempts to git bisect
> master branch git "bisect"
> cc23581e2645c91fa8d6e6c81dc87b4221718bb1 fail
> 3d0323401f7a3e4369a3d3f4ff98f15d19e8a643  fail
> 9d9d4a8e122cf1137edeca857e925f7e76c1ace2   fail
> f5d2165c95fe83f24be9841807613950c1d5d6d0 fail 2020-12-01
> h2. Attached Stack Trace
> To reproduce it in master:
> ./dev/change-scala-version.sh 2.13
> sbt -Pscala-2.13
> > project mllib
> > testOnly org.apache.spark.ml.feature.Word2VecSuite
> [info] Word2VecSuite:
> [info] - params (45 milliseconds)
> [info] - Word2Vec (5 seconds, 768 milliseconds)
> [info] - getVectors (549 milliseconds)
> [info] - findSynonyms (222 milliseconds)
> [info] - window size (382 milliseconds)
> [info] - Word2Vec read/write numPartitions calculation (1 millisecond)
> [info] - Word2Vec read/write (669 milliseconds)
> [info] - Word2VecModel read/write *** FAILED *** (423 milliseconds)
> [info]   org.apache.spark.SparkException: Job aborted.
> [info]   at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:231)
> [info]   at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:188)
> [info]   at 
> org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:108)
> [info]   at 
> org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:106)
> [info]   at 
> org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:131)
> [info]   at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180)
> [info]   at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218)
> [info]   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> [info]   at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215)
> [info]   at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176)
> [info]   at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:132)
> [info]   at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:131)
> [info]   at 
> org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:989)
> [info]   at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
> [info]   at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
> [info]   at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
> [info]   at 
> org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)
> [info]   at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
> [info]   at 
> org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:989)
> [info]   at 
> org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:438)
> [info]   at 
> org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:415)
> [info]   at 
> org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:293)
> [info]   at 
> org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:874)
> [info]   at 
> org.apache.spark.ml.feature.Word2VecModel$Word2VecModelWriter.saveImpl(Word2Vec.scala:368)
> [info]   at org.apache.spark.ml.util.MLWriter.save(ReadWrite.scala:168)
> [info]   at org.apache.spark.ml.util.MLWritable.save(ReadWrite.scala:287)
> [info]   at org.apache.spark.ml.util.MLWritable.save$(ReadWrite.scala:287)
> [info]   at org.apache.spark.ml.feature.Word2VecModel.save(Word2Vec.scala:207)
> [info]   at 
> org.apache.spark.ml.util.DefaultReadWriteTest.testDefaultReadWrite(DefaultReadWriteTest.scala:51)
> [info]   at 
> org.apache.spark.ml.util.DefaultReadWriteTest.testDefaultReadWrite$(DefaultReadWriteTest.scala:42)
> [info]   at 
> org.apache.spark.ml.feature.Word2VecSuite.testDefaultReadWrite(Word2VecSuite.scala:28)
> [info]   at 
>

[jira] [Commented] (SPARK-33894) Word2VecSuite failed for Scala 2.13

2021-01-04 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258485#comment-17258485
 ] 

Apache Spark commented on SPARK-33894:
--

User 'koertkuipers' has created a pull request for this issue:
https://github.com/apache/spark/pull/31018

> Word2VecSuite failed for Scala 2.13
> ---
>
> Key: SPARK-33894
> URL: https://issues.apache.org/jira/browse/SPARK-33894
> Project: Spark
>  Issue Type: Sub-task
>  Components: MLlib
>Affects Versions: 3.2.0
>Reporter: Darcy Shen
>Priority: Major
>
> This may be the first failed build:
> https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-hadoop-2.7-scala-2.13/52/
> h2. Possible Work Around Fix
> Move 
> case class Data(word: String, vector: Array[Float])
> out of the class Word2VecModel
> h2. Attempts to git bisect
> master branch git "bisect"
> cc23581e2645c91fa8d6e6c81dc87b4221718bb1 fail
> 3d0323401f7a3e4369a3d3f4ff98f15d19e8a643  fail
> 9d9d4a8e122cf1137edeca857e925f7e76c1ace2   fail
> f5d2165c95fe83f24be9841807613950c1d5d6d0 fail 2020-12-01
> h2. Attached Stack Trace
> To reproduce it in master:
> ./dev/change-scala-version.sh 2.13
> sbt -Pscala-2.13
> > project mllib
> > testOnly org.apache.spark.ml.feature.Word2VecSuite
> [info] Word2VecSuite:
> [info] - params (45 milliseconds)
> [info] - Word2Vec (5 seconds, 768 milliseconds)
> [info] - getVectors (549 milliseconds)
> [info] - findSynonyms (222 milliseconds)
> [info] - window size (382 milliseconds)
> [info] - Word2Vec read/write numPartitions calculation (1 millisecond)
> [info] - Word2Vec read/write (669 milliseconds)
> [info] - Word2VecModel read/write *** FAILED *** (423 milliseconds)
> [info]   org.apache.spark.SparkException: Job aborted.
> [info]   at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:231)
> [info]   at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:188)
> [info]   at 
> org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:108)
> [info]   at 
> org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:106)
> [info]   at 
> org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:131)
> [info]   at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180)
> [info]   at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218)
> [info]   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> [info]   at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215)
> [info]   at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176)
> [info]   at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:132)
> [info]   at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:131)
> [info]   at 
> org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:989)
> [info]   at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
> [info]   at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
> [info]   at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
> [info]   at 
> org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)
> [info]   at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
> [info]   at 
> org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:989)
> [info]   at 
> org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:438)
> [info]   at 
> org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:415)
> [info]   at 
> org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:293)
> [info]   at 
> org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:874)
> [info]   at 
> org.apache.spark.ml.feature.Word2VecModel$Word2VecModelWriter.saveImpl(Word2Vec.scala:368)
> [info]   at org.apache.spark.ml.util.MLWriter.save(ReadWrite.scala:168)
> [info]   at org.apache.spark.ml.util.MLWritable.save(ReadWrite.scala:287)
> [info]   at org.apache.spark.ml.util.MLWritable.save$(ReadWrite.scala:287)
> [info]   at org.apache.spark.ml.feature.Word2VecModel.save(Word2Vec.scala:207)
> [info]   at 
> org.apache.spark.ml.util.DefaultReadWriteTest.testDefaultReadWrite(DefaultReadWriteTest.scala:51)
> [info]   at 
> org.apache.spark.ml.util.DefaultReadWriteTest.testDefaultReadWrite$(DefaultReadWriteTest.scala:42)
> [info]   at 
>

[jira] [Assigned] (SPARK-33894) Word2VecSuite failed for Scala 2.13

2021-01-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33894:


Assignee: Apache Spark

> Word2VecSuite failed for Scala 2.13
> ---
>
> Key: SPARK-33894
> URL: https://issues.apache.org/jira/browse/SPARK-33894
> Project: Spark
>  Issue Type: Sub-task
>  Components: MLlib
>Affects Versions: 3.2.0
>Reporter: Darcy Shen
>Assignee: Apache Spark
>Priority: Major
>
> This may be the first failed build:
> https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-hadoop-2.7-scala-2.13/52/
> h2. Possible Work Around Fix
> Move 
> case class Data(word: String, vector: Array[Float])
> out of the class Word2VecModel
> h2. Attempts to git bisect
> master branch git "bisect"
> cc23581e2645c91fa8d6e6c81dc87b4221718bb1 fail
> 3d0323401f7a3e4369a3d3f4ff98f15d19e8a643  fail
> 9d9d4a8e122cf1137edeca857e925f7e76c1ace2   fail
> f5d2165c95fe83f24be9841807613950c1d5d6d0 fail 2020-12-01
> h2. Attached Stack Trace
> To reproduce it in master:
> ./dev/change-scala-version.sh 2.13
> sbt -Pscala-2.13
> > project mllib
> > testOnly org.apache.spark.ml.feature.Word2VecSuite
> [info] Word2VecSuite:
> [info] - params (45 milliseconds)
> [info] - Word2Vec (5 seconds, 768 milliseconds)
> [info] - getVectors (549 milliseconds)
> [info] - findSynonyms (222 milliseconds)
> [info] - window size (382 milliseconds)
> [info] - Word2Vec read/write numPartitions calculation (1 millisecond)
> [info] - Word2Vec read/write (669 milliseconds)
> [info] - Word2VecModel read/write *** FAILED *** (423 milliseconds)
> [info]   org.apache.spark.SparkException: Job aborted.
> [info]   at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:231)
> [info]   at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:188)
> [info]   at 
> org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:108)
> [info]   at 
> org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:106)
> [info]   at 
> org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:131)
> [info]   at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180)
> [info]   at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218)
> [info]   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> [info]   at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215)
> [info]   at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176)
> [info]   at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:132)
> [info]   at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:131)
> [info]   at 
> org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:989)
> [info]   at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
> [info]   at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
> [info]   at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
> [info]   at 
> org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)
> [info]   at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
> [info]   at 
> org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:989)
> [info]   at 
> org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:438)
> [info]   at 
> org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:415)
> [info]   at 
> org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:293)
> [info]   at 
> org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:874)
> [info]   at 
> org.apache.spark.ml.feature.Word2VecModel$Word2VecModelWriter.saveImpl(Word2Vec.scala:368)
> [info]   at org.apache.spark.ml.util.MLWriter.save(ReadWrite.scala:168)
> [info]   at org.apache.spark.ml.util.MLWritable.save(ReadWrite.scala:287)
> [info]   at org.apache.spark.ml.util.MLWritable.save$(ReadWrite.scala:287)
> [info]   at org.apache.spark.ml.feature.Word2VecModel.save(Word2Vec.scala:207)
> [info]   at 
> org.apache.spark.ml.util.DefaultReadWriteTest.testDefaultReadWrite(DefaultReadWriteTest.scala:51)
> [info]   at 
> org.apache.spark.ml.util.DefaultReadWriteTest.testDefaultReadWrite$(DefaultReadWriteTest.scala:42)
> [info]   at 
> org.apache.spark.ml.feature.Word2VecSuite.testDefaultReadWrite(Word2VecSuite.scala:28)
> [info]   at 
>

[jira] [Commented] (SPARK-33894) Word2VecSuite failed for Scala 2.13

2021-01-04 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258486#comment-17258486
 ] 

Apache Spark commented on SPARK-33894:
--

User 'koertkuipers' has created a pull request for this issue:
https://github.com/apache/spark/pull/31018

> Word2VecSuite failed for Scala 2.13
> ---
>
> Key: SPARK-33894
> URL: https://issues.apache.org/jira/browse/SPARK-33894
> Project: Spark
>  Issue Type: Sub-task
>  Components: MLlib
>Affects Versions: 3.2.0
>Reporter: Darcy Shen
>Priority: Major
>
> This may be the first failed build:
> https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-hadoop-2.7-scala-2.13/52/
> h2. Possible Work Around Fix
> Move 
> case class Data(word: String, vector: Array[Float])
> out of the class Word2VecModel
> h2. Attempts to git bisect
> master branch git "bisect"
> cc23581e2645c91fa8d6e6c81dc87b4221718bb1 fail
> 3d0323401f7a3e4369a3d3f4ff98f15d19e8a643  fail
> 9d9d4a8e122cf1137edeca857e925f7e76c1ace2   fail
> f5d2165c95fe83f24be9841807613950c1d5d6d0 fail 2020-12-01
> h2. Attached Stack Trace
> To reproduce it in master:
> ./dev/change-scala-version.sh 2.13
> sbt -Pscala-2.13
> > project mllib
> > testOnly org.apache.spark.ml.feature.Word2VecSuite
> [info] Word2VecSuite:
> [info] - params (45 milliseconds)
> [info] - Word2Vec (5 seconds, 768 milliseconds)
> [info] - getVectors (549 milliseconds)
> [info] - findSynonyms (222 milliseconds)
> [info] - window size (382 milliseconds)
> [info] - Word2Vec read/write numPartitions calculation (1 millisecond)
> [info] - Word2Vec read/write (669 milliseconds)
> [info] - Word2VecModel read/write *** FAILED *** (423 milliseconds)
> [info]   org.apache.spark.SparkException: Job aborted.
> [info]   at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:231)
> [info]   at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:188)
> [info]   at 
> org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:108)
> [info]   at 
> org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:106)
> [info]   at 
> org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:131)
> [info]   at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180)
> [info]   at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218)
> [info]   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> [info]   at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215)
> [info]   at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176)
> [info]   at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:132)
> [info]   at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:131)
> [info]   at 
> org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:989)
> [info]   at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
> [info]   at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
> [info]   at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
> [info]   at 
> org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)
> [info]   at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
> [info]   at 
> org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:989)
> [info]   at 
> org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:438)
> [info]   at 
> org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:415)
> [info]   at 
> org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:293)
> [info]   at 
> org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:874)
> [info]   at 
> org.apache.spark.ml.feature.Word2VecModel$Word2VecModelWriter.saveImpl(Word2Vec.scala:368)
> [info]   at org.apache.spark.ml.util.MLWriter.save(ReadWrite.scala:168)
> [info]   at org.apache.spark.ml.util.MLWritable.save(ReadWrite.scala:287)
> [info]   at org.apache.spark.ml.util.MLWritable.save$(ReadWrite.scala:287)
> [info]   at org.apache.spark.ml.feature.Word2VecModel.save(Word2Vec.scala:207)
> [info]   at 
> org.apache.spark.ml.util.DefaultReadWriteTest.testDefaultReadWrite(DefaultReadWriteTest.scala:51)
> [info]   at 
> org.apache.spark.ml.util.DefaultReadWriteTest.testDefaultReadWrite$(DefaultReadWriteTest.scala:42)
> [info]   at 
>

[jira] [Commented] (SPARK-31786) Exception on submitting Spark-Pi to Kubernetes 1.17.3

2021-01-04 Thread Sachit Murarka (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258411#comment-17258411
 ] 

Sachit Murarka commented on SPARK-31786:


[~dongjoon] -> Yes I have used `export HTTP2_DISABLE=true` , but only on my 
machine .
Should it on all nodes of Kubernetes?

Also , regarding you mentioned in second point , 
spark.kubernetes.driverEnv.HTTP2_DISABLE=true has to be used with spark-submit 
in form of --conf . 

Please let me know if my understanding is correct.

> Exception on submitting Spark-Pi to Kubernetes 1.17.3
> -
>
> Key: SPARK-31786
> URL: https://issues.apache.org/jira/browse/SPARK-31786
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.4.5, 3.0.0
>Reporter: Maciej Bryński
>Assignee: Dongjoon Hyun
>Priority: Blocker
> Fix For: 3.0.0
>
>
> Hi,
> I'm getting exception when submitting Spark-Pi app to Kubernetes cluster.
> Kubernetes version: 1.17.3
> JDK version: openjdk version "1.8.0_252"
> Exception:
> {code}
>  ./bin/spark-submit --master k8s://https://172.31.23.60:8443 --deploy-mode 
> cluster --name spark-pi --conf 
> spark.kubernetes.container.image=spark-py:2.4.5 --conf 
> spark.kubernetes.executor.request.cores=0.1 --conf 
> spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf 
> spark.executor.instances=1 local:///opt/spark/examples/src/main/python/pi.py
> log4j:WARN No appenders could be found for logger 
> (io.fabric8.kubernetes.client.Config).
> log4j:WARN Please initialize the log4j system properly.
> log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more 
> info.
> Using Spark's default log4j profile: 
> org/apache/spark/log4j-defaults.properties
> Exception in thread "main" 
> io.fabric8.kubernetes.client.KubernetesClientException: Operation: [create]  
> for kind: [Pod]  with name: [null]  in namespace: [default]  failed.
> at 
> io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:64)
> at 
> io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:72)
> at 
> io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:337)
> at 
> io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:330)
> at 
> org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$2.apply(KubernetesClientApplication.scala:141)
> at 
> org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$2.apply(KubernetesClientApplication.scala:140)
> at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2543)
> at 
> org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:140)
> at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:250)
> at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:241)
> at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2543)
> at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:241)
> at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:204)
> at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
> at 
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
> at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
> at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
> at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: java.net.SocketException: Broken pipe (Write failed)
> at java.net.SocketOutputStream.socketWrite0(Native Method)
> at 
> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111)
> at java.net.SocketOutputStream.write(SocketOutputStream.java:155)
> at sun.security.ssl.OutputRecord.writeBuffer(OutputRecord.java:431)
> at sun.security.ssl.OutputRecord.write(OutputRecord.java:417)
> at 
> sun.security.ssl.SSLSocketImpl.writeRecordInternal(SSLSocketImpl.java:894)
> at sun.security.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:865)
> at sun.security.ssl.AppOutputStream.write(AppOutputStream.java:123)
> at okio.Okio$1.write(Okio.java:79)
> at okio.AsyncTimeout$1.write(AsyncTimeout.java:180)
> at

< 1 2 3 >

101 - 200 of 224 matches

Mail list logo