[jira] [Commented] (SPARK-29650) Discard a NULL constant in LIMIT

2019-10-30 Thread Aman Omer (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16963677#comment-16963677
 ] 

Aman Omer commented on SPARK-29650:
---

Analyzing this one.

> Discard a NULL constant in LIMIT
> 
>
> Key: SPARK-29650
> URL: https://issues.apache.org/jira/browse/SPARK-29650
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Takeshi Yamamuro
>Priority: Major
>
> In PostgreSQL, a NULL constant is accepted in LIMIT and its just ignored.
> But, in spark, it throws an exception below;
> {code:java}
> select * from int8_tbl limit (case when random() < 0.5 then bigint(null) end);
> org.apache.spark.sql.AnalysisException
> The limit expression must evaluate to a constant value, but got CASE WHEN 
> (`_nondeterministic` < CAST(0.5BD AS DOUBLE)) THEN CAST(NULL AS BIGINT) END; 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-29645) ML add param RelativeError

2019-10-30 Thread zhengruifeng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengruifeng resolved SPARK-29645.
--
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 26305
[https://github.com/apache/spark/pull/26305]

> ML add param RelativeError
> --
>
> Key: SPARK-29645
> URL: https://issues.apache.org/jira/browse/SPARK-29645
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, PySpark
>Affects Versions: 3.0.0
>Reporter: zhengruifeng
>Assignee: zhengruifeng
>Priority: Minor
> Fix For: 3.0.0
>
>
> {color:#172b4d}It makes sense to expose {{RelativeError}} to end users, since 
> it controls both the{color}
> {color:#172b4d}precision and memory overhead.
> {color}
> {color:#172b4d}[QuantileDiscretizer 
> |https://github.com/apache/spark/compare/master...zhengruifeng:add_relative_err?expand=1#diff-bf4cb764860f82d632ac0730e3d8c605]had
>  added this param, while other algs not yet.{color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-29645) ML add param RelativeError

2019-10-30 Thread zhengruifeng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengruifeng reassigned SPARK-29645:


Assignee: zhengruifeng

> ML add param RelativeError
> --
>
> Key: SPARK-29645
> URL: https://issues.apache.org/jira/browse/SPARK-29645
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, PySpark
>Affects Versions: 3.0.0
>Reporter: zhengruifeng
>Assignee: zhengruifeng
>Priority: Minor
>
> {color:#172b4d}It makes sense to expose {{RelativeError}} to end users, since 
> it controls both the{color}
> {color:#172b4d}precision and memory overhead.
> {color}
> {color:#172b4d}[QuantileDiscretizer 
> |https://github.com/apache/spark/compare/master...zhengruifeng:add_relative_err?expand=1#diff-bf4cb764860f82d632ac0730e3d8c605]had
>  added this param, while other algs not yet.{color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29616) Bankers' rounding for double types

2019-10-30 Thread Aman Omer (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16963670#comment-16963670
 ] 

Aman Omer commented on SPARK-29616:
---

Thanks [~maropu] . I am working on this.

> Bankers' rounding for double types
> --
>
> Key: SPARK-29616
> URL: https://issues.apache.org/jira/browse/SPARK-29616
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Takeshi Yamamuro
>Priority: Trivial
>
> PostgreSQL uses banker's rounding mode for double types;
> {code}
> postgres=# select * from t;
>  a | b 
> -+-
>  0.5 | 0.5
>  1.5 | 1.5
>  2.5 | 2.5
>  3.5 | 3.5
>  4.5 | 4.5
> (5 rows)
> postgres=# \d t
>  Table "public.t"
>  Column | Type | Collation | Nullable | Default 
> +--+---+--+-
>  a | double precision | | | 
>  b | numeric(2,1) | | |
> postgres=# select round(a), round(b) from t;
>  round | round 
> ---+---
>  0 | 1
>  2 | 2
>  2 | 3
>  4 | 4
>  4 | 5
> (5 rows)
> {code}
>  
> In the master;
> {code}
> scala> sql("select * from t").show
> +---+---+
> | a| b|
> +---+---+
> |0.5|0.5|
> |1.5|1.5|
> |2.5|2.5|
> |3.5|3.5|
> |4.5|4.5|
> +---+---+
> scala> sql("select * from t").printSchema
> root
>  |-- a: double (nullable = true)
>  |-- b: decimal(2,1) (nullable = true)
> scala> sql("select round(a), round(b) from t").show()
> +---+---+
> |round(a, 0)|round(b, 0)|
> +---+---+
> | 1.0| 1|
> | 2.0| 2|
> | 3.0| 3|
> | 4.0| 4|
> | 5.0| 5|
> +---+---+
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29636) Can't parse '11:00 BST' or '2000-10-19 10:23:54+01' signatures to timestamp

2019-10-30 Thread Aman Omer (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16963666#comment-16963666
 ] 

Aman Omer commented on SPARK-29636:
---

[~hyukjin.kwon] [~maxgekk] 

Can you help us here? What should be the output of above queries?

> Can't parse '11:00 BST' or '2000-10-19 10:23:54+01' signatures to timestamp
> ---
>
> Key: SPARK-29636
> URL: https://issues.apache.org/jira/browse/SPARK-29636
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Dylan Guedes
>Priority: Major
>
> Currently, Spark can't parse a string such as '11:00 BST' or '2000-10-19 
> 10:23:54+01' to timestamp:
> {code:sql}
> spark-sql> select cast ('11:00 BST' as timestamp);
> NULL
> Time taken: 2.248 seconds, Fetched 1 row(s)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29679) Make interval type camparable and orderable

2019-10-30 Thread Kent Yao (Jira)
Kent Yao created SPARK-29679:


 Summary: Make interval type camparable and orderable
 Key: SPARK-29679
 URL: https://issues.apache.org/jira/browse/SPARK-29679
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.0
Reporter: Kent Yao


{code:sql}
postgres=# select INTERVAL '9 years 1 months -1 weeks -4 days -10 hours -46 
minutes' > interval '1 s';
 ?column?
--
 t
(1 row)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29664) Column.getItem behavior is not consistent with Scala version

2019-10-30 Thread Terry Kim (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16963651#comment-16963651
 ] 

Terry Kim commented on SPARK-29664:
---

OK, will do. Thanks!

> Column.getItem behavior is not consistent with Scala version
> 
>
> Key: SPARK-29664
> URL: https://issues.apache.org/jira/browse/SPARK-29664
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.0.0
>Reporter: Terry Kim
>Priority: Major
>
> In PySpark, Column.getItem's behavior is different from the Scala version.
> For example,
> In PySpark:
> {code:python}
> df = spark.range(2)
> map_col = create_map(lit(0), lit(100), lit(1), lit(200))
> df.withColumn("mapped", map_col.getItem(col('id'))).show()
> # +---+--+
> # | id|mapped|
> # +---+--+
> # |  0|   100|
> # |  1|   200|
> # +---+--+
> {code}
> In Scala:
> {code:scala}
> val df = spark.range(2)
> val map_col = map(lit(0), lit(100), lit(1), lit(200))
> // The following getItem results in the following exception, which is the 
> right behavior:
> // java.lang.RuntimeException: Unsupported literal type class 
> org.apache.spark.sql.Column id
> //  at 
> org.apache.spark.sql.catalyst.expressions.Literal$.apply(literals.scala:78)
> //  at org.apache.spark.sql.Column.getItem(Column.scala:856)
> //  ... 49 elided
> df.withColumn("mapped", map_col.getItem(col("id"))).show
> // You have to use apply() to match with PySpark's behavior.
> df.withColumn("mapped", map_col(col("id"))).show
> // +---+--+
> // | id|mapped|
> // +---+--+
> // |  0|   100|
> // |  1|   200|
> // +---+--+
> {code}
> Looking at the code for Scala implementation, PySpark's behavior is incorrect 
> since the argument to getItem becomes `Literal`.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29664) Column.getItem behavior is not consistent with Scala version

2019-10-30 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16963649#comment-16963649
 ] 

Hyukjin Kwon commented on SPARK-29664:
--

We will have to update migration guide at 
https://github.com/apache/spark/blob/master/docs/pyspark-migration-guide.md and 
show the workaround ({{df[...]}}) in the docstring of {{getItem}} in PySpark.

> Column.getItem behavior is not consistent with Scala version
> 
>
> Key: SPARK-29664
> URL: https://issues.apache.org/jira/browse/SPARK-29664
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.0.0
>Reporter: Terry Kim
>Priority: Major
>
> In PySpark, Column.getItem's behavior is different from the Scala version.
> For example,
> In PySpark:
> {code:python}
> df = spark.range(2)
> map_col = create_map(lit(0), lit(100), lit(1), lit(200))
> df.withColumn("mapped", map_col.getItem(col('id'))).show()
> # +---+--+
> # | id|mapped|
> # +---+--+
> # |  0|   100|
> # |  1|   200|
> # +---+--+
> {code}
> In Scala:
> {code:scala}
> val df = spark.range(2)
> val map_col = map(lit(0), lit(100), lit(1), lit(200))
> // The following getItem results in the following exception, which is the 
> right behavior:
> // java.lang.RuntimeException: Unsupported literal type class 
> org.apache.spark.sql.Column id
> //  at 
> org.apache.spark.sql.catalyst.expressions.Literal$.apply(literals.scala:78)
> //  at org.apache.spark.sql.Column.getItem(Column.scala:856)
> //  ... 49 elided
> df.withColumn("mapped", map_col.getItem(col("id"))).show
> // You have to use apply() to match with PySpark's behavior.
> df.withColumn("mapped", map_col(col("id"))).show
> // +---+--+
> // | id|mapped|
> // +---+--+
> // |  0|   100|
> // |  1|   200|
> // +---+--+
> {code}
> Looking at the code for Scala implementation, PySpark's behavior is incorrect 
> since the argument to getItem becomes `Literal`.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29664) Column.getItem behavior is not consistent with Scala version

2019-10-30 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16963648#comment-16963648
 ] 

Hyukjin Kwon commented on SPARK-29664:
--

Can we fix Python side to match with Scala side?

> Column.getItem behavior is not consistent with Scala version
> 
>
> Key: SPARK-29664
> URL: https://issues.apache.org/jira/browse/SPARK-29664
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.0.0
>Reporter: Terry Kim
>Priority: Major
>
> In PySpark, Column.getItem's behavior is different from the Scala version.
> For example,
> In PySpark:
> {code:python}
> df = spark.range(2)
> map_col = create_map(lit(0), lit(100), lit(1), lit(200))
> df.withColumn("mapped", map_col.getItem(col('id'))).show()
> # +---+--+
> # | id|mapped|
> # +---+--+
> # |  0|   100|
> # |  1|   200|
> # +---+--+
> {code}
> In Scala:
> {code:scala}
> val df = spark.range(2)
> val map_col = map(lit(0), lit(100), lit(1), lit(200))
> // The following getItem results in the following exception, which is the 
> right behavior:
> // java.lang.RuntimeException: Unsupported literal type class 
> org.apache.spark.sql.Column id
> //  at 
> org.apache.spark.sql.catalyst.expressions.Literal$.apply(literals.scala:78)
> //  at org.apache.spark.sql.Column.getItem(Column.scala:856)
> //  ... 49 elided
> df.withColumn("mapped", map_col.getItem(col("id"))).show
> // You have to use apply() to match with PySpark's behavior.
> df.withColumn("mapped", map_col(col("id"))).show
> // +---+--+
> // | id|mapped|
> // +---+--+
> // |  0|   100|
> // |  1|   200|
> // +---+--+
> {code}
> Looking at the code for Scala implementation, PySpark's behavior is incorrect 
> since the argument to getItem becomes `Literal`.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-29582) Unify the behavior of pyspark.TaskContext with spark core

2019-10-30 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-29582.
--
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 26239
[https://github.com/apache/spark/pull/26239]

> Unify the behavior of pyspark.TaskContext with spark core
> -
>
> Key: SPARK-29582
> URL: https://issues.apache.org/jira/browse/SPARK-29582
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.4.4
>Reporter: Xianyang Liu
>Assignee: Xianyang Liu
>Priority: Major
> Fix For: 3.0.0
>
>
> In Spark Core, there is a `TaskContext` object which is a singleton. We set a 
> task context instance which can be TaskContext or BarrierTaskContext before 
> the task function startup, and unset it to none after the function end. So we 
> can both get TaskContext and BarrierTaskContext with the object. How we can 
> only get the BarrierTaskContext with `BarrierTaskContext`, we will get `None` 
> if we get it by `TaskContext.get` in a barrier stage.
>  
> In this patch, we unify the behavior of TaskContext for pyspark with Spark 
> core. This is useful when people switch from normal code to barrier code, and 
> only need a little update.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-29582) Unify the behavior of pyspark.TaskContext with spark core

2019-10-30 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-29582:


Assignee: Xianyang Liu

> Unify the behavior of pyspark.TaskContext with spark core
> -
>
> Key: SPARK-29582
> URL: https://issues.apache.org/jira/browse/SPARK-29582
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.4.4
>Reporter: Xianyang Liu
>Assignee: Xianyang Liu
>Priority: Major
>
> In Spark Core, there is a `TaskContext` object which is a singleton. We set a 
> task context instance which can be TaskContext or BarrierTaskContext before 
> the task function startup, and unset it to none after the function end. So we 
> can both get TaskContext and BarrierTaskContext with the object. How we can 
> only get the BarrierTaskContext with `BarrierTaskContext`, we will get `None` 
> if we get it by `TaskContext.get` in a barrier stage.
>  
> In this patch, we unify the behavior of TaskContext for pyspark with Spark 
> core. This is useful when people switch from normal code to barrier code, and 
> only need a little update.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29678) ALTER TABLE (add table partition) should look up catalog/table like v2 commands

2019-10-30 Thread Terry Kim (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16963618#comment-16963618
 ] 

Terry Kim commented on SPARK-29678:
---

working on this.

> ALTER TABLE (add table partition) should look up catalog/table like v2 
> commands
> ---
>
> Key: SPARK-29678
> URL: https://issues.apache.org/jira/browse/SPARK-29678
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Terry Kim
>Priority: Major
>
> 'ALTER TABLE table ADD IF NOT EXISTS PARTITION (a=b)' should look up 
> catalog/table like v2 commands



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29678) ALTER TABLE (add table partition) should look up catalog/table like v2 commands

2019-10-30 Thread Terry Kim (Jira)
Terry Kim created SPARK-29678:
-

 Summary: ALTER TABLE (add table partition) should look up 
catalog/table like v2 commands
 Key: SPARK-29678
 URL: https://issues.apache.org/jira/browse/SPARK-29678
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.0
Reporter: Terry Kim


'ALTER TABLE table ADD IF NOT EXISTS PARTITION (a=b)' should look up 
catalog/table like v2 commands



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-29668) Deprecate Python 3 prior to version 3.6

2019-10-30 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-29668:
-

Assignee: Hyukjin Kwon  (was: Dongjoon Hyun)

> Deprecate Python 3 prior to version 3.6
> ---
>
> Key: SPARK-29668
> URL: https://issues.apache.org/jira/browse/SPARK-29668
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Hyukjin Kwon
>Priority: Minor
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29668) Deprecate Python 3 prior to version 3.6

2019-10-30 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16963609#comment-16963609
 ] 

Dongjoon Hyun commented on SPARK-29668:
---

https://github.com/apache/spark/pull/26335 adds a deprecation warnings 
explicitly.

> Deprecate Python 3 prior to version 3.6
> ---
>
> Key: SPARK-29668
> URL: https://issues.apache.org/jira/browse/SPARK-29668
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Hyukjin Kwon
>Priority: Minor
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-29592) ALTER TABLE (set partition location) should look up catalog/table like v2 commands

2019-10-30 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-29592.
-
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 26304
[https://github.com/apache/spark/pull/26304]

> ALTER TABLE (set partition location) should look up catalog/table like v2 
> commands
> --
>
> Key: SPARK-29592
> URL: https://issues.apache.org/jira/browse/SPARK-29592
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Terry Kim
>Assignee: Terry Kim
>Priority: Major
> Fix For: 3.0.0
>
>
> ALTER TABLE (set partition location) should look up catalog/table like v2 
> commands



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-29592) ALTER TABLE (set partition location) should look up catalog/table like v2 commands

2019-10-30 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-29592:
---

Assignee: Terry Kim

> ALTER TABLE (set partition location) should look up catalog/table like v2 
> commands
> --
>
> Key: SPARK-29592
> URL: https://issues.apache.org/jira/browse/SPARK-29592
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Terry Kim
>Assignee: Terry Kim
>Priority: Major
>
> ALTER TABLE (set partition location) should look up catalog/table like v2 
> commands



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-29523) SHOW COLUMNS should look up catalog/table like v2 commands

2019-10-30 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-29523:
---

Assignee: Pablo Langa Blanco

> SHOW COLUMNS should look up catalog/table like v2 commands
> --
>
> Key: SPARK-29523
> URL: https://issues.apache.org/jira/browse/SPARK-29523
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Pablo Langa Blanco
>Assignee: Pablo Langa Blanco
>Priority: Major
>
> SHOW COLUMNS should look up catalog/table like v2 commands



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-29523) SHOW COLUMNS should look up catalog/table like v2 commands

2019-10-30 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-29523.
-
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 26182
[https://github.com/apache/spark/pull/26182]

> SHOW COLUMNS should look up catalog/table like v2 commands
> --
>
> Key: SPARK-29523
> URL: https://issues.apache.org/jira/browse/SPARK-29523
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Pablo Langa Blanco
>Assignee: Pablo Langa Blanco
>Priority: Major
> Fix For: 3.0.0
>
>
> SHOW COLUMNS should look up catalog/table like v2 commands



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-29126) Add usage guide for cogroup Pandas UDF

2019-10-30 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-29126:


Assignee: Chris Martin  (was: Hyukjin Kwon)

> Add usage guide for cogroup Pandas UDF
> --
>
> Key: SPARK-29126
> URL: https://issues.apache.org/jira/browse/SPARK-29126
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 3.0.0
>Reporter: Bryan Cutler
>Assignee: Chris Martin
>Priority: Major
> Fix For: 3.0.0
>
>
> Add usage guide for the cogroup Pandas UDF from SPARK-27463



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-29126) Add usage guide for cogroup Pandas UDF

2019-10-30 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-29126:


Assignee: Hyukjin Kwon

> Add usage guide for cogroup Pandas UDF
> --
>
> Key: SPARK-29126
> URL: https://issues.apache.org/jira/browse/SPARK-29126
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 3.0.0
>Reporter: Bryan Cutler
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.0.0
>
>
> Add usage guide for the cogroup Pandas UDF from SPARK-27463



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-29126) Add usage guide for cogroup Pandas UDF

2019-10-30 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-29126.
--
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 26110
[https://github.com/apache/spark/pull/26110]

> Add usage guide for cogroup Pandas UDF
> --
>
> Key: SPARK-29126
> URL: https://issues.apache.org/jira/browse/SPARK-29126
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 3.0.0
>Reporter: Bryan Cutler
>Priority: Major
> Fix For: 3.0.0
>
>
> Add usage guide for the cogroup Pandas UDF from SPARK-27463



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29671) Change format of interval string

2019-10-30 Thread Wenchen Fan (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16963553#comment-16963553
 ] 

Wenchen Fan commented on SPARK-29671:
-

I think it's good enough to display the second field with fraction values.

> Change format of interval string
> 
>
> Key: SPARK-29671
> URL: https://issues.apache.org/jira/browse/SPARK-29671
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Maxim Gekk
>Priority: Minor
>
> The ticket aims to improve format of interval representation as a string. See 
> https://github.com/apache/spark/pull/26313#issuecomment-547820035



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-29651) Incorrect parsing of interval seconds fraction

2019-10-30 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-29651.
-
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 26313
[https://github.com/apache/spark/pull/26313]

> Incorrect parsing of interval seconds fraction
> --
>
> Key: SPARK-29651
> URL: https://issues.apache.org/jira/browse/SPARK-29651
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.1.0, 2.2.0, 2.3.0, 2.4.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Minor
> Fix For: 3.0.0
>
>
> * The fractional part of interval seconds unit is incorrectly parsed if the 
> number of digits is less than 9, for example:
> {code}
> spark-sql> select interval '10.123456 seconds';
> interval 10 seconds 123 microseconds
> {code}
> The result must be *interval 10 seconds 123 milliseconds 456 microseconds*
> * If the seconds unit of an interval is negative, it is incorrectly converted 
> to `CalendarInterval`, for example:
> {code}
> spark-sql> select interval '-10.123456789 seconds';
> interval -9 seconds -876 milliseconds -544 microseconds
> {code}
> Taking into account truncation to microseconds, the result must be *interval 
> -10 seconds -123 milliseconds -456 microseconds*



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-29651) Incorrect parsing of interval seconds fraction

2019-10-30 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-29651:
---

Assignee: Maxim Gekk

> Incorrect parsing of interval seconds fraction
> --
>
> Key: SPARK-29651
> URL: https://issues.apache.org/jira/browse/SPARK-29651
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.1.0, 2.2.0, 2.3.0, 2.4.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Minor
>
> * The fractional part of interval seconds unit is incorrectly parsed if the 
> number of digits is less than 9, for example:
> {code}
> spark-sql> select interval '10.123456 seconds';
> interval 10 seconds 123 microseconds
> {code}
> The result must be *interval 10 seconds 123 milliseconds 456 microseconds*
> * If the seconds unit of an interval is negative, it is incorrectly converted 
> to `CalendarInterval`, for example:
> {code}
> spark-sql> select interval '-10.123456789 seconds';
> interval -9 seconds -876 milliseconds -544 microseconds
> {code}
> Taking into account truncation to microseconds, the result must be *interval 
> -10 seconds -123 milliseconds -456 microseconds*



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-29277) DataSourceV2: Add early filter and projection pushdown

2019-10-30 Thread Ryan Blue (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan Blue resolved SPARK-29277.
---
Fix Version/s: 3.0.0
   Resolution: Fixed

Fixed by #25955.

> DataSourceV2: Add early filter and projection pushdown
> --
>
> Key: SPARK-29277
> URL: https://issues.apache.org/jira/browse/SPARK-29277
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Ryan Blue
>Priority: Major
> Fix For: 3.0.0
>
>
> Spark uses optimizer rules that need stats before conversion to physical 
> plan. DataSourceV2 should support early pushdown for those rules.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29677) Upgrade Kinesis Client

2019-10-30 Thread Eric S Meisel (Jira)
Eric S Meisel created SPARK-29677:
-

 Summary: Upgrade Kinesis Client
 Key: SPARK-29677
 URL: https://issues.apache.org/jira/browse/SPARK-29677
 Project: Spark
  Issue Type: Improvement
  Components: DStreams
Affects Versions: 2.4.4
Reporter: Eric S Meisel


The current amazon-kinesis-client version is 1.8.10. This version depends on 
the use of `describeStream`, which has a hard limit on an AWS account (10 reqs 
/ second). Versions 1.9.0 and up leverage `listShards`, which has no such 
limit. For large customers, this can be a major problem. 

 

Upgrading the amazon-kinesis-client version should resolve this issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29676) ALTER TABLE (RENAME PARTITION) should look up catalog/table like v2 commands

2019-10-30 Thread Huaxin Gao (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16963537#comment-16963537
 ] 

Huaxin Gao commented on SPARK-29676:


I am working on this

> ALTER TABLE (RENAME PARTITION) should look up catalog/table like v2 commands
> 
>
> Key: SPARK-29676
> URL: https://issues.apache.org/jira/browse/SPARK-29676
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Huaxin Gao
>Priority: Major
>
> ALTER TABLE (RENAME PARTITION) should look up catalog/table like v2 commands



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29676) ALTER TABLE (RENAME PARTITION) should look up catalog/table like v2 commands

2019-10-30 Thread Huaxin Gao (Jira)
Huaxin Gao created SPARK-29676:
--

 Summary: ALTER TABLE (RENAME PARTITION) should look up 
catalog/table like v2 commands
 Key: SPARK-29676
 URL: https://issues.apache.org/jira/browse/SPARK-29676
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.0
Reporter: Huaxin Gao


ALTER TABLE (RENAME PARTITION) should look up catalog/table like v2 commands



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29643) ALTER TABLE (DROP PARTITION) should look up catalog/table like v2 commands

2019-10-30 Thread Huaxin Gao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxin Gao updated SPARK-29643:
---
Environment: (was: ALTER TABLE (DROP PARTITION) should look up 
catalog/table like v2 commands)

> ALTER TABLE (DROP PARTITION) should look up catalog/table like v2 commands
> --
>
> Key: SPARK-29643
> URL: https://issues.apache.org/jira/browse/SPARK-29643
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Huaxin Gao
>Priority: Major
>
> ALTER TABLE (DROP PARTITION) should look up catalog/table like v2 commands



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29643) ALTER TABLE (DROP PARTITION) should look up catalog/table like v2 commands

2019-10-30 Thread Huaxin Gao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxin Gao updated SPARK-29643:
---
Description: ALTER TABLE (DROP PARTITION) should look up catalog/table like 
v2 commands

> ALTER TABLE (DROP PARTITION) should look up catalog/table like v2 commands
> --
>
> Key: SPARK-29643
> URL: https://issues.apache.org/jira/browse/SPARK-29643
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
> Environment: ALTER TABLE (DROP PARTITION) should look up 
> catalog/table like v2 commands
>Reporter: Huaxin Gao
>Priority: Major
>
> ALTER TABLE (DROP PARTITION) should look up catalog/table like v2 commands



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29675) Add exception when isolationLevel is Illegal

2019-10-30 Thread ulysses you (Jira)
ulysses you created SPARK-29675:
---

 Summary: Add exception when isolationLevel is Illegal
 Key: SPARK-29675
 URL: https://issues.apache.org/jira/browse/SPARK-29675
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.4.4
Reporter: ulysses you


Now we use JDBC api and set an Illegal isolationLevel option, spark will throw 
a `scala.MatchError`, it's not friendly to user. So we should add an 
IllegalArgumentException.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29660) Dropping columns and changing column names/types are prohibited in VIEW definition

2019-10-30 Thread Takeshi Yamamuro (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16963533#comment-16963533
 ] 

Takeshi Yamamuro commented on SPARK-29660:
--

I'm not sure that we need to do something about this. See: 
https://github.com/apache/spark/pull/26290#discussion_r340487877

> Dropping columns and changing column names/types are prohibited in VIEW 
> definition
> --
>
> Key: SPARK-29660
> URL: https://issues.apache.org/jira/browse/SPARK-29660
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Takeshi Yamamuro
>Priority: Major
>
> In PostgreSQL, the three DDL syntaxes for VIEW cannot be accepted;
> {code:java}
> -- should fail
> CREATE OR REPLACE VIEW viewtest AS
>   SELECT a FROM viewtest_tbl WHERE a <> 20;
> ERROR:  cannot drop columns from view
> -- should fail
> CREATE OR REPLACE VIEW viewtest AS
>   SELECT 1, * FROM viewtest_tbl;
> ERROR:  cannot change name of view column "a" to "?column?"
> -- should fail
> CREATE OR REPLACE VIEW viewtest AS
>   SELECT a, b::numeric FROM viewtest_tbl;
> ERROR:  cannot change data type of view column "b" from integer to numeric
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29674) Update dropwizard metrics to 4.1.1 for JDK 9+ support

2019-10-30 Thread Sean R. Owen (Jira)
Sean R. Owen created SPARK-29674:


 Summary: Update dropwizard metrics to 4.1.1 for JDK 9+ support
 Key: SPARK-29674
 URL: https://issues.apache.org/jira/browse/SPARK-29674
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Core
Affects Versions: 3.0.0
Reporter: Sean R. Owen
Assignee: Sean R. Owen


It looks like dropwizard metrics 4.x has some fixed for JDK 9+:
https://github.com/dropwizard/metrics/pull/1236

It looks like it's relatively easy to update to 4.1.x from 3.2.x, so we should 
probably do it for Spark 3.0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29106) Add jenkins arm test for spark

2019-10-30 Thread Shane Knapp (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16963505#comment-16963505
 ] 

Shane Knapp commented on SPARK-29106:
-

first pass @ the python tests:
https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-python-arm/3

i'll fix the scheduling later, as well as whack-a-mole any python modules that 
i might have missed.

i wasn't able to get pyarrow to install, but it looks like ARM support for 
arrow is limited at best.

note to self:  holy crap this was a serious PITA getting this stuff installed.

> Add jenkins arm test for spark
> --
>
> Key: SPARK-29106
> URL: https://issues.apache.org/jira/browse/SPARK-29106
> Project: Spark
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 3.0.0
>Reporter: huangtianhua
>Priority: Minor
> Attachments: R-ansible.yml, R-libs.txt
>
>
> Add arm test jobs to amplab jenkins for spark.
> Till now we made two arm test periodic jobs for spark in OpenLab, one is 
> based on master with hadoop 2.7(similar with QA test of amplab jenkins), 
> other one is based on a new branch which we made on date 09-09, see  
> [http://status.openlabtesting.org/builds/job/spark-master-unit-test-hadoop-2.7-arm64]
>   and 
> [http://status.openlabtesting.org/builds/job/spark-unchanged-branch-unit-test-hadoop-2.7-arm64.|http://status.openlabtesting.org/builds/job/spark-unchanged-branch-unit-test-hadoop-2.7-arm64]
>  We only have to care about the first one when integrate arm test with amplab 
> jenkins.
> About the k8s test on arm, we have took test it, see 
> [https://github.com/theopenlab/spark/pull/17], maybe we can integrate it 
> later. 
> And we plan test on other stable branches too, and we can integrate them to 
> amplab when they are ready.
> We have offered an arm instance and sent the infos to shane knapp, thanks 
> shane to add the first arm job to amplab jenkins :) 
> The other important thing is about the leveldbjni 
> [https://github.com/fusesource/leveldbjni,|https://github.com/fusesource/leveldbjni/issues/80]
>  spark depends on leveldbjni-all-1.8 
> [https://mvnrepository.com/artifact/org.fusesource.leveldbjni/leveldbjni-all/1.8],
>  we can see there is no arm64 supporting. So we build an arm64 supporting 
> release of leveldbjni see 
> [https://mvnrepository.com/artifact/org.openlabtesting.leveldbjni/leveldbjni-all/1.8],
>  but we can't modified the spark pom.xml directly with something like 
> 'property'/'profile' to choose correct jar package on arm or x86 platform, 
> because spark depends on some hadoop packages like hadoop-hdfs, the packages 
> depend on leveldbjni-all-1.8 too, unless hadoop release with new arm 
> supporting leveldbjni jar. Now we download the leveldbjni-al-1.8 of 
> openlabtesting and 'mvn install' to use it when arm testing for spark.
> PS: The issues found and fixed:
>  SPARK-28770
>  [https://github.com/apache/spark/pull/25673]
>   
>  SPARK-28519
>  [https://github.com/apache/spark/pull/25279]
>   
>  SPARK-28433
>  [https://github.com/apache/spark/pull/25186]
>  
> SPARK-28467
> [https://github.com/apache/spark/pull/25864]
>  
> SPARK-29286
> [https://github.com/apache/spark/pull/26021]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29604) SessionState is initialized with isolated classloader for Hive if spark.sql.hive.metastore.jars is being set

2019-10-30 Thread Jungtaek Lim (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16963476#comment-16963476
 ] 

Jungtaek Lim commented on SPARK-29604:
--

[~dongjoon]
Do we have any annotation/trait to "isolate" running test suite? I'm suspecting 
session, or listeners in session is being modified from other tests running 
concurrently.

> SessionState is initialized with isolated classloader for Hive if 
> spark.sql.hive.metastore.jars is being set
> 
>
> Key: SPARK-29604
> URL: https://issues.apache.org/jira/browse/SPARK-29604
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.4, 3.0.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Major
> Fix For: 2.4.5, 3.0.0
>
>
> I've observed the issue that external listeners cannot be loaded properly 
> when we run spark-sql with "spark.sql.hive.metastore.jars" configuration 
> being used.
> {noformat}
> Exception in thread "main" java.lang.IllegalArgumentException: Error while 
> instantiating 'org.apache.spark.sql.hive.HiveSessionStateBuilder':
>   at 
> org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1102)
>   at 
> org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:154)
>   at 
> org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:153)
>   at scala.Option.getOrElse(Option.scala:121)
>   at 
> org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:153)
>   at 
> org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:150)
>   at 
> org.apache.spark.sql.SparkSession$$anonfun$1$$anonfun$apply$2.apply(SparkSession.scala:104)
>   at 
> org.apache.spark.sql.SparkSession$$anonfun$1$$anonfun$apply$2.apply(SparkSession.scala:104)
>   at scala.Option.map(Option.scala:146)
>   at 
> org.apache.spark.sql.SparkSession$$anonfun$1.apply(SparkSession.scala:104)
>   at 
> org.apache.spark.sql.SparkSession$$anonfun$1.apply(SparkSession.scala:103)
>   at org.apache.spark.sql.internal.SQLConf$.get(SQLConf.scala:149)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.org$apache$spark$sql$hive$client$HiveClientImpl$$client(HiveClientImpl.scala:282)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:306)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:247)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:246)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:296)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.databaseExists(HiveClientImpl.scala:386)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply$mcZ$sp(HiveExternalCatalog.scala:215)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply(HiveExternalCatalog.scala:215)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply(HiveExternalCatalog.scala:215)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog.databaseExists(HiveExternalCatalog.scala:214)
>   at 
> org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:114)
>   at 
> org.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:102)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:53)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.(SparkSQLCLIDriver.scala:315)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:166)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
>   at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:847)
>   at 
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
>   at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
>   at 

[jira] [Resolved] (SPARK-29666) Release script fail to publish release under dry run mode

2019-10-30 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-29666.
---
Fix Version/s: 3.0.0
   2.4.5
   Resolution: Fixed

Issue resolved by pull request 26329
[https://github.com/apache/spark/pull/26329]

> Release script fail to publish release under dry run mode
> -
>
> Key: SPARK-29666
> URL: https://issues.apache.org/jira/browse/SPARK-29666
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.0.0
>Reporter: Xingbo Jiang
>Assignee: Xingbo Jiang
>Priority: Major
> Fix For: 2.4.5, 3.0.0
>
>
> `release-build.sh` fail to publish release under dry run mode with the 
> following error message:
> {code}
> /opt/spark-rm/release-build.sh: line 429: pushd: 
> spark-repo-g4MBm/org/apache/spark: No such file or directory
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-29666) Release script fail to publish release under dry run mode

2019-10-30 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-29666:
-

Assignee: Xingbo Jiang

> Release script fail to publish release under dry run mode
> -
>
> Key: SPARK-29666
> URL: https://issues.apache.org/jira/browse/SPARK-29666
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.0.0
>Reporter: Xingbo Jiang
>Assignee: Xingbo Jiang
>Priority: Major
>
> `release-build.sh` fail to publish release under dry run mode with the 
> following error message:
> {code}
> /opt/spark-rm/release-build.sh: line 429: pushd: 
> spark-repo-g4MBm/org/apache/spark: No such file or directory
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-29646) Allow pyspark version name format `${versionNumber}-preview` in release script

2019-10-30 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-29646:
-

Assignee: Xingbo Jiang

> Allow pyspark version name format `${versionNumber}-preview` in release script
> --
>
> Key: SPARK-29646
> URL: https://issues.apache.org/jira/browse/SPARK-29646
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.0.0
>Reporter: Xingbo Jiang
>Assignee: Xingbo Jiang
>Priority: Major
>
> We shall allow pyspark version name format `${versionNumber}-preview` in 
> release script, to support preview releases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-29646) Allow pyspark version name format `${versionNumber}-preview` in release script

2019-10-30 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-29646.
---
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 26306
[https://github.com/apache/spark/pull/26306]

> Allow pyspark version name format `${versionNumber}-preview` in release script
> --
>
> Key: SPARK-29646
> URL: https://issues.apache.org/jira/browse/SPARK-29646
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.0.0
>Reporter: Xingbo Jiang
>Assignee: Xingbo Jiang
>Priority: Major
> Fix For: 3.0.0
>
>
> We shall allow pyspark version name format `${versionNumber}-preview` in 
> release script, to support preview releases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29673) upgrade jenkins pypy to PyPy3.6 v7.2.0

2019-10-30 Thread Shane Knapp (Jira)
Shane Knapp created SPARK-29673:
---

 Summary: upgrade jenkins pypy to PyPy3.6 v7.2.0
 Key: SPARK-29673
 URL: https://issues.apache.org/jira/browse/SPARK-29673
 Project: Spark
  Issue Type: Sub-task
  Components: Build
Affects Versions: 3.0.0
Reporter: Shane Knapp
Assignee: Shane Knapp






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29670) Make executor's bindAddress configurable

2019-10-30 Thread Nishchal Venkataramana (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nishchal Venkataramana updated SPARK-29670:
---
Affects Version/s: (was: 2.1.1)
   2.0.2
   2.1.3
   2.2.3
   2.3.4
   2.4.4

> Make executor's bindAddress configurable
> 
>
> Key: SPARK-29670
> URL: https://issues.apache.org/jira/browse/SPARK-29670
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.4
>Reporter: Nishchal Venkataramana
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29670) Make executor's bindAddress configurable

2019-10-30 Thread Nishchal Venkataramana (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nishchal Venkataramana updated SPARK-29670:
---
Fix Version/s: 3.0.0

> Make executor's bindAddress configurable
> 
>
> Key: SPARK-29670
> URL: https://issues.apache.org/jira/browse/SPARK-29670
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.4
>Reporter: Nishchal Venkataramana
>Priority: Major
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29672) remove python2 test from python/run-tests.py

2019-10-30 Thread Shane Knapp (Jira)
Shane Knapp created SPARK-29672:
---

 Summary: remove python2 test from python/run-tests.py
 Key: SPARK-29672
 URL: https://issues.apache.org/jira/browse/SPARK-29672
 Project: Spark
  Issue Type: Sub-task
  Components: Build
Affects Versions: 3.0.0
Reporter: Shane Knapp
Assignee: Shane Knapp






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29670) Make executor's bindAddress configurable

2019-10-30 Thread Nishchal Venkataramana (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nishchal Venkataramana updated SPARK-29670:
---
Labels:   (was: bulk-closed)

> Make executor's bindAddress configurable
> 
>
> Key: SPARK-29670
> URL: https://issues.apache.org/jira/browse/SPARK-29670
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.1.1
>Reporter: Nishchal Venkataramana
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29668) Deprecate Python 3 prior to version 3.6

2019-10-30 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-29668:
--
Parent: SPARK-27884
Issue Type: Sub-task  (was: Task)

> Deprecate Python 3 prior to version 3.6
> ---
>
> Key: SPARK-29668
> URL: https://issues.apache.org/jira/browse/SPARK-29668
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27884) Deprecate Python 2 and Python 3 prior to 3.6 support in Spark 3.0

2019-10-30 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-27884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-27884:
--
Summary: Deprecate Python 2 and Python 3 prior to 3.6 support in Spark 3.0  
(was: Deprecate Python 2 support in Spark 3.0)

> Deprecate Python 2 and Python 3 prior to 3.6 support in Spark 3.0
> -
>
> Key: SPARK-27884
> URL: https://issues.apache.org/jira/browse/SPARK-27884
> Project: Spark
>  Issue Type: Story
>  Components: PySpark
>Affects Versions: 3.0.0
>Reporter: Xiangrui Meng
>Assignee: Xiangrui Meng
>Priority: Major
>  Labels: release-notes
>
> Officially deprecate Python 2 support in Spark 3.0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-27705) kubernetes integration test break on osx when test PVTestsSuite

2019-10-30 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-27705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reopened SPARK-27705:
---

> kubernetes integration test break on osx when test PVTestsSuite
> ---
>
> Key: SPARK-27705
> URL: https://issues.apache.org/jira/browse/SPARK-27705
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Tests
>Affects Versions: 2.4.3
>Reporter: Henry Yu
>Priority: Minor
>
>  
> PVTestsSuite create file on host path /tmp
> But when we use osx, minikube is started on virtualbox, so it breaks test.
> The easiest way to fix is start minikube with host mount string.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-27705) kubernetes integration test break on osx when test PVTestsSuite

2019-10-30 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-27705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-27705.
---
Resolution: Invalid

> kubernetes integration test break on osx when test PVTestsSuite
> ---
>
> Key: SPARK-27705
> URL: https://issues.apache.org/jira/browse/SPARK-27705
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Tests
>Affects Versions: 2.4.3
>Reporter: Henry Yu
>Priority: Minor
>
>  
> PVTestsSuite create file on host path /tmp
> But when we use osx, minikube is started on virtualbox, so it breaks test.
> The easiest way to fix is start minikube with host mount string.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27705) kubernetes integration test break on osx when test PVTestsSuite

2019-10-30 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-27705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-27705:
--
Component/s: Tests

> kubernetes integration test break on osx when test PVTestsSuite
> ---
>
> Key: SPARK-27705
> URL: https://issues.apache.org/jira/browse/SPARK-27705
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Tests
>Affects Versions: 2.4.3
>Reporter: Henry Yu
>Priority: Minor
>
>  
> PVTestsSuite create file on host path /tmp
> But when we use osx, minikube is started on virtualbox, so it breaks test.
> The easiest way to fix is start minikube with host mount string.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-27705) kubernetes integration test break on osx when test PVTestsSuite

2019-10-30 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-27705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-27705.
---
Resolution: Fixed

As [~Andrew HUALI] reported originally, minikube should be started with host 
mount string. It's not a Apache Spark issue.

> kubernetes integration test break on osx when test PVTestsSuite
> ---
>
> Key: SPARK-27705
> URL: https://issues.apache.org/jira/browse/SPARK-27705
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Tests
>Affects Versions: 2.4.3
>Reporter: Henry Yu
>Priority: Minor
>
>  
> PVTestsSuite create file on host path /tmp
> But when we use osx, minikube is started on virtualbox, so it breaks test.
> The easiest way to fix is start minikube with host mount string.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29671) Change format of interval string

2019-10-30 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16963379#comment-16963379
 ] 

Dongjoon Hyun commented on SPARK-29671:
---

Thanks for creating this, [~maxgekk]!

> Change format of interval string
> 
>
> Key: SPARK-29671
> URL: https://issues.apache.org/jira/browse/SPARK-29671
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Maxim Gekk
>Priority: Minor
>
> The ticket aims to improve format of interval representation as a string. See 
> https://github.com/apache/spark/pull/26313#issuecomment-547820035



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29671) Change format of interval string

2019-10-30 Thread Maxim Gekk (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16963375#comment-16963375
 ] 

Maxim Gekk commented on SPARK-29671:


For example, PostgreSQL displays intervals like:
{code}
maxim=# select interval '1010 year 9 month 8 day 7 hour 6 minute -5 second 4 
millisecond -3 microseconds';
 interval
--
 1010 years 9 mons 8 days 07:05:55.003997
(1 row)
{code}
but this requires "normalization" because time fields cannot be negative.

> Change format of interval string
> 
>
> Key: SPARK-29671
> URL: https://issues.apache.org/jira/browse/SPARK-29671
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Maxim Gekk
>Priority: Minor
>
> The ticket aims to improve format of interval representation as a string. See 
> https://github.com/apache/spark/pull/26313#issuecomment-547820035



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29671) Change format of interval string

2019-10-30 Thread Maxim Gekk (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16963373#comment-16963373
 ] 

Maxim Gekk commented on SPARK-29671:


[~cloud_fan][~dongjoon] Let's discuss here how to improve the string 
representation of intervals.

> Change format of interval string
> 
>
> Key: SPARK-29671
> URL: https://issues.apache.org/jira/browse/SPARK-29671
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Maxim Gekk
>Priority: Minor
>
> The ticket aims to improve format of interval representation as a string. See 
> https://github.com/apache/spark/pull/26313#issuecomment-547820035



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29671) Change format of interval string

2019-10-30 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-29671:
--

 Summary: Change format of interval string
 Key: SPARK-29671
 URL: https://issues.apache.org/jira/browse/SPARK-29671
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.0
Reporter: Maxim Gekk


The ticket aims to improve format of interval representation as a string. See 
https://github.com/apache/spark/pull/26313#issuecomment-547820035



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29670) Make executor's bindAddress configurable

2019-10-30 Thread Nishchal Venkataramana (Jira)
Nishchal Venkataramana created SPARK-29670:
--

 Summary: Make executor's bindAddress configurable
 Key: SPARK-29670
 URL: https://issues.apache.org/jira/browse/SPARK-29670
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 2.1.1
Reporter: Nishchal Venkataramana






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-29668) Deprecate Python 3 prior to version 3.6

2019-10-30 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-29668.
---
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 26326
[https://github.com/apache/spark/pull/26326]

> Deprecate Python 3 prior to version 3.6
> ---
>
> Key: SPARK-29668
> URL: https://issues.apache.org/jira/browse/SPARK-29668
> Project: Spark
>  Issue Type: Task
>  Components: Documentation
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-29668) Deprecate Python 3 prior to version 3.6

2019-10-30 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-29668:
-

Assignee: Dongjoon Hyun

> Deprecate Python 3 prior to version 3.6
> ---
>
> Key: SPARK-29668
> URL: https://issues.apache.org/jira/browse/SPARK-29668
> Project: Spark
>  Issue Type: Task
>  Components: Documentation
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29669) Refactor IntervalUtils.fromDayTimeString()

2019-10-30 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-29669:
--

 Summary: Refactor IntervalUtils.fromDayTimeString()
 Key: SPARK-29669
 URL: https://issues.apache.org/jira/browse/SPARK-29669
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.0
Reporter: Maxim Gekk


* Add UnitName enumeration and use it in AstBuilder and in IntervalUtils
* Make fromDayTimeString more generic and avoid adhoc code
* Introduce unit value properties like min/max values and a function to convert 
parsed value to micros



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29668) Deprecate Python 3 prior to version 3.6

2019-10-30 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-29668:
--
Summary: Deprecate Python 3 prior to version 3.6  (was: Deprecate Python 
prior to version 3.6)

> Deprecate Python 3 prior to version 3.6
> ---
>
> Key: SPARK-29668
> URL: https://issues.apache.org/jira/browse/SPARK-29668
> Project: Spark
>  Issue Type: Task
>  Components: Documentation
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29668) Deprecate Python prior to version 3.6

2019-10-30 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-29668:
-

 Summary: Deprecate Python prior to version 3.6
 Key: SPARK-29668
 URL: https://issues.apache.org/jira/browse/SPARK-29668
 Project: Spark
  Issue Type: Task
  Components: Documentation
Affects Versions: 3.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29667) implicitly convert mismatched datatypes on right side of "IN" operator

2019-10-30 Thread Cheng Lian (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian updated SPARK-29667:
---
Environment: (was: spark-2.4.3-bin-dbr-5.5-snapshot-9833d0f)

> implicitly convert mismatched datatypes on right side of "IN" operator
> --
>
> Key: SPARK-29667
> URL: https://issues.apache.org/jira/browse/SPARK-29667
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.3
>Reporter: Jessie Lin
>Priority: Minor
>
> Ran into error on this sql
> Mismatched columns:
> [(a.`id`:decimal(28,0), db1.table1.`id`:decimal(18,0))] 
> the sql and clause
>   AND   a.id in (select id from db1.table1 where col1 = 1 group by id)
> Once I cast decimal(18,0) to decimal(28,0) explicitly above, the sql ran just 
> fine. Can the sql engine cast implicitly in this case?
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29667) implicitly convert mismatched datatypes on right side of "IN" operator

2019-10-30 Thread Cheng Lian (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16963305#comment-16963305
 ] 

Cheng Lian commented on SPARK-29667:


Reproduced this with the following snippet:
{code}
spark.range(10).select($"id" cast DecimalType(18, 
0)).createOrReplaceTempView("t1")
spark.range(10).select($"id" cast DecimalType(28, 
0)).createOrReplaceTempView("t2")
sql("SELECT * FROM t1 WHERE t1.id IN (SELECT id FROM t2)").explain(true)
{code}
Exception:
{noformat}
The data type of one or more elements in the left hand side of an IN subquery
is not compatible with the data type of the output of the subquery
Mismatched columns:
[(t1.`id`:decimal(18,0), t2.`id`:decimal(28,0))]
Left side:
[decimal(18,0)].
Right side:
[decimal(28,0)].; line 1 pos 29;
'Project [*]
+- 'Filter id#16 IN (list#22 [])
   :  +- Project [id#20]
   : +- SubqueryAlias `t2`
   :+- Project [cast(id#18L as decimal(28,0)) AS id#20]
   :   +- Range (0, 10, step=1, splits=Some(8))
   +- SubqueryAlias `t1`
  +- Project [cast(id#14L as decimal(18,0)) AS id#16]
 +- Range (0, 10, step=1, splits=Some(8))
at 
org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$3.applyOrElse(CheckAnalysis.scala:123)
...
{noformat}
It seems that Postgres does support this kind of implicit casting:
{noformat}
postgres=# SELECT CAST(1 AS BIGINT) IN (CAST(1 AS INT));

 ?column?
--
 t
(1 row)
{noformat}
I believe the problem in Spark is that 
{{o.a.s.s.c.expressions.In#checkInputDataTypes()}} is too strict.

> implicitly convert mismatched datatypes on right side of "IN" operator
> --
>
> Key: SPARK-29667
> URL: https://issues.apache.org/jira/browse/SPARK-29667
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.3
> Environment: spark-2.4.3-bin-dbr-5.5-snapshot-9833d0f
>Reporter: Jessie Lin
>Priority: Minor
>
> Ran into error on this sql
> Mismatched columns:
> [(a.`id`:decimal(28,0), db1.table1.`id`:decimal(18,0))] 
> the sql and clause
>   AND   a.id in (select id from db1.table1 where col1 = 1 group by id)
> Once I cast decimal(18,0) to decimal(28,0) explicitly above, the sql ran just 
> fine. Can the sql engine cast implicitly in this case?
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29667) implicitly convert mismatched datatypes on right side of "IN" operator

2019-10-30 Thread Jessie Lin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jessie Lin updated SPARK-29667:
---
Priority: Minor  (was: Major)

> implicitly convert mismatched datatypes on right side of "IN" operator
> --
>
> Key: SPARK-29667
> URL: https://issues.apache.org/jira/browse/SPARK-29667
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.3
> Environment: spark-2.4.3-bin-dbr-5.5-snapshot-9833d0f
>Reporter: Jessie Lin
>Priority: Minor
>
> Ran into error on this sql
> Mismatched columns:
> [(a.`id`:decimal(28,0), db1.table1.`id`:decimal(18,0))] 
> the sql and clause
>   AND   a.id in (select id from db1.table1 where col1 = 1 group by id)
> Once I cast decimal(18,0) to decimal(28,0) explicitly above, the sql ran just 
> fine. Can the sql engine cast implicitly in this case?
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29667) implicitly convert mismatched datatypes on right side of "IN" operator

2019-10-30 Thread Jessie Lin (Jira)
Jessie Lin created SPARK-29667:
--

 Summary: implicitly convert mismatched datatypes on right side of 
"IN" operator
 Key: SPARK-29667
 URL: https://issues.apache.org/jira/browse/SPARK-29667
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.4.3
 Environment: spark-2.4.3-bin-dbr-5.5-snapshot-9833d0f
Reporter: Jessie Lin


Ran into error on this sql
Mismatched columns:
[(a.`id`:decimal(28,0), db1.table1.`id`:decimal(18,0))] 
the sql and clause
  AND   a.id in (select id from db1.table1 where col1 = 1 group by id)
Once I cast decimal(18,0) to decimal(28,0) explicitly above, the sql ran just 
fine. Can the sql engine cast implicitly in this case?
 
 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29666) Release script fail to publish release under dry run mode

2019-10-30 Thread Xingbo Jiang (Jira)
Xingbo Jiang created SPARK-29666:


 Summary: Release script fail to publish release under dry run mode
 Key: SPARK-29666
 URL: https://issues.apache.org/jira/browse/SPARK-29666
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 3.0.0
Reporter: Xingbo Jiang


`release-build.sh` fail to publish release under dry run mode with the 
following error message:
```
/opt/spark-rm/release-build.sh: line 429: pushd: 
spark-repo-g4MBm/org/apache/spark: No such file or directory
```



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29666) Release script fail to publish release under dry run mode

2019-10-30 Thread Xingbo Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xingbo Jiang updated SPARK-29666:
-
Description: 
`release-build.sh` fail to publish release under dry run mode with the 
following error message:
{code}
/opt/spark-rm/release-build.sh: line 429: pushd: 
spark-repo-g4MBm/org/apache/spark: No such file or directory
{code}

  was:
`release-build.sh` fail to publish release under dry run mode with the 
following error message:
```
/opt/spark-rm/release-build.sh: line 429: pushd: 
spark-repo-g4MBm/org/apache/spark: No such file or directory
```


> Release script fail to publish release under dry run mode
> -
>
> Key: SPARK-29666
> URL: https://issues.apache.org/jira/browse/SPARK-29666
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.0.0
>Reporter: Xingbo Jiang
>Priority: Major
>
> `release-build.sh` fail to publish release under dry run mode with the 
> following error message:
> {code}
> /opt/spark-rm/release-build.sh: line 429: pushd: 
> spark-repo-g4MBm/org/apache/spark: No such file or directory
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29604) SessionState is initialized with isolated classloader for Hive if spark.sql.hive.metastore.jars is being set

2019-10-30 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16963264#comment-16963264
 ] 

Dongjoon Hyun commented on SPARK-29604:
---

Thank you so much for confirming, [~kabhwan]!

The newly added test case seems to be flaky in `SBT Hadoop 3.2` build. Could 
you check that?

- 
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-3.2/676/testReport/org.apache.spark.sql.hive.thriftserver/SparkSQLEnvSuite/SPARK_29604_external_listeners_should_be_initialized_with_Spark_classloader/history/

> SessionState is initialized with isolated classloader for Hive if 
> spark.sql.hive.metastore.jars is being set
> 
>
> Key: SPARK-29604
> URL: https://issues.apache.org/jira/browse/SPARK-29604
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.4, 3.0.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Major
> Fix For: 2.4.5, 3.0.0
>
>
> I've observed the issue that external listeners cannot be loaded properly 
> when we run spark-sql with "spark.sql.hive.metastore.jars" configuration 
> being used.
> {noformat}
> Exception in thread "main" java.lang.IllegalArgumentException: Error while 
> instantiating 'org.apache.spark.sql.hive.HiveSessionStateBuilder':
>   at 
> org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1102)
>   at 
> org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:154)
>   at 
> org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:153)
>   at scala.Option.getOrElse(Option.scala:121)
>   at 
> org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:153)
>   at 
> org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:150)
>   at 
> org.apache.spark.sql.SparkSession$$anonfun$1$$anonfun$apply$2.apply(SparkSession.scala:104)
>   at 
> org.apache.spark.sql.SparkSession$$anonfun$1$$anonfun$apply$2.apply(SparkSession.scala:104)
>   at scala.Option.map(Option.scala:146)
>   at 
> org.apache.spark.sql.SparkSession$$anonfun$1.apply(SparkSession.scala:104)
>   at 
> org.apache.spark.sql.SparkSession$$anonfun$1.apply(SparkSession.scala:103)
>   at org.apache.spark.sql.internal.SQLConf$.get(SQLConf.scala:149)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.org$apache$spark$sql$hive$client$HiveClientImpl$$client(HiveClientImpl.scala:282)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:306)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:247)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:246)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:296)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.databaseExists(HiveClientImpl.scala:386)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply$mcZ$sp(HiveExternalCatalog.scala:215)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply(HiveExternalCatalog.scala:215)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply(HiveExternalCatalog.scala:215)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog.databaseExists(HiveExternalCatalog.scala:214)
>   at 
> org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:114)
>   at 
> org.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:102)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:53)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.(SparkSQLCLIDriver.scala:315)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:166)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
>   at 
> 

[jira] [Created] (SPARK-29665) refine the TableProvider interface

2019-10-30 Thread Wenchen Fan (Jira)
Wenchen Fan created SPARK-29665:
---

 Summary: refine the TableProvider interface
 Key: SPARK-29665
 URL: https://issues.apache.org/jira/browse/SPARK-29665
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.0
Reporter: Wenchen Fan
Assignee: Wenchen Fan






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29664) Column.getItem behavior is not consistent with Scala version

2019-10-30 Thread Terry Kim (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16963222#comment-16963222
 ] 

Terry Kim commented on SPARK-29664:
---

[~hyukjin.kwon], is there a reason why getItem's behavior is not consistent? If 
this behavior is a bug, I will prepare a PR shortly.

> Column.getItem behavior is not consistent with Scala version
> 
>
> Key: SPARK-29664
> URL: https://issues.apache.org/jira/browse/SPARK-29664
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.0.0
>Reporter: Terry Kim
>Priority: Major
>
> In PySpark, Column.getItem's behavior is different from the Scala version.
> For example,
> In PySpark:
> {code:python}
> df = spark.range(2)
> map_col = create_map(lit(0), lit(100), lit(1), lit(200))
> df.withColumn("mapped", map_col.getItem(col('id'))).show()
> # +---+--+
> # | id|mapped|
> # +---+--+
> # |  0|   100|
> # |  1|   200|
> # +---+--+
> {code}
> In Scala:
> {code:scala}
> val df = spark.range(2)
> val map_col = map(lit(0), lit(100), lit(1), lit(200))
> // The following getItem results in the following exception, which is the 
> right behavior:
> // java.lang.RuntimeException: Unsupported literal type class 
> org.apache.spark.sql.Column id
> //  at 
> org.apache.spark.sql.catalyst.expressions.Literal$.apply(literals.scala:78)
> //  at org.apache.spark.sql.Column.getItem(Column.scala:856)
> //  ... 49 elided
> df.withColumn("mapped", map_col.getItem(col("id"))).show
> // You have to use apply() to match with PySpark's behavior.
> df.withColumn("mapped", map_col(col("id"))).show
> // +---+--+
> // | id|mapped|
> // +---+--+
> // |  0|   100|
> // |  1|   200|
> // +---+--+
> {code}
> Looking at the code for Scala implementation, PySpark's behavior is incorrect 
> since the argument to getItem becomes `Literal`.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29664) Column.getItem behavior is not consistent with Scala version

2019-10-30 Thread Terry Kim (Jira)
Terry Kim created SPARK-29664:
-

 Summary: Column.getItem behavior is not consistent with Scala 
version
 Key: SPARK-29664
 URL: https://issues.apache.org/jira/browse/SPARK-29664
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 3.0.0
Reporter: Terry Kim


In PySpark, Column.getItem's behavior is different from the Scala version.

For example,
In PySpark:
{code:python}
df = spark.range(2)
map_col = create_map(lit(0), lit(100), lit(1), lit(200))
df.withColumn("mapped", map_col.getItem(col('id'))).show()
# +---+--+
# | id|mapped|
# +---+--+
# |  0|   100|
# |  1|   200|
# +---+--+
{code}

In Scala:
{code:scala}
val df = spark.range(2)
val map_col = map(lit(0), lit(100), lit(1), lit(200))
// The following getItem results in the following exception, which is the right 
behavior:
// java.lang.RuntimeException: Unsupported literal type class 
org.apache.spark.sql.Column id
//  at 
org.apache.spark.sql.catalyst.expressions.Literal$.apply(literals.scala:78)
//  at org.apache.spark.sql.Column.getItem(Column.scala:856)
//  ... 49 elided
df.withColumn("mapped", map_col.getItem(col("id"))).show


// You have to use apply() to match with PySpark's behavior.
df.withColumn("mapped", map_col(col("id"))).show
// +---+--+
// | id|mapped|
// +---+--+
// |  0|   100|
// |  1|   200|
// +---+--+
{code}

Looking at the code for Scala implementation, PySpark's behavior is incorrect 
since the argument to getItem becomes `Literal`.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29662) Cannot have circular references in bean class, but got the circular reference of class class io.cdap.cdap.api.data.schema.Schema

2019-10-30 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16963198#comment-16963198
 ] 

Dongjoon Hyun commented on SPARK-29662:
---

Hi, [~coudray]. What is `CDAP`? Since Apache Spark 2.3.x is EOL, could you try 
Apache Spark 2.4.4?

> Cannot have circular references in bean class, but got the circular reference 
> of class class io.cdap.cdap.api.data.schema.Schema
> 
>
> Key: SPARK-29662
> URL: https://issues.apache.org/jira/browse/SPARK-29662
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.4
>Reporter: romain
>Priority: Major
>
> i'm unable to convert JavaRdd to DataSet 
> i'm using cdap 6.0.0 or 5.1.2 with spark 2.3.4
> Encoder encoderStruct = 
> Encoders.bean(StructuredRecord.class);
> this line make this error: 
> "Cannot have circular references in bean class, but got the circular 
> reference of class class io.cdap.cdap.api.data.schema.Schema"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-29662) Cannot have circular references in bean class, but got the circular reference of class class io.cdap.cdap.api.data.schema.Schema

2019-10-30 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16963198#comment-16963198
 ] 

Dongjoon Hyun edited comment on SPARK-29662 at 10/30/19 4:35 PM:
-

Hi, [~coudray]. What is `CDAP`? Since Apache Spark 2.3.x is EOL, could you try 
Apache Spark 2.4.4?
We regularly closes outdated JIRA which reports only EOL releases (<= 2.3.x).


was (Author: dongjoon):
Hi, [~coudray]. What is `CDAP`? Since Apache Spark 2.3.x is EOL, could you try 
Apache Spark 2.4.4?

> Cannot have circular references in bean class, but got the circular reference 
> of class class io.cdap.cdap.api.data.schema.Schema
> 
>
> Key: SPARK-29662
> URL: https://issues.apache.org/jira/browse/SPARK-29662
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.4
>Reporter: romain
>Priority: Major
>
> i'm unable to convert JavaRdd to DataSet 
> i'm using cdap 6.0.0 or 5.1.2 with spark 2.3.4
> Encoder encoderStruct = 
> Encoders.bean(StructuredRecord.class);
> this line make this error: 
> "Cannot have circular references in bean class, but got the circular 
> reference of class class io.cdap.cdap.api.data.schema.Schema"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-29462) The data type of "array()" should be array

2019-10-30 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-29462.
--
Resolution: Not A Problem

> The data type of "array()" should be array
> 
>
> Key: SPARK-29462
> URL: https://issues.apache.org/jira/browse/SPARK-29462
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Gengliang Wang
>Priority: Minor
>
> In the current implmentation:
> > spark.sql("select array()")
> res0: org.apache.spark.sql.DataFrame = [array(): array]
> The output type should be array



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29660) Dropping columns and changing column names/types are prohibited in VIEW definition

2019-10-30 Thread Aman Omer (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16963186#comment-16963186
 ] 

Aman Omer commented on SPARK-29660:
---

Checking this one 

> Dropping columns and changing column names/types are prohibited in VIEW 
> definition
> --
>
> Key: SPARK-29660
> URL: https://issues.apache.org/jira/browse/SPARK-29660
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Takeshi Yamamuro
>Priority: Major
>
> In PostgreSQL, the three DDL syntaxes for VIEW cannot be accepted;
> {code:java}
> -- should fail
> CREATE OR REPLACE VIEW viewtest AS
>   SELECT a FROM viewtest_tbl WHERE a <> 20;
> ERROR:  cannot drop columns from view
> -- should fail
> CREATE OR REPLACE VIEW viewtest AS
>   SELECT 1, * FROM viewtest_tbl;
> ERROR:  cannot change name of view column "a" to "?column?"
> -- should fail
> CREATE OR REPLACE VIEW viewtest AS
>   SELECT a, b::numeric FROM viewtest_tbl;
> ERROR:  cannot change data type of view column "b" from integer to numeric
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29663) Support sum with interval type values

2019-10-30 Thread Kent Yao (Jira)
Kent Yao created SPARK-29663:


 Summary: Support sum with interval type values
 Key: SPARK-29663
 URL: https://issues.apache.org/jira/browse/SPARK-29663
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.0
Reporter: Kent Yao


{code:sql}
postgres=# SELECT i, Sum(cast(v as interval)) OVER (ORDER BY i ROWS BETWEEN 
CURRENT ROW AND UNBOUNDED FOLLOWING) FROM (VALUES(1,'1 sec'),(2,'2 
sec'),(3,NULL),(4,NULL)) t(i,v); 
i | sum ---+-- 
1 | 00:00:03 
2 | 00:00:02
3 | 
4 | 
(4 rows)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-29120) Add create_view.sql

2019-10-30 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-29120:
-

Assignee: Takeshi Yamamuro

> Add create_view.sql
> ---
>
> Key: SPARK-29120
> URL: https://issues.apache.org/jira/browse/SPARK-29120
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Tests
>Affects Versions: 3.0.0
>Reporter: ABHISHEK KUMAR GUPTA
>Assignee: Takeshi Yamamuro
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-29120) Add create_view.sql

2019-10-30 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-29120.
---
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 26290
[https://github.com/apache/spark/pull/26290]

> Add create_view.sql
> ---
>
> Key: SPARK-29120
> URL: https://issues.apache.org/jira/browse/SPARK-29120
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Tests
>Affects Versions: 3.0.0
>Reporter: ABHISHEK KUMAR GUPTA
>Assignee: Takeshi Yamamuro
>Priority: Major
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29640) [K8S] Intermittent "java.net.UnknownHostException: kubernetes.default.svc" in Spark driver

2019-10-30 Thread Andy Grove (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16963175#comment-16963175
 ] 

Andy Grove commented on SPARK-29640:


A hacky workaround is to wait for DNS to resolve before creating the Spark 
context:
{code:java}
def waitForDns(): Unit = {
  
  val host = "kubernetes.default.svc"

  println(s"Resolving $host ...")
  val t1 = System.currentTimeMillis()
  var attempts = 0
  while (System.currentTimeMillis() - t1 < 15000) {
try {
  attempts += 1
  val address = InetAddress.getByName(host)
  println(s"Resolved $host as ${address.getHostAddress()} after $attempts 
attempt(s)")
  return
} catch {
  case _: UnknownHostException =>
println(s"Failed to resolve $host due to UnknownHostException (attempt 
$attempts)")
Thread.sleep(100)
}
  }
} {code}

> [K8S] Intermittent "java.net.UnknownHostException: kubernetes.default.svc" in 
> Spark driver
> --
>
> Key: SPARK-29640
> URL: https://issues.apache.org/jira/browse/SPARK-29640
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.4.4
>Reporter: Andy Grove
>Priority: Major
> Fix For: 2.4.5
>
>
> We are running into intermittent DNS issues where the Spark driver fails to 
> resolve "kubernetes.default.svc" when trying to create executors. We are 
> running Spark 2.4.4 (with the patch for SPARK-28921) in cluster mode in EKS.
> This happens approximately 10% of the time.
> Here is the stack trace:
> {code:java}
> Exception in thread "main" org.apache.spark.SparkException: External 
> scheduler cannot be instantiated
>   at 
> org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2794)
>   at org.apache.spark.SparkContext.(SparkContext.scala:493)
>   at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2520)
>   at 
> org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:935)
>   at 
> org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:926)
>   at scala.Option.getOrElse(Option.scala:121)
>   at 
> org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:926)
>   at com.rms.execution.test.SparkPiTask$.main(SparkPiTask.scala:36)
>   at com.rms.execution.test.SparkPiTask.main(SparkPiTask.scala)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
>   at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
>   at 
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
>   at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
>   at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
>   at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
>   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
>   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Operation: 
> [get]  for kind: [Pod]  with name: 
> [wf-5-69674f15d0fc45-1571354060179-driver]  in namespace: 
> [tenant-8-workflows]  failed.
>   at 
> io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:64)
>   at 
> io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:72)
>   at 
> io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:229)
>   at 
> io.fabric8.kubernetes.client.dsl.base.BaseOperation.get(BaseOperation.java:162)
>   at 
> org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator$$anonfun$1.apply(ExecutorPodsAllocator.scala:57)
>   at 
> org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator$$anonfun$1.apply(ExecutorPodsAllocator.scala:55)
>   at scala.Option.map(Option.scala:146)
>   at 
> org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.(ExecutorPodsAllocator.scala:55)
>   at 
> org.apache.spark.scheduler.cluster.k8s.KubernetesClusterManager.createSchedulerBackend(KubernetesClusterManager.scala:89)
>   at 
> org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2788)
>   ... 20 more
> Caused by: java.net.UnknownHostException: 

[jira] [Assigned] (SPARK-29653) Fix MICROS_PER_MONTH in IntervalUtils

2019-10-30 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-29653:
-

Assignee: Kent Yao

> Fix MICROS_PER_MONTH in IntervalUtils
> -
>
> Key: SPARK-29653
> URL: https://issues.apache.org/jira/browse/SPARK-29653
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Minor
>
> - final val MICROS_PER_MONTH: Long = DAYS_PER_MONTH * 
> DateTimeUtils.SECONDS_PER_DAY
> + final val MICROS_PER_MONTH: Long = DAYS_PER_MONTH * 
> DateTimeUtils.MICROS_PER_DAY



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-29653) Fix MICROS_PER_MONTH in IntervalUtils

2019-10-30 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-29653.
---
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 26321
[https://github.com/apache/spark/pull/26321]

> Fix MICROS_PER_MONTH in IntervalUtils
> -
>
> Key: SPARK-29653
> URL: https://issues.apache.org/jira/browse/SPARK-29653
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Minor
> Fix For: 3.0.0
>
>
> - final val MICROS_PER_MONTH: Long = DAYS_PER_MONTH * 
> DateTimeUtils.SECONDS_PER_DAY
> + final val MICROS_PER_MONTH: Long = DAYS_PER_MONTH * 
> DateTimeUtils.MICROS_PER_DAY



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29653) Fix MICROS_PER_MONTH in IntervalUtils

2019-10-30 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-29653:
--
Priority: Minor  (was: Major)

> Fix MICROS_PER_MONTH in IntervalUtils
> -
>
> Key: SPARK-29653
> URL: https://issues.apache.org/jira/browse/SPARK-29653
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Kent Yao
>Priority: Minor
>
> - final val MICROS_PER_MONTH: Long = DAYS_PER_MONTH * 
> DateTimeUtils.SECONDS_PER_DAY
> + final val MICROS_PER_MONTH: Long = DAYS_PER_MONTH * 
> DateTimeUtils.MICROS_PER_DAY



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29640) [K8S] Intermittent "java.net.UnknownHostException: kubernetes.default.svc" in Spark driver

2019-10-30 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated SPARK-29640:
---
Description: 
We are running into intermittent DNS issues where the Spark driver fails to 
resolve "kubernetes.default.svc" when trying to create executors. We are 
running Spark 2.4.4 (with the patch for SPARK-28921) in cluster mode in EKS.

This happens approximately 10% of the time.

Here is the stack trace:
{code:java}
Exception in thread "main" org.apache.spark.SparkException: External scheduler 
cannot be instantiated
at 
org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2794)
at org.apache.spark.SparkContext.(SparkContext.scala:493)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2520)
at 
org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:935)
at 
org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:926)
at scala.Option.getOrElse(Option.scala:121)
at 
org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:926)
at com.rms.execution.test.SparkPiTask$.main(SparkPiTask.scala:36)
at com.rms.execution.test.SparkPiTask.main(SparkPiTask.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Operation: 
[get]  for kind: [Pod]  with name: 
[wf-5-69674f15d0fc45-1571354060179-driver]  in namespace: 
[tenant-8-workflows]  failed.
at 
io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:64)
at 
io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:72)
at 
io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:229)
at 
io.fabric8.kubernetes.client.dsl.base.BaseOperation.get(BaseOperation.java:162)
at 
org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator$$anonfun$1.apply(ExecutorPodsAllocator.scala:57)
at 
org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator$$anonfun$1.apply(ExecutorPodsAllocator.scala:55)
at scala.Option.map(Option.scala:146)
at 
org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.(ExecutorPodsAllocator.scala:55)
at 
org.apache.spark.scheduler.cluster.k8s.KubernetesClusterManager.createSchedulerBackend(KubernetesClusterManager.scala:89)
at 
org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2788)
... 20 more
Caused by: java.net.UnknownHostException: kubernetes.default.svc: Try again
at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929)
at 
java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1324)
at java.net.InetAddress.getAllByName0(InetAddress.java:1277)
at java.net.InetAddress.getAllByName(InetAddress.java:1193)
at java.net.InetAddress.getAllByName(InetAddress.java:1127)
at okhttp3.Dns$1.lookup(Dns.java:39)
at 
okhttp3.internal.connection.RouteSelector.resetNextInetSocketAddress(RouteSelector.java:171)
at 
okhttp3.internal.connection.RouteSelector.nextProxy(RouteSelector.java:137)
at okhttp3.internal.connection.RouteSelector.next(RouteSelector.java:82)
at 
okhttp3.internal.connection.StreamAllocation.findConnection(StreamAllocation.java:171)
at 
okhttp3.internal.connection.StreamAllocation.findHealthyConnection(StreamAllocation.java:121)
at 
okhttp3.internal.connection.StreamAllocation.newStream(StreamAllocation.java:100)
at 
okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:42)
at 
okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
at 

[jira] [Created] (SPARK-29662) Cannot have circular references in bean class, but got the circular reference of class class io.cdap.cdap.api.data.schema.Schema

2019-10-30 Thread romain (Jira)
romain created SPARK-29662:
--

 Summary: Cannot have circular references in bean class, but got 
the circular reference of class class io.cdap.cdap.api.data.schema.Schema
 Key: SPARK-29662
 URL: https://issues.apache.org/jira/browse/SPARK-29662
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.3.4
Reporter: romain


i'm unable to convert JavaRdd to DataSet 

i'm using cdap 6.0.0 or 5.1.2 with spark 2.3.4

Encoder encoderStruct = Encoders.bean(StructuredRecord.class);

this line make this error: 

"Cannot have circular references in bean class, but got the circular reference 
of class class io.cdap.cdap.api.data.schema.Schema"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29640) [K8S] Intermittent "java.net.UnknownHostException: kubernetes.default.svc" in Spark driver

2019-10-30 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated SPARK-29640:
---
Description: 
We are running into intermittent DNS issues where the Spark driver fails to 
resolve "kubernetes.default.svc" when trying to create executors. We are 
running Spark 2.4.4 (with the patch for SPARK-28921) in cluster mode.

This happens approximately 10% of the time.

Here is the stack trace:
{code:java}
Exception in thread "main" org.apache.spark.SparkException: External scheduler 
cannot be instantiated
at 
org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2794)
at org.apache.spark.SparkContext.(SparkContext.scala:493)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2520)
at 
org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:935)
at 
org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:926)
at scala.Option.getOrElse(Option.scala:121)
at 
org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:926)
at com.rms.execution.test.SparkPiTask$.main(SparkPiTask.scala:36)
at com.rms.execution.test.SparkPiTask.main(SparkPiTask.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Operation: 
[get]  for kind: [Pod]  with name: 
[wf-5-69674f15d0fc45-1571354060179-driver]  in namespace: 
[tenant-8-workflows]  failed.
at 
io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:64)
at 
io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:72)
at 
io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:229)
at 
io.fabric8.kubernetes.client.dsl.base.BaseOperation.get(BaseOperation.java:162)
at 
org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator$$anonfun$1.apply(ExecutorPodsAllocator.scala:57)
at 
org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator$$anonfun$1.apply(ExecutorPodsAllocator.scala:55)
at scala.Option.map(Option.scala:146)
at 
org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.(ExecutorPodsAllocator.scala:55)
at 
org.apache.spark.scheduler.cluster.k8s.KubernetesClusterManager.createSchedulerBackend(KubernetesClusterManager.scala:89)
at 
org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2788)
... 20 more
Caused by: java.net.UnknownHostException: kubernetes.default.svc: Try again
at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929)
at 
java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1324)
at java.net.InetAddress.getAllByName0(InetAddress.java:1277)
at java.net.InetAddress.getAllByName(InetAddress.java:1193)
at java.net.InetAddress.getAllByName(InetAddress.java:1127)
at okhttp3.Dns$1.lookup(Dns.java:39)
at 
okhttp3.internal.connection.RouteSelector.resetNextInetSocketAddress(RouteSelector.java:171)
at 
okhttp3.internal.connection.RouteSelector.nextProxy(RouteSelector.java:137)
at okhttp3.internal.connection.RouteSelector.next(RouteSelector.java:82)
at 
okhttp3.internal.connection.StreamAllocation.findConnection(StreamAllocation.java:171)
at 
okhttp3.internal.connection.StreamAllocation.findHealthyConnection(StreamAllocation.java:121)
at 
okhttp3.internal.connection.StreamAllocation.newStream(StreamAllocation.java:100)
at 
okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:42)
at 
okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
at 

[jira] [Commented] (SPARK-29640) [K8S] Make it possible to set DNS option to use TCP instead of UDP

2019-10-30 Thread Andy Grove (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16963059#comment-16963059
 ] 

Andy Grove commented on SPARK-29640:


Installing node local dns cache daemon might be another workaround for this 
issue: 
https://github.com/kubernetes/kubernetes/issues/56903#issuecomment-511750647

> [K8S] Make it possible to set DNS option to use TCP instead of UDP
> --
>
> Key: SPARK-29640
> URL: https://issues.apache.org/jira/browse/SPARK-29640
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.4.4
>Reporter: Andy Grove
>Priority: Major
> Fix For: 2.4.5
>
>
> We are running into intermittent DNS issues where the Spark driver fails to 
> resolve "kubernetes.default.svc" and this seems to be caused by 
> [https://github.com/kubernetes/kubernetes/issues/76790]
> One suggested workaround is to specify TCP mode for DNS lookups in the pod 
> spec 
> ([https://github.com/kubernetes/kubernetes/issues/56903#issuecomment-424498508]).
> I would like the ability to provide a flag to spark-submit to specify to use 
> TCP mode for DNS lookups.
> I am working on a PR for this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29661) Support cascaded syntax in CREATE SCHEMA

2019-10-30 Thread Takeshi Yamamuro (Jira)
Takeshi Yamamuro created SPARK-29661:


 Summary: Support cascaded syntax in CREATE SCHEMA
 Key: SPARK-29661
 URL: https://issues.apache.org/jira/browse/SPARK-29661
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.0
Reporter: Takeshi Yamamuro


In PostgreSQL, the cascaded syntax below in CREATE SCHEMA can be accepted;
{code}
CREATE SCHEMA temp_view_test
  CREATE TABLE base_table (a int, id int) using parquet
  CREATE TABLE base_table2 (a int, id int) using parquet;
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29660) Dropping columns and changing column names/types are prohibited in VIEW definition

2019-10-30 Thread Takeshi Yamamuro (Jira)
Takeshi Yamamuro created SPARK-29660:


 Summary: Dropping columns and changing column names/types are 
prohibited in VIEW definition
 Key: SPARK-29660
 URL: https://issues.apache.org/jira/browse/SPARK-29660
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.0
Reporter: Takeshi Yamamuro


In PostgreSQL, the three DDL syntaxes for VIEW cannot be accepted;
{code:java}
-- should fail
CREATE OR REPLACE VIEW viewtest AS
SELECT a FROM viewtest_tbl WHERE a <> 20;
ERROR:  cannot drop columns from view
-- should fail
CREATE OR REPLACE VIEW viewtest AS
SELECT 1, * FROM viewtest_tbl;
ERROR:  cannot change name of view column "a" to "?column?"
-- should fail
CREATE OR REPLACE VIEW viewtest AS
SELECT a, b::numeric FROM viewtest_tbl;
ERROR:  cannot change data type of view column "b" from integer to numeric
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29604) SessionState is initialized with isolated classloader for Hive if spark.sql.hive.metastore.jars is being set

2019-10-30 Thread Jungtaek Lim (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16963000#comment-16963000
 ] 

Jungtaek Lim commented on SPARK-29604:
--

I think it doesn't apply to branch-2.3 as the root issue is more alike lazy 
initialization of streaming query listeners and there's no configuration for 
registering streaming query listeners in Spark 2.3. (only API)

> SessionState is initialized with isolated classloader for Hive if 
> spark.sql.hive.metastore.jars is being set
> 
>
> Key: SPARK-29604
> URL: https://issues.apache.org/jira/browse/SPARK-29604
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.4, 3.0.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Major
> Fix For: 2.4.5, 3.0.0
>
>
> I've observed the issue that external listeners cannot be loaded properly 
> when we run spark-sql with "spark.sql.hive.metastore.jars" configuration 
> being used.
> {noformat}
> Exception in thread "main" java.lang.IllegalArgumentException: Error while 
> instantiating 'org.apache.spark.sql.hive.HiveSessionStateBuilder':
>   at 
> org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1102)
>   at 
> org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:154)
>   at 
> org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:153)
>   at scala.Option.getOrElse(Option.scala:121)
>   at 
> org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:153)
>   at 
> org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:150)
>   at 
> org.apache.spark.sql.SparkSession$$anonfun$1$$anonfun$apply$2.apply(SparkSession.scala:104)
>   at 
> org.apache.spark.sql.SparkSession$$anonfun$1$$anonfun$apply$2.apply(SparkSession.scala:104)
>   at scala.Option.map(Option.scala:146)
>   at 
> org.apache.spark.sql.SparkSession$$anonfun$1.apply(SparkSession.scala:104)
>   at 
> org.apache.spark.sql.SparkSession$$anonfun$1.apply(SparkSession.scala:103)
>   at org.apache.spark.sql.internal.SQLConf$.get(SQLConf.scala:149)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.org$apache$spark$sql$hive$client$HiveClientImpl$$client(HiveClientImpl.scala:282)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:306)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:247)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:246)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:296)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.databaseExists(HiveClientImpl.scala:386)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply$mcZ$sp(HiveExternalCatalog.scala:215)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply(HiveExternalCatalog.scala:215)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply(HiveExternalCatalog.scala:215)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog.databaseExists(HiveExternalCatalog.scala:214)
>   at 
> org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:114)
>   at 
> org.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:102)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:53)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.(SparkSQLCLIDriver.scala:315)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:166)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
>   at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:847)
>   at 
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
>   at 

[jira] [Created] (SPARK-29659) Support COMMENT ON syntax

2019-10-30 Thread Takeshi Yamamuro (Jira)
Takeshi Yamamuro created SPARK-29659:


 Summary: Support COMMENT ON syntax
 Key: SPARK-29659
 URL: https://issues.apache.org/jira/browse/SPARK-29659
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.0
Reporter: Takeshi Yamamuro


[https://www.postgresql.org/docs/current/sql-comment.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29658) Support geometric types

2019-10-30 Thread Takeshi Yamamuro (Jira)
Takeshi Yamamuro created SPARK-29658:


 Summary: Support geometric types
 Key: SPARK-29658
 URL: https://issues.apache.org/jira/browse/SPARK-29658
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.0
Reporter: Takeshi Yamamuro


[https://www.postgresql.org/docs/current/datatype-geometric.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29657) Iterator spill supporting radix sort with null prefix

2019-10-30 Thread dzcxzl (Jira)
dzcxzl created SPARK-29657:
--

 Summary: Iterator spill supporting radix sort with null prefix
 Key: SPARK-29657
 URL: https://issues.apache.org/jira/browse/SPARK-29657
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 2.4.0
Reporter: dzcxzl


In the case of radix sort, when the insertRecord part of the keyPrefix is null, 
the iterator type returned by getSortedIterator is ChainedIterator.
Currently ChainedIterator does not support spill, causing UnsafeExternalSorter 
to take up a lot of execution memory, allocatePage fails, throw 
SparkOutOfMemoryError Unable to acquire xxx bytes of memory, got 0

The following is a log of an error we encountered in the production environment.

[Executor task launch worker for task 66055] INFO TaskMemoryManager: Memory 
used in task 66055
[Executor task launch worker for task 66055] INFO TaskMemoryManager: Acquired 
by org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter@39dd866e: 
64.0 KB
[Executor task launch worker for task 66055] INFO TaskMemoryManager: Acquired 
by org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter@74d17927: 
4.6 GB
[Executor task launch worker for task 66055] INFO TaskMemoryManager: Acquired 
by org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter@31478f9c: 
61.0 MB
[Executor task launch worker for task 66055] INFO TaskMemoryManager: 0 bytes of 
memory were used by task 66055 but are not associated with specific consumers
[Executor task launch worker for task 66055] INFO TaskMemoryManager: 4962998749 
bytes of memory are used for execution and 2218326 bytes of memory are used for 
storage
[Executor task launch worker for task 66055] ERROR Executor: Exception in task 
42.3 in stage 29.0 (TID 66055)
SparkOutOfMemoryError: Unable to acquire 3436 bytes of memory, got 0



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29656) ML algs expose aggregationDepth

2019-10-30 Thread zhengruifeng (Jira)
zhengruifeng created SPARK-29656:


 Summary: ML algs expose aggregationDepth
 Key: SPARK-29656
 URL: https://issues.apache.org/jira/browse/SPARK-29656
 Project: Spark
  Issue Type: Improvement
  Components: ML, PySpark
Affects Versions: 3.0.0
Reporter: zhengruifeng


SVC/LoR/LiR/AFT had exposed expert param aggregationDepth to end users.

It should be nice to expose it in other algs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-29179) Aggregated Metrics by Executor table has refresh issue

2019-10-30 Thread Aman Omer (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aman Omer updated SPARK-29179:
--
Comment: was deleted

(was: Working on this.)

> Aggregated Metrics by Executor table has refresh issue
> --
>
> Key: SPARK-29179
> URL: https://issues.apache.org/jira/browse/SPARK-29179
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.0.0
>Reporter: ABHISHEK KUMAR GUPTA
>Priority: Minor
> Attachments: AggregatedMetricsRefresh Issue1.png, 
> AggregatedMetricsRefresh Issue2.png
>
>
> Aggregated Metrics by Executor table is not getting Refresh always with 
> deatils
>  
> Steps:
> create a job 
> create table emp (id int);
> insert into emp values(100);
> select * from emp;
> Go to Job Page in WEB UI
> Click On Job Description on Job Page and Under Stage Page
> Aggregated Metrics by Executor Table is not refreshed properly. If user do 
> the above operation 2-3 times it gives Empty Table as attached in JIRA
> Browser Used: Chrome Version 76.0.3809.132 (Official Build) (64-bit)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29655) Prefer bucket join if adaptive execution is enabled and maxNumPostShufflePartitions != bucket number

2019-10-30 Thread Yuming Wang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16962893#comment-16962893
 ] 

Yuming Wang commented on SPARK-29655:
-

cc [~Jk_Self]

> Prefer bucket join if adaptive execution is enabled and 
> maxNumPostShufflePartitions != bucket number
> 
>
> Key: SPARK-29655
> URL: https://issues.apache.org/jira/browse/SPARK-29655
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> Prefer bucketing join if adaptive execution is enabled and 
> maxNumPostShufflePartitions != bucket number.  How to reproduce:
> {code:scala}
> import org.apache.spark.sql.SaveMode
> spark.conf.set("spark.sql.autoBroadcastJoinThreshold", -1)
> spark.conf.set("spark.sql.shuffle.partitions", 4)
> val bucketedTableName = "bucketed_table"
> spark.range(10).write.bucketBy(4, 
> "id").sortBy("id").mode(SaveMode.Overwrite).saveAsTable(bucketedTableName)
> val bucketedTable = spark.table(bucketedTableName)
> val df = spark.range(4)
> df.join(bucketedTable, "id").explain()
> spark.conf.set("spark.sql.adaptive.enabled", true)
> spark.conf.set("spark.sql.adaptive.shuffle.maxNumPostShufflePartitions", 5)
> df.join(bucketedTable, "id").explain()
> {code}
> Output:
> {noformat}
> == Physical Plan ==
> AdaptiveSparkPlan(isFinalPlan=false)
> +- Project [id#5L]
>+- SortMergeJoin [id#5L], [id#3L], Inner
>   :- Sort [id#5L ASC NULLS FIRST], false, 0
>   :  +- Exchange hashpartitioning(id#5L, 5), true, [id=#92]
>   : +- Range (0, 4, step=1, splits=16)
>   +- Sort [id#3L ASC NULLS FIRST], false, 0
>  +- Exchange hashpartitioning(id#3L, 5), true, [id=#93]
> +- Project [id#3L]
>+- Filter isnotnull(id#3L)
>   +- FileScan parquet default.bucketed_table[id#3L] Batched: 
> true, DataFilters: [isnotnull(id#3L)], Format: Parquet, Location: 
> InMemoryFileIndex[file:/root/spark-3.0.0-preview-bin-hadoop3.2/spark-warehouse/bucketed_table],
>  PartitionFilters: [], PushedFilters: [IsNotNull(id)], ReadSchema: 
> struct, SelectedBucketsCount: 4 out of 4
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29655) Prefer bucket join if adaptive execution is enabled and maxNumPostShufflePartitions != bucket number

2019-10-30 Thread Yuming Wang (Jira)
Yuming Wang created SPARK-29655:
---

 Summary: Prefer bucket join if adaptive execution is enabled and 
maxNumPostShufflePartitions != bucket number
 Key: SPARK-29655
 URL: https://issues.apache.org/jira/browse/SPARK-29655
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.0
Reporter: Yuming Wang


Prefer bucketing join if adaptive execution is enabled and 
maxNumPostShufflePartitions != bucket number.  How to reproduce:
{code:scala}
import org.apache.spark.sql.SaveMode

spark.conf.set("spark.sql.autoBroadcastJoinThreshold", -1)
spark.conf.set("spark.sql.shuffle.partitions", 4)

val bucketedTableName = "bucketed_table"
spark.range(10).write.bucketBy(4, 
"id").sortBy("id").mode(SaveMode.Overwrite).saveAsTable(bucketedTableName)
val bucketedTable = spark.table(bucketedTableName)
val df = spark.range(4)

df.join(bucketedTable, "id").explain()

spark.conf.set("spark.sql.adaptive.enabled", true)
spark.conf.set("spark.sql.adaptive.shuffle.maxNumPostShufflePartitions", 5)
df.join(bucketedTable, "id").explain()
{code}

Output:
{noformat}
== Physical Plan ==
AdaptiveSparkPlan(isFinalPlan=false)
+- Project [id#5L]
   +- SortMergeJoin [id#5L], [id#3L], Inner
  :- Sort [id#5L ASC NULLS FIRST], false, 0
  :  +- Exchange hashpartitioning(id#5L, 5), true, [id=#92]
  : +- Range (0, 4, step=1, splits=16)
  +- Sort [id#3L ASC NULLS FIRST], false, 0
 +- Exchange hashpartitioning(id#3L, 5), true, [id=#93]
+- Project [id#3L]
   +- Filter isnotnull(id#3L)
  +- FileScan parquet default.bucketed_table[id#3L] Batched: 
true, DataFilters: [isnotnull(id#3L)], Format: Parquet, Location: 
InMemoryFileIndex[file:/root/spark-3.0.0-preview-bin-hadoop3.2/spark-warehouse/bucketed_table],
 PartitionFilters: [], PushedFilters: [IsNotNull(id)], ReadSchema: 
struct, SelectedBucketsCount: 4 out of 4
{noformat}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-29636) Can't parse '11:00 BST' or '2000-10-19 10:23:54+01' signatures to timestamp

2019-10-30 Thread Aman Omer (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16962865#comment-16962865
 ] 

Aman Omer edited comment on SPARK-29636 at 10/30/19 9:51 AM:
-

[~DylanGuedes]

 

Query 1-
{code:java}
select cast ('2000-10-19 10:23:54+01' as timestamp);{code}
PostgreSQL: 19.10.2000 10:23:54

SparkSQL: 2000-10-19 14:53:54

 

Query 2-
{code:java}
select cast ('11:00 BST' as timestamp);{code}
PostgreSQL: invalid input syntax for type timestamp: "11:00 BST"

SparkSQL: NULL

 

However, output of postgres is different from spark sql, but spark is able to 
parse both queries.

 


was (Author: aman_omer):
[~DylanGuedes]

Query 1-

 
{code:java}
select cast ('2000-10-19 10:23:54+01' as timestamp);{code}
PostgreSQL: 19.10.2000 10:23:54

 

SparkSQL: 2000-10-19 14:53:54

 

Query 2-
{code:java}
select cast ('11:00 BST' as timestamp);{code}
PostgreSQL: invalid input syntax for type timestamp: "11:00 BST"

SparkSQL: NULL

 

However, output of postgres is different from spark sql, but spark is able to 
parse both queries.

 

> Can't parse '11:00 BST' or '2000-10-19 10:23:54+01' signatures to timestamp
> ---
>
> Key: SPARK-29636
> URL: https://issues.apache.org/jira/browse/SPARK-29636
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Dylan Guedes
>Priority: Major
>
> Currently, Spark can't parse a string such as '11:00 BST' or '2000-10-19 
> 10:23:54+01' to timestamp:
> {code:sql}
> spark-sql> select cast ('11:00 BST' as timestamp);
> NULL
> Time taken: 2.248 seconds, Fetched 1 row(s)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29636) Can't parse '11:00 BST' or '2000-10-19 10:23:54+01' signatures to timestamp

2019-10-30 Thread Aman Omer (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16962865#comment-16962865
 ] 

Aman Omer commented on SPARK-29636:
---

[~DylanGuedes]

Query 1-

 
{code:java}
select cast ('2000-10-19 10:23:54+01' as timestamp);{code}
PostgreSQL: 19.10.2000 10:23:54

 

SparkSQL: 2000-10-19 14:53:54

 

Query 2-
{code:java}
select cast ('11:00 BST' as timestamp);{code}
PostgreSQL: invalid input syntax for type timestamp: "11:00 BST"

SparkSQL: NULL

 

However, output of postgres is different from spark sql, but spark is able to 
parse both queries.

 

> Can't parse '11:00 BST' or '2000-10-19 10:23:54+01' signatures to timestamp
> ---
>
> Key: SPARK-29636
> URL: https://issues.apache.org/jira/browse/SPARK-29636
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Dylan Guedes
>Priority: Major
>
> Currently, Spark can't parse a string such as '11:00 BST' or '2000-10-19 
> 10:23:54+01' to timestamp:
> {code:sql}
> spark-sql> select cast ('11:00 BST' as timestamp);
> NULL
> Time taken: 2.248 seconds, Fetched 1 row(s)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29654) Add configuration to allow disabling registration of static sources to the metrics system

2019-10-30 Thread Luca Canali (Jira)
Luca Canali created SPARK-29654:
---

 Summary: Add configuration to allow disabling registration of 
static sources to the metrics system
 Key: SPARK-29654
 URL: https://issues.apache.org/jira/browse/SPARK-29654
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.0.0
Reporter: Luca Canali


The Spark metrics system produces many different metrics and not all of them 
are used at the same time. This proposes to introduce a configuration parameter 
to allow disabling the registration of metrics in the "static sources" 
category, in other to reduce the load and clutter on the sink, in the cases 
when the metrics in question are not needed. The metrics registerd as "static 
sources" are under the namespaces CodeGenerator and HiveExternalCatalog and can 
produce a significant amount of data, as they are registered for the driver and 
executors.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29604) SessionState is initialized with isolated classloader for Hive if spark.sql.hive.metastore.jars is being set

2019-10-30 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16962859#comment-16962859
 ] 

Dongjoon Hyun commented on SPARK-29604:
---

BTW, [~kabhwan]. Could you check the old version (at least `2.3.x`) and update 
`Affects Version/s:`, too?

> SessionState is initialized with isolated classloader for Hive if 
> spark.sql.hive.metastore.jars is being set
> 
>
> Key: SPARK-29604
> URL: https://issues.apache.org/jira/browse/SPARK-29604
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.4, 3.0.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Major
> Fix For: 2.4.5, 3.0.0
>
>
> I've observed the issue that external listeners cannot be loaded properly 
> when we run spark-sql with "spark.sql.hive.metastore.jars" configuration 
> being used.
> {noformat}
> Exception in thread "main" java.lang.IllegalArgumentException: Error while 
> instantiating 'org.apache.spark.sql.hive.HiveSessionStateBuilder':
>   at 
> org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1102)
>   at 
> org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:154)
>   at 
> org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:153)
>   at scala.Option.getOrElse(Option.scala:121)
>   at 
> org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:153)
>   at 
> org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:150)
>   at 
> org.apache.spark.sql.SparkSession$$anonfun$1$$anonfun$apply$2.apply(SparkSession.scala:104)
>   at 
> org.apache.spark.sql.SparkSession$$anonfun$1$$anonfun$apply$2.apply(SparkSession.scala:104)
>   at scala.Option.map(Option.scala:146)
>   at 
> org.apache.spark.sql.SparkSession$$anonfun$1.apply(SparkSession.scala:104)
>   at 
> org.apache.spark.sql.SparkSession$$anonfun$1.apply(SparkSession.scala:103)
>   at org.apache.spark.sql.internal.SQLConf$.get(SQLConf.scala:149)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.org$apache$spark$sql$hive$client$HiveClientImpl$$client(HiveClientImpl.scala:282)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:306)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:247)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:246)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:296)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.databaseExists(HiveClientImpl.scala:386)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply$mcZ$sp(HiveExternalCatalog.scala:215)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply(HiveExternalCatalog.scala:215)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply(HiveExternalCatalog.scala:215)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog.databaseExists(HiveExternalCatalog.scala:214)
>   at 
> org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:114)
>   at 
> org.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:102)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:53)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.(SparkSQLCLIDriver.scala:315)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:166)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
>   at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:847)
>   at 
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
>   at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
>   at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
>   at 
> 

  1   2   >