[jira] [Assigned] (SPARK-31957) cleanup hive scratch dir should work for the developer api startWithContext

2020-06-19 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-31957:
-

Assignee: Kent Yao

> cleanup hive scratch dir should work for the developer api startWithContext
> ---
>
> Key: SPARK-31957
> URL: https://issues.apache.org/jira/browse/SPARK-31957
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>
> Comparing to the long-running ThriftServer via start-script, we are more 
> likely to hit the issue  https://issues.apache.org/jira/browse/HIVE-10415 / 
> https://issues.apache.org/jira/browse/SPARK-31626 in the developer API 
> startWithContext



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-31957) cleanup hive scratch dir should work for the developer api startWithContext

2020-06-19 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-31957.
---
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 28784
[https://github.com/apache/spark/pull/28784]

> cleanup hive scratch dir should work for the developer api startWithContext
> ---
>
> Key: SPARK-31957
> URL: https://issues.apache.org/jira/browse/SPARK-31957
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
> Fix For: 3.1.0
>
>
> Comparing to the long-running ThriftServer via start-script, we are more 
> likely to hit the issue  https://issues.apache.org/jira/browse/HIVE-10415 / 
> https://issues.apache.org/jira/browse/SPARK-31626 in the developer API 
> startWithContext



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-32021) make_interval does not accept seconds >100

2020-06-19 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-32021.
---
Fix Version/s: 3.1.0
 Assignee: Maxim Gekk
   Resolution: Fixed

This is resolved via https://github.com/apache/spark/pull/28873

> make_interval does not accept seconds >100
> --
>
> Key: SPARK-32021
> URL: https://issues.apache.org/jira/browse/SPARK-32021
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Juliusz Sompolski
>Assignee: Maxim Gekk
>Priority: Major
> Fix For: 3.1.0
>
>
> In make_interval(years, months, weeks, days, hours, mins, secs), secs are 
> defined as Decimal(8, 6), which turns into null if the value of the 
> expression overflows 100 seconds.
> Larger seconds values should be allowed.
> This has been reported by Simba, who wants to use make_interval to implement 
> translation for TIMESTAMP_ADD ODBC function in Spark 3.0.
> ODBC {fn TIMESTAMPADD(SECOND, integer_exp, timestamp} fails when integer_exp 
> returns seconds values >= 100.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-31980) Spark sequence() fails if start and end of range are identical dates

2020-06-19 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-31980:
--
Affects Version/s: 2.4.0
   2.4.1
   2.4.2
   2.4.3
   2.4.5
   2.4.6

> Spark sequence() fails if start and end of range are identical dates
> 
>
> Key: SPARK-31980
> URL: https://issues.apache.org/jira/browse/SPARK-31980
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0, 2.4.1, 2.4.2, 2.4.3, 2.4.4, 2.4.5, 2.4.6
> Environment: Spark 2.4.4 standalone and on AWS EMR
>Reporter: Dave DeCaprio
>Assignee: JinxinTang
>Priority: Minor
> Fix For: 3.0.1, 3.1.0, 2.4.7
>
>
>  
> The following Spark SQL query throws an exception
> {code:java}
> select sequence(cast("2011-03-01" as date), cast("2011-03-01" as date), 
> interval 1 month)
> {code}
> The error is:
>  
>  
> {noformat}
> java.lang.ArrayIndexOutOfBoundsException: 
> 1java.lang.ArrayIndexOutOfBoundsException: 1 at 
> scala.runtime.ScalaRunTime$.array_update(ScalaRunTime.scala:92) at 
> org.apache.spark.sql.catalyst.expressions.Sequence$TemporalSequenceImpl.eval(collectionOperations.scala:2681)
>  at 
> org.apache.spark.sql.catalyst.expressions.Sequence.eval(collectionOperations.scala:2514)
>  at 
> org.apache.spark.sql.catalyst.expressions.UnaryExpression.eval(Expression.scala:389){noformat}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-30876) Optimizer cannot infer from inferred constraints with join

2020-06-19 Thread Navin Viswanath (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140898#comment-17140898
 ] 

Navin Viswanath edited comment on SPARK-30876 at 6/20/20, 2:27 AM:
---

[~yumwang] would this be in the logical plan optimization? I was looking into 
the logical plans and got this for the following query:

 
{noformat}
val x = testRelation.subquery('x)
val y = testRelation1.subquery('y)
val z = testRelation.subquery('z)
val query = x.join(y).join(z)
  .where(("x.a".attr === "y.b".attr) && ("y.b".attr === "z.c".attr) && 
("z.c".attr === 1)){noformat}
 

Unoptimized:
{noformat}
'Filter ((('x.a = 'y.b) AND ('y.b = 'z.c)) AND ('z.c = 1))
+- 'Join Inner
 :- Join Inner
 : :- SubqueryAlias x
 : : +- LocalRelation , [a#0, b#1, c#2]
 : +- SubqueryAlias y
 : +- LocalRelation , [d#3]
 +- SubqueryAlias z
 +- LocalRelation , [a#0, b#1, c#2]{noformat}
Optimized:
{noformat}
'Filter ((('x.a = 'y.b) AND ('y.b = 'z.c)) AND ('z.c = 1))
+- 'Join Inner
 :- Join Inner
 : :- LocalRelation , [a#0, b#1, c#2]
 : +- LocalRelation , [d#3]
 +- LocalRelation , [a#0, b#1, c#2]{noformat}
Or was this supposed to be in the physical plan? Any pointers would help. 
Thanks!

 


was (Author: navinvishy):
[~yumwang] would this be in the logical plan optimization? I was looking into 
the logical plans and got this.

Unoptimized:

 
{noformat}
'Filter ((('x.a = 'y.b) AND ('y.b = 'z.c)) AND ('z.c = 1))
+- 'Join Inner
 :- Join Inner
 : :- SubqueryAlias x
 : : +- LocalRelation , [a#0, b#1, c#2]
 : +- SubqueryAlias y
 : +- LocalRelation , [d#3]
 +- SubqueryAlias z
 +- LocalRelation , [a#0, b#1, c#2]{noformat}
Optimized:
{noformat}
'Filter ((('x.a = 'y.b) AND ('y.b = 'z.c)) AND ('z.c = 1))
+- 'Join Inner
 :- Join Inner
 : :- LocalRelation , [a#0, b#1, c#2]
 : +- LocalRelation , [d#3]
 +- LocalRelation , [a#0, b#1, c#2]{noformat}
Or was this supposed to be in the physical plan? Any pointers would help. 
Thanks!

 

> Optimizer cannot infer from inferred constraints with join
> --
>
> Key: SPARK-30876
> URL: https://issues.apache.org/jira/browse/SPARK-30876
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Yuming Wang
>Priority: Major
>
> How to reproduce this issue:
> {code:sql}
> create table t1(a int, b int, c int);
> create table t2(a int, b int, c int);
> create table t3(a int, b int, c int);
> select count(*) from t1 join t2 join t3 on (t1.a = t2.b and t2.b = t3.c and 
> t3.c = 1);
> {code}
> Spark 2.3+:
> {noformat}
> == Physical Plan ==
> *(4) HashAggregate(keys=[], functions=[count(1)])
> +- Exchange SinglePartition, true, [id=#102]
>+- *(3) HashAggregate(keys=[], functions=[partial_count(1)])
>   +- *(3) Project
>  +- *(3) BroadcastHashJoin [b#10], [c#14], Inner, BuildRight
> :- *(3) Project [b#10]
> :  +- *(3) BroadcastHashJoin [a#6], [b#10], Inner, BuildRight
> : :- *(3) Project [a#6]
> : :  +- *(3) Filter isnotnull(a#6)
> : : +- *(3) ColumnarToRow
> : :+- FileScan parquet default.t1[a#6] Batched: true, 
> DataFilters: [isnotnull(a#6)], Format: Parquet, Location: 
> InMemoryFileIndex[file:/Users/yumwang/Downloads/spark-3.0.0-preview2-bin-hadoop2.7/spark-warehous...,
>  PartitionFilters: [], PushedFilters: [IsNotNull(a)], ReadSchema: 
> struct
> : +- BroadcastExchange 
> HashedRelationBroadcastMode(List(cast(input[0, int, true] as bigint))), 
> [id=#87]
> :+- *(1) Project [b#10]
> :   +- *(1) Filter (isnotnull(b#10) AND (b#10 = 1))
> :  +- *(1) ColumnarToRow
> : +- FileScan parquet default.t2[b#10] Batched: 
> true, DataFilters: [isnotnull(b#10), (b#10 = 1)], Format: Parquet, Location: 
> InMemoryFileIndex[file:/Users/yumwang/Downloads/spark-3.0.0-preview2-bin-hadoop2.7/spark-warehous...,
>  PartitionFilters: [], PushedFilters: [IsNotNull(b), EqualTo(b,1)], 
> ReadSchema: struct
> +- BroadcastExchange 
> HashedRelationBroadcastMode(List(cast(input[0, int, true] as bigint))), 
> [id=#96]
>+- *(2) Project [c#14]
>   +- *(2) Filter (isnotnull(c#14) AND (c#14 = 1))
>  +- *(2) ColumnarToRow
> +- FileScan parquet default.t3[c#14] Batched: true, 
> DataFilters: [isnotnull(c#14), (c#14 = 1)], Format: Parquet, Location: 
> InMemoryFileIndex[file:/Users/yumwang/Downloads/spark-3.0.0-preview2-bin-hadoop2.7/spark-warehous...,
>  PartitionFilters: [], PushedFilters: [IsNotNull(c), EqualTo(c,1)], 
> ReadSchema: struct
> Time taken: 3.785 seconds, Fetched 1 row(s)
> {noformat}
> Spark 2.2.x:
> {noformat}
> == Physical Plan ==
> 

[jira] [Resolved] (SPARK-31980) Spark sequence() fails if start and end of range are identical dates

2020-06-19 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-31980.
---
Fix Version/s: 3.1.0
   2.4.7
   3.0.1
   Resolution: Fixed

Issue resolved by pull request 28819
[https://github.com/apache/spark/pull/28819]

> Spark sequence() fails if start and end of range are identical dates
> 
>
> Key: SPARK-31980
> URL: https://issues.apache.org/jira/browse/SPARK-31980
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.4
> Environment: Spark 2.4.4 standalone and on AWS EMR
>Reporter: Dave DeCaprio
>Assignee: JinxinTang
>Priority: Minor
> Fix For: 3.0.1, 2.4.7, 3.1.0
>
>
>  
> The following Spark SQL query throws an exception
> {code:java}
> select sequence(cast("2011-03-01" as date), cast("2011-03-01" as date), 
> interval 1 month)
> {code}
> The error is:
>  
>  
> {noformat}
> java.lang.ArrayIndexOutOfBoundsException: 
> 1java.lang.ArrayIndexOutOfBoundsException: 1 at 
> scala.runtime.ScalaRunTime$.array_update(ScalaRunTime.scala:92) at 
> org.apache.spark.sql.catalyst.expressions.Sequence$TemporalSequenceImpl.eval(collectionOperations.scala:2681)
>  at 
> org.apache.spark.sql.catalyst.expressions.Sequence.eval(collectionOperations.scala:2514)
>  at 
> org.apache.spark.sql.catalyst.expressions.UnaryExpression.eval(Expression.scala:389){noformat}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-31980) Spark sequence() fails if start and end of range are identical dates

2020-06-19 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-31980:
-

Assignee: JinxinTang

> Spark sequence() fails if start and end of range are identical dates
> 
>
> Key: SPARK-31980
> URL: https://issues.apache.org/jira/browse/SPARK-31980
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.4
> Environment: Spark 2.4.4 standalone and on AWS EMR
>Reporter: Dave DeCaprio
>Assignee: JinxinTang
>Priority: Minor
>
>  
> The following Spark SQL query throws an exception
> {code:java}
> select sequence(cast("2011-03-01" as date), cast("2011-03-01" as date), 
> interval 1 month)
> {code}
> The error is:
>  
>  
> {noformat}
> java.lang.ArrayIndexOutOfBoundsException: 
> 1java.lang.ArrayIndexOutOfBoundsException: 1 at 
> scala.runtime.ScalaRunTime$.array_update(ScalaRunTime.scala:92) at 
> org.apache.spark.sql.catalyst.expressions.Sequence$TemporalSequenceImpl.eval(collectionOperations.scala:2681)
>  at 
> org.apache.spark.sql.catalyst.expressions.Sequence.eval(collectionOperations.scala:2514)
>  at 
> org.apache.spark.sql.catalyst.expressions.UnaryExpression.eval(Expression.scala:389){noformat}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-32030) Support unlimited MATCHED and NOT MATCHED clauses in MERGE INTO

2020-06-19 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32030:


Assignee: (was: Apache Spark)

> Support unlimited MATCHED and NOT MATCHED clauses in MERGE INTO
> ---
>
> Key: SPARK-32030
> URL: https://issues.apache.org/jira/browse/SPARK-32030
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.1
>Reporter: Xianyin Xin
>Priority: Major
>
> Now the {{MERGE INTO}} syntax is,
> {code:sql}
> MERGE INTO [db_name.]target_table [AS target_alias]
>  USING [db_name.]source_table [] [AS source_alias]
>  ON 
>  [ WHEN MATCHED [ AND  ] THEN  ]
>  [ WHEN MATCHED [ AND  ] THEN  ]
>  [ WHEN NOT MATCHED [ AND  ] THEN  ]{code}
> It would be nice if we support unlimited {{MATCHED}} and {{NOT MATCHED}} 
> clauses in {{MERGE INTO}} statement, because users may want to deal with 
> different "{{AND }}"s, the result of which just like a series of 
> "{{CASE WHEN}}"s. The expected syntax looks like
> {code:sql}
> MERGE INTO [db_name.]target_table [AS target_alias]
>  USING [db_name.]source_table [] [AS source_alias]
>  ON 
>  [when_clause [, ...]]
> {code}
>  where {{when_clause}} is 
> {code:java}
> WHEN MATCHED [ AND  ] THEN {code}
> or
> {code:java}
> WHEN NOT MATCHED [ AND  ] THEN {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-32030) Support unlimited MATCHED and NOT MATCHED clauses in MERGE INTO

2020-06-19 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32030:


Assignee: Apache Spark

> Support unlimited MATCHED and NOT MATCHED clauses in MERGE INTO
> ---
>
> Key: SPARK-32030
> URL: https://issues.apache.org/jira/browse/SPARK-32030
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.1
>Reporter: Xianyin Xin
>Assignee: Apache Spark
>Priority: Major
>
> Now the {{MERGE INTO}} syntax is,
> {code:sql}
> MERGE INTO [db_name.]target_table [AS target_alias]
>  USING [db_name.]source_table [] [AS source_alias]
>  ON 
>  [ WHEN MATCHED [ AND  ] THEN  ]
>  [ WHEN MATCHED [ AND  ] THEN  ]
>  [ WHEN NOT MATCHED [ AND  ] THEN  ]{code}
> It would be nice if we support unlimited {{MATCHED}} and {{NOT MATCHED}} 
> clauses in {{MERGE INTO}} statement, because users may want to deal with 
> different "{{AND }}"s, the result of which just like a series of 
> "{{CASE WHEN}}"s. The expected syntax looks like
> {code:sql}
> MERGE INTO [db_name.]target_table [AS target_alias]
>  USING [db_name.]source_table [] [AS source_alias]
>  ON 
>  [when_clause [, ...]]
> {code}
>  where {{when_clause}} is 
> {code:java}
> WHEN MATCHED [ AND  ] THEN {code}
> or
> {code:java}
> WHEN NOT MATCHED [ AND  ] THEN {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32030) Support unlimited MATCHED and NOT MATCHED clauses in MERGE INTO

2020-06-19 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140908#comment-17140908
 ] 

Apache Spark commented on SPARK-32030:
--

User 'xianyinxin' has created a pull request for this issue:
https://github.com/apache/spark/pull/28875

> Support unlimited MATCHED and NOT MATCHED clauses in MERGE INTO
> ---
>
> Key: SPARK-32030
> URL: https://issues.apache.org/jira/browse/SPARK-32030
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.1
>Reporter: Xianyin Xin
>Priority: Major
>
> Now the {{MERGE INTO}} syntax is,
> {code:sql}
> MERGE INTO [db_name.]target_table [AS target_alias]
>  USING [db_name.]source_table [] [AS source_alias]
>  ON 
>  [ WHEN MATCHED [ AND  ] THEN  ]
>  [ WHEN MATCHED [ AND  ] THEN  ]
>  [ WHEN NOT MATCHED [ AND  ] THEN  ]{code}
> It would be nice if we support unlimited {{MATCHED}} and {{NOT MATCHED}} 
> clauses in {{MERGE INTO}} statement, because users may want to deal with 
> different "{{AND }}"s, the result of which just like a series of 
> "{{CASE WHEN}}"s. The expected syntax looks like
> {code:sql}
> MERGE INTO [db_name.]target_table [AS target_alias]
>  USING [db_name.]source_table [] [AS source_alias]
>  ON 
>  [when_clause [, ...]]
> {code}
>  where {{when_clause}} is 
> {code:java}
> WHEN MATCHED [ AND  ] THEN {code}
> or
> {code:java}
> WHEN NOT MATCHED [ AND  ] THEN {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32036) Remove references to "blacklist"/"whitelist" language (outside of blacklisting feature)

2020-06-19 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140900#comment-17140900
 ] 

Apache Spark commented on SPARK-32036:
--

User 'xkrogen' has created a pull request for this issue:
https://github.com/apache/spark/pull/28874

> Remove references to "blacklist"/"whitelist" language (outside of 
> blacklisting feature)
> ---
>
> Key: SPARK-32036
> URL: https://issues.apache.org/jira/browse/SPARK-32036
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.1
>Reporter: Erik Krogen
>Priority: Minor
>
> As per [discussion on the Spark dev 
> list|https://lists.apache.org/thread.html/rf6b2cdcba4d3875350517a2339619e5d54e12e66626a88553f9fe275%40%3Cdev.spark.apache.org%3E],
>  it will be beneficial to remove references to problematic language that can 
> alienate potential community members. One such reference is "blacklist" and 
> "whitelist". While it seems to me that there is some valid debate as to 
> whether these terms have racist origins, the cultural connotations are 
> inescapable in today's world.
> Renaming the entire blacklisting feature would be a large effort with lots of 
> care needed to maintain public-facing APIs and configurations. Though I think 
> this will be a very rewarding effort for which I've filed SPARK-32037, I'd 
> like to start by tackling all of the other references to such terminology in 
> the codebase, of which there are still dozens or hundreds beyond the 
> blacklisting feature.
> I'm not sure what the best "Component" is for this so I put Spark Core for 
> now.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-32036) Remove references to "blacklist"/"whitelist" language (outside of blacklisting feature)

2020-06-19 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32036:


Assignee: Apache Spark

> Remove references to "blacklist"/"whitelist" language (outside of 
> blacklisting feature)
> ---
>
> Key: SPARK-32036
> URL: https://issues.apache.org/jira/browse/SPARK-32036
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.1
>Reporter: Erik Krogen
>Assignee: Apache Spark
>Priority: Minor
>
> As per [discussion on the Spark dev 
> list|https://lists.apache.org/thread.html/rf6b2cdcba4d3875350517a2339619e5d54e12e66626a88553f9fe275%40%3Cdev.spark.apache.org%3E],
>  it will be beneficial to remove references to problematic language that can 
> alienate potential community members. One such reference is "blacklist" and 
> "whitelist". While it seems to me that there is some valid debate as to 
> whether these terms have racist origins, the cultural connotations are 
> inescapable in today's world.
> Renaming the entire blacklisting feature would be a large effort with lots of 
> care needed to maintain public-facing APIs and configurations. Though I think 
> this will be a very rewarding effort for which I've filed SPARK-32037, I'd 
> like to start by tackling all of the other references to such terminology in 
> the codebase, of which there are still dozens or hundreds beyond the 
> blacklisting feature.
> I'm not sure what the best "Component" is for this so I put Spark Core for 
> now.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32036) Remove references to "blacklist"/"whitelist" language (outside of blacklisting feature)

2020-06-19 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140899#comment-17140899
 ] 

Apache Spark commented on SPARK-32036:
--

User 'xkrogen' has created a pull request for this issue:
https://github.com/apache/spark/pull/28874

> Remove references to "blacklist"/"whitelist" language (outside of 
> blacklisting feature)
> ---
>
> Key: SPARK-32036
> URL: https://issues.apache.org/jira/browse/SPARK-32036
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.1
>Reporter: Erik Krogen
>Priority: Minor
>
> As per [discussion on the Spark dev 
> list|https://lists.apache.org/thread.html/rf6b2cdcba4d3875350517a2339619e5d54e12e66626a88553f9fe275%40%3Cdev.spark.apache.org%3E],
>  it will be beneficial to remove references to problematic language that can 
> alienate potential community members. One such reference is "blacklist" and 
> "whitelist". While it seems to me that there is some valid debate as to 
> whether these terms have racist origins, the cultural connotations are 
> inescapable in today's world.
> Renaming the entire blacklisting feature would be a large effort with lots of 
> care needed to maintain public-facing APIs and configurations. Though I think 
> this will be a very rewarding effort for which I've filed SPARK-32037, I'd 
> like to start by tackling all of the other references to such terminology in 
> the codebase, of which there are still dozens or hundreds beyond the 
> blacklisting feature.
> I'm not sure what the best "Component" is for this so I put Spark Core for 
> now.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-32036) Remove references to "blacklist"/"whitelist" language (outside of blacklisting feature)

2020-06-19 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32036:


Assignee: (was: Apache Spark)

> Remove references to "blacklist"/"whitelist" language (outside of 
> blacklisting feature)
> ---
>
> Key: SPARK-32036
> URL: https://issues.apache.org/jira/browse/SPARK-32036
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.1
>Reporter: Erik Krogen
>Priority: Minor
>
> As per [discussion on the Spark dev 
> list|https://lists.apache.org/thread.html/rf6b2cdcba4d3875350517a2339619e5d54e12e66626a88553f9fe275%40%3Cdev.spark.apache.org%3E],
>  it will be beneficial to remove references to problematic language that can 
> alienate potential community members. One such reference is "blacklist" and 
> "whitelist". While it seems to me that there is some valid debate as to 
> whether these terms have racist origins, the cultural connotations are 
> inescapable in today's world.
> Renaming the entire blacklisting feature would be a large effort with lots of 
> care needed to maintain public-facing APIs and configurations. Though I think 
> this will be a very rewarding effort for which I've filed SPARK-32037, I'd 
> like to start by tackling all of the other references to such terminology in 
> the codebase, of which there are still dozens or hundreds beyond the 
> blacklisting feature.
> I'm not sure what the best "Component" is for this so I put Spark Core for 
> now.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-32038) Regression in handling NaN values in COUNT(DISTINCT)

2020-06-19 Thread Mithun Radhakrishnan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated SPARK-32038:
-
Description: 
There seems to be a regression in Spark 3.0.0, with regard to how {{NaN}} 
values are normalized/handled in {{COUNT(DISTINCT ...)}}. Here is an 
illustration:
{code:scala}
case class Test( uid:String, score:Float)

val POS_NAN_1 = java.lang.Float.intBitsToFloat(0x7f81)
val POS_NAN_2 = java.lang.Float.intBitsToFloat(0x7fff)

val rows = Seq(

 Test("mithunr",  Float.NaN),   
 Test("mithunr",  POS_NAN_1),
 Test("mithunr",  POS_NAN_2),
 Test("abellina", 1.0f),
 Test("abellina", 2.0f)

).toDF.createOrReplaceTempView("mytable")

spark.sql(" select uid, count(distinct score) from mytable group by 1 order by 
1 asc ").show
{code}
Here are the results under Spark 3.0.0:
{code:java|title=Spark 3.0.0 (single aggregation)}
++-+
| uid|count(DISTINCT score)|
++-+
|abellina|2|
| mithunr|3|
++-+
{code}
Note that the count against {{mithunr}} is {{3}}, accounting for each distinct 
value for {{NaN}}.
 The right results are returned when another aggregation is added to the GBY:
{code:scala|title=Spark 3.0.0 (multiple aggregations)}
scala> spark.sql(" select uid, count(distinct score), max(score) from mytable 
group by 1 order by 1 asc ").show
++-+--+
| uid|count(DISTINCT score)|max(score)|
++-+--+
|abellina|2|   2.0|
| mithunr|1|   NaN|
++-+--+
{code}
Also, note that Spark 2.4.6 normalizes the {{DISTINCT}} expression correctly:
{code:scala|title=Spark 2.4.6}
scala> spark.sql(" select uid, count(distinct score) from mytable group by 1 
order by 1 asc ").show

++-+
| uid|count(DISTINCT score)|
++-+
|abellina|2|
| mithunr|1|
++-+
{code}

  was:
There seems to be a regression in Spark 3.0.0, with regard to how {{NaN}} 
values are normalized/handled in {{COUNT(DISTINCT ...)}}. Here is an 
illustration:
{code:scala}
case class Test( uid:String, score:Float)

val POS_NAN_1 = java.lang.Float.intBitsToFloat(0x7f81)
val POS_NAN_2 = java.lang.Float.intBitsToFloat(0x7fff)

val rows = Seq(

 Test("mithunr",  Float.NaN),   
 Test("mithunr",  POS_NAN_1),
 Test("mithunr",  POS_NAN_2),
 Test("abellina", 1.0f),
 Test("abellina", 2.0f)

).toDF.createOrReplaceTempView("mytable")

spark.sql(" select uid, count(distinct score) from mytable group by 1 order by 
1 asc ").show
{code}
Here are the results under Spark 3.0.0:
{code:java|title=Spark 3.0.0 (single aggregation)}
++-+
| uid|count(DISTINCT score)|
++-+
|abellina|2|
| mithunr|3|
++-+
{code}
Note that the count against {{mithunr}} is {{3}}, accounting for each distinct 
value for {{NaN}}.
 The right results are returned when another aggregation is added to the GBY:
{code:scala|title=Spark 3.0.0 (multiple aggregations)}
scala> spark.sql(" select uid, count(distinct score), max(score) from mytable 
group by 1 order by 1 asc ").show
++-+--+
| uid|count(DISTINCT score)|max(score)|
++-+--+
|abellina|2|   2.0|
| mithunr|1|   NaN|
++-+--+
{code}
Also, note that Spark 2.4.6 normalizes the {{DISTINCT}} expression correctly:
{code:scala|title=Spark 2.4.6}
scala> spark.sql(" select uid, count(distinct score) from mytable 
group by 1 order by 1 asc ").show

++-+
| uid|count(DISTINCT score)|
++-+
|abellina|2|
| mithunr|1|
++-+
{code}


> Regression in handling NaN values in COUNT(DISTINCT)
> 
>
> Key: SPARK-32038
> URL: https://issues.apache.org/jira/browse/SPARK-32038
> Project: Spark
>  Issue Type: Bug
>  Components: Optimizer, SQL
>Affects Versions: 3.0.0
>Reporter: Mithun Radhakrishnan
>Priority: Major
>
> There seems to be a regression in Spark 3.0.0, with regard to how {{NaN}} 
> values are normalized/handled in {{COUNT(DISTINCT ...)}}. Here is an 
> illustration:
> {code:scala}
> case class Test( uid:String, score:Float)
> val POS_NAN_1 = java.lang.Float.intBitsToFloat(0x7f81)
> val POS_NAN_2 = java.lang.Float.intBitsToFloat(0x7fff)
> val rows = Seq(
>  Test("mithunr",  

[jira] [Updated] (SPARK-32038) Regression in handling NaN values in COUNT(DISTINCT)

2020-06-19 Thread Mithun Radhakrishnan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated SPARK-32038:
-
Description: 
There seems to be a regression in Spark 3.0.0, with regard to how {{NaN}} 
values are normalized/handled in {{COUNT(DISTINCT ...)}}. Here is an 
illustration:
{code:scala}
case class Test( uid:String, score:Float)

val POS_NAN_1 = java.lang.Float.intBitsToFloat(0x7f81)
val POS_NAN_2 = java.lang.Float.intBitsToFloat(0x7fff)

val rows = Seq(

 Test("mithunr",  Float.NaN),   
 Test("mithunr",  POS_NAN_1),
 Test("mithunr",  POS_NAN_2),
 Test("abellina", 1.0f),
 Test("abellina", 2.0f)

).toDF.createOrReplaceTempView("mytable")

spark.sql(" select uid, count(distinct score) from mytable group by 1 order by 
1 asc ").show
{code}
Here are the results under Spark 3.0.0:
{code:java|title=Spark 3.0.0 (single aggregation)}
++-+
| uid|count(DISTINCT score)|
++-+
|abellina|2|
| mithunr|3|
++-+
{code}
Note that the count against {{mithunr}} is {{3}}, accounting for each distinct 
value for {{NaN}}.
 The right results are returned when another aggregation is added to the GBY:
{code:scala|title=Spark 3.0.0 (multiple aggregations)}
scala> spark.sql(" select uid, count(distinct score), max(score) from mytable 
group by 1 order by 1 asc ").show
++-+--+
| uid|count(DISTINCT score)|max(score)|
++-+--+
|abellina|2|   2.0|
| mithunr|1|   NaN|
++-+--+
{code}
Also, note that Spark 2.4.6 normalizes the {{DISTINCT}} expression correctly:
{code:scala|title=Spark 2.4.6}
scala> spark.sql(" select uid, count(distinct score) from mytable 
group by 1 order by 1 asc ").show

++-+
| uid|count(DISTINCT score)|
++-+
|abellina|2|
| mithunr|1|
++-+
{code}

  was:
There seems to be a regression in Spark 3.0.0, with regard to how {{NaN}} 
values are normalized/handled in {{COUNT(DISTINCT ...)}}. Here is an 
illustration:
{code:scala}
case class Test( uid:String, score:Float)

val POS_NAN_1 = java.lang.Float.intBitsToFloat(0x7f81)
val POS_NAN_2 = java.lang.Float.intBitsToFloat(0x7fff)

val rows = Seq(

 Test("mithunr",  Float.NaN),   
 Test("mithunr",  POS_NAN_1),
 Test("mithunr",  POS_NAN_2),
 Test("abellina", 1.0f),
 Test("abellina", 2.0f)

).toDF.createOrReplaceTempView("mytable")

spark.sql(" select uid, count(distinct score) from mytable group by 1 order by 
1 asc ").show
{code}

Here are the results under Spark 3.0.0:
{code:title=Spark 3.0.0 (single aggregation)}
++-+
| uid|count(DISTINCT score)|
++-+
|abellina|2|
| mithunr|3|
++-+
{code}

Note that the count against {{mithunr}} is {{3}}, accounting for each distinct 
value for {{NaN}}.
The right results are returned when another aggregation is added to the GBY:
{code:scala|title=Spark 3.0.0 (multiple aggregations)}
scala> spark.sql(" select uid, count(distinct score), max(score) from mytable 
group by 1 order by 1 asc ").show
++-+--+
| uid|count(DISTINCT score)|max(score)|
++-+--+
|abellina|2|   2.0|
| mithunr|1|   NaN|
++-+--+
{code}

Also, note that Spark 2.4.6 normalizes the `DISTINCT` expression correctly:
{code:scala|title=Spark 2.4.6}
scala> spark.sql(" select uid, count(distinct score) from mytable 
group by 1 order by 1 asc ").show

++-+
| uid|count(DISTINCT score)|
++-+
|abellina|2|
| mithunr|1|
++-+
{code}


> Regression in handling NaN values in COUNT(DISTINCT)
> 
>
> Key: SPARK-32038
> URL: https://issues.apache.org/jira/browse/SPARK-32038
> Project: Spark
>  Issue Type: Bug
>  Components: Optimizer, SQL
>Affects Versions: 3.0.0
>Reporter: Mithun Radhakrishnan
>Priority: Major
>
> There seems to be a regression in Spark 3.0.0, with regard to how {{NaN}} 
> values are normalized/handled in {{COUNT(DISTINCT ...)}}. Here is an 
> illustration:
> {code:scala}
> case class Test( uid:String, score:Float)
> val POS_NAN_1 = java.lang.Float.intBitsToFloat(0x7f81)
> val POS_NAN_2 = java.lang.Float.intBitsToFloat(0x7fff)
> val rows = Seq(
>  Test("mithunr",  

[jira] [Commented] (SPARK-30876) Optimizer cannot infer from inferred constraints with join

2020-06-19 Thread Navin Viswanath (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140898#comment-17140898
 ] 

Navin Viswanath commented on SPARK-30876:
-

[~yumwang] would this be in the logical plan optimization? I was looking into 
the logical plans and got this.

Unoptimized:

 
{noformat}
'Filter ((('x.a = 'y.b) AND ('y.b = 'z.c)) AND ('z.c = 1))
+- 'Join Inner
 :- Join Inner
 : :- SubqueryAlias x
 : : +- LocalRelation , [a#0, b#1, c#2]
 : +- SubqueryAlias y
 : +- LocalRelation , [d#3]
 +- SubqueryAlias z
 +- LocalRelation , [a#0, b#1, c#2]{noformat}
Optimized:
{noformat}
'Filter ((('x.a = 'y.b) AND ('y.b = 'z.c)) AND ('z.c = 1))
+- 'Join Inner
 :- Join Inner
 : :- LocalRelation , [a#0, b#1, c#2]
 : +- LocalRelation , [d#3]
 +- LocalRelation , [a#0, b#1, c#2]{noformat}
Or was this supposed to be in the physical plan? Any pointers would help. 
Thanks!

 

> Optimizer cannot infer from inferred constraints with join
> --
>
> Key: SPARK-30876
> URL: https://issues.apache.org/jira/browse/SPARK-30876
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Yuming Wang
>Priority: Major
>
> How to reproduce this issue:
> {code:sql}
> create table t1(a int, b int, c int);
> create table t2(a int, b int, c int);
> create table t3(a int, b int, c int);
> select count(*) from t1 join t2 join t3 on (t1.a = t2.b and t2.b = t3.c and 
> t3.c = 1);
> {code}
> Spark 2.3+:
> {noformat}
> == Physical Plan ==
> *(4) HashAggregate(keys=[], functions=[count(1)])
> +- Exchange SinglePartition, true, [id=#102]
>+- *(3) HashAggregate(keys=[], functions=[partial_count(1)])
>   +- *(3) Project
>  +- *(3) BroadcastHashJoin [b#10], [c#14], Inner, BuildRight
> :- *(3) Project [b#10]
> :  +- *(3) BroadcastHashJoin [a#6], [b#10], Inner, BuildRight
> : :- *(3) Project [a#6]
> : :  +- *(3) Filter isnotnull(a#6)
> : : +- *(3) ColumnarToRow
> : :+- FileScan parquet default.t1[a#6] Batched: true, 
> DataFilters: [isnotnull(a#6)], Format: Parquet, Location: 
> InMemoryFileIndex[file:/Users/yumwang/Downloads/spark-3.0.0-preview2-bin-hadoop2.7/spark-warehous...,
>  PartitionFilters: [], PushedFilters: [IsNotNull(a)], ReadSchema: 
> struct
> : +- BroadcastExchange 
> HashedRelationBroadcastMode(List(cast(input[0, int, true] as bigint))), 
> [id=#87]
> :+- *(1) Project [b#10]
> :   +- *(1) Filter (isnotnull(b#10) AND (b#10 = 1))
> :  +- *(1) ColumnarToRow
> : +- FileScan parquet default.t2[b#10] Batched: 
> true, DataFilters: [isnotnull(b#10), (b#10 = 1)], Format: Parquet, Location: 
> InMemoryFileIndex[file:/Users/yumwang/Downloads/spark-3.0.0-preview2-bin-hadoop2.7/spark-warehous...,
>  PartitionFilters: [], PushedFilters: [IsNotNull(b), EqualTo(b,1)], 
> ReadSchema: struct
> +- BroadcastExchange 
> HashedRelationBroadcastMode(List(cast(input[0, int, true] as bigint))), 
> [id=#96]
>+- *(2) Project [c#14]
>   +- *(2) Filter (isnotnull(c#14) AND (c#14 = 1))
>  +- *(2) ColumnarToRow
> +- FileScan parquet default.t3[c#14] Batched: true, 
> DataFilters: [isnotnull(c#14), (c#14 = 1)], Format: Parquet, Location: 
> InMemoryFileIndex[file:/Users/yumwang/Downloads/spark-3.0.0-preview2-bin-hadoop2.7/spark-warehous...,
>  PartitionFilters: [], PushedFilters: [IsNotNull(c), EqualTo(c,1)], 
> ReadSchema: struct
> Time taken: 3.785 seconds, Fetched 1 row(s)
> {noformat}
> Spark 2.2.x:
> {noformat}
> == Physical Plan ==
> *HashAggregate(keys=[], functions=[count(1)])
> +- Exchange SinglePartition
>+- *HashAggregate(keys=[], functions=[partial_count(1)])
>   +- *Project
>  +- *SortMergeJoin [b#19], [c#23], Inner
> :- *Project [b#19]
> :  +- *SortMergeJoin [a#15], [b#19], Inner
> : :- *Sort [a#15 ASC NULLS FIRST], false, 0
> : :  +- Exchange hashpartitioning(a#15, 200)
> : : +- *Filter (isnotnull(a#15) && (a#15 = 1))
> : :+- HiveTableScan [a#15], HiveTableRelation 
> `default`.`t1`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [a#15, 
> b#16, c#17]
> : +- *Sort [b#19 ASC NULLS FIRST], false, 0
> :+- Exchange hashpartitioning(b#19, 200)
> :   +- *Filter (isnotnull(b#19) && (b#19 = 1))
> :  +- HiveTableScan [b#19], HiveTableRelation 
> `default`.`t2`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [a#18, 
> b#19, c#20]
> +- *Sort [c#23 ASC NULLS FIRST], false, 0
>   

[jira] [Created] (SPARK-32038) Regression in handling NaN values in COUNT(DISTINCT)

2020-06-19 Thread Mithun Radhakrishnan (Jira)
Mithun Radhakrishnan created SPARK-32038:


 Summary: Regression in handling NaN values in COUNT(DISTINCT)
 Key: SPARK-32038
 URL: https://issues.apache.org/jira/browse/SPARK-32038
 Project: Spark
  Issue Type: Bug
  Components: Optimizer, SQL
Affects Versions: 3.0.0
Reporter: Mithun Radhakrishnan


There seems to be a regression in Spark 3.0.0, with regard to how {{NaN}} 
values are normalized/handled in {{COUNT(DISTINCT ...)}}. Here is an 
illustration:
{code:scala}
case class Test( uid:String, score:Float)

val POS_NAN_1 = java.lang.Float.intBitsToFloat(0x7f81)
val POS_NAN_2 = java.lang.Float.intBitsToFloat(0x7fff)

val rows = Seq(

 Test("mithunr",  Float.NaN),   
 Test("mithunr",  POS_NAN_1),
 Test("mithunr",  POS_NAN_2),
 Test("abellina", 1.0f),
 Test("abellina", 2.0f)

).toDF.createOrReplaceTempView("mytable")

spark.sql(" select uid, count(distinct score) from mytable group by 1 order by 
1 asc ").show
{code}

Here are the results under Spark 3.0.0:
{code:title=Spark 3.0.0 (single aggregation)}
++-+
| uid|count(DISTINCT score)|
++-+
|abellina|2|
| mithunr|3|
++-+
{code}

Note that the count against {{mithunr}} is {{3}}, accounting for each distinct 
value for {{NaN}}.
The right results are returned when another aggregation is added to the GBY:
{code:scala|title=Spark 3.0.0 (multiple aggregations)}
scala> spark.sql(" select uid, count(distinct score), max(score) from mytable 
group by 1 order by 1 asc ").show
++-+--+
| uid|count(DISTINCT score)|max(score)|
++-+--+
|abellina|2|   2.0|
| mithunr|1|   NaN|
++-+--+
{code}

Also, note that Spark 2.4.6 normalizes the `DISTINCT` expression correctly:
{code:scala|title=Spark 2.4.6}
scala> spark.sql(" select uid, count(distinct score) from mytable 
group by 1 order by 1 asc ").show

++-+
| uid|count(DISTINCT score)|
++-+
|abellina|2|
| mithunr|1|
++-+
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-31350) Coalesce bucketed tables for join if applicable

2020-06-19 Thread Takeshi Yamamuro (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takeshi Yamamuro resolved SPARK-31350.
--
Fix Version/s: 3.1.0
 Assignee: Terry Kim
   Resolution: Fixed

Resolved by https://github.com/apache/spark/pull/28123

> Coalesce bucketed tables for join if applicable
> ---
>
> Key: SPARK-31350
> URL: https://issues.apache.org/jira/browse/SPARK-31350
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Terry Kim
>Assignee: Terry Kim
>Priority: Major
> Fix For: 3.1.0
>
>
> The following example of joining two bucketed tables introduces a full 
> shuffle:
> {code:java}
> spark.conf.set("spark.sql.autoBroadcastJoinThreshold", "0")
> val df1 = (0 until 20).map(i => (i % 5, i % 13, i.toString)).toDF("i", "j", 
> "k")
> val df2 = (0 until 20).map(i => (i % 7, i % 11, i.toString)).toDF("i", "j", 
> "k")
> df1.write.format("parquet").bucketBy(8, "i").saveAsTable("t1")
> df2.write.format("parquet").bucketBy(4, "i").saveAsTable("t2")
> val t1 = spark.table("t1")
> val t2 = spark.table("t2")
> val joined = t1.join(t2, t1("i") === t2("i"))
> joined.explain(true)
> == Physical Plan ==
> *(5) SortMergeJoin [i#44], [i#50], Inner
> :- *(2) Sort [i#44 ASC NULLS FIRST], false, 0
> :  +- Exchange hashpartitioning(i#44, 200), true, [id=#105]
> :     +- *(1) Project [i#44, j#45, k#46]
> :        +- *(1) Filter isnotnull(i#44)
> :           +- *(1) ColumnarToRow
> :              +- FileScan parquet default.t1[i#44,j#45,k#46] Batched: true, 
> DataFilters: [isnotnull(i#44)], Format: Parquet, Location: 
> InMemoryFileIndex[...], PartitionFilters: [], PushedFilters: [IsNotNull(i)], 
> ReadSchema: struct, SelectedBucketsCount: 8 out of 8
> +- *(4) Sort [i#50 ASC NULLS FIRST], false, 0
>    +- Exchange hashpartitioning(i#50, 200), true, [id=#115]
>       +- *(3) Project [i#50, j#51, k#52]
>          +- *(3) Filter isnotnull(i#50)
>             +- *(3) ColumnarToRow
>                +- FileScan parquet default.t2[i#50,j#51,k#52] Batched: true, 
> DataFilters: [isnotnull(i#50)], Format: Parquet, Location: 
> InMemoryFileIndex[...], PartitionFilters: [], PushedFilters: [IsNotNull(i)], 
> ReadSchema: struct, SelectedBucketsCount: 4 out of 4
> {code}
> But one side can be coalesced to eliminate the shuffle.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-32037) Rename blacklisting feature to avoid language with racist connotation

2020-06-19 Thread Erik Krogen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Krogen updated SPARK-32037:

Description: 
As per [discussion on the Spark dev 
list|https://lists.apache.org/thread.html/rf6b2cdcba4d3875350517a2339619e5d54e12e66626a88553f9fe275%40%3Cdev.spark.apache.org%3E],
 it will be beneficial to remove references to problematic language that can 
alienate potential community members. One such reference is "blacklist". While 
it seems to me that there is some valid debate as to whether this term has 
racist origins, the cultural connotations are inescapable in today's world.

I've created a separate task, SPARK-32036, to remove references outside of this 
feature. Given the large surface area of this feature and the public-facing UI 
/ configs / etc., more care will need to be taken here.

I'd like to start by opening up debate on what the best replacement name would 
be. Reject-/deny-/ignore-/block-list are common replacements for "blacklist", 
but I'm not sure that any of them work well for this situation.

  was:
As per [discussion on the Spark dev 
list|https://lists.apache.org/thread.html/rf6b2cdcba4d3875350517a2339619e5d54e12e66626a88553f9fe275%40%3Cdev.spark.apache.org%3E],
 it will be beneficial to remove references to problematic language that can 
alienate potential community members. One such reference is "blacklist". While 
it seems to me that there is some valid debate as to whether these terms have 
racist origins, the cultural connotations are inescapable in today's world.

I've created a separate task, SPARK-32036, to remove references outside of this 
feature. Given the large surface area of this feature and the public-facing UI 
/ configs / etc., more care will need to be taken here.

I'd like to start by opening up debate on what the best replacement name would 
be. Reject-/deny-/ignore-/block-list are common replacements for "blacklist", 
but I'm not sure that any of them work well for this situation.


> Rename blacklisting feature to avoid language with racist connotation
> -
>
> Key: SPARK-32037
> URL: https://issues.apache.org/jira/browse/SPARK-32037
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.1
>Reporter: Erik Krogen
>Priority: Minor
>
> As per [discussion on the Spark dev 
> list|https://lists.apache.org/thread.html/rf6b2cdcba4d3875350517a2339619e5d54e12e66626a88553f9fe275%40%3Cdev.spark.apache.org%3E],
>  it will be beneficial to remove references to problematic language that can 
> alienate potential community members. One such reference is "blacklist". 
> While it seems to me that there is some valid debate as to whether this term 
> has racist origins, the cultural connotations are inescapable in today's 
> world.
> I've created a separate task, SPARK-32036, to remove references outside of 
> this feature. Given the large surface area of this feature and the 
> public-facing UI / configs / etc., more care will need to be taken here.
> I'd like to start by opening up debate on what the best replacement name 
> would be. Reject-/deny-/ignore-/block-list are common replacements for 
> "blacklist", but I'm not sure that any of them work well for this situation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32037) Rename blacklisting feature to avoid language with racist connotation

2020-06-19 Thread Erik Krogen (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140876#comment-17140876
 ] 

Erik Krogen commented on SPARK-32037:
-

+1 from me, I agree that this feature is basically a health tracker.

> Rename blacklisting feature to avoid language with racist connotation
> -
>
> Key: SPARK-32037
> URL: https://issues.apache.org/jira/browse/SPARK-32037
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.1
>Reporter: Erik Krogen
>Priority: Minor
>
> As per [discussion on the Spark dev 
> list|https://lists.apache.org/thread.html/rf6b2cdcba4d3875350517a2339619e5d54e12e66626a88553f9fe275%40%3Cdev.spark.apache.org%3E],
>  it will be beneficial to remove references to problematic language that can 
> alienate potential community members. One such reference is "blacklist". 
> While it seems to me that there is some valid debate as to whether this term 
> has racist origins, the cultural connotations are inescapable in today's 
> world.
> I've created a separate task, SPARK-32036, to remove references outside of 
> this feature. Given the large surface area of this feature and the 
> public-facing UI / configs / etc., more care will need to be taken here.
> I'd like to start by opening up debate on what the best replacement name 
> would be. Reject-/deny-/ignore-/block-list are common replacements for 
> "blacklist", but I'm not sure that any of them work well for this situation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32031) Fix the wrong references of PartialMerge/Final AggregateExpression

2020-06-19 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140859#comment-17140859
 ] 

Dongjoon Hyun commented on SPARK-32031:
---

Hi, [~Ngone51]. This is filed as `Improvement`, but the title is `Fix ...`. Is 
this a bug fix?

> Fix the wrong references of PartialMerge/Final AggregateExpression
> --
>
> Key: SPARK-32031
> URL: https://issues.apache.org/jira/browse/SPARK-32031
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: wuyi
>Priority: Major
>
> For the PartialMerge/Final AggregateExpression, it should reference the 
> `inputAggBufferAttributes` instead of `aggBufferAttributes` according to 
> `AggUtils.planAggXXX`



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-32033) Use new poll API in Kafka connector executor side to avoid infinite wait

2020-06-19 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-32033.
---
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 28871
[https://github.com/apache/spark/pull/28871]

> Use new poll API in Kafka connector executor side to avoid infinite wait
> 
>
> Key: SPARK-32033
> URL: https://issues.apache.org/jira/browse/SPARK-32033
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming
>Affects Versions: 3.1.0
>Reporter: Gabor Somogyi
>Assignee: Gabor Somogyi
>Priority: Major
> Fix For: 3.1.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-32033) Use new poll API in Kafka connector executor side to avoid infinite wait

2020-06-19 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-32033:
-

Assignee: Gabor Somogyi

> Use new poll API in Kafka connector executor side to avoid infinite wait
> 
>
> Key: SPARK-32033
> URL: https://issues.apache.org/jira/browse/SPARK-32033
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming
>Affects Versions: 3.1.0
>Reporter: Gabor Somogyi
>Assignee: Gabor Somogyi
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32037) Rename blacklisting feature to avoid language with racist connotation

2020-06-19 Thread Ryan Blue (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140825#comment-17140825
 ] 

Ryan Blue commented on SPARK-32037:
---

What about "healthy" and "unhealthy"? That's basically what we are trying to 
keep track of -- whether a node is healthy enough to run tasks, or if it should 
not be used for some period of time.

I think "trusted" and "untrusted" may also work, but "healthy" is a bit closer 
to what we want.

> Rename blacklisting feature to avoid language with racist connotation
> -
>
> Key: SPARK-32037
> URL: https://issues.apache.org/jira/browse/SPARK-32037
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.1
>Reporter: Erik Krogen
>Priority: Minor
>
> As per [discussion on the Spark dev 
> list|https://lists.apache.org/thread.html/rf6b2cdcba4d3875350517a2339619e5d54e12e66626a88553f9fe275%40%3Cdev.spark.apache.org%3E],
>  it will be beneficial to remove references to problematic language that can 
> alienate potential community members. One such reference is "blacklist". 
> While it seems to me that there is some valid debate as to whether these 
> terms have racist origins, the cultural connotations are inescapable in 
> today's world.
> I've created a separate task, SPARK-32036, to remove references outside of 
> this feature. Given the large surface area of this feature and the 
> public-facing UI / configs / etc., more care will need to be taken here.
> I'd like to start by opening up debate on what the best replacement name 
> would be. Reject-/deny-/ignore-/block-list are common replacements for 
> "blacklist", but I'm not sure that any of them work well for this situation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-32037) Rename blacklisting feature to avoid language with racist connotation

2020-06-19 Thread Erik Krogen (Jira)
Erik Krogen created SPARK-32037:
---

 Summary: Rename blacklisting feature to avoid language with racist 
connotation
 Key: SPARK-32037
 URL: https://issues.apache.org/jira/browse/SPARK-32037
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.0.1
Reporter: Erik Krogen


As per [discussion on the Spark dev 
list|https://lists.apache.org/thread.html/rf6b2cdcba4d3875350517a2339619e5d54e12e66626a88553f9fe275%40%3Cdev.spark.apache.org%3E],
 it will be beneficial to remove references to problematic language that can 
alienate potential community members. One such reference is "blacklist". While 
it seems to me that there is some valid debate as to whether these terms have 
racist origins, the cultural connotations are inescapable in today's world.

I've created a separate task, SPARK-32036, to remove references outside of this 
feature. Given the large surface area of this feature and the public-facing UI 
/ configs / etc., more care will need to be taken here.

I'd like to start by opening up debate on what the best replacement name would 
be. Reject-/deny-/ignore-/block-list are common replacements for "blacklist", 
but I'm not sure that any of them work well for this situation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-32036) Remove references to "blacklist"/"whitelist" language (outside of blacklisting feature)

2020-06-19 Thread Erik Krogen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Krogen updated SPARK-32036:

Description: 
As per [discussion on the Spark dev 
list|https://lists.apache.org/thread.html/rf6b2cdcba4d3875350517a2339619e5d54e12e66626a88553f9fe275%40%3Cdev.spark.apache.org%3E],
 it will be beneficial to remove references to problematic language that can 
alienate potential community members. One such reference is "blacklist" and 
"whitelist". While it seems to me that there is some valid debate as to whether 
these terms have racist origins, the cultural connotations are inescapable in 
today's world.

Renaming the entire blacklisting feature would be a large effort with lots of 
care needed to maintain public-facing APIs and configurations. Though I think 
this will be a very rewarding effort for which I've filed SPARK-32037, I'd like 
to start by tackling all of the other references to such terminology in the 
codebase, of which there are still dozens or hundreds beyond the blacklisting 
feature.


I'm not sure what the best "Component" is for this so I put Spark Core for now.

  was:
As per [discussion on the Spark dev 
list|https://lists.apache.org/thread.html/rf6b2cdcba4d3875350517a2339619e5d54e12e66626a88553f9fe275%40%3Cdev.spark.apache.org%3E],
 it will be beneficial to remove references to problematic language that can 
alienate potential community members. One such reference is "blacklist" and 
"whitelist". While it seems to me that there is some valid debate as to whether 
these terms have racist origins, the cultural connotations are inescapable in 
today's world.

Renaming the entire blacklisting feature would be a large effort with lots of 
care needed to maintain public-facing APIs and configurations. Though I think 
this will be a very rewarding effort, I'd like to start by tackling all of the 
other references to such terminology in the codebase, of which there are still 
dozens or hundreds beyond the blacklisting feature.


I'm not sure what the best "Component" is for this so I put Spark Core for now.


> Remove references to "blacklist"/"whitelist" language (outside of 
> blacklisting feature)
> ---
>
> Key: SPARK-32036
> URL: https://issues.apache.org/jira/browse/SPARK-32036
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.1
>Reporter: Erik Krogen
>Priority: Minor
>
> As per [discussion on the Spark dev 
> list|https://lists.apache.org/thread.html/rf6b2cdcba4d3875350517a2339619e5d54e12e66626a88553f9fe275%40%3Cdev.spark.apache.org%3E],
>  it will be beneficial to remove references to problematic language that can 
> alienate potential community members. One such reference is "blacklist" and 
> "whitelist". While it seems to me that there is some valid debate as to 
> whether these terms have racist origins, the cultural connotations are 
> inescapable in today's world.
> Renaming the entire blacklisting feature would be a large effort with lots of 
> care needed to maintain public-facing APIs and configurations. Though I think 
> this will be a very rewarding effort for which I've filed SPARK-32037, I'd 
> like to start by tackling all of the other references to such terminology in 
> the codebase, of which there are still dozens or hundreds beyond the 
> blacklisting feature.
> I'm not sure what the best "Component" is for this so I put Spark Core for 
> now.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-32036) Remove references to "blacklist"/"whitelist" language (outside of blacklisting feature)

2020-06-19 Thread Erik Krogen (Jira)
Erik Krogen created SPARK-32036:
---

 Summary: Remove references to "blacklist"/"whitelist" language 
(outside of blacklisting feature)
 Key: SPARK-32036
 URL: https://issues.apache.org/jira/browse/SPARK-32036
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.0.1
Reporter: Erik Krogen


As per [discussion on the Spark dev 
list|https://lists.apache.org/thread.html/rf6b2cdcba4d3875350517a2339619e5d54e12e66626a88553f9fe275%40%3Cdev.spark.apache.org%3E],
 it will be beneficial to remove references to problematic language that can 
alienate potential community members. One such reference is "blacklist" and 
"whitelist". While it seems to me that there is some valid debate as to whether 
these terms have racist origins, the cultural connotations are inescapable in 
today's world.

Renaming the entire blacklisting feature would be a large effort with lots of 
care needed to maintain public-facing APIs and configurations. Though I think 
this will be a very rewarding effort, I'd like to start by tackling all of the 
other references to such terminology in the codebase, of which there are still 
dozens or hundreds beyond the blacklisting feature.


I'm not sure what the best "Component" is for this so I put Spark Core for now.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-32021) make_interval does not accept seconds >100

2020-06-19 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32021:


Assignee: Apache Spark

> make_interval does not accept seconds >100
> --
>
> Key: SPARK-32021
> URL: https://issues.apache.org/jira/browse/SPARK-32021
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Juliusz Sompolski
>Assignee: Apache Spark
>Priority: Major
>
> In make_interval(years, months, weeks, days, hours, mins, secs), secs are 
> defined as Decimal(8, 6), which turns into null if the value of the 
> expression overflows 100 seconds.
> Larger seconds values should be allowed.
> This has been reported by Simba, who wants to use make_interval to implement 
> translation for TIMESTAMP_ADD ODBC function in Spark 3.0.
> ODBC {fn TIMESTAMPADD(SECOND, integer_exp, timestamp} fails when integer_exp 
> returns seconds values >= 100.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32021) make_interval does not accept seconds >100

2020-06-19 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140775#comment-17140775
 ] 

Apache Spark commented on SPARK-32021:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/28873

> make_interval does not accept seconds >100
> --
>
> Key: SPARK-32021
> URL: https://issues.apache.org/jira/browse/SPARK-32021
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Juliusz Sompolski
>Priority: Major
>
> In make_interval(years, months, weeks, days, hours, mins, secs), secs are 
> defined as Decimal(8, 6), which turns into null if the value of the 
> expression overflows 100 seconds.
> Larger seconds values should be allowed.
> This has been reported by Simba, who wants to use make_interval to implement 
> translation for TIMESTAMP_ADD ODBC function in Spark 3.0.
> ODBC {fn TIMESTAMPADD(SECOND, integer_exp, timestamp} fails when integer_exp 
> returns seconds values >= 100.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-32021) make_interval does not accept seconds >100

2020-06-19 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32021:


Assignee: (was: Apache Spark)

> make_interval does not accept seconds >100
> --
>
> Key: SPARK-32021
> URL: https://issues.apache.org/jira/browse/SPARK-32021
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Juliusz Sompolski
>Priority: Major
>
> In make_interval(years, months, weeks, days, hours, mins, secs), secs are 
> defined as Decimal(8, 6), which turns into null if the value of the 
> expression overflows 100 seconds.
> Larger seconds values should be allowed.
> This has been reported by Simba, who wants to use make_interval to implement 
> translation for TIMESTAMP_ADD ODBC function in Spark 3.0.
> ODBC {fn TIMESTAMPADD(SECOND, integer_exp, timestamp} fails when integer_exp 
> returns seconds values >= 100.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-7101) Spark SQL should support java.sql.Time

2020-06-19 Thread YoungGyu Chun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-7101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17132694#comment-17132694
 ] 

YoungGyu Chun edited comment on SPARK-7101 at 6/19/20, 6:35 PM:


I will try to get this done but there are tons of work to do ;)


was (Author: younggyuchun):
I will try to get this done but there are a ton of work ;)

> Spark SQL should support java.sql.Time
> --
>
> Key: SPARK-7101
> URL: https://issues.apache.org/jira/browse/SPARK-7101
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
> Environment: All
>Reporter: Peter Hagelund
>Priority: Major
>
> Several RDBMSes support the TIME data type; for more exact mapping between 
> those and Spark SQL, support for java.sql.Time with an associated 
> DataType.TimeType would be helpful.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29679) Make interval type camparable and orderable

2020-06-19 Thread Bart Samwel (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140651#comment-17140651
 ] 

Bart Samwel commented on SPARK-29679:
-

That's a good question. I think it would make sense to do it that way. That 
means that all ANSI SQL compliant queries will run, and if you mix month-type 
and seconds-type intervals then you get an error if you use any operation that 
depends on them being comparable (including things like GROUP BY). 

> Make interval type camparable and orderable
> ---
>
> Key: SPARK-29679
> URL: https://issues.apache.org/jira/browse/SPARK-29679
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
> Fix For: 3.0.0
>
>
> {code:sql}
> postgres=# select INTERVAL '9 years 1 months -1 weeks -4 days -10 hours -46 
> minutes' > interval '1 s';
>  ?column?
> --
>  t
> (1 row)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-31029) Occasional class not found error in user's Future code using global ExecutionContext

2020-06-19 Thread Thomas Graves (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves resolved SPARK-31029.
---
Fix Version/s: 3.1.0
 Assignee: shanyu zhao
   Resolution: Fixed

> Occasional class not found error in user's Future code using global 
> ExecutionContext
> 
>
> Key: SPARK-31029
> URL: https://issues.apache.org/jira/browse/SPARK-31029
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, YARN
>Affects Versions: 2.4.5
>Reporter: shanyu zhao
>Assignee: shanyu zhao
>Priority: Major
> Fix For: 3.1.0
>
>
> *Problem:*
> When running tpc-ds test (https://github.com/databricks/spark-sql-perf), 
> occasionally we see error related to class not found:
> 2020-02-04 20:00:26,673 ERROR yarn.ApplicationMaster: User class threw 
> exception: scala.ScalaReflectionException: class 
> com.databricks.spark.sql.perf.ExperimentRun in JavaMirror with 
> sun.misc.Launcher$AppClassLoader@28ba21f3 of type class 
> sun.misc.Launcher$AppClassLoader with classpath [...] 
> and parent being sun.misc.Launcher$ExtClassLoader@3ff5d147 of type class 
> sun.misc.Launcher$ExtClassLoader with classpath [...] 
> and parent being primordial classloader with boot classpath [...] not found.
> *Root cause:*
> Spark driver starts ApplicationMaster in the main thread, which starts a user 
> thread and set MutableURLClassLoader to that thread's ContextClassLoader.
>   userClassThread = startUserApplication()
> The main thread then setup YarnSchedulerBackend RPC endpoints, which handles 
> these calls using scala Future with the default global ExecutionContext:
> - doRequestTotalExecutors
> - doKillExecutors
> If main thread starts a future to handle doKillExecutors() before user thread 
> does then the default thread pool thread's ContextClassLoader would be the 
> default (AppClassLoader). 
> If user thread starts a future first then the thread pool thread will have 
> MutableURLClassLoader.
> So if user's code uses a future which references a user provided class (only 
> MutableURLClassLoader can load), and before the future if there are executor 
> lost, you will see errors related to class not found.
> *Proposed Solution:*
> We can potentially solve this problem in one of two ways:
> 1) Set the same class loader (userClassLoader) to both the main thread and 
> user thread in ApplicationMaster.scala
> 2) Do not use "ExecutionContext.Implicits.global" in YarnSchedulerBackend



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-31826) Support composed type of case class for typed Scala UDF

2020-06-19 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-31826.
-
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 28645
[https://github.com/apache/spark/pull/28645]

> Support composed type of case class for typed Scala UDF
> ---
>
> Key: SPARK-31826
> URL: https://issues.apache.org/jira/browse/SPARK-31826
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: wuyi
>Assignee: wuyi
>Priority: Major
> Fix For: 3.1.0
>
>
> After SPARK-30127, typed Scala UDF now supports to accept case class as input 
> parameter. However, it still does not support types like Seq[T], Array[T], 
> assuming T is a case class. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-31826) Support composed type of case class for typed Scala UDF

2020-06-19 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-31826:
---

Assignee: wuyi

> Support composed type of case class for typed Scala UDF
> ---
>
> Key: SPARK-31826
> URL: https://issues.apache.org/jira/browse/SPARK-31826
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: wuyi
>Assignee: wuyi
>Priority: Major
>
> After SPARK-30127, typed Scala UDF now supports to accept case class as input 
> parameter. However, it still does not support types like Seq[T], Array[T], 
> assuming T is a case class. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-31056) Add CalendarIntervals division

2020-06-19 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-31056.
--
Resolution: Won't Fix

> Add CalendarIntervals division
> --
>
> Key: SPARK-31056
> URL: https://issues.apache.org/jira/browse/SPARK-31056
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Enrico Minack
>Priority: Major
>
> {{CalendarInterval}} should be allowed for division. The {{CalendarInterval}} 
> consists of three time components: {{months}}, {{days}} and {{microseconds}}. 
> The division can only be defined between intervals that have a single 
> non-zero time component, while both intervals have the same non-zero time 
> component. Otherwise the division expression would be ambiguous.
> This allows to evaluate the magnitude of {{CalendarInterval}} in SQL 
> expressions:
> {code}
> Seq((Timestamp.valueOf("2020-02-01 12:00:00"), Timestamp.valueOf("2020-02-01 
> 13:30:25")))
>   .toDF("start", "end")
>   .withColumn("interval", $"end" - $"start")
>   .withColumn("interval [h]", $"interval" / lit("1 
> hour").cast(CalendarIntervalType))
>   .withColumn("rate [€/h]", lit(1.45))
>   .withColumn("price [€]", $"interval [h]" * $"rate [€/h]")
>   .show(false)
> +---+---+-+--+--+--+
> |start  |end|interval 
> |interval [h]  |rate [€/h]|price [€] |
> +---+---+-+--+--+--+
> |2020-02-01 12:00:00|2020-02-01 13:30:25|1 hours 30 minutes 25 
> seconds|1.5069|1.45  |2.18506943|
> +---+---+-+--+--+--+
> {code}
> The currently available approach is
> {code}
> Seq((Timestamp.valueOf("2020-02-01 12:00:00"), Timestamp.valueOf("2020-02-01 
> 13:30:25")))
>   .toDF("start", "end")
>   .withColumn("interval [s]", unix_timestamp($"end") - 
> unix_timestamp($"start"))
>   .withColumn("interval [h]", $"interval [s]" / 3600)
>   .withColumn("rate [€/h]", lit(1.45))
>   .withColumn("price [€]", $"interval [h]" * $"rate [€/h]")
>   .show(false)
> {code}
> Going through {{unix_timestamp}} is a hack and it pollutes the SQL query with 
> unrelated semantics (unix timestamp is completely irrelevant for this 
> computation). It is merely there because there is currently no way to access 
> the length of an {{CalendarInterval}}. Dividing an interval by another 
> interval provides means to measure the length in an arbitrary unit (minutes, 
> hours, quarter hours).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-32035) Inconsistent AWS environment variables in documentation

2020-06-19 Thread Ondrej Kokes (Jira)
Ondrej Kokes created SPARK-32035:


 Summary: Inconsistent AWS environment variables in documentation
 Key: SPARK-32035
 URL: https://issues.apache.org/jira/browse/SPARK-32035
 Project: Spark
  Issue Type: Bug
  Components: Documentation
Affects Versions: 3.0.0, 2.4.6
Reporter: Ondrej Kokes


Looking at the actual Scala code, the environment variables used to log into 
AWS are:
 - AWS_ACCESS_KEY_ID
 - AWS_SECRET_ACCESS_KEY
 - AWS_SESSION_TOKEN

These are the same that AWS uses in their libraries.

However, looking through the Spark documentation and comments, I see that these 
are not denoted correctly across the board:

docs/cloud-integration.md
 106:1. `spark-submit` reads the `AWS_ACCESS_KEY`, `AWS_SECRET_KEY` *<-- both 
different*
 107:and `AWS_SESSION_TOKEN` environment variables and sets the associated 
authentication options

docs/streaming-kinesis-integration.md
 232:- Set up the environment variables `AWS_ACCESS_KEY_ID` and 
`AWS_SECRET_KEY` with your AWS credentials. *<-- secret key different*

external/kinesis-asl/src/main/python/examples/streaming/kinesis_wordcount_asl.py
 34: $ export AWS_ACCESS_KEY_ID=
 35: $ export AWS_SECRET_KEY= *<-- different*
 48: Environment Variables - AWS_ACCESS_KEY_ID and AWS_SECRET_KEY *<-- secret 
key different*

core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala
 438: val keyId = System.getenv("AWS_ACCESS_KEY_ID")
 439: val accessKey = System.getenv("AWS_SECRET_ACCESS_KEY")
 448: val sessionToken = System.getenv("AWS_SESSION_TOKEN")

external/kinesis-asl/src/main/scala/org/apache/spark/examples/streaming/KinesisWordCountASL.scala
 53: * $ export AWS_ACCESS_KEY_ID=
 54: * $ export AWS_SECRET_KEY= *<-- different*
 65: * Environment Variables - AWS_ACCESS_KEY_ID and AWS_SECRET_KEY *<-- secret 
key different*

external/kinesis-asl/src/main/java/org/apache/spark/examples/streaming/JavaKinesisWordCountASL.java
 59: * $ export AWS_ACCESS_KEY_ID=[your-access-key]
 60: * $ export AWS_SECRET_KEY= *<-- different*
 71: * Environment Variables - AWS_ACCESS_KEY_ID and AWS_SECRET_KEY *<-- secret 
key different*



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11150) Dynamic partition pruning

2020-06-19 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-11150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140378#comment-17140378
 ] 

Apache Spark commented on SPARK-11150:
--

User 'AngersZh' has created a pull request for this issue:
https://github.com/apache/spark/pull/28872

> Dynamic partition pruning
> -
>
> Key: SPARK-11150
> URL: https://issues.apache.org/jira/browse/SPARK-11150
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 1.5.1, 1.6.0, 2.0.0, 2.1.2, 2.2.1, 2.3.0
>Reporter: Younes
>Assignee: Wei Xue
>Priority: Major
>  Labels: release-notes
> Fix For: 3.0.0
>
> Attachments: image-2019-10-04-11-20-02-616.png
>
>
> Implements dynamic partition pruning by adding a dynamic-partition-pruning 
> filter if there is a partitioned table and a filter on the dimension table. 
> The filter is then planned using a heuristic approach:
>  # As a broadcast relation if it is a broadcast hash join. The broadcast 
> relation will then be transformed into a reused broadcast exchange by the 
> {{ReuseExchange}} rule; or
>  # As a subquery duplicate if the estimated benefit of partition table scan 
> being saved is greater than the estimated cost of the extra scan of the 
> duplicated subquery; otherwise
>  # As a bypassed condition ({{true}}).
>  Below shows a basic example of DPP.
> !image-2019-10-04-11-20-02-616.png|width=521,height=225!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11150) Dynamic partition pruning

2020-06-19 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-11150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140374#comment-17140374
 ] 

Apache Spark commented on SPARK-11150:
--

User 'AngersZh' has created a pull request for this issue:
https://github.com/apache/spark/pull/28872

> Dynamic partition pruning
> -
>
> Key: SPARK-11150
> URL: https://issues.apache.org/jira/browse/SPARK-11150
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 1.5.1, 1.6.0, 2.0.0, 2.1.2, 2.2.1, 2.3.0
>Reporter: Younes
>Assignee: Wei Xue
>Priority: Major
>  Labels: release-notes
> Fix For: 3.0.0
>
> Attachments: image-2019-10-04-11-20-02-616.png
>
>
> Implements dynamic partition pruning by adding a dynamic-partition-pruning 
> filter if there is a partitioned table and a filter on the dimension table. 
> The filter is then planned using a heuristic approach:
>  # As a broadcast relation if it is a broadcast hash join. The broadcast 
> relation will then be transformed into a reused broadcast exchange by the 
> {{ReuseExchange}} rule; or
>  # As a subquery duplicate if the estimated benefit of partition table scan 
> being saved is greater than the estimated cost of the extra scan of the 
> duplicated subquery; otherwise
>  # As a bypassed condition ({{true}}).
>  Below shows a basic example of DPP.
> !image-2019-10-04-11-20-02-616.png|width=521,height=225!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32033) Use new poll API in Kafka connector executor side to avoid infinite wait

2020-06-19 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140360#comment-17140360
 ] 

Apache Spark commented on SPARK-32033:
--

User 'gaborgsomogyi' has created a pull request for this issue:
https://github.com/apache/spark/pull/28871

> Use new poll API in Kafka connector executor side to avoid infinite wait
> 
>
> Key: SPARK-32033
> URL: https://issues.apache.org/jira/browse/SPARK-32033
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming
>Affects Versions: 3.1.0
>Reporter: Gabor Somogyi
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-32033) Use new poll API in Kafka connector executor side to avoid infinite wait

2020-06-19 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32033:


Assignee: Apache Spark

> Use new poll API in Kafka connector executor side to avoid infinite wait
> 
>
> Key: SPARK-32033
> URL: https://issues.apache.org/jira/browse/SPARK-32033
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming
>Affects Versions: 3.1.0
>Reporter: Gabor Somogyi
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-32033) Use new poll API in Kafka connector executor side to avoid infinite wait

2020-06-19 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32033:


Assignee: (was: Apache Spark)

> Use new poll API in Kafka connector executor side to avoid infinite wait
> 
>
> Key: SPARK-32033
> URL: https://issues.apache.org/jira/browse/SPARK-32033
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming
>Affects Versions: 3.1.0
>Reporter: Gabor Somogyi
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32033) Use new poll API in Kafka connector executor side to avoid infinite wait

2020-06-19 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140356#comment-17140356
 ] 

Apache Spark commented on SPARK-32033:
--

User 'gaborgsomogyi' has created a pull request for this issue:
https://github.com/apache/spark/pull/28871

> Use new poll API in Kafka connector executor side to avoid infinite wait
> 
>
> Key: SPARK-32033
> URL: https://issues.apache.org/jira/browse/SPARK-32033
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming
>Affects Versions: 3.1.0
>Reporter: Gabor Somogyi
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16659) use Maven project to submit spark application via yarn-client

2020-06-19 Thread Jack Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-16659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jack Jiang updated SPARK-16659:
---
Description: (was: i want to use spark sql to execute hive sql in my 
maven project,here is the main code:
System.setProperty("hadoop.home.dir",
"D:\\hadoop-common-2.2.0-bin-master");
SparkConf sparkConf = new SparkConf()
.setAppName("test").setMaster("yarn-client");
// .set("hive.metastore.uris", "thrift://172.30.115.59:9083");
SparkContext ctx = new SparkContext(sparkConf);
// ctx.addJar("lib/hive-hbase-handler-0.14.0.2.2.6.0-2800.jar");
HiveContext sqlContext = new 
org.apache.spark.sql.hive.HiveContext(ctx);
String[] tables = sqlContext.tableNames();
for (String tablename : tables) {
System.out.println("tablename : " + tablename);
}
when i run it,it comes to a error:
10:16:17,496  INFO Client:59 - 
 client token: N/A
 diagnostics: Application application_1468409747983_0280 failed 2 times 
due to AM Container for appattempt_1468409747983_0280_02 exited with  
exitCode: -1000
For more detailed output, check application tracking 
page:http://hadoop003.icccuat.com:8088/proxy/application_1468409747983_0280/Then,
 click on links to logs of each attempt.
Diagnostics: File 
file:/C:/Users/uatxj990267/AppData/Local/Temp/spark-8874c486-893d-4ac3-a088-48e4cdb484e1/__spark_conf__9007071161920501082.zip
 does not exist
java.io.FileNotFoundException: File 
file:/C:/Users/uatxj990267/AppData/Local/Temp/spark-8874c486-893d-4ac3-a088-48e4cdb484e1/__spark_conf__9007071161920501082.zip
 does not exist
at 
org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:608)
at 
org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:821)
at 
org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:598)
at 
org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:414)
at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:251)
at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:61)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:357)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:356)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:60)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

Failing this attempt. Failing the application.
 ApplicationMaster host: N/A
 ApplicationMaster RPC port: -1
 queue: default
 start time: 1469067373412
 final status: FAILED
 tracking URL: 
http://hadoop003.icccuat.com:8088/cluster/app/application_1468409747983_0280
 user: uatxj990267
10:16:17,496 ERROR SparkContext:96 - Error initializing SparkContext.
org.apache.spark.SparkException: Yarn application has already ended! It might 
have been killed or unable to launch application master.
at 
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:123)
at 
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:63)
at 
org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144)
at org.apache.spark.SparkContext.(SparkContext.scala:523)
at com.huateng.test.SparkSqlDemo.main(SparkSqlDemo.java:33)
but when i change this code setMaster("yarn-client") to 
setMaster(local[2]),it's OK?what's wrong with it ?can anyone help me?)

> use Maven project to submit spark application via yarn-client
> -
>
> Key: SPARK-16659
> URL: https://issues.apache.org/jira/browse/SPARK-16659
> Project: Spark
>  Issue Type: Question
>Reporter: Jack Jiang
>Priority: Major
>  Labels: newbie
>




--
This message was sent by Atlassian Jira

[jira] [Commented] (SPARK-32034) Port HIVE-14817: Shutdown the SessionManager timeoutChecker thread properly upon shutdown

2020-06-19 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140344#comment-17140344
 ] 

Apache Spark commented on SPARK-32034:
--

User 'yaooqinn' has created a pull request for this issue:
https://github.com/apache/spark/pull/28870

> Port HIVE-14817: Shutdown the SessionManager timeoutChecker thread properly 
> upon shutdown
> -
>
> Key: SPARK-32034
> URL: https://issues.apache.org/jira/browse/SPARK-32034
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Kent Yao
>Priority: Major
>
> When stopping the HiveServer2, the non-daemon thread stops the server from 
> terminating
> {code:java}
> "HiveServer2-Background-Pool: Thread-79" #79 prio=5 os_prio=31 
> tid=0x7fde26138800 nid=0x13713 waiting on condition [0x700010c32000]
>java.lang.Thread.State: TIMED_WAITING (sleeping)
>   at java.lang.Thread.sleep(Native Method)
>   at 
> org.apache.hive.service.cli.session.SessionManager$1.sleepInterval(SessionManager.java:178)
>   at 
> org.apache.hive.service.cli.session.SessionManager$1.run(SessionManager.java:156)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
> Also, causes issues as HIVE-14817 described



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-32034) Port HIVE-14817: Shutdown the SessionManager timeoutChecker thread properly upon shutdown

2020-06-19 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32034:


Assignee: (was: Apache Spark)

> Port HIVE-14817: Shutdown the SessionManager timeoutChecker thread properly 
> upon shutdown
> -
>
> Key: SPARK-32034
> URL: https://issues.apache.org/jira/browse/SPARK-32034
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Kent Yao
>Priority: Major
>
> When stopping the HiveServer2, the non-daemon thread stops the server from 
> terminating
> {code:java}
> "HiveServer2-Background-Pool: Thread-79" #79 prio=5 os_prio=31 
> tid=0x7fde26138800 nid=0x13713 waiting on condition [0x700010c32000]
>java.lang.Thread.State: TIMED_WAITING (sleeping)
>   at java.lang.Thread.sleep(Native Method)
>   at 
> org.apache.hive.service.cli.session.SessionManager$1.sleepInterval(SessionManager.java:178)
>   at 
> org.apache.hive.service.cli.session.SessionManager$1.run(SessionManager.java:156)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
> Also, causes issues as HIVE-14817 described



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32034) Port HIVE-14817: Shutdown the SessionManager timeoutChecker thread properly upon shutdown

2020-06-19 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140343#comment-17140343
 ] 

Apache Spark commented on SPARK-32034:
--

User 'yaooqinn' has created a pull request for this issue:
https://github.com/apache/spark/pull/28870

> Port HIVE-14817: Shutdown the SessionManager timeoutChecker thread properly 
> upon shutdown
> -
>
> Key: SPARK-32034
> URL: https://issues.apache.org/jira/browse/SPARK-32034
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Kent Yao
>Priority: Major
>
> When stopping the HiveServer2, the non-daemon thread stops the server from 
> terminating
> {code:java}
> "HiveServer2-Background-Pool: Thread-79" #79 prio=5 os_prio=31 
> tid=0x7fde26138800 nid=0x13713 waiting on condition [0x700010c32000]
>java.lang.Thread.State: TIMED_WAITING (sleeping)
>   at java.lang.Thread.sleep(Native Method)
>   at 
> org.apache.hive.service.cli.session.SessionManager$1.sleepInterval(SessionManager.java:178)
>   at 
> org.apache.hive.service.cli.session.SessionManager$1.run(SessionManager.java:156)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
> Also, causes issues as HIVE-14817 described



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-32034) Port HIVE-14817: Shutdown the SessionManager timeoutChecker thread properly upon shutdown

2020-06-19 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32034:


Assignee: Apache Spark

> Port HIVE-14817: Shutdown the SessionManager timeoutChecker thread properly 
> upon shutdown
> -
>
> Key: SPARK-32034
> URL: https://issues.apache.org/jira/browse/SPARK-32034
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Kent Yao
>Assignee: Apache Spark
>Priority: Major
>
> When stopping the HiveServer2, the non-daemon thread stops the server from 
> terminating
> {code:java}
> "HiveServer2-Background-Pool: Thread-79" #79 prio=5 os_prio=31 
> tid=0x7fde26138800 nid=0x13713 waiting on condition [0x700010c32000]
>java.lang.Thread.State: TIMED_WAITING (sleeping)
>   at java.lang.Thread.sleep(Native Method)
>   at 
> org.apache.hive.service.cli.session.SessionManager$1.sleepInterval(SessionManager.java:178)
>   at 
> org.apache.hive.service.cli.session.SessionManager$1.run(SessionManager.java:156)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
> Also, causes issues as HIVE-14817 described



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-32034) Port HIVE-14817: Shutdown the SessionManager timeoutChecker thread properly upon shutdown

2020-06-19 Thread Kent Yao (Jira)
Kent Yao created SPARK-32034:


 Summary: Port HIVE-14817: Shutdown the SessionManager 
timeoutChecker thread properly upon shutdown
 Key: SPARK-32034
 URL: https://issues.apache.org/jira/browse/SPARK-32034
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.0, 3.1.0
Reporter: Kent Yao


When stopping the HiveServer2, the non-daemon thread stops the server from 
terminating

{code:java}
"HiveServer2-Background-Pool: Thread-79" #79 prio=5 os_prio=31 
tid=0x7fde26138800 nid=0x13713 waiting on condition [0x700010c32000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at 
org.apache.hive.service.cli.session.SessionManager$1.sleepInterval(SessionManager.java:178)
at 
org.apache.hive.service.cli.session.SessionManager$1.run(SessionManager.java:156)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)


{code}


Also, causes issues as HIVE-14817 described




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-32033) Use new poll API in Kafka connector executor side to avoid infinite wait

2020-06-19 Thread Gabor Somogyi (Jira)
Gabor Somogyi created SPARK-32033:
-

 Summary: Use new poll API in Kafka connector executor side to 
avoid infinite wait
 Key: SPARK-32033
 URL: https://issues.apache.org/jira/browse/SPARK-32033
 Project: Spark
  Issue Type: Sub-task
  Components: Structured Streaming
Affects Versions: 3.1.0
Reporter: Gabor Somogyi






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-32032) Use new poll API in Kafka connector diver side to avoid infinite wait

2020-06-19 Thread Gabor Somogyi (Jira)
Gabor Somogyi created SPARK-32032:
-

 Summary: Use new poll API in Kafka connector diver side to avoid 
infinite wait
 Key: SPARK-32032
 URL: https://issues.apache.org/jira/browse/SPARK-32032
 Project: Spark
  Issue Type: Sub-task
  Components: Structured Streaming
Affects Versions: 3.1.0
Reporter: Gabor Somogyi






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28367) Kafka connector infinite wait because metadata never updated

2020-06-19 Thread Gabor Somogyi (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140319#comment-17140319
 ] 

Gabor Somogyi commented on SPARK-28367:
---

I think we can split the problem into 2 pieces. Driver and executor side.
The executor side is not problematic and can be done w/o new API.
The driver side requires further consideration and effort. Creating subtasks 
and PR for executor side.

> Kafka connector infinite wait because metadata never updated
> 
>
> Key: SPARK-28367
> URL: https://issues.apache.org/jira/browse/SPARK-28367
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.1.3, 2.2.3, 2.3.3, 2.4.3, 3.0.0
>Reporter: Gabor Somogyi
>Priority: Critical
>
> Spark uses an old and deprecated API named poll(long) which never returns and 
> stays in live lock if metadata is not updated (for instance when broker 
> disappears at consumer creation).
> I've created a small standalone application to test it and the alternatives: 
> https://github.com/gaborgsomogyi/kafka-get-assignment



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32031) Fix the wrong references of PartialMerge/Final AggregateExpression

2020-06-19 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140289#comment-17140289
 ] 

Apache Spark commented on SPARK-32031:
--

User 'Ngone51' has created a pull request for this issue:
https://github.com/apache/spark/pull/28869

> Fix the wrong references of PartialMerge/Final AggregateExpression
> --
>
> Key: SPARK-32031
> URL: https://issues.apache.org/jira/browse/SPARK-32031
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: wuyi
>Priority: Major
>
> For the PartialMerge/Final AggregateExpression, it should reference the 
> `inputAggBufferAttributes` instead of `aggBufferAttributes` according to 
> `AggUtils.planAggXXX`



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-32031) Fix the wrong references of PartialMerge/Final AggregateExpression

2020-06-19 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32031:


Assignee: Apache Spark

> Fix the wrong references of PartialMerge/Final AggregateExpression
> --
>
> Key: SPARK-32031
> URL: https://issues.apache.org/jira/browse/SPARK-32031
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: wuyi
>Assignee: Apache Spark
>Priority: Major
>
> For the PartialMerge/Final AggregateExpression, it should reference the 
> `inputAggBufferAttributes` instead of `aggBufferAttributes` according to 
> `AggUtils.planAggXXX`



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-32031) Fix the wrong references of PartialMerge/Final AggregateExpression

2020-06-19 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32031:


Assignee: (was: Apache Spark)

> Fix the wrong references of PartialMerge/Final AggregateExpression
> --
>
> Key: SPARK-32031
> URL: https://issues.apache.org/jira/browse/SPARK-32031
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: wuyi
>Priority: Major
>
> For the PartialMerge/Final AggregateExpression, it should reference the 
> `inputAggBufferAttributes` instead of `aggBufferAttributes` according to 
> `AggUtils.planAggXXX`



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-32031) Fix the wrong references of PartialMerge/Final AggregateExpression

2020-06-19 Thread wuyi (Jira)
wuyi created SPARK-32031:


 Summary: Fix the wrong references of PartialMerge/Final 
AggregateExpression
 Key: SPARK-32031
 URL: https://issues.apache.org/jira/browse/SPARK-32031
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.0
Reporter: wuyi


For the PartialMerge/Final AggregateExpression, it should reference the 
`inputAggBufferAttributes` instead of `aggBufferAttributes` according to 
`AggUtils.planAggXXX`



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-32030) Support unlimited MATCHED and NOT MATCHED clauses in MERGE INTO

2020-06-19 Thread Xianyin Xin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xianyin Xin updated SPARK-32030:

Description: 
Now the {{MERGE INTO}} syntax is,
{code:sql}
MERGE INTO [db_name.]target_table [AS target_alias]
 USING [db_name.]source_table [] [AS source_alias]
 ON 
 [ WHEN MATCHED [ AND  ] THEN  ]
 [ WHEN MATCHED [ AND  ] THEN  ]
 [ WHEN NOT MATCHED [ AND  ] THEN  ]{code}
It would be nice if we support unlimited {{MATCHED}} and {{NOT MATCHED}} 
clauses in {{MERGE INTO}} statement, because users may want to deal with 
different "{{AND }}"s, the result of which just like a series of 
"{{CASE WHEN}}"s. The expected syntax looks like
{code:sql}
MERGE INTO [db_name.]target_table [AS target_alias]
 USING [db_name.]source_table [] [AS source_alias]
 ON 
 [when_clause [, ...]]
{code}

 where {{when_clause}} is 
{code:java}
WHEN MATCHED [ AND  ] THEN {code}
or
{code:java}
WHEN NOT MATCHED [ AND  ] THEN {code}
 

  was:
Now the MERGE INTO syntax is,
```
MERGE INTO [db_name.]target_table [AS target_alias]
USING [db_name.]source_table [] [AS source_alias]
ON 
[ WHEN MATCHED [ AND  ] THEN  ]
[ WHEN MATCHED [ AND  ] THEN  ]
[ WHEN NOT MATCHED [ AND  ]  THEN  ]
```
It would be nice if we support unlimited MATCHED and NOT MATCHED clauses in 
MERGE INTO statement, because users may want to deal with different "AND 
"s, the result of which just like a series of "CASE WHEN"s. The 
expected syntax looks like
```
MERGE INTO [db_name.]target_table [AS target_alias]
USING [db_name.]source_table [] [AS source_alias]
ON 
[when_clause [, ...]]
```
where `when_clause` is 
```
WHEN MATCHED [ AND  ] THEN 
```
or
```
WHEN NOT MATCHED [ AND  ]  THEN 
```



> Support unlimited MATCHED and NOT MATCHED clauses in MERGE INTO
> ---
>
> Key: SPARK-32030
> URL: https://issues.apache.org/jira/browse/SPARK-32030
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.1
>Reporter: Xianyin Xin
>Priority: Major
>
> Now the {{MERGE INTO}} syntax is,
> {code:sql}
> MERGE INTO [db_name.]target_table [AS target_alias]
>  USING [db_name.]source_table [] [AS source_alias]
>  ON 
>  [ WHEN MATCHED [ AND  ] THEN  ]
>  [ WHEN MATCHED [ AND  ] THEN  ]
>  [ WHEN NOT MATCHED [ AND  ] THEN  ]{code}
> It would be nice if we support unlimited {{MATCHED}} and {{NOT MATCHED}} 
> clauses in {{MERGE INTO}} statement, because users may want to deal with 
> different "{{AND }}"s, the result of which just like a series of 
> "{{CASE WHEN}}"s. The expected syntax looks like
> {code:sql}
> MERGE INTO [db_name.]target_table [AS target_alias]
>  USING [db_name.]source_table [] [AS source_alias]
>  ON 
>  [when_clause [, ...]]
> {code}
>  where {{when_clause}} is 
> {code:java}
> WHEN MATCHED [ AND  ] THEN {code}
> or
> {code:java}
> WHEN NOT MATCHED [ AND  ] THEN {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-32030) Support unlimited MATCHED and NOT MATCHED clauses in MERGE INTO

2020-06-19 Thread Xianyin Xin (Jira)
Xianyin Xin created SPARK-32030:
---

 Summary: Support unlimited MATCHED and NOT MATCHED clauses in 
MERGE INTO
 Key: SPARK-32030
 URL: https://issues.apache.org/jira/browse/SPARK-32030
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.1
Reporter: Xianyin Xin


Now the MERGE INTO syntax is,
```
MERGE INTO [db_name.]target_table [AS target_alias]
USING [db_name.]source_table [] [AS source_alias]
ON 
[ WHEN MATCHED [ AND  ] THEN  ]
[ WHEN MATCHED [ AND  ] THEN  ]
[ WHEN NOT MATCHED [ AND  ]  THEN  ]
```
It would be nice if we support unlimited MATCHED and NOT MATCHED clauses in 
MERGE INTO statement, because users may want to deal with different "AND 
"s, the result of which just like a series of "CASE WHEN"s. The 
expected syntax looks like
```
MERGE INTO [db_name.]target_table [AS target_alias]
USING [db_name.]source_table [] [AS source_alias]
ON 
[when_clause [, ...]]
```
where `when_clause` is 
```
WHEN MATCHED [ AND  ] THEN 
```
or
```
WHEN NOT MATCHED [ AND  ]  THEN 
```




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-31993) Generated code in 'concat_ws' fails to compile when splitting method is in effect

2020-06-19 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-31993.
-
Fix Version/s: 3.1.0
 Assignee: Jungtaek Lim
   Resolution: Fixed

> Generated code in 'concat_ws' fails to compile when splitting method is in 
> effect
> -
>
> Key: SPARK-31993
> URL: https://issues.apache.org/jira/browse/SPARK-31993
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.4, 2.4.6, 3.0.0, 3.1.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Major
> Fix For: 3.1.0
>
>
> https://github.com/apache/spark/blob/a0187cd6b59a6b6bb2cadc6711bb663d4d35a844/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala#L88-L195
> There're three parts of generated code in concat_ws (codes, varargCounts, 
> varargBuilds) and all parts try to split method by itself, while 
> `varargCounts` and `varargBuilds` refer on the generated code in `codes`, 
> hence the overall generated code fails to compile if any of part succeeds to 
> split.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32029) Make activeSession null when application end

2020-06-19 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140235#comment-17140235
 ] 

Apache Spark commented on SPARK-32029:
--

User 'ulysses-you' has created a pull request for this issue:
https://github.com/apache/spark/pull/28868

> Make activeSession null when application end
> 
>
> Key: SPARK-32029
> URL: https://issues.apache.org/jira/browse/SPARK-32029
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: ulysses you
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-32029) Make activeSession null when application end

2020-06-19 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32029:


Assignee: (was: Apache Spark)

> Make activeSession null when application end
> 
>
> Key: SPARK-32029
> URL: https://issues.apache.org/jira/browse/SPARK-32029
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: ulysses you
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-32029) Make activeSession null when application end

2020-06-19 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32029:


Assignee: Apache Spark

> Make activeSession null when application end
> 
>
> Key: SPARK-32029
> URL: https://issues.apache.org/jira/browse/SPARK-32029
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: ulysses you
>Assignee: Apache Spark
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32029) Make activeSession null when application end

2020-06-19 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140234#comment-17140234
 ] 

Apache Spark commented on SPARK-32029:
--

User 'ulysses-you' has created a pull request for this issue:
https://github.com/apache/spark/pull/28868

> Make activeSession null when application end
> 
>
> Key: SPARK-32029
> URL: https://issues.apache.org/jira/browse/SPARK-32029
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: ulysses you
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org