[jira] [Assigned] (SPARK-31957) cleanup hive scratch dir should work for the developer api startWithContext
[ https://issues.apache.org/jira/browse/SPARK-31957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-31957: - Assignee: Kent Yao > cleanup hive scratch dir should work for the developer api startWithContext > --- > > Key: SPARK-31957 > URL: https://issues.apache.org/jira/browse/SPARK-31957 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0, 3.1.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > > Comparing to the long-running ThriftServer via start-script, we are more > likely to hit the issue https://issues.apache.org/jira/browse/HIVE-10415 / > https://issues.apache.org/jira/browse/SPARK-31626 in the developer API > startWithContext -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-31957) cleanup hive scratch dir should work for the developer api startWithContext
[ https://issues.apache.org/jira/browse/SPARK-31957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-31957. --- Fix Version/s: 3.1.0 Resolution: Fixed Issue resolved by pull request 28784 [https://github.com/apache/spark/pull/28784] > cleanup hive scratch dir should work for the developer api startWithContext > --- > > Key: SPARK-31957 > URL: https://issues.apache.org/jira/browse/SPARK-31957 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0, 3.1.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Fix For: 3.1.0 > > > Comparing to the long-running ThriftServer via start-script, we are more > likely to hit the issue https://issues.apache.org/jira/browse/HIVE-10415 / > https://issues.apache.org/jira/browse/SPARK-31626 in the developer API > startWithContext -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-32021) make_interval does not accept seconds >100
[ https://issues.apache.org/jira/browse/SPARK-32021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-32021. --- Fix Version/s: 3.1.0 Assignee: Maxim Gekk Resolution: Fixed This is resolved via https://github.com/apache/spark/pull/28873 > make_interval does not accept seconds >100 > -- > > Key: SPARK-32021 > URL: https://issues.apache.org/jira/browse/SPARK-32021 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Juliusz Sompolski >Assignee: Maxim Gekk >Priority: Major > Fix For: 3.1.0 > > > In make_interval(years, months, weeks, days, hours, mins, secs), secs are > defined as Decimal(8, 6), which turns into null if the value of the > expression overflows 100 seconds. > Larger seconds values should be allowed. > This has been reported by Simba, who wants to use make_interval to implement > translation for TIMESTAMP_ADD ODBC function in Spark 3.0. > ODBC {fn TIMESTAMPADD(SECOND, integer_exp, timestamp} fails when integer_exp > returns seconds values >= 100. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31980) Spark sequence() fails if start and end of range are identical dates
[ https://issues.apache.org/jira/browse/SPARK-31980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-31980: -- Affects Version/s: 2.4.0 2.4.1 2.4.2 2.4.3 2.4.5 2.4.6 > Spark sequence() fails if start and end of range are identical dates > > > Key: SPARK-31980 > URL: https://issues.apache.org/jira/browse/SPARK-31980 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0, 2.4.1, 2.4.2, 2.4.3, 2.4.4, 2.4.5, 2.4.6 > Environment: Spark 2.4.4 standalone and on AWS EMR >Reporter: Dave DeCaprio >Assignee: JinxinTang >Priority: Minor > Fix For: 3.0.1, 3.1.0, 2.4.7 > > > > The following Spark SQL query throws an exception > {code:java} > select sequence(cast("2011-03-01" as date), cast("2011-03-01" as date), > interval 1 month) > {code} > The error is: > > > {noformat} > java.lang.ArrayIndexOutOfBoundsException: > 1java.lang.ArrayIndexOutOfBoundsException: 1 at > scala.runtime.ScalaRunTime$.array_update(ScalaRunTime.scala:92) at > org.apache.spark.sql.catalyst.expressions.Sequence$TemporalSequenceImpl.eval(collectionOperations.scala:2681) > at > org.apache.spark.sql.catalyst.expressions.Sequence.eval(collectionOperations.scala:2514) > at > org.apache.spark.sql.catalyst.expressions.UnaryExpression.eval(Expression.scala:389){noformat} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-30876) Optimizer cannot infer from inferred constraints with join
[ https://issues.apache.org/jira/browse/SPARK-30876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140898#comment-17140898 ] Navin Viswanath edited comment on SPARK-30876 at 6/20/20, 2:27 AM: --- [~yumwang] would this be in the logical plan optimization? I was looking into the logical plans and got this for the following query: {noformat} val x = testRelation.subquery('x) val y = testRelation1.subquery('y) val z = testRelation.subquery('z) val query = x.join(y).join(z) .where(("x.a".attr === "y.b".attr) && ("y.b".attr === "z.c".attr) && ("z.c".attr === 1)){noformat} Unoptimized: {noformat} 'Filter ((('x.a = 'y.b) AND ('y.b = 'z.c)) AND ('z.c = 1)) +- 'Join Inner :- Join Inner : :- SubqueryAlias x : : +- LocalRelation , [a#0, b#1, c#2] : +- SubqueryAlias y : +- LocalRelation , [d#3] +- SubqueryAlias z +- LocalRelation , [a#0, b#1, c#2]{noformat} Optimized: {noformat} 'Filter ((('x.a = 'y.b) AND ('y.b = 'z.c)) AND ('z.c = 1)) +- 'Join Inner :- Join Inner : :- LocalRelation , [a#0, b#1, c#2] : +- LocalRelation , [d#3] +- LocalRelation , [a#0, b#1, c#2]{noformat} Or was this supposed to be in the physical plan? Any pointers would help. Thanks! was (Author: navinvishy): [~yumwang] would this be in the logical plan optimization? I was looking into the logical plans and got this. Unoptimized: {noformat} 'Filter ((('x.a = 'y.b) AND ('y.b = 'z.c)) AND ('z.c = 1)) +- 'Join Inner :- Join Inner : :- SubqueryAlias x : : +- LocalRelation , [a#0, b#1, c#2] : +- SubqueryAlias y : +- LocalRelation , [d#3] +- SubqueryAlias z +- LocalRelation , [a#0, b#1, c#2]{noformat} Optimized: {noformat} 'Filter ((('x.a = 'y.b) AND ('y.b = 'z.c)) AND ('z.c = 1)) +- 'Join Inner :- Join Inner : :- LocalRelation , [a#0, b#1, c#2] : +- LocalRelation , [d#3] +- LocalRelation , [a#0, b#1, c#2]{noformat} Or was this supposed to be in the physical plan? Any pointers would help. Thanks! > Optimizer cannot infer from inferred constraints with join > -- > > Key: SPARK-30876 > URL: https://issues.apache.org/jira/browse/SPARK-30876 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Yuming Wang >Priority: Major > > How to reproduce this issue: > {code:sql} > create table t1(a int, b int, c int); > create table t2(a int, b int, c int); > create table t3(a int, b int, c int); > select count(*) from t1 join t2 join t3 on (t1.a = t2.b and t2.b = t3.c and > t3.c = 1); > {code} > Spark 2.3+: > {noformat} > == Physical Plan == > *(4) HashAggregate(keys=[], functions=[count(1)]) > +- Exchange SinglePartition, true, [id=#102] >+- *(3) HashAggregate(keys=[], functions=[partial_count(1)]) > +- *(3) Project > +- *(3) BroadcastHashJoin [b#10], [c#14], Inner, BuildRight > :- *(3) Project [b#10] > : +- *(3) BroadcastHashJoin [a#6], [b#10], Inner, BuildRight > : :- *(3) Project [a#6] > : : +- *(3) Filter isnotnull(a#6) > : : +- *(3) ColumnarToRow > : :+- FileScan parquet default.t1[a#6] Batched: true, > DataFilters: [isnotnull(a#6)], Format: Parquet, Location: > InMemoryFileIndex[file:/Users/yumwang/Downloads/spark-3.0.0-preview2-bin-hadoop2.7/spark-warehous..., > PartitionFilters: [], PushedFilters: [IsNotNull(a)], ReadSchema: > struct > : +- BroadcastExchange > HashedRelationBroadcastMode(List(cast(input[0, int, true] as bigint))), > [id=#87] > :+- *(1) Project [b#10] > : +- *(1) Filter (isnotnull(b#10) AND (b#10 = 1)) > : +- *(1) ColumnarToRow > : +- FileScan parquet default.t2[b#10] Batched: > true, DataFilters: [isnotnull(b#10), (b#10 = 1)], Format: Parquet, Location: > InMemoryFileIndex[file:/Users/yumwang/Downloads/spark-3.0.0-preview2-bin-hadoop2.7/spark-warehous..., > PartitionFilters: [], PushedFilters: [IsNotNull(b), EqualTo(b,1)], > ReadSchema: struct > +- BroadcastExchange > HashedRelationBroadcastMode(List(cast(input[0, int, true] as bigint))), > [id=#96] >+- *(2) Project [c#14] > +- *(2) Filter (isnotnull(c#14) AND (c#14 = 1)) > +- *(2) ColumnarToRow > +- FileScan parquet default.t3[c#14] Batched: true, > DataFilters: [isnotnull(c#14), (c#14 = 1)], Format: Parquet, Location: > InMemoryFileIndex[file:/Users/yumwang/Downloads/spark-3.0.0-preview2-bin-hadoop2.7/spark-warehous..., > PartitionFilters: [], PushedFilters: [IsNotNull(c), EqualTo(c,1)], > ReadSchema: struct > Time taken: 3.785 seconds, Fetched 1 row(s) > {noformat} > Spark 2.2.x: > {noformat} > == Physical Plan == >
[jira] [Resolved] (SPARK-31980) Spark sequence() fails if start and end of range are identical dates
[ https://issues.apache.org/jira/browse/SPARK-31980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-31980. --- Fix Version/s: 3.1.0 2.4.7 3.0.1 Resolution: Fixed Issue resolved by pull request 28819 [https://github.com/apache/spark/pull/28819] > Spark sequence() fails if start and end of range are identical dates > > > Key: SPARK-31980 > URL: https://issues.apache.org/jira/browse/SPARK-31980 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.4 > Environment: Spark 2.4.4 standalone and on AWS EMR >Reporter: Dave DeCaprio >Assignee: JinxinTang >Priority: Minor > Fix For: 3.0.1, 2.4.7, 3.1.0 > > > > The following Spark SQL query throws an exception > {code:java} > select sequence(cast("2011-03-01" as date), cast("2011-03-01" as date), > interval 1 month) > {code} > The error is: > > > {noformat} > java.lang.ArrayIndexOutOfBoundsException: > 1java.lang.ArrayIndexOutOfBoundsException: 1 at > scala.runtime.ScalaRunTime$.array_update(ScalaRunTime.scala:92) at > org.apache.spark.sql.catalyst.expressions.Sequence$TemporalSequenceImpl.eval(collectionOperations.scala:2681) > at > org.apache.spark.sql.catalyst.expressions.Sequence.eval(collectionOperations.scala:2514) > at > org.apache.spark.sql.catalyst.expressions.UnaryExpression.eval(Expression.scala:389){noformat} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31980) Spark sequence() fails if start and end of range are identical dates
[ https://issues.apache.org/jira/browse/SPARK-31980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-31980: - Assignee: JinxinTang > Spark sequence() fails if start and end of range are identical dates > > > Key: SPARK-31980 > URL: https://issues.apache.org/jira/browse/SPARK-31980 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.4 > Environment: Spark 2.4.4 standalone and on AWS EMR >Reporter: Dave DeCaprio >Assignee: JinxinTang >Priority: Minor > > > The following Spark SQL query throws an exception > {code:java} > select sequence(cast("2011-03-01" as date), cast("2011-03-01" as date), > interval 1 month) > {code} > The error is: > > > {noformat} > java.lang.ArrayIndexOutOfBoundsException: > 1java.lang.ArrayIndexOutOfBoundsException: 1 at > scala.runtime.ScalaRunTime$.array_update(ScalaRunTime.scala:92) at > org.apache.spark.sql.catalyst.expressions.Sequence$TemporalSequenceImpl.eval(collectionOperations.scala:2681) > at > org.apache.spark.sql.catalyst.expressions.Sequence.eval(collectionOperations.scala:2514) > at > org.apache.spark.sql.catalyst.expressions.UnaryExpression.eval(Expression.scala:389){noformat} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32030) Support unlimited MATCHED and NOT MATCHED clauses in MERGE INTO
[ https://issues.apache.org/jira/browse/SPARK-32030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32030: Assignee: (was: Apache Spark) > Support unlimited MATCHED and NOT MATCHED clauses in MERGE INTO > --- > > Key: SPARK-32030 > URL: https://issues.apache.org/jira/browse/SPARK-32030 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.1 >Reporter: Xianyin Xin >Priority: Major > > Now the {{MERGE INTO}} syntax is, > {code:sql} > MERGE INTO [db_name.]target_table [AS target_alias] > USING [db_name.]source_table [] [AS source_alias] > ON > [ WHEN MATCHED [ AND ] THEN ] > [ WHEN MATCHED [ AND ] THEN ] > [ WHEN NOT MATCHED [ AND ] THEN ]{code} > It would be nice if we support unlimited {{MATCHED}} and {{NOT MATCHED}} > clauses in {{MERGE INTO}} statement, because users may want to deal with > different "{{AND }}"s, the result of which just like a series of > "{{CASE WHEN}}"s. The expected syntax looks like > {code:sql} > MERGE INTO [db_name.]target_table [AS target_alias] > USING [db_name.]source_table [] [AS source_alias] > ON > [when_clause [, ...]] > {code} > where {{when_clause}} is > {code:java} > WHEN MATCHED [ AND ] THEN {code} > or > {code:java} > WHEN NOT MATCHED [ AND ] THEN {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32030) Support unlimited MATCHED and NOT MATCHED clauses in MERGE INTO
[ https://issues.apache.org/jira/browse/SPARK-32030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32030: Assignee: Apache Spark > Support unlimited MATCHED and NOT MATCHED clauses in MERGE INTO > --- > > Key: SPARK-32030 > URL: https://issues.apache.org/jira/browse/SPARK-32030 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.1 >Reporter: Xianyin Xin >Assignee: Apache Spark >Priority: Major > > Now the {{MERGE INTO}} syntax is, > {code:sql} > MERGE INTO [db_name.]target_table [AS target_alias] > USING [db_name.]source_table [] [AS source_alias] > ON > [ WHEN MATCHED [ AND ] THEN ] > [ WHEN MATCHED [ AND ] THEN ] > [ WHEN NOT MATCHED [ AND ] THEN ]{code} > It would be nice if we support unlimited {{MATCHED}} and {{NOT MATCHED}} > clauses in {{MERGE INTO}} statement, because users may want to deal with > different "{{AND }}"s, the result of which just like a series of > "{{CASE WHEN}}"s. The expected syntax looks like > {code:sql} > MERGE INTO [db_name.]target_table [AS target_alias] > USING [db_name.]source_table [] [AS source_alias] > ON > [when_clause [, ...]] > {code} > where {{when_clause}} is > {code:java} > WHEN MATCHED [ AND ] THEN {code} > or > {code:java} > WHEN NOT MATCHED [ AND ] THEN {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32030) Support unlimited MATCHED and NOT MATCHED clauses in MERGE INTO
[ https://issues.apache.org/jira/browse/SPARK-32030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140908#comment-17140908 ] Apache Spark commented on SPARK-32030: -- User 'xianyinxin' has created a pull request for this issue: https://github.com/apache/spark/pull/28875 > Support unlimited MATCHED and NOT MATCHED clauses in MERGE INTO > --- > > Key: SPARK-32030 > URL: https://issues.apache.org/jira/browse/SPARK-32030 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.1 >Reporter: Xianyin Xin >Priority: Major > > Now the {{MERGE INTO}} syntax is, > {code:sql} > MERGE INTO [db_name.]target_table [AS target_alias] > USING [db_name.]source_table [] [AS source_alias] > ON > [ WHEN MATCHED [ AND ] THEN ] > [ WHEN MATCHED [ AND ] THEN ] > [ WHEN NOT MATCHED [ AND ] THEN ]{code} > It would be nice if we support unlimited {{MATCHED}} and {{NOT MATCHED}} > clauses in {{MERGE INTO}} statement, because users may want to deal with > different "{{AND }}"s, the result of which just like a series of > "{{CASE WHEN}}"s. The expected syntax looks like > {code:sql} > MERGE INTO [db_name.]target_table [AS target_alias] > USING [db_name.]source_table [] [AS source_alias] > ON > [when_clause [, ...]] > {code} > where {{when_clause}} is > {code:java} > WHEN MATCHED [ AND ] THEN {code} > or > {code:java} > WHEN NOT MATCHED [ AND ] THEN {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32036) Remove references to "blacklist"/"whitelist" language (outside of blacklisting feature)
[ https://issues.apache.org/jira/browse/SPARK-32036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140900#comment-17140900 ] Apache Spark commented on SPARK-32036: -- User 'xkrogen' has created a pull request for this issue: https://github.com/apache/spark/pull/28874 > Remove references to "blacklist"/"whitelist" language (outside of > blacklisting feature) > --- > > Key: SPARK-32036 > URL: https://issues.apache.org/jira/browse/SPARK-32036 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.1 >Reporter: Erik Krogen >Priority: Minor > > As per [discussion on the Spark dev > list|https://lists.apache.org/thread.html/rf6b2cdcba4d3875350517a2339619e5d54e12e66626a88553f9fe275%40%3Cdev.spark.apache.org%3E], > it will be beneficial to remove references to problematic language that can > alienate potential community members. One such reference is "blacklist" and > "whitelist". While it seems to me that there is some valid debate as to > whether these terms have racist origins, the cultural connotations are > inescapable in today's world. > Renaming the entire blacklisting feature would be a large effort with lots of > care needed to maintain public-facing APIs and configurations. Though I think > this will be a very rewarding effort for which I've filed SPARK-32037, I'd > like to start by tackling all of the other references to such terminology in > the codebase, of which there are still dozens or hundreds beyond the > blacklisting feature. > I'm not sure what the best "Component" is for this so I put Spark Core for > now. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32036) Remove references to "blacklist"/"whitelist" language (outside of blacklisting feature)
[ https://issues.apache.org/jira/browse/SPARK-32036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32036: Assignee: Apache Spark > Remove references to "blacklist"/"whitelist" language (outside of > blacklisting feature) > --- > > Key: SPARK-32036 > URL: https://issues.apache.org/jira/browse/SPARK-32036 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.1 >Reporter: Erik Krogen >Assignee: Apache Spark >Priority: Minor > > As per [discussion on the Spark dev > list|https://lists.apache.org/thread.html/rf6b2cdcba4d3875350517a2339619e5d54e12e66626a88553f9fe275%40%3Cdev.spark.apache.org%3E], > it will be beneficial to remove references to problematic language that can > alienate potential community members. One such reference is "blacklist" and > "whitelist". While it seems to me that there is some valid debate as to > whether these terms have racist origins, the cultural connotations are > inescapable in today's world. > Renaming the entire blacklisting feature would be a large effort with lots of > care needed to maintain public-facing APIs and configurations. Though I think > this will be a very rewarding effort for which I've filed SPARK-32037, I'd > like to start by tackling all of the other references to such terminology in > the codebase, of which there are still dozens or hundreds beyond the > blacklisting feature. > I'm not sure what the best "Component" is for this so I put Spark Core for > now. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32036) Remove references to "blacklist"/"whitelist" language (outside of blacklisting feature)
[ https://issues.apache.org/jira/browse/SPARK-32036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140899#comment-17140899 ] Apache Spark commented on SPARK-32036: -- User 'xkrogen' has created a pull request for this issue: https://github.com/apache/spark/pull/28874 > Remove references to "blacklist"/"whitelist" language (outside of > blacklisting feature) > --- > > Key: SPARK-32036 > URL: https://issues.apache.org/jira/browse/SPARK-32036 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.1 >Reporter: Erik Krogen >Priority: Minor > > As per [discussion on the Spark dev > list|https://lists.apache.org/thread.html/rf6b2cdcba4d3875350517a2339619e5d54e12e66626a88553f9fe275%40%3Cdev.spark.apache.org%3E], > it will be beneficial to remove references to problematic language that can > alienate potential community members. One such reference is "blacklist" and > "whitelist". While it seems to me that there is some valid debate as to > whether these terms have racist origins, the cultural connotations are > inescapable in today's world. > Renaming the entire blacklisting feature would be a large effort with lots of > care needed to maintain public-facing APIs and configurations. Though I think > this will be a very rewarding effort for which I've filed SPARK-32037, I'd > like to start by tackling all of the other references to such terminology in > the codebase, of which there are still dozens or hundreds beyond the > blacklisting feature. > I'm not sure what the best "Component" is for this so I put Spark Core for > now. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32036) Remove references to "blacklist"/"whitelist" language (outside of blacklisting feature)
[ https://issues.apache.org/jira/browse/SPARK-32036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32036: Assignee: (was: Apache Spark) > Remove references to "blacklist"/"whitelist" language (outside of > blacklisting feature) > --- > > Key: SPARK-32036 > URL: https://issues.apache.org/jira/browse/SPARK-32036 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.1 >Reporter: Erik Krogen >Priority: Minor > > As per [discussion on the Spark dev > list|https://lists.apache.org/thread.html/rf6b2cdcba4d3875350517a2339619e5d54e12e66626a88553f9fe275%40%3Cdev.spark.apache.org%3E], > it will be beneficial to remove references to problematic language that can > alienate potential community members. One such reference is "blacklist" and > "whitelist". While it seems to me that there is some valid debate as to > whether these terms have racist origins, the cultural connotations are > inescapable in today's world. > Renaming the entire blacklisting feature would be a large effort with lots of > care needed to maintain public-facing APIs and configurations. Though I think > this will be a very rewarding effort for which I've filed SPARK-32037, I'd > like to start by tackling all of the other references to such terminology in > the codebase, of which there are still dozens or hundreds beyond the > blacklisting feature. > I'm not sure what the best "Component" is for this so I put Spark Core for > now. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32038) Regression in handling NaN values in COUNT(DISTINCT)
[ https://issues.apache.org/jira/browse/SPARK-32038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mithun Radhakrishnan updated SPARK-32038: - Description: There seems to be a regression in Spark 3.0.0, with regard to how {{NaN}} values are normalized/handled in {{COUNT(DISTINCT ...)}}. Here is an illustration: {code:scala} case class Test( uid:String, score:Float) val POS_NAN_1 = java.lang.Float.intBitsToFloat(0x7f81) val POS_NAN_2 = java.lang.Float.intBitsToFloat(0x7fff) val rows = Seq( Test("mithunr", Float.NaN), Test("mithunr", POS_NAN_1), Test("mithunr", POS_NAN_2), Test("abellina", 1.0f), Test("abellina", 2.0f) ).toDF.createOrReplaceTempView("mytable") spark.sql(" select uid, count(distinct score) from mytable group by 1 order by 1 asc ").show {code} Here are the results under Spark 3.0.0: {code:java|title=Spark 3.0.0 (single aggregation)} ++-+ | uid|count(DISTINCT score)| ++-+ |abellina|2| | mithunr|3| ++-+ {code} Note that the count against {{mithunr}} is {{3}}, accounting for each distinct value for {{NaN}}. The right results are returned when another aggregation is added to the GBY: {code:scala|title=Spark 3.0.0 (multiple aggregations)} scala> spark.sql(" select uid, count(distinct score), max(score) from mytable group by 1 order by 1 asc ").show ++-+--+ | uid|count(DISTINCT score)|max(score)| ++-+--+ |abellina|2| 2.0| | mithunr|1| NaN| ++-+--+ {code} Also, note that Spark 2.4.6 normalizes the {{DISTINCT}} expression correctly: {code:scala|title=Spark 2.4.6} scala> spark.sql(" select uid, count(distinct score) from mytable group by 1 order by 1 asc ").show ++-+ | uid|count(DISTINCT score)| ++-+ |abellina|2| | mithunr|1| ++-+ {code} was: There seems to be a regression in Spark 3.0.0, with regard to how {{NaN}} values are normalized/handled in {{COUNT(DISTINCT ...)}}. Here is an illustration: {code:scala} case class Test( uid:String, score:Float) val POS_NAN_1 = java.lang.Float.intBitsToFloat(0x7f81) val POS_NAN_2 = java.lang.Float.intBitsToFloat(0x7fff) val rows = Seq( Test("mithunr", Float.NaN), Test("mithunr", POS_NAN_1), Test("mithunr", POS_NAN_2), Test("abellina", 1.0f), Test("abellina", 2.0f) ).toDF.createOrReplaceTempView("mytable") spark.sql(" select uid, count(distinct score) from mytable group by 1 order by 1 asc ").show {code} Here are the results under Spark 3.0.0: {code:java|title=Spark 3.0.0 (single aggregation)} ++-+ | uid|count(DISTINCT score)| ++-+ |abellina|2| | mithunr|3| ++-+ {code} Note that the count against {{mithunr}} is {{3}}, accounting for each distinct value for {{NaN}}. The right results are returned when another aggregation is added to the GBY: {code:scala|title=Spark 3.0.0 (multiple aggregations)} scala> spark.sql(" select uid, count(distinct score), max(score) from mytable group by 1 order by 1 asc ").show ++-+--+ | uid|count(DISTINCT score)|max(score)| ++-+--+ |abellina|2| 2.0| | mithunr|1| NaN| ++-+--+ {code} Also, note that Spark 2.4.6 normalizes the {{DISTINCT}} expression correctly: {code:scala|title=Spark 2.4.6} scala> spark.sql(" select uid, count(distinct score) from mytable group by 1 order by 1 asc ").show ++-+ | uid|count(DISTINCT score)| ++-+ |abellina|2| | mithunr|1| ++-+ {code} > Regression in handling NaN values in COUNT(DISTINCT) > > > Key: SPARK-32038 > URL: https://issues.apache.org/jira/browse/SPARK-32038 > Project: Spark > Issue Type: Bug > Components: Optimizer, SQL >Affects Versions: 3.0.0 >Reporter: Mithun Radhakrishnan >Priority: Major > > There seems to be a regression in Spark 3.0.0, with regard to how {{NaN}} > values are normalized/handled in {{COUNT(DISTINCT ...)}}. Here is an > illustration: > {code:scala} > case class Test( uid:String, score:Float) > val POS_NAN_1 = java.lang.Float.intBitsToFloat(0x7f81) > val POS_NAN_2 = java.lang.Float.intBitsToFloat(0x7fff) > val rows = Seq( > Test("mithunr",
[jira] [Updated] (SPARK-32038) Regression in handling NaN values in COUNT(DISTINCT)
[ https://issues.apache.org/jira/browse/SPARK-32038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mithun Radhakrishnan updated SPARK-32038: - Description: There seems to be a regression in Spark 3.0.0, with regard to how {{NaN}} values are normalized/handled in {{COUNT(DISTINCT ...)}}. Here is an illustration: {code:scala} case class Test( uid:String, score:Float) val POS_NAN_1 = java.lang.Float.intBitsToFloat(0x7f81) val POS_NAN_2 = java.lang.Float.intBitsToFloat(0x7fff) val rows = Seq( Test("mithunr", Float.NaN), Test("mithunr", POS_NAN_1), Test("mithunr", POS_NAN_2), Test("abellina", 1.0f), Test("abellina", 2.0f) ).toDF.createOrReplaceTempView("mytable") spark.sql(" select uid, count(distinct score) from mytable group by 1 order by 1 asc ").show {code} Here are the results under Spark 3.0.0: {code:java|title=Spark 3.0.0 (single aggregation)} ++-+ | uid|count(DISTINCT score)| ++-+ |abellina|2| | mithunr|3| ++-+ {code} Note that the count against {{mithunr}} is {{3}}, accounting for each distinct value for {{NaN}}. The right results are returned when another aggregation is added to the GBY: {code:scala|title=Spark 3.0.0 (multiple aggregations)} scala> spark.sql(" select uid, count(distinct score), max(score) from mytable group by 1 order by 1 asc ").show ++-+--+ | uid|count(DISTINCT score)|max(score)| ++-+--+ |abellina|2| 2.0| | mithunr|1| NaN| ++-+--+ {code} Also, note that Spark 2.4.6 normalizes the {{DISTINCT}} expression correctly: {code:scala|title=Spark 2.4.6} scala> spark.sql(" select uid, count(distinct score) from mytable group by 1 order by 1 asc ").show ++-+ | uid|count(DISTINCT score)| ++-+ |abellina|2| | mithunr|1| ++-+ {code} was: There seems to be a regression in Spark 3.0.0, with regard to how {{NaN}} values are normalized/handled in {{COUNT(DISTINCT ...)}}. Here is an illustration: {code:scala} case class Test( uid:String, score:Float) val POS_NAN_1 = java.lang.Float.intBitsToFloat(0x7f81) val POS_NAN_2 = java.lang.Float.intBitsToFloat(0x7fff) val rows = Seq( Test("mithunr", Float.NaN), Test("mithunr", POS_NAN_1), Test("mithunr", POS_NAN_2), Test("abellina", 1.0f), Test("abellina", 2.0f) ).toDF.createOrReplaceTempView("mytable") spark.sql(" select uid, count(distinct score) from mytable group by 1 order by 1 asc ").show {code} Here are the results under Spark 3.0.0: {code:title=Spark 3.0.0 (single aggregation)} ++-+ | uid|count(DISTINCT score)| ++-+ |abellina|2| | mithunr|3| ++-+ {code} Note that the count against {{mithunr}} is {{3}}, accounting for each distinct value for {{NaN}}. The right results are returned when another aggregation is added to the GBY: {code:scala|title=Spark 3.0.0 (multiple aggregations)} scala> spark.sql(" select uid, count(distinct score), max(score) from mytable group by 1 order by 1 asc ").show ++-+--+ | uid|count(DISTINCT score)|max(score)| ++-+--+ |abellina|2| 2.0| | mithunr|1| NaN| ++-+--+ {code} Also, note that Spark 2.4.6 normalizes the `DISTINCT` expression correctly: {code:scala|title=Spark 2.4.6} scala> spark.sql(" select uid, count(distinct score) from mytable group by 1 order by 1 asc ").show ++-+ | uid|count(DISTINCT score)| ++-+ |abellina|2| | mithunr|1| ++-+ {code} > Regression in handling NaN values in COUNT(DISTINCT) > > > Key: SPARK-32038 > URL: https://issues.apache.org/jira/browse/SPARK-32038 > Project: Spark > Issue Type: Bug > Components: Optimizer, SQL >Affects Versions: 3.0.0 >Reporter: Mithun Radhakrishnan >Priority: Major > > There seems to be a regression in Spark 3.0.0, with regard to how {{NaN}} > values are normalized/handled in {{COUNT(DISTINCT ...)}}. Here is an > illustration: > {code:scala} > case class Test( uid:String, score:Float) > val POS_NAN_1 = java.lang.Float.intBitsToFloat(0x7f81) > val POS_NAN_2 = java.lang.Float.intBitsToFloat(0x7fff) > val rows = Seq( > Test("mithunr",
[jira] [Commented] (SPARK-30876) Optimizer cannot infer from inferred constraints with join
[ https://issues.apache.org/jira/browse/SPARK-30876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140898#comment-17140898 ] Navin Viswanath commented on SPARK-30876: - [~yumwang] would this be in the logical plan optimization? I was looking into the logical plans and got this. Unoptimized: {noformat} 'Filter ((('x.a = 'y.b) AND ('y.b = 'z.c)) AND ('z.c = 1)) +- 'Join Inner :- Join Inner : :- SubqueryAlias x : : +- LocalRelation , [a#0, b#1, c#2] : +- SubqueryAlias y : +- LocalRelation , [d#3] +- SubqueryAlias z +- LocalRelation , [a#0, b#1, c#2]{noformat} Optimized: {noformat} 'Filter ((('x.a = 'y.b) AND ('y.b = 'z.c)) AND ('z.c = 1)) +- 'Join Inner :- Join Inner : :- LocalRelation , [a#0, b#1, c#2] : +- LocalRelation , [d#3] +- LocalRelation , [a#0, b#1, c#2]{noformat} Or was this supposed to be in the physical plan? Any pointers would help. Thanks! > Optimizer cannot infer from inferred constraints with join > -- > > Key: SPARK-30876 > URL: https://issues.apache.org/jira/browse/SPARK-30876 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Yuming Wang >Priority: Major > > How to reproduce this issue: > {code:sql} > create table t1(a int, b int, c int); > create table t2(a int, b int, c int); > create table t3(a int, b int, c int); > select count(*) from t1 join t2 join t3 on (t1.a = t2.b and t2.b = t3.c and > t3.c = 1); > {code} > Spark 2.3+: > {noformat} > == Physical Plan == > *(4) HashAggregate(keys=[], functions=[count(1)]) > +- Exchange SinglePartition, true, [id=#102] >+- *(3) HashAggregate(keys=[], functions=[partial_count(1)]) > +- *(3) Project > +- *(3) BroadcastHashJoin [b#10], [c#14], Inner, BuildRight > :- *(3) Project [b#10] > : +- *(3) BroadcastHashJoin [a#6], [b#10], Inner, BuildRight > : :- *(3) Project [a#6] > : : +- *(3) Filter isnotnull(a#6) > : : +- *(3) ColumnarToRow > : :+- FileScan parquet default.t1[a#6] Batched: true, > DataFilters: [isnotnull(a#6)], Format: Parquet, Location: > InMemoryFileIndex[file:/Users/yumwang/Downloads/spark-3.0.0-preview2-bin-hadoop2.7/spark-warehous..., > PartitionFilters: [], PushedFilters: [IsNotNull(a)], ReadSchema: > struct > : +- BroadcastExchange > HashedRelationBroadcastMode(List(cast(input[0, int, true] as bigint))), > [id=#87] > :+- *(1) Project [b#10] > : +- *(1) Filter (isnotnull(b#10) AND (b#10 = 1)) > : +- *(1) ColumnarToRow > : +- FileScan parquet default.t2[b#10] Batched: > true, DataFilters: [isnotnull(b#10), (b#10 = 1)], Format: Parquet, Location: > InMemoryFileIndex[file:/Users/yumwang/Downloads/spark-3.0.0-preview2-bin-hadoop2.7/spark-warehous..., > PartitionFilters: [], PushedFilters: [IsNotNull(b), EqualTo(b,1)], > ReadSchema: struct > +- BroadcastExchange > HashedRelationBroadcastMode(List(cast(input[0, int, true] as bigint))), > [id=#96] >+- *(2) Project [c#14] > +- *(2) Filter (isnotnull(c#14) AND (c#14 = 1)) > +- *(2) ColumnarToRow > +- FileScan parquet default.t3[c#14] Batched: true, > DataFilters: [isnotnull(c#14), (c#14 = 1)], Format: Parquet, Location: > InMemoryFileIndex[file:/Users/yumwang/Downloads/spark-3.0.0-preview2-bin-hadoop2.7/spark-warehous..., > PartitionFilters: [], PushedFilters: [IsNotNull(c), EqualTo(c,1)], > ReadSchema: struct > Time taken: 3.785 seconds, Fetched 1 row(s) > {noformat} > Spark 2.2.x: > {noformat} > == Physical Plan == > *HashAggregate(keys=[], functions=[count(1)]) > +- Exchange SinglePartition >+- *HashAggregate(keys=[], functions=[partial_count(1)]) > +- *Project > +- *SortMergeJoin [b#19], [c#23], Inner > :- *Project [b#19] > : +- *SortMergeJoin [a#15], [b#19], Inner > : :- *Sort [a#15 ASC NULLS FIRST], false, 0 > : : +- Exchange hashpartitioning(a#15, 200) > : : +- *Filter (isnotnull(a#15) && (a#15 = 1)) > : :+- HiveTableScan [a#15], HiveTableRelation > `default`.`t1`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [a#15, > b#16, c#17] > : +- *Sort [b#19 ASC NULLS FIRST], false, 0 > :+- Exchange hashpartitioning(b#19, 200) > : +- *Filter (isnotnull(b#19) && (b#19 = 1)) > : +- HiveTableScan [b#19], HiveTableRelation > `default`.`t2`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [a#18, > b#19, c#20] > +- *Sort [c#23 ASC NULLS FIRST], false, 0 >
[jira] [Created] (SPARK-32038) Regression in handling NaN values in COUNT(DISTINCT)
Mithun Radhakrishnan created SPARK-32038: Summary: Regression in handling NaN values in COUNT(DISTINCT) Key: SPARK-32038 URL: https://issues.apache.org/jira/browse/SPARK-32038 Project: Spark Issue Type: Bug Components: Optimizer, SQL Affects Versions: 3.0.0 Reporter: Mithun Radhakrishnan There seems to be a regression in Spark 3.0.0, with regard to how {{NaN}} values are normalized/handled in {{COUNT(DISTINCT ...)}}. Here is an illustration: {code:scala} case class Test( uid:String, score:Float) val POS_NAN_1 = java.lang.Float.intBitsToFloat(0x7f81) val POS_NAN_2 = java.lang.Float.intBitsToFloat(0x7fff) val rows = Seq( Test("mithunr", Float.NaN), Test("mithunr", POS_NAN_1), Test("mithunr", POS_NAN_2), Test("abellina", 1.0f), Test("abellina", 2.0f) ).toDF.createOrReplaceTempView("mytable") spark.sql(" select uid, count(distinct score) from mytable group by 1 order by 1 asc ").show {code} Here are the results under Spark 3.0.0: {code:title=Spark 3.0.0 (single aggregation)} ++-+ | uid|count(DISTINCT score)| ++-+ |abellina|2| | mithunr|3| ++-+ {code} Note that the count against {{mithunr}} is {{3}}, accounting for each distinct value for {{NaN}}. The right results are returned when another aggregation is added to the GBY: {code:scala|title=Spark 3.0.0 (multiple aggregations)} scala> spark.sql(" select uid, count(distinct score), max(score) from mytable group by 1 order by 1 asc ").show ++-+--+ | uid|count(DISTINCT score)|max(score)| ++-+--+ |abellina|2| 2.0| | mithunr|1| NaN| ++-+--+ {code} Also, note that Spark 2.4.6 normalizes the `DISTINCT` expression correctly: {code:scala|title=Spark 2.4.6} scala> spark.sql(" select uid, count(distinct score) from mytable group by 1 order by 1 asc ").show ++-+ | uid|count(DISTINCT score)| ++-+ |abellina|2| | mithunr|1| ++-+ {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-31350) Coalesce bucketed tables for join if applicable
[ https://issues.apache.org/jira/browse/SPARK-31350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takeshi Yamamuro resolved SPARK-31350. -- Fix Version/s: 3.1.0 Assignee: Terry Kim Resolution: Fixed Resolved by https://github.com/apache/spark/pull/28123 > Coalesce bucketed tables for join if applicable > --- > > Key: SPARK-31350 > URL: https://issues.apache.org/jira/browse/SPARK-31350 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.1.0 >Reporter: Terry Kim >Assignee: Terry Kim >Priority: Major > Fix For: 3.1.0 > > > The following example of joining two bucketed tables introduces a full > shuffle: > {code:java} > spark.conf.set("spark.sql.autoBroadcastJoinThreshold", "0") > val df1 = (0 until 20).map(i => (i % 5, i % 13, i.toString)).toDF("i", "j", > "k") > val df2 = (0 until 20).map(i => (i % 7, i % 11, i.toString)).toDF("i", "j", > "k") > df1.write.format("parquet").bucketBy(8, "i").saveAsTable("t1") > df2.write.format("parquet").bucketBy(4, "i").saveAsTable("t2") > val t1 = spark.table("t1") > val t2 = spark.table("t2") > val joined = t1.join(t2, t1("i") === t2("i")) > joined.explain(true) > == Physical Plan == > *(5) SortMergeJoin [i#44], [i#50], Inner > :- *(2) Sort [i#44 ASC NULLS FIRST], false, 0 > : +- Exchange hashpartitioning(i#44, 200), true, [id=#105] > : +- *(1) Project [i#44, j#45, k#46] > : +- *(1) Filter isnotnull(i#44) > : +- *(1) ColumnarToRow > : +- FileScan parquet default.t1[i#44,j#45,k#46] Batched: true, > DataFilters: [isnotnull(i#44)], Format: Parquet, Location: > InMemoryFileIndex[...], PartitionFilters: [], PushedFilters: [IsNotNull(i)], > ReadSchema: struct, SelectedBucketsCount: 8 out of 8 > +- *(4) Sort [i#50 ASC NULLS FIRST], false, 0 > +- Exchange hashpartitioning(i#50, 200), true, [id=#115] > +- *(3) Project [i#50, j#51, k#52] > +- *(3) Filter isnotnull(i#50) > +- *(3) ColumnarToRow > +- FileScan parquet default.t2[i#50,j#51,k#52] Batched: true, > DataFilters: [isnotnull(i#50)], Format: Parquet, Location: > InMemoryFileIndex[...], PartitionFilters: [], PushedFilters: [IsNotNull(i)], > ReadSchema: struct, SelectedBucketsCount: 4 out of 4 > {code} > But one side can be coalesced to eliminate the shuffle. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32037) Rename blacklisting feature to avoid language with racist connotation
[ https://issues.apache.org/jira/browse/SPARK-32037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen updated SPARK-32037: Description: As per [discussion on the Spark dev list|https://lists.apache.org/thread.html/rf6b2cdcba4d3875350517a2339619e5d54e12e66626a88553f9fe275%40%3Cdev.spark.apache.org%3E], it will be beneficial to remove references to problematic language that can alienate potential community members. One such reference is "blacklist". While it seems to me that there is some valid debate as to whether this term has racist origins, the cultural connotations are inescapable in today's world. I've created a separate task, SPARK-32036, to remove references outside of this feature. Given the large surface area of this feature and the public-facing UI / configs / etc., more care will need to be taken here. I'd like to start by opening up debate on what the best replacement name would be. Reject-/deny-/ignore-/block-list are common replacements for "blacklist", but I'm not sure that any of them work well for this situation. was: As per [discussion on the Spark dev list|https://lists.apache.org/thread.html/rf6b2cdcba4d3875350517a2339619e5d54e12e66626a88553f9fe275%40%3Cdev.spark.apache.org%3E], it will be beneficial to remove references to problematic language that can alienate potential community members. One such reference is "blacklist". While it seems to me that there is some valid debate as to whether these terms have racist origins, the cultural connotations are inescapable in today's world. I've created a separate task, SPARK-32036, to remove references outside of this feature. Given the large surface area of this feature and the public-facing UI / configs / etc., more care will need to be taken here. I'd like to start by opening up debate on what the best replacement name would be. Reject-/deny-/ignore-/block-list are common replacements for "blacklist", but I'm not sure that any of them work well for this situation. > Rename blacklisting feature to avoid language with racist connotation > - > > Key: SPARK-32037 > URL: https://issues.apache.org/jira/browse/SPARK-32037 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.1 >Reporter: Erik Krogen >Priority: Minor > > As per [discussion on the Spark dev > list|https://lists.apache.org/thread.html/rf6b2cdcba4d3875350517a2339619e5d54e12e66626a88553f9fe275%40%3Cdev.spark.apache.org%3E], > it will be beneficial to remove references to problematic language that can > alienate potential community members. One such reference is "blacklist". > While it seems to me that there is some valid debate as to whether this term > has racist origins, the cultural connotations are inescapable in today's > world. > I've created a separate task, SPARK-32036, to remove references outside of > this feature. Given the large surface area of this feature and the > public-facing UI / configs / etc., more care will need to be taken here. > I'd like to start by opening up debate on what the best replacement name > would be. Reject-/deny-/ignore-/block-list are common replacements for > "blacklist", but I'm not sure that any of them work well for this situation. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32037) Rename blacklisting feature to avoid language with racist connotation
[ https://issues.apache.org/jira/browse/SPARK-32037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140876#comment-17140876 ] Erik Krogen commented on SPARK-32037: - +1 from me, I agree that this feature is basically a health tracker. > Rename blacklisting feature to avoid language with racist connotation > - > > Key: SPARK-32037 > URL: https://issues.apache.org/jira/browse/SPARK-32037 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.1 >Reporter: Erik Krogen >Priority: Minor > > As per [discussion on the Spark dev > list|https://lists.apache.org/thread.html/rf6b2cdcba4d3875350517a2339619e5d54e12e66626a88553f9fe275%40%3Cdev.spark.apache.org%3E], > it will be beneficial to remove references to problematic language that can > alienate potential community members. One such reference is "blacklist". > While it seems to me that there is some valid debate as to whether this term > has racist origins, the cultural connotations are inescapable in today's > world. > I've created a separate task, SPARK-32036, to remove references outside of > this feature. Given the large surface area of this feature and the > public-facing UI / configs / etc., more care will need to be taken here. > I'd like to start by opening up debate on what the best replacement name > would be. Reject-/deny-/ignore-/block-list are common replacements for > "blacklist", but I'm not sure that any of them work well for this situation. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32031) Fix the wrong references of PartialMerge/Final AggregateExpression
[ https://issues.apache.org/jira/browse/SPARK-32031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140859#comment-17140859 ] Dongjoon Hyun commented on SPARK-32031: --- Hi, [~Ngone51]. This is filed as `Improvement`, but the title is `Fix ...`. Is this a bug fix? > Fix the wrong references of PartialMerge/Final AggregateExpression > -- > > Key: SPARK-32031 > URL: https://issues.apache.org/jira/browse/SPARK-32031 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: wuyi >Priority: Major > > For the PartialMerge/Final AggregateExpression, it should reference the > `inputAggBufferAttributes` instead of `aggBufferAttributes` according to > `AggUtils.planAggXXX` -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-32033) Use new poll API in Kafka connector executor side to avoid infinite wait
[ https://issues.apache.org/jira/browse/SPARK-32033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-32033. --- Fix Version/s: 3.1.0 Resolution: Fixed Issue resolved by pull request 28871 [https://github.com/apache/spark/pull/28871] > Use new poll API in Kafka connector executor side to avoid infinite wait > > > Key: SPARK-32033 > URL: https://issues.apache.org/jira/browse/SPARK-32033 > Project: Spark > Issue Type: Sub-task > Components: Structured Streaming >Affects Versions: 3.1.0 >Reporter: Gabor Somogyi >Assignee: Gabor Somogyi >Priority: Major > Fix For: 3.1.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32033) Use new poll API in Kafka connector executor side to avoid infinite wait
[ https://issues.apache.org/jira/browse/SPARK-32033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-32033: - Assignee: Gabor Somogyi > Use new poll API in Kafka connector executor side to avoid infinite wait > > > Key: SPARK-32033 > URL: https://issues.apache.org/jira/browse/SPARK-32033 > Project: Spark > Issue Type: Sub-task > Components: Structured Streaming >Affects Versions: 3.1.0 >Reporter: Gabor Somogyi >Assignee: Gabor Somogyi >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32037) Rename blacklisting feature to avoid language with racist connotation
[ https://issues.apache.org/jira/browse/SPARK-32037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140825#comment-17140825 ] Ryan Blue commented on SPARK-32037: --- What about "healthy" and "unhealthy"? That's basically what we are trying to keep track of -- whether a node is healthy enough to run tasks, or if it should not be used for some period of time. I think "trusted" and "untrusted" may also work, but "healthy" is a bit closer to what we want. > Rename blacklisting feature to avoid language with racist connotation > - > > Key: SPARK-32037 > URL: https://issues.apache.org/jira/browse/SPARK-32037 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.1 >Reporter: Erik Krogen >Priority: Minor > > As per [discussion on the Spark dev > list|https://lists.apache.org/thread.html/rf6b2cdcba4d3875350517a2339619e5d54e12e66626a88553f9fe275%40%3Cdev.spark.apache.org%3E], > it will be beneficial to remove references to problematic language that can > alienate potential community members. One such reference is "blacklist". > While it seems to me that there is some valid debate as to whether these > terms have racist origins, the cultural connotations are inescapable in > today's world. > I've created a separate task, SPARK-32036, to remove references outside of > this feature. Given the large surface area of this feature and the > public-facing UI / configs / etc., more care will need to be taken here. > I'd like to start by opening up debate on what the best replacement name > would be. Reject-/deny-/ignore-/block-list are common replacements for > "blacklist", but I'm not sure that any of them work well for this situation. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-32037) Rename blacklisting feature to avoid language with racist connotation
Erik Krogen created SPARK-32037: --- Summary: Rename blacklisting feature to avoid language with racist connotation Key: SPARK-32037 URL: https://issues.apache.org/jira/browse/SPARK-32037 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.0.1 Reporter: Erik Krogen As per [discussion on the Spark dev list|https://lists.apache.org/thread.html/rf6b2cdcba4d3875350517a2339619e5d54e12e66626a88553f9fe275%40%3Cdev.spark.apache.org%3E], it will be beneficial to remove references to problematic language that can alienate potential community members. One such reference is "blacklist". While it seems to me that there is some valid debate as to whether these terms have racist origins, the cultural connotations are inescapable in today's world. I've created a separate task, SPARK-32036, to remove references outside of this feature. Given the large surface area of this feature and the public-facing UI / configs / etc., more care will need to be taken here. I'd like to start by opening up debate on what the best replacement name would be. Reject-/deny-/ignore-/block-list are common replacements for "blacklist", but I'm not sure that any of them work well for this situation. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32036) Remove references to "blacklist"/"whitelist" language (outside of blacklisting feature)
[ https://issues.apache.org/jira/browse/SPARK-32036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen updated SPARK-32036: Description: As per [discussion on the Spark dev list|https://lists.apache.org/thread.html/rf6b2cdcba4d3875350517a2339619e5d54e12e66626a88553f9fe275%40%3Cdev.spark.apache.org%3E], it will be beneficial to remove references to problematic language that can alienate potential community members. One such reference is "blacklist" and "whitelist". While it seems to me that there is some valid debate as to whether these terms have racist origins, the cultural connotations are inescapable in today's world. Renaming the entire blacklisting feature would be a large effort with lots of care needed to maintain public-facing APIs and configurations. Though I think this will be a very rewarding effort for which I've filed SPARK-32037, I'd like to start by tackling all of the other references to such terminology in the codebase, of which there are still dozens or hundreds beyond the blacklisting feature. I'm not sure what the best "Component" is for this so I put Spark Core for now. was: As per [discussion on the Spark dev list|https://lists.apache.org/thread.html/rf6b2cdcba4d3875350517a2339619e5d54e12e66626a88553f9fe275%40%3Cdev.spark.apache.org%3E], it will be beneficial to remove references to problematic language that can alienate potential community members. One such reference is "blacklist" and "whitelist". While it seems to me that there is some valid debate as to whether these terms have racist origins, the cultural connotations are inescapable in today's world. Renaming the entire blacklisting feature would be a large effort with lots of care needed to maintain public-facing APIs and configurations. Though I think this will be a very rewarding effort, I'd like to start by tackling all of the other references to such terminology in the codebase, of which there are still dozens or hundreds beyond the blacklisting feature. I'm not sure what the best "Component" is for this so I put Spark Core for now. > Remove references to "blacklist"/"whitelist" language (outside of > blacklisting feature) > --- > > Key: SPARK-32036 > URL: https://issues.apache.org/jira/browse/SPARK-32036 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.1 >Reporter: Erik Krogen >Priority: Minor > > As per [discussion on the Spark dev > list|https://lists.apache.org/thread.html/rf6b2cdcba4d3875350517a2339619e5d54e12e66626a88553f9fe275%40%3Cdev.spark.apache.org%3E], > it will be beneficial to remove references to problematic language that can > alienate potential community members. One such reference is "blacklist" and > "whitelist". While it seems to me that there is some valid debate as to > whether these terms have racist origins, the cultural connotations are > inescapable in today's world. > Renaming the entire blacklisting feature would be a large effort with lots of > care needed to maintain public-facing APIs and configurations. Though I think > this will be a very rewarding effort for which I've filed SPARK-32037, I'd > like to start by tackling all of the other references to such terminology in > the codebase, of which there are still dozens or hundreds beyond the > blacklisting feature. > I'm not sure what the best "Component" is for this so I put Spark Core for > now. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-32036) Remove references to "blacklist"/"whitelist" language (outside of blacklisting feature)
Erik Krogen created SPARK-32036: --- Summary: Remove references to "blacklist"/"whitelist" language (outside of blacklisting feature) Key: SPARK-32036 URL: https://issues.apache.org/jira/browse/SPARK-32036 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.0.1 Reporter: Erik Krogen As per [discussion on the Spark dev list|https://lists.apache.org/thread.html/rf6b2cdcba4d3875350517a2339619e5d54e12e66626a88553f9fe275%40%3Cdev.spark.apache.org%3E], it will be beneficial to remove references to problematic language that can alienate potential community members. One such reference is "blacklist" and "whitelist". While it seems to me that there is some valid debate as to whether these terms have racist origins, the cultural connotations are inescapable in today's world. Renaming the entire blacklisting feature would be a large effort with lots of care needed to maintain public-facing APIs and configurations. Though I think this will be a very rewarding effort, I'd like to start by tackling all of the other references to such terminology in the codebase, of which there are still dozens or hundreds beyond the blacklisting feature. I'm not sure what the best "Component" is for this so I put Spark Core for now. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32021) make_interval does not accept seconds >100
[ https://issues.apache.org/jira/browse/SPARK-32021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32021: Assignee: Apache Spark > make_interval does not accept seconds >100 > -- > > Key: SPARK-32021 > URL: https://issues.apache.org/jira/browse/SPARK-32021 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Juliusz Sompolski >Assignee: Apache Spark >Priority: Major > > In make_interval(years, months, weeks, days, hours, mins, secs), secs are > defined as Decimal(8, 6), which turns into null if the value of the > expression overflows 100 seconds. > Larger seconds values should be allowed. > This has been reported by Simba, who wants to use make_interval to implement > translation for TIMESTAMP_ADD ODBC function in Spark 3.0. > ODBC {fn TIMESTAMPADD(SECOND, integer_exp, timestamp} fails when integer_exp > returns seconds values >= 100. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32021) make_interval does not accept seconds >100
[ https://issues.apache.org/jira/browse/SPARK-32021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140775#comment-17140775 ] Apache Spark commented on SPARK-32021: -- User 'MaxGekk' has created a pull request for this issue: https://github.com/apache/spark/pull/28873 > make_interval does not accept seconds >100 > -- > > Key: SPARK-32021 > URL: https://issues.apache.org/jira/browse/SPARK-32021 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Juliusz Sompolski >Priority: Major > > In make_interval(years, months, weeks, days, hours, mins, secs), secs are > defined as Decimal(8, 6), which turns into null if the value of the > expression overflows 100 seconds. > Larger seconds values should be allowed. > This has been reported by Simba, who wants to use make_interval to implement > translation for TIMESTAMP_ADD ODBC function in Spark 3.0. > ODBC {fn TIMESTAMPADD(SECOND, integer_exp, timestamp} fails when integer_exp > returns seconds values >= 100. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32021) make_interval does not accept seconds >100
[ https://issues.apache.org/jira/browse/SPARK-32021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32021: Assignee: (was: Apache Spark) > make_interval does not accept seconds >100 > -- > > Key: SPARK-32021 > URL: https://issues.apache.org/jira/browse/SPARK-32021 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Juliusz Sompolski >Priority: Major > > In make_interval(years, months, weeks, days, hours, mins, secs), secs are > defined as Decimal(8, 6), which turns into null if the value of the > expression overflows 100 seconds. > Larger seconds values should be allowed. > This has been reported by Simba, who wants to use make_interval to implement > translation for TIMESTAMP_ADD ODBC function in Spark 3.0. > ODBC {fn TIMESTAMPADD(SECOND, integer_exp, timestamp} fails when integer_exp > returns seconds values >= 100. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-7101) Spark SQL should support java.sql.Time
[ https://issues.apache.org/jira/browse/SPARK-7101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17132694#comment-17132694 ] YoungGyu Chun edited comment on SPARK-7101 at 6/19/20, 6:35 PM: I will try to get this done but there are tons of work to do ;) was (Author: younggyuchun): I will try to get this done but there are a ton of work ;) > Spark SQL should support java.sql.Time > -- > > Key: SPARK-7101 > URL: https://issues.apache.org/jira/browse/SPARK-7101 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 > Environment: All >Reporter: Peter Hagelund >Priority: Major > > Several RDBMSes support the TIME data type; for more exact mapping between > those and Spark SQL, support for java.sql.Time with an associated > DataType.TimeType would be helpful. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-29679) Make interval type camparable and orderable
[ https://issues.apache.org/jira/browse/SPARK-29679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140651#comment-17140651 ] Bart Samwel commented on SPARK-29679: - That's a good question. I think it would make sense to do it that way. That means that all ANSI SQL compliant queries will run, and if you mix month-type and seconds-type intervals then you get an error if you use any operation that depends on them being comparable (including things like GROUP BY). > Make interval type camparable and orderable > --- > > Key: SPARK-29679 > URL: https://issues.apache.org/jira/browse/SPARK-29679 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Fix For: 3.0.0 > > > {code:sql} > postgres=# select INTERVAL '9 years 1 months -1 weeks -4 days -10 hours -46 > minutes' > interval '1 s'; > ?column? > -- > t > (1 row) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-31029) Occasional class not found error in user's Future code using global ExecutionContext
[ https://issues.apache.org/jira/browse/SPARK-31029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves resolved SPARK-31029. --- Fix Version/s: 3.1.0 Assignee: shanyu zhao Resolution: Fixed > Occasional class not found error in user's Future code using global > ExecutionContext > > > Key: SPARK-31029 > URL: https://issues.apache.org/jira/browse/SPARK-31029 > Project: Spark > Issue Type: Bug > Components: Spark Core, YARN >Affects Versions: 2.4.5 >Reporter: shanyu zhao >Assignee: shanyu zhao >Priority: Major > Fix For: 3.1.0 > > > *Problem:* > When running tpc-ds test (https://github.com/databricks/spark-sql-perf), > occasionally we see error related to class not found: > 2020-02-04 20:00:26,673 ERROR yarn.ApplicationMaster: User class threw > exception: scala.ScalaReflectionException: class > com.databricks.spark.sql.perf.ExperimentRun in JavaMirror with > sun.misc.Launcher$AppClassLoader@28ba21f3 of type class > sun.misc.Launcher$AppClassLoader with classpath [...] > and parent being sun.misc.Launcher$ExtClassLoader@3ff5d147 of type class > sun.misc.Launcher$ExtClassLoader with classpath [...] > and parent being primordial classloader with boot classpath [...] not found. > *Root cause:* > Spark driver starts ApplicationMaster in the main thread, which starts a user > thread and set MutableURLClassLoader to that thread's ContextClassLoader. > userClassThread = startUserApplication() > The main thread then setup YarnSchedulerBackend RPC endpoints, which handles > these calls using scala Future with the default global ExecutionContext: > - doRequestTotalExecutors > - doKillExecutors > If main thread starts a future to handle doKillExecutors() before user thread > does then the default thread pool thread's ContextClassLoader would be the > default (AppClassLoader). > If user thread starts a future first then the thread pool thread will have > MutableURLClassLoader. > So if user's code uses a future which references a user provided class (only > MutableURLClassLoader can load), and before the future if there are executor > lost, you will see errors related to class not found. > *Proposed Solution:* > We can potentially solve this problem in one of two ways: > 1) Set the same class loader (userClassLoader) to both the main thread and > user thread in ApplicationMaster.scala > 2) Do not use "ExecutionContext.Implicits.global" in YarnSchedulerBackend -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-31826) Support composed type of case class for typed Scala UDF
[ https://issues.apache.org/jira/browse/SPARK-31826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-31826. - Fix Version/s: 3.1.0 Resolution: Fixed Issue resolved by pull request 28645 [https://github.com/apache/spark/pull/28645] > Support composed type of case class for typed Scala UDF > --- > > Key: SPARK-31826 > URL: https://issues.apache.org/jira/browse/SPARK-31826 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: wuyi >Assignee: wuyi >Priority: Major > Fix For: 3.1.0 > > > After SPARK-30127, typed Scala UDF now supports to accept case class as input > parameter. However, it still does not support types like Seq[T], Array[T], > assuming T is a case class. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31826) Support composed type of case class for typed Scala UDF
[ https://issues.apache.org/jira/browse/SPARK-31826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-31826: --- Assignee: wuyi > Support composed type of case class for typed Scala UDF > --- > > Key: SPARK-31826 > URL: https://issues.apache.org/jira/browse/SPARK-31826 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: wuyi >Assignee: wuyi >Priority: Major > > After SPARK-30127, typed Scala UDF now supports to accept case class as input > parameter. However, it still does not support types like Seq[T], Array[T], > assuming T is a case class. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-31056) Add CalendarIntervals division
[ https://issues.apache.org/jira/browse/SPARK-31056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-31056. -- Resolution: Won't Fix > Add CalendarIntervals division > -- > > Key: SPARK-31056 > URL: https://issues.apache.org/jira/browse/SPARK-31056 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Enrico Minack >Priority: Major > > {{CalendarInterval}} should be allowed for division. The {{CalendarInterval}} > consists of three time components: {{months}}, {{days}} and {{microseconds}}. > The division can only be defined between intervals that have a single > non-zero time component, while both intervals have the same non-zero time > component. Otherwise the division expression would be ambiguous. > This allows to evaluate the magnitude of {{CalendarInterval}} in SQL > expressions: > {code} > Seq((Timestamp.valueOf("2020-02-01 12:00:00"), Timestamp.valueOf("2020-02-01 > 13:30:25"))) > .toDF("start", "end") > .withColumn("interval", $"end" - $"start") > .withColumn("interval [h]", $"interval" / lit("1 > hour").cast(CalendarIntervalType)) > .withColumn("rate [€/h]", lit(1.45)) > .withColumn("price [€]", $"interval [h]" * $"rate [€/h]") > .show(false) > +---+---+-+--+--+--+ > |start |end|interval > |interval [h] |rate [€/h]|price [€] | > +---+---+-+--+--+--+ > |2020-02-01 12:00:00|2020-02-01 13:30:25|1 hours 30 minutes 25 > seconds|1.5069|1.45 |2.18506943| > +---+---+-+--+--+--+ > {code} > The currently available approach is > {code} > Seq((Timestamp.valueOf("2020-02-01 12:00:00"), Timestamp.valueOf("2020-02-01 > 13:30:25"))) > .toDF("start", "end") > .withColumn("interval [s]", unix_timestamp($"end") - > unix_timestamp($"start")) > .withColumn("interval [h]", $"interval [s]" / 3600) > .withColumn("rate [€/h]", lit(1.45)) > .withColumn("price [€]", $"interval [h]" * $"rate [€/h]") > .show(false) > {code} > Going through {{unix_timestamp}} is a hack and it pollutes the SQL query with > unrelated semantics (unix timestamp is completely irrelevant for this > computation). It is merely there because there is currently no way to access > the length of an {{CalendarInterval}}. Dividing an interval by another > interval provides means to measure the length in an arbitrary unit (minutes, > hours, quarter hours). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-32035) Inconsistent AWS environment variables in documentation
Ondrej Kokes created SPARK-32035: Summary: Inconsistent AWS environment variables in documentation Key: SPARK-32035 URL: https://issues.apache.org/jira/browse/SPARK-32035 Project: Spark Issue Type: Bug Components: Documentation Affects Versions: 3.0.0, 2.4.6 Reporter: Ondrej Kokes Looking at the actual Scala code, the environment variables used to log into AWS are: - AWS_ACCESS_KEY_ID - AWS_SECRET_ACCESS_KEY - AWS_SESSION_TOKEN These are the same that AWS uses in their libraries. However, looking through the Spark documentation and comments, I see that these are not denoted correctly across the board: docs/cloud-integration.md 106:1. `spark-submit` reads the `AWS_ACCESS_KEY`, `AWS_SECRET_KEY` *<-- both different* 107:and `AWS_SESSION_TOKEN` environment variables and sets the associated authentication options docs/streaming-kinesis-integration.md 232:- Set up the environment variables `AWS_ACCESS_KEY_ID` and `AWS_SECRET_KEY` with your AWS credentials. *<-- secret key different* external/kinesis-asl/src/main/python/examples/streaming/kinesis_wordcount_asl.py 34: $ export AWS_ACCESS_KEY_ID= 35: $ export AWS_SECRET_KEY= *<-- different* 48: Environment Variables - AWS_ACCESS_KEY_ID and AWS_SECRET_KEY *<-- secret key different* core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala 438: val keyId = System.getenv("AWS_ACCESS_KEY_ID") 439: val accessKey = System.getenv("AWS_SECRET_ACCESS_KEY") 448: val sessionToken = System.getenv("AWS_SESSION_TOKEN") external/kinesis-asl/src/main/scala/org/apache/spark/examples/streaming/KinesisWordCountASL.scala 53: * $ export AWS_ACCESS_KEY_ID= 54: * $ export AWS_SECRET_KEY= *<-- different* 65: * Environment Variables - AWS_ACCESS_KEY_ID and AWS_SECRET_KEY *<-- secret key different* external/kinesis-asl/src/main/java/org/apache/spark/examples/streaming/JavaKinesisWordCountASL.java 59: * $ export AWS_ACCESS_KEY_ID=[your-access-key] 60: * $ export AWS_SECRET_KEY= *<-- different* 71: * Environment Variables - AWS_ACCESS_KEY_ID and AWS_SECRET_KEY *<-- secret key different* -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11150) Dynamic partition pruning
[ https://issues.apache.org/jira/browse/SPARK-11150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140378#comment-17140378 ] Apache Spark commented on SPARK-11150: -- User 'AngersZh' has created a pull request for this issue: https://github.com/apache/spark/pull/28872 > Dynamic partition pruning > - > > Key: SPARK-11150 > URL: https://issues.apache.org/jira/browse/SPARK-11150 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 1.5.1, 1.6.0, 2.0.0, 2.1.2, 2.2.1, 2.3.0 >Reporter: Younes >Assignee: Wei Xue >Priority: Major > Labels: release-notes > Fix For: 3.0.0 > > Attachments: image-2019-10-04-11-20-02-616.png > > > Implements dynamic partition pruning by adding a dynamic-partition-pruning > filter if there is a partitioned table and a filter on the dimension table. > The filter is then planned using a heuristic approach: > # As a broadcast relation if it is a broadcast hash join. The broadcast > relation will then be transformed into a reused broadcast exchange by the > {{ReuseExchange}} rule; or > # As a subquery duplicate if the estimated benefit of partition table scan > being saved is greater than the estimated cost of the extra scan of the > duplicated subquery; otherwise > # As a bypassed condition ({{true}}). > Below shows a basic example of DPP. > !image-2019-10-04-11-20-02-616.png|width=521,height=225! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11150) Dynamic partition pruning
[ https://issues.apache.org/jira/browse/SPARK-11150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140374#comment-17140374 ] Apache Spark commented on SPARK-11150: -- User 'AngersZh' has created a pull request for this issue: https://github.com/apache/spark/pull/28872 > Dynamic partition pruning > - > > Key: SPARK-11150 > URL: https://issues.apache.org/jira/browse/SPARK-11150 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 1.5.1, 1.6.0, 2.0.0, 2.1.2, 2.2.1, 2.3.0 >Reporter: Younes >Assignee: Wei Xue >Priority: Major > Labels: release-notes > Fix For: 3.0.0 > > Attachments: image-2019-10-04-11-20-02-616.png > > > Implements dynamic partition pruning by adding a dynamic-partition-pruning > filter if there is a partitioned table and a filter on the dimension table. > The filter is then planned using a heuristic approach: > # As a broadcast relation if it is a broadcast hash join. The broadcast > relation will then be transformed into a reused broadcast exchange by the > {{ReuseExchange}} rule; or > # As a subquery duplicate if the estimated benefit of partition table scan > being saved is greater than the estimated cost of the extra scan of the > duplicated subquery; otherwise > # As a bypassed condition ({{true}}). > Below shows a basic example of DPP. > !image-2019-10-04-11-20-02-616.png|width=521,height=225! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32033) Use new poll API in Kafka connector executor side to avoid infinite wait
[ https://issues.apache.org/jira/browse/SPARK-32033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140360#comment-17140360 ] Apache Spark commented on SPARK-32033: -- User 'gaborgsomogyi' has created a pull request for this issue: https://github.com/apache/spark/pull/28871 > Use new poll API in Kafka connector executor side to avoid infinite wait > > > Key: SPARK-32033 > URL: https://issues.apache.org/jira/browse/SPARK-32033 > Project: Spark > Issue Type: Sub-task > Components: Structured Streaming >Affects Versions: 3.1.0 >Reporter: Gabor Somogyi >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32033) Use new poll API in Kafka connector executor side to avoid infinite wait
[ https://issues.apache.org/jira/browse/SPARK-32033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32033: Assignee: Apache Spark > Use new poll API in Kafka connector executor side to avoid infinite wait > > > Key: SPARK-32033 > URL: https://issues.apache.org/jira/browse/SPARK-32033 > Project: Spark > Issue Type: Sub-task > Components: Structured Streaming >Affects Versions: 3.1.0 >Reporter: Gabor Somogyi >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32033) Use new poll API in Kafka connector executor side to avoid infinite wait
[ https://issues.apache.org/jira/browse/SPARK-32033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32033: Assignee: (was: Apache Spark) > Use new poll API in Kafka connector executor side to avoid infinite wait > > > Key: SPARK-32033 > URL: https://issues.apache.org/jira/browse/SPARK-32033 > Project: Spark > Issue Type: Sub-task > Components: Structured Streaming >Affects Versions: 3.1.0 >Reporter: Gabor Somogyi >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32033) Use new poll API in Kafka connector executor side to avoid infinite wait
[ https://issues.apache.org/jira/browse/SPARK-32033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140356#comment-17140356 ] Apache Spark commented on SPARK-32033: -- User 'gaborgsomogyi' has created a pull request for this issue: https://github.com/apache/spark/pull/28871 > Use new poll API in Kafka connector executor side to avoid infinite wait > > > Key: SPARK-32033 > URL: https://issues.apache.org/jira/browse/SPARK-32033 > Project: Spark > Issue Type: Sub-task > Components: Structured Streaming >Affects Versions: 3.1.0 >Reporter: Gabor Somogyi >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-16659) use Maven project to submit spark application via yarn-client
[ https://issues.apache.org/jira/browse/SPARK-16659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jack Jiang updated SPARK-16659: --- Description: (was: i want to use spark sql to execute hive sql in my maven project,here is the main code: System.setProperty("hadoop.home.dir", "D:\\hadoop-common-2.2.0-bin-master"); SparkConf sparkConf = new SparkConf() .setAppName("test").setMaster("yarn-client"); // .set("hive.metastore.uris", "thrift://172.30.115.59:9083"); SparkContext ctx = new SparkContext(sparkConf); // ctx.addJar("lib/hive-hbase-handler-0.14.0.2.2.6.0-2800.jar"); HiveContext sqlContext = new org.apache.spark.sql.hive.HiveContext(ctx); String[] tables = sqlContext.tableNames(); for (String tablename : tables) { System.out.println("tablename : " + tablename); } when i run it,it comes to a error: 10:16:17,496 INFO Client:59 - client token: N/A diagnostics: Application application_1468409747983_0280 failed 2 times due to AM Container for appattempt_1468409747983_0280_02 exited with exitCode: -1000 For more detailed output, check application tracking page:http://hadoop003.icccuat.com:8088/proxy/application_1468409747983_0280/Then, click on links to logs of each attempt. Diagnostics: File file:/C:/Users/uatxj990267/AppData/Local/Temp/spark-8874c486-893d-4ac3-a088-48e4cdb484e1/__spark_conf__9007071161920501082.zip does not exist java.io.FileNotFoundException: File file:/C:/Users/uatxj990267/AppData/Local/Temp/spark-8874c486-893d-4ac3-a088-48e4cdb484e1/__spark_conf__9007071161920501082.zip does not exist at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:608) at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:821) at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:598) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:414) at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:251) at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:61) at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359) at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:357) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:356) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:60) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Failing this attempt. Failing the application. ApplicationMaster host: N/A ApplicationMaster RPC port: -1 queue: default start time: 1469067373412 final status: FAILED tracking URL: http://hadoop003.icccuat.com:8088/cluster/app/application_1468409747983_0280 user: uatxj990267 10:16:17,496 ERROR SparkContext:96 - Error initializing SparkContext. org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master. at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:123) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:63) at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144) at org.apache.spark.SparkContext.(SparkContext.scala:523) at com.huateng.test.SparkSqlDemo.main(SparkSqlDemo.java:33) but when i change this code setMaster("yarn-client") to setMaster(local[2]),it's OK?what's wrong with it ?can anyone help me?) > use Maven project to submit spark application via yarn-client > - > > Key: SPARK-16659 > URL: https://issues.apache.org/jira/browse/SPARK-16659 > Project: Spark > Issue Type: Question >Reporter: Jack Jiang >Priority: Major > Labels: newbie > -- This message was sent by Atlassian Jira
[jira] [Commented] (SPARK-32034) Port HIVE-14817: Shutdown the SessionManager timeoutChecker thread properly upon shutdown
[ https://issues.apache.org/jira/browse/SPARK-32034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140344#comment-17140344 ] Apache Spark commented on SPARK-32034: -- User 'yaooqinn' has created a pull request for this issue: https://github.com/apache/spark/pull/28870 > Port HIVE-14817: Shutdown the SessionManager timeoutChecker thread properly > upon shutdown > - > > Key: SPARK-32034 > URL: https://issues.apache.org/jira/browse/SPARK-32034 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0, 3.1.0 >Reporter: Kent Yao >Priority: Major > > When stopping the HiveServer2, the non-daemon thread stops the server from > terminating > {code:java} > "HiveServer2-Background-Pool: Thread-79" #79 prio=5 os_prio=31 > tid=0x7fde26138800 nid=0x13713 waiting on condition [0x700010c32000] >java.lang.Thread.State: TIMED_WAITING (sleeping) > at java.lang.Thread.sleep(Native Method) > at > org.apache.hive.service.cli.session.SessionManager$1.sleepInterval(SessionManager.java:178) > at > org.apache.hive.service.cli.session.SessionManager$1.run(SessionManager.java:156) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {code} > Also, causes issues as HIVE-14817 described -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32034) Port HIVE-14817: Shutdown the SessionManager timeoutChecker thread properly upon shutdown
[ https://issues.apache.org/jira/browse/SPARK-32034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32034: Assignee: (was: Apache Spark) > Port HIVE-14817: Shutdown the SessionManager timeoutChecker thread properly > upon shutdown > - > > Key: SPARK-32034 > URL: https://issues.apache.org/jira/browse/SPARK-32034 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0, 3.1.0 >Reporter: Kent Yao >Priority: Major > > When stopping the HiveServer2, the non-daemon thread stops the server from > terminating > {code:java} > "HiveServer2-Background-Pool: Thread-79" #79 prio=5 os_prio=31 > tid=0x7fde26138800 nid=0x13713 waiting on condition [0x700010c32000] >java.lang.Thread.State: TIMED_WAITING (sleeping) > at java.lang.Thread.sleep(Native Method) > at > org.apache.hive.service.cli.session.SessionManager$1.sleepInterval(SessionManager.java:178) > at > org.apache.hive.service.cli.session.SessionManager$1.run(SessionManager.java:156) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {code} > Also, causes issues as HIVE-14817 described -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32034) Port HIVE-14817: Shutdown the SessionManager timeoutChecker thread properly upon shutdown
[ https://issues.apache.org/jira/browse/SPARK-32034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140343#comment-17140343 ] Apache Spark commented on SPARK-32034: -- User 'yaooqinn' has created a pull request for this issue: https://github.com/apache/spark/pull/28870 > Port HIVE-14817: Shutdown the SessionManager timeoutChecker thread properly > upon shutdown > - > > Key: SPARK-32034 > URL: https://issues.apache.org/jira/browse/SPARK-32034 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0, 3.1.0 >Reporter: Kent Yao >Priority: Major > > When stopping the HiveServer2, the non-daemon thread stops the server from > terminating > {code:java} > "HiveServer2-Background-Pool: Thread-79" #79 prio=5 os_prio=31 > tid=0x7fde26138800 nid=0x13713 waiting on condition [0x700010c32000] >java.lang.Thread.State: TIMED_WAITING (sleeping) > at java.lang.Thread.sleep(Native Method) > at > org.apache.hive.service.cli.session.SessionManager$1.sleepInterval(SessionManager.java:178) > at > org.apache.hive.service.cli.session.SessionManager$1.run(SessionManager.java:156) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {code} > Also, causes issues as HIVE-14817 described -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32034) Port HIVE-14817: Shutdown the SessionManager timeoutChecker thread properly upon shutdown
[ https://issues.apache.org/jira/browse/SPARK-32034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32034: Assignee: Apache Spark > Port HIVE-14817: Shutdown the SessionManager timeoutChecker thread properly > upon shutdown > - > > Key: SPARK-32034 > URL: https://issues.apache.org/jira/browse/SPARK-32034 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0, 3.1.0 >Reporter: Kent Yao >Assignee: Apache Spark >Priority: Major > > When stopping the HiveServer2, the non-daemon thread stops the server from > terminating > {code:java} > "HiveServer2-Background-Pool: Thread-79" #79 prio=5 os_prio=31 > tid=0x7fde26138800 nid=0x13713 waiting on condition [0x700010c32000] >java.lang.Thread.State: TIMED_WAITING (sleeping) > at java.lang.Thread.sleep(Native Method) > at > org.apache.hive.service.cli.session.SessionManager$1.sleepInterval(SessionManager.java:178) > at > org.apache.hive.service.cli.session.SessionManager$1.run(SessionManager.java:156) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {code} > Also, causes issues as HIVE-14817 described -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-32034) Port HIVE-14817: Shutdown the SessionManager timeoutChecker thread properly upon shutdown
Kent Yao created SPARK-32034: Summary: Port HIVE-14817: Shutdown the SessionManager timeoutChecker thread properly upon shutdown Key: SPARK-32034 URL: https://issues.apache.org/jira/browse/SPARK-32034 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.0.0, 3.1.0 Reporter: Kent Yao When stopping the HiveServer2, the non-daemon thread stops the server from terminating {code:java} "HiveServer2-Background-Pool: Thread-79" #79 prio=5 os_prio=31 tid=0x7fde26138800 nid=0x13713 waiting on condition [0x700010c32000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at org.apache.hive.service.cli.session.SessionManager$1.sleepInterval(SessionManager.java:178) at org.apache.hive.service.cli.session.SessionManager$1.run(SessionManager.java:156) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) {code} Also, causes issues as HIVE-14817 described -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-32033) Use new poll API in Kafka connector executor side to avoid infinite wait
Gabor Somogyi created SPARK-32033: - Summary: Use new poll API in Kafka connector executor side to avoid infinite wait Key: SPARK-32033 URL: https://issues.apache.org/jira/browse/SPARK-32033 Project: Spark Issue Type: Sub-task Components: Structured Streaming Affects Versions: 3.1.0 Reporter: Gabor Somogyi -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-32032) Use new poll API in Kafka connector diver side to avoid infinite wait
Gabor Somogyi created SPARK-32032: - Summary: Use new poll API in Kafka connector diver side to avoid infinite wait Key: SPARK-32032 URL: https://issues.apache.org/jira/browse/SPARK-32032 Project: Spark Issue Type: Sub-task Components: Structured Streaming Affects Versions: 3.1.0 Reporter: Gabor Somogyi -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28367) Kafka connector infinite wait because metadata never updated
[ https://issues.apache.org/jira/browse/SPARK-28367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140319#comment-17140319 ] Gabor Somogyi commented on SPARK-28367: --- I think we can split the problem into 2 pieces. Driver and executor side. The executor side is not problematic and can be done w/o new API. The driver side requires further consideration and effort. Creating subtasks and PR for executor side. > Kafka connector infinite wait because metadata never updated > > > Key: SPARK-28367 > URL: https://issues.apache.org/jira/browse/SPARK-28367 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.1.3, 2.2.3, 2.3.3, 2.4.3, 3.0.0 >Reporter: Gabor Somogyi >Priority: Critical > > Spark uses an old and deprecated API named poll(long) which never returns and > stays in live lock if metadata is not updated (for instance when broker > disappears at consumer creation). > I've created a small standalone application to test it and the alternatives: > https://github.com/gaborgsomogyi/kafka-get-assignment -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32031) Fix the wrong references of PartialMerge/Final AggregateExpression
[ https://issues.apache.org/jira/browse/SPARK-32031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140289#comment-17140289 ] Apache Spark commented on SPARK-32031: -- User 'Ngone51' has created a pull request for this issue: https://github.com/apache/spark/pull/28869 > Fix the wrong references of PartialMerge/Final AggregateExpression > -- > > Key: SPARK-32031 > URL: https://issues.apache.org/jira/browse/SPARK-32031 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: wuyi >Priority: Major > > For the PartialMerge/Final AggregateExpression, it should reference the > `inputAggBufferAttributes` instead of `aggBufferAttributes` according to > `AggUtils.planAggXXX` -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32031) Fix the wrong references of PartialMerge/Final AggregateExpression
[ https://issues.apache.org/jira/browse/SPARK-32031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32031: Assignee: Apache Spark > Fix the wrong references of PartialMerge/Final AggregateExpression > -- > > Key: SPARK-32031 > URL: https://issues.apache.org/jira/browse/SPARK-32031 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: wuyi >Assignee: Apache Spark >Priority: Major > > For the PartialMerge/Final AggregateExpression, it should reference the > `inputAggBufferAttributes` instead of `aggBufferAttributes` according to > `AggUtils.planAggXXX` -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32031) Fix the wrong references of PartialMerge/Final AggregateExpression
[ https://issues.apache.org/jira/browse/SPARK-32031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32031: Assignee: (was: Apache Spark) > Fix the wrong references of PartialMerge/Final AggregateExpression > -- > > Key: SPARK-32031 > URL: https://issues.apache.org/jira/browse/SPARK-32031 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: wuyi >Priority: Major > > For the PartialMerge/Final AggregateExpression, it should reference the > `inputAggBufferAttributes` instead of `aggBufferAttributes` according to > `AggUtils.planAggXXX` -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-32031) Fix the wrong references of PartialMerge/Final AggregateExpression
wuyi created SPARK-32031: Summary: Fix the wrong references of PartialMerge/Final AggregateExpression Key: SPARK-32031 URL: https://issues.apache.org/jira/browse/SPARK-32031 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.0 Reporter: wuyi For the PartialMerge/Final AggregateExpression, it should reference the `inputAggBufferAttributes` instead of `aggBufferAttributes` according to `AggUtils.planAggXXX` -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32030) Support unlimited MATCHED and NOT MATCHED clauses in MERGE INTO
[ https://issues.apache.org/jira/browse/SPARK-32030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xianyin Xin updated SPARK-32030: Description: Now the {{MERGE INTO}} syntax is, {code:sql} MERGE INTO [db_name.]target_table [AS target_alias] USING [db_name.]source_table [] [AS source_alias] ON [ WHEN MATCHED [ AND ] THEN ] [ WHEN MATCHED [ AND ] THEN ] [ WHEN NOT MATCHED [ AND ] THEN ]{code} It would be nice if we support unlimited {{MATCHED}} and {{NOT MATCHED}} clauses in {{MERGE INTO}} statement, because users may want to deal with different "{{AND }}"s, the result of which just like a series of "{{CASE WHEN}}"s. The expected syntax looks like {code:sql} MERGE INTO [db_name.]target_table [AS target_alias] USING [db_name.]source_table [] [AS source_alias] ON [when_clause [, ...]] {code} where {{when_clause}} is {code:java} WHEN MATCHED [ AND ] THEN {code} or {code:java} WHEN NOT MATCHED [ AND ] THEN {code} was: Now the MERGE INTO syntax is, ``` MERGE INTO [db_name.]target_table [AS target_alias] USING [db_name.]source_table [] [AS source_alias] ON [ WHEN MATCHED [ AND ] THEN ] [ WHEN MATCHED [ AND ] THEN ] [ WHEN NOT MATCHED [ AND ] THEN ] ``` It would be nice if we support unlimited MATCHED and NOT MATCHED clauses in MERGE INTO statement, because users may want to deal with different "AND "s, the result of which just like a series of "CASE WHEN"s. The expected syntax looks like ``` MERGE INTO [db_name.]target_table [AS target_alias] USING [db_name.]source_table [] [AS source_alias] ON [when_clause [, ...]] ``` where `when_clause` is ``` WHEN MATCHED [ AND ] THEN ``` or ``` WHEN NOT MATCHED [ AND ] THEN ``` > Support unlimited MATCHED and NOT MATCHED clauses in MERGE INTO > --- > > Key: SPARK-32030 > URL: https://issues.apache.org/jira/browse/SPARK-32030 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.1 >Reporter: Xianyin Xin >Priority: Major > > Now the {{MERGE INTO}} syntax is, > {code:sql} > MERGE INTO [db_name.]target_table [AS target_alias] > USING [db_name.]source_table [] [AS source_alias] > ON > [ WHEN MATCHED [ AND ] THEN ] > [ WHEN MATCHED [ AND ] THEN ] > [ WHEN NOT MATCHED [ AND ] THEN ]{code} > It would be nice if we support unlimited {{MATCHED}} and {{NOT MATCHED}} > clauses in {{MERGE INTO}} statement, because users may want to deal with > different "{{AND }}"s, the result of which just like a series of > "{{CASE WHEN}}"s. The expected syntax looks like > {code:sql} > MERGE INTO [db_name.]target_table [AS target_alias] > USING [db_name.]source_table [] [AS source_alias] > ON > [when_clause [, ...]] > {code} > where {{when_clause}} is > {code:java} > WHEN MATCHED [ AND ] THEN {code} > or > {code:java} > WHEN NOT MATCHED [ AND ] THEN {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-32030) Support unlimited MATCHED and NOT MATCHED clauses in MERGE INTO
Xianyin Xin created SPARK-32030: --- Summary: Support unlimited MATCHED and NOT MATCHED clauses in MERGE INTO Key: SPARK-32030 URL: https://issues.apache.org/jira/browse/SPARK-32030 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.1 Reporter: Xianyin Xin Now the MERGE INTO syntax is, ``` MERGE INTO [db_name.]target_table [AS target_alias] USING [db_name.]source_table [] [AS source_alias] ON [ WHEN MATCHED [ AND ] THEN ] [ WHEN MATCHED [ AND ] THEN ] [ WHEN NOT MATCHED [ AND ] THEN ] ``` It would be nice if we support unlimited MATCHED and NOT MATCHED clauses in MERGE INTO statement, because users may want to deal with different "AND "s, the result of which just like a series of "CASE WHEN"s. The expected syntax looks like ``` MERGE INTO [db_name.]target_table [AS target_alias] USING [db_name.]source_table [] [AS source_alias] ON [when_clause [, ...]] ``` where `when_clause` is ``` WHEN MATCHED [ AND ] THEN ``` or ``` WHEN NOT MATCHED [ AND ] THEN ``` -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-31993) Generated code in 'concat_ws' fails to compile when splitting method is in effect
[ https://issues.apache.org/jira/browse/SPARK-31993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-31993. - Fix Version/s: 3.1.0 Assignee: Jungtaek Lim Resolution: Fixed > Generated code in 'concat_ws' fails to compile when splitting method is in > effect > - > > Key: SPARK-31993 > URL: https://issues.apache.org/jira/browse/SPARK-31993 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.3.4, 2.4.6, 3.0.0, 3.1.0 >Reporter: Jungtaek Lim >Assignee: Jungtaek Lim >Priority: Major > Fix For: 3.1.0 > > > https://github.com/apache/spark/blob/a0187cd6b59a6b6bb2cadc6711bb663d4d35a844/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala#L88-L195 > There're three parts of generated code in concat_ws (codes, varargCounts, > varargBuilds) and all parts try to split method by itself, while > `varargCounts` and `varargBuilds` refer on the generated code in `codes`, > hence the overall generated code fails to compile if any of part succeeds to > split. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32029) Make activeSession null when application end
[ https://issues.apache.org/jira/browse/SPARK-32029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140235#comment-17140235 ] Apache Spark commented on SPARK-32029: -- User 'ulysses-you' has created a pull request for this issue: https://github.com/apache/spark/pull/28868 > Make activeSession null when application end > > > Key: SPARK-32029 > URL: https://issues.apache.org/jira/browse/SPARK-32029 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0 >Reporter: ulysses you >Priority: Minor > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32029) Make activeSession null when application end
[ https://issues.apache.org/jira/browse/SPARK-32029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32029: Assignee: (was: Apache Spark) > Make activeSession null when application end > > > Key: SPARK-32029 > URL: https://issues.apache.org/jira/browse/SPARK-32029 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0 >Reporter: ulysses you >Priority: Minor > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32029) Make activeSession null when application end
[ https://issues.apache.org/jira/browse/SPARK-32029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32029: Assignee: Apache Spark > Make activeSession null when application end > > > Key: SPARK-32029 > URL: https://issues.apache.org/jira/browse/SPARK-32029 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0 >Reporter: ulysses you >Assignee: Apache Spark >Priority: Minor > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32029) Make activeSession null when application end
[ https://issues.apache.org/jira/browse/SPARK-32029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140234#comment-17140234 ] Apache Spark commented on SPARK-32029: -- User 'ulysses-you' has created a pull request for this issue: https://github.com/apache/spark/pull/28868 > Make activeSession null when application end > > > Key: SPARK-32029 > URL: https://issues.apache.org/jira/browse/SPARK-32029 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0 >Reporter: ulysses you >Priority: Minor > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org