date:20201117

[jira] [Commented] (SPARK-32852) spark.sql.hive.metastore.jars support HDFS location

2020-11-17 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17234318#comment-17234318
 ] 

Apache Spark commented on SPARK-32852:
--

User 'gengliangwang' has created a pull request for this issue:
https://github.com/apache/spark/pull/30407

> spark.sql.hive.metastore.jars support HDFS location
> ---
>
> Key: SPARK-32852
> URL: https://issues.apache.org/jira/browse/SPARK-32852
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Yuming Wang
>Assignee: angerszhu
>Priority: Major
> Fix For: 3.1.0
>
>
> It would be great if {{spark.sql.hive.metastore.jars}} supports HDFS 
> location. The cluster mode will be very convenient.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-32852) spark.sql.hive.metastore.jars support HDFS location

2020-11-17 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17234319#comment-17234319
 ] 

Apache Spark commented on SPARK-32852:
--

User 'gengliangwang' has created a pull request for this issue:
https://github.com/apache/spark/pull/30407

> spark.sql.hive.metastore.jars support HDFS location
> ---
>
> Key: SPARK-32852
> URL: https://issues.apache.org/jira/browse/SPARK-32852
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Yuming Wang
>Assignee: angerszhu
>Priority: Major
> Fix For: 3.1.0
>
>
> It would be great if {{spark.sql.hive.metastore.jars}} supports HDFS 
> location. The cluster mode will be very convenient.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33473) Extend interpreted subexpression elimination to other interpreted projection

2020-11-17 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17234286#comment-17234286
 ] 

Apache Spark commented on SPARK-33473:
--

User 'viirya' has created a pull request for this issue:
https://github.com/apache/spark/pull/30406

> Extend interpreted subexpression elimination to other interpreted projection
> 
>
> Key: SPARK-33473
> URL: https://issues.apache.org/jira/browse/SPARK-33473
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
>
> Similar to InterpretedUnsafeProjection, we can extend interpreted 
> subexpression elimination to InterpretedMutableProjection and 
> InterpretedSafeProjection.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33473) Extend interpreted subexpression elimination to other interpreted projection

2020-11-17 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33473:


Assignee: L. C. Hsieh  (was: Apache Spark)

> Extend interpreted subexpression elimination to other interpreted projection
> 
>
> Key: SPARK-33473
> URL: https://issues.apache.org/jira/browse/SPARK-33473
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
>
> Similar to InterpretedUnsafeProjection, we can extend interpreted 
> subexpression elimination to InterpretedMutableProjection and 
> InterpretedSafeProjection.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33473) Extend interpreted subexpression elimination to other interpreted projection

2020-11-17 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17234285#comment-17234285
 ] 

Apache Spark commented on SPARK-33473:
--

User 'viirya' has created a pull request for this issue:
https://github.com/apache/spark/pull/30406

> Extend interpreted subexpression elimination to other interpreted projection
> 
>
> Key: SPARK-33473
> URL: https://issues.apache.org/jira/browse/SPARK-33473
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
>
> Similar to InterpretedUnsafeProjection, we can extend interpreted 
> subexpression elimination to InterpretedMutableProjection and 
> InterpretedSafeProjection.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33473) Extend interpreted subexpression elimination to other interpreted projection

2020-11-17 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33473:


Assignee: Apache Spark  (was: L. C. Hsieh)

> Extend interpreted subexpression elimination to other interpreted projection
> 
>
> Key: SPARK-33473
> URL: https://issues.apache.org/jira/browse/SPARK-33473
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: L. C. Hsieh
>Assignee: Apache Spark
>Priority: Major
>
> Similar to InterpretedUnsafeProjection, we can extend interpreted 
> subexpression elimination to InterpretedMutableProjection and 
> InterpretedSafeProjection.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33476) Generalize ExecutorSource to expose user-given file system schemes

2020-11-17 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-33476:
--
Summary: Generalize ExecutorSource to expose user-given file system schemes 
 (was: Generalize filesystem metric to support user-given file scheme)

> Generalize ExecutorSource to expose user-given file system schemes
> --
>
> Key: SPARK-33476
> URL: https://issues.apache.org/jira/browse/SPARK-33476
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33476) Generalize filesystem metric to support user-given file scheme

2020-11-17 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17234248#comment-17234248
 ] 

Apache Spark commented on SPARK-33476:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/30405

> Generalize filesystem metric to support user-given file scheme
> --
>
> Key: SPARK-33476
> URL: https://issues.apache.org/jira/browse/SPARK-33476
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33475) Bump ANTLR runtime version to 4.8-1

2020-11-17 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17234247#comment-17234247
 ] 

Apache Spark commented on SPARK-33475:
--

User 'maropu' has created a pull request for this issue:
https://github.com/apache/spark/pull/30404

> Bump ANTLR runtime version to 4.8-1
> ---
>
> Key: SPARK-33475
> URL: https://issues.apache.org/jira/browse/SPARK-33475
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.1.0
>Reporter: Takeshi Yamamuro
>Priority: Major
>
> This PR intends to upgrade ANTLR runtime from 4.7.1 to 4.8-1.
> Release note of v4.8 and v4.7.2 (the v4.7.2 release has a few minor bug fixes 
> for java targets):
>  - v4.8: https://github.com/antlr/antlr4/releases/tag/4.8
>  - v4.7.2: https://github.com/antlr/antlr4/releases/tag/4.7.2



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33476) Generalize filesystem metric to support user-given file scheme

2020-11-17 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33476:


Assignee: (was: Apache Spark)

> Generalize filesystem metric to support user-given file scheme
> --
>
> Key: SPARK-33476
> URL: https://issues.apache.org/jira/browse/SPARK-33476
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33476) Generalize filesystem metric to support user-given file scheme

2020-11-17 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17234246#comment-17234246
 ] 

Apache Spark commented on SPARK-33476:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/30405

> Generalize filesystem metric to support user-given file scheme
> --
>
> Key: SPARK-33476
> URL: https://issues.apache.org/jira/browse/SPARK-33476
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33475) Bump ANTLR runtime version to 4.8-1

2020-11-17 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33475:


Assignee: (was: Apache Spark)

> Bump ANTLR runtime version to 4.8-1
> ---
>
> Key: SPARK-33475
> URL: https://issues.apache.org/jira/browse/SPARK-33475
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.1.0
>Reporter: Takeshi Yamamuro
>Priority: Major
>
> This PR intends to upgrade ANTLR runtime from 4.7.1 to 4.8-1.
> Release note of v4.8 and v4.7.2 (the v4.7.2 release has a few minor bug fixes 
> for java targets):
>  - v4.8: https://github.com/antlr/antlr4/releases/tag/4.8
>  - v4.7.2: https://github.com/antlr/antlr4/releases/tag/4.7.2



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33475) Bump ANTLR runtime version to 4.8-1

2020-11-17 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33475:


Assignee: Apache Spark

> Bump ANTLR runtime version to 4.8-1
> ---
>
> Key: SPARK-33475
> URL: https://issues.apache.org/jira/browse/SPARK-33475
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.1.0
>Reporter: Takeshi Yamamuro
>Assignee: Apache Spark
>Priority: Major
>
> This PR intends to upgrade ANTLR runtime from 4.7.1 to 4.8-1.
> Release note of v4.8 and v4.7.2 (the v4.7.2 release has a few minor bug fixes 
> for java targets):
>  - v4.8: https://github.com/antlr/antlr4/releases/tag/4.8
>  - v4.7.2: https://github.com/antlr/antlr4/releases/tag/4.7.2



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33476) Generalize filesystem metric to support user-given file scheme

2020-11-17 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33476:


Assignee: Apache Spark

> Generalize filesystem metric to support user-given file scheme
> --
>
> Key: SPARK-33476
> URL: https://issues.apache.org/jira/browse/SPARK-33476
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33476) Generalize filesystem metric to support user-given file scheme

2020-11-17 Thread Dongjoon Hyun (Jira)

Dongjoon Hyun created SPARK-33476:
-

 Summary: Generalize filesystem metric to support user-given file 
scheme
 Key: SPARK-33476
 URL: https://issues.apache.org/jira/browse/SPARK-33476
 Project: Spark
  Issue Type: New Feature
  Components: Spark Core
Affects Versions: 3.1.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33475) Bump ANTLR runtime version to 4.8-1

2020-11-17 Thread Takeshi Yamamuro (Jira)

Takeshi Yamamuro created SPARK-33475:


 Summary: Bump ANTLR runtime version to 4.8-1
 Key: SPARK-33475
 URL: https://issues.apache.org/jira/browse/SPARK-33475
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.1.0
Reporter: Takeshi Yamamuro


This PR intends to upgrade ANTLR runtime from 4.7.1 to 4.8-1.
Release note of v4.8 and v4.7.2 (the v4.7.2 release has a few minor bug fixes 
for java targets):

 - v4.8: https://github.com/antlr/antlr4/releases/tag/4.8
 - v4.7.2: https://github.com/antlr/antlr4/releases/tag/4.7.2



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33448) Migrate CACHE/UNCACHE TABLE to new resolution framework

2020-11-17 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33448:


Assignee: Apache Spark

> Migrate CACHE/UNCACHE TABLE to new resolution framework
> ---
>
> Key: SPARK-33448
> URL: https://issues.apache.org/jira/browse/SPARK-33448
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Terry Kim
>Assignee: Apache Spark
>Priority: Minor
>
> Migrate CACHE/UNCACHE TABLE to new resolution framework.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33448) Migrate CACHE/UNCACHE TABLE to new resolution framework

2020-11-17 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33448:


Assignee: (was: Apache Spark)

> Migrate CACHE/UNCACHE TABLE to new resolution framework
> ---
>
> Key: SPARK-33448
> URL: https://issues.apache.org/jira/browse/SPARK-33448
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Terry Kim
>Priority: Minor
>
> Migrate CACHE/UNCACHE TABLE to new resolution framework.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33448) Migrate CACHE/UNCACHE TABLE to new resolution framework

2020-11-17 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17234223#comment-17234223
 ] 

Apache Spark commented on SPARK-33448:
--

User 'imback82' has created a pull request for this issue:
https://github.com/apache/spark/pull/30403

> Migrate CACHE/UNCACHE TABLE to new resolution framework
> ---
>
> Key: SPARK-33448
> URL: https://issues.apache.org/jira/browse/SPARK-33448
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Terry Kim
>Priority: Minor
>
> Migrate CACHE/UNCACHE TABLE to new resolution framework.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33474) Incorrect value when inserting into date type partition table with date type value

2020-11-17 Thread Yuming Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-33474:

Description: 
{code:java}
create table test_part (name STRING) partitioned by (part date) STORED AS 
PARQUET;
insert into test_part partition(part = date '2019-01-02') values('a');
select * from test_part;
{code}


{noformat}
spark-sql> select * from test_part;
a   NULL
{noformat}



  was:

{code:java}
create table test_part (name STRING) partitioned by (part date) STORED AS 
PARQUET;
insert into test_part partition(part = date '2019-01-02') values('a');
select * from test_part;
{code}


{noformat}
spark-sql> select * from test_part;
a   NUL
{noformat}




> Incorrect value when inserting into date type partition table with date type 
> value
> --
>
> Key: SPARK-33474
> URL: https://issues.apache.org/jira/browse/SPARK-33474
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Yuming Wang
>Priority: Major
>
> {code:java}
> create table test_part (name STRING) partitioned by (part date) STORED AS 
> PARQUET;
> insert into test_part partition(part = date '2019-01-02') values('a');
> select * from test_part;
> {code}
> {noformat}
> spark-sql> select * from test_part;
> a NULL
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33474) Incorrect value when inserting into date type partition table with date type value

2020-11-17 Thread Yuming Wang (Jira)

Yuming Wang created SPARK-33474:
---

 Summary: Incorrect value when inserting into date type partition 
table with date type value
 Key: SPARK-33474
 URL: https://issues.apache.org/jira/browse/SPARK-33474
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.1.0
Reporter: Yuming Wang



{code:java}
create table test_part (name STRING) partitioned by (part date) STORED AS 
PARQUET;
insert into test_part partition(part = date '2019-01-02') values('a');
select * from test_part;
{code}


{noformat}
spark-sql> select * from test_part;
a   NUL
{noformat}





--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33473) Extend interpreted subexpression elimination to other interpreted projection

2020-11-17 Thread L. C. Hsieh (Jira)

L. C. Hsieh created SPARK-33473:
---

 Summary: Extend interpreted subexpression elimination to other 
interpreted projection
 Key: SPARK-33473
 URL: https://issues.apache.org/jira/browse/SPARK-33473
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.1.0
Reporter: L. C. Hsieh
Assignee: L. C. Hsieh


Similar to InterpretedUnsafeProjection, we can extend interpreted subexpression 
elimination to InterpretedMutableProjection and InterpretedSafeProjection.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31069) high cpu caused by chunksBeingTransferred in external shuffle service

2020-11-17 Thread Mridul Muralidharan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mridul Muralidharan reassigned SPARK-31069:
---

Assignee: angerszhu

> high cpu caused by chunksBeingTransferred in external shuffle service
> -
>
> Key: SPARK-31069
> URL: https://issues.apache.org/jira/browse/SPARK-31069
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle, Spark Core
>Affects Versions: 3.1.0
>Reporter: Xiaoju Wu
>Assignee: angerszhu
>Priority: Major
> Fix For: 3.1.0
>
>
> "shuffle-chunk-fetch-handler-2-40" #250 daemon prio=5 os_prio=0 
> tid=0x02ac nid=0xb9b3 runnable [0x7ff20a1af000]
>java.lang.Thread.State: RUNNABLE
> at 
> java.util.concurrent.ConcurrentHashMap$Traverser.advance(ConcurrentHashMap.java:3339)
> at 
> java.util.concurrent.ConcurrentHashMap$ValueIterator.next(ConcurrentHashMap.java:3439)
> at 
> org.apache.spark.network.server.OneForOneStreamManager.chunksBeingTransferred(OneForOneStreamManager.java:184)
> at 
> org.apache.spark.network.server.ChunkFetchRequestHandler.channelRead0(ChunkFetchRequestHandler.java:85)
> at 
> org.apache.spark.network.server.ChunkFetchRequestHandler.channelRead0(ChunkFetchRequestHandler.java:51)
> at 
> org.spark_project.io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
> at 
> org.spark_project.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
> at 
> org.spark_project.io.netty.channel.AbstractChannelHandlerContext.access$600(AbstractChannelHandlerContext.java:38)
> at 
> org.spark_project.io.netty.channel.AbstractChannelHandlerContext$7.run(AbstractChannelHandlerContext.java:353)
> at 
> org.spark_project.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
> at 
> org.spark_project.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:403)
> at 
> org.spark_project.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:463)
> at 
> org.spark_project.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
> at 
> org.spark_project.io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
> at java.lang.Thread.run(Thread.java:748)
>  
>  
>  
> "shuffle-chunk-fetch-handler-2-48" #235 daemon prio=5 os_prio=0 
> tid=0x7ff2302ec800 nid=0xb9ad runnable [0x7ff20a7b4000]
>java.lang.Thread.State: RUNNABLE
> at 
> org.apache.spark.network.server.OneForOneStreamManager.chunksBeingTransferred(OneForOneStreamManager.java:186)
> at 
> org.apache.spark.network.server.ChunkFetchRequestHandler.channelRead0(ChunkFetchRequestHandler.java:85)
> at 
> org.apache.spark.network.server.ChunkFetchRequestHandler.channelRead0(ChunkFetchRequestHandler.java:51)
> at 
> org.spark_project.io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
> at 
> org.spark_project.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
> at 
> org.spark_project.io.netty.channel.AbstractChannelHandlerContext.access$600(AbstractChannelHandlerContext.java:38)
> at 
> org.spark_project.io.netty.channel.AbstractChannelHandlerContext$7.run(AbstractChannelHandlerContext.java:353)
> at 
> org.spark_project.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
> at 
> org.spark_project.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:403)
> at 
> org.spark_project.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:463)
> at 
> org.spark_project.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
> at 
> org.spark_project.io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
> at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31069) high cpu caused by chunksBeingTransferred in external shuffle service

2020-11-17 Thread Mridul Muralidharan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mridul Muralidharan reassigned SPARK-31069:
---

Assignee: angerszhu  (was: angerszhu)

> high cpu caused by chunksBeingTransferred in external shuffle service
> -
>
> Key: SPARK-31069
> URL: https://issues.apache.org/jira/browse/SPARK-31069
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle, Spark Core
>Affects Versions: 3.1.0
>Reporter: Xiaoju Wu
>Assignee: angerszhu
>Priority: Major
> Fix For: 3.1.0
>
>
> "shuffle-chunk-fetch-handler-2-40" #250 daemon prio=5 os_prio=0 
> tid=0x02ac nid=0xb9b3 runnable [0x7ff20a1af000]
>java.lang.Thread.State: RUNNABLE
> at 
> java.util.concurrent.ConcurrentHashMap$Traverser.advance(ConcurrentHashMap.java:3339)
> at 
> java.util.concurrent.ConcurrentHashMap$ValueIterator.next(ConcurrentHashMap.java:3439)
> at 
> org.apache.spark.network.server.OneForOneStreamManager.chunksBeingTransferred(OneForOneStreamManager.java:184)
> at 
> org.apache.spark.network.server.ChunkFetchRequestHandler.channelRead0(ChunkFetchRequestHandler.java:85)
> at 
> org.apache.spark.network.server.ChunkFetchRequestHandler.channelRead0(ChunkFetchRequestHandler.java:51)
> at 
> org.spark_project.io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
> at 
> org.spark_project.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
> at 
> org.spark_project.io.netty.channel.AbstractChannelHandlerContext.access$600(AbstractChannelHandlerContext.java:38)
> at 
> org.spark_project.io.netty.channel.AbstractChannelHandlerContext$7.run(AbstractChannelHandlerContext.java:353)
> at 
> org.spark_project.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
> at 
> org.spark_project.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:403)
> at 
> org.spark_project.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:463)
> at 
> org.spark_project.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
> at 
> org.spark_project.io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
> at java.lang.Thread.run(Thread.java:748)
>  
>  
>  
> "shuffle-chunk-fetch-handler-2-48" #235 daemon prio=5 os_prio=0 
> tid=0x7ff2302ec800 nid=0xb9ad runnable [0x7ff20a7b4000]
>java.lang.Thread.State: RUNNABLE
> at 
> org.apache.spark.network.server.OneForOneStreamManager.chunksBeingTransferred(OneForOneStreamManager.java:186)
> at 
> org.apache.spark.network.server.ChunkFetchRequestHandler.channelRead0(ChunkFetchRequestHandler.java:85)
> at 
> org.apache.spark.network.server.ChunkFetchRequestHandler.channelRead0(ChunkFetchRequestHandler.java:51)
> at 
> org.spark_project.io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
> at 
> org.spark_project.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
> at 
> org.spark_project.io.netty.channel.AbstractChannelHandlerContext.access$600(AbstractChannelHandlerContext.java:38)
> at 
> org.spark_project.io.netty.channel.AbstractChannelHandlerContext$7.run(AbstractChannelHandlerContext.java:353)
> at 
> org.spark_project.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
> at 
> org.spark_project.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:403)
> at 
> org.spark_project.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:463)
> at 
> org.spark_project.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
> at 
> org.spark_project.io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
> at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-31069) high cpu caused by chunksBeingTransferred in external shuffle service

2020-11-17 Thread Mridul Muralidharan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mridul Muralidharan resolved SPARK-31069.
-
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 30139
[https://github.com/apache/spark/pull/30139]

> high cpu caused by chunksBeingTransferred in external shuffle service
> -
>
> Key: SPARK-31069
> URL: https://issues.apache.org/jira/browse/SPARK-31069
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle, Spark Core
>Affects Versions: 3.1.0
>Reporter: Xiaoju Wu
>Priority: Major
> Fix For: 3.1.0
>
>
> "shuffle-chunk-fetch-handler-2-40" #250 daemon prio=5 os_prio=0 
> tid=0x02ac nid=0xb9b3 runnable [0x7ff20a1af000]
>java.lang.Thread.State: RUNNABLE
> at 
> java.util.concurrent.ConcurrentHashMap$Traverser.advance(ConcurrentHashMap.java:3339)
> at 
> java.util.concurrent.ConcurrentHashMap$ValueIterator.next(ConcurrentHashMap.java:3439)
> at 
> org.apache.spark.network.server.OneForOneStreamManager.chunksBeingTransferred(OneForOneStreamManager.java:184)
> at 
> org.apache.spark.network.server.ChunkFetchRequestHandler.channelRead0(ChunkFetchRequestHandler.java:85)
> at 
> org.apache.spark.network.server.ChunkFetchRequestHandler.channelRead0(ChunkFetchRequestHandler.java:51)
> at 
> org.spark_project.io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
> at 
> org.spark_project.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
> at 
> org.spark_project.io.netty.channel.AbstractChannelHandlerContext.access$600(AbstractChannelHandlerContext.java:38)
> at 
> org.spark_project.io.netty.channel.AbstractChannelHandlerContext$7.run(AbstractChannelHandlerContext.java:353)
> at 
> org.spark_project.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
> at 
> org.spark_project.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:403)
> at 
> org.spark_project.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:463)
> at 
> org.spark_project.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
> at 
> org.spark_project.io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
> at java.lang.Thread.run(Thread.java:748)
>  
>  
>  
> "shuffle-chunk-fetch-handler-2-48" #235 daemon prio=5 os_prio=0 
> tid=0x7ff2302ec800 nid=0xb9ad runnable [0x7ff20a7b4000]
>java.lang.Thread.State: RUNNABLE
> at 
> org.apache.spark.network.server.OneForOneStreamManager.chunksBeingTransferred(OneForOneStreamManager.java:186)
> at 
> org.apache.spark.network.server.ChunkFetchRequestHandler.channelRead0(ChunkFetchRequestHandler.java:85)
> at 
> org.apache.spark.network.server.ChunkFetchRequestHandler.channelRead0(ChunkFetchRequestHandler.java:51)
> at 
> org.spark_project.io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
> at 
> org.spark_project.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
> at 
> org.spark_project.io.netty.channel.AbstractChannelHandlerContext.access$600(AbstractChannelHandlerContext.java:38)
> at 
> org.spark_project.io.netty.channel.AbstractChannelHandlerContext$7.run(AbstractChannelHandlerContext.java:353)
> at 
> org.spark_project.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
> at 
> org.spark_project.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:403)
> at 
> org.spark_project.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:463)
> at 
> org.spark_project.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
> at 
> org.spark_project.io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
> at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-33465) RDD.takeOrdered should get rid of usage of reduce or use treeReduce instead

2020-11-17 Thread L. C. Hsieh (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

L. C. Hsieh resolved SPARK-33465.
-
Resolution: Not A Problem

> RDD.takeOrdered should get rid of usage of reduce or use treeReduce instead
> ---
>
> Key: SPARK-33465
> URL: https://issues.apache.org/jira/browse/SPARK-33465
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.1.0
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
>
> {{RDD.takeOrdered}} API sorts elements in each partition and puts the sorted 
> elements into a `BoundedPriorityQueue`.  So actually in the result RDD each 
> partition has only one priority queue. Then the API calls {{RDD.reduce}} API 
> to reduce the elements. But as mentioned before the RDD has only one queue at 
> each partition, it doesn't make sense to call reduce to reduce elements (here 
> the element is queue).
> We should either simplify {{RDD.reduce}} call in {{RDD.takeOrdered}} or 
> replace it with {{treeReduce}} which can actually do partially reducing for 
> this case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33472) IllegalArgumentException when applying RemoveRedundantSorts before EnsureRequirements

2020-11-17 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33472:


Assignee: (was: Apache Spark)

> IllegalArgumentException when applying RemoveRedundantSorts before 
> EnsureRequirements
> -
>
> Key: SPARK-33472
> URL: https://issues.apache.org/jira/browse/SPARK-33472
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.8, 3.0.2, 3.1.0
>Reporter: Allison Wang
>Priority: Major
>
> `RemoveRedundantSorts` rule uses SparkPlan's `outputPartitioning` to check 
> whether a sort node is redundant. Currently, it is added before 
> `EnsureRequirements`. Since `PartitioningCollection` requires left and right 
> partitioning to have the same number of partitions, which is not necessarily 
> true before applying `EnsureRequirements`, the rule can fail with the 
> following exception:
> {{IllegalArgumentException: requirement failed: PartitioningCollection 
> requires all of its partitionings have the same numPartitions.}}
> We should switch the order between these two rules to satisfy the requirement 
> when instantiating `PartitioningCollection`.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33472) IllegalArgumentException when applying RemoveRedundantSorts before EnsureRequirements

2020-11-17 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17234163#comment-17234163
 ] 

Apache Spark commented on SPARK-33472:
--

User 'allisonwang-db' has created a pull request for this issue:
https://github.com/apache/spark/pull/30373

> IllegalArgumentException when applying RemoveRedundantSorts before 
> EnsureRequirements
> -
>
> Key: SPARK-33472
> URL: https://issues.apache.org/jira/browse/SPARK-33472
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.8, 3.0.2, 3.1.0
>Reporter: Allison Wang
>Priority: Major
>
> `RemoveRedundantSorts` rule uses SparkPlan's `outputPartitioning` to check 
> whether a sort node is redundant. Currently, it is added before 
> `EnsureRequirements`. Since `PartitioningCollection` requires left and right 
> partitioning to have the same number of partitions, which is not necessarily 
> true before applying `EnsureRequirements`, the rule can fail with the 
> following exception:
> {{IllegalArgumentException: requirement failed: PartitioningCollection 
> requires all of its partitionings have the same numPartitions.}}
> We should switch the order between these two rules to satisfy the requirement 
> when instantiating `PartitioningCollection`.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33472) IllegalArgumentException when applying RemoveRedundantSorts before EnsureRequirements

2020-11-17 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17234162#comment-17234162
 ] 

Apache Spark commented on SPARK-33472:
--

User 'allisonwang-db' has created a pull request for this issue:
https://github.com/apache/spark/pull/30373

> IllegalArgumentException when applying RemoveRedundantSorts before 
> EnsureRequirements
> -
>
> Key: SPARK-33472
> URL: https://issues.apache.org/jira/browse/SPARK-33472
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.8, 3.0.2, 3.1.0
>Reporter: Allison Wang
>Priority: Major
>
> `RemoveRedundantSorts` rule uses SparkPlan's `outputPartitioning` to check 
> whether a sort node is redundant. Currently, it is added before 
> `EnsureRequirements`. Since `PartitioningCollection` requires left and right 
> partitioning to have the same number of partitions, which is not necessarily 
> true before applying `EnsureRequirements`, the rule can fail with the 
> following exception:
> {{IllegalArgumentException: requirement failed: PartitioningCollection 
> requires all of its partitionings have the same numPartitions.}}
> We should switch the order between these two rules to satisfy the requirement 
> when instantiating `PartitioningCollection`.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33472) IllegalArgumentException when applying RemoveRedundantSorts before EnsureRequirements

2020-11-17 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33472:


Assignee: Apache Spark

> IllegalArgumentException when applying RemoveRedundantSorts before 
> EnsureRequirements
> -
>
> Key: SPARK-33472
> URL: https://issues.apache.org/jira/browse/SPARK-33472
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.8, 3.0.2, 3.1.0
>Reporter: Allison Wang
>Assignee: Apache Spark
>Priority: Major
>
> `RemoveRedundantSorts` rule uses SparkPlan's `outputPartitioning` to check 
> whether a sort node is redundant. Currently, it is added before 
> `EnsureRequirements`. Since `PartitioningCollection` requires left and right 
> partitioning to have the same number of partitions, which is not necessarily 
> true before applying `EnsureRequirements`, the rule can fail with the 
> following exception:
> {{IllegalArgumentException: requirement failed: PartitioningCollection 
> requires all of its partitionings have the same numPartitions.}}
> We should switch the order between these two rules to satisfy the requirement 
> when instantiating `PartitioningCollection`.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33472) IllegalArgumentException when applying RemoveRedundantSorts before EnsureRequirements

2020-11-17 Thread Allison Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allison Wang updated SPARK-33472:
-
Description: 
`RemoveRedundantSorts` rule uses SparkPlan's `outputPartitioning` to check 
whether a sort node is redundant. Currently, it is added before 
`EnsureRequirements`. Since `PartitioningCollection` requires left and right 
partitioning to have the same number of partitions, which is not necessarily 
true before applying `EnsureRequirements`, the rule can fail with the following 
exception:

{{IllegalArgumentException: requirement failed: PartitioningCollection requires 
all of its partitionings have the same numPartitions.}}

We should switch the order between these two rules to satisfy the requirement 
when instantiating `PartitioningCollection`.

 

  was:
`RemoveRedundantSorts` rule uses SparkPlan's `outputPartitioning` to check 
whether a sort node is redundant. Currently, it is added before 
`EnsureRequirements`. Since `PartitioningCollection` requires left and right 
partitioning to have the same number of partitions, which is not necessarily 
true before `EnsureRequirements`, the rule can fail with the following 
exception:

{{IllegalArgumentException: requirement failed: PartitioningCollection requires 
all of its partitionings have the same numPartitions.}}

We should switch the order between these two rules to satisfy the requirement 
when instantiating `PartitioningCollection`.

 


> IllegalArgumentException when applying RemoveRedundantSorts before 
> EnsureRequirements
> -
>
> Key: SPARK-33472
> URL: https://issues.apache.org/jira/browse/SPARK-33472
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.8, 3.0.2, 3.1.0
>Reporter: Allison Wang
>Priority: Major
>
> `RemoveRedundantSorts` rule uses SparkPlan's `outputPartitioning` to check 
> whether a sort node is redundant. Currently, it is added before 
> `EnsureRequirements`. Since `PartitioningCollection` requires left and right 
> partitioning to have the same number of partitions, which is not necessarily 
> true before applying `EnsureRequirements`, the rule can fail with the 
> following exception:
> {{IllegalArgumentException: requirement failed: PartitioningCollection 
> requires all of its partitionings have the same numPartitions.}}
> We should switch the order between these two rules to satisfy the requirement 
> when instantiating `PartitioningCollection`.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33472) IllegalArgumentException when applying RemoveRedundantSorts before EnsureRequirements

2020-11-17 Thread Allison Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allison Wang updated SPARK-33472:
-
Description: 
`RemoveRedundantSorts` rule uses SparkPlan's `outputPartitioning` to check 
whether a sort node is redundant. Currently, it is added before 
`EnsureRequirements`. Since `PartitioningCollection` requires left and right 
partitioning to have the same number of partitions, which is not necessarily 
true before `EnsureRequirements`, the rule can fail with the following 
exception:

{{IllegalArgumentException: requirement failed: PartitioningCollection requires 
all of its partitionings have the same numPartitions.}}

We should switch the order between these two rules to satisfy the requirement 
when instantiating `PartitioningCollection`.

 

  was:
`RemoveRedundantSorts` rule uses SparkPlan's `outputPartitioning` to check 
whether a sort node is redundant. Currently, it is added before 
`EnsureRequirements`. Since `PartitioningCollection` requires left and right 
partitioning to have the same number of partitions, which is not necessarily 
true before `EnsureRequirements`, the rule can fail with the following 
exception:

 IllegalArgumentException: requirement failed: PartitioningCollection 
requires all of its partitionings have the same numPartitions.

We should switch the order between these two rules to satisfy the requirement 
when instantiating `PartitioningCollection`.



 


> IllegalArgumentException when applying RemoveRedundantSorts before 
> EnsureRequirements
> -
>
> Key: SPARK-33472
> URL: https://issues.apache.org/jira/browse/SPARK-33472
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.8, 3.0.2, 3.1.0
>Reporter: Allison Wang
>Priority: Major
>
> `RemoveRedundantSorts` rule uses SparkPlan's `outputPartitioning` to check 
> whether a sort node is redundant. Currently, it is added before 
> `EnsureRequirements`. Since `PartitioningCollection` requires left and right 
> partitioning to have the same number of partitions, which is not necessarily 
> true before `EnsureRequirements`, the rule can fail with the following 
> exception:
> {{IllegalArgumentException: requirement failed: PartitioningCollection 
> requires all of its partitionings have the same numPartitions.}}
> We should switch the order between these two rules to satisfy the requirement 
> when instantiating `PartitioningCollection`.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33472) IllegalArgumentException when applying RemoveRedundantSorts before EnsureRequirements

2020-11-17 Thread Allison Wang (Jira)

Allison Wang created SPARK-33472:


 Summary: IllegalArgumentException when applying 
RemoveRedundantSorts before EnsureRequirements
 Key: SPARK-33472
 URL: https://issues.apache.org/jira/browse/SPARK-33472
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.4.8, 3.0.2, 3.1.0
Reporter: Allison Wang


`RemoveRedundantSorts` rule uses SparkPlan's `outputPartitioning` to check 
whether a sort node is redundant. Currently, it is added before 
`EnsureRequirements`. Since `PartitioningCollection` requires left and right 
partitioning to have the same number of partitions, which is not necessarily 
true before `EnsureRequirements`, the rule can fail with the following 
exception:

 IllegalArgumentException: requirement failed: PartitioningCollection 
requires all of its partitionings have the same numPartitions.

We should switch the order between these two rules to satisfy the requirement 
when instantiating `PartitioningCollection`.



 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33470) HistoryServer stuck "Loading history summary" with Spark 3.0.1 and hadoop 2.10.1

2020-11-17 Thread Dino Tufekcic (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dino Tufekcic updated SPARK-33470:
--
Description: 
History Server starts correctly and shows "No completed applications found!" 
message when /tmp/spark-events directory is empty. As soon as there is an event 
log in the folder generated by a Spark streaming app (haven't tested other 
apps) the History Server gets stuck showing the "Loading history summary..." 
popup.

Spark version: 3.0.1 and 3.0.0. (Spark 2.4.5 works as expected (running 
application shown in History Server))

Hadoop version: 2.10.0 and 2.10.1

Java version: openjdk 1.8.0_272

OS: CentOS 7.7

Command to reproduce:

 
{code:java}
spark-submit --class MyClass --master yarn --deploy-mode cluster --jars 
/usr/local/hadoop/share/hadoop/libs/*.jar myclass.jar{code}
spark-defaults.conf:

 

 
{code:java}
spark.eventLog.dir=/tmp/spark-events
spark.eventLog.enabled=true
spark.eventLog.rolling.enabled=true
spark.ui.enabled=true
spark.yarn.stagingDir=hdfs://10.0.1.4:54310/
{code}
 

Stack trace generated with jstack of the HistoryServer process is attached

  was:
History Server starts correctly and shows "No completed applications found!" 
message when /tmp/spark-events directory is empty. As soon as there is an event 
log in the folder generated by a Spark streaming app (haven't tested other 
apps) the History Server gets stuck showing the "Loading history summary..." 
popup.

Spark version: 3.0.1 and 3.0.0. (Spark 2.4.5 works as expected (running 
application shown in History Server))

Hadoop version: 2.10.0 and 2.10.1

Java version: openjdk 1.8.0_272

OS: CentOS 7.7

Command to reproduce:

 
{code:java}
spark-submit --class MyClass --master yarn --deploy-mode cluster --jars 
/usr/local/hadoop/share/hadoop/libs/*.jar myclass.jar{code}
spark-defaults.conf:

 

 
{code:java}
spark.eventLog.dir=/tmp/spark-events
spark.eventLog.enabled=true
spark.eventLog.rolling.enabled=true
spark.ui.enabled=true
spark.yarn.stagingDir=hdfs://10.0.1.4:54310/
{code}
 


> HistoryServer stuck "Loading history summary" with Spark 3.0.1 and hadoop 
> 2.10.1
> 
>
> Key: SPARK-33470
> URL: https://issues.apache.org/jira/browse/SPARK-33470
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0, 3.0.1
> Environment: Spark version: 3.0.1 and 3.0.0. Spark 2.4.5 works as 
> expected (running application shown in History Server)
> Hadoop version: 2.10.0 and 2.10.1 and 3.1.4
> Java version: openjdk 1.8.0_272
> OS: CentOS 7.7
>Reporter: Dino Tufekcic
>Priority: Major
> Attachments: stack.trace
>
>
> History Server starts correctly and shows "No completed applications found!" 
> message when /tmp/spark-events directory is empty. As soon as there is an 
> event log in the folder generated by a Spark streaming app (haven't tested 
> other apps) the History Server gets stuck showing the "Loading history 
> summary..." popup.
> Spark version: 3.0.1 and 3.0.0. (Spark 2.4.5 works as expected (running 
> application shown in History Server))
> Hadoop version: 2.10.0 and 2.10.1
> Java version: openjdk 1.8.0_272
> OS: CentOS 7.7
> Command to reproduce:
>  
> {code:java}
> spark-submit --class MyClass --master yarn --deploy-mode cluster --jars 
> /usr/local/hadoop/share/hadoop/libs/*.jar myclass.jar{code}
> spark-defaults.conf:
>  
>  
> {code:java}
> spark.eventLog.dir=/tmp/spark-events
> spark.eventLog.enabled=true
> spark.eventLog.rolling.enabled=true
> spark.ui.enabled=true
> spark.yarn.stagingDir=hdfs://10.0.1.4:54310/
> {code}
>  
> Stack trace generated with jstack of the HistoryServer process is attached



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33470) HistoryServer stuck "Loading history summary" with Spark 3.0.1 and hadoop 2.10.1

2020-11-17 Thread Dino Tufekcic (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dino Tufekcic updated SPARK-33470:
--
Attachment: stack.trace

> HistoryServer stuck "Loading history summary" with Spark 3.0.1 and hadoop 
> 2.10.1
> 
>
> Key: SPARK-33470
> URL: https://issues.apache.org/jira/browse/SPARK-33470
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0, 3.0.1
> Environment: Spark version: 3.0.1 and 3.0.0. Spark 2.4.5 works as 
> expected (running application shown in History Server)
> Hadoop version: 2.10.0 and 2.10.1 and 3.1.4
> Java version: openjdk 1.8.0_272
> OS: CentOS 7.7
>Reporter: Dino Tufekcic
>Priority: Major
> Attachments: stack.trace
>
>
> History Server starts correctly and shows "No completed applications found!" 
> message when /tmp/spark-events directory is empty. As soon as there is an 
> event log in the folder generated by a Spark streaming app (haven't tested 
> other apps) the History Server gets stuck showing the "Loading history 
> summary..." popup.
> Spark version: 3.0.1 and 3.0.0. (Spark 2.4.5 works as expected (running 
> application shown in History Server))
> Hadoop version: 2.10.0 and 2.10.1
> Java version: openjdk 1.8.0_272
> OS: CentOS 7.7
> Command to reproduce:
>  
> {code:java}
> spark-submit --class MyClass --master yarn --deploy-mode cluster --jars 
> /usr/local/hadoop/share/hadoop/libs/*.jar myclass.jar{code}
> spark-defaults.conf:
>  
>  
> {code:java}
> spark.eventLog.dir=/tmp/spark-events
> spark.eventLog.enabled=true
> spark.eventLog.rolling.enabled=true
> spark.ui.enabled=true
> spark.yarn.stagingDir=hdfs://10.0.1.4:54310/
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33461) Propagating SPARK_CONF_DIR in K8s and tests

2020-11-17 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-33461:
--
Target Version/s: 3.1.0

> Propagating SPARK_CONF_DIR in K8s and tests
> ---
>
> Key: SPARK-33461
> URL: https://issues.apache.org/jira/browse/SPARK-33461
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.1.0
>Reporter: Prashant Sharma
>Assignee: Prashant Sharma
>Priority: Major
>
> Foundational work for propagating SPARK_CONF_DIR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-33471) Upgrade kubernetes-client to 4.12.0

2020-11-17 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-33471.
---
Resolution: Fixed

Issue resolved by pull request 30401
[https://github.com/apache/spark/pull/30401]

> Upgrade kubernetes-client to 4.12.0
> ---
>
> Key: SPARK-33471
> URL: https://issues.apache.org/jira/browse/SPARK-33471
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, Kubernetes
>Affects Versions: 3.1.0
>Reporter: rameshkrishnan muthusamy
>Assignee: rameshkrishnan muthusamy
>Priority: Major
> Fix For: 3.1.0
>
>
> Upgrade kubernetes-client to 4.12.0



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33471) Upgrade kubernetes-client to 4.12.0

2020-11-17 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-33471:
-

Assignee: rameshkrishnan muthusamy

> Upgrade kubernetes-client to 4.12.0
> ---
>
> Key: SPARK-33471
> URL: https://issues.apache.org/jira/browse/SPARK-33471
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, Kubernetes
>Affects Versions: 3.1.0
>Reporter: rameshkrishnan muthusamy
>Assignee: rameshkrishnan muthusamy
>Priority: Major
> Fix For: 3.1.0
>
>
> Upgrade kubernetes-client to 4.12.0



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-32862) Left semi stream-stream join

2020-11-17 Thread Xiao Li (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-32862:

Labels: release-notes  (was: )

> Left semi stream-stream join
> 
>
> Key: SPARK-32862
> URL: https://issues.apache.org/jira/browse/SPARK-32862
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming
>Affects Versions: 3.1.0
>Reporter: Cheng Su
>Assignee: Cheng Su
>Priority: Major
>  Labels: release-notes
> Fix For: 3.1.0
>
>
> Current stream-stream join supports inner, left outer and right outer join 
> ([https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingSymmetricHashJoinExec.scala#L166]
>  ). We do see internally a lot of users are using left semi stream-stream 
> join (not spark structured streaming), e.g. I want to get the ad impression 
> (join left side) which has click (joint right side), but I don't care how 
> many clicks per ad (left semi semantics).
>  
> Left semi stream-stream join will work as followed:
> (1).for left side input row, check if there's a match on right side state 
> store
>   (1.1). if there's a match, output the left side row.
>   (1.2). if there's no match, put the row in left side state store (with 
> "matched" field to set to false in state store).
> (2).for right side input row, check if there's a match on left side state 
> store. If there's a match, update left side row state with "matched" field to 
> set to true. Put the right side row in right side state store.
> (3).for left side row needs to be evicted from state store, output the row if 
> "matched" field is true.
> (4).for right side row needs to be evicted from state store, doing nothing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33470) HistoryServer stuck "Loading history summary" with Spark 3.0.1 and hadoop 2.10.1

2020-11-17 Thread Dino Tufekcic (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dino Tufekcic updated SPARK-33470:
--
Environment: 
Spark version: 3.0.1 and 3.0.0. Spark 2.4.5 works as expected (running 
application shown in History Server)

Hadoop version: 2.10.0 and 2.10.1 and 3.1.4

Java version: openjdk 1.8.0_272

OS: CentOS 7.7

  was:
Spark version: 3.0.1 and 3.0.0. Spark 2.4.5 works as expected (running 
application shown in History Server)

Hadoop version: 2.10.0 and 2.10.1

Java version: openjdk 1.8.0_272

OS: CentOS 7.7


> HistoryServer stuck "Loading history summary" with Spark 3.0.1 and hadoop 
> 2.10.1
> 
>
> Key: SPARK-33470
> URL: https://issues.apache.org/jira/browse/SPARK-33470
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0, 3.0.1
> Environment: Spark version: 3.0.1 and 3.0.0. Spark 2.4.5 works as 
> expected (running application shown in History Server)
> Hadoop version: 2.10.0 and 2.10.1 and 3.1.4
> Java version: openjdk 1.8.0_272
> OS: CentOS 7.7
>Reporter: Dino Tufekcic
>Priority: Major
>
> History Server starts correctly and shows "No completed applications found!" 
> message when /tmp/spark-events directory is empty. As soon as there is an 
> event log in the folder generated by a Spark streaming app (haven't tested 
> other apps) the History Server gets stuck showing the "Loading history 
> summary..." popup.
> Spark version: 3.0.1 and 3.0.0. (Spark 2.4.5 works as expected (running 
> application shown in History Server))
> Hadoop version: 2.10.0 and 2.10.1
> Java version: openjdk 1.8.0_272
> OS: CentOS 7.7
> Command to reproduce:
>  
> {code:java}
> spark-submit --class MyClass --master yarn --deploy-mode cluster --jars 
> /usr/local/hadoop/share/hadoop/libs/*.jar myclass.jar{code}
> spark-defaults.conf:
>  
>  
> {code:java}
> spark.eventLog.dir=/tmp/spark-events
> spark.eventLog.enabled=true
> spark.eventLog.rolling.enabled=true
> spark.ui.enabled=true
> spark.yarn.stagingDir=hdfs://10.0.1.4:54310/
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33471) Upgrade kubernetes-client to 4.12.0

2020-11-17 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-33471:
--
Parent: SPARK-33005
Issue Type: Sub-task  (was: Improvement)

> Upgrade kubernetes-client to 4.12.0
> ---
>
> Key: SPARK-33471
> URL: https://issues.apache.org/jira/browse/SPARK-33471
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, Kubernetes
>Affects Versions: 3.1.0
>Reporter: rameshkrishnan muthusamy
>Priority: Major
> Fix For: 3.1.0
>
>
> Upgrade kubernetes-client to 4.12.0



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33471) Upgrade kubernetes-client to 4.12.0

2020-11-17 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33471:


Assignee: Apache Spark

> Upgrade kubernetes-client to 4.12.0
> ---
>
> Key: SPARK-33471
> URL: https://issues.apache.org/jira/browse/SPARK-33471
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, Kubernetes
>Affects Versions: 3.1.0
>Reporter: rameshkrishnan muthusamy
>Assignee: Apache Spark
>Priority: Major
> Fix For: 3.1.0
>
>
> Upgrade kubernetes-client to 4.12.0



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33471) Upgrade kubernetes-client to 4.12.0

2020-11-17 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17233931#comment-17233931
 ] 

Apache Spark commented on SPARK-33471:
--

User 'ramesh-muthusamy' has created a pull request for this issue:
https://github.com/apache/spark/pull/30401

> Upgrade kubernetes-client to 4.12.0
> ---
>
> Key: SPARK-33471
> URL: https://issues.apache.org/jira/browse/SPARK-33471
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, Kubernetes
>Affects Versions: 3.1.0
>Reporter: rameshkrishnan muthusamy
>Priority: Major
> Fix For: 3.1.0
>
>
> Upgrade kubernetes-client to 4.12.0



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33471) Upgrade kubernetes-client to 4.12.0

2020-11-17 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33471:


Assignee: (was: Apache Spark)

> Upgrade kubernetes-client to 4.12.0
> ---
>
> Key: SPARK-33471
> URL: https://issues.apache.org/jira/browse/SPARK-33471
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, Kubernetes
>Affects Versions: 3.1.0
>Reporter: rameshkrishnan muthusamy
>Priority: Major
> Fix For: 3.1.0
>
>
> Upgrade kubernetes-client to 4.12.0



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33471) Upgrade kubernetes-client to 4.12.0

2020-11-17 Thread rameshkrishnan muthusamy (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17233928#comment-17233928
 ] 

rameshkrishnan muthusamy commented on SPARK-33471:
--

https://github.com/apache/spark/pull/30401/files

> Upgrade kubernetes-client to 4.12.0
> ---
>
> Key: SPARK-33471
> URL: https://issues.apache.org/jira/browse/SPARK-33471
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, Kubernetes
>Affects Versions: 3.1.0
>Reporter: rameshkrishnan muthusamy
>Priority: Major
> Fix For: 3.1.0
>
>
> Upgrade kubernetes-client to 4.12.0



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33471) Upgrade kubernetes-client to 4.12.0

2020-11-17 Thread rameshkrishnan muthusamy (Jira)

rameshkrishnan muthusamy created SPARK-33471:


 Summary: Upgrade kubernetes-client to 4.12.0
 Key: SPARK-33471
 URL: https://issues.apache.org/jira/browse/SPARK-33471
 Project: Spark
  Issue Type: Improvement
  Components: Build, Kubernetes
Affects Versions: 3.1.0
Reporter: rameshkrishnan muthusamy
 Fix For: 3.1.0


Upgrade kubernetes-client to 4.12.0



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33400) Normalize sameOrderExpressions in SortOrder

2020-11-17 Thread Prakhar Jain (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prakhar Jain updated SPARK-33400:
-
Summary: Normalize sameOrderExpressions in SortOrder  (was: Reduce unneeded 
sorts between two SortMergeJoins)

> Normalize sameOrderExpressions in SortOrder
> ---
>
> Key: SPARK-33400
> URL: https://issues.apache.org/jira/browse/SPARK-33400
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 2.4.7, 3.0.0, 3.0.1
>Reporter: Prakhar Jain
>Priority: Major
>
> When a SortMergeJoin is followed by a Project with aliases, the 
> outputOrdering is not propogated properly and in some cases, it leads to 
> unrequired Sort operation:
>  
>  
> {noformat}
> spark.range(10).repartition($"id").createTempView("t1")
> spark.range(20).repartition($"id").createTempView("t2")
> spark.range(30).repartition($"id").createTempView("t3")
> val planned = sql(
>"""
>  |SELECT t2id, t3.id as t3id
>  |FROM (
>  |SELECT t1.id as t1id, t2.id as t2id
>  |FROM t1, t2
>  |WHERE t1.id = t2.id
>  |) t12, t3
>  |WHERE t1id = t3.id
>""".stripMargin).queryExecution.executedPlan
> *(8) Project [t2id#1059L, id#1004L AS t3id#1060L]
> +- *(8) SortMergeJoin [t2id#1059L], [id#1004L], Inner
>:- *(5) Sort [t2id#1059L ASC NULLS FIRST ], false, 0  
> <---
>:  +- *(5) Project [id#1000L AS t2id#1059L]
>: +- *(5) SortMergeJoin [id#996L], [id#1000L], Inner
>::- *(2) Sort [id#996L ASC NULLS FIRST ], false, 0
>::  +- Exchange hashpartitioning(id#996L, 5), true, [id=#1426]
>:: +- *(1) Range (0, 10, step=1, splits=2)
>:+- *(4) Sort [id#1000L ASC NULLS FIRST ], false, 0
>:   +- Exchange hashpartitioning(id#1000L, 5), true, [id=#1432]
>:  +- *(3) Range (0, 20, step=1, splits=2)
>+- *(7) Sort [id#1004L ASC NULLS FIRST ], false, 0
>   +- Exchange hashpartitioning(id#1004L, 5), true, [id=#1443]
>  +- *(6) Range (0, 30, step=1, splits=2)
> {noformat}
> The above marked Sort node could have been avoided.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33470) HistoryServer stuck "Loading history summary" with Spark 3.0.1 and hadoop 2.10.1

2020-11-17 Thread Dino Tufekcic (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dino Tufekcic updated SPARK-33470:
--
Description: 
History Server starts correctly and shows "No completed applications found!" 
message when /tmp/spark-events directory is empty. As soon as there is an event 
log in the folder generated by a Spark streaming app (haven't tested other 
apps) the History Server gets stuck showing the "Loading history summary..." 
popup.

Spark version: 3.0.1 and 3.0.0. (Spark 2.4.5 works as expected (running 
application shown in History Server))

Hadoop version: 2.10.0 and 2.10.1

Java version: openjdk 1.8.0_272

OS: CentOS 7.7

Command to reproduce:

 
{code:java}
spark-submit --class MyClass --master yarn --deploy-mode cluster --jars 
/usr/local/hadoop/share/hadoop/libs/*.jar myclass.jar{code}
spark-defaults.conf:

 

 
{code:java}
spark.eventLog.dir=/tmp/spark-events
spark.eventLog.enabled=true
spark.eventLog.rolling.enabled=true
spark.ui.enabled=true
spark.yarn.stagingDir=hdfs://10.0.1.4:54310/
{code}
 

  was:
History Server starts correctly and shows "No completed applications found!" 
message when /tmp/spark-events directory is empty. As soon as there is an event 
log in the folder generated by a Spark streaming app (haven't tested other 
apps) the History Server gets stuck showing the "Loading history summary..." 
popup.

Spark version: 3.0.1 and 3.0.0. Spark 2.4.5 works as expected (running 
application shown in History Server)

Hadoop version: 2.10.0 and 2.10.1

Java version: openjdk 1.8.0_272

OS: CentOS 7.7

Command to reproduce:

 
{code:java}
spark-submit --class MyClass --master yarn --deploy-mode cluster --jars 
/usr/local/hadoop/share/hadoop/libs/*.jar myclass.jar{code}

spark-defaults.conf:

 

 
{code:java}
spark.eventLog.dir=/tmp/spark-events
spark.eventLog.enabled=true
spark.eventLog.rolling.enabled=true
spark.ui.enabled=true
spark.yarn.stagingDir=hdfs://10.0.1.4:54310/
{code}
 


> HistoryServer stuck "Loading history summary" with Spark 3.0.1 and hadoop 
> 2.10.1
> 
>
> Key: SPARK-33470
> URL: https://issues.apache.org/jira/browse/SPARK-33470
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0, 3.0.1
> Environment: Spark version: 3.0.1 and 3.0.0. Spark 2.4.5 works as 
> expected (running application shown in History Server)
> Hadoop version: 2.10.0 and 2.10.1
> Java version: openjdk 1.8.0_272
> OS: CentOS 7.7
>Reporter: Dino Tufekcic
>Priority: Major
>
> History Server starts correctly and shows "No completed applications found!" 
> message when /tmp/spark-events directory is empty. As soon as there is an 
> event log in the folder generated by a Spark streaming app (haven't tested 
> other apps) the History Server gets stuck showing the "Loading history 
> summary..." popup.
> Spark version: 3.0.1 and 3.0.0. (Spark 2.4.5 works as expected (running 
> application shown in History Server))
> Hadoop version: 2.10.0 and 2.10.1
> Java version: openjdk 1.8.0_272
> OS: CentOS 7.7
> Command to reproduce:
>  
> {code:java}
> spark-submit --class MyClass --master yarn --deploy-mode cluster --jars 
> /usr/local/hadoop/share/hadoop/libs/*.jar myclass.jar{code}
> spark-defaults.conf:
>  
>  
> {code:java}
> spark.eventLog.dir=/tmp/spark-events
> spark.eventLog.enabled=true
> spark.eventLog.rolling.enabled=true
> spark.ui.enabled=true
> spark.yarn.stagingDir=hdfs://10.0.1.4:54310/
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33470) HistoryServer stuck "Loading history summary" with Spark 3.0.1 and hadoop 2.10.1

2020-11-17 Thread Dino Tufekcic (Jira)

Dino Tufekcic created SPARK-33470:
-

 Summary: HistoryServer stuck "Loading history summary" with Spark 
3.0.1 and hadoop 2.10.1
 Key: SPARK-33470
 URL: https://issues.apache.org/jira/browse/SPARK-33470
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.0.1, 3.0.0
 Environment: Spark version: 3.0.1 and 3.0.0. Spark 2.4.5 works as 
expected (running application shown in History Server)

Hadoop version: 2.10.0 and 2.10.1

Java version: openjdk 1.8.0_272

OS: CentOS 7.7
Reporter: Dino Tufekcic


History Server starts correctly and shows "No completed applications found!" 
message when /tmp/spark-events directory is empty. As soon as there is an event 
log in the folder generated by a Spark streaming app (haven't tested other 
apps) the History Server gets stuck showing the "Loading history summary..." 
popup.

Spark version: 3.0.1 and 3.0.0. Spark 2.4.5 works as expected (running 
application shown in History Server)

Hadoop version: 2.10.0 and 2.10.1

Java version: openjdk 1.8.0_272

OS: CentOS 7.7

Command to reproduce:

 
{code:java}
spark-submit --class MyClass --master yarn --deploy-mode cluster --jars 
/usr/local/hadoop/share/hadoop/libs/*.jar myclass.jar{code}

spark-defaults.conf:

 

 
{code:java}
spark.eventLog.dir=/tmp/spark-events
spark.eventLog.enabled=true
spark.eventLog.rolling.enabled=true
spark.ui.enabled=true
spark.yarn.stagingDir=hdfs://10.0.1.4:54310/
{code}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-19609) Broadcast joins should pushdown join constraints as Filter to the larger relation

2020-11-17 Thread Jackson Westeen (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-19609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17233785#comment-17233785
 ] 

Jackson Westeen commented on SPARK-19609:
-

For what it's worth I think we're running into this in Spark 3.0 right now. 
Inner join column between two Dataframe's when used with the broadcast() hint 
is not getting pushed down to the datasource. Manually collecting the results 
of the smaller DF .select(joinColumn) back to the driver and using with 
.isInCollection seems to be an effective workaround, but it would be nice if 
this wasn't necessary.

> Broadcast joins should pushdown join constraints as Filter to the larger 
> relation
> -
>
> Key: SPARK-19609
> URL: https://issues.apache.org/jira/browse/SPARK-19609
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Nick Dimiduk
>Priority: Major
>  Labels: bulk-closed
>
> For broadcast inner-joins, where the smaller relation is known to be small 
> enough to materialize on a worker, the set of values for all join columns is 
> known and fits in memory. Spark should translate these values into a 
> {{Filter}} pushed down to the datasource. The common join condition of 
> equality, i.e. {{lhs.a == rhs.a}}, can be written as an {{a in ...}} clause. 
> An example of pushing such filters is already present in the form of 
> {{IsNotNull}} filters via [~sameerag]'s work on SPARK-12957 subtasks.
> This optimization could even work when the smaller relation does not fit 
> entirely in memory. This could be done by partitioning the smaller relation 
> into N pieces, applying this predicate pushdown for each piece, and unioning 
> the results.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33469) Add current_timezone function

2020-11-17 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17233773#comment-17233773
 ] 

Apache Spark commented on SPARK-33469:
--

User 'ulysses-you' has created a pull request for this issue:
https://github.com/apache/spark/pull/30400

> Add current_timezone function
> -
>
> Key: SPARK-33469
> URL: https://issues.apache.org/jira/browse/SPARK-33469
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: ulysses you
>Priority: Minor
>
> Add current_timezone function.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33469) Add current_timezone function

2020-11-17 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33469:


Assignee: Apache Spark

> Add current_timezone function
> -
>
> Key: SPARK-33469
> URL: https://issues.apache.org/jira/browse/SPARK-33469
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: ulysses you
>Assignee: Apache Spark
>Priority: Minor
>
> Add current_timezone function.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33469) Add current_timezone function

2020-11-17 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17233771#comment-17233771
 ] 

Apache Spark commented on SPARK-33469:
--

User 'ulysses-you' has created a pull request for this issue:
https://github.com/apache/spark/pull/30400

> Add current_timezone function
> -
>
> Key: SPARK-33469
> URL: https://issues.apache.org/jira/browse/SPARK-33469
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: ulysses you
>Priority: Minor
>
> Add current_timezone function.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33469) Add current_timezone function

2020-11-17 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33469:


Assignee: (was: Apache Spark)

> Add current_timezone function
> -
>
> Key: SPARK-33469
> URL: https://issues.apache.org/jira/browse/SPARK-33469
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: ulysses you
>Priority: Minor
>
> Add current_timezone function.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33469) Add current_timezone function

2020-11-17 Thread ulysses you (Jira)

ulysses you created SPARK-33469:
---

 Summary: Add current_timezone function
 Key: SPARK-33469
 URL: https://issues.apache.org/jira/browse/SPARK-33469
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.1.0
Reporter: ulysses you


Add current_timezone function.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-32222) Add K8s IT for conf propagation

2020-11-17 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-3?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-3.
---
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 30388
[https://github.com/apache/spark/pull/30388]

> Add K8s IT for conf propagation
> ---
>
> Key: SPARK-3
> URL: https://issues.apache.org/jira/browse/SPARK-3
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes
>Affects Versions: 3.1.0
>Reporter: Prashant Sharma
>Assignee: Prashant Sharma
>Priority: Major
> Fix For: 3.1.0
>
>
> An integration test by placing a configuration file in SPARK_CONF_DIR, and 
> verifying it is loaded on the executors in both client and cluster deploy 
> mode. 
> For this, a log4j.properties file is a good candidate for testing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33468) ParseUrl should fail if input string is not a valid url

2020-11-17 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33468:


Assignee: Apache Spark

> ParseUrl should fail if input string is not a valid url
> ---
>
> Key: SPARK-33468
> URL: https://issues.apache.org/jira/browse/SPARK-33468
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: ulysses you
>Assignee: Apache Spark
>Priority: Minor
>
> ParseUrl should fail if input string is not a valid url.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33468) ParseUrl should fail if input string is not a valid url

2020-11-17 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33468:


Assignee: (was: Apache Spark)

> ParseUrl should fail if input string is not a valid url
> ---
>
> Key: SPARK-33468
> URL: https://issues.apache.org/jira/browse/SPARK-33468
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: ulysses you
>Priority: Minor
>
> ParseUrl should fail if input string is not a valid url.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33468) ParseUrl should fail if input string is not a valid url

2020-11-17 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17233714#comment-17233714
 ] 

Apache Spark commented on SPARK-33468:
--

User 'ulysses-you' has created a pull request for this issue:
https://github.com/apache/spark/pull/30399

> ParseUrl should fail if input string is not a valid url
> ---
>
> Key: SPARK-33468
> URL: https://issues.apache.org/jira/browse/SPARK-33468
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: ulysses you
>Priority: Minor
>
> ParseUrl should fail if input string is not a valid url.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-33427) Interpreted subexpression elimination

2020-11-17 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-33427.
-
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 30341
[https://github.com/apache/spark/pull/30341]

> Interpreted subexpression elimination
> -
>
> Key: SPARK-33427
> URL: https://issues.apache.org/jira/browse/SPARK-33427
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
> Fix For: 3.1.0
>
>
> Currently we only do subexpression elimination for codegen. For some reasons, 
> we may need to run interpreted expression valuation. For example, codegen 
> fails to compile and fallback to interpreted mode. It is commonly seen for 
> complex schema from expressions that is possibly caused by the query 
> optimizer too, e.g. SPARK-32945.
> We should also support subexpression elimination for interpreted evaluation. 
> That could reduce performance difference when Spark fallbacks from codegen to 
> interpreted expression evaluation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33468) ParseUrl should fail if input string is not a valid url

2020-11-17 Thread ulysses you (Jira)

ulysses you created SPARK-33468:
---

 Summary: ParseUrl should fail if input string is not a valid url
 Key: SPARK-33468
 URL: https://issues.apache.org/jira/browse/SPARK-33468
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.1.0
Reporter: ulysses you


ParseUrl should fail if input string is not a valid url.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-33467) ParseUrl should fail if input string is not a valid url

2020-11-17 Thread ulysses you (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ulysses you resolved SPARK-33467.
-
Resolution: Invalid

> ParseUrl should fail if input string is not a valid url
> ---
>
> Key: SPARK-33467
> URL: https://issues.apache.org/jira/browse/SPARK-33467
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: ulysses you
>Priority: Minor
>
> ParseUrl should fail if input string is not a valid url



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33467) ParseUrl should fail if input string is not a valid url

2020-11-17 Thread ulysses you (Jira)

ulysses you created SPARK-33467:
---

 Summary: ParseUrl should fail if input string is not a valid url
 Key: SPARK-33467
 URL: https://issues.apache.org/jira/browse/SPARK-33467
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.1.0
Reporter: ulysses you


ParseUrl should fail if input string is not a valid url



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-33445) Can't parse decimal type from csv file

2020-11-17 Thread Punit Shah (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17233589#comment-17233589
 ] 

Punit Shah edited comment on SPARK-33445 at 11/17/20, 1:38 PM:
---

[~dongjoon]

As per the issue description, the call to spark_session.schema results in 
error.  Not spark_session.printSchema.

Please test again.

And I know for a fact that in 2.4.3 printSchema succeeds, whereas schema fails.


was (Author: bullsoverbears):
As per the issue description, the call to spark_session.schema results in 
error.  Not spark_session.printSchema.

Please test again.

And I know for a fact that in 2.4.3 printSchema succeeds, whereas schema fails.

> Can't parse decimal type from csv file
> --
>
> Key: SPARK-33445
> URL: https://issues.apache.org/jira/browse/SPARK-33445
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.6, 2.4.7, 3.0.0
>Reporter: Punit Shah
>Priority: Major
> Attachments: tsd.csv
>
>
> The attached file is a one column csv file containing decimals.
> Execute: {color:#de350b}mydf2 = spark_session.read.csv("tsd.csv", 
> header=True, inferSchema=True){color}
> Then invoking {color:#de350b}mydf2.schema{color} will result in error:
> {color:#ff8b00}ValueError: Could not parse datatype: decimal(6,-7){color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Reopened] (SPARK-33445) Can't parse decimal type from csv file

2020-11-17 Thread Punit Shah (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Punit Shah reopened SPARK-33445:


As per the issue description, the call to spark_session.schema results in 
error.  Not spark_session.printSchema.

Please test again.

And I know for a fact that in 2.4.3 printSchema succeeds, whereas schema fails.

> Can't parse decimal type from csv file
> --
>
> Key: SPARK-33445
> URL: https://issues.apache.org/jira/browse/SPARK-33445
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.6, 2.4.7, 3.0.0
>Reporter: Punit Shah
>Priority: Major
> Attachments: tsd.csv
>
>
> The attached file is a one column csv file containing decimals.
> Execute: {color:#de350b}mydf2 = spark_session.read.csv("tsd.csv", 
> header=True, inferSchema=True){color}
> Then invoking {color:#de350b}mydf2.schema{color} will result in error:
> {color:#ff8b00}ValueError: Could not parse datatype: decimal(6,-7){color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-32221) Avoid possible errors due to incorrect file size or type supplied in spark conf.

2020-11-17 Thread Prashant Sharma (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Sharma updated SPARK-32221:

Description: 
This would avoid failures, in case the files are a bit large or a user places a 
binary file inside the SPARK_CONF_DIR.

Both of which are not supported at the moment.

The reason is, underlying etcd store does limit the size of each entry to only 
1 MiB( Recent versions of K8s have moved to using 3.4.x of etcd which allows 
for 1.5MiB limit). Once etcd is upgraded in all the popular k8s clusters, then 
we can hope to overcome this limitation. e.g. 
[https://etcd.io/docs/v3.4.0/dev-guide/limit/] version of etcd allows for 
higher limit on each entry.

Even if that does not happen, there are other ways to overcome this limitation, 
for example, we can have config files split across multiple configMaps. We need 
to discuss, and prioritise, this issue takes the straightforward approach of 
skipping files that cannot be accommodated within 1.5MiB limit and WARNING the 
user about the same.

  was:
This would avoid failures, in case the files are a bit large or a user places a 
binary file inside the SPARK_CONF_DIR.

Both of which are not supported at the moment.

The reason is, underlying etcd store does limit the size of each entry to only 
1 MiB. Once etcd is upgraded in all the popular k8s clusters, then we can hope 
to overcome this limitation. e.g. 
[https://etcd.io/docs/v3.4.0/dev-guide/limit/] version of etcd allows for 
higher limit on each entry.

Even if that does not happen, there are other ways to overcome this limitation, 
for example, we can have config files split across multiple configMaps. We need 
to discuss, and prioritise, this issue takes the straightforward approach of 
skipping files that cannot be accommodated within 1MiB limit and WARNING the 
user about the same.


> Avoid possible errors due to incorrect file size or type supplied in spark 
> conf.
> 
>
> Key: SPARK-32221
> URL: https://issues.apache.org/jira/browse/SPARK-32221
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes
>Affects Versions: 3.1.0
>Reporter: Prashant Sharma
>Priority: Major
>
> This would avoid failures, in case the files are a bit large or a user places 
> a binary file inside the SPARK_CONF_DIR.
> Both of which are not supported at the moment.
> The reason is, underlying etcd store does limit the size of each entry to 
> only 1 MiB( Recent versions of K8s have moved to using 3.4.x of etcd which 
> allows for 1.5MiB limit). Once etcd is upgraded in all the popular k8s 
> clusters, then we can hope to overcome this limitation. e.g. 
> [https://etcd.io/docs/v3.4.0/dev-guide/limit/] version of etcd allows for 
> higher limit on each entry.
> Even if that does not happen, there are other ways to overcome this 
> limitation, for example, we can have config files split across multiple 
> configMaps. We need to discuss, and prioritise, this issue takes the 
> straightforward approach of skipping files that cannot be accommodated within 
> 1.5MiB limit and WARNING the user about the same.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33452) Create a V2 SHOW PARTITIONS execution node

2020-11-17 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17233502#comment-17233502
 ] 

Apache Spark commented on SPARK-33452:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/30398

> Create a V2 SHOW PARTITIONS execution node
> --
>
> Key: SPARK-33452
> URL: https://issues.apache.org/jira/browse/SPARK-33452
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Maxim Gekk
>Priority: Major
>
> There is the V1 SHOW PARTITIONS implementation:
> https://github.com/apache/spark/blob/7e99fcd64efa425f3c985df4fe957a3be274a49a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala#L975
> The ticket aims to add V2 implementation with similar behavior.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33452) Create a V2 SHOW PARTITIONS execution node

2020-11-17 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17233499#comment-17233499
 ] 

Apache Spark commented on SPARK-33452:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/30398

> Create a V2 SHOW PARTITIONS execution node
> --
>
> Key: SPARK-33452
> URL: https://issues.apache.org/jira/browse/SPARK-33452
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Maxim Gekk
>Priority: Major
>
> There is the V1 SHOW PARTITIONS implementation:
> https://github.com/apache/spark/blob/7e99fcd64efa425f3c985df4fe957a3be274a49a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala#L975
> The ticket aims to add V2 implementation with similar behavior.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33452) Create a V2 SHOW PARTITIONS execution node

2020-11-17 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33452:


Assignee: Apache Spark

> Create a V2 SHOW PARTITIONS execution node
> --
>
> Key: SPARK-33452
> URL: https://issues.apache.org/jira/browse/SPARK-33452
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Maxim Gekk
>Assignee: Apache Spark
>Priority: Major
>
> There is the V1 SHOW PARTITIONS implementation:
> https://github.com/apache/spark/blob/7e99fcd64efa425f3c985df4fe957a3be274a49a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala#L975
> The ticket aims to add V2 implementation with similar behavior.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33452) Create a V2 SHOW PARTITIONS execution node

2020-11-17 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33452:


Assignee: (was: Apache Spark)

> Create a V2 SHOW PARTITIONS execution node
> --
>
> Key: SPARK-33452
> URL: https://issues.apache.org/jira/browse/SPARK-33452
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Maxim Gekk
>Priority: Major
>
> There is the V1 SHOW PARTITIONS implementation:
> https://github.com/apache/spark/blob/7e99fcd64efa425f3c985df4fe957a3be274a49a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala#L975
> The ticket aims to add V2 implementation with similar behavior.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33466) Imputer support mode(most_frequent) strategy

2020-11-17 Thread zhengruifeng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengruifeng updated SPARK-33466:
-
Component/s: PySpark

> Imputer support mode(most_frequent) strategy
> 
>
> Key: SPARK-33466
> URL: https://issues.apache.org/jira/browse/SPARK-33466
> Project: Spark
>  Issue Type: New Feature
>  Components: ML, PySpark
>Affects Versions: 3.1.0
>Reporter: zhengruifeng
>Priority: Minor
>
> [sklearn.impute.SimpleImputer|https://scikit-learn.org/stable/modules/generated/sklearn.impute.SimpleImputer.html#sklearn.impute.SimpleImputer]
>  supports *most_frequent(mode)*, which replace missing using the most 
> frequent value along each column.
> It should be easy to implement it in MLlib.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33466) Imputer support mode(most_frequent) strategy

2020-11-17 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33466:


Assignee: Apache Spark

> Imputer support mode(most_frequent) strategy
> 
>
> Key: SPARK-33466
> URL: https://issues.apache.org/jira/browse/SPARK-33466
> Project: Spark
>  Issue Type: New Feature
>  Components: ML
>Affects Versions: 3.1.0
>Reporter: zhengruifeng
>Assignee: Apache Spark
>Priority: Minor
>
> [sklearn.impute.SimpleImputer|https://scikit-learn.org/stable/modules/generated/sklearn.impute.SimpleImputer.html#sklearn.impute.SimpleImputer]
>  supports *most_frequent(mode)*, which replace missing using the most 
> frequent value along each column.
> It should be easy to implement it in MLlib.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33466) Imputer support mode(most_frequent) strategy

2020-11-17 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33466:


Assignee: (was: Apache Spark)

> Imputer support mode(most_frequent) strategy
> 
>
> Key: SPARK-33466
> URL: https://issues.apache.org/jira/browse/SPARK-33466
> Project: Spark
>  Issue Type: New Feature
>  Components: ML
>Affects Versions: 3.1.0
>Reporter: zhengruifeng
>Priority: Minor
>
> [sklearn.impute.SimpleImputer|https://scikit-learn.org/stable/modules/generated/sklearn.impute.SimpleImputer.html#sklearn.impute.SimpleImputer]
>  supports *most_frequent(mode)*, which replace missing using the most 
> frequent value along each column.
> It should be easy to implement it in MLlib.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33466) Imputer support mode(most_frequent) strategy

2020-11-17 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17233472#comment-17233472
 ] 

Apache Spark commented on SPARK-33466:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/30397

> Imputer support mode(most_frequent) strategy
> 
>
> Key: SPARK-33466
> URL: https://issues.apache.org/jira/browse/SPARK-33466
> Project: Spark
>  Issue Type: New Feature
>  Components: ML
>Affects Versions: 3.1.0
>Reporter: zhengruifeng
>Priority: Minor
>
> [sklearn.impute.SimpleImputer|https://scikit-learn.org/stable/modules/generated/sklearn.impute.SimpleImputer.html#sklearn.impute.SimpleImputer]
>  supports *most_frequent(mode)*, which replace missing using the most 
> frequent value along each column.
> It should be easy to implement it in MLlib.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33466) Imputer support mode(most_frequent) strategy

2020-11-17 Thread zhengruifeng (Jira)

zhengruifeng created SPARK-33466:


 Summary: Imputer support mode(most_frequent) strategy
 Key: SPARK-33466
 URL: https://issues.apache.org/jira/browse/SPARK-33466
 Project: Spark
  Issue Type: New Feature
  Components: ML
Affects Versions: 3.1.0
Reporter: zhengruifeng


[sklearn.impute.SimpleImputer|https://scikit-learn.org/stable/modules/generated/sklearn.impute.SimpleImputer.html#sklearn.impute.SimpleImputer]
 supports *most_frequent(mode)*, which replace missing using the most frequent 
value along each column.

It should be easy to implement it in MLlib.

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-32863) Full outer stream-stream join

2020-11-17 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32863:


Assignee: (was: Apache Spark)

> Full outer stream-stream join
> -
>
> Key: SPARK-32863
> URL: https://issues.apache.org/jira/browse/SPARK-32863
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming
>Affects Versions: 3.1.0
>Reporter: Cheng Su
>Priority: Major
>
> Current stream-stream join supports inner, left outer and right outer join 
> ([https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingSymmetricHashJoinExec.scala#L166]
>  ). With current design of stream-stream join (which marks whether the row is 
> matched or not in state store), it would be very easy to support full outer 
> join as well.
>  
> Full outer stream-stream join will work as followed:
> (1).for left side input row, check if there's a match on right side state 
> store. If there's a match, output all matched rows. Put the row in left side 
> state store.
> (2).for right side input row, check if there's a match on left side state 
> store. If there's a match, output all matched rows and update left side rows 
> state with "matched" field to set to true. Put the right side row in right 
> side state store.
> (3).for left side row needs to be evicted from state store, output the row if 
> "matched" field is false.
> (4).for right side row needs to be evicted from state store, output the row 
> if "matched" field is false.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-32863) Full outer stream-stream join

2020-11-17 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32863:


Assignee: Apache Spark

> Full outer stream-stream join
> -
>
> Key: SPARK-32863
> URL: https://issues.apache.org/jira/browse/SPARK-32863
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming
>Affects Versions: 3.1.0
>Reporter: Cheng Su
>Assignee: Apache Spark
>Priority: Major
>
> Current stream-stream join supports inner, left outer and right outer join 
> ([https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingSymmetricHashJoinExec.scala#L166]
>  ). With current design of stream-stream join (which marks whether the row is 
> matched or not in state store), it would be very easy to support full outer 
> join as well.
>  
> Full outer stream-stream join will work as followed:
> (1).for left side input row, check if there's a match on right side state 
> store. If there's a match, output all matched rows. Put the row in left side 
> state store.
> (2).for right side input row, check if there's a match on left side state 
> store. If there's a match, output all matched rows and update left side rows 
> state with "matched" field to set to true. Put the right side row in right 
> side state store.
> (3).for left side row needs to be evicted from state store, output the row if 
> "matched" field is false.
> (4).for right side row needs to be evicted from state store, output the row 
> if "matched" field is false.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-32863) Full outer stream-stream join

2020-11-17 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17233424#comment-17233424
 ] 

Apache Spark commented on SPARK-32863:
--

User 'c21' has created a pull request for this issue:
https://github.com/apache/spark/pull/30395

> Full outer stream-stream join
> -
>
> Key: SPARK-32863
> URL: https://issues.apache.org/jira/browse/SPARK-32863
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming
>Affects Versions: 3.1.0
>Reporter: Cheng Su
>Priority: Major
>
> Current stream-stream join supports inner, left outer and right outer join 
> ([https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingSymmetricHashJoinExec.scala#L166]
>  ). With current design of stream-stream join (which marks whether the row is 
> matched or not in state store), it would be very easy to support full outer 
> join as well.
>  
> Full outer stream-stream join will work as followed:
> (1).for left side input row, check if there's a match on right side state 
> store. If there's a match, output all matched rows. Put the row in left side 
> state store.
> (2).for right side input row, check if there's a match on left side state 
> store. If there's a match, output all matched rows and update left side rows 
> state with "matched" field to set to true. Put the right side row in right 
> side state store.
> (3).for left side row needs to be evicted from state store, output the row if 
> "matched" field is false.
> (4).for right side row needs to be evicted from state store, output the row 
> if "matched" field is false.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-32863) Full outer stream-stream join

2020-11-17 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17233425#comment-17233425
 ] 

Apache Spark commented on SPARK-32863:
--

User 'c21' has created a pull request for this issue:
https://github.com/apache/spark/pull/30395

> Full outer stream-stream join
> -
>
> Key: SPARK-32863
> URL: https://issues.apache.org/jira/browse/SPARK-32863
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming
>Affects Versions: 3.1.0
>Reporter: Cheng Su
>Priority: Major
>
> Current stream-stream join supports inner, left outer and right outer join 
> ([https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingSymmetricHashJoinExec.scala#L166]
>  ). With current design of stream-stream join (which marks whether the row is 
> matched or not in state store), it would be very easy to support full outer 
> join as well.
>  
> Full outer stream-stream join will work as followed:
> (1).for left side input row, check if there's a match on right side state 
> store. If there's a match, output all matched rows. Put the row in left side 
> state store.
> (2).for right side input row, check if there's a match on left side state 
> store. If there's a match, output all matched rows and update left side rows 
> state with "matched" field to set to true. Put the right side row in right 
> side state store.
> (3).for left side row needs to be evicted from state store, output the row if 
> "matched" field is false.
> (4).for right side row needs to be evicted from state store, output the row 
> if "matched" field is false.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33464) Add/remove (un)necessary cache and restructure GitHub Actions yaml

2020-11-17 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17233408#comment-17233408
 ] 

Apache Spark commented on SPARK-33464:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/30394

> Add/remove (un)necessary cache and restructure GitHub Actions yaml
> --
>
> Key: SPARK-33464
> URL: https://issues.apache.org/jira/browse/SPARK-33464
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 2.4.8, 3.0.2, 3.1.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> Currently, GitHub Actions build has some unnecessary cache/commands. For 
> example, if you run SBT only .m2 cache is not needed. We should clean up and 
> re-organize.
> Also, we should add {{~/.sbt}} into cache. See 
> https://github.com/sbt/sbt/issues/3681



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

80 matches

Mail list logo