date:20201001

[jira] [Resolved] (SPARK-33048) Fix SparkBuild.scala to recognize build settings for Scala 2.13

2020-10-01 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-33048.
--
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 29927
[https://github.com/apache/spark/pull/29927]

> Fix SparkBuild.scala to recognize build settings for Scala 2.13
> ---
>
> Key: SPARK-33048
> URL: https://issues.apache.org/jira/browse/SPARK-33048
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 3.0.1, 3.1.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Major
> Fix For: 3.1.0
>
>
> In SparkBuild.scala, a variable 'scalaBinaryVersion' is hardcoded as '2.12'.
> So, an environment variable 'SPARK_SCALA_VERSION' is also to be '2.12'.
> This issue causes some test suites (e.g. SparkSubmitSuite) to be error.
> {code}
> = TEST OUTPUT FOR o.a.s.deploy.SparkSubmitSuite: 'user classpath first in 
> driver' =
> 20/10/02 08:55:30.234 redirect stderr for command 
> /home/kou/work/oss/spark-scala-2.13/bin/spark-submit INFO Utils: Error: Could 
> not find or load m
> ain class org.apache.spark.launcher.Main
> 20/10/02 08:55:30.235 redirect stderr for command 
> /home/kou/work/oss/spark-scala-2.13/bin/spark-submit INFO Utils: 
> /home/kou/work/oss/spark-scala-
> 2.13/bin/spark-class: line 96: CMD: bad array subscript
> {code}
> The reason of this error is that environment variables 'SPARK_JARS_DIR' and 
> 'LAUNCH_CLASSPATH' is defined in bin/spark-class as follows.
> {code}
> SPARK_JARS_DIR="${SPARK_HOME}/assembly/target/scala-$SPARK_SCALA_VERSION/jars"
> LAUNCH_CLASSPATH="${SPARK_HOME}/launcher/target/scala-$SPARK_SCALA_VERSION/classes:$LAUNCH_CLASSPATH"
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-33051) Uses setup-r to install R in GitHub Actions build

2020-10-01 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-33051.
--
Fix Version/s: 2.4.8
   3.0.2
   3.1.0
   Resolution: Fixed

Issue resolved by pull request 29931
[https://github.com/apache/spark/pull/29931]

> Uses setup-r to install R in GitHub Actions build
> -
>
> Key: SPARK-33051
> URL: https://issues.apache.org/jira/browse/SPARK-33051
> Project: Spark
>  Issue Type: Test
>  Components: Project Infra, SparkR
>Affects Versions: 2.4.7, 3.0.1, 3.1.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.1.0, 3.0.2, 2.4.8
>
>
> At SPARK-32493, the R installation was switched to manual installation 
> because setup-r was broken. This seems fixed in the upstream so we should 
> better switch it back.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33051) Uses setup-r to install R in GitHub Actions build

2020-10-01 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-33051:


Assignee: Hyukjin Kwon

> Uses setup-r to install R in GitHub Actions build
> -
>
> Key: SPARK-33051
> URL: https://issues.apache.org/jira/browse/SPARK-33051
> Project: Spark
>  Issue Type: Test
>  Components: Project Infra, SparkR
>Affects Versions: 2.4.7, 3.0.1, 3.1.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>
> At SPARK-32493, the R installation was switched to manual installation 
> because setup-r was broken. This seems fixed in the upstream so we should 
> better switch it back.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-33026) Add numRows to metric of BroadcastExchangeExec

2020-10-01 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-33026.
---
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 29904
[https://github.com/apache/spark/pull/29904]

> Add numRows to metric of BroadcastExchangeExec
> --
>
> Key: SPARK-33026
> URL: https://issues.apache.org/jira/browse/SPARK-33026
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 3.1.0
>
>
> {{numRows}} can be used here: 
> https://github.com/apache/spark/blob/d6a68e0b67ff7de58073c176dd097070e88ac831/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/JoinEstimation.scala#L55-L156



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33026) Add numRows to metric of BroadcastExchangeExec

2020-10-01 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-33026:
-

Assignee: Yuming Wang

> Add numRows to metric of BroadcastExchangeExec
> --
>
> Key: SPARK-33026
> URL: https://issues.apache.org/jira/browse/SPARK-33026
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
>
> {{numRows}} can be used here: 
> https://github.com/apache/spark/blob/d6a68e0b67ff7de58073c176dd097070e88ac831/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/JoinEstimation.scala#L55-L156



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33052) Make database versions up-to-date for integration tests

2020-10-01 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17205991#comment-17205991
 ] 

Apache Spark commented on SPARK-33052:
--

User 'maropu' has created a pull request for this issue:
https://github.com/apache/spark/pull/29932

> Make database versions up-to-date for integration tests
> ---
>
> Key: SPARK-33052
> URL: https://issues.apache.org/jira/browse/SPARK-33052
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Takeshi Yamamuro
>Priority: Major
>
> This ticket aims at updating database versions below for integration tests;
>  - ibmcom/db2:11.5.0.0a => ibmcom/db2:11.5.4.0 in DB2[Krb]IntegrationSuite
>  - mysql:5.7.28 => mysql:5.7.31 in MySQLIntegrationSuite
>  - postgres:12.0 => postgres:13.0 in Postgres[Krb]IntegrationSuite



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33052) Make database versions up-to-date for integration tests

2020-10-01 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17205990#comment-17205990
 ] 

Apache Spark commented on SPARK-33052:
--

User 'maropu' has created a pull request for this issue:
https://github.com/apache/spark/pull/29932

> Make database versions up-to-date for integration tests
> ---
>
> Key: SPARK-33052
> URL: https://issues.apache.org/jira/browse/SPARK-33052
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Takeshi Yamamuro
>Priority: Major
>
> This ticket aims at updating database versions below for integration tests;
>  - ibmcom/db2:11.5.0.0a => ibmcom/db2:11.5.4.0 in DB2[Krb]IntegrationSuite
>  - mysql:5.7.28 => mysql:5.7.31 in MySQLIntegrationSuite
>  - postgres:12.0 => postgres:13.0 in Postgres[Krb]IntegrationSuite



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33052) Make database versions up-to-date for integration tests

2020-10-01 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33052:


Assignee: Apache Spark

> Make database versions up-to-date for integration tests
> ---
>
> Key: SPARK-33052
> URL: https://issues.apache.org/jira/browse/SPARK-33052
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Takeshi Yamamuro
>Assignee: Apache Spark
>Priority: Major
>
> This ticket aims at updating database versions below for integration tests;
>  - ibmcom/db2:11.5.0.0a => ibmcom/db2:11.5.4.0 in DB2[Krb]IntegrationSuite
>  - mysql:5.7.28 => mysql:5.7.31 in MySQLIntegrationSuite
>  - postgres:12.0 => postgres:13.0 in Postgres[Krb]IntegrationSuite



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33052) Make database versions up-to-date for integration tests

2020-10-01 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33052:


Assignee: (was: Apache Spark)

> Make database versions up-to-date for integration tests
> ---
>
> Key: SPARK-33052
> URL: https://issues.apache.org/jira/browse/SPARK-33052
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Takeshi Yamamuro
>Priority: Major
>
> This ticket aims at updating database versions below for integration tests;
>  - ibmcom/db2:11.5.0.0a => ibmcom/db2:11.5.4.0 in DB2[Krb]IntegrationSuite
>  - mysql:5.7.28 => mysql:5.7.31 in MySQLIntegrationSuite
>  - postgres:12.0 => postgres:13.0 in Postgres[Krb]IntegrationSuite



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33052) Make database versions up-to-date for integration tests

2020-10-01 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33052:


Assignee: (was: Apache Spark)

> Make database versions up-to-date for integration tests
> ---
>
> Key: SPARK-33052
> URL: https://issues.apache.org/jira/browse/SPARK-33052
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Takeshi Yamamuro
>Priority: Major
>
> This ticket aims at updating database versions below for integration tests;
>  - ibmcom/db2:11.5.0.0a => ibmcom/db2:11.5.4.0 in DB2[Krb]IntegrationSuite
>  - mysql:5.7.28 => mysql:5.7.31 in MySQLIntegrationSuite
>  - postgres:12.0 => postgres:13.0 in Postgres[Krb]IntegrationSuite



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33052) Make database versions up-to-date for integration tests

2020-10-01 Thread Takeshi Yamamuro (Jira)

Takeshi Yamamuro created SPARK-33052:


 Summary: Make database versions up-to-date for integration tests
 Key: SPARK-33052
 URL: https://issues.apache.org/jira/browse/SPARK-33052
 Project: Spark
  Issue Type: Test
  Components: SQL
Affects Versions: 3.1.0
Reporter: Takeshi Yamamuro


This ticket aims at updating database versions below for integration tests;

 - ibmcom/db2:11.5.0.0a => ibmcom/db2:11.5.4.0 in DB2[Krb]IntegrationSuite
 - mysql:5.7.28 => mysql:5.7.31 in MySQLIntegrationSuite
 - postgres:12.0 => postgres:13.0 in Postgres[Krb]IntegrationSuite




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33051) Uses setup-r to install R in GitHub Actions build

2020-10-01 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33051:


Assignee: Apache Spark

> Uses setup-r to install R in GitHub Actions build
> -
>
> Key: SPARK-33051
> URL: https://issues.apache.org/jira/browse/SPARK-33051
> Project: Spark
>  Issue Type: Test
>  Components: Project Infra, SparkR
>Affects Versions: 2.4.7, 3.0.1, 3.1.0
>Reporter: Hyukjin Kwon
>Assignee: Apache Spark
>Priority: Major
>
> At SPARK-32493, the R installation was switched to manual installation 
> because setup-r was broken. This seems fixed in the upstream so we should 
> better switch it back.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33051) Uses setup-r to install R in GitHub Actions build

2020-10-01 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17205984#comment-17205984
 ] 

Apache Spark commented on SPARK-33051:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/29931

> Uses setup-r to install R in GitHub Actions build
> -
>
> Key: SPARK-33051
> URL: https://issues.apache.org/jira/browse/SPARK-33051
> Project: Spark
>  Issue Type: Test
>  Components: Project Infra, SparkR
>Affects Versions: 2.4.7, 3.0.1, 3.1.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> At SPARK-32493, the R installation was switched to manual installation 
> because setup-r was broken. This seems fixed in the upstream so we should 
> better switch it back.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33051) Uses setup-r to install R in GitHub Actions build

2020-10-01 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33051:


Assignee: (was: Apache Spark)

> Uses setup-r to install R in GitHub Actions build
> -
>
> Key: SPARK-33051
> URL: https://issues.apache.org/jira/browse/SPARK-33051
> Project: Spark
>  Issue Type: Test
>  Components: Project Infra, SparkR
>Affects Versions: 2.4.7, 3.0.1, 3.1.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> At SPARK-32493, the R installation was switched to manual installation 
> because setup-r was broken. This seems fixed in the upstream so we should 
> better switch it back.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33051) Uses setup-r to install R in GitHub Actions build

2020-10-01 Thread Hyukjin Kwon (Jira)

Hyukjin Kwon created SPARK-33051:


 Summary: Uses setup-r to install R in GitHub Actions build
 Key: SPARK-33051
 URL: https://issues.apache.org/jira/browse/SPARK-33051
 Project: Spark
  Issue Type: Test
  Components: Project Infra, SparkR
Affects Versions: 3.0.1, 2.4.7, 3.1.0
Reporter: Hyukjin Kwon


At SPARK-32493, the R installation was switched to manual installation because 
setup-r was broken. This seems fixed in the upstream so we should better switch 
it back.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-32001) Create Kerberos authentication provider API in JDBC connector

2020-10-01 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-32001.
--
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 29024
[https://github.com/apache/spark/pull/29024]

> Create Kerberos authentication provider API in JDBC connector
> -
>
> Key: SPARK-32001
> URL: https://issues.apache.org/jira/browse/SPARK-32001
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.1.0
>Reporter: Gabor Somogyi
>Assignee: Gabor Somogyi
>Priority: Major
> Fix For: 3.1.0
>
>
> Adding embedded provider to all the possible databases would generate high 
> maintenance cost on Spark side.
> Instead an API can be introduced which would allow to implement further 
> providers independently.
> One important requirement what I suggest is: JDBC connection providers must 
> be loaded independently just like delegation token providers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-32001) Create Kerberos authentication provider API in JDBC connector

2020-10-01 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-32001:


Assignee: Gabor Somogyi

> Create Kerberos authentication provider API in JDBC connector
> -
>
> Key: SPARK-32001
> URL: https://issues.apache.org/jira/browse/SPARK-32001
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.1.0
>Reporter: Gabor Somogyi
>Assignee: Gabor Somogyi
>Priority: Major
>
> Adding embedded provider to all the possible databases would generate high 
> maintenance cost on Spark side.
> Instead an API can be introduced which would allow to implement further 
> providers independently.
> One important requirement what I suggest is: JDBC connection providers must 
> be loaded independently just like delegation token providers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33044) Add a Jenkins build and test job for Scala 2.13

2020-10-01 Thread Hyukjin Kwon (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17205947#comment-17205947
 ] 

Hyukjin Kwon commented on SPARK-33044:
--

Thanks [~dongjoon] for cc'ing me. Yeah, setting up a Jenkins job sounds good.

> Add a Jenkins build and test job for Scala 2.13
> ---
>
> Key: SPARK-33044
> URL: https://issues.apache.org/jira/browse/SPARK-33044
> Project: Spark
>  Issue Type: Sub-task
>  Components: jenkins
>Affects Versions: 3.1.0
>Reporter: Yang Jie
>Priority: Major
>
> {{Master}} branch seems to be almost ready for Scala 2.13 now, we need a 
> Jenkins test job to verify current work results and CI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-32996) Handle Option.empty v1.ExecutorSummary#peakMemoryMetrics

2020-10-01 Thread L. C. Hsieh (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

L. C. Hsieh updated SPARK-32996:

Fix Version/s: 3.0.2

> Handle Option.empty v1.ExecutorSummary#peakMemoryMetrics
> 
>
> Key: SPARK-32996
> URL: https://issues.apache.org/jira/browse/SPARK-32996
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.0.1, 3.1.0
>Reporter: Shruti Gumma
>Assignee: Shruti Gumma
>Priority: Major
> Fix For: 3.0.2, 3.1.0
>
>
> When {{peakMemoryMetrics}} in {{ExecutorSummary}} is {{Option.empty}}, then 
> the {{ExecutorMetricsJsonSerializer#serialize}} method does not execute the 
> {{jsonGenerator.writeObject}} method. This causes the json to be generated 
> with {{peakMemoryMetrics}} key added to the serialized string, but no 
> corresponding value.
> This causes an error to be thrown when it is the next key {{attributes}} turn 
> to be added to the json:
> {{com.fasterxml.jackson.core.JsonGenerationException: Can not write a field 
> name, expecting a value.}}
> {{}}
> {{}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33050) Upgrade Apache ORC to 1.5.12

2020-10-01 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33050:


Assignee: (was: Apache Spark)

> Upgrade Apache ORC to 1.5.12
> 
>
> Key: SPARK-33050
> URL: https://issues.apache.org/jira/browse/SPARK-33050
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33050) Upgrade Apache ORC to 1.5.12

2020-10-01 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33050:


Assignee: Apache Spark

> Upgrade Apache ORC to 1.5.12
> 
>
> Key: SPARK-33050
> URL: https://issues.apache.org/jira/browse/SPARK-33050
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33050) Upgrade Apache ORC to 1.5.12

2020-10-01 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17205921#comment-17205921
 ] 

Apache Spark commented on SPARK-33050:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/29930

> Upgrade Apache ORC to 1.5.12
> 
>
> Key: SPARK-33050
> URL: https://issues.apache.org/jira/browse/SPARK-33050
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33050) Upgrade Apache ORC to 1.5.12

2020-10-01 Thread Dongjoon Hyun (Jira)

Dongjoon Hyun created SPARK-33050:
-

 Summary: Upgrade Apache ORC to 1.5.12
 Key: SPARK-33050
 URL: https://issues.apache.org/jira/browse/SPARK-33050
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 3.1.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-32007) Spark Driver Supervise does not work reliably

2020-10-01 Thread Aoyuan Liao (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17205918#comment-17205918
 ] 

Aoyuan Liao edited comment on SPARK-32007 at 10/2/20, 1:40 AM:
---

[~surajs21]

Can you please post master's log for the first behavior? 


was (Author: eveliao):
[~surajs21]

Can you please post master's log for more information? 

> Spark Driver Supervise does not work reliably
> -
>
> Key: SPARK-32007
> URL: https://issues.apache.org/jira/browse/SPARK-32007
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.4
> Environment: |Java Version|1.8.0_121 (Oracle Corporation)|
> |Java Home|/usr/java/jdk1.8.0_121/jre|
> |Scala Version|version 2.11.12|
> |OS|Amazon Linux|
> h4.  
>Reporter: Suraj Sharma
>Priority: Critical
>
> I have a standalone cluster setup. I DO NOT have a streaming use case. I use 
> AWS EC2 machines to have spark master and worker processes.
> *Problem*: If a spark worker machine running some drivers and executor dies, 
> then the driver is not spawned again on other healthy machines.
> *Below are my findings:*
> ||Action/Behaviour||Executor||Driver||
> |Worker Machine Stop|Relaunches on an active machine|NO Relaunch|
> |kill -9 to process|Relaunches on other machines|Relaunches on other machines|
> |kill to process|Relaunches on other machines|Relaunches on other machines|
> *Cluster Setup:*
>  # I have a spark standalone cluster
>  # {{spark.driver.supervise=true}}
>  # Spark Master HA is enabled and is backed by zookeeper
>  # Spark version = 2.4.4
>  # I am using a systemd script for the spark worker process



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-32007) Spark Driver Supervise does not work reliably

2020-10-01 Thread Aoyuan Liao (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17205918#comment-17205918
 ] 

Aoyuan Liao commented on SPARK-32007:
-

[~surajs21]

Can you please post master's log for more information? 

> Spark Driver Supervise does not work reliably
> -
>
> Key: SPARK-32007
> URL: https://issues.apache.org/jira/browse/SPARK-32007
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.4
> Environment: |Java Version|1.8.0_121 (Oracle Corporation)|
> |Java Home|/usr/java/jdk1.8.0_121/jre|
> |Scala Version|version 2.11.12|
> |OS|Amazon Linux|
> h4.  
>Reporter: Suraj Sharma
>Priority: Critical
>
> I have a standalone cluster setup. I DO NOT have a streaming use case. I use 
> AWS EC2 machines to have spark master and worker processes.
> *Problem*: If a spark worker machine running some drivers and executor dies, 
> then the driver is not spawned again on other healthy machines.
> *Below are my findings:*
> ||Action/Behaviour||Executor||Driver||
> |Worker Machine Stop|Relaunches on an active machine|NO Relaunch|
> |kill -9 to process|Relaunches on other machines|Relaunches on other machines|
> |kill to process|Relaunches on other machines|Relaunches on other machines|
> *Cluster Setup:*
>  # I have a spark standalone cluster
>  # {{spark.driver.supervise=true}}
>  # Spark Master HA is enabled and is backed by zookeeper
>  # Spark version = 2.4.4
>  # I am using a systemd script for the spark worker process



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33049) Decommission Core Integration Test is flaky.

2020-10-01 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17205916#comment-17205916
 ] 

Apache Spark commented on SPARK-33049:
--

User 'holdenk' has created a pull request for this issue:
https://github.com/apache/spark/pull/29929

> Decommission Core Integration Test is flaky.
> 
>
> Key: SPARK-33049
> URL: https://issues.apache.org/jira/browse/SPARK-33049
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Tests
>Affects Versions: 3.1.0
>Reporter: Holden Karau
>Priority: Trivial
>
> See https://github.com/apache/spark/pull/29923#issuecomment-702344724



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33049) Decommission Core Integration Test is flaky.

2020-10-01 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33049:


Assignee: (was: Apache Spark)

> Decommission Core Integration Test is flaky.
> 
>
> Key: SPARK-33049
> URL: https://issues.apache.org/jira/browse/SPARK-33049
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Tests
>Affects Versions: 3.1.0
>Reporter: Holden Karau
>Priority: Trivial
>
> See https://github.com/apache/spark/pull/29923#issuecomment-702344724



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33049) Decommission Core Integration Test is flaky.

2020-10-01 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33049:


Assignee: Apache Spark

> Decommission Core Integration Test is flaky.
> 
>
> Key: SPARK-33049
> URL: https://issues.apache.org/jira/browse/SPARK-33049
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Tests
>Affects Versions: 3.1.0
>Reporter: Holden Karau
>Assignee: Apache Spark
>Priority: Trivial
>
> See https://github.com/apache/spark/pull/29923#issuecomment-702344724



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33049) Decommission Core Integration Test is flaky.

2020-10-01 Thread Holden Karau (Jira)

Holden Karau created SPARK-33049:


 Summary: Decommission Core Integration Test is flaky.
 Key: SPARK-33049
 URL: https://issues.apache.org/jira/browse/SPARK-33049
 Project: Spark
  Issue Type: Bug
  Components: Spark Core, Tests
Affects Versions: 3.1.0
Reporter: Holden Karau


See https://github.com/apache/spark/pull/29923#issuecomment-702344724



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-32741) Check if the same ExprId refers to the unique attribute in logical plans

2020-10-01 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17205908#comment-17205908
 ] 

Apache Spark commented on SPARK-32741:
--

User 'maropu' has created a pull request for this issue:
https://github.com/apache/spark/pull/29928

> Check if the same ExprId refers to the unique attribute in logical plans
> 
>
> Key: SPARK-32741
> URL: https://issues.apache.org/jira/browse/SPARK-32741
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Takeshi Yamamuro
>Assignee: Takeshi Yamamuro
>Priority: Major
> Fix For: 3.1.0
>
>
> Some plan transformations (e.g., `RemoveNoopOperators`) implicitly assume the 
> same `ExprId` refers to the unique attribute. But, `RuleExecutor` does not 
> check this integrity between logical plan transformations. So, this ticket 
> targets at adding this check in `isPlanIntegral` of `Analyzer`/`Optimizer`.
> This PR comes from the talk with @cloud-fan @viirya in 
> https://github.com/apache/spark/pull/29485#discussion_r475346278



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33048) Fix SparkBuild.scala to recognize build settings for Scala 2.13

2020-10-01 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17205899#comment-17205899
 ] 

Apache Spark commented on SPARK-33048:
--

User 'sarutak' has created a pull request for this issue:
https://github.com/apache/spark/pull/29927

> Fix SparkBuild.scala to recognize build settings for Scala 2.13
> ---
>
> Key: SPARK-33048
> URL: https://issues.apache.org/jira/browse/SPARK-33048
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 3.0.1, 3.1.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Major
>
> In SparkBuild.scala, a variable 'scalaBinaryVersion' is hardcoded as '2.12'.
> So, an environment variable 'SPARK_SCALA_VERSION' is also to be '2.12'.
> This issue causes some test suites (e.g. SparkSubmitSuite) to be error.
> {code}
> = TEST OUTPUT FOR o.a.s.deploy.SparkSubmitSuite: 'user classpath first in 
> driver' =
> 20/10/02 08:55:30.234 redirect stderr for command 
> /home/kou/work/oss/spark-scala-2.13/bin/spark-submit INFO Utils: Error: Could 
> not find or load m
> ain class org.apache.spark.launcher.Main
> 20/10/02 08:55:30.235 redirect stderr for command 
> /home/kou/work/oss/spark-scala-2.13/bin/spark-submit INFO Utils: 
> /home/kou/work/oss/spark-scala-
> 2.13/bin/spark-class: line 96: CMD: bad array subscript
> {code}
> The reason of this error is that environment variables 'SPARK_JARS_DIR' and 
> 'LAUNCH_CLASSPATH' is defined in bin/spark-class as follows.
> {code}
> SPARK_JARS_DIR="${SPARK_HOME}/assembly/target/scala-$SPARK_SCALA_VERSION/jars"
> LAUNCH_CLASSPATH="${SPARK_HOME}/launcher/target/scala-$SPARK_SCALA_VERSION/classes:$LAUNCH_CLASSPATH"
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33048) Fix SparkBuild.scala to recognize build settings for Scala 2.13

2020-10-01 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33048:


Assignee: Kousuke Saruta  (was: Apache Spark)

> Fix SparkBuild.scala to recognize build settings for Scala 2.13
> ---
>
> Key: SPARK-33048
> URL: https://issues.apache.org/jira/browse/SPARK-33048
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 3.0.1, 3.1.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Major
>
> In SparkBuild.scala, a variable 'scalaBinaryVersion' is hardcoded as '2.12'.
> So, an environment variable 'SPARK_SCALA_VERSION' is also to be '2.12'.
> This issue causes some test suites (e.g. SparkSubmitSuite) to be error.
> {code}
> = TEST OUTPUT FOR o.a.s.deploy.SparkSubmitSuite: 'user classpath first in 
> driver' =
> 20/10/02 08:55:30.234 redirect stderr for command 
> /home/kou/work/oss/spark-scala-2.13/bin/spark-submit INFO Utils: Error: Could 
> not find or load m
> ain class org.apache.spark.launcher.Main
> 20/10/02 08:55:30.235 redirect stderr for command 
> /home/kou/work/oss/spark-scala-2.13/bin/spark-submit INFO Utils: 
> /home/kou/work/oss/spark-scala-
> 2.13/bin/spark-class: line 96: CMD: bad array subscript
> {code}
> The reason of this error is that environment variables 'SPARK_JARS_DIR' and 
> 'LAUNCH_CLASSPATH' is defined in bin/spark-class as follows.
> {code}
> SPARK_JARS_DIR="${SPARK_HOME}/assembly/target/scala-$SPARK_SCALA_VERSION/jars"
> LAUNCH_CLASSPATH="${SPARK_HOME}/launcher/target/scala-$SPARK_SCALA_VERSION/classes:$LAUNCH_CLASSPATH"
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33048) Fix SparkBuild.scala to recognize build settings for Scala 2.13

2020-10-01 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33048:


Assignee: Apache Spark  (was: Kousuke Saruta)

> Fix SparkBuild.scala to recognize build settings for Scala 2.13
> ---
>
> Key: SPARK-33048
> URL: https://issues.apache.org/jira/browse/SPARK-33048
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 3.0.1, 3.1.0
>Reporter: Kousuke Saruta
>Assignee: Apache Spark
>Priority: Major
>
> In SparkBuild.scala, a variable 'scalaBinaryVersion' is hardcoded as '2.12'.
> So, an environment variable 'SPARK_SCALA_VERSION' is also to be '2.12'.
> This issue causes some test suites (e.g. SparkSubmitSuite) to be error.
> {code}
> = TEST OUTPUT FOR o.a.s.deploy.SparkSubmitSuite: 'user classpath first in 
> driver' =
> 20/10/02 08:55:30.234 redirect stderr for command 
> /home/kou/work/oss/spark-scala-2.13/bin/spark-submit INFO Utils: Error: Could 
> not find or load m
> ain class org.apache.spark.launcher.Main
> 20/10/02 08:55:30.235 redirect stderr for command 
> /home/kou/work/oss/spark-scala-2.13/bin/spark-submit INFO Utils: 
> /home/kou/work/oss/spark-scala-
> 2.13/bin/spark-class: line 96: CMD: bad array subscript
> {code}
> The reason of this error is that environment variables 'SPARK_JARS_DIR' and 
> 'LAUNCH_CLASSPATH' is defined in bin/spark-class as follows.
> {code}
> SPARK_JARS_DIR="${SPARK_HOME}/assembly/target/scala-$SPARK_SCALA_VERSION/jars"
> LAUNCH_CLASSPATH="${SPARK_HOME}/launcher/target/scala-$SPARK_SCALA_VERSION/classes:$LAUNCH_CLASSPATH"
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33048) Fix SparkBuild.scala to recognize build settings for Scala 2.13

2020-10-01 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17205898#comment-17205898
 ] 

Apache Spark commented on SPARK-33048:
--

User 'sarutak' has created a pull request for this issue:
https://github.com/apache/spark/pull/29927

> Fix SparkBuild.scala to recognize build settings for Scala 2.13
> ---
>
> Key: SPARK-33048
> URL: https://issues.apache.org/jira/browse/SPARK-33048
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 3.0.1, 3.1.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Major
>
> In SparkBuild.scala, a variable 'scalaBinaryVersion' is hardcoded as '2.12'.
> So, an environment variable 'SPARK_SCALA_VERSION' is also to be '2.12'.
> This issue causes some test suites (e.g. SparkSubmitSuite) to be error.
> {code}
> = TEST OUTPUT FOR o.a.s.deploy.SparkSubmitSuite: 'user classpath first in 
> driver' =
> 20/10/02 08:55:30.234 redirect stderr for command 
> /home/kou/work/oss/spark-scala-2.13/bin/spark-submit INFO Utils: Error: Could 
> not find or load m
> ain class org.apache.spark.launcher.Main
> 20/10/02 08:55:30.235 redirect stderr for command 
> /home/kou/work/oss/spark-scala-2.13/bin/spark-submit INFO Utils: 
> /home/kou/work/oss/spark-scala-
> 2.13/bin/spark-class: line 96: CMD: bad array subscript
> {code}
> The reason of this error is that environment variables 'SPARK_JARS_DIR' and 
> 'LAUNCH_CLASSPATH' is defined in bin/spark-class as follows.
> {code}
> SPARK_JARS_DIR="${SPARK_HOME}/assembly/target/scala-$SPARK_SCALA_VERSION/jars"
> LAUNCH_CLASSPATH="${SPARK_HOME}/launcher/target/scala-$SPARK_SCALA_VERSION/classes:$LAUNCH_CLASSPATH"
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33045) Implement built-in LIKE ANY and LIKE ALL UDF

2020-10-01 Thread Yuming Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-33045:

Description: 
We already support LIKE ANY / SOME / ALL syntax, but it will throw 
{{StackOverflowError}} if there are many elements(more than 14378 elements). We 
should implement built-in LIKE ANY and LIKE ALL UDF.

{noformat}
java.lang.StackOverflowError
at 
scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
at 
scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38)
at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)
at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)
at 
scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:184)
at 
scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:47)
at 
scala.collection.generic.GenericCompanion.apply(GenericCompanion.scala:53)
at 
org.apache.spark.sql.catalyst.expressions.BinaryExpression.children(Expression.scala:549)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.foreach(TreeNode.scala:175)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1(TreeNode.scala:175)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1$adapted(TreeNode.scala:175)
at scala.collection.immutable.List.foreach(List.scala:392)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.foreach(TreeNode.scala:175)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1(TreeNode.scala:175)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1$adapted(TreeNode.scala:175)
at scala.collection.immutable.List.foreach(List.scala:392)
{noformat}


  was:
We already support ANY / SOME / ALL syntax, but it will throw 
{{StackOverflowError}} if there are many elements(more than 14378 elements). We 
should implement built-in LIKE ANY and LIKE ALL UDF.

{noformat}
java.lang.StackOverflowError
at 
scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
at 
scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38)
at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)
at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)
at 
scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:184)
at 
scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:47)
at 
scala.collection.generic.GenericCompanion.apply(GenericCompanion.scala:53)
at 
org.apache.spark.sql.catalyst.expressions.BinaryExpression.children(Expression.scala:549)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.foreach(TreeNode.scala:175)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1(TreeNode.scala:175)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1$adapted(TreeNode.scala:175)
at scala.collection.immutable.List.foreach(List.scala:392)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.foreach(TreeNode.scala:175)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1(TreeNode.scala:175)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1$adapted(TreeNode.scala:175)
at scala.collection.immutable.List.foreach(List.scala:392)
{noformat}



> Implement built-in LIKE ANY and LIKE ALL UDF
> 
>
> Key: SPARK-33045
> URL: https://issues.apache.org/jira/browse/SPARK-33045
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Yuming Wang
>Priority: Major
>
> We already support LIKE ANY / SOME / ALL syntax, but it will throw 
> {{StackOverflowError}} if there are many elements(more than 14378 elements). 
> We should implement built-in LIKE ANY and LIKE ALL UDF.
> {noformat}
> java.lang.StackOverflowError
>   at 
> scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
>   at 
> scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
>   at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38)
>   at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)
>   at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)
>   at 
> scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:184)
>   at 
> scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:47)
>   at 
> scala.collection.generic.GenericCompanion.apply(GenericCompanion.scala:53)
>

[jira] [Updated] (SPARK-33045) Implement built-in LIKE ANY and LIKE ALL UDF

2020-10-01 Thread Yuming Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-33045:

Description: 
We already support LIKE ANY / SOME / ALL syntax, but it will throw 
{{StackOverflowError}} if there are many elements(more than 14378 elements). We 
should implement built-in LIKE ANY and LIKE ALL UDF to fix this issue.

{noformat}
java.lang.StackOverflowError
at 
scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
at 
scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38)
at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)
at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)
at 
scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:184)
at 
scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:47)
at 
scala.collection.generic.GenericCompanion.apply(GenericCompanion.scala:53)
at 
org.apache.spark.sql.catalyst.expressions.BinaryExpression.children(Expression.scala:549)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.foreach(TreeNode.scala:175)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1(TreeNode.scala:175)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1$adapted(TreeNode.scala:175)
at scala.collection.immutable.List.foreach(List.scala:392)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.foreach(TreeNode.scala:175)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1(TreeNode.scala:175)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1$adapted(TreeNode.scala:175)
at scala.collection.immutable.List.foreach(List.scala:392)
{noformat}


  was:
We already support LIKE ANY / SOME / ALL syntax, but it will throw 
{{StackOverflowError}} if there are many elements(more than 14378 elements). We 
should implement built-in LIKE ANY and LIKE ALL UDF.

{noformat}
java.lang.StackOverflowError
at 
scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
at 
scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38)
at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)
at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)
at 
scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:184)
at 
scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:47)
at 
scala.collection.generic.GenericCompanion.apply(GenericCompanion.scala:53)
at 
org.apache.spark.sql.catalyst.expressions.BinaryExpression.children(Expression.scala:549)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.foreach(TreeNode.scala:175)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1(TreeNode.scala:175)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1$adapted(TreeNode.scala:175)
at scala.collection.immutable.List.foreach(List.scala:392)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.foreach(TreeNode.scala:175)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1(TreeNode.scala:175)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1$adapted(TreeNode.scala:175)
at scala.collection.immutable.List.foreach(List.scala:392)
{noformat}



> Implement built-in LIKE ANY and LIKE ALL UDF
> 
>
> Key: SPARK-33045
> URL: https://issues.apache.org/jira/browse/SPARK-33045
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Yuming Wang
>Priority: Major
>
> We already support LIKE ANY / SOME / ALL syntax, but it will throw 
> {{StackOverflowError}} if there are many elements(more than 14378 elements). 
> We should implement built-in LIKE ANY and LIKE ALL UDF to fix this issue.
> {noformat}
> java.lang.StackOverflowError
>   at 
> scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
>   at 
> scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
>   at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38)
>   at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)
>   at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)
>   at 
> scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:184)
>   at 
> scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:47)
>   at 
> scala.collection.generic.GenericCompanion.

[jira] [Resolved] (SPARK-32859) Introduce SQL physical plan rule to decide enable/disable bucketing

2020-10-01 Thread Takeshi Yamamuro (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takeshi Yamamuro resolved SPARK-32859.
--
Fix Version/s: 3.1.0
 Assignee: Cheng Su
   Resolution: Fixed

Resolved by https://github.com/apache/spark/pull/29804

> Introduce SQL physical plan rule to decide enable/disable bucketing 
> 
>
> Key: SPARK-32859
> URL: https://issues.apache.org/jira/browse/SPARK-32859
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Cheng Su
>Assignee: Cheng Su
>Priority: Minor
> Fix For: 3.1.0
>
>
> Discussed with [~cloud_fan] offline, it would be better that we can decide 
> enable/disable SQL bucketing automatically according to query plan. Currently 
> bucketing is enabled by default (`spark.sql.sources.bucketing.enabled`=true), 
> so for all bucketed tables in the query plan, we will use bucket table scan 
> (all input files per the bucket will be read by same task). This has the 
> drawback that if the bucket table scan is not benefitting at all (no 
> join/groupby/etc in the query), we don't need to use bucket table scan as it 
> would restrict the # of tasks to be # of buckets and might hurt parallelism.
>  
> The proposed change is to introduce a physical plan rule (right before 
> `ensureRequirements`):
> (1).transformUp() physical plan, matching SparkPlan operator which is 
> FileSourceScanExec, if optionalBucketSet is set, enabling bucket scan (bucket 
> filter in this case).
> (2).transformUp() physical plan, matching SparkPlan operator which is 
> SparkPlanWithInterestingPartitioning.
> SparkPlanWithInterestingPartitioning: the plan is in \{SortMergeJoinExec, 
> ShuffledHashJoinExec, HashAggregateExec, ObjectHashAggregateExec, 
> SortAggregateExec, etc, which has 
> HashClusteredDistribution/ClusteredDistribution in 
> requiredChildDistribution}, and its requiredChildDistribution 
> HashClusteredDistribution/ClusteredDistribution on its underlying 
> FileSourceScanExec's bucketed columns.
> (3).for any child of SparkPlanWithInterestingPartitioning, which does not 
> satisfy the plan's requiredChildDistribution: go though the child's sub query 
> plan tree.
>  if (3.1).all node's outputPartitioning is same as child, and all node's 
> requiredChildDistribution is UnspecifiedDistribution.
>  and (3.2).the leaf node is FileSourceScanExec on bucketed table and
>  and (3.3).if enabling bucket scan for this FileSourceScanExec, the 
> outputPartitioning of FileSourceScanExec satisfies requiredChildDistribution 
> of SparkPlanWithInterestingPartitioning.
>  If (3.1),(3.2),(3.3) are all true, enabling bucket scan for this 
> FileSourceScanExec. And double check the new child of 
> SparkPlanWithInterestingPartitioning satisfies requiredChildDistribution.
>  
> The idea of SparkPlanWithInterestingPartitioning, is inspired from 
> "interesting order" in "Access Path Selection in a Relational Database 
> Management 
> System"([http://www.inf.ed.ac.uk/teaching/courses/adbs/AccessPath.pdf]).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33048) Fix SparkBuild.scala to recognize build settings for Scala 2.13

2020-10-01 Thread Kousuke Saruta (Jira)

Kousuke Saruta created SPARK-33048:
--

 Summary: Fix SparkBuild.scala to recognize build settings for 
Scala 2.13
 Key: SPARK-33048
 URL: https://issues.apache.org/jira/browse/SPARK-33048
 Project: Spark
  Issue Type: Sub-task
  Components: Build
Affects Versions: 3.0.1, 3.1.0
Reporter: Kousuke Saruta
Assignee: Kousuke Saruta


In SparkBuild.scala, a variable 'scalaBinaryVersion' is hardcoded as '2.12'.
So, an environment variable 'SPARK_SCALA_VERSION' is also to be '2.12'.
This issue causes some test suites (e.g. SparkSubmitSuite) to be error.

{code}
= TEST OUTPUT FOR o.a.s.deploy.SparkSubmitSuite: 'user classpath first in 
driver' =

20/10/02 08:55:30.234 redirect stderr for command 
/home/kou/work/oss/spark-scala-2.13/bin/spark-submit INFO Utils: Error: Could 
not find or load m
ain class org.apache.spark.launcher.Main
20/10/02 08:55:30.235 redirect stderr for command 
/home/kou/work/oss/spark-scala-2.13/bin/spark-submit INFO Utils: 
/home/kou/work/oss/spark-scala-
2.13/bin/spark-class: line 96: CMD: bad array subscript
{code}

The reason of this error is that environment variables 'SPARK_JARS_DIR' and 
'LAUNCH_CLASSPATH' is defined in bin/spark-class as follows.
{code}
SPARK_JARS_DIR="${SPARK_HOME}/assembly/target/scala-$SPARK_SCALA_VERSION/jars"
LAUNCH_CLASSPATH="${SPARK_HOME}/launcher/target/scala-$SPARK_SCALA_VERSION/classes:$LAUNCH_CLASSPATH"
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-33046) Update how to build doc for Scala 2.13 with sbt

2020-10-01 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-33046.
--
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 29921
[https://github.com/apache/spark/pull/29921]

> Update how to build doc for Scala 2.13 with sbt
> ---
>
> Key: SPARK-33046
> URL: https://issues.apache.org/jira/browse/SPARK-33046
> Project: Spark
>  Issue Type: Sub-task
>  Components: docs
>Affects Versions: 3.0.1, 3.1.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Minor
> Fix For: 3.1.0
>
>
> In the current doc, how to build Spark for Scala 2.13 with sbt is described 
> like:
> {code}
> ./build/sbt -Dscala.version=2.13.0
> {code}
> But build fails with this command because scala-2.13 profile is not enabled 
> and scala-parallel-collections is absent.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33046) Update how to build doc for Scala 2.13 with sbt

2020-10-01 Thread Kousuke Saruta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-33046:
---
Summary: Update how to build doc for Scala 2.13 with sbt  (was: How to 
build for Scala 2.13 with sbt in the doc is wrong.)

> Update how to build doc for Scala 2.13 with sbt
> ---
>
> Key: SPARK-33046
> URL: https://issues.apache.org/jira/browse/SPARK-33046
> Project: Spark
>  Issue Type: Sub-task
>  Components: docs
>Affects Versions: 3.0.1, 3.1.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Minor
>
> In the current doc, how to build Spark for Scala 2.13 with sbt is described 
> like:
> {code}
> ./build/sbt -Dscala.version=2.13.0
> {code}
> But build fails with this command because scala-2.13 profile is not enabled 
> and scala-parallel-collections is absent.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-32585) Support scala enumeration in ScalaReflection

2020-10-01 Thread Tathagata Das (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tathagata Das resolved SPARK-32585.
---
Fix Version/s: 3.1.0
   Resolution: Done

> Support scala enumeration in ScalaReflection
> 
>
> Key: SPARK-32585
> URL: https://issues.apache.org/jira/browse/SPARK-32585
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: ulysses you
>Priority: Minor
> Fix For: 3.1.0
>
>
> Add code in {{ScalaReflection}} to support scala enumeration and make 
> enumeration type as string type in Spark.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33047) Upgrade hive-storage-api to 2.7.2

2020-10-01 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-33047:
-

Assignee: Dongjoon Hyun

> Upgrade hive-storage-api to 2.7.2
> -
>
> Key: SPARK-33047
> URL: https://issues.apache.org/jira/browse/SPARK-33047
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-32996) Handle Option.empty v1.ExecutorSummary#peakMemoryMetrics

2020-10-01 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17205791#comment-17205791
 ] 

Apache Spark commented on SPARK-32996:
--

User 'shrutig' has created a pull request for this issue:
https://github.com/apache/spark/pull/29926

> Handle Option.empty v1.ExecutorSummary#peakMemoryMetrics
> 
>
> Key: SPARK-32996
> URL: https://issues.apache.org/jira/browse/SPARK-32996
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.0.1, 3.1.0
>Reporter: Shruti Gumma
>Assignee: Shruti Gumma
>Priority: Major
> Fix For: 3.1.0
>
>
> When {{peakMemoryMetrics}} in {{ExecutorSummary}} is {{Option.empty}}, then 
> the {{ExecutorMetricsJsonSerializer#serialize}} method does not execute the 
> {{jsonGenerator.writeObject}} method. This causes the json to be generated 
> with {{peakMemoryMetrics}} key added to the serialized string, but no 
> corresponding value.
> This causes an error to be thrown when it is the next key {{attributes}} turn 
> to be added to the json:
> {{com.fasterxml.jackson.core.JsonGenerationException: Can not write a field 
> name, expecting a value.}}
> {{}}
> {{}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-33047) Upgrade hive-storage-api to 2.7.2

2020-10-01 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-33047.
---
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 29923
[https://github.com/apache/spark/pull/29923]

> Upgrade hive-storage-api to 2.7.2
> -
>
> Key: SPARK-33047
> URL: https://issues.apache.org/jira/browse/SPARK-33047
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.1.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33043) RowMatrix is incompatible with spark.driver.maxResultSize=0

2020-10-01 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17205772#comment-17205772
 ] 

Apache Spark commented on SPARK-33043:
--

User 'srowen' has created a pull request for this issue:
https://github.com/apache/spark/pull/29925

> RowMatrix is incompatible with spark.driver.maxResultSize=0
> ---
>
> Key: SPARK-33043
> URL: https://issues.apache.org/jira/browse/SPARK-33043
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib
>Affects Versions: 3.0.0, 3.0.1
>Reporter: Karen Feng
>Priority: Minor
>
> RowMatrix does not work if spark.driver.maxResultSize=0, as this requirement 
> breaks:
>  
> {code:java}
> require(maxDriverResultSizeInBytes > aggregatedObjectSizeInBytes,  
> s"Cannot aggregate object of size $aggregatedObjectSizeInBytes Bytes, "   
>  + s"as it's bigger than maxResultSize ($maxDriverResultSizeInBytes Bytes)")
> {code}
>  
> [https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala#L795.]
>  
> This check should likely only happen if maxDriverResultSizeInBytes > 0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33043) RowMatrix is incompatible with spark.driver.maxResultSize=0

2020-10-01 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33043:


Assignee: (was: Apache Spark)

> RowMatrix is incompatible with spark.driver.maxResultSize=0
> ---
>
> Key: SPARK-33043
> URL: https://issues.apache.org/jira/browse/SPARK-33043
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib
>Affects Versions: 3.0.0, 3.0.1
>Reporter: Karen Feng
>Priority: Minor
>
> RowMatrix does not work if spark.driver.maxResultSize=0, as this requirement 
> breaks:
>  
> {code:java}
> require(maxDriverResultSizeInBytes > aggregatedObjectSizeInBytes,  
> s"Cannot aggregate object of size $aggregatedObjectSizeInBytes Bytes, "   
>  + s"as it's bigger than maxResultSize ($maxDriverResultSizeInBytes Bytes)")
> {code}
>  
> [https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala#L795.]
>  
> This check should likely only happen if maxDriverResultSizeInBytes > 0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33043) RowMatrix is incompatible with spark.driver.maxResultSize=0

2020-10-01 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33043:


Assignee: Apache Spark

> RowMatrix is incompatible with spark.driver.maxResultSize=0
> ---
>
> Key: SPARK-33043
> URL: https://issues.apache.org/jira/browse/SPARK-33043
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib
>Affects Versions: 3.0.0, 3.0.1
>Reporter: Karen Feng
>Assignee: Apache Spark
>Priority: Minor
>
> RowMatrix does not work if spark.driver.maxResultSize=0, as this requirement 
> breaks:
>  
> {code:java}
> require(maxDriverResultSizeInBytes > aggregatedObjectSizeInBytes,  
> s"Cannot aggregate object of size $aggregatedObjectSizeInBytes Bytes, "   
>  + s"as it's bigger than maxResultSize ($maxDriverResultSizeInBytes Bytes)")
> {code}
>  
> [https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala#L795.]
>  
> This check should likely only happen if maxDriverResultSizeInBytes > 0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33043) RowMatrix is incompatible with spark.driver.maxResultSize=0

2020-10-01 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17205771#comment-17205771
 ] 

Apache Spark commented on SPARK-33043:
--

User 'srowen' has created a pull request for this issue:
https://github.com/apache/spark/pull/29925

> RowMatrix is incompatible with spark.driver.maxResultSize=0
> ---
>
> Key: SPARK-33043
> URL: https://issues.apache.org/jira/browse/SPARK-33043
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib
>Affects Versions: 3.0.0, 3.0.1
>Reporter: Karen Feng
>Priority: Minor
>
> RowMatrix does not work if spark.driver.maxResultSize=0, as this requirement 
> breaks:
>  
> {code:java}
> require(maxDriverResultSizeInBytes > aggregatedObjectSizeInBytes,  
> s"Cannot aggregate object of size $aggregatedObjectSizeInBytes Bytes, "   
>  + s"as it's bigger than maxResultSize ($maxDriverResultSizeInBytes Bytes)")
> {code}
>  
> [https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala#L795.]
>  
> This check should likely only happen if maxDriverResultSizeInBytes > 0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33037) Remove knownManagers hardcoded list

2020-10-01 Thread BoYang (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17205743#comment-17205743
 ] 

BoYang commented on SPARK-33037:


After discussion, we feel it is better to remove the knownManagers list. That 
makes code more clean and also support user's custom shuffle manager 
implementation.

PR: https://github.com/apache/spark/pull/29916

> Remove knownManagers hardcoded list
> ---
>
> Key: SPARK-33037
> URL: https://issues.apache.org/jira/browse/SPARK-33037
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle
>Affects Versions: 2.4.7, 3.0.1
>Reporter: BoYang
>Priority: Major
>
> Spark has a hardcode list to contain known shuffle managers, which has two 
> values now. It does not contain user's custom shuffle manager which is set 
> through Spark config "spark.shuffle.manager".
>  
> We hit issue when set "spark.shuffle.manager" with our own shuffle manager 
> plugin (Uber Remote Shuffle Service implementation, 
> [https://github.com/uber/RemoteShuffleService]). Other users will hit same 
> issue when they implement their own shuffle manager.
>  
> Need to add "spark.shuffle.manager" config value to the known managers list 
> as well.
>  
> The know managers list is in code:
> common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockResolver.java
> {quote}private final List knownManagers = Arrays.asList(
>    "org.apache.spark.shuffle.sort.SortShuffleManager",
>    "org.apache.spark.shuffle.unsafe.UnsafeShuffleManager");
> {quote}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33037) Remove knownManagers hardcoded list

2020-10-01 Thread BoYang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BoYang updated SPARK-33037:
---
Summary: Remove knownManagers hardcoded list  (was: Add 
"spark.shuffle.manager" value to knownManagers)

> Remove knownManagers hardcoded list
> ---
>
> Key: SPARK-33037
> URL: https://issues.apache.org/jira/browse/SPARK-33037
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle
>Affects Versions: 2.4.7, 3.0.1
>Reporter: BoYang
>Priority: Major
>
> Spark has a hardcode list to contain known shuffle managers, which has two 
> values now. It does not contain user's custom shuffle manager which is set 
> through Spark config "spark.shuffle.manager".
>  
> We hit issue when set "spark.shuffle.manager" with our own shuffle manager 
> plugin (Uber Remote Shuffle Service implementation, 
> [https://github.com/uber/RemoteShuffleService]). Other users will hit same 
> issue when they implement their own shuffle manager.
>  
> Need to add "spark.shuffle.manager" config value to the known managers list 
> as well.
>  
> The know managers list is in code:
> common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockResolver.java
> {quote}private final List knownManagers = Arrays.asList(
>    "org.apache.spark.shuffle.sort.SortShuffleManager",
>    "org.apache.spark.shuffle.unsafe.UnsafeShuffleManager");
> {quote}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-24554) Add MapType Support for Arrow in PySpark

2020-10-01 Thread Bryan Cutler (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-24554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17205719#comment-17205719
 ] 

Bryan Cutler commented on SPARK-24554:
--

I started working on this, but ran into an issue at 
https://issues.apache.org/jira/browse/ARROW-10151 which needs to be resolved 
first.

> Add MapType Support for Arrow in PySpark
> 
>
> Key: SPARK-24554
> URL: https://issues.apache.org/jira/browse/SPARK-24554
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL
>Affects Versions: 2.3.1
>Reporter: Bryan Cutler
>Priority: Major
>  Labels: bulk-closed
>
> Add support for MapType in Arrow related classes in Scala/Java and pyarrow 
> functionality in Python.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-30821) Executor pods with multiple containers will not be rescheduled unless all containers fail

2020-10-01 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-30821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17205696#comment-17205696
 ] 

Apache Spark commented on SPARK-30821:
--

User 'huskysun' has created a pull request for this issue:
https://github.com/apache/spark/pull/29924

> Executor pods with multiple containers will not be rescheduled unless all 
> containers fail
> -
>
> Key: SPARK-30821
> URL: https://issues.apache.org/jira/browse/SPARK-30821
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes, Spark Core
>Affects Versions: 3.1.0
>Reporter: Kevin Hogeland
>Assignee: Apache Spark
>Priority: Major
>
> Since the restart policy of launched pods is Never, additional handling is 
> required for pods that may have sidecar containers. The executor should be 
> considered failed if any containers have terminated and have a non-zero exit 
> code, but Spark currently only checks the pod phase. The pod phase will 
> remain "running" as long as _any_ pods are still running. Kubernetes sidecar 
> support in 1.18/1.19 does not address this situation, as sidecar containers 
> are excluded from pod phase calculation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33047) Upgrade hive-storage-api to 2.7.2

2020-10-01 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17205662#comment-17205662
 ] 

Apache Spark commented on SPARK-33047:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/29923

> Upgrade hive-storage-api to 2.7.2
> -
>
> Key: SPARK-33047
> URL: https://issues.apache.org/jira/browse/SPARK-33047
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33047) Upgrade hive-storage-api to 2.7.2

2020-10-01 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33047:


Assignee: (was: Apache Spark)

> Upgrade hive-storage-api to 2.7.2
> -
>
> Key: SPARK-33047
> URL: https://issues.apache.org/jira/browse/SPARK-33047
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33047) Upgrade hive-storage-api to 2.7.2

2020-10-01 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33047:


Assignee: Apache Spark

> Upgrade hive-storage-api to 2.7.2
> -
>
> Key: SPARK-33047
> URL: https://issues.apache.org/jira/browse/SPARK-33047
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33047) Upgrade hive-storage-api to 2.7.2

2020-10-01 Thread Dongjoon Hyun (Jira)

Dongjoon Hyun created SPARK-33047:
-

 Summary: Upgrade hive-storage-api to 2.7.2
 Key: SPARK-33047
 URL: https://issues.apache.org/jira/browse/SPARK-33047
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.1.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-32723) Upgrade to jQuery 3.5.1

2020-10-01 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17205607#comment-17205607
 ] 

Apache Spark commented on SPARK-32723:
--

User 'n-marion' has created a pull request for this issue:
https://github.com/apache/spark/pull/29922

> Upgrade to jQuery 3.5.1
> ---
>
> Key: SPARK-32723
> URL: https://issues.apache.org/jira/browse/SPARK-32723
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Ashish Kumar Singh
>Assignee: Peter Toth
>Priority: Major
>  Labels: Security
> Fix For: 3.1.0
>
>
> Spark 3.0, Spark 2.4.x uses JQuery version < 3.5 which has known security 
> vulnerability in Spark Master UI and Spark Worker UI.
> Can we please upgrade JQuery to 3.5 and above ?
>  [https://www.tenable.com/plugins/nessus/136929]
> ??According to the self-reported version in the script, the version of JQuery 
> hosted on the remote web server is greater than or equal to 1.2 and prior to 
> 3.5.0. It is, therefore, affected by multiple cross site scripting 
> vulnerabilities.??
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-30186) support Dynamic Partition Pruning in Adaptive Execution

2020-10-01 Thread Yuming Wang (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-30186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17205606#comment-17205606
 ] 

Yuming Wang commented on SPARK-30186:
-

For our internal tpcds q77, enable AQE and enable DPP cannot work properly:
{code:sql}
WITH ss AS
(
 SELECT   s_store_sk,
  Sum(ss_ext_sales_price) AS sales,
  Sum(ss_net_profit)  AS profit
 FROM store_sales,
  date_dim,
  store
 WHEREss_sold_date_sk = d_date_sk
 AND  d_date BETWEEN Cast('2000-08-23' AS DATE) AND  (
   Cast('2000-08-23' AS DATE) + interval '30' day)
 AND  ss_store_sk = s_store_sk
 GROUP BY s_store_sk), sr AS
(
 SELECT   s_store_sk,
  sum(sr_return_amt) AS returns,
  sum(sr_net_loss)   AS profit_loss
 FROM store_returns,
  date_dim,
  store
 WHEREsr_returned_date_sk = d_date_sk
 AND  d_date BETWEEN cast('2000-08-23' AS date) AND  (
   cast('2000-08-23' AS date) + interval '30' day)
 AND  sr_store_sk = s_store_sk
 GROUP BY s_store_sk), cs AS
(
 SELECT   cs_call_center_sk,
  sum(cs_ext_sales_price) AS sales,
  sum(cs_net_profit)  AS profit
 FROM catalog_sales,
  date_dim
 WHEREcs_sold_date_sk = d_date_sk
 AND  d_date BETWEEN cast('2000-08-23' AS date) AND  (
   cast('2000-08-23' AS date) + interval '30' day)
 GROUP BY cs_call_center_sk), cr AS
(
 SELECT   cr_call_center_sk,
  sum(cr_return_amount) AS returns,
  sum(cr_net_loss)  AS profit_loss
 FROM catalog_returns,
  date_dim
 WHEREcr_returned_date_sk = d_date_sk
 AND  d_date BETWEEN cast('2000-08-23' AS date) AND  (
   cast('2000-08-23' AS date) + interval '30' day)
 GROUP BY cr_call_center_sk), ws AS
(
 SELECT   wp_web_page_sk,
  sum(ws_ext_sales_price) AS sales,
  sum(ws_net_profit)  AS profit
 FROM web_sales,
  date_dim,
  web_page
 WHEREws_sold_date_sk = d_date_sk
 AND  d_date BETWEEN cast('2000-08-23' AS date) AND  (
   cast('2000-08-23' AS date) + interval '30' day)
 AND  ws_web_page_sk = wp_web_page_sk
 GROUP BY wp_web_page_sk), wr AS
(
 SELECT   wp_web_page_sk,
  sum(wr_return_amt) AS returns,
  sum(wr_net_loss)   AS profit_loss
 FROM web_returns,
  date_dim,
  web_page
 WHEREwr_returned_date_sk = d_date_sk
 AND  d_date BETWEEN cast('2000-08-23' AS date) AND  (
   cast('2000-08-23' AS date) + interval '30' day)
 AND  wr_web_page_sk = wp_web_page_sk
 GROUP BY wp_web_page_sk)
SELECT   channel,
 id,
 sum(sales)   AS sales,
 sum(returns) AS returns,
 sum(profit)  AS profit
FROM (
   SELECT'store channel' AS channel,
 ss.s_store_sk   AS id,
 sales,
 COALESCE(returns, 0)   AS returns,
 (profit - COALESCE(profit_loss,0)) AS profit
   FROM  ss
   LEFT JOIN sr
   ONss.s_store_sk = sr.s_store_sk
   UNION ALL
   SELECT 'catalog channel' AS channel,
  cs_call_center_sk AS id,
  sales,
  returns,
  (profit - profit_loss) AS profit
   FROM   cs
   CROSS JOIN cr
   UNION ALL
   SELECT'web channel' AS channel,
 ws.wp_web_page_sk AS id,
 sales,
 COALESCE(returns, 0)  returns,
 (profit - COALESCE(profit_loss,0)) AS profit
   FROM  ws
   LEFT JOIN wr
   ONws.wp_web_page_sk = wr.wp_web_page_sk ) x
GROUP BY rollup(channel, id)
ORDER BY channel,
 id limit 100
{code}


> support Dynamic Partition Pruning in Adaptive Execution
> ---
>
> Key: SPARK-30186
> URL: https://issues.apache.org/jira/browse/SPARK-30186
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL

[jira] [Commented] (SPARK-27318) Join operation on bucketing table fails with base adaptive enabled

2020-10-01 Thread Yuming Wang (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-27318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17205558#comment-17205558
 ] 

Yuming Wang commented on SPARK-27318:
-

Could you try Spark 3.x?

> Join operation on bucketing table fails with base adaptive enabled
> --
>
> Key: SPARK-27318
> URL: https://issues.apache.org/jira/browse/SPARK-27318
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Supritha
>Priority: Major
>
> Join Operation on bucketed table is failing.
> Steps to reproduce the issue.
> {code}
> spark.sql("set spark.sql.adaptive.enabled=true")
> {code}
> 1. Create table bukcet3 and bucket4 Table as below and load the data.
> {code}
> sql("create table bucket3(id3 int,country3 String, sports3 String) row format 
> delimited fields terminated by ','").show()
> sql("create table bucket4(id4 int,country4 String) row format delimited 
> fields terminated by ','").show()
> sql("load data local inpath '/opt/abhidata/bucket2.txt' into table 
> bucket3").show()
> sql("load data local inpath '/opt/abhidata/bucket3.txt' into table 
> bucket4").show()
> {code}
> 2. Create bucketing table as below
> {code}
> spark.sqlContext.table("bucket3").write.bucketBy(3, 
> "id3").saveAsTable("bucketed_table_3");
> spark.sqlContext.table("bucket4").write.bucketBy(4, 
> "id4").saveAsTable("bucketed_table_4");
> {code}
> 3. Execute the join query on the bucketed table 
> {code}
> sql("select * from bucketed_table_3 join bucketed_table_4 on 
> bucketed_table_3.id3 = bucketed_table_4.id4").show()
> {code}
>  
> {code:java}
> java.lang.IllegalArgumentException: requirement failed: 
> PartitioningCollection requires all of its partitionings have the same 
> numPartitions. at scala.Predef$.require(Predef.scala:224) at 
> org.apache.spark.sql.catalyst.plans.physical.PartitioningCollection.(partitioning.scala:291)
>  at 
> org.apache.spark.sql.execution.joins.SortMergeJoinExec.outputPartitioning(SortMergeJoinExec.scala:69)
>  at 
> org.apache.spark.sql.execution.exchange.EnsureRequirements$$anonfun$org$apache$spark$sql$execution$exchange$EnsureRequirements$$ensureDistributionAndOrdering$1.apply(EnsureRequirements.scala:150)
>  at 
> org.apache.spark.sql.execution.exchange.EnsureRequirements$$anonfun$org$apache$spark$sql$execution$exchange$EnsureRequirements$$ensureDistributionAndOrdering$1.apply(EnsureRequirements.scala:149)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at scala.collection.immutable.List.foreach(List.scala:392) at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at 
> scala.collection.immutable.List.map(List.scala:296) at 
> org.apache.spark.sql.execution.exchange.EnsureRequirements.org$apache$spark$sql$execution$exchange$EnsureRequirements$$ensureDistributionAndOrdering(EnsureRequirements.scala:149)
>  at 
> org.apache.spark.sql.execution.exchange.EnsureRequirements$$anonfun$apply$1.applyOrElse(EnsureRequirements.scala:304)
>  at 
> org.apache.spark.sql.execution.exchange.EnsureRequirements$$anonfun$apply$1.applyOrElse(EnsureRequirements.scala:296)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$2.apply(TreeNode.scala:282)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$2.apply(TreeNode.scala:282)
>  at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:281) 
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:275)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:275)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:326)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:324) 
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:275) 
> at 
> org.apache.spark.sql.execution.exchange.EnsureRequirements.apply(EnsureRequirements.scala:296)
>  at 
> org.apache.spark.sql.execution.exchange.EnsureRequirements.apply(EnsureRequirements.scala:38)
>  at 
> org.apache.spark.sql.execution.QueryExecution$$anonfun$prepareForExecution$1.apply(QueryExecution.scala:87)
>  at 
> org.apache.spark.sql.execution.QueryExecution$$anonfun$prepareForExecution$1.apply(QueryExecution.scala:87)
>  at 
> scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:124)
>  at scala.collection.immutable.List.foldLeft(List.scala:84) at 
> org.apache.spark.sql.execution.QueryExec

[jira] [Commented] (SPARK-33046) How to build for Scala 2.13 with sbt in the doc is wrong.

2020-10-01 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17205528#comment-17205528
 ] 

Apache Spark commented on SPARK-33046:
--

User 'sarutak' has created a pull request for this issue:
https://github.com/apache/spark/pull/29921

> How to build for Scala 2.13 with sbt in the doc is wrong.
> -
>
> Key: SPARK-33046
> URL: https://issues.apache.org/jira/browse/SPARK-33046
> Project: Spark
>  Issue Type: Sub-task
>  Components: docs
>Affects Versions: 3.0.1, 3.1.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Minor
>
> In the current doc, how to build Spark for Scala 2.13 with sbt is described 
> like:
> {code}
> ./build/sbt -Dscala.version=2.13.0
> {code}
> But build fails with this command because scala-2.13 profile is not enabled 
> and scala-parallel-collections is absent.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33046) How to build for Scala 2.13 with sbt in the doc is wrong.

2020-10-01 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17205526#comment-17205526
 ] 

Apache Spark commented on SPARK-33046:
--

User 'sarutak' has created a pull request for this issue:
https://github.com/apache/spark/pull/29921

> How to build for Scala 2.13 with sbt in the doc is wrong.
> -
>
> Key: SPARK-33046
> URL: https://issues.apache.org/jira/browse/SPARK-33046
> Project: Spark
>  Issue Type: Sub-task
>  Components: docs
>Affects Versions: 3.0.1, 3.1.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Minor
>
> In the current doc, how to build Spark for Scala 2.13 with sbt is described 
> like:
> {code}
> ./build/sbt -Dscala.version=2.13.0
> {code}
> But build fails with this command because scala-2.13 profile is not enabled 
> and scala-parallel-collections is absent.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33046) How to build for Scala 2.13 with sbt in the doc is wrong.

2020-10-01 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33046:


Assignee: Apache Spark  (was: Kousuke Saruta)

> How to build for Scala 2.13 with sbt in the doc is wrong.
> -
>
> Key: SPARK-33046
> URL: https://issues.apache.org/jira/browse/SPARK-33046
> Project: Spark
>  Issue Type: Sub-task
>  Components: docs
>Affects Versions: 3.0.1, 3.1.0
>Reporter: Kousuke Saruta
>Assignee: Apache Spark
>Priority: Minor
>
> In the current doc, how to build Spark for Scala 2.13 with sbt is described 
> like:
> {code}
> ./build/sbt -Dscala.version=2.13.0
> {code}
> But build fails with this command because scala-2.13 profile is not enabled 
> and scala-parallel-collections is absent.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33046) How to build for Scala 2.13 with sbt in the doc is wrong.

2020-10-01 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33046:


Assignee: Kousuke Saruta  (was: Apache Spark)

> How to build for Scala 2.13 with sbt in the doc is wrong.
> -
>
> Key: SPARK-33046
> URL: https://issues.apache.org/jira/browse/SPARK-33046
> Project: Spark
>  Issue Type: Sub-task
>  Components: docs
>Affects Versions: 3.0.1, 3.1.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Minor
>
> In the current doc, how to build Spark for Scala 2.13 with sbt is described 
> like:
> {code}
> ./build/sbt -Dscala.version=2.13.0
> {code}
> But build fails with this command because scala-2.13 profile is not enabled 
> and scala-parallel-collections is absent.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33024) Fix CodeGen fallback issue of UDFSuite in Scala 2.13

2020-10-01 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen reassigned SPARK-33024:


Assignee: Yang Jie

> Fix CodeGen fallback issue of UDFSuite in Scala 2.13
> 
>
> Key: SPARK-33024
> URL: https://issues.apache.org/jira/browse/SPARK-33024
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>
> After SPARK-32851 set `CODEGEN_FACTORY_MODE` to `CODEGEN_ONLY` in 
> SharedSparkSessionBase of sparkConf to construction SparkSession in Test,
> The test suite `SPARK-32459: UDF should not fail on WrappedArray` in 
> s.sql.UDFSuite exposed a codegen fallback issue in Scala 2.13 as follow:
> {code:java}
> - SPARK-32459: UDF should not fail on WrappedArray *** FAILED ***
> Caused by: org.codehaus.commons.compiler.CompileException: File 
> 'generated.java', Line 47, Column 99: failed to compile: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 47, Column 99: No applicable constructor/method found for zero actual 
> parameters; candidates are: "public scala.collection.mutable.Builder 
> scala.collection.mutable.ArraySeq$.newBuilder(java.lang.Object)", "public 
> scala.collection.mutable.Builder 
> scala.collection.mutable.ArraySeq$.newBuilder(scala.reflect.ClassTag)", 
> "public abstract scala.collection.mutable.Builder 
> scala.collection.EvidenceIterableFactory.newBuilder(java.lang.Object)"
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-33024) Fix CodeGen fallback issue of UDFSuite in Scala 2.13

2020-10-01 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-33024.
--
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 29903
[https://github.com/apache/spark/pull/29903]

> Fix CodeGen fallback issue of UDFSuite in Scala 2.13
> 
>
> Key: SPARK-33024
> URL: https://issues.apache.org/jira/browse/SPARK-33024
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
> Fix For: 3.1.0
>
>
> After SPARK-32851 set `CODEGEN_FACTORY_MODE` to `CODEGEN_ONLY` in 
> SharedSparkSessionBase of sparkConf to construction SparkSession in Test,
> The test suite `SPARK-32459: UDF should not fail on WrappedArray` in 
> s.sql.UDFSuite exposed a codegen fallback issue in Scala 2.13 as follow:
> {code:java}
> - SPARK-32459: UDF should not fail on WrappedArray *** FAILED ***
> Caused by: org.codehaus.commons.compiler.CompileException: File 
> 'generated.java', Line 47, Column 99: failed to compile: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 47, Column 99: No applicable constructor/method found for zero actual 
> parameters; candidates are: "public scala.collection.mutable.Builder 
> scala.collection.mutable.ArraySeq$.newBuilder(java.lang.Object)", "public 
> scala.collection.mutable.Builder 
> scala.collection.mutable.ArraySeq$.newBuilder(scala.reflect.ClassTag)", 
> "public abstract scala.collection.mutable.Builder 
> scala.collection.EvidenceIterableFactory.newBuilder(java.lang.Object)"
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33045) Implement built-in LIKE ANY and LIKE ALL UDF

2020-10-01 Thread Yuming Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-33045:

Description: 
We already support ANY / SOME / ALL syntax, but it will throw 
{{StackOverflowError}} if there are many elements(more than 14378 elements). We 
should implement built-in LIKE ANY and LIKE ALL UDF.

{noformat}
java.lang.StackOverflowError
at 
scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
at 
scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38)
at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)
at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)
at 
scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:184)
at 
scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:47)
at 
scala.collection.generic.GenericCompanion.apply(GenericCompanion.scala:53)
at 
org.apache.spark.sql.catalyst.expressions.BinaryExpression.children(Expression.scala:549)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.foreach(TreeNode.scala:175)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1(TreeNode.scala:175)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1$adapted(TreeNode.scala:175)
at scala.collection.immutable.List.foreach(List.scala:392)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.foreach(TreeNode.scala:175)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1(TreeNode.scala:175)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1$adapted(TreeNode.scala:175)
at scala.collection.immutable.List.foreach(List.scala:392)
{noformat}


  was:
We already support ANY / SOME / ALL syntax, but it will throw 
{{StackOverflowError}} if there are many elements(more than 14378 elements). We 
should implement LIKE ANY and LIKE ALL built-in UDF.

{noformat}
java.lang.StackOverflowError
at 
scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
at 
scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38)
at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)
at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)
at 
scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:184)
at 
scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:47)
at 
scala.collection.generic.GenericCompanion.apply(GenericCompanion.scala:53)
at 
org.apache.spark.sql.catalyst.expressions.BinaryExpression.children(Expression.scala:549)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.foreach(TreeNode.scala:175)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1(TreeNode.scala:175)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1$adapted(TreeNode.scala:175)
at scala.collection.immutable.List.foreach(List.scala:392)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.foreach(TreeNode.scala:175)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1(TreeNode.scala:175)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1$adapted(TreeNode.scala:175)
at scala.collection.immutable.List.foreach(List.scala:392)
{noformat}



> Implement built-in LIKE ANY and LIKE ALL UDF
> 
>
> Key: SPARK-33045
> URL: https://issues.apache.org/jira/browse/SPARK-33045
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Yuming Wang
>Priority: Major
>
> We already support ANY / SOME / ALL syntax, but it will throw 
> {{StackOverflowError}} if there are many elements(more than 14378 elements). 
> We should implement built-in LIKE ANY and LIKE ALL UDF.
> {noformat}
> java.lang.StackOverflowError
>   at 
> scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
>   at 
> scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
>   at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38)
>   at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)
>   at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)
>   at 
> scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:184)
>   at 
> scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:47)
>   at 
> scala.collection.generic.GenericCompanion.apply(GenericCompanion.scala:53)
>   at 
> org.

[jira] [Updated] (SPARK-33045) Implement built-in LIKE ANY and LIKE ALL UDF

2020-10-01 Thread Yuming Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-33045:

Summary: Implement built-in LIKE ANY and LIKE ALL UDF  (was: Implement LIKE 
ANY and LIKE ALL built-in UDF)

> Implement built-in LIKE ANY and LIKE ALL UDF
> 
>
> Key: SPARK-33045
> URL: https://issues.apache.org/jira/browse/SPARK-33045
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Yuming Wang
>Priority: Major
>
> We already support ANY / SOME / ALL syntax, but it will throw 
> {{StackOverflowError}} if there are many elements(more than 14378 elements). 
> We should implement LIKE ANY and LIKE ALL built-in UDF.
> {noformat}
> java.lang.StackOverflowError
>   at 
> scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
>   at 
> scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
>   at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38)
>   at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)
>   at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)
>   at 
> scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:184)
>   at 
> scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:47)
>   at 
> scala.collection.generic.GenericCompanion.apply(GenericCompanion.scala:53)
>   at 
> org.apache.spark.sql.catalyst.expressions.BinaryExpression.children(Expression.scala:549)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.foreach(TreeNode.scala:175)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1(TreeNode.scala:175)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1$adapted(TreeNode.scala:175)
>   at scala.collection.immutable.List.foreach(List.scala:392)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.foreach(TreeNode.scala:175)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1(TreeNode.scala:175)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1$adapted(TreeNode.scala:175)
>   at scala.collection.immutable.List.foreach(List.scala:392)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33046) How to build for Scala 2.13 with sbt in the doc is wrong.

2020-10-01 Thread Kousuke Saruta (Jira)

Kousuke Saruta created SPARK-33046:
--

 Summary: How to build for Scala 2.13 with sbt in the doc is wrong.
 Key: SPARK-33046
 URL: https://issues.apache.org/jira/browse/SPARK-33046
 Project: Spark
  Issue Type: Sub-task
  Components: docs
Affects Versions: 3.0.1, 3.1.0
Reporter: Kousuke Saruta
Assignee: Kousuke Saruta


In the current doc, how to build Spark for Scala 2.13 with sbt is described 
like:
{code}
./build/sbt -Dscala.version=2.13.0
{code}
But build fails with this command because scala-2.13 profile is not enabled and 
scala-parallel-collections is absent.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33045) Implement LIKE ANY and LIKE ALL built-in UDF

2020-10-01 Thread Yuming Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-33045:

Description: 
We already support ANY / SOME / ALL syntax, but it will throw 
{{StackOverflowError}} if there are many elements(more than 14378 elements). We 
should implement LIKE ANY and LIKE ALL built-in UDF.

{noformat}
java.lang.StackOverflowError
at 
scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
at 
scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38)
at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)
at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)
at 
scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:184)
at 
scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:47)
at 
scala.collection.generic.GenericCompanion.apply(GenericCompanion.scala:53)
at 
org.apache.spark.sql.catalyst.expressions.BinaryExpression.children(Expression.scala:549)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.foreach(TreeNode.scala:175)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1(TreeNode.scala:175)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1$adapted(TreeNode.scala:175)
at scala.collection.immutable.List.foreach(List.scala:392)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.foreach(TreeNode.scala:175)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1(TreeNode.scala:175)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1$adapted(TreeNode.scala:175)
at scala.collection.immutable.List.foreach(List.scala:392)
{noformat}


  was:
We already support ANY / SOME / ALL syntax, but it will throw 
{{StackOverflowError}} if there are many elements(more than 14378 elements). We 
should implement LIKE ANY/SOME/ALL UDF.

{noformat}
java.lang.StackOverflowError
at 
scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
at 
scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38)
at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)
at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)
at 
scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:184)
at 
scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:47)
at 
scala.collection.generic.GenericCompanion.apply(GenericCompanion.scala:53)
at 
org.apache.spark.sql.catalyst.expressions.BinaryExpression.children(Expression.scala:549)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.foreach(TreeNode.scala:175)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1(TreeNode.scala:175)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1$adapted(TreeNode.scala:175)
at scala.collection.immutable.List.foreach(List.scala:392)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.foreach(TreeNode.scala:175)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1(TreeNode.scala:175)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1$adapted(TreeNode.scala:175)
at scala.collection.immutable.List.foreach(List.scala:392)
{noformat}



> Implement LIKE ANY and LIKE ALL built-in UDF
> 
>
> Key: SPARK-33045
> URL: https://issues.apache.org/jira/browse/SPARK-33045
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Yuming Wang
>Priority: Major
>
> We already support ANY / SOME / ALL syntax, but it will throw 
> {{StackOverflowError}} if there are many elements(more than 14378 elements). 
> We should implement LIKE ANY and LIKE ALL built-in UDF.
> {noformat}
> java.lang.StackOverflowError
>   at 
> scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
>   at 
> scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
>   at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38)
>   at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)
>   at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)
>   at 
> scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:184)
>   at 
> scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:47)
>   at 
> scala.collection.generic.GenericCompanion.apply(GenericCompanion.scala:53)
>   at 
> org.apache.spark.

[jira] [Updated] (SPARK-33045) Implement LIKE ANY and LIKE ALL built-in UDF

2020-10-01 Thread Yuming Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-33045:

Summary: Implement LIKE ANY and LIKE ALL built-in UDF  (was: Implement LIKE 
ANY and LIKE ALL UDF)

> Implement LIKE ANY and LIKE ALL built-in UDF
> 
>
> Key: SPARK-33045
> URL: https://issues.apache.org/jira/browse/SPARK-33045
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Yuming Wang
>Priority: Major
>
> We already support ANY / SOME / ALL syntax, but it will throw 
> {{StackOverflowError}} if there are many elements(more than 14378 elements). 
> We should implement LIKE ANY/SOME/ALL UDF.
> {noformat}
> java.lang.StackOverflowError
>   at 
> scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
>   at 
> scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
>   at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38)
>   at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)
>   at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)
>   at 
> scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:184)
>   at 
> scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:47)
>   at 
> scala.collection.generic.GenericCompanion.apply(GenericCompanion.scala:53)
>   at 
> org.apache.spark.sql.catalyst.expressions.BinaryExpression.children(Expression.scala:549)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.foreach(TreeNode.scala:175)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1(TreeNode.scala:175)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1$adapted(TreeNode.scala:175)
>   at scala.collection.immutable.List.foreach(List.scala:392)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.foreach(TreeNode.scala:175)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1(TreeNode.scala:175)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1$adapted(TreeNode.scala:175)
>   at scala.collection.immutable.List.foreach(List.scala:392)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33045) Implement LIKE ANY and LIKE ALL UDF

2020-10-01 Thread Yuming Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-33045:

Summary: Implement LIKE ANY and LIKE ALL UDF  (was: Implement LIKE 
ANY/SOME/ALL UDF)

> Implement LIKE ANY and LIKE ALL UDF
> ---
>
> Key: SPARK-33045
> URL: https://issues.apache.org/jira/browse/SPARK-33045
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Yuming Wang
>Priority: Major
>
> We already support ANY / SOME / ALL syntax, but it will throw 
> {{StackOverflowError}} if there are many elements(more than 14378 elements). 
> We should implement LIKE ANY/SOME/ALL UDF.
> {noformat}
> java.lang.StackOverflowError
>   at 
> scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
>   at 
> scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
>   at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38)
>   at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)
>   at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)
>   at 
> scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:184)
>   at 
> scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:47)
>   at 
> scala.collection.generic.GenericCompanion.apply(GenericCompanion.scala:53)
>   at 
> org.apache.spark.sql.catalyst.expressions.BinaryExpression.children(Expression.scala:549)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.foreach(TreeNode.scala:175)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1(TreeNode.scala:175)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1$adapted(TreeNode.scala:175)
>   at scala.collection.immutable.List.foreach(List.scala:392)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.foreach(TreeNode.scala:175)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1(TreeNode.scala:175)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1$adapted(TreeNode.scala:175)
>   at scala.collection.immutable.List.foreach(List.scala:392)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Reopened] (SPARK-32965) pyspark reading csv files with utf_16le encoding

2020-10-01 Thread Punit Shah (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Punit Shah reopened SPARK-32965:


The linked duplicate issue won't be fixed because the issue was mixed with a 
multiline feature issue.  However my ticket exclusively deals with utf-16le and 
utf-16be encoding not being handled correctly via pyspark.

Therefore this issue is still open and unresolved.

> pyspark reading csv files with utf_16le encoding
> 
>
> Key: SPARK-32965
> URL: https://issues.apache.org/jira/browse/SPARK-32965
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.7, 3.0.0, 3.0.1
>Reporter: Punit Shah
>Priority: Major
> Attachments: 16le.csv, 32965.png
>
>
> If you have a file encoded in utf_16le or utf_16be and try to use 
> spark.read.csv("", encoding="utf_16le") the dataframe isn't 
> rendered properly
> if you use python decoding like:
> prdd = spark_session._sc.binaryFiles(path_url).values().flatMap(lambda x : 
> x.decode("utf_16le").splitlines())
> and then do spark.read.csv(prdd), then it works.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33045) Implement LIKE ANY/SOME/ALL UDF

2020-10-01 Thread Yuming Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-33045:

Description: 
We already support ANY / SOME / ALL syntax, but it will throw 
{{StackOverflowError}} if there are many elements(more than 14738 elements). We 
should implement LIKE ANY/SOME/ALL UDF.

{noformat}
java.lang.StackOverflowError
at 
scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
at 
scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38)
at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)
at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)
at 
scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:184)
at 
scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:47)
at 
scala.collection.generic.GenericCompanion.apply(GenericCompanion.scala:53)
at 
org.apache.spark.sql.catalyst.expressions.BinaryExpression.children(Expression.scala:549)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.foreach(TreeNode.scala:175)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1(TreeNode.scala:175)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1$adapted(TreeNode.scala:175)
at scala.collection.immutable.List.foreach(List.scala:392)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.foreach(TreeNode.scala:175)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1(TreeNode.scala:175)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1$adapted(TreeNode.scala:175)
at scala.collection.immutable.List.foreach(List.scala:392)
{noformat}


  was:
We already support ANY / SOME / ALL syntax, but it will throw 
{{StackOverflowError}} if there are many elements. We should implement LIKE 
ANY/SOME/ALL UDF.

{noformat}
java.lang.StackOverflowError
at 
scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
at 
scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38)
at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)
at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)
at 
scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:184)
at 
scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:47)
at 
scala.collection.generic.GenericCompanion.apply(GenericCompanion.scala:53)
at 
org.apache.spark.sql.catalyst.expressions.BinaryExpression.children(Expression.scala:549)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.foreach(TreeNode.scala:175)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1(TreeNode.scala:175)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1$adapted(TreeNode.scala:175)
at scala.collection.immutable.List.foreach(List.scala:392)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.foreach(TreeNode.scala:175)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1(TreeNode.scala:175)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1$adapted(TreeNode.scala:175)
at scala.collection.immutable.List.foreach(List.scala:392)
{noformat}



> Implement LIKE ANY/SOME/ALL UDF
> ---
>
> Key: SPARK-33045
> URL: https://issues.apache.org/jira/browse/SPARK-33045
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Yuming Wang
>Priority: Major
>
> We already support ANY / SOME / ALL syntax, but it will throw 
> {{StackOverflowError}} if there are many elements(more than 14738 elements). 
> We should implement LIKE ANY/SOME/ALL UDF.
> {noformat}
> java.lang.StackOverflowError
>   at 
> scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
>   at 
> scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
>   at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38)
>   at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)
>   at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)
>   at 
> scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:184)
>   at 
> scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:47)
>   at 
> scala.collection.generic.GenericCompanion.apply(GenericCompanion.scala:53)
>   at 
> org.apache.spark.sql.catalyst.expressions.BinaryExpression.children(Expression.scala:549)
>

[jira] [Updated] (SPARK-33045) Implement LIKE ANY/SOME/ALL UDF

2020-10-01 Thread Yuming Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-33045:

Description: 
We already support ANY / SOME / ALL syntax, but it will throw 
{{StackOverflowError}} if there are many elements(more than 14378 elements). We 
should implement LIKE ANY/SOME/ALL UDF.

{noformat}
java.lang.StackOverflowError
at 
scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
at 
scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38)
at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)
at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)
at 
scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:184)
at 
scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:47)
at 
scala.collection.generic.GenericCompanion.apply(GenericCompanion.scala:53)
at 
org.apache.spark.sql.catalyst.expressions.BinaryExpression.children(Expression.scala:549)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.foreach(TreeNode.scala:175)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1(TreeNode.scala:175)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1$adapted(TreeNode.scala:175)
at scala.collection.immutable.List.foreach(List.scala:392)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.foreach(TreeNode.scala:175)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1(TreeNode.scala:175)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1$adapted(TreeNode.scala:175)
at scala.collection.immutable.List.foreach(List.scala:392)
{noformat}


  was:
We already support ANY / SOME / ALL syntax, but it will throw 
{{StackOverflowError}} if there are many elements(more than 14738 elements). We 
should implement LIKE ANY/SOME/ALL UDF.

{noformat}
java.lang.StackOverflowError
at 
scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
at 
scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38)
at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)
at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)
at 
scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:184)
at 
scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:47)
at 
scala.collection.generic.GenericCompanion.apply(GenericCompanion.scala:53)
at 
org.apache.spark.sql.catalyst.expressions.BinaryExpression.children(Expression.scala:549)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.foreach(TreeNode.scala:175)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1(TreeNode.scala:175)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1$adapted(TreeNode.scala:175)
at scala.collection.immutable.List.foreach(List.scala:392)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.foreach(TreeNode.scala:175)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1(TreeNode.scala:175)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1$adapted(TreeNode.scala:175)
at scala.collection.immutable.List.foreach(List.scala:392)
{noformat}



> Implement LIKE ANY/SOME/ALL UDF
> ---
>
> Key: SPARK-33045
> URL: https://issues.apache.org/jira/browse/SPARK-33045
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Yuming Wang
>Priority: Major
>
> We already support ANY / SOME / ALL syntax, but it will throw 
> {{StackOverflowError}} if there are many elements(more than 14378 elements). 
> We should implement LIKE ANY/SOME/ALL UDF.
> {noformat}
> java.lang.StackOverflowError
>   at 
> scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
>   at 
> scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
>   at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38)
>   at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)
>   at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)
>   at 
> scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:184)
>   at 
> scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:47)
>   at 
> scala.collection.generic.GenericCompanion.apply(GenericCompanion.scala:53)
>   at 
> org.apache.spark.sql.catalyst.expressions.BinaryExpression.children(E

[jira] [Updated] (SPARK-33045) Implement LIKE ANY/SOME/ALL UDF

2020-10-01 Thread Yuming Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-33045:

Description: 
We already support ANY / SOME / ALL syntax, but it will throw 
{{StackOverflowError}} if there are many elements. We should implement LIKE 
ANY/SOME/ALL UDF.

{noformat}
java.lang.StackOverflowError
at 
scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
at 
scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38)
at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)
at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)
at 
scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:184)
at 
scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:47)
at 
scala.collection.generic.GenericCompanion.apply(GenericCompanion.scala:53)
at 
org.apache.spark.sql.catalyst.expressions.BinaryExpression.children(Expression.scala:549)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.foreach(TreeNode.scala:175)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1(TreeNode.scala:175)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1$adapted(TreeNode.scala:175)
at scala.collection.immutable.List.foreach(List.scala:392)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.foreach(TreeNode.scala:175)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1(TreeNode.scala:175)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1$adapted(TreeNode.scala:175)
at scala.collection.immutable.List.foreach(List.scala:392)
{noformat}


  was:We already support ANY / SOME / ALL syntax, but it will throw 
{{StackOverflowError}} if there are many elements. We should implement LIKE 
ANY/SOME/ALL UDF.


> Implement LIKE ANY/SOME/ALL UDF
> ---
>
> Key: SPARK-33045
> URL: https://issues.apache.org/jira/browse/SPARK-33045
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Yuming Wang
>Priority: Major
>
> We already support ANY / SOME / ALL syntax, but it will throw 
> {{StackOverflowError}} if there are many elements. We should implement LIKE 
> ANY/SOME/ALL UDF.
> {noformat}
> java.lang.StackOverflowError
>   at 
> scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
>   at 
> scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
>   at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38)
>   at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)
>   at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)
>   at 
> scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:184)
>   at 
> scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:47)
>   at 
> scala.collection.generic.GenericCompanion.apply(GenericCompanion.scala:53)
>   at 
> org.apache.spark.sql.catalyst.expressions.BinaryExpression.children(Expression.scala:549)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.foreach(TreeNode.scala:175)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1(TreeNode.scala:175)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1$adapted(TreeNode.scala:175)
>   at scala.collection.immutable.List.foreach(List.scala:392)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.foreach(TreeNode.scala:175)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1(TreeNode.scala:175)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1$adapted(TreeNode.scala:175)
>   at scala.collection.immutable.List.foreach(List.scala:392)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33045) Implement LIKE ANY/SOME/ALL UDF

2020-10-01 Thread Yuming Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-33045:

Description: We already support ANY / SOME / ALL syntax, but it will throw 
{{StackOverflowError}} if there are many elements. We should implement LIKE 
ANY/SOME/ALL UDF.  (was: We already support ANY / SOME / ALL syntax, but it 
will throw {{StackOverflowError}} if there are many elements. We should 
implement )

> Implement LIKE ANY/SOME/ALL UDF
> ---
>
> Key: SPARK-33045
> URL: https://issues.apache.org/jira/browse/SPARK-33045
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Yuming Wang
>Priority: Major
>
> We already support ANY / SOME / ALL syntax, but it will throw 
> {{StackOverflowError}} if there are many elements. We should implement LIKE 
> ANY/SOME/ALL UDF.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33045) Implement LIKE ANY/SOME/ALL UDF

2020-10-01 Thread Yuming Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-33045:

Description: We already support ANY / SOME / ALL syntax, but it will throw 
{{StackOverflowError}} if there are many elements. We should implement 

> Implement LIKE ANY/SOME/ALL UDF
> ---
>
> Key: SPARK-33045
> URL: https://issues.apache.org/jira/browse/SPARK-33045
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Yuming Wang
>Priority: Major
>
> We already support ANY / SOME / ALL syntax, but it will throw 
> {{StackOverflowError}} if there are many elements. We should implement 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33045) Implement LIKE ANY/SOME/ALL UDF

2020-10-01 Thread Yuming Wang (Jira)

Yuming Wang created SPARK-33045:
---

 Summary: Implement LIKE ANY/SOME/ALL UDF
 Key: SPARK-33045
 URL: https://issues.apache.org/jira/browse/SPARK-33045
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.1.0
Reporter: Yuming Wang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-33025) Empty file for the first partition

2020-10-01 Thread Takeshi Yamamuro (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takeshi Yamamuro resolved SPARK-33025.
--
Resolution: Won't Fix

> Empty file for the first partition
> --
>
> Key: SPARK-33025
> URL: https://issues.apache.org/jira/browse/SPARK-33025
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 3.0.1
>Reporter: Evgenii Samusenko
>Priority: Minor
>
> If I create Dataframe with 1 row, Spark will create empty file for the first 
> partition.
>  
> Example:
> val df = Seq(1).toDF("col1").repartition(8)
> df1.write.csv("/csv")
>  
> I got 2 files. The first contains the first partition and the second contains 
> single row from another partition. It is valid also for parquet, text and etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33025) Empty file for the first partition

2020-10-01 Thread Takeshi Yamamuro (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17205328#comment-17205328
 ] 

Takeshi Yamamuro commented on SPARK-33025:
--

This is not a bug but an expected behaviour and please see 
[https://github.com/apache/spark/pull/18654#issuecomment-315928986] for more 
details. Yea, we might be able to remove it for csv/json/text, but that is a 
minor fix, so I personally think we don't need to do so.

> Empty file for the first partition
> --
>
> Key: SPARK-33025
> URL: https://issues.apache.org/jira/browse/SPARK-33025
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 3.0.1
>Reporter: Evgenii Samusenko
>Priority: Minor
>
> If I create Dataframe with 1 row, Spark will create empty file for the first 
> partition.
>  
> Example:
> val df = Seq(1).toDF("col1").repartition(8)
> df1.write.csv("/csv")
>  
> I got 2 files. The first contains the first partition and the second contains 
> single row from another partition. It is valid also for parquet, text and etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

80 matches

Mail list logo