[jira] [Resolved] (SPARK-33048) Fix SparkBuild.scala to recognize build settings for Scala 2.13
[ https://issues.apache.org/jira/browse/SPARK-33048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-33048. -- Fix Version/s: 3.1.0 Resolution: Fixed Issue resolved by pull request 29927 [https://github.com/apache/spark/pull/29927] > Fix SparkBuild.scala to recognize build settings for Scala 2.13 > --- > > Key: SPARK-33048 > URL: https://issues.apache.org/jira/browse/SPARK-33048 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 3.0.1, 3.1.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Major > Fix For: 3.1.0 > > > In SparkBuild.scala, a variable 'scalaBinaryVersion' is hardcoded as '2.12'. > So, an environment variable 'SPARK_SCALA_VERSION' is also to be '2.12'. > This issue causes some test suites (e.g. SparkSubmitSuite) to be error. > {code} > = TEST OUTPUT FOR o.a.s.deploy.SparkSubmitSuite: 'user classpath first in > driver' = > 20/10/02 08:55:30.234 redirect stderr for command > /home/kou/work/oss/spark-scala-2.13/bin/spark-submit INFO Utils: Error: Could > not find or load m > ain class org.apache.spark.launcher.Main > 20/10/02 08:55:30.235 redirect stderr for command > /home/kou/work/oss/spark-scala-2.13/bin/spark-submit INFO Utils: > /home/kou/work/oss/spark-scala- > 2.13/bin/spark-class: line 96: CMD: bad array subscript > {code} > The reason of this error is that environment variables 'SPARK_JARS_DIR' and > 'LAUNCH_CLASSPATH' is defined in bin/spark-class as follows. > {code} > SPARK_JARS_DIR="${SPARK_HOME}/assembly/target/scala-$SPARK_SCALA_VERSION/jars" > LAUNCH_CLASSPATH="${SPARK_HOME}/launcher/target/scala-$SPARK_SCALA_VERSION/classes:$LAUNCH_CLASSPATH" > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-33051) Uses setup-r to install R in GitHub Actions build
[ https://issues.apache.org/jira/browse/SPARK-33051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-33051. -- Fix Version/s: 2.4.8 3.0.2 3.1.0 Resolution: Fixed Issue resolved by pull request 29931 [https://github.com/apache/spark/pull/29931] > Uses setup-r to install R in GitHub Actions build > - > > Key: SPARK-33051 > URL: https://issues.apache.org/jira/browse/SPARK-33051 > Project: Spark > Issue Type: Test > Components: Project Infra, SparkR >Affects Versions: 2.4.7, 3.0.1, 3.1.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 3.1.0, 3.0.2, 2.4.8 > > > At SPARK-32493, the R installation was switched to manual installation > because setup-r was broken. This seems fixed in the upstream so we should > better switch it back. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33051) Uses setup-r to install R in GitHub Actions build
[ https://issues.apache.org/jira/browse/SPARK-33051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-33051: Assignee: Hyukjin Kwon > Uses setup-r to install R in GitHub Actions build > - > > Key: SPARK-33051 > URL: https://issues.apache.org/jira/browse/SPARK-33051 > Project: Spark > Issue Type: Test > Components: Project Infra, SparkR >Affects Versions: 2.4.7, 3.0.1, 3.1.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > > At SPARK-32493, the R installation was switched to manual installation > because setup-r was broken. This seems fixed in the upstream so we should > better switch it back. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-33026) Add numRows to metric of BroadcastExchangeExec
[ https://issues.apache.org/jira/browse/SPARK-33026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-33026. --- Fix Version/s: 3.1.0 Resolution: Fixed Issue resolved by pull request 29904 [https://github.com/apache/spark/pull/29904] > Add numRows to metric of BroadcastExchangeExec > -- > > Key: SPARK-33026 > URL: https://issues.apache.org/jira/browse/SPARK-33026 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > Fix For: 3.1.0 > > > {{numRows}} can be used here: > https://github.com/apache/spark/blob/d6a68e0b67ff7de58073c176dd097070e88ac831/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/JoinEstimation.scala#L55-L156 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33026) Add numRows to metric of BroadcastExchangeExec
[ https://issues.apache.org/jira/browse/SPARK-33026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-33026: - Assignee: Yuming Wang > Add numRows to metric of BroadcastExchangeExec > -- > > Key: SPARK-33026 > URL: https://issues.apache.org/jira/browse/SPARK-33026 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > > {{numRows}} can be used here: > https://github.com/apache/spark/blob/d6a68e0b67ff7de58073c176dd097070e88ac831/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/JoinEstimation.scala#L55-L156 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33052) Make database versions up-to-date for integration tests
[ https://issues.apache.org/jira/browse/SPARK-33052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17205991#comment-17205991 ] Apache Spark commented on SPARK-33052: -- User 'maropu' has created a pull request for this issue: https://github.com/apache/spark/pull/29932 > Make database versions up-to-date for integration tests > --- > > Key: SPARK-33052 > URL: https://issues.apache.org/jira/browse/SPARK-33052 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 3.1.0 >Reporter: Takeshi Yamamuro >Priority: Major > > This ticket aims at updating database versions below for integration tests; > - ibmcom/db2:11.5.0.0a => ibmcom/db2:11.5.4.0 in DB2[Krb]IntegrationSuite > - mysql:5.7.28 => mysql:5.7.31 in MySQLIntegrationSuite > - postgres:12.0 => postgres:13.0 in Postgres[Krb]IntegrationSuite -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33052) Make database versions up-to-date for integration tests
[ https://issues.apache.org/jira/browse/SPARK-33052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17205990#comment-17205990 ] Apache Spark commented on SPARK-33052: -- User 'maropu' has created a pull request for this issue: https://github.com/apache/spark/pull/29932 > Make database versions up-to-date for integration tests > --- > > Key: SPARK-33052 > URL: https://issues.apache.org/jira/browse/SPARK-33052 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 3.1.0 >Reporter: Takeshi Yamamuro >Priority: Major > > This ticket aims at updating database versions below for integration tests; > - ibmcom/db2:11.5.0.0a => ibmcom/db2:11.5.4.0 in DB2[Krb]IntegrationSuite > - mysql:5.7.28 => mysql:5.7.31 in MySQLIntegrationSuite > - postgres:12.0 => postgres:13.0 in Postgres[Krb]IntegrationSuite -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33052) Make database versions up-to-date for integration tests
[ https://issues.apache.org/jira/browse/SPARK-33052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33052: Assignee: Apache Spark > Make database versions up-to-date for integration tests > --- > > Key: SPARK-33052 > URL: https://issues.apache.org/jira/browse/SPARK-33052 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 3.1.0 >Reporter: Takeshi Yamamuro >Assignee: Apache Spark >Priority: Major > > This ticket aims at updating database versions below for integration tests; > - ibmcom/db2:11.5.0.0a => ibmcom/db2:11.5.4.0 in DB2[Krb]IntegrationSuite > - mysql:5.7.28 => mysql:5.7.31 in MySQLIntegrationSuite > - postgres:12.0 => postgres:13.0 in Postgres[Krb]IntegrationSuite -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33052) Make database versions up-to-date for integration tests
[ https://issues.apache.org/jira/browse/SPARK-33052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33052: Assignee: (was: Apache Spark) > Make database versions up-to-date for integration tests > --- > > Key: SPARK-33052 > URL: https://issues.apache.org/jira/browse/SPARK-33052 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 3.1.0 >Reporter: Takeshi Yamamuro >Priority: Major > > This ticket aims at updating database versions below for integration tests; > - ibmcom/db2:11.5.0.0a => ibmcom/db2:11.5.4.0 in DB2[Krb]IntegrationSuite > - mysql:5.7.28 => mysql:5.7.31 in MySQLIntegrationSuite > - postgres:12.0 => postgres:13.0 in Postgres[Krb]IntegrationSuite -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33052) Make database versions up-to-date for integration tests
[ https://issues.apache.org/jira/browse/SPARK-33052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33052: Assignee: (was: Apache Spark) > Make database versions up-to-date for integration tests > --- > > Key: SPARK-33052 > URL: https://issues.apache.org/jira/browse/SPARK-33052 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 3.1.0 >Reporter: Takeshi Yamamuro >Priority: Major > > This ticket aims at updating database versions below for integration tests; > - ibmcom/db2:11.5.0.0a => ibmcom/db2:11.5.4.0 in DB2[Krb]IntegrationSuite > - mysql:5.7.28 => mysql:5.7.31 in MySQLIntegrationSuite > - postgres:12.0 => postgres:13.0 in Postgres[Krb]IntegrationSuite -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33052) Make database versions up-to-date for integration tests
Takeshi Yamamuro created SPARK-33052: Summary: Make database versions up-to-date for integration tests Key: SPARK-33052 URL: https://issues.apache.org/jira/browse/SPARK-33052 Project: Spark Issue Type: Test Components: SQL Affects Versions: 3.1.0 Reporter: Takeshi Yamamuro This ticket aims at updating database versions below for integration tests; - ibmcom/db2:11.5.0.0a => ibmcom/db2:11.5.4.0 in DB2[Krb]IntegrationSuite - mysql:5.7.28 => mysql:5.7.31 in MySQLIntegrationSuite - postgres:12.0 => postgres:13.0 in Postgres[Krb]IntegrationSuite -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33051) Uses setup-r to install R in GitHub Actions build
[ https://issues.apache.org/jira/browse/SPARK-33051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33051: Assignee: Apache Spark > Uses setup-r to install R in GitHub Actions build > - > > Key: SPARK-33051 > URL: https://issues.apache.org/jira/browse/SPARK-33051 > Project: Spark > Issue Type: Test > Components: Project Infra, SparkR >Affects Versions: 2.4.7, 3.0.1, 3.1.0 >Reporter: Hyukjin Kwon >Assignee: Apache Spark >Priority: Major > > At SPARK-32493, the R installation was switched to manual installation > because setup-r was broken. This seems fixed in the upstream so we should > better switch it back. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33051) Uses setup-r to install R in GitHub Actions build
[ https://issues.apache.org/jira/browse/SPARK-33051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17205984#comment-17205984 ] Apache Spark commented on SPARK-33051: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/29931 > Uses setup-r to install R in GitHub Actions build > - > > Key: SPARK-33051 > URL: https://issues.apache.org/jira/browse/SPARK-33051 > Project: Spark > Issue Type: Test > Components: Project Infra, SparkR >Affects Versions: 2.4.7, 3.0.1, 3.1.0 >Reporter: Hyukjin Kwon >Priority: Major > > At SPARK-32493, the R installation was switched to manual installation > because setup-r was broken. This seems fixed in the upstream so we should > better switch it back. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33051) Uses setup-r to install R in GitHub Actions build
[ https://issues.apache.org/jira/browse/SPARK-33051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33051: Assignee: (was: Apache Spark) > Uses setup-r to install R in GitHub Actions build > - > > Key: SPARK-33051 > URL: https://issues.apache.org/jira/browse/SPARK-33051 > Project: Spark > Issue Type: Test > Components: Project Infra, SparkR >Affects Versions: 2.4.7, 3.0.1, 3.1.0 >Reporter: Hyukjin Kwon >Priority: Major > > At SPARK-32493, the R installation was switched to manual installation > because setup-r was broken. This seems fixed in the upstream so we should > better switch it back. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33051) Uses setup-r to install R in GitHub Actions build
Hyukjin Kwon created SPARK-33051: Summary: Uses setup-r to install R in GitHub Actions build Key: SPARK-33051 URL: https://issues.apache.org/jira/browse/SPARK-33051 Project: Spark Issue Type: Test Components: Project Infra, SparkR Affects Versions: 3.0.1, 2.4.7, 3.1.0 Reporter: Hyukjin Kwon At SPARK-32493, the R installation was switched to manual installation because setup-r was broken. This seems fixed in the upstream so we should better switch it back. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-32001) Create Kerberos authentication provider API in JDBC connector
[ https://issues.apache.org/jira/browse/SPARK-32001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-32001. -- Fix Version/s: 3.1.0 Resolution: Fixed Issue resolved by pull request 29024 [https://github.com/apache/spark/pull/29024] > Create Kerberos authentication provider API in JDBC connector > - > > Key: SPARK-32001 > URL: https://issues.apache.org/jira/browse/SPARK-32001 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.1.0 >Reporter: Gabor Somogyi >Assignee: Gabor Somogyi >Priority: Major > Fix For: 3.1.0 > > > Adding embedded provider to all the possible databases would generate high > maintenance cost on Spark side. > Instead an API can be introduced which would allow to implement further > providers independently. > One important requirement what I suggest is: JDBC connection providers must > be loaded independently just like delegation token providers. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32001) Create Kerberos authentication provider API in JDBC connector
[ https://issues.apache.org/jira/browse/SPARK-32001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-32001: Assignee: Gabor Somogyi > Create Kerberos authentication provider API in JDBC connector > - > > Key: SPARK-32001 > URL: https://issues.apache.org/jira/browse/SPARK-32001 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.1.0 >Reporter: Gabor Somogyi >Assignee: Gabor Somogyi >Priority: Major > > Adding embedded provider to all the possible databases would generate high > maintenance cost on Spark side. > Instead an API can be introduced which would allow to implement further > providers independently. > One important requirement what I suggest is: JDBC connection providers must > be loaded independently just like delegation token providers. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33044) Add a Jenkins build and test job for Scala 2.13
[ https://issues.apache.org/jira/browse/SPARK-33044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17205947#comment-17205947 ] Hyukjin Kwon commented on SPARK-33044: -- Thanks [~dongjoon] for cc'ing me. Yeah, setting up a Jenkins job sounds good. > Add a Jenkins build and test job for Scala 2.13 > --- > > Key: SPARK-33044 > URL: https://issues.apache.org/jira/browse/SPARK-33044 > Project: Spark > Issue Type: Sub-task > Components: jenkins >Affects Versions: 3.1.0 >Reporter: Yang Jie >Priority: Major > > {{Master}} branch seems to be almost ready for Scala 2.13 now, we need a > Jenkins test job to verify current work results and CI. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32996) Handle Option.empty v1.ExecutorSummary#peakMemoryMetrics
[ https://issues.apache.org/jira/browse/SPARK-32996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] L. C. Hsieh updated SPARK-32996: Fix Version/s: 3.0.2 > Handle Option.empty v1.ExecutorSummary#peakMemoryMetrics > > > Key: SPARK-32996 > URL: https://issues.apache.org/jira/browse/SPARK-32996 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 3.0.1, 3.1.0 >Reporter: Shruti Gumma >Assignee: Shruti Gumma >Priority: Major > Fix For: 3.0.2, 3.1.0 > > > When {{peakMemoryMetrics}} in {{ExecutorSummary}} is {{Option.empty}}, then > the {{ExecutorMetricsJsonSerializer#serialize}} method does not execute the > {{jsonGenerator.writeObject}} method. This causes the json to be generated > with {{peakMemoryMetrics}} key added to the serialized string, but no > corresponding value. > This causes an error to be thrown when it is the next key {{attributes}} turn > to be added to the json: > {{com.fasterxml.jackson.core.JsonGenerationException: Can not write a field > name, expecting a value.}} > {{}} > {{}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33050) Upgrade Apache ORC to 1.5.12
[ https://issues.apache.org/jira/browse/SPARK-33050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33050: Assignee: (was: Apache Spark) > Upgrade Apache ORC to 1.5.12 > > > Key: SPARK-33050 > URL: https://issues.apache.org/jira/browse/SPARK-33050 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.1.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33050) Upgrade Apache ORC to 1.5.12
[ https://issues.apache.org/jira/browse/SPARK-33050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33050: Assignee: Apache Spark > Upgrade Apache ORC to 1.5.12 > > > Key: SPARK-33050 > URL: https://issues.apache.org/jira/browse/SPARK-33050 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.1.0 >Reporter: Dongjoon Hyun >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33050) Upgrade Apache ORC to 1.5.12
[ https://issues.apache.org/jira/browse/SPARK-33050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17205921#comment-17205921 ] Apache Spark commented on SPARK-33050: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/29930 > Upgrade Apache ORC to 1.5.12 > > > Key: SPARK-33050 > URL: https://issues.apache.org/jira/browse/SPARK-33050 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.1.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33050) Upgrade Apache ORC to 1.5.12
Dongjoon Hyun created SPARK-33050: - Summary: Upgrade Apache ORC to 1.5.12 Key: SPARK-33050 URL: https://issues.apache.org/jira/browse/SPARK-33050 Project: Spark Issue Type: Bug Components: Build Affects Versions: 3.1.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-32007) Spark Driver Supervise does not work reliably
[ https://issues.apache.org/jira/browse/SPARK-32007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17205918#comment-17205918 ] Aoyuan Liao edited comment on SPARK-32007 at 10/2/20, 1:40 AM: --- [~surajs21] Can you please post master's log for the first behavior? was (Author: eveliao): [~surajs21] Can you please post master's log for more information? > Spark Driver Supervise does not work reliably > - > > Key: SPARK-32007 > URL: https://issues.apache.org/jira/browse/SPARK-32007 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.4 > Environment: |Java Version|1.8.0_121 (Oracle Corporation)| > |Java Home|/usr/java/jdk1.8.0_121/jre| > |Scala Version|version 2.11.12| > |OS|Amazon Linux| > h4. >Reporter: Suraj Sharma >Priority: Critical > > I have a standalone cluster setup. I DO NOT have a streaming use case. I use > AWS EC2 machines to have spark master and worker processes. > *Problem*: If a spark worker machine running some drivers and executor dies, > then the driver is not spawned again on other healthy machines. > *Below are my findings:* > ||Action/Behaviour||Executor||Driver|| > |Worker Machine Stop|Relaunches on an active machine|NO Relaunch| > |kill -9 to process|Relaunches on other machines|Relaunches on other machines| > |kill to process|Relaunches on other machines|Relaunches on other machines| > *Cluster Setup:* > # I have a spark standalone cluster > # {{spark.driver.supervise=true}} > # Spark Master HA is enabled and is backed by zookeeper > # Spark version = 2.4.4 > # I am using a systemd script for the spark worker process -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32007) Spark Driver Supervise does not work reliably
[ https://issues.apache.org/jira/browse/SPARK-32007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17205918#comment-17205918 ] Aoyuan Liao commented on SPARK-32007: - [~surajs21] Can you please post master's log for more information? > Spark Driver Supervise does not work reliably > - > > Key: SPARK-32007 > URL: https://issues.apache.org/jira/browse/SPARK-32007 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.4 > Environment: |Java Version|1.8.0_121 (Oracle Corporation)| > |Java Home|/usr/java/jdk1.8.0_121/jre| > |Scala Version|version 2.11.12| > |OS|Amazon Linux| > h4. >Reporter: Suraj Sharma >Priority: Critical > > I have a standalone cluster setup. I DO NOT have a streaming use case. I use > AWS EC2 machines to have spark master and worker processes. > *Problem*: If a spark worker machine running some drivers and executor dies, > then the driver is not spawned again on other healthy machines. > *Below are my findings:* > ||Action/Behaviour||Executor||Driver|| > |Worker Machine Stop|Relaunches on an active machine|NO Relaunch| > |kill -9 to process|Relaunches on other machines|Relaunches on other machines| > |kill to process|Relaunches on other machines|Relaunches on other machines| > *Cluster Setup:* > # I have a spark standalone cluster > # {{spark.driver.supervise=true}} > # Spark Master HA is enabled and is backed by zookeeper > # Spark version = 2.4.4 > # I am using a systemd script for the spark worker process -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33049) Decommission Core Integration Test is flaky.
[ https://issues.apache.org/jira/browse/SPARK-33049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17205916#comment-17205916 ] Apache Spark commented on SPARK-33049: -- User 'holdenk' has created a pull request for this issue: https://github.com/apache/spark/pull/29929 > Decommission Core Integration Test is flaky. > > > Key: SPARK-33049 > URL: https://issues.apache.org/jira/browse/SPARK-33049 > Project: Spark > Issue Type: Bug > Components: Spark Core, Tests >Affects Versions: 3.1.0 >Reporter: Holden Karau >Priority: Trivial > > See https://github.com/apache/spark/pull/29923#issuecomment-702344724 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33049) Decommission Core Integration Test is flaky.
[ https://issues.apache.org/jira/browse/SPARK-33049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33049: Assignee: (was: Apache Spark) > Decommission Core Integration Test is flaky. > > > Key: SPARK-33049 > URL: https://issues.apache.org/jira/browse/SPARK-33049 > Project: Spark > Issue Type: Bug > Components: Spark Core, Tests >Affects Versions: 3.1.0 >Reporter: Holden Karau >Priority: Trivial > > See https://github.com/apache/spark/pull/29923#issuecomment-702344724 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33049) Decommission Core Integration Test is flaky.
[ https://issues.apache.org/jira/browse/SPARK-33049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33049: Assignee: Apache Spark > Decommission Core Integration Test is flaky. > > > Key: SPARK-33049 > URL: https://issues.apache.org/jira/browse/SPARK-33049 > Project: Spark > Issue Type: Bug > Components: Spark Core, Tests >Affects Versions: 3.1.0 >Reporter: Holden Karau >Assignee: Apache Spark >Priority: Trivial > > See https://github.com/apache/spark/pull/29923#issuecomment-702344724 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33049) Decommission Core Integration Test is flaky.
Holden Karau created SPARK-33049: Summary: Decommission Core Integration Test is flaky. Key: SPARK-33049 URL: https://issues.apache.org/jira/browse/SPARK-33049 Project: Spark Issue Type: Bug Components: Spark Core, Tests Affects Versions: 3.1.0 Reporter: Holden Karau See https://github.com/apache/spark/pull/29923#issuecomment-702344724 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32741) Check if the same ExprId refers to the unique attribute in logical plans
[ https://issues.apache.org/jira/browse/SPARK-32741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17205908#comment-17205908 ] Apache Spark commented on SPARK-32741: -- User 'maropu' has created a pull request for this issue: https://github.com/apache/spark/pull/29928 > Check if the same ExprId refers to the unique attribute in logical plans > > > Key: SPARK-32741 > URL: https://issues.apache.org/jira/browse/SPARK-32741 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Takeshi Yamamuro >Assignee: Takeshi Yamamuro >Priority: Major > Fix For: 3.1.0 > > > Some plan transformations (e.g., `RemoveNoopOperators`) implicitly assume the > same `ExprId` refers to the unique attribute. But, `RuleExecutor` does not > check this integrity between logical plan transformations. So, this ticket > targets at adding this check in `isPlanIntegral` of `Analyzer`/`Optimizer`. > This PR comes from the talk with @cloud-fan @viirya in > https://github.com/apache/spark/pull/29485#discussion_r475346278 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33048) Fix SparkBuild.scala to recognize build settings for Scala 2.13
[ https://issues.apache.org/jira/browse/SPARK-33048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17205899#comment-17205899 ] Apache Spark commented on SPARK-33048: -- User 'sarutak' has created a pull request for this issue: https://github.com/apache/spark/pull/29927 > Fix SparkBuild.scala to recognize build settings for Scala 2.13 > --- > > Key: SPARK-33048 > URL: https://issues.apache.org/jira/browse/SPARK-33048 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 3.0.1, 3.1.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Major > > In SparkBuild.scala, a variable 'scalaBinaryVersion' is hardcoded as '2.12'. > So, an environment variable 'SPARK_SCALA_VERSION' is also to be '2.12'. > This issue causes some test suites (e.g. SparkSubmitSuite) to be error. > {code} > = TEST OUTPUT FOR o.a.s.deploy.SparkSubmitSuite: 'user classpath first in > driver' = > 20/10/02 08:55:30.234 redirect stderr for command > /home/kou/work/oss/spark-scala-2.13/bin/spark-submit INFO Utils: Error: Could > not find or load m > ain class org.apache.spark.launcher.Main > 20/10/02 08:55:30.235 redirect stderr for command > /home/kou/work/oss/spark-scala-2.13/bin/spark-submit INFO Utils: > /home/kou/work/oss/spark-scala- > 2.13/bin/spark-class: line 96: CMD: bad array subscript > {code} > The reason of this error is that environment variables 'SPARK_JARS_DIR' and > 'LAUNCH_CLASSPATH' is defined in bin/spark-class as follows. > {code} > SPARK_JARS_DIR="${SPARK_HOME}/assembly/target/scala-$SPARK_SCALA_VERSION/jars" > LAUNCH_CLASSPATH="${SPARK_HOME}/launcher/target/scala-$SPARK_SCALA_VERSION/classes:$LAUNCH_CLASSPATH" > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33048) Fix SparkBuild.scala to recognize build settings for Scala 2.13
[ https://issues.apache.org/jira/browse/SPARK-33048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33048: Assignee: Kousuke Saruta (was: Apache Spark) > Fix SparkBuild.scala to recognize build settings for Scala 2.13 > --- > > Key: SPARK-33048 > URL: https://issues.apache.org/jira/browse/SPARK-33048 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 3.0.1, 3.1.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Major > > In SparkBuild.scala, a variable 'scalaBinaryVersion' is hardcoded as '2.12'. > So, an environment variable 'SPARK_SCALA_VERSION' is also to be '2.12'. > This issue causes some test suites (e.g. SparkSubmitSuite) to be error. > {code} > = TEST OUTPUT FOR o.a.s.deploy.SparkSubmitSuite: 'user classpath first in > driver' = > 20/10/02 08:55:30.234 redirect stderr for command > /home/kou/work/oss/spark-scala-2.13/bin/spark-submit INFO Utils: Error: Could > not find or load m > ain class org.apache.spark.launcher.Main > 20/10/02 08:55:30.235 redirect stderr for command > /home/kou/work/oss/spark-scala-2.13/bin/spark-submit INFO Utils: > /home/kou/work/oss/spark-scala- > 2.13/bin/spark-class: line 96: CMD: bad array subscript > {code} > The reason of this error is that environment variables 'SPARK_JARS_DIR' and > 'LAUNCH_CLASSPATH' is defined in bin/spark-class as follows. > {code} > SPARK_JARS_DIR="${SPARK_HOME}/assembly/target/scala-$SPARK_SCALA_VERSION/jars" > LAUNCH_CLASSPATH="${SPARK_HOME}/launcher/target/scala-$SPARK_SCALA_VERSION/classes:$LAUNCH_CLASSPATH" > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33048) Fix SparkBuild.scala to recognize build settings for Scala 2.13
[ https://issues.apache.org/jira/browse/SPARK-33048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33048: Assignee: Apache Spark (was: Kousuke Saruta) > Fix SparkBuild.scala to recognize build settings for Scala 2.13 > --- > > Key: SPARK-33048 > URL: https://issues.apache.org/jira/browse/SPARK-33048 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 3.0.1, 3.1.0 >Reporter: Kousuke Saruta >Assignee: Apache Spark >Priority: Major > > In SparkBuild.scala, a variable 'scalaBinaryVersion' is hardcoded as '2.12'. > So, an environment variable 'SPARK_SCALA_VERSION' is also to be '2.12'. > This issue causes some test suites (e.g. SparkSubmitSuite) to be error. > {code} > = TEST OUTPUT FOR o.a.s.deploy.SparkSubmitSuite: 'user classpath first in > driver' = > 20/10/02 08:55:30.234 redirect stderr for command > /home/kou/work/oss/spark-scala-2.13/bin/spark-submit INFO Utils: Error: Could > not find or load m > ain class org.apache.spark.launcher.Main > 20/10/02 08:55:30.235 redirect stderr for command > /home/kou/work/oss/spark-scala-2.13/bin/spark-submit INFO Utils: > /home/kou/work/oss/spark-scala- > 2.13/bin/spark-class: line 96: CMD: bad array subscript > {code} > The reason of this error is that environment variables 'SPARK_JARS_DIR' and > 'LAUNCH_CLASSPATH' is defined in bin/spark-class as follows. > {code} > SPARK_JARS_DIR="${SPARK_HOME}/assembly/target/scala-$SPARK_SCALA_VERSION/jars" > LAUNCH_CLASSPATH="${SPARK_HOME}/launcher/target/scala-$SPARK_SCALA_VERSION/classes:$LAUNCH_CLASSPATH" > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33048) Fix SparkBuild.scala to recognize build settings for Scala 2.13
[ https://issues.apache.org/jira/browse/SPARK-33048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17205898#comment-17205898 ] Apache Spark commented on SPARK-33048: -- User 'sarutak' has created a pull request for this issue: https://github.com/apache/spark/pull/29927 > Fix SparkBuild.scala to recognize build settings for Scala 2.13 > --- > > Key: SPARK-33048 > URL: https://issues.apache.org/jira/browse/SPARK-33048 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 3.0.1, 3.1.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Major > > In SparkBuild.scala, a variable 'scalaBinaryVersion' is hardcoded as '2.12'. > So, an environment variable 'SPARK_SCALA_VERSION' is also to be '2.12'. > This issue causes some test suites (e.g. SparkSubmitSuite) to be error. > {code} > = TEST OUTPUT FOR o.a.s.deploy.SparkSubmitSuite: 'user classpath first in > driver' = > 20/10/02 08:55:30.234 redirect stderr for command > /home/kou/work/oss/spark-scala-2.13/bin/spark-submit INFO Utils: Error: Could > not find or load m > ain class org.apache.spark.launcher.Main > 20/10/02 08:55:30.235 redirect stderr for command > /home/kou/work/oss/spark-scala-2.13/bin/spark-submit INFO Utils: > /home/kou/work/oss/spark-scala- > 2.13/bin/spark-class: line 96: CMD: bad array subscript > {code} > The reason of this error is that environment variables 'SPARK_JARS_DIR' and > 'LAUNCH_CLASSPATH' is defined in bin/spark-class as follows. > {code} > SPARK_JARS_DIR="${SPARK_HOME}/assembly/target/scala-$SPARK_SCALA_VERSION/jars" > LAUNCH_CLASSPATH="${SPARK_HOME}/launcher/target/scala-$SPARK_SCALA_VERSION/classes:$LAUNCH_CLASSPATH" > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33045) Implement built-in LIKE ANY and LIKE ALL UDF
[ https://issues.apache.org/jira/browse/SPARK-33045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-33045: Description: We already support LIKE ANY / SOME / ALL syntax, but it will throw {{StackOverflowError}} if there are many elements(more than 14378 elements). We should implement built-in LIKE ANY and LIKE ALL UDF. {noformat} java.lang.StackOverflowError at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38) at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62) at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53) at scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:184) at scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:47) at scala.collection.generic.GenericCompanion.apply(GenericCompanion.scala:53) at org.apache.spark.sql.catalyst.expressions.BinaryExpression.children(Expression.scala:549) at org.apache.spark.sql.catalyst.trees.TreeNode.foreach(TreeNode.scala:175) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1(TreeNode.scala:175) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1$adapted(TreeNode.scala:175) at scala.collection.immutable.List.foreach(List.scala:392) at org.apache.spark.sql.catalyst.trees.TreeNode.foreach(TreeNode.scala:175) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1(TreeNode.scala:175) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1$adapted(TreeNode.scala:175) at scala.collection.immutable.List.foreach(List.scala:392) {noformat} was: We already support ANY / SOME / ALL syntax, but it will throw {{StackOverflowError}} if there are many elements(more than 14378 elements). We should implement built-in LIKE ANY and LIKE ALL UDF. {noformat} java.lang.StackOverflowError at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38) at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62) at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53) at scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:184) at scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:47) at scala.collection.generic.GenericCompanion.apply(GenericCompanion.scala:53) at org.apache.spark.sql.catalyst.expressions.BinaryExpression.children(Expression.scala:549) at org.apache.spark.sql.catalyst.trees.TreeNode.foreach(TreeNode.scala:175) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1(TreeNode.scala:175) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1$adapted(TreeNode.scala:175) at scala.collection.immutable.List.foreach(List.scala:392) at org.apache.spark.sql.catalyst.trees.TreeNode.foreach(TreeNode.scala:175) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1(TreeNode.scala:175) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1$adapted(TreeNode.scala:175) at scala.collection.immutable.List.foreach(List.scala:392) {noformat} > Implement built-in LIKE ANY and LIKE ALL UDF > > > Key: SPARK-33045 > URL: https://issues.apache.org/jira/browse/SPARK-33045 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Yuming Wang >Priority: Major > > We already support LIKE ANY / SOME / ALL syntax, but it will throw > {{StackOverflowError}} if there are many elements(more than 14378 elements). > We should implement built-in LIKE ANY and LIKE ALL UDF. > {noformat} > java.lang.StackOverflowError > at > scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) > at > scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38) > at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62) > at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53) > at > scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:184) > at > scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:47) > at > scala.collection.generic.GenericCompanion.apply(GenericCompanion.scala:53) >
[jira] [Updated] (SPARK-33045) Implement built-in LIKE ANY and LIKE ALL UDF
[ https://issues.apache.org/jira/browse/SPARK-33045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-33045: Description: We already support LIKE ANY / SOME / ALL syntax, but it will throw {{StackOverflowError}} if there are many elements(more than 14378 elements). We should implement built-in LIKE ANY and LIKE ALL UDF to fix this issue. {noformat} java.lang.StackOverflowError at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38) at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62) at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53) at scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:184) at scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:47) at scala.collection.generic.GenericCompanion.apply(GenericCompanion.scala:53) at org.apache.spark.sql.catalyst.expressions.BinaryExpression.children(Expression.scala:549) at org.apache.spark.sql.catalyst.trees.TreeNode.foreach(TreeNode.scala:175) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1(TreeNode.scala:175) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1$adapted(TreeNode.scala:175) at scala.collection.immutable.List.foreach(List.scala:392) at org.apache.spark.sql.catalyst.trees.TreeNode.foreach(TreeNode.scala:175) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1(TreeNode.scala:175) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1$adapted(TreeNode.scala:175) at scala.collection.immutable.List.foreach(List.scala:392) {noformat} was: We already support LIKE ANY / SOME / ALL syntax, but it will throw {{StackOverflowError}} if there are many elements(more than 14378 elements). We should implement built-in LIKE ANY and LIKE ALL UDF. {noformat} java.lang.StackOverflowError at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38) at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62) at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53) at scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:184) at scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:47) at scala.collection.generic.GenericCompanion.apply(GenericCompanion.scala:53) at org.apache.spark.sql.catalyst.expressions.BinaryExpression.children(Expression.scala:549) at org.apache.spark.sql.catalyst.trees.TreeNode.foreach(TreeNode.scala:175) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1(TreeNode.scala:175) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1$adapted(TreeNode.scala:175) at scala.collection.immutable.List.foreach(List.scala:392) at org.apache.spark.sql.catalyst.trees.TreeNode.foreach(TreeNode.scala:175) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1(TreeNode.scala:175) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1$adapted(TreeNode.scala:175) at scala.collection.immutable.List.foreach(List.scala:392) {noformat} > Implement built-in LIKE ANY and LIKE ALL UDF > > > Key: SPARK-33045 > URL: https://issues.apache.org/jira/browse/SPARK-33045 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Yuming Wang >Priority: Major > > We already support LIKE ANY / SOME / ALL syntax, but it will throw > {{StackOverflowError}} if there are many elements(more than 14378 elements). > We should implement built-in LIKE ANY and LIKE ALL UDF to fix this issue. > {noformat} > java.lang.StackOverflowError > at > scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) > at > scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38) > at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62) > at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53) > at > scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:184) > at > scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:47) > at > scala.collection.generic.GenericCompanion.
[jira] [Resolved] (SPARK-32859) Introduce SQL physical plan rule to decide enable/disable bucketing
[ https://issues.apache.org/jira/browse/SPARK-32859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takeshi Yamamuro resolved SPARK-32859. -- Fix Version/s: 3.1.0 Assignee: Cheng Su Resolution: Fixed Resolved by https://github.com/apache/spark/pull/29804 > Introduce SQL physical plan rule to decide enable/disable bucketing > > > Key: SPARK-32859 > URL: https://issues.apache.org/jira/browse/SPARK-32859 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Cheng Su >Assignee: Cheng Su >Priority: Minor > Fix For: 3.1.0 > > > Discussed with [~cloud_fan] offline, it would be better that we can decide > enable/disable SQL bucketing automatically according to query plan. Currently > bucketing is enabled by default (`spark.sql.sources.bucketing.enabled`=true), > so for all bucketed tables in the query plan, we will use bucket table scan > (all input files per the bucket will be read by same task). This has the > drawback that if the bucket table scan is not benefitting at all (no > join/groupby/etc in the query), we don't need to use bucket table scan as it > would restrict the # of tasks to be # of buckets and might hurt parallelism. > > The proposed change is to introduce a physical plan rule (right before > `ensureRequirements`): > (1).transformUp() physical plan, matching SparkPlan operator which is > FileSourceScanExec, if optionalBucketSet is set, enabling bucket scan (bucket > filter in this case). > (2).transformUp() physical plan, matching SparkPlan operator which is > SparkPlanWithInterestingPartitioning. > SparkPlanWithInterestingPartitioning: the plan is in \{SortMergeJoinExec, > ShuffledHashJoinExec, HashAggregateExec, ObjectHashAggregateExec, > SortAggregateExec, etc, which has > HashClusteredDistribution/ClusteredDistribution in > requiredChildDistribution}, and its requiredChildDistribution > HashClusteredDistribution/ClusteredDistribution on its underlying > FileSourceScanExec's bucketed columns. > (3).for any child of SparkPlanWithInterestingPartitioning, which does not > satisfy the plan's requiredChildDistribution: go though the child's sub query > plan tree. > if (3.1).all node's outputPartitioning is same as child, and all node's > requiredChildDistribution is UnspecifiedDistribution. > and (3.2).the leaf node is FileSourceScanExec on bucketed table and > and (3.3).if enabling bucket scan for this FileSourceScanExec, the > outputPartitioning of FileSourceScanExec satisfies requiredChildDistribution > of SparkPlanWithInterestingPartitioning. > If (3.1),(3.2),(3.3) are all true, enabling bucket scan for this > FileSourceScanExec. And double check the new child of > SparkPlanWithInterestingPartitioning satisfies requiredChildDistribution. > > The idea of SparkPlanWithInterestingPartitioning, is inspired from > "interesting order" in "Access Path Selection in a Relational Database > Management > System"([http://www.inf.ed.ac.uk/teaching/courses/adbs/AccessPath.pdf]). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33048) Fix SparkBuild.scala to recognize build settings for Scala 2.13
Kousuke Saruta created SPARK-33048: -- Summary: Fix SparkBuild.scala to recognize build settings for Scala 2.13 Key: SPARK-33048 URL: https://issues.apache.org/jira/browse/SPARK-33048 Project: Spark Issue Type: Sub-task Components: Build Affects Versions: 3.0.1, 3.1.0 Reporter: Kousuke Saruta Assignee: Kousuke Saruta In SparkBuild.scala, a variable 'scalaBinaryVersion' is hardcoded as '2.12'. So, an environment variable 'SPARK_SCALA_VERSION' is also to be '2.12'. This issue causes some test suites (e.g. SparkSubmitSuite) to be error. {code} = TEST OUTPUT FOR o.a.s.deploy.SparkSubmitSuite: 'user classpath first in driver' = 20/10/02 08:55:30.234 redirect stderr for command /home/kou/work/oss/spark-scala-2.13/bin/spark-submit INFO Utils: Error: Could not find or load m ain class org.apache.spark.launcher.Main 20/10/02 08:55:30.235 redirect stderr for command /home/kou/work/oss/spark-scala-2.13/bin/spark-submit INFO Utils: /home/kou/work/oss/spark-scala- 2.13/bin/spark-class: line 96: CMD: bad array subscript {code} The reason of this error is that environment variables 'SPARK_JARS_DIR' and 'LAUNCH_CLASSPATH' is defined in bin/spark-class as follows. {code} SPARK_JARS_DIR="${SPARK_HOME}/assembly/target/scala-$SPARK_SCALA_VERSION/jars" LAUNCH_CLASSPATH="${SPARK_HOME}/launcher/target/scala-$SPARK_SCALA_VERSION/classes:$LAUNCH_CLASSPATH" {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-33046) Update how to build doc for Scala 2.13 with sbt
[ https://issues.apache.org/jira/browse/SPARK-33046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen resolved SPARK-33046. -- Fix Version/s: 3.1.0 Resolution: Fixed Issue resolved by pull request 29921 [https://github.com/apache/spark/pull/29921] > Update how to build doc for Scala 2.13 with sbt > --- > > Key: SPARK-33046 > URL: https://issues.apache.org/jira/browse/SPARK-33046 > Project: Spark > Issue Type: Sub-task > Components: docs >Affects Versions: 3.0.1, 3.1.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Minor > Fix For: 3.1.0 > > > In the current doc, how to build Spark for Scala 2.13 with sbt is described > like: > {code} > ./build/sbt -Dscala.version=2.13.0 > {code} > But build fails with this command because scala-2.13 profile is not enabled > and scala-parallel-collections is absent. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33046) Update how to build doc for Scala 2.13 with sbt
[ https://issues.apache.org/jira/browse/SPARK-33046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated SPARK-33046: --- Summary: Update how to build doc for Scala 2.13 with sbt (was: How to build for Scala 2.13 with sbt in the doc is wrong.) > Update how to build doc for Scala 2.13 with sbt > --- > > Key: SPARK-33046 > URL: https://issues.apache.org/jira/browse/SPARK-33046 > Project: Spark > Issue Type: Sub-task > Components: docs >Affects Versions: 3.0.1, 3.1.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Minor > > In the current doc, how to build Spark for Scala 2.13 with sbt is described > like: > {code} > ./build/sbt -Dscala.version=2.13.0 > {code} > But build fails with this command because scala-2.13 profile is not enabled > and scala-parallel-collections is absent. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-32585) Support scala enumeration in ScalaReflection
[ https://issues.apache.org/jira/browse/SPARK-32585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tathagata Das resolved SPARK-32585. --- Fix Version/s: 3.1.0 Resolution: Done > Support scala enumeration in ScalaReflection > > > Key: SPARK-32585 > URL: https://issues.apache.org/jira/browse/SPARK-32585 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: ulysses you >Priority: Minor > Fix For: 3.1.0 > > > Add code in {{ScalaReflection}} to support scala enumeration and make > enumeration type as string type in Spark. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33047) Upgrade hive-storage-api to 2.7.2
[ https://issues.apache.org/jira/browse/SPARK-33047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-33047: - Assignee: Dongjoon Hyun > Upgrade hive-storage-api to 2.7.2 > - > > Key: SPARK-33047 > URL: https://issues.apache.org/jira/browse/SPARK-33047 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.1.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32996) Handle Option.empty v1.ExecutorSummary#peakMemoryMetrics
[ https://issues.apache.org/jira/browse/SPARK-32996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17205791#comment-17205791 ] Apache Spark commented on SPARK-32996: -- User 'shrutig' has created a pull request for this issue: https://github.com/apache/spark/pull/29926 > Handle Option.empty v1.ExecutorSummary#peakMemoryMetrics > > > Key: SPARK-32996 > URL: https://issues.apache.org/jira/browse/SPARK-32996 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 3.0.1, 3.1.0 >Reporter: Shruti Gumma >Assignee: Shruti Gumma >Priority: Major > Fix For: 3.1.0 > > > When {{peakMemoryMetrics}} in {{ExecutorSummary}} is {{Option.empty}}, then > the {{ExecutorMetricsJsonSerializer#serialize}} method does not execute the > {{jsonGenerator.writeObject}} method. This causes the json to be generated > with {{peakMemoryMetrics}} key added to the serialized string, but no > corresponding value. > This causes an error to be thrown when it is the next key {{attributes}} turn > to be added to the json: > {{com.fasterxml.jackson.core.JsonGenerationException: Can not write a field > name, expecting a value.}} > {{}} > {{}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-33047) Upgrade hive-storage-api to 2.7.2
[ https://issues.apache.org/jira/browse/SPARK-33047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-33047. --- Fix Version/s: 3.1.0 Resolution: Fixed Issue resolved by pull request 29923 [https://github.com/apache/spark/pull/29923] > Upgrade hive-storage-api to 2.7.2 > - > > Key: SPARK-33047 > URL: https://issues.apache.org/jira/browse/SPARK-33047 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.1.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.1.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33043) RowMatrix is incompatible with spark.driver.maxResultSize=0
[ https://issues.apache.org/jira/browse/SPARK-33043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17205772#comment-17205772 ] Apache Spark commented on SPARK-33043: -- User 'srowen' has created a pull request for this issue: https://github.com/apache/spark/pull/29925 > RowMatrix is incompatible with spark.driver.maxResultSize=0 > --- > > Key: SPARK-33043 > URL: https://issues.apache.org/jira/browse/SPARK-33043 > Project: Spark > Issue Type: Bug > Components: MLlib >Affects Versions: 3.0.0, 3.0.1 >Reporter: Karen Feng >Priority: Minor > > RowMatrix does not work if spark.driver.maxResultSize=0, as this requirement > breaks: > > {code:java} > require(maxDriverResultSizeInBytes > aggregatedObjectSizeInBytes, > s"Cannot aggregate object of size $aggregatedObjectSizeInBytes Bytes, " > + s"as it's bigger than maxResultSize ($maxDriverResultSizeInBytes Bytes)") > {code} > > [https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala#L795.] > > This check should likely only happen if maxDriverResultSizeInBytes > 0. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33043) RowMatrix is incompatible with spark.driver.maxResultSize=0
[ https://issues.apache.org/jira/browse/SPARK-33043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33043: Assignee: (was: Apache Spark) > RowMatrix is incompatible with spark.driver.maxResultSize=0 > --- > > Key: SPARK-33043 > URL: https://issues.apache.org/jira/browse/SPARK-33043 > Project: Spark > Issue Type: Bug > Components: MLlib >Affects Versions: 3.0.0, 3.0.1 >Reporter: Karen Feng >Priority: Minor > > RowMatrix does not work if spark.driver.maxResultSize=0, as this requirement > breaks: > > {code:java} > require(maxDriverResultSizeInBytes > aggregatedObjectSizeInBytes, > s"Cannot aggregate object of size $aggregatedObjectSizeInBytes Bytes, " > + s"as it's bigger than maxResultSize ($maxDriverResultSizeInBytes Bytes)") > {code} > > [https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala#L795.] > > This check should likely only happen if maxDriverResultSizeInBytes > 0. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33043) RowMatrix is incompatible with spark.driver.maxResultSize=0
[ https://issues.apache.org/jira/browse/SPARK-33043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33043: Assignee: Apache Spark > RowMatrix is incompatible with spark.driver.maxResultSize=0 > --- > > Key: SPARK-33043 > URL: https://issues.apache.org/jira/browse/SPARK-33043 > Project: Spark > Issue Type: Bug > Components: MLlib >Affects Versions: 3.0.0, 3.0.1 >Reporter: Karen Feng >Assignee: Apache Spark >Priority: Minor > > RowMatrix does not work if spark.driver.maxResultSize=0, as this requirement > breaks: > > {code:java} > require(maxDriverResultSizeInBytes > aggregatedObjectSizeInBytes, > s"Cannot aggregate object of size $aggregatedObjectSizeInBytes Bytes, " > + s"as it's bigger than maxResultSize ($maxDriverResultSizeInBytes Bytes)") > {code} > > [https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala#L795.] > > This check should likely only happen if maxDriverResultSizeInBytes > 0. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33043) RowMatrix is incompatible with spark.driver.maxResultSize=0
[ https://issues.apache.org/jira/browse/SPARK-33043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17205771#comment-17205771 ] Apache Spark commented on SPARK-33043: -- User 'srowen' has created a pull request for this issue: https://github.com/apache/spark/pull/29925 > RowMatrix is incompatible with spark.driver.maxResultSize=0 > --- > > Key: SPARK-33043 > URL: https://issues.apache.org/jira/browse/SPARK-33043 > Project: Spark > Issue Type: Bug > Components: MLlib >Affects Versions: 3.0.0, 3.0.1 >Reporter: Karen Feng >Priority: Minor > > RowMatrix does not work if spark.driver.maxResultSize=0, as this requirement > breaks: > > {code:java} > require(maxDriverResultSizeInBytes > aggregatedObjectSizeInBytes, > s"Cannot aggregate object of size $aggregatedObjectSizeInBytes Bytes, " > + s"as it's bigger than maxResultSize ($maxDriverResultSizeInBytes Bytes)") > {code} > > [https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala#L795.] > > This check should likely only happen if maxDriverResultSizeInBytes > 0. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33037) Remove knownManagers hardcoded list
[ https://issues.apache.org/jira/browse/SPARK-33037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17205743#comment-17205743 ] BoYang commented on SPARK-33037: After discussion, we feel it is better to remove the knownManagers list. That makes code more clean and also support user's custom shuffle manager implementation. PR: https://github.com/apache/spark/pull/29916 > Remove knownManagers hardcoded list > --- > > Key: SPARK-33037 > URL: https://issues.apache.org/jira/browse/SPARK-33037 > Project: Spark > Issue Type: Improvement > Components: Shuffle >Affects Versions: 2.4.7, 3.0.1 >Reporter: BoYang >Priority: Major > > Spark has a hardcode list to contain known shuffle managers, which has two > values now. It does not contain user's custom shuffle manager which is set > through Spark config "spark.shuffle.manager". > > We hit issue when set "spark.shuffle.manager" with our own shuffle manager > plugin (Uber Remote Shuffle Service implementation, > [https://github.com/uber/RemoteShuffleService]). Other users will hit same > issue when they implement their own shuffle manager. > > Need to add "spark.shuffle.manager" config value to the known managers list > as well. > > The know managers list is in code: > common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockResolver.java > {quote}private final List knownManagers = Arrays.asList( > "org.apache.spark.shuffle.sort.SortShuffleManager", > "org.apache.spark.shuffle.unsafe.UnsafeShuffleManager"); > {quote} > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33037) Remove knownManagers hardcoded list
[ https://issues.apache.org/jira/browse/SPARK-33037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BoYang updated SPARK-33037: --- Summary: Remove knownManagers hardcoded list (was: Add "spark.shuffle.manager" value to knownManagers) > Remove knownManagers hardcoded list > --- > > Key: SPARK-33037 > URL: https://issues.apache.org/jira/browse/SPARK-33037 > Project: Spark > Issue Type: Improvement > Components: Shuffle >Affects Versions: 2.4.7, 3.0.1 >Reporter: BoYang >Priority: Major > > Spark has a hardcode list to contain known shuffle managers, which has two > values now. It does not contain user's custom shuffle manager which is set > through Spark config "spark.shuffle.manager". > > We hit issue when set "spark.shuffle.manager" with our own shuffle manager > plugin (Uber Remote Shuffle Service implementation, > [https://github.com/uber/RemoteShuffleService]). Other users will hit same > issue when they implement their own shuffle manager. > > Need to add "spark.shuffle.manager" config value to the known managers list > as well. > > The know managers list is in code: > common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockResolver.java > {quote}private final List knownManagers = Arrays.asList( > "org.apache.spark.shuffle.sort.SortShuffleManager", > "org.apache.spark.shuffle.unsafe.UnsafeShuffleManager"); > {quote} > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24554) Add MapType Support for Arrow in PySpark
[ https://issues.apache.org/jira/browse/SPARK-24554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17205719#comment-17205719 ] Bryan Cutler commented on SPARK-24554: -- I started working on this, but ran into an issue at https://issues.apache.org/jira/browse/ARROW-10151 which needs to be resolved first. > Add MapType Support for Arrow in PySpark > > > Key: SPARK-24554 > URL: https://issues.apache.org/jira/browse/SPARK-24554 > Project: Spark > Issue Type: Sub-task > Components: PySpark, SQL >Affects Versions: 2.3.1 >Reporter: Bryan Cutler >Priority: Major > Labels: bulk-closed > > Add support for MapType in Arrow related classes in Scala/Java and pyarrow > functionality in Python. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30821) Executor pods with multiple containers will not be rescheduled unless all containers fail
[ https://issues.apache.org/jira/browse/SPARK-30821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17205696#comment-17205696 ] Apache Spark commented on SPARK-30821: -- User 'huskysun' has created a pull request for this issue: https://github.com/apache/spark/pull/29924 > Executor pods with multiple containers will not be rescheduled unless all > containers fail > - > > Key: SPARK-30821 > URL: https://issues.apache.org/jira/browse/SPARK-30821 > Project: Spark > Issue Type: Improvement > Components: Kubernetes, Spark Core >Affects Versions: 3.1.0 >Reporter: Kevin Hogeland >Assignee: Apache Spark >Priority: Major > > Since the restart policy of launched pods is Never, additional handling is > required for pods that may have sidecar containers. The executor should be > considered failed if any containers have terminated and have a non-zero exit > code, but Spark currently only checks the pod phase. The pod phase will > remain "running" as long as _any_ pods are still running. Kubernetes sidecar > support in 1.18/1.19 does not address this situation, as sidecar containers > are excluded from pod phase calculation. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33047) Upgrade hive-storage-api to 2.7.2
[ https://issues.apache.org/jira/browse/SPARK-33047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17205662#comment-17205662 ] Apache Spark commented on SPARK-33047: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/29923 > Upgrade hive-storage-api to 2.7.2 > - > > Key: SPARK-33047 > URL: https://issues.apache.org/jira/browse/SPARK-33047 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.1.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33047) Upgrade hive-storage-api to 2.7.2
[ https://issues.apache.org/jira/browse/SPARK-33047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33047: Assignee: (was: Apache Spark) > Upgrade hive-storage-api to 2.7.2 > - > > Key: SPARK-33047 > URL: https://issues.apache.org/jira/browse/SPARK-33047 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.1.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33047) Upgrade hive-storage-api to 2.7.2
[ https://issues.apache.org/jira/browse/SPARK-33047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33047: Assignee: Apache Spark > Upgrade hive-storage-api to 2.7.2 > - > > Key: SPARK-33047 > URL: https://issues.apache.org/jira/browse/SPARK-33047 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.1.0 >Reporter: Dongjoon Hyun >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33047) Upgrade hive-storage-api to 2.7.2
Dongjoon Hyun created SPARK-33047: - Summary: Upgrade hive-storage-api to 2.7.2 Key: SPARK-33047 URL: https://issues.apache.org/jira/browse/SPARK-33047 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 3.1.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32723) Upgrade to jQuery 3.5.1
[ https://issues.apache.org/jira/browse/SPARK-32723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17205607#comment-17205607 ] Apache Spark commented on SPARK-32723: -- User 'n-marion' has created a pull request for this issue: https://github.com/apache/spark/pull/29922 > Upgrade to jQuery 3.5.1 > --- > > Key: SPARK-32723 > URL: https://issues.apache.org/jira/browse/SPARK-32723 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Ashish Kumar Singh >Assignee: Peter Toth >Priority: Major > Labels: Security > Fix For: 3.1.0 > > > Spark 3.0, Spark 2.4.x uses JQuery version < 3.5 which has known security > vulnerability in Spark Master UI and Spark Worker UI. > Can we please upgrade JQuery to 3.5 and above ? > [https://www.tenable.com/plugins/nessus/136929] > ??According to the self-reported version in the script, the version of JQuery > hosted on the remote web server is greater than or equal to 1.2 and prior to > 3.5.0. It is, therefore, affected by multiple cross site scripting > vulnerabilities.?? > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30186) support Dynamic Partition Pruning in Adaptive Execution
[ https://issues.apache.org/jira/browse/SPARK-30186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17205606#comment-17205606 ] Yuming Wang commented on SPARK-30186: - For our internal tpcds q77, enable AQE and enable DPP cannot work properly: {code:sql} WITH ss AS ( SELECT s_store_sk, Sum(ss_ext_sales_price) AS sales, Sum(ss_net_profit) AS profit FROM store_sales, date_dim, store WHEREss_sold_date_sk = d_date_sk AND d_date BETWEEN Cast('2000-08-23' AS DATE) AND ( Cast('2000-08-23' AS DATE) + interval '30' day) AND ss_store_sk = s_store_sk GROUP BY s_store_sk), sr AS ( SELECT s_store_sk, sum(sr_return_amt) AS returns, sum(sr_net_loss) AS profit_loss FROM store_returns, date_dim, store WHEREsr_returned_date_sk = d_date_sk AND d_date BETWEEN cast('2000-08-23' AS date) AND ( cast('2000-08-23' AS date) + interval '30' day) AND sr_store_sk = s_store_sk GROUP BY s_store_sk), cs AS ( SELECT cs_call_center_sk, sum(cs_ext_sales_price) AS sales, sum(cs_net_profit) AS profit FROM catalog_sales, date_dim WHEREcs_sold_date_sk = d_date_sk AND d_date BETWEEN cast('2000-08-23' AS date) AND ( cast('2000-08-23' AS date) + interval '30' day) GROUP BY cs_call_center_sk), cr AS ( SELECT cr_call_center_sk, sum(cr_return_amount) AS returns, sum(cr_net_loss) AS profit_loss FROM catalog_returns, date_dim WHEREcr_returned_date_sk = d_date_sk AND d_date BETWEEN cast('2000-08-23' AS date) AND ( cast('2000-08-23' AS date) + interval '30' day) GROUP BY cr_call_center_sk), ws AS ( SELECT wp_web_page_sk, sum(ws_ext_sales_price) AS sales, sum(ws_net_profit) AS profit FROM web_sales, date_dim, web_page WHEREws_sold_date_sk = d_date_sk AND d_date BETWEEN cast('2000-08-23' AS date) AND ( cast('2000-08-23' AS date) + interval '30' day) AND ws_web_page_sk = wp_web_page_sk GROUP BY wp_web_page_sk), wr AS ( SELECT wp_web_page_sk, sum(wr_return_amt) AS returns, sum(wr_net_loss) AS profit_loss FROM web_returns, date_dim, web_page WHEREwr_returned_date_sk = d_date_sk AND d_date BETWEEN cast('2000-08-23' AS date) AND ( cast('2000-08-23' AS date) + interval '30' day) AND wr_web_page_sk = wp_web_page_sk GROUP BY wp_web_page_sk) SELECT channel, id, sum(sales) AS sales, sum(returns) AS returns, sum(profit) AS profit FROM ( SELECT'store channel' AS channel, ss.s_store_sk AS id, sales, COALESCE(returns, 0) AS returns, (profit - COALESCE(profit_loss,0)) AS profit FROM ss LEFT JOIN sr ONss.s_store_sk = sr.s_store_sk UNION ALL SELECT 'catalog channel' AS channel, cs_call_center_sk AS id, sales, returns, (profit - profit_loss) AS profit FROM cs CROSS JOIN cr UNION ALL SELECT'web channel' AS channel, ws.wp_web_page_sk AS id, sales, COALESCE(returns, 0) returns, (profit - COALESCE(profit_loss,0)) AS profit FROM ws LEFT JOIN wr ONws.wp_web_page_sk = wr.wp_web_page_sk ) x GROUP BY rollup(channel, id) ORDER BY channel, id limit 100 {code} > support Dynamic Partition Pruning in Adaptive Execution > --- > > Key: SPARK-30186 > URL: https://issues.apache.org/jira/browse/SPARK-30186 > Project: Spark > Issue Type: Improvement > Components: SQL
[jira] [Commented] (SPARK-27318) Join operation on bucketing table fails with base adaptive enabled
[ https://issues.apache.org/jira/browse/SPARK-27318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17205558#comment-17205558 ] Yuming Wang commented on SPARK-27318: - Could you try Spark 3.x? > Join operation on bucketing table fails with base adaptive enabled > -- > > Key: SPARK-27318 > URL: https://issues.apache.org/jira/browse/SPARK-27318 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 >Reporter: Supritha >Priority: Major > > Join Operation on bucketed table is failing. > Steps to reproduce the issue. > {code} > spark.sql("set spark.sql.adaptive.enabled=true") > {code} > 1. Create table bukcet3 and bucket4 Table as below and load the data. > {code} > sql("create table bucket3(id3 int,country3 String, sports3 String) row format > delimited fields terminated by ','").show() > sql("create table bucket4(id4 int,country4 String) row format delimited > fields terminated by ','").show() > sql("load data local inpath '/opt/abhidata/bucket2.txt' into table > bucket3").show() > sql("load data local inpath '/opt/abhidata/bucket3.txt' into table > bucket4").show() > {code} > 2. Create bucketing table as below > {code} > spark.sqlContext.table("bucket3").write.bucketBy(3, > "id3").saveAsTable("bucketed_table_3"); > spark.sqlContext.table("bucket4").write.bucketBy(4, > "id4").saveAsTable("bucketed_table_4"); > {code} > 3. Execute the join query on the bucketed table > {code} > sql("select * from bucketed_table_3 join bucketed_table_4 on > bucketed_table_3.id3 = bucketed_table_4.id4").show() > {code} > > {code:java} > java.lang.IllegalArgumentException: requirement failed: > PartitioningCollection requires all of its partitionings have the same > numPartitions. at scala.Predef$.require(Predef.scala:224) at > org.apache.spark.sql.catalyst.plans.physical.PartitioningCollection.(partitioning.scala:291) > at > org.apache.spark.sql.execution.joins.SortMergeJoinExec.outputPartitioning(SortMergeJoinExec.scala:69) > at > org.apache.spark.sql.execution.exchange.EnsureRequirements$$anonfun$org$apache$spark$sql$execution$exchange$EnsureRequirements$$ensureDistributionAndOrdering$1.apply(EnsureRequirements.scala:150) > at > org.apache.spark.sql.execution.exchange.EnsureRequirements$$anonfun$org$apache$spark$sql$execution$exchange$EnsureRequirements$$ensureDistributionAndOrdering$1.apply(EnsureRequirements.scala:149) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.immutable.List.foreach(List.scala:392) at > scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at > scala.collection.immutable.List.map(List.scala:296) at > org.apache.spark.sql.execution.exchange.EnsureRequirements.org$apache$spark$sql$execution$exchange$EnsureRequirements$$ensureDistributionAndOrdering(EnsureRequirements.scala:149) > at > org.apache.spark.sql.execution.exchange.EnsureRequirements$$anonfun$apply$1.applyOrElse(EnsureRequirements.scala:304) > at > org.apache.spark.sql.execution.exchange.EnsureRequirements$$anonfun$apply$1.applyOrElse(EnsureRequirements.scala:296) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$2.apply(TreeNode.scala:282) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$2.apply(TreeNode.scala:282) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:281) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:275) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:275) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:326) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:324) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:275) > at > org.apache.spark.sql.execution.exchange.EnsureRequirements.apply(EnsureRequirements.scala:296) > at > org.apache.spark.sql.execution.exchange.EnsureRequirements.apply(EnsureRequirements.scala:38) > at > org.apache.spark.sql.execution.QueryExecution$$anonfun$prepareForExecution$1.apply(QueryExecution.scala:87) > at > org.apache.spark.sql.execution.QueryExecution$$anonfun$prepareForExecution$1.apply(QueryExecution.scala:87) > at > scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:124) > at scala.collection.immutable.List.foldLeft(List.scala:84) at > org.apache.spark.sql.execution.QueryExec
[jira] [Commented] (SPARK-33046) How to build for Scala 2.13 with sbt in the doc is wrong.
[ https://issues.apache.org/jira/browse/SPARK-33046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17205528#comment-17205528 ] Apache Spark commented on SPARK-33046: -- User 'sarutak' has created a pull request for this issue: https://github.com/apache/spark/pull/29921 > How to build for Scala 2.13 with sbt in the doc is wrong. > - > > Key: SPARK-33046 > URL: https://issues.apache.org/jira/browse/SPARK-33046 > Project: Spark > Issue Type: Sub-task > Components: docs >Affects Versions: 3.0.1, 3.1.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Minor > > In the current doc, how to build Spark for Scala 2.13 with sbt is described > like: > {code} > ./build/sbt -Dscala.version=2.13.0 > {code} > But build fails with this command because scala-2.13 profile is not enabled > and scala-parallel-collections is absent. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33046) How to build for Scala 2.13 with sbt in the doc is wrong.
[ https://issues.apache.org/jira/browse/SPARK-33046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17205526#comment-17205526 ] Apache Spark commented on SPARK-33046: -- User 'sarutak' has created a pull request for this issue: https://github.com/apache/spark/pull/29921 > How to build for Scala 2.13 with sbt in the doc is wrong. > - > > Key: SPARK-33046 > URL: https://issues.apache.org/jira/browse/SPARK-33046 > Project: Spark > Issue Type: Sub-task > Components: docs >Affects Versions: 3.0.1, 3.1.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Minor > > In the current doc, how to build Spark for Scala 2.13 with sbt is described > like: > {code} > ./build/sbt -Dscala.version=2.13.0 > {code} > But build fails with this command because scala-2.13 profile is not enabled > and scala-parallel-collections is absent. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33046) How to build for Scala 2.13 with sbt in the doc is wrong.
[ https://issues.apache.org/jira/browse/SPARK-33046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33046: Assignee: Apache Spark (was: Kousuke Saruta) > How to build for Scala 2.13 with sbt in the doc is wrong. > - > > Key: SPARK-33046 > URL: https://issues.apache.org/jira/browse/SPARK-33046 > Project: Spark > Issue Type: Sub-task > Components: docs >Affects Versions: 3.0.1, 3.1.0 >Reporter: Kousuke Saruta >Assignee: Apache Spark >Priority: Minor > > In the current doc, how to build Spark for Scala 2.13 with sbt is described > like: > {code} > ./build/sbt -Dscala.version=2.13.0 > {code} > But build fails with this command because scala-2.13 profile is not enabled > and scala-parallel-collections is absent. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33046) How to build for Scala 2.13 with sbt in the doc is wrong.
[ https://issues.apache.org/jira/browse/SPARK-33046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33046: Assignee: Kousuke Saruta (was: Apache Spark) > How to build for Scala 2.13 with sbt in the doc is wrong. > - > > Key: SPARK-33046 > URL: https://issues.apache.org/jira/browse/SPARK-33046 > Project: Spark > Issue Type: Sub-task > Components: docs >Affects Versions: 3.0.1, 3.1.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Minor > > In the current doc, how to build Spark for Scala 2.13 with sbt is described > like: > {code} > ./build/sbt -Dscala.version=2.13.0 > {code} > But build fails with this command because scala-2.13 profile is not enabled > and scala-parallel-collections is absent. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33024) Fix CodeGen fallback issue of UDFSuite in Scala 2.13
[ https://issues.apache.org/jira/browse/SPARK-33024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen reassigned SPARK-33024: Assignee: Yang Jie > Fix CodeGen fallback issue of UDFSuite in Scala 2.13 > > > Key: SPARK-33024 > URL: https://issues.apache.org/jira/browse/SPARK-33024 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > > After SPARK-32851 set `CODEGEN_FACTORY_MODE` to `CODEGEN_ONLY` in > SharedSparkSessionBase of sparkConf to construction SparkSession in Test, > The test suite `SPARK-32459: UDF should not fail on WrappedArray` in > s.sql.UDFSuite exposed a codegen fallback issue in Scala 2.13 as follow: > {code:java} > - SPARK-32459: UDF should not fail on WrappedArray *** FAILED *** > Caused by: org.codehaus.commons.compiler.CompileException: File > 'generated.java', Line 47, Column 99: failed to compile: > org.codehaus.commons.compiler.CompileException: File 'generated.java', Line > 47, Column 99: No applicable constructor/method found for zero actual > parameters; candidates are: "public scala.collection.mutable.Builder > scala.collection.mutable.ArraySeq$.newBuilder(java.lang.Object)", "public > scala.collection.mutable.Builder > scala.collection.mutable.ArraySeq$.newBuilder(scala.reflect.ClassTag)", > "public abstract scala.collection.mutable.Builder > scala.collection.EvidenceIterableFactory.newBuilder(java.lang.Object)" > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-33024) Fix CodeGen fallback issue of UDFSuite in Scala 2.13
[ https://issues.apache.org/jira/browse/SPARK-33024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen resolved SPARK-33024. -- Fix Version/s: 3.1.0 Resolution: Fixed Issue resolved by pull request 29903 [https://github.com/apache/spark/pull/29903] > Fix CodeGen fallback issue of UDFSuite in Scala 2.13 > > > Key: SPARK-33024 > URL: https://issues.apache.org/jira/browse/SPARK-33024 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Fix For: 3.1.0 > > > After SPARK-32851 set `CODEGEN_FACTORY_MODE` to `CODEGEN_ONLY` in > SharedSparkSessionBase of sparkConf to construction SparkSession in Test, > The test suite `SPARK-32459: UDF should not fail on WrappedArray` in > s.sql.UDFSuite exposed a codegen fallback issue in Scala 2.13 as follow: > {code:java} > - SPARK-32459: UDF should not fail on WrappedArray *** FAILED *** > Caused by: org.codehaus.commons.compiler.CompileException: File > 'generated.java', Line 47, Column 99: failed to compile: > org.codehaus.commons.compiler.CompileException: File 'generated.java', Line > 47, Column 99: No applicable constructor/method found for zero actual > parameters; candidates are: "public scala.collection.mutable.Builder > scala.collection.mutable.ArraySeq$.newBuilder(java.lang.Object)", "public > scala.collection.mutable.Builder > scala.collection.mutable.ArraySeq$.newBuilder(scala.reflect.ClassTag)", > "public abstract scala.collection.mutable.Builder > scala.collection.EvidenceIterableFactory.newBuilder(java.lang.Object)" > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33045) Implement built-in LIKE ANY and LIKE ALL UDF
[ https://issues.apache.org/jira/browse/SPARK-33045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-33045: Description: We already support ANY / SOME / ALL syntax, but it will throw {{StackOverflowError}} if there are many elements(more than 14378 elements). We should implement built-in LIKE ANY and LIKE ALL UDF. {noformat} java.lang.StackOverflowError at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38) at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62) at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53) at scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:184) at scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:47) at scala.collection.generic.GenericCompanion.apply(GenericCompanion.scala:53) at org.apache.spark.sql.catalyst.expressions.BinaryExpression.children(Expression.scala:549) at org.apache.spark.sql.catalyst.trees.TreeNode.foreach(TreeNode.scala:175) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1(TreeNode.scala:175) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1$adapted(TreeNode.scala:175) at scala.collection.immutable.List.foreach(List.scala:392) at org.apache.spark.sql.catalyst.trees.TreeNode.foreach(TreeNode.scala:175) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1(TreeNode.scala:175) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1$adapted(TreeNode.scala:175) at scala.collection.immutable.List.foreach(List.scala:392) {noformat} was: We already support ANY / SOME / ALL syntax, but it will throw {{StackOverflowError}} if there are many elements(more than 14378 elements). We should implement LIKE ANY and LIKE ALL built-in UDF. {noformat} java.lang.StackOverflowError at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38) at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62) at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53) at scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:184) at scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:47) at scala.collection.generic.GenericCompanion.apply(GenericCompanion.scala:53) at org.apache.spark.sql.catalyst.expressions.BinaryExpression.children(Expression.scala:549) at org.apache.spark.sql.catalyst.trees.TreeNode.foreach(TreeNode.scala:175) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1(TreeNode.scala:175) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1$adapted(TreeNode.scala:175) at scala.collection.immutable.List.foreach(List.scala:392) at org.apache.spark.sql.catalyst.trees.TreeNode.foreach(TreeNode.scala:175) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1(TreeNode.scala:175) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1$adapted(TreeNode.scala:175) at scala.collection.immutable.List.foreach(List.scala:392) {noformat} > Implement built-in LIKE ANY and LIKE ALL UDF > > > Key: SPARK-33045 > URL: https://issues.apache.org/jira/browse/SPARK-33045 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Yuming Wang >Priority: Major > > We already support ANY / SOME / ALL syntax, but it will throw > {{StackOverflowError}} if there are many elements(more than 14378 elements). > We should implement built-in LIKE ANY and LIKE ALL UDF. > {noformat} > java.lang.StackOverflowError > at > scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) > at > scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38) > at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62) > at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53) > at > scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:184) > at > scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:47) > at > scala.collection.generic.GenericCompanion.apply(GenericCompanion.scala:53) > at > org.
[jira] [Updated] (SPARK-33045) Implement built-in LIKE ANY and LIKE ALL UDF
[ https://issues.apache.org/jira/browse/SPARK-33045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-33045: Summary: Implement built-in LIKE ANY and LIKE ALL UDF (was: Implement LIKE ANY and LIKE ALL built-in UDF) > Implement built-in LIKE ANY and LIKE ALL UDF > > > Key: SPARK-33045 > URL: https://issues.apache.org/jira/browse/SPARK-33045 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Yuming Wang >Priority: Major > > We already support ANY / SOME / ALL syntax, but it will throw > {{StackOverflowError}} if there are many elements(more than 14378 elements). > We should implement LIKE ANY and LIKE ALL built-in UDF. > {noformat} > java.lang.StackOverflowError > at > scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) > at > scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38) > at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62) > at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53) > at > scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:184) > at > scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:47) > at > scala.collection.generic.GenericCompanion.apply(GenericCompanion.scala:53) > at > org.apache.spark.sql.catalyst.expressions.BinaryExpression.children(Expression.scala:549) > at > org.apache.spark.sql.catalyst.trees.TreeNode.foreach(TreeNode.scala:175) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1(TreeNode.scala:175) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1$adapted(TreeNode.scala:175) > at scala.collection.immutable.List.foreach(List.scala:392) > at > org.apache.spark.sql.catalyst.trees.TreeNode.foreach(TreeNode.scala:175) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1(TreeNode.scala:175) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1$adapted(TreeNode.scala:175) > at scala.collection.immutable.List.foreach(List.scala:392) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33046) How to build for Scala 2.13 with sbt in the doc is wrong.
Kousuke Saruta created SPARK-33046: -- Summary: How to build for Scala 2.13 with sbt in the doc is wrong. Key: SPARK-33046 URL: https://issues.apache.org/jira/browse/SPARK-33046 Project: Spark Issue Type: Sub-task Components: docs Affects Versions: 3.0.1, 3.1.0 Reporter: Kousuke Saruta Assignee: Kousuke Saruta In the current doc, how to build Spark for Scala 2.13 with sbt is described like: {code} ./build/sbt -Dscala.version=2.13.0 {code} But build fails with this command because scala-2.13 profile is not enabled and scala-parallel-collections is absent. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33045) Implement LIKE ANY and LIKE ALL built-in UDF
[ https://issues.apache.org/jira/browse/SPARK-33045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-33045: Description: We already support ANY / SOME / ALL syntax, but it will throw {{StackOverflowError}} if there are many elements(more than 14378 elements). We should implement LIKE ANY and LIKE ALL built-in UDF. {noformat} java.lang.StackOverflowError at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38) at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62) at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53) at scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:184) at scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:47) at scala.collection.generic.GenericCompanion.apply(GenericCompanion.scala:53) at org.apache.spark.sql.catalyst.expressions.BinaryExpression.children(Expression.scala:549) at org.apache.spark.sql.catalyst.trees.TreeNode.foreach(TreeNode.scala:175) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1(TreeNode.scala:175) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1$adapted(TreeNode.scala:175) at scala.collection.immutable.List.foreach(List.scala:392) at org.apache.spark.sql.catalyst.trees.TreeNode.foreach(TreeNode.scala:175) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1(TreeNode.scala:175) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1$adapted(TreeNode.scala:175) at scala.collection.immutable.List.foreach(List.scala:392) {noformat} was: We already support ANY / SOME / ALL syntax, but it will throw {{StackOverflowError}} if there are many elements(more than 14378 elements). We should implement LIKE ANY/SOME/ALL UDF. {noformat} java.lang.StackOverflowError at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38) at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62) at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53) at scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:184) at scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:47) at scala.collection.generic.GenericCompanion.apply(GenericCompanion.scala:53) at org.apache.spark.sql.catalyst.expressions.BinaryExpression.children(Expression.scala:549) at org.apache.spark.sql.catalyst.trees.TreeNode.foreach(TreeNode.scala:175) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1(TreeNode.scala:175) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1$adapted(TreeNode.scala:175) at scala.collection.immutable.List.foreach(List.scala:392) at org.apache.spark.sql.catalyst.trees.TreeNode.foreach(TreeNode.scala:175) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1(TreeNode.scala:175) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1$adapted(TreeNode.scala:175) at scala.collection.immutable.List.foreach(List.scala:392) {noformat} > Implement LIKE ANY and LIKE ALL built-in UDF > > > Key: SPARK-33045 > URL: https://issues.apache.org/jira/browse/SPARK-33045 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Yuming Wang >Priority: Major > > We already support ANY / SOME / ALL syntax, but it will throw > {{StackOverflowError}} if there are many elements(more than 14378 elements). > We should implement LIKE ANY and LIKE ALL built-in UDF. > {noformat} > java.lang.StackOverflowError > at > scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) > at > scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38) > at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62) > at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53) > at > scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:184) > at > scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:47) > at > scala.collection.generic.GenericCompanion.apply(GenericCompanion.scala:53) > at > org.apache.spark.
[jira] [Updated] (SPARK-33045) Implement LIKE ANY and LIKE ALL built-in UDF
[ https://issues.apache.org/jira/browse/SPARK-33045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-33045: Summary: Implement LIKE ANY and LIKE ALL built-in UDF (was: Implement LIKE ANY and LIKE ALL UDF) > Implement LIKE ANY and LIKE ALL built-in UDF > > > Key: SPARK-33045 > URL: https://issues.apache.org/jira/browse/SPARK-33045 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Yuming Wang >Priority: Major > > We already support ANY / SOME / ALL syntax, but it will throw > {{StackOverflowError}} if there are many elements(more than 14378 elements). > We should implement LIKE ANY/SOME/ALL UDF. > {noformat} > java.lang.StackOverflowError > at > scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) > at > scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38) > at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62) > at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53) > at > scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:184) > at > scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:47) > at > scala.collection.generic.GenericCompanion.apply(GenericCompanion.scala:53) > at > org.apache.spark.sql.catalyst.expressions.BinaryExpression.children(Expression.scala:549) > at > org.apache.spark.sql.catalyst.trees.TreeNode.foreach(TreeNode.scala:175) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1(TreeNode.scala:175) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1$adapted(TreeNode.scala:175) > at scala.collection.immutable.List.foreach(List.scala:392) > at > org.apache.spark.sql.catalyst.trees.TreeNode.foreach(TreeNode.scala:175) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1(TreeNode.scala:175) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1$adapted(TreeNode.scala:175) > at scala.collection.immutable.List.foreach(List.scala:392) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33045) Implement LIKE ANY and LIKE ALL UDF
[ https://issues.apache.org/jira/browse/SPARK-33045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-33045: Summary: Implement LIKE ANY and LIKE ALL UDF (was: Implement LIKE ANY/SOME/ALL UDF) > Implement LIKE ANY and LIKE ALL UDF > --- > > Key: SPARK-33045 > URL: https://issues.apache.org/jira/browse/SPARK-33045 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Yuming Wang >Priority: Major > > We already support ANY / SOME / ALL syntax, but it will throw > {{StackOverflowError}} if there are many elements(more than 14378 elements). > We should implement LIKE ANY/SOME/ALL UDF. > {noformat} > java.lang.StackOverflowError > at > scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) > at > scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38) > at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62) > at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53) > at > scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:184) > at > scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:47) > at > scala.collection.generic.GenericCompanion.apply(GenericCompanion.scala:53) > at > org.apache.spark.sql.catalyst.expressions.BinaryExpression.children(Expression.scala:549) > at > org.apache.spark.sql.catalyst.trees.TreeNode.foreach(TreeNode.scala:175) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1(TreeNode.scala:175) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1$adapted(TreeNode.scala:175) > at scala.collection.immutable.List.foreach(List.scala:392) > at > org.apache.spark.sql.catalyst.trees.TreeNode.foreach(TreeNode.scala:175) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1(TreeNode.scala:175) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1$adapted(TreeNode.scala:175) > at scala.collection.immutable.List.foreach(List.scala:392) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-32965) pyspark reading csv files with utf_16le encoding
[ https://issues.apache.org/jira/browse/SPARK-32965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Punit Shah reopened SPARK-32965: The linked duplicate issue won't be fixed because the issue was mixed with a multiline feature issue. However my ticket exclusively deals with utf-16le and utf-16be encoding not being handled correctly via pyspark. Therefore this issue is still open and unresolved. > pyspark reading csv files with utf_16le encoding > > > Key: SPARK-32965 > URL: https://issues.apache.org/jira/browse/SPARK-32965 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.7, 3.0.0, 3.0.1 >Reporter: Punit Shah >Priority: Major > Attachments: 16le.csv, 32965.png > > > If you have a file encoded in utf_16le or utf_16be and try to use > spark.read.csv("", encoding="utf_16le") the dataframe isn't > rendered properly > if you use python decoding like: > prdd = spark_session._sc.binaryFiles(path_url).values().flatMap(lambda x : > x.decode("utf_16le").splitlines()) > and then do spark.read.csv(prdd), then it works. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33045) Implement LIKE ANY/SOME/ALL UDF
[ https://issues.apache.org/jira/browse/SPARK-33045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-33045: Description: We already support ANY / SOME / ALL syntax, but it will throw {{StackOverflowError}} if there are many elements(more than 14738 elements). We should implement LIKE ANY/SOME/ALL UDF. {noformat} java.lang.StackOverflowError at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38) at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62) at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53) at scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:184) at scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:47) at scala.collection.generic.GenericCompanion.apply(GenericCompanion.scala:53) at org.apache.spark.sql.catalyst.expressions.BinaryExpression.children(Expression.scala:549) at org.apache.spark.sql.catalyst.trees.TreeNode.foreach(TreeNode.scala:175) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1(TreeNode.scala:175) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1$adapted(TreeNode.scala:175) at scala.collection.immutable.List.foreach(List.scala:392) at org.apache.spark.sql.catalyst.trees.TreeNode.foreach(TreeNode.scala:175) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1(TreeNode.scala:175) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1$adapted(TreeNode.scala:175) at scala.collection.immutable.List.foreach(List.scala:392) {noformat} was: We already support ANY / SOME / ALL syntax, but it will throw {{StackOverflowError}} if there are many elements. We should implement LIKE ANY/SOME/ALL UDF. {noformat} java.lang.StackOverflowError at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38) at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62) at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53) at scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:184) at scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:47) at scala.collection.generic.GenericCompanion.apply(GenericCompanion.scala:53) at org.apache.spark.sql.catalyst.expressions.BinaryExpression.children(Expression.scala:549) at org.apache.spark.sql.catalyst.trees.TreeNode.foreach(TreeNode.scala:175) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1(TreeNode.scala:175) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1$adapted(TreeNode.scala:175) at scala.collection.immutable.List.foreach(List.scala:392) at org.apache.spark.sql.catalyst.trees.TreeNode.foreach(TreeNode.scala:175) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1(TreeNode.scala:175) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1$adapted(TreeNode.scala:175) at scala.collection.immutable.List.foreach(List.scala:392) {noformat} > Implement LIKE ANY/SOME/ALL UDF > --- > > Key: SPARK-33045 > URL: https://issues.apache.org/jira/browse/SPARK-33045 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Yuming Wang >Priority: Major > > We already support ANY / SOME / ALL syntax, but it will throw > {{StackOverflowError}} if there are many elements(more than 14738 elements). > We should implement LIKE ANY/SOME/ALL UDF. > {noformat} > java.lang.StackOverflowError > at > scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) > at > scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38) > at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62) > at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53) > at > scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:184) > at > scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:47) > at > scala.collection.generic.GenericCompanion.apply(GenericCompanion.scala:53) > at > org.apache.spark.sql.catalyst.expressions.BinaryExpression.children(Expression.scala:549) >
[jira] [Updated] (SPARK-33045) Implement LIKE ANY/SOME/ALL UDF
[ https://issues.apache.org/jira/browse/SPARK-33045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-33045: Description: We already support ANY / SOME / ALL syntax, but it will throw {{StackOverflowError}} if there are many elements(more than 14378 elements). We should implement LIKE ANY/SOME/ALL UDF. {noformat} java.lang.StackOverflowError at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38) at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62) at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53) at scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:184) at scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:47) at scala.collection.generic.GenericCompanion.apply(GenericCompanion.scala:53) at org.apache.spark.sql.catalyst.expressions.BinaryExpression.children(Expression.scala:549) at org.apache.spark.sql.catalyst.trees.TreeNode.foreach(TreeNode.scala:175) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1(TreeNode.scala:175) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1$adapted(TreeNode.scala:175) at scala.collection.immutable.List.foreach(List.scala:392) at org.apache.spark.sql.catalyst.trees.TreeNode.foreach(TreeNode.scala:175) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1(TreeNode.scala:175) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1$adapted(TreeNode.scala:175) at scala.collection.immutable.List.foreach(List.scala:392) {noformat} was: We already support ANY / SOME / ALL syntax, but it will throw {{StackOverflowError}} if there are many elements(more than 14738 elements). We should implement LIKE ANY/SOME/ALL UDF. {noformat} java.lang.StackOverflowError at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38) at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62) at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53) at scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:184) at scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:47) at scala.collection.generic.GenericCompanion.apply(GenericCompanion.scala:53) at org.apache.spark.sql.catalyst.expressions.BinaryExpression.children(Expression.scala:549) at org.apache.spark.sql.catalyst.trees.TreeNode.foreach(TreeNode.scala:175) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1(TreeNode.scala:175) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1$adapted(TreeNode.scala:175) at scala.collection.immutable.List.foreach(List.scala:392) at org.apache.spark.sql.catalyst.trees.TreeNode.foreach(TreeNode.scala:175) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1(TreeNode.scala:175) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1$adapted(TreeNode.scala:175) at scala.collection.immutable.List.foreach(List.scala:392) {noformat} > Implement LIKE ANY/SOME/ALL UDF > --- > > Key: SPARK-33045 > URL: https://issues.apache.org/jira/browse/SPARK-33045 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Yuming Wang >Priority: Major > > We already support ANY / SOME / ALL syntax, but it will throw > {{StackOverflowError}} if there are many elements(more than 14378 elements). > We should implement LIKE ANY/SOME/ALL UDF. > {noformat} > java.lang.StackOverflowError > at > scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) > at > scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38) > at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62) > at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53) > at > scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:184) > at > scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:47) > at > scala.collection.generic.GenericCompanion.apply(GenericCompanion.scala:53) > at > org.apache.spark.sql.catalyst.expressions.BinaryExpression.children(E
[jira] [Updated] (SPARK-33045) Implement LIKE ANY/SOME/ALL UDF
[ https://issues.apache.org/jira/browse/SPARK-33045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-33045: Description: We already support ANY / SOME / ALL syntax, but it will throw {{StackOverflowError}} if there are many elements. We should implement LIKE ANY/SOME/ALL UDF. {noformat} java.lang.StackOverflowError at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38) at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62) at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53) at scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:184) at scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:47) at scala.collection.generic.GenericCompanion.apply(GenericCompanion.scala:53) at org.apache.spark.sql.catalyst.expressions.BinaryExpression.children(Expression.scala:549) at org.apache.spark.sql.catalyst.trees.TreeNode.foreach(TreeNode.scala:175) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1(TreeNode.scala:175) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1$adapted(TreeNode.scala:175) at scala.collection.immutable.List.foreach(List.scala:392) at org.apache.spark.sql.catalyst.trees.TreeNode.foreach(TreeNode.scala:175) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1(TreeNode.scala:175) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1$adapted(TreeNode.scala:175) at scala.collection.immutable.List.foreach(List.scala:392) {noformat} was:We already support ANY / SOME / ALL syntax, but it will throw {{StackOverflowError}} if there are many elements. We should implement LIKE ANY/SOME/ALL UDF. > Implement LIKE ANY/SOME/ALL UDF > --- > > Key: SPARK-33045 > URL: https://issues.apache.org/jira/browse/SPARK-33045 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Yuming Wang >Priority: Major > > We already support ANY / SOME / ALL syntax, but it will throw > {{StackOverflowError}} if there are many elements. We should implement LIKE > ANY/SOME/ALL UDF. > {noformat} > java.lang.StackOverflowError > at > scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) > at > scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38) > at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62) > at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53) > at > scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:184) > at > scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:47) > at > scala.collection.generic.GenericCompanion.apply(GenericCompanion.scala:53) > at > org.apache.spark.sql.catalyst.expressions.BinaryExpression.children(Expression.scala:549) > at > org.apache.spark.sql.catalyst.trees.TreeNode.foreach(TreeNode.scala:175) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1(TreeNode.scala:175) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1$adapted(TreeNode.scala:175) > at scala.collection.immutable.List.foreach(List.scala:392) > at > org.apache.spark.sql.catalyst.trees.TreeNode.foreach(TreeNode.scala:175) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1(TreeNode.scala:175) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1$adapted(TreeNode.scala:175) > at scala.collection.immutable.List.foreach(List.scala:392) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33045) Implement LIKE ANY/SOME/ALL UDF
[ https://issues.apache.org/jira/browse/SPARK-33045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-33045: Description: We already support ANY / SOME / ALL syntax, but it will throw {{StackOverflowError}} if there are many elements. We should implement LIKE ANY/SOME/ALL UDF. (was: We already support ANY / SOME / ALL syntax, but it will throw {{StackOverflowError}} if there are many elements. We should implement ) > Implement LIKE ANY/SOME/ALL UDF > --- > > Key: SPARK-33045 > URL: https://issues.apache.org/jira/browse/SPARK-33045 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Yuming Wang >Priority: Major > > We already support ANY / SOME / ALL syntax, but it will throw > {{StackOverflowError}} if there are many elements. We should implement LIKE > ANY/SOME/ALL UDF. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33045) Implement LIKE ANY/SOME/ALL UDF
[ https://issues.apache.org/jira/browse/SPARK-33045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-33045: Description: We already support ANY / SOME / ALL syntax, but it will throw {{StackOverflowError}} if there are many elements. We should implement > Implement LIKE ANY/SOME/ALL UDF > --- > > Key: SPARK-33045 > URL: https://issues.apache.org/jira/browse/SPARK-33045 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Yuming Wang >Priority: Major > > We already support ANY / SOME / ALL syntax, but it will throw > {{StackOverflowError}} if there are many elements. We should implement -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33045) Implement LIKE ANY/SOME/ALL UDF
Yuming Wang created SPARK-33045: --- Summary: Implement LIKE ANY/SOME/ALL UDF Key: SPARK-33045 URL: https://issues.apache.org/jira/browse/SPARK-33045 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.1.0 Reporter: Yuming Wang -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-33025) Empty file for the first partition
[ https://issues.apache.org/jira/browse/SPARK-33025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takeshi Yamamuro resolved SPARK-33025. -- Resolution: Won't Fix > Empty file for the first partition > -- > > Key: SPARK-33025 > URL: https://issues.apache.org/jira/browse/SPARK-33025 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 3.0.1 >Reporter: Evgenii Samusenko >Priority: Minor > > If I create Dataframe with 1 row, Spark will create empty file for the first > partition. > > Example: > val df = Seq(1).toDF("col1").repartition(8) > df1.write.csv("/csv") > > I got 2 files. The first contains the first partition and the second contains > single row from another partition. It is valid also for parquet, text and etc. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33025) Empty file for the first partition
[ https://issues.apache.org/jira/browse/SPARK-33025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17205328#comment-17205328 ] Takeshi Yamamuro commented on SPARK-33025: -- This is not a bug but an expected behaviour and please see [https://github.com/apache/spark/pull/18654#issuecomment-315928986] for more details. Yea, we might be able to remove it for csv/json/text, but that is a minor fix, so I personally think we don't need to do so. > Empty file for the first partition > -- > > Key: SPARK-33025 > URL: https://issues.apache.org/jira/browse/SPARK-33025 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 3.0.1 >Reporter: Evgenii Samusenko >Priority: Minor > > If I create Dataframe with 1 row, Spark will create empty file for the first > partition. > > Example: > val df = Seq(1).toDF("col1").repartition(8) > df1.write.csv("/csv") > > I got 2 files. The first contains the first partition and the second contains > single row from another partition. It is valid also for parquet, text and etc. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org