[jira] [Updated] (SPARK-45306) Make `InMemoryColumnarBenchmark` use AQE-aware utils to collect plans
[ https://issues.apache.org/jira/browse/SPARK-45306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45306: --- Labels: pull-request-available (was: ) > Make `InMemoryColumnarBenchmark` use AQE-aware utils to collect plans > - > > Key: SPARK-45306 > URL: https://issues.apache.org/jira/browse/SPARK-45306 > Project: Spark > Issue Type: Bug > Components: SQL, Tests >Affects Versions: 4.0.0, 3.5.1 >Reporter: Yang Jie >Priority: Major > Labels: pull-request-available > > After SPARK-42768, the default value of > `spark.sql.optimizer.canChangeCachedPlanOutputPartitioning` has changed from > false to true, so we should use AQE-aware utils to collect plans. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45306) Make `InMemoryColumnarBenchmark` use AQE-aware utils to collect plans
[ https://issues.apache.org/jira/browse/SPARK-45306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie updated SPARK-45306: - Affects Version/s: 3.5.1 > Make `InMemoryColumnarBenchmark` use AQE-aware utils to collect plans > - > > Key: SPARK-45306 > URL: https://issues.apache.org/jira/browse/SPARK-45306 > Project: Spark > Issue Type: Bug > Components: SQL, Tests >Affects Versions: 4.0.0, 3.5.1 >Reporter: Yang Jie >Priority: Major > > After SPARK-42768, the default value of > `spark.sql.optimizer.canChangeCachedPlanOutputPartitioning` has changed from > false to true, so we should use AQE-aware utils to collect plans. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45306) Make `InMemoryColumnarBenchmark` use AQE-aware utils to collect plans
Yang Jie created SPARK-45306: Summary: Make `InMemoryColumnarBenchmark` use AQE-aware utils to collect plans Key: SPARK-45306 URL: https://issues.apache.org/jira/browse/SPARK-45306 Project: Spark Issue Type: Bug Components: SQL, Tests Affects Versions: 4.0.0 Reporter: Yang Jie After SPARK-42768, the default value of `spark.sql.optimizer.canChangeCachedPlanOutputPartitioning` has changed from false to true, so we should use AQE-aware utils to collect plans. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45305) Remove JDK 8 workaround added SPARK-32999
[ https://issues.apache.org/jira/browse/SPARK-45305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45305: --- Labels: pull-request-available (was: ) > Remove JDK 8 workaround added SPARK-32999 > - > > Key: SPARK-45305 > URL: https://issues.apache.org/jira/browse/SPARK-45305 > Project: Spark > Issue Type: Test > Components: Tests >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Minor > Labels: pull-request-available > > SPARK-32999 added a test but that's only for JDK 8. We should remove them out. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45305) Remove JDK 8 workaround added SPARK-32999
Hyukjin Kwon created SPARK-45305: Summary: Remove JDK 8 workaround added SPARK-32999 Key: SPARK-45305 URL: https://issues.apache.org/jira/browse/SPARK-45305 Project: Spark Issue Type: Test Components: Tests Affects Versions: 4.0.0 Reporter: Hyukjin Kwon SPARK-32999 added a test but that's only for JDK 8. We should remove them out. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45304) Remove test classloader workaround for SBT build
[ https://issues.apache.org/jira/browse/SPARK-45304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45304: --- Labels: pull-request-available (was: ) > Remove test classloader workaround for SBT build > > > Key: SPARK-45304 > URL: https://issues.apache.org/jira/browse/SPARK-45304 > Project: Spark > Issue Type: Test > Components: Build >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Minor > Labels: pull-request-available > > Revert https://github.com/apache/spark/pull/30198. We don't need it anymore > since we dropped JDK 8 and 11. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37508) Add CONTAINS() function
[ https://issues.apache.org/jira/browse/SPARK-37508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-37508: --- Labels: pull-request-available (was: ) > Add CONTAINS() function > --- > > Key: SPARK-37508 > URL: https://issues.apache.org/jira/browse/SPARK-37508 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.3.0 >Reporter: Max Gekk >Assignee: angerszhu >Priority: Major > Labels: pull-request-available > Fix For: 3.3.0 > > > {{contains()}} is a common convenient function supported by a number of > database systems: > # > [https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators#contains_substr] > # [https://docs.snowflake.com/en/_static/apple-touch-icon.png!CONTAINS — > Snowflake > Documentation|https://docs.snowflake.com/en/sql-reference/functions/contains.html] > Proposed syntax: > {code:java} > contains(haystack, needle) > return type: boolean {code} > It is semantically equivalent to {{haystack like '%needle%'}} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45304) Remove test classloader workaround for SBT build
Hyukjin Kwon created SPARK-45304: Summary: Remove test classloader workaround for SBT build Key: SPARK-45304 URL: https://issues.apache.org/jira/browse/SPARK-45304 Project: Spark Issue Type: Test Components: Build Affects Versions: 4.0.0 Reporter: Hyukjin Kwon Revert https://github.com/apache/spark/pull/30198. We don't need it anymore since we dropped JDK 8 and 11. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45303) Remove JDK 8/11 workaround in KryoSerializerBenchmark
[ https://issues.apache.org/jira/browse/SPARK-45303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45303: --- Labels: pull-request-available (was: ) > Remove JDK 8/11 workaround in KryoSerializerBenchmark > - > > Key: SPARK-45303 > URL: https://issues.apache.org/jira/browse/SPARK-45303 > Project: Spark > Issue Type: Improvement > Components: Tests >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Minor > Labels: pull-request-available > > https://github.com/apache/spark/pull/25966 added a bit of extra flags for JDK > 8/11 consistency. We don't need them anymore because we dropped JDK 8/11 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45303) Remove JDK 8/11 workaround in KryoSerializerBenchmark
Hyukjin Kwon created SPARK-45303: Summary: Remove JDK 8/11 workaround in KryoSerializerBenchmark Key: SPARK-45303 URL: https://issues.apache.org/jira/browse/SPARK-45303 Project: Spark Issue Type: Improvement Components: Tests Affects Versions: 4.0.0 Reporter: Hyukjin Kwon https://github.com/apache/spark/pull/25966 added a bit of extra flags for JDK 8/11 consistency. We don't need them anymore because we dropped JDK 8/11 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45302) Remove PID communication between Python workers when no demon is used
[ https://issues.apache.org/jira/browse/SPARK-45302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45302: --- Labels: pull-request-available (was: ) > Remove PID communication between Python workers when no demon is used > - > > Key: SPARK-45302 > URL: https://issues.apache.org/jira/browse/SPARK-45302 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > > We don't need to send the PID around when JDK 9+ is used because we can get > the API directly. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45302) Remove PID communication between Python workers when no demon is used
Hyukjin Kwon created SPARK-45302: Summary: Remove PID communication between Python workers when no demon is used Key: SPARK-45302 URL: https://issues.apache.org/jira/browse/SPARK-45302 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 4.0.0 Reporter: Hyukjin Kwon We don't need to send the PID around when JDK 9+ is used because we can get the API directly. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28932) Maven install fails on JDK11
[ https://issues.apache.org/jira/browse/SPARK-28932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-28932: --- Labels: pull-request-available (was: ) > Maven install fails on JDK11 > > > Key: SPARK-28932 > URL: https://issues.apache.org/jira/browse/SPARK-28932 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 3.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Blocker > Labels: pull-request-available > Fix For: 3.0.0 > > > {code} > mvn clean install -pl common/network-common -DskipTests > error: fatal error: object scala in compiler mirror not found. > one error found > [INFO] > > [INFO] BUILD FAILURE > [INFO] > > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45301) Remove org.scala-lang scala-library added for JDK 11 workaround
Hyukjin Kwon created SPARK-45301: Summary: Remove org.scala-lang scala-library added for JDK 11 workaround Key: SPARK-45301 URL: https://issues.apache.org/jira/browse/SPARK-45301 Project: Spark Issue Type: Test Components: Build Affects Versions: 4.0.0 Reporter: Hyukjin Kwon https://github.com/apache/spark/pull/25800 added {code} org.scala-lang scala-library {code} Now with JDK 17 it works without them -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44539) Upgrade RoaringBitmap to 1.0.0
[ https://issues.apache.org/jira/browse/SPARK-44539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-44539: --- Labels: pull-request-available (was: ) > Upgrade RoaringBitmap to 1.0.0 > --- > > Key: SPARK-44539 > URL: https://issues.apache.org/jira/browse/SPARK-44539 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Priority: Trivial > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45300) Remove JDK 8 workaround in TimestampFormatterSuite
[ https://issues.apache.org/jira/browse/SPARK-45300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45300: --- Labels: pull-request-available (was: ) > Remove JDK 8 workaround in TimestampFormatterSuite > -- > > Key: SPARK-45300 > URL: https://issues.apache.org/jira/browse/SPARK-45300 > Project: Spark > Issue Type: Test > Components: SQL, Tests >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45300) Remove JDK 8 workaround in TimestampFormatterSuite
Hyukjin Kwon created SPARK-45300: Summary: Remove JDK 8 workaround in TimestampFormatterSuite Key: SPARK-45300 URL: https://issues.apache.org/jira/browse/SPARK-45300 Project: Spark Issue Type: Test Components: SQL, Tests Affects Versions: 4.0.0 Reporter: Hyukjin Kwon -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45300) Remove JDK 8 workaround in TimestampFormatterSuite
[ https://issues.apache.org/jira/browse/SPARK-45300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-45300: - Priority: Minor (was: Major) > Remove JDK 8 workaround in TimestampFormatterSuite > -- > > Key: SPARK-45300 > URL: https://issues.apache.org/jira/browse/SPARK-45300 > Project: Spark > Issue Type: Test > Components: SQL, Tests >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45299) Remove JDK 8 workaround in UtilsSuite
[ https://issues.apache.org/jira/browse/SPARK-45299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45299: --- Labels: pull-request-available (was: ) > Remove JDK 8 workaround in UtilsSuite > - > > Key: SPARK-45299 > URL: https://issues.apache.org/jira/browse/SPARK-45299 > Project: Spark > Issue Type: Test > Components: Tests >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45299) Remove JDK 8 workaround in UtilsSuite
[ https://issues.apache.org/jira/browse/SPARK-45299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-45299: - Description: (was: In "Kill process, we don't need the JDK 8 workaround anymore) > Remove JDK 8 workaround in UtilsSuite > - > > Key: SPARK-45299 > URL: https://issues.apache.org/jira/browse/SPARK-45299 > Project: Spark > Issue Type: Test > Components: Tests >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45299) Remove JDK 8 workaround in UtilsSuite
Hyukjin Kwon created SPARK-45299: Summary: Remove JDK 8 workaround in UtilsSuite Key: SPARK-45299 URL: https://issues.apache.org/jira/browse/SPARK-45299 Project: Spark Issue Type: Test Components: Tests Affects Versions: 4.0.0 Reporter: Hyukjin Kwon In "Kill process, we don't need the JDK 8 workaround anymore -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45298) Remove the workaround for JDK-8228469 in SPARK-31959 test
[ https://issues.apache.org/jira/browse/SPARK-45298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45298: --- Labels: pull-request-available (was: ) > Remove the workaround for JDK-8228469 in SPARK-31959 test > - > > Key: SPARK-45298 > URL: https://issues.apache.org/jira/browse/SPARK-45298 > Project: Spark > Issue Type: Test > Components: SQL, Tests >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > > In https://issues.apache.org/jira/browse/SPARK-31959, we added a bit of > workaround for outdated timezone in the tests. We can now remove them because > we dropped JDK 11 at SPARK-44112 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45298) Remove the workaround for JDK-8228469 in SPARK-31959 test
Hyukjin Kwon created SPARK-45298: Summary: Remove the workaround for JDK-8228469 in SPARK-31959 test Key: SPARK-45298 URL: https://issues.apache.org/jira/browse/SPARK-45298 Project: Spark Issue Type: Improvement Components: SQL, Tests Affects Versions: 4.0.0 Reporter: Hyukjin Kwon In https://issues.apache.org/jira/browse/SPARK-31959, we added a bit of workaround for outdated timezone in the tests. We can now remove them because we dropped JDK 11 at SPARK-44112 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45298) Remove the workaround for JDK-8228469 in SPARK-31959 test
[ https://issues.apache.org/jira/browse/SPARK-45298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-45298: - Issue Type: Test (was: Improvement) > Remove the workaround for JDK-8228469 in SPARK-31959 test > - > > Key: SPARK-45298 > URL: https://issues.apache.org/jira/browse/SPARK-45298 > Project: Spark > Issue Type: Test > Components: SQL, Tests >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Major > > In https://issues.apache.org/jira/browse/SPARK-31959, we added a bit of > workaround for outdated timezone in the tests. We can now remove them because > we dropped JDK 11 at SPARK-44112 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45297) Remove workaround for dateformatter added in SPARK-31827
[ https://issues.apache.org/jira/browse/SPARK-45297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45297: --- Labels: pull-request-available (was: ) > Remove workaround for dateformatter added in SPARK-31827 > > > Key: SPARK-45297 > URL: https://issues.apache.org/jira/browse/SPARK-45297 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > > We dropped JDK 8 at SPARK-44112, and we don't need the workaround for > SPARK-31827 anymore. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45297) Remove workaround for dateformatter added in SPARK-31827
Hyukjin Kwon created SPARK-45297: Summary: Remove workaround for dateformatter added in SPARK-31827 Key: SPARK-45297 URL: https://issues.apache.org/jira/browse/SPARK-45297 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.0.0 Reporter: Hyukjin Kwon We dropped JDK 8 at SPARK-44112, and we don't need the workaround for SPARK-31827 anymore. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45296) Comment out unused JDK 11 related in dev/run-tests.py
[ https://issues.apache.org/jira/browse/SPARK-45296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45296: --- Labels: pull-request-available (was: ) > Comment out unused JDK 11 related in dev/run-tests.py > - > > Key: SPARK-45296 > URL: https://issues.apache.org/jira/browse/SPARK-45296 > Project: Spark > Issue Type: Improvement > Components: Build, Project Infra >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > > {code} > # set up java11 env if this is a pull request build with 'test-java11' in > the title > if "ghprbPullTitle" in os.environ: > if "test-java11" in os.environ["ghprbPullTitle"].lower(): > os.environ["JAVA_HOME"] = "/usr/java/jdk-11.0.1" > os.environ["PATH"] = "%s/bin:%s" % (os.environ["JAVA_HOME"], > os.environ["PATH"]) > test_profiles += ["-Djava.version=11"] > {code} > we don't need this anymore -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45296) Comment out unused JDK 11 related in dev/run-tests.py
Hyukjin Kwon created SPARK-45296: Summary: Comment out unused JDK 11 related in dev/run-tests.py Key: SPARK-45296 URL: https://issues.apache.org/jira/browse/SPARK-45296 Project: Spark Issue Type: Improvement Components: Build, Project Infra Affects Versions: 4.0.0 Reporter: Hyukjin Kwon {code} # set up java11 env if this is a pull request build with 'test-java11' in the title if "ghprbPullTitle" in os.environ: if "test-java11" in os.environ["ghprbPullTitle"].lower(): os.environ["JAVA_HOME"] = "/usr/java/jdk-11.0.1" os.environ["PATH"] = "%s/bin:%s" % (os.environ["JAVA_HOME"], os.environ["PATH"]) test_profiles += ["-Djava.version=11"] {code} we don't need this anymore -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45295) Remove Utils.isMemberClass workaround for JDK 8
[ https://issues.apache.org/jira/browse/SPARK-45295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45295: --- Labels: pull-request-available (was: ) > Remove Utils.isMemberClass workaround for JDK 8 > --- > > Key: SPARK-45295 > URL: https://issues.apache.org/jira/browse/SPARK-45295 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > > We dropped JDK 8 and 11 at SPARK-44112. We don't need the workaround anymore -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45295) Remove Utils.isMemberClass workaround for JDK 8
Hyukjin Kwon created SPARK-45295: Summary: Remove Utils.isMemberClass workaround for JDK 8 Key: SPARK-45295 URL: https://issues.apache.org/jira/browse/SPARK-45295 Project: Spark Issue Type: Improvement Components: Spark Core, SQL Affects Versions: 4.0.0 Reporter: Hyukjin Kwon We dropped JDK 8 and 11 at SPARK-44112. We don't need the workaround anymore -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45294) Use JDK 17 in Binder integration for PySpark live notebooks
[ https://issues.apache.org/jira/browse/SPARK-45294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-45294. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43077 [https://github.com/apache/spark/pull/43077] > Use JDK 17 in Binder integration for PySpark live notebooks > --- > > Key: SPARK-45294 > URL: https://issues.apache.org/jira/browse/SPARK-45294 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > See https://github.com/apache/spark/blob/master/binder/apt.txt -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45294) Use JDK 17 in Binder integration for PySpark live notebooks
[ https://issues.apache.org/jira/browse/SPARK-45294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-45294: Assignee: Hyukjin Kwon > Use JDK 17 in Binder integration for PySpark live notebooks > --- > > Key: SPARK-45294 > URL: https://issues.apache.org/jira/browse/SPARK-45294 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > > See https://github.com/apache/spark/blob/master/binder/apt.txt -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45294) Use JDK 17 in Binder integration for PySpark live notebooks
[ https://issues.apache.org/jira/browse/SPARK-45294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45294: --- Labels: pull-request-available (was: ) > Use JDK 17 in Binder integration for PySpark live notebooks > --- > > Key: SPARK-45294 > URL: https://issues.apache.org/jira/browse/SPARK-45294 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > > See https://github.com/apache/spark/blob/master/binder/apt.txt -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44539) Upgrade RoaringBitmap to 1.0.0
[ https://issues.apache.org/jira/browse/SPARK-44539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BingKun Pan updated SPARK-44539: Summary: Upgrade RoaringBitmap to 1.0.0 (was: Upgrade RoaringBitmap to 0.9.46) > Upgrade RoaringBitmap to 1.0.0 > --- > > Key: SPARK-44539 > URL: https://issues.apache.org/jira/browse/SPARK-44539 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Priority: Trivial > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45294) Use JDK 17 in Binder integration for PySpark live notebooks
Hyukjin Kwon created SPARK-45294: Summary: Use JDK 17 in Binder integration for PySpark live notebooks Key: SPARK-45294 URL: https://issues.apache.org/jira/browse/SPARK-45294 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 4.0.0 Reporter: Hyukjin Kwon See https://github.com/apache/spark/blob/master/binder/apt.txt -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45276) Replace Java 8 and Java 11 installed in the Dockerfile with Java
[ https://issues.apache.org/jira/browse/SPARK-45276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45276: --- Labels: pull-request-available (was: ) > Replace Java 8 and Java 11 installed in the Dockerfile with Java > > > Key: SPARK-45276 > URL: https://issues.apache.org/jira/browse/SPARK-45276 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Yang Jie >Priority: Major > Labels: pull-request-available > > including dev/create-release/spark-rm/Dockerfile and > connector/docker/spark-test/base/Dockerfile > There might be others as well. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45293) Install Java 17 for docker
[ https://issues.apache.org/jira/browse/SPARK-45293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BingKun Pan resolved SPARK-45293. - Resolution: Duplicate https://issues.apache.org/jira/browse/SPARK-45276 > Install Java 17 for docker > -- > > Key: SPARK-45293 > URL: https://issues.apache.org/jira/browse/SPARK-45293 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45293) Install Java 17 for docker
[ https://issues.apache.org/jira/browse/SPARK-45293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45293: --- Labels: pull-request-available (was: ) > Install Java 17 for docker > -- > > Key: SPARK-45293 > URL: https://issues.apache.org/jira/browse/SPARK-45293 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45293) Install Java 17 for docker
BingKun Pan created SPARK-45293: --- Summary: Install Java 17 for docker Key: SPARK-45293 URL: https://issues.apache.org/jira/browse/SPARK-45293 Project: Spark Issue Type: Improvement Components: Project Infra Affects Versions: 4.0.0 Reporter: BingKun Pan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45207) Implement Error Enrichment for Scala Client
[ https://issues.apache.org/jira/browse/SPARK-45207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-45207: Assignee: Yihong He > Implement Error Enrichment for Scala Client > --- > > Key: SPARK-45207 > URL: https://issues.apache.org/jira/browse/SPARK-45207 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 4.0.0 >Reporter: Yihong He >Assignee: Yihong He >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45240) Implement Error Enrichment for Python Client
[ https://issues.apache.org/jira/browse/SPARK-45240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-45240: Assignee: Yihong He > Implement Error Enrichment for Python Client > > > Key: SPARK-45240 > URL: https://issues.apache.org/jira/browse/SPARK-45240 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 4.0.0 >Reporter: Yihong He >Assignee: Yihong He >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45207) Implement Error Enrichment for Scala Client
[ https://issues.apache.org/jira/browse/SPARK-45207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-45207. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 42987 [https://github.com/apache/spark/pull/42987] > Implement Error Enrichment for Scala Client > --- > > Key: SPARK-45207 > URL: https://issues.apache.org/jira/browse/SPARK-45207 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 4.0.0 >Reporter: Yihong He >Assignee: Yihong He >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45240) Implement Error Enrichment for Python Client
[ https://issues.apache.org/jira/browse/SPARK-45240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-45240. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43034 [https://github.com/apache/spark/pull/43034] > Implement Error Enrichment for Python Client > > > Key: SPARK-45240 > URL: https://issues.apache.org/jira/browse/SPARK-45240 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 4.0.0 >Reporter: Yihong He >Assignee: Yihong He >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43259) Assign a name to the error class _LEGACY_ERROR_TEMP_2024
[ https://issues.apache.org/jira/browse/SPARK-43259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-43259: --- Labels: pull-request-available starter (was: starter) > Assign a name to the error class _LEGACY_ERROR_TEMP_2024 > > > Key: SPARK-43259 > URL: https://issues.apache.org/jira/browse/SPARK-43259 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Max Gekk >Priority: Minor > Labels: pull-request-available, starter > > Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2024* defined in > {*}core/src/main/resources/error/error-classes.json{*}. The name should be > short but complete (look at the example in error-classes.json). > Add a test which triggers the error from user code if such test still doesn't > exist. Check exception fields by using {*}checkError(){*}. The last function > checks valuable error fields only, and avoids dependencies from error text > message. In this way, tech editors can modify error format in > error-classes.json, and don't worry of Spark's internal tests. Migrate other > tests that might trigger the error onto checkError(). > If you cannot reproduce the error from user space (using SQL query), replace > the error by an internal error, see {*}SparkException.internalError(){*}. > Improve the error message format in error-classes.json if the current is not > clear. Propose a solution to users how to avoid and fix such kind of errors. > Please, look at the PR below as examples: > * [https://github.com/apache/spark/pull/38685] > * [https://github.com/apache/spark/pull/38656] > * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43900) Support optimize skewed partitions even if introduce extra shuffle
[ https://issues.apache.org/jira/browse/SPARK-43900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-43900: --- Labels: pull-request-available (was: ) > Support optimize skewed partitions even if introduce extra shuffle > -- > > Key: SPARK-43900 > URL: https://issues.apache.org/jira/browse/SPARK-43900 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: Zhen Wang >Priority: Major > Labels: pull-request-available > > Similar to [SPARK-33832|https://issues.apache.org/jira/browse/SPARK-33832], > OptimizeSkewInRebalancePartitions will not apply if skew mitigation causes a > new shuffle. > Test case (data skew in RebalancePartition): > {code:java} > *(2) HashAggregate(keys=[c1#226], functions=[count(1)], output=[c1#226, > count(1)#231L]) > +- *(2) HashAggregate(keys=[c1#226], functions=[partial_count(1)], > output=[c1#226, count#235L]) > +- AQEShuffleRead coalesced > +- ShuffleQueryStage 0 > +- Exchange hashpartitioning(c1#226, 5), > REBALANCE_PARTITIONS_BY_COL, [plan_id=106] > +- *(1) Project [key#221 AS c1#226] > +- *(1) SerializeFromObject > [knownnotnull(assertnotnull(input[0, > org.apache.spark.sql.test.SQLTestData$TestData, true])).key AS key#221] > +- Scan[obj#220] {code} > expect: > {code:java} > HashAggregate(keys=[c1#226], functions=[count(1)], output=[c1#226, > count(1)#231L]) > +- AQEShuffleRead coalesced > +- ShuffleQueryStage 1 > +- Exchange hashpartitioning(c1#226, 5), ENSURE_REQUIREMENTS, > [plan_id=140] > +- *(2) HashAggregate(keys=[c1#226], functions=[partial_count(1)], > output=[c1#226, count#235L]) > +- AQEShuffleRead coalesced and skewed > +- ShuffleQueryStage 0 > +- Exchange hashpartitioning(c1#226, 5), > REBALANCE_PARTITIONS_BY_COL, [plan_id=106] > +- *(1) Project [key#221 AS c1#226] > +- *(1) SerializeFromObject > [knownnotnull(assertnotnull(input[0, > org.apache.spark.sql.test.SQLTestData$TestData, true])).key AS key#221] > +- Scan[obj#220] {code} > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45279) Attach plan_id for all logical plan
[ https://issues.apache.org/jira/browse/SPARK-45279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-45279. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43055 [https://github.com/apache/spark/pull/43055] > Attach plan_id for all logical plan > --- > > Key: SPARK-45279 > URL: https://issues.apache.org/jira/browse/SPARK-45279 > Project: Spark > Issue Type: Improvement > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42947) Spark Thriftserver LDAP should not use DN pattern if user contains domain
[ https://issues.apache.org/jira/browse/SPARK-42947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-42947: --- Labels: pull-request-available (was: ) > Spark Thriftserver LDAP should not use DN pattern if user contains domain > - > > Key: SPARK-42947 > URL: https://issues.apache.org/jira/browse/SPARK-42947 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Jiayi Liu >Priority: Major > Labels: pull-request-available > > When the LDAP provider has domain configuration, such as Active Directory, > the principal should not be constructed according to the DN pattern, but the > username containing the domain should be directly passed to the LDAP provider > as the principal. We can refer to the implementation of Hive LdapUtils. > When the username contains a domain or domain passes from > hive.server2.authentication.ldap.Domain configuration, if we construct the > principal according to the DN pattern (For example, > uid=user@domain,dc=test,dc=com), we will get the following error: > {code:java} > 23/03/28 11:01:48 ERROR TSaslTransport: SASL negotiation failure > javax.security.sasl.SaslException: Error validating the login > at > org.apache.hive.service.auth.PlainSaslServer.evaluateResponse(PlainSaslServer.java:108) > ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1] > at > org.apache.thrift.transport.TSaslTransport$SaslParticipant.evaluateChallengeOrResponse(TSaslTransport.java:537) > ~[libthrift-0.12.0.jar:0.12.0] > at > org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:283) > ~[libthrift-0.12.0.jar:0.12.0] > at > org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:43) > ~[libthrift-0.12.0.jar:0.12.0] > at > org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:223) > ~[libthrift-0.12.0.jar:0.12.0] > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:293) > ~[libthrift-0.12.0.jar:0.12.0] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > ~[?:1.8.0_352] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > ~[?:1.8.0_352] > at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_352] > Caused by: javax.security.sasl.AuthenticationException: Error validating LDAP > user > at > org.apache.hive.service.auth.LdapAuthenticationProviderImpl.Authenticate(LdapAuthenticationProviderImpl.java:76) > ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1] > at > org.apache.hive.service.auth.PlainSaslHelper$PlainServerCallbackHandler.handle(PlainSaslHelper.java:105) > ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1] > at > org.apache.hive.service.auth.PlainSaslServer.evaluateResponse(PlainSaslServer.java:101) > ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1] > ... 8 more > Caused by: javax.naming.AuthenticationException: [LDAP: error code 49 - > 80090308: LdapErr: DSID-0C0903D9, comment: AcceptSecurityContext error, data > 52e, v2580] > at com.sun.jndi.ldap.LdapCtx.mapErrorCode(LdapCtx.java:3261) > ~[?:1.8.0_352] > at com.sun.jndi.ldap.LdapCtx.processReturnCode(LdapCtx.java:3207) > ~[?:1.8.0_352] > at com.sun.jndi.ldap.LdapCtx.processReturnCode(LdapCtx.java:2993) > ~[?:1.8.0_352] > at com.sun.jndi.ldap.LdapCtx.connect(LdapCtx.java:2907) ~[?:1.8.0_352] > at com.sun.jndi.ldap.LdapCtx.(LdapCtx.java:347) ~[?:1.8.0_352] > at > com.sun.jndi.ldap.LdapCtxFactory.getLdapCtxFromUrl(LdapCtxFactory.java:229) > ~[?:1.8.0_352] > at > com.sun.jndi.ldap.LdapCtxFactory.getUsingURL(LdapCtxFactory.java:189) > ~[?:1.8.0_352] > at > com.sun.jndi.ldap.LdapCtxFactory.getUsingURLs(LdapCtxFactory.java:247) > ~[?:1.8.0_352] > at > com.sun.jndi.ldap.LdapCtxFactory.getLdapCtxInstance(LdapCtxFactory.java:154) > ~[?:1.8.0_352] > at > com.sun.jndi.ldap.LdapCtxFactory.getInitialContext(LdapCtxFactory.java:84) > ~[?:1.8.0_352] > at > javax.naming.spi.NamingManager.getInitialContext(NamingManager.java:695) > ~[?:1.8.0_352] > at > javax.naming.InitialContext.getDefaultInitCtx(InitialContext.java:313) > ~[?:1.8.0_352] > at javax.naming.InitialContext.init(InitialContext.java:244) > ~[?:1.8.0_352] > at javax.naming.InitialContext.(InitialContext.java:216) > ~[?:1.8.0_352] > at > javax.naming.directory.InitialDirContext.(InitialDirContext.java:101) > ~[?:1.8.0_352] > at > org.apache.hive.service.auth.LdapAuthenticationProviderImpl.Authenticate(LdapAuthenticationProviderImpl.java:73) > ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1] > at >
[jira] [Assigned] (SPARK-45279) Attach plan_id for all logical plan
[ https://issues.apache.org/jira/browse/SPARK-45279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng reassigned SPARK-45279: - Assignee: Ruifeng Zheng > Attach plan_id for all logical plan > --- > > Key: SPARK-45279 > URL: https://issues.apache.org/jira/browse/SPARK-45279 > Project: Spark > Issue Type: Improvement > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44077) Session Configs were not getting honored in RDDs
[ https://issues.apache.org/jira/browse/SPARK-44077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-44077: --- Labels: pull-request-available (was: ) > Session Configs were not getting honored in RDDs > > > Key: SPARK-44077 > URL: https://issues.apache.org/jira/browse/SPARK-44077 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Kapil Singh >Priority: Major > Labels: pull-request-available > > When calling SQLConf.get on executors, the configs are read from the local > properties on the TaskContext. The local properties are populated driver-side > when scheduling the job, using the properties found in > sparkContext.localProperties. For RDD actions, local properties were not > getting populated. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31177) DataFrameReader.csv incorrectly reads gzip encoded CSV from S3 when it has non-".gz" extension
[ https://issues.apache.org/jira/browse/SPARK-31177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17768464#comment-17768464 ] Mark Waddle commented on SPARK-31177: - [~Minskya]the resolution is “incomplete”, so I don’t think it’s fixed. I worked around it by renaming files to end in .gz extension. > DataFrameReader.csv incorrectly reads gzip encoded CSV from S3 when it has > non-".gz" extension > -- > > Key: SPARK-31177 > URL: https://issues.apache.org/jira/browse/SPARK-31177 > Project: Spark > Issue Type: Bug > Components: Input/Output >Affects Versions: 2.4.4 >Reporter: Mark Waddle >Priority: Major > Labels: bulk-closed > > i have large CSV files that are gzipped and uploaded to S3 with > Content-Encoding=gzip. the files have file extension ".csv", as most web > clients will automatically decompress the file based on the Content-Encoding > header. using pyspark to read these CSV files does not mimic this behavior. > works as expected: > {code:java} > df = spark.read.csv('s3://bucket/large.csv.gz', header=True) > {code} > does not decompress and tries to load entire contents of file as the first > row: > {code:java} > df = spark.read.csv('s3://bucket/large.csv', header=True) > {code} > it looks like it's relying on the file extension to determine if the file is > gzip compressed or not. it would be great if S3 resources, and any other http > based resources, could consult the Content-Encoding response header as well. > i tried to find the code that determines this, but i'm not familiar with the > code base. any pointers would be helpful. and i can look into fixing it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45286) Add back Matomo analytics to release docs
[ https://issues.apache.org/jira/browse/SPARK-45286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen resolved SPARK-45286. -- Fix Version/s: 3.3.4 3.5.1 4.0.0 3.4.2 Resolution: Fixed Issue resolved by pull request 43063 [https://github.com/apache/spark/pull/43063] > Add back Matomo analytics to release docs > - > > Key: SPARK-45286 > URL: https://issues.apache.org/jira/browse/SPARK-45286 > Project: Spark > Issue Type: Improvement > Components: Documentation >Affects Versions: 4.0.0 >Reporter: Sean R. Owen >Assignee: Sean R. Owen >Priority: Minor > Labels: pull-request-available > Fix For: 3.3.4, 3.5.1, 4.0.0, 3.4.2 > > > We had previously removed Google Analytics from the website and release docs, > per ASF policy: https://github.com/apache/spark/pull/36310 > We just restored analytics using the ASF-hosted Matomo service on the website: > https://github.com/apache/spark-website/commit/a1548627b48a62c2e51870d1488ca3e09397bd30 > This change would put the same new tracking code back into the release docs. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45292) Remove Guava from shared classes from IsolatedClientLoader
Cheng Pan created SPARK-45292: - Summary: Remove Guava from shared classes from IsolatedClientLoader Key: SPARK-45292 URL: https://issues.apache.org/jira/browse/SPARK-45292 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.0.0 Reporter: Cheng Pan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44366) Migrate antlr4 from 4.9 to 4.10+
[ https://issues.apache.org/jira/browse/SPARK-44366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-44366: --- Labels: pull-request-available (was: ) > Migrate antlr4 from 4.9 to 4.10+ > > > Key: SPARK-44366 > URL: https://issues.apache.org/jira/browse/SPARK-44366 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44170) Migrating Junit4 to Junit5
[ https://issues.apache.org/jira/browse/SPARK-44170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-44170: --- Labels: pull-request-available (was: ) > Migrating Junit4 to Junit5 > -- > > Key: SPARK-44170 > URL: https://issues.apache.org/jira/browse/SPARK-44170 > Project: Spark > Issue Type: Sub-task > Components: Tests >Affects Versions: 4.0.0 >Reporter: Yang Jie >Priority: Major > Labels: pull-request-available > > JUnit5 is a powerful and flexible update to the JUnit framework, and it > provides a variety of improvements and new features to organize and > describe test cases, as well as help in understanding test results: > # JUnit 5 leverages features from Java 8 or later, such as lambda functions, > making tests more powerful and easier to maintain, but Junit 4 still a Java 7 > compatible version > # JUnit 5 has added some useful new features for describing, organizing, and > executing tests. For examples: [Parameterized > Tests|https://junit.org/junit5/docs/current/user-guide/#writing-tests-parameterized-tests] > and [Conditional Test > Execution|https://junit.org/junit5/docs/current/user-guide/#extensions-conditions] > may make our test code look simpler, [Parallel > Execution|https://junit.org/junit5/docs/current/user-guide/#writing-tests-parallel-execution] > may make our test faster > > More importantly, Junit4 is currently an inactive project, which has not > released a new version for more than two years > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45291) Use unknown query execution id instead of no such app when id is invalid
[ https://issues.apache.org/jira/browse/SPARK-45291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45291: --- Labels: pull-request-available (was: ) > Use unknown query execution id instead of no such app when id is invalid > > > Key: SPARK-45291 > URL: https://issues.apache.org/jira/browse/SPARK-45291 > Project: Spark > Issue Type: Bug > Components: SQL, UI >Affects Versions: 3.5.0, 4.0.0 >Reporter: Kent Yao >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45291) Use unknown query execution id instead of no such app when id is invalid
Kent Yao created SPARK-45291: Summary: Use unknown query execution id instead of no such app when id is invalid Key: SPARK-45291 URL: https://issues.apache.org/jira/browse/SPARK-45291 Project: Spark Issue Type: Bug Components: SQL, UI Affects Versions: 3.5.0, 4.0.0 Reporter: Kent Yao -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-45282) Join loses records for cached datasets
[ https://issues.apache.org/jira/browse/SPARK-45282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17768399#comment-17768399 ] Yuming Wang commented on SPARK-45282: - cc [~ulysses] [~cloud_fan] > Join loses records for cached datasets > -- > > Key: SPARK-45282 > URL: https://issues.apache.org/jira/browse/SPARK-45282 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.1, 3.5.0 > Environment: spark 3.4.1 on apache hadoop 3.3.6 or kubernetes 1.26 or > databricks 13.3 >Reporter: koert kuipers >Priority: Major > Labels: CorrectnessBug, correctness > > we observed this issue on spark 3.4.1 but it is also present on 3.5.0. it is > not present on spark 3.3.1. > it only shows up in distributed environment. i cannot replicate in unit test. > however i did get it to show up on hadoop cluster, kubernetes, and on > databricks 13.3 > the issue is that records are dropped when two cached dataframes are joined. > it seems in spark 3.4.1 in queryplan some Exchanges are dropped as an > optimization while in spark 3.3.1 these Exhanges are still present. it seems > to be an issue with AQE with canChangeCachedPlanOutputPartitioning=true. > to reproduce on distributed cluster these settings needed are: > {code:java} > spark.sql.adaptive.advisoryPartitionSizeInBytes 33554432 > spark.sql.adaptive.coalescePartitions.parallelismFirst false > spark.sql.adaptive.enabled true > spark.sql.optimizer.canChangeCachedPlanOutputPartitioning true {code} > code using scala to reproduce is: > {code:java} > import java.util.UUID > import org.apache.spark.sql.functions.col > import spark.implicits._ > val data = (1 to 100).toDS().map(i => > UUID.randomUUID().toString).persist() > val left = data.map(k => (k, 1)) > val right = data.map(k => (k, k)) // if i change this to k => (k, 1) it works! > println("number of left " + left.count()) > println("number of right " + right.count()) > println("number of (left join right) " + > left.toDF("key", "value1").join(right.toDF("key", "value2"), "key").count() > ) > val left1 = left > .toDF("key", "value1") > .repartition(col("key")) // comment out this line to make it work > .persist() > println("number of left1 " + left1.count()) > val right1 = right > .toDF("key", "value2") > .repartition(col("key")) // comment out this line to make it work > .persist() > println("number of right1 " + right1.count()) > println("number of (left1 join right1) " + left1.join(right1, > "key").count()) // this gives incorrect result{code} > this produces the following output: > {code:java} > number of left 100 > number of right 100 > number of (left join right) 100 > number of left1 100 > number of right1 100 > number of (left1 join right1) 859531 {code} > note that the last number (the incorrect one) actually varies depending on > settings and cluster size etc. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44366) Migrate antlr4 from 4.9 to 4.10+
[ https://issues.apache.org/jira/browse/SPARK-44366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17768393#comment-17768393 ] Yang Jie commented on SPARK-44366: -- +1 > Migrate antlr4 from 4.9 to 4.10+ > > > Key: SPARK-44366 > URL: https://issues.apache.org/jira/browse/SPARK-44366 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45289) ClassCastException when reading Delta table on AWS S3
[ https://issues.apache.org/jira/browse/SPARK-45289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tanawat Panmongkol updated SPARK-45289: --- Description: When attempting to read a Delta table from S3 using version 3.5.0, a _*{{ClassCastException}}*_ occurs involving {{_*org.apache.hadoop.fs.s3a.S3AFileStatus*_}} and {_}*{{org.apache.spark.sql.execution.datasources.FileStatusWithMetadata}}*{_}. The error appears to be related to the new feature SPARK-43039. _*Steps to Reproduce:*_ {code:java} export AWS_ACCESS_KEY_ID='' export AWS_SECRET_ACCESS_KEY='' export AWS_REGION='' docker run --rm -it apache/spark:3.5.0-scala2.12-java11-ubuntu /opt/spark/bin/spark-shell \ --packages 'org.apache.hadoop:hadoop-aws:3.3.4,io.delta:delta-core_2.12:2.4.0' \ --conf "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog" \ --conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" \ --conf "spark.hadoop.aws.region=$AWS_REGION" \ --conf "spark.hadoop.fs.s3a.access.key=$AWS_ACCESS_KEY_ID" \ --conf "spark.hadoop.fs.s3a.secret.key=$AWS_SECRET_ACCESS_KEY" \ --conf "spark.hadoop.fs.s3.impl=org.apache.hadoop.fs.s3a.S3AFileSystem" \ --conf "spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem" \ --conf "spark.hadoop.fs.s3a.path.style.access=true" \ --conf "spark.hadoop.fs.s3a.connection.ssl.enabled=true" \ --conf "spark.jars.ivy=/tmp/ivy/cache"{code} {code:java} scala> spark.read.format("delta").load("s3:").show() {code} *Logs:* {code:java} java.lang.ClassCastException: class org.apache.hadoop.fs.s3a.S3AFileStatus cannot be cast to class org.apache.spark.sql.execution.datasources.FileStatusWithMetadata (org.apache.hadoop.fs.s3a.S3AFileStatus is in unnamed module of loader scala.reflect.internal.util.ScalaClassLoader$URLClassLoader @4552f905; org.apache.spark.sql.execution.datasources.FileStatusWithMetadata is in unnamed module of loader 'app') at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38) at scala.collection.TraversableLike.map(TraversableLike.scala:286) at scala.collection.TraversableLike.map$(TraversableLike.scala:279) at scala.collection.AbstractTraversable.map(Traversable.scala:108) at org.apache.spark.sql.execution.FileSourceScanLike.$anonfun$setFilesNumAndSizeMetric$2(DataSourceScanExec.scala:466) at org.apache.spark.sql.execution.FileSourceScanLike.$anonfun$setFilesNumAndSizeMetric$2$adapted(DataSourceScanExec.scala:466) at scala.collection.immutable.List.map(List.scala:293) at org.apache.spark.sql.execution.FileSourceScanLike.setFilesNumAndSizeMetric(DataSourceScanExec.scala:466) at org.apache.spark.sql.execution.FileSourceScanLike.selectedPartitions(DataSourceScanExec.scala:257) at org.apache.spark.sql.execution.FileSourceScanLike.selectedPartitions$(DataSourceScanExec.scala:251) at org.apache.spark.sql.execution.FileSourceScanExec.selectedPartitions$lzycompute(DataSourceScanExec.scala:506) at org.apache.spark.sql.execution.FileSourceScanExec.selectedPartitions(DataSourceScanExec.scala:506) at org.apache.spark.sql.execution.FileSourceScanLike.dynamicallySelectedPartitions(DataSourceScanExec.scala:286) at org.apache.spark.sql.execution.FileSourceScanLike.dynamicallySelectedPartitions$(DataSourceScanExec.scala:267) at org.apache.spark.sql.execution.FileSourceScanExec.dynamicallySelectedPartitions$lzycompute(DataSourceScanExec.scala:506) at org.apache.spark.sql.execution.FileSourceScanExec.dynamicallySelectedPartitions(DataSourceScanExec.scala:506) at org.apache.spark.sql.execution.FileSourceScanExec.inputRDD$lzycompute(DataSourceScanExec.scala:553) at org.apache.spark.sql.execution.FileSourceScanExec.inputRDD(DataSourceScanExec.scala:537) at org.apache.spark.sql.execution.FileSourceScanExec.doExecute(DataSourceScanExec.scala:575) at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:195) at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:246) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:243) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:191) at org.apache.spark.sql.execution.InputAdapter.inputRDD(WholeStageCodegenExec.scala:527) at org.apache.spark.sql.execution.InputRDDCodegen.inputRDDs(WholeStageCodegenExec.scala:455) at org.apache.spark.sql.execution.InputRDDCodegen.inputRDDs$(WholeStageCodegenExec.scala:454) at org.apache.spark.sql.execution.InputAdapter.inputRDDs(WholeStageCodegenExec.scala:498) at