date:20230924

[jira] [Updated] (SPARK-45306) Make `InMemoryColumnarBenchmark` use AQE-aware utils to collect plans

2023-09-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45306:
---
Labels: pull-request-available  (was: )

> Make `InMemoryColumnarBenchmark` use AQE-aware utils to collect plans
> -
>
> Key: SPARK-45306
> URL: https://issues.apache.org/jira/browse/SPARK-45306
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Affects Versions: 4.0.0, 3.5.1
>Reporter: Yang Jie
>Priority: Major
>  Labels: pull-request-available
>
> After SPARK-42768, the default value of 
> `spark.sql.optimizer.canChangeCachedPlanOutputPartitioning` has changed from 
> false to true, so we should use AQE-aware utils to collect plans.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45306) Make `InMemoryColumnarBenchmark` use AQE-aware utils to collect plans

2023-09-24 Thread Yang Jie (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-45306:
-
Affects Version/s: 3.5.1

> Make `InMemoryColumnarBenchmark` use AQE-aware utils to collect plans
> -
>
> Key: SPARK-45306
> URL: https://issues.apache.org/jira/browse/SPARK-45306
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Affects Versions: 4.0.0, 3.5.1
>Reporter: Yang Jie
>Priority: Major
>
> After SPARK-42768, the default value of 
> `spark.sql.optimizer.canChangeCachedPlanOutputPartitioning` has changed from 
> false to true, so we should use AQE-aware utils to collect plans.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45306) Make `InMemoryColumnarBenchmark` use AQE-aware utils to collect plans

2023-09-24 Thread Yang Jie (Jira)

Yang Jie created SPARK-45306:


 Summary: Make `InMemoryColumnarBenchmark` use AQE-aware utils to 
collect plans
 Key: SPARK-45306
 URL: https://issues.apache.org/jira/browse/SPARK-45306
 Project: Spark
  Issue Type: Bug
  Components: SQL, Tests
Affects Versions: 4.0.0
Reporter: Yang Jie


After SPARK-42768, the default value of 
`spark.sql.optimizer.canChangeCachedPlanOutputPartitioning` has changed from 
false to true, so we should use AQE-aware utils to collect plans.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45305) Remove JDK 8 workaround added SPARK-32999

2023-09-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45305:
---
Labels: pull-request-available  (was: )

> Remove JDK 8 workaround added SPARK-32999
> -
>
> Key: SPARK-45305
> URL: https://issues.apache.org/jira/browse/SPARK-45305
> Project: Spark
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Minor
>  Labels: pull-request-available
>
> SPARK-32999 added a test but that's only for JDK 8. We should remove them out.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45305) Remove JDK 8 workaround added SPARK-32999

2023-09-24 Thread Hyukjin Kwon (Jira)

Hyukjin Kwon created SPARK-45305:


 Summary: Remove JDK 8 workaround added SPARK-32999
 Key: SPARK-45305
 URL: https://issues.apache.org/jira/browse/SPARK-45305
 Project: Spark
  Issue Type: Test
  Components: Tests
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


SPARK-32999 added a test but that's only for JDK 8. We should remove them out.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45304) Remove test classloader workaround for SBT build

2023-09-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45304:
---
Labels: pull-request-available  (was: )

> Remove test classloader workaround for SBT build
> 
>
> Key: SPARK-45304
> URL: https://issues.apache.org/jira/browse/SPARK-45304
> Project: Spark
>  Issue Type: Test
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Minor
>  Labels: pull-request-available
>
> Revert https://github.com/apache/spark/pull/30198. We don't need it anymore 
> since we dropped JDK 8 and 11.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37508) Add CONTAINS() function

2023-09-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-37508:
---
Labels: pull-request-available  (was: )

> Add CONTAINS() function
> ---
>
> Key: SPARK-37508
> URL: https://issues.apache.org/jira/browse/SPARK-37508
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Max Gekk
>Assignee: angerszhu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.0
>
>
> {{contains()}} is a common convenient function supported by a number of 
> database systems:
>  # 
> [https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators#contains_substr]
>  # [https://docs.snowflake.com/en/_static/apple-touch-icon.png!CONTAINS — 
> Snowflake 
> Documentation|https://docs.snowflake.com/en/sql-reference/functions/contains.html]
> Proposed syntax:
> {code:java}
> contains(haystack, needle)
> return type: boolean {code}
> It is semantically equivalent to {{haystack like '%needle%'}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45304) Remove test classloader workaround for SBT build

2023-09-24 Thread Hyukjin Kwon (Jira)

Hyukjin Kwon created SPARK-45304:


 Summary: Remove test classloader workaround for SBT build
 Key: SPARK-45304
 URL: https://issues.apache.org/jira/browse/SPARK-45304
 Project: Spark
  Issue Type: Test
  Components: Build
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


Revert https://github.com/apache/spark/pull/30198. We don't need it anymore 
since we dropped JDK 8 and 11.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45303) Remove JDK 8/11 workaround in KryoSerializerBenchmark

2023-09-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45303:
---
Labels: pull-request-available  (was: )

> Remove JDK 8/11 workaround in KryoSerializerBenchmark
> -
>
> Key: SPARK-45303
> URL: https://issues.apache.org/jira/browse/SPARK-45303
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Minor
>  Labels: pull-request-available
>
> https://github.com/apache/spark/pull/25966 added a bit of extra flags for JDK 
> 8/11 consistency. We don't need them anymore because we dropped JDK 8/11



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45303) Remove JDK 8/11 workaround in KryoSerializerBenchmark

2023-09-24 Thread Hyukjin Kwon (Jira)

Hyukjin Kwon created SPARK-45303:


 Summary: Remove JDK 8/11 workaround in KryoSerializerBenchmark
 Key: SPARK-45303
 URL: https://issues.apache.org/jira/browse/SPARK-45303
 Project: Spark
  Issue Type: Improvement
  Components: Tests
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


https://github.com/apache/spark/pull/25966 added a bit of extra flags for JDK 
8/11 consistency. We don't need them anymore because we dropped JDK 8/11



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45302) Remove PID communication between Python workers when no demon is used

2023-09-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45302:
---
Labels: pull-request-available  (was: )

> Remove PID communication between Python workers when no demon is used
> -
>
> Key: SPARK-45302
> URL: https://issues.apache.org/jira/browse/SPARK-45302
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> We don't need to send the PID around when JDK 9+ is used because we can get 
> the API directly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45302) Remove PID communication between Python workers when no demon is used

2023-09-24 Thread Hyukjin Kwon (Jira)

Hyukjin Kwon created SPARK-45302:


 Summary: Remove PID communication between Python workers when no 
demon is used
 Key: SPARK-45302
 URL: https://issues.apache.org/jira/browse/SPARK-45302
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


We don't need to send the PID around when JDK 9+ is used because we can get the 
API directly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-28932) Maven install fails on JDK11

2023-09-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-28932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-28932:
---
Labels: pull-request-available  (was: )

> Maven install fails on JDK11
> 
>
> Key: SPARK-28932
> URL: https://issues.apache.org/jira/browse/SPARK-28932
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 3.0.0
>
>
> {code}
> mvn clean install -pl common/network-common -DskipTests
> error: fatal error: object scala in compiler mirror not found.
> one error found
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45301) Remove org.scala-lang scala-library added for JDK 11 workaround

2023-09-24 Thread Hyukjin Kwon (Jira)

Hyukjin Kwon created SPARK-45301:


 Summary: Remove org.scala-lang scala-library added for JDK 11 
workaround 
 Key: SPARK-45301
 URL: https://issues.apache.org/jira/browse/SPARK-45301
 Project: Spark
  Issue Type: Test
  Components: Build
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


https://github.com/apache/spark/pull/25800 added 

{code}


  org.scala-lang
  scala-library

{code}

Now with JDK 17 it works without them



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-44539) Upgrade RoaringBitmap to 1.0.0

2023-09-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-44539:
---
Labels: pull-request-available  (was: )

>  Upgrade RoaringBitmap to 1.0.0
> ---
>
> Key: SPARK-44539
> URL: https://issues.apache.org/jira/browse/SPARK-44539
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Priority: Trivial
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45300) Remove JDK 8 workaround in TimestampFormatterSuite

2023-09-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45300:
---
Labels: pull-request-available  (was: )

> Remove JDK 8 workaround in TimestampFormatterSuite
> --
>
> Key: SPARK-45300
> URL: https://issues.apache.org/jira/browse/SPARK-45300
> Project: Spark
>  Issue Type: Test
>  Components: SQL, Tests
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45300) Remove JDK 8 workaround in TimestampFormatterSuite

2023-09-24 Thread Hyukjin Kwon (Jira)

Hyukjin Kwon created SPARK-45300:


 Summary: Remove JDK 8 workaround in TimestampFormatterSuite
 Key: SPARK-45300
 URL: https://issues.apache.org/jira/browse/SPARK-45300
 Project: Spark
  Issue Type: Test
  Components: SQL, Tests
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45300) Remove JDK 8 workaround in TimestampFormatterSuite

2023-09-24 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-45300:
-
Priority: Minor  (was: Major)

> Remove JDK 8 workaround in TimestampFormatterSuite
> --
>
> Key: SPARK-45300
> URL: https://issues.apache.org/jira/browse/SPARK-45300
> Project: Spark
>  Issue Type: Test
>  Components: SQL, Tests
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45299) Remove JDK 8 workaround in UtilsSuite

2023-09-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45299:
---
Labels: pull-request-available  (was: )

> Remove JDK 8 workaround in UtilsSuite
> -
>
> Key: SPARK-45299
> URL: https://issues.apache.org/jira/browse/SPARK-45299
> Project: Spark
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45299) Remove JDK 8 workaround in UtilsSuite

2023-09-24 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-45299:
-
Description: (was: In "Kill process, we don't need the JDK 8 workaround 
anymore)

> Remove JDK 8 workaround in UtilsSuite
> -
>
> Key: SPARK-45299
> URL: https://issues.apache.org/jira/browse/SPARK-45299
> Project: Spark
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45299) Remove JDK 8 workaround in UtilsSuite

2023-09-24 Thread Hyukjin Kwon (Jira)

Hyukjin Kwon created SPARK-45299:


 Summary: Remove JDK 8 workaround in UtilsSuite
 Key: SPARK-45299
 URL: https://issues.apache.org/jira/browse/SPARK-45299
 Project: Spark
  Issue Type: Test
  Components: Tests
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


In "Kill process, we don't need the JDK 8 workaround anymore



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45298) Remove the workaround for JDK-8228469 in SPARK-31959 test

2023-09-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45298:
---
Labels: pull-request-available  (was: )

> Remove the workaround for JDK-8228469 in SPARK-31959 test
> -
>
> Key: SPARK-45298
> URL: https://issues.apache.org/jira/browse/SPARK-45298
> Project: Spark
>  Issue Type: Test
>  Components: SQL, Tests
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> In https://issues.apache.org/jira/browse/SPARK-31959, we added a bit of 
> workaround for outdated timezone in the tests. We can now remove them because 
> we dropped JDK 11 at SPARK-44112



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45298) Remove the workaround for JDK-8228469 in SPARK-31959 test

2023-09-24 Thread Hyukjin Kwon (Jira)

Hyukjin Kwon created SPARK-45298:


 Summary: Remove the workaround for JDK-8228469 in SPARK-31959 test
 Key: SPARK-45298
 URL: https://issues.apache.org/jira/browse/SPARK-45298
 Project: Spark
  Issue Type: Improvement
  Components: SQL, Tests
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


In https://issues.apache.org/jira/browse/SPARK-31959, we added a bit of 
workaround for outdated timezone in the tests. We can now remove them because 
we dropped JDK 11 at SPARK-44112



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45298) Remove the workaround for JDK-8228469 in SPARK-31959 test

2023-09-24 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-45298:
-
Issue Type: Test  (was: Improvement)

> Remove the workaround for JDK-8228469 in SPARK-31959 test
> -
>
> Key: SPARK-45298
> URL: https://issues.apache.org/jira/browse/SPARK-45298
> Project: Spark
>  Issue Type: Test
>  Components: SQL, Tests
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> In https://issues.apache.org/jira/browse/SPARK-31959, we added a bit of 
> workaround for outdated timezone in the tests. We can now remove them because 
> we dropped JDK 11 at SPARK-44112



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45297) Remove workaround for dateformatter added in SPARK-31827

2023-09-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45297:
---
Labels: pull-request-available  (was: )

> Remove workaround for dateformatter added in SPARK-31827
> 
>
> Key: SPARK-45297
> URL: https://issues.apache.org/jira/browse/SPARK-45297
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> We dropped JDK 8 at SPARK-44112, and we don't need the workaround for 
> SPARK-31827 anymore.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45297) Remove workaround for dateformatter added in SPARK-31827

2023-09-24 Thread Hyukjin Kwon (Jira)

Hyukjin Kwon created SPARK-45297:


 Summary: Remove workaround for dateformatter added in SPARK-31827
 Key: SPARK-45297
 URL: https://issues.apache.org/jira/browse/SPARK-45297
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


We dropped JDK 8 at SPARK-44112, and we don't need the workaround for 
SPARK-31827 anymore.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45296) Comment out unused JDK 11 related in dev/run-tests.py

2023-09-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45296:
---
Labels: pull-request-available  (was: )

> Comment out unused JDK 11 related in dev/run-tests.py
> -
>
> Key: SPARK-45296
> URL: https://issues.apache.org/jira/browse/SPARK-45296
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, Project Infra
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> {code}
> # set up java11 env if this is a pull request build with 'test-java11' in 
> the title
> if "ghprbPullTitle" in os.environ:
> if "test-java11" in os.environ["ghprbPullTitle"].lower():
> os.environ["JAVA_HOME"] = "/usr/java/jdk-11.0.1"
> os.environ["PATH"] = "%s/bin:%s" % (os.environ["JAVA_HOME"], 
> os.environ["PATH"])
> test_profiles += ["-Djava.version=11"]
> {code}
> we don't need this anymore



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45296) Comment out unused JDK 11 related in dev/run-tests.py

2023-09-24 Thread Hyukjin Kwon (Jira)

Hyukjin Kwon created SPARK-45296:


 Summary: Comment out unused JDK 11 related in dev/run-tests.py
 Key: SPARK-45296
 URL: https://issues.apache.org/jira/browse/SPARK-45296
 Project: Spark
  Issue Type: Improvement
  Components: Build, Project Infra
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


{code}
# set up java11 env if this is a pull request build with 'test-java11' in 
the title
if "ghprbPullTitle" in os.environ:
if "test-java11" in os.environ["ghprbPullTitle"].lower():
os.environ["JAVA_HOME"] = "/usr/java/jdk-11.0.1"
os.environ["PATH"] = "%s/bin:%s" % (os.environ["JAVA_HOME"], 
os.environ["PATH"])
test_profiles += ["-Djava.version=11"]
{code}

we don't need this anymore



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45295) Remove Utils.isMemberClass workaround for JDK 8

2023-09-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45295:
---
Labels: pull-request-available  (was: )

> Remove Utils.isMemberClass workaround for JDK 8
> ---
>
> Key: SPARK-45295
> URL: https://issues.apache.org/jira/browse/SPARK-45295
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> We dropped JDK 8 and 11 at SPARK-44112. We don't need the workaround anymore



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45295) Remove Utils.isMemberClass workaround for JDK 8

2023-09-24 Thread Hyukjin Kwon (Jira)

Hyukjin Kwon created SPARK-45295:


 Summary: Remove Utils.isMemberClass workaround for JDK 8
 Key: SPARK-45295
 URL: https://issues.apache.org/jira/browse/SPARK-45295
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core, SQL
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


We dropped JDK 8 and 11 at SPARK-44112. We don't need the workaround anymore



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45294) Use JDK 17 in Binder integration for PySpark live notebooks

2023-09-24 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-45294.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43077
[https://github.com/apache/spark/pull/43077]

> Use JDK 17 in Binder integration for PySpark live notebooks
> ---
>
> Key: SPARK-45294
> URL: https://issues.apache.org/jira/browse/SPARK-45294
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> See https://github.com/apache/spark/blob/master/binder/apt.txt



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45294) Use JDK 17 in Binder integration for PySpark live notebooks

2023-09-24 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-45294:


Assignee: Hyukjin Kwon

> Use JDK 17 in Binder integration for PySpark live notebooks
> ---
>
> Key: SPARK-45294
> URL: https://issues.apache.org/jira/browse/SPARK-45294
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> See https://github.com/apache/spark/blob/master/binder/apt.txt



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45294) Use JDK 17 in Binder integration for PySpark live notebooks

2023-09-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45294:
---
Labels: pull-request-available  (was: )

> Use JDK 17 in Binder integration for PySpark live notebooks
> ---
>
> Key: SPARK-45294
> URL: https://issues.apache.org/jira/browse/SPARK-45294
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> See https://github.com/apache/spark/blob/master/binder/apt.txt



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-44539) Upgrade RoaringBitmap to 1.0.0

2023-09-24 Thread BingKun Pan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BingKun Pan updated SPARK-44539:

Summary:  Upgrade RoaringBitmap to 1.0.0  (was:  Upgrade RoaringBitmap to 
0.9.46)

>  Upgrade RoaringBitmap to 1.0.0
> ---
>
> Key: SPARK-44539
> URL: https://issues.apache.org/jira/browse/SPARK-44539
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Priority: Trivial
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45294) Use JDK 17 in Binder integration for PySpark live notebooks

2023-09-24 Thread Hyukjin Kwon (Jira)

Hyukjin Kwon created SPARK-45294:


 Summary: Use JDK 17 in Binder integration for PySpark live 
notebooks
 Key: SPARK-45294
 URL: https://issues.apache.org/jira/browse/SPARK-45294
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


See https://github.com/apache/spark/blob/master/binder/apt.txt



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45276) Replace Java 8 and Java 11 installed in the Dockerfile with Java

2023-09-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45276:
---
Labels: pull-request-available  (was: )

> Replace Java 8 and Java 11 installed in the Dockerfile with Java
> 
>
> Key: SPARK-45276
> URL: https://issues.apache.org/jira/browse/SPARK-45276
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Major
>  Labels: pull-request-available
>
> including dev/create-release/spark-rm/Dockerfile and 
> connector/docker/spark-test/base/Dockerfile
> There might be others as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45293) Install Java 17 for docker

2023-09-24 Thread BingKun Pan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BingKun Pan resolved SPARK-45293.
-
Resolution: Duplicate

https://issues.apache.org/jira/browse/SPARK-45276

> Install Java 17 for docker
> --
>
> Key: SPARK-45293
> URL: https://issues.apache.org/jira/browse/SPARK-45293
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45293) Install Java 17 for docker

2023-09-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45293:
---
Labels: pull-request-available  (was: )

> Install Java 17 for docker
> --
>
> Key: SPARK-45293
> URL: https://issues.apache.org/jira/browse/SPARK-45293
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45293) Install Java 17 for docker

2023-09-24 Thread BingKun Pan (Jira)

BingKun Pan created SPARK-45293:
---

 Summary: Install Java 17 for docker
 Key: SPARK-45293
 URL: https://issues.apache.org/jira/browse/SPARK-45293
 Project: Spark
  Issue Type: Improvement
  Components: Project Infra
Affects Versions: 4.0.0
Reporter: BingKun Pan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45207) Implement Error Enrichment for Scala Client

2023-09-24 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-45207:


Assignee: Yihong He

> Implement Error Enrichment for Scala Client
> ---
>
> Key: SPARK-45207
> URL: https://issues.apache.org/jira/browse/SPARK-45207
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Yihong He
>Assignee: Yihong He
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45240) Implement Error Enrichment for Python Client

2023-09-24 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-45240:


Assignee: Yihong He

> Implement Error Enrichment for Python Client
> 
>
> Key: SPARK-45240
> URL: https://issues.apache.org/jira/browse/SPARK-45240
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Yihong He
>Assignee: Yihong He
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45207) Implement Error Enrichment for Scala Client

2023-09-24 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-45207.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 42987
[https://github.com/apache/spark/pull/42987]

> Implement Error Enrichment for Scala Client
> ---
>
> Key: SPARK-45207
> URL: https://issues.apache.org/jira/browse/SPARK-45207
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Yihong He
>Assignee: Yihong He
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45240) Implement Error Enrichment for Python Client

2023-09-24 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-45240.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43034
[https://github.com/apache/spark/pull/43034]

> Implement Error Enrichment for Python Client
> 
>
> Key: SPARK-45240
> URL: https://issues.apache.org/jira/browse/SPARK-45240
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Yihong He
>Assignee: Yihong He
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-43259) Assign a name to the error class _LEGACY_ERROR_TEMP_2024

2023-09-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-43259:
---
Labels: pull-request-available starter  (was: starter)

> Assign a name to the error class _LEGACY_ERROR_TEMP_2024
> 
>
> Key: SPARK-43259
> URL: https://issues.apache.org/jira/browse/SPARK-43259
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Max Gekk
>Priority: Minor
>  Labels: pull-request-available, starter
>
> Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2024* defined in 
> {*}core/src/main/resources/error/error-classes.json{*}. The name should be 
> short but complete (look at the example in error-classes.json).
> Add a test which triggers the error from user code if such test still doesn't 
> exist. Check exception fields by using {*}checkError(){*}. The last function 
> checks valuable error fields only, and avoids dependencies from error text 
> message. In this way, tech editors can modify error format in 
> error-classes.json, and don't worry of Spark's internal tests. Migrate other 
> tests that might trigger the error onto checkError().
> If you cannot reproduce the error from user space (using SQL query), replace 
> the error by an internal error, see {*}SparkException.internalError(){*}.
> Improve the error message format in error-classes.json if the current is not 
> clear. Propose a solution to users how to avoid and fix such kind of errors.
> Please, look at the PR below as examples:
>  * [https://github.com/apache/spark/pull/38685]
>  * [https://github.com/apache/spark/pull/38656]
>  * [https://github.com/apache/spark/pull/38490]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-43900) Support optimize skewed partitions even if introduce extra shuffle

2023-09-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-43900:
---
Labels: pull-request-available  (was: )

> Support optimize skewed partitions even if introduce extra shuffle
> --
>
> Key: SPARK-43900
> URL: https://issues.apache.org/jira/browse/SPARK-43900
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Zhen Wang
>Priority: Major
>  Labels: pull-request-available
>
> Similar to [SPARK-33832|https://issues.apache.org/jira/browse/SPARK-33832], 
> OptimizeSkewInRebalancePartitions will not apply if skew mitigation causes a 
> new shuffle.
> Test case (data skew in RebalancePartition):
> {code:java}
> *(2) HashAggregate(keys=[c1#226], functions=[count(1)], output=[c1#226, 
> count(1)#231L])
> +- *(2) HashAggregate(keys=[c1#226], functions=[partial_count(1)], 
> output=[c1#226, count#235L])
>    +- AQEShuffleRead coalesced
>       +- ShuffleQueryStage 0
>          +- Exchange hashpartitioning(c1#226, 5), 
> REBALANCE_PARTITIONS_BY_COL, [plan_id=106]
>             +- *(1) Project [key#221 AS c1#226]
>                +- *(1) SerializeFromObject 
> [knownnotnull(assertnotnull(input[0, 
> org.apache.spark.sql.test.SQLTestData$TestData, true])).key AS key#221]
>                   +- Scan[obj#220] {code}
> expect:
> {code:java}
> HashAggregate(keys=[c1#226], functions=[count(1)], output=[c1#226, 
> count(1)#231L])
> +- AQEShuffleRead coalesced
>    +- ShuffleQueryStage 1
>       +- Exchange hashpartitioning(c1#226, 5), ENSURE_REQUIREMENTS, 
> [plan_id=140]
>          +- *(2) HashAggregate(keys=[c1#226], functions=[partial_count(1)], 
> output=[c1#226, count#235L])
>             +- AQEShuffleRead coalesced and skewed
>                +- ShuffleQueryStage 0
>                   +- Exchange hashpartitioning(c1#226, 5), 
> REBALANCE_PARTITIONS_BY_COL, [plan_id=106]
>                      +- *(1) Project [key#221 AS c1#226]
>                         +- *(1) SerializeFromObject 
> [knownnotnull(assertnotnull(input[0, 
> org.apache.spark.sql.test.SQLTestData$TestData, true])).key AS key#221]
>                            +- Scan[obj#220] {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45279) Attach plan_id for all logical plan

2023-09-24 Thread Ruifeng Zheng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-45279.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43055
[https://github.com/apache/spark/pull/43055]

> Attach plan_id for all logical plan
> ---
>
> Key: SPARK-45279
> URL: https://issues.apache.org/jira/browse/SPARK-45279
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-42947) Spark Thriftserver LDAP should not use DN pattern if user contains domain

2023-09-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-42947:
---
Labels: pull-request-available  (was: )

> Spark Thriftserver LDAP should not use DN pattern if user contains domain
> -
>
> Key: SPARK-42947
> URL: https://issues.apache.org/jira/browse/SPARK-42947
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Jiayi Liu
>Priority: Major
>  Labels: pull-request-available
>
> When the LDAP provider has domain configuration, such as Active Directory, 
> the principal should not be constructed according to the DN pattern, but the 
> username containing the domain should be directly passed to the LDAP provider 
> as the principal. We can refer to the implementation of Hive LdapUtils.
> When the username contains a domain or domain passes from 
> hive.server2.authentication.ldap.Domain configuration, if we construct the 
> principal according to the DN pattern (For example, 
> uid=user@domain,dc=test,dc=com), we will get the following error:
> {code:java}
> 23/03/28 11:01:48 ERROR TSaslTransport: SASL negotiation failure
> javax.security.sasl.SaslException: Error validating the login
>   at 
> org.apache.hive.service.auth.PlainSaslServer.evaluateResponse(PlainSaslServer.java:108)
>  ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1]
>   at 
> org.apache.thrift.transport.TSaslTransport$SaslParticipant.evaluateChallengeOrResponse(TSaslTransport.java:537)
>  ~[libthrift-0.12.0.jar:0.12.0]
>   at 
> org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:283) 
> ~[libthrift-0.12.0.jar:0.12.0]
>   at 
> org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:43)
>  ~[libthrift-0.12.0.jar:0.12.0]
>   at 
> org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:223)
>  ~[libthrift-0.12.0.jar:0.12.0]
>   at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:293)
>  ~[libthrift-0.12.0.jar:0.12.0]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  ~[?:1.8.0_352]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  ~[?:1.8.0_352]
>   at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_352]
> Caused by: javax.security.sasl.AuthenticationException: Error validating LDAP 
> user
>   at 
> org.apache.hive.service.auth.LdapAuthenticationProviderImpl.Authenticate(LdapAuthenticationProviderImpl.java:76)
>  ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1]
>   at 
> org.apache.hive.service.auth.PlainSaslHelper$PlainServerCallbackHandler.handle(PlainSaslHelper.java:105)
>  ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1]
>   at 
> org.apache.hive.service.auth.PlainSaslServer.evaluateResponse(PlainSaslServer.java:101)
>  ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1]
>   ... 8 more
> Caused by: javax.naming.AuthenticationException: [LDAP: error code 49 - 
> 80090308: LdapErr: DSID-0C0903D9, comment: AcceptSecurityContext error, data 
> 52e, v2580]
>   at com.sun.jndi.ldap.LdapCtx.mapErrorCode(LdapCtx.java:3261) 
> ~[?:1.8.0_352]
>   at com.sun.jndi.ldap.LdapCtx.processReturnCode(LdapCtx.java:3207) 
> ~[?:1.8.0_352]
>   at com.sun.jndi.ldap.LdapCtx.processReturnCode(LdapCtx.java:2993) 
> ~[?:1.8.0_352]
>   at com.sun.jndi.ldap.LdapCtx.connect(LdapCtx.java:2907) ~[?:1.8.0_352]
>   at com.sun.jndi.ldap.LdapCtx.(LdapCtx.java:347) ~[?:1.8.0_352]
>   at 
> com.sun.jndi.ldap.LdapCtxFactory.getLdapCtxFromUrl(LdapCtxFactory.java:229) 
> ~[?:1.8.0_352]
>   at 
> com.sun.jndi.ldap.LdapCtxFactory.getUsingURL(LdapCtxFactory.java:189) 
> ~[?:1.8.0_352]
>   at 
> com.sun.jndi.ldap.LdapCtxFactory.getUsingURLs(LdapCtxFactory.java:247) 
> ~[?:1.8.0_352]
>   at 
> com.sun.jndi.ldap.LdapCtxFactory.getLdapCtxInstance(LdapCtxFactory.java:154) 
> ~[?:1.8.0_352]
>   at 
> com.sun.jndi.ldap.LdapCtxFactory.getInitialContext(LdapCtxFactory.java:84) 
> ~[?:1.8.0_352]
>   at 
> javax.naming.spi.NamingManager.getInitialContext(NamingManager.java:695) 
> ~[?:1.8.0_352]
>   at 
> javax.naming.InitialContext.getDefaultInitCtx(InitialContext.java:313) 
> ~[?:1.8.0_352]
>   at javax.naming.InitialContext.init(InitialContext.java:244) 
> ~[?:1.8.0_352]
>   at javax.naming.InitialContext.(InitialContext.java:216) 
> ~[?:1.8.0_352]
>   at 
> javax.naming.directory.InitialDirContext.(InitialDirContext.java:101) 
> ~[?:1.8.0_352]
>   at 
> org.apache.hive.service.auth.LdapAuthenticationProviderImpl.Authenticate(LdapAuthenticationProviderImpl.java:73)
>  ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1]
>   at 
>

[jira] [Assigned] (SPARK-45279) Attach plan_id for all logical plan

2023-09-24 Thread Ruifeng Zheng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng reassigned SPARK-45279:
-

Assignee: Ruifeng Zheng

> Attach plan_id for all logical plan
> ---
>
> Key: SPARK-45279
> URL: https://issues.apache.org/jira/browse/SPARK-45279
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-44077) Session Configs were not getting honored in RDDs

2023-09-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-44077:
---
Labels: pull-request-available  (was: )

> Session Configs were not getting honored in RDDs
> 
>
> Key: SPARK-44077
> URL: https://issues.apache.org/jira/browse/SPARK-44077
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Kapil Singh
>Priority: Major
>  Labels: pull-request-available
>
> When calling SQLConf.get on executors, the configs are read from the local 
> properties on the TaskContext. The local properties are populated driver-side 
> when scheduling the job, using the properties found in 
> sparkContext.localProperties. For RDD actions, local properties were not 
> getting populated.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31177) DataFrameReader.csv incorrectly reads gzip encoded CSV from S3 when it has non-".gz" extension

2023-09-24 Thread Mark Waddle (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17768464#comment-17768464
 ] 

Mark Waddle commented on SPARK-31177:
-

[~Minskya]the resolution is “incomplete”, so I don’t think it’s fixed. I worked 
around it by renaming files to end in .gz extension.

> DataFrameReader.csv incorrectly reads gzip encoded CSV from S3 when it has 
> non-".gz" extension
> --
>
> Key: SPARK-31177
> URL: https://issues.apache.org/jira/browse/SPARK-31177
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output
>Affects Versions: 2.4.4
>Reporter: Mark Waddle
>Priority: Major
>  Labels: bulk-closed
>
> i have large CSV files that are gzipped and uploaded to S3 with 
> Content-Encoding=gzip. the files have file extension ".csv", as most web 
> clients will automatically decompress the file based on the Content-Encoding 
> header. using pyspark to read these CSV files does not mimic this behavior.
> works as expected:
> {code:java}
> df = spark.read.csv('s3://bucket/large.csv.gz', header=True)
> {code}
> does not decompress and tries to load entire contents of file as the first 
> row:
> {code:java}
> df = spark.read.csv('s3://bucket/large.csv', header=True)
> {code}
> it looks like it's relying on the file extension to determine if the file is 
> gzip compressed or not. it would be great if S3 resources, and any other http 
> based resources, could consult the Content-Encoding response header as well.
> i tried to find the code that determines this, but i'm not familiar with the 
> code base. any pointers would be helpful. and i can look into fixing it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45286) Add back Matomo analytics to release docs

2023-09-24 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-45286.
--
Fix Version/s: 3.3.4
   3.5.1
   4.0.0
   3.4.2
   Resolution: Fixed

Issue resolved by pull request 43063
[https://github.com/apache/spark/pull/43063]

> Add back Matomo analytics to release docs
> -
>
> Key: SPARK-45286
> URL: https://issues.apache.org/jira/browse/SPARK-45286
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 4.0.0
>Reporter: Sean R. Owen
>Assignee: Sean R. Owen
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.3.4, 3.5.1, 4.0.0, 3.4.2
>
>
> We had previously removed Google Analytics from the website and release docs, 
> per ASF policy: https://github.com/apache/spark/pull/36310
> We just restored analytics using the ASF-hosted Matomo service on the website:
> https://github.com/apache/spark-website/commit/a1548627b48a62c2e51870d1488ca3e09397bd30
> This change would put the same new tracking code back into the release docs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45292) Remove Guava from shared classes from IsolatedClientLoader

2023-09-24 Thread Cheng Pan (Jira)

Cheng Pan created SPARK-45292:
-

 Summary: Remove Guava from shared classes from IsolatedClientLoader
 Key: SPARK-45292
 URL: https://issues.apache.org/jira/browse/SPARK-45292
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 4.0.0
Reporter: Cheng Pan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-44366) Migrate antlr4 from 4.9 to 4.10+

2023-09-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-44366:
---
Labels: pull-request-available  (was: )

> Migrate antlr4 from 4.9 to 4.10+
> 
>
> Key: SPARK-44366
> URL: https://issues.apache.org/jira/browse/SPARK-44366
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-44170) Migrating Junit4 to Junit5

2023-09-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-44170:
---
Labels: pull-request-available  (was: )

> Migrating Junit4 to Junit5
> --
>
> Key: SPARK-44170
> URL: https://issues.apache.org/jira/browse/SPARK-44170
> Project: Spark
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Major
>  Labels: pull-request-available
>
> JUnit5 is a powerful and flexible update to the JUnit framework, and it 
> provides a variety of improvements and new features to organize and
> describe test cases, as well as help in understanding test results：
>  # JUnit 5 leverages features from Java 8 or later, such as lambda functions, 
> making tests more powerful and easier to maintain, but Junit 4 still a Java 7 
> compatible version
>  # JUnit 5 has added some useful new features for describing, organizing, and 
> executing tests. For examples: [Parameterized 
> Tests|https://junit.org/junit5/docs/current/user-guide/#writing-tests-parameterized-tests]
>  and [Conditional Test 
> Execution|https://junit.org/junit5/docs/current/user-guide/#extensions-conditions]
>  may make our test code look simpler, [Parallel 
> Execution|https://junit.org/junit5/docs/current/user-guide/#writing-tests-parallel-execution]
>  may make our test faster
>  
> More importantly, Junit4 is currently an inactive project, which has not 
> released a new version for more than two years
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45291) Use unknown query execution id instead of no such app when id is invalid

2023-09-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45291:
---
Labels: pull-request-available  (was: )

> Use unknown query execution id instead of no such app when id is invalid
> 
>
> Key: SPARK-45291
> URL: https://issues.apache.org/jira/browse/SPARK-45291
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, UI
>Affects Versions: 3.5.0, 4.0.0
>Reporter: Kent Yao
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45291) Use unknown query execution id instead of no such app when id is invalid

2023-09-24 Thread Kent Yao (Jira)

Kent Yao created SPARK-45291:


 Summary: Use unknown query execution id instead of no such app 
when id is invalid
 Key: SPARK-45291
 URL: https://issues.apache.org/jira/browse/SPARK-45291
 Project: Spark
  Issue Type: Bug
  Components: SQL, UI
Affects Versions: 3.5.0, 4.0.0
Reporter: Kent Yao






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-45282) Join loses records for cached datasets

2023-09-24 Thread Yuming Wang (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17768399#comment-17768399
 ] 

Yuming Wang commented on SPARK-45282:
-

cc [~ulysses] [~cloud_fan]

> Join loses records for cached datasets
> --
>
> Key: SPARK-45282
> URL: https://issues.apache.org/jira/browse/SPARK-45282
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.1, 3.5.0
> Environment: spark 3.4.1 on apache hadoop 3.3.6 or kubernetes 1.26 or 
> databricks 13.3
>Reporter: koert kuipers
>Priority: Major
>  Labels: CorrectnessBug, correctness
>
> we observed this issue on spark 3.4.1 but it is also present on 3.5.0. it is 
> not present on spark 3.3.1.
> it only shows up in distributed environment. i cannot replicate in unit test. 
> however i did get it to show up on hadoop cluster, kubernetes, and on 
> databricks 13.3
> the issue is that records are dropped when two cached dataframes are joined. 
> it seems in spark 3.4.1 in queryplan some Exchanges are dropped as an 
> optimization while in spark 3.3.1 these Exhanges are still present. it seems 
> to be an issue with AQE with canChangeCachedPlanOutputPartitioning=true.
> to reproduce on distributed cluster these settings needed are:
> {code:java}
> spark.sql.adaptive.advisoryPartitionSizeInBytes 33554432
> spark.sql.adaptive.coalescePartitions.parallelismFirst false
> spark.sql.adaptive.enabled true
> spark.sql.optimizer.canChangeCachedPlanOutputPartitioning true {code}
> code using scala to reproduce is:
> {code:java}
> import java.util.UUID
> import org.apache.spark.sql.functions.col
> import spark.implicits._
> val data = (1 to 100).toDS().map(i => 
> UUID.randomUUID().toString).persist()
> val left = data.map(k => (k, 1))
> val right = data.map(k => (k, k)) // if i change this to k => (k, 1) it works!
> println("number of left " + left.count())
> println("number of right " + right.count())
> println("number of (left join right) " +
>   left.toDF("key", "value1").join(right.toDF("key", "value2"), "key").count()
> )
> val left1 = left
>   .toDF("key", "value1")
>   .repartition(col("key")) // comment out this line to make it work
>   .persist()
> println("number of left1 " + left1.count())
> val right1 = right
>   .toDF("key", "value2")
>   .repartition(col("key")) // comment out this line to make it work
>   .persist()
> println("number of right1 " + right1.count())
> println("number of (left1 join right1) " +  left1.join(right1, 
> "key").count()) // this gives incorrect result{code}
> this produces the following output:
> {code:java}
> number of left 100
> number of right 100
> number of (left join right) 100
> number of left1 100
> number of right1 100
> number of (left1 join right1) 859531 {code}
> note that the last number (the incorrect one) actually varies depending on 
> settings and cluster size etc.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-44366) Migrate antlr4 from 4.9 to 4.10+

2023-09-24 Thread Yang Jie (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-44366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17768393#comment-17768393
 ] 

Yang Jie commented on SPARK-44366:
--

+1

> Migrate antlr4 from 4.9 to 4.10+
> 
>
> Key: SPARK-44366
> URL: https://issues.apache.org/jira/browse/SPARK-44366
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45289) ClassCastException when reading Delta table on AWS S3

2023-09-24 Thread Tanawat Panmongkol (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tanawat Panmongkol updated SPARK-45289:
---
Description: 
When attempting to read a Delta table from S3 using version 3.5.0, a 
_*{{ClassCastException}}*_ occurs involving 
{{_*org.apache.hadoop.fs.s3a.S3AFileStatus*_}} and 
{_}*{{org.apache.spark.sql.execution.datasources.FileStatusWithMetadata}}*{_}. 
The error appears to be related to the new feature SPARK-43039.

_*Steps to Reproduce:*_
{code:java}
export AWS_ACCESS_KEY_ID=''
export AWS_SECRET_ACCESS_KEY=''
export AWS_REGION=''

docker run --rm -it apache/spark:3.5.0-scala2.12-java11-ubuntu 
/opt/spark/bin/spark-shell \
--packages 'org.apache.hadoop:hadoop-aws:3.3.4,io.delta:delta-core_2.12:2.4.0' \
--conf 
"spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog"
 \
--conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" \
--conf "spark.hadoop.aws.region=$AWS_REGION" \
--conf "spark.hadoop.fs.s3a.access.key=$AWS_ACCESS_KEY_ID" \
--conf "spark.hadoop.fs.s3a.secret.key=$AWS_SECRET_ACCESS_KEY" \
--conf "spark.hadoop.fs.s3.impl=org.apache.hadoop.fs.s3a.S3AFileSystem" \
--conf "spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem" \
--conf "spark.hadoop.fs.s3a.path.style.access=true" \
--conf "spark.hadoop.fs.s3a.connection.ssl.enabled=true" \
--conf "spark.jars.ivy=/tmp/ivy/cache"{code}
{code:java}
scala> 
spark.read.format("delta").load("s3:").show()
 {code}
*Logs:*
{code:java}
java.lang.ClassCastException: class org.apache.hadoop.fs.s3a.S3AFileStatus 
cannot be cast to class 
org.apache.spark.sql.execution.datasources.FileStatusWithMetadata 
(org.apache.hadoop.fs.s3a.S3AFileStatus is in unnamed module of loader 
scala.reflect.internal.util.ScalaClassLoader$URLClassLoader @4552f905; 
org.apache.spark.sql.execution.datasources.FileStatusWithMetadata is in unnamed 
module of loader 'app')
  at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
  at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
  at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
  at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38)
  at scala.collection.TraversableLike.map(TraversableLike.scala:286)
  at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
  at scala.collection.AbstractTraversable.map(Traversable.scala:108)
  at 
org.apache.spark.sql.execution.FileSourceScanLike.$anonfun$setFilesNumAndSizeMetric$2(DataSourceScanExec.scala:466)
  at 
org.apache.spark.sql.execution.FileSourceScanLike.$anonfun$setFilesNumAndSizeMetric$2$adapted(DataSourceScanExec.scala:466)
  at scala.collection.immutable.List.map(List.scala:293)
  at 
org.apache.spark.sql.execution.FileSourceScanLike.setFilesNumAndSizeMetric(DataSourceScanExec.scala:466)
  at 
org.apache.spark.sql.execution.FileSourceScanLike.selectedPartitions(DataSourceScanExec.scala:257)
  at 
org.apache.spark.sql.execution.FileSourceScanLike.selectedPartitions$(DataSourceScanExec.scala:251)
  at 
org.apache.spark.sql.execution.FileSourceScanExec.selectedPartitions$lzycompute(DataSourceScanExec.scala:506)
  at 
org.apache.spark.sql.execution.FileSourceScanExec.selectedPartitions(DataSourceScanExec.scala:506)
  at 
org.apache.spark.sql.execution.FileSourceScanLike.dynamicallySelectedPartitions(DataSourceScanExec.scala:286)
  at 
org.apache.spark.sql.execution.FileSourceScanLike.dynamicallySelectedPartitions$(DataSourceScanExec.scala:267)
  at 
org.apache.spark.sql.execution.FileSourceScanExec.dynamicallySelectedPartitions$lzycompute(DataSourceScanExec.scala:506)
  at 
org.apache.spark.sql.execution.FileSourceScanExec.dynamicallySelectedPartitions(DataSourceScanExec.scala:506)
  at 
org.apache.spark.sql.execution.FileSourceScanExec.inputRDD$lzycompute(DataSourceScanExec.scala:553)
  at 
org.apache.spark.sql.execution.FileSourceScanExec.inputRDD(DataSourceScanExec.scala:537)
  at 
org.apache.spark.sql.execution.FileSourceScanExec.doExecute(DataSourceScanExec.scala:575)
  at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:195)
  at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:246)
  at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
  at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:243)
  at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:191)
  at 
org.apache.spark.sql.execution.InputAdapter.inputRDD(WholeStageCodegenExec.scala:527)
  at 
org.apache.spark.sql.execution.InputRDDCodegen.inputRDDs(WholeStageCodegenExec.scala:455)
  at 
org.apache.spark.sql.execution.InputRDDCodegen.inputRDDs$(WholeStageCodegenExec.scala:454)
  at 
org.apache.spark.sql.execution.InputAdapter.inputRDDs(WholeStageCodegenExec.scala:498)
  at

59 matches

Mail list logo