[jira] [Resolved] (SPARK-47163) Fix `make-distribution.sh` to check `jackson-core-asl-1.9.13.jar` existence first

2024-02-25 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-47163.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45253
[https://github.com/apache/spark/pull/45253]

> Fix `make-distribution.sh` to check `jackson-core-asl-1.9.13.jar` existence 
> first
> -
>
> Key: SPARK-47163
> URL: https://issues.apache.org/jira/browse/SPARK-47163
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47163) Fix `make-distribution.sh` to check `jackson-core-asl-1.9.13.jar` existence first

2024-02-25 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-47163:
-

Assignee: Dongjoon Hyun

> Fix `make-distribution.sh` to check `jackson-core-asl-1.9.13.jar` existence 
> first
> -
>
> Key: SPARK-47163
> URL: https://issues.apache.org/jira/browse/SPARK-47163
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47162) Exclude `CodeHaus Jackson` dependencies from Master/Worker/HistoryServer classpaths

2024-02-25 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-47162:
--
Description: This is resolved via SPARK-47152.

> Exclude `CodeHaus Jackson` dependencies from Master/Worker/HistoryServer 
> classpaths
> ---
>
> Key: SPARK-47162
> URL: https://issues.apache.org/jira/browse/SPARK-47162
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
> Fix For: 4.0.0
>
>
> This is resolved via SPARK-47152.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47160) Update K8s `Dockerfile` to include `hive-jackson` directory if exists

2024-02-25 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-47160.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45251
[https://github.com/apache/spark/pull/45251]

> Update K8s `Dockerfile` to include `hive-jackson` directory if exists
> -
>
> Key: SPARK-47160
> URL: https://issues.apache.org/jira/browse/SPARK-47160
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47161) Uses hash key properly for SparkR build on Windows

2024-02-25 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-47161.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45252
[https://github.com/apache/spark/pull/45252]

> Uses hash key properly for SparkR build on Windows
> --
>
> Key: SPARK-47161
> URL: https://issues.apache.org/jira/browse/SPARK-47161
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> The cache is not being used
> https://github.com/apache/spark/actions/runs/8039485831/job/2195633



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47161) Uses hash key properly for SparkR build on Windows

2024-02-25 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-47161:
-

Assignee: Hyukjin Kwon

> Uses hash key properly for SparkR build on Windows
> --
>
> Key: SPARK-47161
> URL: https://issues.apache.org/jira/browse/SPARK-47161
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> The cache is not being used
> https://github.com/apache/spark/actions/runs/8039485831/job/2195633



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47163) Fix `make-distribution.sh` to check `jackson-core-asl-1.9.13.jar` existence first

2024-02-25 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-47163:
--
Summary: Fix `make-distribution.sh` to check `jackson-core-asl-1.9.13.jar` 
existence first  (was: Fix `make-distribution.sh` to check 
`jackson-*-asl-*.jar` existence first)

> Fix `make-distribution.sh` to check `jackson-core-asl-1.9.13.jar` existence 
> first
> -
>
> Key: SPARK-47163
> URL: https://issues.apache.org/jira/browse/SPARK-47163
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47163) Fix `make-distribution.sh` to check `jackson-*-asl-*.jar` existence first

2024-02-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47163:
---
Labels: pull-request-available  (was: )

> Fix `make-distribution.sh` to check `jackson-*-asl-*.jar` existence first
> -
>
> Key: SPARK-47163
> URL: https://issues.apache.org/jira/browse/SPARK-47163
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47163) Fix `make-distribution.sh` to check `jackson-*-asl-*.jar` existence first

2024-02-25 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-47163:
-

 Summary: Fix `make-distribution.sh` to check `jackson-*-asl-*.jar` 
existence first
 Key: SPARK-47163
 URL: https://issues.apache.org/jira/browse/SPARK-47163
 Project: Spark
  Issue Type: Sub-task
  Components: Build
Affects Versions: 4.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47162) Exclude `CodeHaus Jackson` dependencies from Master/Worker/HistoryServer classpaths

2024-02-25 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-47162.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

> Exclude `CodeHaus Jackson` dependencies from Master/Worker/HistoryServer 
> classpaths
> ---
>
> Key: SPARK-47162
> URL: https://issues.apache.org/jira/browse/SPARK-47162
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47161) Uses hash key properly for SparkR build on Windows

2024-02-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47161:
---
Labels: pull-request-available  (was: )

> Uses hash key properly for SparkR build on Windows
> --
>
> Key: SPARK-47161
> URL: https://issues.apache.org/jira/browse/SPARK-47161
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> The cache is not being used
> https://github.com/apache/spark/actions/runs/8039485831/job/2195633



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47162) Exclude `CodeHaus Jackson` dependencies from Master/Worker/HistoryServer class paths

2024-02-25 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-47162:
--
Summary: Exclude `CodeHaus Jackson` dependencies from 
Master/Worker/HistoryServer class paths  (was: Exclude `CodeHaus Jackson` 
dependencies from Master/Worker/HistoryServer)

> Exclude `CodeHaus Jackson` dependencies from Master/Worker/HistoryServer 
> class paths
> 
>
> Key: SPARK-47162
> URL: https://issues.apache.org/jira/browse/SPARK-47162
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47162) Exclude `CodeHaus Jackson` dependencies from Master/Worker/HistoryServer

2024-02-25 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-47162:
-

 Summary: Exclude `CodeHaus Jackson` dependencies from 
Master/Worker/HistoryServer
 Key: SPARK-47162
 URL: https://issues.apache.org/jira/browse/SPARK-47162
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47162) Exclude `CodeHaus Jackson` dependencies from Master/Worker/HistoryServer classpaths

2024-02-25 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-47162:
--
Summary: Exclude `CodeHaus Jackson` dependencies from 
Master/Worker/HistoryServer classpaths  (was: Exclude `CodeHaus Jackson` 
dependencies from Master/Worker/HistoryServer class paths)

> Exclude `CodeHaus Jackson` dependencies from Master/Worker/HistoryServer 
> classpaths
> ---
>
> Key: SPARK-47162
> URL: https://issues.apache.org/jira/browse/SPARK-47162
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47161) Uses hash key properly for SparkR build on Windows

2024-02-25 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-47161:


 Summary: Uses hash key properly for SparkR build on Windows
 Key: SPARK-47161
 URL: https://issues.apache.org/jira/browse/SPARK-47161
 Project: Spark
  Issue Type: Improvement
  Components: Project Infra
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


The cache is not being used

https://github.com/apache/spark/actions/runs/8039485831/job/2195633



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47160) Update K8s `Dockerfile` to include `hive-jackson` directory if exists

2024-02-25 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-47160:
--
Parent: SPARK-47046
Issue Type: Sub-task  (was: Improvement)

> Update K8s `Dockerfile` to include `hive-jackson` directory if exists
> -
>
> Key: SPARK-47160
> URL: https://issues.apache.org/jira/browse/SPARK-47160
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47160) Update K8s `Dockerfile` to include `hive-jackson` directory if exists

2024-02-25 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-47160:
-

 Summary: Update K8s `Dockerfile` to include `hive-jackson` 
directory if exists
 Key: SPARK-47160
 URL: https://issues.apache.org/jira/browse/SPARK-47160
 Project: Spark
  Issue Type: Improvement
  Components: Kubernetes
Affects Versions: 4.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47160) Update K8s `Dockerfile` to include `hive-jackson` directory if exists

2024-02-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47160:
---
Labels: pull-request-available  (was: )

> Update K8s `Dockerfile` to include `hive-jackson` directory if exists
> -
>
> Key: SPARK-47160
> URL: https://issues.apache.org/jira/browse/SPARK-47160
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47159) Set `OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES` in `MacOS` GitHub Action Job

2024-02-25 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-47159.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45249
[https://github.com/apache/spark/pull/45249]

> Set `OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES` in `MacOS` GitHub Action Job
> --
>
> Key: SPARK-47159
> URL: https://issues.apache.org/jira/browse/SPARK-47159
> Project: Spark
>  Issue Type: Test
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47159) Set `OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES` in `MacOS` GitHub Action Job

2024-02-25 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-47159:


Assignee: Dongjoon Hyun

> Set `OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES` in `MacOS` GitHub Action Job
> --
>
> Key: SPARK-47159
> URL: https://issues.apache.org/jira/browse/SPARK-47159
> Project: Spark
>  Issue Type: Test
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47159) Set `OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES` in `MacOS` GitHub Action Job

2024-02-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47159:
---
Labels: pull-request-available  (was: )

> Set `OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES` in `MacOS` GitHub Action Job
> --
>
> Key: SPARK-47159
> URL: https://issues.apache.org/jira/browse/SPARK-47159
> Project: Spark
>  Issue Type: Test
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47159) Set `OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES` in `MacOS` GitHub Action Job

2024-02-25 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-47159:
-

 Summary: Set `OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES` in `MacOS` 
GitHub Action Job
 Key: SPARK-47159
 URL: https://issues.apache.org/jira/browse/SPARK-47159
 Project: Spark
  Issue Type: Test
  Components: Project Infra
Affects Versions: 4.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47158) Assign proper name and sqlState to _LEGACY_ERROR_TEMP_2134 & 2231

2024-02-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47158:
---
Labels: pull-request-available  (was: )

> Assign proper name and sqlState to _LEGACY_ERROR_TEMP_2134 & 2231
> -
>
> Key: SPARK-47158
> URL: https://issues.apache.org/jira/browse/SPARK-47158
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
>
> Assign proper name and sqlState to _LEGACY_ERROR_TEMP_2134 & 2231



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47158) Assign proper name and sqlState to _LEGACY_ERROR_TEMP_2134 & 2231

2024-02-25 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-47158:

Summary: Assign proper name and sqlState to _LEGACY_ERROR_TEMP_2134 & 2231  
(was: Assign proper name to top LEGACY errors)

> Assign proper name and sqlState to _LEGACY_ERROR_TEMP_2134 & 2231
> -
>
> Key: SPARK-47158
> URL: https://issues.apache.org/jira/browse/SPARK-47158
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Assign proper name and sqlState to _LEGACY_ERROR_TEMP_2134 & 2231



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47158) Assign proper name to top LEGACY errors

2024-02-25 Thread Haejoon Lee (Jira)
Haejoon Lee created SPARK-47158:
---

 Summary: Assign proper name to top LEGACY errors
 Key: SPARK-47158
 URL: https://issues.apache.org/jira/browse/SPARK-47158
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 4.0.0
Reporter: Haejoon Lee


Assign proper name and sqlState to _LEGACY_ERROR_TEMP_2134 & 2231



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47120) Null comparison push down data filter from subquery produces in NPE in Parquet filter

2024-02-25 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-47120.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45202
[https://github.com/apache/spark/pull/45202]

> Null comparison push down data filter from subquery produces in NPE in 
> Parquet filter
> -
>
> Key: SPARK-47120
> URL: https://issues.apache.org/jira/browse/SPARK-47120
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.5.0
>Reporter: Cosmin Dumitru
>Assignee: Cosmin Dumitru
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> This issue has been introduced in 
> [https://github.com/apache/spark/pull/41088]  where we convert scalar 
> subqueries to literals and then convert the literals to 
> {{{}org.apache.spark.sql.sources.Filters{}}}. These filters are then pushed 
> down to parquet.
> If the literal is a comparison with {{null}} then the parquet filter 
> conversion code throws NPE. 
>  
> repro code which results in NPE
> {code:java}
> create table t1(d date) using parquet
> create table t2(d date) using parquet
> insert into t1 values date'2021-01-01'
> insert into t2 values (null)
> select * from t1 where 1=1 and d > (select d from t2){code}
> [fix PR |https://github.com/apache/spark/pull/45202/files]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47157) Introduce abstraction that facilitates more flexible management of file listings within the system.

2024-02-25 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-47157.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45224
[https://github.com/apache/spark/pull/45224]

> Introduce abstraction that facilitates more flexible management of file 
> listings within the system.
> ---
>
> Key: SPARK-47157
> URL: https://issues.apache.org/jira/browse/SPARK-47157
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.5.0
>Reporter: Costas Zarifis
>Assignee: Costas Zarifis
>Priority: Major
>  Labels: pull-request-available, sql
> Fix For: 4.0.0
>
>
> The introduction of these constructs is crucial for defining a standardized 
> API for file listing operations, regardless of the underlying representation 
> that's used to represent files and partitions. By introducing said 
> abstractions we improve the modularity of the code and enable future 
> improvements that can prove to be beneficial both for runtime and memory 
> improvements.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41125) Simple call to createDataFrame fails with PicklingError but only on python3.11

2024-02-25 Thread Jamie (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17820502#comment-17820502
 ] 

Jamie commented on SPARK-41125:
---

I can confirm this now works on python 3.11 using the latest available version 
of spark.

> Simple call to createDataFrame fails with PicklingError but only on python3.11
> --
>
> Key: SPARK-41125
> URL: https://issues.apache.org/jira/browse/SPARK-41125
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.3.1
>Reporter: Jamie
>Priority: Minor
> Attachments: screenshot-1.png
>
>
> I am using python's pytest library to write unit tests for a pyspark library 
> I am building. pytest has a popular capability called fixtures which allow us 
> to write reusable preparation steps for our tests. I have a simple fixture 
> that creates a pyspark.sql.DataFrame which works on python 3.7, 3.8, 3.9, 
> 3.10 but fails on python 3.11.
> The failing code is in a fixture called {{{}dataframe_of_purchases{}}}. Here 
> is my fixtures code:
> {code:python}
> from decimal import Decimal
> import pytest
> from pyspark.sql import DataFrame, SparkSession
> from pyspark.sql.types import (
> DecimalType,
> IntegerType,
> StringType,
> StructField,
> StructType,
> )
> @pytest.fixture(scope="session")
> def purchases_schema():
> return StructType(
> [
> StructField("Customer", StringType(), True),
> StructField("Store", StringType(), True),
> StructField("Channel", StringType(), True),
> StructField("Product", StringType(), True),
> StructField("Quantity", IntegerType(), True),
> StructField("Basket", StringType(), True),
> StructField("GrossSpend", DecimalType(10, 2), True),
> ]
> )
> @pytest.fixture(scope="session")
> def dataframe_of_purchases(purchases_schema) -> DataFrame:
> spark = SparkSession.builder.getOrCreate()
> return spark.createDataFrame(
> data=[
> ("Leia", "Hammersmith", "Instore", "Cheddar", 2, "Basket1", 
> Decimal(2.50))
> ],
> schema=purchases_schema,
> )
> {code}
> This code can be seen here: 
> [https://github.com/jamiekt/jstark/blob/9e1d0e654195932a0765f66db6c8359ed8b60a3b/tests/conftest.py]
> The tests run in a GitHub Actions CI pipeline against many different versions 
> of python on linux, Windows & MacOS. The tests only fail for python 3.11, and 
> on all platforms:
>  !screenshot-1.png! 
> This run can be seen at: 
> https://github.com/jamiekt/jstark/actions/runs/3457011099
> The error is
> {quote}_pickle.PicklingError: Could not serialize object: IndexError: tuple 
> index out of range{quote}
> The full stacktrace is:
>  
> {code}
> ../../../.local/share/hatch/env/virtual/jstark/fjzPEUEi/jstark/lib/python3.11/site-packages/pyspark/sql/session.py:894:
>  in createDataFrame
> return self._create_dataframe(
> ../../../.local/share/hatch/env/virtual/jstark/fjzPEUEi/jstark/lib/python3.11/site-packages/pyspark/sql/session.py:938:
>  in _create_dataframe
> jrdd = self._jvm.SerDeUtil.toJavaArray(rdd._to_java_object_rdd())
> ../../../.local/share/hatch/env/virtual/jstark/fjzPEUEi/jstark/lib/python3.11/site-packages/pyspark/rdd.py:3113:
>  in _to_java_object_rdd
> return self.ctx._jvm.SerDeUtil.pythonToJava(rdd._jrdd, True)
> ../../../.local/share/hatch/env/virtual/jstark/fjzPEUEi/jstark/lib/python3.11/site-packages/pyspark/rdd.py:3505:
>  in _jrdd
> wrapped_func = _wrap_function(
> ../../../.local/share/hatch/env/virtual/jstark/fjzPEUEi/jstark/lib/python3.11/site-packages/pyspark/rdd.py:3362:
>  in _wrap_function
> pickled_command, broadcast_vars, env, includes = 
> _prepare_for_python_RDD(sc, command)
> ../../../.local/share/hatch/env/virtual/jstark/fjzPEUEi/jstark/lib/python3.11/site-packages/pyspark/rdd.py:3345:
>  in _prepare_for_python_RDD
> pickled_command = ser.dumps(command)
> Traceback (most recent call last):
>   File 
> "/home/runner/.local/share/hatch/env/virtual/jstark/fjzPEUEi/jstark/lib/python3.11/site-packages/pyspark/serializers.py",
>  line 458, in dumps
> return cloudpickle.dumps(obj, pickle_protocol)
>^^^
>   File 
> "/home/runner/.local/share/hatch/env/virtual/jstark/fjzPEUEi/jstark/lib/python3.11/site-packages/pyspark/cloudpickle/cloudpickle_fast.py",
>  line 73, in dumps
> cp.dump(obj)
>   File 
> "/home/runner/.local/share/hatch/env/virtual/jstark/fjzPEUEi/jstark/lib/python3.11/site-packages/pyspark/cloudpickle/cloudpickle_fast.py",
>  line 602, in dump
> return Pickler.dump(self, obj)
>^^^
>   File 
> "/home/runner/.local/share/hatch/env/virtual/jstark/fjzPEUEi/jstark/

[jira] [Updated] (SPARK-47156) SparkSession returns a null context during a dataset creation

2024-02-25 Thread Marc Le Bihan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marc Le Bihan updated SPARK-47156:
--
Description: 
I need first to know if I'm in front of a bug or not.

If it's the case, I'll manage to create a test to help you reproduce the case, 
but if it isn't, maybe Spark documentation could explain when 
{{sparkSession.getContext()}} can return {{{}null{}}}.

 

I'm willing to ease my development by separating :
 * parquet files management \{ checking existence, then loading them as cache, 
or saving data to them },
 * from dataset creation, when it doesn't exist yet, and should be constituted 
from scratch.

 

The method I'm using is this one:
{code:java}
{code:Java}
protected Dataset constitutionStandard(OptionsCreationLecture 
optionsCreationLecture,
   Supplier> worker, CacheParqueteur 
cacheParqueteur) {
   OptionsCreationLecture options = optionsCreationLecture != null ? 
optionsCreationLecture : optionsCreationLecture();

   Dataset dataset = cacheParqueteur.call(options.useCache());
   return dataset == null ? 
cacheParqueteur.save(cacheParqueteur.appliquer(worker.get())) : dataset;
}
{code}
In case the dataset doesn't exist in parquet files (= cache) yet, it starts its 
creation by calling a {{worker.get()}} that is a {{Supplier}} of 
{{{}Dataset{}}}.

 

A concrete usage is this one:
{code:java}
{code:Java}
public Dataset rowEtablissements(OptionsCreationLecture 
optionsCreationLecture, HistoriqueExecution historiqueExecution, int anneeCOG, 
int anneeSIRENE, boolean actifsSeulement, boolean communesValides, boolean 
nomenclaturesNAF2Valides) {
   OptionsCreationLecture options = optionsCreationLecture != null ? 
optionsCreationLecture : optionsCreationLecture();
   Supplier> worker = () -> {
  super.setStageDescription(this.messageSource, 
"row.etablissements.libelle.long", "row.etablissements.libelle.court", 
anneeSIRENE, anneeCOG, actifsSeulement, communesValides, 
nomenclaturesNAF2Valides);
  
  Map indexs = new HashMap<>();
  Dataset etablissements = 
etablissementsNonFiltres(optionsCreationLecture, anneeSIRENE);

  etablissements = etablissements.filter(
 (FilterFunction)etablissement -> 
this.validator.validationEtablissement(this.session, historiqueExecution, 
etablissement, actifsSeulement, nomenclaturesNAF2Valides, indexs));

  // Si le filtrage par communes valides a été demandé, l'appliquer.
  if (communesValides) {
 etablissements = rowRestreindreAuxCommunesValides(etablissements, 
anneeCOG, anneeSIRENE, indexs);
  }
  else {
 etablissements = etablissements.withColumn("codeDepartement", 
substring(CODE_COMMUNE.col(), 1, 2));
  }

  // Associer les libellés des codes APE/NAF.
  Dataset nomenclatureNAF = 
this.nafDataset.rowNomenclatureNAF(anneeSIRENE);
  etablissements = etablissements.join(nomenclatureNAF, 
etablissements.col("activitePrincipale").equalTo(nomenclatureNAF.col("codeNAF"))
 , "left_outer")
 .drop("codeNAF", "niveauNAF");

  // Le dataset est maintenant considéré comme valide, et ses champs 
peuvent être castés dans leurs types définitifs.
  return this.validator.cast(etablissements);
   };

   return constitutionStandard(options, () -> worker.get()
  .withColumn("partitionSiren", SIREN_ENTREPRISE.col().substr(1,2)),
  new CacheParqueteur<>(options, this.session,
 "etablissements", 
"annee_{0,number,#0}-actifs_{1}-communes_verifiees_{2}-nafs_verifies_{3}", 
DEPARTEMENT_SIREN_SIRET,
 anneeSIRENE, anneeCOG, actifsSeulement, communesValides));
} {code}
 

In the worker, a filter calls a {{validationEtablissement(SparkSession, 
HistoriqueExecution, Row, ...)}} on each row to perform complete checking 
(eight rules to check for an establishment validity).

When a check fails, along with a warning log, I'm also counting in 
{{historiqueExecution}} object the number of problems of that kind I've 
encountered.

That function increase a {{longAccumulator}} value, and create that accumulator 
first, that it stores in a {{{}Map accumulators{}}},  
if needed.
{code:java}
{code:Java}
public void incrementerOccurrences(SparkSession session, String 
codeOuFormatMessage, boolean creerSiAbsent) {
   LongAccumulator accumulator = accumulators.get(codeOuFormatMessage);

   if (accumulator == null && creerSiAbsent) {
  accumulator = session.sparkContext().longAccumulator(codeOuFormatMessage);
  accumulators.put(codeOuFormatMessage, accumulator);
   }

   if (accumulator != null) {
  accumulator.add(1);
   }
}
{code}
 

Or at least, it should. But my problem is that it isn't the case.

During Dataset constitution :

*1)* If I initialize the {{historiqueExecution}} variable with the exhaustive 
list of messages it can have to count, +*before*+ the {{worker.get()}} is 
called by the {{constitutionStandard}} method, the dataset is perfectly 
constituted and I 

[jira] [Created] (SPARK-47156) SparkSession returns a null context during a dataset creation

2024-02-25 Thread Marc Le Bihan (Jira)
Marc Le Bihan created SPARK-47156:
-

 Summary: SparkSession returns a null context during a dataset 
creation
 Key: SPARK-47156
 URL: https://issues.apache.org/jira/browse/SPARK-47156
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.4.2
 Environment: Debian 12

Java 17
Reporter: Marc Le Bihan


I need first to know if I'm in front of a bug or not.

If it's the case, I manage to create a test to help you reproduce the case, but 
if it isn't, maybe Spark documentation could explain when 
{{sparkSession.getContext()}} can return {{{}null{}}}.

 

I'm willing to ease my development by separating :
 * parquet files management \{ checking existence, then loading them as cache, 
or saving data to them },
 * from dataset creation, when it doesn't exist yet, and should be constituted 
from scratch.

 

The method I'm using is this one:

 
{code:java}
{code:Java}
protected Dataset constitutionStandard(OptionsCreationLecture 
optionsCreationLecture,
   Supplier> worker, CacheParqueteur 
cacheParqueteur) {
   OptionsCreationLecture options = optionsCreationLecture != null ? 
optionsCreationLecture : optionsCreationLecture();

   Dataset dataset = cacheParqueteur.call(options.useCache());
   return dataset == null ? 
cacheParqueteur.save(cacheParqueteur.appliquer(worker.get())) : dataset;
}
{code}
In case the dataset doesn't exist in parquet files (= cache) yet, it starts its 
creation by calling a {{worker.get()}} that is a {{Supplier}} of 
{{{}Dataset{}}}.

 

 

A concrete usage is this one:

 

 
{code:java}
{code:Java}
public Dataset rowEtablissements(OptionsCreationLecture 
optionsCreationLecture, HistoriqueExecution historiqueExecution, int anneeCOG, 
int anneeSIRENE, boolean actifsSeulement, boolean communesValides, boolean 
nomenclaturesNAF2Valides) {
   OptionsCreationLecture options = optionsCreationLecture != null ? 
optionsCreationLecture : optionsCreationLecture();
   Supplier> worker = () -> {
  super.setStageDescription(this.messageSource, 
"row.etablissements.libelle.long", "row.etablissements.libelle.court", 
anneeSIRENE, anneeCOG, actifsSeulement, communesValides, 
nomenclaturesNAF2Valides);
  
  Map indexs = new HashMap<>();
  Dataset etablissements = 
etablissementsNonFiltres(optionsCreationLecture, anneeSIRENE);

  etablissements = etablissements.filter(
 (FilterFunction)etablissement -> 
this.validator.validationEtablissement(this.session, historiqueExecution, 
etablissement, actifsSeulement, nomenclaturesNAF2Valides, indexs));

  // Si le filtrage par communes valides a été demandé, l'appliquer.
  if (communesValides) {
 etablissements = rowRestreindreAuxCommunesValides(etablissements, 
anneeCOG, anneeSIRENE, indexs);
  }
  else {
 etablissements = etablissements.withColumn("codeDepartement", 
substring(CODE_COMMUNE.col(), 1, 2));
  }

  // Associer les libellés des codes APE/NAF.
  Dataset nomenclatureNAF = 
this.nafDataset.rowNomenclatureNAF(anneeSIRENE);
  etablissements = etablissements.join(nomenclatureNAF, 
etablissements.col("activitePrincipale").equalTo(nomenclatureNAF.col("codeNAF"))
 , "left_outer")
 .drop("codeNAF", "niveauNAF");

  // Le dataset est maintenant considéré comme valide, et ses champs 
peuvent être castés dans leurs types définitifs.
  return this.validator.cast(etablissements);
   };

   return constitutionStandard(options, () -> worker.get()
  .withColumn("partitionSiren", SIREN_ENTREPRISE.col().substr(1,2)),
  new CacheParqueteur<>(options, this.session,
 "etablissements", 
"annee_{0,number,#0}-actifs_{1}-communes_verifiees_{2}-nafs_verifies_{3}", 
DEPARTEMENT_SIREN_SIRET,
 anneeSIRENE, anneeCOG, actifsSeulement, communesValides));
} {code}
 

In the worker, a filter calls a {{validationEtablissement(SparkSession, 
HistoriqueExecution, Row, ...)}} on each row to perform complete checking 
(eight rules to check for an establishment validity).

 

When a check fails, along with a warning log, I'm also counting in 
{{historiqueExecution}} object the number of problems of that kind I've 
encountered.

That function increase a {{longAccumulator}} value, and create that accumulator 
first, that it stores in a {{{}Map accumulators{}}},  
if needed. 
{code:java}
{code:Java}
public void incrementerOccurrences(SparkSession session, String 
codeOuFormatMessage, boolean creerSiAbsent) {
   LongAccumulator accumulator = accumulators.get(codeOuFormatMessage);

   if (accumulator == null && creerSiAbsent) {
  accumulator = session.sparkContext().longAccumulator(codeOuFormatMessage);
  accumulators.put(codeOuFormatMessage, accumulator);
   }

   if (accumulator != null) {
  accumulator.add(1);
   }
}
{code}
 

 

Or at least, it should. But my problem is that it isn't the case.

During Dataset cons