[jira] [Resolved] (SPARK-47163) Fix `make-distribution.sh` to check `jackson-core-asl-1.9.13.jar` existence first
[ https://issues.apache.org/jira/browse/SPARK-47163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-47163. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45253 [https://github.com/apache/spark/pull/45253] > Fix `make-distribution.sh` to check `jackson-core-asl-1.9.13.jar` existence > first > - > > Key: SPARK-47163 > URL: https://issues.apache.org/jira/browse/SPARK-47163 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47163) Fix `make-distribution.sh` to check `jackson-core-asl-1.9.13.jar` existence first
[ https://issues.apache.org/jira/browse/SPARK-47163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-47163: - Assignee: Dongjoon Hyun > Fix `make-distribution.sh` to check `jackson-core-asl-1.9.13.jar` existence > first > - > > Key: SPARK-47163 > URL: https://issues.apache.org/jira/browse/SPARK-47163 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47162) Exclude `CodeHaus Jackson` dependencies from Master/Worker/HistoryServer classpaths
[ https://issues.apache.org/jira/browse/SPARK-47162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-47162: -- Description: This is resolved via SPARK-47152. > Exclude `CodeHaus Jackson` dependencies from Master/Worker/HistoryServer > classpaths > --- > > Key: SPARK-47162 > URL: https://issues.apache.org/jira/browse/SPARK-47162 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > Fix For: 4.0.0 > > > This is resolved via SPARK-47152. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47160) Update K8s `Dockerfile` to include `hive-jackson` directory if exists
[ https://issues.apache.org/jira/browse/SPARK-47160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-47160. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45251 [https://github.com/apache/spark/pull/45251] > Update K8s `Dockerfile` to include `hive-jackson` directory if exists > - > > Key: SPARK-47160 > URL: https://issues.apache.org/jira/browse/SPARK-47160 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47161) Uses hash key properly for SparkR build on Windows
[ https://issues.apache.org/jira/browse/SPARK-47161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-47161. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45252 [https://github.com/apache/spark/pull/45252] > Uses hash key properly for SparkR build on Windows > -- > > Key: SPARK-47161 > URL: https://issues.apache.org/jira/browse/SPARK-47161 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > The cache is not being used > https://github.com/apache/spark/actions/runs/8039485831/job/2195633 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47161) Uses hash key properly for SparkR build on Windows
[ https://issues.apache.org/jira/browse/SPARK-47161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-47161: - Assignee: Hyukjin Kwon > Uses hash key properly for SparkR build on Windows > -- > > Key: SPARK-47161 > URL: https://issues.apache.org/jira/browse/SPARK-47161 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > > The cache is not being used > https://github.com/apache/spark/actions/runs/8039485831/job/2195633 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47163) Fix `make-distribution.sh` to check `jackson-core-asl-1.9.13.jar` existence first
[ https://issues.apache.org/jira/browse/SPARK-47163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-47163: -- Summary: Fix `make-distribution.sh` to check `jackson-core-asl-1.9.13.jar` existence first (was: Fix `make-distribution.sh` to check `jackson-*-asl-*.jar` existence first) > Fix `make-distribution.sh` to check `jackson-core-asl-1.9.13.jar` existence > first > - > > Key: SPARK-47163 > URL: https://issues.apache.org/jira/browse/SPARK-47163 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47163) Fix `make-distribution.sh` to check `jackson-*-asl-*.jar` existence first
[ https://issues.apache.org/jira/browse/SPARK-47163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47163: --- Labels: pull-request-available (was: ) > Fix `make-distribution.sh` to check `jackson-*-asl-*.jar` existence first > - > > Key: SPARK-47163 > URL: https://issues.apache.org/jira/browse/SPARK-47163 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47163) Fix `make-distribution.sh` to check `jackson-*-asl-*.jar` existence first
Dongjoon Hyun created SPARK-47163: - Summary: Fix `make-distribution.sh` to check `jackson-*-asl-*.jar` existence first Key: SPARK-47163 URL: https://issues.apache.org/jira/browse/SPARK-47163 Project: Spark Issue Type: Sub-task Components: Build Affects Versions: 4.0.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47162) Exclude `CodeHaus Jackson` dependencies from Master/Worker/HistoryServer classpaths
[ https://issues.apache.org/jira/browse/SPARK-47162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-47162. --- Fix Version/s: 4.0.0 Resolution: Fixed > Exclude `CodeHaus Jackson` dependencies from Master/Worker/HistoryServer > classpaths > --- > > Key: SPARK-47162 > URL: https://issues.apache.org/jira/browse/SPARK-47162 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47161) Uses hash key properly for SparkR build on Windows
[ https://issues.apache.org/jira/browse/SPARK-47161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47161: --- Labels: pull-request-available (was: ) > Uses hash key properly for SparkR build on Windows > -- > > Key: SPARK-47161 > URL: https://issues.apache.org/jira/browse/SPARK-47161 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > > The cache is not being used > https://github.com/apache/spark/actions/runs/8039485831/job/2195633 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47162) Exclude `CodeHaus Jackson` dependencies from Master/Worker/HistoryServer class paths
[ https://issues.apache.org/jira/browse/SPARK-47162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-47162: -- Summary: Exclude `CodeHaus Jackson` dependencies from Master/Worker/HistoryServer class paths (was: Exclude `CodeHaus Jackson` dependencies from Master/Worker/HistoryServer) > Exclude `CodeHaus Jackson` dependencies from Master/Worker/HistoryServer > class paths > > > Key: SPARK-47162 > URL: https://issues.apache.org/jira/browse/SPARK-47162 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47162) Exclude `CodeHaus Jackson` dependencies from Master/Worker/HistoryServer
Dongjoon Hyun created SPARK-47162: - Summary: Exclude `CodeHaus Jackson` dependencies from Master/Worker/HistoryServer Key: SPARK-47162 URL: https://issues.apache.org/jira/browse/SPARK-47162 Project: Spark Issue Type: Sub-task Components: Spark Core Affects Versions: 4.0.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47162) Exclude `CodeHaus Jackson` dependencies from Master/Worker/HistoryServer classpaths
[ https://issues.apache.org/jira/browse/SPARK-47162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-47162: -- Summary: Exclude `CodeHaus Jackson` dependencies from Master/Worker/HistoryServer classpaths (was: Exclude `CodeHaus Jackson` dependencies from Master/Worker/HistoryServer class paths) > Exclude `CodeHaus Jackson` dependencies from Master/Worker/HistoryServer > classpaths > --- > > Key: SPARK-47162 > URL: https://issues.apache.org/jira/browse/SPARK-47162 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47161) Uses hash key properly for SparkR build on Windows
Hyukjin Kwon created SPARK-47161: Summary: Uses hash key properly for SparkR build on Windows Key: SPARK-47161 URL: https://issues.apache.org/jira/browse/SPARK-47161 Project: Spark Issue Type: Improvement Components: Project Infra Affects Versions: 4.0.0 Reporter: Hyukjin Kwon The cache is not being used https://github.com/apache/spark/actions/runs/8039485831/job/2195633 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47160) Update K8s `Dockerfile` to include `hive-jackson` directory if exists
[ https://issues.apache.org/jira/browse/SPARK-47160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-47160: -- Parent: SPARK-47046 Issue Type: Sub-task (was: Improvement) > Update K8s `Dockerfile` to include `hive-jackson` directory if exists > - > > Key: SPARK-47160 > URL: https://issues.apache.org/jira/browse/SPARK-47160 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47160) Update K8s `Dockerfile` to include `hive-jackson` directory if exists
Dongjoon Hyun created SPARK-47160: - Summary: Update K8s `Dockerfile` to include `hive-jackson` directory if exists Key: SPARK-47160 URL: https://issues.apache.org/jira/browse/SPARK-47160 Project: Spark Issue Type: Improvement Components: Kubernetes Affects Versions: 4.0.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47160) Update K8s `Dockerfile` to include `hive-jackson` directory if exists
[ https://issues.apache.org/jira/browse/SPARK-47160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47160: --- Labels: pull-request-available (was: ) > Update K8s `Dockerfile` to include `hive-jackson` directory if exists > - > > Key: SPARK-47160 > URL: https://issues.apache.org/jira/browse/SPARK-47160 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47159) Set `OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES` in `MacOS` GitHub Action Job
[ https://issues.apache.org/jira/browse/SPARK-47159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-47159. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45249 [https://github.com/apache/spark/pull/45249] > Set `OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES` in `MacOS` GitHub Action Job > -- > > Key: SPARK-47159 > URL: https://issues.apache.org/jira/browse/SPARK-47159 > Project: Spark > Issue Type: Test > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47159) Set `OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES` in `MacOS` GitHub Action Job
[ https://issues.apache.org/jira/browse/SPARK-47159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-47159: Assignee: Dongjoon Hyun > Set `OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES` in `MacOS` GitHub Action Job > -- > > Key: SPARK-47159 > URL: https://issues.apache.org/jira/browse/SPARK-47159 > Project: Spark > Issue Type: Test > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47159) Set `OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES` in `MacOS` GitHub Action Job
[ https://issues.apache.org/jira/browse/SPARK-47159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47159: --- Labels: pull-request-available (was: ) > Set `OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES` in `MacOS` GitHub Action Job > -- > > Key: SPARK-47159 > URL: https://issues.apache.org/jira/browse/SPARK-47159 > Project: Spark > Issue Type: Test > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47159) Set `OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES` in `MacOS` GitHub Action Job
Dongjoon Hyun created SPARK-47159: - Summary: Set `OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES` in `MacOS` GitHub Action Job Key: SPARK-47159 URL: https://issues.apache.org/jira/browse/SPARK-47159 Project: Spark Issue Type: Test Components: Project Infra Affects Versions: 4.0.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47158) Assign proper name and sqlState to _LEGACY_ERROR_TEMP_2134 & 2231
[ https://issues.apache.org/jira/browse/SPARK-47158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47158: --- Labels: pull-request-available (was: ) > Assign proper name and sqlState to _LEGACY_ERROR_TEMP_2134 & 2231 > - > > Key: SPARK-47158 > URL: https://issues.apache.org/jira/browse/SPARK-47158 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Priority: Major > Labels: pull-request-available > > Assign proper name and sqlState to _LEGACY_ERROR_TEMP_2134 & 2231 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47158) Assign proper name and sqlState to _LEGACY_ERROR_TEMP_2134 & 2231
[ https://issues.apache.org/jira/browse/SPARK-47158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haejoon Lee updated SPARK-47158: Summary: Assign proper name and sqlState to _LEGACY_ERROR_TEMP_2134 & 2231 (was: Assign proper name to top LEGACY errors) > Assign proper name and sqlState to _LEGACY_ERROR_TEMP_2134 & 2231 > - > > Key: SPARK-47158 > URL: https://issues.apache.org/jira/browse/SPARK-47158 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Priority: Major > > Assign proper name and sqlState to _LEGACY_ERROR_TEMP_2134 & 2231 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47158) Assign proper name to top LEGACY errors
Haejoon Lee created SPARK-47158: --- Summary: Assign proper name to top LEGACY errors Key: SPARK-47158 URL: https://issues.apache.org/jira/browse/SPARK-47158 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 4.0.0 Reporter: Haejoon Lee Assign proper name and sqlState to _LEGACY_ERROR_TEMP_2134 & 2231 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47120) Null comparison push down data filter from subquery produces in NPE in Parquet filter
[ https://issues.apache.org/jira/browse/SPARK-47120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-47120. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45202 [https://github.com/apache/spark/pull/45202] > Null comparison push down data filter from subquery produces in NPE in > Parquet filter > - > > Key: SPARK-47120 > URL: https://issues.apache.org/jira/browse/SPARK-47120 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.5.0 >Reporter: Cosmin Dumitru >Assignee: Cosmin Dumitru >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > This issue has been introduced in > [https://github.com/apache/spark/pull/41088] where we convert scalar > subqueries to literals and then convert the literals to > {{{}org.apache.spark.sql.sources.Filters{}}}. These filters are then pushed > down to parquet. > If the literal is a comparison with {{null}} then the parquet filter > conversion code throws NPE. > > repro code which results in NPE > {code:java} > create table t1(d date) using parquet > create table t2(d date) using parquet > insert into t1 values date'2021-01-01' > insert into t2 values (null) > select * from t1 where 1=1 and d > (select d from t2){code} > [fix PR |https://github.com/apache/spark/pull/45202/files] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47157) Introduce abstraction that facilitates more flexible management of file listings within the system.
[ https://issues.apache.org/jira/browse/SPARK-47157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-47157. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45224 [https://github.com/apache/spark/pull/45224] > Introduce abstraction that facilitates more flexible management of file > listings within the system. > --- > > Key: SPARK-47157 > URL: https://issues.apache.org/jira/browse/SPARK-47157 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.5.0 >Reporter: Costas Zarifis >Assignee: Costas Zarifis >Priority: Major > Labels: pull-request-available, sql > Fix For: 4.0.0 > > > The introduction of these constructs is crucial for defining a standardized > API for file listing operations, regardless of the underlying representation > that's used to represent files and partitions. By introducing said > abstractions we improve the modularity of the code and enable future > improvements that can prove to be beneficial both for runtime and memory > improvements. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41125) Simple call to createDataFrame fails with PicklingError but only on python3.11
[ https://issues.apache.org/jira/browse/SPARK-41125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17820502#comment-17820502 ] Jamie commented on SPARK-41125: --- I can confirm this now works on python 3.11 using the latest available version of spark. > Simple call to createDataFrame fails with PicklingError but only on python3.11 > -- > > Key: SPARK-41125 > URL: https://issues.apache.org/jira/browse/SPARK-41125 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.3.1 >Reporter: Jamie >Priority: Minor > Attachments: screenshot-1.png > > > I am using python's pytest library to write unit tests for a pyspark library > I am building. pytest has a popular capability called fixtures which allow us > to write reusable preparation steps for our tests. I have a simple fixture > that creates a pyspark.sql.DataFrame which works on python 3.7, 3.8, 3.9, > 3.10 but fails on python 3.11. > The failing code is in a fixture called {{{}dataframe_of_purchases{}}}. Here > is my fixtures code: > {code:python} > from decimal import Decimal > import pytest > from pyspark.sql import DataFrame, SparkSession > from pyspark.sql.types import ( > DecimalType, > IntegerType, > StringType, > StructField, > StructType, > ) > @pytest.fixture(scope="session") > def purchases_schema(): > return StructType( > [ > StructField("Customer", StringType(), True), > StructField("Store", StringType(), True), > StructField("Channel", StringType(), True), > StructField("Product", StringType(), True), > StructField("Quantity", IntegerType(), True), > StructField("Basket", StringType(), True), > StructField("GrossSpend", DecimalType(10, 2), True), > ] > ) > @pytest.fixture(scope="session") > def dataframe_of_purchases(purchases_schema) -> DataFrame: > spark = SparkSession.builder.getOrCreate() > return spark.createDataFrame( > data=[ > ("Leia", "Hammersmith", "Instore", "Cheddar", 2, "Basket1", > Decimal(2.50)) > ], > schema=purchases_schema, > ) > {code} > This code can be seen here: > [https://github.com/jamiekt/jstark/blob/9e1d0e654195932a0765f66db6c8359ed8b60a3b/tests/conftest.py] > The tests run in a GitHub Actions CI pipeline against many different versions > of python on linux, Windows & MacOS. The tests only fail for python 3.11, and > on all platforms: > !screenshot-1.png! > This run can be seen at: > https://github.com/jamiekt/jstark/actions/runs/3457011099 > The error is > {quote}_pickle.PicklingError: Could not serialize object: IndexError: tuple > index out of range{quote} > The full stacktrace is: > > {code} > ../../../.local/share/hatch/env/virtual/jstark/fjzPEUEi/jstark/lib/python3.11/site-packages/pyspark/sql/session.py:894: > in createDataFrame > return self._create_dataframe( > ../../../.local/share/hatch/env/virtual/jstark/fjzPEUEi/jstark/lib/python3.11/site-packages/pyspark/sql/session.py:938: > in _create_dataframe > jrdd = self._jvm.SerDeUtil.toJavaArray(rdd._to_java_object_rdd()) > ../../../.local/share/hatch/env/virtual/jstark/fjzPEUEi/jstark/lib/python3.11/site-packages/pyspark/rdd.py:3113: > in _to_java_object_rdd > return self.ctx._jvm.SerDeUtil.pythonToJava(rdd._jrdd, True) > ../../../.local/share/hatch/env/virtual/jstark/fjzPEUEi/jstark/lib/python3.11/site-packages/pyspark/rdd.py:3505: > in _jrdd > wrapped_func = _wrap_function( > ../../../.local/share/hatch/env/virtual/jstark/fjzPEUEi/jstark/lib/python3.11/site-packages/pyspark/rdd.py:3362: > in _wrap_function > pickled_command, broadcast_vars, env, includes = > _prepare_for_python_RDD(sc, command) > ../../../.local/share/hatch/env/virtual/jstark/fjzPEUEi/jstark/lib/python3.11/site-packages/pyspark/rdd.py:3345: > in _prepare_for_python_RDD > pickled_command = ser.dumps(command) > Traceback (most recent call last): > File > "/home/runner/.local/share/hatch/env/virtual/jstark/fjzPEUEi/jstark/lib/python3.11/site-packages/pyspark/serializers.py", > line 458, in dumps > return cloudpickle.dumps(obj, pickle_protocol) >^^^ > File > "/home/runner/.local/share/hatch/env/virtual/jstark/fjzPEUEi/jstark/lib/python3.11/site-packages/pyspark/cloudpickle/cloudpickle_fast.py", > line 73, in dumps > cp.dump(obj) > File > "/home/runner/.local/share/hatch/env/virtual/jstark/fjzPEUEi/jstark/lib/python3.11/site-packages/pyspark/cloudpickle/cloudpickle_fast.py", > line 602, in dump > return Pickler.dump(self, obj) >^^^ > File > "/home/runner/.local/share/hatch/env/virtual/jstark/fjzPEUEi/jstark/
[jira] [Updated] (SPARK-47156) SparkSession returns a null context during a dataset creation
[ https://issues.apache.org/jira/browse/SPARK-47156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marc Le Bihan updated SPARK-47156: -- Description: I need first to know if I'm in front of a bug or not. If it's the case, I'll manage to create a test to help you reproduce the case, but if it isn't, maybe Spark documentation could explain when {{sparkSession.getContext()}} can return {{{}null{}}}. I'm willing to ease my development by separating : * parquet files management \{ checking existence, then loading them as cache, or saving data to them }, * from dataset creation, when it doesn't exist yet, and should be constituted from scratch. The method I'm using is this one: {code:java} {code:Java} protected Dataset constitutionStandard(OptionsCreationLecture optionsCreationLecture, Supplier> worker, CacheParqueteur cacheParqueteur) { OptionsCreationLecture options = optionsCreationLecture != null ? optionsCreationLecture : optionsCreationLecture(); Dataset dataset = cacheParqueteur.call(options.useCache()); return dataset == null ? cacheParqueteur.save(cacheParqueteur.appliquer(worker.get())) : dataset; } {code} In case the dataset doesn't exist in parquet files (= cache) yet, it starts its creation by calling a {{worker.get()}} that is a {{Supplier}} of {{{}Dataset{}}}. A concrete usage is this one: {code:java} {code:Java} public Dataset rowEtablissements(OptionsCreationLecture optionsCreationLecture, HistoriqueExecution historiqueExecution, int anneeCOG, int anneeSIRENE, boolean actifsSeulement, boolean communesValides, boolean nomenclaturesNAF2Valides) { OptionsCreationLecture options = optionsCreationLecture != null ? optionsCreationLecture : optionsCreationLecture(); Supplier> worker = () -> { super.setStageDescription(this.messageSource, "row.etablissements.libelle.long", "row.etablissements.libelle.court", anneeSIRENE, anneeCOG, actifsSeulement, communesValides, nomenclaturesNAF2Valides); Map indexs = new HashMap<>(); Dataset etablissements = etablissementsNonFiltres(optionsCreationLecture, anneeSIRENE); etablissements = etablissements.filter( (FilterFunction)etablissement -> this.validator.validationEtablissement(this.session, historiqueExecution, etablissement, actifsSeulement, nomenclaturesNAF2Valides, indexs)); // Si le filtrage par communes valides a été demandé, l'appliquer. if (communesValides) { etablissements = rowRestreindreAuxCommunesValides(etablissements, anneeCOG, anneeSIRENE, indexs); } else { etablissements = etablissements.withColumn("codeDepartement", substring(CODE_COMMUNE.col(), 1, 2)); } // Associer les libellés des codes APE/NAF. Dataset nomenclatureNAF = this.nafDataset.rowNomenclatureNAF(anneeSIRENE); etablissements = etablissements.join(nomenclatureNAF, etablissements.col("activitePrincipale").equalTo(nomenclatureNAF.col("codeNAF")) , "left_outer") .drop("codeNAF", "niveauNAF"); // Le dataset est maintenant considéré comme valide, et ses champs peuvent être castés dans leurs types définitifs. return this.validator.cast(etablissements); }; return constitutionStandard(options, () -> worker.get() .withColumn("partitionSiren", SIREN_ENTREPRISE.col().substr(1,2)), new CacheParqueteur<>(options, this.session, "etablissements", "annee_{0,number,#0}-actifs_{1}-communes_verifiees_{2}-nafs_verifies_{3}", DEPARTEMENT_SIREN_SIRET, anneeSIRENE, anneeCOG, actifsSeulement, communesValides)); } {code} In the worker, a filter calls a {{validationEtablissement(SparkSession, HistoriqueExecution, Row, ...)}} on each row to perform complete checking (eight rules to check for an establishment validity). When a check fails, along with a warning log, I'm also counting in {{historiqueExecution}} object the number of problems of that kind I've encountered. That function increase a {{longAccumulator}} value, and create that accumulator first, that it stores in a {{{}Map accumulators{}}}, if needed. {code:java} {code:Java} public void incrementerOccurrences(SparkSession session, String codeOuFormatMessage, boolean creerSiAbsent) { LongAccumulator accumulator = accumulators.get(codeOuFormatMessage); if (accumulator == null && creerSiAbsent) { accumulator = session.sparkContext().longAccumulator(codeOuFormatMessage); accumulators.put(codeOuFormatMessage, accumulator); } if (accumulator != null) { accumulator.add(1); } } {code} Or at least, it should. But my problem is that it isn't the case. During Dataset constitution : *1)* If I initialize the {{historiqueExecution}} variable with the exhaustive list of messages it can have to count, +*before*+ the {{worker.get()}} is called by the {{constitutionStandard}} method, the dataset is perfectly constituted and I
[jira] [Created] (SPARK-47156) SparkSession returns a null context during a dataset creation
Marc Le Bihan created SPARK-47156: - Summary: SparkSession returns a null context during a dataset creation Key: SPARK-47156 URL: https://issues.apache.org/jira/browse/SPARK-47156 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.4.2 Environment: Debian 12 Java 17 Reporter: Marc Le Bihan I need first to know if I'm in front of a bug or not. If it's the case, I manage to create a test to help you reproduce the case, but if it isn't, maybe Spark documentation could explain when {{sparkSession.getContext()}} can return {{{}null{}}}. I'm willing to ease my development by separating : * parquet files management \{ checking existence, then loading them as cache, or saving data to them }, * from dataset creation, when it doesn't exist yet, and should be constituted from scratch. The method I'm using is this one: {code:java} {code:Java} protected Dataset constitutionStandard(OptionsCreationLecture optionsCreationLecture, Supplier> worker, CacheParqueteur cacheParqueteur) { OptionsCreationLecture options = optionsCreationLecture != null ? optionsCreationLecture : optionsCreationLecture(); Dataset dataset = cacheParqueteur.call(options.useCache()); return dataset == null ? cacheParqueteur.save(cacheParqueteur.appliquer(worker.get())) : dataset; } {code} In case the dataset doesn't exist in parquet files (= cache) yet, it starts its creation by calling a {{worker.get()}} that is a {{Supplier}} of {{{}Dataset{}}}. A concrete usage is this one: {code:java} {code:Java} public Dataset rowEtablissements(OptionsCreationLecture optionsCreationLecture, HistoriqueExecution historiqueExecution, int anneeCOG, int anneeSIRENE, boolean actifsSeulement, boolean communesValides, boolean nomenclaturesNAF2Valides) { OptionsCreationLecture options = optionsCreationLecture != null ? optionsCreationLecture : optionsCreationLecture(); Supplier> worker = () -> { super.setStageDescription(this.messageSource, "row.etablissements.libelle.long", "row.etablissements.libelle.court", anneeSIRENE, anneeCOG, actifsSeulement, communesValides, nomenclaturesNAF2Valides); Map indexs = new HashMap<>(); Dataset etablissements = etablissementsNonFiltres(optionsCreationLecture, anneeSIRENE); etablissements = etablissements.filter( (FilterFunction)etablissement -> this.validator.validationEtablissement(this.session, historiqueExecution, etablissement, actifsSeulement, nomenclaturesNAF2Valides, indexs)); // Si le filtrage par communes valides a été demandé, l'appliquer. if (communesValides) { etablissements = rowRestreindreAuxCommunesValides(etablissements, anneeCOG, anneeSIRENE, indexs); } else { etablissements = etablissements.withColumn("codeDepartement", substring(CODE_COMMUNE.col(), 1, 2)); } // Associer les libellés des codes APE/NAF. Dataset nomenclatureNAF = this.nafDataset.rowNomenclatureNAF(anneeSIRENE); etablissements = etablissements.join(nomenclatureNAF, etablissements.col("activitePrincipale").equalTo(nomenclatureNAF.col("codeNAF")) , "left_outer") .drop("codeNAF", "niveauNAF"); // Le dataset est maintenant considéré comme valide, et ses champs peuvent être castés dans leurs types définitifs. return this.validator.cast(etablissements); }; return constitutionStandard(options, () -> worker.get() .withColumn("partitionSiren", SIREN_ENTREPRISE.col().substr(1,2)), new CacheParqueteur<>(options, this.session, "etablissements", "annee_{0,number,#0}-actifs_{1}-communes_verifiees_{2}-nafs_verifies_{3}", DEPARTEMENT_SIREN_SIRET, anneeSIRENE, anneeCOG, actifsSeulement, communesValides)); } {code} In the worker, a filter calls a {{validationEtablissement(SparkSession, HistoriqueExecution, Row, ...)}} on each row to perform complete checking (eight rules to check for an establishment validity). When a check fails, along with a warning log, I'm also counting in {{historiqueExecution}} object the number of problems of that kind I've encountered. That function increase a {{longAccumulator}} value, and create that accumulator first, that it stores in a {{{}Map accumulators{}}}, if needed. {code:java} {code:Java} public void incrementerOccurrences(SparkSession session, String codeOuFormatMessage, boolean creerSiAbsent) { LongAccumulator accumulator = accumulators.get(codeOuFormatMessage); if (accumulator == null && creerSiAbsent) { accumulator = session.sparkContext().longAccumulator(codeOuFormatMessage); accumulators.put(codeOuFormatMessage, accumulator); } if (accumulator != null) { accumulator.add(1); } } {code} Or at least, it should. But my problem is that it isn't the case. During Dataset cons