[jira] [Commented] (SPARK-40099) Merge adjacent CaseWhen branches if their values are the same
[ https://issues.apache.org/jira/browse/SPARK-40099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17686249#comment-17686249 ] daile commented on SPARK-40099: --- [~yumwang] can you help review it again ? > Merge adjacent CaseWhen branches if their values are the same > - > > Key: SPARK-40099 > URL: https://issues.apache.org/jira/browse/SPARK-40099 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Yuming Wang >Priority: Major > > For example: > {code:sql} > CASE > WHEN f1.buyer_id IS NOT NULL THEN 1 > WHEN f2.buyer_id IS NOT NULL THEN 1 > ELSE 0 > END > {code} > The excepted result: > {code:sql} > CASE > WHEN f1.buyer_id IS NOT NULL or f2.buyer_id IS NOT NULL > THEN 1 > ELSE 0 > END > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-4073) Parquet+Snappy can cause significant off-heap memory usage
[ https://issues.apache.org/jira/browse/SPARK-4073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17686239#comment-17686239 ] shufan edited comment on SPARK-4073 at 2/9/23 7:46 AM: --- I had a similar problem。 When I submitted a hive on spark task, which was to join two tables, one of the big tables was parquet + snappy, about 5G in size, with 100 million rows of data, the executor was killed by k8s. Configuration is set spark.executor.memoryOverhead=6g; set spark.executor.memory=5g; set spark.executor.cores=4; set spark.executor.instances=2; set spark.executor.extraJavaOptions=-XX:MaxDirectMemorySize=4096m -Dio.netty.maxDirectMemory=104857600; In the above configuration, the jvm memory usage exceeds 11G. The executor has less than 5G of heap memory and the Direct ByteBuffer less than 4G, which are around 9G. 11 - 9 = 2G of memory is unknown. Can you tell me which part of the remaining 2G of non-heap memory is used? Is there any way to limit it? was (Author: shufan084): I had a similar problem。 When I submitted a hive on spark task, which was to join two tables, one of the big tables was parquet + snappy, about 5G in size, with 100 million rows of data, the executor was killed by k8s. Configuration is set spark.executor.memoryOverhead=6g; set spark.executor.memory=5g; set spark.executor.cores=4; set spark.executor.instances=2; set spark.executor.extraJavaOptions=-XX:MaxDirectMemorySize=4096m -Dio.netty.maxDirectMemory=104857600; In the above configuration, the jvm memory usage exceeds 11G. The executor has less than 5G of heap memory and the Direct ByteBuffer less than 4G, which are around 9G. 11 - 9 = 2G of memory is unknown. Can you tell me which part of the remaining 2G of memory is used? Is there any way to limit it? > Parquet+Snappy can cause significant off-heap memory usage > -- > > Key: SPARK-4073 > URL: https://issues.apache.org/jira/browse/SPARK-4073 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.1.0 >Reporter: Patrick Wendell >Priority: Critical > > The parquet snappy codec allocates off-heap buffers for decompression[1]. In > one cases the observed size of these buffers was high enough to add several > GB of data to the overall virtual memory usage of the Spark executor process. > I don't understand enough about our use of Snappy to fully grok how much data > we would _expect_ to be present in these buffers at any given time, but I can > say a few things. > 1. The dataset had individual rows that were fairly large, e.g. megabytes. > 2. Direct buffers are not cleaned up until GC events, and overall there was > not much heap contention. So maybe they just weren't being cleaned. > I opened PARQUET-118 to see if they can provide an option to use on-heap > buffers for decompression. In the mean time, we could consider changing the > default back to gzip, or we could do nothing (not sure how many other users > will hit this). > [1] > https://github.com/apache/incubator-parquet-mr/blob/master/parquet-hadoop/src/main/java/parquet/hadoop/codec/SnappyDecompressor.java#L28 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4073) Parquet+Snappy can cause significant off-heap memory usage
[ https://issues.apache.org/jira/browse/SPARK-4073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17686239#comment-17686239 ] shufan commented on SPARK-4073: --- I had a similar problem。 When I submitted a hive on spark task, which was to join two tables, one of the big tables was parquet + snappy, about 5G in size, with 100 million rows of data, the executor was killed by k8s. Configuration is set spark.executor.memoryOverhead=6g; set spark.executor.memory=5g; set spark.executor.cores=4; set spark.executor.instances=2; set spark.executor.extraJavaOptions=-XX:MaxDirectMemorySize=4096m -Dio.netty.maxDirectMemory=104857600; In the above configuration, the jvm memory usage exceeds 11G. The executor has less than 5G of heap memory and the Direct ByteBuffer less than 4G, which are around 9G. 11 - 9 = 2G of memory is unknown. Can you tell me which part of the remaining 2G of memory is used? Is there any way to limit it? > Parquet+Snappy can cause significant off-heap memory usage > -- > > Key: SPARK-4073 > URL: https://issues.apache.org/jira/browse/SPARK-4073 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.1.0 >Reporter: Patrick Wendell >Priority: Critical > > The parquet snappy codec allocates off-heap buffers for decompression[1]. In > one cases the observed size of these buffers was high enough to add several > GB of data to the overall virtual memory usage of the Spark executor process. > I don't understand enough about our use of Snappy to fully grok how much data > we would _expect_ to be present in these buffers at any given time, but I can > say a few things. > 1. The dataset had individual rows that were fairly large, e.g. megabytes. > 2. Direct buffers are not cleaned up until GC events, and overall there was > not much heap contention. So maybe they just weren't being cleaned. > I opened PARQUET-118 to see if they can provide an option to use on-heap > buffers for decompression. In the mean time, we could consider changing the > default back to gzip, or we could do nothing (not sure how many other users > will hit this). > [1] > https://github.com/apache/incubator-parquet-mr/blob/master/parquet-hadoop/src/main/java/parquet/hadoop/codec/SnappyDecompressor.java#L28 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42388) Avoid unnecessary parquet footer reads when no filters in vectorized parquet reader
[ https://issues.apache.org/jira/browse/SPARK-42388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mars updated SPARK-42388: - Summary: Avoid unnecessary parquet footer reads when no filters in vectorized parquet reader (was: Avoid unnecessary parquet footer reads when no filters) > Avoid unnecessary parquet footer reads when no filters in vectorized parquet > reader > --- > > Key: SPARK-42388 > URL: https://issues.apache.org/jira/browse/SPARK-42388 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Mars >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42388) Avoid unnecessary parquet footer reads when no filters
Mars created SPARK-42388: Summary: Avoid unnecessary parquet footer reads when no filters Key: SPARK-42388 URL: https://issues.apache.org/jira/browse/SPARK-42388 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.4.0 Reporter: Mars -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42387) Avoid unnecessary parquet footer reads when no filters
Mars created SPARK-42387: Summary: Avoid unnecessary parquet footer reads when no filters Key: SPARK-42387 URL: https://issues.apache.org/jira/browse/SPARK-42387 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.4.0 Reporter: Mars -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42386) Rewrite HiveGenericUDF with Invoke
[ https://issues.apache.org/jira/browse/SPARK-42386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17686222#comment-17686222 ] Apache Spark commented on SPARK-42386: -- User 'panbingkun' has created a pull request for this issue: https://github.com/apache/spark/pull/39949 > Rewrite HiveGenericUDF with Invoke > -- > > Key: SPARK-42386 > URL: https://issues.apache.org/jira/browse/SPARK-42386 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: BingKun Pan >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42386) Rewrite HiveGenericUDF with Invoke
[ https://issues.apache.org/jira/browse/SPARK-42386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42386: Assignee: (was: Apache Spark) > Rewrite HiveGenericUDF with Invoke > -- > > Key: SPARK-42386 > URL: https://issues.apache.org/jira/browse/SPARK-42386 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: BingKun Pan >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42386) Rewrite HiveGenericUDF with Invoke
[ https://issues.apache.org/jira/browse/SPARK-42386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42386: Assignee: Apache Spark > Rewrite HiveGenericUDF with Invoke > -- > > Key: SPARK-42386 > URL: https://issues.apache.org/jira/browse/SPARK-42386 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: BingKun Pan >Assignee: Apache Spark >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42386) Rewrite HiveGenericUDF with Invoke
BingKun Pan created SPARK-42386: --- Summary: Rewrite HiveGenericUDF with Invoke Key: SPARK-42386 URL: https://issues.apache.org/jira/browse/SPARK-42386 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.5.0 Reporter: BingKun Pan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42385) Upgrade RoaringBitmap to 0.9.39
[ https://issues.apache.org/jira/browse/SPARK-42385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17686190#comment-17686190 ] Apache Spark commented on SPARK-42385: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/39948 > Upgrade RoaringBitmap to 0.9.39 > --- > > Key: SPARK-42385 > URL: https://issues.apache.org/jira/browse/SPARK-42385 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: Yang Jie >Priority: Minor > > [https://github.com/RoaringBitmap/RoaringBitmap/releases/tag/0.9.39] > * ForAllInRange Fixes Yet Again by [@larsk-db|https://github.com/larsk-db] > in [#614|https://github.com/RoaringBitmap/RoaringBitmap/pull/614] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42385) Upgrade RoaringBitmap to 0.9.39
[ https://issues.apache.org/jira/browse/SPARK-42385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42385: Assignee: (was: Apache Spark) > Upgrade RoaringBitmap to 0.9.39 > --- > > Key: SPARK-42385 > URL: https://issues.apache.org/jira/browse/SPARK-42385 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: Yang Jie >Priority: Minor > > [https://github.com/RoaringBitmap/RoaringBitmap/releases/tag/0.9.39] > * ForAllInRange Fixes Yet Again by [@larsk-db|https://github.com/larsk-db] > in [#614|https://github.com/RoaringBitmap/RoaringBitmap/pull/614] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42385) Upgrade RoaringBitmap to 0.9.39
[ https://issues.apache.org/jira/browse/SPARK-42385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42385: Assignee: Apache Spark > Upgrade RoaringBitmap to 0.9.39 > --- > > Key: SPARK-42385 > URL: https://issues.apache.org/jira/browse/SPARK-42385 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Minor > > [https://github.com/RoaringBitmap/RoaringBitmap/releases/tag/0.9.39] > * ForAllInRange Fixes Yet Again by [@larsk-db|https://github.com/larsk-db] > in [#614|https://github.com/RoaringBitmap/RoaringBitmap/pull/614] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41715) Catch specific exceptions for both Spark Connect and PySpark
[ https://issues.apache.org/jira/browse/SPARK-41715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17686163#comment-17686163 ] Apache Spark commented on SPARK-41715: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/39947 > Catch specific exceptions for both Spark Connect and PySpark > > > Key: SPARK-41715 > URL: https://issues.apache.org/jira/browse/SPARK-41715 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Minor > > In python/pyspark/sql/tests/test_catalog.py, we should catch more specific > exceptions such as AnalysisException. The test is shared in both Spark > Connect and PySpark so we should figure the way out to share it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41715) Catch specific exceptions for both Spark Connect and PySpark
[ https://issues.apache.org/jira/browse/SPARK-41715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17686164#comment-17686164 ] Apache Spark commented on SPARK-41715: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/39947 > Catch specific exceptions for both Spark Connect and PySpark > > > Key: SPARK-41715 > URL: https://issues.apache.org/jira/browse/SPARK-41715 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Minor > > In python/pyspark/sql/tests/test_catalog.py, we should catch more specific > exceptions such as AnalysisException. The test is shared in both Spark > Connect and PySpark so we should figure the way out to share it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41715) Catch specific exceptions for both Spark Connect and PySpark
[ https://issues.apache.org/jira/browse/SPARK-41715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41715: Assignee: (was: Apache Spark) > Catch specific exceptions for both Spark Connect and PySpark > > > Key: SPARK-41715 > URL: https://issues.apache.org/jira/browse/SPARK-41715 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Minor > > In python/pyspark/sql/tests/test_catalog.py, we should catch more specific > exceptions such as AnalysisException. The test is shared in both Spark > Connect and PySpark so we should figure the way out to share it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41715) Catch specific exceptions for both Spark Connect and PySpark
[ https://issues.apache.org/jira/browse/SPARK-41715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17686162#comment-17686162 ] Apache Spark commented on SPARK-41715: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/39947 > Catch specific exceptions for both Spark Connect and PySpark > > > Key: SPARK-41715 > URL: https://issues.apache.org/jira/browse/SPARK-41715 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Minor > > In python/pyspark/sql/tests/test_catalog.py, we should catch more specific > exceptions such as AnalysisException. The test is shared in both Spark > Connect and PySpark so we should figure the way out to share it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41715) Catch specific exceptions for both Spark Connect and PySpark
[ https://issues.apache.org/jira/browse/SPARK-41715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41715: Assignee: Apache Spark > Catch specific exceptions for both Spark Connect and PySpark > > > Key: SPARK-41715 > URL: https://issues.apache.org/jira/browse/SPARK-41715 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Apache Spark >Priority: Minor > > In python/pyspark/sql/tests/test_catalog.py, we should catch more specific > exceptions such as AnalysisException. The test is shared in both Spark > Connect and PySpark so we should figure the way out to share it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40453) Improve error handling for GRPC server
[ https://issues.apache.org/jira/browse/SPARK-40453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40453: Assignee: Apache Spark > Improve error handling for GRPC server > -- > > Key: SPARK-40453 > URL: https://issues.apache.org/jira/browse/SPARK-40453 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.2.2 >Reporter: Martin Grund >Assignee: Apache Spark >Priority: Major > > Right now the errors are handled very rudimentary and do not produce proper > GRPC errors. This issue address the work needed to return proper errors. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40453) Improve error handling for GRPC server
[ https://issues.apache.org/jira/browse/SPARK-40453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17686161#comment-17686161 ] Apache Spark commented on SPARK-40453: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/39947 > Improve error handling for GRPC server > -- > > Key: SPARK-40453 > URL: https://issues.apache.org/jira/browse/SPARK-40453 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.2.2 >Reporter: Martin Grund >Priority: Major > > Right now the errors are handled very rudimentary and do not produce proper > GRPC errors. This issue address the work needed to return proper errors. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40453) Improve error handling for GRPC server
[ https://issues.apache.org/jira/browse/SPARK-40453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40453: Assignee: (was: Apache Spark) > Improve error handling for GRPC server > -- > > Key: SPARK-40453 > URL: https://issues.apache.org/jira/browse/SPARK-40453 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.2.2 >Reporter: Martin Grund >Priority: Major > > Right now the errors are handled very rudimentary and do not produce proper > GRPC errors. This issue address the work needed to return proper errors. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40453) Improve error handling for GRPC server
[ https://issues.apache.org/jira/browse/SPARK-40453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17686160#comment-17686160 ] Apache Spark commented on SPARK-40453: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/39947 > Improve error handling for GRPC server > -- > > Key: SPARK-40453 > URL: https://issues.apache.org/jira/browse/SPARK-40453 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.2.2 >Reporter: Martin Grund >Priority: Major > > Right now the errors are handled very rudimentary and do not produce proper > GRPC errors. This issue address the work needed to return proper errors. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42385) Upgrade RoaringBitmap to 0.9.39
Yang Jie created SPARK-42385: Summary: Upgrade RoaringBitmap to 0.9.39 Key: SPARK-42385 URL: https://issues.apache.org/jira/browse/SPARK-42385 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 3.5.0 Reporter: Yang Jie [https://github.com/RoaringBitmap/RoaringBitmap/releases/tag/0.9.39] * ForAllInRange Fixes Yet Again by [@larsk-db|https://github.com/larsk-db] in [#614|https://github.com/RoaringBitmap/RoaringBitmap/pull/614] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42355) Upgrade some maven-plugins
[ https://issues.apache.org/jira/browse/SPARK-42355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen reassigned SPARK-42355: Assignee: Yang Jie > Upgrade some maven-plugins > -- > > Key: SPARK-42355 > URL: https://issues.apache.org/jira/browse/SPARK-42355 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Minor > > [INFO] maven-checkstyle-plugin 3.2.0 -> 3.2.1 > [INFO] maven-clean-plugin . 3.1.0 -> 3.2.0 > [INFO] maven-dependency-plugin 3.3.0 -> 3.5.0 > [INFO] maven-enforcer-plugin ... 3.0.0-M2 -> 3.2.1 > [INFO] maven-source-plugin 3.1.0 -> 3.2.1 > [INFO] maven-surefire-plugin 3.0.0-M7 -> 3.0.0-M8 > [INFO] maven-jar-plugin ... 3.2.2 -> 3.3.0 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42355) Upgrade some maven-plugins
[ https://issues.apache.org/jira/browse/SPARK-42355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen resolved SPARK-42355. -- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 39899 [https://github.com/apache/spark/pull/39899] > Upgrade some maven-plugins > -- > > Key: SPARK-42355 > URL: https://issues.apache.org/jira/browse/SPARK-42355 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Minor > Fix For: 3.5.0 > > > [INFO] maven-checkstyle-plugin 3.2.0 -> 3.2.1 > [INFO] maven-clean-plugin . 3.1.0 -> 3.2.0 > [INFO] maven-dependency-plugin 3.3.0 -> 3.5.0 > [INFO] maven-enforcer-plugin ... 3.0.0-M2 -> 3.2.1 > [INFO] maven-source-plugin 3.1.0 -> 3.2.1 > [INFO] maven-surefire-plugin 3.0.0-M7 -> 3.0.0-M8 > [INFO] maven-jar-plugin ... 3.2.2 -> 3.3.0 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42350) Replcace `get().getOrElse` with `getOrElse`
[ https://issues.apache.org/jira/browse/SPARK-42350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen resolved SPARK-42350. -- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 39893 [https://github.com/apache/spark/pull/39893] > Replcace `get().getOrElse` with `getOrElse` > --- > > Key: SPARK-42350 > URL: https://issues.apache.org/jira/browse/SPARK-42350 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Trivial > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42350) Replcace `get().getOrElse` with `getOrElse`
[ https://issues.apache.org/jira/browse/SPARK-42350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen reassigned SPARK-42350: Assignee: Yang Jie > Replcace `get().getOrElse` with `getOrElse` > --- > > Key: SPARK-42350 > URL: https://issues.apache.org/jira/browse/SPARK-42350 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Trivial > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40770) Improved error messages for applyInPandas for schema mismatch
[ https://issues.apache.org/jira/browse/SPARK-40770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-40770: - Fix Version/s: 3.5.0 (was: 3.4.0) > Improved error messages for applyInPandas for schema mismatch > - > > Key: SPARK-40770 > URL: https://issues.apache.org/jira/browse/SPARK-40770 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Enrico Minack >Assignee: Enrico Minack >Priority: Minor > Fix For: 3.5.0 > > > Error messages raised by `applyInPandas` are very generic or useless when > used with complex schemata: > {code} > KeyError: 'val' > {code} > {code} > RuntimeError: Number of columns of the returned pandas.DataFrame doesn't > match specified schema. Expected: 2 Actual: 3 > {code} > {code} > java.lang.IllegalArgumentException: not all nodes and buffers were consumed. > nodes: [ArrowFieldNode [length=3, nullCount=0]] buffers: [ArrowBuf[304], > address:139860828549160, length:0, ArrowBuf[305], address:139860828549160, > length:24] > {code} > {code} > pyarrow.lib.ArrowTypeError: Expected a string or bytes dtype, got int64 > {code} > {code} > pyarrow.lib.ArrowInvalid: Could not convert '0' with type str: tried to > convert to double > {code} > These should be improved by adding column names or descriptive messages (in > the same order as above): > {code} > RuntimeError: Column names of the returned pandas.DataFrame do not match > specified schema. Missing: val Unexpected: v Schema: id, val > {code} > {code} > RuntimeError: Column names of the returned pandas.DataFrame do not match > specified schema. Missing: val Unexpected: foo, v Schema: id, val > {code} > {code} > RuntimeError: Column names of the returned pandas.DataFrame do not match > specified schema. Unexpected: v Schema: id, id > {code} > {code} > pyarrow.lib.ArrowTypeError: Expected a string or bytes dtype, got int64 > The above exception was the direct cause of the following exception: > TypeError: Exception thrown when converting pandas.Series (int64) with name > 'val' to Arrow Array (string). > {code} > {code} > pyarrow.lib.ArrowInvalid: Could not convert '0' with type str: tried to > convert to double > The above exception was the direct cause of the following exception: > ValueError: Exception thrown when converting pandas.Series (object) with name > 'val' to Arrow Array (double). > {code} > When no column names are given, the following error was returned: > {code} > RuntimeError: Number of columns of the returned pandas.DataFrame doesn't > match specified schema. Expected: 2 Actual: 3 > {code} > Where it should contain the output schema: > {code} > RuntimeError: Number of columns of the returned pandas.DataFrame doesn't > match specified schema. Expected: 2 Actual: 3 Schema: id, val > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40770) Improved error messages for applyInPandas for schema mismatch
[ https://issues.apache.org/jira/browse/SPARK-40770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-40770: Assignee: Enrico Minack > Improved error messages for applyInPandas for schema mismatch > - > > Key: SPARK-40770 > URL: https://issues.apache.org/jira/browse/SPARK-40770 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Enrico Minack >Assignee: Enrico Minack >Priority: Minor > > Error messages raised by `applyInPandas` are very generic or useless when > used with complex schemata: > {code} > KeyError: 'val' > {code} > {code} > RuntimeError: Number of columns of the returned pandas.DataFrame doesn't > match specified schema. Expected: 2 Actual: 3 > {code} > {code} > java.lang.IllegalArgumentException: not all nodes and buffers were consumed. > nodes: [ArrowFieldNode [length=3, nullCount=0]] buffers: [ArrowBuf[304], > address:139860828549160, length:0, ArrowBuf[305], address:139860828549160, > length:24] > {code} > {code} > pyarrow.lib.ArrowTypeError: Expected a string or bytes dtype, got int64 > {code} > {code} > pyarrow.lib.ArrowInvalid: Could not convert '0' with type str: tried to > convert to double > {code} > These should be improved by adding column names or descriptive messages (in > the same order as above): > {code} > RuntimeError: Column names of the returned pandas.DataFrame do not match > specified schema. Missing: val Unexpected: v Schema: id, val > {code} > {code} > RuntimeError: Column names of the returned pandas.DataFrame do not match > specified schema. Missing: val Unexpected: foo, v Schema: id, val > {code} > {code} > RuntimeError: Column names of the returned pandas.DataFrame do not match > specified schema. Unexpected: v Schema: id, id > {code} > {code} > pyarrow.lib.ArrowTypeError: Expected a string or bytes dtype, got int64 > The above exception was the direct cause of the following exception: > TypeError: Exception thrown when converting pandas.Series (int64) with name > 'val' to Arrow Array (string). > {code} > {code} > pyarrow.lib.ArrowInvalid: Could not convert '0' with type str: tried to > convert to double > The above exception was the direct cause of the following exception: > ValueError: Exception thrown when converting pandas.Series (object) with name > 'val' to Arrow Array (double). > {code} > When no column names are given, the following error was returned: > {code} > RuntimeError: Number of columns of the returned pandas.DataFrame doesn't > match specified schema. Expected: 2 Actual: 3 > {code} > Where it should contain the output schema: > {code} > RuntimeError: Number of columns of the returned pandas.DataFrame doesn't > match specified schema. Expected: 2 Actual: 3 Schema: id, val > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-40770) Improved error messages for applyInPandas for schema mismatch
[ https://issues.apache.org/jira/browse/SPARK-40770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-40770. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 38223 [https://github.com/apache/spark/pull/38223] > Improved error messages for applyInPandas for schema mismatch > - > > Key: SPARK-40770 > URL: https://issues.apache.org/jira/browse/SPARK-40770 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Enrico Minack >Assignee: Enrico Minack >Priority: Minor > Fix For: 3.4.0 > > > Error messages raised by `applyInPandas` are very generic or useless when > used with complex schemata: > {code} > KeyError: 'val' > {code} > {code} > RuntimeError: Number of columns of the returned pandas.DataFrame doesn't > match specified schema. Expected: 2 Actual: 3 > {code} > {code} > java.lang.IllegalArgumentException: not all nodes and buffers were consumed. > nodes: [ArrowFieldNode [length=3, nullCount=0]] buffers: [ArrowBuf[304], > address:139860828549160, length:0, ArrowBuf[305], address:139860828549160, > length:24] > {code} > {code} > pyarrow.lib.ArrowTypeError: Expected a string or bytes dtype, got int64 > {code} > {code} > pyarrow.lib.ArrowInvalid: Could not convert '0' with type str: tried to > convert to double > {code} > These should be improved by adding column names or descriptive messages (in > the same order as above): > {code} > RuntimeError: Column names of the returned pandas.DataFrame do not match > specified schema. Missing: val Unexpected: v Schema: id, val > {code} > {code} > RuntimeError: Column names of the returned pandas.DataFrame do not match > specified schema. Missing: val Unexpected: foo, v Schema: id, val > {code} > {code} > RuntimeError: Column names of the returned pandas.DataFrame do not match > specified schema. Unexpected: v Schema: id, id > {code} > {code} > pyarrow.lib.ArrowTypeError: Expected a string or bytes dtype, got int64 > The above exception was the direct cause of the following exception: > TypeError: Exception thrown when converting pandas.Series (int64) with name > 'val' to Arrow Array (string). > {code} > {code} > pyarrow.lib.ArrowInvalid: Could not convert '0' with type str: tried to > convert to double > The above exception was the direct cause of the following exception: > ValueError: Exception thrown when converting pandas.Series (object) with name > 'val' to Arrow Array (double). > {code} > When no column names are given, the following error was returned: > {code} > RuntimeError: Number of columns of the returned pandas.DataFrame doesn't > match specified schema. Expected: 2 Actual: 3 > {code} > Where it should contain the output schema: > {code} > RuntimeError: Number of columns of the returned pandas.DataFrame doesn't > match specified schema. Expected: 2 Actual: 3 Schema: id, val > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42379) Use FileSystem.exists in FileSystemBasedCheckpointFileManager.exists
[ https://issues.apache.org/jira/browse/SPARK-42379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim resolved SPARK-42379. -- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 39936 [https://github.com/apache/spark/pull/39936] > Use FileSystem.exists in FileSystemBasedCheckpointFileManager.exists > > > Key: SPARK-42379 > URL: https://issues.apache.org/jira/browse/SPARK-42379 > Project: Spark > Issue Type: Task > Components: Structured Streaming >Affects Versions: 3.5.0 >Reporter: Jungtaek Lim >Assignee: Jungtaek Lim >Priority: Major > Fix For: 3.5.0 > > > Other methods in FileSystemBasedCheckpointFileManager already uses > FileSystem.exists for all cases checking existence of the path. Use > FileSystem.exists in FileSystemBasedCheckpointFileManager.exists, which is > consistent with other methods in FileSystemBasedCheckpointFileManager. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42379) Use FileSystem.exists in FileSystemBasedCheckpointFileManager.exists
[ https://issues.apache.org/jira/browse/SPARK-42379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim reassigned SPARK-42379: Assignee: Jungtaek Lim > Use FileSystem.exists in FileSystemBasedCheckpointFileManager.exists > > > Key: SPARK-42379 > URL: https://issues.apache.org/jira/browse/SPARK-42379 > Project: Spark > Issue Type: Task > Components: Structured Streaming >Affects Versions: 3.5.0 >Reporter: Jungtaek Lim >Assignee: Jungtaek Lim >Priority: Major > > Other methods in FileSystemBasedCheckpointFileManager already uses > FileSystem.exists for all cases checking existence of the path. Use > FileSystem.exists in FileSystemBasedCheckpointFileManager.exists, which is > consistent with other methods in FileSystemBasedCheckpointFileManager. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41962) Update the import order of scala package in class SpecificParquetRecordReaderBase
[ https://issues.apache.org/jira/browse/SPARK-41962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-41962: -- Fix Version/s: (was: 3.2.4) > Update the import order of scala package in class > SpecificParquetRecordReaderBase > - > > Key: SPARK-41962 > URL: https://issues.apache.org/jira/browse/SPARK-41962 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: shuyouZZ >Assignee: shuyouZZ >Priority: Major > Fix For: 3.3.2, 3.4.0 > > > There is a check style issue in class {{SpecificParquetRecordReaderBase}}. > The import order of scala package is not correct. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42335) Pass the comment option through to univocity if users set it explicitly in CSV dataSource
[ https://issues.apache.org/jira/browse/SPARK-42335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen resolved SPARK-42335. -- Fix Version/s: 3.5.0 (was: 3.4.0) Resolution: Fixed Issue resolved by pull request 39878 [https://github.com/apache/spark/pull/39878] > Pass the comment option through to univocity if users set it explicitly in > CSV dataSource > - > > Key: SPARK-42335 > URL: https://issues.apache.org/jira/browse/SPARK-42335 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0, 3.1.0, 3.2.0, 3.3.0 >Reporter: Wei Guo >Assignee: Wei Guo >Priority: Minor > Fix For: 3.5.0 > > Attachments: image-2023-02-03-18-56-01-596.png, > image-2023-02-03-18-56-10-083.png > > > In PR [https://github.com/apache/spark/pull/29516], in order to fix some > bugs, univocity-parsers used by CSV dataSource was upgraded from 2.8.3 to > 2.9.0, it also involved a new feature of univocity-parsers that quoting > values of the first column that start with the comment character. It made a > breaking for users downstream that handing a whole row as input. > > For codes: > {code:java} > Seq(("#abc", 1)).toDF.write.csv("/Users/guowei/comment_test") {code} > Before Spark 3.0,the content of output CSV files is shown as: > !image-2023-02-03-18-56-01-596.png! > After this change, the content is shown as: > !image-2023-02-03-18-56-10-083.png! > For users, they can't set comment option to '\u' to keep the behavior as > before because the new added `isCommentSet` check logic as follows: > {code:java} > val isCommentSet = this.comment != '\u' > def asWriterSettings: CsvWriterSettings = { > // other code > if (isCommentSet) { > format.setComment(comment) > } > // other code > } > {code} > It's better to pass the comment option through to univocity if users set it > explicitly in CSV dataSource. > > After this change, the behavior as flows: > |id|code|2.4 and before|3.0 and after|this update|remark| > |1|Seq("#abc", "\udef", "xyz").toDF() > .write.{color:#57d9a3}option("comment", "\u"){color}.csv(path)|#abc > *def* > xyz|{color:#4c9aff}"#abc"{color} > {color:#4c9aff}*def*{color} > {color:#4c9aff}xyz{color}|{color:#4c9aff}#abc{color} > {color:#4c9aff}*"def"*{color} > {color:#4c9aff}xyz{color}|{color:#4c9aff}this update has a little bit > difference with 3.0{color}| > |2|Seq("#abc", "\udef", "xyz").toDF() > .write{color:#57d9a3}.option("comment", "#"){color}.csv(path)|#abc > *def* > xyz|"#abc" > *def* > xyz|"#abc" > *def* > xyz|the same| > |3|Seq("#abc", "\udef", "xyz").toDF() > .write.csv(path)|#abc > *def* > xyz|"#abc" > *def* > xyz|"#abc" > *def* > xyz|default behavior: the same| > |4|{_}Seq{_}("#abc", "\udef", "xyz").toDF().write.text(path) > spark.read.{color:#57d9a3}option("comment", "\u"){color}.csv(path)|#abc > xyz|{color:#4c9aff}#abc{color} > {color:#4c9aff}\udef{color} > {color:#4c9aff}xyz{color}|{color:#4c9aff}#abc{color} > {color:#4c9aff}xyz{color}|{color:#4c9aff}this update has a little bit > difference with 3.0{color}| > |5|{_}Seq{_}("#abc", "\udef", "xyz").toDF().write.text(path) > spark.read.{color:#57d9a3}option("comment", "#"){color}.csv(path)|\udef > xyz|\udef > xyz|\udef > xyz|the same| > |6|{_}Seq{_}("#abc", "\udef", "xyz").toDF().write.text(path) > spark.read.csv(path)|#abc > xyz|#abc > \udef > xyz|#abc > \udef > xyz|default behavior: the same| > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42335) Pass the comment option through to univocity if users set it explicitly in CSV dataSource
[ https://issues.apache.org/jira/browse/SPARK-42335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen reassigned SPARK-42335: Assignee: Wei Guo > Pass the comment option through to univocity if users set it explicitly in > CSV dataSource > - > > Key: SPARK-42335 > URL: https://issues.apache.org/jira/browse/SPARK-42335 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0, 3.1.0, 3.2.0, 3.3.0 >Reporter: Wei Guo >Assignee: Wei Guo >Priority: Minor > Fix For: 3.4.0 > > Attachments: image-2023-02-03-18-56-01-596.png, > image-2023-02-03-18-56-10-083.png > > > In PR [https://github.com/apache/spark/pull/29516], in order to fix some > bugs, univocity-parsers used by CSV dataSource was upgraded from 2.8.3 to > 2.9.0, it also involved a new feature of univocity-parsers that quoting > values of the first column that start with the comment character. It made a > breaking for users downstream that handing a whole row as input. > > For codes: > {code:java} > Seq(("#abc", 1)).toDF.write.csv("/Users/guowei/comment_test") {code} > Before Spark 3.0,the content of output CSV files is shown as: > !image-2023-02-03-18-56-01-596.png! > After this change, the content is shown as: > !image-2023-02-03-18-56-10-083.png! > For users, they can't set comment option to '\u' to keep the behavior as > before because the new added `isCommentSet` check logic as follows: > {code:java} > val isCommentSet = this.comment != '\u' > def asWriterSettings: CsvWriterSettings = { > // other code > if (isCommentSet) { > format.setComment(comment) > } > // other code > } > {code} > It's better to pass the comment option through to univocity if users set it > explicitly in CSV dataSource. > > After this change, the behavior as flows: > |id|code|2.4 and before|3.0 and after|this update|remark| > |1|Seq("#abc", "\udef", "xyz").toDF() > .write.{color:#57d9a3}option("comment", "\u"){color}.csv(path)|#abc > *def* > xyz|{color:#4c9aff}"#abc"{color} > {color:#4c9aff}*def*{color} > {color:#4c9aff}xyz{color}|{color:#4c9aff}#abc{color} > {color:#4c9aff}*"def"*{color} > {color:#4c9aff}xyz{color}|{color:#4c9aff}this update has a little bit > difference with 3.0{color}| > |2|Seq("#abc", "\udef", "xyz").toDF() > .write{color:#57d9a3}.option("comment", "#"){color}.csv(path)|#abc > *def* > xyz|"#abc" > *def* > xyz|"#abc" > *def* > xyz|the same| > |3|Seq("#abc", "\udef", "xyz").toDF() > .write.csv(path)|#abc > *def* > xyz|"#abc" > *def* > xyz|"#abc" > *def* > xyz|default behavior: the same| > |4|{_}Seq{_}("#abc", "\udef", "xyz").toDF().write.text(path) > spark.read.{color:#57d9a3}option("comment", "\u"){color}.csv(path)|#abc > xyz|{color:#4c9aff}#abc{color} > {color:#4c9aff}\udef{color} > {color:#4c9aff}xyz{color}|{color:#4c9aff}#abc{color} > {color:#4c9aff}xyz{color}|{color:#4c9aff}this update has a little bit > difference with 3.0{color}| > |5|{_}Seq{_}("#abc", "\udef", "xyz").toDF().write.text(path) > spark.read.{color:#57d9a3}option("comment", "#"){color}.csv(path)|\udef > xyz|\udef > xyz|\udef > xyz|the same| > |6|{_}Seq{_}("#abc", "\udef", "xyz").toDF().write.text(path) > spark.read.csv(path)|#abc > xyz|#abc > \udef > xyz|#abc > \udef > xyz|default behavior: the same| > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41695) Upgrade netty to 4.1.86.Final
[ https://issues.apache.org/jira/browse/SPARK-41695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] L. C. Hsieh resolved SPARK-41695. - Resolution: Won't Fix > Upgrade netty to 4.1.86.Final > - > > Key: SPARK-41695 > URL: https://issues.apache.org/jira/browse/SPARK-41695 > Project: Spark > Issue Type: Dependency upgrade > Components: Build >Affects Versions: 3.3.1 >Reporter: Tobias Stadler >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41696) Upgrade Hadoop to 3.3.4
[ https://issues.apache.org/jira/browse/SPARK-41696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] L. C. Hsieh resolved SPARK-41696. - Resolution: Won't Fix > Upgrade Hadoop to 3.3.4 > --- > > Key: SPARK-41696 > URL: https://issues.apache.org/jira/browse/SPARK-41696 > Project: Spark > Issue Type: Dependency upgrade > Components: Build >Affects Versions: 3.3.1 >Reporter: Tobias Stadler >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42346) distinct(count colname) with UNION ALL causes query analyzer bug
[ https://issues.apache.org/jira/browse/SPARK-42346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17686081#comment-17686081 ] Ritika Maheshwari commented on SPARK-42346: --- Yes that caused the error to appear. Thanks > distinct(count colname) with UNION ALL causes query analyzer bug > > > Key: SPARK-42346 > URL: https://issues.apache.org/jira/browse/SPARK-42346 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0, 3.4.0, 3.5.0 >Reporter: Robin >Assignee: Peter Toth >Priority: Major > Fix For: 3.3.2, 3.4.0, 3.5.0 > > > If you combine a UNION ALL with a count(distinct colname) you get a query > analyzer bug. > > This behaviour is introduced in 3.3.0. The bug was not present in 3.2.1. > > Here is a reprex in PySpark: > {{df_pd = pd.DataFrame([}} > {{ \{'surname': 'a', 'first_name': 'b'}}} > {{])}} > {{df_spark = spark.createDataFrame(df_pd)}} > {{df_spark.createOrReplaceTempView("input_table")}} > {{sql = """}} > {{SELECT }} > {{ (SELECT Count(DISTINCT first_name) FROM input_table) }} > {{ AS distinct_value_count}} > {{FROM input_table}} > {{UNION ALL}} > {{SELECT }} > {{ (SELECT Count(DISTINCT surname) FROM input_table) }} > {{ AS distinct_value_count}} > {{FROM input_table """}} > {{spark.sql(sql).toPandas()}} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42318) Assign name to _LEGACY_ERROR_TEMP_2125
[ https://issues.apache.org/jira/browse/SPARK-42318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk resolved SPARK-42318. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39891 [https://github.com/apache/spark/pull/39891] > Assign name to _LEGACY_ERROR_TEMP_2125 > -- > > Key: SPARK-42318 > URL: https://issues.apache.org/jira/browse/SPARK-42318 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42318) Assign name to _LEGACY_ERROR_TEMP_2125
[ https://issues.apache.org/jira/browse/SPARK-42318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk reassigned SPARK-42318: Assignee: Haejoon Lee > Assign name to _LEGACY_ERROR_TEMP_2125 > -- > > Key: SPARK-42318 > URL: https://issues.apache.org/jira/browse/SPARK-42318 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42319) Assign name to _LEGACY_ERROR_TEMP_2123
[ https://issues.apache.org/jira/browse/SPARK-42319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk resolved SPARK-42319. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39891 [https://github.com/apache/spark/pull/39891] > Assign name to _LEGACY_ERROR_TEMP_2123 > -- > > Key: SPARK-42319 > URL: https://issues.apache.org/jira/browse/SPARK-42319 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42319) Assign name to _LEGACY_ERROR_TEMP_2123
[ https://issues.apache.org/jira/browse/SPARK-42319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk reassigned SPARK-42319: Assignee: Haejoon Lee > Assign name to _LEGACY_ERROR_TEMP_2123 > -- > > Key: SPARK-42319 > URL: https://issues.apache.org/jira/browse/SPARK-42319 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42310) Assign name to _LEGACY_ERROR_TEMP_1289
[ https://issues.apache.org/jira/browse/SPARK-42310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42310: Assignee: (was: Apache Spark) > Assign name to _LEGACY_ERROR_TEMP_1289 > -- > > Key: SPARK-42310 > URL: https://issues.apache.org/jira/browse/SPARK-42310 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42310) Assign name to _LEGACY_ERROR_TEMP_1289
[ https://issues.apache.org/jira/browse/SPARK-42310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42310: Assignee: Apache Spark > Assign name to _LEGACY_ERROR_TEMP_1289 > -- > > Key: SPARK-42310 > URL: https://issues.apache.org/jira/browse/SPARK-42310 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42310) Assign name to _LEGACY_ERROR_TEMP_1289
[ https://issues.apache.org/jira/browse/SPARK-42310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17686036#comment-17686036 ] Apache Spark commented on SPARK-42310: -- User 'itholic' has created a pull request for this issue: https://github.com/apache/spark/pull/39946 > Assign name to _LEGACY_ERROR_TEMP_1289 > -- > > Key: SPARK-42310 > URL: https://issues.apache.org/jira/browse/SPARK-42310 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42314) Assign name to _LEGACY_ERROR_TEMP_2127
[ https://issues.apache.org/jira/browse/SPARK-42314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk reassigned SPARK-42314: Assignee: Haejoon Lee > Assign name to _LEGACY_ERROR_TEMP_2127 > -- > > Key: SPARK-42314 > URL: https://issues.apache.org/jira/browse/SPARK-42314 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42314) Assign name to _LEGACY_ERROR_TEMP_2127
[ https://issues.apache.org/jira/browse/SPARK-42314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk resolved SPARK-42314. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39890 [https://github.com/apache/spark/pull/39890] > Assign name to _LEGACY_ERROR_TEMP_2127 > -- > > Key: SPARK-42314 > URL: https://issues.apache.org/jira/browse/SPARK-42314 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42384) Mask function's generated code does not handle null input
[ https://issues.apache.org/jira/browse/SPARK-42384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17685995#comment-17685995 ] Apache Spark commented on SPARK-42384: -- User 'bersprockets' has created a pull request for this issue: https://github.com/apache/spark/pull/39945 > Mask function's generated code does not handle null input > - > > Key: SPARK-42384 > URL: https://issues.apache.org/jira/browse/SPARK-42384 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0, 3.5.0 >Reporter: Bruce Robbins >Priority: Major > > Example: > {noformat} > create or replace temp view v1 as > select * from values > (null), > ('AbCD123-@$#') > as data(col1); > cache table v1; > select mask(col1) from v1; > {noformat} > This query results in a {{NullPointerException}}: > {noformat} > 23/02/07 16:36:06 ERROR Executor: Exception in task 0.0 in stage 3.0 (TID 3) > java.lang.NullPointerException > at > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeWriter.write(UnsafeWriter.java:110) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760) > {noformat} > The generated code calls {{UnsafeWriter.write(0, value_0)}} regardless of > whether {{Mask.transformInput}} returns null or not. The > {{UnsafeWriter.write}} method for {{UTF8String}} does not expect a null > pointer. > {noformat} > /* 031 */ boolean isNull_1 = i.isNullAt(0); > /* 032 */ UTF8String value_1 = isNull_1 ? > /* 033 */ null : (i.getUTF8String(0)); > /* 034 */ > /* 035 */ > /* 036 */ > /* 037 */ > /* 038 */ UTF8String value_0 = null; > /* 039 */ value_0 = > org.apache.spark.sql.catalyst.expressions.Mask.transformInput(value_1, > ((UTF8String) references[0] /* literal */), ((UTF8String) references[1] /* > literal */), ((UTF8String) references[2] /* literal */), ((UTF8String) > references[3] /* literal */));; > /* 040 */ if (false) { > /* 041 */ mutableStateArray_0[0].setNullAt(0); > /* 042 */ } else { > /* 043 */ mutableStateArray_0[0].write(0, value_0); > /* 044 */ } > /* 045 */ return (mutableStateArray_0[0].getRow()); > /* 046 */ } > {noformat} > The bug is not exercised by a literal null input value, since there appears > to be some optimization that simply replaces the entire function call with a > null literal: > {noformat} > spark-sql> explain SELECT mask(NULL); > == Physical Plan == > *(1) Project [null AS mask(NULL, X, x, n, NULL)#47] > +- *(1) Scan OneRowRelation[] > Time taken: 0.026 seconds, Fetched 1 row(s) > spark-sql> SELECT mask(NULL); > NULL > Time taken: 0.042 seconds, Fetched 1 row(s) > spark-sql> > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42384) Mask function's generated code does not handle null input
[ https://issues.apache.org/jira/browse/SPARK-42384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42384: Assignee: (was: Apache Spark) > Mask function's generated code does not handle null input > - > > Key: SPARK-42384 > URL: https://issues.apache.org/jira/browse/SPARK-42384 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0, 3.5.0 >Reporter: Bruce Robbins >Priority: Major > > Example: > {noformat} > create or replace temp view v1 as > select * from values > (null), > ('AbCD123-@$#') > as data(col1); > cache table v1; > select mask(col1) from v1; > {noformat} > This query results in a {{NullPointerException}}: > {noformat} > 23/02/07 16:36:06 ERROR Executor: Exception in task 0.0 in stage 3.0 (TID 3) > java.lang.NullPointerException > at > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeWriter.write(UnsafeWriter.java:110) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760) > {noformat} > The generated code calls {{UnsafeWriter.write(0, value_0)}} regardless of > whether {{Mask.transformInput}} returns null or not. The > {{UnsafeWriter.write}} method for {{UTF8String}} does not expect a null > pointer. > {noformat} > /* 031 */ boolean isNull_1 = i.isNullAt(0); > /* 032 */ UTF8String value_1 = isNull_1 ? > /* 033 */ null : (i.getUTF8String(0)); > /* 034 */ > /* 035 */ > /* 036 */ > /* 037 */ > /* 038 */ UTF8String value_0 = null; > /* 039 */ value_0 = > org.apache.spark.sql.catalyst.expressions.Mask.transformInput(value_1, > ((UTF8String) references[0] /* literal */), ((UTF8String) references[1] /* > literal */), ((UTF8String) references[2] /* literal */), ((UTF8String) > references[3] /* literal */));; > /* 040 */ if (false) { > /* 041 */ mutableStateArray_0[0].setNullAt(0); > /* 042 */ } else { > /* 043 */ mutableStateArray_0[0].write(0, value_0); > /* 044 */ } > /* 045 */ return (mutableStateArray_0[0].getRow()); > /* 046 */ } > {noformat} > The bug is not exercised by a literal null input value, since there appears > to be some optimization that simply replaces the entire function call with a > null literal: > {noformat} > spark-sql> explain SELECT mask(NULL); > == Physical Plan == > *(1) Project [null AS mask(NULL, X, x, n, NULL)#47] > +- *(1) Scan OneRowRelation[] > Time taken: 0.026 seconds, Fetched 1 row(s) > spark-sql> SELECT mask(NULL); > NULL > Time taken: 0.042 seconds, Fetched 1 row(s) > spark-sql> > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42384) Mask function's generated code does not handle null input
[ https://issues.apache.org/jira/browse/SPARK-42384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42384: Assignee: Apache Spark > Mask function's generated code does not handle null input > - > > Key: SPARK-42384 > URL: https://issues.apache.org/jira/browse/SPARK-42384 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0, 3.5.0 >Reporter: Bruce Robbins >Assignee: Apache Spark >Priority: Major > > Example: > {noformat} > create or replace temp view v1 as > select * from values > (null), > ('AbCD123-@$#') > as data(col1); > cache table v1; > select mask(col1) from v1; > {noformat} > This query results in a {{NullPointerException}}: > {noformat} > 23/02/07 16:36:06 ERROR Executor: Exception in task 0.0 in stage 3.0 (TID 3) > java.lang.NullPointerException > at > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeWriter.write(UnsafeWriter.java:110) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760) > {noformat} > The generated code calls {{UnsafeWriter.write(0, value_0)}} regardless of > whether {{Mask.transformInput}} returns null or not. The > {{UnsafeWriter.write}} method for {{UTF8String}} does not expect a null > pointer. > {noformat} > /* 031 */ boolean isNull_1 = i.isNullAt(0); > /* 032 */ UTF8String value_1 = isNull_1 ? > /* 033 */ null : (i.getUTF8String(0)); > /* 034 */ > /* 035 */ > /* 036 */ > /* 037 */ > /* 038 */ UTF8String value_0 = null; > /* 039 */ value_0 = > org.apache.spark.sql.catalyst.expressions.Mask.transformInput(value_1, > ((UTF8String) references[0] /* literal */), ((UTF8String) references[1] /* > literal */), ((UTF8String) references[2] /* literal */), ((UTF8String) > references[3] /* literal */));; > /* 040 */ if (false) { > /* 041 */ mutableStateArray_0[0].setNullAt(0); > /* 042 */ } else { > /* 043 */ mutableStateArray_0[0].write(0, value_0); > /* 044 */ } > /* 045 */ return (mutableStateArray_0[0].getRow()); > /* 046 */ } > {noformat} > The bug is not exercised by a literal null input value, since there appears > to be some optimization that simply replaces the entire function call with a > null literal: > {noformat} > spark-sql> explain SELECT mask(NULL); > == Physical Plan == > *(1) Project [null AS mask(NULL, X, x, n, NULL)#47] > +- *(1) Scan OneRowRelation[] > Time taken: 0.026 seconds, Fetched 1 row(s) > spark-sql> SELECT mask(NULL); > NULL > Time taken: 0.042 seconds, Fetched 1 row(s) > spark-sql> > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42384) Mask function's generated code does not handle null input
[ https://issues.apache.org/jira/browse/SPARK-42384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bruce Robbins updated SPARK-42384: -- Affects Version/s: 3.4.0 > Mask function's generated code does not handle null input > - > > Key: SPARK-42384 > URL: https://issues.apache.org/jira/browse/SPARK-42384 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0, 3.5.0 >Reporter: Bruce Robbins >Priority: Major > > Example: > {noformat} > create or replace temp view v1 as > select * from values > (null), > ('AbCD123-@$#') > as data(col1); > cache table v1; > select mask(col1) from v1; > {noformat} > This query results in a {{NullPointerException}}: > {noformat} > 23/02/07 16:36:06 ERROR Executor: Exception in task 0.0 in stage 3.0 (TID 3) > java.lang.NullPointerException > at > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeWriter.write(UnsafeWriter.java:110) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760) > {noformat} > The generated code calls {{UnsafeWriter.write(0, value_0)}} regardless of > whether {{Mask.transformInput}} returns null or not. The > {{UnsafeWriter.write}} method for {{UTF8String}} does not expect a null > pointer. > {noformat} > /* 031 */ boolean isNull_1 = i.isNullAt(0); > /* 032 */ UTF8String value_1 = isNull_1 ? > /* 033 */ null : (i.getUTF8String(0)); > /* 034 */ > /* 035 */ > /* 036 */ > /* 037 */ > /* 038 */ UTF8String value_0 = null; > /* 039 */ value_0 = > org.apache.spark.sql.catalyst.expressions.Mask.transformInput(value_1, > ((UTF8String) references[0] /* literal */), ((UTF8String) references[1] /* > literal */), ((UTF8String) references[2] /* literal */), ((UTF8String) > references[3] /* literal */));; > /* 040 */ if (false) { > /* 041 */ mutableStateArray_0[0].setNullAt(0); > /* 042 */ } else { > /* 043 */ mutableStateArray_0[0].write(0, value_0); > /* 044 */ } > /* 045 */ return (mutableStateArray_0[0].getRow()); > /* 046 */ } > {noformat} > The bug is not exercised by a literal null input value, since there appears > to be some optimization that simply replaces the entire function call with a > null literal: > {noformat} > spark-sql> explain SELECT mask(NULL); > == Physical Plan == > *(1) Project [null AS mask(NULL, X, x, n, NULL)#47] > +- *(1) Scan OneRowRelation[] > Time taken: 0.026 seconds, Fetched 1 row(s) > spark-sql> SELECT mask(NULL); > NULL > Time taken: 0.042 seconds, Fetched 1 row(s) > spark-sql> > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42384) Mask function's generated code does not handle null input
Bruce Robbins created SPARK-42384: - Summary: Mask function's generated code does not handle null input Key: SPARK-42384 URL: https://issues.apache.org/jira/browse/SPARK-42384 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.5.0 Reporter: Bruce Robbins Example: {noformat} create or replace temp view v1 as select * from values (null), ('AbCD123-@$#') as data(col1); cache table v1; select mask(col1) from v1; {noformat} This query results in a {{NullPointerException}}: {noformat} 23/02/07 16:36:06 ERROR Executor: Exception in task 0.0 in stage 3.0 (TID 3) java.lang.NullPointerException at org.apache.spark.sql.catalyst.expressions.codegen.UnsafeWriter.write(UnsafeWriter.java:110) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760) {noformat} The generated code calls {{UnsafeWriter.write(0, value_0)}} regardless of whether {{Mask.transformInput}} returns null or not. The {{UnsafeWriter.write}} method for {{UTF8String}} does not expect a null pointer. {noformat} /* 031 */ boolean isNull_1 = i.isNullAt(0); /* 032 */ UTF8String value_1 = isNull_1 ? /* 033 */ null : (i.getUTF8String(0)); /* 034 */ /* 035 */ /* 036 */ /* 037 */ /* 038 */ UTF8String value_0 = null; /* 039 */ value_0 = org.apache.spark.sql.catalyst.expressions.Mask.transformInput(value_1, ((UTF8String) references[0] /* literal */), ((UTF8String) references[1] /* literal */), ((UTF8String) references[2] /* literal */), ((UTF8String) references[3] /* literal */));; /* 040 */ if (false) { /* 041 */ mutableStateArray_0[0].setNullAt(0); /* 042 */ } else { /* 043 */ mutableStateArray_0[0].write(0, value_0); /* 044 */ } /* 045 */ return (mutableStateArray_0[0].getRow()); /* 046 */ } {noformat} The bug is not exercised by a literal null input value, since there appears to be some optimization that simply replaces the entire function call with a null literal: {noformat} spark-sql> explain SELECT mask(NULL); == Physical Plan == *(1) Project [null AS mask(NULL, X, x, n, NULL)#47] +- *(1) Scan OneRowRelation[] Time taken: 0.026 seconds, Fetched 1 row(s) spark-sql> SELECT mask(NULL); NULL Time taken: 0.042 seconds, Fetched 1 row(s) spark-sql> {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42383) Protobuf serializer for RocksDB.TypeAliases
[ https://issues.apache.org/jira/browse/SPARK-42383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42383: Assignee: Apache Spark > Protobuf serializer for RocksDB.TypeAliases > --- > > Key: SPARK-42383 > URL: https://issues.apache.org/jira/browse/SPARK-42383 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.5.0 >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42383) Protobuf serializer for RocksDB.TypeAliases
[ https://issues.apache.org/jira/browse/SPARK-42383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42383: Assignee: (was: Apache Spark) > Protobuf serializer for RocksDB.TypeAliases > --- > > Key: SPARK-42383 > URL: https://issues.apache.org/jira/browse/SPARK-42383 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.5.0 >Reporter: Yang Jie >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42383) Protobuf serializer for RocksDB.TypeAliases
[ https://issues.apache.org/jira/browse/SPARK-42383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17685950#comment-17685950 ] Apache Spark commented on SPARK-42383: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/39944 > Protobuf serializer for RocksDB.TypeAliases > --- > > Key: SPARK-42383 > URL: https://issues.apache.org/jira/browse/SPARK-42383 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.5.0 >Reporter: Yang Jie >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42383) Protobuf serializer for RocksDB.TypeAliases
Yang Jie created SPARK-42383: Summary: Protobuf serializer for RocksDB.TypeAliases Key: SPARK-42383 URL: https://issues.apache.org/jira/browse/SPARK-42383 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.5.0 Reporter: Yang Jie -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40819) Parquet INT64 (TIMESTAMP(NANOS,true)) now throwing Illegal Parquet type instead of automatically converting to LongType
[ https://issues.apache.org/jira/browse/SPARK-40819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17685928#comment-17685928 ] Apache Spark commented on SPARK-40819: -- User 'awdavidson' has created a pull request for this issue: https://github.com/apache/spark/pull/39943 > Parquet INT64 (TIMESTAMP(NANOS,true)) now throwing Illegal Parquet type > instead of automatically converting to LongType > > > Key: SPARK-40819 > URL: https://issues.apache.org/jira/browse/SPARK-40819 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0, 3.2.1, 3.3.0, 3.2.2, 3.3.1, 3.2.3, 3.3.2, 3.4.0 >Reporter: Alfred Davidson >Assignee: Alfred Davidson >Priority: Critical > Labels: regression > Fix For: 3.2.4, 3.3.2, 3.4.0 > > > Since 3.2 parquet files containing attributes with type "INT64 > (TIMESTAMP(NANOS, true))" are no longer readable and attempting to read > throws: > > {code:java} > Caused by: org.apache.spark.sql.AnalysisException: Illegal Parquet type: > INT64 (TIMESTAMP(NANOS,true)) > at > org.apache.spark.sql.errors.QueryCompilationErrors$.illegalParquetTypeError(QueryCompilationErrors.scala:1284) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.illegalType$1(ParquetSchemaConverter.scala:105) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.convertPrimitiveField(ParquetSchemaConverter.scala:174) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.convertField(ParquetSchemaConverter.scala:90) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.$anonfun$convert$1(ParquetSchemaConverter.scala:72) > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238) > at scala.collection.Iterator.foreach(Iterator.scala:941) > at scala.collection.Iterator.foreach$(Iterator.scala:941) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1429) > at scala.collection.IterableLike.foreach(IterableLike.scala:74) > at scala.collection.IterableLike.foreach$(IterableLike.scala:73) > at scala.collection.AbstractIterable.foreach(Iterable.scala:56) > at scala.collection.TraversableLike.map(TraversableLike.scala:238) > at scala.collection.TraversableLike.map$(TraversableLike.scala:231) > at scala.collection.AbstractTraversable.map(Traversable.scala:108) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.convert(ParquetSchemaConverter.scala:66) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.convert(ParquetSchemaConverter.scala:63) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$.$anonfun$readSchemaFromFooter$2(ParquetFileFormat.scala:548) > at scala.Option.getOrElse(Option.scala:189) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$.readSchemaFromFooter(ParquetFileFormat.scala:548) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$.$anonfun$mergeSchemasInParallel$2(ParquetFileFormat.scala:528) > at scala.collection.immutable.Stream.map(Stream.scala:418) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$.$anonfun$mergeSchemasInParallel$1(ParquetFileFormat.scala:528) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$.$anonfun$mergeSchemasInParallel$1$adapted(ParquetFileFormat.scala:521) > at > org.apache.spark.sql.execution.datasources.SchemaMergeUtils$.$anonfun$mergeSchemasInParallel$2(SchemaMergeUtils.scala:76) > {code} > Prior to 3.2 successfully reads the parquet automatically converting to a > LongType. > I believe work part of https://issues.apache.org/jira/browse/SPARK-34661 > introduced the change in behaviour, more specifically here: > [https://github.com/apache/spark/pull/31776/files#diff-3730a913c4b95edf09fb78f8739c538bae53f7269555b6226efe7ccee1901b39R154] > which throws the QueryCompilationErrors.illegalParquetTypeError -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42305) Assign name to _LEGACY_ERROR_TEMP_1229
[ https://issues.apache.org/jira/browse/SPARK-42305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk reassigned SPARK-42305: Assignee: Haejoon Lee > Assign name to _LEGACY_ERROR_TEMP_1229 > -- > > Key: SPARK-42305 > URL: https://issues.apache.org/jira/browse/SPARK-42305 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42305) Assign name to _LEGACY_ERROR_TEMP_1229
[ https://issues.apache.org/jira/browse/SPARK-42305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk resolved SPARK-42305. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39875 [https://github.com/apache/spark/pull/39875] > Assign name to _LEGACY_ERROR_TEMP_1229 > -- > > Key: SPARK-42305 > URL: https://issues.apache.org/jira/browse/SPARK-42305 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42131) Extract the function that construct the select statement for JDBC dialect.
[ https://issues.apache.org/jira/browse/SPARK-42131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-42131. - Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 39667 [https://github.com/apache/spark/pull/39667] > Extract the function that construct the select statement for JDBC dialect. > -- > > Key: SPARK-42131 > URL: https://issues.apache.org/jira/browse/SPARK-42131 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: jiaan.geng >Assignee: jiaan.geng >Priority: Major > Fix For: 3.5.0 > > > Currently, JDBCRDD uses fixed format for SELECT statement. > {code:java} > val sqlText = options.prepareQuery + > s"SELECT $columnList FROM ${options.tableOrQuery} $myTableSampleClause" > + > s" $myWhereClause $getGroupByClause $getOrderByClause $myLimitClause > $myOffsetClause" > {code} > But some databases have different syntax that uses different keyword or sort. > For example, MS SQL Server uses keyword TOP to describe LIMIT clause or Top N. > The LIMIT clause of MS SQL Server. > {code:java} > SELECT TOP(1) Model, Color, Price > FROM dbo.Cars > WHERE Color = 'blue' > {code} > The Top N of MS SQL Server. > {code:java} > SELECT TOP(1) Model, Color, Price > FROM dbo.Cars > WHERE Color = 'blue' > ORDER BY Price ASC > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42131) Extract the function that construct the select statement for JDBC dialect.
[ https://issues.apache.org/jira/browse/SPARK-42131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-42131: --- Assignee: jiaan.geng > Extract the function that construct the select statement for JDBC dialect. > -- > > Key: SPARK-42131 > URL: https://issues.apache.org/jira/browse/SPARK-42131 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: jiaan.geng >Assignee: jiaan.geng >Priority: Major > > Currently, JDBCRDD uses fixed format for SELECT statement. > {code:java} > val sqlText = options.prepareQuery + > s"SELECT $columnList FROM ${options.tableOrQuery} $myTableSampleClause" > + > s" $myWhereClause $getGroupByClause $getOrderByClause $myLimitClause > $myOffsetClause" > {code} > But some databases have different syntax that uses different keyword or sort. > For example, MS SQL Server uses keyword TOP to describe LIMIT clause or Top N. > The LIMIT clause of MS SQL Server. > {code:java} > SELECT TOP(1) Model, Color, Price > FROM dbo.Cars > WHERE Color = 'blue' > {code} > The Top N of MS SQL Server. > {code:java} > SELECT TOP(1) Model, Color, Price > FROM dbo.Cars > WHERE Color = 'blue' > ORDER BY Price ASC > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42303) Assign name to _LEGACY_ERROR_TEMP_1326
[ https://issues.apache.org/jira/browse/SPARK-42303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk reassigned SPARK-42303: Assignee: Haejoon Lee > Assign name to _LEGACY_ERROR_TEMP_1326 > -- > > Key: SPARK-42303 > URL: https://issues.apache.org/jira/browse/SPARK-42303 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42303) Assign name to _LEGACY_ERROR_TEMP_1326
[ https://issues.apache.org/jira/browse/SPARK-42303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk resolved SPARK-42303. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39873 [https://github.com/apache/spark/pull/39873] > Assign name to _LEGACY_ERROR_TEMP_1326 > -- > > Key: SPARK-42303 > URL: https://issues.apache.org/jira/browse/SPARK-42303 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34645) [K8S] Driver pod stuck in Running state after job completes
[ https://issues.apache.org/jira/browse/SPARK-34645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17685897#comment-17685897 ] Hussein Awala commented on SPARK-34645: --- I am facing a similar problem with Spark 3.2.1 and JDK 8, I'm running the jobs in client mode on arm64 nodes, in 10% of these jobs, after deleting the executors pods and the created PVCs, the driver pod stucks in running state with this log: {code:java} 3/02/08 13:04:38 INFO SparkUI: Stopped Spark web UI at http://172.17.45.51:4040 23/02/08 13:04:38 INFO KubernetesClusterSchedulerBackend: Shutting down all executors 23/02/08 13:04:38 INFO KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Asking each executor to shut down 23/02/08 13:04:38 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed. 23/02/08 13:04:39 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 23/02/08 13:04:39 INFO MemoryStore: MemoryStore cleared 23/02/08 13:04:39 INFO BlockManager: BlockManager stopped 23/02/08 13:04:39 INFO BlockManagerMaster: BlockManagerMaster stopped 23/02/08 13:04:39 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 23/02/08 13:04:39 INFO SparkContext: Successfully stopped SparkContext {code} JDK: {code:java} root@***:/# java -version | tail -n3 openjdk version "1.8.0_362" OpenJDK Runtime Environment (Temurin)(build 1.8.0_362-b09) OpenJDK 64-Bit Server VM (Temurin)(build 25.362-b09, mixed mode) {code} I tried: * without the conf _spark.kubernetes.driver.reusePersistentVolumeClaim_ and without the PVC at all * applying the patch [https://github.com/apache/spark/commit/457b75ea2bca6b5811d61ce9f1d28c94b0dde3a2] proposed by [~mickayg] on spark 3.2.1 * upgrading tp 3.2.3 but I still have the same problem. I didn't find any relevant fix in the spark 3.3.0 and 3.3.1 release note except upgrading the kubernetes client, do you have some tips for investigating the issue? > [K8S] Driver pod stuck in Running state after job completes > --- > > Key: SPARK-34645 > URL: https://issues.apache.org/jira/browse/SPARK-34645 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.0.2 > Environment: Kubernetes: > {code:java} > Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.2", > GitCommit:"f5743093fd1c663cb0cbc89748f730662345d44d", GitTreeState:"clean", > BuildDate:"2020-09-16T13:41:02Z", GoVersion:"go1.15", Compiler:"gc", > Platform:"linux/amd64"} > Server Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.5", > GitCommit:"2166946f41b36dea2c4626f90a77706f426cdea2", GitTreeState:"clean", > BuildDate:"2019-03-25T15:19:22Z", GoVersion:"go1.11.5", Compiler:"gc", > Platform:"linux/amd64"} > {code} >Reporter: Andy Grove >Priority: Major > > I am running automated benchmarks in k8s, using spark-submit in cluster mode, > so the driver runs in a pod. > When running with Spark 3.0.1 and 3.1.1 everything works as expected and I > see the Spark context being shut down after the job completes. > However, when running with Spark 3.0.2 I do not see the context get shut down > and the driver pod is stuck in the Running state indefinitely. > This is the output I see after job completion with 3.0.1 and 3.1.1 and this > output does not appear with 3.0.2. With 3.0.2 there is no output at all after > the job completes. > {code:java} > 2021-03-05 20:09:24,576 INFO spark.SparkContext: Invoking stop() from > shutdown hook > 2021-03-05 20:09:24,592 INFO server.AbstractConnector: Stopped > Spark@784499d0{HTTP/1.1, (http/1.1)}{0.0.0.0:4040} > 2021-03-05 20:09:24,594 INFO ui.SparkUI: Stopped Spark web UI at > http://benchmark-runner-3e8a38780400e0d1-driver-svc.default.svc:4040 > 2021-03-05 20:09:24,599 INFO k8s.KubernetesClusterSchedulerBackend: Shutting > down all executors > 2021-03-05 20:09:24,600 INFO > k8s.KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Asking each > executor to shut down > 2021-03-05 20:09:24,609 WARN k8s.ExecutorPodsWatchSnapshotSource: Kubernetes > client has been closed (this is expected if the application is shutting down.) > 2021-03-05 20:09:24,719 INFO spark.MapOutputTrackerMasterEndpoint: > MapOutputTrackerMasterEndpoint stopped! > 2021-03-05 20:09:24,736 INFO memory.MemoryStore: MemoryStore cleared > 2021-03-05 20:09:24,738 INFO storage.BlockManager: BlockManager stopped > 2021-03-05 20:09:24,744 INFO storage.BlockManagerMaster: BlockManagerMaster > stopped > 2021-03-05 20:09:24,752 INFO > scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: > OutputCommitCoordinator stopped! > 2021-03-05 20:09:24,768 INFO spark.SparkContext: Successfully stopped > SparkContext > 2021-03-05 20:09:24,768 INFO util.Shutdow
[jira] [Comment Edited] (SPARK-42380) Upgrade maven to 3.9.0
[ https://issues.apache.org/jira/browse/SPARK-42380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17685881#comment-17685881 ] Yang Jie edited comment on SPARK-42380 at 2/8/23 12:24 PM: --- Upgrade `cyclonedx-maven-plugin` to 2.7.4 can avoid this error, but `cyclonedx-maven-plugin` 2.7.4 has another issue waiting to be fixed https://github.com/CycloneDX/cyclonedx-maven-plugin/issues/272 was (Author: luciferyang): Upgrade `cyclonedx-maven-plugin` to 2.7.4 can avoid this error, but `cyclonedx-maven-plugin` 2.7.4 has another issue waiting to be fixed > Upgrade maven to 3.9.0 > -- > > Key: SPARK-42380 > URL: https://issues.apache.org/jira/browse/SPARK-42380 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: Yang Jie >Priority: Minor > > {code:java} > [ERROR] An error occurred attempting to read POM > org.codehaus.plexus.util.xml.pull.XmlPullParserException: UTF-8 BOM plus xml > decl of ISO-8859-1 is incompatible (position: START_DOCUMENT seen version="1.0" encoding="ISO-8859-1"... @1:42) > at org.codehaus.plexus.util.xml.pull.MXParser.parseXmlDeclWithVersion > (MXParser.java:3423) > at org.codehaus.plexus.util.xml.pull.MXParser.parseXmlDecl > (MXParser.java:3345) > at org.codehaus.plexus.util.xml.pull.MXParser.parsePI (MXParser.java:3197) > at org.codehaus.plexus.util.xml.pull.MXParser.parseProlog > (MXParser.java:1828) > at org.codehaus.plexus.util.xml.pull.MXParser.nextImpl > (MXParser.java:1757) > at org.codehaus.plexus.util.xml.pull.MXParser.next (MXParser.java:1375) > at org.apache.maven.model.io.xpp3.MavenXpp3Reader.read > (MavenXpp3Reader.java:3940) > at org.apache.maven.model.io.xpp3.MavenXpp3Reader.read > (MavenXpp3Reader.java:612) > at org.apache.maven.model.io.xpp3.MavenXpp3Reader.read > (MavenXpp3Reader.java:627) > at org.cyclonedx.maven.BaseCycloneDxMojo.readPom > (BaseCycloneDxMojo.java:759) > at org.cyclonedx.maven.BaseCycloneDxMojo.readPom > (BaseCycloneDxMojo.java:746) > at org.cyclonedx.maven.BaseCycloneDxMojo.retrieveParentProject > (BaseCycloneDxMojo.java:694) > at org.cyclonedx.maven.BaseCycloneDxMojo.getClosestMetadata > (BaseCycloneDxMojo.java:524) > at org.cyclonedx.maven.BaseCycloneDxMojo.convert > (BaseCycloneDxMojo.java:481) > at org.cyclonedx.maven.CycloneDxMojo.execute (CycloneDxMojo.java:70) > at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo > (DefaultBuildPluginManager.java:126) > at org.apache.maven.lifecycle.internal.MojoExecutor.doExecute2 > (MojoExecutor.java:342) > at org.apache.maven.lifecycle.internal.MojoExecutor.doExecute > (MojoExecutor.java:330) > at org.apache.maven.lifecycle.internal.MojoExecutor.execute > (MojoExecutor.java:213) > at org.apache.maven.lifecycle.internal.MojoExecutor.execute > (MojoExecutor.java:175) > at org.apache.maven.lifecycle.internal.MojoExecutor.access$000 > (MojoExecutor.java:76) > at org.apache.maven.lifecycle.internal.MojoExecutor$1.run > (MojoExecutor.java:163) > at org.apache.maven.plugin.DefaultMojosExecutionStrategy.execute > (DefaultMojosExecutionStrategy.java:39) > at org.apache.maven.lifecycle.internal.MojoExecutor.execute > (MojoExecutor.java:160) > at > org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject > (LifecycleModuleBuilder.java:105) > at > org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject > (LifecycleModuleBuilder.java:73) > at > org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build > (SingleThreadedBuilder.java:53) > at org.apache.maven.lifecycle.internal.LifecycleStarter.execute > (LifecycleStarter.java:118) > at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:260) > at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:172) > at org.apache.maven.DefaultMaven.execute (DefaultMaven.java:100) > at org.apache.maven.cli.MavenCli.execute (MavenCli.java:821) > at org.apache.maven.cli.MavenCli.doMain (MavenCli.java:270) > at org.apache.maven.cli.MavenCli.main (MavenCli.java:192) > at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke > (NativeMethodAccessorImpl.java:62) > at sun.reflect.DelegatingMethodAccessorImpl.invoke > (DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke (Method.java:498) > at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced > (Launcher.java:282) > at org.codehaus.plexus.classworlds.launcher.Launcher.launch > (Launcher.java:225) > at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode > (Launcher.java:406) > at org.codeh
[jira] [Commented] (SPARK-42380) Upgrade maven to 3.9.0
[ https://issues.apache.org/jira/browse/SPARK-42380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17685881#comment-17685881 ] Yang Jie commented on SPARK-42380: -- Upgrade `cyclonedx-maven-plugin` to 2.7.4 can avoid this error, but `cyclonedx-maven-plugin` 2.7.4 has another issue waiting to be fixed > Upgrade maven to 3.9.0 > -- > > Key: SPARK-42380 > URL: https://issues.apache.org/jira/browse/SPARK-42380 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: Yang Jie >Priority: Minor > > {code:java} > [ERROR] An error occurred attempting to read POM > org.codehaus.plexus.util.xml.pull.XmlPullParserException: UTF-8 BOM plus xml > decl of ISO-8859-1 is incompatible (position: START_DOCUMENT seen version="1.0" encoding="ISO-8859-1"... @1:42) > at org.codehaus.plexus.util.xml.pull.MXParser.parseXmlDeclWithVersion > (MXParser.java:3423) > at org.codehaus.plexus.util.xml.pull.MXParser.parseXmlDecl > (MXParser.java:3345) > at org.codehaus.plexus.util.xml.pull.MXParser.parsePI (MXParser.java:3197) > at org.codehaus.plexus.util.xml.pull.MXParser.parseProlog > (MXParser.java:1828) > at org.codehaus.plexus.util.xml.pull.MXParser.nextImpl > (MXParser.java:1757) > at org.codehaus.plexus.util.xml.pull.MXParser.next (MXParser.java:1375) > at org.apache.maven.model.io.xpp3.MavenXpp3Reader.read > (MavenXpp3Reader.java:3940) > at org.apache.maven.model.io.xpp3.MavenXpp3Reader.read > (MavenXpp3Reader.java:612) > at org.apache.maven.model.io.xpp3.MavenXpp3Reader.read > (MavenXpp3Reader.java:627) > at org.cyclonedx.maven.BaseCycloneDxMojo.readPom > (BaseCycloneDxMojo.java:759) > at org.cyclonedx.maven.BaseCycloneDxMojo.readPom > (BaseCycloneDxMojo.java:746) > at org.cyclonedx.maven.BaseCycloneDxMojo.retrieveParentProject > (BaseCycloneDxMojo.java:694) > at org.cyclonedx.maven.BaseCycloneDxMojo.getClosestMetadata > (BaseCycloneDxMojo.java:524) > at org.cyclonedx.maven.BaseCycloneDxMojo.convert > (BaseCycloneDxMojo.java:481) > at org.cyclonedx.maven.CycloneDxMojo.execute (CycloneDxMojo.java:70) > at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo > (DefaultBuildPluginManager.java:126) > at org.apache.maven.lifecycle.internal.MojoExecutor.doExecute2 > (MojoExecutor.java:342) > at org.apache.maven.lifecycle.internal.MojoExecutor.doExecute > (MojoExecutor.java:330) > at org.apache.maven.lifecycle.internal.MojoExecutor.execute > (MojoExecutor.java:213) > at org.apache.maven.lifecycle.internal.MojoExecutor.execute > (MojoExecutor.java:175) > at org.apache.maven.lifecycle.internal.MojoExecutor.access$000 > (MojoExecutor.java:76) > at org.apache.maven.lifecycle.internal.MojoExecutor$1.run > (MojoExecutor.java:163) > at org.apache.maven.plugin.DefaultMojosExecutionStrategy.execute > (DefaultMojosExecutionStrategy.java:39) > at org.apache.maven.lifecycle.internal.MojoExecutor.execute > (MojoExecutor.java:160) > at > org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject > (LifecycleModuleBuilder.java:105) > at > org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject > (LifecycleModuleBuilder.java:73) > at > org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build > (SingleThreadedBuilder.java:53) > at org.apache.maven.lifecycle.internal.LifecycleStarter.execute > (LifecycleStarter.java:118) > at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:260) > at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:172) > at org.apache.maven.DefaultMaven.execute (DefaultMaven.java:100) > at org.apache.maven.cli.MavenCli.execute (MavenCli.java:821) > at org.apache.maven.cli.MavenCli.doMain (MavenCli.java:270) > at org.apache.maven.cli.MavenCli.main (MavenCli.java:192) > at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke > (NativeMethodAccessorImpl.java:62) > at sun.reflect.DelegatingMethodAccessorImpl.invoke > (DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke (Method.java:498) > at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced > (Launcher.java:282) > at org.codehaus.plexus.classworlds.launcher.Launcher.launch > (Launcher.java:225) > at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode > (Launcher.java:406) > at org.codehaus.plexus.classworlds.launcher.Launcher.main > (Launcher.java:347) > {code} > A existing problem -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.
[jira] [Updated] (SPARK-42380) Upgrade maven to 3.9.0
[ https://issues.apache.org/jira/browse/SPARK-42380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie updated SPARK-42380: - Description: {code:java} [ERROR] An error occurred attempting to read POM org.codehaus.plexus.util.xml.pull.XmlPullParserException: UTF-8 BOM plus xml decl of ISO-8859-1 is incompatible (position: START_DOCUMENT seen Upgrade maven to 3.9.0 > -- > > Key: SPARK-42380 > URL: https://issues.apache.org/jira/browse/SPARK-42380 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: Yang Jie >Priority: Minor > > {code:java} > [ERROR] An error occurred attempting to read POM > org.codehaus.plexus.util.xml.pull.XmlPullParserException: UTF-8 BOM plus xml > decl of ISO-8859-1 is incompatible (position: START_DOCUMENT seen version="1.0" encoding="ISO-8859-1"... @1:42) > at org.codehaus.plexus.util.xml.pull.MXParser.parseXmlDeclWithVersion > (MXParser.java:3423) > at org.codehaus.plexus.util.xml.pull.MXParser.parseXmlDecl > (MXParser.java:3345) > at org.codehaus.plexus.util.xml.pull.MXParser.parsePI (MXParser.java:3197) > at org.codehaus.plexus.util.xml.pull.MXParser.parseProlog > (MXParser.java:1828) > at org.codehaus.plexus.util.xml.pull.MXParser.nextImpl > (MXParser.java:1757) > at org.codehaus.plexus.util.xml.pull.MXParser.next (MXParser.java:1375) > at org.apache.maven.model.io.xpp3.MavenXpp3Reader.read > (MavenXpp3Reader.java:3940) > at org.apache.maven.model.io.xpp3.MavenXpp3Reader.read > (MavenXpp3Reader.java:612) > at org.apache.maven.model.io.xpp3.MavenXpp3Reader.read > (MavenXpp3Reader.java:627) > at org.cyclonedx.maven.BaseCycloneDxMojo.readPom > (BaseCycloneDxMojo.java:759) > at org.cyclonedx.maven.BaseCycloneDxMojo.readPom > (BaseCycloneDxMojo.java:746) > at org.cyclonedx.maven.BaseCycloneDxMojo.retrieveParentProject > (BaseCycloneDxMojo.java:694) > at org.cyclonedx.maven.BaseCycloneDxMojo.getClosestMetadata > (BaseCycloneDxMojo.java:524) > at org.cyclonedx.maven.BaseCycloneDxMojo.convert > (BaseCycloneDxMojo.java:481) > at org.cyclonedx.maven.CycloneDxMojo.execute (CycloneDxMojo.java:70) > at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo > (DefaultBuildPluginManager.java:126) > at org.apache.maven.lifecycle.internal.MojoExecutor.doExecute2 > (MojoExecutor.java:342) > at org.apache.maven.lifecycle.internal.MojoExecutor.doExecute > (MojoExecutor.java:330) > at org.apache.maven.lifecycle.internal.MojoExecutor.execute > (MojoExecutor.java:213) > at org.apache.maven.lifecycle.internal.MojoExecutor.execute > (MojoExecutor.java:175) > at org.apache.maven.lifecycle.internal.MojoExecutor.access$000 > (MojoExecutor.java:76) > at org.apache.maven.lifecycle.internal.MojoExecutor$1.run > (MojoExecutor.java:163) > at org.apache.maven.plugin.DefaultMojosExecutionStrategy.execute > (DefaultMojosExecutionStrategy.java:39) > at org.apache.maven.lifecycle.internal.MojoExecutor.execute > (MojoExecutor.java:160) > at > org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject > (LifecycleModuleBuilder.java:105) > at > org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject > (LifecycleModuleBuilder.java:73) > at > org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build > (SingleThreadedBuilder.java:53) > at org.apache.maven.lifecycle.internal.LifecycleStarter.execute > (LifecycleStarter.java:118) > at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:260) > at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:172) > at org.apache.maven.DefaultMaven.execute (DefaultMaven.java:100) > at org.apache.maven.cli.MavenCli.execute (MavenCli.java:821) > at org.apache.maven.cli.MavenCli.doMain (MavenCli.java:270) > at org.apache.maven.cli.MavenCli.main (MavenCli.java:192) > at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke > (NativeMethodAccessorImpl.java:62) > at sun.reflect.DelegatingMethodAccessorImpl.invoke > (DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke (Method.java:498) > at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced > (Launcher.java:282) > at org.codehaus.plexus.classworlds.launcher.Launcher.launch > (Launcher.java:225) > at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode > (Launcher.java:406) > at org.codehaus.plexus.classworlds.launcher.Launcher.main > (Launcher.java:347) > {code} > A existing problem -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail:
[jira] [Commented] (SPARK-42382) Upgrade `cyclonedx-maven-plugin` to 2.7.4
[ https://issues.apache.org/jira/browse/SPARK-42382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17685880#comment-17685880 ] Yang Jie commented on SPARK-42382: -- [https://github.com/CycloneDX/cyclonedx-maven-plugin/issues/272] need wait this fix > Upgrade `cyclonedx-maven-plugin` to 2.7.4 > - > > Key: SPARK-42382 > URL: https://issues.apache.org/jira/browse/SPARK-42382 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: Yang Jie >Priority: Minor > > https://github.com/CycloneDX/cyclonedx-maven-plugin/releases/tag/cyclonedx-maven-plugin-2.7.4 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42342) Introduce base hierarchy to exceptions.
[ https://issues.apache.org/jira/browse/SPARK-42342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-42342. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39882 [https://github.com/apache/spark/pull/39882] > Introduce base hierarchy to exceptions. > --- > > Key: SPARK-42342 > URL: https://issues.apache.org/jira/browse/SPARK-42342 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Takuya Ueshin >Assignee: Takuya Ueshin >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42342) Introduce base hierarchy to exceptions.
[ https://issues.apache.org/jira/browse/SPARK-42342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-42342: Assignee: Takuya Ueshin > Introduce base hierarchy to exceptions. > --- > > Key: SPARK-42342 > URL: https://issues.apache.org/jira/browse/SPARK-42342 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Takuya Ueshin >Assignee: Takuya Ueshin >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30022) Supporting Parsing of Simple Hive Virtual View created from Presto
[ https://issues.apache.org/jira/browse/SPARK-30022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17685861#comment-17685861 ] jorge arada commented on SPARK-30022: - We are facing the same problem. The presto/trino views are not "readable" from spark. What can we do to push this request forward? > Supporting Parsing of Simple Hive Virtual View created from Presto > -- > > Key: SPARK-30022 > URL: https://issues.apache.org/jira/browse/SPARK-30022 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Arun Ravi M V >Priority: Major > > > We have an environment were we use Apache Spark and Presto (both backed by > Apache Hive Metastore). Currently, Views created from Presto fail to get > parsed in Apache Spark. This is because Presto stores the View definition and > View schema in a base64 encoded fashion and Spark is unable to process it. I > would like to propose a minor change that will allow us to read these encoded > definitions created by Presto in a Spark Program. > Assuming that the UDFs are made available, the user should be able to read > presto views after the fix. > > I would like to propose a change to > [https://github.com/apache/spark/blob/9459833eae7fae887af560f3127997e023c51d00/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala#L440] > to have support creation of CatlogTable for Views created from Presto. > > Hive Metastore DB, the table definition for Presto views (*select* * *from* > TBLS *where* `TBL_TYPE` *like* '%VIRTUAL_VIEW%') shows that the > VIEW_EXPANDED_TEXT is hardcoded as `/* Presto View **/` and > VIEW_ORIGINAL_TEXT is `/** Presto View: base64({ "originalSql": "" "catalog": > "", "schema": "", "columns": [ > { "name": "", "type": "" } > ], "owner": ""}) */` > Refer: > [https://github.com/prestodb/presto/blob/3242715959a169dbcdd88946c28488d2365c8886/presto-hive/src/main/java/com/facebook/presto/hive/HiveUtil.java#L614] > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42381) `CreateDataFrame` should accept objects
[ https://issues.apache.org/jira/browse/SPARK-42381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng reassigned SPARK-42381: - Assignee: Ruifeng Zheng > `CreateDataFrame` should accept objects > --- > > Key: SPARK-42381 > URL: https://issues.apache.org/jira/browse/SPARK-42381 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-42346) distinct(count colname) with UNION ALL causes query analyzer bug
[ https://issues.apache.org/jira/browse/SPARK-42346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17685858#comment-17685858 ] Peter Toth edited comment on SPARK-42346 at 2/8/23 11:16 AM: - [~ritikam], you also need to disable the "ConvertToLocalRelation" rule optimization `--conf "spark.sql.optimizer.excludedRules=org.apache.spark.sql.catalyst.optimizer.ConvertToLocalRelation"` to get the error from spark-shell. was (Author: petertoth): [~ritikam], you also need to disable the "ConvertToLocalRelation" rule optimization `--conf "spark.sql.optimizer.excludedRules=org.apache.spark.sql.catalyst.optimizer.ConvertToLocalRelation"` to get the error from spark-schell. > distinct(count colname) with UNION ALL causes query analyzer bug > > > Key: SPARK-42346 > URL: https://issues.apache.org/jira/browse/SPARK-42346 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0, 3.4.0, 3.5.0 >Reporter: Robin >Assignee: Peter Toth >Priority: Major > Fix For: 3.3.2, 3.4.0, 3.5.0 > > > If you combine a UNION ALL with a count(distinct colname) you get a query > analyzer bug. > > This behaviour is introduced in 3.3.0. The bug was not present in 3.2.1. > > Here is a reprex in PySpark: > {{df_pd = pd.DataFrame([}} > {{ \{'surname': 'a', 'first_name': 'b'}}} > {{])}} > {{df_spark = spark.createDataFrame(df_pd)}} > {{df_spark.createOrReplaceTempView("input_table")}} > {{sql = """}} > {{SELECT }} > {{ (SELECT Count(DISTINCT first_name) FROM input_table) }} > {{ AS distinct_value_count}} > {{FROM input_table}} > {{UNION ALL}} > {{SELECT }} > {{ (SELECT Count(DISTINCT surname) FROM input_table) }} > {{ AS distinct_value_count}} > {{FROM input_table """}} > {{spark.sql(sql).toPandas()}} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42346) distinct(count colname) with UNION ALL causes query analyzer bug
[ https://issues.apache.org/jira/browse/SPARK-42346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17685858#comment-17685858 ] Peter Toth commented on SPARK-42346: [~ritikam], you also need to disable the "ConvertToLocalRelation" rule optimization `--conf "spark.sql.optimizer.excludedRules=org.apache.spark.sql.catalyst.optimizer.ConvertToLocalRelation"` to get the error from spark-schell. > distinct(count colname) with UNION ALL causes query analyzer bug > > > Key: SPARK-42346 > URL: https://issues.apache.org/jira/browse/SPARK-42346 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0, 3.4.0, 3.5.0 >Reporter: Robin >Assignee: Peter Toth >Priority: Major > Fix For: 3.3.2, 3.4.0, 3.5.0 > > > If you combine a UNION ALL with a count(distinct colname) you get a query > analyzer bug. > > This behaviour is introduced in 3.3.0. The bug was not present in 3.2.1. > > Here is a reprex in PySpark: > {{df_pd = pd.DataFrame([}} > {{ \{'surname': 'a', 'first_name': 'b'}}} > {{])}} > {{df_spark = spark.createDataFrame(df_pd)}} > {{df_spark.createOrReplaceTempView("input_table")}} > {{sql = """}} > {{SELECT }} > {{ (SELECT Count(DISTINCT first_name) FROM input_table) }} > {{ AS distinct_value_count}} > {{FROM input_table}} > {{UNION ALL}} > {{SELECT }} > {{ (SELECT Count(DISTINCT surname) FROM input_table) }} > {{ AS distinct_value_count}} > {{FROM input_table """}} > {{spark.sql(sql).toPandas()}} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42381) `CreateDataFrame` should accept objects
[ https://issues.apache.org/jira/browse/SPARK-42381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-42381. --- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39939 [https://github.com/apache/spark/pull/39939] > `CreateDataFrame` should accept objects > --- > > Key: SPARK-42381 > URL: https://issues.apache.org/jira/browse/SPARK-42381 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42382) Upgrade `cyclonedx-maven-plugin` to 2.7.4
Yang Jie created SPARK-42382: Summary: Upgrade `cyclonedx-maven-plugin` to 2.7.4 Key: SPARK-42382 URL: https://issues.apache.org/jira/browse/SPARK-42382 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 3.5.0 Reporter: Yang Jie https://github.com/CycloneDX/cyclonedx-maven-plugin/releases/tag/cyclonedx-maven-plugin-2.7.4 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41053) Better Spark UI scalability and Driver stability for large applications
[ https://issues.apache.org/jira/browse/SPARK-41053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17685765#comment-17685765 ] Gengliang Wang commented on SPARK-41053: [~dongjoon] sure. [~LuciferYang] [~techaddict] [~panbingkun] [~mridul] [~dongjoon] [~cloud_fan] Thanks all for the contributions and reviews! > Better Spark UI scalability and Driver stability for large applications > --- > > Key: SPARK-41053 > URL: https://issues.apache.org/jira/browse/SPARK-41053 > Project: Spark > Issue Type: Umbrella > Components: Spark Core, Web UI >Affects Versions: 3.4.0 >Reporter: Gengliang Wang >Assignee: Apache Spark >Priority: Major > Labels: releasenotes > Attachments: Better Spark UI scalability and Driver stability for > large applications.pdf > > > After SPARK-18085, the Spark history server(SHS) becomes more scalable for > processing large applications by supporting a persistent > KV-store(LevelDB/RocksDB) as the storage layer. > As for the live Spark UI, all the data is still stored in memory, which can > bring memory pressures to the Spark driver for large applications. > For better Spark UI scalability and Driver stability, I propose to > * {*}Support storing all the UI data in a persistent KV store{*}. > RocksDB/LevelDB provides low memory overhead. Their write/read performance is > fast enough to serve the write/read workload for live UI. SHS can leverage > the persistent KV store to fasten its startup. > * *Support a new Protobuf serializer for all the UI data.* The new > serializer is supposed to be faster, according to benchmarks. It will be the > default serializer for the persistent KV store of live UI. As for event logs, > it is optional. The current serializer for UI data is JSON. When writing > persistent KV-store, there is GZip compression. Since there is compression > support in RocksDB/LevelDB, the new serializer won’t compress the output > before writing to the persistent KV store. Here is a benchmark of > writing/reading 100,000 SQLExecutionUIData to/from RocksDB: > > |*Serializer*|*Avg Write time(μs)*|*Avg Read time(μs)*|*RocksDB File Total > Size(MB)*|*Result total size in memory(MB)*| > |*Spark’s KV Serializer(JSON+gzip)*|352.2|119.26|837|868| > |*Protobuf*|109.9|34.3|858|2105| > I am also proposing to support RocksDB instead of both LevelDB & RocksDB in > the live UI. > SPIP: > [https://docs.google.com/document/d/1cuKnFwlTodyVhUQPMuakq2YDaLH05jaY9FRu_aD1zMo/edit?usp=sharing] > SPIP vote: https://lists.apache.org/thread/lom4zcob6237q6nnj46jylkzwmmsxvgj -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41053) Better Spark UI scalability and Driver stability for large applications
[ https://issues.apache.org/jira/browse/SPARK-41053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang resolved SPARK-41053. Resolution: Fixed > Better Spark UI scalability and Driver stability for large applications > --- > > Key: SPARK-41053 > URL: https://issues.apache.org/jira/browse/SPARK-41053 > Project: Spark > Issue Type: Umbrella > Components: Spark Core, Web UI >Affects Versions: 3.4.0 >Reporter: Gengliang Wang >Assignee: Apache Spark >Priority: Major > Labels: releasenotes > Attachments: Better Spark UI scalability and Driver stability for > large applications.pdf > > > After SPARK-18085, the Spark history server(SHS) becomes more scalable for > processing large applications by supporting a persistent > KV-store(LevelDB/RocksDB) as the storage layer. > As for the live Spark UI, all the data is still stored in memory, which can > bring memory pressures to the Spark driver for large applications. > For better Spark UI scalability and Driver stability, I propose to > * {*}Support storing all the UI data in a persistent KV store{*}. > RocksDB/LevelDB provides low memory overhead. Their write/read performance is > fast enough to serve the write/read workload for live UI. SHS can leverage > the persistent KV store to fasten its startup. > * *Support a new Protobuf serializer for all the UI data.* The new > serializer is supposed to be faster, according to benchmarks. It will be the > default serializer for the persistent KV store of live UI. As for event logs, > it is optional. The current serializer for UI data is JSON. When writing > persistent KV-store, there is GZip compression. Since there is compression > support in RocksDB/LevelDB, the new serializer won’t compress the output > before writing to the persistent KV store. Here is a benchmark of > writing/reading 100,000 SQLExecutionUIData to/from RocksDB: > > |*Serializer*|*Avg Write time(μs)*|*Avg Read time(μs)*|*RocksDB File Total > Size(MB)*|*Result total size in memory(MB)*| > |*Spark’s KV Serializer(JSON+gzip)*|352.2|119.26|837|868| > |*Protobuf*|109.9|34.3|858|2105| > I am also proposing to support RocksDB instead of both LevelDB & RocksDB in > the live UI. > SPIP: > [https://docs.google.com/document/d/1cuKnFwlTodyVhUQPMuakq2YDaLH05jaY9FRu_aD1zMo/edit?usp=sharing] > SPIP vote: https://lists.apache.org/thread/lom4zcob6237q6nnj46jylkzwmmsxvgj -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41053) Better Spark UI scalability and Driver stability for large applications
[ https://issues.apache.org/jira/browse/SPARK-41053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang reassigned SPARK-41053: -- Assignee: Apache Spark > Better Spark UI scalability and Driver stability for large applications > --- > > Key: SPARK-41053 > URL: https://issues.apache.org/jira/browse/SPARK-41053 > Project: Spark > Issue Type: Umbrella > Components: Spark Core, Web UI >Affects Versions: 3.4.0 >Reporter: Gengliang Wang >Assignee: Apache Spark >Priority: Major > Labels: releasenotes > Attachments: Better Spark UI scalability and Driver stability for > large applications.pdf > > > After SPARK-18085, the Spark history server(SHS) becomes more scalable for > processing large applications by supporting a persistent > KV-store(LevelDB/RocksDB) as the storage layer. > As for the live Spark UI, all the data is still stored in memory, which can > bring memory pressures to the Spark driver for large applications. > For better Spark UI scalability and Driver stability, I propose to > * {*}Support storing all the UI data in a persistent KV store{*}. > RocksDB/LevelDB provides low memory overhead. Their write/read performance is > fast enough to serve the write/read workload for live UI. SHS can leverage > the persistent KV store to fasten its startup. > * *Support a new Protobuf serializer for all the UI data.* The new > serializer is supposed to be faster, according to benchmarks. It will be the > default serializer for the persistent KV store of live UI. As for event logs, > it is optional. The current serializer for UI data is JSON. When writing > persistent KV-store, there is GZip compression. Since there is compression > support in RocksDB/LevelDB, the new serializer won’t compress the output > before writing to the persistent KV store. Here is a benchmark of > writing/reading 100,000 SQLExecutionUIData to/from RocksDB: > > |*Serializer*|*Avg Write time(μs)*|*Avg Read time(μs)*|*RocksDB File Total > Size(MB)*|*Result total size in memory(MB)*| > |*Spark’s KV Serializer(JSON+gzip)*|352.2|119.26|837|868| > |*Protobuf*|109.9|34.3|858|2105| > I am also proposing to support RocksDB instead of both LevelDB & RocksDB in > the live UI. > SPIP: > [https://docs.google.com/document/d/1cuKnFwlTodyVhUQPMuakq2YDaLH05jaY9FRu_aD1zMo/edit?usp=sharing] > SPIP vote: https://lists.apache.org/thread/lom4zcob6237q6nnj46jylkzwmmsxvgj -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38005) Support cleaning up merged shuffle files and state from external shuffle service
[ https://issues.apache.org/jira/browse/SPARK-38005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mars updated SPARK-38005: - Fix Version/s: 3.4.0 > Support cleaning up merged shuffle files and state from external shuffle > service > > > Key: SPARK-38005 > URL: https://issues.apache.org/jira/browse/SPARK-38005 > Project: Spark > Issue Type: Sub-task > Components: Shuffle, Spark Core >Affects Versions: 3.2.0 >Reporter: Chandni Singh >Priority: Major > Fix For: 3.4.0 > > > Currently merged shuffle files and state is not cleaned up until an > application ends. SPARK-37618 handles the cleanup of regular shuffle files. > This jira will address cleaning up of merged shuffle files/state. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-38005) Support cleaning up merged shuffle files and state from external shuffle service
[ https://issues.apache.org/jira/browse/SPARK-38005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mars resolved SPARK-38005. -- Resolution: Fixed > Support cleaning up merged shuffle files and state from external shuffle > service > > > Key: SPARK-38005 > URL: https://issues.apache.org/jira/browse/SPARK-38005 > Project: Spark > Issue Type: Sub-task > Components: Shuffle, Spark Core >Affects Versions: 3.2.0 >Reporter: Chandni Singh >Priority: Major > > Currently merged shuffle files and state is not cleaned up until an > application ends. SPARK-37618 handles the cleanup of regular shuffle files. > This jira will address cleaning up of merged shuffle files/state. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42267) Support left_outer join
[ https://issues.apache.org/jira/browse/SPARK-42267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17685758#comment-17685758 ] Apache Spark commented on SPARK-42267: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/39940 > Support left_outer join > --- > > Key: SPARK-42267 > URL: https://issues.apache.org/jira/browse/SPARK-42267 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Assignee: Ruifeng Zheng >Priority: Major > Fix For: 3.4.0 > > > ``` > >>> df = spark.range(1) > >>> df2 = spark.range(2) > >>> df.join(df2, how="left_outer") > Traceback (most recent call last): > File "", line 1, in > File "/Users/xinrong.meng/spark/python/pyspark/sql/connect/dataframe.py", > line 438, in join > plan.Join(left=self._plan, right=other._plan, on=on, how=how), > File "/Users/xinrong.meng/spark/python/pyspark/sql/connect/plan.py", line > 730, in __init__ > raise NotImplementedError( > NotImplementedError: > Unsupported join type: left_outer. Supported join types > include: > "inner", "outer", "full", "fullouter", "full_outer", > "leftouter", "left", "left_outer", "rightouter", > "right", "right_outer", "leftsemi", "left_semi", > "semi", "leftanti", "left_anti", "anti", "cross", > ``` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42267) Support left_outer join
[ https://issues.apache.org/jira/browse/SPARK-42267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17685760#comment-17685760 ] Apache Spark commented on SPARK-42267: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/39940 > Support left_outer join > --- > > Key: SPARK-42267 > URL: https://issues.apache.org/jira/browse/SPARK-42267 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Assignee: Ruifeng Zheng >Priority: Major > Fix For: 3.4.0 > > > ``` > >>> df = spark.range(1) > >>> df2 = spark.range(2) > >>> df.join(df2, how="left_outer") > Traceback (most recent call last): > File "", line 1, in > File "/Users/xinrong.meng/spark/python/pyspark/sql/connect/dataframe.py", > line 438, in join > plan.Join(left=self._plan, right=other._plan, on=on, how=how), > File "/Users/xinrong.meng/spark/python/pyspark/sql/connect/plan.py", line > 730, in __init__ > raise NotImplementedError( > NotImplementedError: > Unsupported join type: left_outer. Supported join types > include: > "inner", "outer", "full", "fullouter", "full_outer", > "leftouter", "left", "left_outer", "rightouter", > "right", "right_outer", "leftsemi", "left_semi", > "semi", "leftanti", "left_anti", "anti", "cross", > ``` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42267) Support left_outer join
[ https://issues.apache.org/jira/browse/SPARK-42267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-42267: Assignee: Ruifeng Zheng > Support left_outer join > --- > > Key: SPARK-42267 > URL: https://issues.apache.org/jira/browse/SPARK-42267 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Assignee: Ruifeng Zheng >Priority: Major > Fix For: 3.4.0 > > > ``` > >>> df = spark.range(1) > >>> df2 = spark.range(2) > >>> df.join(df2, how="left_outer") > Traceback (most recent call last): > File "", line 1, in > File "/Users/xinrong.meng/spark/python/pyspark/sql/connect/dataframe.py", > line 438, in join > plan.Join(left=self._plan, right=other._plan, on=on, how=how), > File "/Users/xinrong.meng/spark/python/pyspark/sql/connect/plan.py", line > 730, in __init__ > raise NotImplementedError( > NotImplementedError: > Unsupported join type: left_outer. Supported join types > include: > "inner", "outer", "full", "fullouter", "full_outer", > "leftouter", "left", "left_outer", "rightouter", > "right", "right_outer", "leftsemi", "left_semi", > "semi", "leftanti", "left_anti", "anti", "cross", > ``` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42267) Support left_outer join
[ https://issues.apache.org/jira/browse/SPARK-42267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-42267. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39938 [https://github.com/apache/spark/pull/39938] > Support left_outer join > --- > > Key: SPARK-42267 > URL: https://issues.apache.org/jira/browse/SPARK-42267 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Priority: Major > Fix For: 3.4.0 > > > ``` > >>> df = spark.range(1) > >>> df2 = spark.range(2) > >>> df.join(df2, how="left_outer") > Traceback (most recent call last): > File "", line 1, in > File "/Users/xinrong.meng/spark/python/pyspark/sql/connect/dataframe.py", > line 438, in join > plan.Join(left=self._plan, right=other._plan, on=on, how=how), > File "/Users/xinrong.meng/spark/python/pyspark/sql/connect/plan.py", line > 730, in __init__ > raise NotImplementedError( > NotImplementedError: > Unsupported join type: left_outer. Supported join types > include: > "inner", "outer", "full", "fullouter", "full_outer", > "leftouter", "left", "left_outer", "rightouter", > "right", "right_outer", "leftsemi", "left_semi", > "semi", "leftanti", "left_anti", "anti", "cross", > ``` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42381) `CreateDataFrame` should accept objects
[ https://issues.apache.org/jira/browse/SPARK-42381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17685752#comment-17685752 ] Apache Spark commented on SPARK-42381: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/39939 > `CreateDataFrame` should accept objects > --- > > Key: SPARK-42381 > URL: https://issues.apache.org/jira/browse/SPARK-42381 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42381) `CreateDataFrame` should accept objects
[ https://issues.apache.org/jira/browse/SPARK-42381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42381: Assignee: (was: Apache Spark) > `CreateDataFrame` should accept objects > --- > > Key: SPARK-42381 > URL: https://issues.apache.org/jira/browse/SPARK-42381 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42381) `CreateDataFrame` should accept objects
[ https://issues.apache.org/jira/browse/SPARK-42381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17685750#comment-17685750 ] Apache Spark commented on SPARK-42381: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/39939 > `CreateDataFrame` should accept objects > --- > > Key: SPARK-42381 > URL: https://issues.apache.org/jira/browse/SPARK-42381 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42381) `CreateDataFrame` should accept objects
[ https://issues.apache.org/jira/browse/SPARK-42381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42381: Assignee: Apache Spark > `CreateDataFrame` should accept objects > --- > > Key: SPARK-42381 > URL: https://issues.apache.org/jira/browse/SPARK-42381 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42381) `CreateDataFrame` should accept objects
Ruifeng Zheng created SPARK-42381: - Summary: `CreateDataFrame` should accept objects Key: SPARK-42381 URL: https://issues.apache.org/jira/browse/SPARK-42381 Project: Spark Issue Type: Sub-task Components: Connect, PySpark Affects Versions: 3.4.0 Reporter: Ruifeng Zheng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org