[jira] [Commented] (SPARK-39047) Replace the error class ILLEGAL_SUBSTRING by INVALID_PARAMETER_VALUE
[ https://issues.apache.org/jira/browse/SPARK-39047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17529233#comment-17529233 ] Apache Spark commented on SPARK-39047: -- User 'MaxGekk' has created a pull request for this issue: https://github.com/apache/spark/pull/36390 > Replace the error class ILLEGAL_SUBSTRING by INVALID_PARAMETER_VALUE > > > Key: SPARK-39047 > URL: https://issues.apache.org/jira/browse/SPARK-39047 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > Fix For: 3.4.0 > > > Use the INVALID_PARAMETER_VALUE error class instead of ILLEGAL_SUBSTRING and > remove the last one because it duplicates INVALID_PARAMETER_VALUE. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39047) Replace the error class ILLEGAL_SUBSTRING by INVALID_PARAMETER_VALUE
[ https://issues.apache.org/jira/browse/SPARK-39047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17529232#comment-17529232 ] Apache Spark commented on SPARK-39047: -- User 'MaxGekk' has created a pull request for this issue: https://github.com/apache/spark/pull/36390 > Replace the error class ILLEGAL_SUBSTRING by INVALID_PARAMETER_VALUE > > > Key: SPARK-39047 > URL: https://issues.apache.org/jira/browse/SPARK-39047 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > Fix For: 3.4.0 > > > Use the INVALID_PARAMETER_VALUE error class instead of ILLEGAL_SUBSTRING and > remove the last one because it duplicates INVALID_PARAMETER_VALUE. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-39013) Parser changes to enforce `()` for creating table without any columns
[ https://issues.apache.org/jira/browse/SPARK-39013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jackie Zhang resolved SPARK-39013. -- Resolution: Won't Fix > Parser changes to enforce `()` for creating table without any columns > - > > Key: SPARK-39013 > URL: https://issues.apache.org/jira/browse/SPARK-39013 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Jackie Zhang >Priority: Major > > We would like to enforce the `()` for `CREATE TABLE` queries to explicit > indicate a table without any columns will be created. > E.g. `CREATE TABLE table () USING DELTA`. > Existing behavior of CTAS and CREATE external table at location are not > affected. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39052) Support Char in Literal.create
[ https://issues.apache.org/jira/browse/SPARK-39052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk reassigned SPARK-39052: Assignee: Hyukjin Kwon > Support Char in Literal.create > -- > > Key: SPARK-39052 > URL: https://issues.apache.org/jira/browse/SPARK-39052 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Minor > > https://github.com/apache/spark/commit/54fcaafb094e299f21c18370fddb4a727c88d875 > added the support of Char at the literal. Liternal.create should work with > this too. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-39052) Support Char in Literal.create
[ https://issues.apache.org/jira/browse/SPARK-39052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk resolved SPARK-39052. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 36389 [https://github.com/apache/spark/pull/36389] > Support Char in Literal.create > -- > > Key: SPARK-39052 > URL: https://issues.apache.org/jira/browse/SPARK-39052 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Minor > Fix For: 3.4.0 > > > https://github.com/apache/spark/commit/54fcaafb094e299f21c18370fddb4a727c88d875 > added the support of Char at the literal. Liternal.create should work with > this too. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39047) Replace the error class ILLEGAL_SUBSTRING by INVALID_PARAMETER_VALUE
[ https://issues.apache.org/jira/browse/SPARK-39047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk reassigned SPARK-39047: Assignee: Max Gekk > Replace the error class ILLEGAL_SUBSTRING by INVALID_PARAMETER_VALUE > > > Key: SPARK-39047 > URL: https://issues.apache.org/jira/browse/SPARK-39047 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > > Use the INVALID_PARAMETER_VALUE error class instead of ILLEGAL_SUBSTRING and > remove the last one because it duplicates INVALID_PARAMETER_VALUE. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-39047) Replace the error class ILLEGAL_SUBSTRING by INVALID_PARAMETER_VALUE
[ https://issues.apache.org/jira/browse/SPARK-39047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk resolved SPARK-39047. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 36380 [https://github.com/apache/spark/pull/36380] > Replace the error class ILLEGAL_SUBSTRING by INVALID_PARAMETER_VALUE > > > Key: SPARK-39047 > URL: https://issues.apache.org/jira/browse/SPARK-39047 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > Fix For: 3.4.0 > > > Use the INVALID_PARAMETER_VALUE error class instead of ILLEGAL_SUBSTRING and > remove the last one because it duplicates INVALID_PARAMETER_VALUE. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-39046) Return an empty context string if TreeNode.origin is wrongly set
[ https://issues.apache.org/jira/browse/SPARK-39046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang resolved SPARK-39046. Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 36379 [https://github.com/apache/spark/pull/36379] > Return an empty context string if TreeNode.origin is wrongly set > > > Key: SPARK-39046 > URL: https://issues.apache.org/jira/browse/SPARK-39046 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Minor > Fix For: 3.3.0 > > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39052) Support Char in Literal.create
[ https://issues.apache.org/jira/browse/SPARK-39052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39052: Assignee: (was: Apache Spark) > Support Char in Literal.create > -- > > Key: SPARK-39052 > URL: https://issues.apache.org/jira/browse/SPARK-39052 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Minor > > https://github.com/apache/spark/commit/54fcaafb094e299f21c18370fddb4a727c88d875 > added the support of Char at the literal. Liternal.create should work with > this too. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39052) Support Char in Literal.create
[ https://issues.apache.org/jira/browse/SPARK-39052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17529147#comment-17529147 ] Apache Spark commented on SPARK-39052: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/36389 > Support Char in Literal.create > -- > > Key: SPARK-39052 > URL: https://issues.apache.org/jira/browse/SPARK-39052 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Minor > > https://github.com/apache/spark/commit/54fcaafb094e299f21c18370fddb4a727c88d875 > added the support of Char at the literal. Liternal.create should work with > this too. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39052) Support Char in Literal.create
[ https://issues.apache.org/jira/browse/SPARK-39052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39052: Assignee: Apache Spark > Support Char in Literal.create > -- > > Key: SPARK-39052 > URL: https://issues.apache.org/jira/browse/SPARK-39052 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Apache Spark >Priority: Minor > > https://github.com/apache/spark/commit/54fcaafb094e299f21c18370fddb4a727c88d875 > added the support of Char at the literal. Liternal.create should work with > this too. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39052) Support Char in Literal.create
[ https://issues.apache.org/jira/browse/SPARK-39052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-39052: - Affects Version/s: 3.4.0 (was: 3.3.0) > Support Char in Literal.create > -- > > Key: SPARK-39052 > URL: https://issues.apache.org/jira/browse/SPARK-39052 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Minor > > https://github.com/apache/spark/commit/54fcaafb094e299f21c18370fddb4a727c88d875 > added the support of Char at the literal. Liternal.create should work with > this too. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39052) Support Char in Literal.create
[ https://issues.apache.org/jira/browse/SPARK-39052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-39052: - Priority: Minor (was: Major) > Support Char in Literal.create > -- > > Key: SPARK-39052 > URL: https://issues.apache.org/jira/browse/SPARK-39052 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Hyukjin Kwon >Priority: Minor > > https://github.com/apache/spark/commit/54fcaafb094e299f21c18370fddb4a727c88d875 > added the support of Char at the literal. Liternal.create should work with > this too. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39052) Support Char in Literal.create
Hyukjin Kwon created SPARK-39052: Summary: Support Char in Literal.create Key: SPARK-39052 URL: https://issues.apache.org/jira/browse/SPARK-39052 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.3.0 Reporter: Hyukjin Kwon https://github.com/apache/spark/commit/54fcaafb094e299f21c18370fddb4a727c88d875 added the support of Char at the literal. Liternal.create should work with this too. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39036) Support Alter Table/Partition Concatenate command
[ https://issues.apache.org/jira/browse/SPARK-39036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-39036: - Summary: Support Alter Table/Partition Concatenate command (was: support Alter Table/Partition Concatenate command) > Support Alter Table/Partition Concatenate command > - > > Key: SPARK-39036 > URL: https://issues.apache.org/jira/browse/SPARK-39036 > Project: Spark > Issue Type: New Feature > Components: Spark Core, SQL >Affects Versions: 3.3.0 >Reporter: gabrywu >Priority: Major > > Hi, folks, > In Hive, we can use following command to merge small files, however, there is > not a corresponding command to do that. > I believe it's useful and it's not enough only using AQE. Is anyone working > on this to merge small files? If not, I want to create a PR to implement it > > {code:java} > ALTER TABLE table_name [PARTITION (partition_key = 'partition_value' [, > ...])] CONCATENATE;{code} > > [https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AlterTable/PartitionConcatenate] > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39044) AggregatingAccumulator with TypedImperativeAggregate throwing NullPointerException
[ https://issues.apache.org/jira/browse/SPARK-39044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17529144#comment-17529144 ] Hyukjin Kwon commented on SPARK-39044: -- [~rshkv] it would be much easier to debug more if there's an self-contained reproducer > AggregatingAccumulator with TypedImperativeAggregate throwing > NullPointerException > -- > > Key: SPARK-39044 > URL: https://issues.apache.org/jira/browse/SPARK-39044 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.1 >Reporter: Willi Raschkowski >Priority: Major > > We're using a custom TypedImperativeAggregate inside an > AggregatingAccumulator (via {{observe()}} and get the error below. It looks > like we're trying to serialize an aggregation buffer that hasn't been > initialized yet. > {code} > Caused by: org.apache.spark.SparkException: Job aborted. > at > org.apache.spark.sql.errors.QueryExecutionErrors$.jobAbortedError(QueryExecutionErrors.scala:496) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:251) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:186) > at > org.apache.spark.sql.execution.datasources.DataSource.writeAndRead(DataSource.scala:540) > ... > Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: > Task 9 in stage 1.0 failed 4 times, most recent failure: Lost task 9.3 in > stage 1.0 (TID 32) (10.0.134.136 executor 3): java.io.IOException: > java.lang.NullPointerException > at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1435) > at > org.apache.spark.scheduler.DirectTaskResult.writeExternal(TaskResult.scala:51) > at > java.base/java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:1460) > at > java.base/java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) > at > java.base/java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1179) > at > java.base/java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:349) > at > org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:46) > at > org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:114) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:633) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base/java.lang.Thread.run(Thread.java:829) > Caused by: java.lang.NullPointerException > at > org.apache.spark.sql.catalyst.expressions.aggregate.TypedImperativeAggregate.getBufferObject(interfaces.scala:638) > at > org.apache.spark.sql.catalyst.expressions.aggregate.TypedImperativeAggregate.getBufferObject(interfaces.scala:599) > at > org.apache.spark.sql.catalyst.expressions.aggregate.TypedImperativeAggregate.serializeAggregateBufferInPlace(interfaces.scala:621) > at > org.apache.spark.sql.execution.AggregatingAccumulator.withBufferSerialized(AggregatingAccumulator.scala:205) > at > org.apache.spark.sql.execution.AggregatingAccumulator.withBufferSerialized(AggregatingAccumulator.scala:33) > at > org.apache.spark.util.AccumulatorV2.writeReplace(AccumulatorV2.scala:186) > at jdk.internal.reflect.GeneratedMethodAccessor49.invoke(Unknown Source) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:566) > at > java.base/java.io.ObjectStreamClass.invokeWriteReplace(ObjectStreamClass.java:1235) > at > java.base/java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1137) > at > java.base/java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:349) > at > org.apache.spark.scheduler.DirectTaskResult.$anonfun$writeExternal$2(TaskResult.scala:55) > at > org.apache.spark.scheduler.DirectTaskResult.$anonfun$writeExternal$2$adapted(TaskResult.scala:55) > at scala.collection.Iterator.foreach(Iterator.scala:943) > at scala.collection.Iterator.foreach$(Iterator.scala:943) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1431) > at scala.collection.IterableLike.foreach(IterableLike.scala:74) > at scala.collection.IterableLike.foreach$(IterableLike.scala:73) > at scala.collection.AbstractIterable.foreach(Iterable.scala:56) > at > org.apache.spark.scheduler.DirectTaskResult.$anonfun$writeEx
[jira] [Assigned] (SPARK-38988) Pandas API - "PerformanceWarning: DataFrame is highly fragmented." get printed many times.
[ https://issues.apache.org/jira/browse/SPARK-38988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-38988: Assignee: Xinrong Meng > Pandas API - "PerformanceWarning: DataFrame is highly fragmented." get > printed many times. > --- > > Key: SPARK-38988 > URL: https://issues.apache.org/jira/browse/SPARK-38988 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.3.0, 3.4.0 >Reporter: Bjørn Jørgensen >Assignee: Xinrong Meng >Priority: Major > Attachments: Untitled.html, info.txt, warning printed.txt > > > I add a file and a notebook with the info msg I get when I run df.info() > Spark master build from 13.04.22. > df.shape > (763300, 224) -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-38988) Pandas API - "PerformanceWarning: DataFrame is highly fragmented." get printed many times.
[ https://issues.apache.org/jira/browse/SPARK-38988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-38988. -- Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 36367 [https://github.com/apache/spark/pull/36367] > Pandas API - "PerformanceWarning: DataFrame is highly fragmented." get > printed many times. > --- > > Key: SPARK-38988 > URL: https://issues.apache.org/jira/browse/SPARK-38988 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.3.0, 3.4.0 >Reporter: Bjørn Jørgensen >Assignee: Xinrong Meng >Priority: Major > Fix For: 3.3.0 > > Attachments: Untitled.html, info.txt, warning printed.txt > > > I add a file and a notebook with the info msg I get when I run df.info() > Spark master build from 13.04.22. > df.shape > (763300, 224) -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39049) Remove unneeded pass
[ https://issues.apache.org/jira/browse/SPARK-39049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-39049: Assignee: Bjørn Jørgensen > Remove unneeded pass > > > Key: SPARK-39049 > URL: https://issues.apache.org/jira/browse/SPARK-39049 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Bjørn Jørgensen >Assignee: Bjørn Jørgensen >Priority: Major > > Remove unneeded pass -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-39049) Remove unneeded pass
[ https://issues.apache.org/jira/browse/SPARK-39049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-39049. -- Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 36383 [https://github.com/apache/spark/pull/36383] > Remove unneeded pass > > > Key: SPARK-39049 > URL: https://issues.apache.org/jira/browse/SPARK-39049 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Bjørn Jørgensen >Assignee: Bjørn Jørgensen >Priority: Major > Fix For: 3.3.0 > > > Remove unneeded pass -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39051) Minor refactoring of `python/pyspark/sql/pandas/conversion.py`
[ https://issues.apache.org/jira/browse/SPARK-39051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-39051: Assignee: Xinrong Meng > Minor refactoring of `python/pyspark/sql/pandas/conversion.py` > -- > > Key: SPARK-39051 > URL: https://issues.apache.org/jira/browse/SPARK-39051 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > > Minor refactoring of `python/pyspark/sql/pandas/conversion.py` -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-39051) Minor refactoring of `python/pyspark/sql/pandas/conversion.py`
[ https://issues.apache.org/jira/browse/SPARK-39051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-39051. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 36384 [https://github.com/apache/spark/pull/36384] > Minor refactoring of `python/pyspark/sql/pandas/conversion.py` > -- > > Key: SPARK-39051 > URL: https://issues.apache.org/jira/browse/SPARK-39051 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > Fix For: 3.4.0 > > > Minor refactoring of `python/pyspark/sql/pandas/conversion.py` -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39038) Skip reporting test results if triggering workflow was skipped
[ https://issues.apache.org/jira/browse/SPARK-39038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-39038: Assignee: Enrico Minack > Skip reporting test results if triggering workflow was skipped > -- > > Key: SPARK-39038 > URL: https://issues.apache.org/jira/browse/SPARK-39038 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 3.4.0 >Reporter: Enrico Minack >Assignee: Enrico Minack >Priority: Major > > The `"Report test results"` workflow is triggered when either `"Build and > test"` or `"Build and test (ANSI)"` complete. On fork repositories, workflow > `"Build and test (ANSI)"` is always skipped. > The triggered `"Report test results"` workflow downloads artifacts from the > triggering workflow and errors because there are none artifacts. > Therefore, the `"Report test results"` workflow should be skipped when the > triggering workflow completed with conclusion `'skipped'`. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-39038) Skip reporting test results if triggering workflow was skipped
[ https://issues.apache.org/jira/browse/SPARK-39038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-39038. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 36371 [https://github.com/apache/spark/pull/36371] > Skip reporting test results if triggering workflow was skipped > -- > > Key: SPARK-39038 > URL: https://issues.apache.org/jira/browse/SPARK-39038 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 3.4.0 >Reporter: Enrico Minack >Assignee: Enrico Minack >Priority: Major > Fix For: 3.4.0 > > > The `"Report test results"` workflow is triggered when either `"Build and > test"` or `"Build and test (ANSI)"` complete. On fork repositories, workflow > `"Build and test (ANSI)"` is always skipped. > The triggered `"Report test results"` workflow downloads artifacts from the > triggering workflow and errors because there are none artifacts. > Therefore, the `"Report test results"` workflow should be skipped when the > triggering workflow completed with conclusion `'skipped'`. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38918) Nested column pruning should filter out attributes that do not belong to the current relation
[ https://issues.apache.org/jira/browse/SPARK-38918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17529116#comment-17529116 ] Apache Spark commented on SPARK-38918: -- User 'allisonwang-db' has created a pull request for this issue: https://github.com/apache/spark/pull/36388 > Nested column pruning should filter out attributes that do not belong to the > current relation > - > > Key: SPARK-38918 > URL: https://issues.apache.org/jira/browse/SPARK-38918 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Major > Fix For: 3.3.0, 3.4.0 > > > `SchemaPruning` currently does not check if the root field of a nested column > belongs to the current relation. This can happen when the filter contains > correlated subqueries, where the children field can contain attributes from > both the inner and the outer query. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38918) Nested column pruning should filter out attributes that do not belong to the current relation
[ https://issues.apache.org/jira/browse/SPARK-38918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17529115#comment-17529115 ] Apache Spark commented on SPARK-38918: -- User 'allisonwang-db' has created a pull request for this issue: https://github.com/apache/spark/pull/36388 > Nested column pruning should filter out attributes that do not belong to the > current relation > - > > Key: SPARK-38918 > URL: https://issues.apache.org/jira/browse/SPARK-38918 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Major > Fix For: 3.3.0, 3.4.0 > > > `SchemaPruning` currently does not check if the root field of a nested column > belongs to the current relation. This can happen when the filter contains > correlated subqueries, where the children field can contain attributes from > both the inner and the outer query. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38918) Nested column pruning should filter out attributes that do not belong to the current relation
[ https://issues.apache.org/jira/browse/SPARK-38918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17529114#comment-17529114 ] Apache Spark commented on SPARK-38918: -- User 'allisonwang-db' has created a pull request for this issue: https://github.com/apache/spark/pull/36387 > Nested column pruning should filter out attributes that do not belong to the > current relation > - > > Key: SPARK-38918 > URL: https://issues.apache.org/jira/browse/SPARK-38918 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Major > Fix For: 3.3.0, 3.4.0 > > > `SchemaPruning` currently does not check if the root field of a nested column > belongs to the current relation. This can happen when the filter contains > correlated subqueries, where the children field can contain attributes from > both the inner and the outer query. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38918) Nested column pruning should filter out attributes that do not belong to the current relation
[ https://issues.apache.org/jira/browse/SPARK-38918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17529113#comment-17529113 ] Apache Spark commented on SPARK-38918: -- User 'allisonwang-db' has created a pull request for this issue: https://github.com/apache/spark/pull/36386 > Nested column pruning should filter out attributes that do not belong to the > current relation > - > > Key: SPARK-38918 > URL: https://issues.apache.org/jira/browse/SPARK-38918 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Major > Fix For: 3.3.0, 3.4.0 > > > `SchemaPruning` currently does not check if the root field of a nested column > belongs to the current relation. This can happen when the filter contains > correlated subqueries, where the children field can contain attributes from > both the inner and the outer query. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38918) Nested column pruning should filter out attributes that do not belong to the current relation
[ https://issues.apache.org/jira/browse/SPARK-38918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17529112#comment-17529112 ] Apache Spark commented on SPARK-38918: -- User 'allisonwang-db' has created a pull request for this issue: https://github.com/apache/spark/pull/36387 > Nested column pruning should filter out attributes that do not belong to the > current relation > - > > Key: SPARK-38918 > URL: https://issues.apache.org/jira/browse/SPARK-38918 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Major > Fix For: 3.3.0, 3.4.0 > > > `SchemaPruning` currently does not check if the root field of a nested column > belongs to the current relation. This can happen when the filter contains > correlated subqueries, where the children field can contain attributes from > both the inner and the outer query. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38918) Nested column pruning should filter out attributes that do not belong to the current relation
[ https://issues.apache.org/jira/browse/SPARK-38918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17529111#comment-17529111 ] Apache Spark commented on SPARK-38918: -- User 'allisonwang-db' has created a pull request for this issue: https://github.com/apache/spark/pull/36386 > Nested column pruning should filter out attributes that do not belong to the > current relation > - > > Key: SPARK-38918 > URL: https://issues.apache.org/jira/browse/SPARK-38918 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Major > Fix For: 3.3.0, 3.4.0 > > > `SchemaPruning` currently does not check if the root field of a nested column > belongs to the current relation. This can happen when the filter contains > correlated subqueries, where the children field can contain attributes from > both the inner and the outer query. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39051) Minor refactoring of `python/pyspark/sql/pandas/conversion.py`
[ https://issues.apache.org/jira/browse/SPARK-39051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39051: Assignee: (was: Apache Spark) > Minor refactoring of `python/pyspark/sql/pandas/conversion.py` > -- > > Key: SPARK-39051 > URL: https://issues.apache.org/jira/browse/SPARK-39051 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Priority: Major > > Minor refactoring of `python/pyspark/sql/pandas/conversion.py` -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39051) Minor refactoring of `python/pyspark/sql/pandas/conversion.py`
[ https://issues.apache.org/jira/browse/SPARK-39051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39051: Assignee: Apache Spark > Minor refactoring of `python/pyspark/sql/pandas/conversion.py` > -- > > Key: SPARK-39051 > URL: https://issues.apache.org/jira/browse/SPARK-39051 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Assignee: Apache Spark >Priority: Major > > Minor refactoring of `python/pyspark/sql/pandas/conversion.py` -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39051) Minor refactoring of `python/pyspark/sql/pandas/conversion.py`
[ https://issues.apache.org/jira/browse/SPARK-39051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17529109#comment-17529109 ] Apache Spark commented on SPARK-39051: -- User 'xinrong-databricks' has created a pull request for this issue: https://github.com/apache/spark/pull/36384 > Minor refactoring of `python/pyspark/sql/pandas/conversion.py` > -- > > Key: SPARK-39051 > URL: https://issues.apache.org/jira/browse/SPARK-39051 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Priority: Major > > Minor refactoring of `python/pyspark/sql/pandas/conversion.py` -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39051) Minor refactoring of `python/pyspark/sql/pandas/conversion.py`
Xinrong Meng created SPARK-39051: Summary: Minor refactoring of `python/pyspark/sql/pandas/conversion.py` Key: SPARK-39051 URL: https://issues.apache.org/jira/browse/SPARK-39051 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 3.4.0 Reporter: Xinrong Meng Minor refactoring of `python/pyspark/sql/pandas/conversion.py` -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38388) Repartition + Stage retries could lead to incorrect data
[ https://issues.apache.org/jira/browse/SPARK-38388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17529097#comment-17529097 ] Jason Xu commented on SPARK-38388: -- Hi [~cloud_fan], could you assign this ticket to me? I have bandwidth to work on it in May. Another possible solution: Since the root cause is related to non-deterministic data in shuffling, is it possible to let driver to keep checksums of all shuffle blocks, if a map task re-attempt generates shuffle block with different checksum, Spark can detect on-the-fly and rerun all reduce tasks to avoid correctness issue. I feel this could be a better solution because this is transparent to users, it doesn't require users to explicitly mark their data as nondeterminate. There are challenges for the other solution: 1. It wouldn't be easy to educate regular Spark users about the issue, they might not see the advice or they don't understand the importance of marking DeterministicLevel. 2. Even if they understand, it's hard for users to always remember to mark nondeterminate of their data. Would do you think? > Repartition + Stage retries could lead to incorrect data > - > > Key: SPARK-38388 > URL: https://issues.apache.org/jira/browse/SPARK-38388 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0, 3.1.1 > Environment: Spark 2.4 and 3.1 >Reporter: Jason Xu >Priority: Major > Labels: correctness, data-loss > > Spark repartition uses RoundRobinPartitioning, the generated results is > non-deterministic when data has some randomness and stage/task retries happen. > The bug can be triggered when upstream data has some randomness, a > repartition is called on them, then followed by result stage (could be more > stages). > As the pattern shows below: > upstream stage (data with randomness) -> (repartition shuffle) -> result stage > When one executor goes down at result stage, some tasks of that stage might > have finished, others would fail, shuffle files on that executor also get > lost, some tasks from previous stage (upstream data generation, repartition) > will need to rerun to generate dependent shuffle data files. > Because data has some randomness, regenerated data in upstream retried tasks > is slightly different, repartition then generates inconsistent ordering, then > tasks at result stage will be retried generating different data. > This is similar but different to > https://issues.apache.org/jira/browse/SPARK-23207, fix for it uses extra > local sort to make the row ordering deterministic, the sorting algorithm it > uses simply compares row/record hash. But in this case, upstream data has > some randomness, the sorting algorithm doesn't help keep the order, thus > RoundRobinPartitioning introduced non-deterministic result. > The following code returns 986415, instead of 100: > {code:java} > import scala.sys.process._ > import org.apache.spark.TaskContext > case class TestObject(id: Long, value: Double) > val ds = spark.range(0, 1000 * 1000, 1).repartition(100, > $"id").withColumn("val", rand()).repartition(100).map { > row => if (TaskContext.get.stageAttemptNumber == 0 && > TaskContext.get.attemptNumber == 0 && TaskContext.get.partitionId > 97) { > throw new Exception("pkill -f java".!!) > } > TestObject(row.getLong(0), row.getDouble(1)) > } > ds.toDF("id", "value").write.mode("overwrite").saveAsTable("tmp.test_table") > spark.sql("select count(distinct id) from tmp.test_table").show{code} > Command: > {code:java} > spark-shell --num-executors 10 (--conf spark.dynamicAllocation.enabled=false > --conf spark.shuffle.service.enabled=false){code} > To simulate the issue, disable external shuffle service is needed (if it's > also enabled by default in your environment), this is to trigger shuffle > file loss and previous stage retries. > In our production, we have external shuffle service enabled, this data > correctness issue happened when there were node losses. > Although there's some non-deterministic factor in upstream data, user > wouldn't expect to see incorrect result. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39050) Convert UNSUPPORTED_OPERATION to UNSUPPORTED_FEATURE
[ https://issues.apache.org/jira/browse/SPARK-39050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17529093#comment-17529093 ] Apache Spark commented on SPARK-39050: -- User 'srielau' has created a pull request for this issue: https://github.com/apache/spark/pull/36385 > Convert UNSUPPORTED_OPERATION to UNSUPPORTED_FEATURE > > > Key: SPARK-39050 > URL: https://issues.apache.org/jira/browse/SPARK-39050 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Serge Rielau >Priority: Major > > UNSUPPORTED_OPERATION is very similar to UNSUPPORTED_FEATURE. > We can just roll them together -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39050) Convert UNSUPPORTED_OPERATION to UNSUPPORTED_FEATURE
[ https://issues.apache.org/jira/browse/SPARK-39050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39050: Assignee: Apache Spark > Convert UNSUPPORTED_OPERATION to UNSUPPORTED_FEATURE > > > Key: SPARK-39050 > URL: https://issues.apache.org/jira/browse/SPARK-39050 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Serge Rielau >Assignee: Apache Spark >Priority: Major > > UNSUPPORTED_OPERATION is very similar to UNSUPPORTED_FEATURE. > We can just roll them together -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39050) Convert UNSUPPORTED_OPERATION to UNSUPPORTED_FEATURE
[ https://issues.apache.org/jira/browse/SPARK-39050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39050: Assignee: (was: Apache Spark) > Convert UNSUPPORTED_OPERATION to UNSUPPORTED_FEATURE > > > Key: SPARK-39050 > URL: https://issues.apache.org/jira/browse/SPARK-39050 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Serge Rielau >Priority: Major > > UNSUPPORTED_OPERATION is very similar to UNSUPPORTED_FEATURE. > We can just roll them together -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39050) Convert UNSUPPORTED_OPERATION to UNSUPPORTED_FEATURE
Serge Rielau created SPARK-39050: Summary: Convert UNSUPPORTED_OPERATION to UNSUPPORTED_FEATURE Key: SPARK-39050 URL: https://issues.apache.org/jira/browse/SPARK-39050 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.3.0 Reporter: Serge Rielau UNSUPPORTED_OPERATION is very similar to UNSUPPORTED_FEATURE. We can just roll them together -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39049) Remove unneeded pass
[ https://issues.apache.org/jira/browse/SPARK-39049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39049: Assignee: (was: Apache Spark) > Remove unneeded pass > > > Key: SPARK-39049 > URL: https://issues.apache.org/jira/browse/SPARK-39049 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Bjørn Jørgensen >Priority: Major > > Remove unneeded pass -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39049) Remove unneeded pass
[ https://issues.apache.org/jira/browse/SPARK-39049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17528941#comment-17528941 ] Apache Spark commented on SPARK-39049: -- User 'bjornjorgensen' has created a pull request for this issue: https://github.com/apache/spark/pull/36383 > Remove unneeded pass > > > Key: SPARK-39049 > URL: https://issues.apache.org/jira/browse/SPARK-39049 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Bjørn Jørgensen >Priority: Major > > Remove unneeded pass -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39049) Remove unneeded pass
[ https://issues.apache.org/jira/browse/SPARK-39049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39049: Assignee: Apache Spark > Remove unneeded pass > > > Key: SPARK-39049 > URL: https://issues.apache.org/jira/browse/SPARK-39049 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Bjørn Jørgensen >Assignee: Apache Spark >Priority: Major > > Remove unneeded pass -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39049) Remove unneeded pass
Bjørn Jørgensen created SPARK-39049: --- Summary: Remove unneeded pass Key: SPARK-39049 URL: https://issues.apache.org/jira/browse/SPARK-39049 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 3.4.0 Reporter: Bjørn Jørgensen Remove unneeded pass -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38988) Pandas API - "PerformanceWarning: DataFrame is highly fragmented." get printed many times.
[ https://issues.apache.org/jira/browse/SPARK-38988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38988: Assignee: (was: Apache Spark) > Pandas API - "PerformanceWarning: DataFrame is highly fragmented." get > printed many times. > --- > > Key: SPARK-38988 > URL: https://issues.apache.org/jira/browse/SPARK-38988 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.3.0, 3.4.0 >Reporter: Bjørn Jørgensen >Priority: Major > Attachments: Untitled.html, info.txt, warning printed.txt > > > I add a file and a notebook with the info msg I get when I run df.info() > Spark master build from 13.04.22. > df.shape > (763300, 224) -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38988) Pandas API - "PerformanceWarning: DataFrame is highly fragmented." get printed many times.
[ https://issues.apache.org/jira/browse/SPARK-38988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38988: Assignee: Apache Spark > Pandas API - "PerformanceWarning: DataFrame is highly fragmented." get > printed many times. > --- > > Key: SPARK-38988 > URL: https://issues.apache.org/jira/browse/SPARK-38988 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.3.0, 3.4.0 >Reporter: Bjørn Jørgensen >Assignee: Apache Spark >Priority: Major > Attachments: Untitled.html, info.txt, warning printed.txt > > > I add a file and a notebook with the info msg I get when I run df.info() > Spark master build from 13.04.22. > df.shape > (763300, 224) -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38988) Pandas API - "PerformanceWarning: DataFrame is highly fragmented." get printed many times.
[ https://issues.apache.org/jira/browse/SPARK-38988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17528933#comment-17528933 ] Apache Spark commented on SPARK-38988: -- User 'xinrong-databricks' has created a pull request for this issue: https://github.com/apache/spark/pull/36367 > Pandas API - "PerformanceWarning: DataFrame is highly fragmented." get > printed many times. > --- > > Key: SPARK-38988 > URL: https://issues.apache.org/jira/browse/SPARK-38988 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.3.0, 3.4.0 >Reporter: Bjørn Jørgensen >Priority: Major > Attachments: Untitled.html, info.txt, warning printed.txt > > > I add a file and a notebook with the info msg I get when I run df.info() > Spark master build from 13.04.22. > df.shape > (763300, 224) -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39048) Refactor `GroupBy._reduce_for_stat_function` on accepted data types
[ https://issues.apache.org/jira/browse/SPARK-39048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-39048: - Summary: Refactor `GroupBy._reduce_for_stat_function` on accepted data types (was: Refactor GroupBy._reduce_for_stat_function on accepted data types ) > Refactor `GroupBy._reduce_for_stat_function` on accepted data types > > > Key: SPARK-39048 > URL: https://issues.apache.org/jira/browse/SPARK-39048 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Priority: Major > > `Groupby._reduce_for_stat_function` is a common helper function leveraged by > multiple statistical functions of GroupBy objects. > It defines parameters `only_numeric` and `bool_as_numeric` to control > accepted Spark types. > To be consistent with pandas API, we may also have to introduce > `str_as_numeric` for `sum` for example. > Instead of introducing parameters designated for each Spark type, the PR is > proposed to introduce a parameter `accepted_spark_types` to specify accepted > types of Spark columns to be aggregated. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39048) Refactor GroupBy._reduce_for_stat_function on accepted data types
[ https://issues.apache.org/jira/browse/SPARK-39048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17528928#comment-17528928 ] Apache Spark commented on SPARK-39048: -- User 'xinrong-databricks' has created a pull request for this issue: https://github.com/apache/spark/pull/36382 > Refactor GroupBy._reduce_for_stat_function on accepted data types > -- > > Key: SPARK-39048 > URL: https://issues.apache.org/jira/browse/SPARK-39048 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Priority: Major > > `Groupby._reduce_for_stat_function` is a common helper function leveraged by > multiple statistical functions of GroupBy objects. > It defines parameters `only_numeric` and `bool_as_numeric` to control > accepted Spark types. > To be consistent with pandas API, we may also have to introduce > `str_as_numeric` for `sum` for example. > Instead of introducing parameters designated for each Spark type, the PR is > proposed to introduce a parameter `accepted_spark_types` to specify accepted > types of Spark columns to be aggregated. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39048) Refactor GroupBy._reduce_for_stat_function on accepted data types
[ https://issues.apache.org/jira/browse/SPARK-39048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39048: Assignee: Apache Spark > Refactor GroupBy._reduce_for_stat_function on accepted data types > -- > > Key: SPARK-39048 > URL: https://issues.apache.org/jira/browse/SPARK-39048 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Assignee: Apache Spark >Priority: Major > > `Groupby._reduce_for_stat_function` is a common helper function leveraged by > multiple statistical functions of GroupBy objects. > It defines parameters `only_numeric` and `bool_as_numeric` to control > accepted Spark types. > To be consistent with pandas API, we may also have to introduce > `str_as_numeric` for `sum` for example. > Instead of introducing parameters designated for each Spark type, the PR is > proposed to introduce a parameter `accepted_spark_types` to specify accepted > types of Spark columns to be aggregated. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39048) Refactor GroupBy._reduce_for_stat_function on accepted data types
[ https://issues.apache.org/jira/browse/SPARK-39048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39048: Assignee: (was: Apache Spark) > Refactor GroupBy._reduce_for_stat_function on accepted data types > -- > > Key: SPARK-39048 > URL: https://issues.apache.org/jira/browse/SPARK-39048 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Priority: Major > > `Groupby._reduce_for_stat_function` is a common helper function leveraged by > multiple statistical functions of GroupBy objects. > It defines parameters `only_numeric` and `bool_as_numeric` to control > accepted Spark types. > To be consistent with pandas API, we may also have to introduce > `str_as_numeric` for `sum` for example. > Instead of introducing parameters designated for each Spark type, the PR is > proposed to introduce a parameter `accepted_spark_types` to specify accepted > types of Spark columns to be aggregated. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39048) Refactor GroupBy._reduce_for_stat_function on accepted data types
[ https://issues.apache.org/jira/browse/SPARK-39048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17528927#comment-17528927 ] Apache Spark commented on SPARK-39048: -- User 'xinrong-databricks' has created a pull request for this issue: https://github.com/apache/spark/pull/36382 > Refactor GroupBy._reduce_for_stat_function on accepted data types > -- > > Key: SPARK-39048 > URL: https://issues.apache.org/jira/browse/SPARK-39048 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Priority: Major > > `Groupby._reduce_for_stat_function` is a common helper function leveraged by > multiple statistical functions of GroupBy objects. > It defines parameters `only_numeric` and `bool_as_numeric` to control > accepted Spark types. > To be consistent with pandas API, we may also have to introduce > `str_as_numeric` for `sum` for example. > Instead of introducing parameters designated for each Spark type, the PR is > proposed to introduce a parameter `accepted_spark_types` to specify accepted > types of Spark columns to be aggregated. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39048) Refactor GroupBy._reduce_for_stat_function on accepted data types
Xinrong Meng created SPARK-39048: Summary: Refactor GroupBy._reduce_for_stat_function on accepted data types Key: SPARK-39048 URL: https://issues.apache.org/jira/browse/SPARK-39048 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 3.4.0 Reporter: Xinrong Meng `Groupby._reduce_for_stat_function` is a common helper function leveraged by multiple statistical functions of GroupBy objects. It defines parameters `only_numeric` and `bool_as_numeric` to control accepted Spark types. To be consistent with pandas API, we may also have to introduce `str_as_numeric` for `sum` for example. Instead of introducing parameters designated for each Spark type, the PR is proposed to introduce a parameter `accepted_spark_types` to specify accepted types of Spark columns to be aggregated. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37942) Use error classes in the compilation errors of properties
[ https://issues.apache.org/jira/browse/SPARK-37942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37942: Assignee: Apache Spark > Use error classes in the compilation errors of properties > - > > Key: SPARK-37942 > URL: https://issues.apache.org/jira/browse/SPARK-37942 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Max Gekk >Assignee: Apache Spark >Priority: Major > > Migrate the following errors in QueryCompilationErrors: > * cannotReadCorruptedTablePropertyError > * cannotCreateJDBCNamespaceWithPropertyError > * cannotSetJDBCNamespaceWithPropertyError > * cannotUnsetJDBCNamespaceWithPropertyError > * alterTableSerDePropertiesNotSupportedForV2TablesError > * unsetNonExistentPropertyError > onto use error classes. Throw an implementation of SparkThrowable. Also write > a test per every error in QueryCompilationErrorsSuite. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37942) Use error classes in the compilation errors of properties
[ https://issues.apache.org/jira/browse/SPARK-37942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17528896#comment-17528896 ] Apache Spark commented on SPARK-37942: -- User 'jerqi' has created a pull request for this issue: https://github.com/apache/spark/pull/36381 > Use error classes in the compilation errors of properties > - > > Key: SPARK-37942 > URL: https://issues.apache.org/jira/browse/SPARK-37942 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Max Gekk >Priority: Major > > Migrate the following errors in QueryCompilationErrors: > * cannotReadCorruptedTablePropertyError > * cannotCreateJDBCNamespaceWithPropertyError > * cannotSetJDBCNamespaceWithPropertyError > * cannotUnsetJDBCNamespaceWithPropertyError > * alterTableSerDePropertiesNotSupportedForV2TablesError > * unsetNonExistentPropertyError > onto use error classes. Throw an implementation of SparkThrowable. Also write > a test per every error in QueryCompilationErrorsSuite. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37942) Use error classes in the compilation errors of properties
[ https://issues.apache.org/jira/browse/SPARK-37942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37942: Assignee: (was: Apache Spark) > Use error classes in the compilation errors of properties > - > > Key: SPARK-37942 > URL: https://issues.apache.org/jira/browse/SPARK-37942 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Max Gekk >Priority: Major > > Migrate the following errors in QueryCompilationErrors: > * cannotReadCorruptedTablePropertyError > * cannotCreateJDBCNamespaceWithPropertyError > * cannotSetJDBCNamespaceWithPropertyError > * cannotUnsetJDBCNamespaceWithPropertyError > * alterTableSerDePropertiesNotSupportedForV2TablesError > * unsetNonExistentPropertyError > onto use error classes. Throw an implementation of SparkThrowable. Also write > a test per every error in QueryCompilationErrorsSuite. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39047) Replace the error class ILLEGAL_SUBSTRING by INVALID_PARAMETER_VALUE
[ https://issues.apache.org/jira/browse/SPARK-39047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1752#comment-1752 ] Apache Spark commented on SPARK-39047: -- User 'MaxGekk' has created a pull request for this issue: https://github.com/apache/spark/pull/36380 > Replace the error class ILLEGAL_SUBSTRING by INVALID_PARAMETER_VALUE > > > Key: SPARK-39047 > URL: https://issues.apache.org/jira/browse/SPARK-39047 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Priority: Major > > Use the INVALID_PARAMETER_VALUE error class instead of ILLEGAL_SUBSTRING and > remove the last one because it duplicates INVALID_PARAMETER_VALUE. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39047) Replace the error class ILLEGAL_SUBSTRING by INVALID_PARAMETER_VALUE
[ https://issues.apache.org/jira/browse/SPARK-39047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39047: Assignee: Apache Spark > Replace the error class ILLEGAL_SUBSTRING by INVALID_PARAMETER_VALUE > > > Key: SPARK-39047 > URL: https://issues.apache.org/jira/browse/SPARK-39047 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Assignee: Apache Spark >Priority: Major > > Use the INVALID_PARAMETER_VALUE error class instead of ILLEGAL_SUBSTRING and > remove the last one because it duplicates INVALID_PARAMETER_VALUE. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39047) Replace the error class ILLEGAL_SUBSTRING by INVALID_PARAMETER_VALUE
[ https://issues.apache.org/jira/browse/SPARK-39047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39047: Assignee: (was: Apache Spark) > Replace the error class ILLEGAL_SUBSTRING by INVALID_PARAMETER_VALUE > > > Key: SPARK-39047 > URL: https://issues.apache.org/jira/browse/SPARK-39047 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Priority: Major > > Use the INVALID_PARAMETER_VALUE error class instead of ILLEGAL_SUBSTRING and > remove the last one because it duplicates INVALID_PARAMETER_VALUE. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39047) Replace the error class ILLEGAL_SUBSTRING by INVALID_PARAMETER_VALUE
[ https://issues.apache.org/jira/browse/SPARK-39047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17528886#comment-17528886 ] Apache Spark commented on SPARK-39047: -- User 'MaxGekk' has created a pull request for this issue: https://github.com/apache/spark/pull/36380 > Replace the error class ILLEGAL_SUBSTRING by INVALID_PARAMETER_VALUE > > > Key: SPARK-39047 > URL: https://issues.apache.org/jira/browse/SPARK-39047 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Priority: Major > > Use the INVALID_PARAMETER_VALUE error class instead of ILLEGAL_SUBSTRING and > remove the last one because it duplicates INVALID_PARAMETER_VALUE. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-38997) DS V2 aggregate push-down supports group by expressions
[ https://issues.apache.org/jira/browse/SPARK-38997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-38997. - Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 36325 [https://github.com/apache/spark/pull/36325] > DS V2 aggregate push-down supports group by expressions > --- > > Key: SPARK-38997 > URL: https://issues.apache.org/jira/browse/SPARK-38997 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: jiaan.geng >Assignee: jiaan.geng >Priority: Major > Fix For: 3.3.0 > > > Currently, Spark DS V2 aggregate push-down only supports group by column. > But the SQL show below is very useful and common. > SELECT CASE WHEN ("SALARY" > 8000.00) AND ("SALARY" < 1.00) THEN "SALARY" > ELSE 0.00 END AS key, SUM("SALARY") FROM "test"."employee" GROUP BY key -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38997) DS V2 aggregate push-down supports group by expressions
[ https://issues.apache.org/jira/browse/SPARK-38997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-38997: --- Assignee: jiaan.geng > DS V2 aggregate push-down supports group by expressions > --- > > Key: SPARK-38997 > URL: https://issues.apache.org/jira/browse/SPARK-38997 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: jiaan.geng >Assignee: jiaan.geng >Priority: Major > > Currently, Spark DS V2 aggregate push-down only supports group by column. > But the SQL show below is very useful and common. > SELECT CASE WHEN ("SALARY" > 8000.00) AND ("SALARY" < 1.00) THEN "SALARY" > ELSE 0.00 END AS key, SUM("SALARY") FROM "test"."employee" GROUP BY key -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39047) Replace the error class ILLEGAL_SUBSTRING by INVALID_PARAMETER_VALUE
Max Gekk created SPARK-39047: Summary: Replace the error class ILLEGAL_SUBSTRING by INVALID_PARAMETER_VALUE Key: SPARK-39047 URL: https://issues.apache.org/jira/browse/SPARK-39047 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.4.0 Reporter: Max Gekk Use the INVALID_PARAMETER_VALUE error class instead of ILLEGAL_SUBSTRING and remove the last one because it duplicates INVALID_PARAMETER_VALUE. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-25177) When dataframe decimal type column having scale higher than 6, 0 values are shown in scientific notation
[ https://issues.apache.org/jira/browse/SPARK-25177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17528877#comment-17528877 ] Vinod KC edited comment on SPARK-25177 at 4/27/22 4:29 PM: --- In case, if anyone looking for a workaround to convert 0 in scientific notation to plaintext, this code snippet may help. {code:java} import org.apache.spark.sql.types.Decimal val handleBigDecZeroUDF = udf((decimalVal:Decimal) => { if (decimalVal.scale > 6) { decimalVal.toBigDecimal.bigDecimal.toPlainString() } else { decimalVal.toString() } }) spark.sql("create table testBigDec (a decimal(10,7), b decimal(10,6), c decimal(10,8))") spark.sql("insert into testBigDec values(0, 0,0)") spark.sql("insert into testBigDec values(1, 1, 1)") val df = spark.table("testBigDec") df.show(false) // this will show scientific notation // use custom UDF `handleBigDecZeroUDF` to convert zero into plainText notation df.select(handleBigDecZeroUDF(col("a")).as("a"),col("b"),handleBigDecZeroUDF(col("c")).as("c")).show(false) // Result of df.show(false) +-++--+ |a |b |c | +-++--+ |0E-7 |0.00|0E-8 | |1.000|1.00|1.| +-++--+ // Result using handleBigDecZeroUDF +-++--+ |a |b |c | +-++--+ |0.000|0.00|0.| |1.000|1.00|1.| +-++--+ {code} was (Author: vinodkc): In case, if anyone looking for a workaround to convert 0 in scientific notation to plaintext, this code snippet may help. {code:java} import org.apache.spark.sql.types.Decimal val handleBigDecZeroUDF = udf((decimalVal:Decimal) => { if (decimalVal.scale > 6) { decimalVal.toBigDecimal.bigDecimal.toPlainString() } else { decimalVal.toString() } }) spark.sql("create table testBigDec (a decimal(10,7), b decimal(10,6), c decimal(10,8))") spark.sql("insert into testBigDec values(0, 0,0)") spark.sql("insert into testBigDec values(1, 1, 1)") val df = spark.table("testBigDec") df.show(false) // this will show scientific notation // use custom UDF `handleBigDecZeroUDF` to convert zero into plainText notation df.select(handleBigDecZeroUDF(col("a")).as("a"),md5(handleBigDecZeroUDF(col("a"))).as("a-md5"),col("b"),handleBigDecZeroUDF(col("c")).as("c")).show(false) {code} > When dataframe decimal type column having scale higher than 6, 0 values are > shown in scientific notation > > > Key: SPARK-25177 > URL: https://issues.apache.org/jira/browse/SPARK-25177 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 >Reporter: Vinod KC >Priority: Minor > Labels: bulk-closed > > If scale of decimal type is > 6 , 0 value will be shown in scientific > notation and hence, when the dataframe output is saved to external database, > it fails due to scientific notation on "0" values. > Eg: In Spark > -- > spark.sql("create table test (a decimal(10,7), b decimal(10,6), c > decimal(10,8))") > spark.sql("insert into test values(0, 0,0)") > spark.sql("insert into test values(1, 1, 1)") > spark.table("test").show() > | a | b | c | > | 0E-7 |0.00| 0E-8 |//If scale > 6, zero is displayed in > scientific notation| > |1.000|1.00|1.| > > Eg: In Postgress > -- > CREATE TABLE Testdec (a DECIMAL(10,7), b DECIMAL(10,6), c DECIMAL(10,8)); > INSERT INTO Testdec VALUES (0,0,0); > INSERT INTO Testdec VALUES (1,1,1); > select * from Testdec; > Result: > a | b | c > ---++--- > 0.000 | 0.00 | 0. > 1.000 | 1.00 | 1. > We can make spark SQL result consistent with other Databases like Postgresql > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25177) When dataframe decimal type column having scale higher than 6, 0 values are shown in scientific notation
[ https://issues.apache.org/jira/browse/SPARK-25177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17528877#comment-17528877 ] Vinod KC commented on SPARK-25177: -- In case, if anyone looking for a workaround to convert 0 in scientific notation to plaintext, this code snippet may help. {code:java} import org.apache.spark.sql.types.Decimal val handleBigDecZeroUDF = udf((decimalVal:Decimal) => { if (decimalVal.scale > 6) { decimalVal.toBigDecimal.bigDecimal.toPlainString() } else { decimalVal.toString() } }) spark.sql("create table testBigDec (a decimal(10,7), b decimal(10,6), c decimal(10,8))") spark.sql("insert into testBigDec values(0, 0,0)") spark.sql("insert into testBigDec values(1, 1, 1)") val df = spark.table("testBigDec") df.show(false) // this will show scientific notation // use custom UDF `handleBigDecZeroUDF` to convert zero into plainText notation df.select(handleBigDecZeroUDF(col("a")).as("a"),md5(handleBigDecZeroUDF(col("a"))).as("a-md5"),col("b"),handleBigDecZeroUDF(col("c")).as("c")).show(false) {code} > When dataframe decimal type column having scale higher than 6, 0 values are > shown in scientific notation > > > Key: SPARK-25177 > URL: https://issues.apache.org/jira/browse/SPARK-25177 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 >Reporter: Vinod KC >Priority: Minor > Labels: bulk-closed > > If scale of decimal type is > 6 , 0 value will be shown in scientific > notation and hence, when the dataframe output is saved to external database, > it fails due to scientific notation on "0" values. > Eg: In Spark > -- > spark.sql("create table test (a decimal(10,7), b decimal(10,6), c > decimal(10,8))") > spark.sql("insert into test values(0, 0,0)") > spark.sql("insert into test values(1, 1, 1)") > spark.table("test").show() > | a | b | c | > | 0E-7 |0.00| 0E-8 |//If scale > 6, zero is displayed in > scientific notation| > |1.000|1.00|1.| > > Eg: In Postgress > -- > CREATE TABLE Testdec (a DECIMAL(10,7), b DECIMAL(10,6), c DECIMAL(10,8)); > INSERT INTO Testdec VALUES (0,0,0); > INSERT INTO Testdec VALUES (1,1,1); > select * from Testdec; > Result: > a | b | c > ---++--- > 0.000 | 0.00 | 0. > 1.000 | 1.00 | 1. > We can make spark SQL result consistent with other Databases like Postgresql > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39022) Spark SQL - Combination of HAVING and SORT not resolved correctly
[ https://issues.apache.org/jira/browse/SPARK-39022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17528863#comment-17528863 ] Lukas Grasmann commented on SPARK-39022: This is my first contribution. How do I assign this issue to myself? > Spark SQL - Combination of HAVING and SORT not resolved correctly > - > > Key: SPARK-39022 > URL: https://issues.apache.org/jira/browse/SPARK-39022 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.2, 3.2.1, 3.4.0 >Reporter: Lukas Grasmann >Priority: Major > Attachments: explain_new.txt, explain_old.txt > > > Example: Given a simple relation {{test}} with two relevant columns {{hotel}} > and {{price}} where {{hotel}} is a unique identifier of a hotel and {{price}} > is the cost of a night's stay. We would then like to order the {{{}hotel{}}}s > by their cumulative prices but only for hotels where the cumulative price is > higher than {{{}150{}}}. > h2. Current Behavior > To achieve the goal specified above, we give a simple query that works in > most common database systems. Note that we only retrieve {{hotel}} in the > {{SELECT ... FROM}} statement which means that the aggregate has to be > removed from the result attributes using a {{Project}} node. > {code:scala} > sqlcontext.sql("SELECT hotel FROM test GROUP BY hotel HAVING sum(price) > 150 > ORDER BY sum(price)").show{code} > Currently, this yields an {{AnalysisException}} since the aggregate > {{sum(price)}} in {{Sort}} is not resolved correctly. Note that the child of > {{Sort}} is a (premature) {{Project}} node which only provides {{hotel}} as > its output. This prevents the aggregate values from being passed to > {{{}Sort{}}}. > {code:scala} > org.apache.spark.sql.AnalysisException: Column 'price' does not exist. Did > you mean one of the following? [test.hotel]; line 1 pos 75; > 'Sort ['sum('price) ASC NULLS FIRST], true > +- Project [hotel#17] >+- Filter (sum(cast(price#18 as double))#22 > cast(150 as double)) > +- Aggregate [HOTEL#17], [hotel#17, sum(cast(price#18 as double)) AS > sum(cast(price#18 as double))#22] > +- SubqueryAlias test > +- View (`test`, [hotel#17,price#18]) >+- Relation [hotel#17,price#18] csv > {code} > The {{AnalysisException}} itself, however, is not caused by the introduced > {{Project}} as can be seen in the following example. Here, {{sum(price)}} is > part of the result and therefore *not* removed using a {{Project}} node. > {code:scala} > sqlcontext.sql("SELECT hotel, sum(price) FROM test GROUP BY hotel HAVING > sum(price) > 150 ORDER BY sum(price)").show{code} > Resolving the aggregate {{sum(price)}} (i.e., resolving it to the aggregate > introduced by the {{Aggregate}} node) is still not successful even if there > is no {{{}Project{}}}. Spark still throws the following {{AnalysisException}} > which is similar to the exception from before. It follows that there is a > second error in the analyzer that still prevents successful resolution even > if the problem regarding the {{Project}} node is fixed. > {code:scala} > org.apache.spark.sql.AnalysisException: Column 'price' does not exist. Did > you mean one of the following? [sum(price), test.hotel]; line 1 pos 87; > 'Sort ['sum('price) ASC NULLS FIRST], true > +- Filter (sum(price)#24 > cast(150 as double)) >+- Aggregate [HOTEL#17], [hotel#17, sum(cast(price#18 as double)) AS > sum(price)#24] > +- SubqueryAlias test > +- View (`test`, [hotel#17,price#18]) > +- Relation [hotel#17,price#18] csv > {code} > > This error occurs (at least) in Spark versions 3.1.2, 3.2.1, as well as the > latest version from the GitHub {{master}} branch. > h2. Current Workaround > The issue can currently be worked around by using a subquery to first > retrieve only the hotels which fulfill the condition and then ordering them > in the outer query: > {code:sql} > SELECT hotel, sum_price FROM > (SELECT hotel, sum(price) AS sum_price FROM test GROUP BY hotel HAVING > sum(price) > 150) sub > ORDER BY sum_price; > {code} > h2. Proposed Solution(s) > The first change fixes the (premature) insertion of {{Project}} before a > {{Sort}} by moving the {{Project}} up in the plan such that the {{Project}} > is then parent of the {{Sort}} instead of vice versa. This does not change > the results of the computations since both {{Sort}} and {{Project}} do not > add or remove tuples from the result. > There are two potential side-effects to this solution: > * May change some plans generated by DataFrame/DataSet which previously also > produced similar errors such that they now yield a result instead. However, > this is unlikely to produce unexpected/undesired results (see above). >
[jira] [Commented] (SPARK-38648) SPIP: Simplified API for DL Inferencing
[ https://issues.apache.org/jira/browse/SPARK-38648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17528862#comment-17528862 ] Xiangrui Meng commented on SPARK-38648: --- I think it is beneficial to both Spark and DL frameworks if Spark has state-of-the-art DL capabilities. We did some work in the past to make Spark work better with DL frameworks, e.g., iterator Scalar Pandas UDF, barrier mode, and GPU scheduling. But most of them are low level APIs for developers, not end users. Our Spark user guide contains little about DL and AI. The dependency on DL frameworks might create issues. One idea is to develop in the Spark repo and Spark namespace but publish to PyPI independently. For example, in order to use DL features, users need to explicitly install `pyspark-dl` and then use the features under `pyspark.dl` namespace. Putting development inside Spark and publishing under the spark namespace would help drive both development and adoption. > SPIP: Simplified API for DL Inferencing > --- > > Key: SPARK-38648 > URL: https://issues.apache.org/jira/browse/SPARK-38648 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 3.0.0 >Reporter: Lee Yang >Priority: Minor > > h1. Background and Motivation > The deployment of deep learning (DL) models to Spark clusters can be a point > of friction today. DL practitioners often aren't well-versed with Spark, and > Spark experts often aren't well-versed with the fast-changing DL frameworks. > Currently, the deployment of trained DL models is done in a fairly ad-hoc > manner, with each model integration usually requiring significant effort. > To simplify this process, we propose adding an integration layer for each > major DL framework that can introspect their respective saved models to > more-easily integrate these models into Spark applications. You can find a > detailed proposal here: > [https://docs.google.com/document/d/1n7QPHVZfmQknvebZEXxzndHPV2T71aBsDnP4COQa_v0] > h1. Goals > - Simplify the deployment of pre-trained single-node DL models to Spark > inference applications. > - Follow pandas_udf for simple inference use-cases. > - Follow Spark ML Pipelines APIs for transfer-learning use-cases. > - Enable integrations with popular third-party DL frameworks like > TensorFlow, PyTorch, and Huggingface. > - Focus on PySpark, since most of the DL frameworks use Python. > - Take advantage of built-in Spark features like GPU scheduling and Arrow > integration. > - Enable inference on both CPU and GPU. > h1. Non-goals > - DL model training. > - Inference w/ distributed models, i.e. "model parallel" inference. > h1. Target Personas > - Data scientists who need to deploy DL models on Spark. > - Developers who need to deploy DL models on Spark. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38914) Allow user to insert specified columns into insertable view
[ https://issues.apache.org/jira/browse/SPARK-38914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang reassigned SPARK-38914: -- Assignee: morvenhuang > Allow user to insert specified columns into insertable view > --- > > Key: SPARK-38914 > URL: https://issues.apache.org/jira/browse/SPARK-38914 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.1 >Reporter: morvenhuang >Assignee: morvenhuang >Priority: Minor > > The option `spark.sql.defaultColumn.useNullsForMissingDefautValues` allows us > to insert specified columns into table (SPARK-38795), but currently this > option does not work for insertable view, > Below INSERT INTO will result in AnalysisException even when the > useNullsForMissingDefautValues option is true, > {code:java} > spark.sql("CREATE TEMPORARY VIEW v1 (c1 int, c2 string) USING > org.apache.spark.sql.json.DefaultSource OPTIONS ( path 'json_dir')"); > spark.sql("INSERT INTO v1(c1) VALUES(100)"); > org.apache.spark.sql.AnalysisException: unknown requires that the data to be > inserted have the same number of columns as the target table: target table > has 2 column(s) but the inserted data has 1 column(s), including 0 partition > column(s) having constant value(s). > {code} > > I can provide a fix for this issue. > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-38914) Allow user to insert specified columns into insertable view
[ https://issues.apache.org/jira/browse/SPARK-38914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang resolved SPARK-38914. Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 36212 [https://github.com/apache/spark/pull/36212] > Allow user to insert specified columns into insertable view > --- > > Key: SPARK-38914 > URL: https://issues.apache.org/jira/browse/SPARK-38914 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.1 >Reporter: morvenhuang >Assignee: morvenhuang >Priority: Minor > Fix For: 3.4.0 > > > The option `spark.sql.defaultColumn.useNullsForMissingDefautValues` allows us > to insert specified columns into table (SPARK-38795), but currently this > option does not work for insertable view, > Below INSERT INTO will result in AnalysisException even when the > useNullsForMissingDefautValues option is true, > {code:java} > spark.sql("CREATE TEMPORARY VIEW v1 (c1 int, c2 string) USING > org.apache.spark.sql.json.DefaultSource OPTIONS ( path 'json_dir')"); > spark.sql("INSERT INTO v1(c1) VALUES(100)"); > org.apache.spark.sql.AnalysisException: unknown requires that the data to be > inserted have the same number of columns as the target table: target table > has 2 column(s) but the inserted data has 1 column(s), including 0 partition > column(s) having constant value(s). > {code} > > I can provide a fix for this issue. > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39046) Return an empty context string if TreeNode.origin is wrongly set
[ https://issues.apache.org/jira/browse/SPARK-39046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39046: Assignee: Gengliang Wang (was: Apache Spark) > Return an empty context string if TreeNode.origin is wrongly set > > > Key: SPARK-39046 > URL: https://issues.apache.org/jira/browse/SPARK-39046 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39046) Return an empty context string if TreeNode.origin is wrongly set
[ https://issues.apache.org/jira/browse/SPARK-39046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17528859#comment-17528859 ] Apache Spark commented on SPARK-39046: -- User 'gengliangwang' has created a pull request for this issue: https://github.com/apache/spark/pull/36379 > Return an empty context string if TreeNode.origin is wrongly set > > > Key: SPARK-39046 > URL: https://issues.apache.org/jira/browse/SPARK-39046 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39046) Return an empty context string if TreeNode.origin is wrongly set
[ https://issues.apache.org/jira/browse/SPARK-39046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39046: Assignee: Apache Spark (was: Gengliang Wang) > Return an empty context string if TreeNode.origin is wrongly set > > > Key: SPARK-39046 > URL: https://issues.apache.org/jira/browse/SPARK-39046 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Gengliang Wang >Assignee: Apache Spark >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39022) Spark SQL - Combination of HAVING and SORT not resolved correctly
[ https://issues.apache.org/jira/browse/SPARK-39022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17528856#comment-17528856 ] Apache Spark commented on SPARK-39022: -- User 'Lukas-Grasmann' has created a pull request for this issue: https://github.com/apache/spark/pull/36378 > Spark SQL - Combination of HAVING and SORT not resolved correctly > - > > Key: SPARK-39022 > URL: https://issues.apache.org/jira/browse/SPARK-39022 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.2, 3.2.1, 3.4.0 >Reporter: Lukas Grasmann >Priority: Major > Attachments: explain_new.txt, explain_old.txt > > > Example: Given a simple relation {{test}} with two relevant columns {{hotel}} > and {{price}} where {{hotel}} is a unique identifier of a hotel and {{price}} > is the cost of a night's stay. We would then like to order the {{{}hotel{}}}s > by their cumulative prices but only for hotels where the cumulative price is > higher than {{{}150{}}}. > h2. Current Behavior > To achieve the goal specified above, we give a simple query that works in > most common database systems. Note that we only retrieve {{hotel}} in the > {{SELECT ... FROM}} statement which means that the aggregate has to be > removed from the result attributes using a {{Project}} node. > {code:scala} > sqlcontext.sql("SELECT hotel FROM test GROUP BY hotel HAVING sum(price) > 150 > ORDER BY sum(price)").show{code} > Currently, this yields an {{AnalysisException}} since the aggregate > {{sum(price)}} in {{Sort}} is not resolved correctly. Note that the child of > {{Sort}} is a (premature) {{Project}} node which only provides {{hotel}} as > its output. This prevents the aggregate values from being passed to > {{{}Sort{}}}. > {code:scala} > org.apache.spark.sql.AnalysisException: Column 'price' does not exist. Did > you mean one of the following? [test.hotel]; line 1 pos 75; > 'Sort ['sum('price) ASC NULLS FIRST], true > +- Project [hotel#17] >+- Filter (sum(cast(price#18 as double))#22 > cast(150 as double)) > +- Aggregate [HOTEL#17], [hotel#17, sum(cast(price#18 as double)) AS > sum(cast(price#18 as double))#22] > +- SubqueryAlias test > +- View (`test`, [hotel#17,price#18]) >+- Relation [hotel#17,price#18] csv > {code} > The {{AnalysisException}} itself, however, is not caused by the introduced > {{Project}} as can be seen in the following example. Here, {{sum(price)}} is > part of the result and therefore *not* removed using a {{Project}} node. > {code:scala} > sqlcontext.sql("SELECT hotel, sum(price) FROM test GROUP BY hotel HAVING > sum(price) > 150 ORDER BY sum(price)").show{code} > Resolving the aggregate {{sum(price)}} (i.e., resolving it to the aggregate > introduced by the {{Aggregate}} node) is still not successful even if there > is no {{{}Project{}}}. Spark still throws the following {{AnalysisException}} > which is similar to the exception from before. It follows that there is a > second error in the analyzer that still prevents successful resolution even > if the problem regarding the {{Project}} node is fixed. > {code:scala} > org.apache.spark.sql.AnalysisException: Column 'price' does not exist. Did > you mean one of the following? [sum(price), test.hotel]; line 1 pos 87; > 'Sort ['sum('price) ASC NULLS FIRST], true > +- Filter (sum(price)#24 > cast(150 as double)) >+- Aggregate [HOTEL#17], [hotel#17, sum(cast(price#18 as double)) AS > sum(price)#24] > +- SubqueryAlias test > +- View (`test`, [hotel#17,price#18]) > +- Relation [hotel#17,price#18] csv > {code} > > This error occurs (at least) in Spark versions 3.1.2, 3.2.1, as well as the > latest version from the GitHub {{master}} branch. > h2. Current Workaround > The issue can currently be worked around by using a subquery to first > retrieve only the hotels which fulfill the condition and then ordering them > in the outer query: > {code:sql} > SELECT hotel, sum_price FROM > (SELECT hotel, sum(price) AS sum_price FROM test GROUP BY hotel HAVING > sum(price) > 150) sub > ORDER BY sum_price; > {code} > h2. Proposed Solution(s) > The first change fixes the (premature) insertion of {{Project}} before a > {{Sort}} by moving the {{Project}} up in the plan such that the {{Project}} > is then parent of the {{Sort}} instead of vice versa. This does not change > the results of the computations since both {{Sort}} and {{Project}} do not > add or remove tuples from the result. > There are two potential side-effects to this solution: > * May change some plans generated by DataFrame/DataSet which previously also > produced similar errors such that they now yield a result instead. However, > this is unlikely to produce unexpecte
[jira] [Assigned] (SPARK-39022) Spark SQL - Combination of HAVING and SORT not resolved correctly
[ https://issues.apache.org/jira/browse/SPARK-39022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39022: Assignee: (was: Apache Spark) > Spark SQL - Combination of HAVING and SORT not resolved correctly > - > > Key: SPARK-39022 > URL: https://issues.apache.org/jira/browse/SPARK-39022 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.2, 3.2.1, 3.4.0 >Reporter: Lukas Grasmann >Priority: Major > Attachments: explain_new.txt, explain_old.txt > > > Example: Given a simple relation {{test}} with two relevant columns {{hotel}} > and {{price}} where {{hotel}} is a unique identifier of a hotel and {{price}} > is the cost of a night's stay. We would then like to order the {{{}hotel{}}}s > by their cumulative prices but only for hotels where the cumulative price is > higher than {{{}150{}}}. > h2. Current Behavior > To achieve the goal specified above, we give a simple query that works in > most common database systems. Note that we only retrieve {{hotel}} in the > {{SELECT ... FROM}} statement which means that the aggregate has to be > removed from the result attributes using a {{Project}} node. > {code:scala} > sqlcontext.sql("SELECT hotel FROM test GROUP BY hotel HAVING sum(price) > 150 > ORDER BY sum(price)").show{code} > Currently, this yields an {{AnalysisException}} since the aggregate > {{sum(price)}} in {{Sort}} is not resolved correctly. Note that the child of > {{Sort}} is a (premature) {{Project}} node which only provides {{hotel}} as > its output. This prevents the aggregate values from being passed to > {{{}Sort{}}}. > {code:scala} > org.apache.spark.sql.AnalysisException: Column 'price' does not exist. Did > you mean one of the following? [test.hotel]; line 1 pos 75; > 'Sort ['sum('price) ASC NULLS FIRST], true > +- Project [hotel#17] >+- Filter (sum(cast(price#18 as double))#22 > cast(150 as double)) > +- Aggregate [HOTEL#17], [hotel#17, sum(cast(price#18 as double)) AS > sum(cast(price#18 as double))#22] > +- SubqueryAlias test > +- View (`test`, [hotel#17,price#18]) >+- Relation [hotel#17,price#18] csv > {code} > The {{AnalysisException}} itself, however, is not caused by the introduced > {{Project}} as can be seen in the following example. Here, {{sum(price)}} is > part of the result and therefore *not* removed using a {{Project}} node. > {code:scala} > sqlcontext.sql("SELECT hotel, sum(price) FROM test GROUP BY hotel HAVING > sum(price) > 150 ORDER BY sum(price)").show{code} > Resolving the aggregate {{sum(price)}} (i.e., resolving it to the aggregate > introduced by the {{Aggregate}} node) is still not successful even if there > is no {{{}Project{}}}. Spark still throws the following {{AnalysisException}} > which is similar to the exception from before. It follows that there is a > second error in the analyzer that still prevents successful resolution even > if the problem regarding the {{Project}} node is fixed. > {code:scala} > org.apache.spark.sql.AnalysisException: Column 'price' does not exist. Did > you mean one of the following? [sum(price), test.hotel]; line 1 pos 87; > 'Sort ['sum('price) ASC NULLS FIRST], true > +- Filter (sum(price)#24 > cast(150 as double)) >+- Aggregate [HOTEL#17], [hotel#17, sum(cast(price#18 as double)) AS > sum(price)#24] > +- SubqueryAlias test > +- View (`test`, [hotel#17,price#18]) > +- Relation [hotel#17,price#18] csv > {code} > > This error occurs (at least) in Spark versions 3.1.2, 3.2.1, as well as the > latest version from the GitHub {{master}} branch. > h2. Current Workaround > The issue can currently be worked around by using a subquery to first > retrieve only the hotels which fulfill the condition and then ordering them > in the outer query: > {code:sql} > SELECT hotel, sum_price FROM > (SELECT hotel, sum(price) AS sum_price FROM test GROUP BY hotel HAVING > sum(price) > 150) sub > ORDER BY sum_price; > {code} > h2. Proposed Solution(s) > The first change fixes the (premature) insertion of {{Project}} before a > {{Sort}} by moving the {{Project}} up in the plan such that the {{Project}} > is then parent of the {{Sort}} instead of vice versa. This does not change > the results of the computations since both {{Sort}} and {{Project}} do not > add or remove tuples from the result. > There are two potential side-effects to this solution: > * May change some plans generated by DataFrame/DataSet which previously also > produced similar errors such that they now yield a result instead. However, > this is unlikely to produce unexpected/undesired results (see above). > * Moving the projection might reduce performance for {{Sort}} since the > input is p
[jira] [Assigned] (SPARK-39022) Spark SQL - Combination of HAVING and SORT not resolved correctly
[ https://issues.apache.org/jira/browse/SPARK-39022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39022: Assignee: Apache Spark > Spark SQL - Combination of HAVING and SORT not resolved correctly > - > > Key: SPARK-39022 > URL: https://issues.apache.org/jira/browse/SPARK-39022 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.2, 3.2.1, 3.4.0 >Reporter: Lukas Grasmann >Assignee: Apache Spark >Priority: Major > Attachments: explain_new.txt, explain_old.txt > > > Example: Given a simple relation {{test}} with two relevant columns {{hotel}} > and {{price}} where {{hotel}} is a unique identifier of a hotel and {{price}} > is the cost of a night's stay. We would then like to order the {{{}hotel{}}}s > by their cumulative prices but only for hotels where the cumulative price is > higher than {{{}150{}}}. > h2. Current Behavior > To achieve the goal specified above, we give a simple query that works in > most common database systems. Note that we only retrieve {{hotel}} in the > {{SELECT ... FROM}} statement which means that the aggregate has to be > removed from the result attributes using a {{Project}} node. > {code:scala} > sqlcontext.sql("SELECT hotel FROM test GROUP BY hotel HAVING sum(price) > 150 > ORDER BY sum(price)").show{code} > Currently, this yields an {{AnalysisException}} since the aggregate > {{sum(price)}} in {{Sort}} is not resolved correctly. Note that the child of > {{Sort}} is a (premature) {{Project}} node which only provides {{hotel}} as > its output. This prevents the aggregate values from being passed to > {{{}Sort{}}}. > {code:scala} > org.apache.spark.sql.AnalysisException: Column 'price' does not exist. Did > you mean one of the following? [test.hotel]; line 1 pos 75; > 'Sort ['sum('price) ASC NULLS FIRST], true > +- Project [hotel#17] >+- Filter (sum(cast(price#18 as double))#22 > cast(150 as double)) > +- Aggregate [HOTEL#17], [hotel#17, sum(cast(price#18 as double)) AS > sum(cast(price#18 as double))#22] > +- SubqueryAlias test > +- View (`test`, [hotel#17,price#18]) >+- Relation [hotel#17,price#18] csv > {code} > The {{AnalysisException}} itself, however, is not caused by the introduced > {{Project}} as can be seen in the following example. Here, {{sum(price)}} is > part of the result and therefore *not* removed using a {{Project}} node. > {code:scala} > sqlcontext.sql("SELECT hotel, sum(price) FROM test GROUP BY hotel HAVING > sum(price) > 150 ORDER BY sum(price)").show{code} > Resolving the aggregate {{sum(price)}} (i.e., resolving it to the aggregate > introduced by the {{Aggregate}} node) is still not successful even if there > is no {{{}Project{}}}. Spark still throws the following {{AnalysisException}} > which is similar to the exception from before. It follows that there is a > second error in the analyzer that still prevents successful resolution even > if the problem regarding the {{Project}} node is fixed. > {code:scala} > org.apache.spark.sql.AnalysisException: Column 'price' does not exist. Did > you mean one of the following? [sum(price), test.hotel]; line 1 pos 87; > 'Sort ['sum('price) ASC NULLS FIRST], true > +- Filter (sum(price)#24 > cast(150 as double)) >+- Aggregate [HOTEL#17], [hotel#17, sum(cast(price#18 as double)) AS > sum(price)#24] > +- SubqueryAlias test > +- View (`test`, [hotel#17,price#18]) > +- Relation [hotel#17,price#18] csv > {code} > > This error occurs (at least) in Spark versions 3.1.2, 3.2.1, as well as the > latest version from the GitHub {{master}} branch. > h2. Current Workaround > The issue can currently be worked around by using a subquery to first > retrieve only the hotels which fulfill the condition and then ordering them > in the outer query: > {code:sql} > SELECT hotel, sum_price FROM > (SELECT hotel, sum(price) AS sum_price FROM test GROUP BY hotel HAVING > sum(price) > 150) sub > ORDER BY sum_price; > {code} > h2. Proposed Solution(s) > The first change fixes the (premature) insertion of {{Project}} before a > {{Sort}} by moving the {{Project}} up in the plan such that the {{Project}} > is then parent of the {{Sort}} instead of vice versa. This does not change > the results of the computations since both {{Sort}} and {{Project}} do not > add or remove tuples from the result. > There are two potential side-effects to this solution: > * May change some plans generated by DataFrame/DataSet which previously also > produced similar errors such that they now yield a result instead. However, > this is unlikely to produce unexpected/undesired results (see above). > * Moving the projection might reduce performance for {{Sort}
[jira] [Commented] (SPARK-39022) Spark SQL - Combination of HAVING and SORT not resolved correctly
[ https://issues.apache.org/jira/browse/SPARK-39022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17528855#comment-17528855 ] Apache Spark commented on SPARK-39022: -- User 'Lukas-Grasmann' has created a pull request for this issue: https://github.com/apache/spark/pull/36378 > Spark SQL - Combination of HAVING and SORT not resolved correctly > - > > Key: SPARK-39022 > URL: https://issues.apache.org/jira/browse/SPARK-39022 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.2, 3.2.1, 3.4.0 >Reporter: Lukas Grasmann >Priority: Major > Attachments: explain_new.txt, explain_old.txt > > > Example: Given a simple relation {{test}} with two relevant columns {{hotel}} > and {{price}} where {{hotel}} is a unique identifier of a hotel and {{price}} > is the cost of a night's stay. We would then like to order the {{{}hotel{}}}s > by their cumulative prices but only for hotels where the cumulative price is > higher than {{{}150{}}}. > h2. Current Behavior > To achieve the goal specified above, we give a simple query that works in > most common database systems. Note that we only retrieve {{hotel}} in the > {{SELECT ... FROM}} statement which means that the aggregate has to be > removed from the result attributes using a {{Project}} node. > {code:scala} > sqlcontext.sql("SELECT hotel FROM test GROUP BY hotel HAVING sum(price) > 150 > ORDER BY sum(price)").show{code} > Currently, this yields an {{AnalysisException}} since the aggregate > {{sum(price)}} in {{Sort}} is not resolved correctly. Note that the child of > {{Sort}} is a (premature) {{Project}} node which only provides {{hotel}} as > its output. This prevents the aggregate values from being passed to > {{{}Sort{}}}. > {code:scala} > org.apache.spark.sql.AnalysisException: Column 'price' does not exist. Did > you mean one of the following? [test.hotel]; line 1 pos 75; > 'Sort ['sum('price) ASC NULLS FIRST], true > +- Project [hotel#17] >+- Filter (sum(cast(price#18 as double))#22 > cast(150 as double)) > +- Aggregate [HOTEL#17], [hotel#17, sum(cast(price#18 as double)) AS > sum(cast(price#18 as double))#22] > +- SubqueryAlias test > +- View (`test`, [hotel#17,price#18]) >+- Relation [hotel#17,price#18] csv > {code} > The {{AnalysisException}} itself, however, is not caused by the introduced > {{Project}} as can be seen in the following example. Here, {{sum(price)}} is > part of the result and therefore *not* removed using a {{Project}} node. > {code:scala} > sqlcontext.sql("SELECT hotel, sum(price) FROM test GROUP BY hotel HAVING > sum(price) > 150 ORDER BY sum(price)").show{code} > Resolving the aggregate {{sum(price)}} (i.e., resolving it to the aggregate > introduced by the {{Aggregate}} node) is still not successful even if there > is no {{{}Project{}}}. Spark still throws the following {{AnalysisException}} > which is similar to the exception from before. It follows that there is a > second error in the analyzer that still prevents successful resolution even > if the problem regarding the {{Project}} node is fixed. > {code:scala} > org.apache.spark.sql.AnalysisException: Column 'price' does not exist. Did > you mean one of the following? [sum(price), test.hotel]; line 1 pos 87; > 'Sort ['sum('price) ASC NULLS FIRST], true > +- Filter (sum(price)#24 > cast(150 as double)) >+- Aggregate [HOTEL#17], [hotel#17, sum(cast(price#18 as double)) AS > sum(price)#24] > +- SubqueryAlias test > +- View (`test`, [hotel#17,price#18]) > +- Relation [hotel#17,price#18] csv > {code} > > This error occurs (at least) in Spark versions 3.1.2, 3.2.1, as well as the > latest version from the GitHub {{master}} branch. > h2. Current Workaround > The issue can currently be worked around by using a subquery to first > retrieve only the hotels which fulfill the condition and then ordering them > in the outer query: > {code:sql} > SELECT hotel, sum_price FROM > (SELECT hotel, sum(price) AS sum_price FROM test GROUP BY hotel HAVING > sum(price) > 150) sub > ORDER BY sum_price; > {code} > h2. Proposed Solution(s) > The first change fixes the (premature) insertion of {{Project}} before a > {{Sort}} by moving the {{Project}} up in the plan such that the {{Project}} > is then parent of the {{Sort}} instead of vice versa. This does not change > the results of the computations since both {{Sort}} and {{Project}} do not > add or remove tuples from the result. > There are two potential side-effects to this solution: > * May change some plans generated by DataFrame/DataSet which previously also > produced similar errors such that they now yield a result instead. However, > this is unlikely to produce unexpecte
[jira] [Updated] (SPARK-39046) Return an empty context string if TreeNode.origin is wrongly set
[ https://issues.apache.org/jira/browse/SPARK-39046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang updated SPARK-39046: --- Parent: SPARK-38615 Issue Type: Sub-task (was: Improvement) > Return an empty context string if TreeNode.origin is wrongly set > > > Key: SPARK-39046 > URL: https://issues.apache.org/jira/browse/SPARK-39046 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39046) Return an empty context string if TreeNode.origin is wrongly set
Gengliang Wang created SPARK-39046: -- Summary: Return an empty context string if TreeNode.origin is wrongly set Key: SPARK-39046 URL: https://issues.apache.org/jira/browse/SPARK-39046 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.3.0 Reporter: Gengliang Wang Assignee: Gengliang Wang -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39045) INTERNAL_ERROR for "all" internal errors
Serge Rielau created SPARK-39045: Summary: INTERNAL_ERROR for "all" internal errors Key: SPARK-39045 URL: https://issues.apache.org/jira/browse/SPARK-39045 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.3.0 Reporter: Serge Rielau We should be able to inject the [SYSTEM_ERROR] class for most cases without waiting to label the long tail on user facing error classes -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38979) Improve error log readability in OrcUtils.requestedColumnIds
[ https://issues.apache.org/jira/browse/SPARK-38979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen reassigned SPARK-38979: Assignee: dzcxzl > Improve error log readability in OrcUtils.requestedColumnIds > > > Key: SPARK-38979 > URL: https://issues.apache.org/jira/browse/SPARK-38979 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.1 >Reporter: dzcxzl >Assignee: dzcxzl >Priority: Trivial > > In OrcUtils#requestedColumnIds sometimes it fails because > orcFieldNames.length > dataSchema.length, the log is not very clear. > {code:java} > java.lang.AssertionError: assertion failed: The given data schema > struct has less fields than the actual ORC physical schema, no > idea which columns were dropped, fail to read. {code} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-38979) Improve error log readability in OrcUtils.requestedColumnIds
[ https://issues.apache.org/jira/browse/SPARK-38979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen resolved SPARK-38979. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 36296 [https://github.com/apache/spark/pull/36296] > Improve error log readability in OrcUtils.requestedColumnIds > > > Key: SPARK-38979 > URL: https://issues.apache.org/jira/browse/SPARK-38979 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.1 >Reporter: dzcxzl >Assignee: dzcxzl >Priority: Trivial > Fix For: 3.4.0 > > > In OrcUtils#requestedColumnIds sometimes it fails because > orcFieldNames.length > dataSchema.length, the log is not very clear. > {code:java} > java.lang.AssertionError: assertion failed: The given data schema > struct has less fields than the actual ORC physical schema, no > idea which columns were dropped, fail to read. {code} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39015) SparkRuntimeException when trying to get non-existent key in a map
[ https://issues.apache.org/jira/browse/SPARK-39015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk updated SPARK-39015: - Fix Version/s: 3.3.0 > SparkRuntimeException when trying to get non-existent key in a map > -- > > Key: SPARK-39015 > URL: https://issues.apache.org/jira/browse/SPARK-39015 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Raza Jafri >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 3.3.0, 3.4.0 > > > [~maxgekk] submitted a > [commit|https://github.com/apache/spark/commit/bc8c264851457d8ef59f5b332c79296651ec5d1e] > that tries to convert the key to SQL but that part of the code is blowing > up. > {code:java} > scala> :pa > // Entering paste mode (ctrl-D to finish) > import org.apache.spark.sql.Row > import org.apache.spark.sql.types.StructType > import org.apache.spark.sql.types.StringType > import org.apache.spark.sql.types.DataTypes > val arrayStructureData = Seq( > Row(Map("hair"->"black", "eye"->"brown")), > Row(Map("hair"->"blond", "eye"->"blue")), > Row(Map())) > val mapType = DataTypes.createMapType(StringType,StringType) > val arrayStructureSchema = new StructType() > .add("properties", mapType) > val mapTypeDF = spark.createDataFrame( > spark.sparkContext.parallelize(arrayStructureData),arrayStructureSchema) > mapTypeDF.selectExpr("element_at(properties, 'hair')").show > // Exiting paste mode, now interpreting. > ++ > |element_at(properties, hair)| > ++ > | black| > | blond| > |null| > ++ > scala> spark.conf.set("spark.sql.ansi.enabled", true) > scala> mapTypeDF.selectExpr("element_at(properties, 'hair')").show > 22/04/25 18:26:01 ERROR Executor: Exception in task 6.0 in stage 5.0 (TID 23) > org.apache.spark.SparkRuntimeException: The feature is not supported: literal > for 'hair' of class org.apache.spark.unsafe.types.UTF8String. > at > org.apache.spark.sql.errors.QueryExecutionErrors$.literalTypeUnsupportedError(QueryExecutionErrors.scala:240) > ~[spark-catalyst_2.12-3.3.0-SNAPSHOT.jar:3.3.0-SNAPSHOT] > at > org.apache.spark.sql.catalyst.expressions.Literal$.apply(literals.scala:101) > ~[spark-catalyst_2.12-3.3.0-SNAPSHOT.jar:3.3.0-SNAPSHOT] > at > org.apache.spark.sql.errors.QueryErrorsBase.toSQLValue(QueryErrorsBase.scala:44) > ~[spark-catalyst_2.12-3.3.0-SNAPSHOT.jar:3.3.0-SNAPSHOT] > at > org.apache.spark.sql.errors.QueryErrorsBase.toSQLValue$(QueryErrorsBase.scala:43) > ~[spark-catalyst_2.12-3.3.0-SNAPSHOT.jar:3.3.0-SNAPSHOT] > at > org.apache.spark.sql.errors.QueryExecutionErrors$.toSQLValue(QueryExecutionErrors.scala:69) > ~[spark-catalyst_2.12-3.3.0-SNAPSHOT.jar:3.3.0-SNAPSHOT] > {code} > Seems like it's trying to convert UTF8String to a sql literal -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39044) AggregatingAccumulator with TypedImperativeAggregate throwing NullPointerException
[ https://issues.apache.org/jira/browse/SPARK-39044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17528751#comment-17528751 ] Willi Raschkowski commented on SPARK-39044: --- This worked on Spark 3.0. [~beliefer], given we're hitting this in {{withBufferSerialized}}, I think this might be related to SPARK-37203. > AggregatingAccumulator with TypedImperativeAggregate throwing > NullPointerException > -- > > Key: SPARK-39044 > URL: https://issues.apache.org/jira/browse/SPARK-39044 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.1 >Reporter: Willi Raschkowski >Priority: Major > > We're using a custom TypedImperativeAggregate inside an > AggregatingAccumulator (via {{observe()}} and get the error below. It looks > like we're trying to serialize an aggregation buffer that hasn't been > initialized yet. > {code} > Caused by: org.apache.spark.SparkException: Job aborted. > at > org.apache.spark.sql.errors.QueryExecutionErrors$.jobAbortedError(QueryExecutionErrors.scala:496) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:251) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:186) > at > org.apache.spark.sql.execution.datasources.DataSource.writeAndRead(DataSource.scala:540) > ... > Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: > Task 9 in stage 1.0 failed 4 times, most recent failure: Lost task 9.3 in > stage 1.0 (TID 32) (10.0.134.136 executor 3): java.io.IOException: > java.lang.NullPointerException > at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1435) > at > org.apache.spark.scheduler.DirectTaskResult.writeExternal(TaskResult.scala:51) > at > java.base/java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:1460) > at > java.base/java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) > at > java.base/java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1179) > at > java.base/java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:349) > at > org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:46) > at > org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:114) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:633) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base/java.lang.Thread.run(Thread.java:829) > Caused by: java.lang.NullPointerException > at > org.apache.spark.sql.catalyst.expressions.aggregate.TypedImperativeAggregate.getBufferObject(interfaces.scala:638) > at > org.apache.spark.sql.catalyst.expressions.aggregate.TypedImperativeAggregate.getBufferObject(interfaces.scala:599) > at > org.apache.spark.sql.catalyst.expressions.aggregate.TypedImperativeAggregate.serializeAggregateBufferInPlace(interfaces.scala:621) > at > org.apache.spark.sql.execution.AggregatingAccumulator.withBufferSerialized(AggregatingAccumulator.scala:205) > at > org.apache.spark.sql.execution.AggregatingAccumulator.withBufferSerialized(AggregatingAccumulator.scala:33) > at > org.apache.spark.util.AccumulatorV2.writeReplace(AccumulatorV2.scala:186) > at jdk.internal.reflect.GeneratedMethodAccessor49.invoke(Unknown Source) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:566) > at > java.base/java.io.ObjectStreamClass.invokeWriteReplace(ObjectStreamClass.java:1235) > at > java.base/java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1137) > at > java.base/java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:349) > at > org.apache.spark.scheduler.DirectTaskResult.$anonfun$writeExternal$2(TaskResult.scala:55) > at > org.apache.spark.scheduler.DirectTaskResult.$anonfun$writeExternal$2$adapted(TaskResult.scala:55) > at scala.collection.Iterator.foreach(Iterator.scala:943) > at scala.collection.Iterator.foreach$(Iterator.scala:943) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1431) > at scala.collection.IterableLike.foreach(IterableLike.scala:74) > at scala.collection.IterableLike.foreach$(IterableLike.scala:73) > at scala.collection.AbstractIterable.foreach(Iterable.scala:56) > at
[jira] [Updated] (SPARK-39044) AggregatingAccumulator with TypedImperativeAggregate throwing NullPointerException
[ https://issues.apache.org/jira/browse/SPARK-39044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Willi Raschkowski updated SPARK-39044: -- Description: We're using a custom TypedImperativeAggregate inside an AggregatingAccumulator (via {{observe()}} and get the error below. It looks like we're trying to serialize an aggregation buffer that hasn't been initialized yet. {code} Caused by: org.apache.spark.SparkException: Job aborted. at org.apache.spark.sql.errors.QueryExecutionErrors$.jobAbortedError(QueryExecutionErrors.scala:496) at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:251) at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:186) at org.apache.spark.sql.execution.datasources.DataSource.writeAndRead(DataSource.scala:540) ... Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 9 in stage 1.0 failed 4 times, most recent failure: Lost task 9.3 in stage 1.0 (TID 32) (10.0.134.136 executor 3): java.io.IOException: java.lang.NullPointerException at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1435) at org.apache.spark.scheduler.DirectTaskResult.writeExternal(TaskResult.scala:51) at java.base/java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:1460) at java.base/java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.base/java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1179) at java.base/java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:349) at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:46) at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:114) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:633) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:829) Caused by: java.lang.NullPointerException at org.apache.spark.sql.catalyst.expressions.aggregate.TypedImperativeAggregate.getBufferObject(interfaces.scala:638) at org.apache.spark.sql.catalyst.expressions.aggregate.TypedImperativeAggregate.getBufferObject(interfaces.scala:599) at org.apache.spark.sql.catalyst.expressions.aggregate.TypedImperativeAggregate.serializeAggregateBufferInPlace(interfaces.scala:621) at org.apache.spark.sql.execution.AggregatingAccumulator.withBufferSerialized(AggregatingAccumulator.scala:205) at org.apache.spark.sql.execution.AggregatingAccumulator.withBufferSerialized(AggregatingAccumulator.scala:33) at org.apache.spark.util.AccumulatorV2.writeReplace(AccumulatorV2.scala:186) at jdk.internal.reflect.GeneratedMethodAccessor49.invoke(Unknown Source) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at java.base/java.io.ObjectStreamClass.invokeWriteReplace(ObjectStreamClass.java:1235) at java.base/java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1137) at java.base/java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:349) at org.apache.spark.scheduler.DirectTaskResult.$anonfun$writeExternal$2(TaskResult.scala:55) at org.apache.spark.scheduler.DirectTaskResult.$anonfun$writeExternal$2$adapted(TaskResult.scala:55) at scala.collection.Iterator.foreach(Iterator.scala:943) at scala.collection.Iterator.foreach$(Iterator.scala:943) at scala.collection.AbstractIterator.foreach(Iterator.scala:1431) at scala.collection.IterableLike.foreach(IterableLike.scala:74) at scala.collection.IterableLike.foreach$(IterableLike.scala:73) at scala.collection.AbstractIterable.foreach(Iterable.scala:56) at org.apache.spark.scheduler.DirectTaskResult.$anonfun$writeExternal$1(TaskResult.scala:55) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1428) ... 11 more {code} was: We're using a custom TypedImperativeAggregate inside an AggregatingAccumulator (via {{observe()}} and get the error below. It looks like we're trying to serialize an aggregation buffer that hasn't been initialized yet. {code} Caused by: java.io.IOException: java.lang.NullPointerException at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1435) at org.apache.spark.scheduler.DirectTaskResult.writeExternal(Ta
[jira] [Updated] (SPARK-39044) AggregatingAccumulator with TypedImperativeAggregate throwing NullPointerException
[ https://issues.apache.org/jira/browse/SPARK-39044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Willi Raschkowski updated SPARK-39044: -- Description: We're using a custom TypedImperativeAggregate inside an AggregatingAccumulator (via {{observe()}} and get the error below. It looks like we're trying to serialize an aggregation buffer that hasn't been initialized yet. {code} Caused by: java.io.IOException: java.lang.NullPointerException at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1435) at org.apache.spark.scheduler.DirectTaskResult.writeExternal(TaskResult.scala:51) at java.base/java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:1460) at java.base/java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.base/java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1179) at java.base/java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:349) at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:46) at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:114) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:633) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ... 1 more Caused by: java.lang.NullPointerException at org.apache.spark.sql.catalyst.expressions.aggregate.TypedImperativeAggregate.getBufferObject(interfaces.scala:638) at org.apache.spark.sql.catalyst.expressions.aggregate.TypedImperativeAggregate.getBufferObject(interfaces.scala:599) at org.apache.spark.sql.catalyst.expressions.aggregate.TypedImperativeAggregate.serializeAggregateBufferInPlace(interfaces.scala:621) at org.apache.spark.sql.execution.AggregatingAccumulator.withBufferSerialized(AggregatingAccumulator.scala:205) at org.apache.spark.sql.execution.AggregatingAccumulator.withBufferSerialized(AggregatingAccumulator.scala:33) at org.apache.spark.util.AccumulatorV2.writeReplace(AccumulatorV2.scala:186) at jdk.internal.reflect.GeneratedMethodAccessor49.invoke(Unknown Source) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at java.base/java.io.ObjectStreamClass.invokeWriteReplace(ObjectStreamClass.java:1235) at java.base/java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1137) at java.base/java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:349) at org.apache.spark.scheduler.DirectTaskResult.$anonfun$writeExternal$2(TaskResult.scala:55) at org.apache.spark.scheduler.DirectTaskResult.$anonfun$writeExternal$2$adapted(TaskResult.scala:55) at scala.collection.Iterator.foreach(Iterator.scala:943) at scala.collection.Iterator.foreach$(Iterator.scala:943) at scala.collection.AbstractIterator.foreach(Iterator.scala:1431) at scala.collection.IterableLike.foreach(IterableLike.scala:74) at scala.collection.IterableLike.foreach$(IterableLike.scala:73) at scala.collection.AbstractIterable.foreach(Iterable.scala:56) at org.apache.spark.scheduler.DirectTaskResult.$anonfun$writeExternal$1(TaskResult.scala:55) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1428) ... 11 more {code} was: We're using a custom TypedImperativeAggregate inside an AggregatingAccumulator (via {{observe()}} and get this error below: {code} Caused by: java.io.IOException: java.lang.NullPointerException at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1435) at org.apache.spark.scheduler.DirectTaskResult.writeExternal(TaskResult.scala:51) at java.base/java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:1460) at java.base/java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.base/java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1179) at java.base/java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:349) at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:46) at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:114) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:633) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.ru
[jira] [Created] (SPARK-39044) AggregatingAccumulator with TypedImperativeAggregate throwing NullPointerException
Willi Raschkowski created SPARK-39044: - Summary: AggregatingAccumulator with TypedImperativeAggregate throwing NullPointerException Key: SPARK-39044 URL: https://issues.apache.org/jira/browse/SPARK-39044 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.2.1 Reporter: Willi Raschkowski We're using a custom TypedImperativeAggregate inside an AggregatingAccumulator (via {{observe()}} and get this error below: {code} Caused by: java.io.IOException: java.lang.NullPointerException at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1435) at org.apache.spark.scheduler.DirectTaskResult.writeExternal(TaskResult.scala:51) at java.base/java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:1460) at java.base/java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.base/java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1179) at java.base/java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:349) at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:46) at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:114) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:633) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ... 1 more Caused by: java.lang.NullPointerException at org.apache.spark.sql.catalyst.expressions.aggregate.TypedImperativeAggregate.getBufferObject(interfaces.scala:638) at org.apache.spark.sql.catalyst.expressions.aggregate.TypedImperativeAggregate.getBufferObject(interfaces.scala:599) at org.apache.spark.sql.catalyst.expressions.aggregate.TypedImperativeAggregate.serializeAggregateBufferInPlace(interfaces.scala:621) at org.apache.spark.sql.execution.AggregatingAccumulator.withBufferSerialized(AggregatingAccumulator.scala:205) at org.apache.spark.sql.execution.AggregatingAccumulator.withBufferSerialized(AggregatingAccumulator.scala:33) at org.apache.spark.util.AccumulatorV2.writeReplace(AccumulatorV2.scala:186) at jdk.internal.reflect.GeneratedMethodAccessor49.invoke(Unknown Source) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at java.base/java.io.ObjectStreamClass.invokeWriteReplace(ObjectStreamClass.java:1235) at java.base/java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1137) at java.base/java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:349) at org.apache.spark.scheduler.DirectTaskResult.$anonfun$writeExternal$2(TaskResult.scala:55) at org.apache.spark.scheduler.DirectTaskResult.$anonfun$writeExternal$2$adapted(TaskResult.scala:55) at scala.collection.Iterator.foreach(Iterator.scala:943) at scala.collection.Iterator.foreach$(Iterator.scala:943) at scala.collection.AbstractIterator.foreach(Iterator.scala:1431) at scala.collection.IterableLike.foreach(IterableLike.scala:74) at scala.collection.IterableLike.foreach$(IterableLike.scala:73) at scala.collection.AbstractIterable.foreach(Iterable.scala:56) at org.apache.spark.scheduler.DirectTaskResult.$anonfun$writeExternal$1(TaskResult.scala:55) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1428) ... 11 more {code} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39043) Hive client should not gather statistic by default.
[ https://issues.apache.org/jira/browse/SPARK-39043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39043: Assignee: (was: Apache Spark) > Hive client should not gather statistic by default. > --- > > Key: SPARK-39043 > URL: https://issues.apache.org/jira/browse/SPARK-39043 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.1.2, 3.2.0, 3.3.0 >Reporter: angerszhu >Priority: Major > > When use `InsertIntoHiveTable`, when insert overwrite partition, it will call > Hive.loadPartition(), in this method, when `hive.stats.autogather` is > true(default is true) > > {code:java} > // Some comments here > public String getFoo() > if (oldPart == null) { > newTPart.getTPartition().setParameters(new HashMap()); > if (this.getConf().getBoolVar(HiveConf.ConfVars.HIVESTATSAUTOGATHER)) > { > > StatsSetupConst.setBasicStatsStateForCreateTable(newTPart.getParameters(), > StatsSetupConst.TRUE); > } > public static void setBasicStatsStateForCreateTable(Map > params, String setting) { > if (TRUE.equals(setting)) { > for (String stat : StatsSetupConst.supportedStats) { > params.put(stat, "0"); > } > } > setBasicStatsState(params, setting); > } > public static final String[] supportedStats = > {NUM_FILES,ROW_COUNT,TOTAL_SIZE,RAW_DATA_SIZE}; > {code} > Then it set default rowNum as 0, but since spark will update numFiles and > rawSize, so rowNum remain 0. > This impact other system like presto's CBO. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39043) Hive client should not gather statistic by default.
[ https://issues.apache.org/jira/browse/SPARK-39043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39043: Assignee: Apache Spark > Hive client should not gather statistic by default. > --- > > Key: SPARK-39043 > URL: https://issues.apache.org/jira/browse/SPARK-39043 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.1.2, 3.2.0, 3.3.0 >Reporter: angerszhu >Assignee: Apache Spark >Priority: Major > > When use `InsertIntoHiveTable`, when insert overwrite partition, it will call > Hive.loadPartition(), in this method, when `hive.stats.autogather` is > true(default is true) > > {code:java} > // Some comments here > public String getFoo() > if (oldPart == null) { > newTPart.getTPartition().setParameters(new HashMap()); > if (this.getConf().getBoolVar(HiveConf.ConfVars.HIVESTATSAUTOGATHER)) > { > > StatsSetupConst.setBasicStatsStateForCreateTable(newTPart.getParameters(), > StatsSetupConst.TRUE); > } > public static void setBasicStatsStateForCreateTable(Map > params, String setting) { > if (TRUE.equals(setting)) { > for (String stat : StatsSetupConst.supportedStats) { > params.put(stat, "0"); > } > } > setBasicStatsState(params, setting); > } > public static final String[] supportedStats = > {NUM_FILES,ROW_COUNT,TOTAL_SIZE,RAW_DATA_SIZE}; > {code} > Then it set default rowNum as 0, but since spark will update numFiles and > rawSize, so rowNum remain 0. > This impact other system like presto's CBO. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39043) Hive client should not gather statistic by default.
[ https://issues.apache.org/jira/browse/SPARK-39043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17528734#comment-17528734 ] Apache Spark commented on SPARK-39043: -- User 'AngersZh' has created a pull request for this issue: https://github.com/apache/spark/pull/36377 > Hive client should not gather statistic by default. > --- > > Key: SPARK-39043 > URL: https://issues.apache.org/jira/browse/SPARK-39043 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.1.2, 3.2.0, 3.3.0 >Reporter: angerszhu >Priority: Major > > When use `InsertIntoHiveTable`, when insert overwrite partition, it will call > Hive.loadPartition(), in this method, when `hive.stats.autogather` is > true(default is true) > > {code:java} > // Some comments here > public String getFoo() > if (oldPart == null) { > newTPart.getTPartition().setParameters(new HashMap()); > if (this.getConf().getBoolVar(HiveConf.ConfVars.HIVESTATSAUTOGATHER)) > { > > StatsSetupConst.setBasicStatsStateForCreateTable(newTPart.getParameters(), > StatsSetupConst.TRUE); > } > public static void setBasicStatsStateForCreateTable(Map > params, String setting) { > if (TRUE.equals(setting)) { > for (String stat : StatsSetupConst.supportedStats) { > params.put(stat, "0"); > } > } > setBasicStatsState(params, setting); > } > public static final String[] supportedStats = > {NUM_FILES,ROW_COUNT,TOTAL_SIZE,RAW_DATA_SIZE}; > {code} > Then it set default rowNum as 0, but since spark will update numFiles and > rawSize, so rowNum remain 0. > This impact other system like presto's CBO. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39040) Respect NaNvl in EquivalentExpressions for expression elimination
[ https://issues.apache.org/jira/browse/SPARK-39040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17528725#comment-17528725 ] Apache Spark commented on SPARK-39040: -- User 'ulysses-you' has created a pull request for this issue: https://github.com/apache/spark/pull/36376 > Respect NaNvl in EquivalentExpressions for expression elimination > - > > Key: SPARK-39040 > URL: https://issues.apache.org/jira/browse/SPARK-39040 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: XiDuo You >Priority: Major > > For example the query will fail: > {code:java} > set spark.sql.ansi.enabled=true; > set > spark.sql.optimizer.excludedRules=org.apache.spark.sql.catalyst.optimizer.ConstantFolding; > SELECT nanvl(1, 1/0 + 1/0); {code} > {code:java} > org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in > stage 4.0 failed 1 times, most recent failure: Lost task 0.0 in stage 4.0 > (TID 4) (10.221.98.68 executor driver): > org.apache.spark.SparkArithmeticException: divide by zero. To return NULL > instead, use 'try_divide'. If necessary set spark.sql.ansi.enabled to false > (except for ANSI interval type) to bypass this error. > == SQL(line 1, position 17) == > select nanvl(1 , 1/0 + 1/0) > ^^^ at > org.apache.spark.sql.errors.QueryExecutionErrors$.divideByZeroError(QueryExecutionErrors.scala:151) > {code} > We should respect the ordering of conditional expression that always evaluate > the predicate branch first, so the query above should not fail. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39040) Respect NaNvl in EquivalentExpressions for expression elimination
[ https://issues.apache.org/jira/browse/SPARK-39040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39040: Assignee: (was: Apache Spark) > Respect NaNvl in EquivalentExpressions for expression elimination > - > > Key: SPARK-39040 > URL: https://issues.apache.org/jira/browse/SPARK-39040 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: XiDuo You >Priority: Major > > For example the query will fail: > {code:java} > set spark.sql.ansi.enabled=true; > set > spark.sql.optimizer.excludedRules=org.apache.spark.sql.catalyst.optimizer.ConstantFolding; > SELECT nanvl(1, 1/0 + 1/0); {code} > {code:java} > org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in > stage 4.0 failed 1 times, most recent failure: Lost task 0.0 in stage 4.0 > (TID 4) (10.221.98.68 executor driver): > org.apache.spark.SparkArithmeticException: divide by zero. To return NULL > instead, use 'try_divide'. If necessary set spark.sql.ansi.enabled to false > (except for ANSI interval type) to bypass this error. > == SQL(line 1, position 17) == > select nanvl(1 , 1/0 + 1/0) > ^^^ at > org.apache.spark.sql.errors.QueryExecutionErrors$.divideByZeroError(QueryExecutionErrors.scala:151) > {code} > We should respect the ordering of conditional expression that always evaluate > the predicate branch first, so the query above should not fail. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39040) Respect NaNvl in EquivalentExpressions for expression elimination
[ https://issues.apache.org/jira/browse/SPARK-39040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17528723#comment-17528723 ] Apache Spark commented on SPARK-39040: -- User 'ulysses-you' has created a pull request for this issue: https://github.com/apache/spark/pull/36376 > Respect NaNvl in EquivalentExpressions for expression elimination > - > > Key: SPARK-39040 > URL: https://issues.apache.org/jira/browse/SPARK-39040 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: XiDuo You >Priority: Major > > For example the query will fail: > {code:java} > set spark.sql.ansi.enabled=true; > set > spark.sql.optimizer.excludedRules=org.apache.spark.sql.catalyst.optimizer.ConstantFolding; > SELECT nanvl(1, 1/0 + 1/0); {code} > {code:java} > org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in > stage 4.0 failed 1 times, most recent failure: Lost task 0.0 in stage 4.0 > (TID 4) (10.221.98.68 executor driver): > org.apache.spark.SparkArithmeticException: divide by zero. To return NULL > instead, use 'try_divide'. If necessary set spark.sql.ansi.enabled to false > (except for ANSI interval type) to bypass this error. > == SQL(line 1, position 17) == > select nanvl(1 , 1/0 + 1/0) > ^^^ at > org.apache.spark.sql.errors.QueryExecutionErrors$.divideByZeroError(QueryExecutionErrors.scala:151) > {code} > We should respect the ordering of conditional expression that always evaluate > the predicate branch first, so the query above should not fail. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39040) Respect NaNvl in EquivalentExpressions for expression elimination
[ https://issues.apache.org/jira/browse/SPARK-39040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39040: Assignee: Apache Spark > Respect NaNvl in EquivalentExpressions for expression elimination > - > > Key: SPARK-39040 > URL: https://issues.apache.org/jira/browse/SPARK-39040 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: XiDuo You >Assignee: Apache Spark >Priority: Major > > For example the query will fail: > {code:java} > set spark.sql.ansi.enabled=true; > set > spark.sql.optimizer.excludedRules=org.apache.spark.sql.catalyst.optimizer.ConstantFolding; > SELECT nanvl(1, 1/0 + 1/0); {code} > {code:java} > org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in > stage 4.0 failed 1 times, most recent failure: Lost task 0.0 in stage 4.0 > (TID 4) (10.221.98.68 executor driver): > org.apache.spark.SparkArithmeticException: divide by zero. To return NULL > instead, use 'try_divide'. If necessary set spark.sql.ansi.enabled to false > (except for ANSI interval type) to bypass this error. > == SQL(line 1, position 17) == > select nanvl(1 , 1/0 + 1/0) > ^^^ at > org.apache.spark.sql.errors.QueryExecutionErrors$.divideByZeroError(QueryExecutionErrors.scala:151) > {code} > We should respect the ordering of conditional expression that always evaluate > the predicate branch first, so the query above should not fail. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39043) Hive client should not gather statistic by default.
angerszhu created SPARK-39043: - Summary: Hive client should not gather statistic by default. Key: SPARK-39043 URL: https://issues.apache.org/jira/browse/SPARK-39043 Project: Spark Issue Type: Task Components: SQL Affects Versions: 3.2.0, 3.1.2, 3.3.0 Reporter: angerszhu When use `InsertIntoHiveTable`, when insert overwrite partition, it will call Hive.loadPartition(), in this method, when `hive.stats.autogather` is true(default is true) {code:java} // Some comments here public String getFoo() if (oldPart == null) { newTPart.getTPartition().setParameters(new HashMap()); if (this.getConf().getBoolVar(HiveConf.ConfVars.HIVESTATSAUTOGATHER)) { StatsSetupConst.setBasicStatsStateForCreateTable(newTPart.getParameters(), StatsSetupConst.TRUE); } public static void setBasicStatsStateForCreateTable(Map params, String setting) { if (TRUE.equals(setting)) { for (String stat : StatsSetupConst.supportedStats) { params.put(stat, "0"); } } setBasicStatsState(params, setting); } public static final String[] supportedStats = {NUM_FILES,ROW_COUNT,TOTAL_SIZE,RAW_DATA_SIZE}; {code} Then it set default rowNum as 0, but since spark will update numFiles and rawSize, so rowNum remain 0. This impact other system like presto's CBO. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39015) SparkRuntimeException when trying to get non-existent key in a map
[ https://issues.apache.org/jira/browse/SPARK-39015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17528717#comment-17528717 ] Apache Spark commented on SPARK-39015: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/36375 > SparkRuntimeException when trying to get non-existent key in a map > -- > > Key: SPARK-39015 > URL: https://issues.apache.org/jira/browse/SPARK-39015 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Raza Jafri >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 3.4.0 > > > [~maxgekk] submitted a > [commit|https://github.com/apache/spark/commit/bc8c264851457d8ef59f5b332c79296651ec5d1e] > that tries to convert the key to SQL but that part of the code is blowing > up. > {code:java} > scala> :pa > // Entering paste mode (ctrl-D to finish) > import org.apache.spark.sql.Row > import org.apache.spark.sql.types.StructType > import org.apache.spark.sql.types.StringType > import org.apache.spark.sql.types.DataTypes > val arrayStructureData = Seq( > Row(Map("hair"->"black", "eye"->"brown")), > Row(Map("hair"->"blond", "eye"->"blue")), > Row(Map())) > val mapType = DataTypes.createMapType(StringType,StringType) > val arrayStructureSchema = new StructType() > .add("properties", mapType) > val mapTypeDF = spark.createDataFrame( > spark.sparkContext.parallelize(arrayStructureData),arrayStructureSchema) > mapTypeDF.selectExpr("element_at(properties, 'hair')").show > // Exiting paste mode, now interpreting. > ++ > |element_at(properties, hair)| > ++ > | black| > | blond| > |null| > ++ > scala> spark.conf.set("spark.sql.ansi.enabled", true) > scala> mapTypeDF.selectExpr("element_at(properties, 'hair')").show > 22/04/25 18:26:01 ERROR Executor: Exception in task 6.0 in stage 5.0 (TID 23) > org.apache.spark.SparkRuntimeException: The feature is not supported: literal > for 'hair' of class org.apache.spark.unsafe.types.UTF8String. > at > org.apache.spark.sql.errors.QueryExecutionErrors$.literalTypeUnsupportedError(QueryExecutionErrors.scala:240) > ~[spark-catalyst_2.12-3.3.0-SNAPSHOT.jar:3.3.0-SNAPSHOT] > at > org.apache.spark.sql.catalyst.expressions.Literal$.apply(literals.scala:101) > ~[spark-catalyst_2.12-3.3.0-SNAPSHOT.jar:3.3.0-SNAPSHOT] > at > org.apache.spark.sql.errors.QueryErrorsBase.toSQLValue(QueryErrorsBase.scala:44) > ~[spark-catalyst_2.12-3.3.0-SNAPSHOT.jar:3.3.0-SNAPSHOT] > at > org.apache.spark.sql.errors.QueryErrorsBase.toSQLValue$(QueryErrorsBase.scala:43) > ~[spark-catalyst_2.12-3.3.0-SNAPSHOT.jar:3.3.0-SNAPSHOT] > at > org.apache.spark.sql.errors.QueryExecutionErrors$.toSQLValue(QueryExecutionErrors.scala:69) > ~[spark-catalyst_2.12-3.3.0-SNAPSHOT.jar:3.3.0-SNAPSHOT] > {code} > Seems like it's trying to convert UTF8String to a sql literal -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39015) SparkRuntimeException when trying to get non-existent key in a map
[ https://issues.apache.org/jira/browse/SPARK-39015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17528716#comment-17528716 ] Apache Spark commented on SPARK-39015: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/36375 > SparkRuntimeException when trying to get non-existent key in a map > -- > > Key: SPARK-39015 > URL: https://issues.apache.org/jira/browse/SPARK-39015 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Raza Jafri >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 3.4.0 > > > [~maxgekk] submitted a > [commit|https://github.com/apache/spark/commit/bc8c264851457d8ef59f5b332c79296651ec5d1e] > that tries to convert the key to SQL but that part of the code is blowing > up. > {code:java} > scala> :pa > // Entering paste mode (ctrl-D to finish) > import org.apache.spark.sql.Row > import org.apache.spark.sql.types.StructType > import org.apache.spark.sql.types.StringType > import org.apache.spark.sql.types.DataTypes > val arrayStructureData = Seq( > Row(Map("hair"->"black", "eye"->"brown")), > Row(Map("hair"->"blond", "eye"->"blue")), > Row(Map())) > val mapType = DataTypes.createMapType(StringType,StringType) > val arrayStructureSchema = new StructType() > .add("properties", mapType) > val mapTypeDF = spark.createDataFrame( > spark.sparkContext.parallelize(arrayStructureData),arrayStructureSchema) > mapTypeDF.selectExpr("element_at(properties, 'hair')").show > // Exiting paste mode, now interpreting. > ++ > |element_at(properties, hair)| > ++ > | black| > | blond| > |null| > ++ > scala> spark.conf.set("spark.sql.ansi.enabled", true) > scala> mapTypeDF.selectExpr("element_at(properties, 'hair')").show > 22/04/25 18:26:01 ERROR Executor: Exception in task 6.0 in stage 5.0 (TID 23) > org.apache.spark.SparkRuntimeException: The feature is not supported: literal > for 'hair' of class org.apache.spark.unsafe.types.UTF8String. > at > org.apache.spark.sql.errors.QueryExecutionErrors$.literalTypeUnsupportedError(QueryExecutionErrors.scala:240) > ~[spark-catalyst_2.12-3.3.0-SNAPSHOT.jar:3.3.0-SNAPSHOT] > at > org.apache.spark.sql.catalyst.expressions.Literal$.apply(literals.scala:101) > ~[spark-catalyst_2.12-3.3.0-SNAPSHOT.jar:3.3.0-SNAPSHOT] > at > org.apache.spark.sql.errors.QueryErrorsBase.toSQLValue(QueryErrorsBase.scala:44) > ~[spark-catalyst_2.12-3.3.0-SNAPSHOT.jar:3.3.0-SNAPSHOT] > at > org.apache.spark.sql.errors.QueryErrorsBase.toSQLValue$(QueryErrorsBase.scala:43) > ~[spark-catalyst_2.12-3.3.0-SNAPSHOT.jar:3.3.0-SNAPSHOT] > at > org.apache.spark.sql.errors.QueryExecutionErrors$.toSQLValue(QueryExecutionErrors.scala:69) > ~[spark-catalyst_2.12-3.3.0-SNAPSHOT.jar:3.3.0-SNAPSHOT] > {code} > Seems like it's trying to convert UTF8String to a sql literal -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39006) Show a directional error message for PVC Dynamic Allocation Failure
[ https://issues.apache.org/jira/browse/SPARK-39006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17528683#comment-17528683 ] Apache Spark commented on SPARK-39006: -- User 'dcoliversun' has created a pull request for this issue: https://github.com/apache/spark/pull/36374 > Show a directional error message for PVC Dynamic Allocation Failure > --- > > Key: SPARK-39006 > URL: https://issues.apache.org/jira/browse/SPARK-39006 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.1.0 >Reporter: qian >Priority: Major > > When spark application requires multiple executors and not set pvc claimName > with onDemand or SPARK_EXECUTOR_ID, it always create executor pods. Because > pvc has be created by first executor pod. > {noformat} > 22/04/22 08:55:47 WARN ExecutorPodsSnapshotsStoreImpl: Exception when > notifying snapshot subscriber. > io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: > POST at: > https://kubernetes.default.svc/api/v1/namespaces/default/persistentvolumeclaims. > Message: persistentvolumeclaims "test-1" already exists. Received status: > Status(apiVersion=v1, code=409, details=StatusDetails(causes=[], group=null, > kind=persistentvolumeclaims, name=test-1, retryAfterSeconds=null, uid=null, > additionalProperties={}), kind=Status, message=persistentvolumeclaims > "test-1" already exists, metadata=ListMeta(_continue=null, > remainingItemCount=null, resourceVersion=null, selfLink=null, > additionalProperties={}), reason=AlreadyExists, status=Failure, > additionalProperties={}). > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:697) > ~[kubernetes-client-5.10.1.jar:?] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:676) > ~[kubernetes-client-5.10.1.jar:?] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:629) > ~[kubernetes-client-5.10.1.jar:?] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:566) > ~[kubernetes-client-5.10.1.jar:?] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:527) > ~[kubernetes-client-5.10.1.jar:?] > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleCreate(OperationSupport.java:315) > ~[kubernetes-client-5.10.1.jar:?] > at > io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleCreate(BaseOperation.java:651) > ~[kubernetes-client-5.10.1.jar:?] > at > io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleCreate(BaseOperation.java:91) > ~[kubernetes-client-5.10.1.jar:?] > at > io.fabric8.kubernetes.client.dsl.base.CreateOnlyResourceOperation.create(CreateOnlyResourceOperation.java:61) > ~[kubernetes-client-5.10.1.jar:?] > at > org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$requestNewExecutors$3(ExecutorPodsAllocator.scala:415) > ~[spark-kubernetes_2.12-3.3.0-SNAPSHOT.jar:3.3.0-SNAPSHOT] > at scala.collection.immutable.List.foreach(List.scala:431) > ~[scala-library-2.12.15.jar:?] > at > org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$requestNewExecutors$1(ExecutorPodsAllocator.scala:408) > ~[spark-kubernetes_2.12-3.3.0-SNAPSHOT.jar:3.3.0-SNAPSHOT] > at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:158) > ~[scala-library-2.12.15.jar:?] > at > org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.requestNewExecutors(ExecutorPodsAllocator.scala:385) > ~[spark-kubernetes_2.12-3.3.0-SNAPSHOT.jar:3.3.0-SNAPSHOT] > at > org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$onNewSnapshots$35(ExecutorPodsAllocator.scala:349) > ~[spark-kubernetes_2.12-3.3.0-SNAPSHOT.jar:3.3.0-SNAPSHOT] > at > org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$onNewSnapshots$35$adapted(ExecutorPodsAllocator.scala:342) > ~[spark-kubernetes_2.12-3.3.0-SNAPSHOT.jar:3.3.0-SNAPSHOT] > at > scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) > ~[scala-library-2.12.15.jar:?] > at > scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) > ~[scala-library-2.12.15.jar:?] > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) > ~[scala-library-2.12.15.jar:?] > at > org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.onNewSnapshots(ExecutorPodsAllocator.scala:342) > ~[spark-kubernetes_2.12-3.3.0-SNAPSHOT.jar:3.3.0-SNAPSHOT] > at > org.apache.spark.scheduler.cluster.k8s.Exec
[jira] [Commented] (SPARK-39041) Mapping Spark Query ResultSet/Schema to TRowSet/TTableSchema directly
[ https://issues.apache.org/jira/browse/SPARK-39041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17528678#comment-17528678 ] Apache Spark commented on SPARK-39041: -- User 'yaooqinn' has created a pull request for this issue: https://github.com/apache/spark/pull/36373 > Mapping Spark Query ResultSet/Schema to TRowSet/TTableSchema directly > - > > Key: SPARK-39041 > URL: https://issues.apache.org/jira/browse/SPARK-39041 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Kent Yao >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39041) Mapping Spark Query ResultSet/Schema to TRowSet/TTableSchema directly
[ https://issues.apache.org/jira/browse/SPARK-39041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17528679#comment-17528679 ] Apache Spark commented on SPARK-39041: -- User 'yaooqinn' has created a pull request for this issue: https://github.com/apache/spark/pull/36373 > Mapping Spark Query ResultSet/Schema to TRowSet/TTableSchema directly > - > > Key: SPARK-39041 > URL: https://issues.apache.org/jira/browse/SPARK-39041 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Kent Yao >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39041) Mapping Spark Query ResultSet/Schema to TRowSet/TTableSchema directly
[ https://issues.apache.org/jira/browse/SPARK-39041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39041: Assignee: (was: Apache Spark) > Mapping Spark Query ResultSet/Schema to TRowSet/TTableSchema directly > - > > Key: SPARK-39041 > URL: https://issues.apache.org/jira/browse/SPARK-39041 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Kent Yao >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39041) Mapping Spark Query ResultSet/Schema to TRowSet/TTableSchema directly
[ https://issues.apache.org/jira/browse/SPARK-39041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39041: Assignee: Apache Spark > Mapping Spark Query ResultSet/Schema to TRowSet/TTableSchema directly > - > > Key: SPARK-39041 > URL: https://issues.apache.org/jira/browse/SPARK-39041 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Kent Yao >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org