[jira] [Assigned] (SPARK-40225) PySpark rdd.takeOrdered should check num and numPartitions
[ https://issues.apache.org/jira/browse/SPARK-40225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40225: Assignee: (was: Apache Spark) > PySpark rdd.takeOrdered should check num and numPartitions > -- > > Key: SPARK-40225 > URL: https://issues.apache.org/jira/browse/SPARK-40225 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40225) PySpark rdd.takeOrdered should check num and numPartitions
[ https://issues.apache.org/jira/browse/SPARK-40225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17585148#comment-17585148 ] Apache Spark commented on SPARK-40225: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/37669 > PySpark rdd.takeOrdered should check num and numPartitions > -- > > Key: SPARK-40225 > URL: https://issues.apache.org/jira/browse/SPARK-40225 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40225) PySpark rdd.takeOrdered should check num and numPartitions
[ https://issues.apache.org/jira/browse/SPARK-40225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17585146#comment-17585146 ] Apache Spark commented on SPARK-40225: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/37669 > PySpark rdd.takeOrdered should check num and numPartitions > -- > > Key: SPARK-40225 > URL: https://issues.apache.org/jira/browse/SPARK-40225 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40225) PySpark rdd.takeOrdered should check num and numPartitions
[ https://issues.apache.org/jira/browse/SPARK-40225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40225: Assignee: Apache Spark > PySpark rdd.takeOrdered should check num and numPartitions > -- > > Key: SPARK-40225 > URL: https://issues.apache.org/jira/browse/SPARK-40225 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Apache Spark >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-40206) Spark SQL Predict Pushdown for Hive Bucketed Table
[ https://issues.apache.org/jira/browse/SPARK-40206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-40206. -- Resolution: Invalid > Spark SQL Predict Pushdown for Hive Bucketed Table > -- > > Key: SPARK-40206 > URL: https://issues.apache.org/jira/browse/SPARK-40206 > Project: Spark > Issue Type: Question > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Raymond Tang >Priority: Minor > Labels: hive, hive-buckets, spark, spark-sql > > Hi team, > I was testing out Hive bucket table features. One of the benefits as most > documentation suggested is that bucketed hive table can be used for query > filer/predict pushdown to improve query performance. > However through my exploration, that doesn't seem to be true. *Can you please > help to clarify if Spark SQL supports query optimizations when using Hive > bucketed table?* > > How to produce the issue: > Create a Hive 3 table using the following DDL: > {code:java} > create table test_db.bucket_table(user_id int, key string) > comment 'A bucketed table' > partitioned by(country string) > clustered by(user_id) sorted by (key) into 10 buckets > stored as ORC;{code} > And then insert into this table using the following PySpark script: > {code:java} > from pyspark.sql import SparkSession > appName = "PySpark Hive Bucketing Example" > master = "local" > # Create Spark session with Hive supported. > spark = SparkSession.builder \ > .appName(appName) \ > .master(master) \ > .enableHiveSupport() \ > .getOrCreate() > # prepare sample data for inserting into hive table > data = [] > countries = ['CN', 'AU'] > for i in range(0, 1000): > data.append([int(i), 'U'+str(i), countries[i % 2]]) > df = spark.createDataFrame(data, ['user_id', 'key', 'country']) > df.show() > # Save df to Hive table test_db.bucket_table > df.write.mode('append').insertInto('test_db.bucket_table') {code} > Then query the table using the following script: > {code:java} > from pyspark.sql import SparkSession > appName = "PySpark Hive Bucketing Example" > master = "local" > # Create Spark session with Hive supported. > spark = SparkSession.builder \ > .appName(appName) \ > .master(master) \ > .enableHiveSupport() \ > .getOrCreate() > df = spark.sql("""select * from test_db.bucket_table > where country='AU' and user_id=101 > """) > df.show() > df.explain(extended=True) {code} > I am expecting to read from only one bucket file in HDFS but instead Spark > scanned all bucket files in partition folder country=AU. > {code:java} > == Parsed Logical Plan == > 'Project [*] > - 'Filter (('country = AU) AND ('t1.user_id = 101)) > - 'SubqueryAlias t1 >- 'UnresolvedRelation [test_db, bucket_table], [], false > == Analyzed Logical Plan == > user_id: int, key: string, country: string > Project [user_id#20, key#21, country#22] > - Filter ((country#22 = AU) AND (user_id#20 = 101)) > - SubqueryAlias t1 >- SubqueryAlias spark_catalog.test_db.bucket_table > - Relation test_db.bucket_table[user_id#20,key#21,country#22] orc > == Optimized Logical Plan == > Filter (((isnotnull(country#22) AND isnotnull(user_id#20)) AND (country#22 = > AU)) AND (user_id#20 = 101)) > - Relation test_db.bucket_table[user_id#20,key#21,country#22] orc > == Physical Plan == > *(1) Filter (isnotnull(user_id#20) AND (user_id#20 = 101)) > - *(1) ColumnarToRow > - FileScan orc test_db.bucket_table[user_id#20,key#21,country#22] > Batched: true, DataFilters: [isnotnull(user_id#20), (user_id#20 = 101)], > Format: ORC, Location: InMemoryFileIndex(1 > paths)[hdfs://localhost:9000/user/hive/warehouse/test_db.db/bucket_table/coun..., > PartitionFilters: [isnotnull(country#22), (country#22 = AU)], PushedFilters: > [IsNotNull(user_id), EqualTo(user_id,101)], ReadSchema: > struct {code} > *Am I doing something wrong? or is it because Spark doesn't support it? Your > guidance and help will be appreciated.* > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40204) Whether it is possible to support querying the status of a specific application in a subsequent version
[ https://issues.apache.org/jira/browse/SPARK-40204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-40204: - Flags: (was: Important) > Whether it is possible to support querying the status of a specific > application in a subsequent version > --- > > Key: SPARK-40204 > URL: https://issues.apache.org/jira/browse/SPARK-40204 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.4.4, 2.4.6 > Environment: Standalone Cluster Mode >Reporter: bitao >Priority: Major > Labels: features > > The current SparkAppHandler cannot support obtaining the application status > in Standalone Cluster mode. One way is to query the status of the specified > Driver through the StandaloneRestServer, but it cannot query the status of > the specified application. Is it possible to add a method (eg: > handleAppStatus) to the StandaloneRestServer by asking the Master Send the > RequestMasterState message to get the state of the specified application. The > current MasterWebUI should do this, but the premise is that it needs to use > the same RpcEnv as the Master Endpoint. Many times we care about the status > of the application rather than the status of the Driver, so we hope to add > this function in subsequent versions to support obtaining the status of the > specified application in Standalone cluster mode. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40206) Spark SQL Predict Pushdown for Hive Bucketed Table
[ https://issues.apache.org/jira/browse/SPARK-40206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17585135#comment-17585135 ] Hyukjin Kwon commented on SPARK-40206: -- [~raymond.tang] Let;s probably ask the questions to dev mailing list before filing it as a JIRA here. We're promoting to ask questions in other channels. > Spark SQL Predict Pushdown for Hive Bucketed Table > -- > > Key: SPARK-40206 > URL: https://issues.apache.org/jira/browse/SPARK-40206 > Project: Spark > Issue Type: Question > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Raymond Tang >Priority: Minor > Labels: hive, hive-buckets, spark, spark-sql > > Hi team, > I was testing out Hive bucket table features. One of the benefits as most > documentation suggested is that bucketed hive table can be used for query > filer/predict pushdown to improve query performance. > However through my exploration, that doesn't seem to be true. *Can you please > help to clarify if Spark SQL supports query optimizations when using Hive > bucketed table?* > > How to produce the issue: > Create a Hive 3 table using the following DDL: > {code:java} > create table test_db.bucket_table(user_id int, key string) > comment 'A bucketed table' > partitioned by(country string) > clustered by(user_id) sorted by (key) into 10 buckets > stored as ORC;{code} > And then insert into this table using the following PySpark script: > {code:java} > from pyspark.sql import SparkSession > appName = "PySpark Hive Bucketing Example" > master = "local" > # Create Spark session with Hive supported. > spark = SparkSession.builder \ > .appName(appName) \ > .master(master) \ > .enableHiveSupport() \ > .getOrCreate() > # prepare sample data for inserting into hive table > data = [] > countries = ['CN', 'AU'] > for i in range(0, 1000): > data.append([int(i), 'U'+str(i), countries[i % 2]]) > df = spark.createDataFrame(data, ['user_id', 'key', 'country']) > df.show() > # Save df to Hive table test_db.bucket_table > df.write.mode('append').insertInto('test_db.bucket_table') {code} > Then query the table using the following script: > {code:java} > from pyspark.sql import SparkSession > appName = "PySpark Hive Bucketing Example" > master = "local" > # Create Spark session with Hive supported. > spark = SparkSession.builder \ > .appName(appName) \ > .master(master) \ > .enableHiveSupport() \ > .getOrCreate() > df = spark.sql("""select * from test_db.bucket_table > where country='AU' and user_id=101 > """) > df.show() > df.explain(extended=True) {code} > I am expecting to read from only one bucket file in HDFS but instead Spark > scanned all bucket files in partition folder country=AU. > {code:java} > == Parsed Logical Plan == > 'Project [*] > - 'Filter (('country = AU) AND ('t1.user_id = 101)) > - 'SubqueryAlias t1 >- 'UnresolvedRelation [test_db, bucket_table], [], false > == Analyzed Logical Plan == > user_id: int, key: string, country: string > Project [user_id#20, key#21, country#22] > - Filter ((country#22 = AU) AND (user_id#20 = 101)) > - SubqueryAlias t1 >- SubqueryAlias spark_catalog.test_db.bucket_table > - Relation test_db.bucket_table[user_id#20,key#21,country#22] orc > == Optimized Logical Plan == > Filter (((isnotnull(country#22) AND isnotnull(user_id#20)) AND (country#22 = > AU)) AND (user_id#20 = 101)) > - Relation test_db.bucket_table[user_id#20,key#21,country#22] orc > == Physical Plan == > *(1) Filter (isnotnull(user_id#20) AND (user_id#20 = 101)) > - *(1) ColumnarToRow > - FileScan orc test_db.bucket_table[user_id#20,key#21,country#22] > Batched: true, DataFilters: [isnotnull(user_id#20), (user_id#20 = 101)], > Format: ORC, Location: InMemoryFileIndex(1 > paths)[hdfs://localhost:9000/user/hive/warehouse/test_db.db/bucket_table/coun..., > PartitionFilters: [isnotnull(country#22), (country#22 = AU)], PushedFilters: > [IsNotNull(user_id), EqualTo(user_id,101)], ReadSchema: > struct {code} > *Am I doing something wrong? or is it because Spark doesn't support it? Your > guidance and help will be appreciated.* > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40225) PySpark rdd.takeOrdered should check num and numPartitions
Ruifeng Zheng created SPARK-40225: - Summary: PySpark rdd.takeOrdered should check num and numPartitions Key: SPARK-40225 URL: https://issues.apache.org/jira/browse/SPARK-40225 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 3.4.0 Reporter: Ruifeng Zheng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39656) Fix wrong namespace in DescribeNamespaceExec
[ https://issues.apache.org/jira/browse/SPARK-39656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] XiDuo You updated SPARK-39656: -- Fix Version/s: 3.1.4 3.4.0 3.3.1 3.2.3 > Fix wrong namespace in DescribeNamespaceExec > > > Key: SPARK-39656 > URL: https://issues.apache.org/jira/browse/SPARK-39656 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: XiDuo You >Priority: Minor > Fix For: 3.1.4, 3.4.0, 3.3.1, 3.2.3 > > > DescribeNamespaceExec should show whole namespace rather than last -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40224) Make ObjectHashAggregateExec release memory eagerly when fallback to sort-based
[ https://issues.apache.org/jira/browse/SPARK-40224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17585109#comment-17585109 ] Apache Spark commented on SPARK-40224: -- User 'ulysses-you' has created a pull request for this issue: https://github.com/apache/spark/pull/37668 > Make ObjectHashAggregateExec release memory eagerly when fallback to > sort-based > --- > > Key: SPARK-40224 > URL: https://issues.apache.org/jira/browse/SPARK-40224 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: XiDuo You >Priority: Major > > Avoid OOM issue as far as possible: > {code:java} > ObjectAggregationIterator INFO - Aggregation hash map size 128 reaches > threshold capacity (128 entries), spilling and falling back to sort based > aggregation. You may change the threshold by adjust option > spark.sql.objectHashAggregate.sortBased.fallbackThreshold > # > # java.lang.OutOfMemoryError: Java heap space > # -XX:OnOutOfMemoryError="kill %p" > # Executing /bin/sh -c "kill 46725"...{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40224) Make ObjectHashAggregateExec release memory eagerly when fallback to sort-based
[ https://issues.apache.org/jira/browse/SPARK-40224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40224: Assignee: (was: Apache Spark) > Make ObjectHashAggregateExec release memory eagerly when fallback to > sort-based > --- > > Key: SPARK-40224 > URL: https://issues.apache.org/jira/browse/SPARK-40224 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: XiDuo You >Priority: Major > > Avoid OOM issue as far as possible: > {code:java} > ObjectAggregationIterator INFO - Aggregation hash map size 128 reaches > threshold capacity (128 entries), spilling and falling back to sort based > aggregation. You may change the threshold by adjust option > spark.sql.objectHashAggregate.sortBased.fallbackThreshold > # > # java.lang.OutOfMemoryError: Java heap space > # -XX:OnOutOfMemoryError="kill %p" > # Executing /bin/sh -c "kill 46725"...{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40224) Make ObjectHashAggregateExec release memory eagerly when fallback to sort-based
[ https://issues.apache.org/jira/browse/SPARK-40224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40224: Assignee: Apache Spark > Make ObjectHashAggregateExec release memory eagerly when fallback to > sort-based > --- > > Key: SPARK-40224 > URL: https://issues.apache.org/jira/browse/SPARK-40224 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: XiDuo You >Assignee: Apache Spark >Priority: Major > > Avoid OOM issue as far as possible: > {code:java} > ObjectAggregationIterator INFO - Aggregation hash map size 128 reaches > threshold capacity (128 entries), spilling and falling back to sort based > aggregation. You may change the threshold by adjust option > spark.sql.objectHashAggregate.sortBased.fallbackThreshold > # > # java.lang.OutOfMemoryError: Java heap space > # -XX:OnOutOfMemoryError="kill %p" > # Executing /bin/sh -c "kill 46725"...{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-26568) Too many partitions may cause thriftServer frequently Full GC
[ https://issues.apache.org/jira/browse/SPARK-26568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17585103#comment-17585103 ] gglinux edited comment on SPARK-26568 at 8/26/22 2:46 AM: -- We encountered the same problem and can be reproduced.[~srowen] [~cane] When there are 200 fields and 18 partitions. This problem can be triggered. Please refer to the following comments for specific logs was (Author: JIRAUSER294757): We encountered the same problem and can be reproduced. When there are 200 fields and 18 partitions. This problem can be triggered. [^error.log] > Too many partitions may cause thriftServer frequently Full GC > - > > Key: SPARK-26568 > URL: https://issues.apache.org/jira/browse/SPARK-26568 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.0 >Reporter: zhoukang >Priority: Major > Attachments: error.log > > > The reason is that: > first we have a table with many partitions(may be several hundred);second, we > have some concurrent queries.Then the long-running thriftServer may encounter > OOM issue. > Here is a case: > call stack of OOM thread: > {code:java} > pool-34-thread-10 > at > org.apache.hadoop.hive.metastore.api.StorageDescriptor.(Lorg/apache/hadoop/hive/metastore/api/StorageDescriptor;)V > (StorageDescriptor.java:240) > at > org.apache.hadoop.hive.metastore.api.Partition.(Lorg/apache/hadoop/hive/metastore/api/Partition;)V > (Partition.java:216) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.deepCopy(Lorg/apache/hadoop/hive/metastore/api/Partition;)Lorg/apache/hadoop/hive/metastore/api/Partition; > (HiveMetaStoreClient.java:1343) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.deepCopyPartitions(Ljava/util/Collection;Ljava/util/List;)Ljava/util/List; > (HiveMetaStoreClient.java:1409) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.deepCopyPartitions(Ljava/util/List;)Ljava/util/List; > (HiveMetaStoreClient.java:1397) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.listPartitionsByFilter(Ljava/lang/String;Ljava/lang/String;Ljava/lang/String;S)Ljava/util/List; > (HiveMetaStoreClient.java:914) > at > sun.reflect.GeneratedMethodAccessor98.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; > (Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; > (DelegatingMethodAccessorImpl.java:43) > at > java.lang.reflect.Method.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; > (Method.java:606) > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(Ljava/lang/Object;Ljava/lang/reflect/Method;[Ljava/lang/Object;)Ljava/lang/Object; > (RetryingMetaStoreClient.java:90) > at > com.sun.proxy.$Proxy30.listPartitionsByFilter(Ljava/lang/String;Ljava/lang/String;Ljava/lang/String;S)Ljava/util/List; > (Unknown Source) > at > org.apache.hadoop.hive.ql.metadata.Hive.getPartitionsByFilter(Lorg/apache/hadoop/hive/ql/metadata/Table;Ljava/lang/String;)Ljava/util/List; > (Hive.java:1967) > at > sun.reflect.GeneratedMethodAccessor97.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; > (Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; > (DelegatingMethodAccessorImpl.java:43) > at > java.lang.reflect.Method.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; > (Method.java:606) > at > org.apache.spark.sql.hive.client.Shim_v0_13.getPartitionsByFilter(Lorg/apache/hadoop/hive/ql/metadata/Hive;Lorg/apache/hadoop/hive/ql/metadata/Table;Lscala/collection/Seq;)Lscala/collection/Seq; > (HiveShim.scala:602) > at > org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionsByFilter$1.apply()Lscala/collection/Seq; > (HiveClientImpl.scala:608) > at > org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionsByFilter$1.apply()Ljava/lang/Object; > (HiveClientImpl.scala:606) > at > org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply()Ljava/lang/Object; > (HiveClientImpl.scala:321) > at > org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(Lscala/Function0;Lscala/runtime/IntRef;Lscala/runtime/ObjectRef;Ljava/lang/Object;)V > (HiveClientImpl.scala:264) > at > org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(Lscala/Function0;)Ljava/lang/Object; > (HiveClientImpl.scala:263) > at > org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(Lscala/Function0;)Ljava/lang/Object; > (HiveClientImpl.scala:307) > at >
[jira] [Created] (SPARK-40224) Make ObjectHashAggregateExec release memory eagerly when fallback to sort-based
XiDuo You created SPARK-40224: - Summary: Make ObjectHashAggregateExec release memory eagerly when fallback to sort-based Key: SPARK-40224 URL: https://issues.apache.org/jira/browse/SPARK-40224 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.4.0 Reporter: XiDuo You Avoid OOM issue as far as possible: {code:java} ObjectAggregationIterator INFO - Aggregation hash map size 128 reaches threshold capacity (128 entries), spilling and falling back to sort based aggregation. You may change the threshold by adjust option spark.sql.objectHashAggregate.sortBased.fallbackThreshold # # java.lang.OutOfMemoryError: Java heap space # -XX:OnOutOfMemoryError="kill %p" # Executing /bin/sh -c "kill 46725"...{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-26568) Too many partitions may cause thriftServer frequently Full GC
[ https://issues.apache.org/jira/browse/SPARK-26568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17585104#comment-17585104 ] gglinux edited comment on SPARK-26568 at 8/26/22 2:46 AM: -- {code:java} at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56) at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.doExecute(ShuffleExchangeExec.scala:119) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at org.apache.spark.sql.execution.SortExec.doExecute(SortExec.scala:101) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at org.apache.spark.sql.execution.window.WindowExec.doExecute(WindowExec.scala:302) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at org.apache.spark.sql.execution.FilterExec.doExecute(basicPhysicalOperators.scala:213) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at org.apache.spark.sql.execution.ProjectExec.doExecute(basicPhysicalOperators.scala:70) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at org.apache.spark.sql.execution.UnionExec$$anonfun$doExecute$1.apply(basicPhysicalOperators.scala:571) at org.apache.spark.sql.execution.UnionExec$$anonfun$doExecute$1.apply(basicPhysicalOperators.scala:571) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.immutable.List.foreach(List.scala:392) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.immutable.List.map(List.scala:296) at org.apache.spark.sql.execution.UnionExec.doExecute(basicPhysicalOperators.scala:571) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at
[jira] [Assigned] (SPARK-40153) Unify the logic of resolve functions and table-valued functions
[ https://issues.apache.org/jira/browse/SPARK-40153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-40153: --- Assignee: Allison Wang > Unify the logic of resolve functions and table-valued functions > --- > > Key: SPARK-40153 > URL: https://issues.apache.org/jira/browse/SPARK-40153 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Major > > Make ResolveTableValuedFunctions similar to ResolveFunctions: first try > resolving the function as a built-in or temp function, then expand the > identifier and resolve it as a persistent function. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-40153) Unify the logic of resolve functions and table-valued functions
[ https://issues.apache.org/jira/browse/SPARK-40153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-40153. - Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 37586 [https://github.com/apache/spark/pull/37586] > Unify the logic of resolve functions and table-valued functions > --- > > Key: SPARK-40153 > URL: https://issues.apache.org/jira/browse/SPARK-40153 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Major > Fix For: 3.4.0 > > > Make ResolveTableValuedFunctions similar to ResolveFunctions: first try > resolving the function as a built-in or temp function, then expand the > identifier and resolve it as a persistent function. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26568) Too many partitions may cause thriftServer frequently Full GC
[ https://issues.apache.org/jira/browse/SPARK-26568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17585104#comment-17585104 ] gglinux commented on SPARK-26568: - {code:java} //代码占位符 at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56) at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.doExecute(ShuffleExchangeExec.scala:119) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at org.apache.spark.sql.execution.SortExec.doExecute(SortExec.scala:101) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at org.apache.spark.sql.execution.window.WindowExec.doExecute(WindowExec.scala:302) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at org.apache.spark.sql.execution.FilterExec.doExecute(basicPhysicalOperators.scala:213) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at org.apache.spark.sql.execution.ProjectExec.doExecute(basicPhysicalOperators.scala:70) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at org.apache.spark.sql.execution.UnionExec$$anonfun$doExecute$1.apply(basicPhysicalOperators.scala:571) at org.apache.spark.sql.execution.UnionExec$$anonfun$doExecute$1.apply(basicPhysicalOperators.scala:571) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.immutable.List.foreach(List.scala:392) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.immutable.List.map(List.scala:296) at org.apache.spark.sql.execution.UnionExec.doExecute(basicPhysicalOperators.scala:571) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at
[jira] [Commented] (SPARK-26568) Too many partitions may cause thriftServer frequently Full GC
[ https://issues.apache.org/jira/browse/SPARK-26568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17585103#comment-17585103 ] gglinux commented on SPARK-26568: - We encountered the same problem and can be reproduced. When there are 200 fields and 18 partitions. This problem can be triggered. [^error.log] > Too many partitions may cause thriftServer frequently Full GC > - > > Key: SPARK-26568 > URL: https://issues.apache.org/jira/browse/SPARK-26568 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.0 >Reporter: zhoukang >Priority: Major > Attachments: error.log > > > The reason is that: > first we have a table with many partitions(may be several hundred);second, we > have some concurrent queries.Then the long-running thriftServer may encounter > OOM issue. > Here is a case: > call stack of OOM thread: > {code:java} > pool-34-thread-10 > at > org.apache.hadoop.hive.metastore.api.StorageDescriptor.(Lorg/apache/hadoop/hive/metastore/api/StorageDescriptor;)V > (StorageDescriptor.java:240) > at > org.apache.hadoop.hive.metastore.api.Partition.(Lorg/apache/hadoop/hive/metastore/api/Partition;)V > (Partition.java:216) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.deepCopy(Lorg/apache/hadoop/hive/metastore/api/Partition;)Lorg/apache/hadoop/hive/metastore/api/Partition; > (HiveMetaStoreClient.java:1343) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.deepCopyPartitions(Ljava/util/Collection;Ljava/util/List;)Ljava/util/List; > (HiveMetaStoreClient.java:1409) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.deepCopyPartitions(Ljava/util/List;)Ljava/util/List; > (HiveMetaStoreClient.java:1397) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.listPartitionsByFilter(Ljava/lang/String;Ljava/lang/String;Ljava/lang/String;S)Ljava/util/List; > (HiveMetaStoreClient.java:914) > at > sun.reflect.GeneratedMethodAccessor98.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; > (Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; > (DelegatingMethodAccessorImpl.java:43) > at > java.lang.reflect.Method.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; > (Method.java:606) > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(Ljava/lang/Object;Ljava/lang/reflect/Method;[Ljava/lang/Object;)Ljava/lang/Object; > (RetryingMetaStoreClient.java:90) > at > com.sun.proxy.$Proxy30.listPartitionsByFilter(Ljava/lang/String;Ljava/lang/String;Ljava/lang/String;S)Ljava/util/List; > (Unknown Source) > at > org.apache.hadoop.hive.ql.metadata.Hive.getPartitionsByFilter(Lorg/apache/hadoop/hive/ql/metadata/Table;Ljava/lang/String;)Ljava/util/List; > (Hive.java:1967) > at > sun.reflect.GeneratedMethodAccessor97.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; > (Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; > (DelegatingMethodAccessorImpl.java:43) > at > java.lang.reflect.Method.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; > (Method.java:606) > at > org.apache.spark.sql.hive.client.Shim_v0_13.getPartitionsByFilter(Lorg/apache/hadoop/hive/ql/metadata/Hive;Lorg/apache/hadoop/hive/ql/metadata/Table;Lscala/collection/Seq;)Lscala/collection/Seq; > (HiveShim.scala:602) > at > org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionsByFilter$1.apply()Lscala/collection/Seq; > (HiveClientImpl.scala:608) > at > org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionsByFilter$1.apply()Ljava/lang/Object; > (HiveClientImpl.scala:606) > at > org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply()Ljava/lang/Object; > (HiveClientImpl.scala:321) > at > org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(Lscala/Function0;Lscala/runtime/IntRef;Lscala/runtime/ObjectRef;Ljava/lang/Object;)V > (HiveClientImpl.scala:264) > at > org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(Lscala/Function0;)Ljava/lang/Object; > (HiveClientImpl.scala:263) > at > org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(Lscala/Function0;)Ljava/lang/Object; > (HiveClientImpl.scala:307) > at > org.apache.spark.sql.hive.client.HiveClientImpl.getPartitionsByFilter(Lorg/apache/spark/sql/catalyst/catalog/CatalogTable;Lscala/collection/Seq;)Lscala/collection/Seq; > (HiveClientImpl.scala:606) > at > org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$listPartitionsByFilter$1.apply()Lscala/collection/Seq; > (HiveExternalCatalog.scala:1017) > at >
[jira] [Updated] (SPARK-26568) Too many partitions may cause thriftServer frequently Full GC
[ https://issues.apache.org/jira/browse/SPARK-26568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gglinux updated SPARK-26568: Attachment: error.log > Too many partitions may cause thriftServer frequently Full GC > - > > Key: SPARK-26568 > URL: https://issues.apache.org/jira/browse/SPARK-26568 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.0 >Reporter: zhoukang >Priority: Major > Attachments: error.log > > > The reason is that: > first we have a table with many partitions(may be several hundred);second, we > have some concurrent queries.Then the long-running thriftServer may encounter > OOM issue. > Here is a case: > call stack of OOM thread: > {code:java} > pool-34-thread-10 > at > org.apache.hadoop.hive.metastore.api.StorageDescriptor.(Lorg/apache/hadoop/hive/metastore/api/StorageDescriptor;)V > (StorageDescriptor.java:240) > at > org.apache.hadoop.hive.metastore.api.Partition.(Lorg/apache/hadoop/hive/metastore/api/Partition;)V > (Partition.java:216) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.deepCopy(Lorg/apache/hadoop/hive/metastore/api/Partition;)Lorg/apache/hadoop/hive/metastore/api/Partition; > (HiveMetaStoreClient.java:1343) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.deepCopyPartitions(Ljava/util/Collection;Ljava/util/List;)Ljava/util/List; > (HiveMetaStoreClient.java:1409) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.deepCopyPartitions(Ljava/util/List;)Ljava/util/List; > (HiveMetaStoreClient.java:1397) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.listPartitionsByFilter(Ljava/lang/String;Ljava/lang/String;Ljava/lang/String;S)Ljava/util/List; > (HiveMetaStoreClient.java:914) > at > sun.reflect.GeneratedMethodAccessor98.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; > (Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; > (DelegatingMethodAccessorImpl.java:43) > at > java.lang.reflect.Method.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; > (Method.java:606) > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(Ljava/lang/Object;Ljava/lang/reflect/Method;[Ljava/lang/Object;)Ljava/lang/Object; > (RetryingMetaStoreClient.java:90) > at > com.sun.proxy.$Proxy30.listPartitionsByFilter(Ljava/lang/String;Ljava/lang/String;Ljava/lang/String;S)Ljava/util/List; > (Unknown Source) > at > org.apache.hadoop.hive.ql.metadata.Hive.getPartitionsByFilter(Lorg/apache/hadoop/hive/ql/metadata/Table;Ljava/lang/String;)Ljava/util/List; > (Hive.java:1967) > at > sun.reflect.GeneratedMethodAccessor97.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; > (Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; > (DelegatingMethodAccessorImpl.java:43) > at > java.lang.reflect.Method.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; > (Method.java:606) > at > org.apache.spark.sql.hive.client.Shim_v0_13.getPartitionsByFilter(Lorg/apache/hadoop/hive/ql/metadata/Hive;Lorg/apache/hadoop/hive/ql/metadata/Table;Lscala/collection/Seq;)Lscala/collection/Seq; > (HiveShim.scala:602) > at > org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionsByFilter$1.apply()Lscala/collection/Seq; > (HiveClientImpl.scala:608) > at > org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionsByFilter$1.apply()Ljava/lang/Object; > (HiveClientImpl.scala:606) > at > org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply()Ljava/lang/Object; > (HiveClientImpl.scala:321) > at > org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(Lscala/Function0;Lscala/runtime/IntRef;Lscala/runtime/ObjectRef;Ljava/lang/Object;)V > (HiveClientImpl.scala:264) > at > org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(Lscala/Function0;)Ljava/lang/Object; > (HiveClientImpl.scala:263) > at > org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(Lscala/Function0;)Ljava/lang/Object; > (HiveClientImpl.scala:307) > at > org.apache.spark.sql.hive.client.HiveClientImpl.getPartitionsByFilter(Lorg/apache/spark/sql/catalyst/catalog/CatalogTable;Lscala/collection/Seq;)Lscala/collection/Seq; > (HiveClientImpl.scala:606) > at > org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$listPartitionsByFilter$1.apply()Lscala/collection/Seq; > (HiveExternalCatalog.scala:1017) > at > org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$listPartitionsByFilter$1.apply()Ljava/lang/Object; > (HiveExternalCatalog.scala:1000) > at >
[jira] [Updated] (SPARK-40223) Cannot alter table with locale tr
[ https://issues.apache.org/jira/browse/SPARK-40223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-40223: Description: How to reproduce this issue: {code:scala} test("Test update stats with locale tr") { withSQLConf(SQLConf.CASE_SENSITIVE.key -> "true", SQLConf.AUTO_SIZE_UPDATE_ENABLED.key -> "true") { withLocale("tr") { val tabName = "tAb_I" withTable(tabName) { sql(s"CREATE TABLE $tabName(col_I int)") sql(s"INSERT OVERWRITE TABLE $tabName SELECT 1") } } } } {code} Error: {noformat} Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: [tab_ı]: is not a valid table name at org.apache.hadoop.hive.ql.metadata.Table.checkValidity(Table.java:192) at org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:623) at org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:612) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) {noformat} was: How to reproduce this issue: {code:scala} test("Test update stats with locale tr") { withSQLConf(SQLConf.CASE_SENSITIVE.key -> "true", SQLConf.AUTO_SIZE_UPDATE_ENABLED.key -> "true") { withLocale("tr") { val tabName = "tAb_I" withTable(tabName) { sql(s"CREATE TABLE $tabName(col_I int) USING PARQUET") sql(s"INSERT OVERWRITE TABLE $tabName SELECT 1") } } } } {code} Error: {noformat} Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: [tab_ı]: is not a valid table name at org.apache.hadoop.hive.ql.metadata.Table.checkValidity(Table.java:192) at org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:623) at org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:612) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) {noformat} > Cannot alter table with locale tr > - > > Key: SPARK-40223 > URL: https://issues.apache.org/jira/browse/SPARK-40223 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Yuming Wang >Priority: Major > > How to reproduce this issue: > {code:scala} > test("Test update stats with locale tr") { > withSQLConf(SQLConf.CASE_SENSITIVE.key -> "true", > SQLConf.AUTO_SIZE_UPDATE_ENABLED.key -> "true") { > withLocale("tr") { > val tabName = "tAb_I" > withTable(tabName) { > sql(s"CREATE TABLE $tabName(col_I int)") > sql(s"INSERT OVERWRITE TABLE $tabName SELECT 1") > } > } > } > } > {code} > Error: > {noformat} > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: [tab_ı]: is not > a valid table name > at > org.apache.hadoop.hive.ql.metadata.Table.checkValidity(Table.java:192) > at org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:623) > at org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:612) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40223) Cannot alter table with locale tr
Yuming Wang created SPARK-40223: --- Summary: Cannot alter table with locale tr Key: SPARK-40223 URL: https://issues.apache.org/jira/browse/SPARK-40223 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.4.0 Reporter: Yuming Wang How to reproduce this issue: {code:scala} test("Test update stats with locale tr") { withSQLConf(SQLConf.CASE_SENSITIVE.key -> "true", SQLConf.AUTO_SIZE_UPDATE_ENABLED.key -> "true") { withLocale("tr") { val tabName = "tAb_I" withTable(tabName) { sql(s"CREATE TABLE $tabName(col_I int) USING PARQUET") sql(s"INSERT OVERWRITE TABLE $tabName SELECT 1") } } } } {code} Error: {noformat} Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: [tab_ı]: is not a valid table name at org.apache.hadoop.hive.ql.metadata.Table.checkValidity(Table.java:192) at org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:623) at org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:612) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40221) Not able to format using scalafmt
[ https://issues.apache.org/jira/browse/SPARK-40221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17585077#comment-17585077 ] Hyukjin Kwon commented on SPARK-40221: -- Does this still happen in the latest master branch? I can't reproduce it > Not able to format using scalafmt > - > > Key: SPARK-40221 > URL: https://issues.apache.org/jira/browse/SPARK-40221 > Project: Spark > Issue Type: Question > Components: Build >Affects Versions: 3.4.0 >Reporter: Ziqi Liu >Priority: Major > > I'm following the guidance in [https://spark.apache.org/developer-tools.html] > using > {code:java} > ./dev/scalafmt{code} > to format the code, but getting this error: > {code:java} > [ERROR] Failed to execute goal > org.antipathy:mvn-scalafmt_2.12:1.1.1640084764.9f463a9:format (default-cli) > on project spark-parent_2.12: Error formatting Scala files: missing setting > 'version'. To fix this problem, add the following line to .scalafmt.conf: > 'version=3.2.1'. -> [Help 1] > [ERROR] > [ERROR] To see the full stack trace of the errors, re-run Maven with the -e > switch. > [ERROR] Re-run Maven using the -X switch to enable full debug logging. > [ERROR] > [ERROR] For more information about the errors and possible solutions, please > read the following articles: > [ERROR] [Help 1] > http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-40049) Add adaptive plan case in ReplaceNullWithFalseInPredicateEndToEndSuite
[ https://issues.apache.org/jira/browse/SPARK-40049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-40049. -- Resolution: Invalid > Add adaptive plan case in ReplaceNullWithFalseInPredicateEndToEndSuite > -- > > Key: SPARK-40049 > URL: https://issues.apache.org/jira/browse/SPARK-40049 > Project: Spark > Issue Type: Test > Components: SQL, Tests >Affects Versions: 3.4.0 >Reporter: Kazuyuki Tanimura >Priority: Minor > > Currently `ReplaceNullWithFalseInPredicateEndToEndSuite` assumes that > adaptive query execution is turned off. We should add cases > `spark.sql.adaptive.forceApply=true` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40049) Add adaptive plan case in ReplaceNullWithFalseInPredicateEndToEndSuite
[ https://issues.apache.org/jira/browse/SPARK-40049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-40049: - Fix Version/s: (was: 3.4.0) > Add adaptive plan case in ReplaceNullWithFalseInPredicateEndToEndSuite > -- > > Key: SPARK-40049 > URL: https://issues.apache.org/jira/browse/SPARK-40049 > Project: Spark > Issue Type: Test > Components: SQL, Tests >Affects Versions: 3.4.0 >Reporter: Kazuyuki Tanimura >Priority: Minor > > Currently `ReplaceNullWithFalseInPredicateEndToEndSuite` assumes that > adaptive query execution is turned off. We should add cases > `spark.sql.adaptive.forceApply=true` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-40049) Add adaptive plan case in ReplaceNullWithFalseInPredicateEndToEndSuite
[ https://issues.apache.org/jira/browse/SPARK-40049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reopened SPARK-40049: -- Assignee: (was: Kazuyuki Tanimura) > Add adaptive plan case in ReplaceNullWithFalseInPredicateEndToEndSuite > -- > > Key: SPARK-40049 > URL: https://issues.apache.org/jira/browse/SPARK-40049 > Project: Spark > Issue Type: Test > Components: SQL, Tests >Affects Versions: 3.4.0 >Reporter: Kazuyuki Tanimura >Priority: Minor > Fix For: 3.4.0 > > > Currently `ReplaceNullWithFalseInPredicateEndToEndSuite` assumes that > adaptive query execution is turned off. We should add cases > `spark.sql.adaptive.forceApply=true` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-40088) Add SparkPlanWIthAQESuite
[ https://issues.apache.org/jira/browse/SPARK-40088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reopened SPARK-40088: -- Assignee: (was: Kazuyuki Tanimura) > Add SparkPlanWIthAQESuite > - > > Key: SPARK-40088 > URL: https://issues.apache.org/jira/browse/SPARK-40088 > Project: Spark > Issue Type: Test > Components: SQL, Tests >Affects Versions: 3.4.0 >Reporter: Kazuyuki Tanimura >Priority: Minor > Fix For: 3.4.0 > > > Currently `SparkPlanSuite` assumes that AQE is always turned off. We should > also test with AQE turned on -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-40088) Add SparkPlanWIthAQESuite
[ https://issues.apache.org/jira/browse/SPARK-40088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-40088. -- Resolution: Invalid > Add SparkPlanWIthAQESuite > - > > Key: SPARK-40088 > URL: https://issues.apache.org/jira/browse/SPARK-40088 > Project: Spark > Issue Type: Test > Components: SQL, Tests >Affects Versions: 3.4.0 >Reporter: Kazuyuki Tanimura >Priority: Minor > > Currently `SparkPlanSuite` assumes that AQE is always turned off. We should > also test with AQE turned on -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40088) Add SparkPlanWIthAQESuite
[ https://issues.apache.org/jira/browse/SPARK-40088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-40088: - Fix Version/s: (was: 3.4.0) > Add SparkPlanWIthAQESuite > - > > Key: SPARK-40088 > URL: https://issues.apache.org/jira/browse/SPARK-40088 > Project: Spark > Issue Type: Test > Components: SQL, Tests >Affects Versions: 3.4.0 >Reporter: Kazuyuki Tanimura >Priority: Minor > > Currently `SparkPlanSuite` assumes that AQE is always turned off. We should > also test with AQE turned on -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-40110) Add JDBCWithAQESuite
[ https://issues.apache.org/jira/browse/SPARK-40110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-40110. -- Resolution: Invalid Marking it as an invalid for now since we're not going to go this way. > Add JDBCWithAQESuite > > > Key: SPARK-40110 > URL: https://issues.apache.org/jira/browse/SPARK-40110 > Project: Spark > Issue Type: Test > Components: SQL, Tests >Affects Versions: 3.4.0 >Reporter: Kazuyuki Tanimura >Assignee: Kazuyuki Tanimura >Priority: Minor > > Currently `JDBCSuite` assumes that AQE is always turned off. We should also > test with AQE turned on -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40110) Add JDBCWithAQESuite
[ https://issues.apache.org/jira/browse/SPARK-40110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-40110: - Fix Version/s: (was: 3.4.0) > Add JDBCWithAQESuite > > > Key: SPARK-40110 > URL: https://issues.apache.org/jira/browse/SPARK-40110 > Project: Spark > Issue Type: Test > Components: SQL, Tests >Affects Versions: 3.4.0 >Reporter: Kazuyuki Tanimura >Assignee: Kazuyuki Tanimura >Priority: Minor > > Currently `JDBCSuite` assumes that AQE is always turned off. We should also > test with AQE turned on -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-40110) Add JDBCWithAQESuite
[ https://issues.apache.org/jira/browse/SPARK-40110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reopened SPARK-40110: -- > Add JDBCWithAQESuite > > > Key: SPARK-40110 > URL: https://issues.apache.org/jira/browse/SPARK-40110 > Project: Spark > Issue Type: Test > Components: SQL, Tests >Affects Versions: 3.4.0 >Reporter: Kazuyuki Tanimura >Assignee: Kazuyuki Tanimura >Priority: Minor > > Currently `JDBCSuite` assumes that AQE is always turned off. We should also > test with AQE turned on -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40088) Add SparkPlanWIthAQESuite
[ https://issues.apache.org/jira/browse/SPARK-40088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17585070#comment-17585070 ] Apache Spark commented on SPARK-40088: -- User 'kazuyukitanimura' has created a pull request for this issue: https://github.com/apache/spark/pull/37665 > Add SparkPlanWIthAQESuite > - > > Key: SPARK-40088 > URL: https://issues.apache.org/jira/browse/SPARK-40088 > Project: Spark > Issue Type: Test > Components: SQL, Tests >Affects Versions: 3.4.0 >Reporter: Kazuyuki Tanimura >Assignee: Kazuyuki Tanimura >Priority: Minor > Fix For: 3.4.0 > > > Currently `SparkPlanSuite` assumes that AQE is always turned off. We should > also test with AQE turned on -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40110) Add JDBCWithAQESuite
[ https://issues.apache.org/jira/browse/SPARK-40110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17585068#comment-17585068 ] Apache Spark commented on SPARK-40110: -- User 'kazuyukitanimura' has created a pull request for this issue: https://github.com/apache/spark/pull/37666 > Add JDBCWithAQESuite > > > Key: SPARK-40110 > URL: https://issues.apache.org/jira/browse/SPARK-40110 > Project: Spark > Issue Type: Test > Components: SQL, Tests >Affects Versions: 3.4.0 >Reporter: Kazuyuki Tanimura >Assignee: Kazuyuki Tanimura >Priority: Minor > Fix For: 3.4.0 > > > Currently `JDBCSuite` assumes that AQE is always turned off. We should also > test with AQE turned on -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40088) Add SparkPlanWIthAQESuite
[ https://issues.apache.org/jira/browse/SPARK-40088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17585067#comment-17585067 ] Apache Spark commented on SPARK-40088: -- User 'kazuyukitanimura' has created a pull request for this issue: https://github.com/apache/spark/pull/37665 > Add SparkPlanWIthAQESuite > - > > Key: SPARK-40088 > URL: https://issues.apache.org/jira/browse/SPARK-40088 > Project: Spark > Issue Type: Test > Components: SQL, Tests >Affects Versions: 3.4.0 >Reporter: Kazuyuki Tanimura >Assignee: Kazuyuki Tanimura >Priority: Minor > Fix For: 3.4.0 > > > Currently `SparkPlanSuite` assumes that AQE is always turned off. We should > also test with AQE turned on -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40049) Add adaptive plan case in ReplaceNullWithFalseInPredicateEndToEndSuite
[ https://issues.apache.org/jira/browse/SPARK-40049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17585066#comment-17585066 ] Apache Spark commented on SPARK-40049: -- User 'kazuyukitanimura' has created a pull request for this issue: https://github.com/apache/spark/pull/37664 > Add adaptive plan case in ReplaceNullWithFalseInPredicateEndToEndSuite > -- > > Key: SPARK-40049 > URL: https://issues.apache.org/jira/browse/SPARK-40049 > Project: Spark > Issue Type: Test > Components: SQL, Tests >Affects Versions: 3.4.0 >Reporter: Kazuyuki Tanimura >Assignee: Kazuyuki Tanimura >Priority: Minor > Fix For: 3.4.0 > > > Currently `ReplaceNullWithFalseInPredicateEndToEndSuite` assumes that > adaptive query execution is turned off. We should add cases > `spark.sql.adaptive.forceApply=true` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40222) Numeric try_add/try_divide/try_subtract/try_multiply should throw error from their children
[ https://issues.apache.org/jira/browse/SPARK-40222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17585047#comment-17585047 ] Apache Spark commented on SPARK-40222: -- User 'gengliangwang' has created a pull request for this issue: https://github.com/apache/spark/pull/37663 > Numeric try_add/try_divide/try_subtract/try_multiply should throw error from > their children > --- > > Key: SPARK-40222 > URL: https://issues.apache.org/jira/browse/SPARK-40222 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > > Similar to https://issues.apache.org/jira/browse/SPARK-40054, we should > refactor the > {{{}try_add{}}}/{{{}try_subtract{}}}/{{{}try_multiply{}}}/{{{}try_divide{}}} > functions so that the errors from their children will be shown instead of > ignored. > Spark SQL allows arithmetic operations between > Number/Date/Timestamp/CalendarInterval/AnsiInterval (see the rule > [ResolveBinaryArithmetic|https://github.com/databricks/runtime/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala#L501] > for details). Some of these combinations can throw exceptions too: * Date + > CalendarInterval > * Date + AnsiInterval > * Timestamp + AnsiInterval > * Date - CalendarInterval > * Date - AnsiInterval > * Timestamp - AnsiInterval > * Number * CalendarInterval > * Number * AnsiInterval > * CalendarInterval / Number > * AnsiInterval / Number > This Jira is for the cases when both input data types are numbers. > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40222) Numeric try_add/try_divide/try_subtract/try_multiply should throw error from their children
[ https://issues.apache.org/jira/browse/SPARK-40222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40222: Assignee: Gengliang Wang (was: Apache Spark) > Numeric try_add/try_divide/try_subtract/try_multiply should throw error from > their children > --- > > Key: SPARK-40222 > URL: https://issues.apache.org/jira/browse/SPARK-40222 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > > Similar to https://issues.apache.org/jira/browse/SPARK-40054, we should > refactor the > {{{}try_add{}}}/{{{}try_subtract{}}}/{{{}try_multiply{}}}/{{{}try_divide{}}} > functions so that the errors from their children will be shown instead of > ignored. > Spark SQL allows arithmetic operations between > Number/Date/Timestamp/CalendarInterval/AnsiInterval (see the rule > [ResolveBinaryArithmetic|https://github.com/databricks/runtime/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala#L501] > for details). Some of these combinations can throw exceptions too: * Date + > CalendarInterval > * Date + AnsiInterval > * Timestamp + AnsiInterval > * Date - CalendarInterval > * Date - AnsiInterval > * Timestamp - AnsiInterval > * Number * CalendarInterval > * Number * AnsiInterval > * CalendarInterval / Number > * AnsiInterval / Number > This Jira is for the cases when both input data types are numbers. > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40222) Numeric try_add/try_divide/try_subtract/try_multiply should throw error from their children
[ https://issues.apache.org/jira/browse/SPARK-40222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17585048#comment-17585048 ] Apache Spark commented on SPARK-40222: -- User 'gengliangwang' has created a pull request for this issue: https://github.com/apache/spark/pull/37663 > Numeric try_add/try_divide/try_subtract/try_multiply should throw error from > their children > --- > > Key: SPARK-40222 > URL: https://issues.apache.org/jira/browse/SPARK-40222 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > > Similar to https://issues.apache.org/jira/browse/SPARK-40054, we should > refactor the > {{{}try_add{}}}/{{{}try_subtract{}}}/{{{}try_multiply{}}}/{{{}try_divide{}}} > functions so that the errors from their children will be shown instead of > ignored. > Spark SQL allows arithmetic operations between > Number/Date/Timestamp/CalendarInterval/AnsiInterval (see the rule > [ResolveBinaryArithmetic|https://github.com/databricks/runtime/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala#L501] > for details). Some of these combinations can throw exceptions too: * Date + > CalendarInterval > * Date + AnsiInterval > * Timestamp + AnsiInterval > * Date - CalendarInterval > * Date - AnsiInterval > * Timestamp - AnsiInterval > * Number * CalendarInterval > * Number * AnsiInterval > * CalendarInterval / Number > * AnsiInterval / Number > This Jira is for the cases when both input data types are numbers. > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40222) Numeric try_add/try_divide/try_subtract/try_multiply should throw error from their children
[ https://issues.apache.org/jira/browse/SPARK-40222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40222: Assignee: Apache Spark (was: Gengliang Wang) > Numeric try_add/try_divide/try_subtract/try_multiply should throw error from > their children > --- > > Key: SPARK-40222 > URL: https://issues.apache.org/jira/browse/SPARK-40222 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Gengliang Wang >Assignee: Apache Spark >Priority: Major > > Similar to https://issues.apache.org/jira/browse/SPARK-40054, we should > refactor the > {{{}try_add{}}}/{{{}try_subtract{}}}/{{{}try_multiply{}}}/{{{}try_divide{}}} > functions so that the errors from their children will be shown instead of > ignored. > Spark SQL allows arithmetic operations between > Number/Date/Timestamp/CalendarInterval/AnsiInterval (see the rule > [ResolveBinaryArithmetic|https://github.com/databricks/runtime/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala#L501] > for details). Some of these combinations can throw exceptions too: * Date + > CalendarInterval > * Date + AnsiInterval > * Timestamp + AnsiInterval > * Date - CalendarInterval > * Date - AnsiInterval > * Timestamp - AnsiInterval > * Number * CalendarInterval > * Number * AnsiInterval > * CalendarInterval / Number > * AnsiInterval / Number > This Jira is for the cases when both input data types are numbers. > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40222) Numeric try_add/try_divide/try_subtract/try_multiply should throw error from their children
Gengliang Wang created SPARK-40222: -- Summary: Numeric try_add/try_divide/try_subtract/try_multiply should throw error from their children Key: SPARK-40222 URL: https://issues.apache.org/jira/browse/SPARK-40222 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.4.0 Reporter: Gengliang Wang Assignee: Gengliang Wang Similar to https://issues.apache.org/jira/browse/SPARK-40054, we should refactor the {{{}try_add{}}}/{{{}try_subtract{}}}/{{{}try_multiply{}}}/{{{}try_divide{}}} functions so that the errors from their children will be shown instead of ignored. Spark SQL allows arithmetic operations between Number/Date/Timestamp/CalendarInterval/AnsiInterval (see the rule [ResolveBinaryArithmetic|https://github.com/databricks/runtime/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala#L501] for details). Some of these combinations can throw exceptions too: * Date + CalendarInterval * Date + AnsiInterval * Timestamp + AnsiInterval * Date - CalendarInterval * Date - AnsiInterval * Timestamp - AnsiInterval * Number * CalendarInterval * Number * AnsiInterval * CalendarInterval / Number * AnsiInterval / Number This Jira is for the cases when both input data types are numbers. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40142) Make pyspark.sql.functions examples self-contained
[ https://issues.apache.org/jira/browse/SPARK-40142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17585035#comment-17585035 ] Apache Spark commented on SPARK-40142: -- User 'khalidmammadov' has created a pull request for this issue: https://github.com/apache/spark/pull/37662 > Make pyspark.sql.functions examples self-contained > -- > > Key: SPARK-40142 > URL: https://issues.apache.org/jira/browse/SPARK-40142 > Project: Spark > Issue Type: Sub-task > Components: PySpark, SQL >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40142) Make pyspark.sql.functions examples self-contained
[ https://issues.apache.org/jira/browse/SPARK-40142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17585036#comment-17585036 ] Apache Spark commented on SPARK-40142: -- User 'khalidmammadov' has created a pull request for this issue: https://github.com/apache/spark/pull/37662 > Make pyspark.sql.functions examples self-contained > -- > > Key: SPARK-40142 > URL: https://issues.apache.org/jira/browse/SPARK-40142 > Project: Spark > Issue Type: Sub-task > Components: PySpark, SQL >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40211) Allow executeTake() / collectLimit's number of starting partitions to be customized
[ https://issues.apache.org/jira/browse/SPARK-40211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40211: Assignee: Apache Spark > Allow executeTake() / collectLimit's number of starting partitions to be > customized > --- > > Key: SPARK-40211 > URL: https://issues.apache.org/jira/browse/SPARK-40211 > Project: Spark > Issue Type: Story > Components: Spark Core, SQL >Affects Versions: 3.4.0 >Reporter: Ziqi Liu >Assignee: Apache Spark >Priority: Major > > Today, Spark’s executeTake() code allow for the limitScaleUpFactor to be > customized but does not allow for the initial number of partitions to be > customized: it’s currently hardcoded to {{{}1{}}}. > We should add a configuration so that the initial partition count can be > customized. By setting this new configuration to a high value we could > effectively mitigate the “run multiple jobs” overhead in {{take}} behavior. > We could also set it to higher-than-1-but-still-small values (like, say, > {{{}10{}}}) to achieve a middle-ground trade-off. > > Essentially, we need to make {{numPartsToTry = 1L}} > ([code|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala#L481]) > customizable. We should do this via a new SQL conf, similar to the > {{limitScaleUpFactor}} conf. > > Spark has several near-duplicate versions of this code ([see code > search|https://github.com/apache/spark/search?q=numPartsToTry+%3D+1]) in: > * SparkPlan > * RDD > * pyspark rdd > Also, in pyspark {{limitScaleUpFactor}} is not supported either. So for > now, I will focus on scala side first, leaving python side untouched and > meanwhile sync with pyspark members. Depending on the progress we can do them > all in one PR or make scala side change first and leave pyspark change as a > follow-up. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40211) Allow executeTake() / collectLimit's number of starting partitions to be customized
[ https://issues.apache.org/jira/browse/SPARK-40211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17584995#comment-17584995 ] Apache Spark commented on SPARK-40211: -- User 'liuzqt' has created a pull request for this issue: https://github.com/apache/spark/pull/37661 > Allow executeTake() / collectLimit's number of starting partitions to be > customized > --- > > Key: SPARK-40211 > URL: https://issues.apache.org/jira/browse/SPARK-40211 > Project: Spark > Issue Type: Story > Components: Spark Core, SQL >Affects Versions: 3.4.0 >Reporter: Ziqi Liu >Priority: Major > > Today, Spark’s executeTake() code allow for the limitScaleUpFactor to be > customized but does not allow for the initial number of partitions to be > customized: it’s currently hardcoded to {{{}1{}}}. > We should add a configuration so that the initial partition count can be > customized. By setting this new configuration to a high value we could > effectively mitigate the “run multiple jobs” overhead in {{take}} behavior. > We could also set it to higher-than-1-but-still-small values (like, say, > {{{}10{}}}) to achieve a middle-ground trade-off. > > Essentially, we need to make {{numPartsToTry = 1L}} > ([code|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala#L481]) > customizable. We should do this via a new SQL conf, similar to the > {{limitScaleUpFactor}} conf. > > Spark has several near-duplicate versions of this code ([see code > search|https://github.com/apache/spark/search?q=numPartsToTry+%3D+1]) in: > * SparkPlan > * RDD > * pyspark rdd > Also, in pyspark {{limitScaleUpFactor}} is not supported either. So for > now, I will focus on scala side first, leaving python side untouched and > meanwhile sync with pyspark members. Depending on the progress we can do them > all in one PR or make scala side change first and leave pyspark change as a > follow-up. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40211) Allow executeTake() / collectLimit's number of starting partitions to be customized
[ https://issues.apache.org/jira/browse/SPARK-40211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40211: Assignee: (was: Apache Spark) > Allow executeTake() / collectLimit's number of starting partitions to be > customized > --- > > Key: SPARK-40211 > URL: https://issues.apache.org/jira/browse/SPARK-40211 > Project: Spark > Issue Type: Story > Components: Spark Core, SQL >Affects Versions: 3.4.0 >Reporter: Ziqi Liu >Priority: Major > > Today, Spark’s executeTake() code allow for the limitScaleUpFactor to be > customized but does not allow for the initial number of partitions to be > customized: it’s currently hardcoded to {{{}1{}}}. > We should add a configuration so that the initial partition count can be > customized. By setting this new configuration to a high value we could > effectively mitigate the “run multiple jobs” overhead in {{take}} behavior. > We could also set it to higher-than-1-but-still-small values (like, say, > {{{}10{}}}) to achieve a middle-ground trade-off. > > Essentially, we need to make {{numPartsToTry = 1L}} > ([code|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala#L481]) > customizable. We should do this via a new SQL conf, similar to the > {{limitScaleUpFactor}} conf. > > Spark has several near-duplicate versions of this code ([see code > search|https://github.com/apache/spark/search?q=numPartsToTry+%3D+1]) in: > * SparkPlan > * RDD > * pyspark rdd > Also, in pyspark {{limitScaleUpFactor}} is not supported either. So for > now, I will focus on scala side first, leaving python side untouched and > meanwhile sync with pyspark members. Depending on the progress we can do them > all in one PR or make scala side change first and leave pyspark change as a > follow-up. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40221) Not able to format using scalafmt
[ https://issues.apache.org/jira/browse/SPARK-40221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ziqi Liu updated SPARK-40221: - Description: I'm following the guidance in [https://spark.apache.org/developer-tools.html] using {code:java} ./dev/scalafmt{code} to format the code, but getting this error: {code:java} [ERROR] Failed to execute goal org.antipathy:mvn-scalafmt_2.12:1.1.1640084764.9f463a9:format (default-cli) on project spark-parent_2.12: Error formatting Scala files: missing setting 'version'. To fix this problem, add the following line to .scalafmt.conf: 'version=3.2.1'. -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException {code} was: I'm following the guidance in [https://spark.apache.org/developer-tools.html] using ./dev/scalafmt to format the code, but getting this error: {code:java} [ERROR] Failed to execute goal org.antipathy:mvn-scalafmt_2.12:1.1.1640084764.9f463a9:format (default-cli) on project spark-parent_2.12: Error formatting Scala files: missing setting 'version'. To fix this problem, add the following line to .scalafmt.conf: 'version=3.2.1'. -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException {code} > Not able to format using scalafmt > - > > Key: SPARK-40221 > URL: https://issues.apache.org/jira/browse/SPARK-40221 > Project: Spark > Issue Type: Question > Components: Build >Affects Versions: 3.4.0 >Reporter: Ziqi Liu >Priority: Major > > I'm following the guidance in [https://spark.apache.org/developer-tools.html] > using > {code:java} > ./dev/scalafmt{code} > to format the code, but getting this error: > {code:java} > [ERROR] Failed to execute goal > org.antipathy:mvn-scalafmt_2.12:1.1.1640084764.9f463a9:format (default-cli) > on project spark-parent_2.12: Error formatting Scala files: missing setting > 'version'. To fix this problem, add the following line to .scalafmt.conf: > 'version=3.2.1'. -> [Help 1] > [ERROR] > [ERROR] To see the full stack trace of the errors, re-run Maven with the -e > switch. > [ERROR] Re-run Maven using the -X switch to enable full debug logging. > [ERROR] > [ERROR] For more information about the errors and possible solutions, please > read the following articles: > [ERROR] [Help 1] > http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40221) Not able to format using scalafmt
Ziqi Liu created SPARK-40221: Summary: Not able to format using scalafmt Key: SPARK-40221 URL: https://issues.apache.org/jira/browse/SPARK-40221 Project: Spark Issue Type: Question Components: Build Affects Versions: 3.4.0 Reporter: Ziqi Liu I'm following the guidance in [https://spark.apache.org/developer-tools.html] using ./dev/scalafmt to format the code, but getting this error: {code:java} [ERROR] Failed to execute goal org.antipathy:mvn-scalafmt_2.12:1.1.1640084764.9f463a9:format (default-cli) on project spark-parent_2.12: Error formatting Scala files: missing setting 'version'. To fix this problem, add the following line to .scalafmt.conf: 'version=3.2.1'. -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40131) Support NumPy ndarray in built-in functions
[ https://issues.apache.org/jira/browse/SPARK-40131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-40131: - Description: Support NumPy ndarray in built-in functions(`pyspark.sql.functions`) by introducing Py4J input converter `NumpyArrayConverter`. The converter converts a ndarray to a Java array. (was: Per [https://github.com/apache/spark/pull/37560#discussion_r948572473] we want to support NumPy ndarray in built-in functions) > Support NumPy ndarray in built-in functions > --- > > Key: SPARK-40131 > URL: https://issues.apache.org/jira/browse/SPARK-40131 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Priority: Major > > Support NumPy ndarray in built-in functions(`pyspark.sql.functions`) by > introducing Py4J input converter `NumpyArrayConverter`. The converter > converts a ndarray to a Java array. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-40130) Support NumPy scalars in built-in functions
[ https://issues.apache.org/jira/browse/SPARK-40130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng resolved SPARK-40130. -- Assignee: Xinrong Meng Resolution: Fixed > Support NumPy scalars in built-in functions > --- > > Key: SPARK-40130 > URL: https://issues.apache.org/jira/browse/SPARK-40130 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > > Support NumPy scalars in built-in functions by introducing Py4J input > converter `NumpyScalarConverter`. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39483) Construct the schema from `np.dtype` when `createDataFrame` from a NumPy array
[ https://issues.apache.org/jira/browse/SPARK-39483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng reassigned SPARK-39483: Assignee: Xinrong Meng > Construct the schema from `np.dtype` when `createDataFrame` from a NumPy > array > --- > > Key: SPARK-39483 > URL: https://issues.apache.org/jira/browse/SPARK-39483 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > > Construct the schema from `np.dtype` when `createDataFrame` from a NumPy > array. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40130) Support NumPy scalars in built-in functions
[ https://issues.apache.org/jira/browse/SPARK-40130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17584975#comment-17584975 ] Xinrong Meng commented on SPARK-40130: -- Resolved by https://github.com/apache/spark/pull/37560 > Support NumPy scalars in built-in functions > --- > > Key: SPARK-40130 > URL: https://issues.apache.org/jira/browse/SPARK-40130 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Priority: Major > > Support NumPy scalars in built-in functions by introducing Py4J input > converter `NumpyScalarConverter`. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40220) Don't output the empty map of error message parameters
[ https://issues.apache.org/jira/browse/SPARK-40220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17584892#comment-17584892 ] Apache Spark commented on SPARK-40220: -- User 'MaxGekk' has created a pull request for this issue: https://github.com/apache/spark/pull/37660 > Don't output the empty map of error message parameters > -- > > Key: SPARK-40220 > URL: https://issues.apache.org/jira/browse/SPARK-40220 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > > In the current implementation, Spark output empty message parameters in the > MINIMAL and STANDARD formats: > {code:json} > org.apache.spark.SparkRuntimeException > { > "errorClass" : "ELEMENT_AT_BY_INDEX_ZERO", > "messageParameters" : { } > } > {code} > that contradict w/ the approach for other JSON fields. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40220) Don't output the empty map of error message parameters
[ https://issues.apache.org/jira/browse/SPARK-40220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40220: Assignee: Max Gekk (was: Apache Spark) > Don't output the empty map of error message parameters > -- > > Key: SPARK-40220 > URL: https://issues.apache.org/jira/browse/SPARK-40220 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > > In the current implementation, Spark output empty message parameters in the > MINIMAL and STANDARD formats: > {code:json} > org.apache.spark.SparkRuntimeException > { > "errorClass" : "ELEMENT_AT_BY_INDEX_ZERO", > "messageParameters" : { } > } > {code} > that contradict w/ the approach for other JSON fields. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40220) Don't output the empty map of error message parameters
[ https://issues.apache.org/jira/browse/SPARK-40220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40220: Assignee: Apache Spark (was: Max Gekk) > Don't output the empty map of error message parameters > -- > > Key: SPARK-40220 > URL: https://issues.apache.org/jira/browse/SPARK-40220 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Assignee: Apache Spark >Priority: Major > > In the current implementation, Spark output empty message parameters in the > MINIMAL and STANDARD formats: > {code:json} > org.apache.spark.SparkRuntimeException > { > "errorClass" : "ELEMENT_AT_BY_INDEX_ZERO", > "messageParameters" : { } > } > {code} > that contradict w/ the approach for other JSON fields. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40220) Don't output the empty map of error message parameters
Max Gekk created SPARK-40220: Summary: Don't output the empty map of error message parameters Key: SPARK-40220 URL: https://issues.apache.org/jira/browse/SPARK-40220 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.4.0 Reporter: Max Gekk Assignee: Max Gekk In the current implementation, Spark output empty message parameters in the MINIMAL and STANDARD formats: {code:json} org.apache.spark.SparkRuntimeException { "errorClass" : "ELEMENT_AT_BY_INDEX_ZERO", "messageParameters" : { } } {code} that contradict w/ the approach for other JSON fields. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40212) SparkSQL castPartValue does not properly handle byte & short
[ https://issues.apache.org/jira/browse/SPARK-40212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17584884#comment-17584884 ] Brennan Stein commented on SPARK-40212: --- Is a PR with unit test adequate reproduction? :) Realized it was actually a very simple fix [https://github.com/apache/spark/pull/37659] > SparkSQL castPartValue does not properly handle byte & short > > > Key: SPARK-40212 > URL: https://issues.apache.org/jira/browse/SPARK-40212 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Brennan Stein >Priority: Major > > Reading in a parquet file partitioned on disk by a `Byte`-type column fails > with the following exception: > > {code:java} > [info] Cause: java.lang.ClassCastException: java.lang.Integer cannot be > cast to java.lang.Byte > [info] at scala.runtime.BoxesRunTime.unboxToByte(BoxesRunTime.java:95) > [info] at > org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow.getByte(rows.scala:39) > [info] at > org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow.getByte$(rows.scala:39) > [info] at > org.apache.spark.sql.catalyst.expressions.GenericInternalRow.getByte(rows.scala:195) > [info] at > org.apache.spark.sql.catalyst.expressions.JoinedRow.getByte(JoinedRow.scala:86) > [info] at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.writeFields_0_6$(Unknown > Source) > [info] at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown > Source) > [info] at > org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat.$anonfun$buildReaderWithPartitionValues$8(ParquetFileFormat.scala:385) > [info] at > org.apache.spark.sql.execution.datasources.RecordReaderIterator$$anon$1.next(RecordReaderIterator.scala:62) > [info] at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.next(FileScanRDD.scala:189) > [info] at scala.collection.Iterator$$anon$10.next(Iterator.scala:461) > [info] at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown > Source) > [info] at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > [info] at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760) > [info] at > org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:364) > [info] at > org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:890) > [info] at > org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:890) > [info] at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) > [info] at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365) > [info] at org.apache.spark.rdd.RDD.iterator(RDD.scala:329) > [info] at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) > [info] at org.apache.spark.scheduler.Task.run(Task.scala:136) > [info] at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548) > [info] at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504) > [info] at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551) > [info] at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > [info] at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > [info] at java.lang.Thread.run(Thread.java:748) {code} > I believe the issue to stem from > [PartitioningUtils::castPartValueToDesiredType|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala#L533] > returning an Integer for ByteType and ShortType (which then fails to unbox > to the expected type): > > {code:java} > case ByteType | ShortType | IntegerType => Integer.parseInt(value) {code} > > The issue appears to have been introduced in [this > commit|https://github.com/apache/spark/commit/fc29c91f27d866502f5b6cc4261d4943b57e] > so likely affects Spark 3.2 as well, though I've only tested on 3.3.0. > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33760) Extend Dynamic Partition Pruning Support to DataSources
[ https://issues.apache.org/jira/browse/SPARK-33760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17584882#comment-17584882 ] Willi Raschkowski commented on SPARK-33760: --- Is this related to SPARK-35779? > Extend Dynamic Partition Pruning Support to DataSources > --- > > Key: SPARK-33760 > URL: https://issues.apache.org/jira/browse/SPARK-33760 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.1 >Reporter: Anoop Johnson >Priority: Major > > The implementation of Dynamic Partition Pruning (DPP) in Spark is > [specific|https://github.com/apache/spark/blob/fb2e3af4b5d92398d57e61b766466cc7efd9d7cb/sql/core/src/main/scala/org/apache/spark/sql/execution/dynamicpruning/PartitionPruning.scala#L59-L64] > to HadoopFSRelation. As a result, DPP is not triggered for queries that use > data sources. > The DataSource v2 readers can expose the partition metadata. Can we use this > metadata and extend DPP to work on data sources as well? > Would appreciate thoughts or corner cases we need to handle. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40212) SparkSQL castPartValue does not properly handle byte & short
[ https://issues.apache.org/jira/browse/SPARK-40212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40212: Assignee: Apache Spark > SparkSQL castPartValue does not properly handle byte & short > > > Key: SPARK-40212 > URL: https://issues.apache.org/jira/browse/SPARK-40212 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Brennan Stein >Assignee: Apache Spark >Priority: Major > > Reading in a parquet file partitioned on disk by a `Byte`-type column fails > with the following exception: > > {code:java} > [info] Cause: java.lang.ClassCastException: java.lang.Integer cannot be > cast to java.lang.Byte > [info] at scala.runtime.BoxesRunTime.unboxToByte(BoxesRunTime.java:95) > [info] at > org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow.getByte(rows.scala:39) > [info] at > org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow.getByte$(rows.scala:39) > [info] at > org.apache.spark.sql.catalyst.expressions.GenericInternalRow.getByte(rows.scala:195) > [info] at > org.apache.spark.sql.catalyst.expressions.JoinedRow.getByte(JoinedRow.scala:86) > [info] at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.writeFields_0_6$(Unknown > Source) > [info] at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown > Source) > [info] at > org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat.$anonfun$buildReaderWithPartitionValues$8(ParquetFileFormat.scala:385) > [info] at > org.apache.spark.sql.execution.datasources.RecordReaderIterator$$anon$1.next(RecordReaderIterator.scala:62) > [info] at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.next(FileScanRDD.scala:189) > [info] at scala.collection.Iterator$$anon$10.next(Iterator.scala:461) > [info] at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown > Source) > [info] at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > [info] at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760) > [info] at > org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:364) > [info] at > org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:890) > [info] at > org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:890) > [info] at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) > [info] at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365) > [info] at org.apache.spark.rdd.RDD.iterator(RDD.scala:329) > [info] at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) > [info] at org.apache.spark.scheduler.Task.run(Task.scala:136) > [info] at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548) > [info] at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504) > [info] at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551) > [info] at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > [info] at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > [info] at java.lang.Thread.run(Thread.java:748) {code} > I believe the issue to stem from > [PartitioningUtils::castPartValueToDesiredType|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala#L533] > returning an Integer for ByteType and ShortType (which then fails to unbox > to the expected type): > > {code:java} > case ByteType | ShortType | IntegerType => Integer.parseInt(value) {code} > > The issue appears to have been introduced in [this > commit|https://github.com/apache/spark/commit/fc29c91f27d866502f5b6cc4261d4943b57e] > so likely affects Spark 3.2 as well, though I've only tested on 3.3.0. > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40212) SparkSQL castPartValue does not properly handle byte & short
[ https://issues.apache.org/jira/browse/SPARK-40212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40212: Assignee: (was: Apache Spark) > SparkSQL castPartValue does not properly handle byte & short > > > Key: SPARK-40212 > URL: https://issues.apache.org/jira/browse/SPARK-40212 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Brennan Stein >Priority: Major > > Reading in a parquet file partitioned on disk by a `Byte`-type column fails > with the following exception: > > {code:java} > [info] Cause: java.lang.ClassCastException: java.lang.Integer cannot be > cast to java.lang.Byte > [info] at scala.runtime.BoxesRunTime.unboxToByte(BoxesRunTime.java:95) > [info] at > org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow.getByte(rows.scala:39) > [info] at > org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow.getByte$(rows.scala:39) > [info] at > org.apache.spark.sql.catalyst.expressions.GenericInternalRow.getByte(rows.scala:195) > [info] at > org.apache.spark.sql.catalyst.expressions.JoinedRow.getByte(JoinedRow.scala:86) > [info] at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.writeFields_0_6$(Unknown > Source) > [info] at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown > Source) > [info] at > org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat.$anonfun$buildReaderWithPartitionValues$8(ParquetFileFormat.scala:385) > [info] at > org.apache.spark.sql.execution.datasources.RecordReaderIterator$$anon$1.next(RecordReaderIterator.scala:62) > [info] at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.next(FileScanRDD.scala:189) > [info] at scala.collection.Iterator$$anon$10.next(Iterator.scala:461) > [info] at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown > Source) > [info] at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > [info] at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760) > [info] at > org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:364) > [info] at > org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:890) > [info] at > org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:890) > [info] at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) > [info] at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365) > [info] at org.apache.spark.rdd.RDD.iterator(RDD.scala:329) > [info] at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) > [info] at org.apache.spark.scheduler.Task.run(Task.scala:136) > [info] at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548) > [info] at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504) > [info] at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551) > [info] at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > [info] at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > [info] at java.lang.Thread.run(Thread.java:748) {code} > I believe the issue to stem from > [PartitioningUtils::castPartValueToDesiredType|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala#L533] > returning an Integer for ByteType and ShortType (which then fails to unbox > to the expected type): > > {code:java} > case ByteType | ShortType | IntegerType => Integer.parseInt(value) {code} > > The issue appears to have been introduced in [this > commit|https://github.com/apache/spark/commit/fc29c91f27d866502f5b6cc4261d4943b57e] > so likely affects Spark 3.2 as well, though I've only tested on 3.3.0. > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40212) SparkSQL castPartValue does not properly handle byte & short
[ https://issues.apache.org/jira/browse/SPARK-40212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17584880#comment-17584880 ] Apache Spark commented on SPARK-40212: -- User 'BrennanStein' has created a pull request for this issue: https://github.com/apache/spark/pull/37659 > SparkSQL castPartValue does not properly handle byte & short > > > Key: SPARK-40212 > URL: https://issues.apache.org/jira/browse/SPARK-40212 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Brennan Stein >Priority: Major > > Reading in a parquet file partitioned on disk by a `Byte`-type column fails > with the following exception: > > {code:java} > [info] Cause: java.lang.ClassCastException: java.lang.Integer cannot be > cast to java.lang.Byte > [info] at scala.runtime.BoxesRunTime.unboxToByte(BoxesRunTime.java:95) > [info] at > org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow.getByte(rows.scala:39) > [info] at > org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow.getByte$(rows.scala:39) > [info] at > org.apache.spark.sql.catalyst.expressions.GenericInternalRow.getByte(rows.scala:195) > [info] at > org.apache.spark.sql.catalyst.expressions.JoinedRow.getByte(JoinedRow.scala:86) > [info] at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.writeFields_0_6$(Unknown > Source) > [info] at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown > Source) > [info] at > org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat.$anonfun$buildReaderWithPartitionValues$8(ParquetFileFormat.scala:385) > [info] at > org.apache.spark.sql.execution.datasources.RecordReaderIterator$$anon$1.next(RecordReaderIterator.scala:62) > [info] at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.next(FileScanRDD.scala:189) > [info] at scala.collection.Iterator$$anon$10.next(Iterator.scala:461) > [info] at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown > Source) > [info] at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > [info] at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760) > [info] at > org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:364) > [info] at > org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:890) > [info] at > org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:890) > [info] at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) > [info] at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365) > [info] at org.apache.spark.rdd.RDD.iterator(RDD.scala:329) > [info] at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) > [info] at org.apache.spark.scheduler.Task.run(Task.scala:136) > [info] at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548) > [info] at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504) > [info] at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551) > [info] at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > [info] at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > [info] at java.lang.Thread.run(Thread.java:748) {code} > I believe the issue to stem from > [PartitioningUtils::castPartValueToDesiredType|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala#L533] > returning an Integer for ByteType and ShortType (which then fails to unbox > to the expected type): > > {code:java} > case ByteType | ShortType | IntegerType => Integer.parseInt(value) {code} > > The issue appears to have been introduced in [this > commit|https://github.com/apache/spark/commit/fc29c91f27d866502f5b6cc4261d4943b57e] > so likely affects Spark 3.2 as well, though I've only tested on 3.3.0. > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-40192) Remove redundant groupby
[ https://issues.apache.org/jira/browse/SPARK-40192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen resolved SPARK-40192. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 37628 [https://github.com/apache/spark/pull/37628] > Remove redundant groupby > > > Key: SPARK-40192 > URL: https://issues.apache.org/jira/browse/SPARK-40192 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: deshanxiao >Assignee: deshanxiao >Priority: Minor > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40192) Remove redundant groupby
[ https://issues.apache.org/jira/browse/SPARK-40192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen reassigned SPARK-40192: Assignee: deshanxiao > Remove redundant groupby > > > Key: SPARK-40192 > URL: https://issues.apache.org/jira/browse/SPARK-40192 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: deshanxiao >Assignee: deshanxiao >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40192) Remove redundant groupby
[ https://issues.apache.org/jira/browse/SPARK-40192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen updated SPARK-40192: - Priority: Trivial (was: Minor) > Remove redundant groupby > > > Key: SPARK-40192 > URL: https://issues.apache.org/jira/browse/SPARK-40192 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: deshanxiao >Assignee: deshanxiao >Priority: Trivial > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40219) resolved view plan should hold the schema to avoid redundant lookup
[ https://issues.apache.org/jira/browse/SPARK-40219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17584833#comment-17584833 ] Apache Spark commented on SPARK-40219: -- User 'cloud-fan' has created a pull request for this issue: https://github.com/apache/spark/pull/37658 > resolved view plan should hold the schema to avoid redundant lookup > --- > > Key: SPARK-40219 > URL: https://issues.apache.org/jira/browse/SPARK-40219 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Wenchen Fan >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40219) resolved view plan should hold the schema to avoid redundant lookup
[ https://issues.apache.org/jira/browse/SPARK-40219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40219: Assignee: (was: Apache Spark) > resolved view plan should hold the schema to avoid redundant lookup > --- > > Key: SPARK-40219 > URL: https://issues.apache.org/jira/browse/SPARK-40219 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Wenchen Fan >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40219) resolved view plan should hold the schema to avoid redundant lookup
[ https://issues.apache.org/jira/browse/SPARK-40219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40219: Assignee: Apache Spark > resolved view plan should hold the schema to avoid redundant lookup > --- > > Key: SPARK-40219 > URL: https://issues.apache.org/jira/browse/SPARK-40219 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Wenchen Fan >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40219) resolved view plan should hold the schema to avoid redundant lookup
[ https://issues.apache.org/jira/browse/SPARK-40219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17584830#comment-17584830 ] Apache Spark commented on SPARK-40219: -- User 'cloud-fan' has created a pull request for this issue: https://github.com/apache/spark/pull/37658 > resolved view plan should hold the schema to avoid redundant lookup > --- > > Key: SPARK-40219 > URL: https://issues.apache.org/jira/browse/SPARK-40219 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Wenchen Fan >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40219) resolved view plan should hold the schema to avoid redundant lookup
Wenchen Fan created SPARK-40219: --- Summary: resolved view plan should hold the schema to avoid redundant lookup Key: SPARK-40219 URL: https://issues.apache.org/jira/browse/SPARK-40219 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.4.0 Reporter: Wenchen Fan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40010) Make pyspark.sql.window examples self-contained
[ https://issues.apache.org/jira/browse/SPARK-40010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17584826#comment-17584826 ] Apache Spark commented on SPARK-40010: -- User 'dcoliversun' has created a pull request for this issue: https://github.com/apache/spark/pull/37657 > Make pyspark.sql.window examples self-contained > --- > > Key: SPARK-40010 > URL: https://issues.apache.org/jira/browse/SPARK-40010 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 3.4.0 >Reporter: Qian Sun >Assignee: Qian Sun >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39562) Make hive-thrift server module passes in IPv6 environment
[ https://issues.apache.org/jira/browse/SPARK-39562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17584770#comment-17584770 ] Apache Spark commented on SPARK-39562: -- User 'cloud-fan' has created a pull request for this issue: https://github.com/apache/spark/pull/37656 > Make hive-thrift server module passes in IPv6 environment > - > > Key: SPARK-39562 > URL: https://issues.apache.org/jira/browse/SPARK-39562 > Project: Spark > Issue Type: Sub-task > Components: SQL, Tests >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39562) Make hive-thrift server module passes in IPv6 environment
[ https://issues.apache.org/jira/browse/SPARK-39562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17584769#comment-17584769 ] Apache Spark commented on SPARK-39562: -- User 'cloud-fan' has created a pull request for this issue: https://github.com/apache/spark/pull/37656 > Make hive-thrift server module passes in IPv6 environment > - > > Key: SPARK-39562 > URL: https://issues.apache.org/jira/browse/SPARK-39562 > Project: Spark > Issue Type: Sub-task > Components: SQL, Tests >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40218) GROUPING SETS should preserve the grouping columns
[ https://issues.apache.org/jira/browse/SPARK-40218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17584756#comment-17584756 ] Apache Spark commented on SPARK-40218: -- User 'cloud-fan' has created a pull request for this issue: https://github.com/apache/spark/pull/37655 > GROUPING SETS should preserve the grouping columns > -- > > Key: SPARK-40218 > URL: https://issues.apache.org/jira/browse/SPARK-40218 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40218) GROUPING SETS should preserve the grouping columns
[ https://issues.apache.org/jira/browse/SPARK-40218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17584754#comment-17584754 ] Apache Spark commented on SPARK-40218: -- User 'cloud-fan' has created a pull request for this issue: https://github.com/apache/spark/pull/37655 > GROUPING SETS should preserve the grouping columns > -- > > Key: SPARK-40218 > URL: https://issues.apache.org/jira/browse/SPARK-40218 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40218) GROUPING SETS should preserve the grouping columns
[ https://issues.apache.org/jira/browse/SPARK-40218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40218: Assignee: Wenchen Fan (was: Apache Spark) > GROUPING SETS should preserve the grouping columns > -- > > Key: SPARK-40218 > URL: https://issues.apache.org/jira/browse/SPARK-40218 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40218) GROUPING SETS should preserve the grouping columns
[ https://issues.apache.org/jira/browse/SPARK-40218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40218: Assignee: Apache Spark (was: Wenchen Fan) > GROUPING SETS should preserve the grouping columns > -- > > Key: SPARK-40218 > URL: https://issues.apache.org/jira/browse/SPARK-40218 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Wenchen Fan >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40218) GROUPING SETS should preserve the grouping columns
Wenchen Fan created SPARK-40218: --- Summary: GROUPING SETS should preserve the grouping columns Key: SPARK-40218 URL: https://issues.apache.org/jira/browse/SPARK-40218 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.2.0 Reporter: Wenchen Fan Assignee: Wenchen Fan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40217) Support java.math.BigDecimal as an external type of Decimal128 type
jiaan.geng created SPARK-40217: -- Summary: Support java.math.BigDecimal as an external type of Decimal128 type Key: SPARK-40217 URL: https://issues.apache.org/jira/browse/SPARK-40217 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.4.0 Reporter: jiaan.geng Allow parallelization/collection of java.math.BigDecimal values, and convert the values to int128 values of Decimal128Type. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-40205) Provide a query context of ELEMENT_AT_BY_INDEX_ZERO
[ https://issues.apache.org/jira/browse/SPARK-40205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk resolved SPARK-40205. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 37645 [https://github.com/apache/spark/pull/37645] > Provide a query context of ELEMENT_AT_BY_INDEX_ZERO > --- > > Key: SPARK-40205 > URL: https://issues.apache.org/jira/browse/SPARK-40205 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > Fix For: 3.4.0 > > > Pass a query context to elementAtByIndexZeroError() in ElementAt -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-40209) Incorrect value in the error message of NUMERIC_VALUE_OUT_OF_RANGE
[ https://issues.apache.org/jira/browse/SPARK-40209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk resolved SPARK-40209. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 37649 [https://github.com/apache/spark/pull/37649] > Incorrect value in the error message of NUMERIC_VALUE_OUT_OF_RANGE > -- > > Key: SPARK-40209 > URL: https://issues.apache.org/jira/browse/SPARK-40209 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > Fix For: 3.4.0 > > > The example below demonstrates the issue: > {code:sql} > spark-sql> select cast(interval '10.123' second as decimal(1, 0)); > [NUMERIC_VALUE_OUT_OF_RANGE] 0.10 cannot be represented as Decimal(1, 0). > If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error. > {code} > The value 0.10 is not related to 10.123. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-40053) HiveExternalCatalogVersionsSuite will test all spark versions and aborted when Python 2.7 is used
[ https://issues.apache.org/jira/browse/SPARK-40053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-40053. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 37487 [https://github.com/apache/spark/pull/37487] > HiveExternalCatalogVersionsSuite will test all spark versions and aborted > when Python 2.7 is used > -- > > Key: SPARK-40053 > URL: https://issues.apache.org/jira/browse/SPARK-40053 > Project: Spark > Issue Type: Improvement > Components: Tests >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Minor > Fix For: 3.4.0 > > > When the test environment is java 8 + Python 2.7, > HiveExternalCatalogVersionsSuite will test all Spark 3.x versions and Spark > 2.4.8, and all test will ABORTED > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40053) HiveExternalCatalogVersionsSuite will test all spark versions and aborted when Python 2.7 is used
[ https://issues.apache.org/jira/browse/SPARK-40053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-40053: Assignee: Yang Jie > HiveExternalCatalogVersionsSuite will test all spark versions and aborted > when Python 2.7 is used > -- > > Key: SPARK-40053 > URL: https://issues.apache.org/jira/browse/SPARK-40053 > Project: Spark > Issue Type: Improvement > Components: Tests >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Minor > > When the test environment is java 8 + Python 2.7, > HiveExternalCatalogVersionsSuite will test all Spark 3.x versions and Spark > 2.4.8, and all test will ABORTED > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40216) Extract common `prepareWrite` method for `ParquetFileFormat` and `ParquetWrite` to eliminate duplicate code
[ https://issues.apache.org/jira/browse/SPARK-40216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40216: Assignee: Apache Spark > Extract common `prepareWrite` method for `ParquetFileFormat` and > `ParquetWrite` to eliminate duplicate code > --- > > Key: SPARK-40216 > URL: https://issues.apache.org/jira/browse/SPARK-40216 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Minor > > many duplicate codes in `ParquetFileFormat.prepareWrite` and > `ParqeutWrite.prepareWrite` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40216) Extract common `prepareWrite` method for `ParquetFileFormat` and `ParquetWrite` to eliminate duplicate code
[ https://issues.apache.org/jira/browse/SPARK-40216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17584660#comment-17584660 ] Apache Spark commented on SPARK-40216: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/37654 > Extract common `prepareWrite` method for `ParquetFileFormat` and > `ParquetWrite` to eliminate duplicate code > --- > > Key: SPARK-40216 > URL: https://issues.apache.org/jira/browse/SPARK-40216 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Minor > > many duplicate codes in `ParquetFileFormat.prepareWrite` and > `ParqeutWrite.prepareWrite` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40216) Extract common `prepareWrite` method for `ParquetFileFormat` and `ParquetWrite` to eliminate duplicate code
[ https://issues.apache.org/jira/browse/SPARK-40216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40216: Assignee: (was: Apache Spark) > Extract common `prepareWrite` method for `ParquetFileFormat` and > `ParquetWrite` to eliminate duplicate code > --- > > Key: SPARK-40216 > URL: https://issues.apache.org/jira/browse/SPARK-40216 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Minor > > many duplicate codes in `ParquetFileFormat.prepareWrite` and > `ParqeutWrite.prepareWrite` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40216) Extract common `prepareWrite` method for `ParquetFileFormat` and `ParquetWrite` to eliminate duplicate code
Yang Jie created SPARK-40216: Summary: Extract common `prepareWrite` method for `ParquetFileFormat` and `ParquetWrite` to eliminate duplicate code Key: SPARK-40216 URL: https://issues.apache.org/jira/browse/SPARK-40216 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.4.0 Reporter: Yang Jie many duplicate codes in `ParquetFileFormat.prepareWrite` and `ParqeutWrite.prepareWrite` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-40214) Add `get` to dataframe functions
[ https://issues.apache.org/jira/browse/SPARK-40214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-40214. --- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 37652 [https://github.com/apache/spark/pull/37652] > Add `get` to dataframe functions > > > Key: SPARK-40214 > URL: https://issues.apache.org/jira/browse/SPARK-40214 > Project: Spark > Issue Type: Improvement > Components: PySpark, SQL >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40215) Add SQL configs to control CSV/JSON date and timestamp parsing behaviour
[ https://issues.apache.org/jira/browse/SPARK-40215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17584634#comment-17584634 ] Apache Spark commented on SPARK-40215: -- User 'sadikovi' has created a pull request for this issue: https://github.com/apache/spark/pull/37653 > Add SQL configs to control CSV/JSON date and timestamp parsing behaviour > > > Key: SPARK-40215 > URL: https://issues.apache.org/jira/browse/SPARK-40215 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Ivan Sadikov >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40215) Add SQL configs to control CSV/JSON date and timestamp parsing behaviour
[ https://issues.apache.org/jira/browse/SPARK-40215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17584633#comment-17584633 ] Apache Spark commented on SPARK-40215: -- User 'sadikovi' has created a pull request for this issue: https://github.com/apache/spark/pull/37653 > Add SQL configs to control CSV/JSON date and timestamp parsing behaviour > > > Key: SPARK-40215 > URL: https://issues.apache.org/jira/browse/SPARK-40215 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Ivan Sadikov >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org