[jira] [Assigned] (SPARK-40225) PySpark rdd.takeOrdered should check num and numPartitions

2022-08-25 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40225:


Assignee: (was: Apache Spark)

> PySpark rdd.takeOrdered should check num and numPartitions
> --
>
> Key: SPARK-40225
> URL: https://issues.apache.org/jira/browse/SPARK-40225
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40225) PySpark rdd.takeOrdered should check num and numPartitions

2022-08-25 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17585148#comment-17585148
 ] 

Apache Spark commented on SPARK-40225:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/37669

> PySpark rdd.takeOrdered should check num and numPartitions
> --
>
> Key: SPARK-40225
> URL: https://issues.apache.org/jira/browse/SPARK-40225
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40225) PySpark rdd.takeOrdered should check num and numPartitions

2022-08-25 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17585146#comment-17585146
 ] 

Apache Spark commented on SPARK-40225:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/37669

> PySpark rdd.takeOrdered should check num and numPartitions
> --
>
> Key: SPARK-40225
> URL: https://issues.apache.org/jira/browse/SPARK-40225
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40225) PySpark rdd.takeOrdered should check num and numPartitions

2022-08-25 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40225:


Assignee: Apache Spark

> PySpark rdd.takeOrdered should check num and numPartitions
> --
>
> Key: SPARK-40225
> URL: https://issues.apache.org/jira/browse/SPARK-40225
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Apache Spark
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40206) Spark SQL Predict Pushdown for Hive Bucketed Table

2022-08-25 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-40206.
--
Resolution: Invalid

> Spark SQL Predict Pushdown for Hive Bucketed Table
> --
>
> Key: SPARK-40206
> URL: https://issues.apache.org/jira/browse/SPARK-40206
> Project: Spark
>  Issue Type: Question
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Raymond Tang
>Priority: Minor
>  Labels: hive, hive-buckets, spark, spark-sql
>
> Hi team,
> I was testing out Hive bucket table features.  One of the benefits as most 
> documentation suggested is that bucketed hive table can be used for query 
> filer/predict pushdown to improve query performance.
> However through my exploration, that doesn't seem to be true. *Can you please 
> help to clarify if Spark SQL supports query optimizations when using Hive 
> bucketed table?*
>  
> How to produce the issue:
> Create a Hive 3 table using the following DDL:
> {code:java}
> create table test_db.bucket_table(user_id int, key string) 
> comment 'A bucketed table' 
> partitioned by(country string) 
> clustered by(user_id) sorted by (key) into 10 buckets
> stored as ORC;{code}
> And then insert into this table using the following PySpark script:
> {code:java}
> from pyspark.sql import SparkSession
> appName = "PySpark Hive Bucketing Example"
> master = "local"
> # Create Spark session with Hive supported.
> spark = SparkSession.builder \
> .appName(appName) \
> .master(master) \
> .enableHiveSupport() \
> .getOrCreate()
> # prepare sample data for inserting into hive table
> data = []
> countries = ['CN', 'AU']
> for i in range(0, 1000):
> data.append([int(i),  'U'+str(i), countries[i % 2]])
> df = spark.createDataFrame(data, ['user_id', 'key', 'country'])
> df.show()
> # Save df to Hive table test_db.bucket_table
> df.write.mode('append').insertInto('test_db.bucket_table') {code}
> Then query the table using the following script:
> {code:java}
> from pyspark.sql import SparkSession
> appName = "PySpark Hive Bucketing Example"
> master = "local"
> # Create Spark session with Hive supported.
> spark = SparkSession.builder \
> .appName(appName) \
> .master(master) \
> .enableHiveSupport() \
> .getOrCreate()
> df = spark.sql("""select * from test_db.bucket_table
> where country='AU' and user_id=101
> """)
> df.show()
> df.explain(extended=True) {code}
> I am expecting to read from only one bucket file in HDFS but instead Spark 
> scanned all bucket files in partition folder country=AU.
> {code:java}
> == Parsed Logical Plan ==
> 'Project [*]
>  - 'Filter (('country = AU) AND ('t1.user_id = 101))
> - 'SubqueryAlias t1
>- 'UnresolvedRelation [test_db, bucket_table], [], false
> == Analyzed Logical Plan ==
> user_id: int, key: string, country: string
> Project [user_id#20, key#21, country#22]
>  - Filter ((country#22 = AU) AND (user_id#20 = 101))
> - SubqueryAlias t1
>- SubqueryAlias spark_catalog.test_db.bucket_table
>   - Relation test_db.bucket_table[user_id#20,key#21,country#22] orc
> == Optimized Logical Plan ==
> Filter (((isnotnull(country#22) AND isnotnull(user_id#20)) AND (country#22 = 
> AU)) AND (user_id#20 = 101))
>  - Relation test_db.bucket_table[user_id#20,key#21,country#22] orc
> == Physical Plan ==
> *(1) Filter (isnotnull(user_id#20) AND (user_id#20 = 101))
>  - *(1) ColumnarToRow
> - FileScan orc test_db.bucket_table[user_id#20,key#21,country#22] 
> Batched: true, DataFilters: [isnotnull(user_id#20), (user_id#20 = 101)], 
> Format: ORC, Location: InMemoryFileIndex(1 
> paths)[hdfs://localhost:9000/user/hive/warehouse/test_db.db/bucket_table/coun...,
>  PartitionFilters: [isnotnull(country#22), (country#22 = AU)], PushedFilters: 
> [IsNotNull(user_id), EqualTo(user_id,101)], ReadSchema: 
> struct   {code}
> *Am I doing something wrong? or is it because Spark doesn't support it? Your 
> guidance and help will be appreciated.* 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40204) Whether it is possible to support querying the status of a specific application in a subsequent version

2022-08-25 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-40204:
-
Flags:   (was: Important)

> Whether it is possible to support querying the status of a specific 
> application in a subsequent version
> ---
>
> Key: SPARK-40204
> URL: https://issues.apache.org/jira/browse/SPARK-40204
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.4, 2.4.6
> Environment: Standalone Cluster Mode
>Reporter: bitao
>Priority: Major
>  Labels: features
>
> The current SparkAppHandler cannot support obtaining the application status 
> in Standalone Cluster mode. One way is to query the status of the specified 
> Driver through the StandaloneRestServer, but it cannot query the status of 
> the specified application. Is it possible to add a method (eg: 
> handleAppStatus) to the StandaloneRestServer by asking the Master Send the 
> RequestMasterState message to get the state of the specified application. The 
> current MasterWebUI should do this, but the premise is that it needs to use 
> the same RpcEnv as the Master Endpoint. Many times we care about the status 
> of the application rather than the status of the Driver, so we hope to add 
> this function in subsequent versions to support obtaining the status of the 
> specified application in Standalone cluster mode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40206) Spark SQL Predict Pushdown for Hive Bucketed Table

2022-08-25 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17585135#comment-17585135
 ] 

Hyukjin Kwon commented on SPARK-40206:
--

[~raymond.tang] Let;s probably ask the questions to dev mailing list before 
filing it as a JIRA here. We're promoting to ask questions in other channels.

> Spark SQL Predict Pushdown for Hive Bucketed Table
> --
>
> Key: SPARK-40206
> URL: https://issues.apache.org/jira/browse/SPARK-40206
> Project: Spark
>  Issue Type: Question
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Raymond Tang
>Priority: Minor
>  Labels: hive, hive-buckets, spark, spark-sql
>
> Hi team,
> I was testing out Hive bucket table features.  One of the benefits as most 
> documentation suggested is that bucketed hive table can be used for query 
> filer/predict pushdown to improve query performance.
> However through my exploration, that doesn't seem to be true. *Can you please 
> help to clarify if Spark SQL supports query optimizations when using Hive 
> bucketed table?*
>  
> How to produce the issue:
> Create a Hive 3 table using the following DDL:
> {code:java}
> create table test_db.bucket_table(user_id int, key string) 
> comment 'A bucketed table' 
> partitioned by(country string) 
> clustered by(user_id) sorted by (key) into 10 buckets
> stored as ORC;{code}
> And then insert into this table using the following PySpark script:
> {code:java}
> from pyspark.sql import SparkSession
> appName = "PySpark Hive Bucketing Example"
> master = "local"
> # Create Spark session with Hive supported.
> spark = SparkSession.builder \
> .appName(appName) \
> .master(master) \
> .enableHiveSupport() \
> .getOrCreate()
> # prepare sample data for inserting into hive table
> data = []
> countries = ['CN', 'AU']
> for i in range(0, 1000):
> data.append([int(i),  'U'+str(i), countries[i % 2]])
> df = spark.createDataFrame(data, ['user_id', 'key', 'country'])
> df.show()
> # Save df to Hive table test_db.bucket_table
> df.write.mode('append').insertInto('test_db.bucket_table') {code}
> Then query the table using the following script:
> {code:java}
> from pyspark.sql import SparkSession
> appName = "PySpark Hive Bucketing Example"
> master = "local"
> # Create Spark session with Hive supported.
> spark = SparkSession.builder \
> .appName(appName) \
> .master(master) \
> .enableHiveSupport() \
> .getOrCreate()
> df = spark.sql("""select * from test_db.bucket_table
> where country='AU' and user_id=101
> """)
> df.show()
> df.explain(extended=True) {code}
> I am expecting to read from only one bucket file in HDFS but instead Spark 
> scanned all bucket files in partition folder country=AU.
> {code:java}
> == Parsed Logical Plan ==
> 'Project [*]
>  - 'Filter (('country = AU) AND ('t1.user_id = 101))
> - 'SubqueryAlias t1
>- 'UnresolvedRelation [test_db, bucket_table], [], false
> == Analyzed Logical Plan ==
> user_id: int, key: string, country: string
> Project [user_id#20, key#21, country#22]
>  - Filter ((country#22 = AU) AND (user_id#20 = 101))
> - SubqueryAlias t1
>- SubqueryAlias spark_catalog.test_db.bucket_table
>   - Relation test_db.bucket_table[user_id#20,key#21,country#22] orc
> == Optimized Logical Plan ==
> Filter (((isnotnull(country#22) AND isnotnull(user_id#20)) AND (country#22 = 
> AU)) AND (user_id#20 = 101))
>  - Relation test_db.bucket_table[user_id#20,key#21,country#22] orc
> == Physical Plan ==
> *(1) Filter (isnotnull(user_id#20) AND (user_id#20 = 101))
>  - *(1) ColumnarToRow
> - FileScan orc test_db.bucket_table[user_id#20,key#21,country#22] 
> Batched: true, DataFilters: [isnotnull(user_id#20), (user_id#20 = 101)], 
> Format: ORC, Location: InMemoryFileIndex(1 
> paths)[hdfs://localhost:9000/user/hive/warehouse/test_db.db/bucket_table/coun...,
>  PartitionFilters: [isnotnull(country#22), (country#22 = AU)], PushedFilters: 
> [IsNotNull(user_id), EqualTo(user_id,101)], ReadSchema: 
> struct   {code}
> *Am I doing something wrong? or is it because Spark doesn't support it? Your 
> guidance and help will be appreciated.* 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40225) PySpark rdd.takeOrdered should check num and numPartitions

2022-08-25 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-40225:
-

 Summary: PySpark rdd.takeOrdered should check num and numPartitions
 Key: SPARK-40225
 URL: https://issues.apache.org/jira/browse/SPARK-40225
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 3.4.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39656) Fix wrong namespace in DescribeNamespaceExec

2022-08-25 Thread XiDuo You (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

XiDuo You updated SPARK-39656:
--
Fix Version/s: 3.1.4
   3.4.0
   3.3.1
   3.2.3

> Fix wrong namespace in DescribeNamespaceExec
> 
>
> Key: SPARK-39656
> URL: https://issues.apache.org/jira/browse/SPARK-39656
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: XiDuo You
>Priority: Minor
> Fix For: 3.1.4, 3.4.0, 3.3.1, 3.2.3
>
>
> DescribeNamespaceExec should show whole namespace rather than last



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40224) Make ObjectHashAggregateExec release memory eagerly when fallback to sort-based

2022-08-25 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17585109#comment-17585109
 ] 

Apache Spark commented on SPARK-40224:
--

User 'ulysses-you' has created a pull request for this issue:
https://github.com/apache/spark/pull/37668

> Make ObjectHashAggregateExec release memory eagerly when fallback to 
> sort-based
> ---
>
> Key: SPARK-40224
> URL: https://issues.apache.org/jira/browse/SPARK-40224
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: XiDuo You
>Priority: Major
>
> Avoid OOM issue as far as possible:
> {code:java}
> ObjectAggregationIterator INFO - Aggregation hash map size 128 reaches 
> threshold capacity (128 entries), spilling and falling back to sort based 
> aggregation. You may change the threshold by adjust option 
> spark.sql.objectHashAggregate.sortBased.fallbackThreshold
> #
> # java.lang.OutOfMemoryError: Java heap space
> # -XX:OnOutOfMemoryError="kill %p"
> #   Executing /bin/sh -c "kill 46725"...{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40224) Make ObjectHashAggregateExec release memory eagerly when fallback to sort-based

2022-08-25 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40224:


Assignee: (was: Apache Spark)

> Make ObjectHashAggregateExec release memory eagerly when fallback to 
> sort-based
> ---
>
> Key: SPARK-40224
> URL: https://issues.apache.org/jira/browse/SPARK-40224
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: XiDuo You
>Priority: Major
>
> Avoid OOM issue as far as possible:
> {code:java}
> ObjectAggregationIterator INFO - Aggregation hash map size 128 reaches 
> threshold capacity (128 entries), spilling and falling back to sort based 
> aggregation. You may change the threshold by adjust option 
> spark.sql.objectHashAggregate.sortBased.fallbackThreshold
> #
> # java.lang.OutOfMemoryError: Java heap space
> # -XX:OnOutOfMemoryError="kill %p"
> #   Executing /bin/sh -c "kill 46725"...{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40224) Make ObjectHashAggregateExec release memory eagerly when fallback to sort-based

2022-08-25 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40224:


Assignee: Apache Spark

> Make ObjectHashAggregateExec release memory eagerly when fallback to 
> sort-based
> ---
>
> Key: SPARK-40224
> URL: https://issues.apache.org/jira/browse/SPARK-40224
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: XiDuo You
>Assignee: Apache Spark
>Priority: Major
>
> Avoid OOM issue as far as possible:
> {code:java}
> ObjectAggregationIterator INFO - Aggregation hash map size 128 reaches 
> threshold capacity (128 entries), spilling and falling back to sort based 
> aggregation. You may change the threshold by adjust option 
> spark.sql.objectHashAggregate.sortBased.fallbackThreshold
> #
> # java.lang.OutOfMemoryError: Java heap space
> # -XX:OnOutOfMemoryError="kill %p"
> #   Executing /bin/sh -c "kill 46725"...{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-26568) Too many partitions may cause thriftServer frequently Full GC

2022-08-25 Thread gglinux (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-26568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17585103#comment-17585103
 ] 

gglinux edited comment on SPARK-26568 at 8/26/22 2:46 AM:
--

We encountered the same problem and can be reproduced.[~srowen] [~cane] 

When there are 200 fields and 18 partitions. This problem can be triggered.

 

Please refer to the following comments for specific logs


was (Author: JIRAUSER294757):
We encountered the same problem and can be reproduced.

When there are 200 fields and 18 partitions. This problem can be triggered.

[^error.log]

 

> Too many partitions may cause thriftServer frequently Full GC
> -
>
> Key: SPARK-26568
> URL: https://issues.apache.org/jira/browse/SPARK-26568
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: zhoukang
>Priority: Major
> Attachments: error.log
>
>
> The reason is that:
> first we have a table with many partitions(may be several hundred);second, we 
> have some concurrent queries.Then the long-running thriftServer may encounter 
> OOM issue.
> Here is a case:
> call stack of OOM thread:
> {code:java}
> pool-34-thread-10
>   at 
> org.apache.hadoop.hive.metastore.api.StorageDescriptor.(Lorg/apache/hadoop/hive/metastore/api/StorageDescriptor;)V
>  (StorageDescriptor.java:240)
>   at 
> org.apache.hadoop.hive.metastore.api.Partition.(Lorg/apache/hadoop/hive/metastore/api/Partition;)V
>  (Partition.java:216)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.deepCopy(Lorg/apache/hadoop/hive/metastore/api/Partition;)Lorg/apache/hadoop/hive/metastore/api/Partition;
>  (HiveMetaStoreClient.java:1343)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.deepCopyPartitions(Ljava/util/Collection;Ljava/util/List;)Ljava/util/List;
>  (HiveMetaStoreClient.java:1409)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.deepCopyPartitions(Ljava/util/List;)Ljava/util/List;
>  (HiveMetaStoreClient.java:1397)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.listPartitionsByFilter(Ljava/lang/String;Ljava/lang/String;Ljava/lang/String;S)Ljava/util/List;
>  (HiveMetaStoreClient.java:914)
>   at 
> sun.reflect.GeneratedMethodAccessor98.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;
>  (Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;
>  (DelegatingMethodAccessorImpl.java:43)
>   at 
> java.lang.reflect.Method.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;
>  (Method.java:606)
>   at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(Ljava/lang/Object;Ljava/lang/reflect/Method;[Ljava/lang/Object;)Ljava/lang/Object;
>  (RetryingMetaStoreClient.java:90)
>   at 
> com.sun.proxy.$Proxy30.listPartitionsByFilter(Ljava/lang/String;Ljava/lang/String;Ljava/lang/String;S)Ljava/util/List;
>  (Unknown Source)
>   at 
> org.apache.hadoop.hive.ql.metadata.Hive.getPartitionsByFilter(Lorg/apache/hadoop/hive/ql/metadata/Table;Ljava/lang/String;)Ljava/util/List;
>  (Hive.java:1967)
>   at 
> sun.reflect.GeneratedMethodAccessor97.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;
>  (Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;
>  (DelegatingMethodAccessorImpl.java:43)
>   at 
> java.lang.reflect.Method.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;
>  (Method.java:606)
>   at 
> org.apache.spark.sql.hive.client.Shim_v0_13.getPartitionsByFilter(Lorg/apache/hadoop/hive/ql/metadata/Hive;Lorg/apache/hadoop/hive/ql/metadata/Table;Lscala/collection/Seq;)Lscala/collection/Seq;
>  (HiveShim.scala:602)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionsByFilter$1.apply()Lscala/collection/Seq;
>  (HiveClientImpl.scala:608)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionsByFilter$1.apply()Ljava/lang/Object;
>  (HiveClientImpl.scala:606)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply()Ljava/lang/Object;
>  (HiveClientImpl.scala:321)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(Lscala/Function0;Lscala/runtime/IntRef;Lscala/runtime/ObjectRef;Ljava/lang/Object;)V
>  (HiveClientImpl.scala:264)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(Lscala/Function0;)Ljava/lang/Object;
>  (HiveClientImpl.scala:263)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(Lscala/Function0;)Ljava/lang/Object;
>  (HiveClientImpl.scala:307)
>   at 
> 

[jira] [Created] (SPARK-40224) Make ObjectHashAggregateExec release memory eagerly when fallback to sort-based

2022-08-25 Thread XiDuo You (Jira)
XiDuo You created SPARK-40224:
-

 Summary: Make ObjectHashAggregateExec release memory eagerly when 
fallback to sort-based
 Key: SPARK-40224
 URL: https://issues.apache.org/jira/browse/SPARK-40224
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.4.0
Reporter: XiDuo You


Avoid OOM issue as far as possible:
{code:java}
ObjectAggregationIterator INFO - Aggregation hash map size 128 reaches 
threshold capacity (128 entries), spilling and falling back to sort based 
aggregation. You may change the threshold by adjust option 
spark.sql.objectHashAggregate.sortBased.fallbackThreshold

#
# java.lang.OutOfMemoryError: Java heap space
# -XX:OnOutOfMemoryError="kill %p"
#   Executing /bin/sh -c "kill 46725"...{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-26568) Too many partitions may cause thriftServer frequently Full GC

2022-08-25 Thread gglinux (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-26568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17585104#comment-17585104
 ] 

gglinux edited comment on SPARK-26568 at 8/26/22 2:46 AM:
--

{code:java}
at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56)
        at 
org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.doExecute(ShuffleExchangeExec.scala:119)
        at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
        at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
        at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
        at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
        at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
        at org.apache.spark.sql.execution.SortExec.doExecute(SortExec.scala:101)
        at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
        at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
        at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
        at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
        at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
        at 
org.apache.spark.sql.execution.window.WindowExec.doExecute(WindowExec.scala:302)
        at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
        at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
        at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
        at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
        at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
        at 
org.apache.spark.sql.execution.FilterExec.doExecute(basicPhysicalOperators.scala:213)
        at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
        at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
        at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
        at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
        at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
        at 
org.apache.spark.sql.execution.ProjectExec.doExecute(basicPhysicalOperators.scala:70)
        at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
        at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
        at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
        at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
        at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
        at 
org.apache.spark.sql.execution.UnionExec$$anonfun$doExecute$1.apply(basicPhysicalOperators.scala:571)
        at 
org.apache.spark.sql.execution.UnionExec$$anonfun$doExecute$1.apply(basicPhysicalOperators.scala:571)
        at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
        at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
        at scala.collection.immutable.List.foreach(List.scala:392)
        at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
        at scala.collection.immutable.List.map(List.scala:296)
        at 
org.apache.spark.sql.execution.UnionExec.doExecute(basicPhysicalOperators.scala:571)
        at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
        at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
        at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
        at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
        at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
        at 

[jira] [Assigned] (SPARK-40153) Unify the logic of resolve functions and table-valued functions

2022-08-25 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-40153:
---

Assignee: Allison Wang

> Unify the logic of resolve functions and table-valued functions
> ---
>
> Key: SPARK-40153
> URL: https://issues.apache.org/jira/browse/SPARK-40153
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
>
> Make ResolveTableValuedFunctions similar to ResolveFunctions: first try 
> resolving the function as a built-in or temp function, then expand the 
> identifier and resolve it as a persistent function.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40153) Unify the logic of resolve functions and table-valued functions

2022-08-25 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-40153.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 37586
[https://github.com/apache/spark/pull/37586]

> Unify the logic of resolve functions and table-valued functions
> ---
>
> Key: SPARK-40153
> URL: https://issues.apache.org/jira/browse/SPARK-40153
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
> Fix For: 3.4.0
>
>
> Make ResolveTableValuedFunctions similar to ResolveFunctions: first try 
> resolving the function as a built-in or temp function, then expand the 
> identifier and resolve it as a persistent function.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26568) Too many partitions may cause thriftServer frequently Full GC

2022-08-25 Thread gglinux (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-26568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17585104#comment-17585104
 ] 

gglinux commented on SPARK-26568:
-

{code:java}
//代码占位符
at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56)
        at 
org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.doExecute(ShuffleExchangeExec.scala:119)
        at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
        at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
        at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
        at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
        at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
        at org.apache.spark.sql.execution.SortExec.doExecute(SortExec.scala:101)
        at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
        at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
        at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
        at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
        at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
        at 
org.apache.spark.sql.execution.window.WindowExec.doExecute(WindowExec.scala:302)
        at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
        at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
        at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
        at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
        at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
        at 
org.apache.spark.sql.execution.FilterExec.doExecute(basicPhysicalOperators.scala:213)
        at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
        at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
        at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
        at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
        at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
        at 
org.apache.spark.sql.execution.ProjectExec.doExecute(basicPhysicalOperators.scala:70)
        at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
        at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
        at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
        at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
        at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
        at 
org.apache.spark.sql.execution.UnionExec$$anonfun$doExecute$1.apply(basicPhysicalOperators.scala:571)
        at 
org.apache.spark.sql.execution.UnionExec$$anonfun$doExecute$1.apply(basicPhysicalOperators.scala:571)
        at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
        at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
        at scala.collection.immutable.List.foreach(List.scala:392)
        at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
        at scala.collection.immutable.List.map(List.scala:296)
        at 
org.apache.spark.sql.execution.UnionExec.doExecute(basicPhysicalOperators.scala:571)
        at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
        at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
        at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
        at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
        at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
        at 

[jira] [Commented] (SPARK-26568) Too many partitions may cause thriftServer frequently Full GC

2022-08-25 Thread gglinux (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-26568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17585103#comment-17585103
 ] 

gglinux commented on SPARK-26568:
-

We encountered the same problem and can be reproduced.

When there are 200 fields and 18 partitions. This problem can be triggered.

[^error.log]

 

> Too many partitions may cause thriftServer frequently Full GC
> -
>
> Key: SPARK-26568
> URL: https://issues.apache.org/jira/browse/SPARK-26568
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: zhoukang
>Priority: Major
> Attachments: error.log
>
>
> The reason is that:
> first we have a table with many partitions(may be several hundred);second, we 
> have some concurrent queries.Then the long-running thriftServer may encounter 
> OOM issue.
> Here is a case:
> call stack of OOM thread:
> {code:java}
> pool-34-thread-10
>   at 
> org.apache.hadoop.hive.metastore.api.StorageDescriptor.(Lorg/apache/hadoop/hive/metastore/api/StorageDescriptor;)V
>  (StorageDescriptor.java:240)
>   at 
> org.apache.hadoop.hive.metastore.api.Partition.(Lorg/apache/hadoop/hive/metastore/api/Partition;)V
>  (Partition.java:216)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.deepCopy(Lorg/apache/hadoop/hive/metastore/api/Partition;)Lorg/apache/hadoop/hive/metastore/api/Partition;
>  (HiveMetaStoreClient.java:1343)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.deepCopyPartitions(Ljava/util/Collection;Ljava/util/List;)Ljava/util/List;
>  (HiveMetaStoreClient.java:1409)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.deepCopyPartitions(Ljava/util/List;)Ljava/util/List;
>  (HiveMetaStoreClient.java:1397)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.listPartitionsByFilter(Ljava/lang/String;Ljava/lang/String;Ljava/lang/String;S)Ljava/util/List;
>  (HiveMetaStoreClient.java:914)
>   at 
> sun.reflect.GeneratedMethodAccessor98.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;
>  (Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;
>  (DelegatingMethodAccessorImpl.java:43)
>   at 
> java.lang.reflect.Method.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;
>  (Method.java:606)
>   at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(Ljava/lang/Object;Ljava/lang/reflect/Method;[Ljava/lang/Object;)Ljava/lang/Object;
>  (RetryingMetaStoreClient.java:90)
>   at 
> com.sun.proxy.$Proxy30.listPartitionsByFilter(Ljava/lang/String;Ljava/lang/String;Ljava/lang/String;S)Ljava/util/List;
>  (Unknown Source)
>   at 
> org.apache.hadoop.hive.ql.metadata.Hive.getPartitionsByFilter(Lorg/apache/hadoop/hive/ql/metadata/Table;Ljava/lang/String;)Ljava/util/List;
>  (Hive.java:1967)
>   at 
> sun.reflect.GeneratedMethodAccessor97.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;
>  (Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;
>  (DelegatingMethodAccessorImpl.java:43)
>   at 
> java.lang.reflect.Method.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;
>  (Method.java:606)
>   at 
> org.apache.spark.sql.hive.client.Shim_v0_13.getPartitionsByFilter(Lorg/apache/hadoop/hive/ql/metadata/Hive;Lorg/apache/hadoop/hive/ql/metadata/Table;Lscala/collection/Seq;)Lscala/collection/Seq;
>  (HiveShim.scala:602)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionsByFilter$1.apply()Lscala/collection/Seq;
>  (HiveClientImpl.scala:608)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionsByFilter$1.apply()Ljava/lang/Object;
>  (HiveClientImpl.scala:606)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply()Ljava/lang/Object;
>  (HiveClientImpl.scala:321)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(Lscala/Function0;Lscala/runtime/IntRef;Lscala/runtime/ObjectRef;Ljava/lang/Object;)V
>  (HiveClientImpl.scala:264)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(Lscala/Function0;)Ljava/lang/Object;
>  (HiveClientImpl.scala:263)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(Lscala/Function0;)Ljava/lang/Object;
>  (HiveClientImpl.scala:307)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.getPartitionsByFilter(Lorg/apache/spark/sql/catalyst/catalog/CatalogTable;Lscala/collection/Seq;)Lscala/collection/Seq;
>  (HiveClientImpl.scala:606)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$listPartitionsByFilter$1.apply()Lscala/collection/Seq;
>  (HiveExternalCatalog.scala:1017)
>   at 
> 

[jira] [Updated] (SPARK-26568) Too many partitions may cause thriftServer frequently Full GC

2022-08-25 Thread gglinux (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-26568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

gglinux updated SPARK-26568:

Attachment: error.log

> Too many partitions may cause thriftServer frequently Full GC
> -
>
> Key: SPARK-26568
> URL: https://issues.apache.org/jira/browse/SPARK-26568
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: zhoukang
>Priority: Major
> Attachments: error.log
>
>
> The reason is that:
> first we have a table with many partitions(may be several hundred);second, we 
> have some concurrent queries.Then the long-running thriftServer may encounter 
> OOM issue.
> Here is a case:
> call stack of OOM thread:
> {code:java}
> pool-34-thread-10
>   at 
> org.apache.hadoop.hive.metastore.api.StorageDescriptor.(Lorg/apache/hadoop/hive/metastore/api/StorageDescriptor;)V
>  (StorageDescriptor.java:240)
>   at 
> org.apache.hadoop.hive.metastore.api.Partition.(Lorg/apache/hadoop/hive/metastore/api/Partition;)V
>  (Partition.java:216)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.deepCopy(Lorg/apache/hadoop/hive/metastore/api/Partition;)Lorg/apache/hadoop/hive/metastore/api/Partition;
>  (HiveMetaStoreClient.java:1343)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.deepCopyPartitions(Ljava/util/Collection;Ljava/util/List;)Ljava/util/List;
>  (HiveMetaStoreClient.java:1409)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.deepCopyPartitions(Ljava/util/List;)Ljava/util/List;
>  (HiveMetaStoreClient.java:1397)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.listPartitionsByFilter(Ljava/lang/String;Ljava/lang/String;Ljava/lang/String;S)Ljava/util/List;
>  (HiveMetaStoreClient.java:914)
>   at 
> sun.reflect.GeneratedMethodAccessor98.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;
>  (Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;
>  (DelegatingMethodAccessorImpl.java:43)
>   at 
> java.lang.reflect.Method.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;
>  (Method.java:606)
>   at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(Ljava/lang/Object;Ljava/lang/reflect/Method;[Ljava/lang/Object;)Ljava/lang/Object;
>  (RetryingMetaStoreClient.java:90)
>   at 
> com.sun.proxy.$Proxy30.listPartitionsByFilter(Ljava/lang/String;Ljava/lang/String;Ljava/lang/String;S)Ljava/util/List;
>  (Unknown Source)
>   at 
> org.apache.hadoop.hive.ql.metadata.Hive.getPartitionsByFilter(Lorg/apache/hadoop/hive/ql/metadata/Table;Ljava/lang/String;)Ljava/util/List;
>  (Hive.java:1967)
>   at 
> sun.reflect.GeneratedMethodAccessor97.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;
>  (Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;
>  (DelegatingMethodAccessorImpl.java:43)
>   at 
> java.lang.reflect.Method.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;
>  (Method.java:606)
>   at 
> org.apache.spark.sql.hive.client.Shim_v0_13.getPartitionsByFilter(Lorg/apache/hadoop/hive/ql/metadata/Hive;Lorg/apache/hadoop/hive/ql/metadata/Table;Lscala/collection/Seq;)Lscala/collection/Seq;
>  (HiveShim.scala:602)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionsByFilter$1.apply()Lscala/collection/Seq;
>  (HiveClientImpl.scala:608)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionsByFilter$1.apply()Ljava/lang/Object;
>  (HiveClientImpl.scala:606)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply()Ljava/lang/Object;
>  (HiveClientImpl.scala:321)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(Lscala/Function0;Lscala/runtime/IntRef;Lscala/runtime/ObjectRef;Ljava/lang/Object;)V
>  (HiveClientImpl.scala:264)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(Lscala/Function0;)Ljava/lang/Object;
>  (HiveClientImpl.scala:263)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(Lscala/Function0;)Ljava/lang/Object;
>  (HiveClientImpl.scala:307)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.getPartitionsByFilter(Lorg/apache/spark/sql/catalyst/catalog/CatalogTable;Lscala/collection/Seq;)Lscala/collection/Seq;
>  (HiveClientImpl.scala:606)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$listPartitionsByFilter$1.apply()Lscala/collection/Seq;
>  (HiveExternalCatalog.scala:1017)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$listPartitionsByFilter$1.apply()Ljava/lang/Object;
>  (HiveExternalCatalog.scala:1000)
>   at 
> 

[jira] [Updated] (SPARK-40223) Cannot alter table with locale tr

2022-08-25 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-40223:

Description: 
How to reproduce this issue:
{code:scala}
  test("Test update stats with locale tr") {
withSQLConf(SQLConf.CASE_SENSITIVE.key -> "true",
  SQLConf.AUTO_SIZE_UPDATE_ENABLED.key -> "true") {
  withLocale("tr") {
val tabName = "tAb_I"
withTable(tabName) {
  sql(s"CREATE TABLE $tabName(col_I int)")
  sql(s"INSERT OVERWRITE TABLE $tabName SELECT 1")
}
  }
}
  }
{code}

Error:
{noformat}
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: [tab_ı]: is not a 
valid table name
at 
org.apache.hadoop.hive.ql.metadata.Table.checkValidity(Table.java:192)
at org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:623)
at org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:612)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
{noformat}



  was:
How to reproduce this issue:
{code:scala}
  test("Test update stats with locale tr") {
withSQLConf(SQLConf.CASE_SENSITIVE.key -> "true",
  SQLConf.AUTO_SIZE_UPDATE_ENABLED.key -> "true") {
  withLocale("tr") {
val tabName = "tAb_I"
withTable(tabName) {
  sql(s"CREATE TABLE $tabName(col_I int) USING PARQUET")
  sql(s"INSERT OVERWRITE TABLE $tabName SELECT 1")
}
  }
}
  }
{code}

Error:
{noformat}
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: [tab_ı]: is not a 
valid table name
at 
org.apache.hadoop.hive.ql.metadata.Table.checkValidity(Table.java:192)
at org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:623)
at org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:612)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
{noformat}




> Cannot alter table with locale tr
> -
>
> Key: SPARK-40223
> URL: https://issues.apache.org/jira/browse/SPARK-40223
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Yuming Wang
>Priority: Major
>
> How to reproduce this issue:
> {code:scala}
>   test("Test update stats with locale tr") {
> withSQLConf(SQLConf.CASE_SENSITIVE.key -> "true",
>   SQLConf.AUTO_SIZE_UPDATE_ENABLED.key -> "true") {
>   withLocale("tr") {
> val tabName = "tAb_I"
> withTable(tabName) {
>   sql(s"CREATE TABLE $tabName(col_I int)")
>   sql(s"INSERT OVERWRITE TABLE $tabName SELECT 1")
> }
>   }
> }
>   }
> {code}
> Error:
> {noformat}
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: [tab_ı]: is not 
> a valid table name
>   at 
> org.apache.hadoop.hive.ql.metadata.Table.checkValidity(Table.java:192)
>   at org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:623)
>   at org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:612)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40223) Cannot alter table with locale tr

2022-08-25 Thread Yuming Wang (Jira)
Yuming Wang created SPARK-40223:
---

 Summary: Cannot alter table with locale tr
 Key: SPARK-40223
 URL: https://issues.apache.org/jira/browse/SPARK-40223
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.4.0
Reporter: Yuming Wang


How to reproduce this issue:
{code:scala}
  test("Test update stats with locale tr") {
withSQLConf(SQLConf.CASE_SENSITIVE.key -> "true",
  SQLConf.AUTO_SIZE_UPDATE_ENABLED.key -> "true") {
  withLocale("tr") {
val tabName = "tAb_I"
withTable(tabName) {
  sql(s"CREATE TABLE $tabName(col_I int) USING PARQUET")
  sql(s"INSERT OVERWRITE TABLE $tabName SELECT 1")
}
  }
}
  }
{code}

Error:
{noformat}
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: [tab_ı]: is not a 
valid table name
at 
org.apache.hadoop.hive.ql.metadata.Table.checkValidity(Table.java:192)
at org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:623)
at org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:612)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
{noformat}





--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40221) Not able to format using scalafmt

2022-08-25 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17585077#comment-17585077
 ] 

Hyukjin Kwon commented on SPARK-40221:
--

Does this still happen in the latest master branch? I can't reproduce it

> Not able to format using scalafmt
> -
>
> Key: SPARK-40221
> URL: https://issues.apache.org/jira/browse/SPARK-40221
> Project: Spark
>  Issue Type: Question
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Ziqi Liu
>Priority: Major
>
> I'm following the guidance in [https://spark.apache.org/developer-tools.html] 
> using 
> {code:java}
> ./dev/scalafmt{code}
> to format the code, but getting this error:
> {code:java}
> [ERROR] Failed to execute goal 
> org.antipathy:mvn-scalafmt_2.12:1.1.1640084764.9f463a9:format (default-cli) 
> on project spark-parent_2.12: Error formatting Scala files: missing setting 
> 'version'. To fix this problem, add the following line to .scalafmt.conf: 
> 'version=3.2.1'. -> [Help 1]
> [ERROR]
> [ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
> switch.
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> [ERROR]
> [ERROR] For more information about the errors and possible solutions, please 
> read the following articles:
> [ERROR] [Help 1] 
> http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40049) Add adaptive plan case in ReplaceNullWithFalseInPredicateEndToEndSuite

2022-08-25 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-40049.
--
Resolution: Invalid

> Add adaptive plan case in ReplaceNullWithFalseInPredicateEndToEndSuite
> --
>
> Key: SPARK-40049
> URL: https://issues.apache.org/jira/browse/SPARK-40049
> Project: Spark
>  Issue Type: Test
>  Components: SQL, Tests
>Affects Versions: 3.4.0
>Reporter: Kazuyuki Tanimura
>Priority: Minor
>
> Currently `ReplaceNullWithFalseInPredicateEndToEndSuite` assumes that 
> adaptive query execution is turned off. We should add cases 
> `spark.sql.adaptive.forceApply=true`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40049) Add adaptive plan case in ReplaceNullWithFalseInPredicateEndToEndSuite

2022-08-25 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-40049:
-
Fix Version/s: (was: 3.4.0)

> Add adaptive plan case in ReplaceNullWithFalseInPredicateEndToEndSuite
> --
>
> Key: SPARK-40049
> URL: https://issues.apache.org/jira/browse/SPARK-40049
> Project: Spark
>  Issue Type: Test
>  Components: SQL, Tests
>Affects Versions: 3.4.0
>Reporter: Kazuyuki Tanimura
>Priority: Minor
>
> Currently `ReplaceNullWithFalseInPredicateEndToEndSuite` assumes that 
> adaptive query execution is turned off. We should add cases 
> `spark.sql.adaptive.forceApply=true`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-40049) Add adaptive plan case in ReplaceNullWithFalseInPredicateEndToEndSuite

2022-08-25 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reopened SPARK-40049:
--
  Assignee: (was: Kazuyuki Tanimura)

> Add adaptive plan case in ReplaceNullWithFalseInPredicateEndToEndSuite
> --
>
> Key: SPARK-40049
> URL: https://issues.apache.org/jira/browse/SPARK-40049
> Project: Spark
>  Issue Type: Test
>  Components: SQL, Tests
>Affects Versions: 3.4.0
>Reporter: Kazuyuki Tanimura
>Priority: Minor
> Fix For: 3.4.0
>
>
> Currently `ReplaceNullWithFalseInPredicateEndToEndSuite` assumes that 
> adaptive query execution is turned off. We should add cases 
> `spark.sql.adaptive.forceApply=true`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-40088) Add SparkPlanWIthAQESuite

2022-08-25 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reopened SPARK-40088:
--
  Assignee: (was: Kazuyuki Tanimura)

> Add SparkPlanWIthAQESuite
> -
>
> Key: SPARK-40088
> URL: https://issues.apache.org/jira/browse/SPARK-40088
> Project: Spark
>  Issue Type: Test
>  Components: SQL, Tests
>Affects Versions: 3.4.0
>Reporter: Kazuyuki Tanimura
>Priority: Minor
> Fix For: 3.4.0
>
>
> Currently `SparkPlanSuite` assumes that AQE is always turned off. We should 
> also test with AQE turned on



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40088) Add SparkPlanWIthAQESuite

2022-08-25 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-40088.
--
Resolution: Invalid

> Add SparkPlanWIthAQESuite
> -
>
> Key: SPARK-40088
> URL: https://issues.apache.org/jira/browse/SPARK-40088
> Project: Spark
>  Issue Type: Test
>  Components: SQL, Tests
>Affects Versions: 3.4.0
>Reporter: Kazuyuki Tanimura
>Priority: Minor
>
> Currently `SparkPlanSuite` assumes that AQE is always turned off. We should 
> also test with AQE turned on



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40088) Add SparkPlanWIthAQESuite

2022-08-25 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-40088:
-
Fix Version/s: (was: 3.4.0)

> Add SparkPlanWIthAQESuite
> -
>
> Key: SPARK-40088
> URL: https://issues.apache.org/jira/browse/SPARK-40088
> Project: Spark
>  Issue Type: Test
>  Components: SQL, Tests
>Affects Versions: 3.4.0
>Reporter: Kazuyuki Tanimura
>Priority: Minor
>
> Currently `SparkPlanSuite` assumes that AQE is always turned off. We should 
> also test with AQE turned on



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40110) Add JDBCWithAQESuite

2022-08-25 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-40110.
--
Resolution: Invalid

Marking it as an invalid for now since we're not going to go this way.

> Add JDBCWithAQESuite
> 
>
> Key: SPARK-40110
> URL: https://issues.apache.org/jira/browse/SPARK-40110
> Project: Spark
>  Issue Type: Test
>  Components: SQL, Tests
>Affects Versions: 3.4.0
>Reporter: Kazuyuki Tanimura
>Assignee: Kazuyuki Tanimura
>Priority: Minor
>
> Currently `JDBCSuite` assumes that AQE is always turned off. We should also 
> test with AQE turned on



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40110) Add JDBCWithAQESuite

2022-08-25 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-40110:
-
Fix Version/s: (was: 3.4.0)

> Add JDBCWithAQESuite
> 
>
> Key: SPARK-40110
> URL: https://issues.apache.org/jira/browse/SPARK-40110
> Project: Spark
>  Issue Type: Test
>  Components: SQL, Tests
>Affects Versions: 3.4.0
>Reporter: Kazuyuki Tanimura
>Assignee: Kazuyuki Tanimura
>Priority: Minor
>
> Currently `JDBCSuite` assumes that AQE is always turned off. We should also 
> test with AQE turned on



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-40110) Add JDBCWithAQESuite

2022-08-25 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reopened SPARK-40110:
--

> Add JDBCWithAQESuite
> 
>
> Key: SPARK-40110
> URL: https://issues.apache.org/jira/browse/SPARK-40110
> Project: Spark
>  Issue Type: Test
>  Components: SQL, Tests
>Affects Versions: 3.4.0
>Reporter: Kazuyuki Tanimura
>Assignee: Kazuyuki Tanimura
>Priority: Minor
>
> Currently `JDBCSuite` assumes that AQE is always turned off. We should also 
> test with AQE turned on



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40088) Add SparkPlanWIthAQESuite

2022-08-25 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17585070#comment-17585070
 ] 

Apache Spark commented on SPARK-40088:
--

User 'kazuyukitanimura' has created a pull request for this issue:
https://github.com/apache/spark/pull/37665

> Add SparkPlanWIthAQESuite
> -
>
> Key: SPARK-40088
> URL: https://issues.apache.org/jira/browse/SPARK-40088
> Project: Spark
>  Issue Type: Test
>  Components: SQL, Tests
>Affects Versions: 3.4.0
>Reporter: Kazuyuki Tanimura
>Assignee: Kazuyuki Tanimura
>Priority: Minor
> Fix For: 3.4.0
>
>
> Currently `SparkPlanSuite` assumes that AQE is always turned off. We should 
> also test with AQE turned on



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40110) Add JDBCWithAQESuite

2022-08-25 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17585068#comment-17585068
 ] 

Apache Spark commented on SPARK-40110:
--

User 'kazuyukitanimura' has created a pull request for this issue:
https://github.com/apache/spark/pull/37666

> Add JDBCWithAQESuite
> 
>
> Key: SPARK-40110
> URL: https://issues.apache.org/jira/browse/SPARK-40110
> Project: Spark
>  Issue Type: Test
>  Components: SQL, Tests
>Affects Versions: 3.4.0
>Reporter: Kazuyuki Tanimura
>Assignee: Kazuyuki Tanimura
>Priority: Minor
> Fix For: 3.4.0
>
>
> Currently `JDBCSuite` assumes that AQE is always turned off. We should also 
> test with AQE turned on



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40088) Add SparkPlanWIthAQESuite

2022-08-25 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17585067#comment-17585067
 ] 

Apache Spark commented on SPARK-40088:
--

User 'kazuyukitanimura' has created a pull request for this issue:
https://github.com/apache/spark/pull/37665

> Add SparkPlanWIthAQESuite
> -
>
> Key: SPARK-40088
> URL: https://issues.apache.org/jira/browse/SPARK-40088
> Project: Spark
>  Issue Type: Test
>  Components: SQL, Tests
>Affects Versions: 3.4.0
>Reporter: Kazuyuki Tanimura
>Assignee: Kazuyuki Tanimura
>Priority: Minor
> Fix For: 3.4.0
>
>
> Currently `SparkPlanSuite` assumes that AQE is always turned off. We should 
> also test with AQE turned on



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40049) Add adaptive plan case in ReplaceNullWithFalseInPredicateEndToEndSuite

2022-08-25 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17585066#comment-17585066
 ] 

Apache Spark commented on SPARK-40049:
--

User 'kazuyukitanimura' has created a pull request for this issue:
https://github.com/apache/spark/pull/37664

> Add adaptive plan case in ReplaceNullWithFalseInPredicateEndToEndSuite
> --
>
> Key: SPARK-40049
> URL: https://issues.apache.org/jira/browse/SPARK-40049
> Project: Spark
>  Issue Type: Test
>  Components: SQL, Tests
>Affects Versions: 3.4.0
>Reporter: Kazuyuki Tanimura
>Assignee: Kazuyuki Tanimura
>Priority: Minor
> Fix For: 3.4.0
>
>
> Currently `ReplaceNullWithFalseInPredicateEndToEndSuite` assumes that 
> adaptive query execution is turned off. We should add cases 
> `spark.sql.adaptive.forceApply=true`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40222) Numeric try_add/try_divide/try_subtract/try_multiply should throw error from their children

2022-08-25 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17585047#comment-17585047
 ] 

Apache Spark commented on SPARK-40222:
--

User 'gengliangwang' has created a pull request for this issue:
https://github.com/apache/spark/pull/37663

> Numeric try_add/try_divide/try_subtract/try_multiply should throw error from 
> their children
> ---
>
> Key: SPARK-40222
> URL: https://issues.apache.org/jira/browse/SPARK-40222
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>
> Similar to https://issues.apache.org/jira/browse/SPARK-40054, we should 
> refactor the 
> {{{}try_add{}}}/{{{}try_subtract{}}}/{{{}try_multiply{}}}/{{{}try_divide{}}} 
> functions so that the errors from their children will be shown instead of 
> ignored.
>  Spark SQL allows arithmetic operations between 
> Number/Date/Timestamp/CalendarInterval/AnsiInterval (see the rule 
> [ResolveBinaryArithmetic|https://github.com/databricks/runtime/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala#L501]
>  for details). Some of these combinations can throw exceptions too: * Date + 
> CalendarInterval
>  * Date + AnsiInterval
>  * Timestamp + AnsiInterval
>  * Date - CalendarInterval
>  * Date - AnsiInterval
>  * Timestamp - AnsiInterval
>  * Number * CalendarInterval
>  * Number * AnsiInterval
>  * CalendarInterval / Number
>  * AnsiInterval / Number
> This Jira is for the cases when both input data types are numbers.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40222) Numeric try_add/try_divide/try_subtract/try_multiply should throw error from their children

2022-08-25 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40222:


Assignee: Gengliang Wang  (was: Apache Spark)

> Numeric try_add/try_divide/try_subtract/try_multiply should throw error from 
> their children
> ---
>
> Key: SPARK-40222
> URL: https://issues.apache.org/jira/browse/SPARK-40222
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>
> Similar to https://issues.apache.org/jira/browse/SPARK-40054, we should 
> refactor the 
> {{{}try_add{}}}/{{{}try_subtract{}}}/{{{}try_multiply{}}}/{{{}try_divide{}}} 
> functions so that the errors from their children will be shown instead of 
> ignored.
>  Spark SQL allows arithmetic operations between 
> Number/Date/Timestamp/CalendarInterval/AnsiInterval (see the rule 
> [ResolveBinaryArithmetic|https://github.com/databricks/runtime/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala#L501]
>  for details). Some of these combinations can throw exceptions too: * Date + 
> CalendarInterval
>  * Date + AnsiInterval
>  * Timestamp + AnsiInterval
>  * Date - CalendarInterval
>  * Date - AnsiInterval
>  * Timestamp - AnsiInterval
>  * Number * CalendarInterval
>  * Number * AnsiInterval
>  * CalendarInterval / Number
>  * AnsiInterval / Number
> This Jira is for the cases when both input data types are numbers.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40222) Numeric try_add/try_divide/try_subtract/try_multiply should throw error from their children

2022-08-25 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17585048#comment-17585048
 ] 

Apache Spark commented on SPARK-40222:
--

User 'gengliangwang' has created a pull request for this issue:
https://github.com/apache/spark/pull/37663

> Numeric try_add/try_divide/try_subtract/try_multiply should throw error from 
> their children
> ---
>
> Key: SPARK-40222
> URL: https://issues.apache.org/jira/browse/SPARK-40222
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>
> Similar to https://issues.apache.org/jira/browse/SPARK-40054, we should 
> refactor the 
> {{{}try_add{}}}/{{{}try_subtract{}}}/{{{}try_multiply{}}}/{{{}try_divide{}}} 
> functions so that the errors from their children will be shown instead of 
> ignored.
>  Spark SQL allows arithmetic operations between 
> Number/Date/Timestamp/CalendarInterval/AnsiInterval (see the rule 
> [ResolveBinaryArithmetic|https://github.com/databricks/runtime/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala#L501]
>  for details). Some of these combinations can throw exceptions too: * Date + 
> CalendarInterval
>  * Date + AnsiInterval
>  * Timestamp + AnsiInterval
>  * Date - CalendarInterval
>  * Date - AnsiInterval
>  * Timestamp - AnsiInterval
>  * Number * CalendarInterval
>  * Number * AnsiInterval
>  * CalendarInterval / Number
>  * AnsiInterval / Number
> This Jira is for the cases when both input data types are numbers.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40222) Numeric try_add/try_divide/try_subtract/try_multiply should throw error from their children

2022-08-25 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40222:


Assignee: Apache Spark  (was: Gengliang Wang)

> Numeric try_add/try_divide/try_subtract/try_multiply should throw error from 
> their children
> ---
>
> Key: SPARK-40222
> URL: https://issues.apache.org/jira/browse/SPARK-40222
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Gengliang Wang
>Assignee: Apache Spark
>Priority: Major
>
> Similar to https://issues.apache.org/jira/browse/SPARK-40054, we should 
> refactor the 
> {{{}try_add{}}}/{{{}try_subtract{}}}/{{{}try_multiply{}}}/{{{}try_divide{}}} 
> functions so that the errors from their children will be shown instead of 
> ignored.
>  Spark SQL allows arithmetic operations between 
> Number/Date/Timestamp/CalendarInterval/AnsiInterval (see the rule 
> [ResolveBinaryArithmetic|https://github.com/databricks/runtime/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala#L501]
>  for details). Some of these combinations can throw exceptions too: * Date + 
> CalendarInterval
>  * Date + AnsiInterval
>  * Timestamp + AnsiInterval
>  * Date - CalendarInterval
>  * Date - AnsiInterval
>  * Timestamp - AnsiInterval
>  * Number * CalendarInterval
>  * Number * AnsiInterval
>  * CalendarInterval / Number
>  * AnsiInterval / Number
> This Jira is for the cases when both input data types are numbers.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40222) Numeric try_add/try_divide/try_subtract/try_multiply should throw error from their children

2022-08-25 Thread Gengliang Wang (Jira)
Gengliang Wang created SPARK-40222:
--

 Summary: Numeric try_add/try_divide/try_subtract/try_multiply 
should throw error from their children
 Key: SPARK-40222
 URL: https://issues.apache.org/jira/browse/SPARK-40222
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.4.0
Reporter: Gengliang Wang
Assignee: Gengliang Wang


Similar to https://issues.apache.org/jira/browse/SPARK-40054, we should 
refactor the 
{{{}try_add{}}}/{{{}try_subtract{}}}/{{{}try_multiply{}}}/{{{}try_divide{}}} 
functions so that the errors from their children will be shown instead of 
ignored.
 Spark SQL allows arithmetic operations between 
Number/Date/Timestamp/CalendarInterval/AnsiInterval (see the rule 
[ResolveBinaryArithmetic|https://github.com/databricks/runtime/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala#L501]
 for details). Some of these combinations can throw exceptions too: * Date + 
CalendarInterval
 * Date + AnsiInterval
 * Timestamp + AnsiInterval
 * Date - CalendarInterval
 * Date - AnsiInterval
 * Timestamp - AnsiInterval
 * Number * CalendarInterval
 * Number * AnsiInterval
 * CalendarInterval / Number
 * AnsiInterval / Number

This Jira is for the cases when both input data types are numbers.

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40142) Make pyspark.sql.functions examples self-contained

2022-08-25 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17585035#comment-17585035
 ] 

Apache Spark commented on SPARK-40142:
--

User 'khalidmammadov' has created a pull request for this issue:
https://github.com/apache/spark/pull/37662

> Make pyspark.sql.functions examples self-contained
> --
>
> Key: SPARK-40142
> URL: https://issues.apache.org/jira/browse/SPARK-40142
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40142) Make pyspark.sql.functions examples self-contained

2022-08-25 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17585036#comment-17585036
 ] 

Apache Spark commented on SPARK-40142:
--

User 'khalidmammadov' has created a pull request for this issue:
https://github.com/apache/spark/pull/37662

> Make pyspark.sql.functions examples self-contained
> --
>
> Key: SPARK-40142
> URL: https://issues.apache.org/jira/browse/SPARK-40142
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40211) Allow executeTake() / collectLimit's number of starting partitions to be customized

2022-08-25 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40211:


Assignee: Apache Spark

> Allow executeTake() / collectLimit's number of starting partitions to be 
> customized
> ---
>
> Key: SPARK-40211
> URL: https://issues.apache.org/jira/browse/SPARK-40211
> Project: Spark
>  Issue Type: Story
>  Components: Spark Core, SQL
>Affects Versions: 3.4.0
>Reporter: Ziqi Liu
>Assignee: Apache Spark
>Priority: Major
>
> Today, Spark’s executeTake() code allow for the limitScaleUpFactor to be 
> customized but does not allow for the initial number of partitions to be 
> customized: it’s currently hardcoded to {{{}1{}}}.
> We should add a configuration so that the initial partition count can be 
> customized. By setting this new configuration to a high value we could 
> effectively mitigate the “run multiple jobs” overhead in {{take}} behavior. 
> We could also set it to higher-than-1-but-still-small values (like, say, 
> {{{}10{}}}) to achieve a middle-ground trade-off.
>  
> Essentially, we need to make {{numPartsToTry = 1L}} 
> ([code|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala#L481])
>  customizable. We should do this via a new SQL conf, similar to the 
> {{limitScaleUpFactor}} conf.
>  
> Spark has several near-duplicate versions of this code ([see code 
> search|https://github.com/apache/spark/search?q=numPartsToTry+%3D+1]) in:
>  * SparkPlan
>  * RDD
>  * pyspark rdd
> Also, in pyspark  {{limitScaleUpFactor}}  is not supported either. So for 
> now, I will focus on scala side first, leaving python side untouched and 
> meanwhile sync with pyspark members. Depending on the progress we can do them 
> all in one PR or make scala side change first and leave pyspark change as a 
> follow-up.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40211) Allow executeTake() / collectLimit's number of starting partitions to be customized

2022-08-25 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17584995#comment-17584995
 ] 

Apache Spark commented on SPARK-40211:
--

User 'liuzqt' has created a pull request for this issue:
https://github.com/apache/spark/pull/37661

> Allow executeTake() / collectLimit's number of starting partitions to be 
> customized
> ---
>
> Key: SPARK-40211
> URL: https://issues.apache.org/jira/browse/SPARK-40211
> Project: Spark
>  Issue Type: Story
>  Components: Spark Core, SQL
>Affects Versions: 3.4.0
>Reporter: Ziqi Liu
>Priority: Major
>
> Today, Spark’s executeTake() code allow for the limitScaleUpFactor to be 
> customized but does not allow for the initial number of partitions to be 
> customized: it’s currently hardcoded to {{{}1{}}}.
> We should add a configuration so that the initial partition count can be 
> customized. By setting this new configuration to a high value we could 
> effectively mitigate the “run multiple jobs” overhead in {{take}} behavior. 
> We could also set it to higher-than-1-but-still-small values (like, say, 
> {{{}10{}}}) to achieve a middle-ground trade-off.
>  
> Essentially, we need to make {{numPartsToTry = 1L}} 
> ([code|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala#L481])
>  customizable. We should do this via a new SQL conf, similar to the 
> {{limitScaleUpFactor}} conf.
>  
> Spark has several near-duplicate versions of this code ([see code 
> search|https://github.com/apache/spark/search?q=numPartsToTry+%3D+1]) in:
>  * SparkPlan
>  * RDD
>  * pyspark rdd
> Also, in pyspark  {{limitScaleUpFactor}}  is not supported either. So for 
> now, I will focus on scala side first, leaving python side untouched and 
> meanwhile sync with pyspark members. Depending on the progress we can do them 
> all in one PR or make scala side change first and leave pyspark change as a 
> follow-up.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40211) Allow executeTake() / collectLimit's number of starting partitions to be customized

2022-08-25 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40211:


Assignee: (was: Apache Spark)

> Allow executeTake() / collectLimit's number of starting partitions to be 
> customized
> ---
>
> Key: SPARK-40211
> URL: https://issues.apache.org/jira/browse/SPARK-40211
> Project: Spark
>  Issue Type: Story
>  Components: Spark Core, SQL
>Affects Versions: 3.4.0
>Reporter: Ziqi Liu
>Priority: Major
>
> Today, Spark’s executeTake() code allow for the limitScaleUpFactor to be 
> customized but does not allow for the initial number of partitions to be 
> customized: it’s currently hardcoded to {{{}1{}}}.
> We should add a configuration so that the initial partition count can be 
> customized. By setting this new configuration to a high value we could 
> effectively mitigate the “run multiple jobs” overhead in {{take}} behavior. 
> We could also set it to higher-than-1-but-still-small values (like, say, 
> {{{}10{}}}) to achieve a middle-ground trade-off.
>  
> Essentially, we need to make {{numPartsToTry = 1L}} 
> ([code|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala#L481])
>  customizable. We should do this via a new SQL conf, similar to the 
> {{limitScaleUpFactor}} conf.
>  
> Spark has several near-duplicate versions of this code ([see code 
> search|https://github.com/apache/spark/search?q=numPartsToTry+%3D+1]) in:
>  * SparkPlan
>  * RDD
>  * pyspark rdd
> Also, in pyspark  {{limitScaleUpFactor}}  is not supported either. So for 
> now, I will focus on scala side first, leaving python side untouched and 
> meanwhile sync with pyspark members. Depending on the progress we can do them 
> all in one PR or make scala side change first and leave pyspark change as a 
> follow-up.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40221) Not able to format using scalafmt

2022-08-25 Thread Ziqi Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ziqi Liu updated SPARK-40221:
-
Description: 
I'm following the guidance in [https://spark.apache.org/developer-tools.html] 
using 
{code:java}
./dev/scalafmt{code}

to format the code, but getting this error:
{code:java}
[ERROR] Failed to execute goal 
org.antipathy:mvn-scalafmt_2.12:1.1.1640084764.9f463a9:format (default-cli) on 
project spark-parent_2.12: Error formatting Scala files: missing setting 
'version'. To fix this problem, add the following line to .scalafmt.conf: 
'version=3.2.1'. -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException {code}
 

  was:
I'm following the guidance in [https://spark.apache.org/developer-tools.html] 
using 
./dev/scalafmt
to format the code, but getting this error:
{code:java}
[ERROR] Failed to execute goal 
org.antipathy:mvn-scalafmt_2.12:1.1.1640084764.9f463a9:format (default-cli) on 
project spark-parent_2.12: Error formatting Scala files: missing setting 
'version'. To fix this problem, add the following line to .scalafmt.conf: 
'version=3.2.1'. -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException {code}
 


> Not able to format using scalafmt
> -
>
> Key: SPARK-40221
> URL: https://issues.apache.org/jira/browse/SPARK-40221
> Project: Spark
>  Issue Type: Question
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Ziqi Liu
>Priority: Major
>
> I'm following the guidance in [https://spark.apache.org/developer-tools.html] 
> using 
> {code:java}
> ./dev/scalafmt{code}
> to format the code, but getting this error:
> {code:java}
> [ERROR] Failed to execute goal 
> org.antipathy:mvn-scalafmt_2.12:1.1.1640084764.9f463a9:format (default-cli) 
> on project spark-parent_2.12: Error formatting Scala files: missing setting 
> 'version'. To fix this problem, add the following line to .scalafmt.conf: 
> 'version=3.2.1'. -> [Help 1]
> [ERROR]
> [ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
> switch.
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> [ERROR]
> [ERROR] For more information about the errors and possible solutions, please 
> read the following articles:
> [ERROR] [Help 1] 
> http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40221) Not able to format using scalafmt

2022-08-25 Thread Ziqi Liu (Jira)
Ziqi Liu created SPARK-40221:


 Summary: Not able to format using scalafmt
 Key: SPARK-40221
 URL: https://issues.apache.org/jira/browse/SPARK-40221
 Project: Spark
  Issue Type: Question
  Components: Build
Affects Versions: 3.4.0
Reporter: Ziqi Liu


I'm following the guidance in [https://spark.apache.org/developer-tools.html] 
using 
./dev/scalafmt
to format the code, but getting this error:
{code:java}
[ERROR] Failed to execute goal 
org.antipathy:mvn-scalafmt_2.12:1.1.1640084764.9f463a9:format (default-cli) on 
project spark-parent_2.12: Error formatting Scala files: missing setting 
'version'. To fix this problem, add the following line to .scalafmt.conf: 
'version=3.2.1'. -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException {code}
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40131) Support NumPy ndarray in built-in functions

2022-08-25 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-40131:
-
Description: Support NumPy ndarray in built-in 
functions(`pyspark.sql.functions`) by introducing Py4J input converter 
`NumpyArrayConverter`. The converter converts a ndarray to a Java array.  (was: 
Per [https://github.com/apache/spark/pull/37560#discussion_r948572473]
we want to support NumPy ndarray in built-in functions)

> Support NumPy ndarray in built-in functions
> ---
>
> Key: SPARK-40131
> URL: https://issues.apache.org/jira/browse/SPARK-40131
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Priority: Major
>
> Support NumPy ndarray in built-in functions(`pyspark.sql.functions`) by 
> introducing Py4J input converter `NumpyArrayConverter`. The converter 
> converts a ndarray to a Java array.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40130) Support NumPy scalars in built-in functions

2022-08-25 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng resolved SPARK-40130.
--
  Assignee: Xinrong Meng
Resolution: Fixed

> Support NumPy scalars in built-in functions
> ---
>
> Key: SPARK-40130
> URL: https://issues.apache.org/jira/browse/SPARK-40130
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
>
> Support NumPy scalars in built-in functions by introducing Py4J input 
> converter `NumpyScalarConverter`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39483) Construct the schema from `np.dtype` when `createDataFrame` from a NumPy array

2022-08-25 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng reassigned SPARK-39483:


Assignee: Xinrong Meng

>  Construct the schema from `np.dtype` when `createDataFrame` from a NumPy 
> array
> ---
>
> Key: SPARK-39483
> URL: https://issues.apache.org/jira/browse/SPARK-39483
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
>
>  Construct the schema from `np.dtype` when `createDataFrame` from a NumPy 
> array.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40130) Support NumPy scalars in built-in functions

2022-08-25 Thread Xinrong Meng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17584975#comment-17584975
 ] 

Xinrong Meng commented on SPARK-40130:
--

Resolved by https://github.com/apache/spark/pull/37560

> Support NumPy scalars in built-in functions
> ---
>
> Key: SPARK-40130
> URL: https://issues.apache.org/jira/browse/SPARK-40130
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Priority: Major
>
> Support NumPy scalars in built-in functions by introducing Py4J input 
> converter `NumpyScalarConverter`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40220) Don't output the empty map of error message parameters

2022-08-25 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17584892#comment-17584892
 ] 

Apache Spark commented on SPARK-40220:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/37660

> Don't output the empty map of error message parameters
> --
>
> Key: SPARK-40220
> URL: https://issues.apache.org/jira/browse/SPARK-40220
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>
> In the current implementation, Spark output empty message parameters in the 
> MINIMAL and STANDARD formats:
> {code:json}
>  org.apache.spark.SparkRuntimeException
>  {
>   "errorClass" : "ELEMENT_AT_BY_INDEX_ZERO",
>   "messageParameters" : { }
>  }
> {code}
> that contradict w/ the approach for other JSON fields.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40220) Don't output the empty map of error message parameters

2022-08-25 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40220:


Assignee: Max Gekk  (was: Apache Spark)

> Don't output the empty map of error message parameters
> --
>
> Key: SPARK-40220
> URL: https://issues.apache.org/jira/browse/SPARK-40220
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>
> In the current implementation, Spark output empty message parameters in the 
> MINIMAL and STANDARD formats:
> {code:json}
>  org.apache.spark.SparkRuntimeException
>  {
>   "errorClass" : "ELEMENT_AT_BY_INDEX_ZERO",
>   "messageParameters" : { }
>  }
> {code}
> that contradict w/ the approach for other JSON fields.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40220) Don't output the empty map of error message parameters

2022-08-25 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40220:


Assignee: Apache Spark  (was: Max Gekk)

> Don't output the empty map of error message parameters
> --
>
> Key: SPARK-40220
> URL: https://issues.apache.org/jira/browse/SPARK-40220
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Apache Spark
>Priority: Major
>
> In the current implementation, Spark output empty message parameters in the 
> MINIMAL and STANDARD formats:
> {code:json}
>  org.apache.spark.SparkRuntimeException
>  {
>   "errorClass" : "ELEMENT_AT_BY_INDEX_ZERO",
>   "messageParameters" : { }
>  }
> {code}
> that contradict w/ the approach for other JSON fields.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40220) Don't output the empty map of error message parameters

2022-08-25 Thread Max Gekk (Jira)
Max Gekk created SPARK-40220:


 Summary: Don't output the empty map of error message parameters
 Key: SPARK-40220
 URL: https://issues.apache.org/jira/browse/SPARK-40220
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.4.0
Reporter: Max Gekk
Assignee: Max Gekk


In the current implementation, Spark output empty message parameters in the 
MINIMAL and STANDARD formats:

{code:json}
 org.apache.spark.SparkRuntimeException
 {
  "errorClass" : "ELEMENT_AT_BY_INDEX_ZERO",
  "messageParameters" : { }
 }
{code}
that contradict w/ the approach for other JSON fields.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40212) SparkSQL castPartValue does not properly handle byte & short

2022-08-25 Thread Brennan Stein (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17584884#comment-17584884
 ] 

Brennan Stein commented on SPARK-40212:
---

Is a PR with unit test adequate reproduction? :) Realized it was actually a 
very simple fix

[https://github.com/apache/spark/pull/37659]

 

> SparkSQL castPartValue does not properly handle byte & short
> 
>
> Key: SPARK-40212
> URL: https://issues.apache.org/jira/browse/SPARK-40212
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Brennan Stein
>Priority: Major
>
> Reading in a parquet file partitioned on disk by a `Byte`-type column fails 
> with the following exception:
>  
> {code:java}
> [info]   Cause: java.lang.ClassCastException: java.lang.Integer cannot be 
> cast to java.lang.Byte
> [info]   at scala.runtime.BoxesRunTime.unboxToByte(BoxesRunTime.java:95)
> [info]   at 
> org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow.getByte(rows.scala:39)
> [info]   at 
> org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow.getByte$(rows.scala:39)
> [info]   at 
> org.apache.spark.sql.catalyst.expressions.GenericInternalRow.getByte(rows.scala:195)
> [info]   at 
> org.apache.spark.sql.catalyst.expressions.JoinedRow.getByte(JoinedRow.scala:86)
> [info]   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.writeFields_0_6$(Unknown
>  Source)
> [info]   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
>  Source)
> [info]   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat.$anonfun$buildReaderWithPartitionValues$8(ParquetFileFormat.scala:385)
> [info]   at 
> org.apache.spark.sql.execution.datasources.RecordReaderIterator$$anon$1.next(RecordReaderIterator.scala:62)
> [info]   at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.next(FileScanRDD.scala:189)
> [info]   at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
> [info]   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
>  Source)
> [info]   at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
> [info]   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760)
> [info]   at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:364)
> [info]   at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:890)
> [info]   at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:890)
> [info]   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
> [info]   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365)
> [info]   at org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
> [info]   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
> [info]   at org.apache.spark.scheduler.Task.run(Task.scala:136)
> [info]   at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
> [info]   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504)
> [info]   at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
> [info]   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> [info]   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> [info]   at java.lang.Thread.run(Thread.java:748) {code}
> I believe the issue to stem from 
> [PartitioningUtils::castPartValueToDesiredType|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala#L533]
>  returning an Integer for ByteType and ShortType (which then fails to unbox 
> to the expected type):
>  
> {code:java}
> case ByteType | ShortType | IntegerType => Integer.parseInt(value) {code}
>  
> The issue appears to have been introduced in [this 
> commit|https://github.com/apache/spark/commit/fc29c91f27d866502f5b6cc4261d4943b57e]
>  so likely affects Spark 3.2 as well, though I've only tested on 3.3.0.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33760) Extend Dynamic Partition Pruning Support to DataSources

2022-08-25 Thread Willi Raschkowski (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17584882#comment-17584882
 ] 

Willi Raschkowski commented on SPARK-33760:
---

Is this related to SPARK-35779?

> Extend Dynamic Partition Pruning Support to DataSources
> ---
>
> Key: SPARK-33760
> URL: https://issues.apache.org/jira/browse/SPARK-33760
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1
>Reporter: Anoop Johnson
>Priority: Major
>
> The implementation of Dynamic Partition Pruning  (DPP) in Spark is 
> [specific|https://github.com/apache/spark/blob/fb2e3af4b5d92398d57e61b766466cc7efd9d7cb/sql/core/src/main/scala/org/apache/spark/sql/execution/dynamicpruning/PartitionPruning.scala#L59-L64]
>  to HadoopFSRelation. As a result, DPP is not triggered for queries that use 
> data sources. 
> The DataSource v2 readers can expose the partition metadata. Can we use this 
> metadata and extend DPP to work on data sources as well?
> Would appreciate thoughts or corner cases we need to handle.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40212) SparkSQL castPartValue does not properly handle byte & short

2022-08-25 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40212:


Assignee: Apache Spark

> SparkSQL castPartValue does not properly handle byte & short
> 
>
> Key: SPARK-40212
> URL: https://issues.apache.org/jira/browse/SPARK-40212
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Brennan Stein
>Assignee: Apache Spark
>Priority: Major
>
> Reading in a parquet file partitioned on disk by a `Byte`-type column fails 
> with the following exception:
>  
> {code:java}
> [info]   Cause: java.lang.ClassCastException: java.lang.Integer cannot be 
> cast to java.lang.Byte
> [info]   at scala.runtime.BoxesRunTime.unboxToByte(BoxesRunTime.java:95)
> [info]   at 
> org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow.getByte(rows.scala:39)
> [info]   at 
> org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow.getByte$(rows.scala:39)
> [info]   at 
> org.apache.spark.sql.catalyst.expressions.GenericInternalRow.getByte(rows.scala:195)
> [info]   at 
> org.apache.spark.sql.catalyst.expressions.JoinedRow.getByte(JoinedRow.scala:86)
> [info]   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.writeFields_0_6$(Unknown
>  Source)
> [info]   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
>  Source)
> [info]   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat.$anonfun$buildReaderWithPartitionValues$8(ParquetFileFormat.scala:385)
> [info]   at 
> org.apache.spark.sql.execution.datasources.RecordReaderIterator$$anon$1.next(RecordReaderIterator.scala:62)
> [info]   at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.next(FileScanRDD.scala:189)
> [info]   at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
> [info]   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
>  Source)
> [info]   at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
> [info]   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760)
> [info]   at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:364)
> [info]   at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:890)
> [info]   at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:890)
> [info]   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
> [info]   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365)
> [info]   at org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
> [info]   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
> [info]   at org.apache.spark.scheduler.Task.run(Task.scala:136)
> [info]   at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
> [info]   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504)
> [info]   at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
> [info]   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> [info]   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> [info]   at java.lang.Thread.run(Thread.java:748) {code}
> I believe the issue to stem from 
> [PartitioningUtils::castPartValueToDesiredType|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala#L533]
>  returning an Integer for ByteType and ShortType (which then fails to unbox 
> to the expected type):
>  
> {code:java}
> case ByteType | ShortType | IntegerType => Integer.parseInt(value) {code}
>  
> The issue appears to have been introduced in [this 
> commit|https://github.com/apache/spark/commit/fc29c91f27d866502f5b6cc4261d4943b57e]
>  so likely affects Spark 3.2 as well, though I've only tested on 3.3.0.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40212) SparkSQL castPartValue does not properly handle byte & short

2022-08-25 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40212:


Assignee: (was: Apache Spark)

> SparkSQL castPartValue does not properly handle byte & short
> 
>
> Key: SPARK-40212
> URL: https://issues.apache.org/jira/browse/SPARK-40212
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Brennan Stein
>Priority: Major
>
> Reading in a parquet file partitioned on disk by a `Byte`-type column fails 
> with the following exception:
>  
> {code:java}
> [info]   Cause: java.lang.ClassCastException: java.lang.Integer cannot be 
> cast to java.lang.Byte
> [info]   at scala.runtime.BoxesRunTime.unboxToByte(BoxesRunTime.java:95)
> [info]   at 
> org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow.getByte(rows.scala:39)
> [info]   at 
> org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow.getByte$(rows.scala:39)
> [info]   at 
> org.apache.spark.sql.catalyst.expressions.GenericInternalRow.getByte(rows.scala:195)
> [info]   at 
> org.apache.spark.sql.catalyst.expressions.JoinedRow.getByte(JoinedRow.scala:86)
> [info]   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.writeFields_0_6$(Unknown
>  Source)
> [info]   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
>  Source)
> [info]   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat.$anonfun$buildReaderWithPartitionValues$8(ParquetFileFormat.scala:385)
> [info]   at 
> org.apache.spark.sql.execution.datasources.RecordReaderIterator$$anon$1.next(RecordReaderIterator.scala:62)
> [info]   at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.next(FileScanRDD.scala:189)
> [info]   at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
> [info]   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
>  Source)
> [info]   at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
> [info]   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760)
> [info]   at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:364)
> [info]   at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:890)
> [info]   at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:890)
> [info]   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
> [info]   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365)
> [info]   at org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
> [info]   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
> [info]   at org.apache.spark.scheduler.Task.run(Task.scala:136)
> [info]   at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
> [info]   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504)
> [info]   at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
> [info]   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> [info]   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> [info]   at java.lang.Thread.run(Thread.java:748) {code}
> I believe the issue to stem from 
> [PartitioningUtils::castPartValueToDesiredType|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala#L533]
>  returning an Integer for ByteType and ShortType (which then fails to unbox 
> to the expected type):
>  
> {code:java}
> case ByteType | ShortType | IntegerType => Integer.parseInt(value) {code}
>  
> The issue appears to have been introduced in [this 
> commit|https://github.com/apache/spark/commit/fc29c91f27d866502f5b6cc4261d4943b57e]
>  so likely affects Spark 3.2 as well, though I've only tested on 3.3.0.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40212) SparkSQL castPartValue does not properly handle byte & short

2022-08-25 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17584880#comment-17584880
 ] 

Apache Spark commented on SPARK-40212:
--

User 'BrennanStein' has created a pull request for this issue:
https://github.com/apache/spark/pull/37659

> SparkSQL castPartValue does not properly handle byte & short
> 
>
> Key: SPARK-40212
> URL: https://issues.apache.org/jira/browse/SPARK-40212
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Brennan Stein
>Priority: Major
>
> Reading in a parquet file partitioned on disk by a `Byte`-type column fails 
> with the following exception:
>  
> {code:java}
> [info]   Cause: java.lang.ClassCastException: java.lang.Integer cannot be 
> cast to java.lang.Byte
> [info]   at scala.runtime.BoxesRunTime.unboxToByte(BoxesRunTime.java:95)
> [info]   at 
> org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow.getByte(rows.scala:39)
> [info]   at 
> org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow.getByte$(rows.scala:39)
> [info]   at 
> org.apache.spark.sql.catalyst.expressions.GenericInternalRow.getByte(rows.scala:195)
> [info]   at 
> org.apache.spark.sql.catalyst.expressions.JoinedRow.getByte(JoinedRow.scala:86)
> [info]   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.writeFields_0_6$(Unknown
>  Source)
> [info]   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
>  Source)
> [info]   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat.$anonfun$buildReaderWithPartitionValues$8(ParquetFileFormat.scala:385)
> [info]   at 
> org.apache.spark.sql.execution.datasources.RecordReaderIterator$$anon$1.next(RecordReaderIterator.scala:62)
> [info]   at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.next(FileScanRDD.scala:189)
> [info]   at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
> [info]   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
>  Source)
> [info]   at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
> [info]   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760)
> [info]   at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:364)
> [info]   at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:890)
> [info]   at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:890)
> [info]   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
> [info]   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365)
> [info]   at org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
> [info]   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
> [info]   at org.apache.spark.scheduler.Task.run(Task.scala:136)
> [info]   at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
> [info]   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504)
> [info]   at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
> [info]   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> [info]   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> [info]   at java.lang.Thread.run(Thread.java:748) {code}
> I believe the issue to stem from 
> [PartitioningUtils::castPartValueToDesiredType|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala#L533]
>  returning an Integer for ByteType and ShortType (which then fails to unbox 
> to the expected type):
>  
> {code:java}
> case ByteType | ShortType | IntegerType => Integer.parseInt(value) {code}
>  
> The issue appears to have been introduced in [this 
> commit|https://github.com/apache/spark/commit/fc29c91f27d866502f5b6cc4261d4943b57e]
>  so likely affects Spark 3.2 as well, though I've only tested on 3.3.0.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40192) Remove redundant groupby

2022-08-25 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-40192.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 37628
[https://github.com/apache/spark/pull/37628]

> Remove redundant groupby
> 
>
> Key: SPARK-40192
> URL: https://issues.apache.org/jira/browse/SPARK-40192
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: deshanxiao
>Assignee: deshanxiao
>Priority: Minor
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40192) Remove redundant groupby

2022-08-25 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen reassigned SPARK-40192:


Assignee: deshanxiao

> Remove redundant groupby
> 
>
> Key: SPARK-40192
> URL: https://issues.apache.org/jira/browse/SPARK-40192
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: deshanxiao
>Assignee: deshanxiao
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40192) Remove redundant groupby

2022-08-25 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen updated SPARK-40192:
-
Priority: Trivial  (was: Minor)

> Remove redundant groupby
> 
>
> Key: SPARK-40192
> URL: https://issues.apache.org/jira/browse/SPARK-40192
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: deshanxiao
>Assignee: deshanxiao
>Priority: Trivial
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40219) resolved view plan should hold the schema to avoid redundant lookup

2022-08-25 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17584833#comment-17584833
 ] 

Apache Spark commented on SPARK-40219:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/37658

> resolved view plan should hold the schema to avoid redundant lookup
> ---
>
> Key: SPARK-40219
> URL: https://issues.apache.org/jira/browse/SPARK-40219
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Wenchen Fan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40219) resolved view plan should hold the schema to avoid redundant lookup

2022-08-25 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40219:


Assignee: (was: Apache Spark)

> resolved view plan should hold the schema to avoid redundant lookup
> ---
>
> Key: SPARK-40219
> URL: https://issues.apache.org/jira/browse/SPARK-40219
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Wenchen Fan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40219) resolved view plan should hold the schema to avoid redundant lookup

2022-08-25 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40219:


Assignee: Apache Spark

> resolved view plan should hold the schema to avoid redundant lookup
> ---
>
> Key: SPARK-40219
> URL: https://issues.apache.org/jira/browse/SPARK-40219
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Wenchen Fan
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40219) resolved view plan should hold the schema to avoid redundant lookup

2022-08-25 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17584830#comment-17584830
 ] 

Apache Spark commented on SPARK-40219:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/37658

> resolved view plan should hold the schema to avoid redundant lookup
> ---
>
> Key: SPARK-40219
> URL: https://issues.apache.org/jira/browse/SPARK-40219
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Wenchen Fan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40219) resolved view plan should hold the schema to avoid redundant lookup

2022-08-25 Thread Wenchen Fan (Jira)
Wenchen Fan created SPARK-40219:
---

 Summary: resolved view plan should hold the schema to avoid 
redundant lookup
 Key: SPARK-40219
 URL: https://issues.apache.org/jira/browse/SPARK-40219
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.4.0
Reporter: Wenchen Fan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40010) Make pyspark.sql.window examples self-contained

2022-08-25 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17584826#comment-17584826
 ] 

Apache Spark commented on SPARK-40010:
--

User 'dcoliversun' has created a pull request for this issue:
https://github.com/apache/spark/pull/37657

> Make pyspark.sql.window examples self-contained
> ---
>
> Key: SPARK-40010
> URL: https://issues.apache.org/jira/browse/SPARK-40010
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 3.4.0
>Reporter: Qian Sun
>Assignee: Qian Sun
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39562) Make hive-thrift server module passes in IPv6 environment

2022-08-25 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17584770#comment-17584770
 ] 

Apache Spark commented on SPARK-39562:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/37656

> Make hive-thrift server module passes in IPv6 environment
> -
>
> Key: SPARK-39562
> URL: https://issues.apache.org/jira/browse/SPARK-39562
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Tests
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39562) Make hive-thrift server module passes in IPv6 environment

2022-08-25 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17584769#comment-17584769
 ] 

Apache Spark commented on SPARK-39562:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/37656

> Make hive-thrift server module passes in IPv6 environment
> -
>
> Key: SPARK-39562
> URL: https://issues.apache.org/jira/browse/SPARK-39562
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Tests
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40218) GROUPING SETS should preserve the grouping columns

2022-08-25 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17584756#comment-17584756
 ] 

Apache Spark commented on SPARK-40218:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/37655

> GROUPING SETS should preserve the grouping columns
> --
>
> Key: SPARK-40218
> URL: https://issues.apache.org/jira/browse/SPARK-40218
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40218) GROUPING SETS should preserve the grouping columns

2022-08-25 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17584754#comment-17584754
 ] 

Apache Spark commented on SPARK-40218:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/37655

> GROUPING SETS should preserve the grouping columns
> --
>
> Key: SPARK-40218
> URL: https://issues.apache.org/jira/browse/SPARK-40218
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40218) GROUPING SETS should preserve the grouping columns

2022-08-25 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40218:


Assignee: Wenchen Fan  (was: Apache Spark)

> GROUPING SETS should preserve the grouping columns
> --
>
> Key: SPARK-40218
> URL: https://issues.apache.org/jira/browse/SPARK-40218
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40218) GROUPING SETS should preserve the grouping columns

2022-08-25 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40218:


Assignee: Apache Spark  (was: Wenchen Fan)

> GROUPING SETS should preserve the grouping columns
> --
>
> Key: SPARK-40218
> URL: https://issues.apache.org/jira/browse/SPARK-40218
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Wenchen Fan
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40218) GROUPING SETS should preserve the grouping columns

2022-08-25 Thread Wenchen Fan (Jira)
Wenchen Fan created SPARK-40218:
---

 Summary: GROUPING SETS should preserve the grouping columns
 Key: SPARK-40218
 URL: https://issues.apache.org/jira/browse/SPARK-40218
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.2.0
Reporter: Wenchen Fan
Assignee: Wenchen Fan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40217) Support java.math.BigDecimal as an external type of Decimal128 type

2022-08-25 Thread jiaan.geng (Jira)
jiaan.geng created SPARK-40217:
--

 Summary: Support java.math.BigDecimal as an external type of 
Decimal128 type
 Key: SPARK-40217
 URL: https://issues.apache.org/jira/browse/SPARK-40217
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.4.0
Reporter: jiaan.geng


Allow parallelization/collection of java.math.BigDecimal values, and convert 
the values to int128 values of Decimal128Type.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40205) Provide a query context of ELEMENT_AT_BY_INDEX_ZERO

2022-08-25 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-40205.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 37645
[https://github.com/apache/spark/pull/37645]

> Provide a query context of ELEMENT_AT_BY_INDEX_ZERO
> ---
>
> Key: SPARK-40205
> URL: https://issues.apache.org/jira/browse/SPARK-40205
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
> Fix For: 3.4.0
>
>
> Pass a query context to elementAtByIndexZeroError() in ElementAt



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40209) Incorrect value in the error message of NUMERIC_VALUE_OUT_OF_RANGE

2022-08-25 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-40209.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 37649
[https://github.com/apache/spark/pull/37649]

> Incorrect value in the error message of NUMERIC_VALUE_OUT_OF_RANGE
> --
>
> Key: SPARK-40209
> URL: https://issues.apache.org/jira/browse/SPARK-40209
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
> Fix For: 3.4.0
>
>
> The example below demonstrates the issue:
> {code:sql}
> spark-sql> select cast(interval '10.123' second as decimal(1, 0));
> [NUMERIC_VALUE_OUT_OF_RANGE] 0.10 cannot be represented as Decimal(1, 0). 
> If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error.
> {code}
> The value 0.10 is not related to 10.123.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40053) HiveExternalCatalogVersionsSuite will test all spark versions and aborted when Python 2.7 is used

2022-08-25 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-40053.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 37487
[https://github.com/apache/spark/pull/37487]

> HiveExternalCatalogVersionsSuite will test all spark versions and aborted 
> when Python 2.7 is used 
> --
>
> Key: SPARK-40053
> URL: https://issues.apache.org/jira/browse/SPARK-40053
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
> Fix For: 3.4.0
>
>
> When the test environment is java 8 + Python 2.7,  
> HiveExternalCatalogVersionsSuite will test all Spark 3.x versions and Spark 
> 2.4.8,  and all test will ABORTED
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40053) HiveExternalCatalogVersionsSuite will test all spark versions and aborted when Python 2.7 is used

2022-08-25 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-40053:


Assignee: Yang Jie

> HiveExternalCatalogVersionsSuite will test all spark versions and aborted 
> when Python 2.7 is used 
> --
>
> Key: SPARK-40053
> URL: https://issues.apache.org/jira/browse/SPARK-40053
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
>
> When the test environment is java 8 + Python 2.7,  
> HiveExternalCatalogVersionsSuite will test all Spark 3.x versions and Spark 
> 2.4.8,  and all test will ABORTED
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40216) Extract common `prepareWrite` method for `ParquetFileFormat` and `ParquetWrite` to eliminate duplicate code

2022-08-25 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40216:


Assignee: Apache Spark

> Extract common `prepareWrite` method for `ParquetFileFormat` and 
> `ParquetWrite` to eliminate duplicate code
> ---
>
> Key: SPARK-40216
> URL: https://issues.apache.org/jira/browse/SPARK-40216
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Minor
>
> many duplicate codes in `ParquetFileFormat.prepareWrite`  and 
> `ParqeutWrite.prepareWrite` 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40216) Extract common `prepareWrite` method for `ParquetFileFormat` and `ParquetWrite` to eliminate duplicate code

2022-08-25 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17584660#comment-17584660
 ] 

Apache Spark commented on SPARK-40216:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/37654

> Extract common `prepareWrite` method for `ParquetFileFormat` and 
> `ParquetWrite` to eliminate duplicate code
> ---
>
> Key: SPARK-40216
> URL: https://issues.apache.org/jira/browse/SPARK-40216
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Minor
>
> many duplicate codes in `ParquetFileFormat.prepareWrite`  and 
> `ParqeutWrite.prepareWrite` 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40216) Extract common `prepareWrite` method for `ParquetFileFormat` and `ParquetWrite` to eliminate duplicate code

2022-08-25 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40216:


Assignee: (was: Apache Spark)

> Extract common `prepareWrite` method for `ParquetFileFormat` and 
> `ParquetWrite` to eliminate duplicate code
> ---
>
> Key: SPARK-40216
> URL: https://issues.apache.org/jira/browse/SPARK-40216
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Minor
>
> many duplicate codes in `ParquetFileFormat.prepareWrite`  and 
> `ParqeutWrite.prepareWrite` 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40216) Extract common `prepareWrite` method for `ParquetFileFormat` and `ParquetWrite` to eliminate duplicate code

2022-08-25 Thread Yang Jie (Jira)
Yang Jie created SPARK-40216:


 Summary: Extract common `prepareWrite` method for 
`ParquetFileFormat` and `ParquetWrite` to eliminate duplicate code
 Key: SPARK-40216
 URL: https://issues.apache.org/jira/browse/SPARK-40216
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.4.0
Reporter: Yang Jie


many duplicate codes in `ParquetFileFormat.prepareWrite`  and 
`ParqeutWrite.prepareWrite` 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40214) Add `get` to dataframe functions

2022-08-25 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-40214.
---
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 37652
[https://github.com/apache/spark/pull/37652]

> Add `get` to dataframe functions
> 
>
> Key: SPARK-40214
> URL: https://issues.apache.org/jira/browse/SPARK-40214
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, SQL
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40215) Add SQL configs to control CSV/JSON date and timestamp parsing behaviour

2022-08-25 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17584634#comment-17584634
 ] 

Apache Spark commented on SPARK-40215:
--

User 'sadikovi' has created a pull request for this issue:
https://github.com/apache/spark/pull/37653

> Add SQL configs to control CSV/JSON date and timestamp parsing behaviour
> 
>
> Key: SPARK-40215
> URL: https://issues.apache.org/jira/browse/SPARK-40215
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Ivan Sadikov
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40215) Add SQL configs to control CSV/JSON date and timestamp parsing behaviour

2022-08-25 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17584633#comment-17584633
 ] 

Apache Spark commented on SPARK-40215:
--

User 'sadikovi' has created a pull request for this issue:
https://github.com/apache/spark/pull/37653

> Add SQL configs to control CSV/JSON date and timestamp parsing behaviour
> 
>
> Key: SPARK-40215
> URL: https://issues.apache.org/jira/browse/SPARK-40215
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Ivan Sadikov
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org