[jira] [Updated] (SPARK-47471) Support order-insensitive lateral column alias

2024-03-19 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-47471:

Component/s: SQL
 (was: Block Manager)

> Support order-insensitive lateral column alias
> --
>
> Key: SPARK-47471
> URL: https://issues.apache.org/jira/browse/SPARK-47471
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.4
>Reporter: Yuming Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47471) Support order-insensitive lateral column alias

2024-03-19 Thread Yuming Wang (Jira)
Yuming Wang created SPARK-47471:
---

 Summary: Support order-insensitive lateral column alias
 Key: SPARK-47471
 URL: https://issues.apache.org/jira/browse/SPARK-47471
 Project: Spark
  Issue Type: Improvement
  Components: Block Manager
Affects Versions: 3.3.4
Reporter: Yuming Wang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47459) Cancel running stage if the result is empty relation

2024-03-19 Thread Yuming Wang (Jira)
Yuming Wang created SPARK-47459:
---

 Summary: Cancel running stage if the result is empty relation
 Key: SPARK-47459
 URL: https://issues.apache.org/jira/browse/SPARK-47459
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.5.1
Reporter: Yuming Wang
 Attachments: task stack trace.png

How to reproduce:
bin/spark-sql --master yarn --conf spark.driver.host=10.211.174.53
{code:sql}
set spark.sql.adaptive.enabled=true;
select a from (select id as a, id as b, id as z from range(1)) t1
join (select id as c, id as d from range(2)) t2 on t1.a = t2.c
join (select id as e, id as f from range(3)) t3 on t2.d = t3.e
where z % 10 < 0
group by 1;
{code}






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47459) Cancel running stage if the result is empty relation

2024-03-19 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-47459:

Attachment: task stack trace.png

> Cancel running stage if the result is empty relation
> 
>
> Key: SPARK-47459
> URL: https://issues.apache.org/jira/browse/SPARK-47459
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.1
>Reporter: Yuming Wang
>Priority: Major
> Attachments: task stack trace.png
>
>
> How to reproduce:
> bin/spark-sql --master yarn --conf spark.driver.host=10.211.174.53
> {code:sql}
> set spark.sql.adaptive.enabled=true;
> select a from (select id as a, id as b, id as z from range(1)) t1
> join (select id as c, id as d from range(2)) t2 on t1.a = t2.c
> join (select id as e, id as f from range(3)) t3 on t2.d = t3.e
> where z % 10 < 0
> group by 1;
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47441) Do not add log link for unmanaged AM in Spark UI

2024-03-18 Thread Yuming Wang (Jira)
Yuming Wang created SPARK-47441:
---

 Summary: Do not add log link for unmanaged AM in Spark UI
 Key: SPARK-47441
 URL: https://issues.apache.org/jira/browse/SPARK-47441
 Project: Spark
  Issue Type: Bug
  Components: YARN
Affects Versions: 3.5.1, 3.5.0
Reporter: Yuming Wang


{noformat}
24/03/18 04:58:25,022 ERROR [spark-listener-group-appStatus] 
scheduler.AsyncEventQueue:97 : Listener AppStatusListener threw an exception
java.lang.NumberFormatException: For input string: "null"
at 
java.lang.NumberFormatException.forInputString(NumberFormatException.java:67) 
~[?:?]
at java.lang.Integer.parseInt(Integer.java:668) ~[?:?]
at java.lang.Integer.parseInt(Integer.java:786) ~[?:?]
at scala.collection.immutable.StringLike.toInt(StringLike.scala:310) 
~[scala-library-2.12.18.jar:?]
at scala.collection.immutable.StringLike.toInt$(StringLike.scala:310) 
~[scala-library-2.12.18.jar:?]
at scala.collection.immutable.StringOps.toInt(StringOps.scala:33) 
~[scala-library-2.12.18.jar:?]
at org.apache.spark.util.Utils$.parseHostPort(Utils.scala:1105) 
~[spark-core_2.12-3.5.1.jar:3.5.1]
at 
org.apache.spark.status.ProcessSummaryWrapper.(storeTypes.scala:609) 
~[spark-core_2.12-3.5.1.jar:3.5.1]
at 
org.apache.spark.status.LiveMiscellaneousProcess.doUpdate(LiveEntity.scala:1045)
 ~[spark-core_2.12-3.5.1.jar:3.5.1]
at org.apache.spark.status.LiveEntity.write(LiveEntity.scala:50) 
~[spark-core_2.12-3.5.1.jar:3.5.1]
at 
org.apache.spark.status.AppStatusListener.update(AppStatusListener.scala:1233) 
~[spark-core_2.12-3.5.1.jar:3.5.1]
at 
org.apache.spark.status.AppStatusListener.onMiscellaneousProcessAdded(AppStatusListener.scala:1445)
 ~[spark-core_2.12-3.5.1.jar:3.5.1]
at 
org.apache.spark.status.AppStatusListener.onOtherEvent(AppStatusListener.scala:113)
 ~[spark-core_2.12-3.5.1.jar:3.5.1]
at 
org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:100)
 ~[spark-core_2.12-3.5.1.jar:3.5.1]
at 
org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28)
 ~[spark-core_2.12-3.5.1.jar:3.5.1]
at 
org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
 ~[spark-core_2.12-3.5.1.jar:3.5.1]
at 
org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
 ~[spark-core_2.12-3.5.1.jar:3.5.1]
at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:117) 
~[spark-core_2.12-3.5.1.jar:3.5.1]
at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:101) 
~[spark-core_2.12-3.5.1.jar:3.5.1]
at 
org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:105)
 ~[spark-core_2.12-3.5.1.jar:3.5.1]
at 
org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:105)
 ~[spark-core_2.12-3.5.1.jar:3.5.1]
at 
scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23) 
~[scala-library-2.12.18.jar:?]
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62) 
~[scala-library-2.12.18.jar:?]
at 
org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:100)
 ~[spark-core_2.12-3.5.1.jar:3.5.1]
at 
org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:96)
 ~[spark-core_2.12-3.5.1.jar:3.5.1]
at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1356) 
[spark-core_2.12-3.5.1.jar:3.5.1]
at 
org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:96)
 [spark-core_2.12-3.5.1.jar:3.5.1]
{noformat}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-47222) fileCompressionFactor should be applied to the size of the table

2024-02-28 Thread Yuming Wang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-47222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821923#comment-17821923
 ] 

Yuming Wang commented on SPARK-47222:
-

https://github.com/apache/spark/pull/45329

> fileCompressionFactor should be applied to the size of the table
> 
>
> Key: SPARK-47222
> URL: https://issues.apache.org/jira/browse/SPARK-47222
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Yuming Wang
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47222) fileCompressionFactor should be applied to the size of the table

2024-02-28 Thread Yuming Wang (Jira)
Yuming Wang created SPARK-47222:
---

 Summary: fileCompressionFactor should be applied to the size of 
the table
 Key: SPARK-47222
 URL: https://issues.apache.org/jira/browse/SPARK-47222
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 4.0.0
Reporter: Yuming Wang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46885) Push down filters through TypedFilter

2024-01-27 Thread Yuming Wang (Jira)
Yuming Wang created SPARK-46885:
---

 Summary: Push down filters through TypedFilter
 Key: SPARK-46885
 URL: https://issues.apache.org/jira/browse/SPARK-46885
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 4.0.0
Reporter: Yuming Wang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40609) Casts types according to bucket info for Equality expression

2023-12-05 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang resolved SPARK-40609.
-
Fix Version/s: 4.0.0
   Resolution: Duplicate

Issue fixed by SPARK-46219.

> Casts types according to bucket info for Equality expression
> 
>
> Key: SPARK-40609
> URL: https://issues.apache.org/jira/browse/SPARK-40609
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Yuming Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46219) Unwrap cast in join predicates

2023-12-03 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-46219:

Summary: Unwrap cast in join predicates  (was: Unwrapp cast in join 
predicates)

> Unwrap cast in join predicates
> --
>
> Key: SPARK-46219
> URL: https://issues.apache.org/jira/browse/SPARK-46219
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Yuming Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46219) Unwrapp cast in join predicates

2023-12-03 Thread Yuming Wang (Jira)
Yuming Wang created SPARK-46219:
---

 Summary: Unwrapp cast in join predicates
 Key: SPARK-46219
 URL: https://issues.apache.org/jira/browse/SPARK-46219
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Affects Versions: 4.0.0
Reporter: Yuming Wang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46069) Support unwrap timestamp type to date type

2023-12-02 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang resolved SPARK-46069.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43982
[https://github.com/apache/spark/pull/43982]

> Support unwrap timestamp type to date type
> --
>
> Key: SPARK-46069
> URL: https://issues.apache.org/jira/browse/SPARK-46069
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Wan Kun
>Assignee: Wan Kun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46069) Support unwrap timestamp type to date type

2023-12-02 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang reassigned SPARK-46069:
---

Assignee: Wan Kun

> Support unwrap timestamp type to date type
> --
>
> Key: SPARK-46069
> URL: https://issues.apache.org/jira/browse/SPARK-46069
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Wan Kun
>Assignee: Wan Kun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-43228) Join keys also match PartitioningCollection

2023-12-02 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang resolved SPARK-43228.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44128
[https://github.com/apache/spark/pull/44128]

> Join keys also match PartitioningCollection
> ---
>
> Key: SPARK-43228
> URL: https://issues.apache.org/jira/browse/SPARK-43228
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-43228) Join keys also match PartitioningCollection

2023-12-02 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang reassigned SPARK-43228:
---

Assignee: Yuming Wang

> Join keys also match PartitioningCollection
> ---
>
> Key: SPARK-43228
> URL: https://issues.apache.org/jira/browse/SPARK-43228
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46122) Disable spark.sql.legacy.createHiveTableByDefault by default

2023-11-27 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-46122:

Summary: Disable spark.sql.legacy.createHiveTableByDefault by default  
(was: Enable spark.sql.legacy.createHiveTableByDefault by default)

> Disable spark.sql.legacy.createHiveTableByDefault by default
> 
>
> Key: SPARK-46122
> URL: https://issues.apache.org/jira/browse/SPARK-46122
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Yuming Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46122) Enable spark.sql.legacy.createHiveTableByDefault by default

2023-11-27 Thread Yuming Wang (Jira)
Yuming Wang created SPARK-46122:
---

 Summary: Enable spark.sql.legacy.createHiveTableByDefault by 
default
 Key: SPARK-46122
 URL: https://issues.apache.org/jira/browse/SPARK-46122
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 4.0.0
Reporter: Yuming Wang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46119) Override toString method for UnresolvedAlias

2023-11-26 Thread Yuming Wang (Jira)
Yuming Wang created SPARK-46119:
---

 Summary: Override toString method for UnresolvedAlias
 Key: SPARK-46119
 URL: https://issues.apache.org/jira/browse/SPARK-46119
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 4.0.0
Reporter: Yuming Wang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46102) Prune keys or values from Generate if it is a map type

2023-11-25 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-46102:

Summary: Prune keys or values from Generate if it is a map type  (was: 
Prune keys or values from Generate if it is a map type.)

> Prune keys or values from Generate if it is a map type
> --
>
> Key: SPARK-46102
> URL: https://issues.apache.org/jira/browse/SPARK-46102
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Yuming Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46102) Prune keys or values from Generate if it is a map type.

2023-11-25 Thread Yuming Wang (Jira)
Yuming Wang created SPARK-46102:
---

 Summary: Prune keys or values from Generate if it is a map type.
 Key: SPARK-46102
 URL: https://issues.apache.org/jira/browse/SPARK-46102
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 4.0.0
Reporter: Yuming Wang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-46097) Push down limit 1 though Union and Aggregate

2023-11-24 Thread Yuming Wang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-46097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17789619#comment-17789619
 ] 

Yuming Wang commented on SPARK-46097:
-

https://github.com/apache/spark/pull/44009

> Push down limit 1 though Union and Aggregate
> 
>
> Key: SPARK-46097
> URL: https://issues.apache.org/jira/browse/SPARK-46097
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Yuming Wang
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46097) Push down limit 1 though Union and Aggregate

2023-11-24 Thread Yuming Wang (Jira)
Yuming Wang created SPARK-46097:
---

 Summary: Push down limit 1 though Union and Aggregate
 Key: SPARK-46097
 URL: https://issues.apache.org/jira/browse/SPARK-46097
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 4.0.0
Reporter: Yuming Wang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45954) Avoid generating redundant ShuffleExchangeExec node

2023-11-16 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-45954:

Summary: Avoid generating redundant ShuffleExchangeExec node  (was: Remove 
redundant shuffles)

> Avoid generating redundant ShuffleExchangeExec node
> ---
>
> Key: SPARK-45954
> URL: https://issues.apache.org/jira/browse/SPARK-45954
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Yuming Wang
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45954) Remove redundant shuffles

2023-11-16 Thread Yuming Wang (Jira)
Yuming Wang created SPARK-45954:
---

 Summary: Remove redundant shuffles
 Key: SPARK-45954
 URL: https://issues.apache.org/jira/browse/SPARK-45954
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 4.0.0
Reporter: Yuming Wang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45947) Set a human readable description for Dataset api

2023-11-15 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-45947:

Description: 
We should set the view name to 
sparkSession.sparkContext.setJobDescription("xxx")


 !screenshot-1.png! 


  was:
Need to sparkSession.sparkContext.setJobDescription("xxx")
 !screenshot-1.png! 



> Set a human readable description for Dataset api
> 
>
> Key: SPARK-45947
> URL: https://issues.apache.org/jira/browse/SPARK-45947
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Yuming Wang
>Priority: Major
> Attachments: screenshot-1.png
>
>
> We should set the view name to 
> sparkSession.sparkContext.setJobDescription("xxx")
>  !screenshot-1.png! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45947) Set a human readable description for Dataset api

2023-11-15 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-45947:

Description: 
Need to sparkSession.sparkContext.setJobDescription("xxx")
 !screenshot-1.png! 


> Set a human readable description for Dataset api
> 
>
> Key: SPARK-45947
> URL: https://issues.apache.org/jira/browse/SPARK-45947
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Yuming Wang
>Priority: Major
> Attachments: screenshot-1.png
>
>
> Need to sparkSession.sparkContext.setJobDescription("xxx")
>  !screenshot-1.png! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45947) Set a human readable description for Dataset api

2023-11-15 Thread Yuming Wang (Jira)
Yuming Wang created SPARK-45947:
---

 Summary: Set a human readable description for Dataset api
 Key: SPARK-45947
 URL: https://issues.apache.org/jira/browse/SPARK-45947
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 4.0.0
Reporter: Yuming Wang
 Attachments: screenshot-1.png





--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45947) Set a human readable description for Dataset api

2023-11-15 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-45947:

Attachment: screenshot-1.png

> Set a human readable description for Dataset api
> 
>
> Key: SPARK-45947
> URL: https://issues.apache.org/jira/browse/SPARK-45947
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Yuming Wang
>Priority: Major
> Attachments: screenshot-1.png
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45915) Treat decimal(x, 0) the same as IntegralType in PromoteStrings

2023-11-14 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-45915:

Summary: Treat decimal(x, 0) the same as IntegralType in PromoteStrings  
(was: Unwrap cast in predicate)

> Treat decimal(x, 0) the same as IntegralType in PromoteStrings
> --
>
> Key: SPARK-45915
> URL: https://issues.apache.org/jira/browse/SPARK-45915
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Yuming Wang
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45915) Unwrap cast in predicate

2023-11-13 Thread Yuming Wang (Jira)
Yuming Wang created SPARK-45915:
---

 Summary: Unwrap cast in predicate
 Key: SPARK-45915
 URL: https://issues.apache.org/jira/browse/SPARK-45915
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 4.0.0
Reporter: Yuming Wang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45909) Remove the cast if it can safely up-cast in IsNotNull

2023-11-13 Thread Yuming Wang (Jira)
Yuming Wang created SPARK-45909:
---

 Summary: Remove the cast if it can safely up-cast in IsNotNull
 Key: SPARK-45909
 URL: https://issues.apache.org/jira/browse/SPARK-45909
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 4.0.0
Reporter: Yuming Wang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45894) hive table level setting hadoop.mapred.max.split.size

2023-11-11 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-45894:

Target Version/s:   (was: 3.5.0)

> hive table level setting hadoop.mapred.max.split.size
> -
>
> Key: SPARK-45894
> URL: https://issues.apache.org/jira/browse/SPARK-45894
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: guihuawen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0
>
>
> In the scenario of hive table scan, by configuring the 
> hadoop.mapred.max.split.size parameter, you can increase the parallelism of 
> the scan hive table stage, thereby reducing the running time.
> However, if a large table and a small table are in the same query, if only a 
> separate hadoop.mapred.max.split.size parameter is configured, some stages 
> will run a very large number of tasks, and some stages will The number of 
> tasks running is very small. For runtime tasks, the 
> hadoop.mapred.max.split.size parameter can be set separately for each hive 
> table to ensure this balance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45895) Combine multiple like to like all

2023-11-11 Thread Yuming Wang (Jira)
Yuming Wang created SPARK-45895:
---

 Summary: Combine multiple like to like all
 Key: SPARK-45895
 URL: https://issues.apache.org/jira/browse/SPARK-45895
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 4.0.0
Reporter: Yuming Wang



{code:scala}
   spark.sql("create table t(a string, b string, c string) using parquet")
spark.sql(
  """
|select * from t where
|substr(a, 1, 5) like '%a%' and
|substr(a, 1, 5) like '%b%'
|""".stripMargin).explain(true)
{code}

We can optimize the query to:
{code:scala}
spark.sql(
  """
|select * from t where
|substr(a, 1, 5) like all('%a%', '%b%')
|""".stripMargin).explain(true)
{code}





--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45853) Add Iceberg and Hudi to third party projects

2023-11-09 Thread Yuming Wang (Jira)
Yuming Wang created SPARK-45853:
---

 Summary: Add Iceberg and Hudi to third party projects
 Key: SPARK-45853
 URL: https://issues.apache.org/jira/browse/SPARK-45853
 Project: Spark
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 4.0.0
Reporter: Yuming Wang



{noformat}
Error: org.apache.hive.service.cli.HiveSQLException: Error running query: 
java.util.concurrent.ExecutionException: 
org.apache.spark.SparkClassNotFoundException: [DATA_SOURCE_NOT_FOUND] Failed to 
find the data source: iceberg. Please find packages at 
`https://spark.apache.org/third-party-projects.html`.
at 
org.apache.spark.sql.hive.thriftserver.HiveThriftServerErrors$.runningQueryError(HiveThriftServerErrors.scala:46)
at 
org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:262)
at 
org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.$anonfun$run$2(SparkExecuteStatementOperation.scala:166)
at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at 
org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties(SparkOperation.scala:79)
at 
org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties$(SparkOperation.scala:63)
at 
org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.withLocalProperties(SparkExecuteStatementOperation.scala:41)
at 
org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:166)
at 
org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:161)
at 
java.base/java.security.AccessController.doPrivileged(AccessController.java:712)
at java.base/javax.security.auth.Subject.doAs(Subject.java:439)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)
at 
org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2.run(SparkExecuteStatementOperation.scala:175)
at 
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:833)
{noformat}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45848) spark-build-info.ps1 missing the docroot property

2023-11-08 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-45848:

Description: 
https://github.com/apache/spark/blob/master/build/spark-build-info.ps1#L38-L44
https://github.com/apache/spark/blob/master/build/spark-build-info#L30-L36

  
was:https://github.com/apache/spark/blob/master/build/spark-build-info.ps1#L38-L44


> spark-build-info.ps1 missing the docroot property
> -
>
> Key: SPARK-45848
> URL: https://issues.apache.org/jira/browse/SPARK-45848
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> https://github.com/apache/spark/blob/master/build/spark-build-info.ps1#L38-L44
> https://github.com/apache/spark/blob/master/build/spark-build-info#L30-L36



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45848) spark-build-info.ps1 missing the docroot property

2023-11-08 Thread Yuming Wang (Jira)
Yuming Wang created SPARK-45848:
---

 Summary: spark-build-info.ps1 missing the docroot property
 Key: SPARK-45848
 URL: https://issues.apache.org/jira/browse/SPARK-45848
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 4.0.0
Reporter: Yuming Wang


https://github.com/apache/spark/blob/master/build/spark-build-info.ps1#L38-L44



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (HIVE-27859) Backport HIVE-27817: Disable ssl hostname verification for 127.0.0.1

2023-11-08 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated HIVE-27859:
---
Affects Version/s: 2.3.9
   (was: 2.3.8)

> Backport HIVE-27817: Disable ssl hostname verification for 127.0.0.1
> 
>
> Key: HIVE-27859
> URL: https://issues.apache.org/jira/browse/HIVE-27859
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.3.9
>Reporter: Yuming Wang
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27859) Backport HIVE-27817: Disable ssl hostname verification for 127.0.0.1

2023-11-08 Thread Yuming Wang (Jira)
Yuming Wang created HIVE-27859:
--

 Summary: Backport HIVE-27817: Disable ssl hostname verification 
for 127.0.0.1
 Key: HIVE-27859
 URL: https://issues.apache.org/jira/browse/HIVE-27859
 Project: Hive
  Issue Type: Improvement
Affects Versions: 2.3.8
Reporter: Yuming Wang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (SPARK-45755) Push down limit through Dataset.isEmpty()

2023-10-31 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-45755:

Description: 
Push down LocalLimit can not optimize the case of distinct.

{code:scala}
  def isEmpty: Boolean = withAction("isEmpty",
withTypedPlan { LocalLimit(Literal(1), select().logicalPlan) 
}.queryExecution) { plan =>
plan.executeTake(1).isEmpty
  }
{code}


> Push down limit through Dataset.isEmpty()
> -
>
> Key: SPARK-45755
> URL: https://issues.apache.org/jira/browse/SPARK-45755
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> Push down LocalLimit can not optimize the case of distinct.
> {code:scala}
>   def isEmpty: Boolean = withAction("isEmpty",
> withTypedPlan { LocalLimit(Literal(1), select().logicalPlan) 
> }.queryExecution) { plan =>
> plan.executeTake(1).isEmpty
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45755) Push down limit through Dataset.isEmpty()

2023-10-31 Thread Yuming Wang (Jira)
Yuming Wang created SPARK-45755:
---

 Summary: Push down limit through Dataset.isEmpty()
 Key: SPARK-45755
 URL: https://issues.apache.org/jira/browse/SPARK-45755
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 4.0.0
Reporter: Yuming Wang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45658) Canonicalization of DynamicPruningSubquery is broken

2023-10-25 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-45658:

Target Version/s:   (was: 3.5.1)

> Canonicalization of DynamicPruningSubquery is broken
> 
>
> Key: SPARK-45658
> URL: https://issues.apache.org/jira/browse/SPARK-45658
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.0, 3.5.1
>Reporter: Asif
>Priority: Major
>
> The canonicalization of (buildKeys: Seq[Expression]) in the class 
> DynamicPruningSubquery is broken, as the buildKeys are canonicalized just by 
> calling 
> buildKeys.map(_.canonicalized)
> The  above would result in incorrect canonicalization as it would not be 
> normalizing the exprIds relative to buildQuery output
> The fix is to use the buildQuery : LogicalPlan's output to normalize the 
> buildKeys expression
> as given below, using the standard approach.
> buildKeys.map(QueryPlan.normalizeExpressions(_, buildQuery.output)),
> Will be filing a PR and bug test for the same.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45658) Canonicalization of DynamicPruningSubquery is broken

2023-10-25 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-45658:

Affects Version/s: (was: 3.5.1)

> Canonicalization of DynamicPruningSubquery is broken
> 
>
> Key: SPARK-45658
> URL: https://issues.apache.org/jira/browse/SPARK-45658
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Asif
>Priority: Major
>
> The canonicalization of (buildKeys: Seq[Expression]) in the class 
> DynamicPruningSubquery is broken, as the buildKeys are canonicalized just by 
> calling 
> buildKeys.map(_.canonicalized)
> The  above would result in incorrect canonicalization as it would not be 
> normalizing the exprIds relative to buildQuery output
> The fix is to use the buildQuery : LogicalPlan's output to normalize the 
> buildKeys expression
> as given below, using the standard approach.
> buildKeys.map(QueryPlan.normalizeExpressions(_, buildQuery.output)),
> Will be filing a PR and bug test for the same.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (HIVE-27818) Fix compilation failure in AccumuloPredicateHandler

2023-10-24 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated HIVE-27818:
---
Description: 
{noformat}
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) on 
project hive-accumulo-handler: Compilation failure
[ERROR] 
/Users/yumwang/opensource/hive/accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/predicate/AccumuloPredicateHandler.java:[263,23]
 unreported exception org.apache.hadoop.hive.ql.metadata.HiveException; must be 
caught or declared to be thrown
[ERROR] 
{noformat}


{noformat}
yumwang@G9L07H60PK hive % java -version
openjdk version "1.8.0_382"
OpenJDK Runtime Environment (Zulu 8.72.0.17-CA-macos-aarch64) (build 
1.8.0_382-b05)
OpenJDK 64-Bit Server VM (Zulu 8.72.0.17-CA-macos-aarch64) (build 25.382-b05, 
mixed mode)
yumwang@G9L07H60PK hive % mvn -version
Apache Maven 3.9.4 (dfbb324ad4a7c8fb0bf182e6d91b0ae20e3d2dd9)
Maven home: /Users/yumwang/software/apache-maven-3.8.8
Java version: 1.8.0_382, vendor: Azul Systems, Inc., runtime: 
/Library/Java/JavaVirtualMachines/zulu-8.jdk/Contents/Home/jre
Default locale: en_CN, platform encoding: UTF-8
OS name: "mac os x", version: "13.6", arch: "aarch64", family: "mac"
{noformat}




  was:
{noformat}
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) on 
project hive-accumulo-handler: Compilation failure
[ERROR] 
/Users/yumwang/opensource/hive/accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/predicate/AccumuloPredicateHandler.java:[263,23]
 unreported exception org.apache.hadoop.hive.ql.metadata.HiveException; must be 
caught or declared to be thrown
[ERROR] 
{noformat}



> Fix compilation failure in AccumuloPredicateHandler
> ---
>
> Key: HIVE-27818
> URL: https://issues.apache.org/jira/browse/HIVE-27818
> Project: Hive
>  Issue Type: Bug
>Reporter: Yuming Wang
>Priority: Major
>
> {noformat}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) 
> on project hive-accumulo-handler: Compilation failure
> [ERROR] 
> /Users/yumwang/opensource/hive/accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/predicate/AccumuloPredicateHandler.java:[263,23]
>  unreported exception org.apache.hadoop.hive.ql.metadata.HiveException; must 
> be caught or declared to be thrown
> [ERROR] 
> {noformat}
> {noformat}
> yumwang@G9L07H60PK hive % java -version
> openjdk version "1.8.0_382"
> OpenJDK Runtime Environment (Zulu 8.72.0.17-CA-macos-aarch64) (build 
> 1.8.0_382-b05)
> OpenJDK 64-Bit Server VM (Zulu 8.72.0.17-CA-macos-aarch64) (build 25.382-b05, 
> mixed mode)
> yumwang@G9L07H60PK hive % mvn -version
> Apache Maven 3.9.4 (dfbb324ad4a7c8fb0bf182e6d91b0ae20e3d2dd9)
> Maven home: /Users/yumwang/software/apache-maven-3.8.8
> Java version: 1.8.0_382, vendor: Azul Systems, Inc., runtime: 
> /Library/Java/JavaVirtualMachines/zulu-8.jdk/Contents/Home/jre
> Default locale: en_CN, platform encoding: UTF-8
> OS name: "mac os x", version: "13.6", arch: "aarch64", family: "mac"
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27818) Fix compilation failure in AccumuloPredicateHandler

2023-10-24 Thread Yuming Wang (Jira)
Yuming Wang created HIVE-27818:
--

 Summary: Fix compilation failure in AccumuloPredicateHandler
 Key: HIVE-27818
 URL: https://issues.apache.org/jira/browse/HIVE-27818
 Project: Hive
  Issue Type: Bug
Reporter: Yuming Wang


{noformat}
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) on 
project hive-accumulo-handler: Compilation failure
[ERROR] 
/Users/yumwang/opensource/hive/accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/predicate/AccumuloPredicateHandler.java:[263,23]
 unreported exception org.apache.hadoop.hive.ql.metadata.HiveException; must be 
caught or declared to be thrown
[ERROR] 
{noformat}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27817) Disable ssl hostname verification for 127.0.0.1

2023-10-23 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated HIVE-27817:
---
Description: 
{code:java}
diff --git 
a/common/src/java/org/apache/hadoop/hive/common/auth/HiveAuthUtils.java 
b/common/src/java/org/apache/hadoop/hive/common/auth/HiveAuthUtils.java
index e12f245871..632980e7cd 100644
--- a/common/src/java/org/apache/hadoop/hive/common/auth/HiveAuthUtils.java
+++ b/common/src/java/org/apache/hadoop/hive/common/auth/HiveAuthUtils.java
@@ -71,7 +71,11 @@ public static TTransport getSSLSocket(String host, int port, 
int loginTimeout,
   private static TSocket getSSLSocketWithHttps(TSocket tSSLSocket) throws 
TTransportException {
 SSLSocket sslSocket = (SSLSocket) tSSLSocket.getSocket();
 SSLParameters sslParams = sslSocket.getSSLParameters();
-sslParams.setEndpointIdentificationAlgorithm("HTTPS");
+if (sslSocket.getLocalAddress().getHostAddress().equals("127.0.0.1")) {
+  sslParams.setEndpointIdentificationAlgorithm(null);
+} else {
+  sslParams.setEndpointIdentificationAlgorithm("HTTPS");
+}
 sslSocket.setSSLParameters(sslParams);
 return new TSocket(sslSocket);
   }

{code}

  was:
{code:diff}
diff --git 
a/common/src/java/org/apache/hadoop/hive/common/auth/HiveAuthUtils.java 
b/common/src/java/org/apache/hadoop/hive/common/auth/HiveAuthUtils.java
index e12f245871..632980e7cd 100644
--- a/common/src/java/org/apache/hadoop/hive/common/auth/HiveAuthUtils.java
+++ b/common/src/java/org/apache/hadoop/hive/common/auth/HiveAuthUtils.java
@@ -71,7 +71,11 @@ public static TTransport getSSLSocket(String host, int port, 
int loginTimeout,
   private static TSocket getSSLSocketWithHttps(TSocket tSSLSocket) throws 
TTransportException {
 SSLSocket sslSocket = (SSLSocket) tSSLSocket.getSocket();
 SSLParameters sslParams = sslSocket.getSSLParameters();
-sslParams.setEndpointIdentificationAlgorithm("HTTPS");
+if (sslSocket.getLocalAddress().getHostAddress().equals("127.0.0.1")) {
+  sslParams.setEndpointIdentificationAlgorithm(null);
+} else {
+  sslParams.setEndpointIdentificationAlgorithm("HTTPS");
+}
 sslSocket.setSSLParameters(sslParams);
 return new TSocket(sslSocket);
   }

{code}


> Disable ssl hostname verification for 127.0.0.1
> ---
>
> Key: HIVE-27817
> URL: https://issues.apache.org/jira/browse/HIVE-27817
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 2.3.0
>Reporter: Yuming Wang
>Priority: Major
>
> {code:java}
> diff --git 
> a/common/src/java/org/apache/hadoop/hive/common/auth/HiveAuthUtils.java 
> b/common/src/java/org/apache/hadoop/hive/common/auth/HiveAuthUtils.java
> index e12f245871..632980e7cd 100644
> --- a/common/src/java/org/apache/hadoop/hive/common/auth/HiveAuthUtils.java
> +++ b/common/src/java/org/apache/hadoop/hive/common/auth/HiveAuthUtils.java
> @@ -71,7 +71,11 @@ public static TTransport getSSLSocket(String host, int 
> port, int loginTimeout,
>private static TSocket getSSLSocketWithHttps(TSocket tSSLSocket) throws 
> TTransportException {
>  SSLSocket sslSocket = (SSLSocket) tSSLSocket.getSocket();
>  SSLParameters sslParams = sslSocket.getSSLParameters();
> -sslParams.setEndpointIdentificationAlgorithm("HTTPS");
> +if (sslSocket.getLocalAddress().getHostAddress().equals("127.0.0.1")) {
> +  sslParams.setEndpointIdentificationAlgorithm(null);
> +} else {
> +  sslParams.setEndpointIdentificationAlgorithm("HTTPS");
> +}
>  sslSocket.setSSLParameters(sslParams);
>  return new TSocket(sslSocket);
>}
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27817) Disable ssl hostname verification for 127.0.0.1

2023-10-23 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated HIVE-27817:
---
Description: 
{code:diff}
diff --git 
a/common/src/java/org/apache/hadoop/hive/common/auth/HiveAuthUtils.java 
b/common/src/java/org/apache/hadoop/hive/common/auth/HiveAuthUtils.java
index e12f245871..632980e7cd 100644
--- a/common/src/java/org/apache/hadoop/hive/common/auth/HiveAuthUtils.java
+++ b/common/src/java/org/apache/hadoop/hive/common/auth/HiveAuthUtils.java
@@ -71,7 +71,11 @@ public static TTransport getSSLSocket(String host, int port, 
int loginTimeout,
   private static TSocket getSSLSocketWithHttps(TSocket tSSLSocket) throws 
TTransportException {
 SSLSocket sslSocket = (SSLSocket) tSSLSocket.getSocket();
 SSLParameters sslParams = sslSocket.getSSLParameters();
-sslParams.setEndpointIdentificationAlgorithm("HTTPS");
+if (sslSocket.getLocalAddress().getHostAddress().equals("127.0.0.1")) {
+  sslParams.setEndpointIdentificationAlgorithm(null);
+} else {
+  sslParams.setEndpointIdentificationAlgorithm("HTTPS");
+}
 sslSocket.setSSLParameters(sslParams);
 return new TSocket(sslSocket);
   }

{code}

> Disable ssl hostname verification for 127.0.0.1
> ---
>
> Key: HIVE-27817
> URL: https://issues.apache.org/jira/browse/HIVE-27817
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 2.3.0
>Reporter: Yuming Wang
>Priority: Major
>
> {code:diff}
> diff --git 
> a/common/src/java/org/apache/hadoop/hive/common/auth/HiveAuthUtils.java 
> b/common/src/java/org/apache/hadoop/hive/common/auth/HiveAuthUtils.java
> index e12f245871..632980e7cd 100644
> --- a/common/src/java/org/apache/hadoop/hive/common/auth/HiveAuthUtils.java
> +++ b/common/src/java/org/apache/hadoop/hive/common/auth/HiveAuthUtils.java
> @@ -71,7 +71,11 @@ public static TTransport getSSLSocket(String host, int 
> port, int loginTimeout,
>private static TSocket getSSLSocketWithHttps(TSocket tSSLSocket) throws 
> TTransportException {
>  SSLSocket sslSocket = (SSLSocket) tSSLSocket.getSocket();
>  SSLParameters sslParams = sslSocket.getSSLParameters();
> -sslParams.setEndpointIdentificationAlgorithm("HTTPS");
> +if (sslSocket.getLocalAddress().getHostAddress().equals("127.0.0.1")) {
> +  sslParams.setEndpointIdentificationAlgorithm(null);
> +} else {
> +  sslParams.setEndpointIdentificationAlgorithm("HTTPS");
> +}
>  sslSocket.setSSLParameters(sslParams);
>  return new TSocket(sslSocket);
>}
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27817) Disable ssl hostname verification for 127.0.0.1

2023-10-23 Thread Yuming Wang (Jira)
Yuming Wang created HIVE-27817:
--

 Summary: Disable ssl hostname verification for 127.0.0.1
 Key: HIVE-27817
 URL: https://issues.apache.org/jira/browse/HIVE-27817
 Project: Hive
  Issue Type: Improvement
  Components: Hive
Affects Versions: 4.0.0-beta-1
Reporter: Yuming Wang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27817) Disable ssl hostname verification for 127.0.0.1

2023-10-23 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated HIVE-27817:
---
Affects Version/s: 2.3.0
   (was: 4.0.0-beta-1)

> Disable ssl hostname verification for 127.0.0.1
> ---
>
> Key: HIVE-27817
> URL: https://issues.apache.org/jira/browse/HIVE-27817
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 2.3.0
>Reporter: Yuming Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27815) Support collect numModifiedRows

2023-10-20 Thread Yuming Wang (Jira)
Yuming Wang created HIVE-27815:
--

 Summary: Support collect numModifiedRows
 Key: HIVE-27815
 URL: https://issues.apache.org/jira/browse/HIVE-27815
 Project: Hive
  Issue Type: Improvement
Affects Versions: 2.3.8
Reporter: Yuming Wang


Backport part of HIVE-14388.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (SPARK-43851) Support LCA in grouping expressions

2023-10-20 Thread Yuming Wang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1591#comment-1591
 ] 

Yuming Wang commented on SPARK-43851:
-

The resolution should be unresolved.

> Support LCA in grouping expressions
> ---
>
> Key: SPARK-43851
> URL: https://issues.apache.org/jira/browse/SPARK-43851
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Yuming Wang
>Priority: Major
>
> Teradata supports it:
> {code:sql}
> create table t1(a int) using  parquet;
> select a + 1 as a1, a1 + 1 as a2 from t1 group by a1, a2;
> {code}
> {noformat}
> [UNSUPPORTED_FEATURE.LATERAL_COLUMN_ALIAS_IN_GROUP_BY] The feature is not 
> supported: Referencing a lateral column alias via GROUP BY alias/ALL is not 
> supported yet.
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-43851) Support LCA in grouping expressions

2023-10-20 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang reopened SPARK-43851:
-
  Assignee: (was: Jia Fan)

> Support LCA in grouping expressions
> ---
>
> Key: SPARK-43851
> URL: https://issues.apache.org/jira/browse/SPARK-43851
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Yuming Wang
>Priority: Major
> Fix For: 3.5.0
>
>
> Teradata supports it:
> {code:sql}
> create table t1(a int) using  parquet;
> select a + 1 as a1, a1 + 1 as a2 from t1 group by a1, a2;
> {code}
> {noformat}
> [UNSUPPORTED_FEATURE.LATERAL_COLUMN_ALIAS_IN_GROUP_BY] The feature is not 
> supported: Referencing a lateral column alias via GROUP BY alias/ALL is not 
> supported yet.
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43851) Support LCA in grouping expressions

2023-10-20 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-43851:

Fix Version/s: (was: 3.5.0)

> Support LCA in grouping expressions
> ---
>
> Key: SPARK-43851
> URL: https://issues.apache.org/jira/browse/SPARK-43851
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Yuming Wang
>Priority: Major
>
> Teradata supports it:
> {code:sql}
> create table t1(a int) using  parquet;
> select a + 1 as a1, a1 + 1 as a2 from t1 group by a1, a2;
> {code}
> {noformat}
> [UNSUPPORTED_FEATURE.LATERAL_COLUMN_ALIAS_IN_GROUP_BY] The feature is not 
> supported: Referencing a lateral column alias via GROUP BY alias/ALL is not 
> supported yet.
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45454) Set the table's default owner to current_user

2023-10-07 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-45454:

Parent: (was: SPARK-30016)
Issue Type: Improvement  (was: Sub-task)

> Set the table's default owner to current_user
> -
>
> Key: SPARK-45454
> URL: https://issues.apache.org/jira/browse/SPARK-45454
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Yuming Wang
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45454) Set the table's default owner to current_user

2023-10-07 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-45454:

Summary: Set the table's default owner to current_user  (was: Set owner of 
DS v2 table to CURRENT_USER if it is already set)

> Set the table's default owner to current_user
> -
>
> Key: SPARK-45454
> URL: https://issues.apache.org/jira/browse/SPARK-45454
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Yuming Wang
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45454) Set owner of DS v2 table to CURRENT_USER if it is already set

2023-10-07 Thread Yuming Wang (Jira)
Yuming Wang created SPARK-45454:
---

 Summary: Set owner of DS v2 table to CURRENT_USER if it is already 
set
 Key: SPARK-45454
 URL: https://issues.apache.org/jira/browse/SPARK-45454
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 4.0.0
Reporter: Yuming Wang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45387) Partition key filter cannot be pushed down when using cast

2023-09-30 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-45387:

Target Version/s:   (was: 3.1.1, 3.3.0)

> Partition key filter cannot be pushed down when using cast
> --
>
> Key: SPARK-45387
> URL: https://issues.apache.org/jira/browse/SPARK-45387
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.1, 3.1.2, 3.3.0, 3.4.0
>Reporter: TianyiMa
>Priority: Critical
>
> Suppose we have a partitioned table `table_pt` with partition colum `dt` 
> which is StringType and the table metadata is managed by Hive Metastore, if 
> we filter partition by dt = '123', this filter can be pushed down to data 
> source, but if the filter condition is number, e.g. dt = 123, that cannot be 
> pushed down to data source, causing spark to pull all of that table's 
> partition meta data to client, which is poor of performance if the table has 
> thousands of partitions and increasing the risk of hive metastore oom.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45369) Push down limit through generate

2023-09-28 Thread Yuming Wang (Jira)
Yuming Wang created SPARK-45369:
---

 Summary: Push down limit through generate
 Key: SPARK-45369
 URL: https://issues.apache.org/jira/browse/SPARK-45369
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 4.0.0
Reporter: Yuming Wang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-45282) Join loses records for cached datasets

2023-09-24 Thread Yuming Wang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-45282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17768399#comment-17768399
 ] 

Yuming Wang commented on SPARK-45282:
-

cc [~ulysses] [~cloud_fan]

> Join loses records for cached datasets
> --
>
> Key: SPARK-45282
> URL: https://issues.apache.org/jira/browse/SPARK-45282
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.1, 3.5.0
> Environment: spark 3.4.1 on apache hadoop 3.3.6 or kubernetes 1.26 or 
> databricks 13.3
>Reporter: koert kuipers
>Priority: Major
>  Labels: CorrectnessBug, correctness
>
> we observed this issue on spark 3.4.1 but it is also present on 3.5.0. it is 
> not present on spark 3.3.1.
> it only shows up in distributed environment. i cannot replicate in unit test. 
> however i did get it to show up on hadoop cluster, kubernetes, and on 
> databricks 13.3
> the issue is that records are dropped when two cached dataframes are joined. 
> it seems in spark 3.4.1 in queryplan some Exchanges are dropped as an 
> optimization while in spark 3.3.1 these Exhanges are still present. it seems 
> to be an issue with AQE with canChangeCachedPlanOutputPartitioning=true.
> to reproduce on distributed cluster these settings needed are:
> {code:java}
> spark.sql.adaptive.advisoryPartitionSizeInBytes 33554432
> spark.sql.adaptive.coalescePartitions.parallelismFirst false
> spark.sql.adaptive.enabled true
> spark.sql.optimizer.canChangeCachedPlanOutputPartitioning true {code}
> code using scala to reproduce is:
> {code:java}
> import java.util.UUID
> import org.apache.spark.sql.functions.col
> import spark.implicits._
> val data = (1 to 100).toDS().map(i => 
> UUID.randomUUID().toString).persist()
> val left = data.map(k => (k, 1))
> val right = data.map(k => (k, k)) // if i change this to k => (k, 1) it works!
> println("number of left " + left.count())
> println("number of right " + right.count())
> println("number of (left join right) " +
>   left.toDF("key", "value1").join(right.toDF("key", "value2"), "key").count()
> )
> val left1 = left
>   .toDF("key", "value1")
>   .repartition(col("key")) // comment out this line to make it work
>   .persist()
> println("number of left1 " + left1.count())
> val right1 = right
>   .toDF("key", "value2")
>   .repartition(col("key")) // comment out this line to make it work
>   .persist()
> println("number of right1 " + right1.count())
> println("number of (left1 join right1) " +  left1.join(right1, 
> "key").count()) // this gives incorrect result{code}
> this produces the following output:
> {code:java}
> number of left 100
> number of right 100
> number of (left join right) 100
> number of left1 100
> number of right1 100
> number of (left1 join right1) 859531 {code}
> note that the last number (the incorrect one) actually varies depending on 
> settings and cluster size etc.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-43406) enable spark sql to drop multiple partitions in one call

2023-09-14 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang resolved SPARK-43406.
-
Resolution: Duplicate

> enable spark sql to drop multiple partitions in one call
> 
>
> Key: SPARK-43406
> URL: https://issues.apache.org/jira/browse/SPARK-43406
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.1, 3.3.2, 3.4.0
>Reporter: chenruotao
>Priority: Major
>
> Now spark sql cannot drop multiple partitions in one call, so I fix it
> With this patch we can drop multiple partitions like this : 
> alter table test.table_partition drop partition(dt<='2023-04-02', 
> dt>='2023-03-31')



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43406) enable spark sql to drop multiple partitions in one call

2023-09-14 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-43406:

Target Version/s:   (was: 4.0.0)

> enable spark sql to drop multiple partitions in one call
> 
>
> Key: SPARK-43406
> URL: https://issues.apache.org/jira/browse/SPARK-43406
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.1, 3.3.2, 3.4.0
>Reporter: chenruotao
>Priority: Major
>
> Now spark sql cannot drop multiple partitions in one call, so I fix it
> With this patch we can drop multiple partitions like this : 
> alter table test.table_partition drop partition(dt<='2023-04-02', 
> dt>='2023-03-31')



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43406) enable spark sql to drop multiple partitions in one call

2023-09-14 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-43406:

Fix Version/s: (was: 3.5.0)

> enable spark sql to drop multiple partitions in one call
> 
>
> Key: SPARK-43406
> URL: https://issues.apache.org/jira/browse/SPARK-43406
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.1, 3.3.2, 3.4.0
>Reporter: chenruotao
>Priority: Major
>
> Now spark sql cannot drop multiple partitions in one call, so I fix it
> With this patch we can drop multiple partitions like this : 
> alter table test.table_partition drop partition(dt<='2023-04-02', 
> dt>='2023-03-31')



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43406) enable spark sql to drop multiple partitions in one call

2023-09-14 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-43406:

Target Version/s: 4.0.0

> enable spark sql to drop multiple partitions in one call
> 
>
> Key: SPARK-43406
> URL: https://issues.apache.org/jira/browse/SPARK-43406
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.1, 3.3.2, 3.4.0
>Reporter: chenruotao
>Priority: Major
>
> Now spark sql cannot drop multiple partitions in one call, so I fix it
> With this patch we can drop multiple partitions like this : 
> alter table test.table_partition drop partition(dt<='2023-04-02', 
> dt>='2023-03-31')



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (HIVE-27665) Change Filter Parser on HMS to allow backticks

2023-09-13 Thread Yuming Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17764944#comment-17764944
 ] 

Yuming Wang commented on HIVE-27665:


PR: https://github.com/apache/hive/pull/4667

> Change Filter Parser on HMS to allow backticks
> --
>
> Key: HIVE-27665
> URL: https://issues.apache.org/jira/browse/HIVE-27665
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Steve Carlin
>Assignee: Steve Carlin
>Priority: Major
>
> The ParititonFilter parser on HMS does not allow backticks.  This is 
> currently causing for a customer that has a column name of 'date' which is a 
> keyword.  
> There is more work to be done if we want the HS2 client to support filters 
> with backticked columns, but that will be done in a different Jira



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (SPARK-45089) Remove obsolete repo of DB2 JDBC driver

2023-09-07 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang resolved SPARK-45089.
-
Fix Version/s: 4.0.0
 Assignee: Cheng Pan
   Resolution: Fixed

Issue resolved by pull request 42820
https://github.com/apache/spark/pull/42820

> Remove obsolete repo of DB2 JDBC driver
> ---
>
> Key: SPARK-45089
> URL: https://issues.apache.org/jira/browse/SPARK-45089
> Project: Spark
>  Issue Type: Test
>  Components: Build, Tests
>Affects Versions: 4.0.0
>Reporter: Cheng Pan
>Assignee: Cheng Pan
>Priority: Major
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45071) Optimize the processing speed of `BinaryArithmetic#dataType` when processing multi-column data

2023-09-05 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-45071:

Fix Version/s: 3.5.1
   (was: 3.5.0)

> Optimize the processing speed of `BinaryArithmetic#dataType` when processing 
> multi-column data
> --
>
> Key: SPARK-45071
> URL: https://issues.apache.org/jira/browse/SPARK-45071
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0, 3.5.0
>Reporter: ming95
>Assignee: ming95
>Priority: Major
> Fix For: 3.4.2, 4.0.0, 3.5.1
>
>
> Since `BinaryArithmetic#dataType` will recursively process the datatype of 
> each node, the driver will be very slow when multiple columns are processed.
> For example, the following code:
> {code:java}
> ```
>     import spark.implicits._
>     import scala.util.Random
>     import org.apache.spark.sql.functions.sum
>     import org.apache.spark.sql.types.{StructType, StructField, IntegerType}
>     val N = 30
>     val M = 100
>     val columns = Seq.fill(N)(Random.alphanumeric.take(8).mkString)
>     val data = Seq.fill(M)(Seq.fill(N)(Random.nextInt(16) - 5))
>     val schema = StructType(columns.map(StructField(_, IntegerType)))
>     val rdd = spark.sparkContext.parallelize(data.map(Row.fromSeq(_)))
>     val df = spark.createDataFrame(rdd, schema)
>     val colExprs = columns.map(sum(_))
>     // gen a new column , and add the other 30 column
>     df.withColumn("new_col_sum", expr(columns.mkString(" + ")))
> ```
> {code}
>  
> This code will take a few minutes for the driver to execute in the spark3.4 
> version, but only takes a few seconds to execute in the spark3.2 version. 
> Related issue: SPARK-39316



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45071) Optimize the processing speed of `BinaryArithmetic#dataType` when processing multi-column data

2023-09-05 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang resolved SPARK-45071.
-
Fix Version/s: 3.5.0
   4.0.0
   3.4.2
   Resolution: Fixed

Issue resolved by pull request 42804
[https://github.com/apache/spark/pull/42804]

> Optimize the processing speed of `BinaryArithmetic#dataType` when processing 
> multi-column data
> --
>
> Key: SPARK-45071
> URL: https://issues.apache.org/jira/browse/SPARK-45071
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0, 3.5.0
>Reporter: ming95
>Assignee: ming95
>Priority: Major
> Fix For: 3.5.0, 4.0.0, 3.4.2
>
>
> Since `BinaryArithmetic#dataType` will recursively process the datatype of 
> each node, the driver will be very slow when multiple columns are processed.
> For example, the following code:
> {code:java}
> ```
>     import spark.implicits._
>     import scala.util.Random
>     import org.apache.spark.sql.functions.sum
>     import org.apache.spark.sql.types.{StructType, StructField, IntegerType}
>     val N = 30
>     val M = 100
>     val columns = Seq.fill(N)(Random.alphanumeric.take(8).mkString)
>     val data = Seq.fill(M)(Seq.fill(N)(Random.nextInt(16) - 5))
>     val schema = StructType(columns.map(StructField(_, IntegerType)))
>     val rdd = spark.sparkContext.parallelize(data.map(Row.fromSeq(_)))
>     val df = spark.createDataFrame(rdd, schema)
>     val colExprs = columns.map(sum(_))
>     // gen a new column , and add the other 30 column
>     df.withColumn("new_col_sum", expr(columns.mkString(" + ")))
> ```
> {code}
>  
> This code will take a few minutes for the driver to execute in the spark3.4 
> version, but only takes a few seconds to execute in the spark3.2 version. 
> Related issue: SPARK-39316



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45071) Optimize the processing speed of `BinaryArithmetic#dataType` when processing multi-column data

2023-09-05 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang reassigned SPARK-45071:
---

Assignee: ming95

> Optimize the processing speed of `BinaryArithmetic#dataType` when processing 
> multi-column data
> --
>
> Key: SPARK-45071
> URL: https://issues.apache.org/jira/browse/SPARK-45071
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0, 3.5.0
>Reporter: ming95
>Assignee: ming95
>Priority: Major
>
> Since `BinaryArithmetic#dataType` will recursively process the datatype of 
> each node, the driver will be very slow when multiple columns are processed.
> For example, the following code:
> {code:java}
> ```
>     import spark.implicits._
>     import scala.util.Random
>     import org.apache.spark.sql.functions.sum
>     import org.apache.spark.sql.types.{StructType, StructField, IntegerType}
>     val N = 30
>     val M = 100
>     val columns = Seq.fill(N)(Random.alphanumeric.take(8).mkString)
>     val data = Seq.fill(M)(Seq.fill(N)(Random.nextInt(16) - 5))
>     val schema = StructType(columns.map(StructField(_, IntegerType)))
>     val rdd = spark.sparkContext.parallelize(data.map(Row.fromSeq(_)))
>     val df = spark.createDataFrame(rdd, schema)
>     val colExprs = columns.map(sum(_))
>     // gen a new column , and add the other 30 column
>     df.withColumn("new_col_sum", expr(columns.mkString(" + ")))
> ```
> {code}
>  
> This code will take a few minutes for the driver to execute in the spark3.4 
> version, but only takes a few seconds to execute in the spark3.2 version. 
> Related issue: SPARK-39316



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45020) org.apache.spark.sql.catalyst.analysis.NoSuchDatabaseException: Database 'default' not found (state=08S01,code=0)

2023-09-04 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-45020:

Fix Version/s: (was: 3.1.0)

> org.apache.spark.sql.catalyst.analysis.NoSuchDatabaseException: Database 
> 'default' not found (state=08S01,code=0)
> -
>
> Key: SPARK-45020
> URL: https://issues.apache.org/jira/browse/SPARK-45020
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.1.0
>Reporter: Sruthi Mooriyathvariam
>Priority: Minor
>
> There is an alert that fires up when a Spark 3.1 cluster is created using 
> shared metastore with Spark 2.4. The alert says DefaultDatabase does not 
> exist. This is misleading and thus we need to suppress this alert. 
> In the class SessionCatalog.scala, the method requireDbExists() is not 
> handling the case when the db = defaultDB. This needs to be added to suppress 
> this misleading alert. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-44846) PushFoldableIntoBranches in complex grouping expressions may cause bindReference error

2023-09-04 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang reassigned SPARK-44846:
---

Assignee: zhuml

> PushFoldableIntoBranches in complex grouping expressions may cause 
> bindReference error
> --
>
> Key: SPARK-44846
> URL: https://issues.apache.org/jira/browse/SPARK-44846
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.1
>Reporter: zhuml
>Assignee: zhuml
>Priority: Major
>
> SQL:
> {code:java}
> select c*2 as d from
> (select if(b > 1, 1, b) as c from
> (select if(a < 0, 0 ,a) as b from t group by b) t1
> group by c) t2 {code}
> ERROR:
> {code:java}
> Couldn't find _groupingexpression#15 in [if ((_groupingexpression#15 > 1)) 1 
> else _groupingexpression#15#16]
> java.lang.IllegalStateException: Couldn't find _groupingexpression#15 in [if 
> ((_groupingexpression#15 > 1)) 1 else _groupingexpression#15#16]
>     at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:80)
>     at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:73)
>     at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:461)
>     at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(origin.scala:76)
>     at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:461)
>     at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$3(TreeNode.scala:466)
>     at 
> org.apache.spark.sql.catalyst.trees.BinaryLike.mapChildren(TreeNode.scala:1241)
>     at 
> org.apache.spark.sql.catalyst.trees.BinaryLike.mapChildren$(TreeNode.scala:1240)
>     at 
> org.apache.spark.sql.catalyst.expressions.BinaryExpression.mapChildren(Expression.scala:653)
>     at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:466)
>     at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$3(TreeNode.scala:466)
>     at 
> org.apache.spark.sql.catalyst.trees.TernaryLike.mapChildren(TreeNode.scala:1272)
>     at 
> org.apache.spark.sql.catalyst.trees.TernaryLike.mapChildren$(TreeNode.scala:1271)
>     at 
> org.apache.spark.sql.catalyst.expressions.If.mapChildren(conditionalExpressions.scala:41)
>     at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:466)
>     at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$3(TreeNode.scala:466)
>     at 
> org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1215)
>     at 
> org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1214)
>     at 
> org.apache.spark.sql.catalyst.expressions.UnaryExpression.mapChildren(Expression.scala:533)
>     at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:466)
>     at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:437)
>     at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:405)
>     at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$.bindReference(BoundAttribute.scala:73)
>     at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$.$anonfun$bindReferences$1(BoundAttribute.scala:94)
>     at scala.collection.immutable.List.map(List.scala:293)
>     at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$.bindReferences(BoundAttribute.scala:94)
>     at 
> org.apache.spark.sql.execution.aggregate.HashAggregateExec.generateResultFunction(HashAggregateExec.scala:360)
>     at 
> org.apache.spark.sql.execution.aggregate.HashAggregateExec.doProduceWithKeys(HashAggregateExec.scala:538)
>     at 
> org.apache.spark.sql.execution.aggregate.AggregateCodegenSupport.doProduce(AggregateCodegenSupport.scala:69)
>     at 
> org.apache.spark.sql.execution.aggregate.AggregateCodegenSupport.doProduce$(AggregateCodegenSupport.scala:65)
>     at 
> org.apache.spark.sql.execution.aggregate.HashAggregateExec.doProduce(HashAggregateExec.scala:49)
>     at 
> org.apache.spark.sql.execution.CodegenSupport.$anonfun$produce$1(WholeStageCodegenExec.scala:97)
>     at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:246)
>     at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>     at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:243)
>     at 
> org.apache.spark.sql.execution.CodegenSupport.produce(WholeStageCodegenExec.scala:92)
>     at 
> org.apache.spark.sql.execution.CodegenSupport.produce$(WholeStageCodegenExec.scala:92)
>     at 

[jira] [Resolved] (SPARK-44846) PushFoldableIntoBranches in complex grouping expressions may cause bindReference error

2023-09-04 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang resolved SPARK-44846.
-
Fix Version/s: 3.5.0
   4.0.0
   3.4.2
   Resolution: Fixed

Issue resolved by pull request 42633
[https://github.com/apache/spark/pull/42633]

> PushFoldableIntoBranches in complex grouping expressions may cause 
> bindReference error
> --
>
> Key: SPARK-44846
> URL: https://issues.apache.org/jira/browse/SPARK-44846
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.1
>Reporter: zhuml
>Assignee: zhuml
>Priority: Major
> Fix For: 3.5.0, 4.0.0, 3.4.2
>
>
> SQL:
> {code:java}
> select c*2 as d from
> (select if(b > 1, 1, b) as c from
> (select if(a < 0, 0 ,a) as b from t group by b) t1
> group by c) t2 {code}
> ERROR:
> {code:java}
> Couldn't find _groupingexpression#15 in [if ((_groupingexpression#15 > 1)) 1 
> else _groupingexpression#15#16]
> java.lang.IllegalStateException: Couldn't find _groupingexpression#15 in [if 
> ((_groupingexpression#15 > 1)) 1 else _groupingexpression#15#16]
>     at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:80)
>     at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:73)
>     at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:461)
>     at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(origin.scala:76)
>     at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:461)
>     at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$3(TreeNode.scala:466)
>     at 
> org.apache.spark.sql.catalyst.trees.BinaryLike.mapChildren(TreeNode.scala:1241)
>     at 
> org.apache.spark.sql.catalyst.trees.BinaryLike.mapChildren$(TreeNode.scala:1240)
>     at 
> org.apache.spark.sql.catalyst.expressions.BinaryExpression.mapChildren(Expression.scala:653)
>     at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:466)
>     at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$3(TreeNode.scala:466)
>     at 
> org.apache.spark.sql.catalyst.trees.TernaryLike.mapChildren(TreeNode.scala:1272)
>     at 
> org.apache.spark.sql.catalyst.trees.TernaryLike.mapChildren$(TreeNode.scala:1271)
>     at 
> org.apache.spark.sql.catalyst.expressions.If.mapChildren(conditionalExpressions.scala:41)
>     at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:466)
>     at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$3(TreeNode.scala:466)
>     at 
> org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1215)
>     at 
> org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1214)
>     at 
> org.apache.spark.sql.catalyst.expressions.UnaryExpression.mapChildren(Expression.scala:533)
>     at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:466)
>     at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:437)
>     at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:405)
>     at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$.bindReference(BoundAttribute.scala:73)
>     at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$.$anonfun$bindReferences$1(BoundAttribute.scala:94)
>     at scala.collection.immutable.List.map(List.scala:293)
>     at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$.bindReferences(BoundAttribute.scala:94)
>     at 
> org.apache.spark.sql.execution.aggregate.HashAggregateExec.generateResultFunction(HashAggregateExec.scala:360)
>     at 
> org.apache.spark.sql.execution.aggregate.HashAggregateExec.doProduceWithKeys(HashAggregateExec.scala:538)
>     at 
> org.apache.spark.sql.execution.aggregate.AggregateCodegenSupport.doProduce(AggregateCodegenSupport.scala:69)
>     at 
> org.apache.spark.sql.execution.aggregate.AggregateCodegenSupport.doProduce$(AggregateCodegenSupport.scala:65)
>     at 
> org.apache.spark.sql.execution.aggregate.HashAggregateExec.doProduce(HashAggregateExec.scala:49)
>     at 
> org.apache.spark.sql.execution.CodegenSupport.$anonfun$produce$1(WholeStageCodegenExec.scala:97)
>     at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:246)
>     at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>     at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:243)
>  

[jira] [Commented] (HIVE-27667) Fix get partitions with max_parts

2023-09-02 Thread Yuming Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17761512#comment-17761512
 ] 

Yuming Wang commented on HIVE-27667:


https://github.com/apache/hive/pull/4662

> Fix get partitions with max_parts
> -
>
> Key: HIVE-27667
> URL: https://issues.apache.org/jira/browse/HIVE-27667
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 4.0.0-beta-1
>Reporter: Yuming Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27667) Fix get partitions with max_parts

2023-09-02 Thread Yuming Wang (Jira)
Yuming Wang created HIVE-27667:
--

 Summary: Fix get partitions with max_parts
 Key: HIVE-27667
 URL: https://issues.apache.org/jira/browse/HIVE-27667
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 4.0.0-beta-1
Reporter: Yuming Wang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27660) Update some test results for branch-2.3

2023-08-31 Thread Yuming Wang (Jira)
Yuming Wang created HIVE-27660:
--

 Summary: Update some test results for branch-2.3
 Key: HIVE-27660
 URL: https://issues.apache.org/jira/browse/HIVE-27660
 Project: Hive
  Issue Type: Test
  Components: Test
Affects Versions: 2.3.10
Reporter: Yuming Wang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27659) Make partition order configurable if we are not returning all partitions

2023-08-31 Thread Yuming Wang (Jira)
Yuming Wang created HIVE-27659:
--

 Summary: Make partition order configurable if we are not returning 
all partitions
 Key: HIVE-27659
 URL: https://issues.apache.org/jira/browse/HIVE-27659
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Affects Versions: 4.0.0-beta-1
Reporter: Yuming Wang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (SPARK-44892) Add official image Dockerfile for Spark 3.3.3

2023-08-22 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-44892:

Fix Version/s: (was: 4.0.0)

> Add official image Dockerfile for Spark 3.3.3
> -
>
> Key: SPARK-44892
> URL: https://issues.apache.org/jira/browse/SPARK-44892
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Docker
>Affects Versions: 3.3.3
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44892) Add official image Dockerfile for Spark 3.3.3

2023-08-22 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang resolved SPARK-44892.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 54
[https://github.com/apache/spark-docker/pull/54]

> Add official image Dockerfile for Spark 3.3.3
> -
>
> Key: SPARK-44892
> URL: https://issues.apache.org/jira/browse/SPARK-44892
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Docker
>Affects Versions: 3.3.3
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-44892) Add official image Dockerfile for Spark 3.3.3

2023-08-22 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang reassigned SPARK-44892:
---

Assignee: Yuming Wang

> Add official image Dockerfile for Spark 3.3.3
> -
>
> Key: SPARK-44892
> URL: https://issues.apache.org/jira/browse/SPARK-44892
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Docker
>Affects Versions: 3.3.3
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44892) Add official image Dockerfile for Spark 3.3.3

2023-08-21 Thread Yuming Wang (Jira)
Yuming Wang created SPARK-44892:
---

 Summary: Add official image Dockerfile for Spark 3.3.3
 Key: SPARK-44892
 URL: https://issues.apache.org/jira/browse/SPARK-44892
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Docker
Affects Versions: 3.3.3
Reporter: Yuming Wang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44813) The JIRA Python misses our assignee when it searches user again

2023-08-21 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-44813:

Fix Version/s: 3.3.4
   (was: 3.3.3)

> The JIRA Python misses our assignee when it searches user again
> ---
>
> Key: SPARK-44813
> URL: https://issues.apache.org/jira/browse/SPARK-44813
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
> Fix For: 3.4.2, 3.5.0, 4.0.0, 3.3.4
>
>
> {code:java}
> >>> assignee = asf_jira.user("yao")
> >>> "SPARK-44801"'SPARK-44801'
> >>> asf_jira.assign_issue(issue.key, assignee.name)
> response text = {"errorMessages":[],"errors":{"assignee":"User 'airhot' 
> cannot be assigned issues."}} {code}
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44857) Fix getBaseURI error in Spark Worker LogPage UI buttons

2023-08-21 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-44857:

Fix Version/s: 3.3.4
   (was: 3.3.3)

> Fix getBaseURI error in Spark Worker LogPage UI buttons
> ---
>
> Key: SPARK-44857
> URL: https://issues.apache.org/jira/browse/SPARK-44857
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Web UI
>Affects Versions: 3.2.0, 3.2.4, 3.3.2, 3.4.1, 3.5.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.4.2, 3.5.0, 4.0.0, 3.3.4
>
> Attachments: Screenshot 2023-08-17 at 2.38.45 PM.png
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-44880) Remove unnecessary curly braces at the end of the thread locks info

2023-08-19 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang reassigned SPARK-44880:
---

Assignee: Kent Yao

> Remove unnecessary curly braces at the end of the thread locks info
> ---
>
> Key: SPARK-44880
> URL: https://issues.apache.org/jira/browse/SPARK-44880
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.3.2, 3.4.1, 3.5.0, 4.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
> Fix For: 3.5.0, 4.0.0
>
>
> Remove unnecessary curly braces at the end of the thread locks info



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44880) Remove unnecessary curly braces at the end of the thread locks info

2023-08-19 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang resolved SPARK-44880.
-
Fix Version/s: 3.5.0
   4.0.0
   Resolution: Fixed

Issue resolved by pull request 42571
[https://github.com/apache/spark/pull/42571]

> Remove unnecessary curly braces at the end of the thread locks info
> ---
>
> Key: SPARK-44880
> URL: https://issues.apache.org/jira/browse/SPARK-44880
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.3.2, 3.4.1, 3.5.0, 4.0.0
>Reporter: Kent Yao
>Priority: Major
> Fix For: 3.5.0, 4.0.0
>
>
> Remove unnecessary curly braces at the end of the thread locks info



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44880) Remove unnecessary curly braces at the end of the thread locks info

2023-08-19 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-44880:

Fix Version/s: 3.5.1
   (was: 3.5.0)

> Remove unnecessary curly braces at the end of the thread locks info
> ---
>
> Key: SPARK-44880
> URL: https://issues.apache.org/jira/browse/SPARK-44880
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.3.2, 3.4.1, 3.5.0, 4.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
> Fix For: 4.0.0, 3.5.1
>
>
> Remove unnecessary curly braces at the end of the thread locks info



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44792) Upgrade curator to 5.2.0

2023-08-13 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang resolved SPARK-44792.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 42474
[https://github.com/apache/spark/pull/42474]

> Upgrade curator to 5.2.0
> 
>
> Key: SPARK-44792
> URL: https://issues.apache.org/jira/browse/SPARK-44792
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 4.0.0
>
>
> https://issues.apache.org/jira/browse/HADOOP-17612
> https://issues.apache.org/jira/browse/HADOOP-18515



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-44792) Upgrade curator to 5.2.0

2023-08-13 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang reassigned SPARK-44792:
---

Assignee: Yuming Wang

> Upgrade curator to 5.2.0
> 
>
> Key: SPARK-44792
> URL: https://issues.apache.org/jira/browse/SPARK-44792
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
>
> https://issues.apache.org/jira/browse/HADOOP-17612
> https://issues.apache.org/jira/browse/HADOOP-18515



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44792) Upgrade curator to 5.2.0

2023-08-12 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-44792:

Description: 
https://issues.apache.org/jira/browse/HADOOP-17612
https://issues.apache.org/jira/browse/HADOOP-18515

  was:https://issues.apache.org/jira/browse/HADOOP-17612


> Upgrade curator to 5.2.0
> 
>
> Key: SPARK-44792
> URL: https://issues.apache.org/jira/browse/SPARK-44792
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> https://issues.apache.org/jira/browse/HADOOP-17612
> https://issues.apache.org/jira/browse/HADOOP-18515



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44792) Upgrade curator to 5.2.0

2023-08-12 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-44792:

Description: https://issues.apache.org/jira/browse/HADOOP-17612

> Upgrade curator to 5.2.0
> 
>
> Key: SPARK-44792
> URL: https://issues.apache.org/jira/browse/SPARK-44792
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> https://issues.apache.org/jira/browse/HADOOP-17612



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44792) Upgrade curator to 5.2.0

2023-08-12 Thread Yuming Wang (Jira)
Yuming Wang created SPARK-44792:
---

 Summary: Upgrade curator to 5.2.0
 Key: SPARK-44792
 URL: https://issues.apache.org/jira/browse/SPARK-44792
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 4.0.0
Reporter: Yuming Wang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44700) Rule OptimizeCsvJsonExprs should not be applied to expression like from_json(regexp_replace)

2023-08-10 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-44700:

Fix Version/s: 3.3.0

> Rule OptimizeCsvJsonExprs should not be applied to expression like 
> from_json(regexp_replace)
> 
>
> Key: SPARK-44700
> URL: https://issues.apache.org/jira/browse/SPARK-44700
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.1
>Reporter: jiahong.li
>Priority: Minor
> Fix For: 3.3.0
>
>
> _SQL_ like below:
> select tmp.* 
>  from
>  (select
>         device_id, ads_id, 
>         from_json(regexp_replace(device_personas, '(?<=(\\{|,))"device_', 
> '"user_device_'), ${device_schema}) as tmp
>         from input )
> ${device_schema} includes more than 100 fields.
> if Rule: OptimizeCsvJsonExprs  been applied, the expression, regexp_replace, 
> will be invoked many times, that costs so much time.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44700) Rule OptimizeCsvJsonExprs should not be applied to expression like from_json(regexp_replace)

2023-08-10 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang resolved SPARK-44700.
-
Resolution: Fixed

Please upgrade Spark to the latest version to fix this issue.

> Rule OptimizeCsvJsonExprs should not be applied to expression like 
> from_json(regexp_replace)
> 
>
> Key: SPARK-44700
> URL: https://issues.apache.org/jira/browse/SPARK-44700
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.1
>Reporter: jiahong.li
>Priority: Minor
>
> _SQL_ like below:
> select tmp.* 
>  from
>  (select
>         device_id, ads_id, 
>         from_json(regexp_replace(device_personas, '(?<=(\\{|,))"device_', 
> '"user_device_'), ${device_schema}) as tmp
>         from input )
> ${device_schema} includes more than 100 fields.
> if Rule: OptimizeCsvJsonExprs  been applied, the expression, regexp_replace, 
> will be invoked many times, that costs so much time.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44700) Rule OptimizeCsvJsonExprs should not be applied to expression like from_json(regexp_replace)

2023-08-10 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-44700:

Affects Version/s: 3.1.1
   (was: 3.4.0)
   (was: 3.4.1)

> Rule OptimizeCsvJsonExprs should not be applied to expression like 
> from_json(regexp_replace)
> 
>
> Key: SPARK-44700
> URL: https://issues.apache.org/jira/browse/SPARK-44700
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.1
>Reporter: jiahong.li
>Priority: Minor
>
> _SQL_ like below:
> select tmp.* 
>  from
>  (select
>         device_id, ads_id, 
>         from_json(regexp_replace(device_personas, '(?<=(\\{|,))"device_', 
> '"user_device_'), ${device_schema}) as tmp
>         from input )
> ${device_schema} includes more than 100 fields.
> if Rule: OptimizeCsvJsonExprs  been applied, the expression, regexp_replace, 
> will be invoked many times, that costs so much time.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24087) Avoid shuffle when join keys are a super-set of bucket keys

2023-08-08 Thread Yuming Wang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-24087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17752239#comment-17752239
 ] 

Yuming Wang commented on SPARK-24087:
-

Fixed by SPARK-35703.

> Avoid shuffle when join keys are a super-set of bucket keys
> ---
>
> Key: SPARK-24087
> URL: https://issues.apache.org/jira/browse/SPARK-24087
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: yucai
>Priority: Major
>  Labels: bulk-closed
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44719) NoClassDefFoundError when using Hive UDF

2023-08-08 Thread Yuming Wang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17752023#comment-17752023
 ] 

Yuming Wang commented on SPARK-44719:
-

There are two ways to fix it:
1. Upgrade the built-in hive to 2.3.10 with the following patch.
2. Revert SPARK-43225.

https://github.com/apache/hive/pull/4562
https://github.com/apache/hive/pull/4563
https://github.com/apache/hive/pull/4564

> NoClassDefFoundError when using Hive UDF
> 
>
> Key: SPARK-44719
> URL: https://issues.apache.org/jira/browse/SPARK-44719
> Project: Spark
>  Issue Type: Bug
>  Components: Build, SQL
>Affects Versions: 3.5.0
>Reporter: Yuming Wang
>Priority: Major
> Attachments: HiveUDFs-1.0-SNAPSHOT.jar
>
>
> How to reproduce:
> {noformat}
> spark-sql (default)> add jar 
> /Users/yumwang/Downloads/HiveUDFs-1.0-SNAPSHOT.jar;
> Time taken: 0.413 seconds
> spark-sql (default)> CREATE TEMPORARY FUNCTION long_to_ip as 
> 'net.petrabarus.hiveudfs.LongToIP';
> Time taken: 0.038 seconds
> spark-sql (default)> SELECT long_to_ip(2130706433L) FROM range(10);
> 23/08/08 20:17:58 ERROR SparkSQLDriver: Failed in [SELECT 
> long_to_ip(2130706433L) FROM range(10)]
> java.lang.NoClassDefFoundError: org/codehaus/jackson/map/type/TypeFactory
>   at org.apache.hadoop.hive.ql.udf.UDFJson.(UDFJson.java:64)
>   at java.lang.Class.forName0(Native Method)
>   at java.lang.Class.forName(Class.java:348)
> ...
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44719) NoClassDefFoundError when using Hive UDF

2023-08-08 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-44719:

Description: 
How to reproduce:
{noformat}
spark-sql (default)> add jar /Users/yumwang/Downloads/HiveUDFs-1.0-SNAPSHOT.jar;
Time taken: 0.413 seconds
spark-sql (default)> CREATE TEMPORARY FUNCTION long_to_ip as 
'net.petrabarus.hiveudfs.LongToIP';
Time taken: 0.038 seconds
spark-sql (default)> SELECT long_to_ip(2130706433L) FROM range(10);
23/08/08 20:17:58 ERROR SparkSQLDriver: Failed in [SELECT 
long_to_ip(2130706433L) FROM range(10)]
java.lang.NoClassDefFoundError: org/codehaus/jackson/map/type/TypeFactory
at org.apache.hadoop.hive.ql.udf.UDFJson.(UDFJson.java:64)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
...
{noformat}


  was:
How to reproduce:
```
spark-sql (default)> add jar /Users/yumwang/Downloads/HiveUDFs-1.0-SNAPSHOT.jar;
Time taken: 0.413 seconds
spark-sql (default)> CREATE TEMPORARY FUNCTION long_to_ip as 
'net.petrabarus.hiveudfs.LongToIP';
Time taken: 0.038 seconds
spark-sql (default)> SELECT long_to_ip(2130706433L) FROM range(10);
23/08/08 20:17:58 ERROR SparkSQLDriver: Failed in [SELECT 
long_to_ip(2130706433L) FROM range(10)]
java.lang.NoClassDefFoundError: org/codehaus/jackson/map/type/TypeFactory
at org.apache.hadoop.hive.ql.udf.UDFJson.(UDFJson.java:64)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
...
```


> NoClassDefFoundError when using Hive UDF
> 
>
> Key: SPARK-44719
> URL: https://issues.apache.org/jira/browse/SPARK-44719
> Project: Spark
>  Issue Type: Bug
>  Components: Build, SQL
>Affects Versions: 3.5.0
>Reporter: Yuming Wang
>Priority: Major
> Attachments: HiveUDFs-1.0-SNAPSHOT.jar
>
>
> How to reproduce:
> {noformat}
> spark-sql (default)> add jar 
> /Users/yumwang/Downloads/HiveUDFs-1.0-SNAPSHOT.jar;
> Time taken: 0.413 seconds
> spark-sql (default)> CREATE TEMPORARY FUNCTION long_to_ip as 
> 'net.petrabarus.hiveudfs.LongToIP';
> Time taken: 0.038 seconds
> spark-sql (default)> SELECT long_to_ip(2130706433L) FROM range(10);
> 23/08/08 20:17:58 ERROR SparkSQLDriver: Failed in [SELECT 
> long_to_ip(2130706433L) FROM range(10)]
> java.lang.NoClassDefFoundError: org/codehaus/jackson/map/type/TypeFactory
>   at org.apache.hadoop.hive.ql.udf.UDFJson.(UDFJson.java:64)
>   at java.lang.Class.forName0(Native Method)
>   at java.lang.Class.forName(Class.java:348)
> ...
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44719) NoClassDefFoundError when using Hive UDF

2023-08-08 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-44719:

Attachment: HiveUDFs-1.0-SNAPSHOT.jar

> NoClassDefFoundError when using Hive UDF
> 
>
> Key: SPARK-44719
> URL: https://issues.apache.org/jira/browse/SPARK-44719
> Project: Spark
>  Issue Type: Bug
>  Components: Build, SQL
>Affects Versions: 3.5.0
>Reporter: Yuming Wang
>Priority: Major
> Attachments: HiveUDFs-1.0-SNAPSHOT.jar
>
>
> How to reproduce:
> ```
> spark-sql (default)> add jar 
> /Users/yumwang/Downloads/HiveUDFs-1.0-SNAPSHOT.jar;
> Time taken: 0.413 seconds
> spark-sql (default)> CREATE TEMPORARY FUNCTION long_to_ip as 
> 'net.petrabarus.hiveudfs.LongToIP';
> Time taken: 0.038 seconds
> spark-sql (default)> SELECT long_to_ip(2130706433L) FROM range(10);
> 23/08/08 20:17:58 ERROR SparkSQLDriver: Failed in [SELECT 
> long_to_ip(2130706433L) FROM range(10)]
> java.lang.NoClassDefFoundError: org/codehaus/jackson/map/type/TypeFactory
>   at org.apache.hadoop.hive.ql.udf.UDFJson.(UDFJson.java:64)
>   at java.lang.Class.forName0(Native Method)
>   at java.lang.Class.forName(Class.java:348)
> ...
> ```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44719) NoClassDefFoundError when using Hive UDF

2023-08-08 Thread Yuming Wang (Jira)
Yuming Wang created SPARK-44719:
---

 Summary: NoClassDefFoundError when using Hive UDF
 Key: SPARK-44719
 URL: https://issues.apache.org/jira/browse/SPARK-44719
 Project: Spark
  Issue Type: Bug
  Components: Build, SQL
Affects Versions: 3.5.0
Reporter: Yuming Wang
 Attachments: HiveUDFs-1.0-SNAPSHOT.jar

How to reproduce:
```
spark-sql (default)> add jar /Users/yumwang/Downloads/HiveUDFs-1.0-SNAPSHOT.jar;
Time taken: 0.413 seconds
spark-sql (default)> CREATE TEMPORARY FUNCTION long_to_ip as 
'net.petrabarus.hiveudfs.LongToIP';
Time taken: 0.038 seconds
spark-sql (default)> SELECT long_to_ip(2130706433L) FROM range(10);
23/08/08 20:17:58 ERROR SparkSQLDriver: Failed in [SELECT 
long_to_ip(2130706433L) FROM range(10)]
java.lang.NoClassDefFoundError: org/codehaus/jackson/map/type/TypeFactory
at org.apache.hadoop.hive.ql.udf.UDFJson.(UDFJson.java:64)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
...
```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (HIVE-27581) Backport jackson upgrade related patch to branch-2.3

2023-08-08 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated HIVE-27581:
---
Description: 
2.9.4 -> 2.9.5: 
https://github.com/apache/hive/commit/33e208c0709fac5bd6380aacfba49448412d112b
2.9.5 -> 2.9.8: 
https://github.com/apache/hive/commit/2fa22bf360898dc8fd1408bfcc96e1c6aeaf9a53
2.9.8 -> 2.9.9: 
https://github.com/apache/hive/commit/7fc5a88a149cf0767a5846cbb6ace22d8e99a63c
2.9.9 -> 2.10.0: 
https://github.com/apache/hive/commit/31935896a78f95ae0792ae7f29960d1b604fbe9d
2.10.0 -> 2.10.5: 
https://github.com/apache/hive/commit/aa5b6b7968d90d027c5336bf430719acbff70f68
2.10.5 -> 2.12.0: 
https://github.com/apache/hive/commit/1e8cc12f2d60973b7674813ae82c8f3372423d54
---
2.12.0 -> 2.12.7: 
https://github.com/apache/hive/commit/568ded4b22a020f4d2d3567f15b287b25a3f2b71
2.12.7 -> 2.13.5: 
https://github.com/apache/hive/commit/8236426ed7aa87430e82d47effe946e38fa1f7f2

  was:
2.9.4 -> 2.9.5: 
https://github.com/apache/hive/commit/33e208c0709fac5bd6380aacfba49448412d112b
2.9.5 -> 2.9.8: 
https://github.com/apache/hive/commit/2fa22bf360898dc8fd1408bfcc96e1c6aeaf9a53
2.9.8 -> 2.9.9: 
https://github.com/apache/hive/commit/7fc5a88a149cf0767a5846cbb6ace22d8e99a63c
2.9.9 -> 2.10.0: 
https://github.com/apache/hive/commit/31935896a78f95ae0792ae7f29960d1b604fbe9d
2.10.0 -> 2.10.5: 
https://github.com/apache/hive/commit/aa5b6b7968d90d027c5336bf430719acbff70f68
2.10.5 -> 2.12.0: 
https://github.com/apache/hive/commit/1e8cc12f2d60973b7674813ae82c8f3372423d54
2.12.0 -> 2.12.7: 
https://github.com/apache/hive/commit/568ded4b22a020f4d2d3567f15b287b25a3f2b71
2.12.7 -> 2.13.5: 
https://github.com/apache/hive/commit/8236426ed7aa87430e82d47effe946e38fa1f7f2


> Backport jackson upgrade related patch to branch-2.3
> 
>
> Key: HIVE-27581
> URL: https://issues.apache.org/jira/browse/HIVE-27581
> Project: Hive
>  Issue Type: Task
>Reporter: Yuming Wang
>Priority: Major
>
> 2.9.4 -> 2.9.5: 
> https://github.com/apache/hive/commit/33e208c0709fac5bd6380aacfba49448412d112b
> 2.9.5 -> 2.9.8: 
> https://github.com/apache/hive/commit/2fa22bf360898dc8fd1408bfcc96e1c6aeaf9a53
> 2.9.8 -> 2.9.9: 
> https://github.com/apache/hive/commit/7fc5a88a149cf0767a5846cbb6ace22d8e99a63c
> 2.9.9 -> 2.10.0: 
> https://github.com/apache/hive/commit/31935896a78f95ae0792ae7f29960d1b604fbe9d
> 2.10.0 -> 2.10.5: 
> https://github.com/apache/hive/commit/aa5b6b7968d90d027c5336bf430719acbff70f68
> 2.10.5 -> 2.12.0: 
> https://github.com/apache/hive/commit/1e8cc12f2d60973b7674813ae82c8f3372423d54
> ---
> 2.12.0 -> 2.12.7: 
> https://github.com/apache/hive/commit/568ded4b22a020f4d2d3567f15b287b25a3f2b71
> 2.12.7 -> 2.13.5: 
> https://github.com/apache/hive/commit/8236426ed7aa87430e82d47effe946e38fa1f7f2



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27581) Backport jackson upgrade related patch to branch-2.3

2023-08-08 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated HIVE-27581:
---
Description: 
2.9.4 -> 2.9.5: 
https://github.com/apache/hive/commit/33e208c0709fac5bd6380aacfba49448412d112b
2.9.5 -> 2.9.8: 
https://github.com/apache/hive/commit/2fa22bf360898dc8fd1408bfcc96e1c6aeaf9a53
2.9.8 -> 2.9.9: 
https://github.com/apache/hive/commit/7fc5a88a149cf0767a5846cbb6ace22d8e99a63c
2.9.9 -> 2.10.0: 
https://github.com/apache/hive/commit/31935896a78f95ae0792ae7f29960d1b604fbe9d
2.10.0 -> 2.10.5: 
https://github.com/apache/hive/commit/aa5b6b7968d90d027c5336bf430719acbff70f68
2.10.5 -> 2.12.0: 
https://github.com/apache/hive/commit/1e8cc12f2d60973b7674813ae82c8f3372423d54
2.12.0 -> 2.12.7: 
https://github.com/apache/hive/commit/568ded4b22a020f4d2d3567f15b287b25a3f2b71
2.12.7 -> 2.13.5: 
https://github.com/apache/hive/commit/8236426ed7aa87430e82d47effe946e38fa1f7f2

> Backport jackson upgrade related patch to branch-2.3
> 
>
> Key: HIVE-27581
> URL: https://issues.apache.org/jira/browse/HIVE-27581
> Project: Hive
>  Issue Type: Task
>Reporter: Yuming Wang
>Priority: Major
>
> 2.9.4 -> 2.9.5: 
> https://github.com/apache/hive/commit/33e208c0709fac5bd6380aacfba49448412d112b
> 2.9.5 -> 2.9.8: 
> https://github.com/apache/hive/commit/2fa22bf360898dc8fd1408bfcc96e1c6aeaf9a53
> 2.9.8 -> 2.9.9: 
> https://github.com/apache/hive/commit/7fc5a88a149cf0767a5846cbb6ace22d8e99a63c
> 2.9.9 -> 2.10.0: 
> https://github.com/apache/hive/commit/31935896a78f95ae0792ae7f29960d1b604fbe9d
> 2.10.0 -> 2.10.5: 
> https://github.com/apache/hive/commit/aa5b6b7968d90d027c5336bf430719acbff70f68
> 2.10.5 -> 2.12.0: 
> https://github.com/apache/hive/commit/1e8cc12f2d60973b7674813ae82c8f3372423d54
> 2.12.0 -> 2.12.7: 
> https://github.com/apache/hive/commit/568ded4b22a020f4d2d3567f15b287b25a3f2b71
> 2.12.7 -> 2.13.5: 
> https://github.com/apache/hive/commit/8236426ed7aa87430e82d47effe946e38fa1f7f2



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27581) Backport jackson upgrade related patch to branch-2.3

2023-08-08 Thread Yuming Wang (Jira)
Yuming Wang created HIVE-27581:
--

 Summary: Backport jackson upgrade related patch to branch-2.3
 Key: HIVE-27581
 URL: https://issues.apache.org/jira/browse/HIVE-27581
 Project: Hive
  Issue Type: Task
Reporter: Yuming Wang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27580) Backport HIVE-20071: Migrate to jackson 2.x and prevent usage

2023-08-08 Thread Yuming Wang (Jira)
Yuming Wang created HIVE-27580:
--

 Summary: Backport HIVE-20071: Migrate to jackson 2.x and prevent 
usage
 Key: HIVE-27580
 URL: https://issues.apache.org/jira/browse/HIVE-27580
 Project: Hive
  Issue Type: Task
Reporter: Yuming Wang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


  1   2   3   4   5   6   7   8   9   10   >