[jira] [Commented] (SPARK-41135) Rename UNSUPPORTED_EMPTY_LOCATION to INVALID_EMPTY_LOCATION

2022-11-14 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17633573#comment-17633573
 ] 

Apache Spark commented on SPARK-41135:
--

User 'itholic' has created a pull request for this issue:
https://github.com/apache/spark/pull/38650

> Rename UNSUPPORTED_EMPTY_LOCATION to INVALID_EMPTY_LOCATION
> ---
>
> Key: SPARK-41135
> URL: https://issues.apache.org/jira/browse/SPARK-41135
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>
> The name of `UNSUPPORTED_EMPTY_LOCATION` can be improved with its error 
> message



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41135) Rename UNSUPPORTED_EMPTY_LOCATION to INVALID_EMPTY_LOCATION

2022-11-14 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41135:


Assignee: Apache Spark

> Rename UNSUPPORTED_EMPTY_LOCATION to INVALID_EMPTY_LOCATION
> ---
>
> Key: SPARK-41135
> URL: https://issues.apache.org/jira/browse/SPARK-41135
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Assignee: Apache Spark
>Priority: Major
>
> The name of `UNSUPPORTED_EMPTY_LOCATION` can be improved with its error 
> message



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41135) Rename UNSUPPORTED_EMPTY_LOCATION to INVALID_EMPTY_LOCATION

2022-11-14 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41135:


Assignee: (was: Apache Spark)

> Rename UNSUPPORTED_EMPTY_LOCATION to INVALID_EMPTY_LOCATION
> ---
>
> Key: SPARK-41135
> URL: https://issues.apache.org/jira/browse/SPARK-41135
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>
> The name of `UNSUPPORTED_EMPTY_LOCATION` can be improved with its error 
> message



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41136) Shorten graceful shutdown time of ExecutorPodsSnapshotsStoreImpl to prevent blocking shutdown process

2022-11-14 Thread Cheng Pan (Jira)
Cheng Pan created SPARK-41136:
-

 Summary: Shorten graceful shutdown time of 
ExecutorPodsSnapshotsStoreImpl to prevent blocking shutdown process
 Key: SPARK-41136
 URL: https://issues.apache.org/jira/browse/SPARK-41136
 Project: Spark
  Issue Type: Bug
  Components: Kubernetes
Affects Versions: 3.3.1
Reporter: Cheng Pan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41137) Rename LATERAL_JOIN_OF_TYPE to INVALID_LATERAL_JOIN_TYPE

2022-11-14 Thread Haejoon Lee (Jira)
Haejoon Lee created SPARK-41137:
---

 Summary: Rename LATERAL_JOIN_OF_TYPE to INVALID_LATERAL_JOIN_TYPE
 Key: SPARK-41137
 URL: https://issues.apache.org/jira/browse/SPARK-41137
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.4.0
Reporter: Haejoon Lee


The current error class name is not much make sense to illustrate the situation.

We should fix the error class and and its error message more properly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41137) Rename LATERAL_JOIN_OF_TYPE to INVALID_LATERAL_JOIN_TYPE

2022-11-14 Thread Haejoon Lee (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17633589#comment-17633589
 ] 

Haejoon Lee commented on SPARK-41137:
-

I'm working on it

> Rename LATERAL_JOIN_OF_TYPE to INVALID_LATERAL_JOIN_TYPE
> 
>
> Key: SPARK-41137
> URL: https://issues.apache.org/jira/browse/SPARK-41137
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>
> The current error class name is not much make sense to illustrate the 
> situation.
> We should fix the error class and and its error message more properly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41136) Shorten graceful shutdown time of ExecutorPodsSnapshotsStoreImpl to prevent blocking shutdown process

2022-11-14 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41136:


Assignee: (was: Apache Spark)

> Shorten graceful shutdown time of ExecutorPodsSnapshotsStoreImpl to prevent 
> blocking shutdown process
> -
>
> Key: SPARK-41136
> URL: https://issues.apache.org/jira/browse/SPARK-41136
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.3.1
>Reporter: Cheng Pan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41136) Shorten graceful shutdown time of ExecutorPodsSnapshotsStoreImpl to prevent blocking shutdown process

2022-11-14 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17633595#comment-17633595
 ] 

Apache Spark commented on SPARK-41136:
--

User 'pan3793' has created a pull request for this issue:
https://github.com/apache/spark/pull/38651

> Shorten graceful shutdown time of ExecutorPodsSnapshotsStoreImpl to prevent 
> blocking shutdown process
> -
>
> Key: SPARK-41136
> URL: https://issues.apache.org/jira/browse/SPARK-41136
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.3.1
>Reporter: Cheng Pan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41136) Shorten graceful shutdown time of ExecutorPodsSnapshotsStoreImpl to prevent blocking shutdown process

2022-11-14 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41136:


Assignee: Apache Spark

> Shorten graceful shutdown time of ExecutorPodsSnapshotsStoreImpl to prevent 
> blocking shutdown process
> -
>
> Key: SPARK-41136
> URL: https://issues.apache.org/jira/browse/SPARK-41136
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.3.1
>Reporter: Cheng Pan
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41137) Rename LATERAL_JOIN_OF_TYPE to INVALID_LATERAL_JOIN_TYPE

2022-11-14 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17633598#comment-17633598
 ] 

Apache Spark commented on SPARK-41137:
--

User 'itholic' has created a pull request for this issue:
https://github.com/apache/spark/pull/38652

> Rename LATERAL_JOIN_OF_TYPE to INVALID_LATERAL_JOIN_TYPE
> 
>
> Key: SPARK-41137
> URL: https://issues.apache.org/jira/browse/SPARK-41137
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>
> The current error class name is not much make sense to illustrate the 
> situation.
> We should fix the error class and and its error message more properly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41137) Rename LATERAL_JOIN_OF_TYPE to INVALID_LATERAL_JOIN_TYPE

2022-11-14 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41137:


Assignee: (was: Apache Spark)

> Rename LATERAL_JOIN_OF_TYPE to INVALID_LATERAL_JOIN_TYPE
> 
>
> Key: SPARK-41137
> URL: https://issues.apache.org/jira/browse/SPARK-41137
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>
> The current error class name is not much make sense to illustrate the 
> situation.
> We should fix the error class and and its error message more properly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41137) Rename LATERAL_JOIN_OF_TYPE to INVALID_LATERAL_JOIN_TYPE

2022-11-14 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41137:


Assignee: Apache Spark

> Rename LATERAL_JOIN_OF_TYPE to INVALID_LATERAL_JOIN_TYPE
> 
>
> Key: SPARK-41137
> URL: https://issues.apache.org/jira/browse/SPARK-41137
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Assignee: Apache Spark
>Priority: Major
>
> The current error class name is not much make sense to illustrate the 
> situation.
> We should fix the error class and and its error message more properly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41128) Implement `DataFrame.fillna ` and `DataFrame.na.fill `

2022-11-14 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17633605#comment-17633605
 ] 

Apache Spark commented on SPARK-41128:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/38653

> Implement `DataFrame.fillna ` and `DataFrame.na.fill `
> --
>
> Key: SPARK-41128
> URL: https://issues.apache.org/jira/browse/SPARK-41128
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41128) Implement `DataFrame.fillna ` and `DataFrame.na.fill `

2022-11-14 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41128:


Assignee: Apache Spark  (was: Ruifeng Zheng)

> Implement `DataFrame.fillna ` and `DataFrame.na.fill `
> --
>
> Key: SPARK-41128
> URL: https://issues.apache.org/jira/browse/SPARK-41128
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41128) Implement `DataFrame.fillna ` and `DataFrame.na.fill `

2022-11-14 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41128:


Assignee: Ruifeng Zheng  (was: Apache Spark)

> Implement `DataFrame.fillna ` and `DataFrame.na.fill `
> --
>
> Key: SPARK-41128
> URL: https://issues.apache.org/jira/browse/SPARK-41128
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41128) Implement `DataFrame.fillna ` and `DataFrame.na.fill `

2022-11-14 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17633606#comment-17633606
 ] 

Apache Spark commented on SPARK-41128:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/38653

> Implement `DataFrame.fillna ` and `DataFrame.na.fill `
> --
>
> Key: SPARK-41128
> URL: https://issues.apache.org/jira/browse/SPARK-41128
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41013) spark-3.1.2以cluster模式提交作业报 Could not initialize class com.github.luben.zstd.ZstdOutputStream

2022-11-14 Thread yutiantian (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17633613#comment-17633613
 ] 

yutiantian commented on SPARK-41013:


最终原因:由于配置了 -Djava.io.tmpdir=/data01/spark/tmp,并且/data01/spark/tmp 
在某些节点上并不是777权限,导致cluster模式,driver跑在某些节点上时,解压zstd压缩包到/data01/spark/tmp下没有权限,导致失败,报了如上的错误。

> spark-3.1.2以cluster模式提交作业报 Could not initialize class 
> com.github.luben.zstd.ZstdOutputStream
> 
>
> Key: SPARK-41013
> URL: https://issues.apache.org/jira/browse/SPARK-41013
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.1.2
>Reporter: yutiantian
>Priority: Major
>  Labels: libzstd-jni, spark.shuffle.mapStatus.compression.codec, 
> zstd
>
> 使用spark-3.1.2版本以cluster模式提交作业,报
> Could not initialize class com.github.luben.zstd.ZstdOutputStream。具体日志如下:
> Exception in thread "map-output-dispatcher-0" Exception in thread 
> "map-output-dispatcher-2" java.lang.ExceptionInInitializerError: Cannot 
> unpack libzstd-jni: No such file or directory at 
> java.io.UnixFileSystem.createFileExclusively(Native Method) at 
> java.io.File.createTempFile(File.java:2024) at 
> com.github.luben.zstd.util.Native.load(Native.java:97) at 
> com.github.luben.zstd.util.Native.load(Native.java:55) at 
> com.github.luben.zstd.ZstdOutputStream.(ZstdOutputStream.java:16) at 
> org.apache.spark.io.ZStdCompressionCodec.compressedOutputStream(CompressionCodec.scala:223)
>  at 
> org.apache.spark.MapOutputTracker$.serializeMapStatuses(MapOutputTracker.scala:910)
>  at 
> org.apache.spark.ShuffleStatus.$anonfun$serializedMapStatus$2(MapOutputTracker.scala:233)
>  at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at 
> org.apache.spark.ShuffleStatus.withWriteLock(MapOutputTracker.scala:72) at 
> org.apache.spark.ShuffleStatus.serializedMapStatus(MapOutputTracker.scala:230)
>  at 
> org.apache.spark.MapOutputTrackerMaster$MessageLoop.run(MapOutputTracker.scala:466)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748) Exception in thread 
> "map-output-dispatcher-7" Exception in thread "map-output-dispatcher-5" 
> java.lang.NoClassDefFoundError: Could not initialize class 
> com.github.luben.zstd.ZstdOutputStream at 
> org.apache.spark.io.ZStdCompressionCodec.compressedOutputStream(CompressionCodec.scala:223)
>  at 
> org.apache.spark.MapOutputTracker$.serializeMapStatuses(MapOutputTracker.scala:910)
>  at 
> org.apache.spark.ShuffleStatus.$anonfun$serializedMapStatus$2(MapOutputTracker.scala:233)
>  at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at 
> org.apache.spark.ShuffleStatus.withWriteLock(MapOutputTracker.scala:72) at 
> org.apache.spark.ShuffleStatus.serializedMapStatus(MapOutputTracker.scala:230)
>  at 
> org.apache.spark.MapOutputTrackerMaster$MessageLoop.run(MapOutputTracker.scala:466)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748) Exception in thread 
> "map-output-dispatcher-4" Exception in thread "map-output-dispatcher-3" 
> java.lang.NoClassDefFoundError: Could not initialize class 
> com.github.luben.zstd.ZstdOutputStream at 
> org.apache.spark.io.ZStdCompressionCodec.compressedOutputStream(CompressionCodec.scala:223)
>  at 
> org.apache.spark.MapOutputTracker$.serializeMapStatuses(MapOutputTracker.scala:910)
>  at 
> org.apache.spark.ShuffleStatus.$anonfun$serializedMapStatus$2(MapOutputTracker.scala:233)
>  at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at 
> org.apache.spark.ShuffleStatus.withWriteLock(MapOutputTracker.scala:72) at 
> org.apache.spark.ShuffleStatus.serializedMapStatus(MapOutputTracker.scala:230)
>  at 
> org.apache.spark.MapOutputTrackerMaster$MessageLoop.run(MapOutputTracker.scala:466)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748) java.lang.NoClassDefFoundError: 
> Could not initialize class com.github.luben.zstd.ZstdOutputStream at 
> org.apache.spark.io.ZStdCompressionCodec.compressedOutputStream(CompressionCodec.scala:223)
>  at 
> org.apache.spark.MapOutputTracker$.serializeMapStatuses(MapOutputTracker.scala:910)
>  at 
> org.apache.spark.ShuffleStatus.$anonfun$serializedMapStatus$2(MapOutputTracker.scala:233)
>  at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at 

[jira] [Commented] (SPARK-41005) Arrow based collect

2022-11-14 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17633627#comment-17633627
 ] 

Apache Spark commented on SPARK-41005:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/38654

> Arrow based collect
> ---
>
> Key: SPARK-41005
> URL: https://issues.apache.org/jira/browse/SPARK-41005
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41005) Arrow based collect

2022-11-14 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17633629#comment-17633629
 ] 

Apache Spark commented on SPARK-41005:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/38654

> Arrow based collect
> ---
>
> Key: SPARK-41005
> URL: https://issues.apache.org/jira/browse/SPARK-41005
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41138) DataFrame.na.fill should have the same augment types as DataFrame.fillna

2022-11-14 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-41138:
-

 Summary: DataFrame.na.fill should have the same augment types as 
DataFrame.fillna
 Key: SPARK-41138
 URL: https://issues.apache.org/jira/browse/SPARK-41138
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 3.4.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41138) DataFrame.na.fill should have the same augment types as DataFrame.fillna

2022-11-14 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17633647#comment-17633647
 ] 

Apache Spark commented on SPARK-41138:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/38655

> DataFrame.na.fill should have the same augment types as DataFrame.fillna
> 
>
> Key: SPARK-41138
> URL: https://issues.apache.org/jira/browse/SPARK-41138
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41138) DataFrame.na.fill should have the same augment types as DataFrame.fillna

2022-11-14 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41138:


Assignee: (was: Apache Spark)

> DataFrame.na.fill should have the same augment types as DataFrame.fillna
> 
>
> Key: SPARK-41138
> URL: https://issues.apache.org/jira/browse/SPARK-41138
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41138) DataFrame.na.fill should have the same augment types as DataFrame.fillna

2022-11-14 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41138:


Assignee: Apache Spark

> DataFrame.na.fill should have the same augment types as DataFrame.fillna
> 
>
> Key: SPARK-41138
> URL: https://issues.apache.org/jira/browse/SPARK-41138
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Apache Spark
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40686) Support data masking and redacting built-in functions

2022-11-14 Thread Ranga Reddy (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17633668#comment-17633668
 ] 

Ranga Reddy commented on SPARK-40686:
-

Hi [~dtenedor] 

I am not able to find more references/examples for *phi-related* functions 
except okera website. Could you please share some references so that i will 
start work on those functions? 

> Support data masking and redacting built-in functions
> -
>
> Key: SPARK-40686
> URL: https://issues.apache.org/jira/browse/SPARK-40686
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Vinod KC
>Priority: Minor
>
> Support built-in data masking and redacting functions 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41139) Improve error message for PYTHON_UDF_IN_ON_CLAUSE

2022-11-14 Thread Haejoon Lee (Jira)
Haejoon Lee created SPARK-41139:
---

 Summary: Improve error message for PYTHON_UDF_IN_ON_CLAUSE
 Key: SPARK-41139
 URL: https://issues.apache.org/jira/browse/SPARK-41139
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.4.0
Reporter: Haejoon Lee


The error message of `PYTHON_UDF_IN_ON_CLAUSE` is not clear enough to let user 
understand the solve the problem.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41140) Assign a name to the legacy error class _LEGACY_ERROR_TEMP_2440

2022-11-14 Thread Max Gekk (Jira)
Max Gekk created SPARK-41140:


 Summary: Assign a name to the legacy error class 
_LEGACY_ERROR_TEMP_2440
 Key: SPARK-41140
 URL: https://issues.apache.org/jira/browse/SPARK-41140
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.4.0
Reporter: Max Gekk
Assignee: Max Gekk
 Fix For: 3.4.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41140) Assign a name to the legacy error class _LEGACY_ERROR_TEMP_2440

2022-11-14 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk updated SPARK-41140:
-
Fix Version/s: (was: 3.4.0)

> Assign a name to the legacy error class _LEGACY_ERROR_TEMP_2440
> ---
>
> Key: SPARK-41140
> URL: https://issues.apache.org/jira/browse/SPARK-41140
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41140) Assign a name to the legacy error class _LEGACY_ERROR_TEMP_2440

2022-11-14 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17633691#comment-17633691
 ] 

Apache Spark commented on SPARK-41140:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/38656

> Assign a name to the legacy error class _LEGACY_ERROR_TEMP_2440
> ---
>
> Key: SPARK-41140
> URL: https://issues.apache.org/jira/browse/SPARK-41140
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41140) Assign a name to the legacy error class _LEGACY_ERROR_TEMP_2440

2022-11-14 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41140:


Assignee: Max Gekk  (was: Apache Spark)

> Assign a name to the legacy error class _LEGACY_ERROR_TEMP_2440
> ---
>
> Key: SPARK-41140
> URL: https://issues.apache.org/jira/browse/SPARK-41140
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41140) Assign a name to the legacy error class _LEGACY_ERROR_TEMP_2440

2022-11-14 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41140:


Assignee: Apache Spark  (was: Max Gekk)

> Assign a name to the legacy error class _LEGACY_ERROR_TEMP_2440
> ---
>
> Key: SPARK-41140
> URL: https://issues.apache.org/jira/browse/SPARK-41140
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41139) Improve error message for PYTHON_UDF_IN_ON_CLAUSE

2022-11-14 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17633694#comment-17633694
 ] 

Apache Spark commented on SPARK-41139:
--

User 'itholic' has created a pull request for this issue:
https://github.com/apache/spark/pull/38657

> Improve error message for PYTHON_UDF_IN_ON_CLAUSE
> -
>
> Key: SPARK-41139
> URL: https://issues.apache.org/jira/browse/SPARK-41139
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>
> The error message of `PYTHON_UDF_IN_ON_CLAUSE` is not clear enough to let 
> user understand the solve the problem.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41139) Improve error message for PYTHON_UDF_IN_ON_CLAUSE

2022-11-14 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41139:


Assignee: (was: Apache Spark)

> Improve error message for PYTHON_UDF_IN_ON_CLAUSE
> -
>
> Key: SPARK-41139
> URL: https://issues.apache.org/jira/browse/SPARK-41139
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>
> The error message of `PYTHON_UDF_IN_ON_CLAUSE` is not clear enough to let 
> user understand the solve the problem.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41139) Improve error message for PYTHON_UDF_IN_ON_CLAUSE

2022-11-14 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41139:


Assignee: Apache Spark

> Improve error message for PYTHON_UDF_IN_ON_CLAUSE
> -
>
> Key: SPARK-41139
> URL: https://issues.apache.org/jira/browse/SPARK-41139
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Assignee: Apache Spark
>Priority: Major
>
> The error message of `PYTHON_UDF_IN_ON_CLAUSE` is not clear enough to let 
> user understand the solve the problem.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41109) Rename the error class _LEGACY_ERROR_TEMP_1216 to INVALID_LIKE_PATTERN

2022-11-14 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17633711#comment-17633711
 ] 

Apache Spark commented on SPARK-41109:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/38658

> Rename the error class _LEGACY_ERROR_TEMP_1216 to INVALID_LIKE_PATTERN
> --
>
> Key: SPARK-41109
> URL: https://issues.apache.org/jira/browse/SPARK-41109
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41109) Rename the error class _LEGACY_ERROR_TEMP_1216 to INVALID_LIKE_PATTERN

2022-11-14 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17633712#comment-17633712
 ] 

Apache Spark commented on SPARK-41109:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/38658

> Rename the error class _LEGACY_ERROR_TEMP_1216 to INVALID_LIKE_PATTERN
> --
>
> Key: SPARK-41109
> URL: https://issues.apache.org/jira/browse/SPARK-41109
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41114) Support local data for LocalRelation

2022-11-14 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17633782#comment-17633782
 ] 

Apache Spark commented on SPARK-41114:
--

User 'dengziming' has created a pull request for this issue:
https://github.com/apache/spark/pull/38659

> Support local data for LocalRelation
> 
>
> Key: SPARK-41114
> URL: https://issues.apache.org/jira/browse/SPARK-41114
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Deng Ziming
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41114) Support local data for LocalRelation

2022-11-14 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41114:


Assignee: Apache Spark

> Support local data for LocalRelation
> 
>
> Key: SPARK-41114
> URL: https://issues.apache.org/jira/browse/SPARK-41114
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Deng Ziming
>Assignee: Apache Spark
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41114) Support local data for LocalRelation

2022-11-14 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41114:


Assignee: (was: Apache Spark)

> Support local data for LocalRelation
> 
>
> Key: SPARK-41114
> URL: https://issues.apache.org/jira/browse/SPARK-41114
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Deng Ziming
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39865) Show proper error messages on the overflow errors of table insert

2022-11-14 Thread Catalin Toda (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17633915#comment-17633915
 ] 

Catalin Toda commented on SPARK-39865:
--

[~Gengliang.Wang] In our environment in Spark 3.3 we see the following 
stacktrace caused by https://github.com/apache/spark/pull/37311 


{code:java}
java.lang.ClassCastException: 
org.apache.spark.sql.catalyst.expressions.CaseWhen cannot be cast to 
org.apache.spark.sql.catalyst.expressions.AnsiCast
at 
org.apache.spark.sql.catalyst.expressions.CheckOverflowInTableInsert.withNewChildInternal(Cast.scala:2362)
at 
org.apache.spark.sql.catalyst.expressions.CheckOverflowInTableInsert.withNewChildInternal(Cast.scala:2360)
at 
org.apache.spark.sql.catalyst.trees.UnaryLike.$anonfun$mapChildren$5(TreeNode.scala:1233)
at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:176)
at 
org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1232)
at 
org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1227)
at 
org.apache.spark.sql.catalyst.expressions.UnaryExpression.mapChildren(Expression.scala:498)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:635)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$1(TreeNode.scala:635)
at 
org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1228)
at 
org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1227)
at 
org.apache.spark.sql.catalyst.expressions.UnaryExpression.mapChildren(Expression.scala:498)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:635)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformExpressionsUpWithPruning$1(QueryPlan.scala:188)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:200)
at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:176)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:200)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:211)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$3(QueryPlan.scala:216)
at 
scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
at scala.collection.immutable.List.foreach(List.scala:431)
at scala.collection.TraversableLike.map(TraversableLike.scala:286)
at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
at scala.collection.immutable.List.map(List.scala:305)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:216)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$4(QueryPlan.scala:221)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:427)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:221)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUpWithPruning(QueryPlan.scala:188)
at 
org.apache.spark.sql.catalyst.optimizer.PushFoldableIntoBranches$$anonfun$apply$15.applyOrElse(expressions.scala:645)
at 
org.apache.spark.sql.catalyst.optimizer.PushFoldableIntoBranches$$anonfun$apply$15.applyOrElse(expressions.scala:643)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:584)
at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:176)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:584)
at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$3(TreeNode.scala:589)
at 
org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1228)
at 
org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1227)
at 
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.mapChi

[jira] [Commented] (SPARK-41073) Spark ThriftServer generate huge amounts of DelegationToken

2022-11-14 Thread Erik Krogen (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17633936#comment-17633936
 ] 

Erik Krogen commented on SPARK-41073:
-

I'm not very familiar with this area, but it seems you are trying to solve the 
same problem that is already discussed in SPARK-36328

> Spark ThriftServer generate huge amounts of DelegationToken
> ---
>
> Key: SPARK-41073
> URL: https://issues.apache.org/jira/browse/SPARK-41073
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1
>Reporter: zhengchenyu
>Priority: Major
> Attachments: SPARK-41073.proposal.A.draft.001.patch
>
>
> In our cluster, zookeeper nearly crashed. I found the znodes of 
> /zkdtsm/ZKDTSMRoot/ZKDTSMTokensRoot increased quickly. 
> After some research, I found some sql running on spark-thriftserver obtain 
> huge amounts of DelegationToken.
> The reason is that in these spark-sql, every hive parition acquire a 
> different delegation token. 
> And HadoopRDDs in thriftserver can't share credentials from 
> CoarseGrainedSchedulerBackend::delegationTokens, we must share it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39601) AllocationFailure should not be treated as exitCausedByApp when driver is shutting down

2022-11-14 Thread Cheng Pan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Pan updated SPARK-39601:
--
Description: 
I observed some Spark Applications successfully completed all jobs but failed 
during the shutting down phase w/ reason: Max number of executor failures (16) 
reached, the timeline is

Driver - Job success, Spark starts shutting down procedure.
```
2022-06-23 19:50:55 CST AbstractConnector INFO - Stopped 
Spark@74e9431b\{HTTP/1.1, (http/1.1)}{0.0.0.0:0}
2022-06-23 19:50:55 CST SparkUI INFO - Stopped Spark web UI at 
http://hadoop2627.xxx.org:28446
2022-06-23 19:50:55 CST YarnClusterSchedulerBackend INFO - Shutting down all 
executors
```

Driver - A container allocate successful during shutting down phase.
```
2022-06-23 19:52:21 CST YarnAllocator INFO - Launching container 
container_e94_1649986670278_7743380_02_25 on host hadoop4388.xxx.org for 
executor with ID 24 for ResourceProfile Id 0
```

Executor - The executor can not connect to driver endpoint because driver 
already stopped the endpoint.
```
Exception in thread "main" java.lang.reflect.UndeclaredThrowableException
  at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1911)
  at 
org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:61)
  at 
org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:393)
  at 
org.apache.spark.executor.YarnCoarseGrainedExecutorBackend$.main(YarnCoarseGrainedExecutorBackend.scala:81)
  at 
org.apache.spark.executor.YarnCoarseGrainedExecutorBackend.main(YarnCoarseGrainedExecutorBackend.scala)
Caused by: org.apache.spark.SparkException: Exception thrown in awaitResult: 
  at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:301)
  at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
  at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:101)
  at 
org.apache.spark.executor.CoarseGrainedExecutorBackend$.$anonfun$run$9(CoarseGrainedExecutorBackend.scala:413)
  at scala.runtime.java8.JFunction1$mcVI$sp.apply(JFunction1$mcVI$sp.java:23)
  at 
scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:877)
  at scala.collection.immutable.Range.foreach(Range.scala:158)
  at 
scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:876)
  at 
org.apache.spark.executor.CoarseGrainedExecutorBackend$.$anonfun$run$7(CoarseGrainedExecutorBackend.scala:411)
  at 
org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:62)
  at 
org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:61)
  at java.security.AccessController.doPrivileged(Native Method)
  at javax.security.auth.Subject.doAs(Subject.java:422)
  at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893)
  ... 4 more
Caused by: org.apache.spark.rpc.RpcEndpointNotFoundException: Cannot find 
endpoint: spark://coarsegrainedschedu...@hadoop2627.xxx.org:21956
  at 
org.apache.spark.rpc.netty.NettyRpcEnv.$anonfun$asyncSetupEndpointRefByURI$1(NettyRpcEnv.scala:148)
  at 
org.apache.spark.rpc.netty.NettyRpcEnv.$anonfun$asyncSetupEndpointRefByURI$1$adapted(NettyRpcEnv.scala:144)
  at scala.concurrent.Future.$anonfun$flatMap$1(Future.scala:307)
  at scala.concurrent.impl.Promise.$anonfun$transformWith$1(Promise.scala:41)
  at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)
  at org.apache.spark.util.ThreadUtils$$anon$1.execute(ThreadUtils.scala:99)
  at 
scala.concurrent.impl.ExecutionContextImpl$$anon$4.execute(ExecutionContextImpl.scala:138)
  at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:72)
  at 
scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1(Promise.scala:288)
  at 
scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1$adapted(Promise.scala:288)
  at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:288)
```

Driver - YarnAllocator received container launch error message and treat it as 
`exitCausedByApp`
```
2022-06-23 19:52:27 CST YarnAllocator INFO - Completed container 
container_e94_1649986670278_7743380_02_25 on host: hadoop4388.xxx.org 
(state: COMPLETE, exit status: 1)
2022-06-23 19:52:27 CST YarnAllocator WARN - Container from a bad node: 
container_e94_1649986670278_7743380_02_25 on host: hadoop4388.xxx.org. Exit 
status: 1. Diagnostics: [2022-06-23 19:52:24.932]Exception from 
container-launch.
Container id: container_e94_1649986670278_7743380_02_25
Exit code: 1
Shell output: main : command provided 1
main : run as user is bdms_pm
main : requested yarn user is bdms_pm
Getting exit code file...
Creating script paths...
Writing pid file...
Writing to tmp file 
/mnt/dfs/2/yarn/local/nmPrivate/application_1649986670278_7743380/container_e94_1649986670278_7743380_02_25/container_e94_1649986670278_7743380_02_

[jira] [Updated] (SPARK-39601) AllocationFailure should not be treated as exitCausedByApp when driver is shutting down

2022-11-14 Thread Cheng Pan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Pan updated SPARK-39601:
--
Description: 
I observed some Spark Applications successfully completed all jobs but failed 
during the shutting down phase w/ reason: Max number of executor failures (16) 
reached, the timeline is

Driver - Job success, Spark starts shutting down procedure.
{code:java}
2022-06-23 19:50:55 CST AbstractConnector INFO - Stopped 
Spark@74e9431b{HTTP/1.1, (http/1.1)}
{0.0.0.0:0}
2022-06-23 19:50:55 CST SparkUI INFO - Stopped Spark web UI at 
http://hadoop2627.xxx.org:28446
2022-06-23 19:50:55 CST YarnClusterSchedulerBackend INFO - Shutting down all 
executors
{code}
Driver - A container allocate successful during shutting down phase.
{code:java}
2022-06-23 19:52:21 CST YarnAllocator INFO - Launching container 
container_e94_1649986670278_7743380_02_25 on host hadoop4388.xxx.org for 
executor with ID 24 for ResourceProfile Id 0{code}
Executor - The executor can not connect to driver endpoint because driver 
already stopped the endpoint.
{code:java}
Exception in thread "main" java.lang.reflect.UndeclaredThrowableException
  at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1911)
  at 
org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:61)
  at 
org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:393)
  at 
org.apache.spark.executor.YarnCoarseGrainedExecutorBackend$.main(YarnCoarseGrainedExecutorBackend.scala:81)
  at 
org.apache.spark.executor.YarnCoarseGrainedExecutorBackend.main(YarnCoarseGrainedExecutorBackend.scala)
Caused by: org.apache.spark.SparkException: Exception thrown in awaitResult: 
  at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:301)
  at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
  at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:101)
  at 
org.apache.spark.executor.CoarseGrainedExecutorBackend$.$anonfun$run$9(CoarseGrainedExecutorBackend.scala:413)
  at scala.runtime.java8.JFunction1$mcVI$sp.apply(JFunction1$mcVI$sp.java:23)
  at 
scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:877)
  at scala.collection.immutable.Range.foreach(Range.scala:158)
  at 
scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:876)
  at 
org.apache.spark.executor.CoarseGrainedExecutorBackend$.$anonfun$run$7(CoarseGrainedExecutorBackend.scala:411)
  at 
org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:62)
  at 
org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:61)
  at java.security.AccessController.doPrivileged(Native Method)
  at javax.security.auth.Subject.doAs(Subject.java:422)
  at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893)
  ... 4 more
Caused by: org.apache.spark.rpc.RpcEndpointNotFoundException: Cannot find 
endpoint: spark://coarsegrainedschedu...@hadoop2627.xxx.org:21956
  at 
org.apache.spark.rpc.netty.NettyRpcEnv.$anonfun$asyncSetupEndpointRefByURI$1(NettyRpcEnv.scala:148)
  at 
org.apache.spark.rpc.netty.NettyRpcEnv.$anonfun$asyncSetupEndpointRefByURI$1$adapted(NettyRpcEnv.scala:144)
  at scala.concurrent.Future.$anonfun$flatMap$1(Future.scala:307)
  at scala.concurrent.impl.Promise.$anonfun$transformWith$1(Promise.scala:41)
  at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)
  at org.apache.spark.util.ThreadUtils$$anon$1.execute(ThreadUtils.scala:99)
  at 
scala.concurrent.impl.ExecutionContextImpl$$anon$4.execute(ExecutionContextImpl.scala:138)
  at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:72)
  at 
scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1(Promise.scala:288)
  at 
scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1$adapted(Promise.scala:288)
  at 
scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:288){code}
Driver - YarnAllocator received container launch error message and treat it as 
`exitCausedByApp`
{code:java}
2022-06-23 19:52:27 CST YarnAllocator INFO - Completed container 
container_e94_1649986670278_7743380_02_25 on host: hadoop4388.xxx.org 
(state: COMPLETE, exit status: 1)
2022-06-23 19:52:27 CST YarnAllocator WARN - Container from a bad node: 
container_e94_1649986670278_7743380_02_25 on host: hadoop4388.xxx.org. Exit 
status: 1. Diagnostics: [2022-06-23 19:52:24.932]Exception from 
container-launch.
Container id: container_e94_1649986670278_7743380_02_25
Exit code: 1
Shell output: main : command provided 1
main : run as user is bdms_pm
main : requested yarn user is bdms_pm
Getting exit code file...
Creating script paths...
Writing pid file...
Writing to tmp file 
/mnt/dfs/2/yarn/local/nmPrivate/application_1649986670278_7743380/container_e94_1649986670278_7743380_02_25/co

[jira] [Updated] (SPARK-39601) AllocationFailure should not be treated as exitCausedByApp when driver is shutting down

2022-11-14 Thread Cheng Pan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Pan updated SPARK-39601:
--
Description: 
I observed some Spark Applications successfully completed all jobs but failed 
during the shutting down phase w/ reason: Max number of executor failures (16) 
reached, the timeline is

Driver - Job success, Spark starts shutting down procedure.
{code:java}
2022-06-23 19:50:55 CST AbstractConnector INFO - Stopped 
Spark@74e9431b{HTTP/1.1, (http/1.1)}
{0.0.0.0:0}
2022-06-23 19:50:55 CST SparkUI INFO - Stopped Spark web UI at 
http://hadoop2627.xxx.org:28446
2022-06-23 19:50:55 CST YarnClusterSchedulerBackend INFO - Shutting down all 
executors
{code}
Driver - A container allocate successful during shutting down phase.
{code:java}
2022-06-23 19:52:21 CST YarnAllocator INFO - Launching container 
container_e94_1649986670278_7743380_02_25 on host hadoop4388.xxx.org for 
executor with ID 24 for ResourceProfile Id 0{code}
Executor - The executor can not connect to driver endpoint because driver 
already stopped the endpoint.
{code:java}
Exception in thread "main" java.lang.reflect.UndeclaredThrowableException
  at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1911)
  at 
org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:61)
  at 
org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:393)
  at 
org.apache.spark.executor.YarnCoarseGrainedExecutorBackend$.main(YarnCoarseGrainedExecutorBackend.scala:81)
  at 
org.apache.spark.executor.YarnCoarseGrainedExecutorBackend.main(YarnCoarseGrainedExecutorBackend.scala)
Caused by: org.apache.spark.SparkException: Exception thrown in awaitResult: 
  at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:301)
  at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
  at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:101)
  at 
org.apache.spark.executor.CoarseGrainedExecutorBackend$.$anonfun$run$9(CoarseGrainedExecutorBackend.scala:413)
  at scala.runtime.java8.JFunction1$mcVI$sp.apply(JFunction1$mcVI$sp.java:23)
  at 
scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:877)
  at scala.collection.immutable.Range.foreach(Range.scala:158)
  at 
scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:876)
  at 
org.apache.spark.executor.CoarseGrainedExecutorBackend$.$anonfun$run$7(CoarseGrainedExecutorBackend.scala:411)
  at 
org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:62)
  at 
org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:61)
  at java.security.AccessController.doPrivileged(Native Method)
  at javax.security.auth.Subject.doAs(Subject.java:422)
  at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893)
  ... 4 more
Caused by: org.apache.spark.rpc.RpcEndpointNotFoundException: Cannot find 
endpoint: spark://coarsegrainedschedu...@hadoop2627.xxx.org:21956
  at 
org.apache.spark.rpc.netty.NettyRpcEnv.$anonfun$asyncSetupEndpointRefByURI$1(NettyRpcEnv.scala:148)
  at 
org.apache.spark.rpc.netty.NettyRpcEnv.$anonfun$asyncSetupEndpointRefByURI$1$adapted(NettyRpcEnv.scala:144)
  at scala.concurrent.Future.$anonfun$flatMap$1(Future.scala:307)
  at scala.concurrent.impl.Promise.$anonfun$transformWith$1(Promise.scala:41)
  at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)
  at org.apache.spark.util.ThreadUtils$$anon$1.execute(ThreadUtils.scala:99)
  at 
scala.concurrent.impl.ExecutionContextImpl$$anon$4.execute(ExecutionContextImpl.scala:138)
  at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:72)
  at 
scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1(Promise.scala:288)
  at 
scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1$adapted(Promise.scala:288)
  at 
scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:288){code}
Driver - YarnAllocator received container launch error message and treat it as 
`exitCausedByApp`
{code:java}
2022-06-23 19:52:27 CST YarnAllocator INFO - Completed container 
container_e94_1649986670278_7743380_02_25 on host: hadoop4388.xxx.org 
(state: COMPLETE, exit status: 1)
2022-06-23 19:52:27 CST YarnAllocator WARN - Container from a bad node: 
container_e94_1649986670278_7743380_02_25 on host: hadoop4388.xxx.org. Exit 
status: 1. Diagnostics: [2022-06-23 19:52:24.932]Exception from 
container-launch.
Container id: container_e94_1649986670278_7743380_02_25
Exit code: 1
Shell output: main : command provided 1
main : run as user is bdms_pm
main : requested yarn user is bdms_pm
Getting exit code file...
Creating script paths...
Writing pid file...
Writing to tmp file 
/mnt/dfs/2/yarn/local/nmPrivate/application_1649986670278_7743380/container_e94_1649986670278_7743380_02_25/co

[jira] [Commented] (SPARK-40686) Support data masking and redacting built-in functions

2022-11-14 Thread Daniel (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17633978#comment-17633978
 ] 

Daniel commented on SPARK-40686:


Hi [~rangareddy.av...@gmail.com] [~vinodkc] for the phi-related functions, e.g. 
phi_date: "PHI" stands for "protected health information." The idea would be to 
support redacting dates, etc. according to these HIPAA rules: 

[https://www.hhs.gov/hipaa/for-professionals/privacy/special-topics/de-identification/index.html]

While useful, probably the more basic functions like "mask_default" and "null" 
would be higher priority initially. (If you agree, please post if you feel 
differently.)

> Support data masking and redacting built-in functions
> -
>
> Key: SPARK-40686
> URL: https://issues.apache.org/jira/browse/SPARK-40686
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Vinod KC
>Priority: Minor
>
> Support built-in data masking and redacting functions 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41096) Support reading parquet FIXED_LEN_BYTE_ARRAY type

2022-11-14 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun reassigned SPARK-41096:


Assignee: Kazuyuki Tanimura

> Support reading parquet FIXED_LEN_BYTE_ARRAY type
> -
>
> Key: SPARK-41096
> URL: https://issues.apache.org/jira/browse/SPARK-41096
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Kazuyuki Tanimura
>Assignee: Kazuyuki Tanimura
>Priority: Major
>
> Parquet has FIXED_LEN_BYTE_ARRAY (FLBA) data type. However, Spark Parquet 
> reader currently cannot handle it.
> Read it as BinaryType in Spark.
> Iceberg Parquet reader, for example, can handle FLBA. This improvement should 
> reduce the gap between Spark and Iceberg Parquet reader.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-41096) Support reading parquet FIXED_LEN_BYTE_ARRAY type

2022-11-14 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun resolved SPARK-41096.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 38628
[https://github.com/apache/spark/pull/38628]

> Support reading parquet FIXED_LEN_BYTE_ARRAY type
> -
>
> Key: SPARK-41096
> URL: https://issues.apache.org/jira/browse/SPARK-41096
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Kazuyuki Tanimura
>Assignee: Kazuyuki Tanimura
>Priority: Major
> Fix For: 3.4.0
>
>
> Parquet has FIXED_LEN_BYTE_ARRAY (FLBA) data type. However, Spark Parquet 
> reader currently cannot handle it.
> Read it as BinaryType in Spark.
> Iceberg Parquet reader, for example, can handle FLBA. This improvement should 
> reduce the gap between Spark and Iceberg Parquet reader.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-39865) Show proper error messages on the overflow errors of table insert

2022-11-14 Thread Gengliang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17634062#comment-17634062
 ] 

Gengliang Wang edited comment on SPARK-39865 at 11/14/22 10:05 PM:
---

[~catalinii] Could you provide full reproduce steps? Thanks in advance!


was (Author: gengliang.wang):
[~catalinii] Could you provide full reproduce steps? Thanks in advance!.

> Show proper error messages on the overflow errors of table insert
> -
>
> Key: SPARK-39865
> URL: https://issues.apache.org/jira/browse/SPARK-39865
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0, 3.4.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
> Fix For: 3.3.1
>
>
> In Spark 3.3, the error message of ANSI CAST is improved. However, the table 
> insertion is using the same CAST expression:
> {code:java}
> > create table tiny(i tinyint);
> > insert into tiny values (1000);
> org.apache.spark.SparkArithmeticException[CAST_OVERFLOW]: The value 1000 of 
> the type "INT" cannot be cast to "TINYINT" due to an overflow. Use `try_cast` 
> to tolerate overflow and return NULL instead. If necessary set 
> "spark.sql.ansi.enabled" to "false" to bypass this error.
> {code}
>  
> Showing the hint of `If necessary set "spark.sql.ansi.enabled" to "false" to 
> bypass this error` doesn't help at all. This PR is to fix the error message. 
> After changes, the error message of this example will become:
> {code:java}
> org.apache.spark.SparkArithmeticException: [CAST_OVERFLOW_IN_TABLE_INSERT] 
> Fail to insert a value of "INT" type into the "TINYINT" type column `i` due 
> to an overflow. Use `try_cast` on the input value to tolerate overflow and 
> return NULL instead.{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39865) Show proper error messages on the overflow errors of table insert

2022-11-14 Thread Gengliang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17634062#comment-17634062
 ] 

Gengliang Wang commented on SPARK-39865:


[~catalinii] Could you provide full reproduce steps? Thanks in advance!.

> Show proper error messages on the overflow errors of table insert
> -
>
> Key: SPARK-39865
> URL: https://issues.apache.org/jira/browse/SPARK-39865
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0, 3.4.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
> Fix For: 3.3.1
>
>
> In Spark 3.3, the error message of ANSI CAST is improved. However, the table 
> insertion is using the same CAST expression:
> {code:java}
> > create table tiny(i tinyint);
> > insert into tiny values (1000);
> org.apache.spark.SparkArithmeticException[CAST_OVERFLOW]: The value 1000 of 
> the type "INT" cannot be cast to "TINYINT" due to an overflow. Use `try_cast` 
> to tolerate overflow and return NULL instead. If necessary set 
> "spark.sql.ansi.enabled" to "false" to bypass this error.
> {code}
>  
> Showing the hint of `If necessary set "spark.sql.ansi.enabled" to "false" to 
> bypass this error` doesn't help at all. This PR is to fix the error message. 
> After changes, the error message of this example will become:
> {code:java}
> org.apache.spark.SparkArithmeticException: [CAST_OVERFLOW_IN_TABLE_INSERT] 
> Fail to insert a value of "INT" type into the "TINYINT" type column `i` due 
> to an overflow. Use `try_cast` on the input value to tolerate overflow and 
> return NULL instead.{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40199) Spark throws NPE without useful message when NULL value appears in non-null schema

2022-11-14 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17634084#comment-17634084
 ] 

Apache Spark commented on SPARK-40199:
--

User 'xkrogen' has created a pull request for this issue:
https://github.com/apache/spark/pull/38660

> Spark throws NPE without useful message when NULL value appears in non-null 
> schema
> --
>
> Key: SPARK-40199
> URL: https://issues.apache.org/jira/browse/SPARK-40199
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.2
>Reporter: Erik Krogen
>Priority: Major
>
> Currently in some cases, if Spark encounters a NULL value where the schema 
> indicates that the column/field should be non-null, it will throw a 
> {{NullPointerException}} with no message and thus no way to debug further. 
> This can happen, for example, if you have a UDF which is erroneously marked 
> as {{asNonNullable()}}, or if you read input data where the actual values 
> don't match the schema (which could happen e.g. with Avro if the reader 
> provides a schema declaring non-null although the data was written with null 
> values).
> As an example of how to reproduce:
> {code:scala}
> val badUDF = spark.udf.register[String, Int]("bad_udf", in => 
> null).asNonNullable()
> Seq(1, 2).toDF("c1").select(badUDF($"c1")).collect()
> {code}
> This throws an exception like:
> {code}
> Driver stacktrace:
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in 
> stage 0.0 failed 1 times, most recent failure: Lost task 1.0 in stage 0.0 
> (TID 1) (xx executor driver): java.lang.NullPointerException
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeWriter.write(UnsafeWriter.java:110)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760)
>   at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:364)
>   at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:890)
>   at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:890)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:92)
>   at org.apache.spark.scheduler.Task.run(Task.scala:139)
>   at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
>   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1490)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
> As a user, it is very confusing -- it looks like there is a bug in Spark. We 
> have had many users report such problems, and though we can guide them to a 
> schema-data mismatch, there is no indication of what field might contain the 
> bad values, so a laborious data exploration process is required to find and 
> remedy it.
> We should provide a better error message in such cases.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41141) avoid introducing a new aggregate expression in the analysis phase when subquery is referencing it

2022-11-14 Thread Asif (Jira)
Asif created SPARK-41141:


 Summary: avoid introducing a new aggregate expression in the 
analysis phase when subquery is referencing it
 Key: SPARK-41141
 URL: https://issues.apache.org/jira/browse/SPARK-41141
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.3.1
Reporter: Asif


Currently the  analyzer phase rules on subquery referencing the aggregate 
expression in outer query, avoids introducing a new aggregate only for a single 
level aggregate function. It introduces new aggregate expression for nested 
aggregate functions.

It is possible to avoid  adding this extra aggregate expression  easily, 
atleast if the outer projection involving aggregate function is exactly same as 
the one that is used in subquery, or if the outer query's projection involving 
aggregate function is a subtree of the subquery's expression.

 

Thus consider the following 2 cases:

1) select  cos (sum(a)) , b from t1  group by b having exists (select x from t2 
where y = cos(sum(a)) )

2) select  sum(a) , b from t1  group by b having exists (select x from t2 where 
y = cos(sum(a)) )

 

In both the above cases, there is no need for adding extra aggregate expression.

 

I am also investigating if its possible to avoid if the case is 

 

3) select  Cos(sum(a)) , b from t1  group by b having exists (select x from t2 
where y = sum(a) )

 

This Jira also is needed for another issue where subquery datasource v2  is 
projecting columns which are not needed. ( no Jira filed yet for that, will do 
that..)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40199) Spark throws NPE without useful message when NULL value appears in non-null schema

2022-11-14 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17634085#comment-17634085
 ] 

Apache Spark commented on SPARK-40199:
--

User 'xkrogen' has created a pull request for this issue:
https://github.com/apache/spark/pull/38660

> Spark throws NPE without useful message when NULL value appears in non-null 
> schema
> --
>
> Key: SPARK-40199
> URL: https://issues.apache.org/jira/browse/SPARK-40199
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.2
>Reporter: Erik Krogen
>Priority: Major
>
> Currently in some cases, if Spark encounters a NULL value where the schema 
> indicates that the column/field should be non-null, it will throw a 
> {{NullPointerException}} with no message and thus no way to debug further. 
> This can happen, for example, if you have a UDF which is erroneously marked 
> as {{asNonNullable()}}, or if you read input data where the actual values 
> don't match the schema (which could happen e.g. with Avro if the reader 
> provides a schema declaring non-null although the data was written with null 
> values).
> As an example of how to reproduce:
> {code:scala}
> val badUDF = spark.udf.register[String, Int]("bad_udf", in => 
> null).asNonNullable()
> Seq(1, 2).toDF("c1").select(badUDF($"c1")).collect()
> {code}
> This throws an exception like:
> {code}
> Driver stacktrace:
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in 
> stage 0.0 failed 1 times, most recent failure: Lost task 1.0 in stage 0.0 
> (TID 1) (xx executor driver): java.lang.NullPointerException
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeWriter.write(UnsafeWriter.java:110)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760)
>   at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:364)
>   at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:890)
>   at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:890)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:92)
>   at org.apache.spark.scheduler.Task.run(Task.scala:139)
>   at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
>   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1490)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
> As a user, it is very confusing -- it looks like there is a bug in Spark. We 
> have had many users report such problems, and though we can guide them to a 
> schema-data mismatch, there is no indication of what field might contain the 
> bad values, so a laborious data exploration process is required to find and 
> remedy it.
> We should provide a better error message in such cases.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41141) avoid introducing a new aggregate expression in the analysis phase when subquery is referencing it

2022-11-14 Thread Asif (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Asif updated SPARK-41141:
-
Description: 
Currently the  analyzer phase rules on subquery referencing the aggregate 
expression in outer query, avoids introducing a new aggregate only for a single 
level aggregate function. It introduces new aggregate expression for nested 
aggregate functions.

It is possible to avoid  adding this extra aggregate expression  easily, 
atleast if the outer projection involving aggregate function is exactly same as 
the one that is used in subquery, or if the outer query's projection involving 
aggregate function is a subtree of the subquery's expression.

 

Thus consider the following 2 cases:

1) select  cos (sum(a)) , b from t1  group by b having exists (select x from t2 
where y = cos(sum(a)) )

2) select  sum(a) , b from t1  group by b having exists (select x from t2 where 
y = cos(sum(a)) )

 

In both the above cases, there is no need for adding extra aggregate expression.

 

I am also investigating if its possible to avoid if the case is 

 

3) select  Cos(sum(a)) , b from t1  group by b having exists (select x from t2 
where y = sum(a) )

 

This Jira also is needed for another issue where subquery datasource v2  is 
projecting columns which are not needed. ( no Jira filed yet for that, will do 
that..)

 

Will be opening a PR for this soon..

  was:
Currently the  analyzer phase rules on subquery referencing the aggregate 
expression in outer query, avoids introducing a new aggregate only for a single 
level aggregate function. It introduces new aggregate expression for nested 
aggregate functions.

It is possible to avoid  adding this extra aggregate expression  easily, 
atleast if the outer projection involving aggregate function is exactly same as 
the one that is used in subquery, or if the outer query's projection involving 
aggregate function is a subtree of the subquery's expression.

 

Thus consider the following 2 cases:

1) select  cos (sum(a)) , b from t1  group by b having exists (select x from t2 
where y = cos(sum(a)) )

2) select  sum(a) , b from t1  group by b having exists (select x from t2 where 
y = cos(sum(a)) )

 

In both the above cases, there is no need for adding extra aggregate expression.

 

I am also investigating if its possible to avoid if the case is 

 

3) select  Cos(sum(a)) , b from t1  group by b having exists (select x from t2 
where y = sum(a) )

 

This Jira also is needed for another issue where subquery datasource v2  is 
projecting columns which are not needed. ( no Jira filed yet for that, will do 
that..)


> avoid introducing a new aggregate expression in the analysis phase when 
> subquery is referencing it
> --
>
> Key: SPARK-41141
> URL: https://issues.apache.org/jira/browse/SPARK-41141
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.1
>Reporter: Asif
>Priority: Major
>  Labels: spark-sql
>
> Currently the  analyzer phase rules on subquery referencing the aggregate 
> expression in outer query, avoids introducing a new aggregate only for a 
> single level aggregate function. It introduces new aggregate expression for 
> nested aggregate functions.
> It is possible to avoid  adding this extra aggregate expression  easily, 
> atleast if the outer projection involving aggregate function is exactly same 
> as the one that is used in subquery, or if the outer query's projection 
> involving aggregate function is a subtree of the subquery's expression.
>  
> Thus consider the following 2 cases:
> 1) select  cos (sum(a)) , b from t1  group by b having exists (select x from 
> t2 where y = cos(sum(a)) )
> 2) select  sum(a) , b from t1  group by b having exists (select x from t2 
> where y = cos(sum(a)) )
>  
> In both the above cases, there is no need for adding extra aggregate 
> expression.
>  
> I am also investigating if its possible to avoid if the case is 
>  
> 3) select  Cos(sum(a)) , b from t1  group by b having exists (select x from 
> t2 where y = sum(a) )
>  
> This Jira also is needed for another issue where subquery datasource v2  is 
> projecting columns which are not needed. ( no Jira filed yet for that, will 
> do that..)
>  
> Will be opening a PR for this soon..



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36950) Normalize semi-structured data into a flat table.

2022-11-14 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17634097#comment-17634097
 ] 

Hyukjin Kwon commented on SPARK-36950:
--

[~bjornjorgensen] are you interested in submitting a PR?

> Normalize semi-structured data into a flat table.
> -
>
> Key: SPARK-36950
> URL: https://issues.apache.org/jira/browse/SPARK-36950
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Bjørn Jørgensen
>Priority: Major
>
> Hi, in pandas there is this json_normalize that flat out nested data.
> https://github.com/pandas-dev/pandas/blob/v1.3.3/pandas/io/json/_normalize.py#L112-L353
>  
> I have opened a request for this function at koalas. Now there are more 
> people that will have some function over to pyspark.
> https://github.com/databricks/koalas/issues/2162
> This is also a function that geopandas are using. In the meantime I have 
> found a gist that has code that flattens out the whole dataframe.
> https://gist.github.com/nmukerje/e65cde41be85470e4b8dfd9a2d6aed50 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-41073) Spark ThriftServer generate huge amounts of DelegationToken

2022-11-14 Thread zhengchenyu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengchenyu resolved SPARK-41073.
-
Resolution: Duplicate

> Spark ThriftServer generate huge amounts of DelegationToken
> ---
>
> Key: SPARK-41073
> URL: https://issues.apache.org/jira/browse/SPARK-41073
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1
>Reporter: zhengchenyu
>Priority: Major
> Attachments: SPARK-41073.proposal.A.draft.001.patch
>
>
> In our cluster, zookeeper nearly crashed. I found the znodes of 
> /zkdtsm/ZKDTSMRoot/ZKDTSMTokensRoot increased quickly. 
> After some research, I found some sql running on spark-thriftserver obtain 
> huge amounts of DelegationToken.
> The reason is that in these spark-sql, every hive parition acquire a 
> different delegation token. 
> And HadoopRDDs in thriftserver can't share credentials from 
> CoarseGrainedSchedulerBackend::delegationTokens, we must share it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41073) Spark ThriftServer generate huge amounts of DelegationToken

2022-11-14 Thread zhengchenyu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17634119#comment-17634119
 ] 

zhengchenyu commented on SPARK-41073:
-

[~xkrogen] Thank you very much for reply. This issue is duplicate with 
SPARK-36328 indeed. I will close it.

> Spark ThriftServer generate huge amounts of DelegationToken
> ---
>
> Key: SPARK-41073
> URL: https://issues.apache.org/jira/browse/SPARK-41073
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1
>Reporter: zhengchenyu
>Priority: Major
> Attachments: SPARK-41073.proposal.A.draft.001.patch
>
>
> In our cluster, zookeeper nearly crashed. I found the znodes of 
> /zkdtsm/ZKDTSMRoot/ZKDTSMTokensRoot increased quickly. 
> After some research, I found some sql running on spark-thriftserver obtain 
> huge amounts of DelegationToken.
> The reason is that in these spark-sql, every hive parition acquire a 
> different delegation token. 
> And HadoopRDDs in thriftserver can't share credentials from 
> CoarseGrainedSchedulerBackend::delegationTokens, we must share it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41085) Support Bit manipulation function COUNTSET

2022-11-14 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41085:


Assignee: Apache Spark

> Support Bit manipulation function COUNTSET
> --
>
> Key: SPARK-41085
> URL: https://issues.apache.org/jira/browse/SPARK-41085
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.4
>Reporter: Vinod KC
>Assignee: Apache Spark
>Priority: Minor
>
> Support Bit manipulation function COUNTSET
> The function shall return the number of 1 bits in the specified integer 
> value. If the optional second argument is set to zero, it shall return the 
> number of 0 bits instead.
> COUNTSET(integer_type a [, INT zero_or_one])



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41085) Support Bit manipulation function COUNTSET

2022-11-14 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41085:


Assignee: (was: Apache Spark)

> Support Bit manipulation function COUNTSET
> --
>
> Key: SPARK-41085
> URL: https://issues.apache.org/jira/browse/SPARK-41085
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.4
>Reporter: Vinod KC
>Priority: Minor
>
> Support Bit manipulation function COUNTSET
> The function shall return the number of 1 bits in the specified integer 
> value. If the optional second argument is set to zero, it shall return the 
> number of 0 bits instead.
> COUNTSET(integer_type a [, INT zero_or_one])



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41085) Support Bit manipulation function COUNTSET

2022-11-14 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17634124#comment-17634124
 ] 

Apache Spark commented on SPARK-41085:
--

User 'vinodkc' has created a pull request for this issue:
https://github.com/apache/spark/pull/38661

> Support Bit manipulation function COUNTSET
> --
>
> Key: SPARK-41085
> URL: https://issues.apache.org/jira/browse/SPARK-41085
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.4
>Reporter: Vinod KC
>Priority: Minor
>
> Support Bit manipulation function COUNTSET
> The function shall return the number of 1 bits in the specified integer 
> value. If the optional second argument is set to zero, it shall return the 
> number of 0 bits instead.
> COUNTSET(integer_type a [, INT zero_or_one])



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-41120) Upgrade joda-time from 2.12.0 to 2.12.1

2022-11-14 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-41120.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 38636
[https://github.com/apache/spark/pull/38636]

> Upgrade joda-time from 2.12.0 to 2.12.1
> ---
>
> Key: SPARK-41120
> URL: https://issues.apache.org/jira/browse/SPARK-41120
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41120) Upgrade joda-time from 2.12.0 to 2.12.1

2022-11-14 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-41120:


Assignee: BingKun Pan

> Upgrade joda-time from 2.12.0 to 2.12.1
> ---
>
> Key: SPARK-41120
> URL: https://issues.apache.org/jira/browse/SPARK-41120
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-41123) Upgrade mysql-connector-java from 8.0.30 to 8.0.31

2022-11-14 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-41123.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 38639
[https://github.com/apache/spark/pull/38639]

> Upgrade mysql-connector-java from 8.0.30 to 8.0.31
> --
>
> Key: SPARK-41123
> URL: https://issues.apache.org/jira/browse/SPARK-41123
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41123) Upgrade mysql-connector-java from 8.0.30 to 8.0.31

2022-11-14 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-41123:


Assignee: BingKun Pan

> Upgrade mysql-connector-java from 8.0.30 to 8.0.31
> --
>
> Key: SPARK-41123
> URL: https://issues.apache.org/jira/browse/SPARK-41123
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41142) Support named arguments functions

2022-11-14 Thread Yaohua Zhao (Jira)
Yaohua Zhao created SPARK-41142:
---

 Summary: Support named arguments functions
 Key: SPARK-41142
 URL: https://issues.apache.org/jira/browse/SPARK-41142
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Affects Versions: 3.3.2
Reporter: Yaohua Zhao


Support named arguments functions in Spark SQL.

General usage: _FUNC_(arg0, arg1, arg2, arg5 => value5, arg8 => value8)
 * Arguments can be passed positionally or by name
 * Positional arguments cannot come after a named argument.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41143) Add named arguments function syntax support and trait

2022-11-14 Thread Yaohua Zhao (Jira)
Yaohua Zhao created SPARK-41143:
---

 Summary: Add named arguments function syntax support and trait
 Key: SPARK-41143
 URL: https://issues.apache.org/jira/browse/SPARK-41143
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.3.2
Reporter: Yaohua Zhao


Parser can parse: _FUNC_ ( key0 => value0 )



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40504) Make yarn appmaster load config from client

2022-11-14 Thread zhengchenyu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengchenyu resolved SPARK-40504.
-
Resolution: Implemented

> Make yarn appmaster load config from client
> ---
>
> Key: SPARK-40504
> URL: https://issues.apache.org/jira/browse/SPARK-40504
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Affects Versions: 3.0.1
>Reporter: zhengchenyu
>Priority: Major
>
> In yarn federation mode, config in client side and nm side may be different. 
> AppMaster should override config from client side.
> For example: 
> In client side, yarn.resourcemanager.ha.rm-ids are yarn routers.
> In nm side, yarn.resourcemanager.ha.rm-ids are the rms of subcluster.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40958) Support data masking built-in function 'null'

2022-11-14 Thread Ranga Reddy (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17634142#comment-17634142
 ] 

Ranga Reddy commented on SPARK-40958:
-

I will work on this Jira.

> Support data masking built-in function 'null'
> -
>
> Key: SPARK-40958
> URL: https://issues.apache.org/jira/browse/SPARK-40958
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Daniel
>Priority: Major
>
> This can be a simple function that returns a NULL value of the given input 
> type.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-40504) Make yarn appmaster load config from client

2022-11-14 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reopened SPARK-40504:
---

> Make yarn appmaster load config from client
> ---
>
> Key: SPARK-40504
> URL: https://issues.apache.org/jira/browse/SPARK-40504
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Affects Versions: 3.0.1
>Reporter: zhengchenyu
>Priority: Major
>
> In yarn federation mode, config in client side and nm side may be different. 
> AppMaster should override config from client side.
> For example: 
> In client side, yarn.resourcemanager.ha.rm-ids are yarn routers.
> In nm side, yarn.resourcemanager.ha.rm-ids are the rms of subcluster.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40504) Make yarn appmaster load config from client

2022-11-14 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-40504.
---
Resolution: Invalid

> Make yarn appmaster load config from client
> ---
>
> Key: SPARK-40504
> URL: https://issues.apache.org/jira/browse/SPARK-40504
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Affects Versions: 3.0.1
>Reporter: zhengchenyu
>Priority: Major
>
> In yarn federation mode, config in client side and nm side may be different. 
> AppMaster should override config from client side.
> For example: 
> In client side, yarn.resourcemanager.ha.rm-ids are yarn routers.
> In nm side, yarn.resourcemanager.ha.rm-ids are the rms of subcluster.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-40504) Make yarn appmaster load config from client

2022-11-14 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun closed SPARK-40504.
-

> Make yarn appmaster load config from client
> ---
>
> Key: SPARK-40504
> URL: https://issues.apache.org/jira/browse/SPARK-40504
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Affects Versions: 3.0.1
>Reporter: zhengchenyu
>Priority: Major
>
> In yarn federation mode, config in client side and nm side may be different. 
> AppMaster should override config from client side.
> For example: 
> In client side, yarn.resourcemanager.ha.rm-ids are yarn routers.
> In nm side, yarn.resourcemanager.ha.rm-ids are the rms of subcluster.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41143) Add named arguments function syntax support and trait

2022-11-14 Thread Yaohua Zhao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yaohua Zhao updated SPARK-41143:

Description: Parser can parse: _{_}FUNC_{_} ( key0 => value0 )  (was: 
Parser can parse: _FUNC{_}_{_} ( key0 => value0 ))

> Add named arguments function syntax support and trait
> -
>
> Key: SPARK-41143
> URL: https://issues.apache.org/jira/browse/SPARK-41143
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.2
>Reporter: Yaohua Zhao
>Priority: Major
>
> Parser can parse: _{_}FUNC_{_} ( key0 => value0 )



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41143) Add named arguments function syntax support and trait

2022-11-14 Thread Yaohua Zhao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yaohua Zhao updated SPARK-41143:

Description: Parser can parse: _FUNC{_}_{_} ( key0 => value0 )  (was: 
Parser can parse: _FUNC_ ( key0 => value0 ))

> Add named arguments function syntax support and trait
> -
>
> Key: SPARK-41143
> URL: https://issues.apache.org/jira/browse/SPARK-41143
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.2
>Reporter: Yaohua Zhao
>Priority: Major
>
> Parser can parse: _FUNC{_}_{_} ( key0 => value0 )



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41143) Add named arguments function syntax support and trait

2022-11-14 Thread Yaohua Zhao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yaohua Zhao updated SPARK-41143:

Description: 
The parser can parse:
{code:java}
_FUNC_ ( key0 => value0 ){code}

  was:Parser can parse: _{_}FUNC_{_} ( key0 => value0 )


> Add named arguments function syntax support and trait
> -
>
> Key: SPARK-41143
> URL: https://issues.apache.org/jira/browse/SPARK-41143
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.2
>Reporter: Yaohua Zhao
>Priority: Major
>
> The parser can parse:
> {code:java}
> _FUNC_ ( key0 => value0 ){code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-41138) DataFrame.na.fill should have the same augment types as DataFrame.fillna

2022-11-14 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-41138.
---
Resolution: Not A Problem

> DataFrame.na.fill should have the same augment types as DataFrame.fillna
> 
>
> Key: SPARK-41138
> URL: https://issues.apache.org/jira/browse/SPARK-41138
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-41101) Add messageClassName support for pypspark-protobuf

2022-11-14 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-41101.
--
Fix Version/s: 3.4.0
   (was: 2.4.0)
   Resolution: Fixed

Issue resolved by pull request 38603
[https://github.com/apache/spark/pull/38603]

> Add messageClassName support for pypspark-protobuf
> --
>
> Key: SPARK-41101
> URL: https://issues.apache.org/jira/browse/SPARK-41101
> Project: Spark
>  Issue Type: Task
>  Components: Protobuf
>Affects Versions: 2.3.0
>Reporter: Sandish Kumar HN
>Assignee: Sandish Kumar HN
>Priority: Major
> Fix For: 3.4.0
>
>
> When compared to the Scala API, the pyspark-protobuf API lacks support for 
> the messageClassName parameter; users can pass a messageClassName specific 
> jar file via the spark-submit option —jars.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41101) Add messageClassName support for pypspark-protobuf

2022-11-14 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-41101:


Assignee: Sandish Kumar HN

> Add messageClassName support for pypspark-protobuf
> --
>
> Key: SPARK-41101
> URL: https://issues.apache.org/jira/browse/SPARK-41101
> Project: Spark
>  Issue Type: Task
>  Components: Protobuf
>Affects Versions: 2.3.0
>Reporter: Sandish Kumar HN
>Assignee: Sandish Kumar HN
>Priority: Major
> Fix For: 2.4.0
>
>
> When compared to the Scala API, the pyspark-protobuf API lacks support for 
> the messageClassName parameter; users can pass a messageClassName specific 
> jar file via the spark-submit option —jars.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41141) avoid introducing a new aggregate expression in the analysis phase when subquery is referencing it

2022-11-14 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-41141:

Target Version/s:   (was: 3.3.1)

> avoid introducing a new aggregate expression in the analysis phase when 
> subquery is referencing it
> --
>
> Key: SPARK-41141
> URL: https://issues.apache.org/jira/browse/SPARK-41141
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.1
>Reporter: Asif
>Priority: Major
>  Labels: spark-sql
>
> Currently the  analyzer phase rules on subquery referencing the aggregate 
> expression in outer query, avoids introducing a new aggregate only for a 
> single level aggregate function. It introduces new aggregate expression for 
> nested aggregate functions.
> It is possible to avoid  adding this extra aggregate expression  easily, 
> atleast if the outer projection involving aggregate function is exactly same 
> as the one that is used in subquery, or if the outer query's projection 
> involving aggregate function is a subtree of the subquery's expression.
>  
> Thus consider the following 2 cases:
> 1) select  cos (sum(a)) , b from t1  group by b having exists (select x from 
> t2 where y = cos(sum(a)) )
> 2) select  sum(a) , b from t1  group by b having exists (select x from t2 
> where y = cos(sum(a)) )
>  
> In both the above cases, there is no need for adding extra aggregate 
> expression.
>  
> I am also investigating if its possible to avoid if the case is 
>  
> 3) select  Cos(sum(a)) , b from t1  group by b having exists (select x from 
> t2 where y = sum(a) )
>  
> This Jira also is needed for another issue where subquery datasource v2  is 
> projecting columns which are not needed. ( no Jira filed yet for that, will 
> do that..)
>  
> Will be opening a PR for this soon..



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41144) UnresolvedHint should not cause query failure

2022-11-14 Thread XiDuo You (Jira)
XiDuo You created SPARK-41144:
-

 Summary: UnresolvedHint should not cause query failure
 Key: SPARK-41144
 URL: https://issues.apache.org/jira/browse/SPARK-41144
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.4.0
Reporter: XiDuo You


 
{code:java}
CREATE TABLE t1(c1 bigint) USING PARQUET;
CREATE TABLE t2(c2 bigint) USING PARQUET;
SELECT /*+ hash(t2) */ * FROM t1 join t2 on c1 = c2;{code}
 

 

failed with msg:
{code:java}
org.apache.spark.sql.catalyst.analysis.UnresolvedException: Invalid call to 
exprId on unresolved object
  at 
org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute.exprId(unresolved.scala:147)
  at 
org.apache.spark.sql.catalyst.analysis.Analyzer$AddMetadataColumns$.$anonfun$hasMetadataCol$4(Analyzer.scala:1005)
  at 
org.apache.spark.sql.catalyst.analysis.Analyzer$AddMetadataColumns$.$anonfun$hasMetadataCol$4$adapted(Analyzer.scala:1005)
  at scala.collection.Iterator.exists(Iterator.scala:969)
  at scala.collection.Iterator.exists$(Iterator.scala:967)
  at scala.collection.AbstractIterator.exists(Iterator.scala:1431)
  at scala.collection.IterableLike.exists(IterableLike.scala:79)
  at scala.collection.IterableLike.exists$(IterableLike.scala:78)
  at scala.collection.AbstractIterable.exists(Iterable.scala:56)
  at 
org.apache.spark.sql.catalyst.analysis.Analyzer$AddMetadataColumns$.$anonfun$hasMetadataCol$3(Analyzer.scala:1005)
  at 
org.apache.spark.sql.catalyst.analysis.Analyzer$AddMetadataColumns$.$anonfun$hasMetadataCol$3$adapted(Analyzer.scala:1005)
 {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-41134) improve error message of internal errors

2022-11-14 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-41134.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 38648
[https://github.com/apache/spark/pull/38648]

> improve error message of internal errors
> 
>
> Key: SPARK-41134
> URL: https://issues.apache.org/jira/browse/SPARK-41134
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41134) improve error message of internal errors

2022-11-14 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-41134:
---

Assignee: Wenchen Fan

> improve error message of internal errors
> 
>
> Key: SPARK-41134
> URL: https://issues.apache.org/jira/browse/SPARK-41134
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41144) UnresolvedHint should not cause query failure

2022-11-14 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17634158#comment-17634158
 ] 

Apache Spark commented on SPARK-41144:
--

User 'ulysses-you' has created a pull request for this issue:
https://github.com/apache/spark/pull/38662

> UnresolvedHint should not cause query failure
> -
>
> Key: SPARK-41144
> URL: https://issues.apache.org/jira/browse/SPARK-41144
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: XiDuo You
>Priority: Major
>
>  
> {code:java}
> CREATE TABLE t1(c1 bigint) USING PARQUET;
> CREATE TABLE t2(c2 bigint) USING PARQUET;
> SELECT /*+ hash(t2) */ * FROM t1 join t2 on c1 = c2;{code}
>  
>  
> failed with msg:
> {code:java}
> org.apache.spark.sql.catalyst.analysis.UnresolvedException: Invalid call to 
> exprId on unresolved object
>   at 
> org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute.exprId(unresolved.scala:147)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$AddMetadataColumns$.$anonfun$hasMetadataCol$4(Analyzer.scala:1005)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$AddMetadataColumns$.$anonfun$hasMetadataCol$4$adapted(Analyzer.scala:1005)
>   at scala.collection.Iterator.exists(Iterator.scala:969)
>   at scala.collection.Iterator.exists$(Iterator.scala:967)
>   at scala.collection.AbstractIterator.exists(Iterator.scala:1431)
>   at scala.collection.IterableLike.exists(IterableLike.scala:79)
>   at scala.collection.IterableLike.exists$(IterableLike.scala:78)
>   at scala.collection.AbstractIterable.exists(Iterable.scala:56)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$AddMetadataColumns$.$anonfun$hasMetadataCol$3(Analyzer.scala:1005)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$AddMetadataColumns$.$anonfun$hasMetadataCol$3$adapted(Analyzer.scala:1005)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41144) UnresolvedHint should not cause query failure

2022-11-14 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41144:


Assignee: Apache Spark

> UnresolvedHint should not cause query failure
> -
>
> Key: SPARK-41144
> URL: https://issues.apache.org/jira/browse/SPARK-41144
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: XiDuo You
>Assignee: Apache Spark
>Priority: Major
>
>  
> {code:java}
> CREATE TABLE t1(c1 bigint) USING PARQUET;
> CREATE TABLE t2(c2 bigint) USING PARQUET;
> SELECT /*+ hash(t2) */ * FROM t1 join t2 on c1 = c2;{code}
>  
>  
> failed with msg:
> {code:java}
> org.apache.spark.sql.catalyst.analysis.UnresolvedException: Invalid call to 
> exprId on unresolved object
>   at 
> org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute.exprId(unresolved.scala:147)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$AddMetadataColumns$.$anonfun$hasMetadataCol$4(Analyzer.scala:1005)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$AddMetadataColumns$.$anonfun$hasMetadataCol$4$adapted(Analyzer.scala:1005)
>   at scala.collection.Iterator.exists(Iterator.scala:969)
>   at scala.collection.Iterator.exists$(Iterator.scala:967)
>   at scala.collection.AbstractIterator.exists(Iterator.scala:1431)
>   at scala.collection.IterableLike.exists(IterableLike.scala:79)
>   at scala.collection.IterableLike.exists$(IterableLike.scala:78)
>   at scala.collection.AbstractIterable.exists(Iterable.scala:56)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$AddMetadataColumns$.$anonfun$hasMetadataCol$3(Analyzer.scala:1005)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$AddMetadataColumns$.$anonfun$hasMetadataCol$3$adapted(Analyzer.scala:1005)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41144) UnresolvedHint should not cause query failure

2022-11-14 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41144:


Assignee: (was: Apache Spark)

> UnresolvedHint should not cause query failure
> -
>
> Key: SPARK-41144
> URL: https://issues.apache.org/jira/browse/SPARK-41144
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: XiDuo You
>Priority: Major
>
>  
> {code:java}
> CREATE TABLE t1(c1 bigint) USING PARQUET;
> CREATE TABLE t2(c2 bigint) USING PARQUET;
> SELECT /*+ hash(t2) */ * FROM t1 join t2 on c1 = c2;{code}
>  
>  
> failed with msg:
> {code:java}
> org.apache.spark.sql.catalyst.analysis.UnresolvedException: Invalid call to 
> exprId on unresolved object
>   at 
> org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute.exprId(unresolved.scala:147)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$AddMetadataColumns$.$anonfun$hasMetadataCol$4(Analyzer.scala:1005)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$AddMetadataColumns$.$anonfun$hasMetadataCol$4$adapted(Analyzer.scala:1005)
>   at scala.collection.Iterator.exists(Iterator.scala:969)
>   at scala.collection.Iterator.exists$(Iterator.scala:967)
>   at scala.collection.AbstractIterator.exists(Iterator.scala:1431)
>   at scala.collection.IterableLike.exists(IterableLike.scala:79)
>   at scala.collection.IterableLike.exists$(IterableLike.scala:78)
>   at scala.collection.AbstractIterable.exists(Iterable.scala:56)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$AddMetadataColumns$.$anonfun$hasMetadataCol$3(Analyzer.scala:1005)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$AddMetadataColumns$.$anonfun$hasMetadataCol$3$adapted(Analyzer.scala:1005)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41144) UnresolvedHint should not cause query failure

2022-11-14 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17634160#comment-17634160
 ] 

Apache Spark commented on SPARK-41144:
--

User 'ulysses-you' has created a pull request for this issue:
https://github.com/apache/spark/pull/38662

> UnresolvedHint should not cause query failure
> -
>
> Key: SPARK-41144
> URL: https://issues.apache.org/jira/browse/SPARK-41144
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: XiDuo You
>Priority: Major
>
>  
> {code:java}
> CREATE TABLE t1(c1 bigint) USING PARQUET;
> CREATE TABLE t2(c2 bigint) USING PARQUET;
> SELECT /*+ hash(t2) */ * FROM t1 join t2 on c1 = c2;{code}
>  
>  
> failed with msg:
> {code:java}
> org.apache.spark.sql.catalyst.analysis.UnresolvedException: Invalid call to 
> exprId on unresolved object
>   at 
> org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute.exprId(unresolved.scala:147)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$AddMetadataColumns$.$anonfun$hasMetadataCol$4(Analyzer.scala:1005)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$AddMetadataColumns$.$anonfun$hasMetadataCol$4$adapted(Analyzer.scala:1005)
>   at scala.collection.Iterator.exists(Iterator.scala:969)
>   at scala.collection.Iterator.exists$(Iterator.scala:967)
>   at scala.collection.AbstractIterator.exists(Iterator.scala:1431)
>   at scala.collection.IterableLike.exists(IterableLike.scala:79)
>   at scala.collection.IterableLike.exists$(IterableLike.scala:78)
>   at scala.collection.AbstractIterable.exists(Iterable.scala:56)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$AddMetadataColumns$.$anonfun$hasMetadataCol$3(Analyzer.scala:1005)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$AddMetadataColumns$.$anonfun$hasMetadataCol$3$adapted(Analyzer.scala:1005)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41143) Add named arguments function syntax support and trait

2022-11-14 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17634169#comment-17634169
 ] 

Apache Spark commented on SPARK-41143:
--

User 'Yaohua628' has created a pull request for this issue:
https://github.com/apache/spark/pull/38663

> Add named arguments function syntax support and trait
> -
>
> Key: SPARK-41143
> URL: https://issues.apache.org/jira/browse/SPARK-41143
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.2
>Reporter: Yaohua Zhao
>Priority: Major
>
> The parser can parse:
> {code:java}
> _FUNC_ ( key0 => value0 ){code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41143) Add named arguments function syntax support and trait

2022-11-14 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41143:


Assignee: Apache Spark

> Add named arguments function syntax support and trait
> -
>
> Key: SPARK-41143
> URL: https://issues.apache.org/jira/browse/SPARK-41143
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.2
>Reporter: Yaohua Zhao
>Assignee: Apache Spark
>Priority: Major
>
> The parser can parse:
> {code:java}
> _FUNC_ ( key0 => value0 ){code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41143) Add named arguments function syntax support and trait

2022-11-14 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41143:


Assignee: (was: Apache Spark)

> Add named arguments function syntax support and trait
> -
>
> Key: SPARK-41143
> URL: https://issues.apache.org/jira/browse/SPARK-41143
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.2
>Reporter: Yaohua Zhao
>Priority: Major
>
> The parser can parse:
> {code:java}
> _FUNC_ ( key0 => value0 ){code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41143) Add named arguments function syntax support and trait

2022-11-14 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17634168#comment-17634168
 ] 

Apache Spark commented on SPARK-41143:
--

User 'Yaohua628' has created a pull request for this issue:
https://github.com/apache/spark/pull/38663

> Add named arguments function syntax support and trait
> -
>
> Key: SPARK-41143
> URL: https://issues.apache.org/jira/browse/SPARK-41143
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.2
>Reporter: Yaohua Zhao
>Priority: Major
>
> The parser can parse:
> {code:java}
> _FUNC_ ( key0 => value0 ){code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41072) Convert the internal error about failed stream to user-facing error

2022-11-14 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk reassigned SPARK-41072:


Assignee: Max Gekk

> Convert the internal error about failed stream to user-facing error
> ---
>
> Key: SPARK-41072
> URL: https://issues.apache.org/jira/browse/SPARK-41072
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>
> Assign an error class to the following internal error since it is an 
> user-facing error:
> {code}
> java.lang.Exception: org.apache.spark.sql.streaming.StreamingQueryException: 
> Query cloudtrail_pipeline [id = 5a3758c3-3b3a-47ff-843a-23292cde3b4f, runId = 
> c1a90694-daa2-4929-b749-82b8a43fa2b1] terminated with exception: 
> [INTERNAL_ERROR] Execution of the stream cloudtrail_pipeline failed. Please, 
> fill a bug report in, and provide the full stack trace. 2 at 
> org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:403)
>  3 at 
> org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.$anonfun$run$4(StreamExecution.scala:269)
>  4 at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) 
> 5 at com.databricks.unity.EmptyHandle$.runWithAndClose(UCSHandle.scala:42) 6 
> at 
> org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:269)
>  7Caused by: java.lang.Exception: org.apache.spark.SparkException: 
> [INTERNAL_ERROR] Execution of the stream cloudtrail_pipeline failed. Please, 
> fill a bug report in, and provide the full stack trace. 8 at 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-41072) Convert the internal error about failed stream to user-facing error

2022-11-14 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-41072.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 38629
[https://github.com/apache/spark/pull/38629]

> Convert the internal error about failed stream to user-facing error
> ---
>
> Key: SPARK-41072
> URL: https://issues.apache.org/jira/browse/SPARK-41072
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
> Fix For: 3.4.0
>
>
> Assign an error class to the following internal error since it is an 
> user-facing error:
> {code}
> java.lang.Exception: org.apache.spark.sql.streaming.StreamingQueryException: 
> Query cloudtrail_pipeline [id = 5a3758c3-3b3a-47ff-843a-23292cde3b4f, runId = 
> c1a90694-daa2-4929-b749-82b8a43fa2b1] terminated with exception: 
> [INTERNAL_ERROR] Execution of the stream cloudtrail_pipeline failed. Please, 
> fill a bug report in, and provide the full stack trace. 2 at 
> org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:403)
>  3 at 
> org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.$anonfun$run$4(StreamExecution.scala:269)
>  4 at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) 
> 5 at com.databricks.unity.EmptyHandle$.runWithAndClose(UCSHandle.scala:42) 6 
> at 
> org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:269)
>  7Caused by: java.lang.Exception: org.apache.spark.SparkException: 
> [INTERNAL_ERROR] Execution of the stream cloudtrail_pipeline failed. Please, 
> fill a bug report in, and provide the full stack trace. 8 at 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-41025) Introduce ComparableOffset as mixed-in interface for streaming offset

2022-11-14 Thread Jungtaek Lim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim resolved SPARK-41025.
--
Resolution: Abandoned

> Introduce ComparableOffset as mixed-in interface for streaming offset
> -
>
> Key: SPARK-41025
> URL: https://issues.apache.org/jira/browse/SPARK-41025
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.4.0
>Reporter: Jungtaek Lim
>Priority: Major
>
> Currently, Spark does not care about the offset range when planning 
> microbatch, and it has been data source implementation's responsibility to 
> check the boundary of offset range.
> While not all types of offsets could provide comparison between two instances 
> (some types may only be able to provide equality comparison), majority of 
> types of offsets can do, which Spark can leverage it to perform the 
> validation of offset range generally.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-41137) Rename LATERAL_JOIN_OF_TYPE to INVALID_LATERAL_JOIN_TYPE

2022-11-14 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-41137.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 38652
[https://github.com/apache/spark/pull/38652]

> Rename LATERAL_JOIN_OF_TYPE to INVALID_LATERAL_JOIN_TYPE
> 
>
> Key: SPARK-41137
> URL: https://issues.apache.org/jira/browse/SPARK-41137
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
> Fix For: 3.4.0
>
>
> The current error class name is not much make sense to illustrate the 
> situation.
> We should fix the error class and and its error message more properly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41137) Rename LATERAL_JOIN_OF_TYPE to INVALID_LATERAL_JOIN_TYPE

2022-11-14 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk reassigned SPARK-41137:


Assignee: Haejoon Lee

> Rename LATERAL_JOIN_OF_TYPE to INVALID_LATERAL_JOIN_TYPE
> 
>
> Key: SPARK-41137
> URL: https://issues.apache.org/jira/browse/SPARK-41137
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>
> The current error class name is not much make sense to illustrate the 
> situation.
> We should fix the error class and and its error message more properly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41145) Assert the offset range for file stream source in Trigger.AvailableNow

2022-11-14 Thread Jungtaek Lim (Jira)
Jungtaek Lim created SPARK-41145:


 Summary: Assert the offset range for file stream source in 
Trigger.AvailableNow
 Key: SPARK-41145
 URL: https://issues.apache.org/jira/browse/SPARK-41145
 Project: Spark
  Issue Type: Improvement
  Components: Structured Streaming
Affects Versions: 3.4.0
Reporter: Jungtaek Lim


We encountered the issue where the data source did not properly implement the 
offset with Trigger.AvailableNow, and the query ran with processing same data 
continuously without stopping.

We would like to proactively avoid such case for most used data sources. I'll 
create a new JIRA ticket for Kafka data source as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41146) Assert the offset range for Kafka source in Trigger.AvailableNow

2022-11-14 Thread Jungtaek Lim (Jira)
Jungtaek Lim created SPARK-41146:


 Summary: Assert the offset range for Kafka source in 
Trigger.AvailableNow
 Key: SPARK-41146
 URL: https://issues.apache.org/jira/browse/SPARK-41146
 Project: Spark
  Issue Type: Improvement
  Components: Structured Streaming
Affects Versions: 3.4.0
Reporter: Jungtaek Lim


We encountered the issue where the data source did not properly implement the 
offset with Trigger.AvailableNow, and the query ran with processing same data 
continuously without stopping.

We would like to proactively avoid such case for most used data sources. I'll 
create a new JIRA ticket for Kafka data source as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41146) Assert the offset range for Kafka source in Trigger.AvailableNow

2022-11-14 Thread Jungtaek Lim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim updated SPARK-41146:
-
Description: 
We encountered the issue where the data source did not properly implement the 
offset with Trigger.AvailableNow, and the query ran with processing same data 
continuously without stopping.

We would like to proactively avoid such case for most used data sources.

  was:
We encountered the issue where the data source did not properly implement the 
offset with Trigger.AvailableNow, and the query ran with processing same data 
continuously without stopping.

We would like to proactively avoid such case for most used data sources. I'll 
create a new JIRA ticket for Kafka data source as well.


> Assert the offset range for Kafka source in Trigger.AvailableNow
> 
>
> Key: SPARK-41146
> URL: https://issues.apache.org/jira/browse/SPARK-41146
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.4.0
>Reporter: Jungtaek Lim
>Priority: Major
>
> We encountered the issue where the data source did not properly implement the 
> offset with Trigger.AvailableNow, and the query ran with processing same data 
> continuously without stopping.
> We would like to proactively avoid such case for most used data sources.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-41145) Assert the offset range for file stream source in Trigger.AvailableNow

2022-11-14 Thread Jungtaek Lim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim resolved SPARK-41145.
--
Resolution: Not A Problem

> Assert the offset range for file stream source in Trigger.AvailableNow
> --
>
> Key: SPARK-41145
> URL: https://issues.apache.org/jira/browse/SPARK-41145
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.4.0
>Reporter: Jungtaek Lim
>Priority: Major
>
> We encountered the issue where the data source did not properly implement the 
> offset with Trigger.AvailableNow, and the query ran with processing same data 
> continuously without stopping.
> We would like to proactively avoid such case for most used data sources. I'll 
> create a new JIRA ticket for Kafka data source as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41145) Assert the offset range for file stream source in Trigger.AvailableNow

2022-11-14 Thread Jungtaek Lim (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17634193#comment-17634193
 ] 

Jungtaek Lim commented on SPARK-41145:
--

Looks like we have proper assertion in offset range for getBatch().

[https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSource.scala#L235]

I quickly looked into the codebase for File stream source, and looks like there 
are good amount of logging when we turn on DEBUG/TRACE log. Maybe nothing to do 
for File stream source.

> Assert the offset range for file stream source in Trigger.AvailableNow
> --
>
> Key: SPARK-41145
> URL: https://issues.apache.org/jira/browse/SPARK-41145
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.4.0
>Reporter: Jungtaek Lim
>Priority: Major
>
> We encountered the issue where the data source did not properly implement the 
> offset with Trigger.AvailableNow, and the query ran with processing same data 
> continuously without stopping.
> We would like to proactively avoid such case for most used data sources. I'll 
> create a new JIRA ticket for Kafka data source as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41147) Rename _LEGACY_ERROR_TEMP_1042 to INVALID_FUNCTION_ARGUMENT

2022-11-14 Thread Haejoon Lee (Jira)
Haejoon Lee created SPARK-41147:
---

 Summary: Rename _LEGACY_ERROR_TEMP_1042 to 
INVALID_FUNCTION_ARGUMENT
 Key: SPARK-41147
 URL: https://issues.apache.org/jira/browse/SPARK-41147
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.4.0
Reporter: Haejoon Lee


We should make all LEGACY_ERROR_TEMP to the proper name.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41147) Assign a name to the legacy error class _LEGACY_ERROR_TEMP_1042

2022-11-14 Thread Haejoon Lee (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17634195#comment-17634195
 ] 

Haejoon Lee commented on SPARK-41147:
-

I'm working on it

> Assign a name to the legacy error class _LEGACY_ERROR_TEMP_1042
> ---
>
> Key: SPARK-41147
> URL: https://issues.apache.org/jira/browse/SPARK-41147
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>
> We should make all LEGACY_ERROR_TEMP to the proper name.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41147) Assign a name to the legacy error class _LEGACY_ERROR_TEMP_1042

2022-11-14 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-41147:

Summary: Assign a name to the legacy error class _LEGACY_ERROR_TEMP_1042  
(was: Rename _LEGACY_ERROR_TEMP_1042 to INVALID_FUNCTION_ARGUMENT)

> Assign a name to the legacy error class _LEGACY_ERROR_TEMP_1042
> ---
>
> Key: SPARK-41147
> URL: https://issues.apache.org/jira/browse/SPARK-41147
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>
> We should make all LEGACY_ERROR_TEMP to the proper name.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   >