date:20240409

[jira] [Created] (SPARK-47791) Truncate exceed decimals with scale first instead of precision from JDBC datasource

2024-04-09 Thread Kent Yao (Jira)

Kent Yao created SPARK-47791:


 Summary: Truncate exceed decimals with scale first instead of 
precision from JDBC datasource
 Key: SPARK-47791
 URL: https://issues.apache.org/jira/browse/SPARK-47791
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 4.0.0
Reporter: Kent Yao






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47781) Handle negative scale for JDBC data sources

2024-04-09 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao updated SPARK-47781:
-
Summary: Handle negative scale for JDBC data sources  (was: Handle negative 
scale and truncate exceed scale first for JDBC data sources)

> Handle negative scale for JDBC data sources
> ---
>
> Key: SPARK-47781
> URL: https://issues.apache.org/jira/browse/SPARK-47781
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> SPARK-30252 has disabled the negative scale for decimals. It has a regression 
> that it also disabled reading from data sources that support negative scale 
> decimals



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47790) Upgrade `commons-io` to 2.16.1

2024-04-09 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47790:
---
Labels: pull-request-available  (was: )

> Upgrade `commons-io` to 2.16.1
> --
>
> Key: SPARK-47790
> URL: https://issues.apache.org/jira/browse/SPARK-47790
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47781) Handle negative scale and truncate exceed scale first for JDBC data sources

2024-04-09 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao resolved SPARK-47781.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45956
[https://github.com/apache/spark/pull/45956]

> Handle negative scale and truncate exceed scale first for JDBC data sources
> ---
>
> Key: SPARK-47781
> URL: https://issues.apache.org/jira/browse/SPARK-47781
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> SPARK-30252 has disabled the negative scale for decimals. It has a regression 
> that it also disabled reading from data sources that support negative scale 
> decimals



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45444) Upgrade `commons-io` to 2.14.0

2024-04-09 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-45444:
--
Summary: Upgrade `commons-io` to 2.14.0  (was: Upgrade `commons-io` to 
1.24.0)

> Upgrade `commons-io` to 2.14.0
> --
>
> Key: SPARK-45444
> URL: https://issues.apache.org/jira/browse/SPARK-45444
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45444) Upgrade `commons-io` to 1.24.0

2024-04-09 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-45444:
--
Parent: SPARK-47046
Issue Type: Sub-task  (was: Improvement)

> Upgrade `commons-io` to 1.24.0
> --
>
> Key: SPARK-45444
> URL: https://issues.apache.org/jira/browse/SPARK-45444
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45734) Upgrade commons-io to 2.15.0

2024-04-09 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-45734:
--
Parent: SPARK-47046
Issue Type: Sub-task  (was: Improvement)

> Upgrade commons-io to 2.15.0
> 
>
> Key: SPARK-45734
> URL: https://issues.apache.org/jira/browse/SPARK-45734
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> https://commons.apache.org/proper/commons-io/changes-report.html#a2.15.0



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47593) Connector module: Migrate logWarn with variables to structured logging framework

2024-04-09 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-47593.

Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45879
[https://github.com/apache/spark/pull/45879]

> Connector module: Migrate logWarn with variables to structured logging 
> framework
> 
>
> Key: SPARK-47593
> URL: https://issues.apache.org/jira/browse/SPARK-47593
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Gengliang Wang
>Assignee: BingKun Pan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-47776) State store operation cannot work properly with binary inequality collation

2024-04-09 Thread Jungtaek Lim (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim reassigned SPARK-47776:


Assignee: Jungtaek Lim

> State store operation cannot work properly with binary inequality collation
> ---
>
> Key: SPARK-47776
> URL: https://issues.apache.org/jira/browse/SPARK-47776
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 4.0.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Blocker
>  Labels: pull-request-available
>
> Arguably this is a correctness issue, though we haven't released collation 
> feature yet.
> collation introduces the concept of binary (in)equality, which means in some 
> collation we no longer be able to just compare the binary format of two 
> UnsafeRows to determine equality.
> For example, 'aaa' and 'AAA' can be "semantically" same in case insensitive 
> collation.
> State store is basically key-value storage, and the most provider 
> implementations rely on the fact that all the columns in the key schema 
> support binary equality. We need to disallow using binary inequality column 
> in the key schema, before we could support this in majority of state store 
> providers (or high-level of state store.)
> Why this is correctness issue? For example, streaming aggregation will 
> produce an output of aggregation which does not care about the semantic 
> equality.
> e.g. df.groupBy(strCol).count() 
> Although strCol is case insensitive, 'a' and 'A' won't be counted together in 
> streaming aggregation, while they should be.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47776) State store operation cannot work properly with binary inequality collation

2024-04-09 Thread Jungtaek Lim (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim resolved SPARK-47776.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45951
[https://github.com/apache/spark/pull/45951]

> State store operation cannot work properly with binary inequality collation
> ---
>
> Key: SPARK-47776
> URL: https://issues.apache.org/jira/browse/SPARK-47776
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 4.0.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Arguably this is a correctness issue, though we haven't released collation 
> feature yet.
> collation introduces the concept of binary (in)equality, which means in some 
> collation we no longer be able to just compare the binary format of two 
> UnsafeRows to determine equality.
> For example, 'aaa' and 'AAA' can be "semantically" same in case insensitive 
> collation.
> State store is basically key-value storage, and the most provider 
> implementations rely on the fact that all the columns in the key schema 
> support binary equality. We need to disallow using binary inequality column 
> in the key schema, before we could support this in majority of state store 
> providers (or high-level of state store.)
> Why this is correctness issue? For example, streaming aggregation will 
> produce an output of aggregation which does not care about the semantic 
> equality.
> e.g. df.groupBy(strCol).count() 
> Although strCol is case insensitive, 'a' and 'A' won't be counted together in 
> streaming aggregation, while they should be.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-47759) Apps being stuck after JavaUtils.timeStringAs fails to parse a legitimate time string

2024-04-09 Thread Mridul Muralidharan (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-47759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835587#comment-17835587
 ] 

Mridul Muralidharan edited comment on SPARK-47759 at 4/10/24 4:08 AM:
--

In order to validate, I would suggest two things.
Wrap the input str (or lower), within quotes, in the exception message ... for 
example, "120s\u00A0" will look like 120s in the exception message as it is a 
unicode non breaking space.
The other would be to include the NumberFormatException 'e' as the cause in the 
exception being thrown.

Once you are able to get a stack trace with these two change in place, it 
should help us debug this better.


was (Author: mridulm80):
In order to validate, I would suggest two things.
Wrap the input str, within quote, in the exception message ... for example, 
"120s\u00A0" will look like 120s in the exception message as it is a unicode 
non breaking space.
The other would be to include the NumberFormatException 'e' as the cause in the 
exception being thrown.

Once you are able to get a stack trace with these two change in place, it 
should help us debug this better.

> Apps being stuck after JavaUtils.timeStringAs fails to parse a legitimate 
> time string
> -
>
> Key: SPARK-47759
> URL: https://issues.apache.org/jira/browse/SPARK-47759
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.5.0, 3.5.1
>Reporter: Bo Xiong
>Assignee: Bo Xiong
>Priority: Critical
>  Labels: hang, pull-request-available, stuck, threadsafe
> Fix For: 3.5.0, 4.0.0, 3.5.1, 3.5.2
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> h2. Symptom
> It's observed that our Spark apps occasionally got stuck with an unexpected 
> stack trace when reading/parsing a legitimate time string. Note that we 
> manually killed the stuck app instances and the retry goes thru on the same 
> cluster (without requiring any app code change).
>  
> *[Stack Trace 1]* The stack trace doesn't make sense since *120s* is a 
> legitimate time string, where the app runs on emr-7.0.0 with Spark 3.5.0 
> runtime.
> {code:java}
> Caused by: java.lang.RuntimeException: java.lang.NumberFormatException: Time 
> must be specified as seconds (s), milliseconds (ms), microseconds (us), 
> minutes (m or min), hour (h), or day (d). E.g. 50s, 100ms, or 250us.
> Failed to parse time string: 120s
> at 
> org.apache.spark.network.util.JavaUtils.timeStringAs(JavaUtils.java:258)
> at 
> org.apache.spark.network.util.JavaUtils.timeStringAsSec(JavaUtils.java:275)
> at org.apache.spark.util.Utils$.timeStringAsSeconds(Utils.scala:1166)
> at org.apache.spark.rpc.RpcTimeout$.apply(RpcTimeout.scala:131)
> at org.apache.spark.util.RpcUtils$.askRpcTimeout(RpcUtils.scala:41)
> at org.apache.spark.rpc.RpcEndpointRef.(RpcEndpointRef.scala:33)
> at 
> org.apache.spark.rpc.netty.NettyRpcEndpointRef.(NettyRpcEnv.scala:533)
> at org.apache.spark.rpc.netty.RequestMessage$.apply(NettyRpcEnv.scala:640)
> at 
> org.apache.spark.rpc.netty.NettyRpcHandler.internalReceive(NettyRpcEnv.scala:697)
> at 
> org.apache.spark.rpc.netty.NettyRpcHandler.receive(NettyRpcEnv.scala:682)
> at 
> org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:163)
> at 
> org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:109)
> at 
> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:140)
> at 
> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:53)
> at 
> io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
> at 
> io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
> at 
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
> at 
>

[jira] [Comment Edited] (SPARK-47759) Apps being stuck after JavaUtils.timeStringAs fails to parse a legitimate time string

2024-04-09 Thread Mridul Muralidharan (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-47759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835587#comment-17835587
 ] 

Mridul Muralidharan edited comment on SPARK-47759 at 4/10/24 4:08 AM:
--

In order to validate, I would suggest two things.
Wrap the input str, within quote, in the exception message ... for example, 
"120s\u00A0" will look like 120s in the exception message as it is a unicode 
non breaking space.
The other would be to include the NumberFormatException 'e' as the cause in the 
exception being thrown.

Once you are able to get a stack trace with these two change in place, it 
should help us debug this better.


was (Author: mridulm80):
In order to validate, I would suggest two things.
Wrap the input str, within quote, to the exception ... for example, 
"120s\u00A0" will look like 120s in the exception message as it is a unicode 
non breaking space.
The other would be to include the NumberFormatException 'e' as the cause in the 
exception being thrown.

Once you are able to get a stack trace with these two change in place, it 
should help us debug this better.

> Apps being stuck after JavaUtils.timeStringAs fails to parse a legitimate 
> time string
> -
>
> Key: SPARK-47759
> URL: https://issues.apache.org/jira/browse/SPARK-47759
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.5.0, 3.5.1
>Reporter: Bo Xiong
>Assignee: Bo Xiong
>Priority: Critical
>  Labels: hang, pull-request-available, stuck, threadsafe
> Fix For: 3.5.0, 4.0.0, 3.5.1, 3.5.2
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> h2. Symptom
> It's observed that our Spark apps occasionally got stuck with an unexpected 
> stack trace when reading/parsing a legitimate time string. Note that we 
> manually killed the stuck app instances and the retry goes thru on the same 
> cluster (without requiring any app code change).
>  
> *[Stack Trace 1]* The stack trace doesn't make sense since *120s* is a 
> legitimate time string, where the app runs on emr-7.0.0 with Spark 3.5.0 
> runtime.
> {code:java}
> Caused by: java.lang.RuntimeException: java.lang.NumberFormatException: Time 
> must be specified as seconds (s), milliseconds (ms), microseconds (us), 
> minutes (m or min), hour (h), or day (d). E.g. 50s, 100ms, or 250us.
> Failed to parse time string: 120s
> at 
> org.apache.spark.network.util.JavaUtils.timeStringAs(JavaUtils.java:258)
> at 
> org.apache.spark.network.util.JavaUtils.timeStringAsSec(JavaUtils.java:275)
> at org.apache.spark.util.Utils$.timeStringAsSeconds(Utils.scala:1166)
> at org.apache.spark.rpc.RpcTimeout$.apply(RpcTimeout.scala:131)
> at org.apache.spark.util.RpcUtils$.askRpcTimeout(RpcUtils.scala:41)
> at org.apache.spark.rpc.RpcEndpointRef.(RpcEndpointRef.scala:33)
> at 
> org.apache.spark.rpc.netty.NettyRpcEndpointRef.(NettyRpcEnv.scala:533)
> at org.apache.spark.rpc.netty.RequestMessage$.apply(NettyRpcEnv.scala:640)
> at 
> org.apache.spark.rpc.netty.NettyRpcHandler.internalReceive(NettyRpcEnv.scala:697)
> at 
> org.apache.spark.rpc.netty.NettyRpcHandler.receive(NettyRpcEnv.scala:682)
> at 
> org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:163)
> at 
> org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:109)
> at 
> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:140)
> at 
> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:53)
> at 
> io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
> at 
> io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
> at 
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
> at 
>

[jira] [Commented] (SPARK-47759) Apps being stuck after JavaUtils.timeStringAs fails to parse a legitimate time string

2024-04-09 Thread Mridul Muralidharan (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-47759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835587#comment-17835587
 ] 

Mridul Muralidharan commented on SPARK-47759:
-

In order to validate, I would suggest two things.
Wrap the input str, within quote, to the exception ... for example, 
"120s\u00A0" will look like 120s in the exception message as it is a unicode 
non breaking space.
The other would be to include the NumberFormatException 'e' as the cause in the 
exception being thrown.

Once you are able to get a stack trace with these two change in place, it 
should help us debug this better.

> Apps being stuck after JavaUtils.timeStringAs fails to parse a legitimate 
> time string
> -
>
> Key: SPARK-47759
> URL: https://issues.apache.org/jira/browse/SPARK-47759
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.5.0, 3.5.1
>Reporter: Bo Xiong
>Assignee: Bo Xiong
>Priority: Critical
>  Labels: hang, pull-request-available, stuck, threadsafe
> Fix For: 3.5.0, 4.0.0, 3.5.1, 3.5.2
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> h2. Symptom
> It's observed that our Spark apps occasionally got stuck with an unexpected 
> stack trace when reading/parsing a legitimate time string. Note that we 
> manually killed the stuck app instances and the retry goes thru on the same 
> cluster (without requiring any app code change).
>  
> *[Stack Trace 1]* The stack trace doesn't make sense since *120s* is a 
> legitimate time string, where the app runs on emr-7.0.0 with Spark 3.5.0 
> runtime.
> {code:java}
> Caused by: java.lang.RuntimeException: java.lang.NumberFormatException: Time 
> must be specified as seconds (s), milliseconds (ms), microseconds (us), 
> minutes (m or min), hour (h), or day (d). E.g. 50s, 100ms, or 250us.
> Failed to parse time string: 120s
> at 
> org.apache.spark.network.util.JavaUtils.timeStringAs(JavaUtils.java:258)
> at 
> org.apache.spark.network.util.JavaUtils.timeStringAsSec(JavaUtils.java:275)
> at org.apache.spark.util.Utils$.timeStringAsSeconds(Utils.scala:1166)
> at org.apache.spark.rpc.RpcTimeout$.apply(RpcTimeout.scala:131)
> at org.apache.spark.util.RpcUtils$.askRpcTimeout(RpcUtils.scala:41)
> at org.apache.spark.rpc.RpcEndpointRef.(RpcEndpointRef.scala:33)
> at 
> org.apache.spark.rpc.netty.NettyRpcEndpointRef.(NettyRpcEnv.scala:533)
> at org.apache.spark.rpc.netty.RequestMessage$.apply(NettyRpcEnv.scala:640)
> at 
> org.apache.spark.rpc.netty.NettyRpcHandler.internalReceive(NettyRpcEnv.scala:697)
> at 
> org.apache.spark.rpc.netty.NettyRpcHandler.receive(NettyRpcEnv.scala:682)
> at 
> org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:163)
> at 
> org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:109)
> at 
> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:140)
> at 
> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:53)
> at 
> io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
> at 
> io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
> at 
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
> at 
> org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:102)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
> at 
>

[jira] [Updated] (SPARK-47083) Upgrade `commons-codec` to 1.16.1

2024-04-09 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-47083:
--
Fix Version/s: 3.5.2

> Upgrade `commons-codec` to 1.16.1
> -
>
> Key: SPARK-47083
> URL: https://issues.apache.org/jira/browse/SPARK-47083
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0, 3.5.2
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47761) Oracle: Support reading AnsiIntervalTypes

2024-04-09 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao resolved SPARK-47761.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45925
[https://github.com/apache/spark/pull/45925]

> Oracle: Support reading AnsiIntervalTypes
> -
>
> Key: SPARK-47761
> URL: https://issues.apache.org/jira/browse/SPARK-47761
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-47761) Oracle: Support reading AnsiIntervalTypes

2024-04-09 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao reassigned SPARK-47761:


Assignee: Kent Yao

> Oracle: Support reading AnsiIntervalTypes
> -
>
> Key: SPARK-47761
> URL: https://issues.apache.org/jira/browse/SPARK-47761
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47182) Exclude `commons-(io|lang3)` transitive dependencies from `commons-compress` and `avro-*`

2024-04-09 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-47182:
--
Fix Version/s: 3.5.2

> Exclude `commons-(io|lang3)` transitive dependencies from `commons-compress` 
> and `avro-*`
> -
>
> Key: SPARK-47182
> URL: https://issues.apache.org/jira/browse/SPARK-47182
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0, 3.5.2
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-47463) An error occurred while pushing down the filter of if expression for iceberg datasource.

2024-04-09 Thread Zhen Wang (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-47463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835282#comment-17835282
 ] 

Zhen Wang edited comment on SPARK-47463 at 4/10/24 2:22 AM:


new case:

filter with *nvl* expression:
{code:java}
create table t2(c1 boolean) using iceberg;
select * from t2 where nvl(c1, false) = true; {code}


was (Author: wforget):
new similar failure case:
{code:java}
create table t2(c1 boolean) using iceberg;
select * from t2 where nvl(c1, false) = true; {code}

> An error occurred while pushing down the filter of if expression for iceberg 
> datasource.
> 
>
> Key: SPARK-47463
> URL: https://issues.apache.org/jira/browse/SPARK-47463
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
> Environment: Spark 3.5.0
> Iceberg 1.4.3
>Reporter: Zhen Wang
>Priority: Major
>  Labels: pull-request-available
>
> Reproduce:
> {code:java}
> create table t1(c1 int) using iceberg;
> select * from
> (select if(c1 = 1, c1, null) as c1 from t1) t
> where t.c1 > 0; {code}
> Error:
> {code:java}
> org.apache.spark.SparkException: [INTERNAL_ERROR] The Spark SQL phase 
> optimization failed with an internal error. You hit a bug in Spark or the 
> Spark plugins you use. Please, report this bug to the corresponding 
> communities or vendors, and provide the full stack trace.
>   at 
> org.apache.spark.SparkException$.internalError(SparkException.scala:107)
>   at 
> org.apache.spark.sql.execution.QueryExecution$.toInternalError(QueryExecution.scala:536)
>   at 
> org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:548)
>   at 
> org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:219)
>   at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900)
>   at 
> org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:218)
>   at 
> org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:148)
>   at 
> org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:144)
>   at 
> org.apache.spark.sql.execution.QueryExecution.assertOptimized(QueryExecution.scala:162)
>   at 
> org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:182)
>   at 
> org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:179)
>   at 
> org.apache.spark.sql.execution.QueryExecution.simpleString(QueryExecution.scala:238)
>   at 
> org.apache.spark.sql.execution.QueryExecution.org$apache$spark$sql$execution$QueryExecution$$explainString(QueryExecution.scala:284)
>   at 
> org.apache.spark.sql.execution.QueryExecution.explainString(QueryExecution.scala:252)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:117)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:201)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:108)
>   at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:66)
>   at org.apache.spark.sql.Dataset.withAction(Dataset.scala:4327)
>   at org.apache.spark.sql.Dataset.collect(Dataset.scala:3580)
>   at 
> org.apache.kyuubi.engine.spark.operation.ExecuteStatement.fullCollectResult(ExecuteStatement.scala:72)
>   at 
> org.apache.kyuubi.engine.spark.operation.ExecuteStatement.collectAsIterator(ExecuteStatement.scala:164)
>   at 
> org.apache.kyuubi.engine.spark.operation.ExecuteStatement.$anonfun$executeStatement$1(ExecuteStatement.scala:87)
>   at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>   at 
> org.apache.kyuubi.engine.spark.operation.SparkOperation.$anonfun$withLocalProperties$1(SparkOperation.scala:155)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:201)
>   at 
> org.apache.kyuubi.engine.spark.operation.SparkOperation.withLocalProperties(SparkOperation.scala:139)
>   at 
> org.apache.kyuubi.engine.spark.operation.ExecuteStatement.executeStatement(ExecuteStatement.scala:81)
>   at 
> org.apache.kyuubi.engine.spark.operation.ExecuteStatement$$anon$1.run(ExecuteStatement.scala:103)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>

[jira] [Resolved] (SPARK-47586) Hive module: Migrate logError with variables to structured logging framework

2024-04-09 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-47586.

Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45876
[https://github.com/apache/spark/pull/45876]

> Hive module: Migrate logError with variables to structured logging framework
> 
>
> Key: SPARK-47586
> URL: https://issues.apache.org/jira/browse/SPARK-47586
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Gengliang Wang
>Assignee: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47787) Upgrade `commons-compress` to 1.26.1`

2024-04-09 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-47787.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45966
[https://github.com/apache/spark/pull/45966]

> Upgrade `commons-compress` to 1.26.1`
> -
>
> Key: SPARK-47787
> URL: https://issues.apache.org/jira/browse/SPARK-47787
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47759) Apps being stuck after JavaUtils.timeStringAs fails to parse a legitimate time string

2024-04-09 Thread Bo Xiong (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Xiong updated SPARK-47759:
-
Description: 
h2. Symptom

It's observed that our Spark apps occasionally got stuck with an unexpected 
stack trace when reading/parsing a legitimate time string. Note that we 
manually killed the stuck app instances and the retry goes thru on the same 
cluster (without requiring any app code change).

 

*[Stack Trace 1]* The stack trace doesn't make sense since *120s* is a 
legitimate time string, where the app runs on emr-7.0.0 with Spark 3.5.0 
runtime.
{code:java}
Caused by: java.lang.RuntimeException: java.lang.NumberFormatException: Time 
must be specified as seconds (s), milliseconds (ms), microseconds (us), minutes 
(m or min), hour (h), or day (d). E.g. 50s, 100ms, or 250us.
Failed to parse time string: 120s
at org.apache.spark.network.util.JavaUtils.timeStringAs(JavaUtils.java:258)
at 
org.apache.spark.network.util.JavaUtils.timeStringAsSec(JavaUtils.java:275)
at org.apache.spark.util.Utils$.timeStringAsSeconds(Utils.scala:1166)
at org.apache.spark.rpc.RpcTimeout$.apply(RpcTimeout.scala:131)
at org.apache.spark.util.RpcUtils$.askRpcTimeout(RpcUtils.scala:41)
at org.apache.spark.rpc.RpcEndpointRef.(RpcEndpointRef.scala:33)
at 
org.apache.spark.rpc.netty.NettyRpcEndpointRef.(NettyRpcEnv.scala:533)
at org.apache.spark.rpc.netty.RequestMessage$.apply(NettyRpcEnv.scala:640)
at 
org.apache.spark.rpc.netty.NettyRpcHandler.internalReceive(NettyRpcEnv.scala:697)
at org.apache.spark.rpc.netty.NettyRpcHandler.receive(NettyRpcEnv.scala:682)
at 
org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:163)
at 
org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:109)
at 
org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:140)
at 
org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:53)
at 
io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
at 
io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
at 
io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
at 
org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:102)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
at 
org.apache.spark.network.crypto.TransportCipher$DecryptionHandler.channelRead(TransportCipher.java:192)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
at 
io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at 
io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
at 
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788)
at

[jira] [Updated] (SPARK-47788) Ensure the same hash partitioning scheme/hash function is used across batches

2024-04-09 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47788:
---
Labels: pull-request-available  (was: )

> Ensure the same hash partitioning scheme/hash function is used across batches
> -
>
> Key: SPARK-47788
> URL: https://issues.apache.org/jira/browse/SPARK-47788
> Project: Spark
>  Issue Type: Test
>  Components: Structured Streaming
>Affects Versions: 3.5.1
>Reporter: Fanyue Xia
>Priority: Major
>  Labels: pull-request-available
>
> To really make sure that any changes to hash function / partitioner in Spark 
> doesn’t cause logical correctness issues in existing running streaming 
> queries, we should add a new unit test, to ensure hash function stability is 
> maintained.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-47578) Spark core: Migrate logWarn with variables to structured logging framework

2024-04-09 Thread Daniel (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-47578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835552#comment-17835552
 ] 

Daniel commented on SPARK-47578:


spark-2$ grep logWarning sql/core/src/main/* -R | cut -d ':' -f 1 | sort -u

Under sql/core/src/main/scala/org/apache/spark/sql/ (excluding streaming):

Column.scala
SparkSession.scala
api/python/PythonSQLUtils.scala
api/r/SQLUtils.scala
catalyst/analysis/ResolveSessionCatalog.scala
execution/CacheManager.scala
execution/ExistingRDD.scala
execution/OptimizeMetadataOnlyQuery.scala
execution/QueryExecution.scala
execution/SparkSqlParser.scala
execution/SparkStrategies.scala
execution/WholeStageCodegenExec.scala
execution/adaptive/AdaptiveSparkPlanExec.scala
execution/adaptive/InsertAdaptiveSparkPlan.scala
execution/adaptive/ShufflePartitionsUtil.scala
execution/aggregate/HashAggregateExec.scala
execution/command/AnalyzeTablesCommand.scala
execution/command/CommandUtils.scala
execution/command/SetCommand.scala
execution/command/createDataSourceTables.scala
execution/command/ddl.scala
execution/datasources/BasicWriteStatsTracker.scala
execution/datasources/DataSource.scala
execution/datasources/DataSourceManager.scala
execution/datasources/FilePartition.scala
execution/datasources/FileScanRDD.scala
execution/datasources/FileStatusCache.scala
execution/datasources/csv/CSVDataSource.scala
execution/datasources/jdbc/JDBCRDD.scala
execution/datasources/jdbc/JDBCRelation.scala
execution/datasources/jdbc/JdbcUtils.scala
execution/datasources/json/JsonOutputWriter.scala
execution/datasources/orc/OrcUtils.scala
execution/datasources/parquet/ParquetFileFormat.scala
execution/datasources/parquet/ParquetUtils.scala
execution/datasources/v2/CacheTableExec.scala
execution/datasources/v2/CreateIndexExec.scala
execution/datasources/v2/CreateNamespaceExec.scala
execution/datasources/v2/CreateTableExec.scala
execution/datasources/v2/DataSourceV2Strategy.scala
execution/datasources/v2/DropIndexExec.scala
execution/datasources/v2/FilePartitionReader.scala
execution/datasources/v2/FileScan.scala
execution/datasources/v2/V2ScanPartitioningAndOrdering.scala
execution/datasources/v2/jdbc/JDBCTableCatalog.scala
execution/datasources/v2/state/StatePartitionReader.scala
execution/datasources/xml/XmlDataSource.scala
execution/python/ApplyInPandasWithStatePythonRunner.scala
execution/python/AttachDistributedSequenceExec.scala
execution/ui/SQLAppStatusListener.scala
execution/window/WindowExecBase.scala
internal/SharedState.scala
jdbc/DB2Dialect.scala
jdbc/H2Dialect.scala
jdbc/JdbcDialects.scala
jdbc/MsSqlServerDialect.scala
jdbc/MySQLDialect.scala
jdbc/OracleDialect.scala

 

> Spark core: Migrate logWarn with variables to structured logging framework
> --
>
> Key: SPARK-47578
> URL: https://issues.apache.org/jira/browse/SPARK-47578
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Gengliang Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-47789) Review and improve error message texts

2024-04-09 Thread Serge Rielau (Jira)

Serge Rielau created SPARK-47789:


 Summary: Review and improve error message texts
 Key: SPARK-47789
 URL: https://issues.apache.org/jira/browse/SPARK-47789
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Serge Rielau


error-classes.json content could use some TLC to fix formatting, improve 
grammar, and other editing.
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-47583) SQL core: Migrate logError with variables to structured logging framework

2024-04-09 Thread Daniel (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-47583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835546#comment-17835546
 ] 

Daniel edited comment on SPARK-47583 at 4/9/24 11:26 PM:
-

spark$ grep logError sql/core/src/main/* -R | cut -d ':' -f 1 | sort -u

sql/core/src/main/scala/org/apache/spark/sql/execution (excluding the streaming 
directory):

BaseScriptTransformationExec.scala
adaptive/AdaptiveSparkPlanExec.scala
command/InsertIntoDataSourceDirCommand.scala
command/createDataSourceTables.scala
command/ddl.scala
datasources/FileFormatWriter.scala
datasources/jdbc/connection/ConnectionProvider.scala
datasources/v2/WriteToDataSourceV2Exec.scala
datasources/v2/jdbc/JDBCScanBuilder.scala
datasources/v2/jdbc/JDBCTableCatalog.scala
exchange/BroadcastExchangeExec.scala
python/PythonStreamingSourceRunner.scala


was (Author: JIRAUSER285772):
spark$ grep logError sql/core/src/main/* -R | cut -d ':' -f 1 | sort -u

sql/core/src/main/scala/org/apache/spark/sql/execution (excluding the streaming 
directory):

(X) BaseScriptTransformationExec.scala
(X) adaptive/AdaptiveSparkPlanExec.scala
(X) command/InsertIntoDataSourceDirCommand.scala
(X) command/createDataSourceTables.scala
(X) command/ddl.scala
(X) datasources/FileFormatWriter.scala
(X) datasources/jdbc/connection/ConnectionProvider.scala
(X) datasources/v2/WriteToDataSourceV2Exec.scala
(X) datasources/v2/jdbc/JDBCScanBuilder.scala
(X) datasources/v2/jdbc/JDBCTableCatalog.scala
(X) exchange/BroadcastExchangeExec.scala
python/PythonStreamingSourceRunner.scala

> SQL core: Migrate logError with variables to structured logging framework
> -
>
> Key: SPARK-47583
> URL: https://issues.apache.org/jira/browse/SPARK-47583
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Gengliang Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-47788) Ensure the same hash partitioning scheme/hash function is used across batches

2024-04-09 Thread Fanyue Xia (Jira)

Fanyue Xia created SPARK-47788:
--

 Summary: Ensure the same hash partitioning scheme/hash function is 
used across batches
 Key: SPARK-47788
 URL: https://issues.apache.org/jira/browse/SPARK-47788
 Project: Spark
  Issue Type: Test
  Components: Structured Streaming
Affects Versions: 3.5.1
Reporter: Fanyue Xia


To really make sure that any changes to hash function / partitioner in Spark 
doesn’t cause logical correctness issues in existing running streaming queries, 
we should add a new unit test, to ensure hash function stability is maintained.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-47583) SQL core: Migrate logError with variables to structured logging framework

2024-04-09 Thread Daniel (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-47583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835546#comment-17835546
 ] 

Daniel edited comment on SPARK-47583 at 4/9/24 11:16 PM:
-

spark$ grep logError sql/core/src/main/* -R | cut -d ':' -f 1 | sort -u

sql/core/src/main/scala/org/apache/spark/sql/execution (excluding the streaming 
directory):

(X) BaseScriptTransformationExec.scala
(X) adaptive/AdaptiveSparkPlanExec.scala
(X) command/InsertIntoDataSourceDirCommand.scala
(X) command/createDataSourceTables.scala
(X) command/ddl.scala
(X) datasources/FileFormatWriter.scala
(X) datasources/jdbc/connection/ConnectionProvider.scala
(X) datasources/v2/WriteToDataSourceV2Exec.scala
(X) datasources/v2/jdbc/JDBCScanBuilder.scala
(X) datasources/v2/jdbc/JDBCTableCatalog.scala
(X) exchange/BroadcastExchangeExec.scala
python/PythonStreamingSourceRunner.scala


was (Author: JIRAUSER285772):
spark$ grep logError sql/core/src/main/* -R | cut -d ':' -f 1 | sort -u

sql/core/src/main/scala/org/apache/spark/sql/execution (excluding the streaming 
directory):

(X) BaseScriptTransformationExec.scala
(X) adaptive/AdaptiveSparkPlanExec.scala
(X) command/InsertIntoDataSourceDirCommand.scala
(X) command/createDataSourceTables.scala
(X) command/ddl.scala
(X) datasources/FileFormatWriter.scala
(X) datasources/jdbc/connection/ConnectionProvider.scala
(X) datasources/v2/WriteToDataSourceV2Exec.scala
(X) datasources/v2/jdbc/JDBCScanBuilder.scala
(X) datasources/v2/jdbc/JDBCTableCatalog.scala
exchange/BroadcastExchangeExec.scala
python/PythonStreamingSourceRunner.scala

> SQL core: Migrate logError with variables to structured logging framework
> -
>
> Key: SPARK-47583
> URL: https://issues.apache.org/jira/browse/SPARK-47583
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Gengliang Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-47583) SQL core: Migrate logError with variables to structured logging framework

2024-04-09 Thread Daniel (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-47583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835546#comment-17835546
 ] 

Daniel edited comment on SPARK-47583 at 4/9/24 11:14 PM:
-

spark$ grep logError sql/core/src/main/* -R | cut -d ':' -f 1 | sort -u

sql/core/src/main/scala/org/apache/spark/sql/execution (excluding the streaming 
directory):

(X) BaseScriptTransformationExec.scala
(X) adaptive/AdaptiveSparkPlanExec.scala
(X) command/InsertIntoDataSourceDirCommand.scala
(X) command/createDataSourceTables.scala
(X) command/ddl.scala
(X) datasources/FileFormatWriter.scala
(X) datasources/jdbc/connection/ConnectionProvider.scala
(X) datasources/v2/WriteToDataSourceV2Exec.scala
(X) datasources/v2/jdbc/JDBCScanBuilder.scala
(X) datasources/v2/jdbc/JDBCTableCatalog.scala
exchange/BroadcastExchangeExec.scala
python/PythonStreamingSourceRunner.scala


was (Author: JIRAUSER285772):
spark$ grep logError sql/core/src/main/* -R | cut -d ':' -f 1 | sort -u

sql/core/src/main/scala/org/apache/spark/sql/execution (excluding the streaming 
directory):

(X) BaseScriptTransformationExec.scala
(X) adaptive/AdaptiveSparkPlanExec.scala
(X) command/InsertIntoDataSourceDirCommand.scala
(X) command/createDataSourceTables.scala
(X) command/ddl.scala
(X) datasources/FileFormatWriter.scala
(X) datasources/jdbc/connection/ConnectionProvider.scala
(X) datasources/v2/WriteToDataSourceV2Exec.scala
(X) datasources/v2/jdbc/JDBCScanBuilder.scala
datasources/v2/jdbc/JDBCTableCatalog.scala
exchange/BroadcastExchangeExec.scala
python/PythonStreamingSourceRunner.scala

> SQL core: Migrate logError with variables to structured logging framework
> -
>
> Key: SPARK-47583
> URL: https://issues.apache.org/jira/browse/SPARK-47583
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Gengliang Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-47583) SQL core: Migrate logError with variables to structured logging framework

2024-04-09 Thread Daniel (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-47583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835546#comment-17835546
 ] 

Daniel edited comment on SPARK-47583 at 4/9/24 11:14 PM:
-

spark$ grep logError sql/core/src/main/* -R | cut -d ':' -f 1 | sort -u

sql/core/src/main/scala/org/apache/spark/sql/execution (excluding the streaming 
directory):

(X) BaseScriptTransformationExec.scala
(X) adaptive/AdaptiveSparkPlanExec.scala
(X) command/InsertIntoDataSourceDirCommand.scala
(X) command/createDataSourceTables.scala
(X) command/ddl.scala
(X) datasources/FileFormatWriter.scala
(X) datasources/jdbc/connection/ConnectionProvider.scala
(X) datasources/v2/WriteToDataSourceV2Exec.scala
(X) datasources/v2/jdbc/JDBCScanBuilder.scala
datasources/v2/jdbc/JDBCTableCatalog.scala
exchange/BroadcastExchangeExec.scala
python/PythonStreamingSourceRunner.scala


was (Author: JIRAUSER285772):
spark$ grep logError sql/core/src/main/* -R | cut -d ':' -f 1 | sort -u

sql/core/src/main/scala/org/apache/spark/sql/execution (excluding the streaming 
directory):

(X) BaseScriptTransformationExec.scala
(X) adaptive/AdaptiveSparkPlanExec.scala
(X) command/InsertIntoDataSourceDirCommand.scala
(X) command/createDataSourceTables.scala
(X) command/ddl.scala
(X) datasources/FileFormatWriter.scala
(X) datasources/jdbc/connection/ConnectionProvider.scala
(X) datasources/v2/WriteToDataSourceV2Exec.scala
datasources/v2/jdbc/JDBCScanBuilder.scala
datasources/v2/jdbc/JDBCTableCatalog.scala
exchange/BroadcastExchangeExec.scala
python/PythonStreamingSourceRunner.scala

> SQL core: Migrate logError with variables to structured logging framework
> -
>
> Key: SPARK-47583
> URL: https://issues.apache.org/jira/browse/SPARK-47583
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Gengliang Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-47583) SQL core: Migrate logError with variables to structured logging framework

2024-04-09 Thread Daniel (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-47583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835546#comment-17835546
 ] 

Daniel edited comment on SPARK-47583 at 4/9/24 11:07 PM:
-

spark$ grep logError sql/core/src/main/* -R | cut -d ':' -f 1 | sort -u

sql/core/src/main/scala/org/apache/spark/sql/execution (excluding the streaming 
directory):

(X) BaseScriptTransformationExec.scala
(X) adaptive/AdaptiveSparkPlanExec.scala
(X) command/InsertIntoDataSourceDirCommand.scala
(X) command/createDataSourceTables.scala
(X) command/ddl.scala
(X) datasources/FileFormatWriter.scala
datasources/jdbc/connection/ConnectionProvider.scala
datasources/v2/WriteToDataSourceV2Exec.scala
datasources/v2/jdbc/JDBCScanBuilder.scala
datasources/v2/jdbc/JDBCTableCatalog.scala
exchange/BroadcastExchangeExec.scala
python/PythonStreamingSourceRunner.scala


was (Author: JIRAUSER285772):
spark$ grep logError sql/core/src/main/* -R | cut -d ':' -f 1 | sort -u

sql/core/src/main/scala/org/apache/spark/sql/execution (excluding the streaming 
directory):

(X) BaseScriptTransformationExec.scala
(X) adaptive/AdaptiveSparkPlanExec.scala
(X) command/InsertIntoDataSourceDirCommand.scala
(X) command/createDataSourceTables.scala
(X) command/ddl.scala
datasources/FileFormatWriter.scala
datasources/jdbc/connection/ConnectionProvider.scala
datasources/v2/WriteToDataSourceV2Exec.scala
datasources/v2/jdbc/JDBCScanBuilder.scala
datasources/v2/jdbc/JDBCTableCatalog.scala
exchange/BroadcastExchangeExec.scala
python/PythonStreamingSourceRunner.scala

> SQL core: Migrate logError with variables to structured logging framework
> -
>
> Key: SPARK-47583
> URL: https://issues.apache.org/jira/browse/SPARK-47583
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Gengliang Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-47583) SQL core: Migrate logError with variables to structured logging framework

2024-04-09 Thread Daniel (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-47583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835546#comment-17835546
 ] 

Daniel edited comment on SPARK-47583 at 4/9/24 11:07 PM:
-

spark$ grep logError sql/core/src/main/* -R | cut -d ':' -f 1 | sort -u

sql/core/src/main/scala/org/apache/spark/sql/execution (excluding the streaming 
directory):

(X) BaseScriptTransformationExec.scala
(X) adaptive/AdaptiveSparkPlanExec.scala
(X) command/InsertIntoDataSourceDirCommand.scala
(X) command/createDataSourceTables.scala
(X) command/ddl.scala
(X) datasources/FileFormatWriter.scala
(X) datasources/jdbc/connection/ConnectionProvider.scala
datasources/v2/WriteToDataSourceV2Exec.scala
datasources/v2/jdbc/JDBCScanBuilder.scala
datasources/v2/jdbc/JDBCTableCatalog.scala
exchange/BroadcastExchangeExec.scala
python/PythonStreamingSourceRunner.scala


was (Author: JIRAUSER285772):
spark$ grep logError sql/core/src/main/* -R | cut -d ':' -f 1 | sort -u

sql/core/src/main/scala/org/apache/spark/sql/execution (excluding the streaming 
directory):

(X) BaseScriptTransformationExec.scala
(X) adaptive/AdaptiveSparkPlanExec.scala
(X) command/InsertIntoDataSourceDirCommand.scala
(X) command/createDataSourceTables.scala
(X) command/ddl.scala
(X) datasources/FileFormatWriter.scala
datasources/jdbc/connection/ConnectionProvider.scala
datasources/v2/WriteToDataSourceV2Exec.scala
datasources/v2/jdbc/JDBCScanBuilder.scala
datasources/v2/jdbc/JDBCTableCatalog.scala
exchange/BroadcastExchangeExec.scala
python/PythonStreamingSourceRunner.scala

> SQL core: Migrate logError with variables to structured logging framework
> -
>
> Key: SPARK-47583
> URL: https://issues.apache.org/jira/browse/SPARK-47583
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Gengliang Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-47583) SQL core: Migrate logError with variables to structured logging framework

2024-04-09 Thread Daniel (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-47583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835546#comment-17835546
 ] 

Daniel edited comment on SPARK-47583 at 4/9/24 10:59 PM:
-

spark$ grep logError sql/core/src/main/* -R | cut -d ':' -f 1 | sort -u

sql/core/src/main/scala/org/apache/spark/sql/execution (excluding the streaming 
directory):

(X) BaseScriptTransformationExec.scala
(X) adaptive/AdaptiveSparkPlanExec.scala
(X) command/InsertIntoDataSourceDirCommand.scala
(X) command/createDataSourceTables.scala
(X) command/ddl.scala
datasources/FileFormatWriter.scala
datasources/jdbc/connection/ConnectionProvider.scala
datasources/v2/WriteToDataSourceV2Exec.scala
datasources/v2/jdbc/JDBCScanBuilder.scala
datasources/v2/jdbc/JDBCTableCatalog.scala
exchange/BroadcastExchangeExec.scala
python/PythonStreamingSourceRunner.scala


was (Author: JIRAUSER285772):
spark$ grep logError sql/core/src/main/* -R | cut -d ':' -f 1 | sort -u

sql/core/src/main/scala/org/apache/spark/sql/execution (excluding the streaming 
directory):

(X) BaseScriptTransformationExec.scala
(X) adaptive/AdaptiveSparkPlanExec.scala
(X) command/InsertIntoDataSourceDirCommand.scala
(X) command/createDataSourceTables.scala
command/ddl.scala
datasources/FileFormatWriter.scala
datasources/jdbc/connection/ConnectionProvider.scala
datasources/v2/WriteToDataSourceV2Exec.scala
datasources/v2/jdbc/JDBCScanBuilder.scala
datasources/v2/jdbc/JDBCTableCatalog.scala
exchange/BroadcastExchangeExec.scala
python/PythonStreamingSourceRunner.scala

> SQL core: Migrate logError with variables to structured logging framework
> -
>
> Key: SPARK-47583
> URL: https://issues.apache.org/jira/browse/SPARK-47583
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Gengliang Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-47583) SQL core: Migrate logError with variables to structured logging framework

2024-04-09 Thread Daniel (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-47583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835546#comment-17835546
 ] 

Daniel edited comment on SPARK-47583 at 4/9/24 10:55 PM:
-

spark$ grep logError sql/core/src/main/* -R | cut -d ':' -f 1 | sort -u

sql/core/src/main/scala/org/apache/spark/sql/execution (excluding the streaming 
directory):

(X) BaseScriptTransformationExec.scala
(X) adaptive/AdaptiveSparkPlanExec.scala
(X) command/InsertIntoDataSourceDirCommand.scala
command/createDataSourceTables.scala
command/ddl.scala
datasources/FileFormatWriter.scala
datasources/jdbc/connection/ConnectionProvider.scala
datasources/v2/WriteToDataSourceV2Exec.scala
datasources/v2/jdbc/JDBCScanBuilder.scala
datasources/v2/jdbc/JDBCTableCatalog.scala
exchange/BroadcastExchangeExec.scala
python/PythonStreamingSourceRunner.scala


was (Author: JIRAUSER285772):
spark$ grep logError sql/core/src/main/* -R | cut -d ':' -f 1 | sort -u

sql/core/src/main/scala/org/apache/spark/sql/execution (excluding the streaming 
directory):

(X) BaseScriptTransformationExec.scala
(X) adaptive/AdaptiveSparkPlanExec.scala
command/InsertIntoDataSourceDirCommand.scala
command/createDataSourceTables.scala
command/ddl.scala
datasources/FileFormatWriter.scala
datasources/jdbc/connection/ConnectionProvider.scala
datasources/v2/WriteToDataSourceV2Exec.scala
datasources/v2/jdbc/JDBCScanBuilder.scala
datasources/v2/jdbc/JDBCTableCatalog.scala
exchange/BroadcastExchangeExec.scala
python/PythonStreamingSourceRunner.scala

> SQL core: Migrate logError with variables to structured logging framework
> -
>
> Key: SPARK-47583
> URL: https://issues.apache.org/jira/browse/SPARK-47583
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Gengliang Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-47583) SQL core: Migrate logError with variables to structured logging framework

2024-04-09 Thread Daniel (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-47583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835546#comment-17835546
 ] 

Daniel edited comment on SPARK-47583 at 4/9/24 10:53 PM:
-

spark$ grep logError sql/core/src/main/* -R | cut -d ':' -f 1 | sort -u

sql/core/src/main/scala/org/apache/spark/sql/execution (excluding the streaming 
directory):

(X) BaseScriptTransformationExec.scala
(X) adaptive/AdaptiveSparkPlanExec.scala
command/InsertIntoDataSourceDirCommand.scala
command/createDataSourceTables.scala
command/ddl.scala
datasources/FileFormatWriter.scala
datasources/jdbc/connection/ConnectionProvider.scala
datasources/v2/WriteToDataSourceV2Exec.scala
datasources/v2/jdbc/JDBCScanBuilder.scala
datasources/v2/jdbc/JDBCTableCatalog.scala
exchange/BroadcastExchangeExec.scala
python/PythonStreamingSourceRunner.scala


was (Author: JIRAUSER285772):
spark$ grep logError sql/core/src/main/* -R | cut -d ':' -f 1 | sort -u

sql/core/src/main/scala/org/apache/spark/sql/execution (excluding the streaming 
directory):

(X) BaseScriptTransformationExec.scala
adaptive/AdaptiveSparkPlanExec.scala
command/InsertIntoDataSourceDirCommand.scala
command/createDataSourceTables.scala
command/ddl.scala
datasources/FileFormatWriter.scala
datasources/jdbc/connection/ConnectionProvider.scala
datasources/v2/WriteToDataSourceV2Exec.scala
datasources/v2/jdbc/JDBCScanBuilder.scala
datasources/v2/jdbc/JDBCTableCatalog.scala
exchange/BroadcastExchangeExec.scala
python/PythonStreamingSourceRunner.scala

> SQL core: Migrate logError with variables to structured logging framework
> -
>
> Key: SPARK-47583
> URL: https://issues.apache.org/jira/browse/SPARK-47583
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Gengliang Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-47583) SQL core: Migrate logError with variables to structured logging framework

2024-04-09 Thread Daniel (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-47583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835546#comment-17835546
 ] 

Daniel edited comment on SPARK-47583 at 4/9/24 10:47 PM:
-

spark$ grep logError sql/core/src/main/* -R | cut -d ':' -f 1 | sort -u

sql/core/src/main/scala/org/apache/spark/sql/execution:

(X) BaseScriptTransformationExec.scala
adaptive/AdaptiveSparkPlanExec.scala
command/InsertIntoDataSourceDirCommand.scala
command/createDataSourceTables.scala
command/ddl.scala
datasources/FileFormatWriter.scala
datasources/jdbc/connection/ConnectionProvider.scala
datasources/v2/WriteToDataSourceV2Exec.scala
datasources/v2/jdbc/JDBCScanBuilder.scala
datasources/v2/jdbc/JDBCTableCatalog.scala
exchange/BroadcastExchangeExec.scala
python/PythonStreamingSourceRunner.scala


was (Author: JIRAUSER285772):
spark$ grep logError sql/core/src/main/* -R | cut -d ':' -f 1 | sort -u

sql/core/src/main/scala/org/apache/spark/sql/execution:

(X) BaseScriptTransformationExec.scala
adaptive/AdaptiveSparkPlanExec.scala
command/InsertIntoDataSourceDirCommand.scala
command/createDataSourceTables.scala
command/ddl.scala
datasources/FileFormatWriter.scala
datasources/jdbc/connection/ConnectionProvider.scala
datasources/v2/WriteToDataSourceV2Exec.scala
datasources/v2/jdbc/JDBCScanBuilder.scala
datasources/v2/jdbc/JDBCTableCatalog.scala
exchange/BroadcastExchangeExec.scala
python/PythonStreamingSourceRunner.scala
streaming/AsyncCommitLog.scala
streaming/AsyncLogPurge.scala
streaming/AsyncOffsetSeqLog.scala
streaming/AsyncProgressTrackingMicroBatchExecution.scala
streaming/ErrorNotifier.scala
streaming/MicroBatchExecution.scala
streaming/StreamExecution.scala
streaming/StreamMetadata.scala
streaming/continuous/ContinuousExecution.scala
streaming/continuous/ContinuousWriteRDD.scala
streaming/state/OperatorStateMetadata.scala
streaming/state/RocksDB.scala
streaming/state/RocksDBFileManager.scala
streaming/state/StateSchemaCompatibilityChecker.scala
streaming/state/StateStoreChangelog.scala

 

> SQL core: Migrate logError with variables to structured logging framework
> -
>
> Key: SPARK-47583
> URL: https://issues.apache.org/jira/browse/SPARK-47583
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Gengliang Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-47583) SQL core: Migrate logError with variables to structured logging framework

2024-04-09 Thread Daniel (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-47583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835546#comment-17835546
 ] 

Daniel edited comment on SPARK-47583 at 4/9/24 10:32 PM:
-

spark$ grep logError sql/core/src/main/* -R | cut -d ':' -f 1 | sort -u

sql/core/src/main/scala/org/apache/spark/sql/execution:

(X) BaseScriptTransformationExec.scala
adaptive/AdaptiveSparkPlanExec.scala
command/InsertIntoDataSourceDirCommand.scala
command/createDataSourceTables.scala
command/ddl.scala
datasources/FileFormatWriter.scala
datasources/jdbc/connection/ConnectionProvider.scala
datasources/v2/WriteToDataSourceV2Exec.scala
datasources/v2/jdbc/JDBCScanBuilder.scala
datasources/v2/jdbc/JDBCTableCatalog.scala
exchange/BroadcastExchangeExec.scala
python/PythonStreamingSourceRunner.scala
streaming/AsyncCommitLog.scala
streaming/AsyncLogPurge.scala
streaming/AsyncOffsetSeqLog.scala
streaming/AsyncProgressTrackingMicroBatchExecution.scala
streaming/ErrorNotifier.scala
streaming/MicroBatchExecution.scala
streaming/StreamExecution.scala
streaming/StreamMetadata.scala
streaming/continuous/ContinuousExecution.scala
streaming/continuous/ContinuousWriteRDD.scala
streaming/state/OperatorStateMetadata.scala
streaming/state/RocksDB.scala
streaming/state/RocksDBFileManager.scala
streaming/state/StateSchemaCompatibilityChecker.scala
streaming/state/StateStoreChangelog.scala

 


was (Author: JIRAUSER285772):
spark$ grep logError sql/core/src/main/* -R | cut -d ':' -f 1 | sort -u

(X) 
sql/core/src/main/scala/org/apache/spark/sql/execution/BaseScriptTransformationExec.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/command/InsertIntoDataSourceDirCommand.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/connection/ConnectionProvider.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2Exec.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCScanBuilder.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTableCatalog.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/python/PythonStreamingSourceRunner.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/AsyncCommitLog.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/AsyncLogPurge.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/AsyncOffsetSeqLog.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/AsyncProgressTrackingMicroBatchExecution.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ErrorNotifier.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamMetadata.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousExecution.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousWriteRDD.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/OperatorStateMetadata.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileManager.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateSchemaCompatibilityChecker.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStoreChangelog.scala
sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala

> SQL core: Migrate logError with variables to structured logging framework
> -
>
> Key: SPARK-47583
> URL: https://issues.apache.org/jira/browse/SPARK-47583
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Gengliang Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-47583) SQL core: Migrate logError with variables to structured logging framework

2024-04-09 Thread Daniel (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-47583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835546#comment-17835546
 ] 

Daniel edited comment on SPARK-47583 at 4/9/24 10:30 PM:
-

spark$ grep logError sql/core/src/main/* -R | cut -d ':' -f 1 | sort -u

(X) 
sql/core/src/main/scala/org/apache/spark/sql/execution/BaseScriptTransformationExec.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/command/InsertIntoDataSourceDirCommand.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/connection/ConnectionProvider.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2Exec.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCScanBuilder.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTableCatalog.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/python/PythonStreamingSourceRunner.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/AsyncCommitLog.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/AsyncLogPurge.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/AsyncOffsetSeqLog.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/AsyncProgressTrackingMicroBatchExecution.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ErrorNotifier.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamMetadata.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousExecution.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousWriteRDD.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/OperatorStateMetadata.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileManager.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateSchemaCompatibilityChecker.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStoreChangelog.scala
sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala


was (Author: JIRAUSER285772):
spark$ grep logError sql/core/src/main/* -R | cut -d ':' -f 1 | sort -u


sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/BaseScriptTransformationExec.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/command/InsertIntoDataSourceDirCommand.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/connection/ConnectionProvider.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2Exec.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCScanBuilder.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTableCatalog.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/python/PythonStreamingSourceRunner.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/AsyncCommitLog.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/AsyncLogPurge.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/AsyncOffsetSeqLog.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/AsyncProgressTrackingMicroBatchExecution.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ErrorNotifier.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala

[jira] [Commented] (SPARK-47583) SQL core: Migrate logError with variables to structured logging framework

2024-04-09 Thread Daniel (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-47583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835546#comment-17835546
 ] 

Daniel commented on SPARK-47583:


spark$ grep logError sql/core/src/main/* -R | cut -d ':' -f 1 | sort -u


sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/BaseScriptTransformationExec.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/command/InsertIntoDataSourceDirCommand.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/connection/ConnectionProvider.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2Exec.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCScanBuilder.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTableCatalog.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/python/PythonStreamingSourceRunner.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/AsyncCommitLog.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/AsyncLogPurge.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/AsyncOffsetSeqLog.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/AsyncProgressTrackingMicroBatchExecution.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ErrorNotifier.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamMetadata.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousExecution.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousWriteRDD.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/OperatorStateMetadata.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileManager.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateSchemaCompatibilityChecker.scala
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStoreChangelog.scala
sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala

> SQL core: Migrate logError with variables to structured logging framework
> -
>
> Key: SPARK-47583
> URL: https://issues.apache.org/jira/browse/SPARK-47583
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Gengliang Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47787) Upgrade `commons-compress` to 1.26.1`

2024-04-09 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47787:
---
Labels: pull-request-available  (was: )

> Upgrade `commons-compress` to 1.26.1`
> -
>
> Key: SPARK-47787
> URL: https://issues.apache.org/jira/browse/SPARK-47787
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47787) Upgrade `commons-compress` to 1.26.1`

2024-04-09 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-47787:
--
Summary: Upgrade `commons-compress` to 1.26.1`  (was: Upgrade 
`commons-compress` to 1.26.`)

> Upgrade `commons-compress` to 1.26.1`
> -
>
> Key: SPARK-47787
> URL: https://issues.apache.org/jira/browse/SPARK-47787
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-47787) Upgrade `commons-compress` to 1.26.`

2024-04-09 Thread Dongjoon Hyun (Jira)

Dongjoon Hyun created SPARK-47787:
-

 Summary: Upgrade `commons-compress` to 1.26.`
 Key: SPARK-47787
 URL: https://issues.apache.org/jira/browse/SPARK-47787
 Project: Spark
  Issue Type: Sub-task
  Components: Build
Affects Versions: 4.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47718) .sql() does not recognize watermark defined upstream

2024-04-09 Thread Wei Liu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Liu updated SPARK-47718:

Labels:   (was: pull-request-available)

> .sql() does not recognize watermark defined upstream
> 
>
> Key: SPARK-47718
> URL: https://issues.apache.org/jira/browse/SPARK-47718
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.5.1
>Reporter: Chloe He
>Priority: Major
>
> I have a data pipeline set up in such a way that it reads data from a Kafka 
> source, does some transformation on the data using pyspark, then writes the 
> output into a sink (Kafka, Redis, etc).
>  
> My entire pipeline in written in SQL, so I wish to use the .sql() method to 
> execute SQL on my streaming source directly.
>  
> However, I'm running into the issue where my watermark is not being 
> recognized by the downstream query via the .sql() method.
>  
> ```
> Python 3.11.8 | packaged by conda-forge | (main, Feb 16 2024, 20:49:36) 
> [Clang 16.0.6 ] on darwin
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import pyspark
> >>> print(pyspark.__version__)
> 3.5.1
> >>> from pyspark.sql import SparkSession
> >>>
> >>> session = SparkSession.builder \
> ...     .config("spark.jars.packages", 
> "org.apache.spark:spark-sql-kafka-0-10_2.12:3.5.1")\
> ...     .getOrCreate()
> >>> from pyspark.sql.functions import col, from_json
> >>> from pyspark.sql.types import StructField, StructType, TimestampType, 
> >>> LongType, DoubleType, IntegerType
> >>> schema = StructType(
> ...     [
> ...         StructField('createTime', TimestampType(), True),
> ...         StructField('orderId', LongType(), True),
> ...         StructField('payAmount', DoubleType(), True),
> ...         StructField('payPlatform', IntegerType(), True),
> ...         StructField('provinceId', IntegerType(), True),
> ...     ])
> >>>
> >>> streaming_df = session.readStream\
> ...     .format("kafka")\
> ...     .option("kafka.bootstrap.servers", "localhost:9092")\
> ...     .option("subscribe", "payment_msg")\
> ...     .option("startingOffsets","earliest")\
> ...     .load()\
> ...     .select(from_json(col("value").cast("string"), 
> schema).alias("parsed_value"))\
> ...     .select("parsed_value.*")\
> ...     .withWatermark("createTime", "10 seconds")
> >>>
> >>> streaming_df.createOrReplaceTempView("streaming_df")
> >>> session.sql("""
> ... SELECT
> ...     window.start, window.end, provinceId, sum(payAmount) as totalPayAmount
> ...     FROM streaming_df
> ...     GROUP BY provinceId, window('createTime', '1 hour', '30 minutes')
> ...     ORDER BY window.start
> ... """)\
> ...   .writeStream\
> ...   .format("kafka") \
> ...   .option("checkpointLocation", "checkpoint") \
> ...   .option("kafka.bootstrap.servers", "localhost:9092") \
> ...   .option("topic", "sink") \
> ...   .start()
> ```
>  
> This throws exception
> ```
> pyspark.errors.exceptions.captured.AnalysisException: Append output mode not 
> supported when there are streaming aggregations on streaming 
> DataFrames/DataSets without watermark; line 6 pos 4;
> ```
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-47783) Refresh error-states.json

2024-04-09 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang reassigned SPARK-47783:
--

Assignee: Serge Rielau

> Refresh error-states.json
> -
>
> Key: SPARK-47783
> URL: https://issues.apache.org/jira/browse/SPARK-47783
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Serge Rielau
>Assignee: Serge Rielau
>Priority: Major
>  Labels: pull-request-available
>
> We want to add more SQLSTATEs to the menu to prevent collisions and do some 
> general cleanup



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47783) Refresh error-states.json

2024-04-09 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-47783.

Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45961
[https://github.com/apache/spark/pull/45961]

> Refresh error-states.json
> -
>
> Key: SPARK-47783
> URL: https://issues.apache.org/jira/browse/SPARK-47783
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Serge Rielau
>Assignee: Serge Rielau
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> We want to add more SQLSTATEs to the menu to prevent collisions and do some 
> general cleanup



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47759) Apps being stuck after JavaUtils.timeStringAs fails to parse a legitimate time string

2024-04-09 Thread Bo Xiong (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Xiong updated SPARK-47759:
-
Description: 
h2. Symptom

It's observed that our Spark apps occasionally got stuck with an unexpected 
stack trace when reading/parsing a legitimate time string. Note that we 
manually killed the stuck app instances and the retry goes thru on the same 
cluster (without requiring any app code change).

 

*[Stack Trace 1]* The stack trace doesn't make sense since *120s* is a 
legitimate time string, where the app runs on emr-7.0.0 with Spark 3.5.0 
runtime.
{code:java}
Caused by: java.lang.RuntimeException: java.lang.NumberFormatException: Time 
must be specified as seconds (s), milliseconds (ms), microseconds (us), minutes 
(m or min), hour (h), or day (d). E.g. 50s, 100ms, or 250us.
Failed to parse time string: 120s
at org.apache.spark.network.util.JavaUtils.timeStringAs(JavaUtils.java:258)
at 
org.apache.spark.network.util.JavaUtils.timeStringAsSec(JavaUtils.java:275)
at org.apache.spark.util.Utils$.timeStringAsSeconds(Utils.scala:1166)
at org.apache.spark.rpc.RpcTimeout$.apply(RpcTimeout.scala:131)
at org.apache.spark.util.RpcUtils$.askRpcTimeout(RpcUtils.scala:41)
at org.apache.spark.rpc.RpcEndpointRef.(RpcEndpointRef.scala:33)
at 
org.apache.spark.rpc.netty.NettyRpcEndpointRef.(NettyRpcEnv.scala:533)
at org.apache.spark.rpc.netty.RequestMessage$.apply(NettyRpcEnv.scala:640)
at 
org.apache.spark.rpc.netty.NettyRpcHandler.internalReceive(NettyRpcEnv.scala:697)
at org.apache.spark.rpc.netty.NettyRpcHandler.receive(NettyRpcEnv.scala:682)
at 
org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:163)
at 
org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:109)
at 
org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:140)
at 
org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:53)
at 
io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
at 
io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
at 
io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
at 
org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:102)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
at 
org.apache.spark.network.crypto.TransportCipher$DecryptionHandler.channelRead(TransportCipher.java:192)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
at 
io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at 
io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
at 
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788)
at

[jira] [Updated] (SPARK-47759) Apps being stuck after JavaUtils.timeStringAs fails to parse a legitimate time string

2024-04-09 Thread Bo Xiong (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Xiong updated SPARK-47759:
-
Description: 
h2. Symptom

It's observed that our Spark apps occasionally got stuck with an unexpected 
stack trace when reading/parsing a legitimate time string. Note that we 
manually killed the stuck app instances and the retry goes thru on the same 
cluster (without requiring any app code change).

 

*[Stack Trace 1]* The stack trace doesn't make sense since *120s* is a 
legitimate time string, where the app runs on emr-7.0.0 with Spark 3.5.0 
runtime.
{code:java}
Caused by: java.lang.RuntimeException: java.lang.NumberFormatException: Time 
must be specified as seconds (s), milliseconds (ms), microseconds (us), minutes 
(m or min), hour (h), or day (d). E.g. 50s, 100ms, or 250us.
Failed to parse time string: 120s
at org.apache.spark.network.util.JavaUtils.timeStringAs(JavaUtils.java:258)
at 
org.apache.spark.network.util.JavaUtils.timeStringAsSec(JavaUtils.java:275)
at org.apache.spark.util.Utils$.timeStringAsSeconds(Utils.scala:1166)
at org.apache.spark.rpc.RpcTimeout$.apply(RpcTimeout.scala:131)
at org.apache.spark.util.RpcUtils$.askRpcTimeout(RpcUtils.scala:41)
at org.apache.spark.rpc.RpcEndpointRef.(RpcEndpointRef.scala:33)
at 
org.apache.spark.rpc.netty.NettyRpcEndpointRef.(NettyRpcEnv.scala:533)
at org.apache.spark.rpc.netty.RequestMessage$.apply(NettyRpcEnv.scala:640)
at 
org.apache.spark.rpc.netty.NettyRpcHandler.internalReceive(NettyRpcEnv.scala:697)
at org.apache.spark.rpc.netty.NettyRpcHandler.receive(NettyRpcEnv.scala:682)
at 
org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:163)
at 
org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:109)
at 
org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:140)
at 
org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:53)
at 
io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
at 
io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
at 
io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
at 
org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:102)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
at 
org.apache.spark.network.crypto.TransportCipher$DecryptionHandler.channelRead(TransportCipher.java:192)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
at 
io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at 
io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
at 
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788)
at

[jira] [Updated] (SPARK-47415) Levenshtein (all collations)

2024-04-09 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47415:
---
Labels: pull-request-available  (was: )

> Levenshtein (all collations)
> 
>
> Key: SPARK-47415
> URL: https://issues.apache.org/jira/browse/SPARK-47415
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47759) Apps being stuck after JavaUtils.timeStringAs fails to parse a legitimate time string

2024-04-09 Thread Bo Xiong (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Xiong updated SPARK-47759:
-
Description: 
h2. Symptom

It's observed that our Spark apps occasionally got stuck with an unexpected 
stack trace when reading/parsing a legitimate time string. Note that we 
manually killed the stuck app instances and the retry goes thru on the same 
cluster (without requiring any app code change).

 

*[Stack Trace 1]* The stack trace doesn't make sense since *120s* is a 
legitimate time string, where the app runs on emr-7.0.0 with Spark 3.5.0 
runtime.
{code:java}
Caused by: java.lang.RuntimeException: java.lang.NumberFormatException: Time 
must be specified as seconds (s), milliseconds (ms), microseconds (us), minutes 
(m or min), hour (h), or day (d). E.g. 50s, 100ms, or 250us.
Failed to parse time string: 120s
at org.apache.spark.network.util.JavaUtils.timeStringAs(JavaUtils.java:258)
at 
org.apache.spark.network.util.JavaUtils.timeStringAsSec(JavaUtils.java:275)
at org.apache.spark.util.Utils$.timeStringAsSeconds(Utils.scala:1166)
at org.apache.spark.rpc.RpcTimeout$.apply(RpcTimeout.scala:131)
at org.apache.spark.util.RpcUtils$.askRpcTimeout(RpcUtils.scala:41)
at org.apache.spark.rpc.RpcEndpointRef.(RpcEndpointRef.scala:33)
at 
org.apache.spark.rpc.netty.NettyRpcEndpointRef.(NettyRpcEnv.scala:533)
at org.apache.spark.rpc.netty.RequestMessage$.apply(NettyRpcEnv.scala:640)
at 
org.apache.spark.rpc.netty.NettyRpcHandler.internalReceive(NettyRpcEnv.scala:697)
at org.apache.spark.rpc.netty.NettyRpcHandler.receive(NettyRpcEnv.scala:682)
at 
org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:163)
at 
org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:109)
at 
org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:140)
at 
org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:53)
at 
io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
at 
io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
at 
io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
at 
org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:102)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
at 
org.apache.spark.network.crypto.TransportCipher$DecryptionHandler.channelRead(TransportCipher.java:192)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
at 
io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at 
io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
at 
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788)
at

[jira] [Created] (SPARK-47785) Upgrade `bouncycastle` to 1.78

2024-04-09 Thread Dongjoon Hyun (Jira)

Dongjoon Hyun created SPARK-47785:
-

 Summary: Upgrade `bouncycastle` to 1.78
 Key: SPARK-47785
 URL: https://issues.apache.org/jira/browse/SPARK-47785
 Project: Spark
  Issue Type: Sub-task
  Components: Build, Tests
Affects Versions: 4.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47783) Refresh error-states.json

2024-04-09 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47783:
---
Labels: pull-request-available  (was: )

> Refresh error-states.json
> -
>
> Key: SPARK-47783
> URL: https://issues.apache.org/jira/browse/SPARK-47783
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Serge Rielau
>Priority: Major
>  Labels: pull-request-available
>
> We want to add more SQLSTATEs to the menu to prevent collisions and do some 
> general cleanup



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47783) Refresh error-states.json

2024-04-09 Thread Serge Rielau (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Serge Rielau updated SPARK-47783:
-
Summary: Refresh error-states.json  (was: Refresh error-state.sql)

> Refresh error-states.json
> -
>
> Key: SPARK-47783
> URL: https://issues.apache.org/jira/browse/SPARK-47783
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Serge Rielau
>Priority: Major
>
> We want to add more SQLSTATEs to the menu to prevent collisions and do some 
> general cleanup



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-47783) Refresh error-state.sql

2024-04-09 Thread Serge Rielau (Jira)

Serge Rielau created SPARK-47783:


 Summary: Refresh error-state.sql
 Key: SPARK-47783
 URL: https://issues.apache.org/jira/browse/SPARK-47783
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Serge Rielau


We want to add more SQLSTATEs to the menu to prevent collisions and do some 
general cleanup



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47673) [Arbitrary State Support] State TTL support - ListState

2024-04-09 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47673:
---
Labels: pull-request-available  (was: )

> [Arbitrary State Support] State TTL support - ListState
> ---
>
> Key: SPARK-47673
> URL: https://issues.apache.org/jira/browse/SPARK-47673
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 4.0.0
>Reporter: Eric Marnadi
>Priority: Major
>  Labels: pull-request-available
>
> Add support for expiring state value based on ttl for List State in 
> transformWithState operator.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47748) Upgrade `zstd-jni` to 1.5.6-2

2024-04-09 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-47748:
--
Parent: SPARK-47046
Issue Type: Sub-task  (was: Improvement)

> Upgrade `zstd-jni` to 1.5.6-2
> -
>
> Key: SPARK-47748
> URL: https://issues.apache.org/jira/browse/SPARK-47748
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-47782) Remove redundant json4s-jackson definition in sql/api POM

2024-04-09 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-47782:
-

Assignee: Cheng Pan  (was: Dongjoon Hyun)

> Remove redundant json4s-jackson definition in sql/api POM
> -
>
> Key: SPARK-47782
> URL: https://issues.apache.org/jira/browse/SPARK-47782
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Cheng Pan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47782) Remove redundant json4s-jackson definition in sql/api POM

2024-04-09 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47782:
---
Labels: pull-request-available  (was: )

> Remove redundant json4s-jackson definition in sql/api POM
> -
>
> Key: SPARK-47782
> URL: https://issues.apache.org/jira/browse/SPARK-47782
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-47782) Remove redundant json4s-jackson definition in sql/api POM

2024-04-09 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-47782:
-

Assignee: Dongjoon Hyun

> Remove redundant json4s-jackson definition in sql/api POM
> -
>
> Key: SPARK-47782
> URL: https://issues.apache.org/jira/browse/SPARK-47782
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47782) Remove redundant json4s-jackson definition in sql/api POM

2024-04-09 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-47782.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45943
[https://github.com/apache/spark/pull/45943]

> Remove redundant json4s-jackson definition in sql/api POM
> -
>
> Key: SPARK-47782
> URL: https://issues.apache.org/jira/browse/SPARK-47782
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47782) Remove redundant json4s-jackson definition in sql/api POM

2024-04-09 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-47782:
--
Parent: SPARK-47046
Issue Type: Sub-task  (was: Task)

> Remove redundant json4s-jackson definition in sql/api POM
> -
>
> Key: SPARK-47782
> URL: https://issues.apache.org/jira/browse/SPARK-47782
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-47782) Remove redundant json4s-jackson definition in sql/api POM

2024-04-09 Thread Dongjoon Hyun (Jira)

Dongjoon Hyun created SPARK-47782:
-

 Summary: Remove redundant json4s-jackson definition in sql/api POM
 Key: SPARK-47782
 URL: https://issues.apache.org/jira/browse/SPARK-47782
 Project: Spark
  Issue Type: Task
  Components: Build
Affects Versions: 4.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47601) Graphx: Migrate logs with variables to structured logging framework

2024-04-09 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47601:
---
Labels: pull-request-available  (was: )

> Graphx:  Migrate logs with variables to structured logging framework
> 
>
> Key: SPARK-47601
> URL: https://issues.apache.org/jira/browse/SPARK-47601
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Gengliang Wang
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47781) Handle negative scale and truncate exceed scale first for JDBC data sources

2024-04-09 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47781:
---
Labels: pull-request-available  (was: )

> Handle negative scale and truncate exceed scale first for JDBC data sources
> ---
>
> Key: SPARK-47781
> URL: https://issues.apache.org/jira/browse/SPARK-47781
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Priority: Major
>  Labels: pull-request-available
>
> SPARK-30252 has disabled the negative scale for decimals. It has a regression 
> that it also disabled reading from data sources that support negative scale 
> decimals



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-47781) Handle negative scale and truncate exceed scale first for JDBC data sources

2024-04-09 Thread Kent Yao (Jira)

Kent Yao created SPARK-47781:


 Summary: Handle negative scale and truncate exceed scale first for 
JDBC data sources
 Key: SPARK-47781
 URL: https://issues.apache.org/jira/browse/SPARK-47781
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 4.0.0
Reporter: Kent Yao


SPARK-30252 has disabled the negative scale for decimals. It has a regression 
that it also disabled reading from data sources that support negative scale 
decimals



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-47416) SoundEx (all collations)

2024-04-09 Thread Nikola Mandic (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-47416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835328#comment-17835328
 ] 

Nikola Mandic commented on SPARK-47416:
---

Working on it.

> SoundEx (all collations)
> 
>
> Key: SPARK-47416
> URL: https://issues.apache.org/jira/browse/SPARK-47416
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47780) Generate stable and uniquely-named classes in the Catalyst code generator

2024-04-09 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47780:
---
Labels: pull-request-available  (was: )

> Generate stable and uniquely-named classes in the Catalyst code generator
> -
>
> Key: SPARK-47780
> URL: https://issues.apache.org/jira/browse/SPARK-47780
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.1
>Reporter: Vojin Jovanovic
>Priority: Minor
>  Labels: pull-request-available
>
> Code generated by Catalyst can not be executed on [GraalVM Native 
> Image|https://www.graalvm.org/] for the following two reasons:
>  # The generated classes for Scala UDFs are not stable so they can not be 
> [pre-defined in a native 
> image|https://www.graalvm.org/latest/reference-manual/native-image/metadata/ExperimentalAgentOptions/#support-for-predefined-classes].
>  The instability comes from the address of hidden functions 
> ({{{}/0x{}}}) that [end up in the generated 
> code|https://github.com/apache/spark/blob/v3.5.1/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala#L1173].
>   These addresses are completely random and therefore provide little value to 
> the end user. 
>  # The name of the generated class is always the same 
> ({{{}org.apache.spark.sql.catalyst.expressions.GeneratedClass{}}}) which 
> makes it impossible to [pre-define multiple such classes in a native 
> image|https://github.com/oracle/graal/blob/vm-ce-23.1.2/substratevm/src/com.oracle.svm.hosted/src/com/oracle/svm/hosted/ClassPredefinitionFeature.java#L153].
>   We can fix this by generating the class name in the following format: 
> {{{}org.apache.spark.sql.catalyst.expressions.GeneratedClass${}}}.
> It would be useful enable Spark workloads on GraalVM Native Image. With 
> Native Image, one can generate low-memory Spark binaries for their frequent 
> workloads. We would also like to make all Spark benchmarks in the 
> [renaissance benchmark suite|https://renaissance.dev/] run as native images.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-47780) Generate stable and uniquely-named classes in the Catalyst code generator

2024-04-09 Thread Vojin Jovanovic (Jira)

Vojin Jovanovic created SPARK-47780:
---

 Summary: Generate stable and uniquely-named classes in the 
Catalyst code generator
 Key: SPARK-47780
 URL: https://issues.apache.org/jira/browse/SPARK-47780
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.5.1
Reporter: Vojin Jovanovic


Code generated by Catalyst can not be executed on [GraalVM Native 
Image|https://www.graalvm.org/] for the following two reasons:
 # The generated classes for Scala UDFs are not stable so they can not be 
[pre-defined in a native 
image|https://www.graalvm.org/latest/reference-manual/native-image/metadata/ExperimentalAgentOptions/#support-for-predefined-classes].
 The instability comes from the address of hidden functions 
({{{}/0x{}}}) that [end up in the generated 
code|https://github.com/apache/spark/blob/v3.5.1/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala#L1173].
  These addresses are completely random and therefore provide little value to 
the end user. 
 # The name of the generated class is always the same 
({{{}org.apache.spark.sql.catalyst.expressions.GeneratedClass{}}}) which makes 
it impossible to [pre-define multiple such classes in a native 
image|https://github.com/oracle/graal/blob/vm-ce-23.1.2/substratevm/src/com.oracle.svm.hosted/src/com/oracle/svm/hosted/ClassPredefinitionFeature.java#L153].
  We can fix this by generating the class name in the following format: 
{{{}org.apache.spark.sql.catalyst.expressions.GeneratedClass${}}}.

It would be useful enable Spark workloads on GraalVM Native Image. With Native 
Image, one can generate low-memory Spark binaries for their frequent workloads. 
We would also like to make all Spark benchmarks in the [renaissance benchmark 
suite|https://renaissance.dev/] run as native images.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-47413) Substring, Right, Left (all collations)

2024-04-09 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-47413:
--

Assignee: (was: Apache Spark)

> Substring, Right, Left (all collations)
> ---
>
> Key: SPARK-47413
> URL: https://issues.apache.org/jira/browse/SPARK-47413
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
>
> Enable collation support for the *Substring* built-in string function in 
> Spark (including *Right* and *Left* functions). First confirm what is the 
> expected behaviour for these functions when given collated strings, then move 
> on to the implementation that would enable handling strings of all collation 
> types. Implement the corresponding unit tests 
> (CollationStringExpressionsSuite) and E2E tests (CollationSuite) to reflect 
> how this function should be used with collation in SparkSQL, and feel free to 
> use your chosen Spark SQL Editor to experiment with the existing functions to 
> learn more about how they work. In addition, look into the possible use-cases 
> and implementation of similar functions within other other open-source DBMS, 
> such as [PostgreSQL|https://www.postgresql.org/docs/].
>  
> The goal for this Jira ticket is to implement the {*}Substring{*}, 
> {*}Right{*}, and *Left* functions so that they support all collation types 
> currently supported in Spark. To understand what changes were introduced in 
> order to enable full collation support for other existing functions in Spark, 
> take a look at the Spark PRs and Jira tickets for completed tasks in this 
> parent (for example: Contains, StartsWith, EndsWith).
>  
> Read more about ICU [Collation Concepts|http://example.com/] and 
> [Collator|http://example.com/] class. Also, refer to the Unicode Technical 
> Standard for 
> [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-47779) Add a helper function to sort PS Frame/Series

2024-04-09 Thread Ruifeng Zheng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng reassigned SPARK-47779:
-

Assignee: Ruifeng Zheng

> Add a helper function to sort PS Frame/Series
> -
>
> Key: SPARK-47779
> URL: https://issues.apache.org/jira/browse/SPARK-47779
> Project: Spark
>  Issue Type: Improvement
>  Components: PS, Tests
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-47413) Substring, Right, Left (all collations)

2024-04-09 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-47413:
--

Assignee: Apache Spark

> Substring, Right, Left (all collations)
> ---
>
> Key: SPARK-47413
> URL: https://issues.apache.org/jira/browse/SPARK-47413
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Apache Spark
>Priority: Major
>  Labels: pull-request-available
>
> Enable collation support for the *Substring* built-in string function in 
> Spark (including *Right* and *Left* functions). First confirm what is the 
> expected behaviour for these functions when given collated strings, then move 
> on to the implementation that would enable handling strings of all collation 
> types. Implement the corresponding unit tests 
> (CollationStringExpressionsSuite) and E2E tests (CollationSuite) to reflect 
> how this function should be used with collation in SparkSQL, and feel free to 
> use your chosen Spark SQL Editor to experiment with the existing functions to 
> learn more about how they work. In addition, look into the possible use-cases 
> and implementation of similar functions within other other open-source DBMS, 
> such as [PostgreSQL|https://www.postgresql.org/docs/].
>  
> The goal for this Jira ticket is to implement the {*}Substring{*}, 
> {*}Right{*}, and *Left* functions so that they support all collation types 
> currently supported in Spark. To understand what changes were introduced in 
> order to enable full collation support for other existing functions in Spark, 
> take a look at the Spark PRs and Jira tickets for completed tasks in this 
> parent (for example: Contains, StartsWith, EndsWith).
>  
> Read more about ICU [Collation Concepts|http://example.com/] and 
> [Collator|http://example.com/] class. Also, refer to the Unicode Technical 
> Standard for 
> [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47779) Add a helper function to sort PS Frame/Series

2024-04-09 Thread Ruifeng Zheng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-47779.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45952
[https://github.com/apache/spark/pull/45952]

> Add a helper function to sort PS Frame/Series
> -
>
> Key: SPARK-47779
> URL: https://issues.apache.org/jira/browse/SPARK-47779
> Project: Spark
>  Issue Type: Improvement
>  Components: PS, Tests
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-47779) Add a helper function to sort PS Frame/Series

2024-04-09 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-47779:
--

Assignee: (was: Apache Spark)

> Add a helper function to sort PS Frame/Series
> -
>
> Key: SPARK-47779
> URL: https://issues.apache.org/jira/browse/SPARK-47779
> Project: Spark
>  Issue Type: Improvement
>  Components: PS, Tests
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-47413) Substring, Right, Left (all collations)

2024-04-09 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-47413:
--

Assignee: Apache Spark

> Substring, Right, Left (all collations)
> ---
>
> Key: SPARK-47413
> URL: https://issues.apache.org/jira/browse/SPARK-47413
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Apache Spark
>Priority: Major
>  Labels: pull-request-available
>
> Enable collation support for the *Substring* built-in string function in 
> Spark (including *Right* and *Left* functions). First confirm what is the 
> expected behaviour for these functions when given collated strings, then move 
> on to the implementation that would enable handling strings of all collation 
> types. Implement the corresponding unit tests 
> (CollationStringExpressionsSuite) and E2E tests (CollationSuite) to reflect 
> how this function should be used with collation in SparkSQL, and feel free to 
> use your chosen Spark SQL Editor to experiment with the existing functions to 
> learn more about how they work. In addition, look into the possible use-cases 
> and implementation of similar functions within other other open-source DBMS, 
> such as [PostgreSQL|https://www.postgresql.org/docs/].
>  
> The goal for this Jira ticket is to implement the {*}Substring{*}, 
> {*}Right{*}, and *Left* functions so that they support all collation types 
> currently supported in Spark. To understand what changes were introduced in 
> order to enable full collation support for other existing functions in Spark, 
> take a look at the Spark PRs and Jira tickets for completed tasks in this 
> parent (for example: Contains, StartsWith, EndsWith).
>  
> Read more about ICU [Collation Concepts|http://example.com/] and 
> [Collator|http://example.com/] class. Also, refer to the Unicode Technical 
> Standard for 
> [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-47413) Substring, Right, Left (all collations)

2024-04-09 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-47413:
--

Assignee: (was: Apache Spark)

> Substring, Right, Left (all collations)
> ---
>
> Key: SPARK-47413
> URL: https://issues.apache.org/jira/browse/SPARK-47413
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
>
> Enable collation support for the *Substring* built-in string function in 
> Spark (including *Right* and *Left* functions). First confirm what is the 
> expected behaviour for these functions when given collated strings, then move 
> on to the implementation that would enable handling strings of all collation 
> types. Implement the corresponding unit tests 
> (CollationStringExpressionsSuite) and E2E tests (CollationSuite) to reflect 
> how this function should be used with collation in SparkSQL, and feel free to 
> use your chosen Spark SQL Editor to experiment with the existing functions to 
> learn more about how they work. In addition, look into the possible use-cases 
> and implementation of similar functions within other other open-source DBMS, 
> such as [PostgreSQL|https://www.postgresql.org/docs/].
>  
> The goal for this Jira ticket is to implement the {*}Substring{*}, 
> {*}Right{*}, and *Left* functions so that they support all collation types 
> currently supported in Spark. To understand what changes were introduced in 
> order to enable full collation support for other existing functions in Spark, 
> take a look at the Spark PRs and Jira tickets for completed tasks in this 
> parent (for example: Contains, StartsWith, EndsWith).
>  
> Read more about ICU [Collation Concepts|http://example.com/] and 
> [Collator|http://example.com/] class. Also, refer to the Unicode Technical 
> Standard for 
> [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-47779) Add a helper function to sort PS Frame/Series

2024-04-09 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-47779:
--

Assignee: Apache Spark

> Add a helper function to sort PS Frame/Series
> -
>
> Key: SPARK-47779
> URL: https://issues.apache.org/jira/browse/SPARK-47779
> Project: Spark
>  Issue Type: Improvement
>  Components: PS, Tests
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Apache Spark
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-47718) .sql() does not recognize watermark defined upstream

2024-04-09 Thread Jungtaek Lim (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-47718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835303#comment-17835303
 ] 

Jungtaek Lim commented on SPARK-47718:
--

window('createTime', '1 hour', '30 minutes')

'createTime' is a literal, not the column reference. Please try again with 
window(createTime, '1 hour', '30 minutes') and reopen if the issue persists.

> .sql() does not recognize watermark defined upstream
> 
>
> Key: SPARK-47718
> URL: https://issues.apache.org/jira/browse/SPARK-47718
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.5.1
>Reporter: Chloe He
>Priority: Major
>  Labels: pull-request-available
>
> I have a data pipeline set up in such a way that it reads data from a Kafka 
> source, does some transformation on the data using pyspark, then writes the 
> output into a sink (Kafka, Redis, etc).
>  
> My entire pipeline in written in SQL, so I wish to use the .sql() method to 
> execute SQL on my streaming source directly.
>  
> However, I'm running into the issue where my watermark is not being 
> recognized by the downstream query via the .sql() method.
>  
> ```
> Python 3.11.8 | packaged by conda-forge | (main, Feb 16 2024, 20:49:36) 
> [Clang 16.0.6 ] on darwin
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import pyspark
> >>> print(pyspark.__version__)
> 3.5.1
> >>> from pyspark.sql import SparkSession
> >>>
> >>> session = SparkSession.builder \
> ...     .config("spark.jars.packages", 
> "org.apache.spark:spark-sql-kafka-0-10_2.12:3.5.1")\
> ...     .getOrCreate()
> >>> from pyspark.sql.functions import col, from_json
> >>> from pyspark.sql.types import StructField, StructType, TimestampType, 
> >>> LongType, DoubleType, IntegerType
> >>> schema = StructType(
> ...     [
> ...         StructField('createTime', TimestampType(), True),
> ...         StructField('orderId', LongType(), True),
> ...         StructField('payAmount', DoubleType(), True),
> ...         StructField('payPlatform', IntegerType(), True),
> ...         StructField('provinceId', IntegerType(), True),
> ...     ])
> >>>
> >>> streaming_df = session.readStream\
> ...     .format("kafka")\
> ...     .option("kafka.bootstrap.servers", "localhost:9092")\
> ...     .option("subscribe", "payment_msg")\
> ...     .option("startingOffsets","earliest")\
> ...     .load()\
> ...     .select(from_json(col("value").cast("string"), 
> schema).alias("parsed_value"))\
> ...     .select("parsed_value.*")\
> ...     .withWatermark("createTime", "10 seconds")
> >>>
> >>> streaming_df.createOrReplaceTempView("streaming_df")
> >>> session.sql("""
> ... SELECT
> ...     window.start, window.end, provinceId, sum(payAmount) as totalPayAmount
> ...     FROM streaming_df
> ...     GROUP BY provinceId, window('createTime', '1 hour', '30 minutes')
> ...     ORDER BY window.start
> ... """)\
> ...   .writeStream\
> ...   .format("kafka") \
> ...   .option("checkpointLocation", "checkpoint") \
> ...   .option("kafka.bootstrap.servers", "localhost:9092") \
> ...   .option("topic", "sink") \
> ...   .start()
> ```
>  
> This throws exception
> ```
> pyspark.errors.exceptions.captured.AnalysisException: Append output mode not 
> supported when there are streaming aggregations on streaming 
> DataFrames/DataSets without watermark; line 6 pos 4;
> ```
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47718) .sql() does not recognize watermark defined upstream

2024-04-09 Thread Jungtaek Lim (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim resolved SPARK-47718.
--
Resolution: Not A Bug

> .sql() does not recognize watermark defined upstream
> 
>
> Key: SPARK-47718
> URL: https://issues.apache.org/jira/browse/SPARK-47718
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.5.1
>Reporter: Chloe He
>Priority: Major
>  Labels: pull-request-available
>
> I have a data pipeline set up in such a way that it reads data from a Kafka 
> source, does some transformation on the data using pyspark, then writes the 
> output into a sink (Kafka, Redis, etc).
>  
> My entire pipeline in written in SQL, so I wish to use the .sql() method to 
> execute SQL on my streaming source directly.
>  
> However, I'm running into the issue where my watermark is not being 
> recognized by the downstream query via the .sql() method.
>  
> ```
> Python 3.11.8 | packaged by conda-forge | (main, Feb 16 2024, 20:49:36) 
> [Clang 16.0.6 ] on darwin
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import pyspark
> >>> print(pyspark.__version__)
> 3.5.1
> >>> from pyspark.sql import SparkSession
> >>>
> >>> session = SparkSession.builder \
> ...     .config("spark.jars.packages", 
> "org.apache.spark:spark-sql-kafka-0-10_2.12:3.5.1")\
> ...     .getOrCreate()
> >>> from pyspark.sql.functions import col, from_json
> >>> from pyspark.sql.types import StructField, StructType, TimestampType, 
> >>> LongType, DoubleType, IntegerType
> >>> schema = StructType(
> ...     [
> ...         StructField('createTime', TimestampType(), True),
> ...         StructField('orderId', LongType(), True),
> ...         StructField('payAmount', DoubleType(), True),
> ...         StructField('payPlatform', IntegerType(), True),
> ...         StructField('provinceId', IntegerType(), True),
> ...     ])
> >>>
> >>> streaming_df = session.readStream\
> ...     .format("kafka")\
> ...     .option("kafka.bootstrap.servers", "localhost:9092")\
> ...     .option("subscribe", "payment_msg")\
> ...     .option("startingOffsets","earliest")\
> ...     .load()\
> ...     .select(from_json(col("value").cast("string"), 
> schema).alias("parsed_value"))\
> ...     .select("parsed_value.*")\
> ...     .withWatermark("createTime", "10 seconds")
> >>>
> >>> streaming_df.createOrReplaceTempView("streaming_df")
> >>> session.sql("""
> ... SELECT
> ...     window.start, window.end, provinceId, sum(payAmount) as totalPayAmount
> ...     FROM streaming_df
> ...     GROUP BY provinceId, window('createTime', '1 hour', '30 minutes')
> ...     ORDER BY window.start
> ... """)\
> ...   .writeStream\
> ...   .format("kafka") \
> ...   .option("checkpointLocation", "checkpoint") \
> ...   .option("kafka.bootstrap.servers", "localhost:9092") \
> ...   .option("topic", "sink") \
> ...   .start()
> ```
>  
> This throws exception
> ```
> pyspark.errors.exceptions.captured.AnalysisException: Append output mode not 
> supported when there are streaming aggregations on streaming 
> DataFrames/DataSets without watermark; line 6 pos 4;
> ```
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47779) Add a helper function to sort PS Frame/Series

2024-04-09 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47779:
---
Labels: pull-request-available  (was: )

> Add a helper function to sort PS Frame/Series
> -
>
> Key: SPARK-47779
> URL: https://issues.apache.org/jira/browse/SPARK-47779
> Project: Spark
>  Issue Type: Improvement
>  Components: PS, Tests
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-47779) Add a helper function to sort PS Frame/Series

2024-04-09 Thread Ruifeng Zheng (Jira)

Ruifeng Zheng created SPARK-47779:
-

 Summary: Add a helper function to sort PS Frame/Series
 Key: SPARK-47779
 URL: https://issues.apache.org/jira/browse/SPARK-47779
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47779) Add a helper function to sort PS Frame/Series

2024-04-09 Thread Ruifeng Zheng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng updated SPARK-47779:
--
Component/s: PS
 Tests
 (was: PySpark)

> Add a helper function to sort PS Frame/Series
> -
>
> Key: SPARK-47779
> URL: https://issues.apache.org/jira/browse/SPARK-47779
> Project: Spark
>  Issue Type: Improvement
>  Components: PS, Tests
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-47778) Promote `--wait` option to all services

2024-04-09 Thread Cheng Pan (Jira)

Cheng Pan created SPARK-47778:
-

 Summary: Promote `--wait` option to all services
 Key: SPARK-47778
 URL: https://issues.apache.org/jira/browse/SPARK-47778
 Project: Spark
  Issue Type: New Feature
  Components: Deploy
Affects Versions: 4.0.0
Reporter: Cheng Pan


SPARK-47040 add `--wait` support to `start-connect-server.sh`

./sbin/start-connect-server.sh [--wait] [options]

 

In [https://github.com/apache/spark/pull/45852,] we discussed and reached a 
consensus to promote `–wait` to all service for consistency



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47774) Remove redundant rules from `MimaExcludes`

2024-04-09 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-47774:
--
Fix Version/s: 3.4.3

> Remove redundant rules from `MimaExcludes`
> --
>
> Key: SPARK-47774
> URL: https://issues.apache.org/jira/browse/SPARK-47774
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0, 3.5.2, 3.4.3
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47776) State store operation cannot work properly with binary inequality collation

2024-04-09 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47776:
---
Labels: pull-request-available  (was: )

> State store operation cannot work properly with binary inequality collation
> ---
>
> Key: SPARK-47776
> URL: https://issues.apache.org/jira/browse/SPARK-47776
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 4.0.0
>Reporter: Jungtaek Lim
>Priority: Blocker
>  Labels: pull-request-available
>
> Arguably this is a correctness issue, though we haven't released collation 
> feature yet.
> collation introduces the concept of binary (in)equality, which means in some 
> collation we no longer be able to just compare the binary format of two 
> UnsafeRows to determine equality.
> For example, 'aaa' and 'AAA' can be "semantically" same in case insensitive 
> collation.
> State store is basically key-value storage, and the most provider 
> implementations rely on the fact that all the columns in the key schema 
> support binary equality. We need to disallow using binary inequality column 
> in the key schema, before we could support this in majority of state store 
> providers (or high-level of state store.)
> Why this is correctness issue? For example, streaming aggregation will 
> produce an output of aggregation which does not care about the semantic 
> equality.
> e.g. df.groupBy(strCol).count() 
> Although strCol is case insensitive, 'a' and 'A' won't be counted together in 
> streaming aggregation, while they should be.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47774) Remove redundant rules from `MimaExcludes`

2024-04-09 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-47774:
--
Fix Version/s: 3.5.2

> Remove redundant rules from `MimaExcludes`
> --
>
> Key: SPARK-47774
> URL: https://issues.apache.org/jira/browse/SPARK-47774
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0, 3.5.2
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47774) Remove redundant rules from `MimaExcludes`

2024-04-09 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-47774.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45944
[https://github.com/apache/spark/pull/45944]

> Remove redundant rules from `MimaExcludes`
> --
>
> Key: SPARK-47774
> URL: https://issues.apache.org/jira/browse/SPARK-47774
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-47412) StringLPad, StringRPad (all collations)

2024-04-09 Thread Gideon P (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-47412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835182#comment-17835182
 ] 

Gideon P commented on SPARK-47412:
--

[~uros-db] I agree -- this one can be expected to be simple and familiar. I 
will move on to this one (while still shepherding 
https://github.com/apache/spark/pull/45738/ to the finish line). Thanks! I will 
let you know if I have questions.

> StringLPad, StringRPad (all collations)
> ---
>
> Key: SPARK-47412
> URL: https://issues.apache.org/jira/browse/SPARK-47412
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Priority: Major
>
> Enable collation support for the *StringLPad* & *StringRPad* built-in string 
> functions in Spark. First confirm what is the expected behaviour for these 
> functions when given collated strings, then move on to the implementation 
> that would enable handling strings of all collation types. Implement the 
> corresponding unit tests (CollationStringExpressionsSuite) and E2E tests 
> (CollationSuite) to reflect how this function should be used with collation 
> in SparkSQL, and feel free to use your chosen Spark SQL Editor to experiment 
> with the existing functions to learn more about how they work. In addition, 
> look into the possible use-cases and implementation of similar functions 
> within other other open-source DBMS, such as 
> [PostgreSQL|https://www.postgresql.org/docs/].
>  
> The goal for this Jira ticket is to implement the *StringLPad* & *StringRPad* 
> functions so that they support all collation types currently supported in 
> Spark. To understand what changes were introduced in order to enable full 
> collation support for other existing functions in Spark, take a look at the 
> Spark PRs and Jira tickets for completed tasks in this parent (for example: 
> Contains, StartsWith, EndsWith).
>  
> Read more about ICU [Collation Concepts|http://example.com/] and 
> [Collator|http://example.com/] class. Also, refer to the Unicode Technical 
> Standard for 
> [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47777) Add spark connect test for python streaming data source

2024-04-09 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-4?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-4:
---
Labels: pull-request-available  (was: )

> Add spark connect test for python streaming data source
> ---
>
> Key: SPARK-4
> URL: https://issues.apache.org/jira/browse/SPARK-4
> Project: Spark
>  Issue Type: Test
>  Components: PySpark, SS
>Affects Versions: 3.5.1
>Reporter: Chaoqin Li
>Priority: Major
>  Labels: pull-request-available
>
> Make python streaming data source pyspark test also runs on spark connect. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-47777) Add spark connect test for python streaming data source

2024-04-09 Thread Chaoqin Li (Jira)

Chaoqin Li created SPARK-4:
--

 Summary: Add spark connect test for python streaming data source
 Key: SPARK-4
 URL: https://issues.apache.org/jira/browse/SPARK-4
 Project: Spark
  Issue Type: Test
  Components: PySpark, SS
Affects Versions: 3.5.1
Reporter: Chaoqin Li


Make python streaming data source pyspark test also runs on spark connect. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47773) Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on Various Native Engines

2024-04-09 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-47773:
--
Affects Version/s: 4.0.0
   (was: 3.5.1)

> Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on 
> Various Native Engines
> 
>
> Key: SPARK-47773
> URL: https://issues.apache.org/jira/browse/SPARK-47773
> Project: Spark
>  Issue Type: Epic
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Ke Jia
>Priority: Major
>
> SPIP doc: 
> https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing
> This 
> [SPIP|https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing]
>  outlines the integration of Gluten's physical plan conversion, validation, 
> and fallback framework into Apache Spark. The goal is to enhance Spark's 
> flexibility and robustness in executing physical plans and to leverage 
> Gluten's performance optimizations. Currently, Spark lacks an official 
> cross-platform execution support for physical plans. Gluten's mechanism, 
> which employs the Substrait standard, can convert and optimize Spark's 
> physical plans, thus improving portability, interoperability, and execution 
> efficiency.
> The design proposal advocates for the incorporation of the TransformSupport 
> interface and its specialized variants—LeafTransformSupport, 
> UnaryTransformSupport, and BinaryTransformSupport. These are instrumental in 
> streamlining the conversion of different operator types into a 
> Substrait-based common format. The validation phase entails a thorough 
> assessment of the Substrait plan against native backends to ensure 
> compatibility. In instances where validation does not succeed, Spark's native 
> operators will be deployed, with requisite transformations to adapt data 
> formats accordingly. The proposal emphasizes the centrality of the plan 
> transformation phase, positing it as the foundational step. The subsequent 
> validation and fallback procedures are slated for consideration upon the 
> successful establishment of the initial phase.
> The integration of Gluten into Spark has already shown significant 
> performance improvements with ClickHouse and Velox backends and has been 
> successfully deployed in production by several customers. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-47773) Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on Various Native Engines

2024-04-09 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-47773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835181#comment-17835181
 ] 

Dongjoon Hyun commented on SPARK-47773:
---

Since this is a new feature which cannot affect old releases, I updated the 
Affected Version to 4.0.0.

> Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on 
> Various Native Engines
> 
>
> Key: SPARK-47773
> URL: https://issues.apache.org/jira/browse/SPARK-47773
> Project: Spark
>  Issue Type: Epic
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Ke Jia
>Priority: Major
>
> SPIP doc: 
> https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing
> This 
> [SPIP|https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing]
>  outlines the integration of Gluten's physical plan conversion, validation, 
> and fallback framework into Apache Spark. The goal is to enhance Spark's 
> flexibility and robustness in executing physical plans and to leverage 
> Gluten's performance optimizations. Currently, Spark lacks an official 
> cross-platform execution support for physical plans. Gluten's mechanism, 
> which employs the Substrait standard, can convert and optimize Spark's 
> physical plans, thus improving portability, interoperability, and execution 
> efficiency.
> The design proposal advocates for the incorporation of the TransformSupport 
> interface and its specialized variants—LeafTransformSupport, 
> UnaryTransformSupport, and BinaryTransformSupport. These are instrumental in 
> streamlining the conversion of different operator types into a 
> Substrait-based common format. The validation phase entails a thorough 
> assessment of the Substrait plan against native backends to ensure 
> compatibility. In instances where validation does not succeed, Spark's native 
> operators will be deployed, with requisite transformations to adapt data 
> formats accordingly. The proposal emphasizes the centrality of the plan 
> transformation phase, positing it as the foundational step. The subsequent 
> validation and fallback procedures are slated for consideration upon the 
> successful establishment of the initial phase.
> The integration of Gluten into Spark has already shown significant 
> performance improvements with ClickHouse and Velox backends and has been 
> successfully deployed in production by several customers. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47772) Fix the doctest of mode function

2024-04-09 Thread Ruifeng Zheng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-47772.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45940
[https://github.com/apache/spark/pull/45940]

> Fix the doctest of mode function
> 
>
> Key: SPARK-47772
> URL: https://issues.apache.org/jira/browse/SPARK-47772
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-47776) State store operation cannot work properly with binary inequality collation

2024-04-09 Thread Jungtaek Lim (Jira)

Jungtaek Lim created SPARK-47776:


 Summary: State store operation cannot work properly with binary 
inequality collation
 Key: SPARK-47776
 URL: https://issues.apache.org/jira/browse/SPARK-47776
 Project: Spark
  Issue Type: Bug
  Components: Structured Streaming
Affects Versions: 4.0.0
Reporter: Jungtaek Lim


Arguably this is a correctness issue, though we haven't released collation 
feature yet.

collation introduces the concept of binary (in)equality, which means in some 
collation we no longer be able to just compare the binary format of two 
UnsafeRows to determine equality.

For example, 'aaa' and 'AAA' can be "semantically" same in case insensitive 
collation.

State store is basically key-value storage, and the most provider 
implementations rely on the fact that all the columns in the key schema support 
binary equality. We need to disallow using binary inequality column in the key 
schema, before we could support this in majority of state store providers (or 
high-level of state store.)

Why this is correctness issue? For example, streaming aggregation will produce 
an output of aggregation which does not care about the semantic equality.

e.g. df.groupBy(strCol).count() 

Although strCol is case insensitive, 'a' and 'A' won't be counted together in 
streaming aggregation, while they should be.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-47718) .sql() does not recognize watermark defined upstream

2024-04-09 Thread Jungtaek Lim (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-47718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835175#comment-17835175
 ] 

Jungtaek Lim commented on SPARK-47718:
--

I've lowered down to major - this is neither a regression nor correctness issue.

> .sql() does not recognize watermark defined upstream
> 
>
> Key: SPARK-47718
> URL: https://issues.apache.org/jira/browse/SPARK-47718
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.5.1
>Reporter: Chloe He
>Priority: Major
>  Labels: pull-request-available
>
> I have a data pipeline set up in such a way that it reads data from a Kafka 
> source, does some transformation on the data using pyspark, then writes the 
> output into a sink (Kafka, Redis, etc).
>  
> My entire pipeline in written in SQL, so I wish to use the .sql() method to 
> execute SQL on my streaming source directly.
>  
> However, I'm running into the issue where my watermark is not being 
> recognized by the downstream query via the .sql() method.
>  
> ```
> Python 3.11.8 | packaged by conda-forge | (main, Feb 16 2024, 20:49:36) 
> [Clang 16.0.6 ] on darwin
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import pyspark
> >>> print(pyspark.__version__)
> 3.5.1
> >>> from pyspark.sql import SparkSession
> >>>
> >>> session = SparkSession.builder \
> ...     .config("spark.jars.packages", 
> "org.apache.spark:spark-sql-kafka-0-10_2.12:3.5.1")\
> ...     .getOrCreate()
> >>> from pyspark.sql.functions import col, from_json
> >>> from pyspark.sql.types import StructField, StructType, TimestampType, 
> >>> LongType, DoubleType, IntegerType
> >>> schema = StructType(
> ...     [
> ...         StructField('createTime', TimestampType(), True),
> ...         StructField('orderId', LongType(), True),
> ...         StructField('payAmount', DoubleType(), True),
> ...         StructField('payPlatform', IntegerType(), True),
> ...         StructField('provinceId', IntegerType(), True),
> ...     ])
> >>>
> >>> streaming_df = session.readStream\
> ...     .format("kafka")\
> ...     .option("kafka.bootstrap.servers", "localhost:9092")\
> ...     .option("subscribe", "payment_msg")\
> ...     .option("startingOffsets","earliest")\
> ...     .load()\
> ...     .select(from_json(col("value").cast("string"), 
> schema).alias("parsed_value"))\
> ...     .select("parsed_value.*")\
> ...     .withWatermark("createTime", "10 seconds")
> >>>
> >>> streaming_df.createOrReplaceTempView("streaming_df")
> >>> session.sql("""
> ... SELECT
> ...     window.start, window.end, provinceId, sum(payAmount) as totalPayAmount
> ...     FROM streaming_df
> ...     GROUP BY provinceId, window('createTime', '1 hour', '30 minutes')
> ...     ORDER BY window.start
> ... """)\
> ...   .writeStream\
> ...   .format("kafka") \
> ...   .option("checkpointLocation", "checkpoint") \
> ...   .option("kafka.bootstrap.servers", "localhost:9092") \
> ...   .option("topic", "sink") \
> ...   .start()
> ```
>  
> This throws exception
> ```
> pyspark.errors.exceptions.captured.AnalysisException: Append output mode not 
> supported when there are streaming aggregations on streaming 
> DataFrames/DataSets without watermark; line 6 pos 4;
> ```
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47718) .sql() does not recognize watermark defined upstream

2024-04-09 Thread Jungtaek Lim (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim updated SPARK-47718:
-
Priority: Major  (was: Blocker)

> .sql() does not recognize watermark defined upstream
> 
>
> Key: SPARK-47718
> URL: https://issues.apache.org/jira/browse/SPARK-47718
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.5.1
>Reporter: Chloe He
>Priority: Major
>  Labels: pull-request-available
>
> I have a data pipeline set up in such a way that it reads data from a Kafka 
> source, does some transformation on the data using pyspark, then writes the 
> output into a sink (Kafka, Redis, etc).
>  
> My entire pipeline in written in SQL, so I wish to use the .sql() method to 
> execute SQL on my streaming source directly.
>  
> However, I'm running into the issue where my watermark is not being 
> recognized by the downstream query via the .sql() method.
>  
> ```
> Python 3.11.8 | packaged by conda-forge | (main, Feb 16 2024, 20:49:36) 
> [Clang 16.0.6 ] on darwin
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import pyspark
> >>> print(pyspark.__version__)
> 3.5.1
> >>> from pyspark.sql import SparkSession
> >>>
> >>> session = SparkSession.builder \
> ...     .config("spark.jars.packages", 
> "org.apache.spark:spark-sql-kafka-0-10_2.12:3.5.1")\
> ...     .getOrCreate()
> >>> from pyspark.sql.functions import col, from_json
> >>> from pyspark.sql.types import StructField, StructType, TimestampType, 
> >>> LongType, DoubleType, IntegerType
> >>> schema = StructType(
> ...     [
> ...         StructField('createTime', TimestampType(), True),
> ...         StructField('orderId', LongType(), True),
> ...         StructField('payAmount', DoubleType(), True),
> ...         StructField('payPlatform', IntegerType(), True),
> ...         StructField('provinceId', IntegerType(), True),
> ...     ])
> >>>
> >>> streaming_df = session.readStream\
> ...     .format("kafka")\
> ...     .option("kafka.bootstrap.servers", "localhost:9092")\
> ...     .option("subscribe", "payment_msg")\
> ...     .option("startingOffsets","earliest")\
> ...     .load()\
> ...     .select(from_json(col("value").cast("string"), 
> schema).alias("parsed_value"))\
> ...     .select("parsed_value.*")\
> ...     .withWatermark("createTime", "10 seconds")
> >>>
> >>> streaming_df.createOrReplaceTempView("streaming_df")
> >>> session.sql("""
> ... SELECT
> ...     window.start, window.end, provinceId, sum(payAmount) as totalPayAmount
> ...     FROM streaming_df
> ...     GROUP BY provinceId, window('createTime', '1 hour', '30 minutes')
> ...     ORDER BY window.start
> ... """)\
> ...   .writeStream\
> ...   .format("kafka") \
> ...   .option("checkpointLocation", "checkpoint") \
> ...   .option("kafka.bootstrap.servers", "localhost:9092") \
> ...   .option("topic", "sink") \
> ...   .start()
> ```
>  
> This throws exception
> ```
> pyspark.errors.exceptions.captured.AnalysisException: Append output mode not 
> supported when there are streaming aggregations on streaming 
> DataFrames/DataSets without watermark; line 6 pos 4;
> ```
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-47773) Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on Various Native Engines

2024-04-09 Thread Ke Jia (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-47773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835171#comment-17835171
 ] 

Ke Jia commented on SPARK-47773:


[~viirya] Great, I have added comment permissions in the SPIP Doc. I'm very 
much looking forward to your feedback. Thanks.

> Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on 
> Various Native Engines
> 
>
> Key: SPARK-47773
> URL: https://issues.apache.org/jira/browse/SPARK-47773
> Project: Spark
>  Issue Type: Epic
>  Components: SQL
>Affects Versions: 3.5.1
>Reporter: Ke Jia
>Priority: Major
>
> SPIP doc: 
> https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing
> This 
> [SPIP|https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing]
>  outlines the integration of Gluten's physical plan conversion, validation, 
> and fallback framework into Apache Spark. The goal is to enhance Spark's 
> flexibility and robustness in executing physical plans and to leverage 
> Gluten's performance optimizations. Currently, Spark lacks an official 
> cross-platform execution support for physical plans. Gluten's mechanism, 
> which employs the Substrait standard, can convert and optimize Spark's 
> physical plans, thus improving portability, interoperability, and execution 
> efficiency.
> The design proposal advocates for the incorporation of the TransformSupport 
> interface and its specialized variants—LeafTransformSupport, 
> UnaryTransformSupport, and BinaryTransformSupport. These are instrumental in 
> streamlining the conversion of different operator types into a 
> Substrait-based common format. The validation phase entails a thorough 
> assessment of the Substrait plan against native backends to ensure 
> compatibility. In instances where validation does not succeed, Spark's native 
> operators will be deployed, with requisite transformations to adapt data 
> formats accordingly. The proposal emphasizes the centrality of the plan 
> transformation phase, positing it as the foundational step. The subsequent 
> validation and fallback procedures are slated for consideration upon the 
> successful establishment of the initial phase.
> The integration of Gluten into Spark has already shown significant 
> performance improvements with ClickHouse and Velox backends and has been 
> successfully deployed in production by several customers. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-47773) Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on Various Native Engines

2024-04-09 Thread L. C. Hsieh (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-47773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835168#comment-17835168
 ] 

L. C. Hsieh commented on SPARK-47773:
-

Can you open comment on the SPIP doc?

> Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on 
> Various Native Engines
> 
>
> Key: SPARK-47773
> URL: https://issues.apache.org/jira/browse/SPARK-47773
> Project: Spark
>  Issue Type: Epic
>  Components: SQL
>Affects Versions: 3.5.1
>Reporter: Ke Jia
>Priority: Major
>
> SPIP doc: 
> https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing
> This 
> [SPIP|https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing]
>  outlines the integration of Gluten's physical plan conversion, validation, 
> and fallback framework into Apache Spark. The goal is to enhance Spark's 
> flexibility and robustness in executing physical plans and to leverage 
> Gluten's performance optimizations. Currently, Spark lacks an official 
> cross-platform execution support for physical plans. Gluten's mechanism, 
> which employs the Substrait standard, can convert and optimize Spark's 
> physical plans, thus improving portability, interoperability, and execution 
> efficiency.
> The design proposal advocates for the incorporation of the TransformSupport 
> interface and its specialized variants—LeafTransformSupport, 
> UnaryTransformSupport, and BinaryTransformSupport. These are instrumental in 
> streamlining the conversion of different operator types into a 
> Substrait-based common format. The validation phase entails a thorough 
> assessment of the Substrait plan against native backends to ensure 
> compatibility. In instances where validation does not succeed, Spark's native 
> operators will be deployed, with requisite transformations to adapt data 
> formats accordingly. The proposal emphasizes the centrality of the plan 
> transformation phase, positing it as the foundational step. The subsequent 
> validation and fallback procedures are slated for consideration upon the 
> successful establishment of the initial phase.
> The integration of Gluten into Spark has already shown significant 
> performance improvements with ClickHouse and Velox backends and has been 
> successfully deployed in production by several customers. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47736) Add support for AbstractArrayType

2024-04-09 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47736:
---
Labels: pull-request-available  (was: )

> Add support for AbstractArrayType
> -
>
> Key: SPARK-47736
> URL: https://issues.apache.org/jira/browse/SPARK-47736
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Mihailo Milosevic
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47736) Add support for AbstractArrayType

2024-04-09 Thread Mihailo Milosevic (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mihailo Milosevic updated SPARK-47736:
--
Summary: Add support for AbstractArrayType  (was: Add support for 
AbstractArrayType(StringTypeCollated))

> Add support for AbstractArrayType
> -
>
> Key: SPARK-47736
> URL: https://issues.apache.org/jira/browse/SPARK-47736
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Mihailo Milosevic
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47240) SPIP: Structured Logging Framework for Apache Spark

2024-04-09 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47240:
---
Labels: pull-request-available  (was: )

> SPIP: Structured Logging Framework for Apache Spark
> ---
>
> Key: SPARK-47240
> URL: https://issues.apache.org/jira/browse/SPARK-47240
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Gengliang Wang
>Priority: Major
>  Labels: pull-request-available
>
> This proposal aims to enhance Apache Spark's logging system by implementing 
> structured logging. This transition will change the format of the default log 
> files from plain text to JSON, making them more accessible and analyzable. 
> The new logs will include crucial identifiers such as worker, executor, 
> query, job, stage, and task IDs, thereby making the logs more informative and 
> facilitating easier search and analysis.
> h2. Current Logging Format
> The current format of Spark logs is plain text, which can be challenging to 
> parse and analyze efficiently. An example of the current log format is as 
> follows:
> {code:java}
> 23/11/29 17:53:44 ERROR BlockManagerMasterEndpoint: Fail to know the executor 
> 289 is alive or not.
> org.apache.spark.SparkException: Exception thrown in awaitResult: 
>     
> Caused by: org.apache.spark.rpc.RpcEndpointNotFoundException: ..
> {code}
> h2. Proposed Structured Logging Format
> The proposed change involves structuring the logs in JSON format, which 
> organizes the log information into easily identifiable fields. Here is how 
> the new structured log format would look:
> {code:java}
> {
>    "ts":"23/11/29 17:53:44",
>    "level":"ERROR",
>    "msg":"Fail to know the executor 289 is alive or not",
>    "context":{
>        "executor_id":"289"
>    },
>    "exception":{
>       "class":"org.apache.spark.SparkException",
>       "msg":"Exception thrown in awaitResult",
>       "stackTrace":"..."
>    },
>    "source":"BlockManagerMasterEndpoint"
> } {code}
> This format will enable users to upload and directly query 
> driver/executor/master/worker log files using Spark SQL for more effective 
> problem-solving and analysis, such as tracking executor losses or identifying 
> faulty tasks:
> {code:java}
> spark.read.json("hdfs://hdfs_host/logs").createOrReplaceTempView("logs")
> /* To get all the executor lost logs */
> SELECT * FROM logs WHERE contains(message, 'Lost executor');
> /* To get all the distributed logs about executor 289 */
> SELECT * FROM logs WHERE executor_id = 289;
> /* To get all the errors on host 100.116.29.4 */
> SELECT * FROM logs WHERE host = "100.116.29.4" and log_level="ERROR";
> {code}
>  
> SPIP doc: 
> [https://docs.google.com/document/d/1rATVGmFLNVLmtxSpWrEceYm7d-ocgu8ofhryVs4g3XU/edit?usp=sharing]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47765) Add SET COLLATION to parser rules

2024-04-09 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47765:
---
Labels: pull-request-available  (was: )

> Add SET COLLATION to parser rules
> -
>
> Key: SPARK-47765
> URL: https://issues.apache.org/jira/browse/SPARK-47765
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Mihailo Milosevic
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

99 matches

Mail list logo