[jira] [Created] (SPARK-47791) Truncate exceed decimals with scale first instead of precision from JDBC datasource
Kent Yao created SPARK-47791: Summary: Truncate exceed decimals with scale first instead of precision from JDBC datasource Key: SPARK-47791 URL: https://issues.apache.org/jira/browse/SPARK-47791 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 4.0.0 Reporter: Kent Yao -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47781) Handle negative scale for JDBC data sources
[ https://issues.apache.org/jira/browse/SPARK-47781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao updated SPARK-47781: - Summary: Handle negative scale for JDBC data sources (was: Handle negative scale and truncate exceed scale first for JDBC data sources) > Handle negative scale for JDBC data sources > --- > > Key: SPARK-47781 > URL: https://issues.apache.org/jira/browse/SPARK-47781 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > SPARK-30252 has disabled the negative scale for decimals. It has a regression > that it also disabled reading from data sources that support negative scale > decimals -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47790) Upgrade `commons-io` to 2.16.1
[ https://issues.apache.org/jira/browse/SPARK-47790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47790: --- Labels: pull-request-available (was: ) > Upgrade `commons-io` to 2.16.1 > -- > > Key: SPARK-47790 > URL: https://issues.apache.org/jira/browse/SPARK-47790 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47781) Handle negative scale and truncate exceed scale first for JDBC data sources
[ https://issues.apache.org/jira/browse/SPARK-47781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao resolved SPARK-47781. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45956 [https://github.com/apache/spark/pull/45956] > Handle negative scale and truncate exceed scale first for JDBC data sources > --- > > Key: SPARK-47781 > URL: https://issues.apache.org/jira/browse/SPARK-47781 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > SPARK-30252 has disabled the negative scale for decimals. It has a regression > that it also disabled reading from data sources that support negative scale > decimals -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45444) Upgrade `commons-io` to 2.14.0
[ https://issues.apache.org/jira/browse/SPARK-45444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-45444: -- Summary: Upgrade `commons-io` to 2.14.0 (was: Upgrade `commons-io` to 1.24.0) > Upgrade `commons-io` to 2.14.0 > -- > > Key: SPARK-45444 > URL: https://issues.apache.org/jira/browse/SPARK-45444 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Trivial > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45444) Upgrade `commons-io` to 1.24.0
[ https://issues.apache.org/jira/browse/SPARK-45444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-45444: -- Parent: SPARK-47046 Issue Type: Sub-task (was: Improvement) > Upgrade `commons-io` to 1.24.0 > -- > > Key: SPARK-45444 > URL: https://issues.apache.org/jira/browse/SPARK-45444 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Trivial > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45734) Upgrade commons-io to 2.15.0
[ https://issues.apache.org/jira/browse/SPARK-45734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-45734: -- Parent: SPARK-47046 Issue Type: Sub-task (was: Improvement) > Upgrade commons-io to 2.15.0 > > > Key: SPARK-45734 > URL: https://issues.apache.org/jira/browse/SPARK-45734 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > https://commons.apache.org/proper/commons-io/changes-report.html#a2.15.0 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47593) Connector module: Migrate logWarn with variables to structured logging framework
[ https://issues.apache.org/jira/browse/SPARK-47593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang resolved SPARK-47593. Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45879 [https://github.com/apache/spark/pull/45879] > Connector module: Migrate logWarn with variables to structured logging > framework > > > Key: SPARK-47593 > URL: https://issues.apache.org/jira/browse/SPARK-47593 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Gengliang Wang >Assignee: BingKun Pan >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47776) State store operation cannot work properly with binary inequality collation
[ https://issues.apache.org/jira/browse/SPARK-47776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim reassigned SPARK-47776: Assignee: Jungtaek Lim > State store operation cannot work properly with binary inequality collation > --- > > Key: SPARK-47776 > URL: https://issues.apache.org/jira/browse/SPARK-47776 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 4.0.0 >Reporter: Jungtaek Lim >Assignee: Jungtaek Lim >Priority: Blocker > Labels: pull-request-available > > Arguably this is a correctness issue, though we haven't released collation > feature yet. > collation introduces the concept of binary (in)equality, which means in some > collation we no longer be able to just compare the binary format of two > UnsafeRows to determine equality. > For example, 'aaa' and 'AAA' can be "semantically" same in case insensitive > collation. > State store is basically key-value storage, and the most provider > implementations rely on the fact that all the columns in the key schema > support binary equality. We need to disallow using binary inequality column > in the key schema, before we could support this in majority of state store > providers (or high-level of state store.) > Why this is correctness issue? For example, streaming aggregation will > produce an output of aggregation which does not care about the semantic > equality. > e.g. df.groupBy(strCol).count() > Although strCol is case insensitive, 'a' and 'A' won't be counted together in > streaming aggregation, while they should be. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47776) State store operation cannot work properly with binary inequality collation
[ https://issues.apache.org/jira/browse/SPARK-47776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim resolved SPARK-47776. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45951 [https://github.com/apache/spark/pull/45951] > State store operation cannot work properly with binary inequality collation > --- > > Key: SPARK-47776 > URL: https://issues.apache.org/jira/browse/SPARK-47776 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 4.0.0 >Reporter: Jungtaek Lim >Assignee: Jungtaek Lim >Priority: Blocker > Labels: pull-request-available > Fix For: 4.0.0 > > > Arguably this is a correctness issue, though we haven't released collation > feature yet. > collation introduces the concept of binary (in)equality, which means in some > collation we no longer be able to just compare the binary format of two > UnsafeRows to determine equality. > For example, 'aaa' and 'AAA' can be "semantically" same in case insensitive > collation. > State store is basically key-value storage, and the most provider > implementations rely on the fact that all the columns in the key schema > support binary equality. We need to disallow using binary inequality column > in the key schema, before we could support this in majority of state store > providers (or high-level of state store.) > Why this is correctness issue? For example, streaming aggregation will > produce an output of aggregation which does not care about the semantic > equality. > e.g. df.groupBy(strCol).count() > Although strCol is case insensitive, 'a' and 'A' won't be counted together in > streaming aggregation, while they should be. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-47759) Apps being stuck after JavaUtils.timeStringAs fails to parse a legitimate time string
[ https://issues.apache.org/jira/browse/SPARK-47759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835587#comment-17835587 ] Mridul Muralidharan edited comment on SPARK-47759 at 4/10/24 4:08 AM: -- In order to validate, I would suggest two things. Wrap the input str (or lower), within quotes, in the exception message ... for example, "120s\u00A0" will look like 120s in the exception message as it is a unicode non breaking space. The other would be to include the NumberFormatException 'e' as the cause in the exception being thrown. Once you are able to get a stack trace with these two change in place, it should help us debug this better. was (Author: mridulm80): In order to validate, I would suggest two things. Wrap the input str, within quote, in the exception message ... for example, "120s\u00A0" will look like 120s in the exception message as it is a unicode non breaking space. The other would be to include the NumberFormatException 'e' as the cause in the exception being thrown. Once you are able to get a stack trace with these two change in place, it should help us debug this better. > Apps being stuck after JavaUtils.timeStringAs fails to parse a legitimate > time string > - > > Key: SPARK-47759 > URL: https://issues.apache.org/jira/browse/SPARK-47759 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.5.0, 3.5.1 >Reporter: Bo Xiong >Assignee: Bo Xiong >Priority: Critical > Labels: hang, pull-request-available, stuck, threadsafe > Fix For: 3.5.0, 4.0.0, 3.5.1, 3.5.2 > > Original Estimate: 4h > Remaining Estimate: 4h > > h2. Symptom > It's observed that our Spark apps occasionally got stuck with an unexpected > stack trace when reading/parsing a legitimate time string. Note that we > manually killed the stuck app instances and the retry goes thru on the same > cluster (without requiring any app code change). > > *[Stack Trace 1]* The stack trace doesn't make sense since *120s* is a > legitimate time string, where the app runs on emr-7.0.0 with Spark 3.5.0 > runtime. > {code:java} > Caused by: java.lang.RuntimeException: java.lang.NumberFormatException: Time > must be specified as seconds (s), milliseconds (ms), microseconds (us), > minutes (m or min), hour (h), or day (d). E.g. 50s, 100ms, or 250us. > Failed to parse time string: 120s > at > org.apache.spark.network.util.JavaUtils.timeStringAs(JavaUtils.java:258) > at > org.apache.spark.network.util.JavaUtils.timeStringAsSec(JavaUtils.java:275) > at org.apache.spark.util.Utils$.timeStringAsSeconds(Utils.scala:1166) > at org.apache.spark.rpc.RpcTimeout$.apply(RpcTimeout.scala:131) > at org.apache.spark.util.RpcUtils$.askRpcTimeout(RpcUtils.scala:41) > at org.apache.spark.rpc.RpcEndpointRef.(RpcEndpointRef.scala:33) > at > org.apache.spark.rpc.netty.NettyRpcEndpointRef.(NettyRpcEnv.scala:533) > at org.apache.spark.rpc.netty.RequestMessage$.apply(NettyRpcEnv.scala:640) > at > org.apache.spark.rpc.netty.NettyRpcHandler.internalReceive(NettyRpcEnv.scala:697) > at > org.apache.spark.rpc.netty.NettyRpcHandler.receive(NettyRpcEnv.scala:682) > at > org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:163) > at > org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:109) > at > org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:140) > at > org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:53) > at > io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) > at > io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) > at > io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) > at >
[jira] [Comment Edited] (SPARK-47759) Apps being stuck after JavaUtils.timeStringAs fails to parse a legitimate time string
[ https://issues.apache.org/jira/browse/SPARK-47759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835587#comment-17835587 ] Mridul Muralidharan edited comment on SPARK-47759 at 4/10/24 4:08 AM: -- In order to validate, I would suggest two things. Wrap the input str, within quote, in the exception message ... for example, "120s\u00A0" will look like 120s in the exception message as it is a unicode non breaking space. The other would be to include the NumberFormatException 'e' as the cause in the exception being thrown. Once you are able to get a stack trace with these two change in place, it should help us debug this better. was (Author: mridulm80): In order to validate, I would suggest two things. Wrap the input str, within quote, to the exception ... for example, "120s\u00A0" will look like 120s in the exception message as it is a unicode non breaking space. The other would be to include the NumberFormatException 'e' as the cause in the exception being thrown. Once you are able to get a stack trace with these two change in place, it should help us debug this better. > Apps being stuck after JavaUtils.timeStringAs fails to parse a legitimate > time string > - > > Key: SPARK-47759 > URL: https://issues.apache.org/jira/browse/SPARK-47759 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.5.0, 3.5.1 >Reporter: Bo Xiong >Assignee: Bo Xiong >Priority: Critical > Labels: hang, pull-request-available, stuck, threadsafe > Fix For: 3.5.0, 4.0.0, 3.5.1, 3.5.2 > > Original Estimate: 4h > Remaining Estimate: 4h > > h2. Symptom > It's observed that our Spark apps occasionally got stuck with an unexpected > stack trace when reading/parsing a legitimate time string. Note that we > manually killed the stuck app instances and the retry goes thru on the same > cluster (without requiring any app code change). > > *[Stack Trace 1]* The stack trace doesn't make sense since *120s* is a > legitimate time string, where the app runs on emr-7.0.0 with Spark 3.5.0 > runtime. > {code:java} > Caused by: java.lang.RuntimeException: java.lang.NumberFormatException: Time > must be specified as seconds (s), milliseconds (ms), microseconds (us), > minutes (m or min), hour (h), or day (d). E.g. 50s, 100ms, or 250us. > Failed to parse time string: 120s > at > org.apache.spark.network.util.JavaUtils.timeStringAs(JavaUtils.java:258) > at > org.apache.spark.network.util.JavaUtils.timeStringAsSec(JavaUtils.java:275) > at org.apache.spark.util.Utils$.timeStringAsSeconds(Utils.scala:1166) > at org.apache.spark.rpc.RpcTimeout$.apply(RpcTimeout.scala:131) > at org.apache.spark.util.RpcUtils$.askRpcTimeout(RpcUtils.scala:41) > at org.apache.spark.rpc.RpcEndpointRef.(RpcEndpointRef.scala:33) > at > org.apache.spark.rpc.netty.NettyRpcEndpointRef.(NettyRpcEnv.scala:533) > at org.apache.spark.rpc.netty.RequestMessage$.apply(NettyRpcEnv.scala:640) > at > org.apache.spark.rpc.netty.NettyRpcHandler.internalReceive(NettyRpcEnv.scala:697) > at > org.apache.spark.rpc.netty.NettyRpcHandler.receive(NettyRpcEnv.scala:682) > at > org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:163) > at > org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:109) > at > org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:140) > at > org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:53) > at > io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) > at > io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) > at > io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) > at >
[jira] [Commented] (SPARK-47759) Apps being stuck after JavaUtils.timeStringAs fails to parse a legitimate time string
[ https://issues.apache.org/jira/browse/SPARK-47759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835587#comment-17835587 ] Mridul Muralidharan commented on SPARK-47759: - In order to validate, I would suggest two things. Wrap the input str, within quote, to the exception ... for example, "120s\u00A0" will look like 120s in the exception message as it is a unicode non breaking space. The other would be to include the NumberFormatException 'e' as the cause in the exception being thrown. Once you are able to get a stack trace with these two change in place, it should help us debug this better. > Apps being stuck after JavaUtils.timeStringAs fails to parse a legitimate > time string > - > > Key: SPARK-47759 > URL: https://issues.apache.org/jira/browse/SPARK-47759 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.5.0, 3.5.1 >Reporter: Bo Xiong >Assignee: Bo Xiong >Priority: Critical > Labels: hang, pull-request-available, stuck, threadsafe > Fix For: 3.5.0, 4.0.0, 3.5.1, 3.5.2 > > Original Estimate: 4h > Remaining Estimate: 4h > > h2. Symptom > It's observed that our Spark apps occasionally got stuck with an unexpected > stack trace when reading/parsing a legitimate time string. Note that we > manually killed the stuck app instances and the retry goes thru on the same > cluster (without requiring any app code change). > > *[Stack Trace 1]* The stack trace doesn't make sense since *120s* is a > legitimate time string, where the app runs on emr-7.0.0 with Spark 3.5.0 > runtime. > {code:java} > Caused by: java.lang.RuntimeException: java.lang.NumberFormatException: Time > must be specified as seconds (s), milliseconds (ms), microseconds (us), > minutes (m or min), hour (h), or day (d). E.g. 50s, 100ms, or 250us. > Failed to parse time string: 120s > at > org.apache.spark.network.util.JavaUtils.timeStringAs(JavaUtils.java:258) > at > org.apache.spark.network.util.JavaUtils.timeStringAsSec(JavaUtils.java:275) > at org.apache.spark.util.Utils$.timeStringAsSeconds(Utils.scala:1166) > at org.apache.spark.rpc.RpcTimeout$.apply(RpcTimeout.scala:131) > at org.apache.spark.util.RpcUtils$.askRpcTimeout(RpcUtils.scala:41) > at org.apache.spark.rpc.RpcEndpointRef.(RpcEndpointRef.scala:33) > at > org.apache.spark.rpc.netty.NettyRpcEndpointRef.(NettyRpcEnv.scala:533) > at org.apache.spark.rpc.netty.RequestMessage$.apply(NettyRpcEnv.scala:640) > at > org.apache.spark.rpc.netty.NettyRpcHandler.internalReceive(NettyRpcEnv.scala:697) > at > org.apache.spark.rpc.netty.NettyRpcHandler.receive(NettyRpcEnv.scala:682) > at > org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:163) > at > org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:109) > at > org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:140) > at > org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:53) > at > io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) > at > io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) > at > io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) > at > org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:102) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) > at >
[jira] [Updated] (SPARK-47083) Upgrade `commons-codec` to 1.16.1
[ https://issues.apache.org/jira/browse/SPARK-47083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-47083: -- Fix Version/s: 3.5.2 > Upgrade `commons-codec` to 1.16.1 > - > > Key: SPARK-47083 > URL: https://issues.apache.org/jira/browse/SPARK-47083 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0, 3.5.2 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47761) Oracle: Support reading AnsiIntervalTypes
[ https://issues.apache.org/jira/browse/SPARK-47761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao resolved SPARK-47761. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45925 [https://github.com/apache/spark/pull/45925] > Oracle: Support reading AnsiIntervalTypes > - > > Key: SPARK-47761 > URL: https://issues.apache.org/jira/browse/SPARK-47761 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47761) Oracle: Support reading AnsiIntervalTypes
[ https://issues.apache.org/jira/browse/SPARK-47761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao reassigned SPARK-47761: Assignee: Kent Yao > Oracle: Support reading AnsiIntervalTypes > - > > Key: SPARK-47761 > URL: https://issues.apache.org/jira/browse/SPARK-47761 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47182) Exclude `commons-(io|lang3)` transitive dependencies from `commons-compress` and `avro-*`
[ https://issues.apache.org/jira/browse/SPARK-47182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-47182: -- Fix Version/s: 3.5.2 > Exclude `commons-(io|lang3)` transitive dependencies from `commons-compress` > and `avro-*` > - > > Key: SPARK-47182 > URL: https://issues.apache.org/jira/browse/SPARK-47182 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0, 3.5.2 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-47463) An error occurred while pushing down the filter of if expression for iceberg datasource.
[ https://issues.apache.org/jira/browse/SPARK-47463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835282#comment-17835282 ] Zhen Wang edited comment on SPARK-47463 at 4/10/24 2:22 AM: new case: filter with *nvl* expression: {code:java} create table t2(c1 boolean) using iceberg; select * from t2 where nvl(c1, false) = true; {code} was (Author: wforget): new similar failure case: {code:java} create table t2(c1 boolean) using iceberg; select * from t2 where nvl(c1, false) = true; {code} > An error occurred while pushing down the filter of if expression for iceberg > datasource. > > > Key: SPARK-47463 > URL: https://issues.apache.org/jira/browse/SPARK-47463 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 > Environment: Spark 3.5.0 > Iceberg 1.4.3 >Reporter: Zhen Wang >Priority: Major > Labels: pull-request-available > > Reproduce: > {code:java} > create table t1(c1 int) using iceberg; > select * from > (select if(c1 = 1, c1, null) as c1 from t1) t > where t.c1 > 0; {code} > Error: > {code:java} > org.apache.spark.SparkException: [INTERNAL_ERROR] The Spark SQL phase > optimization failed with an internal error. You hit a bug in Spark or the > Spark plugins you use. Please, report this bug to the corresponding > communities or vendors, and provide the full stack trace. > at > org.apache.spark.SparkException$.internalError(SparkException.scala:107) > at > org.apache.spark.sql.execution.QueryExecution$.toInternalError(QueryExecution.scala:536) > at > org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:548) > at > org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:219) > at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900) > at > org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:218) > at > org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:148) > at > org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:144) > at > org.apache.spark.sql.execution.QueryExecution.assertOptimized(QueryExecution.scala:162) > at > org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:182) > at > org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:179) > at > org.apache.spark.sql.execution.QueryExecution.simpleString(QueryExecution.scala:238) > at > org.apache.spark.sql.execution.QueryExecution.org$apache$spark$sql$execution$QueryExecution$$explainString(QueryExecution.scala:284) > at > org.apache.spark.sql.execution.QueryExecution.explainString(QueryExecution.scala:252) > at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:117) > at > org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:201) > at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:108) > at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900) > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:66) > at org.apache.spark.sql.Dataset.withAction(Dataset.scala:4327) > at org.apache.spark.sql.Dataset.collect(Dataset.scala:3580) > at > org.apache.kyuubi.engine.spark.operation.ExecuteStatement.fullCollectResult(ExecuteStatement.scala:72) > at > org.apache.kyuubi.engine.spark.operation.ExecuteStatement.collectAsIterator(ExecuteStatement.scala:164) > at > org.apache.kyuubi.engine.spark.operation.ExecuteStatement.$anonfun$executeStatement$1(ExecuteStatement.scala:87) > at > scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > at > org.apache.kyuubi.engine.spark.operation.SparkOperation.$anonfun$withLocalProperties$1(SparkOperation.scala:155) > at > org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:201) > at > org.apache.kyuubi.engine.spark.operation.SparkOperation.withLocalProperties(SparkOperation.scala:139) > at > org.apache.kyuubi.engine.spark.operation.ExecuteStatement.executeStatement(ExecuteStatement.scala:81) > at > org.apache.kyuubi.engine.spark.operation.ExecuteStatement$$anon$1.run(ExecuteStatement.scala:103) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) >
[jira] [Resolved] (SPARK-47586) Hive module: Migrate logError with variables to structured logging framework
[ https://issues.apache.org/jira/browse/SPARK-47586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang resolved SPARK-47586. Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45876 [https://github.com/apache/spark/pull/45876] > Hive module: Migrate logError with variables to structured logging framework > > > Key: SPARK-47586 > URL: https://issues.apache.org/jira/browse/SPARK-47586 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Gengliang Wang >Assignee: Haejoon Lee >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47787) Upgrade `commons-compress` to 1.26.1`
[ https://issues.apache.org/jira/browse/SPARK-47787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-47787. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45966 [https://github.com/apache/spark/pull/45966] > Upgrade `commons-compress` to 1.26.1` > - > > Key: SPARK-47787 > URL: https://issues.apache.org/jira/browse/SPARK-47787 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47759) Apps being stuck after JavaUtils.timeStringAs fails to parse a legitimate time string
[ https://issues.apache.org/jira/browse/SPARK-47759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Xiong updated SPARK-47759: - Description: h2. Symptom It's observed that our Spark apps occasionally got stuck with an unexpected stack trace when reading/parsing a legitimate time string. Note that we manually killed the stuck app instances and the retry goes thru on the same cluster (without requiring any app code change). *[Stack Trace 1]* The stack trace doesn't make sense since *120s* is a legitimate time string, where the app runs on emr-7.0.0 with Spark 3.5.0 runtime. {code:java} Caused by: java.lang.RuntimeException: java.lang.NumberFormatException: Time must be specified as seconds (s), milliseconds (ms), microseconds (us), minutes (m or min), hour (h), or day (d). E.g. 50s, 100ms, or 250us. Failed to parse time string: 120s at org.apache.spark.network.util.JavaUtils.timeStringAs(JavaUtils.java:258) at org.apache.spark.network.util.JavaUtils.timeStringAsSec(JavaUtils.java:275) at org.apache.spark.util.Utils$.timeStringAsSeconds(Utils.scala:1166) at org.apache.spark.rpc.RpcTimeout$.apply(RpcTimeout.scala:131) at org.apache.spark.util.RpcUtils$.askRpcTimeout(RpcUtils.scala:41) at org.apache.spark.rpc.RpcEndpointRef.(RpcEndpointRef.scala:33) at org.apache.spark.rpc.netty.NettyRpcEndpointRef.(NettyRpcEnv.scala:533) at org.apache.spark.rpc.netty.RequestMessage$.apply(NettyRpcEnv.scala:640) at org.apache.spark.rpc.netty.NettyRpcHandler.internalReceive(NettyRpcEnv.scala:697) at org.apache.spark.rpc.netty.NettyRpcHandler.receive(NettyRpcEnv.scala:682) at org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:163) at org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:109) at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:140) at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:53) at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:102) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) at org.apache.spark.network.crypto.TransportCipher$DecryptionHandler.channelRead(TransportCipher.java:192) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788) at
[jira] [Updated] (SPARK-47788) Ensure the same hash partitioning scheme/hash function is used across batches
[ https://issues.apache.org/jira/browse/SPARK-47788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47788: --- Labels: pull-request-available (was: ) > Ensure the same hash partitioning scheme/hash function is used across batches > - > > Key: SPARK-47788 > URL: https://issues.apache.org/jira/browse/SPARK-47788 > Project: Spark > Issue Type: Test > Components: Structured Streaming >Affects Versions: 3.5.1 >Reporter: Fanyue Xia >Priority: Major > Labels: pull-request-available > > To really make sure that any changes to hash function / partitioner in Spark > doesn’t cause logical correctness issues in existing running streaming > queries, we should add a new unit test, to ensure hash function stability is > maintained. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-47578) Spark core: Migrate logWarn with variables to structured logging framework
[ https://issues.apache.org/jira/browse/SPARK-47578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835552#comment-17835552 ] Daniel commented on SPARK-47578: spark-2$ grep logWarning sql/core/src/main/* -R | cut -d ':' -f 1 | sort -u Under sql/core/src/main/scala/org/apache/spark/sql/ (excluding streaming): Column.scala SparkSession.scala api/python/PythonSQLUtils.scala api/r/SQLUtils.scala catalyst/analysis/ResolveSessionCatalog.scala execution/CacheManager.scala execution/ExistingRDD.scala execution/OptimizeMetadataOnlyQuery.scala execution/QueryExecution.scala execution/SparkSqlParser.scala execution/SparkStrategies.scala execution/WholeStageCodegenExec.scala execution/adaptive/AdaptiveSparkPlanExec.scala execution/adaptive/InsertAdaptiveSparkPlan.scala execution/adaptive/ShufflePartitionsUtil.scala execution/aggregate/HashAggregateExec.scala execution/command/AnalyzeTablesCommand.scala execution/command/CommandUtils.scala execution/command/SetCommand.scala execution/command/createDataSourceTables.scala execution/command/ddl.scala execution/datasources/BasicWriteStatsTracker.scala execution/datasources/DataSource.scala execution/datasources/DataSourceManager.scala execution/datasources/FilePartition.scala execution/datasources/FileScanRDD.scala execution/datasources/FileStatusCache.scala execution/datasources/csv/CSVDataSource.scala execution/datasources/jdbc/JDBCRDD.scala execution/datasources/jdbc/JDBCRelation.scala execution/datasources/jdbc/JdbcUtils.scala execution/datasources/json/JsonOutputWriter.scala execution/datasources/orc/OrcUtils.scala execution/datasources/parquet/ParquetFileFormat.scala execution/datasources/parquet/ParquetUtils.scala execution/datasources/v2/CacheTableExec.scala execution/datasources/v2/CreateIndexExec.scala execution/datasources/v2/CreateNamespaceExec.scala execution/datasources/v2/CreateTableExec.scala execution/datasources/v2/DataSourceV2Strategy.scala execution/datasources/v2/DropIndexExec.scala execution/datasources/v2/FilePartitionReader.scala execution/datasources/v2/FileScan.scala execution/datasources/v2/V2ScanPartitioningAndOrdering.scala execution/datasources/v2/jdbc/JDBCTableCatalog.scala execution/datasources/v2/state/StatePartitionReader.scala execution/datasources/xml/XmlDataSource.scala execution/python/ApplyInPandasWithStatePythonRunner.scala execution/python/AttachDistributedSequenceExec.scala execution/ui/SQLAppStatusListener.scala execution/window/WindowExecBase.scala internal/SharedState.scala jdbc/DB2Dialect.scala jdbc/H2Dialect.scala jdbc/JdbcDialects.scala jdbc/MsSqlServerDialect.scala jdbc/MySQLDialect.scala jdbc/OracleDialect.scala > Spark core: Migrate logWarn with variables to structured logging framework > -- > > Key: SPARK-47578 > URL: https://issues.apache.org/jira/browse/SPARK-47578 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Gengliang Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47789) Review and improve error message texts
Serge Rielau created SPARK-47789: Summary: Review and improve error message texts Key: SPARK-47789 URL: https://issues.apache.org/jira/browse/SPARK-47789 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 4.0.0 Reporter: Serge Rielau error-classes.json content could use some TLC to fix formatting, improve grammar, and other editing. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-47583) SQL core: Migrate logError with variables to structured logging framework
[ https://issues.apache.org/jira/browse/SPARK-47583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835546#comment-17835546 ] Daniel edited comment on SPARK-47583 at 4/9/24 11:26 PM: - spark$ grep logError sql/core/src/main/* -R | cut -d ':' -f 1 | sort -u sql/core/src/main/scala/org/apache/spark/sql/execution (excluding the streaming directory): BaseScriptTransformationExec.scala adaptive/AdaptiveSparkPlanExec.scala command/InsertIntoDataSourceDirCommand.scala command/createDataSourceTables.scala command/ddl.scala datasources/FileFormatWriter.scala datasources/jdbc/connection/ConnectionProvider.scala datasources/v2/WriteToDataSourceV2Exec.scala datasources/v2/jdbc/JDBCScanBuilder.scala datasources/v2/jdbc/JDBCTableCatalog.scala exchange/BroadcastExchangeExec.scala python/PythonStreamingSourceRunner.scala was (Author: JIRAUSER285772): spark$ grep logError sql/core/src/main/* -R | cut -d ':' -f 1 | sort -u sql/core/src/main/scala/org/apache/spark/sql/execution (excluding the streaming directory): (X) BaseScriptTransformationExec.scala (X) adaptive/AdaptiveSparkPlanExec.scala (X) command/InsertIntoDataSourceDirCommand.scala (X) command/createDataSourceTables.scala (X) command/ddl.scala (X) datasources/FileFormatWriter.scala (X) datasources/jdbc/connection/ConnectionProvider.scala (X) datasources/v2/WriteToDataSourceV2Exec.scala (X) datasources/v2/jdbc/JDBCScanBuilder.scala (X) datasources/v2/jdbc/JDBCTableCatalog.scala (X) exchange/BroadcastExchangeExec.scala python/PythonStreamingSourceRunner.scala > SQL core: Migrate logError with variables to structured logging framework > - > > Key: SPARK-47583 > URL: https://issues.apache.org/jira/browse/SPARK-47583 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Gengliang Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47788) Ensure the same hash partitioning scheme/hash function is used across batches
Fanyue Xia created SPARK-47788: -- Summary: Ensure the same hash partitioning scheme/hash function is used across batches Key: SPARK-47788 URL: https://issues.apache.org/jira/browse/SPARK-47788 Project: Spark Issue Type: Test Components: Structured Streaming Affects Versions: 3.5.1 Reporter: Fanyue Xia To really make sure that any changes to hash function / partitioner in Spark doesn’t cause logical correctness issues in existing running streaming queries, we should add a new unit test, to ensure hash function stability is maintained. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-47583) SQL core: Migrate logError with variables to structured logging framework
[ https://issues.apache.org/jira/browse/SPARK-47583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835546#comment-17835546 ] Daniel edited comment on SPARK-47583 at 4/9/24 11:16 PM: - spark$ grep logError sql/core/src/main/* -R | cut -d ':' -f 1 | sort -u sql/core/src/main/scala/org/apache/spark/sql/execution (excluding the streaming directory): (X) BaseScriptTransformationExec.scala (X) adaptive/AdaptiveSparkPlanExec.scala (X) command/InsertIntoDataSourceDirCommand.scala (X) command/createDataSourceTables.scala (X) command/ddl.scala (X) datasources/FileFormatWriter.scala (X) datasources/jdbc/connection/ConnectionProvider.scala (X) datasources/v2/WriteToDataSourceV2Exec.scala (X) datasources/v2/jdbc/JDBCScanBuilder.scala (X) datasources/v2/jdbc/JDBCTableCatalog.scala (X) exchange/BroadcastExchangeExec.scala python/PythonStreamingSourceRunner.scala was (Author: JIRAUSER285772): spark$ grep logError sql/core/src/main/* -R | cut -d ':' -f 1 | sort -u sql/core/src/main/scala/org/apache/spark/sql/execution (excluding the streaming directory): (X) BaseScriptTransformationExec.scala (X) adaptive/AdaptiveSparkPlanExec.scala (X) command/InsertIntoDataSourceDirCommand.scala (X) command/createDataSourceTables.scala (X) command/ddl.scala (X) datasources/FileFormatWriter.scala (X) datasources/jdbc/connection/ConnectionProvider.scala (X) datasources/v2/WriteToDataSourceV2Exec.scala (X) datasources/v2/jdbc/JDBCScanBuilder.scala (X) datasources/v2/jdbc/JDBCTableCatalog.scala exchange/BroadcastExchangeExec.scala python/PythonStreamingSourceRunner.scala > SQL core: Migrate logError with variables to structured logging framework > - > > Key: SPARK-47583 > URL: https://issues.apache.org/jira/browse/SPARK-47583 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Gengliang Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-47583) SQL core: Migrate logError with variables to structured logging framework
[ https://issues.apache.org/jira/browse/SPARK-47583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835546#comment-17835546 ] Daniel edited comment on SPARK-47583 at 4/9/24 11:14 PM: - spark$ grep logError sql/core/src/main/* -R | cut -d ':' -f 1 | sort -u sql/core/src/main/scala/org/apache/spark/sql/execution (excluding the streaming directory): (X) BaseScriptTransformationExec.scala (X) adaptive/AdaptiveSparkPlanExec.scala (X) command/InsertIntoDataSourceDirCommand.scala (X) command/createDataSourceTables.scala (X) command/ddl.scala (X) datasources/FileFormatWriter.scala (X) datasources/jdbc/connection/ConnectionProvider.scala (X) datasources/v2/WriteToDataSourceV2Exec.scala (X) datasources/v2/jdbc/JDBCScanBuilder.scala (X) datasources/v2/jdbc/JDBCTableCatalog.scala exchange/BroadcastExchangeExec.scala python/PythonStreamingSourceRunner.scala was (Author: JIRAUSER285772): spark$ grep logError sql/core/src/main/* -R | cut -d ':' -f 1 | sort -u sql/core/src/main/scala/org/apache/spark/sql/execution (excluding the streaming directory): (X) BaseScriptTransformationExec.scala (X) adaptive/AdaptiveSparkPlanExec.scala (X) command/InsertIntoDataSourceDirCommand.scala (X) command/createDataSourceTables.scala (X) command/ddl.scala (X) datasources/FileFormatWriter.scala (X) datasources/jdbc/connection/ConnectionProvider.scala (X) datasources/v2/WriteToDataSourceV2Exec.scala (X) datasources/v2/jdbc/JDBCScanBuilder.scala datasources/v2/jdbc/JDBCTableCatalog.scala exchange/BroadcastExchangeExec.scala python/PythonStreamingSourceRunner.scala > SQL core: Migrate logError with variables to structured logging framework > - > > Key: SPARK-47583 > URL: https://issues.apache.org/jira/browse/SPARK-47583 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Gengliang Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-47583) SQL core: Migrate logError with variables to structured logging framework
[ https://issues.apache.org/jira/browse/SPARK-47583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835546#comment-17835546 ] Daniel edited comment on SPARK-47583 at 4/9/24 11:14 PM: - spark$ grep logError sql/core/src/main/* -R | cut -d ':' -f 1 | sort -u sql/core/src/main/scala/org/apache/spark/sql/execution (excluding the streaming directory): (X) BaseScriptTransformationExec.scala (X) adaptive/AdaptiveSparkPlanExec.scala (X) command/InsertIntoDataSourceDirCommand.scala (X) command/createDataSourceTables.scala (X) command/ddl.scala (X) datasources/FileFormatWriter.scala (X) datasources/jdbc/connection/ConnectionProvider.scala (X) datasources/v2/WriteToDataSourceV2Exec.scala (X) datasources/v2/jdbc/JDBCScanBuilder.scala datasources/v2/jdbc/JDBCTableCatalog.scala exchange/BroadcastExchangeExec.scala python/PythonStreamingSourceRunner.scala was (Author: JIRAUSER285772): spark$ grep logError sql/core/src/main/* -R | cut -d ':' -f 1 | sort -u sql/core/src/main/scala/org/apache/spark/sql/execution (excluding the streaming directory): (X) BaseScriptTransformationExec.scala (X) adaptive/AdaptiveSparkPlanExec.scala (X) command/InsertIntoDataSourceDirCommand.scala (X) command/createDataSourceTables.scala (X) command/ddl.scala (X) datasources/FileFormatWriter.scala (X) datasources/jdbc/connection/ConnectionProvider.scala (X) datasources/v2/WriteToDataSourceV2Exec.scala datasources/v2/jdbc/JDBCScanBuilder.scala datasources/v2/jdbc/JDBCTableCatalog.scala exchange/BroadcastExchangeExec.scala python/PythonStreamingSourceRunner.scala > SQL core: Migrate logError with variables to structured logging framework > - > > Key: SPARK-47583 > URL: https://issues.apache.org/jira/browse/SPARK-47583 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Gengliang Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-47583) SQL core: Migrate logError with variables to structured logging framework
[ https://issues.apache.org/jira/browse/SPARK-47583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835546#comment-17835546 ] Daniel edited comment on SPARK-47583 at 4/9/24 11:07 PM: - spark$ grep logError sql/core/src/main/* -R | cut -d ':' -f 1 | sort -u sql/core/src/main/scala/org/apache/spark/sql/execution (excluding the streaming directory): (X) BaseScriptTransformationExec.scala (X) adaptive/AdaptiveSparkPlanExec.scala (X) command/InsertIntoDataSourceDirCommand.scala (X) command/createDataSourceTables.scala (X) command/ddl.scala (X) datasources/FileFormatWriter.scala datasources/jdbc/connection/ConnectionProvider.scala datasources/v2/WriteToDataSourceV2Exec.scala datasources/v2/jdbc/JDBCScanBuilder.scala datasources/v2/jdbc/JDBCTableCatalog.scala exchange/BroadcastExchangeExec.scala python/PythonStreamingSourceRunner.scala was (Author: JIRAUSER285772): spark$ grep logError sql/core/src/main/* -R | cut -d ':' -f 1 | sort -u sql/core/src/main/scala/org/apache/spark/sql/execution (excluding the streaming directory): (X) BaseScriptTransformationExec.scala (X) adaptive/AdaptiveSparkPlanExec.scala (X) command/InsertIntoDataSourceDirCommand.scala (X) command/createDataSourceTables.scala (X) command/ddl.scala datasources/FileFormatWriter.scala datasources/jdbc/connection/ConnectionProvider.scala datasources/v2/WriteToDataSourceV2Exec.scala datasources/v2/jdbc/JDBCScanBuilder.scala datasources/v2/jdbc/JDBCTableCatalog.scala exchange/BroadcastExchangeExec.scala python/PythonStreamingSourceRunner.scala > SQL core: Migrate logError with variables to structured logging framework > - > > Key: SPARK-47583 > URL: https://issues.apache.org/jira/browse/SPARK-47583 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Gengliang Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-47583) SQL core: Migrate logError with variables to structured logging framework
[ https://issues.apache.org/jira/browse/SPARK-47583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835546#comment-17835546 ] Daniel edited comment on SPARK-47583 at 4/9/24 11:07 PM: - spark$ grep logError sql/core/src/main/* -R | cut -d ':' -f 1 | sort -u sql/core/src/main/scala/org/apache/spark/sql/execution (excluding the streaming directory): (X) BaseScriptTransformationExec.scala (X) adaptive/AdaptiveSparkPlanExec.scala (X) command/InsertIntoDataSourceDirCommand.scala (X) command/createDataSourceTables.scala (X) command/ddl.scala (X) datasources/FileFormatWriter.scala (X) datasources/jdbc/connection/ConnectionProvider.scala datasources/v2/WriteToDataSourceV2Exec.scala datasources/v2/jdbc/JDBCScanBuilder.scala datasources/v2/jdbc/JDBCTableCatalog.scala exchange/BroadcastExchangeExec.scala python/PythonStreamingSourceRunner.scala was (Author: JIRAUSER285772): spark$ grep logError sql/core/src/main/* -R | cut -d ':' -f 1 | sort -u sql/core/src/main/scala/org/apache/spark/sql/execution (excluding the streaming directory): (X) BaseScriptTransformationExec.scala (X) adaptive/AdaptiveSparkPlanExec.scala (X) command/InsertIntoDataSourceDirCommand.scala (X) command/createDataSourceTables.scala (X) command/ddl.scala (X) datasources/FileFormatWriter.scala datasources/jdbc/connection/ConnectionProvider.scala datasources/v2/WriteToDataSourceV2Exec.scala datasources/v2/jdbc/JDBCScanBuilder.scala datasources/v2/jdbc/JDBCTableCatalog.scala exchange/BroadcastExchangeExec.scala python/PythonStreamingSourceRunner.scala > SQL core: Migrate logError with variables to structured logging framework > - > > Key: SPARK-47583 > URL: https://issues.apache.org/jira/browse/SPARK-47583 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Gengliang Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-47583) SQL core: Migrate logError with variables to structured logging framework
[ https://issues.apache.org/jira/browse/SPARK-47583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835546#comment-17835546 ] Daniel edited comment on SPARK-47583 at 4/9/24 10:59 PM: - spark$ grep logError sql/core/src/main/* -R | cut -d ':' -f 1 | sort -u sql/core/src/main/scala/org/apache/spark/sql/execution (excluding the streaming directory): (X) BaseScriptTransformationExec.scala (X) adaptive/AdaptiveSparkPlanExec.scala (X) command/InsertIntoDataSourceDirCommand.scala (X) command/createDataSourceTables.scala (X) command/ddl.scala datasources/FileFormatWriter.scala datasources/jdbc/connection/ConnectionProvider.scala datasources/v2/WriteToDataSourceV2Exec.scala datasources/v2/jdbc/JDBCScanBuilder.scala datasources/v2/jdbc/JDBCTableCatalog.scala exchange/BroadcastExchangeExec.scala python/PythonStreamingSourceRunner.scala was (Author: JIRAUSER285772): spark$ grep logError sql/core/src/main/* -R | cut -d ':' -f 1 | sort -u sql/core/src/main/scala/org/apache/spark/sql/execution (excluding the streaming directory): (X) BaseScriptTransformationExec.scala (X) adaptive/AdaptiveSparkPlanExec.scala (X) command/InsertIntoDataSourceDirCommand.scala (X) command/createDataSourceTables.scala command/ddl.scala datasources/FileFormatWriter.scala datasources/jdbc/connection/ConnectionProvider.scala datasources/v2/WriteToDataSourceV2Exec.scala datasources/v2/jdbc/JDBCScanBuilder.scala datasources/v2/jdbc/JDBCTableCatalog.scala exchange/BroadcastExchangeExec.scala python/PythonStreamingSourceRunner.scala > SQL core: Migrate logError with variables to structured logging framework > - > > Key: SPARK-47583 > URL: https://issues.apache.org/jira/browse/SPARK-47583 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Gengliang Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-47583) SQL core: Migrate logError with variables to structured logging framework
[ https://issues.apache.org/jira/browse/SPARK-47583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835546#comment-17835546 ] Daniel edited comment on SPARK-47583 at 4/9/24 10:55 PM: - spark$ grep logError sql/core/src/main/* -R | cut -d ':' -f 1 | sort -u sql/core/src/main/scala/org/apache/spark/sql/execution (excluding the streaming directory): (X) BaseScriptTransformationExec.scala (X) adaptive/AdaptiveSparkPlanExec.scala (X) command/InsertIntoDataSourceDirCommand.scala command/createDataSourceTables.scala command/ddl.scala datasources/FileFormatWriter.scala datasources/jdbc/connection/ConnectionProvider.scala datasources/v2/WriteToDataSourceV2Exec.scala datasources/v2/jdbc/JDBCScanBuilder.scala datasources/v2/jdbc/JDBCTableCatalog.scala exchange/BroadcastExchangeExec.scala python/PythonStreamingSourceRunner.scala was (Author: JIRAUSER285772): spark$ grep logError sql/core/src/main/* -R | cut -d ':' -f 1 | sort -u sql/core/src/main/scala/org/apache/spark/sql/execution (excluding the streaming directory): (X) BaseScriptTransformationExec.scala (X) adaptive/AdaptiveSparkPlanExec.scala command/InsertIntoDataSourceDirCommand.scala command/createDataSourceTables.scala command/ddl.scala datasources/FileFormatWriter.scala datasources/jdbc/connection/ConnectionProvider.scala datasources/v2/WriteToDataSourceV2Exec.scala datasources/v2/jdbc/JDBCScanBuilder.scala datasources/v2/jdbc/JDBCTableCatalog.scala exchange/BroadcastExchangeExec.scala python/PythonStreamingSourceRunner.scala > SQL core: Migrate logError with variables to structured logging framework > - > > Key: SPARK-47583 > URL: https://issues.apache.org/jira/browse/SPARK-47583 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Gengliang Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-47583) SQL core: Migrate logError with variables to structured logging framework
[ https://issues.apache.org/jira/browse/SPARK-47583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835546#comment-17835546 ] Daniel edited comment on SPARK-47583 at 4/9/24 10:53 PM: - spark$ grep logError sql/core/src/main/* -R | cut -d ':' -f 1 | sort -u sql/core/src/main/scala/org/apache/spark/sql/execution (excluding the streaming directory): (X) BaseScriptTransformationExec.scala (X) adaptive/AdaptiveSparkPlanExec.scala command/InsertIntoDataSourceDirCommand.scala command/createDataSourceTables.scala command/ddl.scala datasources/FileFormatWriter.scala datasources/jdbc/connection/ConnectionProvider.scala datasources/v2/WriteToDataSourceV2Exec.scala datasources/v2/jdbc/JDBCScanBuilder.scala datasources/v2/jdbc/JDBCTableCatalog.scala exchange/BroadcastExchangeExec.scala python/PythonStreamingSourceRunner.scala was (Author: JIRAUSER285772): spark$ grep logError sql/core/src/main/* -R | cut -d ':' -f 1 | sort -u sql/core/src/main/scala/org/apache/spark/sql/execution (excluding the streaming directory): (X) BaseScriptTransformationExec.scala adaptive/AdaptiveSparkPlanExec.scala command/InsertIntoDataSourceDirCommand.scala command/createDataSourceTables.scala command/ddl.scala datasources/FileFormatWriter.scala datasources/jdbc/connection/ConnectionProvider.scala datasources/v2/WriteToDataSourceV2Exec.scala datasources/v2/jdbc/JDBCScanBuilder.scala datasources/v2/jdbc/JDBCTableCatalog.scala exchange/BroadcastExchangeExec.scala python/PythonStreamingSourceRunner.scala > SQL core: Migrate logError with variables to structured logging framework > - > > Key: SPARK-47583 > URL: https://issues.apache.org/jira/browse/SPARK-47583 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Gengliang Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-47583) SQL core: Migrate logError with variables to structured logging framework
[ https://issues.apache.org/jira/browse/SPARK-47583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835546#comment-17835546 ] Daniel edited comment on SPARK-47583 at 4/9/24 10:47 PM: - spark$ grep logError sql/core/src/main/* -R | cut -d ':' -f 1 | sort -u sql/core/src/main/scala/org/apache/spark/sql/execution: (X) BaseScriptTransformationExec.scala adaptive/AdaptiveSparkPlanExec.scala command/InsertIntoDataSourceDirCommand.scala command/createDataSourceTables.scala command/ddl.scala datasources/FileFormatWriter.scala datasources/jdbc/connection/ConnectionProvider.scala datasources/v2/WriteToDataSourceV2Exec.scala datasources/v2/jdbc/JDBCScanBuilder.scala datasources/v2/jdbc/JDBCTableCatalog.scala exchange/BroadcastExchangeExec.scala python/PythonStreamingSourceRunner.scala was (Author: JIRAUSER285772): spark$ grep logError sql/core/src/main/* -R | cut -d ':' -f 1 | sort -u sql/core/src/main/scala/org/apache/spark/sql/execution: (X) BaseScriptTransformationExec.scala adaptive/AdaptiveSparkPlanExec.scala command/InsertIntoDataSourceDirCommand.scala command/createDataSourceTables.scala command/ddl.scala datasources/FileFormatWriter.scala datasources/jdbc/connection/ConnectionProvider.scala datasources/v2/WriteToDataSourceV2Exec.scala datasources/v2/jdbc/JDBCScanBuilder.scala datasources/v2/jdbc/JDBCTableCatalog.scala exchange/BroadcastExchangeExec.scala python/PythonStreamingSourceRunner.scala streaming/AsyncCommitLog.scala streaming/AsyncLogPurge.scala streaming/AsyncOffsetSeqLog.scala streaming/AsyncProgressTrackingMicroBatchExecution.scala streaming/ErrorNotifier.scala streaming/MicroBatchExecution.scala streaming/StreamExecution.scala streaming/StreamMetadata.scala streaming/continuous/ContinuousExecution.scala streaming/continuous/ContinuousWriteRDD.scala streaming/state/OperatorStateMetadata.scala streaming/state/RocksDB.scala streaming/state/RocksDBFileManager.scala streaming/state/StateSchemaCompatibilityChecker.scala streaming/state/StateStoreChangelog.scala > SQL core: Migrate logError with variables to structured logging framework > - > > Key: SPARK-47583 > URL: https://issues.apache.org/jira/browse/SPARK-47583 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Gengliang Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-47583) SQL core: Migrate logError with variables to structured logging framework
[ https://issues.apache.org/jira/browse/SPARK-47583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835546#comment-17835546 ] Daniel edited comment on SPARK-47583 at 4/9/24 10:32 PM: - spark$ grep logError sql/core/src/main/* -R | cut -d ':' -f 1 | sort -u sql/core/src/main/scala/org/apache/spark/sql/execution: (X) BaseScriptTransformationExec.scala adaptive/AdaptiveSparkPlanExec.scala command/InsertIntoDataSourceDirCommand.scala command/createDataSourceTables.scala command/ddl.scala datasources/FileFormatWriter.scala datasources/jdbc/connection/ConnectionProvider.scala datasources/v2/WriteToDataSourceV2Exec.scala datasources/v2/jdbc/JDBCScanBuilder.scala datasources/v2/jdbc/JDBCTableCatalog.scala exchange/BroadcastExchangeExec.scala python/PythonStreamingSourceRunner.scala streaming/AsyncCommitLog.scala streaming/AsyncLogPurge.scala streaming/AsyncOffsetSeqLog.scala streaming/AsyncProgressTrackingMicroBatchExecution.scala streaming/ErrorNotifier.scala streaming/MicroBatchExecution.scala streaming/StreamExecution.scala streaming/StreamMetadata.scala streaming/continuous/ContinuousExecution.scala streaming/continuous/ContinuousWriteRDD.scala streaming/state/OperatorStateMetadata.scala streaming/state/RocksDB.scala streaming/state/RocksDBFileManager.scala streaming/state/StateSchemaCompatibilityChecker.scala streaming/state/StateStoreChangelog.scala was (Author: JIRAUSER285772): spark$ grep logError sql/core/src/main/* -R | cut -d ':' -f 1 | sort -u (X) sql/core/src/main/scala/org/apache/spark/sql/execution/BaseScriptTransformationExec.scala sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala sql/core/src/main/scala/org/apache/spark/sql/execution/command/InsertIntoDataSourceDirCommand.scala sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/connection/ConnectionProvider.scala sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2Exec.scala sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCScanBuilder.scala sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTableCatalog.scala sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala sql/core/src/main/scala/org/apache/spark/sql/execution/python/PythonStreamingSourceRunner.scala sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/AsyncCommitLog.scala sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/AsyncLogPurge.scala sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/AsyncOffsetSeqLog.scala sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/AsyncProgressTrackingMicroBatchExecution.scala sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ErrorNotifier.scala sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamMetadata.scala sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousExecution.scala sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousWriteRDD.scala sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/OperatorStateMetadata.scala sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileManager.scala sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateSchemaCompatibilityChecker.scala sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStoreChangelog.scala sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala > SQL core: Migrate logError with variables to structured logging framework > - > > Key: SPARK-47583 > URL: https://issues.apache.org/jira/browse/SPARK-47583 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Gengliang Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-47583) SQL core: Migrate logError with variables to structured logging framework
[ https://issues.apache.org/jira/browse/SPARK-47583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835546#comment-17835546 ] Daniel edited comment on SPARK-47583 at 4/9/24 10:30 PM: - spark$ grep logError sql/core/src/main/* -R | cut -d ':' -f 1 | sort -u (X) sql/core/src/main/scala/org/apache/spark/sql/execution/BaseScriptTransformationExec.scala sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala sql/core/src/main/scala/org/apache/spark/sql/execution/command/InsertIntoDataSourceDirCommand.scala sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/connection/ConnectionProvider.scala sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2Exec.scala sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCScanBuilder.scala sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTableCatalog.scala sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala sql/core/src/main/scala/org/apache/spark/sql/execution/python/PythonStreamingSourceRunner.scala sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/AsyncCommitLog.scala sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/AsyncLogPurge.scala sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/AsyncOffsetSeqLog.scala sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/AsyncProgressTrackingMicroBatchExecution.scala sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ErrorNotifier.scala sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamMetadata.scala sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousExecution.scala sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousWriteRDD.scala sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/OperatorStateMetadata.scala sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileManager.scala sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateSchemaCompatibilityChecker.scala sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStoreChangelog.scala sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala was (Author: JIRAUSER285772): spark$ grep logError sql/core/src/main/* -R | cut -d ':' -f 1 | sort -u sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala sql/core/src/main/scala/org/apache/spark/sql/execution/BaseScriptTransformationExec.scala sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala sql/core/src/main/scala/org/apache/spark/sql/execution/command/InsertIntoDataSourceDirCommand.scala sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/connection/ConnectionProvider.scala sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2Exec.scala sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCScanBuilder.scala sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTableCatalog.scala sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala sql/core/src/main/scala/org/apache/spark/sql/execution/python/PythonStreamingSourceRunner.scala sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/AsyncCommitLog.scala sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/AsyncLogPurge.scala sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/AsyncOffsetSeqLog.scala sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/AsyncProgressTrackingMicroBatchExecution.scala sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ErrorNotifier.scala sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala
[jira] [Commented] (SPARK-47583) SQL core: Migrate logError with variables to structured logging framework
[ https://issues.apache.org/jira/browse/SPARK-47583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835546#comment-17835546 ] Daniel commented on SPARK-47583: spark$ grep logError sql/core/src/main/* -R | cut -d ':' -f 1 | sort -u sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala sql/core/src/main/scala/org/apache/spark/sql/execution/BaseScriptTransformationExec.scala sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala sql/core/src/main/scala/org/apache/spark/sql/execution/command/InsertIntoDataSourceDirCommand.scala sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/connection/ConnectionProvider.scala sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2Exec.scala sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCScanBuilder.scala sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTableCatalog.scala sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala sql/core/src/main/scala/org/apache/spark/sql/execution/python/PythonStreamingSourceRunner.scala sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/AsyncCommitLog.scala sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/AsyncLogPurge.scala sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/AsyncOffsetSeqLog.scala sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/AsyncProgressTrackingMicroBatchExecution.scala sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ErrorNotifier.scala sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamMetadata.scala sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousExecution.scala sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousWriteRDD.scala sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/OperatorStateMetadata.scala sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileManager.scala sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateSchemaCompatibilityChecker.scala sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStoreChangelog.scala sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala > SQL core: Migrate logError with variables to structured logging framework > - > > Key: SPARK-47583 > URL: https://issues.apache.org/jira/browse/SPARK-47583 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Gengliang Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47787) Upgrade `commons-compress` to 1.26.1`
[ https://issues.apache.org/jira/browse/SPARK-47787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47787: --- Labels: pull-request-available (was: ) > Upgrade `commons-compress` to 1.26.1` > - > > Key: SPARK-47787 > URL: https://issues.apache.org/jira/browse/SPARK-47787 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47787) Upgrade `commons-compress` to 1.26.1`
[ https://issues.apache.org/jira/browse/SPARK-47787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-47787: -- Summary: Upgrade `commons-compress` to 1.26.1` (was: Upgrade `commons-compress` to 1.26.`) > Upgrade `commons-compress` to 1.26.1` > - > > Key: SPARK-47787 > URL: https://issues.apache.org/jira/browse/SPARK-47787 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47787) Upgrade `commons-compress` to 1.26.`
Dongjoon Hyun created SPARK-47787: - Summary: Upgrade `commons-compress` to 1.26.` Key: SPARK-47787 URL: https://issues.apache.org/jira/browse/SPARK-47787 Project: Spark Issue Type: Sub-task Components: Build Affects Versions: 4.0.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47718) .sql() does not recognize watermark defined upstream
[ https://issues.apache.org/jira/browse/SPARK-47718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Liu updated SPARK-47718: Labels: (was: pull-request-available) > .sql() does not recognize watermark defined upstream > > > Key: SPARK-47718 > URL: https://issues.apache.org/jira/browse/SPARK-47718 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.5.1 >Reporter: Chloe He >Priority: Major > > I have a data pipeline set up in such a way that it reads data from a Kafka > source, does some transformation on the data using pyspark, then writes the > output into a sink (Kafka, Redis, etc). > > My entire pipeline in written in SQL, so I wish to use the .sql() method to > execute SQL on my streaming source directly. > > However, I'm running into the issue where my watermark is not being > recognized by the downstream query via the .sql() method. > > ``` > Python 3.11.8 | packaged by conda-forge | (main, Feb 16 2024, 20:49:36) > [Clang 16.0.6 ] on darwin > Type "help", "copyright", "credits" or "license" for more information. > >>> import pyspark > >>> print(pyspark.__version__) > 3.5.1 > >>> from pyspark.sql import SparkSession > >>> > >>> session = SparkSession.builder \ > ... .config("spark.jars.packages", > "org.apache.spark:spark-sql-kafka-0-10_2.12:3.5.1")\ > ... .getOrCreate() > >>> from pyspark.sql.functions import col, from_json > >>> from pyspark.sql.types import StructField, StructType, TimestampType, > >>> LongType, DoubleType, IntegerType > >>> schema = StructType( > ... [ > ... StructField('createTime', TimestampType(), True), > ... StructField('orderId', LongType(), True), > ... StructField('payAmount', DoubleType(), True), > ... StructField('payPlatform', IntegerType(), True), > ... StructField('provinceId', IntegerType(), True), > ... ]) > >>> > >>> streaming_df = session.readStream\ > ... .format("kafka")\ > ... .option("kafka.bootstrap.servers", "localhost:9092")\ > ... .option("subscribe", "payment_msg")\ > ... .option("startingOffsets","earliest")\ > ... .load()\ > ... .select(from_json(col("value").cast("string"), > schema).alias("parsed_value"))\ > ... .select("parsed_value.*")\ > ... .withWatermark("createTime", "10 seconds") > >>> > >>> streaming_df.createOrReplaceTempView("streaming_df") > >>> session.sql(""" > ... SELECT > ... window.start, window.end, provinceId, sum(payAmount) as totalPayAmount > ... FROM streaming_df > ... GROUP BY provinceId, window('createTime', '1 hour', '30 minutes') > ... ORDER BY window.start > ... """)\ > ... .writeStream\ > ... .format("kafka") \ > ... .option("checkpointLocation", "checkpoint") \ > ... .option("kafka.bootstrap.servers", "localhost:9092") \ > ... .option("topic", "sink") \ > ... .start() > ``` > > This throws exception > ``` > pyspark.errors.exceptions.captured.AnalysisException: Append output mode not > supported when there are streaming aggregations on streaming > DataFrames/DataSets without watermark; line 6 pos 4; > ``` > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47783) Refresh error-states.json
[ https://issues.apache.org/jira/browse/SPARK-47783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang reassigned SPARK-47783: -- Assignee: Serge Rielau > Refresh error-states.json > - > > Key: SPARK-47783 > URL: https://issues.apache.org/jira/browse/SPARK-47783 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Serge Rielau >Assignee: Serge Rielau >Priority: Major > Labels: pull-request-available > > We want to add more SQLSTATEs to the menu to prevent collisions and do some > general cleanup -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47783) Refresh error-states.json
[ https://issues.apache.org/jira/browse/SPARK-47783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang resolved SPARK-47783. Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45961 [https://github.com/apache/spark/pull/45961] > Refresh error-states.json > - > > Key: SPARK-47783 > URL: https://issues.apache.org/jira/browse/SPARK-47783 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Serge Rielau >Assignee: Serge Rielau >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > We want to add more SQLSTATEs to the menu to prevent collisions and do some > general cleanup -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47759) Apps being stuck after JavaUtils.timeStringAs fails to parse a legitimate time string
[ https://issues.apache.org/jira/browse/SPARK-47759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Xiong updated SPARK-47759: - Description: h2. Symptom It's observed that our Spark apps occasionally got stuck with an unexpected stack trace when reading/parsing a legitimate time string. Note that we manually killed the stuck app instances and the retry goes thru on the same cluster (without requiring any app code change). *[Stack Trace 1]* The stack trace doesn't make sense since *120s* is a legitimate time string, where the app runs on emr-7.0.0 with Spark 3.5.0 runtime. {code:java} Caused by: java.lang.RuntimeException: java.lang.NumberFormatException: Time must be specified as seconds (s), milliseconds (ms), microseconds (us), minutes (m or min), hour (h), or day (d). E.g. 50s, 100ms, or 250us. Failed to parse time string: 120s at org.apache.spark.network.util.JavaUtils.timeStringAs(JavaUtils.java:258) at org.apache.spark.network.util.JavaUtils.timeStringAsSec(JavaUtils.java:275) at org.apache.spark.util.Utils$.timeStringAsSeconds(Utils.scala:1166) at org.apache.spark.rpc.RpcTimeout$.apply(RpcTimeout.scala:131) at org.apache.spark.util.RpcUtils$.askRpcTimeout(RpcUtils.scala:41) at org.apache.spark.rpc.RpcEndpointRef.(RpcEndpointRef.scala:33) at org.apache.spark.rpc.netty.NettyRpcEndpointRef.(NettyRpcEnv.scala:533) at org.apache.spark.rpc.netty.RequestMessage$.apply(NettyRpcEnv.scala:640) at org.apache.spark.rpc.netty.NettyRpcHandler.internalReceive(NettyRpcEnv.scala:697) at org.apache.spark.rpc.netty.NettyRpcHandler.receive(NettyRpcEnv.scala:682) at org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:163) at org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:109) at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:140) at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:53) at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:102) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) at org.apache.spark.network.crypto.TransportCipher$DecryptionHandler.channelRead(TransportCipher.java:192) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788) at
[jira] [Updated] (SPARK-47759) Apps being stuck after JavaUtils.timeStringAs fails to parse a legitimate time string
[ https://issues.apache.org/jira/browse/SPARK-47759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Xiong updated SPARK-47759: - Description: h2. Symptom It's observed that our Spark apps occasionally got stuck with an unexpected stack trace when reading/parsing a legitimate time string. Note that we manually killed the stuck app instances and the retry goes thru on the same cluster (without requiring any app code change). *[Stack Trace 1]* The stack trace doesn't make sense since *120s* is a legitimate time string, where the app runs on emr-7.0.0 with Spark 3.5.0 runtime. {code:java} Caused by: java.lang.RuntimeException: java.lang.NumberFormatException: Time must be specified as seconds (s), milliseconds (ms), microseconds (us), minutes (m or min), hour (h), or day (d). E.g. 50s, 100ms, or 250us. Failed to parse time string: 120s at org.apache.spark.network.util.JavaUtils.timeStringAs(JavaUtils.java:258) at org.apache.spark.network.util.JavaUtils.timeStringAsSec(JavaUtils.java:275) at org.apache.spark.util.Utils$.timeStringAsSeconds(Utils.scala:1166) at org.apache.spark.rpc.RpcTimeout$.apply(RpcTimeout.scala:131) at org.apache.spark.util.RpcUtils$.askRpcTimeout(RpcUtils.scala:41) at org.apache.spark.rpc.RpcEndpointRef.(RpcEndpointRef.scala:33) at org.apache.spark.rpc.netty.NettyRpcEndpointRef.(NettyRpcEnv.scala:533) at org.apache.spark.rpc.netty.RequestMessage$.apply(NettyRpcEnv.scala:640) at org.apache.spark.rpc.netty.NettyRpcHandler.internalReceive(NettyRpcEnv.scala:697) at org.apache.spark.rpc.netty.NettyRpcHandler.receive(NettyRpcEnv.scala:682) at org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:163) at org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:109) at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:140) at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:53) at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:102) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) at org.apache.spark.network.crypto.TransportCipher$DecryptionHandler.channelRead(TransportCipher.java:192) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788) at
[jira] [Updated] (SPARK-47415) Levenshtein (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47415: --- Labels: pull-request-available (was: ) > Levenshtein (all collations) > > > Key: SPARK-47415 > URL: https://issues.apache.org/jira/browse/SPARK-47415 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47759) Apps being stuck after JavaUtils.timeStringAs fails to parse a legitimate time string
[ https://issues.apache.org/jira/browse/SPARK-47759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Xiong updated SPARK-47759: - Description: h2. Symptom It's observed that our Spark apps occasionally got stuck with an unexpected stack trace when reading/parsing a legitimate time string. Note that we manually killed the stuck app instances and the retry goes thru on the same cluster (without requiring any app code change). *[Stack Trace 1]* The stack trace doesn't make sense since *120s* is a legitimate time string, where the app runs on emr-7.0.0 with Spark 3.5.0 runtime. {code:java} Caused by: java.lang.RuntimeException: java.lang.NumberFormatException: Time must be specified as seconds (s), milliseconds (ms), microseconds (us), minutes (m or min), hour (h), or day (d). E.g. 50s, 100ms, or 250us. Failed to parse time string: 120s at org.apache.spark.network.util.JavaUtils.timeStringAs(JavaUtils.java:258) at org.apache.spark.network.util.JavaUtils.timeStringAsSec(JavaUtils.java:275) at org.apache.spark.util.Utils$.timeStringAsSeconds(Utils.scala:1166) at org.apache.spark.rpc.RpcTimeout$.apply(RpcTimeout.scala:131) at org.apache.spark.util.RpcUtils$.askRpcTimeout(RpcUtils.scala:41) at org.apache.spark.rpc.RpcEndpointRef.(RpcEndpointRef.scala:33) at org.apache.spark.rpc.netty.NettyRpcEndpointRef.(NettyRpcEnv.scala:533) at org.apache.spark.rpc.netty.RequestMessage$.apply(NettyRpcEnv.scala:640) at org.apache.spark.rpc.netty.NettyRpcHandler.internalReceive(NettyRpcEnv.scala:697) at org.apache.spark.rpc.netty.NettyRpcHandler.receive(NettyRpcEnv.scala:682) at org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:163) at org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:109) at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:140) at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:53) at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:102) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) at org.apache.spark.network.crypto.TransportCipher$DecryptionHandler.channelRead(TransportCipher.java:192) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788) at
[jira] [Created] (SPARK-47785) Upgrade `bouncycastle` to 1.78
Dongjoon Hyun created SPARK-47785: - Summary: Upgrade `bouncycastle` to 1.78 Key: SPARK-47785 URL: https://issues.apache.org/jira/browse/SPARK-47785 Project: Spark Issue Type: Sub-task Components: Build, Tests Affects Versions: 4.0.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47783) Refresh error-states.json
[ https://issues.apache.org/jira/browse/SPARK-47783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47783: --- Labels: pull-request-available (was: ) > Refresh error-states.json > - > > Key: SPARK-47783 > URL: https://issues.apache.org/jira/browse/SPARK-47783 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Serge Rielau >Priority: Major > Labels: pull-request-available > > We want to add more SQLSTATEs to the menu to prevent collisions and do some > general cleanup -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47783) Refresh error-states.json
[ https://issues.apache.org/jira/browse/SPARK-47783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Serge Rielau updated SPARK-47783: - Summary: Refresh error-states.json (was: Refresh error-state.sql) > Refresh error-states.json > - > > Key: SPARK-47783 > URL: https://issues.apache.org/jira/browse/SPARK-47783 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Serge Rielau >Priority: Major > > We want to add more SQLSTATEs to the menu to prevent collisions and do some > general cleanup -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47783) Refresh error-state.sql
Serge Rielau created SPARK-47783: Summary: Refresh error-state.sql Key: SPARK-47783 URL: https://issues.apache.org/jira/browse/SPARK-47783 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 4.0.0 Reporter: Serge Rielau We want to add more SQLSTATEs to the menu to prevent collisions and do some general cleanup -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47673) [Arbitrary State Support] State TTL support - ListState
[ https://issues.apache.org/jira/browse/SPARK-47673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47673: --- Labels: pull-request-available (was: ) > [Arbitrary State Support] State TTL support - ListState > --- > > Key: SPARK-47673 > URL: https://issues.apache.org/jira/browse/SPARK-47673 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 4.0.0 >Reporter: Eric Marnadi >Priority: Major > Labels: pull-request-available > > Add support for expiring state value based on ttl for List State in > transformWithState operator. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47748) Upgrade `zstd-jni` to 1.5.6-2
[ https://issues.apache.org/jira/browse/SPARK-47748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-47748: -- Parent: SPARK-47046 Issue Type: Sub-task (was: Improvement) > Upgrade `zstd-jni` to 1.5.6-2 > - > > Key: SPARK-47748 > URL: https://issues.apache.org/jira/browse/SPARK-47748 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47782) Remove redundant json4s-jackson definition in sql/api POM
[ https://issues.apache.org/jira/browse/SPARK-47782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-47782: - Assignee: Cheng Pan (was: Dongjoon Hyun) > Remove redundant json4s-jackson definition in sql/api POM > - > > Key: SPARK-47782 > URL: https://issues.apache.org/jira/browse/SPARK-47782 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Cheng Pan >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47782) Remove redundant json4s-jackson definition in sql/api POM
[ https://issues.apache.org/jira/browse/SPARK-47782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47782: --- Labels: pull-request-available (was: ) > Remove redundant json4s-jackson definition in sql/api POM > - > > Key: SPARK-47782 > URL: https://issues.apache.org/jira/browse/SPARK-47782 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47782) Remove redundant json4s-jackson definition in sql/api POM
[ https://issues.apache.org/jira/browse/SPARK-47782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-47782: - Assignee: Dongjoon Hyun > Remove redundant json4s-jackson definition in sql/api POM > - > > Key: SPARK-47782 > URL: https://issues.apache.org/jira/browse/SPARK-47782 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47782) Remove redundant json4s-jackson definition in sql/api POM
[ https://issues.apache.org/jira/browse/SPARK-47782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-47782. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45943 [https://github.com/apache/spark/pull/45943] > Remove redundant json4s-jackson definition in sql/api POM > - > > Key: SPARK-47782 > URL: https://issues.apache.org/jira/browse/SPARK-47782 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47782) Remove redundant json4s-jackson definition in sql/api POM
[ https://issues.apache.org/jira/browse/SPARK-47782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-47782: -- Parent: SPARK-47046 Issue Type: Sub-task (was: Task) > Remove redundant json4s-jackson definition in sql/api POM > - > > Key: SPARK-47782 > URL: https://issues.apache.org/jira/browse/SPARK-47782 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47782) Remove redundant json4s-jackson definition in sql/api POM
Dongjoon Hyun created SPARK-47782: - Summary: Remove redundant json4s-jackson definition in sql/api POM Key: SPARK-47782 URL: https://issues.apache.org/jira/browse/SPARK-47782 Project: Spark Issue Type: Task Components: Build Affects Versions: 4.0.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47601) Graphx: Migrate logs with variables to structured logging framework
[ https://issues.apache.org/jira/browse/SPARK-47601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47601: --- Labels: pull-request-available (was: ) > Graphx: Migrate logs with variables to structured logging framework > > > Key: SPARK-47601 > URL: https://issues.apache.org/jira/browse/SPARK-47601 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Gengliang Wang >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47781) Handle negative scale and truncate exceed scale first for JDBC data sources
[ https://issues.apache.org/jira/browse/SPARK-47781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47781: --- Labels: pull-request-available (was: ) > Handle negative scale and truncate exceed scale first for JDBC data sources > --- > > Key: SPARK-47781 > URL: https://issues.apache.org/jira/browse/SPARK-47781 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Kent Yao >Priority: Major > Labels: pull-request-available > > SPARK-30252 has disabled the negative scale for decimals. It has a regression > that it also disabled reading from data sources that support negative scale > decimals -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47781) Handle negative scale and truncate exceed scale first for JDBC data sources
Kent Yao created SPARK-47781: Summary: Handle negative scale and truncate exceed scale first for JDBC data sources Key: SPARK-47781 URL: https://issues.apache.org/jira/browse/SPARK-47781 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 4.0.0 Reporter: Kent Yao SPARK-30252 has disabled the negative scale for decimals. It has a regression that it also disabled reading from data sources that support negative scale decimals -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-47416) SoundEx (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835328#comment-17835328 ] Nikola Mandic commented on SPARK-47416: --- Working on it. > SoundEx (all collations) > > > Key: SPARK-47416 > URL: https://issues.apache.org/jira/browse/SPARK-47416 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47780) Generate stable and uniquely-named classes in the Catalyst code generator
[ https://issues.apache.org/jira/browse/SPARK-47780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47780: --- Labels: pull-request-available (was: ) > Generate stable and uniquely-named classes in the Catalyst code generator > - > > Key: SPARK-47780 > URL: https://issues.apache.org/jira/browse/SPARK-47780 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.1 >Reporter: Vojin Jovanovic >Priority: Minor > Labels: pull-request-available > > Code generated by Catalyst can not be executed on [GraalVM Native > Image|https://www.graalvm.org/] for the following two reasons: > # The generated classes for Scala UDFs are not stable so they can not be > [pre-defined in a native > image|https://www.graalvm.org/latest/reference-manual/native-image/metadata/ExperimentalAgentOptions/#support-for-predefined-classes]. > The instability comes from the address of hidden functions > ({{{}/0x{}}}) that [end up in the generated > code|https://github.com/apache/spark/blob/v3.5.1/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala#L1173]. > These addresses are completely random and therefore provide little value to > the end user. > # The name of the generated class is always the same > ({{{}org.apache.spark.sql.catalyst.expressions.GeneratedClass{}}}) which > makes it impossible to [pre-define multiple such classes in a native > image|https://github.com/oracle/graal/blob/vm-ce-23.1.2/substratevm/src/com.oracle.svm.hosted/src/com/oracle/svm/hosted/ClassPredefinitionFeature.java#L153]. > We can fix this by generating the class name in the following format: > {{{}org.apache.spark.sql.catalyst.expressions.GeneratedClass${}}}. > It would be useful enable Spark workloads on GraalVM Native Image. With > Native Image, one can generate low-memory Spark binaries for their frequent > workloads. We would also like to make all Spark benchmarks in the > [renaissance benchmark suite|https://renaissance.dev/] run as native images. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47780) Generate stable and uniquely-named classes in the Catalyst code generator
Vojin Jovanovic created SPARK-47780: --- Summary: Generate stable and uniquely-named classes in the Catalyst code generator Key: SPARK-47780 URL: https://issues.apache.org/jira/browse/SPARK-47780 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.5.1 Reporter: Vojin Jovanovic Code generated by Catalyst can not be executed on [GraalVM Native Image|https://www.graalvm.org/] for the following two reasons: # The generated classes for Scala UDFs are not stable so they can not be [pre-defined in a native image|https://www.graalvm.org/latest/reference-manual/native-image/metadata/ExperimentalAgentOptions/#support-for-predefined-classes]. The instability comes from the address of hidden functions ({{{}/0x{}}}) that [end up in the generated code|https://github.com/apache/spark/blob/v3.5.1/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala#L1173]. These addresses are completely random and therefore provide little value to the end user. # The name of the generated class is always the same ({{{}org.apache.spark.sql.catalyst.expressions.GeneratedClass{}}}) which makes it impossible to [pre-define multiple such classes in a native image|https://github.com/oracle/graal/blob/vm-ce-23.1.2/substratevm/src/com.oracle.svm.hosted/src/com/oracle/svm/hosted/ClassPredefinitionFeature.java#L153]. We can fix this by generating the class name in the following format: {{{}org.apache.spark.sql.catalyst.expressions.GeneratedClass${}}}. It would be useful enable Spark workloads on GraalVM Native Image. With Native Image, one can generate low-memory Spark binaries for their frequent workloads. We would also like to make all Spark benchmarks in the [renaissance benchmark suite|https://renaissance.dev/] run as native images. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47413) Substring, Right, Left (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-47413: -- Assignee: (was: Apache Spark) > Substring, Right, Left (all collations) > --- > > Key: SPARK-47413 > URL: https://issues.apache.org/jira/browse/SPARK-47413 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Priority: Major > Labels: pull-request-available > > Enable collation support for the *Substring* built-in string function in > Spark (including *Right* and *Left* functions). First confirm what is the > expected behaviour for these functions when given collated strings, then move > on to the implementation that would enable handling strings of all collation > types. Implement the corresponding unit tests > (CollationStringExpressionsSuite) and E2E tests (CollationSuite) to reflect > how this function should be used with collation in SparkSQL, and feel free to > use your chosen Spark SQL Editor to experiment with the existing functions to > learn more about how they work. In addition, look into the possible use-cases > and implementation of similar functions within other other open-source DBMS, > such as [PostgreSQL|https://www.postgresql.org/docs/]. > > The goal for this Jira ticket is to implement the {*}Substring{*}, > {*}Right{*}, and *Left* functions so that they support all collation types > currently supported in Spark. To understand what changes were introduced in > order to enable full collation support for other existing functions in Spark, > take a look at the Spark PRs and Jira tickets for completed tasks in this > parent (for example: Contains, StartsWith, EndsWith). > > Read more about ICU [Collation Concepts|http://example.com/] and > [Collator|http://example.com/] class. Also, refer to the Unicode Technical > Standard for > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47779) Add a helper function to sort PS Frame/Series
[ https://issues.apache.org/jira/browse/SPARK-47779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng reassigned SPARK-47779: - Assignee: Ruifeng Zheng > Add a helper function to sort PS Frame/Series > - > > Key: SPARK-47779 > URL: https://issues.apache.org/jira/browse/SPARK-47779 > Project: Spark > Issue Type: Improvement > Components: PS, Tests >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47413) Substring, Right, Left (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-47413: -- Assignee: Apache Spark > Substring, Right, Left (all collations) > --- > > Key: SPARK-47413 > URL: https://issues.apache.org/jira/browse/SPARK-47413 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Apache Spark >Priority: Major > Labels: pull-request-available > > Enable collation support for the *Substring* built-in string function in > Spark (including *Right* and *Left* functions). First confirm what is the > expected behaviour for these functions when given collated strings, then move > on to the implementation that would enable handling strings of all collation > types. Implement the corresponding unit tests > (CollationStringExpressionsSuite) and E2E tests (CollationSuite) to reflect > how this function should be used with collation in SparkSQL, and feel free to > use your chosen Spark SQL Editor to experiment with the existing functions to > learn more about how they work. In addition, look into the possible use-cases > and implementation of similar functions within other other open-source DBMS, > such as [PostgreSQL|https://www.postgresql.org/docs/]. > > The goal for this Jira ticket is to implement the {*}Substring{*}, > {*}Right{*}, and *Left* functions so that they support all collation types > currently supported in Spark. To understand what changes were introduced in > order to enable full collation support for other existing functions in Spark, > take a look at the Spark PRs and Jira tickets for completed tasks in this > parent (for example: Contains, StartsWith, EndsWith). > > Read more about ICU [Collation Concepts|http://example.com/] and > [Collator|http://example.com/] class. Also, refer to the Unicode Technical > Standard for > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47779) Add a helper function to sort PS Frame/Series
[ https://issues.apache.org/jira/browse/SPARK-47779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-47779. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45952 [https://github.com/apache/spark/pull/45952] > Add a helper function to sort PS Frame/Series > - > > Key: SPARK-47779 > URL: https://issues.apache.org/jira/browse/SPARK-47779 > Project: Spark > Issue Type: Improvement > Components: PS, Tests >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47779) Add a helper function to sort PS Frame/Series
[ https://issues.apache.org/jira/browse/SPARK-47779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-47779: -- Assignee: (was: Apache Spark) > Add a helper function to sort PS Frame/Series > - > > Key: SPARK-47779 > URL: https://issues.apache.org/jira/browse/SPARK-47779 > Project: Spark > Issue Type: Improvement > Components: PS, Tests >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47413) Substring, Right, Left (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-47413: -- Assignee: Apache Spark > Substring, Right, Left (all collations) > --- > > Key: SPARK-47413 > URL: https://issues.apache.org/jira/browse/SPARK-47413 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Apache Spark >Priority: Major > Labels: pull-request-available > > Enable collation support for the *Substring* built-in string function in > Spark (including *Right* and *Left* functions). First confirm what is the > expected behaviour for these functions when given collated strings, then move > on to the implementation that would enable handling strings of all collation > types. Implement the corresponding unit tests > (CollationStringExpressionsSuite) and E2E tests (CollationSuite) to reflect > how this function should be used with collation in SparkSQL, and feel free to > use your chosen Spark SQL Editor to experiment with the existing functions to > learn more about how they work. In addition, look into the possible use-cases > and implementation of similar functions within other other open-source DBMS, > such as [PostgreSQL|https://www.postgresql.org/docs/]. > > The goal for this Jira ticket is to implement the {*}Substring{*}, > {*}Right{*}, and *Left* functions so that they support all collation types > currently supported in Spark. To understand what changes were introduced in > order to enable full collation support for other existing functions in Spark, > take a look at the Spark PRs and Jira tickets for completed tasks in this > parent (for example: Contains, StartsWith, EndsWith). > > Read more about ICU [Collation Concepts|http://example.com/] and > [Collator|http://example.com/] class. Also, refer to the Unicode Technical > Standard for > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47413) Substring, Right, Left (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-47413: -- Assignee: (was: Apache Spark) > Substring, Right, Left (all collations) > --- > > Key: SPARK-47413 > URL: https://issues.apache.org/jira/browse/SPARK-47413 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Priority: Major > Labels: pull-request-available > > Enable collation support for the *Substring* built-in string function in > Spark (including *Right* and *Left* functions). First confirm what is the > expected behaviour for these functions when given collated strings, then move > on to the implementation that would enable handling strings of all collation > types. Implement the corresponding unit tests > (CollationStringExpressionsSuite) and E2E tests (CollationSuite) to reflect > how this function should be used with collation in SparkSQL, and feel free to > use your chosen Spark SQL Editor to experiment with the existing functions to > learn more about how they work. In addition, look into the possible use-cases > and implementation of similar functions within other other open-source DBMS, > such as [PostgreSQL|https://www.postgresql.org/docs/]. > > The goal for this Jira ticket is to implement the {*}Substring{*}, > {*}Right{*}, and *Left* functions so that they support all collation types > currently supported in Spark. To understand what changes were introduced in > order to enable full collation support for other existing functions in Spark, > take a look at the Spark PRs and Jira tickets for completed tasks in this > parent (for example: Contains, StartsWith, EndsWith). > > Read more about ICU [Collation Concepts|http://example.com/] and > [Collator|http://example.com/] class. Also, refer to the Unicode Technical > Standard for > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47779) Add a helper function to sort PS Frame/Series
[ https://issues.apache.org/jira/browse/SPARK-47779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-47779: -- Assignee: Apache Spark > Add a helper function to sort PS Frame/Series > - > > Key: SPARK-47779 > URL: https://issues.apache.org/jira/browse/SPARK-47779 > Project: Spark > Issue Type: Improvement > Components: PS, Tests >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Apache Spark >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-47718) .sql() does not recognize watermark defined upstream
[ https://issues.apache.org/jira/browse/SPARK-47718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835303#comment-17835303 ] Jungtaek Lim commented on SPARK-47718: -- window('createTime', '1 hour', '30 minutes') 'createTime' is a literal, not the column reference. Please try again with window(createTime, '1 hour', '30 minutes') and reopen if the issue persists. > .sql() does not recognize watermark defined upstream > > > Key: SPARK-47718 > URL: https://issues.apache.org/jira/browse/SPARK-47718 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.5.1 >Reporter: Chloe He >Priority: Major > Labels: pull-request-available > > I have a data pipeline set up in such a way that it reads data from a Kafka > source, does some transformation on the data using pyspark, then writes the > output into a sink (Kafka, Redis, etc). > > My entire pipeline in written in SQL, so I wish to use the .sql() method to > execute SQL on my streaming source directly. > > However, I'm running into the issue where my watermark is not being > recognized by the downstream query via the .sql() method. > > ``` > Python 3.11.8 | packaged by conda-forge | (main, Feb 16 2024, 20:49:36) > [Clang 16.0.6 ] on darwin > Type "help", "copyright", "credits" or "license" for more information. > >>> import pyspark > >>> print(pyspark.__version__) > 3.5.1 > >>> from pyspark.sql import SparkSession > >>> > >>> session = SparkSession.builder \ > ... .config("spark.jars.packages", > "org.apache.spark:spark-sql-kafka-0-10_2.12:3.5.1")\ > ... .getOrCreate() > >>> from pyspark.sql.functions import col, from_json > >>> from pyspark.sql.types import StructField, StructType, TimestampType, > >>> LongType, DoubleType, IntegerType > >>> schema = StructType( > ... [ > ... StructField('createTime', TimestampType(), True), > ... StructField('orderId', LongType(), True), > ... StructField('payAmount', DoubleType(), True), > ... StructField('payPlatform', IntegerType(), True), > ... StructField('provinceId', IntegerType(), True), > ... ]) > >>> > >>> streaming_df = session.readStream\ > ... .format("kafka")\ > ... .option("kafka.bootstrap.servers", "localhost:9092")\ > ... .option("subscribe", "payment_msg")\ > ... .option("startingOffsets","earliest")\ > ... .load()\ > ... .select(from_json(col("value").cast("string"), > schema).alias("parsed_value"))\ > ... .select("parsed_value.*")\ > ... .withWatermark("createTime", "10 seconds") > >>> > >>> streaming_df.createOrReplaceTempView("streaming_df") > >>> session.sql(""" > ... SELECT > ... window.start, window.end, provinceId, sum(payAmount) as totalPayAmount > ... FROM streaming_df > ... GROUP BY provinceId, window('createTime', '1 hour', '30 minutes') > ... ORDER BY window.start > ... """)\ > ... .writeStream\ > ... .format("kafka") \ > ... .option("checkpointLocation", "checkpoint") \ > ... .option("kafka.bootstrap.servers", "localhost:9092") \ > ... .option("topic", "sink") \ > ... .start() > ``` > > This throws exception > ``` > pyspark.errors.exceptions.captured.AnalysisException: Append output mode not > supported when there are streaming aggregations on streaming > DataFrames/DataSets without watermark; line 6 pos 4; > ``` > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47718) .sql() does not recognize watermark defined upstream
[ https://issues.apache.org/jira/browse/SPARK-47718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim resolved SPARK-47718. -- Resolution: Not A Bug > .sql() does not recognize watermark defined upstream > > > Key: SPARK-47718 > URL: https://issues.apache.org/jira/browse/SPARK-47718 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.5.1 >Reporter: Chloe He >Priority: Major > Labels: pull-request-available > > I have a data pipeline set up in such a way that it reads data from a Kafka > source, does some transformation on the data using pyspark, then writes the > output into a sink (Kafka, Redis, etc). > > My entire pipeline in written in SQL, so I wish to use the .sql() method to > execute SQL on my streaming source directly. > > However, I'm running into the issue where my watermark is not being > recognized by the downstream query via the .sql() method. > > ``` > Python 3.11.8 | packaged by conda-forge | (main, Feb 16 2024, 20:49:36) > [Clang 16.0.6 ] on darwin > Type "help", "copyright", "credits" or "license" for more information. > >>> import pyspark > >>> print(pyspark.__version__) > 3.5.1 > >>> from pyspark.sql import SparkSession > >>> > >>> session = SparkSession.builder \ > ... .config("spark.jars.packages", > "org.apache.spark:spark-sql-kafka-0-10_2.12:3.5.1")\ > ... .getOrCreate() > >>> from pyspark.sql.functions import col, from_json > >>> from pyspark.sql.types import StructField, StructType, TimestampType, > >>> LongType, DoubleType, IntegerType > >>> schema = StructType( > ... [ > ... StructField('createTime', TimestampType(), True), > ... StructField('orderId', LongType(), True), > ... StructField('payAmount', DoubleType(), True), > ... StructField('payPlatform', IntegerType(), True), > ... StructField('provinceId', IntegerType(), True), > ... ]) > >>> > >>> streaming_df = session.readStream\ > ... .format("kafka")\ > ... .option("kafka.bootstrap.servers", "localhost:9092")\ > ... .option("subscribe", "payment_msg")\ > ... .option("startingOffsets","earliest")\ > ... .load()\ > ... .select(from_json(col("value").cast("string"), > schema).alias("parsed_value"))\ > ... .select("parsed_value.*")\ > ... .withWatermark("createTime", "10 seconds") > >>> > >>> streaming_df.createOrReplaceTempView("streaming_df") > >>> session.sql(""" > ... SELECT > ... window.start, window.end, provinceId, sum(payAmount) as totalPayAmount > ... FROM streaming_df > ... GROUP BY provinceId, window('createTime', '1 hour', '30 minutes') > ... ORDER BY window.start > ... """)\ > ... .writeStream\ > ... .format("kafka") \ > ... .option("checkpointLocation", "checkpoint") \ > ... .option("kafka.bootstrap.servers", "localhost:9092") \ > ... .option("topic", "sink") \ > ... .start() > ``` > > This throws exception > ``` > pyspark.errors.exceptions.captured.AnalysisException: Append output mode not > supported when there are streaming aggregations on streaming > DataFrames/DataSets without watermark; line 6 pos 4; > ``` > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47779) Add a helper function to sort PS Frame/Series
[ https://issues.apache.org/jira/browse/SPARK-47779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47779: --- Labels: pull-request-available (was: ) > Add a helper function to sort PS Frame/Series > - > > Key: SPARK-47779 > URL: https://issues.apache.org/jira/browse/SPARK-47779 > Project: Spark > Issue Type: Improvement > Components: PS, Tests >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47779) Add a helper function to sort PS Frame/Series
Ruifeng Zheng created SPARK-47779: - Summary: Add a helper function to sort PS Frame/Series Key: SPARK-47779 URL: https://issues.apache.org/jira/browse/SPARK-47779 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 4.0.0 Reporter: Ruifeng Zheng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47779) Add a helper function to sort PS Frame/Series
[ https://issues.apache.org/jira/browse/SPARK-47779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng updated SPARK-47779: -- Component/s: PS Tests (was: PySpark) > Add a helper function to sort PS Frame/Series > - > > Key: SPARK-47779 > URL: https://issues.apache.org/jira/browse/SPARK-47779 > Project: Spark > Issue Type: Improvement > Components: PS, Tests >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47778) Promote `--wait` option to all services
Cheng Pan created SPARK-47778: - Summary: Promote `--wait` option to all services Key: SPARK-47778 URL: https://issues.apache.org/jira/browse/SPARK-47778 Project: Spark Issue Type: New Feature Components: Deploy Affects Versions: 4.0.0 Reporter: Cheng Pan SPARK-47040 add `--wait` support to `start-connect-server.sh` ./sbin/start-connect-server.sh [--wait] [options] In [https://github.com/apache/spark/pull/45852,] we discussed and reached a consensus to promote `–wait` to all service for consistency -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47774) Remove redundant rules from `MimaExcludes`
[ https://issues.apache.org/jira/browse/SPARK-47774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-47774: -- Fix Version/s: 3.4.3 > Remove redundant rules from `MimaExcludes` > -- > > Key: SPARK-47774 > URL: https://issues.apache.org/jira/browse/SPARK-47774 > Project: Spark > Issue Type: Sub-task > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0, 3.5.2, 3.4.3 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47776) State store operation cannot work properly with binary inequality collation
[ https://issues.apache.org/jira/browse/SPARK-47776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47776: --- Labels: pull-request-available (was: ) > State store operation cannot work properly with binary inequality collation > --- > > Key: SPARK-47776 > URL: https://issues.apache.org/jira/browse/SPARK-47776 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 4.0.0 >Reporter: Jungtaek Lim >Priority: Blocker > Labels: pull-request-available > > Arguably this is a correctness issue, though we haven't released collation > feature yet. > collation introduces the concept of binary (in)equality, which means in some > collation we no longer be able to just compare the binary format of two > UnsafeRows to determine equality. > For example, 'aaa' and 'AAA' can be "semantically" same in case insensitive > collation. > State store is basically key-value storage, and the most provider > implementations rely on the fact that all the columns in the key schema > support binary equality. We need to disallow using binary inequality column > in the key schema, before we could support this in majority of state store > providers (or high-level of state store.) > Why this is correctness issue? For example, streaming aggregation will > produce an output of aggregation which does not care about the semantic > equality. > e.g. df.groupBy(strCol).count() > Although strCol is case insensitive, 'a' and 'A' won't be counted together in > streaming aggregation, while they should be. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47774) Remove redundant rules from `MimaExcludes`
[ https://issues.apache.org/jira/browse/SPARK-47774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-47774: -- Fix Version/s: 3.5.2 > Remove redundant rules from `MimaExcludes` > -- > > Key: SPARK-47774 > URL: https://issues.apache.org/jira/browse/SPARK-47774 > Project: Spark > Issue Type: Sub-task > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0, 3.5.2 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47774) Remove redundant rules from `MimaExcludes`
[ https://issues.apache.org/jira/browse/SPARK-47774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-47774. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45944 [https://github.com/apache/spark/pull/45944] > Remove redundant rules from `MimaExcludes` > -- > > Key: SPARK-47774 > URL: https://issues.apache.org/jira/browse/SPARK-47774 > Project: Spark > Issue Type: Sub-task > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-47412) StringLPad, StringRPad (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835182#comment-17835182 ] Gideon P commented on SPARK-47412: -- [~uros-db] I agree -- this one can be expected to be simple and familiar. I will move on to this one (while still shepherding https://github.com/apache/spark/pull/45738/ to the finish line). Thanks! I will let you know if I have questions. > StringLPad, StringRPad (all collations) > --- > > Key: SPARK-47412 > URL: https://issues.apache.org/jira/browse/SPARK-47412 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Priority: Major > > Enable collation support for the *StringLPad* & *StringRPad* built-in string > functions in Spark. First confirm what is the expected behaviour for these > functions when given collated strings, then move on to the implementation > that would enable handling strings of all collation types. Implement the > corresponding unit tests (CollationStringExpressionsSuite) and E2E tests > (CollationSuite) to reflect how this function should be used with collation > in SparkSQL, and feel free to use your chosen Spark SQL Editor to experiment > with the existing functions to learn more about how they work. In addition, > look into the possible use-cases and implementation of similar functions > within other other open-source DBMS, such as > [PostgreSQL|https://www.postgresql.org/docs/]. > > The goal for this Jira ticket is to implement the *StringLPad* & *StringRPad* > functions so that they support all collation types currently supported in > Spark. To understand what changes were introduced in order to enable full > collation support for other existing functions in Spark, take a look at the > Spark PRs and Jira tickets for completed tasks in this parent (for example: > Contains, StartsWith, EndsWith). > > Read more about ICU [Collation Concepts|http://example.com/] and > [Collator|http://example.com/] class. Also, refer to the Unicode Technical > Standard for > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47777) Add spark connect test for python streaming data source
[ https://issues.apache.org/jira/browse/SPARK-4?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-4: --- Labels: pull-request-available (was: ) > Add spark connect test for python streaming data source > --- > > Key: SPARK-4 > URL: https://issues.apache.org/jira/browse/SPARK-4 > Project: Spark > Issue Type: Test > Components: PySpark, SS >Affects Versions: 3.5.1 >Reporter: Chaoqin Li >Priority: Major > Labels: pull-request-available > > Make python streaming data source pyspark test also runs on spark connect. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47777) Add spark connect test for python streaming data source
Chaoqin Li created SPARK-4: -- Summary: Add spark connect test for python streaming data source Key: SPARK-4 URL: https://issues.apache.org/jira/browse/SPARK-4 Project: Spark Issue Type: Test Components: PySpark, SS Affects Versions: 3.5.1 Reporter: Chaoqin Li Make python streaming data source pyspark test also runs on spark connect. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47773) Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on Various Native Engines
[ https://issues.apache.org/jira/browse/SPARK-47773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-47773: -- Affects Version/s: 4.0.0 (was: 3.5.1) > Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on > Various Native Engines > > > Key: SPARK-47773 > URL: https://issues.apache.org/jira/browse/SPARK-47773 > Project: Spark > Issue Type: Epic > Components: SQL >Affects Versions: 4.0.0 >Reporter: Ke Jia >Priority: Major > > SPIP doc: > https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing > This > [SPIP|https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing] > outlines the integration of Gluten's physical plan conversion, validation, > and fallback framework into Apache Spark. The goal is to enhance Spark's > flexibility and robustness in executing physical plans and to leverage > Gluten's performance optimizations. Currently, Spark lacks an official > cross-platform execution support for physical plans. Gluten's mechanism, > which employs the Substrait standard, can convert and optimize Spark's > physical plans, thus improving portability, interoperability, and execution > efficiency. > The design proposal advocates for the incorporation of the TransformSupport > interface and its specialized variants—LeafTransformSupport, > UnaryTransformSupport, and BinaryTransformSupport. These are instrumental in > streamlining the conversion of different operator types into a > Substrait-based common format. The validation phase entails a thorough > assessment of the Substrait plan against native backends to ensure > compatibility. In instances where validation does not succeed, Spark's native > operators will be deployed, with requisite transformations to adapt data > formats accordingly. The proposal emphasizes the centrality of the plan > transformation phase, positing it as the foundational step. The subsequent > validation and fallback procedures are slated for consideration upon the > successful establishment of the initial phase. > The integration of Gluten into Spark has already shown significant > performance improvements with ClickHouse and Velox backends and has been > successfully deployed in production by several customers. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-47773) Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on Various Native Engines
[ https://issues.apache.org/jira/browse/SPARK-47773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835181#comment-17835181 ] Dongjoon Hyun commented on SPARK-47773: --- Since this is a new feature which cannot affect old releases, I updated the Affected Version to 4.0.0. > Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on > Various Native Engines > > > Key: SPARK-47773 > URL: https://issues.apache.org/jira/browse/SPARK-47773 > Project: Spark > Issue Type: Epic > Components: SQL >Affects Versions: 4.0.0 >Reporter: Ke Jia >Priority: Major > > SPIP doc: > https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing > This > [SPIP|https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing] > outlines the integration of Gluten's physical plan conversion, validation, > and fallback framework into Apache Spark. The goal is to enhance Spark's > flexibility and robustness in executing physical plans and to leverage > Gluten's performance optimizations. Currently, Spark lacks an official > cross-platform execution support for physical plans. Gluten's mechanism, > which employs the Substrait standard, can convert and optimize Spark's > physical plans, thus improving portability, interoperability, and execution > efficiency. > The design proposal advocates for the incorporation of the TransformSupport > interface and its specialized variants—LeafTransformSupport, > UnaryTransformSupport, and BinaryTransformSupport. These are instrumental in > streamlining the conversion of different operator types into a > Substrait-based common format. The validation phase entails a thorough > assessment of the Substrait plan against native backends to ensure > compatibility. In instances where validation does not succeed, Spark's native > operators will be deployed, with requisite transformations to adapt data > formats accordingly. The proposal emphasizes the centrality of the plan > transformation phase, positing it as the foundational step. The subsequent > validation and fallback procedures are slated for consideration upon the > successful establishment of the initial phase. > The integration of Gluten into Spark has already shown significant > performance improvements with ClickHouse and Velox backends and has been > successfully deployed in production by several customers. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47772) Fix the doctest of mode function
[ https://issues.apache.org/jira/browse/SPARK-47772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-47772. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45940 [https://github.com/apache/spark/pull/45940] > Fix the doctest of mode function > > > Key: SPARK-47772 > URL: https://issues.apache.org/jira/browse/SPARK-47772 > Project: Spark > Issue Type: Improvement > Components: PySpark, Tests >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47776) State store operation cannot work properly with binary inequality collation
Jungtaek Lim created SPARK-47776: Summary: State store operation cannot work properly with binary inequality collation Key: SPARK-47776 URL: https://issues.apache.org/jira/browse/SPARK-47776 Project: Spark Issue Type: Bug Components: Structured Streaming Affects Versions: 4.0.0 Reporter: Jungtaek Lim Arguably this is a correctness issue, though we haven't released collation feature yet. collation introduces the concept of binary (in)equality, which means in some collation we no longer be able to just compare the binary format of two UnsafeRows to determine equality. For example, 'aaa' and 'AAA' can be "semantically" same in case insensitive collation. State store is basically key-value storage, and the most provider implementations rely on the fact that all the columns in the key schema support binary equality. We need to disallow using binary inequality column in the key schema, before we could support this in majority of state store providers (or high-level of state store.) Why this is correctness issue? For example, streaming aggregation will produce an output of aggregation which does not care about the semantic equality. e.g. df.groupBy(strCol).count() Although strCol is case insensitive, 'a' and 'A' won't be counted together in streaming aggregation, while they should be. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-47718) .sql() does not recognize watermark defined upstream
[ https://issues.apache.org/jira/browse/SPARK-47718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835175#comment-17835175 ] Jungtaek Lim commented on SPARK-47718: -- I've lowered down to major - this is neither a regression nor correctness issue. > .sql() does not recognize watermark defined upstream > > > Key: SPARK-47718 > URL: https://issues.apache.org/jira/browse/SPARK-47718 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.5.1 >Reporter: Chloe He >Priority: Major > Labels: pull-request-available > > I have a data pipeline set up in such a way that it reads data from a Kafka > source, does some transformation on the data using pyspark, then writes the > output into a sink (Kafka, Redis, etc). > > My entire pipeline in written in SQL, so I wish to use the .sql() method to > execute SQL on my streaming source directly. > > However, I'm running into the issue where my watermark is not being > recognized by the downstream query via the .sql() method. > > ``` > Python 3.11.8 | packaged by conda-forge | (main, Feb 16 2024, 20:49:36) > [Clang 16.0.6 ] on darwin > Type "help", "copyright", "credits" or "license" for more information. > >>> import pyspark > >>> print(pyspark.__version__) > 3.5.1 > >>> from pyspark.sql import SparkSession > >>> > >>> session = SparkSession.builder \ > ... .config("spark.jars.packages", > "org.apache.spark:spark-sql-kafka-0-10_2.12:3.5.1")\ > ... .getOrCreate() > >>> from pyspark.sql.functions import col, from_json > >>> from pyspark.sql.types import StructField, StructType, TimestampType, > >>> LongType, DoubleType, IntegerType > >>> schema = StructType( > ... [ > ... StructField('createTime', TimestampType(), True), > ... StructField('orderId', LongType(), True), > ... StructField('payAmount', DoubleType(), True), > ... StructField('payPlatform', IntegerType(), True), > ... StructField('provinceId', IntegerType(), True), > ... ]) > >>> > >>> streaming_df = session.readStream\ > ... .format("kafka")\ > ... .option("kafka.bootstrap.servers", "localhost:9092")\ > ... .option("subscribe", "payment_msg")\ > ... .option("startingOffsets","earliest")\ > ... .load()\ > ... .select(from_json(col("value").cast("string"), > schema).alias("parsed_value"))\ > ... .select("parsed_value.*")\ > ... .withWatermark("createTime", "10 seconds") > >>> > >>> streaming_df.createOrReplaceTempView("streaming_df") > >>> session.sql(""" > ... SELECT > ... window.start, window.end, provinceId, sum(payAmount) as totalPayAmount > ... FROM streaming_df > ... GROUP BY provinceId, window('createTime', '1 hour', '30 minutes') > ... ORDER BY window.start > ... """)\ > ... .writeStream\ > ... .format("kafka") \ > ... .option("checkpointLocation", "checkpoint") \ > ... .option("kafka.bootstrap.servers", "localhost:9092") \ > ... .option("topic", "sink") \ > ... .start() > ``` > > This throws exception > ``` > pyspark.errors.exceptions.captured.AnalysisException: Append output mode not > supported when there are streaming aggregations on streaming > DataFrames/DataSets without watermark; line 6 pos 4; > ``` > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47718) .sql() does not recognize watermark defined upstream
[ https://issues.apache.org/jira/browse/SPARK-47718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim updated SPARK-47718: - Priority: Major (was: Blocker) > .sql() does not recognize watermark defined upstream > > > Key: SPARK-47718 > URL: https://issues.apache.org/jira/browse/SPARK-47718 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.5.1 >Reporter: Chloe He >Priority: Major > Labels: pull-request-available > > I have a data pipeline set up in such a way that it reads data from a Kafka > source, does some transformation on the data using pyspark, then writes the > output into a sink (Kafka, Redis, etc). > > My entire pipeline in written in SQL, so I wish to use the .sql() method to > execute SQL on my streaming source directly. > > However, I'm running into the issue where my watermark is not being > recognized by the downstream query via the .sql() method. > > ``` > Python 3.11.8 | packaged by conda-forge | (main, Feb 16 2024, 20:49:36) > [Clang 16.0.6 ] on darwin > Type "help", "copyright", "credits" or "license" for more information. > >>> import pyspark > >>> print(pyspark.__version__) > 3.5.1 > >>> from pyspark.sql import SparkSession > >>> > >>> session = SparkSession.builder \ > ... .config("spark.jars.packages", > "org.apache.spark:spark-sql-kafka-0-10_2.12:3.5.1")\ > ... .getOrCreate() > >>> from pyspark.sql.functions import col, from_json > >>> from pyspark.sql.types import StructField, StructType, TimestampType, > >>> LongType, DoubleType, IntegerType > >>> schema = StructType( > ... [ > ... StructField('createTime', TimestampType(), True), > ... StructField('orderId', LongType(), True), > ... StructField('payAmount', DoubleType(), True), > ... StructField('payPlatform', IntegerType(), True), > ... StructField('provinceId', IntegerType(), True), > ... ]) > >>> > >>> streaming_df = session.readStream\ > ... .format("kafka")\ > ... .option("kafka.bootstrap.servers", "localhost:9092")\ > ... .option("subscribe", "payment_msg")\ > ... .option("startingOffsets","earliest")\ > ... .load()\ > ... .select(from_json(col("value").cast("string"), > schema).alias("parsed_value"))\ > ... .select("parsed_value.*")\ > ... .withWatermark("createTime", "10 seconds") > >>> > >>> streaming_df.createOrReplaceTempView("streaming_df") > >>> session.sql(""" > ... SELECT > ... window.start, window.end, provinceId, sum(payAmount) as totalPayAmount > ... FROM streaming_df > ... GROUP BY provinceId, window('createTime', '1 hour', '30 minutes') > ... ORDER BY window.start > ... """)\ > ... .writeStream\ > ... .format("kafka") \ > ... .option("checkpointLocation", "checkpoint") \ > ... .option("kafka.bootstrap.servers", "localhost:9092") \ > ... .option("topic", "sink") \ > ... .start() > ``` > > This throws exception > ``` > pyspark.errors.exceptions.captured.AnalysisException: Append output mode not > supported when there are streaming aggregations on streaming > DataFrames/DataSets without watermark; line 6 pos 4; > ``` > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-47773) Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on Various Native Engines
[ https://issues.apache.org/jira/browse/SPARK-47773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835171#comment-17835171 ] Ke Jia commented on SPARK-47773: [~viirya] Great, I have added comment permissions in the SPIP Doc. I'm very much looking forward to your feedback. Thanks. > Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on > Various Native Engines > > > Key: SPARK-47773 > URL: https://issues.apache.org/jira/browse/SPARK-47773 > Project: Spark > Issue Type: Epic > Components: SQL >Affects Versions: 3.5.1 >Reporter: Ke Jia >Priority: Major > > SPIP doc: > https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing > This > [SPIP|https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing] > outlines the integration of Gluten's physical plan conversion, validation, > and fallback framework into Apache Spark. The goal is to enhance Spark's > flexibility and robustness in executing physical plans and to leverage > Gluten's performance optimizations. Currently, Spark lacks an official > cross-platform execution support for physical plans. Gluten's mechanism, > which employs the Substrait standard, can convert and optimize Spark's > physical plans, thus improving portability, interoperability, and execution > efficiency. > The design proposal advocates for the incorporation of the TransformSupport > interface and its specialized variants—LeafTransformSupport, > UnaryTransformSupport, and BinaryTransformSupport. These are instrumental in > streamlining the conversion of different operator types into a > Substrait-based common format. The validation phase entails a thorough > assessment of the Substrait plan against native backends to ensure > compatibility. In instances where validation does not succeed, Spark's native > operators will be deployed, with requisite transformations to adapt data > formats accordingly. The proposal emphasizes the centrality of the plan > transformation phase, positing it as the foundational step. The subsequent > validation and fallback procedures are slated for consideration upon the > successful establishment of the initial phase. > The integration of Gluten into Spark has already shown significant > performance improvements with ClickHouse and Velox backends and has been > successfully deployed in production by several customers. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-47773) Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on Various Native Engines
[ https://issues.apache.org/jira/browse/SPARK-47773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835168#comment-17835168 ] L. C. Hsieh commented on SPARK-47773: - Can you open comment on the SPIP doc? > Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on > Various Native Engines > > > Key: SPARK-47773 > URL: https://issues.apache.org/jira/browse/SPARK-47773 > Project: Spark > Issue Type: Epic > Components: SQL >Affects Versions: 3.5.1 >Reporter: Ke Jia >Priority: Major > > SPIP doc: > https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing > This > [SPIP|https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing] > outlines the integration of Gluten's physical plan conversion, validation, > and fallback framework into Apache Spark. The goal is to enhance Spark's > flexibility and robustness in executing physical plans and to leverage > Gluten's performance optimizations. Currently, Spark lacks an official > cross-platform execution support for physical plans. Gluten's mechanism, > which employs the Substrait standard, can convert and optimize Spark's > physical plans, thus improving portability, interoperability, and execution > efficiency. > The design proposal advocates for the incorporation of the TransformSupport > interface and its specialized variants—LeafTransformSupport, > UnaryTransformSupport, and BinaryTransformSupport. These are instrumental in > streamlining the conversion of different operator types into a > Substrait-based common format. The validation phase entails a thorough > assessment of the Substrait plan against native backends to ensure > compatibility. In instances where validation does not succeed, Spark's native > operators will be deployed, with requisite transformations to adapt data > formats accordingly. The proposal emphasizes the centrality of the plan > transformation phase, positing it as the foundational step. The subsequent > validation and fallback procedures are slated for consideration upon the > successful establishment of the initial phase. > The integration of Gluten into Spark has already shown significant > performance improvements with ClickHouse and Velox backends and has been > successfully deployed in production by several customers. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47736) Add support for AbstractArrayType
[ https://issues.apache.org/jira/browse/SPARK-47736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47736: --- Labels: pull-request-available (was: ) > Add support for AbstractArrayType > - > > Key: SPARK-47736 > URL: https://issues.apache.org/jira/browse/SPARK-47736 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Mihailo Milosevic >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47736) Add support for AbstractArrayType
[ https://issues.apache.org/jira/browse/SPARK-47736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mihailo Milosevic updated SPARK-47736: -- Summary: Add support for AbstractArrayType (was: Add support for AbstractArrayType(StringTypeCollated)) > Add support for AbstractArrayType > - > > Key: SPARK-47736 > URL: https://issues.apache.org/jira/browse/SPARK-47736 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Mihailo Milosevic >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47240) SPIP: Structured Logging Framework for Apache Spark
[ https://issues.apache.org/jira/browse/SPARK-47240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47240: --- Labels: pull-request-available (was: ) > SPIP: Structured Logging Framework for Apache Spark > --- > > Key: SPARK-47240 > URL: https://issues.apache.org/jira/browse/SPARK-47240 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Gengliang Wang >Priority: Major > Labels: pull-request-available > > This proposal aims to enhance Apache Spark's logging system by implementing > structured logging. This transition will change the format of the default log > files from plain text to JSON, making them more accessible and analyzable. > The new logs will include crucial identifiers such as worker, executor, > query, job, stage, and task IDs, thereby making the logs more informative and > facilitating easier search and analysis. > h2. Current Logging Format > The current format of Spark logs is plain text, which can be challenging to > parse and analyze efficiently. An example of the current log format is as > follows: > {code:java} > 23/11/29 17:53:44 ERROR BlockManagerMasterEndpoint: Fail to know the executor > 289 is alive or not. > org.apache.spark.SparkException: Exception thrown in awaitResult: > > Caused by: org.apache.spark.rpc.RpcEndpointNotFoundException: .. > {code} > h2. Proposed Structured Logging Format > The proposed change involves structuring the logs in JSON format, which > organizes the log information into easily identifiable fields. Here is how > the new structured log format would look: > {code:java} > { > "ts":"23/11/29 17:53:44", > "level":"ERROR", > "msg":"Fail to know the executor 289 is alive or not", > "context":{ > "executor_id":"289" > }, > "exception":{ > "class":"org.apache.spark.SparkException", > "msg":"Exception thrown in awaitResult", > "stackTrace":"..." > }, > "source":"BlockManagerMasterEndpoint" > } {code} > This format will enable users to upload and directly query > driver/executor/master/worker log files using Spark SQL for more effective > problem-solving and analysis, such as tracking executor losses or identifying > faulty tasks: > {code:java} > spark.read.json("hdfs://hdfs_host/logs").createOrReplaceTempView("logs") > /* To get all the executor lost logs */ > SELECT * FROM logs WHERE contains(message, 'Lost executor'); > /* To get all the distributed logs about executor 289 */ > SELECT * FROM logs WHERE executor_id = 289; > /* To get all the errors on host 100.116.29.4 */ > SELECT * FROM logs WHERE host = "100.116.29.4" and log_level="ERROR"; > {code} > > SPIP doc: > [https://docs.google.com/document/d/1rATVGmFLNVLmtxSpWrEceYm7d-ocgu8ofhryVs4g3XU/edit?usp=sharing] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47765) Add SET COLLATION to parser rules
[ https://issues.apache.org/jira/browse/SPARK-47765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47765: --- Labels: pull-request-available (was: ) > Add SET COLLATION to parser rules > - > > Key: SPARK-47765 > URL: https://issues.apache.org/jira/browse/SPARK-47765 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 4.0.0 >Reporter: Mihailo Milosevic >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org