[jira] [Assigned] (SPARK-34399) Add file commit time to metrics and shown in SQL Tab UI
[ https://issues.apache.org/jira/browse/SPARK-34399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-34399: --- Assignee: angerszhu > Add file commit time to metrics and shown in SQL Tab UI > --- > > Key: SPARK-34399 > URL: https://issues.apache.org/jira/browse/SPARK-34399 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: angerszhu >Assignee: angerszhu >Priority: Major > > Add file commit time to metrics and shown in SQL Tab UI -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-34399) Add file commit time to metrics and shown in SQL Tab UI
[ https://issues.apache.org/jira/browse/SPARK-34399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-34399. - Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 33542 [https://github.com/apache/spark/pull/33542] > Add file commit time to metrics and shown in SQL Tab UI > --- > > Key: SPARK-34399 > URL: https://issues.apache.org/jira/browse/SPARK-34399 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: angerszhu >Assignee: angerszhu >Priority: Major > Fix For: 3.2.0 > > > Add file commit time to metrics and shown in SQL Tab UI -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36312) ParquetWritter should check inner field
[ https://issues.apache.org/jira/browse/SPARK-36312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-36312: --- Assignee: angerszhu > ParquetWritter should check inner field > --- > > Key: SPARK-36312 > URL: https://issues.apache.org/jira/browse/SPARK-36312 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: angerszhu >Assignee: angerszhu >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-36312) ParquetWritter should check inner field
[ https://issues.apache.org/jira/browse/SPARK-36312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-36312. - Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 33531 [https://github.com/apache/spark/pull/33531] > ParquetWritter should check inner field > --- > > Key: SPARK-36312 > URL: https://issues.apache.org/jira/browse/SPARK-36312 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: angerszhu >Assignee: angerszhu >Priority: Major > Fix For: 3.2.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-35639) Add metrics about coalesced partitions to CustomShuffleReader in AQE
[ https://issues.apache.org/jira/browse/SPARK-35639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-35639. - Resolution: Fixed Issue resolved by pull request 32776 [https://github.com/apache/spark/pull/32776] > Add metrics about coalesced partitions to CustomShuffleReader in AQE > > > Key: SPARK-35639 > URL: https://issues.apache.org/jira/browse/SPARK-35639 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Major > Fix For: 3.2.0 > > > {{CustomShuffleReaderExec}} reports "number of skewed partitions" and "number > of skewed partition splits". > It would be useful to also report "number of partitions to coalesce" and > "number of coalesced partitions" and include this in string rendering of the > SparkPlan node so that it looks like this > {code:java} > (12) CustomShuffleReader > Input [2]: [a#23, b#24] > Arguments: coalesced 3 partitions into 1 and split 2 skewed partitions into 4 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-36275) ResolveAggregateFunctions should work with nested fields
[ https://issues.apache.org/jira/browse/SPARK-36275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-36275. - Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 33498 [https://github.com/apache/spark/pull/33498] > ResolveAggregateFunctions should work with nested fields > > > Key: SPARK-36275 > URL: https://issues.apache.org/jira/browse/SPARK-36275 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Major > Fix For: 3.2.0 > > > A sort after Aggregate can fail to resolve if it contains nested fields. For > example > {code:java} > SELECT c.x, SUM(c.y) > FROM VALUES NAMED_STRUCT('x', 'A', 'y', 1), NAMED_STRUCT('x', 'A', 'y', 2) AS > t(c) > GROUP BY c.x > ORDER BY c.x > {code} > Error: > {code} > org.apache.spark.sql.AnalysisException: cannot resolve 'c.x' given input > columns: [sum(c.y), x]; line 5 pos 9; > 'Sort ['c.x ASC NULLS FIRST], true > +- Aggregate [c#0.x], [c#0.x AS x#2, sum(c#0.y) AS sum(c.y)#5L] >+- SubqueryAlias t > +- LocalRelation [c#0] > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36275) ResolveAggregateFunctions should work with nested fields
[ https://issues.apache.org/jira/browse/SPARK-36275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-36275: --- Assignee: Allison Wang > ResolveAggregateFunctions should work with nested fields > > > Key: SPARK-36275 > URL: https://issues.apache.org/jira/browse/SPARK-36275 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Major > > A sort after Aggregate can fail to resolve if it contains nested fields. For > example > {code:java} > SELECT c.x, SUM(c.y) > FROM VALUES NAMED_STRUCT('x', 'A', 'y', 1), NAMED_STRUCT('x', 'A', 'y', 2) AS > t(c) > GROUP BY c.x > ORDER BY c.x > {code} > Error: > {code} > org.apache.spark.sql.AnalysisException: cannot resolve 'c.x' given input > columns: [sum(c.y), x]; line 5 pos 9; > 'Sort ['c.x ASC NULLS FIRST], true > +- Aggregate [c#0.x], [c#0.x AS x#2, sum(c#0.y) AS sum(c.y)#5L] >+- SubqueryAlias t > +- LocalRelation [c#0] > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36028) Allow Project to host outer references in scalar subqueries
[ https://issues.apache.org/jira/browse/SPARK-36028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan updated SPARK-36028: Fix Version/s: (was: 3.3.0) 3.2.0 > Allow Project to host outer references in scalar subqueries > --- > > Key: SPARK-36028 > URL: https://issues.apache.org/jira/browse/SPARK-36028 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Major > Fix For: 3.2.0 > > > Support Project to host outer references in subqueries, for example: > {code:sql} > SELECT (SELECT c1) FROM t > {code} > Currently, it will throw AnalysisException: > {code} > org.apache.spark.sql.AnalysisException: Expressions referencing the outer > query are not supported outside of WHERE/HAVING clauses > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36323) Support ANSI interval literals for TimeWindow
[ https://issues.apache.org/jira/browse/SPARK-36323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36323: Assignee: Kousuke Saruta (was: Apache Spark) > Support ANSI interval literals for TimeWindow > - > > Key: SPARK-36323 > URL: https://issues.apache.org/jira/browse/SPARK-36323 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0, 3.3.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Major > > Like watermark, it's great to support ANSI interval literals for TimeWindow. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36323) Support ANSI interval literals for TimeWindow
[ https://issues.apache.org/jira/browse/SPARK-36323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36323: Assignee: Apache Spark (was: Kousuke Saruta) > Support ANSI interval literals for TimeWindow > - > > Key: SPARK-36323 > URL: https://issues.apache.org/jira/browse/SPARK-36323 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0, 3.3.0 >Reporter: Kousuke Saruta >Assignee: Apache Spark >Priority: Major > > Like watermark, it's great to support ANSI interval literals for TimeWindow. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-36318) Update docs about mapping of ANSI interval types to Java/Scala/SQL types
[ https://issues.apache.org/jira/browse/SPARK-36318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta resolved SPARK-36318. Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved in https://github.com/apache/spark/pull/33543. > Update docs about mapping of ANSI interval types to Java/Scala/SQL types > > > Key: SPARK-36318 > URL: https://issues.apache.org/jira/browse/SPARK-36318 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > Fix For: 3.2.0 > > > Update tables in https://spark.apache.org/docs/latest/sql-ref-datatypes.html > regarding to mapping types to language API types. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36323) Support ANSI interval literals for TimeWindow
[ https://issues.apache.org/jira/browse/SPARK-36323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17388433#comment-17388433 ] Apache Spark commented on SPARK-36323: -- User 'sarutak' has created a pull request for this issue: https://github.com/apache/spark/pull/33551 > Support ANSI interval literals for TimeWindow > - > > Key: SPARK-36323 > URL: https://issues.apache.org/jira/browse/SPARK-36323 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0, 3.3.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Major > > Like watermark, it's great to support ANSI interval literals for TimeWindow. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36323) Support ANSI interval literals for TimeWindow
Kousuke Saruta created SPARK-36323: -- Summary: Support ANSI interval literals for TimeWindow Key: SPARK-36323 URL: https://issues.apache.org/jira/browse/SPARK-36323 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.2.0, 3.3.0 Reporter: Kousuke Saruta Assignee: Kousuke Saruta Like watermark, it's great to support ANSI interval literals for TimeWindow. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36322) Client cannot authenticate via:[TOKEN, KERBEROS]
[ https://issues.apache.org/jira/browse/SPARK-36322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] MengYao updated SPARK-36322: Description: When I run spark thriftserver in spark on k8s, the -- principal parameter and -- KeyTab parameter of Kerberos are specified in the script to start the driver. In fact, they work well, but there is a problem in the next token distribution process, that is, the driver cannot send the token to the executor when the executor registration is successful, so the {color:red}client cannot authenticate via: [token, KERBEROS]{color},The detailed stack information is as follows: java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS] at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:692) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1722) at org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:655) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:742) at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:375) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1533) at org.apache.hadoop.ipc.Client.call(Client.java:1456) at org.apache.hadoop.ipc.Client.call(Client.java:1417) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) at com.sun.proxy.$Proxy20.getBlockLocations(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:255) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy21.getBlockLocations(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1226) at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1213) at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1201) at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:306) at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:272) at org.apache.hadoop.hdfs.DFSInputStream.(DFSInputStream.java:264) at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1526) at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:304) at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:299) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:312) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:769) at org.apache.hadoop.mapred.LineRecordReader.(LineRecordReader.java:109) at org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67) at org.apache.spark.rdd.HadoopRDD$$anon$1.liftedTree1$1(HadoopRDD.scala:267) at org.apache.spark.rdd.HadoopRDD$$anon$1.(HadoopRDD.scala:266) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:224) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:95) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346) at org.apache.spark.rdd.RDD.iterator(RDD.scala:310) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346) at org.apache.spark.rdd.RDD.iterator(RDD.scala:310) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346) at org.apache.spark.rdd.RDD.iterator(RDD.scala:310) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346) at org.apache.spark.rdd.RDD.iterator(RDD.scala:310) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346) at
[jira] [Created] (SPARK-36322) Client cannot authenticate via:[TOKEN, KERBEROS]
MengYao created SPARK-36322: --- Summary: Client cannot authenticate via:[TOKEN, KERBEROS] Key: SPARK-36322 URL: https://issues.apache.org/jira/browse/SPARK-36322 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.4.6 Reporter: MengYao Fix For: 2.4.9 When I run spark thriftserver in spark on k8s, the -- principal parameter and -- KeyTab parameter of Kerberos are specified in the script to start the driver. In fact, they work well, but there is a problem in the next token distribution process, that is, the driver cannot send the token to the executor when the executor registration is successful, so the {color:red}client cannot authenticate via: [token, KERBEROS]{color}The detailed stack information is as follows: java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS] at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:692) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1722) at org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:655) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:742) at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:375) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1533) at org.apache.hadoop.ipc.Client.call(Client.java:1456) at org.apache.hadoop.ipc.Client.call(Client.java:1417) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) at com.sun.proxy.$Proxy20.getBlockLocations(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:255) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy21.getBlockLocations(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1226) at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1213) at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1201) at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:306) at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:272) at org.apache.hadoop.hdfs.DFSInputStream.(DFSInputStream.java:264) at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1526) at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:304) at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:299) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:312) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:769) at org.apache.hadoop.mapred.LineRecordReader.(LineRecordReader.java:109) at org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67) at org.apache.spark.rdd.HadoopRDD$$anon$1.liftedTree1$1(HadoopRDD.scala:267) at org.apache.spark.rdd.HadoopRDD$$anon$1.(HadoopRDD.scala:266) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:224) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:95) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346) at org.apache.spark.rdd.RDD.iterator(RDD.scala:310) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346) at org.apache.spark.rdd.RDD.iterator(RDD.scala:310) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346) at org.apache.spark.rdd.RDD.iterator(RDD.scala:310) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346) at
[jira] [Assigned] (SPARK-36321) Do not fail application in kubernetes if name is too long
[ https://issues.apache.org/jira/browse/SPARK-36321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36321: Assignee: (was: Apache Spark) > Do not fail application in kubernetes if name is too long > - > > Key: SPARK-36321 > URL: https://issues.apache.org/jira/browse/SPARK-36321 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.3.0 >Reporter: XiDuo You >Priority: Major > > If we have a long spark app name and start with k8s master, we will get the > execption. > {code:java} > java.lang.IllegalArgumentException: > 'a-89fe2f7ae71c3570' in > spark.kubernetes.executor.podNamePrefix is invalid. must conform > https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#dns-label-names > and the value length <= 47 > at > org.apache.spark.internal.config.TypedConfigBuilder.$anonfun$checkValue$1(ConfigBuilder.scala:108) > at > org.apache.spark.internal.config.TypedConfigBuilder.$anonfun$transform$1(ConfigBuilder.scala:101) > at scala.Option.map(Option.scala:230) > at > org.apache.spark.internal.config.OptionalConfigEntry.readFrom(ConfigEntry.scala:239) > at > org.apache.spark.internal.config.OptionalConfigEntry.readFrom(ConfigEntry.scala:214) > at org.apache.spark.SparkConf.get(SparkConf.scala:261) > at > org.apache.spark.deploy.k8s.KubernetesConf.get(KubernetesConf.scala:67) > at > org.apache.spark.deploy.k8s.KubernetesExecutorConf.(KubernetesConf.scala:147) > at > org.apache.spark.deploy.k8s.KubernetesConf$.createExecutorConf(KubernetesConf.scala:231) > at > org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$requestNewExecutors$2(ExecutorPodsAllocator.scala:367) > {code} > Use app name as the executor pod name is the Spark internal behavior and we > should not make application failure. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36321) Do not fail application in kubernetes if name is too long
[ https://issues.apache.org/jira/browse/SPARK-36321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36321: Assignee: Apache Spark > Do not fail application in kubernetes if name is too long > - > > Key: SPARK-36321 > URL: https://issues.apache.org/jira/browse/SPARK-36321 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.3.0 >Reporter: XiDuo You >Assignee: Apache Spark >Priority: Major > > If we have a long spark app name and start with k8s master, we will get the > execption. > {code:java} > java.lang.IllegalArgumentException: > 'a-89fe2f7ae71c3570' in > spark.kubernetes.executor.podNamePrefix is invalid. must conform > https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#dns-label-names > and the value length <= 47 > at > org.apache.spark.internal.config.TypedConfigBuilder.$anonfun$checkValue$1(ConfigBuilder.scala:108) > at > org.apache.spark.internal.config.TypedConfigBuilder.$anonfun$transform$1(ConfigBuilder.scala:101) > at scala.Option.map(Option.scala:230) > at > org.apache.spark.internal.config.OptionalConfigEntry.readFrom(ConfigEntry.scala:239) > at > org.apache.spark.internal.config.OptionalConfigEntry.readFrom(ConfigEntry.scala:214) > at org.apache.spark.SparkConf.get(SparkConf.scala:261) > at > org.apache.spark.deploy.k8s.KubernetesConf.get(KubernetesConf.scala:67) > at > org.apache.spark.deploy.k8s.KubernetesExecutorConf.(KubernetesConf.scala:147) > at > org.apache.spark.deploy.k8s.KubernetesConf$.createExecutorConf(KubernetesConf.scala:231) > at > org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$requestNewExecutors$2(ExecutorPodsAllocator.scala:367) > {code} > Use app name as the executor pod name is the Spark internal behavior and we > should not make application failure. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36321) Do not fail application in kubernetes if name is too long
[ https://issues.apache.org/jira/browse/SPARK-36321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17388420#comment-17388420 ] Apache Spark commented on SPARK-36321: -- User 'ulysses-you' has created a pull request for this issue: https://github.com/apache/spark/pull/33550 > Do not fail application in kubernetes if name is too long > - > > Key: SPARK-36321 > URL: https://issues.apache.org/jira/browse/SPARK-36321 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.3.0 >Reporter: XiDuo You >Priority: Major > > If we have a long spark app name and start with k8s master, we will get the > execption. > {code:java} > java.lang.IllegalArgumentException: > 'a-89fe2f7ae71c3570' in > spark.kubernetes.executor.podNamePrefix is invalid. must conform > https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#dns-label-names > and the value length <= 47 > at > org.apache.spark.internal.config.TypedConfigBuilder.$anonfun$checkValue$1(ConfigBuilder.scala:108) > at > org.apache.spark.internal.config.TypedConfigBuilder.$anonfun$transform$1(ConfigBuilder.scala:101) > at scala.Option.map(Option.scala:230) > at > org.apache.spark.internal.config.OptionalConfigEntry.readFrom(ConfigEntry.scala:239) > at > org.apache.spark.internal.config.OptionalConfigEntry.readFrom(ConfigEntry.scala:214) > at org.apache.spark.SparkConf.get(SparkConf.scala:261) > at > org.apache.spark.deploy.k8s.KubernetesConf.get(KubernetesConf.scala:67) > at > org.apache.spark.deploy.k8s.KubernetesExecutorConf.(KubernetesConf.scala:147) > at > org.apache.spark.deploy.k8s.KubernetesConf$.createExecutorConf(KubernetesConf.scala:231) > at > org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$requestNewExecutors$2(ExecutorPodsAllocator.scala:367) > {code} > Use app name as the executor pod name is the Spark internal behavior and we > should not make application failure. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36321) Do not fail application in kubernetes if name is too long
XiDuo You created SPARK-36321: - Summary: Do not fail application in kubernetes if name is too long Key: SPARK-36321 URL: https://issues.apache.org/jira/browse/SPARK-36321 Project: Spark Issue Type: Bug Components: Kubernetes Affects Versions: 3.3.0 Reporter: XiDuo You If we have a long spark app name and start with k8s master, we will get the execption. {code:java} java.lang.IllegalArgumentException: 'a-89fe2f7ae71c3570' in spark.kubernetes.executor.podNamePrefix is invalid. must conform https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#dns-label-names and the value length <= 47 at org.apache.spark.internal.config.TypedConfigBuilder.$anonfun$checkValue$1(ConfigBuilder.scala:108) at org.apache.spark.internal.config.TypedConfigBuilder.$anonfun$transform$1(ConfigBuilder.scala:101) at scala.Option.map(Option.scala:230) at org.apache.spark.internal.config.OptionalConfigEntry.readFrom(ConfigEntry.scala:239) at org.apache.spark.internal.config.OptionalConfigEntry.readFrom(ConfigEntry.scala:214) at org.apache.spark.SparkConf.get(SparkConf.scala:261) at org.apache.spark.deploy.k8s.KubernetesConf.get(KubernetesConf.scala:67) at org.apache.spark.deploy.k8s.KubernetesExecutorConf.(KubernetesConf.scala:147) at org.apache.spark.deploy.k8s.KubernetesConf$.createExecutorConf(KubernetesConf.scala:231) at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$requestNewExecutors$2(ExecutorPodsAllocator.scala:367) {code} Use app name as the executor pod name is the Spark internal behavior and we should not make application failure. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36314) Update Sessionization example to use native support of session window
[ https://issues.apache.org/jira/browse/SPARK-36314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] L. C. Hsieh reassigned SPARK-36314: --- Assignee: Jungtaek Lim > Update Sessionization example to use native support of session window > - > > Key: SPARK-36314 > URL: https://issues.apache.org/jira/browse/SPARK-36314 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.2.0 >Reporter: Jungtaek Lim >Assignee: Jungtaek Lim >Priority: Major > > Currently, Sessionization examples are using flatMapGroupsWithState, which > can be replaced with native support of session windows. We can also probably > provide other complicated sessionization example which requites > flatMapGroupsWithState. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-36314) Update Sessionization example to use native support of session window
[ https://issues.apache.org/jira/browse/SPARK-36314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] L. C. Hsieh resolved SPARK-36314. - Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 33548 [https://github.com/apache/spark/pull/33548] > Update Sessionization example to use native support of session window > - > > Key: SPARK-36314 > URL: https://issues.apache.org/jira/browse/SPARK-36314 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.2.0 >Reporter: Jungtaek Lim >Assignee: Jungtaek Lim >Priority: Major > Fix For: 3.2.0 > > > Currently, Sessionization examples are using flatMapGroupsWithState, which > can be replaced with native support of session windows. We can also probably > provide other complicated sessionization example which requites > flatMapGroupsWithState. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36320) Fix Series/Index.copy() to drop extra columns.
[ https://issues.apache.org/jira/browse/SPARK-36320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36320: Assignee: Apache Spark > Fix Series/Index.copy() to drop extra columns. > -- > > Key: SPARK-36320 > URL: https://issues.apache.org/jira/browse/SPARK-36320 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Takuya Ueshin >Assignee: Apache Spark >Priority: Major > > Currently {{Series}}/{{Index.copy()}} keeps the copy of the anchor DataFrame > which holds unnecessary columns. > We can drop those when {{Series}}/{{Index.copy()}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36320) Fix Series/Index.copy() to drop extra columns.
[ https://issues.apache.org/jira/browse/SPARK-36320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36320: Assignee: (was: Apache Spark) > Fix Series/Index.copy() to drop extra columns. > -- > > Key: SPARK-36320 > URL: https://issues.apache.org/jira/browse/SPARK-36320 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Takuya Ueshin >Priority: Major > > Currently {{Series}}/{{Index.copy()}} keeps the copy of the anchor DataFrame > which holds unnecessary columns. > We can drop those when {{Series}}/{{Index.copy()}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36320) Fix Series/Index.copy() to drop extra columns.
[ https://issues.apache.org/jira/browse/SPARK-36320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17388392#comment-17388392 ] Apache Spark commented on SPARK-36320: -- User 'ueshin' has created a pull request for this issue: https://github.com/apache/spark/pull/33549 > Fix Series/Index.copy() to drop extra columns. > -- > > Key: SPARK-36320 > URL: https://issues.apache.org/jira/browse/SPARK-36320 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Takuya Ueshin >Priority: Major > > Currently {{Series}}/{{Index.copy()}} keeps the copy of the anchor DataFrame > which holds unnecessary columns. > We can drop those when {{Series}}/{{Index.copy()}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36320) Fix Series/Index.copy() to drop extra columns.
Takuya Ueshin created SPARK-36320: - Summary: Fix Series/Index.copy() to drop extra columns. Key: SPARK-36320 URL: https://issues.apache.org/jira/browse/SPARK-36320 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 3.2.0 Reporter: Takuya Ueshin Currently {{Series}}/{{Index.copy()}} keeps the copy of the anchor DataFrame which holds unnecessary columns. We can drop those when {{Series}}/{{Index.copy()}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36099) Group exception messages in core/util
[ https://issues.apache.org/jira/browse/SPARK-36099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17388387#comment-17388387 ] dgd_contributor commented on SPARK-36099: - Sorry I wasn't checking the comment recently, I've done the work for the spark core but didn't create a pull request because I've been waiting for the approve in SPARK-36095. Again, truly sorry for your wasted time. [~Shockang] > Group exception messages in core/util > - > > Key: SPARK-36099 > URL: https://issues.apache.org/jira/browse/SPARK-36099 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Allison Wang >Priority: Major > > 'core/src/main/scala/org/apache/spark/util' > || Filename || Count || > | AccumulatorV2.scala | 4 | > | ClosureCleaner.scala | 1 | > | DependencyUtils.scala| 1 | > | KeyLock.scala| 1 | > | ListenerBus.scala| 1 | > | NextIterator.scala | 1 | > | SerializableBuffer.scala | 2 | > | ThreadUtils.scala| 4 | > | Utils.scala | 16 | > 'core/src/main/scala/org/apache/spark/util/collection' > || Filename || Count || > | AppendOnlyMap.scala | 1 | > | CompactBuffer.scala | 1 | > | ImmutableBitSet.scala | 6 | > | MedianHeap.scala | 1 | > | OpenHashSet.scala | 2 | > 'core/src/main/scala/org/apache/spark/util/io' > || Filename|| Count || > | ChunkedByteBuffer.scala | 1 | > 'core/src/main/scala/org/apache/spark/util/logging' > || Filename || Count || > | DriverLogger.scala | 1 | > 'core/src/main/scala/org/apache/spark/util/random' > || Filename|| Count || > | RandomSampler.scala | 1 | -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36310) Fix hasnan() window function in IndexOpsMixin
[ https://issues.apache.org/jira/browse/SPARK-36310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-36310: Assignee: Xinrong Meng > Fix hasnan() window function in IndexOpsMixin > - > > Key: SPARK-36310 > URL: https://issues.apache.org/jira/browse/SPARK-36310 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > > > {code:java} > File "/__w/spark/spark/python/pyspark/pandas/groupby.py", line 1497, in > pyspark.pandas.groupby.GroupBy.rank > Failed example: > df.groupby("a").rank().sort_index() > Exception raised: > ... > pyspark.sql.utils.AnalysisException: It is not allowed to use a window > function inside an aggregate function. Please use the inner window function > in a sub-query. > {code} > As shown above, hasnans() used in "rank" causes "It is not allowed to use a > window function inside an aggregate function" exception. > We shall adjust that. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-36310) Fix hasnan() window function in IndexOpsMixin
[ https://issues.apache.org/jira/browse/SPARK-36310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-36310. -- Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 33547 [https://github.com/apache/spark/pull/33547] > Fix hasnan() window function in IndexOpsMixin > - > > Key: SPARK-36310 > URL: https://issues.apache.org/jira/browse/SPARK-36310 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > Fix For: 3.2.0 > > > > {code:java} > File "/__w/spark/spark/python/pyspark/pandas/groupby.py", line 1497, in > pyspark.pandas.groupby.GroupBy.rank > Failed example: > df.groupby("a").rank().sort_index() > Exception raised: > ... > pyspark.sql.utils.AnalysisException: It is not allowed to use a window > function inside an aggregate function. Please use the inner window function > in a sub-query. > {code} > As shown above, hasnans() used in "rank" causes "It is not allowed to use a > window function inside an aggregate function" exception. > We shall adjust that. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36094) Group SQL component error messages in Spark error class JSON file
[ https://issues.apache.org/jira/browse/SPARK-36094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karen Feng updated SPARK-36094: --- Description: To improve auditing, reduce duplication, and improve quality of error messages thrown from Spark, we should group them in a single JSON file (as discussed in the [mailing list|http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Add-error-IDs-td31126.html] and introduced in [SPARK-34920|#diff-d41e24da75af19647fadd76ad0b63ecb22b08c0004b07091e4603a30ec0fe013]). In this file, the error messages should be labeled according to a consistent error class and with a SQLSTATE. We will start with the SQL component first. As a starting point, we can build off the exception grouping done in SPARK-33539. In total, there are ~1000 error messages to group split across three files (QueryCompilationErrors, QueryExecutionErrors, and QueryParsingErrors). In this ticket, each of these files is split into chunks of ~20 errors for refactoring. Here is an example PR that groups a few error messages in the QueryCompilationErrors class: [PR 33309|https://github.com/apache/spark/pull/33309]. [Guidelines|https://github.com/apache/spark/blob/master/core/src/main/resources/error/README.md]: - Error classes should be unique and sorted in alphabetical order. - Error classes should be unified as much as possible to improve auditing. If error messages are similar, group them into a single error class and add parameters to the error message. - SQLSTATE should match the ANSI/ISO standard, without introducing new classes or subclasses. - The Throwable should extend [SparkThrowable|https://github.com/apache/spark/blob/master/core/src/main/java/org/apache/spark/SparkThrowable.java]; see [SparkArithmeticException|https://github.com/apache/spark/blob/f90eb6a5db0778fd18b0b544f93eac3103bbf03b/core/src/main/scala/org/apache/spark/SparkException.scala#L75] as an example of how to mix SparkThrowable into a base Exception type. We will improve error message quality as a follow-up. was: To improve auditing, reduce duplication, and improve quality of error messages thrown from Spark, we should group them in a single JSON file (as discussed in the [mailing list|http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Add-error-IDs-td31126.html] and introduced in [SPARK-34920|#diff-d41e24da75af19647fadd76ad0b63ecb22b08c0004b07091e4603a30ec0fe013]). In this file, the error messages should be labeled according to a consistent error class and with a SQLSTATE. We will start with the SQL component first. As a starting point, we can build off the exception grouping done in [SPARK-33539|https://issues.apache.org/jira/browse/SPARK-33539]. In total, there are ~1000 error messages to group split across three files (QueryCompilationErrors, QueryExecutionErrors, and QueryParsingErrors). In this ticket, each of these files is split into chunks of ~20 errors for refactoring. Here is an example PR that groups a few error messages in the QueryCompilationErrors class: [PR 33309|https://github.com/apache/spark/pull/33309]. [Guidelines|https://github.com/apache/spark/blob/master/core/src/main/resources/error/README.md]: - Error classes should be de-duplicated as much as possible to improve auditing. If error messages are similar, group them into a single error class and add parameters to the error message. - SQLSTATE should match the ANSI/ISO standard, without introducing new classes or subclasses. - The Throwable should extend [SparkThrowable|https://github.com/apache/spark/blob/master/core/src/main/java/org/apache/spark/SparkThrowable.java]; see [SparkArithmeticException|https://github.com/apache/spark/blob/f90eb6a5db0778fd18b0b544f93eac3103bbf03b/core/src/main/scala/org/apache/spark/SparkException.scala#L75] as an example of how to mix SparkThrowable into a base Exception type. We will improve error message quality as a follow-up. > Group SQL component error messages in Spark error class JSON file > - > > Key: SPARK-36094 > URL: https://issues.apache.org/jira/browse/SPARK-36094 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 3.2.0 >Reporter: Karen Feng >Priority: Major > > To improve auditing, reduce duplication, and improve quality of error > messages thrown from Spark, we should group them in a single JSON file (as > discussed in the [mailing > list|http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Add-error-IDs-td31126.html] > and introduced in > [SPARK-34920|#diff-d41e24da75af19647fadd76ad0b63ecb22b08c0004b07091e4603a30ec0fe013]). > In this file, the error messages should be labeled according to a consistent > error class and with a SQLSTATE. > We will start with
[jira] [Resolved] (SPARK-35997) Implement comparison operators for CategoricalDtype in pandas API on Spark
[ https://issues.apache.org/jira/browse/SPARK-35997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng resolved SPARK-35997. -- Resolution: Done > Implement comparison operators for CategoricalDtype in pandas API on Spark > -- > > Key: SPARK-35997 > URL: https://issues.apache.org/jira/browse/SPARK-35997 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Xinrong Meng >Priority: Major > > In pandas API on Spark, "<, <=, >, >=" have not been implemented for > CategoricalDtype. > We ought to match pandas' behavior. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36314) Update Sessionization example to use native support of session window
[ https://issues.apache.org/jira/browse/SPARK-36314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17388345#comment-17388345 ] Apache Spark commented on SPARK-36314: -- User 'HeartSaVioR' has created a pull request for this issue: https://github.com/apache/spark/pull/33548 > Update Sessionization example to use native support of session window > - > > Key: SPARK-36314 > URL: https://issues.apache.org/jira/browse/SPARK-36314 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.2.0 >Reporter: Jungtaek Lim >Priority: Major > > Currently, Sessionization examples are using flatMapGroupsWithState, which > can be replaced with native support of session windows. We can also probably > provide other complicated sessionization example which requites > flatMapGroupsWithState. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36314) Update Sessionization example to use native support of session window
[ https://issues.apache.org/jira/browse/SPARK-36314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17388344#comment-17388344 ] Apache Spark commented on SPARK-36314: -- User 'HeartSaVioR' has created a pull request for this issue: https://github.com/apache/spark/pull/33548 > Update Sessionization example to use native support of session window > - > > Key: SPARK-36314 > URL: https://issues.apache.org/jira/browse/SPARK-36314 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.2.0 >Reporter: Jungtaek Lim >Priority: Major > > Currently, Sessionization examples are using flatMapGroupsWithState, which > can be replaced with native support of session windows. We can also probably > provide other complicated sessionization example which requites > flatMapGroupsWithState. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36314) Update Sessionization example to use native support of session window
[ https://issues.apache.org/jira/browse/SPARK-36314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36314: Assignee: (was: Apache Spark) > Update Sessionization example to use native support of session window > - > > Key: SPARK-36314 > URL: https://issues.apache.org/jira/browse/SPARK-36314 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.2.0 >Reporter: Jungtaek Lim >Priority: Major > > Currently, Sessionization examples are using flatMapGroupsWithState, which > can be replaced with native support of session windows. We can also probably > provide other complicated sessionization example which requites > flatMapGroupsWithState. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36314) Update Sessionization example to use native support of session window
[ https://issues.apache.org/jira/browse/SPARK-36314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36314: Assignee: Apache Spark > Update Sessionization example to use native support of session window > - > > Key: SPARK-36314 > URL: https://issues.apache.org/jira/browse/SPARK-36314 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.2.0 >Reporter: Jungtaek Lim >Assignee: Apache Spark >Priority: Major > > Currently, Sessionization examples are using flatMapGroupsWithState, which > can be replaced with native support of session windows. We can also probably > provide other complicated sessionization example which requites > flatMapGroupsWithState. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36190) Improve the rest of DataTypeOps tests by avoiding joins
[ https://issues.apache.org/jira/browse/SPARK-36190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36190: Assignee: (was: Apache Spark) > Improve the rest of DataTypeOps tests by avoiding joins > --- > > Key: SPARK-36190 > URL: https://issues.apache.org/jira/browse/SPARK-36190 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Xinrong Meng >Priority: Major > > bool, string, numeric DataTypeOps tests have been improved by avoiding joins. > Improve the rest of DataTypeOps tests in the same way. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36310) Fix hasnan() window function in IndexOpsMixin
[ https://issues.apache.org/jira/browse/SPARK-36310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36310: Assignee: (was: Apache Spark) > Fix hasnan() window function in IndexOpsMixin > - > > Key: SPARK-36310 > URL: https://issues.apache.org/jira/browse/SPARK-36310 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Xinrong Meng >Priority: Major > > > {code:java} > File "/__w/spark/spark/python/pyspark/pandas/groupby.py", line 1497, in > pyspark.pandas.groupby.GroupBy.rank > Failed example: > df.groupby("a").rank().sort_index() > Exception raised: > ... > pyspark.sql.utils.AnalysisException: It is not allowed to use a window > function inside an aggregate function. Please use the inner window function > in a sub-query. > {code} > As shown above, hasnans() used in "rank" causes "It is not allowed to use a > window function inside an aggregate function" exception. > We shall adjust that. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36310) Fix hasnan() window function in IndexOpsMixin
[ https://issues.apache.org/jira/browse/SPARK-36310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36310: Assignee: Apache Spark > Fix hasnan() window function in IndexOpsMixin > - > > Key: SPARK-36310 > URL: https://issues.apache.org/jira/browse/SPARK-36310 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Xinrong Meng >Assignee: Apache Spark >Priority: Major > > > {code:java} > File "/__w/spark/spark/python/pyspark/pandas/groupby.py", line 1497, in > pyspark.pandas.groupby.GroupBy.rank > Failed example: > df.groupby("a").rank().sort_index() > Exception raised: > ... > pyspark.sql.utils.AnalysisException: It is not allowed to use a window > function inside an aggregate function. Please use the inner window function > in a sub-query. > {code} > As shown above, hasnans() used in "rank" causes "It is not allowed to use a > window function inside an aggregate function" exception. > We shall adjust that. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36310) Fix hasnan() window function in IndexOpsMixin
[ https://issues.apache.org/jira/browse/SPARK-36310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17388312#comment-17388312 ] Apache Spark commented on SPARK-36310: -- User 'ueshin' has created a pull request for this issue: https://github.com/apache/spark/pull/33547 > Fix hasnan() window function in IndexOpsMixin > - > > Key: SPARK-36310 > URL: https://issues.apache.org/jira/browse/SPARK-36310 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Xinrong Meng >Priority: Major > > > {code:java} > File "/__w/spark/spark/python/pyspark/pandas/groupby.py", line 1497, in > pyspark.pandas.groupby.GroupBy.rank > Failed example: > df.groupby("a").rank().sort_index() > Exception raised: > ... > pyspark.sql.utils.AnalysisException: It is not allowed to use a window > function inside an aggregate function. Please use the inner window function > in a sub-query. > {code} > As shown above, hasnans() used in "rank" causes "It is not allowed to use a > window function inside an aggregate function" exception. > We shall adjust that. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36190) Improve the rest of DataTypeOps tests by avoiding joins
[ https://issues.apache.org/jira/browse/SPARK-36190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17388309#comment-17388309 ] Apache Spark commented on SPARK-36190: -- User 'xinrong-databricks' has created a pull request for this issue: https://github.com/apache/spark/pull/33546 > Improve the rest of DataTypeOps tests by avoiding joins > --- > > Key: SPARK-36190 > URL: https://issues.apache.org/jira/browse/SPARK-36190 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Xinrong Meng >Priority: Major > > bool, string, numeric DataTypeOps tests have been improved by avoiding joins. > Improve the rest of DataTypeOps tests in the same way. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36190) Improve the rest of DataTypeOps tests by avoiding joins
[ https://issues.apache.org/jira/browse/SPARK-36190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17388308#comment-17388308 ] Apache Spark commented on SPARK-36190: -- User 'xinrong-databricks' has created a pull request for this issue: https://github.com/apache/spark/pull/33546 > Improve the rest of DataTypeOps tests by avoiding joins > --- > > Key: SPARK-36190 > URL: https://issues.apache.org/jira/browse/SPARK-36190 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Xinrong Meng >Priority: Major > > bool, string, numeric DataTypeOps tests have been improved by avoiding joins. > Improve the rest of DataTypeOps tests in the same way. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36190) Improve the rest of DataTypeOps tests by avoiding joins
[ https://issues.apache.org/jira/browse/SPARK-36190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36190: Assignee: Apache Spark > Improve the rest of DataTypeOps tests by avoiding joins > --- > > Key: SPARK-36190 > URL: https://issues.apache.org/jira/browse/SPARK-36190 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Xinrong Meng >Assignee: Apache Spark >Priority: Major > > bool, string, numeric DataTypeOps tests have been improved by avoiding joins. > Improve the rest of DataTypeOps tests in the same way. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36190) Improve the rest of DataTypeOps tests by avoiding joins
[ https://issues.apache.org/jira/browse/SPARK-36190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-36190: - Summary: Improve the rest of DataTypeOps tests by avoiding joins (was: Improve all DataTypeOps tests by avoiding joins) > Improve the rest of DataTypeOps tests by avoiding joins > --- > > Key: SPARK-36190 > URL: https://issues.apache.org/jira/browse/SPARK-36190 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Xinrong Meng >Priority: Major > > bool, string, numeric DataTypeOps tests have been improved by avoiding joins. > Improve the rest of DataTypeOps tests in the same way. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36190) Improve all DataTypeOps tests by avoiding joins
[ https://issues.apache.org/jira/browse/SPARK-36190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-36190: - Summary: Improve all DataTypeOps tests by avoiding joins (was: Improve the rest of DataTypeOps tests by avoiding joins) > Improve all DataTypeOps tests by avoiding joins > --- > > Key: SPARK-36190 > URL: https://issues.apache.org/jira/browse/SPARK-36190 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Xinrong Meng >Priority: Major > > bool, string, numeric DataTypeOps tests have been improved by avoiding joins. > Improve the rest of DataTypeOps tests in the same way. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36310) Fix hasnan() window function in IndexOpsMixin
[ https://issues.apache.org/jira/browse/SPARK-36310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-36310: - Description: {code:java} File "/__w/spark/spark/python/pyspark/pandas/groupby.py", line 1497, in pyspark.pandas.groupby.GroupBy.rank Failed example: df.groupby("a").rank().sort_index() Exception raised: ... pyspark.sql.utils.AnalysisException: It is not allowed to use a window function inside an aggregate function. Please use the inner window function in a sub-query. {code} As shown above, hasnans() used in "rank" causes "It is not allowed to use a window function inside an aggregate function" exception. We shall adjust that. was: {code:java} File "/__w/spark/spark/python/pyspark/pandas/groupby.py", line 1497, in pyspark.pandas.groupby.GroupBy.rank Failed example: df.groupby("a").rank().sort_index() Exception raised: ... pyspark.sql.utils.AnalysisException: It is not allowed to use a window function inside an aggregate function. Please use the inner window function in a sub-query. {code} As shown above, hasnans() used in "rank" causes "It is not allowed to use a window function inside an aggregate function" exception. any() and all() have the same issue. We shall adjust that. > Fix hasnan() window function in IndexOpsMixin > - > > Key: SPARK-36310 > URL: https://issues.apache.org/jira/browse/SPARK-36310 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Xinrong Meng >Priority: Major > > > {code:java} > File "/__w/spark/spark/python/pyspark/pandas/groupby.py", line 1497, in > pyspark.pandas.groupby.GroupBy.rank > Failed example: > df.groupby("a").rank().sort_index() > Exception raised: > ... > pyspark.sql.utils.AnalysisException: It is not allowed to use a window > function inside an aggregate function. Please use the inner window function > in a sub-query. > {code} > As shown above, hasnans() used in "rank" causes "It is not allowed to use a > window function inside an aggregate function" exception. > We shall adjust that. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36310) Fix hasnan() window function in IndexOpsMixin
[ https://issues.apache.org/jira/browse/SPARK-36310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-36310: - Summary: Fix hasnan() window function in IndexOpsMixin (was: Fix hasnan(), any(), and all() window function in IndexOpsMixin) > Fix hasnan() window function in IndexOpsMixin > - > > Key: SPARK-36310 > URL: https://issues.apache.org/jira/browse/SPARK-36310 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Xinrong Meng >Priority: Major > > > {code:java} > File "/__w/spark/spark/python/pyspark/pandas/groupby.py", line 1497, in > pyspark.pandas.groupby.GroupBy.rank > Failed example: > df.groupby("a").rank().sort_index() > Exception raised: > ... > pyspark.sql.utils.AnalysisException: It is not allowed to use a window > function inside an aggregate function. Please use the inner window function > in a sub-query. > {code} > As shown above, hasnans() used in "rank" causes "It is not allowed to use a > window function inside an aggregate function" exception. > any() and all() have the same issue. > We shall adjust that. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36319) Have Observation return Map instead of Row
[ https://issues.apache.org/jira/browse/SPARK-36319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17388269#comment-17388269 ] Apache Spark commented on SPARK-36319: -- User 'EnricoMi' has created a pull request for this issue: https://github.com/apache/spark/pull/33545 > Have Observation return Map instead of Row > -- > > Key: SPARK-36319 > URL: https://issues.apache.org/jira/browse/SPARK-36319 > Project: Spark > Issue Type: Improvement > Components: Java API, PySpark, SQL >Affects Versions: 3.3.0 >Reporter: Enrico Minack >Priority: Major > > As [~gurwls223] pointed out, the {{Observation}} API (Scala, Java, PySpark) > could return a {{Map}} / {{Dict}}. It currently returns {{Row}} simply > because the metrics are (internal to {{Observation}}) retrieved from the > listener as rows. Since that is hidden from the user by the {{Observation}} > API, there is no need to return {{Row}}. > If there is some value in the original {{Row}}, both could be provided via > {{getAsRow}} and {{getAsMap}}. > The {{Observation}} API has been added to Spark in unreleased 3.3.0, so it > should not be a blocker to remove the {{Row}} return type in 3.3.0 again. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36319) Have Observation return Map instead of Row
[ https://issues.apache.org/jira/browse/SPARK-36319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36319: Assignee: (was: Apache Spark) > Have Observation return Map instead of Row > -- > > Key: SPARK-36319 > URL: https://issues.apache.org/jira/browse/SPARK-36319 > Project: Spark > Issue Type: Improvement > Components: Java API, PySpark, SQL >Affects Versions: 3.3.0 >Reporter: Enrico Minack >Priority: Major > > As [~gurwls223] pointed out, the {{Observation}} API (Scala, Java, PySpark) > could return a {{Map}} / {{Dict}}. It currently returns {{Row}} simply > because the metrics are (internal to {{Observation}}) retrieved from the > listener as rows. Since that is hidden from the user by the {{Observation}} > API, there is no need to return {{Row}}. > If there is some value in the original {{Row}}, both could be provided via > {{getAsRow}} and {{getAsMap}}. > The {{Observation}} API has been added to Spark in unreleased 3.3.0, so it > should not be a blocker to remove the {{Row}} return type in 3.3.0 again. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36319) Have Observation return Map instead of Row
[ https://issues.apache.org/jira/browse/SPARK-36319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36319: Assignee: Apache Spark > Have Observation return Map instead of Row > -- > > Key: SPARK-36319 > URL: https://issues.apache.org/jira/browse/SPARK-36319 > Project: Spark > Issue Type: Improvement > Components: Java API, PySpark, SQL >Affects Versions: 3.3.0 >Reporter: Enrico Minack >Assignee: Apache Spark >Priority: Major > > As [~gurwls223] pointed out, the {{Observation}} API (Scala, Java, PySpark) > could return a {{Map}} / {{Dict}}. It currently returns {{Row}} simply > because the metrics are (internal to {{Observation}}) retrieved from the > listener as rows. Since that is hidden from the user by the {{Observation}} > API, there is no need to return {{Row}}. > If there is some value in the original {{Row}}, both could be provided via > {{getAsRow}} and {{getAsMap}}. > The {{Observation}} API has been added to Spark in unreleased 3.3.0, so it > should not be a blocker to remove the {{Row}} return type in 3.3.0 again. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36319) Have Observation return Map instead of Row
[ https://issues.apache.org/jira/browse/SPARK-36319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17388268#comment-17388268 ] Apache Spark commented on SPARK-36319: -- User 'EnricoMi' has created a pull request for this issue: https://github.com/apache/spark/pull/33545 > Have Observation return Map instead of Row > -- > > Key: SPARK-36319 > URL: https://issues.apache.org/jira/browse/SPARK-36319 > Project: Spark > Issue Type: Improvement > Components: Java API, PySpark, SQL >Affects Versions: 3.3.0 >Reporter: Enrico Minack >Priority: Major > > As [~gurwls223] pointed out, the {{Observation}} API (Scala, Java, PySpark) > could return a {{Map}} / {{Dict}}. It currently returns {{Row}} simply > because the metrics are (internal to {{Observation}}) retrieved from the > listener as rows. Since that is hidden from the user by the {{Observation}} > API, there is no need to return {{Row}}. > If there is some value in the original {{Row}}, both could be provided via > {{getAsRow}} and {{getAsMap}}. > The {{Observation}} API has been added to Spark in unreleased 3.3.0, so it > should not be a blocker to remove the {{Row}} return type in 3.3.0 again. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34927) Support TPCDSQueryBenchmark in Benchmarks
[ https://issues.apache.org/jira/browse/SPARK-34927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17388265#comment-17388265 ] Apache Spark commented on SPARK-34927: -- User 'MyeongKim' has created a pull request for this issue: https://github.com/apache/spark/pull/33544 > Support TPCDSQueryBenchmark in Benchmarks > - > > Key: SPARK-34927 > URL: https://issues.apache.org/jira/browse/SPARK-34927 > Project: Spark > Issue Type: Improvement > Components: SQL, Tests >Affects Versions: 3.2.0 >Reporter: Hyukjin Kwon >Priority: Minor > > Benchmarks.scala currently does not support TPCDSQueryBenchmark. We should > make it supported. See also > https://github.com/apache/spark/pull/32015#issuecomment-89046 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34927) Support TPCDSQueryBenchmark in Benchmarks
[ https://issues.apache.org/jira/browse/SPARK-34927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34927: Assignee: (was: Apache Spark) > Support TPCDSQueryBenchmark in Benchmarks > - > > Key: SPARK-34927 > URL: https://issues.apache.org/jira/browse/SPARK-34927 > Project: Spark > Issue Type: Improvement > Components: SQL, Tests >Affects Versions: 3.2.0 >Reporter: Hyukjin Kwon >Priority: Minor > > Benchmarks.scala currently does not support TPCDSQueryBenchmark. We should > make it supported. See also > https://github.com/apache/spark/pull/32015#issuecomment-89046 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34927) Support TPCDSQueryBenchmark in Benchmarks
[ https://issues.apache.org/jira/browse/SPARK-34927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34927: Assignee: Apache Spark > Support TPCDSQueryBenchmark in Benchmarks > - > > Key: SPARK-34927 > URL: https://issues.apache.org/jira/browse/SPARK-34927 > Project: Spark > Issue Type: Improvement > Components: SQL, Tests >Affects Versions: 3.2.0 >Reporter: Hyukjin Kwon >Assignee: Apache Spark >Priority: Minor > > Benchmarks.scala currently does not support TPCDSQueryBenchmark. We should > make it supported. See also > https://github.com/apache/spark/pull/32015#issuecomment-89046 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34927) Support TPCDSQueryBenchmark in Benchmarks
[ https://issues.apache.org/jira/browse/SPARK-34927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17388266#comment-17388266 ] Apache Spark commented on SPARK-34927: -- User 'MyeongKim' has created a pull request for this issue: https://github.com/apache/spark/pull/33544 > Support TPCDSQueryBenchmark in Benchmarks > - > > Key: SPARK-34927 > URL: https://issues.apache.org/jira/browse/SPARK-34927 > Project: Spark > Issue Type: Improvement > Components: SQL, Tests >Affects Versions: 3.2.0 >Reporter: Hyukjin Kwon >Priority: Minor > > Benchmarks.scala currently does not support TPCDSQueryBenchmark. We should > make it supported. See also > https://github.com/apache/spark/pull/32015#issuecomment-89046 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36094) Group SQL component error messages in Spark error class JSON file
[ https://issues.apache.org/jira/browse/SPARK-36094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karen Feng updated SPARK-36094: --- Description: To improve auditing, reduce duplication, and improve quality of error messages thrown from Spark, we should group them in a single JSON file (as discussed in the [mailing list|http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Add-error-IDs-td31126.html] and introduced in [SPARK-34920|#diff-d41e24da75af19647fadd76ad0b63ecb22b08c0004b07091e4603a30ec0fe013]). In this file, the error messages should be labeled according to a consistent error class and with a SQLSTATE. We will start with the SQL component first. As a starting point, we can build off the exception grouping done in [SPARK-33539|https://issues.apache.org/jira/browse/SPARK-33539]. In total, there are ~1000 error messages to group split across three files (QueryCompilationErrors, QueryExecutionErrors, and QueryParsingErrors). In this ticket, each of these files is split into chunks of ~20 errors for refactoring. Here is an example PR that groups a few error messages in the QueryCompilationErrors class: [PR 33309|https://github.com/apache/spark/pull/33309]. [Guidelines|https://github.com/apache/spark/blob/master/core/src/main/resources/error/README.md]: - Error classes should be de-duplicated as much as possible to improve auditing. If error messages are similar, group them into a single error class and add parameters to the error message. - SQLSTATE should match the ANSI/ISO standard, without introducing new classes or subclasses. - The Throwable should extend [SparkThrowable|https://github.com/apache/spark/blob/master/core/src/main/java/org/apache/spark/SparkThrowable.java]; see [SparkArithmeticException|https://github.com/apache/spark/blob/f90eb6a5db0778fd18b0b544f93eac3103bbf03b/core/src/main/scala/org/apache/spark/SparkException.scala#L75] as an example of how to mix SparkThrowable into a base Exception type. We will improve error message quality as a follow-up. was: To improve auditing, reduce duplication, and improve quality of error messages thrown from Spark, we should group them in a single JSON file (as discussed in the [mailing list|http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Add-error-IDs-td31126.html] and introduced in [SPARK-34920|#diff-d41e24da75af19647fadd76ad0b63ecb22b08c0004b07091e4603a30ec0fe013]). In this file, the error messages should be labeled according to a consistent error class and with a SQLSTATE. We will start with the SQL component first. As a starting point, we can build off the exception grouping done in [SPARK-33539|https://issues.apache.org/jira/browse/SPARK-33539]. In total, there are ~1000 error messages to group split across three files (QueryCompilationErrors, QueryExecutionErrors, and QueryParsingErrors). In this ticket, each of these files is split into chunks of ~20 errors for refactoring. As a guideline, the error classes should be de-duplicated as much as possible to improve auditing. We will improve error message quality as a follow-up. Here is an example PR that groups a few error messages in the QueryCompilationErrors class: [PR 33309|https://github.com/apache/spark/pull/33309]. > Group SQL component error messages in Spark error class JSON file > - > > Key: SPARK-36094 > URL: https://issues.apache.org/jira/browse/SPARK-36094 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 3.2.0 >Reporter: Karen Feng >Priority: Major > > To improve auditing, reduce duplication, and improve quality of error > messages thrown from Spark, we should group them in a single JSON file (as > discussed in the [mailing > list|http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Add-error-IDs-td31126.html] > and introduced in > [SPARK-34920|#diff-d41e24da75af19647fadd76ad0b63ecb22b08c0004b07091e4603a30ec0fe013]). > In this file, the error messages should be labeled according to a consistent > error class and with a SQLSTATE. > We will start with the SQL component first. > As a starting point, we can build off the exception grouping done in > [SPARK-33539|https://issues.apache.org/jira/browse/SPARK-33539]. In total, > there are ~1000 error messages to group split across three files > (QueryCompilationErrors, QueryExecutionErrors, and QueryParsingErrors). In > this ticket, each of these files is split into chunks of ~20 errors for > refactoring. > Here is an example PR that groups a few error messages in the > QueryCompilationErrors class: [PR > 33309|https://github.com/apache/spark/pull/33309]. > [Guidelines|https://github.com/apache/spark/blob/master/core/src/main/resources/error/README.md]: > - Error classes should be
[jira] [Created] (SPARK-36319) Have Observation return Map instead of Row
Enrico Minack created SPARK-36319: - Summary: Have Observation return Map instead of Row Key: SPARK-36319 URL: https://issues.apache.org/jira/browse/SPARK-36319 Project: Spark Issue Type: Improvement Components: Java API, PySpark, SQL Affects Versions: 3.3.0 Reporter: Enrico Minack As [~gurwls223] pointed out, the {{Observation}} API (Scala, Java, PySpark) could return a {{Map}} / {{Dict}}. It currently returns {{Row}} simply because the metrics are (internal to {{Observation}}) retrieved from the listener as rows. Since that is hidden from the user by the {{Observation}} API, there is no need to return {{Row}}. If there is some value in the original {{Row}}, both could be provided via {{getAsRow}} and {{getAsMap}}. The {{Observation}} API has been added to Spark in unreleased 3.3.0, so it should not be a blocker to remove the {{Row}} return type in 3.3.0 again. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36318) Update docs about mapping of ANSI interval types to Java/Scala/SQL types
[ https://issues.apache.org/jira/browse/SPARK-36318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36318: Assignee: Max Gekk (was: Apache Spark) > Update docs about mapping of ANSI interval types to Java/Scala/SQL types > > > Key: SPARK-36318 > URL: https://issues.apache.org/jira/browse/SPARK-36318 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > > Update tables in https://spark.apache.org/docs/latest/sql-ref-datatypes.html > regarding to mapping types to language API types. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36318) Update docs about mapping of ANSI interval types to Java/Scala/SQL types
[ https://issues.apache.org/jira/browse/SPARK-36318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17388236#comment-17388236 ] Apache Spark commented on SPARK-36318: -- User 'MaxGekk' has created a pull request for this issue: https://github.com/apache/spark/pull/33543 > Update docs about mapping of ANSI interval types to Java/Scala/SQL types > > > Key: SPARK-36318 > URL: https://issues.apache.org/jira/browse/SPARK-36318 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > > Update tables in https://spark.apache.org/docs/latest/sql-ref-datatypes.html > regarding to mapping types to language API types. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36318) Update docs about mapping of ANSI interval types to Java/Scala/SQL types
[ https://issues.apache.org/jira/browse/SPARK-36318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36318: Assignee: Apache Spark (was: Max Gekk) > Update docs about mapping of ANSI interval types to Java/Scala/SQL types > > > Key: SPARK-36318 > URL: https://issues.apache.org/jira/browse/SPARK-36318 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Max Gekk >Assignee: Apache Spark >Priority: Major > > Update tables in https://spark.apache.org/docs/latest/sql-ref-datatypes.html > regarding to mapping types to language API types. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36318) Update docs about mapping of ANSI interval types to Java/Scala/SQL types
Max Gekk created SPARK-36318: Summary: Update docs about mapping of ANSI interval types to Java/Scala/SQL types Key: SPARK-36318 URL: https://issues.apache.org/jira/browse/SPARK-36318 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.2.0 Reporter: Max Gekk Assignee: Max Gekk Update tables in https://spark.apache.org/docs/latest/sql-ref-datatypes.html regarding to mapping types to language API types. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-36263) Add Dataset.observe(Observation, Column, Column*) to PySpark
[ https://issues.apache.org/jira/browse/SPARK-36263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-36263. - Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 33484 [https://github.com/apache/spark/pull/33484] > Add Dataset.observe(Observation, Column, Column*) to PySpark > > > Key: SPARK-36263 > URL: https://issues.apache.org/jira/browse/SPARK-36263 > Project: Spark > Issue Type: New Feature > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Enrico Minack >Assignee: Enrico Minack >Priority: Major > Fix For: 3.3.0 > > > With SPARK-34806 we now have a way to use the `Dataset.observe` method > without the need to interact with > `org.apache.spark.sql.util.QueryExecutionListener`. This allows us to easily > retrieve observations in PySpark. > Adding a `Dataset.observe(Observation, Column, Column*)` equivalent to > PySpark's `DataFrame` is straightforward while it allows to utilise > observations from Python. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36263) Add Dataset.observe(Observation, Column, Column*) to PySpark
[ https://issues.apache.org/jira/browse/SPARK-36263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-36263: --- Assignee: Enrico Minack > Add Dataset.observe(Observation, Column, Column*) to PySpark > > > Key: SPARK-36263 > URL: https://issues.apache.org/jira/browse/SPARK-36263 > Project: Spark > Issue Type: New Feature > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Enrico Minack >Assignee: Enrico Minack >Priority: Major > > With SPARK-34806 we now have a way to use the `Dataset.observe` method > without the need to interact with > `org.apache.spark.sql.util.QueryExecutionListener`. This allows us to easily > retrieve observations in PySpark. > Adding a `Dataset.observe(Observation, Column, Column*)` equivalent to > PySpark's `DataFrame` is straightforward while it allows to utilise > observations from Python. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-36185) Implement functions in CategoricalAccessor/CategoricalIndex
[ https://issues.apache.org/jira/browse/SPARK-36185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin resolved SPARK-36185. --- Fix Version/s: 3.2.0 Resolution: Done I'd close this since we've done the tasks we planned under this umbrella ticket. > Implement functions in CategoricalAccessor/CategoricalIndex > --- > > Key: SPARK-36185 > URL: https://issues.apache.org/jira/browse/SPARK-36185 > Project: Spark > Issue Type: Umbrella > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Takuya Ueshin >Priority: Major > Fix For: 3.2.0 > > > There are functions we haven't implemented in {{CategoricalAccessor}} and > {{CategoricalIndex}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36317) PruneFileSourcePartitionsSuite tests are failing after the fix to SPARK-36136
[ https://issues.apache.org/jira/browse/SPARK-36317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17388204#comment-17388204 ] Chao Sun commented on SPARK-36317: -- [~vsowrirajan]: the change is already reverted - are you still seeing the test failures? > PruneFileSourcePartitionsSuite tests are failing after the fix to SPARK-36136 > - > > Key: SPARK-36317 > URL: https://issues.apache.org/jira/browse/SPARK-36317 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 3.2.0 >Reporter: Venkata krishnan Sowrirajan >Priority: Major > > After the fix to [SPARK-36136][SQL][TESTS] Refactor > PruneFileSourcePartitionsSuite etc to a different package, couple of tests in > PruneFileSourcePartitionsSuite are failing now. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36242) Ensure spill file closed before set success to true in ExternalSorter.spillMemoryIteratorToDisk method
[ https://issues.apache.org/jira/browse/SPARK-36242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-36242: -- Affects Version/s: 3.2.0 3.0.3 3.1.2 > Ensure spill file closed before set success to true in > ExternalSorter.spillMemoryIteratorToDisk method > -- > > Key: SPARK-36242 > URL: https://issues.apache.org/jira/browse/SPARK-36242 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.3, 3.1.2, 3.2.0, 3.3.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Minor > Fix For: 3.2.0, 3.1.3, 3.0.4, 3.3.0 > > > The processes of ExternalSorter.spillMemoryIteratorToDisk and > ExternalAppendOnlyMap.spillMemoryIteratorToDisk are similar, but there are > some differences in setting `success = true` > > Code of ExternalSorter.spillMemoryIteratorToDisk as follows: > > {code:java} > if (objectsWritten > 0) { > flush() > } else { > writer.revertPartialWritesAndClose() > } > success = true > } finally { > if (success) { > writer.close() > } else { > ... > } > }{code} > Code of ExternalSorter.spillMemoryIteratorToDisk as follows: > {code:java} > if (objectsWritten > 0) { > flush() > writer.close() > } else { > writer.revertPartialWritesAndClose() > } > success = true > } finally { > if (!success) { > ... > } > }{code} > It seems that the processing of `ExternalSorter.spillMemoryIteratorToDisk` > mehod is more reasonable, We should make sure setting `success = true` after > the spill file is closed > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36242) Ensure spill file closed before set success to true in ExternalSorter.spillMemoryIteratorToDisk method
[ https://issues.apache.org/jira/browse/SPARK-36242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-36242: -- Fix Version/s: 3.0.4 > Ensure spill file closed before set success to true in > ExternalSorter.spillMemoryIteratorToDisk method > -- > > Key: SPARK-36242 > URL: https://issues.apache.org/jira/browse/SPARK-36242 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Minor > Fix For: 3.2.0, 3.1.3, 3.0.4, 3.3.0 > > > The processes of ExternalSorter.spillMemoryIteratorToDisk and > ExternalAppendOnlyMap.spillMemoryIteratorToDisk are similar, but there are > some differences in setting `success = true` > > Code of ExternalSorter.spillMemoryIteratorToDisk as follows: > > {code:java} > if (objectsWritten > 0) { > flush() > } else { > writer.revertPartialWritesAndClose() > } > success = true > } finally { > if (success) { > writer.close() > } else { > ... > } > }{code} > Code of ExternalSorter.spillMemoryIteratorToDisk as follows: > {code:java} > if (objectsWritten > 0) { > flush() > writer.close() > } else { > writer.revertPartialWritesAndClose() > } > success = true > } finally { > if (!success) { > ... > } > }{code} > It seems that the processing of `ExternalSorter.spillMemoryIteratorToDisk` > mehod is more reasonable, We should make sure setting `success = true` after > the spill file is closed > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36242) Ensure spill file closed before set success to true in ExternalSorter.spillMemoryIteratorToDisk method
[ https://issues.apache.org/jira/browse/SPARK-36242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-36242: -- Issue Type: Bug (was: Improvement) > Ensure spill file closed before set success to true in > ExternalSorter.spillMemoryIteratorToDisk method > -- > > Key: SPARK-36242 > URL: https://issues.apache.org/jira/browse/SPARK-36242 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Minor > Fix For: 3.2.0, 3.1.3, 3.0.4, 3.3.0 > > > The processes of ExternalSorter.spillMemoryIteratorToDisk and > ExternalAppendOnlyMap.spillMemoryIteratorToDisk are similar, but there are > some differences in setting `success = true` > > Code of ExternalSorter.spillMemoryIteratorToDisk as follows: > > {code:java} > if (objectsWritten > 0) { > flush() > } else { > writer.revertPartialWritesAndClose() > } > success = true > } finally { > if (success) { > writer.close() > } else { > ... > } > }{code} > Code of ExternalSorter.spillMemoryIteratorToDisk as follows: > {code:java} > if (objectsWritten > 0) { > flush() > writer.close() > } else { > writer.revertPartialWritesAndClose() > } > success = true > } finally { > if (!success) { > ... > } > }{code} > It seems that the processing of `ExternalSorter.spillMemoryIteratorToDisk` > mehod is more reasonable, We should make sure setting `success = true` after > the spill file is closed > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36242) Ensure spill file closed before set success to true in ExternalSorter.spillMemoryIteratorToDisk method
[ https://issues.apache.org/jira/browse/SPARK-36242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-36242: -- Fix Version/s: 3.1.3 > Ensure spill file closed before set success to true in > ExternalSorter.spillMemoryIteratorToDisk method > -- > > Key: SPARK-36242 > URL: https://issues.apache.org/jira/browse/SPARK-36242 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Minor > Fix For: 3.2.0, 3.1.3, 3.3.0 > > > The processes of ExternalSorter.spillMemoryIteratorToDisk and > ExternalAppendOnlyMap.spillMemoryIteratorToDisk are similar, but there are > some differences in setting `success = true` > > Code of ExternalSorter.spillMemoryIteratorToDisk as follows: > > {code:java} > if (objectsWritten > 0) { > flush() > } else { > writer.revertPartialWritesAndClose() > } > success = true > } finally { > if (success) { > writer.close() > } else { > ... > } > }{code} > Code of ExternalSorter.spillMemoryIteratorToDisk as follows: > {code:java} > if (objectsWritten > 0) { > flush() > writer.close() > } else { > writer.revertPartialWritesAndClose() > } > success = true > } finally { > if (!success) { > ... > } > }{code} > It seems that the processing of `ExternalSorter.spillMemoryIteratorToDisk` > mehod is more reasonable, We should make sure setting `success = true` after > the spill file is closed > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34399) Add file commit time to metrics and shown in SQL Tab UI
[ https://issues.apache.org/jira/browse/SPARK-34399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17388195#comment-17388195 ] Apache Spark commented on SPARK-34399: -- User 'gengliangwang' has created a pull request for this issue: https://github.com/apache/spark/pull/33542 > Add file commit time to metrics and shown in SQL Tab UI > --- > > Key: SPARK-34399 > URL: https://issues.apache.org/jira/browse/SPARK-34399 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: angerszhu >Priority: Major > > Add file commit time to metrics and shown in SQL Tab UI -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36317) PruneFileSourcePartitionsSuite tests are failing after the fix to SPARK-36136
Venkata krishnan Sowrirajan created SPARK-36317: --- Summary: PruneFileSourcePartitionsSuite tests are failing after the fix to SPARK-36136 Key: SPARK-36317 URL: https://issues.apache.org/jira/browse/SPARK-36317 Project: Spark Issue Type: Test Components: SQL Affects Versions: 3.2.0 Reporter: Venkata krishnan Sowrirajan After the fix to [SPARK-36136][SQL][TESTS] Refactor PruneFileSourcePartitionsSuite etc to a different package, couple of tests in PruneFileSourcePartitionsSuite are failing now. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36316) NoClassDefFoundError for org.slf4j.impl.StaticLoggerBinder in org.apache.spark.Logging#isLog4j12 when using SLF4J/Logback 2.x
Ian Springer created SPARK-36316: Summary: NoClassDefFoundError for org.slf4j.impl.StaticLoggerBinder in org.apache.spark.Logging#isLog4j12 when using SLF4J/Logback 2.x Key: SPARK-36316 URL: https://issues.apache.org/jira/browse/SPARK-36316 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.1.2 Reporter: Ian Springer When using SLF4J 2.x, I hit the following exception: java.lang.NoClassDefFoundError: org/slf4j/impl/StaticLoggerBinder Caused by: java.lang.ClassNotFoundException: org.slf4j.impl.StaticLoggerBinder This is because org.slf4j.impl.StaticLoggerBinder no longer exists in SLF4J 2.x (see [http://www.slf4j.org/codes.html#StaticLoggerBinder).] Ideally, Spark should not have a hard dependency on an SFL4J 1.x impl classes. Perhaps reflection or NoClassDefFoundError try-catch blocks could be used in the logger detection code, so both SLF4J 1.x and 2.x could be supported at runtime. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36315) Only skip AQEShuffleReadRule in the final stage if it breaks the distribution requirement
[ https://issues.apache.org/jira/browse/SPARK-36315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17388171#comment-17388171 ] Apache Spark commented on SPARK-36315: -- User 'cloud-fan' has created a pull request for this issue: https://github.com/apache/spark/pull/33541 > Only skip AQEShuffleReadRule in the final stage if it breaks the distribution > requirement > - > > Key: SPARK-36315 > URL: https://issues.apache.org/jira/browse/SPARK-36315 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Wenchen Fan >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36315) Only skip AQEShuffleReadRule in the final stage if it breaks the distribution requirement
[ https://issues.apache.org/jira/browse/SPARK-36315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36315: Assignee: Apache Spark > Only skip AQEShuffleReadRule in the final stage if it breaks the distribution > requirement > - > > Key: SPARK-36315 > URL: https://issues.apache.org/jira/browse/SPARK-36315 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Wenchen Fan >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36315) Only skip AQEShuffleReadRule in the final stage if it breaks the distribution requirement
[ https://issues.apache.org/jira/browse/SPARK-36315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36315: Assignee: (was: Apache Spark) > Only skip AQEShuffleReadRule in the final stage if it breaks the distribution > requirement > - > > Key: SPARK-36315 > URL: https://issues.apache.org/jira/browse/SPARK-36315 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Wenchen Fan >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36315) Only skip AQEShuffleReadRule in the final stage if it breaks the distribution requirement
[ https://issues.apache.org/jira/browse/SPARK-36315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17388169#comment-17388169 ] Apache Spark commented on SPARK-36315: -- User 'cloud-fan' has created a pull request for this issue: https://github.com/apache/spark/pull/33541 > Only skip AQEShuffleReadRule in the final stage if it breaks the distribution > requirement > - > > Key: SPARK-36315 > URL: https://issues.apache.org/jira/browse/SPARK-36315 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Wenchen Fan >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36315) Only skip AQEShuffleReadRule in the final stage if it breaks the distribution requirement
Wenchen Fan created SPARK-36315: --- Summary: Only skip AQEShuffleReadRule in the final stage if it breaks the distribution requirement Key: SPARK-36315 URL: https://issues.apache.org/jira/browse/SPARK-36315 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.2.0 Reporter: Wenchen Fan -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-36086) The case of the delta table is inconsistent with parquet
[ https://issues.apache.org/jira/browse/SPARK-36086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17388156#comment-17388156 ] Ruslan Krivoshein edited comment on SPARK-36086 at 7/27/21, 4:03 PM: - Let me handle it, please was (Author: krivosheinruslan): Let me get on with it, please > The case of the delta table is inconsistent with parquet > > > Key: SPARK-36086 > URL: https://issues.apache.org/jira/browse/SPARK-36086 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.1 >Reporter: Yuming Wang >Priority: Major > > How to reproduce this issue: > {noformat} > 1. Add delta-core_2.12-1.0.0-SNAPSHOT.jar to ${SPARK_HOME}/jars. > 2. bin/spark-shell --conf > spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension --conf > spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog > {noformat} > {code:scala} > spark.sql("create table t1 using parquet as select id, id as lower_id from > range(5)") > spark.sql("CREATE VIEW v1 as SELECT * FROM t1") > spark.sql("CREATE TABLE t2 USING DELTA PARTITIONED BY (LOWER_ID) SELECT > LOWER_ID, ID FROM v1") > spark.sql("CREATE TABLE t3 USING PARQUET PARTITIONED BY (LOWER_ID) SELECT > LOWER_ID, ID FROM v1") > spark.sql("desc extended t2").show(false) > spark.sql("desc extended t3").show(false) > {code} > {noformat} > scala> spark.sql("desc extended t2").show(false) > ++--+---+ > |col_name|data_type > |comment| > ++--+---+ > |lower_id|bigint > | | > |id |bigint > | | > || > | | > |# Partitioning | > | | > |Part 0 |lower_id > | | > || > | | > |# Detailed Table Information| > | | > |Name|default.t2 > | | > |Location > |file:/Users/yumwang/Downloads/spark-3.1.1-bin-hadoop2.7/spark-warehouse/t2| > | > |Provider|delta > | | > |Table Properties > |[Type=MANAGED,delta.minReaderVersion=1,delta.minWriterVersion=2] | > | > ++--+---+ > scala> spark.sql("desc extended t3").show(false) > ++--+---+ > |col_name|data_type > |comment| > ++--+---+ > |ID |bigint > |null | > |LOWER_ID|bigint > |null | > |# Partition Information | > | | > |# col_name |data_type > |comment| > |LOWER_ID|bigint > |null | > || > | | > |# Detailed Table Information| > | | > |Database|default > | | > |Table |t3 > | | > |Owner |yumwang > | | >
[jira] [Commented] (SPARK-36086) The case of the delta table is inconsistent with parquet
[ https://issues.apache.org/jira/browse/SPARK-36086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17388156#comment-17388156 ] Ruslan Krivoshein commented on SPARK-36086: --- Let me get on with it, please > The case of the delta table is inconsistent with parquet > > > Key: SPARK-36086 > URL: https://issues.apache.org/jira/browse/SPARK-36086 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.1 >Reporter: Yuming Wang >Priority: Major > > How to reproduce this issue: > {noformat} > 1. Add delta-core_2.12-1.0.0-SNAPSHOT.jar to ${SPARK_HOME}/jars. > 2. bin/spark-shell --conf > spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension --conf > spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog > {noformat} > {code:scala} > spark.sql("create table t1 using parquet as select id, id as lower_id from > range(5)") > spark.sql("CREATE VIEW v1 as SELECT * FROM t1") > spark.sql("CREATE TABLE t2 USING DELTA PARTITIONED BY (LOWER_ID) SELECT > LOWER_ID, ID FROM v1") > spark.sql("CREATE TABLE t3 USING PARQUET PARTITIONED BY (LOWER_ID) SELECT > LOWER_ID, ID FROM v1") > spark.sql("desc extended t2").show(false) > spark.sql("desc extended t3").show(false) > {code} > {noformat} > scala> spark.sql("desc extended t2").show(false) > ++--+---+ > |col_name|data_type > |comment| > ++--+---+ > |lower_id|bigint > | | > |id |bigint > | | > || > | | > |# Partitioning | > | | > |Part 0 |lower_id > | | > || > | | > |# Detailed Table Information| > | | > |Name|default.t2 > | | > |Location > |file:/Users/yumwang/Downloads/spark-3.1.1-bin-hadoop2.7/spark-warehouse/t2| > | > |Provider|delta > | | > |Table Properties > |[Type=MANAGED,delta.minReaderVersion=1,delta.minWriterVersion=2] | > | > ++--+---+ > scala> spark.sql("desc extended t3").show(false) > ++--+---+ > |col_name|data_type > |comment| > ++--+---+ > |ID |bigint > |null | > |LOWER_ID|bigint > |null | > |# Partition Information | > | | > |# col_name |data_type > |comment| > |LOWER_ID|bigint > |null | > || > | | > |# Detailed Table Information| > | | > |Database|default > | | > |Table |t3 > | | > |Owner |yumwang > | | > |Created Time|Mon Jul 12 14:07:16 CST 2021 > |
[jira] [Commented] (SPARK-36099) Group exception messages in core/util
[ https://issues.apache.org/jira/browse/SPARK-36099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17388107#comment-17388107 ] Shockang commented on SPARK-36099: -- It's good for you to wait a day. I was going to submit the PR tonight. Why is it so coincidental![~dc-heros] > Group exception messages in core/util > - > > Key: SPARK-36099 > URL: https://issues.apache.org/jira/browse/SPARK-36099 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Allison Wang >Priority: Major > > 'core/src/main/scala/org/apache/spark/util' > || Filename || Count || > | AccumulatorV2.scala | 4 | > | ClosureCleaner.scala | 1 | > | DependencyUtils.scala| 1 | > | KeyLock.scala| 1 | > | ListenerBus.scala| 1 | > | NextIterator.scala | 1 | > | SerializableBuffer.scala | 2 | > | ThreadUtils.scala| 4 | > | Utils.scala | 16 | > 'core/src/main/scala/org/apache/spark/util/collection' > || Filename || Count || > | AppendOnlyMap.scala | 1 | > | CompactBuffer.scala | 1 | > | ImmutableBitSet.scala | 6 | > | MedianHeap.scala | 1 | > | OpenHashSet.scala | 2 | > 'core/src/main/scala/org/apache/spark/util/io' > || Filename|| Count || > | ChunkedByteBuffer.scala | 1 | > 'core/src/main/scala/org/apache/spark/util/logging' > || Filename || Count || > | DriverLogger.scala | 1 | > 'core/src/main/scala/org/apache/spark/util/random' > || Filename|| Count || > | RandomSampler.scala | 1 | -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36099) Group exception messages in core/util
[ https://issues.apache.org/jira/browse/SPARK-36099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17388101#comment-17388101 ] Shockang commented on SPARK-36099: -- I’m sorry for my gaffe.[~dc-heros] > Group exception messages in core/util > - > > Key: SPARK-36099 > URL: https://issues.apache.org/jira/browse/SPARK-36099 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Allison Wang >Priority: Major > > 'core/src/main/scala/org/apache/spark/util' > || Filename || Count || > | AccumulatorV2.scala | 4 | > | ClosureCleaner.scala | 1 | > | DependencyUtils.scala| 1 | > | KeyLock.scala| 1 | > | ListenerBus.scala| 1 | > | NextIterator.scala | 1 | > | SerializableBuffer.scala | 2 | > | ThreadUtils.scala| 4 | > | Utils.scala | 16 | > 'core/src/main/scala/org/apache/spark/util/collection' > || Filename || Count || > | AppendOnlyMap.scala | 1 | > | CompactBuffer.scala | 1 | > | ImmutableBitSet.scala | 6 | > | MedianHeap.scala | 1 | > | OpenHashSet.scala | 2 | > 'core/src/main/scala/org/apache/spark/util/io' > || Filename|| Count || > | ChunkedByteBuffer.scala | 1 | > 'core/src/main/scala/org/apache/spark/util/logging' > || Filename || Count || > | DriverLogger.scala | 1 | > 'core/src/main/scala/org/apache/spark/util/random' > || Filename|| Count || > | RandomSampler.scala | 1 | -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36099) Group exception messages in core/util
[ https://issues.apache.org/jira/browse/SPARK-36099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17388095#comment-17388095 ] Shockang commented on SPARK-36099: -- I think you should ask my permission, or my time will be wasted.[~dc-heros] > Group exception messages in core/util > - > > Key: SPARK-36099 > URL: https://issues.apache.org/jira/browse/SPARK-36099 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Allison Wang >Priority: Major > > 'core/src/main/scala/org/apache/spark/util' > || Filename || Count || > | AccumulatorV2.scala | 4 | > | ClosureCleaner.scala | 1 | > | DependencyUtils.scala| 1 | > | KeyLock.scala| 1 | > | ListenerBus.scala| 1 | > | NextIterator.scala | 1 | > | SerializableBuffer.scala | 2 | > | ThreadUtils.scala| 4 | > | Utils.scala | 16 | > 'core/src/main/scala/org/apache/spark/util/collection' > || Filename || Count || > | AppendOnlyMap.scala | 1 | > | CompactBuffer.scala | 1 | > | ImmutableBitSet.scala | 6 | > | MedianHeap.scala | 1 | > | OpenHashSet.scala | 2 | > 'core/src/main/scala/org/apache/spark/util/io' > || Filename|| Count || > | ChunkedByteBuffer.scala | 1 | > 'core/src/main/scala/org/apache/spark/util/logging' > || Filename || Count || > | DriverLogger.scala | 1 | > 'core/src/main/scala/org/apache/spark/util/random' > || Filename|| Count || > | RandomSampler.scala | 1 | -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36099) Group exception messages in core/util
[ https://issues.apache.org/jira/browse/SPARK-36099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17388090#comment-17388090 ] Shockang commented on SPARK-36099: -- You should tell me in advance.My code has written more than 200 lines and is preparing to submit pr….[~dc-heros] > Group exception messages in core/util > - > > Key: SPARK-36099 > URL: https://issues.apache.org/jira/browse/SPARK-36099 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Allison Wang >Priority: Major > > 'core/src/main/scala/org/apache/spark/util' > || Filename || Count || > | AccumulatorV2.scala | 4 | > | ClosureCleaner.scala | 1 | > | DependencyUtils.scala| 1 | > | KeyLock.scala| 1 | > | ListenerBus.scala| 1 | > | NextIterator.scala | 1 | > | SerializableBuffer.scala | 2 | > | ThreadUtils.scala| 4 | > | Utils.scala | 16 | > 'core/src/main/scala/org/apache/spark/util/collection' > || Filename || Count || > | AppendOnlyMap.scala | 1 | > | CompactBuffer.scala | 1 | > | ImmutableBitSet.scala | 6 | > | MedianHeap.scala | 1 | > | OpenHashSet.scala | 2 | > 'core/src/main/scala/org/apache/spark/util/io' > || Filename|| Count || > | ChunkedByteBuffer.scala | 1 | > 'core/src/main/scala/org/apache/spark/util/logging' > || Filename || Count || > | DriverLogger.scala | 1 | > 'core/src/main/scala/org/apache/spark/util/random' > || Filename|| Count || > | RandomSampler.scala | 1 | -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36102) Group exception messages in core/deploy
[ https://issues.apache.org/jira/browse/SPARK-36102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36102: Assignee: Apache Spark > Group exception messages in core/deploy > --- > > Key: SPARK-36102 > URL: https://issues.apache.org/jira/browse/SPARK-36102 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Allison Wang >Assignee: Apache Spark >Priority: Major > > 'core/src/main/scala/org/apache/spark/deploy' > || Filename || Count || > | FaultToleranceTest.scala | 1 | > | PythonRunner.scala| 1 | > | RRunner.scala | 2 | > | SparkHadoopUtil.scala | 2 | > | SparkSubmit.scala | 7 | > | SparkSubmitArguments.scala| 3 | > | StandaloneResourceUtils.scala | 1 | > 'core/src/main/scala/org/apache/spark/deploy/history' > || Filename || Count || > | ApplicationCache.scala | 2 | > | EventLogFileWriters.scala| 2 | > | FsHistoryProvider.scala | 5 | > | HistoryServer.scala | 2 | > | HistoryServerMemoryManager.scala | 1 | > 'core/src/main/scala/org/apache/spark/deploy/master' > || Filename || Count || > | Master.scala | 2 | > 'core/src/main/scala/org/apache/spark/deploy/rest' > || Filename|| Count || > | RestSubmissionClient.scala | 11 | > | StandaloneRestServer.scala | 2 | > | SubmitRestProtocolMessage.scala | 5 | > | SubmitRestProtocolRequest.scala | 1 | > 'core/src/main/scala/org/apache/spark/deploy/security' > || Filename || Count || > | HadoopFSDelegationTokenProvider.scala | 1 | > 'core/src/main/scala/org/apache/spark/deploy/worker' > || Filename || Count || > | DriverRunner.scala | 2 | > | Worker.scala | 3 | > 'core/src/main/scala/org/apache/spark/deploy/worker/ui' > || Filename || Count || > | LogPage.scala | 2 | -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36102) Group exception messages in core/deploy
[ https://issues.apache.org/jira/browse/SPARK-36102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17388089#comment-17388089 ] Apache Spark commented on SPARK-36102: -- User 'dgd-contributor' has created a pull request for this issue: https://github.com/apache/spark/pull/33540 > Group exception messages in core/deploy > --- > > Key: SPARK-36102 > URL: https://issues.apache.org/jira/browse/SPARK-36102 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Allison Wang >Priority: Major > > 'core/src/main/scala/org/apache/spark/deploy' > || Filename || Count || > | FaultToleranceTest.scala | 1 | > | PythonRunner.scala| 1 | > | RRunner.scala | 2 | > | SparkHadoopUtil.scala | 2 | > | SparkSubmit.scala | 7 | > | SparkSubmitArguments.scala| 3 | > | StandaloneResourceUtils.scala | 1 | > 'core/src/main/scala/org/apache/spark/deploy/history' > || Filename || Count || > | ApplicationCache.scala | 2 | > | EventLogFileWriters.scala| 2 | > | FsHistoryProvider.scala | 5 | > | HistoryServer.scala | 2 | > | HistoryServerMemoryManager.scala | 1 | > 'core/src/main/scala/org/apache/spark/deploy/master' > || Filename || Count || > | Master.scala | 2 | > 'core/src/main/scala/org/apache/spark/deploy/rest' > || Filename|| Count || > | RestSubmissionClient.scala | 11 | > | StandaloneRestServer.scala | 2 | > | SubmitRestProtocolMessage.scala | 5 | > | SubmitRestProtocolRequest.scala | 1 | > 'core/src/main/scala/org/apache/spark/deploy/security' > || Filename || Count || > | HadoopFSDelegationTokenProvider.scala | 1 | > 'core/src/main/scala/org/apache/spark/deploy/worker' > || Filename || Count || > | DriverRunner.scala | 2 | > | Worker.scala | 3 | > 'core/src/main/scala/org/apache/spark/deploy/worker/ui' > || Filename || Count || > | LogPage.scala | 2 | -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36102) Group exception messages in core/deploy
[ https://issues.apache.org/jira/browse/SPARK-36102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36102: Assignee: (was: Apache Spark) > Group exception messages in core/deploy > --- > > Key: SPARK-36102 > URL: https://issues.apache.org/jira/browse/SPARK-36102 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Allison Wang >Priority: Major > > 'core/src/main/scala/org/apache/spark/deploy' > || Filename || Count || > | FaultToleranceTest.scala | 1 | > | PythonRunner.scala| 1 | > | RRunner.scala | 2 | > | SparkHadoopUtil.scala | 2 | > | SparkSubmit.scala | 7 | > | SparkSubmitArguments.scala| 3 | > | StandaloneResourceUtils.scala | 1 | > 'core/src/main/scala/org/apache/spark/deploy/history' > || Filename || Count || > | ApplicationCache.scala | 2 | > | EventLogFileWriters.scala| 2 | > | FsHistoryProvider.scala | 5 | > | HistoryServer.scala | 2 | > | HistoryServerMemoryManager.scala | 1 | > 'core/src/main/scala/org/apache/spark/deploy/master' > || Filename || Count || > | Master.scala | 2 | > 'core/src/main/scala/org/apache/spark/deploy/rest' > || Filename|| Count || > | RestSubmissionClient.scala | 11 | > | StandaloneRestServer.scala | 2 | > | SubmitRestProtocolMessage.scala | 5 | > | SubmitRestProtocolRequest.scala | 1 | > 'core/src/main/scala/org/apache/spark/deploy/security' > || Filename || Count || > | HadoopFSDelegationTokenProvider.scala | 1 | > 'core/src/main/scala/org/apache/spark/deploy/worker' > || Filename || Count || > | DriverRunner.scala | 2 | > | Worker.scala | 3 | > 'core/src/main/scala/org/apache/spark/deploy/worker/ui' > || Filename || Count || > | LogPage.scala | 2 | -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36101) Group exception messages in core/api
[ https://issues.apache.org/jira/browse/SPARK-36101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36101: Assignee: Apache Spark > Group exception messages in core/api > > > Key: SPARK-36101 > URL: https://issues.apache.org/jira/browse/SPARK-36101 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Allison Wang >Assignee: Apache Spark >Priority: Major > > 'core/src/main/scala/org/apache/spark/api/java' > || Filename|| Count || > | JavaUtils.scala | 1 | > 'core/src/main/scala/org/apache/spark/api/python' > || Filename|| Count || > | Py4JServer.scala| 3 | > | PythonHadoopUtil.scala | 1 | > | PythonRDD.scala | 3 | > | PythonRunner.scala | 4 | > | PythonWorkerFactory.scala | 4 | > | SerDeUtil.scala | 1 | > | WriteInputFormatTestDataGenerator.scala | 2 | > 'core/src/main/scala/org/apache/spark/api/r' > || Filename || Count || > | BaseRRunner.scala | 1 | > | JVMObjectTracker.scala | 1 | > | RBackendHandler.scala | 2 | > | RUtils.scala | 1 | > | SerDe.scala| 4 | -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36101) Group exception messages in core/api
[ https://issues.apache.org/jira/browse/SPARK-36101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17388070#comment-17388070 ] Apache Spark commented on SPARK-36101: -- User 'dgd-contributor' has created a pull request for this issue: https://github.com/apache/spark/pull/33536 > Group exception messages in core/api > > > Key: SPARK-36101 > URL: https://issues.apache.org/jira/browse/SPARK-36101 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Allison Wang >Priority: Major > > 'core/src/main/scala/org/apache/spark/api/java' > || Filename|| Count || > | JavaUtils.scala | 1 | > 'core/src/main/scala/org/apache/spark/api/python' > || Filename|| Count || > | Py4JServer.scala| 3 | > | PythonHadoopUtil.scala | 1 | > | PythonRDD.scala | 3 | > | PythonRunner.scala | 4 | > | PythonWorkerFactory.scala | 4 | > | SerDeUtil.scala | 1 | > | WriteInputFormatTestDataGenerator.scala | 2 | > 'core/src/main/scala/org/apache/spark/api/r' > || Filename || Count || > | BaseRRunner.scala | 1 | > | JVMObjectTracker.scala | 1 | > | RBackendHandler.scala | 2 | > | RUtils.scala | 1 | > | SerDe.scala| 4 | -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36101) Group exception messages in core/api
[ https://issues.apache.org/jira/browse/SPARK-36101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36101: Assignee: (was: Apache Spark) > Group exception messages in core/api > > > Key: SPARK-36101 > URL: https://issues.apache.org/jira/browse/SPARK-36101 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Allison Wang >Priority: Major > > 'core/src/main/scala/org/apache/spark/api/java' > || Filename|| Count || > | JavaUtils.scala | 1 | > 'core/src/main/scala/org/apache/spark/api/python' > || Filename|| Count || > | Py4JServer.scala| 3 | > | PythonHadoopUtil.scala | 1 | > | PythonRDD.scala | 3 | > | PythonRunner.scala | 4 | > | PythonWorkerFactory.scala | 4 | > | SerDeUtil.scala | 1 | > | WriteInputFormatTestDataGenerator.scala | 2 | > 'core/src/main/scala/org/apache/spark/api/r' > || Filename || Count || > | BaseRRunner.scala | 1 | > | JVMObjectTracker.scala | 1 | > | RBackendHandler.scala | 2 | > | RUtils.scala | 1 | > | SerDe.scala| 4 | -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-34249) Add documentation for ANSI implicit cast rules
[ https://issues.apache.org/jira/browse/SPARK-34249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-34249. - Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 33516 [https://github.com/apache/spark/pull/33516] > Add documentation for ANSI implicit cast rules > -- > > Key: SPARK-34249 > URL: https://issues.apache.org/jira/browse/SPARK-34249 > Project: Spark > Issue Type: Sub-task > Components: Documentation, SQL >Affects Versions: 3.2.0 >Reporter: Gengliang Wang >Priority: Major > Fix For: 3.2.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34249) Add documentation for ANSI implicit cast rules
[ https://issues.apache.org/jira/browse/SPARK-34249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-34249: --- Assignee: Gengliang Wang > Add documentation for ANSI implicit cast rules > -- > > Key: SPARK-34249 > URL: https://issues.apache.org/jira/browse/SPARK-34249 > Project: Spark > Issue Type: Sub-task > Components: Documentation, SQL >Affects Versions: 3.2.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > Fix For: 3.2.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34619) Update the Spark SQL guide about day-time and year-month interval types
[ https://issues.apache.org/jira/browse/SPARK-34619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated SPARK-34619: --- Affects Version/s: 3.2.0 > Update the Spark SQL guide about day-time and year-month interval types > --- > > Key: SPARK-34619 > URL: https://issues.apache.org/jira/browse/SPARK-34619 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0, 3.3.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > Fix For: 3.2.0 > > > Describe new types at > http://spark.apache.org/docs/latest/sql-ref-datatypes.html -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34619) Update the Spark SQL guide about day-time and year-month interval types
[ https://issues.apache.org/jira/browse/SPARK-34619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated SPARK-34619: --- Affects Version/s: (was: 3.3.0) > Update the Spark SQL guide about day-time and year-month interval types > --- > > Key: SPARK-34619 > URL: https://issues.apache.org/jira/browse/SPARK-34619 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > Fix For: 3.2.0 > > > Describe new types at > http://spark.apache.org/docs/latest/sql-ref-datatypes.html -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35398) Simplify the way to get classes from ClassBodyEvaluator in CodeGenerator.updateAndGetCompilationStats method
[ https://issues.apache.org/jira/browse/SPARK-35398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35398: Assignee: Apache Spark > Simplify the way to get classes from ClassBodyEvaluator in > CodeGenerator.updateAndGetCompilationStats method > > > Key: SPARK-35398 > URL: https://issues.apache.org/jira/browse/SPARK-35398 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Trivial > > SPARK-35253 upgraded janino from 3.0.16 to 3.1.4, {{ClassBodyEvaluator}} > provides the {{getBytecodes}} method to get > the mapping from {{ClassFile.getThisClassName}} to {{ClassFile.toByteArray}} > directly in this version and we don't need to get this variable by reflection > api anymore. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35398) Simplify the way to get classes from ClassBodyEvaluator in CodeGenerator.updateAndGetCompilationStats method
[ https://issues.apache.org/jira/browse/SPARK-35398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35398: Assignee: (was: Apache Spark) > Simplify the way to get classes from ClassBodyEvaluator in > CodeGenerator.updateAndGetCompilationStats method > > > Key: SPARK-35398 > URL: https://issues.apache.org/jira/browse/SPARK-35398 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Yang Jie >Priority: Trivial > > SPARK-35253 upgraded janino from 3.0.16 to 3.1.4, {{ClassBodyEvaluator}} > provides the {{getBytecodes}} method to get > the mapping from {{ClassFile.getThisClassName}} to {{ClassFile.toByteArray}} > directly in this version and we don't need to get this variable by reflection > api anymore. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35253) Upgrade Janino from 3.0.16 to 3.1.4
[ https://issues.apache.org/jira/browse/SPARK-35253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35253: Assignee: Apache Spark > Upgrade Janino from 3.0.16 to 3.1.4 > --- > > Key: SPARK-35253 > URL: https://issues.apache.org/jira/browse/SPARK-35253 > Project: Spark > Issue Type: Improvement > Components: Build, SQL >Affects Versions: 3.2.0 >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Minor > > From the [change log|http://janino-compiler.github.io/janino/changelog.html], > the janino 3.0.x line has been deprecated, we can use 3.1.x line instead of > it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35253) Upgrade Janino from 3.0.16 to 3.1.4
[ https://issues.apache.org/jira/browse/SPARK-35253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35253: Assignee: (was: Apache Spark) > Upgrade Janino from 3.0.16 to 3.1.4 > --- > > Key: SPARK-35253 > URL: https://issues.apache.org/jira/browse/SPARK-35253 > Project: Spark > Issue Type: Improvement > Components: Build, SQL >Affects Versions: 3.2.0 >Reporter: Yang Jie >Priority: Minor > > From the [change log|http://janino-compiler.github.io/janino/changelog.html], > the janino 3.0.x line has been deprecated, we can use 3.1.x line instead of > it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36295) Refactor sixth set of 20 query execution errors to use error classes
[ https://issues.apache.org/jira/browse/SPARK-36295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17387979#comment-17387979 ] PengLei commented on SPARK-36295: - woking on this > Refactor sixth set of 20 query execution errors to use error classes > > > Key: SPARK-36295 > URL: https://issues.apache.org/jira/browse/SPARK-36295 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, SQL >Affects Versions: 3.2.0 >Reporter: Karen Feng >Priority: Major > > Refactor some exceptions in > [QueryExecutionErrors|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala] > to use error classes. > There are currently ~350 exceptions in this file; so this PR only focuses on > the sixth set of 20. > {code:java} > noRecordsFromEmptyDataReaderError > fileNotFoundError > unsupportedSchemaColumnConvertError > cannotReadParquetFilesError > cannotCreateColumnarReaderError > invalidNamespaceNameError > unsupportedPartitionTransformError > missingDatabaseLocationError > cannotRemoveReservedPropertyError > namespaceNotEmptyError > writingJobFailedError > writingJobAbortedError > commitDeniedError > unsupportedTableWritesError > cannotCreateJDBCTableWithPartitionsError > unsupportedUserSpecifiedSchemaError > writeUnsupportedForBinaryFileDataSourceError > fileLengthExceedsMaxLengthError > unsupportedFieldNameError > cannotSpecifyBothJdbcTableNameAndQueryError > {code} > For more detail, see the parent ticket SPARK-36094. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36294) Refactor fifth set of 20 query execution errors to use error classes
[ https://issues.apache.org/jira/browse/SPARK-36294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17387978#comment-17387978 ] PengLei commented on SPARK-36294: - working on this > Refactor fifth set of 20 query execution errors to use error classes > > > Key: SPARK-36294 > URL: https://issues.apache.org/jira/browse/SPARK-36294 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, SQL >Affects Versions: 3.2.0 >Reporter: Karen Feng >Priority: Major > > Refactor some exceptions in > [QueryExecutionErrors|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala] > to use error classes. > There are currently ~350 exceptions in this file; so this PR only focuses on > the fifth set of 20. > {code:java} > createStreamingSourceNotSpecifySchemaError > streamedOperatorUnsupportedByDataSourceError > multiplePathsSpecifiedError > failedToFindDataSourceError > removedClassInSpark2Error > incompatibleDataSourceRegisterError > unrecognizedFileFormatError > sparkUpgradeInReadingDatesError > sparkUpgradeInWritingDatesError > buildReaderUnsupportedForFileFormatError > jobAbortedError > taskFailedWhileWritingRowsError > readCurrentFileNotFoundError > unsupportedSaveModeError > cannotClearOutputDirectoryError > cannotClearPartitionDirectoryError > failedToCastValueToDataTypeForPartitionColumnError > endOfStreamError > fallbackV1RelationReportsInconsistentSchemaError > cannotDropNonemptyNamespaceError > {code} > For more detail, see the parent ticket SPARK-36094. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-36291) Refactor second set of 20 query execution errors to use error classes
[ https://issues.apache.org/jira/browse/SPARK-36291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] PengLei updated SPARK-36291: Comment: was deleted (was: working on this) > Refactor second set of 20 query execution errors to use error classes > - > > Key: SPARK-36291 > URL: https://issues.apache.org/jira/browse/SPARK-36291 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, SQL >Affects Versions: 3.2.0 >Reporter: Karen Feng >Priority: Major > > Refactor some exceptions in > [QueryExecutionErrors|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala] > to use error classes. > There are currently ~350 exceptions in this file; so this PR only focuses on > the second set of 20. > {code:java} > inputTypeUnsupportedError > invalidFractionOfSecondError > overflowInSumOfDecimalError > overflowInIntegralDivideError > mapSizeExceedArraySizeWhenZipMapError > copyNullFieldNotAllowedError > literalTypeUnsupportedError > noDefaultForDataTypeError > doGenCodeOfAliasShouldNotBeCalledError > orderedOperationUnsupportedByDataTypeError > regexGroupIndexLessThanZeroError > regexGroupIndexExceedGroupCountError > invalidUrlError > dataTypeOperationUnsupportedError > mergeUnsupportedByWindowFunctionError > dataTypeUnexpectedError > typeUnsupportedError > negativeValueUnexpectedError > addNewFunctionMismatchedWithFunctionError > cannotGenerateCodeForUncomparableTypeError > {code} > For more detail, see the parent ticket SPARK-36094. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34619) Update the Spark SQL guide about day-time and year-month interval types
[ https://issues.apache.org/jira/browse/SPARK-34619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated SPARK-34619: --- Affects Version/s: (was: 3.2.0) 3.3.0 > Update the Spark SQL guide about day-time and year-month interval types > --- > > Key: SPARK-34619 > URL: https://issues.apache.org/jira/browse/SPARK-34619 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > Fix For: 3.2.0 > > > Describe new types at > http://spark.apache.org/docs/latest/sql-ref-datatypes.html -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34619) Update the Spark SQL guide about day-time and year-month interval types
[ https://issues.apache.org/jira/browse/SPARK-34619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17387969#comment-17387969 ] Apache Spark commented on SPARK-34619: -- User 'sarutak' has created a pull request for this issue: https://github.com/apache/spark/pull/33539 > Update the Spark SQL guide about day-time and year-month interval types > --- > > Key: SPARK-34619 > URL: https://issues.apache.org/jira/browse/SPARK-34619 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > Fix For: 3.2.0 > > > Describe new types at > http://spark.apache.org/docs/latest/sql-ref-datatypes.html -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org