date:20210727

[jira] [Assigned] (SPARK-34399) Add file commit time to metrics and shown in SQL Tab UI

2021-07-27 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-34399:
---

Assignee: angerszhu

> Add file commit time to metrics and shown in SQL Tab UI
> ---
>
> Key: SPARK-34399
> URL: https://issues.apache.org/jira/browse/SPARK-34399
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Assignee: angerszhu
>Priority: Major
>
> Add file commit time to metrics and shown in SQL Tab UI



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-34399) Add file commit time to metrics and shown in SQL Tab UI

2021-07-27 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-34399.
-
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 33542
[https://github.com/apache/spark/pull/33542]

> Add file commit time to metrics and shown in SQL Tab UI
> ---
>
> Key: SPARK-34399
> URL: https://issues.apache.org/jira/browse/SPARK-34399
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Assignee: angerszhu
>Priority: Major
> Fix For: 3.2.0
>
>
> Add file commit time to metrics and shown in SQL Tab UI



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36312) ParquetWritter should check inner field

2021-07-27 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-36312:
---

Assignee: angerszhu

> ParquetWritter should check inner field
> ---
>
> Key: SPARK-36312
> URL: https://issues.apache.org/jira/browse/SPARK-36312
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Assignee: angerszhu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-36312) ParquetWritter should check inner field

2021-07-27 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-36312.
-
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 33531
[https://github.com/apache/spark/pull/33531]

> ParquetWritter should check inner field
> ---
>
> Key: SPARK-36312
> URL: https://issues.apache.org/jira/browse/SPARK-36312
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Assignee: angerszhu
>Priority: Major
> Fix For: 3.2.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-35639) Add metrics about coalesced partitions to CustomShuffleReader in AQE

2021-07-27 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-35639.
-
Resolution: Fixed

Issue resolved by pull request 32776
[https://github.com/apache/spark/pull/32776]

> Add metrics about coalesced partitions to CustomShuffleReader in AQE
> 
>
> Key: SPARK-35639
> URL: https://issues.apache.org/jira/browse/SPARK-35639
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Fix For: 3.2.0
>
>
> {{CustomShuffleReaderExec}} reports "number of skewed partitions" and "number 
> of skewed partition splits".
>  It would be useful to also report "number of partitions to coalesce" and 
> "number of coalesced partitions" and include this in string rendering of the 
> SparkPlan node so that it looks like this
> {code:java}
> (12) CustomShuffleReader
> Input [2]: [a#23, b#24]
> Arguments: coalesced 3 partitions into 1 and split 2 skewed partitions into 4
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-36275) ResolveAggregateFunctions should work with nested fields

2021-07-27 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-36275.
-
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 33498
[https://github.com/apache/spark/pull/33498]

> ResolveAggregateFunctions should work with nested fields
> 
>
> Key: SPARK-36275
> URL: https://issues.apache.org/jira/browse/SPARK-36275
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
> Fix For: 3.2.0
>
>
> A sort after Aggregate can fail to resolve if it contains nested fields. For 
> example
> {code:java}
> SELECT c.x, SUM(c.y)
> FROM VALUES NAMED_STRUCT('x', 'A', 'y', 1), NAMED_STRUCT('x', 'A', 'y', 2) AS 
> t(c)
> GROUP BY c.x
> ORDER BY c.x
> {code}
> Error:
> {code}
> org.apache.spark.sql.AnalysisException: cannot resolve 'c.x' given input 
> columns: [sum(c.y), x]; line 5 pos 9;
> 'Sort ['c.x ASC NULLS FIRST], true
> +- Aggregate [c#0.x], [c#0.x AS x#2, sum(c#0.y) AS sum(c.y)#5L]
>+- SubqueryAlias t
>   +- LocalRelation [c#0]
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36275) ResolveAggregateFunctions should work with nested fields

2021-07-27 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-36275:
---

Assignee: Allison Wang

> ResolveAggregateFunctions should work with nested fields
> 
>
> Key: SPARK-36275
> URL: https://issues.apache.org/jira/browse/SPARK-36275
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
>
> A sort after Aggregate can fail to resolve if it contains nested fields. For 
> example
> {code:java}
> SELECT c.x, SUM(c.y)
> FROM VALUES NAMED_STRUCT('x', 'A', 'y', 1), NAMED_STRUCT('x', 'A', 'y', 2) AS 
> t(c)
> GROUP BY c.x
> ORDER BY c.x
> {code}
> Error:
> {code}
> org.apache.spark.sql.AnalysisException: cannot resolve 'c.x' given input 
> columns: [sum(c.y), x]; line 5 pos 9;
> 'Sort ['c.x ASC NULLS FIRST], true
> +- Aggregate [c#0.x], [c#0.x AS x#2, sum(c#0.y) AS sum(c.y)#5L]
>+- SubqueryAlias t
>   +- LocalRelation [c#0]
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-36028) Allow Project to host outer references in scalar subqueries

2021-07-27 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan updated SPARK-36028:

Fix Version/s: (was: 3.3.0)
   3.2.0

> Allow Project to host outer references in scalar subqueries
> ---
>
> Key: SPARK-36028
> URL: https://issues.apache.org/jira/browse/SPARK-36028
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
> Fix For: 3.2.0
>
>
> Support Project to host outer references in subqueries, for example:
> {code:sql}
> SELECT (SELECT c1) FROM t
> {code}
> Currently, it will throw AnalysisException:
> {code}
> org.apache.spark.sql.AnalysisException: Expressions referencing the outer 
> query are not supported outside of WHERE/HAVING clauses
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36323) Support ANSI interval literals for TimeWindow

2021-07-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36323:


Assignee: Kousuke Saruta  (was: Apache Spark)

> Support ANSI interval literals for TimeWindow
> -
>
> Key: SPARK-36323
> URL: https://issues.apache.org/jira/browse/SPARK-36323
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Major
>
> Like watermark, it's great to support ANSI interval literals for TimeWindow.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36323) Support ANSI interval literals for TimeWindow

2021-07-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36323:


Assignee: Apache Spark  (was: Kousuke Saruta)

> Support ANSI interval literals for TimeWindow
> -
>
> Key: SPARK-36323
> URL: https://issues.apache.org/jira/browse/SPARK-36323
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Kousuke Saruta
>Assignee: Apache Spark
>Priority: Major
>
> Like watermark, it's great to support ANSI interval literals for TimeWindow.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-36318) Update docs about mapping of ANSI interval types to Java/Scala/SQL types

2021-07-27 Thread Kousuke Saruta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta resolved SPARK-36318.

Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved in https://github.com/apache/spark/pull/33543.

> Update docs about mapping of ANSI interval types to Java/Scala/SQL types
> 
>
> Key: SPARK-36318
> URL: https://issues.apache.org/jira/browse/SPARK-36318
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
> Fix For: 3.2.0
>
>
> Update tables in https://spark.apache.org/docs/latest/sql-ref-datatypes.html 
> regarding to mapping types to language API types.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36323) Support ANSI interval literals for TimeWindow

2021-07-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17388433#comment-17388433
 ] 

Apache Spark commented on SPARK-36323:
--

User 'sarutak' has created a pull request for this issue:
https://github.com/apache/spark/pull/33551

> Support ANSI interval literals for TimeWindow
> -
>
> Key: SPARK-36323
> URL: https://issues.apache.org/jira/browse/SPARK-36323
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Major
>
> Like watermark, it's great to support ANSI interval literals for TimeWindow.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-36323) Support ANSI interval literals for TimeWindow

2021-07-27 Thread Kousuke Saruta (Jira)

Kousuke Saruta created SPARK-36323:
--

 Summary: Support ANSI interval literals for TimeWindow
 Key: SPARK-36323
 URL: https://issues.apache.org/jira/browse/SPARK-36323
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0, 3.3.0
Reporter: Kousuke Saruta
Assignee: Kousuke Saruta


Like watermark, it's great to support ANSI interval literals for TimeWindow.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-36322) Client cannot authenticate via:[TOKEN, KERBEROS]

2021-07-27 Thread MengYao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

MengYao updated SPARK-36322:

Description: 
When I run spark thriftserver in spark on k8s, the -- principal parameter and 
-- KeyTab parameter of Kerberos are specified in the script to start the 
driver. In fact, they work well, but there is a problem in the next token 
distribution process, that is, the driver cannot send the token to the executor 
when the executor registration is successful, so the {color:red}client cannot 
authenticate via: [token, KERBEROS]{color}，The detailed stack information is as 
follows：
java.io.IOException: org.apache.hadoop.security.AccessControlException: Client 
cannot authenticate via:[TOKEN, KERBEROS]
at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:692)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1722)
at 
org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:655)
at 
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:742)
at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:375)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1533)
at org.apache.hadoop.ipc.Client.call(Client.java:1456)
at org.apache.hadoop.ipc.Client.call(Client.java:1417)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy20.getBlockLocations(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:255)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy21.getBlockLocations(Unknown Source)
at 
org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1226)
at 
org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1213)
at 
org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1201)
at 
org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:306)
at 
org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:272)
at org.apache.hadoop.hdfs.DFSInputStream.(DFSInputStream.java:264)
at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1526)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:304)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:299)
at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:312)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:769)
at 
org.apache.hadoop.mapred.LineRecordReader.(LineRecordReader.java:109)
at 
org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67)
at 
org.apache.spark.rdd.HadoopRDD$$anon$1.liftedTree1$1(HadoopRDD.scala:267)
at org.apache.spark.rdd.HadoopRDD$$anon$1.(HadoopRDD.scala:266)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:224)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:95)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
at

[jira] [Created] (SPARK-36322) Client cannot authenticate via:[TOKEN, KERBEROS]

2021-07-27 Thread MengYao (Jira)

MengYao created SPARK-36322:
---

 Summary: Client cannot authenticate via:[TOKEN, KERBEROS]
 Key: SPARK-36322
 URL: https://issues.apache.org/jira/browse/SPARK-36322
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.4.6
Reporter: MengYao
 Fix For: 2.4.9


When I run spark thriftserver in spark on k8s, the -- principal parameter and 
-- KeyTab parameter of Kerberos are specified in the script to start the 
driver. In fact, they work well, but there is a problem in the next token 
distribution process, that is, the driver cannot send the token to the executor 
when the executor registration is successful, so the {color:red}client cannot 
authenticate via: [token, KERBEROS]{color}The detailed stack information is as 
follows：
java.io.IOException: org.apache.hadoop.security.AccessControlException: Client 
cannot authenticate via:[TOKEN, KERBEROS]
at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:692)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1722)
at 
org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:655)
at 
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:742)
at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:375)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1533)
at org.apache.hadoop.ipc.Client.call(Client.java:1456)
at org.apache.hadoop.ipc.Client.call(Client.java:1417)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy20.getBlockLocations(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:255)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy21.getBlockLocations(Unknown Source)
at 
org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1226)
at 
org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1213)
at 
org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1201)
at 
org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:306)
at 
org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:272)
at org.apache.hadoop.hdfs.DFSInputStream.(DFSInputStream.java:264)
at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1526)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:304)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:299)
at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:312)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:769)
at 
org.apache.hadoop.mapred.LineRecordReader.(LineRecordReader.java:109)
at 
org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67)
at 
org.apache.spark.rdd.HadoopRDD$$anon$1.liftedTree1$1(HadoopRDD.scala:267)
at org.apache.spark.rdd.HadoopRDD$$anon$1.(HadoopRDD.scala:266)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:224)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:95)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
at

[jira] [Assigned] (SPARK-36321) Do not fail application in kubernetes if name is too long

2021-07-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36321:


Assignee: (was: Apache Spark)

> Do not fail application in kubernetes if name is too long
> -
>
> Key: SPARK-36321
> URL: https://issues.apache.org/jira/browse/SPARK-36321
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.3.0
>Reporter: XiDuo You
>Priority: Major
>
> If we have a long spark app name and start with k8s master, we will get the 
> execption.
> {code:java}
> java.lang.IllegalArgumentException: 
> 'a-89fe2f7ae71c3570' in 
> spark.kubernetes.executor.podNamePrefix is invalid. must conform 
> https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#dns-label-names
>  and the value length <= 47
>   at 
> org.apache.spark.internal.config.TypedConfigBuilder.$anonfun$checkValue$1(ConfigBuilder.scala:108)
>   at 
> org.apache.spark.internal.config.TypedConfigBuilder.$anonfun$transform$1(ConfigBuilder.scala:101)
>   at scala.Option.map(Option.scala:230)
>   at 
> org.apache.spark.internal.config.OptionalConfigEntry.readFrom(ConfigEntry.scala:239)
>   at 
> org.apache.spark.internal.config.OptionalConfigEntry.readFrom(ConfigEntry.scala:214)
>   at org.apache.spark.SparkConf.get(SparkConf.scala:261)
>   at 
> org.apache.spark.deploy.k8s.KubernetesConf.get(KubernetesConf.scala:67)
>   at 
> org.apache.spark.deploy.k8s.KubernetesExecutorConf.(KubernetesConf.scala:147)
>   at 
> org.apache.spark.deploy.k8s.KubernetesConf$.createExecutorConf(KubernetesConf.scala:231)
>   at 
> org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$requestNewExecutors$2(ExecutorPodsAllocator.scala:367)
> {code}
> Use app name as the executor pod name is the Spark internal behavior and we 
> should not make application failure.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36321) Do not fail application in kubernetes if name is too long

2021-07-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36321:


Assignee: Apache Spark

> Do not fail application in kubernetes if name is too long
> -
>
> Key: SPARK-36321
> URL: https://issues.apache.org/jira/browse/SPARK-36321
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.3.0
>Reporter: XiDuo You
>Assignee: Apache Spark
>Priority: Major
>
> If we have a long spark app name and start with k8s master, we will get the 
> execption.
> {code:java}
> java.lang.IllegalArgumentException: 
> 'a-89fe2f7ae71c3570' in 
> spark.kubernetes.executor.podNamePrefix is invalid. must conform 
> https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#dns-label-names
>  and the value length <= 47
>   at 
> org.apache.spark.internal.config.TypedConfigBuilder.$anonfun$checkValue$1(ConfigBuilder.scala:108)
>   at 
> org.apache.spark.internal.config.TypedConfigBuilder.$anonfun$transform$1(ConfigBuilder.scala:101)
>   at scala.Option.map(Option.scala:230)
>   at 
> org.apache.spark.internal.config.OptionalConfigEntry.readFrom(ConfigEntry.scala:239)
>   at 
> org.apache.spark.internal.config.OptionalConfigEntry.readFrom(ConfigEntry.scala:214)
>   at org.apache.spark.SparkConf.get(SparkConf.scala:261)
>   at 
> org.apache.spark.deploy.k8s.KubernetesConf.get(KubernetesConf.scala:67)
>   at 
> org.apache.spark.deploy.k8s.KubernetesExecutorConf.(KubernetesConf.scala:147)
>   at 
> org.apache.spark.deploy.k8s.KubernetesConf$.createExecutorConf(KubernetesConf.scala:231)
>   at 
> org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$requestNewExecutors$2(ExecutorPodsAllocator.scala:367)
> {code}
> Use app name as the executor pod name is the Spark internal behavior and we 
> should not make application failure.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36321) Do not fail application in kubernetes if name is too long

2021-07-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17388420#comment-17388420
 ] 

Apache Spark commented on SPARK-36321:
--

User 'ulysses-you' has created a pull request for this issue:
https://github.com/apache/spark/pull/33550

> Do not fail application in kubernetes if name is too long
> -
>
> Key: SPARK-36321
> URL: https://issues.apache.org/jira/browse/SPARK-36321
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.3.0
>Reporter: XiDuo You
>Priority: Major
>
> If we have a long spark app name and start with k8s master, we will get the 
> execption.
> {code:java}
> java.lang.IllegalArgumentException: 
> 'a-89fe2f7ae71c3570' in 
> spark.kubernetes.executor.podNamePrefix is invalid. must conform 
> https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#dns-label-names
>  and the value length <= 47
>   at 
> org.apache.spark.internal.config.TypedConfigBuilder.$anonfun$checkValue$1(ConfigBuilder.scala:108)
>   at 
> org.apache.spark.internal.config.TypedConfigBuilder.$anonfun$transform$1(ConfigBuilder.scala:101)
>   at scala.Option.map(Option.scala:230)
>   at 
> org.apache.spark.internal.config.OptionalConfigEntry.readFrom(ConfigEntry.scala:239)
>   at 
> org.apache.spark.internal.config.OptionalConfigEntry.readFrom(ConfigEntry.scala:214)
>   at org.apache.spark.SparkConf.get(SparkConf.scala:261)
>   at 
> org.apache.spark.deploy.k8s.KubernetesConf.get(KubernetesConf.scala:67)
>   at 
> org.apache.spark.deploy.k8s.KubernetesExecutorConf.(KubernetesConf.scala:147)
>   at 
> org.apache.spark.deploy.k8s.KubernetesConf$.createExecutorConf(KubernetesConf.scala:231)
>   at 
> org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$requestNewExecutors$2(ExecutorPodsAllocator.scala:367)
> {code}
> Use app name as the executor pod name is the Spark internal behavior and we 
> should not make application failure.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-36321) Do not fail application in kubernetes if name is too long

2021-07-27 Thread XiDuo You (Jira)

XiDuo You created SPARK-36321:
-

 Summary: Do not fail application in kubernetes if name is too long
 Key: SPARK-36321
 URL: https://issues.apache.org/jira/browse/SPARK-36321
 Project: Spark
  Issue Type: Bug
  Components: Kubernetes
Affects Versions: 3.3.0
Reporter: XiDuo You


If we have a long spark app name and start with k8s master, we will get the 
execption.

{code:java}
java.lang.IllegalArgumentException: 
'a-89fe2f7ae71c3570' in 
spark.kubernetes.executor.podNamePrefix is invalid. must conform 
https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#dns-label-names
 and the value length <= 47
at 
org.apache.spark.internal.config.TypedConfigBuilder.$anonfun$checkValue$1(ConfigBuilder.scala:108)
at 
org.apache.spark.internal.config.TypedConfigBuilder.$anonfun$transform$1(ConfigBuilder.scala:101)
at scala.Option.map(Option.scala:230)
at 
org.apache.spark.internal.config.OptionalConfigEntry.readFrom(ConfigEntry.scala:239)
at 
org.apache.spark.internal.config.OptionalConfigEntry.readFrom(ConfigEntry.scala:214)
at org.apache.spark.SparkConf.get(SparkConf.scala:261)
at 
org.apache.spark.deploy.k8s.KubernetesConf.get(KubernetesConf.scala:67)
at 
org.apache.spark.deploy.k8s.KubernetesExecutorConf.(KubernetesConf.scala:147)
at 
org.apache.spark.deploy.k8s.KubernetesConf$.createExecutorConf(KubernetesConf.scala:231)
at 
org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$requestNewExecutors$2(ExecutorPodsAllocator.scala:367)
{code}

Use app name as the executor pod name is the Spark internal behavior and we 
should not make application failure.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36314) Update Sessionization example to use native support of session window

2021-07-27 Thread L. C. Hsieh (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

L. C. Hsieh reassigned SPARK-36314:
---

Assignee: Jungtaek Lim

> Update Sessionization example to use native support of session window
> -
>
> Key: SPARK-36314
> URL: https://issues.apache.org/jira/browse/SPARK-36314
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.2.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Major
>
> Currently, Sessionization examples are using flatMapGroupsWithState, which 
> can be replaced with native support of session windows. We can also probably 
> provide other complicated sessionization example which requites 
> flatMapGroupsWithState.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-36314) Update Sessionization example to use native support of session window

2021-07-27 Thread L. C. Hsieh (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

L. C. Hsieh resolved SPARK-36314.
-
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 33548
[https://github.com/apache/spark/pull/33548]

> Update Sessionization example to use native support of session window
> -
>
> Key: SPARK-36314
> URL: https://issues.apache.org/jira/browse/SPARK-36314
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.2.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Major
> Fix For: 3.2.0
>
>
> Currently, Sessionization examples are using flatMapGroupsWithState, which 
> can be replaced with native support of session windows. We can also probably 
> provide other complicated sessionization example which requites 
> flatMapGroupsWithState.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36320) Fix Series/Index.copy() to drop extra columns.

2021-07-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36320:


Assignee: Apache Spark

> Fix Series/Index.copy() to drop extra columns.
> --
>
> Key: SPARK-36320
> URL: https://issues.apache.org/jira/browse/SPARK-36320
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Takuya Ueshin
>Assignee: Apache Spark
>Priority: Major
>
> Currently {{Series}}/{{Index.copy()}} keeps the copy of the anchor DataFrame 
> which holds unnecessary columns.
> We can drop those when {{Series}}/{{Index.copy()}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36320) Fix Series/Index.copy() to drop extra columns.

2021-07-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36320:


Assignee: (was: Apache Spark)

> Fix Series/Index.copy() to drop extra columns.
> --
>
> Key: SPARK-36320
> URL: https://issues.apache.org/jira/browse/SPARK-36320
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Takuya Ueshin
>Priority: Major
>
> Currently {{Series}}/{{Index.copy()}} keeps the copy of the anchor DataFrame 
> which holds unnecessary columns.
> We can drop those when {{Series}}/{{Index.copy()}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36320) Fix Series/Index.copy() to drop extra columns.

2021-07-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17388392#comment-17388392
 ] 

Apache Spark commented on SPARK-36320:
--

User 'ueshin' has created a pull request for this issue:
https://github.com/apache/spark/pull/33549

> Fix Series/Index.copy() to drop extra columns.
> --
>
> Key: SPARK-36320
> URL: https://issues.apache.org/jira/browse/SPARK-36320
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Takuya Ueshin
>Priority: Major
>
> Currently {{Series}}/{{Index.copy()}} keeps the copy of the anchor DataFrame 
> which holds unnecessary columns.
> We can drop those when {{Series}}/{{Index.copy()}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-36320) Fix Series/Index.copy() to drop extra columns.

2021-07-27 Thread Takuya Ueshin (Jira)

Takuya Ueshin created SPARK-36320:
-

 Summary: Fix Series/Index.copy() to drop extra columns.
 Key: SPARK-36320
 URL: https://issues.apache.org/jira/browse/SPARK-36320
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 3.2.0
Reporter: Takuya Ueshin


Currently {{Series}}/{{Index.copy()}} keeps the copy of the anchor DataFrame 
which holds unnecessary columns.
We can drop those when {{Series}}/{{Index.copy()}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36099) Group exception messages in core/util

2021-07-27 Thread dgd_contributor (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17388387#comment-17388387
 ] 

dgd_contributor commented on SPARK-36099:
-

Sorry I wasn't checking the comment recently, I've done the work for the spark 
core but didn't create a pull request because I've been waiting for the approve 
in SPARK-36095.

Again, truly sorry for your wasted time.  [~Shockang]

> Group exception messages in core/util
> -
>
> Key: SPARK-36099
> URL: https://issues.apache.org/jira/browse/SPARK-36099
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Allison Wang
>Priority: Major
>
> 'core/src/main/scala/org/apache/spark/util'
> || Filename ||   Count ||
> | AccumulatorV2.scala  |   4 |
> | ClosureCleaner.scala |   1 |
> | DependencyUtils.scala|   1 |
> | KeyLock.scala|   1 |
> | ListenerBus.scala|   1 |
> | NextIterator.scala   |   1 |
> | SerializableBuffer.scala |   2 |
> | ThreadUtils.scala|   4 |
> | Utils.scala  |  16 |
> 'core/src/main/scala/org/apache/spark/util/collection'
> || Filename  ||   Count ||
> | AppendOnlyMap.scala   |   1 |
> | CompactBuffer.scala   |   1 |
> | ImmutableBitSet.scala |   6 |
> | MedianHeap.scala  |   1 |
> | OpenHashSet.scala |   2 |
> 'core/src/main/scala/org/apache/spark/util/io'
> || Filename||   Count ||
> | ChunkedByteBuffer.scala |   1 |
> 'core/src/main/scala/org/apache/spark/util/logging'
> || Filename   ||   Count ||
> | DriverLogger.scala |   1 |
> 'core/src/main/scala/org/apache/spark/util/random'
> || Filename||   Count ||
> | RandomSampler.scala |   1 |



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36310) Fix hasnan() window function in IndexOpsMixin

2021-07-27 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-36310:


Assignee: Xinrong Meng

> Fix hasnan() window function in IndexOpsMixin
> -
>
> Key: SPARK-36310
> URL: https://issues.apache.org/jira/browse/SPARK-36310
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
>
>  
> {code:java}
> File "/__w/spark/spark/python/pyspark/pandas/groupby.py", line 1497, in 
> pyspark.pandas.groupby.GroupBy.rank
> Failed example:
> df.groupby("a").rank().sort_index()
> Exception raised:
> ...
> pyspark.sql.utils.AnalysisException: It is not allowed to use a window 
> function inside an aggregate function. Please use the inner window function 
> in a sub-query.
> {code}
> As shown above, hasnans() used in "rank" causes "It is not allowed to use a 
> window function inside an aggregate function" exception.
> We shall adjust that.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-36310) Fix hasnan() window function in IndexOpsMixin

2021-07-27 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-36310.
--
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 33547
[https://github.com/apache/spark/pull/33547]

> Fix hasnan() window function in IndexOpsMixin
> -
>
> Key: SPARK-36310
> URL: https://issues.apache.org/jira/browse/SPARK-36310
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
> Fix For: 3.2.0
>
>
>  
> {code:java}
> File "/__w/spark/spark/python/pyspark/pandas/groupby.py", line 1497, in 
> pyspark.pandas.groupby.GroupBy.rank
> Failed example:
> df.groupby("a").rank().sort_index()
> Exception raised:
> ...
> pyspark.sql.utils.AnalysisException: It is not allowed to use a window 
> function inside an aggregate function. Please use the inner window function 
> in a sub-query.
> {code}
> As shown above, hasnans() used in "rank" causes "It is not allowed to use a 
> window function inside an aggregate function" exception.
> We shall adjust that.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-36094) Group SQL component error messages in Spark error class JSON file

2021-07-27 Thread Karen Feng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karen Feng updated SPARK-36094:
---
Description: 
To improve auditing, reduce duplication, and improve quality of error messages 
thrown from Spark, we should group them in a single JSON file (as discussed in 
the [mailing 
list|http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Add-error-IDs-td31126.html]
 and introduced in 
[SPARK-34920|#diff-d41e24da75af19647fadd76ad0b63ecb22b08c0004b07091e4603a30ec0fe013]).
 In this file, the error messages should be labeled according to a consistent 
error class and with a SQLSTATE.

We will start with the SQL component first.
 As a starting point, we can build off the exception grouping done in 
SPARK-33539. In total, there are ~1000 error messages to group split across 
three files (QueryCompilationErrors, QueryExecutionErrors, and 
QueryParsingErrors). In this ticket, each of these files is split into chunks 
of ~20 errors for refactoring.

Here is an example PR that groups a few error messages in the 
QueryCompilationErrors class: [PR 
33309|https://github.com/apache/spark/pull/33309].

[Guidelines|https://github.com/apache/spark/blob/master/core/src/main/resources/error/README.md]:
 - Error classes should be unique and sorted in alphabetical order.
 - Error classes should be unified as much as possible to improve auditing. If 
error messages are similar, group them into a single error class and add 
parameters to the error message.
 - SQLSTATE should match the ANSI/ISO standard, without introducing new classes 
or subclasses.
 - The Throwable should extend 
[SparkThrowable|https://github.com/apache/spark/blob/master/core/src/main/java/org/apache/spark/SparkThrowable.java];
 see 
[SparkArithmeticException|https://github.com/apache/spark/blob/f90eb6a5db0778fd18b0b544f93eac3103bbf03b/core/src/main/scala/org/apache/spark/SparkException.scala#L75]
 as an example of how to mix SparkThrowable into a base Exception type.

We will improve error message quality as a follow-up.

  was:
To improve auditing, reduce duplication, and improve quality of error messages 
thrown from Spark, we should group them in a single JSON file (as discussed in 
the [mailing 
list|http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Add-error-IDs-td31126.html]
 and introduced in 
[SPARK-34920|#diff-d41e24da75af19647fadd76ad0b63ecb22b08c0004b07091e4603a30ec0fe013]).
 In this file, the error messages should be labeled according to a consistent 
error class and with a SQLSTATE.

We will start with the SQL component first.
As a starting point, we can build off the exception grouping done in 
[SPARK-33539|https://issues.apache.org/jira/browse/SPARK-33539]. In total, 
there are ~1000 error messages to group split across three files 
(QueryCompilationErrors, QueryExecutionErrors, and QueryParsingErrors). In this 
ticket, each of these files is split into chunks of ~20 errors for refactoring.

Here is an example PR that groups a few error messages in the 
QueryCompilationErrors class: [PR 
33309|https://github.com/apache/spark/pull/33309].

[Guidelines|https://github.com/apache/spark/blob/master/core/src/main/resources/error/README.md]:

- Error classes should be de-duplicated as much as possible to improve 
auditing. If error messages are similar, group them into a single error class 
and add parameters to the error message.
- SQLSTATE should match the ANSI/ISO standard, without introducing new classes 
or subclasses.
- The Throwable should extend 
[SparkThrowable|https://github.com/apache/spark/blob/master/core/src/main/java/org/apache/spark/SparkThrowable.java];
 see 
[SparkArithmeticException|https://github.com/apache/spark/blob/f90eb6a5db0778fd18b0b544f93eac3103bbf03b/core/src/main/scala/org/apache/spark/SparkException.scala#L75]
 as an example of how to mix SparkThrowable into a base Exception type.

We will improve error message quality as a follow-up.


> Group SQL component error messages in Spark error class JSON file
> -
>
> Key: SPARK-36094
> URL: https://issues.apache.org/jira/browse/SPARK-36094
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 3.2.0
>Reporter: Karen Feng
>Priority: Major
>
> To improve auditing, reduce duplication, and improve quality of error 
> messages thrown from Spark, we should group them in a single JSON file (as 
> discussed in the [mailing 
> list|http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Add-error-IDs-td31126.html]
>  and introduced in 
> [SPARK-34920|#diff-d41e24da75af19647fadd76ad0b63ecb22b08c0004b07091e4603a30ec0fe013]).
>  In this file, the error messages should be labeled according to a consistent 
> error class and with a SQLSTATE.
> We will start with

[jira] [Resolved] (SPARK-35997) Implement comparison operators for CategoricalDtype in pandas API on Spark

2021-07-27 Thread Xinrong Meng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng resolved SPARK-35997.
--
Resolution: Done

> Implement comparison operators for CategoricalDtype in pandas API on Spark
> --
>
> Key: SPARK-35997
> URL: https://issues.apache.org/jira/browse/SPARK-35997
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Xinrong Meng
>Priority: Major
>
> In pandas API on Spark, "<, <=, >, >=" have not been implemented for 
> CategoricalDtype.
> We ought to match pandas' behavior.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36314) Update Sessionization example to use native support of session window

2021-07-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17388345#comment-17388345
 ] 

Apache Spark commented on SPARK-36314:
--

User 'HeartSaVioR' has created a pull request for this issue:
https://github.com/apache/spark/pull/33548

> Update Sessionization example to use native support of session window
> -
>
> Key: SPARK-36314
> URL: https://issues.apache.org/jira/browse/SPARK-36314
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.2.0
>Reporter: Jungtaek Lim
>Priority: Major
>
> Currently, Sessionization examples are using flatMapGroupsWithState, which 
> can be replaced with native support of session windows. We can also probably 
> provide other complicated sessionization example which requites 
> flatMapGroupsWithState.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36314) Update Sessionization example to use native support of session window

2021-07-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17388344#comment-17388344
 ] 

Apache Spark commented on SPARK-36314:
--

User 'HeartSaVioR' has created a pull request for this issue:
https://github.com/apache/spark/pull/33548

> Update Sessionization example to use native support of session window
> -
>
> Key: SPARK-36314
> URL: https://issues.apache.org/jira/browse/SPARK-36314
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.2.0
>Reporter: Jungtaek Lim
>Priority: Major
>
> Currently, Sessionization examples are using flatMapGroupsWithState, which 
> can be replaced with native support of session windows. We can also probably 
> provide other complicated sessionization example which requites 
> flatMapGroupsWithState.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36314) Update Sessionization example to use native support of session window

2021-07-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36314:


Assignee: (was: Apache Spark)

> Update Sessionization example to use native support of session window
> -
>
> Key: SPARK-36314
> URL: https://issues.apache.org/jira/browse/SPARK-36314
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.2.0
>Reporter: Jungtaek Lim
>Priority: Major
>
> Currently, Sessionization examples are using flatMapGroupsWithState, which 
> can be replaced with native support of session windows. We can also probably 
> provide other complicated sessionization example which requites 
> flatMapGroupsWithState.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36314) Update Sessionization example to use native support of session window

2021-07-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36314:


Assignee: Apache Spark

> Update Sessionization example to use native support of session window
> -
>
> Key: SPARK-36314
> URL: https://issues.apache.org/jira/browse/SPARK-36314
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.2.0
>Reporter: Jungtaek Lim
>Assignee: Apache Spark
>Priority: Major
>
> Currently, Sessionization examples are using flatMapGroupsWithState, which 
> can be replaced with native support of session windows. We can also probably 
> provide other complicated sessionization example which requites 
> flatMapGroupsWithState.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36190) Improve the rest of DataTypeOps tests by avoiding joins

2021-07-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36190:


Assignee: (was: Apache Spark)

> Improve the rest of DataTypeOps tests by avoiding joins
> ---
>
> Key: SPARK-36190
> URL: https://issues.apache.org/jira/browse/SPARK-36190
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Xinrong Meng
>Priority: Major
>
> bool, string, numeric DataTypeOps tests have been improved by avoiding joins.
> Improve the rest of DataTypeOps tests in the same way.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36310) Fix hasnan() window function in IndexOpsMixin

2021-07-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36310:


Assignee: (was: Apache Spark)

> Fix hasnan() window function in IndexOpsMixin
> -
>
> Key: SPARK-36310
> URL: https://issues.apache.org/jira/browse/SPARK-36310
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Xinrong Meng
>Priority: Major
>
>  
> {code:java}
> File "/__w/spark/spark/python/pyspark/pandas/groupby.py", line 1497, in 
> pyspark.pandas.groupby.GroupBy.rank
> Failed example:
> df.groupby("a").rank().sort_index()
> Exception raised:
> ...
> pyspark.sql.utils.AnalysisException: It is not allowed to use a window 
> function inside an aggregate function. Please use the inner window function 
> in a sub-query.
> {code}
> As shown above, hasnans() used in "rank" causes "It is not allowed to use a 
> window function inside an aggregate function" exception.
> We shall adjust that.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36310) Fix hasnan() window function in IndexOpsMixin

2021-07-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36310:


Assignee: Apache Spark

> Fix hasnan() window function in IndexOpsMixin
> -
>
> Key: SPARK-36310
> URL: https://issues.apache.org/jira/browse/SPARK-36310
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Xinrong Meng
>Assignee: Apache Spark
>Priority: Major
>
>  
> {code:java}
> File "/__w/spark/spark/python/pyspark/pandas/groupby.py", line 1497, in 
> pyspark.pandas.groupby.GroupBy.rank
> Failed example:
> df.groupby("a").rank().sort_index()
> Exception raised:
> ...
> pyspark.sql.utils.AnalysisException: It is not allowed to use a window 
> function inside an aggregate function. Please use the inner window function 
> in a sub-query.
> {code}
> As shown above, hasnans() used in "rank" causes "It is not allowed to use a 
> window function inside an aggregate function" exception.
> We shall adjust that.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36310) Fix hasnan() window function in IndexOpsMixin

2021-07-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17388312#comment-17388312
 ] 

Apache Spark commented on SPARK-36310:
--

User 'ueshin' has created a pull request for this issue:
https://github.com/apache/spark/pull/33547

> Fix hasnan() window function in IndexOpsMixin
> -
>
> Key: SPARK-36310
> URL: https://issues.apache.org/jira/browse/SPARK-36310
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Xinrong Meng
>Priority: Major
>
>  
> {code:java}
> File "/__w/spark/spark/python/pyspark/pandas/groupby.py", line 1497, in 
> pyspark.pandas.groupby.GroupBy.rank
> Failed example:
> df.groupby("a").rank().sort_index()
> Exception raised:
> ...
> pyspark.sql.utils.AnalysisException: It is not allowed to use a window 
> function inside an aggregate function. Please use the inner window function 
> in a sub-query.
> {code}
> As shown above, hasnans() used in "rank" causes "It is not allowed to use a 
> window function inside an aggregate function" exception.
> We shall adjust that.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36190) Improve the rest of DataTypeOps tests by avoiding joins

2021-07-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17388309#comment-17388309
 ] 

Apache Spark commented on SPARK-36190:
--

User 'xinrong-databricks' has created a pull request for this issue:
https://github.com/apache/spark/pull/33546

> Improve the rest of DataTypeOps tests by avoiding joins
> ---
>
> Key: SPARK-36190
> URL: https://issues.apache.org/jira/browse/SPARK-36190
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Xinrong Meng
>Priority: Major
>
> bool, string, numeric DataTypeOps tests have been improved by avoiding joins.
> Improve the rest of DataTypeOps tests in the same way.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36190) Improve the rest of DataTypeOps tests by avoiding joins

2021-07-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17388308#comment-17388308
 ] 

Apache Spark commented on SPARK-36190:
--

User 'xinrong-databricks' has created a pull request for this issue:
https://github.com/apache/spark/pull/33546

> Improve the rest of DataTypeOps tests by avoiding joins
> ---
>
> Key: SPARK-36190
> URL: https://issues.apache.org/jira/browse/SPARK-36190
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Xinrong Meng
>Priority: Major
>
> bool, string, numeric DataTypeOps tests have been improved by avoiding joins.
> Improve the rest of DataTypeOps tests in the same way.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36190) Improve the rest of DataTypeOps tests by avoiding joins

2021-07-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36190:


Assignee: Apache Spark

> Improve the rest of DataTypeOps tests by avoiding joins
> ---
>
> Key: SPARK-36190
> URL: https://issues.apache.org/jira/browse/SPARK-36190
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Xinrong Meng
>Assignee: Apache Spark
>Priority: Major
>
> bool, string, numeric DataTypeOps tests have been improved by avoiding joins.
> Improve the rest of DataTypeOps tests in the same way.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-36190) Improve the rest of DataTypeOps tests by avoiding joins

2021-07-27 Thread Xinrong Meng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-36190:
-
Summary: Improve the rest of DataTypeOps tests by avoiding joins  (was: 
Improve all DataTypeOps tests by avoiding joins)

> Improve the rest of DataTypeOps tests by avoiding joins
> ---
>
> Key: SPARK-36190
> URL: https://issues.apache.org/jira/browse/SPARK-36190
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Xinrong Meng
>Priority: Major
>
> bool, string, numeric DataTypeOps tests have been improved by avoiding joins.
> Improve the rest of DataTypeOps tests in the same way.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-36190) Improve all DataTypeOps tests by avoiding joins

2021-07-27 Thread Xinrong Meng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-36190:
-
Summary: Improve all DataTypeOps tests by avoiding joins  (was: Improve the 
rest of DataTypeOps tests by avoiding joins)

> Improve all DataTypeOps tests by avoiding joins
> ---
>
> Key: SPARK-36190
> URL: https://issues.apache.org/jira/browse/SPARK-36190
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Xinrong Meng
>Priority: Major
>
> bool, string, numeric DataTypeOps tests have been improved by avoiding joins.
> Improve the rest of DataTypeOps tests in the same way.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-36310) Fix hasnan() window function in IndexOpsMixin

2021-07-27 Thread Xinrong Meng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-36310:
-
Description: 
 
{code:java}
File "/__w/spark/spark/python/pyspark/pandas/groupby.py", line 1497, in 
pyspark.pandas.groupby.GroupBy.rank
Failed example:
df.groupby("a").rank().sort_index()
Exception raised:
...
pyspark.sql.utils.AnalysisException: It is not allowed to use a window function 
inside an aggregate function. Please use the inner window function in a 
sub-query.
{code}
As shown above, hasnans() used in "rank" causes "It is not allowed to use a 
window function inside an aggregate function" exception.

We shall adjust that.

 

  was:
 
{code:java}
File "/__w/spark/spark/python/pyspark/pandas/groupby.py", line 1497, in 
pyspark.pandas.groupby.GroupBy.rank
Failed example:
df.groupby("a").rank().sort_index()
Exception raised:
...
pyspark.sql.utils.AnalysisException: It is not allowed to use a window function 
inside an aggregate function. Please use the inner window function in a 
sub-query.
{code}
As shown above, hasnans() used in "rank" causes "It is not allowed to use a 
window function inside an aggregate function" exception.
any() and all() have the same issue.

We shall adjust that.

 


> Fix hasnan() window function in IndexOpsMixin
> -
>
> Key: SPARK-36310
> URL: https://issues.apache.org/jira/browse/SPARK-36310
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Xinrong Meng
>Priority: Major
>
>  
> {code:java}
> File "/__w/spark/spark/python/pyspark/pandas/groupby.py", line 1497, in 
> pyspark.pandas.groupby.GroupBy.rank
> Failed example:
> df.groupby("a").rank().sort_index()
> Exception raised:
> ...
> pyspark.sql.utils.AnalysisException: It is not allowed to use a window 
> function inside an aggregate function. Please use the inner window function 
> in a sub-query.
> {code}
> As shown above, hasnans() used in "rank" causes "It is not allowed to use a 
> window function inside an aggregate function" exception.
> We shall adjust that.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-36310) Fix hasnan() window function in IndexOpsMixin

2021-07-27 Thread Xinrong Meng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-36310:
-
Summary: Fix hasnan() window function in IndexOpsMixin  (was: Fix hasnan(), 
any(), and all() window function in IndexOpsMixin)

> Fix hasnan() window function in IndexOpsMixin
> -
>
> Key: SPARK-36310
> URL: https://issues.apache.org/jira/browse/SPARK-36310
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Xinrong Meng
>Priority: Major
>
>  
> {code:java}
> File "/__w/spark/spark/python/pyspark/pandas/groupby.py", line 1497, in 
> pyspark.pandas.groupby.GroupBy.rank
> Failed example:
> df.groupby("a").rank().sort_index()
> Exception raised:
> ...
> pyspark.sql.utils.AnalysisException: It is not allowed to use a window 
> function inside an aggregate function. Please use the inner window function 
> in a sub-query.
> {code}
> As shown above, hasnans() used in "rank" causes "It is not allowed to use a 
> window function inside an aggregate function" exception.
> any() and all() have the same issue.
> We shall adjust that.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36319) Have Observation return Map instead of Row

2021-07-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17388269#comment-17388269
 ] 

Apache Spark commented on SPARK-36319:
--

User 'EnricoMi' has created a pull request for this issue:
https://github.com/apache/spark/pull/33545

> Have Observation return Map instead of Row
> --
>
> Key: SPARK-36319
> URL: https://issues.apache.org/jira/browse/SPARK-36319
> Project: Spark
>  Issue Type: Improvement
>  Components: Java API, PySpark, SQL
>Affects Versions: 3.3.0
>Reporter: Enrico Minack
>Priority: Major
>
> As [~gurwls223] pointed out, the {{Observation}} API (Scala, Java, PySpark) 
> could return a {{Map}} / {{Dict}}. It currently returns {{Row}} simply 
> because the metrics are (internal to {{Observation}}) retrieved from the 
> listener as rows. Since that is hidden from the user by the {{Observation}} 
> API, there is no need to return {{Row}}.
> If there is some value in the original {{Row}}, both could be provided via 
> {{getAsRow}} and {{getAsMap}}.
> The {{Observation}} API has been added to Spark in unreleased 3.3.0, so it 
> should not be a blocker to remove the {{Row}} return type in 3.3.0 again.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36319) Have Observation return Map instead of Row

2021-07-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36319:


Assignee: (was: Apache Spark)

> Have Observation return Map instead of Row
> --
>
> Key: SPARK-36319
> URL: https://issues.apache.org/jira/browse/SPARK-36319
> Project: Spark
>  Issue Type: Improvement
>  Components: Java API, PySpark, SQL
>Affects Versions: 3.3.0
>Reporter: Enrico Minack
>Priority: Major
>
> As [~gurwls223] pointed out, the {{Observation}} API (Scala, Java, PySpark) 
> could return a {{Map}} / {{Dict}}. It currently returns {{Row}} simply 
> because the metrics are (internal to {{Observation}}) retrieved from the 
> listener as rows. Since that is hidden from the user by the {{Observation}} 
> API, there is no need to return {{Row}}.
> If there is some value in the original {{Row}}, both could be provided via 
> {{getAsRow}} and {{getAsMap}}.
> The {{Observation}} API has been added to Spark in unreleased 3.3.0, so it 
> should not be a blocker to remove the {{Row}} return type in 3.3.0 again.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36319) Have Observation return Map instead of Row

2021-07-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36319:


Assignee: Apache Spark

> Have Observation return Map instead of Row
> --
>
> Key: SPARK-36319
> URL: https://issues.apache.org/jira/browse/SPARK-36319
> Project: Spark
>  Issue Type: Improvement
>  Components: Java API, PySpark, SQL
>Affects Versions: 3.3.0
>Reporter: Enrico Minack
>Assignee: Apache Spark
>Priority: Major
>
> As [~gurwls223] pointed out, the {{Observation}} API (Scala, Java, PySpark) 
> could return a {{Map}} / {{Dict}}. It currently returns {{Row}} simply 
> because the metrics are (internal to {{Observation}}) retrieved from the 
> listener as rows. Since that is hidden from the user by the {{Observation}} 
> API, there is no need to return {{Row}}.
> If there is some value in the original {{Row}}, both could be provided via 
> {{getAsRow}} and {{getAsMap}}.
> The {{Observation}} API has been added to Spark in unreleased 3.3.0, so it 
> should not be a blocker to remove the {{Row}} return type in 3.3.0 again.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36319) Have Observation return Map instead of Row

2021-07-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17388268#comment-17388268
 ] 

Apache Spark commented on SPARK-36319:
--

User 'EnricoMi' has created a pull request for this issue:
https://github.com/apache/spark/pull/33545

> Have Observation return Map instead of Row
> --
>
> Key: SPARK-36319
> URL: https://issues.apache.org/jira/browse/SPARK-36319
> Project: Spark
>  Issue Type: Improvement
>  Components: Java API, PySpark, SQL
>Affects Versions: 3.3.0
>Reporter: Enrico Minack
>Priority: Major
>
> As [~gurwls223] pointed out, the {{Observation}} API (Scala, Java, PySpark) 
> could return a {{Map}} / {{Dict}}. It currently returns {{Row}} simply 
> because the metrics are (internal to {{Observation}}) retrieved from the 
> listener as rows. Since that is hidden from the user by the {{Observation}} 
> API, there is no need to return {{Row}}.
> If there is some value in the original {{Row}}, both could be provided via 
> {{getAsRow}} and {{getAsMap}}.
> The {{Observation}} API has been added to Spark in unreleased 3.3.0, so it 
> should not be a blocker to remove the {{Row}} return type in 3.3.0 again.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34927) Support TPCDSQueryBenchmark in Benchmarks

2021-07-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17388265#comment-17388265
 ] 

Apache Spark commented on SPARK-34927:
--

User 'MyeongKim' has created a pull request for this issue:
https://github.com/apache/spark/pull/33544

> Support TPCDSQueryBenchmark in Benchmarks
> -
>
> Key: SPARK-34927
> URL: https://issues.apache.org/jira/browse/SPARK-34927
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Tests
>Affects Versions: 3.2.0
>Reporter: Hyukjin Kwon
>Priority: Minor
>
> Benchmarks.scala currently does not support TPCDSQueryBenchmark. We should 
> make it supported. See also 
> https://github.com/apache/spark/pull/32015#issuecomment-89046



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34927) Support TPCDSQueryBenchmark in Benchmarks

2021-07-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34927:


Assignee: (was: Apache Spark)

> Support TPCDSQueryBenchmark in Benchmarks
> -
>
> Key: SPARK-34927
> URL: https://issues.apache.org/jira/browse/SPARK-34927
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Tests
>Affects Versions: 3.2.0
>Reporter: Hyukjin Kwon
>Priority: Minor
>
> Benchmarks.scala currently does not support TPCDSQueryBenchmark. We should 
> make it supported. See also 
> https://github.com/apache/spark/pull/32015#issuecomment-89046



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34927) Support TPCDSQueryBenchmark in Benchmarks

2021-07-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34927:


Assignee: Apache Spark

> Support TPCDSQueryBenchmark in Benchmarks
> -
>
> Key: SPARK-34927
> URL: https://issues.apache.org/jira/browse/SPARK-34927
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Tests
>Affects Versions: 3.2.0
>Reporter: Hyukjin Kwon
>Assignee: Apache Spark
>Priority: Minor
>
> Benchmarks.scala currently does not support TPCDSQueryBenchmark. We should 
> make it supported. See also 
> https://github.com/apache/spark/pull/32015#issuecomment-89046



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34927) Support TPCDSQueryBenchmark in Benchmarks

2021-07-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17388266#comment-17388266
 ] 

Apache Spark commented on SPARK-34927:
--

User 'MyeongKim' has created a pull request for this issue:
https://github.com/apache/spark/pull/33544

> Support TPCDSQueryBenchmark in Benchmarks
> -
>
> Key: SPARK-34927
> URL: https://issues.apache.org/jira/browse/SPARK-34927
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Tests
>Affects Versions: 3.2.0
>Reporter: Hyukjin Kwon
>Priority: Minor
>
> Benchmarks.scala currently does not support TPCDSQueryBenchmark. We should 
> make it supported. See also 
> https://github.com/apache/spark/pull/32015#issuecomment-89046



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-36094) Group SQL component error messages in Spark error class JSON file

2021-07-27 Thread Karen Feng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karen Feng updated SPARK-36094:
---
Description: 
To improve auditing, reduce duplication, and improve quality of error messages 
thrown from Spark, we should group them in a single JSON file (as discussed in 
the [mailing 
list|http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Add-error-IDs-td31126.html]
 and introduced in 
[SPARK-34920|#diff-d41e24da75af19647fadd76ad0b63ecb22b08c0004b07091e4603a30ec0fe013]).
 In this file, the error messages should be labeled according to a consistent 
error class and with a SQLSTATE.

We will start with the SQL component first.
As a starting point, we can build off the exception grouping done in 
[SPARK-33539|https://issues.apache.org/jira/browse/SPARK-33539]. In total, 
there are ~1000 error messages to group split across three files 
(QueryCompilationErrors, QueryExecutionErrors, and QueryParsingErrors). In this 
ticket, each of these files is split into chunks of ~20 errors for refactoring.

Here is an example PR that groups a few error messages in the 
QueryCompilationErrors class: [PR 
33309|https://github.com/apache/spark/pull/33309].

[Guidelines|https://github.com/apache/spark/blob/master/core/src/main/resources/error/README.md]:

- Error classes should be de-duplicated as much as possible to improve 
auditing. If error messages are similar, group them into a single error class 
and add parameters to the error message.
- SQLSTATE should match the ANSI/ISO standard, without introducing new classes 
or subclasses.
- The Throwable should extend 
[SparkThrowable|https://github.com/apache/spark/blob/master/core/src/main/java/org/apache/spark/SparkThrowable.java];
 see 
[SparkArithmeticException|https://github.com/apache/spark/blob/f90eb6a5db0778fd18b0b544f93eac3103bbf03b/core/src/main/scala/org/apache/spark/SparkException.scala#L75]
 as an example of how to mix SparkThrowable into a base Exception type.

We will improve error message quality as a follow-up.

  was:
To improve auditing, reduce duplication, and improve quality of error messages 
thrown from Spark, we should group them in a single JSON file (as discussed in 
the [mailing 
list|http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Add-error-IDs-td31126.html]
 and introduced in 
[SPARK-34920|#diff-d41e24da75af19647fadd76ad0b63ecb22b08c0004b07091e4603a30ec0fe013]).
 In this file, the error messages should be labeled according to a consistent 
error class and with a SQLSTATE.

We will start with the SQL component first.
As a starting point, we can build off the exception grouping done in 
[SPARK-33539|https://issues.apache.org/jira/browse/SPARK-33539]. In total, 
there are ~1000 error messages to group split across three files 
(QueryCompilationErrors, QueryExecutionErrors, and QueryParsingErrors). In this 
ticket, each of these files is split into chunks of ~20 errors for refactoring.

As a guideline, the error classes should be de-duplicated as much as possible 
to improve auditing.
We will improve error message quality as a follow-up.

Here is an example PR that groups a few error messages in the 
QueryCompilationErrors class: [PR 
33309|https://github.com/apache/spark/pull/33309].


> Group SQL component error messages in Spark error class JSON file
> -
>
> Key: SPARK-36094
> URL: https://issues.apache.org/jira/browse/SPARK-36094
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 3.2.0
>Reporter: Karen Feng
>Priority: Major
>
> To improve auditing, reduce duplication, and improve quality of error 
> messages thrown from Spark, we should group them in a single JSON file (as 
> discussed in the [mailing 
> list|http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Add-error-IDs-td31126.html]
>  and introduced in 
> [SPARK-34920|#diff-d41e24da75af19647fadd76ad0b63ecb22b08c0004b07091e4603a30ec0fe013]).
>  In this file, the error messages should be labeled according to a consistent 
> error class and with a SQLSTATE.
> We will start with the SQL component first.
> As a starting point, we can build off the exception grouping done in 
> [SPARK-33539|https://issues.apache.org/jira/browse/SPARK-33539]. In total, 
> there are ~1000 error messages to group split across three files 
> (QueryCompilationErrors, QueryExecutionErrors, and QueryParsingErrors). In 
> this ticket, each of these files is split into chunks of ~20 errors for 
> refactoring.
> Here is an example PR that groups a few error messages in the 
> QueryCompilationErrors class: [PR 
> 33309|https://github.com/apache/spark/pull/33309].
> [Guidelines|https://github.com/apache/spark/blob/master/core/src/main/resources/error/README.md]:
> - Error classes should be

[jira] [Created] (SPARK-36319) Have Observation return Map instead of Row

2021-07-27 Thread Enrico Minack (Jira)

Enrico Minack created SPARK-36319:
-

 Summary: Have Observation return Map instead of Row
 Key: SPARK-36319
 URL: https://issues.apache.org/jira/browse/SPARK-36319
 Project: Spark
  Issue Type: Improvement
  Components: Java API, PySpark, SQL
Affects Versions: 3.3.0
Reporter: Enrico Minack


As [~gurwls223] pointed out, the {{Observation}} API (Scala, Java, PySpark) 
could return a {{Map}} / {{Dict}}. It currently returns {{Row}} simply because 
the metrics are (internal to {{Observation}}) retrieved from the listener as 
rows. Since that is hidden from the user by the {{Observation}} API, there is 
no need to return {{Row}}.

If there is some value in the original {{Row}}, both could be provided via 
{{getAsRow}} and {{getAsMap}}.

The {{Observation}} API has been added to Spark in unreleased 3.3.0, so it 
should not be a blocker to remove the {{Row}} return type in 3.3.0 again.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36318) Update docs about mapping of ANSI interval types to Java/Scala/SQL types

2021-07-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36318:


Assignee: Max Gekk  (was: Apache Spark)

> Update docs about mapping of ANSI interval types to Java/Scala/SQL types
> 
>
> Key: SPARK-36318
> URL: https://issues.apache.org/jira/browse/SPARK-36318
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>
> Update tables in https://spark.apache.org/docs/latest/sql-ref-datatypes.html 
> regarding to mapping types to language API types.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36318) Update docs about mapping of ANSI interval types to Java/Scala/SQL types

2021-07-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17388236#comment-17388236
 ] 

Apache Spark commented on SPARK-36318:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/33543

> Update docs about mapping of ANSI interval types to Java/Scala/SQL types
> 
>
> Key: SPARK-36318
> URL: https://issues.apache.org/jira/browse/SPARK-36318
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>
> Update tables in https://spark.apache.org/docs/latest/sql-ref-datatypes.html 
> regarding to mapping types to language API types.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36318) Update docs about mapping of ANSI interval types to Java/Scala/SQL types

2021-07-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36318:


Assignee: Apache Spark  (was: Max Gekk)

> Update docs about mapping of ANSI interval types to Java/Scala/SQL types
> 
>
> Key: SPARK-36318
> URL: https://issues.apache.org/jira/browse/SPARK-36318
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Max Gekk
>Assignee: Apache Spark
>Priority: Major
>
> Update tables in https://spark.apache.org/docs/latest/sql-ref-datatypes.html 
> regarding to mapping types to language API types.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-36318) Update docs about mapping of ANSI interval types to Java/Scala/SQL types

2021-07-27 Thread Max Gekk (Jira)

Max Gekk created SPARK-36318:


 Summary: Update docs about mapping of ANSI interval types to 
Java/Scala/SQL types
 Key: SPARK-36318
 URL: https://issues.apache.org/jira/browse/SPARK-36318
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Max Gekk
Assignee: Max Gekk


Update tables in https://spark.apache.org/docs/latest/sql-ref-datatypes.html 
regarding to mapping types to language API types.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-36263) Add Dataset.observe(Observation, Column, Column*) to PySpark

2021-07-27 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-36263.
-
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 33484
[https://github.com/apache/spark/pull/33484]

> Add Dataset.observe(Observation, Column, Column*) to PySpark
> 
>
> Key: SPARK-36263
> URL: https://issues.apache.org/jira/browse/SPARK-36263
> Project: Spark
>  Issue Type: New Feature
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Enrico Minack
>Assignee: Enrico Minack
>Priority: Major
> Fix For: 3.3.0
>
>
> With SPARK-34806 we now have a way to use the `Dataset.observe` method 
> without the need to interact with 
> `org.apache.spark.sql.util.QueryExecutionListener`. This allows us to easily 
> retrieve observations in PySpark.
> Adding a `Dataset.observe(Observation, Column, Column*)` equivalent to 
> PySpark's `DataFrame` is straightforward while it allows to utilise 
> observations from Python.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36263) Add Dataset.observe(Observation, Column, Column*) to PySpark

2021-07-27 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-36263:
---

Assignee: Enrico Minack

> Add Dataset.observe(Observation, Column, Column*) to PySpark
> 
>
> Key: SPARK-36263
> URL: https://issues.apache.org/jira/browse/SPARK-36263
> Project: Spark
>  Issue Type: New Feature
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Enrico Minack
>Assignee: Enrico Minack
>Priority: Major
>
> With SPARK-34806 we now have a way to use the `Dataset.observe` method 
> without the need to interact with 
> `org.apache.spark.sql.util.QueryExecutionListener`. This allows us to easily 
> retrieve observations in PySpark.
> Adding a `Dataset.observe(Observation, Column, Column*)` equivalent to 
> PySpark's `DataFrame` is straightforward while it allows to utilise 
> observations from Python.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-36185) Implement functions in CategoricalAccessor/CategoricalIndex

2021-07-27 Thread Takuya Ueshin (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin resolved SPARK-36185.
---
Fix Version/s: 3.2.0
   Resolution: Done

I'd close this since we've done the tasks we planned under this umbrella ticket.

> Implement functions in CategoricalAccessor/CategoricalIndex
> ---
>
> Key: SPARK-36185
> URL: https://issues.apache.org/jira/browse/SPARK-36185
> Project: Spark
>  Issue Type: Umbrella
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Takuya Ueshin
>Priority: Major
> Fix For: 3.2.0
>
>
> There are functions we haven't implemented in {{CategoricalAccessor}} and 
> {{CategoricalIndex}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36317) PruneFileSourcePartitionsSuite tests are failing after the fix to SPARK-36136

2021-07-27 Thread Chao Sun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17388204#comment-17388204
 ] 

Chao Sun commented on SPARK-36317:
--

[~vsowrirajan]: the change is already reverted - are you still seeing the test 
failures?

> PruneFileSourcePartitionsSuite tests are failing after the fix to SPARK-36136
> -
>
> Key: SPARK-36317
> URL: https://issues.apache.org/jira/browse/SPARK-36317
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Venkata krishnan Sowrirajan
>Priority: Major
>
> After the fix to [SPARK-36136][SQL][TESTS] Refactor 
> PruneFileSourcePartitionsSuite etc to a different package, couple of tests in 
> PruneFileSourcePartitionsSuite are failing now.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-36242) Ensure spill file closed before set success to true in ExternalSorter.spillMemoryIteratorToDisk method

2021-07-27 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-36242:
--
Affects Version/s: 3.2.0
   3.0.3
   3.1.2

> Ensure spill file closed before set success to true in 
> ExternalSorter.spillMemoryIteratorToDisk method
> --
>
> Key: SPARK-36242
> URL: https://issues.apache.org/jira/browse/SPARK-36242
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.3, 3.1.2, 3.2.0, 3.3.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
> Fix For: 3.2.0, 3.1.3, 3.0.4, 3.3.0
>
>
> The processes of ExternalSorter.spillMemoryIteratorToDisk and 
> ExternalAppendOnlyMap.spillMemoryIteratorToDisk are similar, but there are 
> some differences in setting `success = true`
>  
> Code of ExternalSorter.spillMemoryIteratorToDisk as follows:
>  
> {code:java}
>   if (objectsWritten > 0) {
> flush()
>   } else {
> writer.revertPartialWritesAndClose()
>   }
>   success = true
> } finally {
>   if (success) {
> writer.close()
>   } else {
> ...
>   }
> }{code}
> Code of ExternalSorter.spillMemoryIteratorToDisk as follows:
> {code:java}
>   if (objectsWritten > 0) {
> flush()
> writer.close()
>   } else {
> writer.revertPartialWritesAndClose()
>   }
>   success = true
> } finally {
>   if (!success) {
> ...
>   }
> }{code}
> It seems that the processing of `ExternalSorter.spillMemoryIteratorToDisk` 
> mehod is more reasonable, We should make sure setting `success = true` after 
> the spill file is closed
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-36242) Ensure spill file closed before set success to true in ExternalSorter.spillMemoryIteratorToDisk method

2021-07-27 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-36242:
--
Fix Version/s: 3.0.4

> Ensure spill file closed before set success to true in 
> ExternalSorter.spillMemoryIteratorToDisk method
> --
>
> Key: SPARK-36242
> URL: https://issues.apache.org/jira/browse/SPARK-36242
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
> Fix For: 3.2.0, 3.1.3, 3.0.4, 3.3.0
>
>
> The processes of ExternalSorter.spillMemoryIteratorToDisk and 
> ExternalAppendOnlyMap.spillMemoryIteratorToDisk are similar, but there are 
> some differences in setting `success = true`
>  
> Code of ExternalSorter.spillMemoryIteratorToDisk as follows:
>  
> {code:java}
>   if (objectsWritten > 0) {
> flush()
>   } else {
> writer.revertPartialWritesAndClose()
>   }
>   success = true
> } finally {
>   if (success) {
> writer.close()
>   } else {
> ...
>   }
> }{code}
> Code of ExternalSorter.spillMemoryIteratorToDisk as follows:
> {code:java}
>   if (objectsWritten > 0) {
> flush()
> writer.close()
>   } else {
> writer.revertPartialWritesAndClose()
>   }
>   success = true
> } finally {
>   if (!success) {
> ...
>   }
> }{code}
> It seems that the processing of `ExternalSorter.spillMemoryIteratorToDisk` 
> mehod is more reasonable, We should make sure setting `success = true` after 
> the spill file is closed
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-36242) Ensure spill file closed before set success to true in ExternalSorter.spillMemoryIteratorToDisk method

2021-07-27 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-36242:
--
Issue Type: Bug  (was: Improvement)

> Ensure spill file closed before set success to true in 
> ExternalSorter.spillMemoryIteratorToDisk method
> --
>
> Key: SPARK-36242
> URL: https://issues.apache.org/jira/browse/SPARK-36242
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
> Fix For: 3.2.0, 3.1.3, 3.0.4, 3.3.0
>
>
> The processes of ExternalSorter.spillMemoryIteratorToDisk and 
> ExternalAppendOnlyMap.spillMemoryIteratorToDisk are similar, but there are 
> some differences in setting `success = true`
>  
> Code of ExternalSorter.spillMemoryIteratorToDisk as follows:
>  
> {code:java}
>   if (objectsWritten > 0) {
> flush()
>   } else {
> writer.revertPartialWritesAndClose()
>   }
>   success = true
> } finally {
>   if (success) {
> writer.close()
>   } else {
> ...
>   }
> }{code}
> Code of ExternalSorter.spillMemoryIteratorToDisk as follows:
> {code:java}
>   if (objectsWritten > 0) {
> flush()
> writer.close()
>   } else {
> writer.revertPartialWritesAndClose()
>   }
>   success = true
> } finally {
>   if (!success) {
> ...
>   }
> }{code}
> It seems that the processing of `ExternalSorter.spillMemoryIteratorToDisk` 
> mehod is more reasonable, We should make sure setting `success = true` after 
> the spill file is closed
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-36242) Ensure spill file closed before set success to true in ExternalSorter.spillMemoryIteratorToDisk method

2021-07-27 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-36242:
--
Fix Version/s: 3.1.3

> Ensure spill file closed before set success to true in 
> ExternalSorter.spillMemoryIteratorToDisk method
> --
>
> Key: SPARK-36242
> URL: https://issues.apache.org/jira/browse/SPARK-36242
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
> Fix For: 3.2.0, 3.1.3, 3.3.0
>
>
> The processes of ExternalSorter.spillMemoryIteratorToDisk and 
> ExternalAppendOnlyMap.spillMemoryIteratorToDisk are similar, but there are 
> some differences in setting `success = true`
>  
> Code of ExternalSorter.spillMemoryIteratorToDisk as follows:
>  
> {code:java}
>   if (objectsWritten > 0) {
> flush()
>   } else {
> writer.revertPartialWritesAndClose()
>   }
>   success = true
> } finally {
>   if (success) {
> writer.close()
>   } else {
> ...
>   }
> }{code}
> Code of ExternalSorter.spillMemoryIteratorToDisk as follows:
> {code:java}
>   if (objectsWritten > 0) {
> flush()
> writer.close()
>   } else {
> writer.revertPartialWritesAndClose()
>   }
>   success = true
> } finally {
>   if (!success) {
> ...
>   }
> }{code}
> It seems that the processing of `ExternalSorter.spillMemoryIteratorToDisk` 
> mehod is more reasonable, We should make sure setting `success = true` after 
> the spill file is closed
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34399) Add file commit time to metrics and shown in SQL Tab UI

2021-07-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17388195#comment-17388195
 ] 

Apache Spark commented on SPARK-34399:
--

User 'gengliangwang' has created a pull request for this issue:
https://github.com/apache/spark/pull/33542

> Add file commit time to metrics and shown in SQL Tab UI
> ---
>
> Key: SPARK-34399
> URL: https://issues.apache.org/jira/browse/SPARK-34399
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Priority: Major
>
> Add file commit time to metrics and shown in SQL Tab UI



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-36317) PruneFileSourcePartitionsSuite tests are failing after the fix to SPARK-36136

2021-07-27 Thread Venkata krishnan Sowrirajan (Jira)

Venkata krishnan Sowrirajan created SPARK-36317:
---

 Summary: PruneFileSourcePartitionsSuite tests are failing after 
the fix to SPARK-36136
 Key: SPARK-36317
 URL: https://issues.apache.org/jira/browse/SPARK-36317
 Project: Spark
  Issue Type: Test
  Components: SQL
Affects Versions: 3.2.0
Reporter: Venkata krishnan Sowrirajan


After the fix to [SPARK-36136][SQL][TESTS] Refactor 
PruneFileSourcePartitionsSuite etc to a different package, couple of tests in 
PruneFileSourcePartitionsSuite are failing now.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-36316) NoClassDefFoundError for org.slf4j.impl.StaticLoggerBinder in org.apache.spark.Logging#isLog4j12 when using SLF4J/Logback 2.x

2021-07-27 Thread Ian Springer (Jira)

Ian Springer created SPARK-36316:


 Summary: NoClassDefFoundError for 
org.slf4j.impl.StaticLoggerBinder in org.apache.spark.Logging#isLog4j12 when 
using SLF4J/Logback 2.x
 Key: SPARK-36316
 URL: https://issues.apache.org/jira/browse/SPARK-36316
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.1.2
Reporter: Ian Springer


When using SLF4J 2.x, I hit the following exception:

 
java.lang.NoClassDefFoundError: org/slf4j/impl/StaticLoggerBinder
Caused by: java.lang.ClassNotFoundException: org.slf4j.impl.StaticLoggerBinder
 

This is because org.slf4j.impl.StaticLoggerBinder no longer exists in SLF4J 2.x 
(see [http://www.slf4j.org/codes.html#StaticLoggerBinder).] Ideally, Spark 
should not have a hard dependency on an SFL4J 1.x impl classes. 

 

Perhaps reflection or NoClassDefFoundError try-catch blocks could be used in 
the logger detection code, so both SLF4J 1.x and 2.x could be supported at 
runtime.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36315) Only skip AQEShuffleReadRule in the final stage if it breaks the distribution requirement

2021-07-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17388171#comment-17388171
 ] 

Apache Spark commented on SPARK-36315:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/33541

> Only skip AQEShuffleReadRule in the final stage if it breaks the distribution 
> requirement
> -
>
> Key: SPARK-36315
> URL: https://issues.apache.org/jira/browse/SPARK-36315
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Wenchen Fan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36315) Only skip AQEShuffleReadRule in the final stage if it breaks the distribution requirement

2021-07-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36315:


Assignee: Apache Spark

> Only skip AQEShuffleReadRule in the final stage if it breaks the distribution 
> requirement
> -
>
> Key: SPARK-36315
> URL: https://issues.apache.org/jira/browse/SPARK-36315
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Wenchen Fan
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36315) Only skip AQEShuffleReadRule in the final stage if it breaks the distribution requirement

2021-07-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36315:


Assignee: (was: Apache Spark)

> Only skip AQEShuffleReadRule in the final stage if it breaks the distribution 
> requirement
> -
>
> Key: SPARK-36315
> URL: https://issues.apache.org/jira/browse/SPARK-36315
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Wenchen Fan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36315) Only skip AQEShuffleReadRule in the final stage if it breaks the distribution requirement

2021-07-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17388169#comment-17388169
 ] 

Apache Spark commented on SPARK-36315:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/33541

> Only skip AQEShuffleReadRule in the final stage if it breaks the distribution 
> requirement
> -
>
> Key: SPARK-36315
> URL: https://issues.apache.org/jira/browse/SPARK-36315
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Wenchen Fan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-36315) Only skip AQEShuffleReadRule in the final stage if it breaks the distribution requirement

2021-07-27 Thread Wenchen Fan (Jira)

Wenchen Fan created SPARK-36315:
---

 Summary: Only skip AQEShuffleReadRule in the final stage if it 
breaks the distribution requirement
 Key: SPARK-36315
 URL: https://issues.apache.org/jira/browse/SPARK-36315
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.2.0
Reporter: Wenchen Fan






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-36086) The case of the delta table is inconsistent with parquet

2021-07-27 Thread Ruslan Krivoshein (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17388156#comment-17388156
 ] 

Ruslan Krivoshein edited comment on SPARK-36086 at 7/27/21, 4:03 PM:
-

Let me handle it, please


was (Author: krivosheinruslan):
Let me get on with it, please

> The case of the delta table is inconsistent with parquet
> 
>
> Key: SPARK-36086
> URL: https://issues.apache.org/jira/browse/SPARK-36086
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.1
>Reporter: Yuming Wang
>Priority: Major
>
> How to reproduce this issue:
> {noformat}
> 1. Add delta-core_2.12-1.0.0-SNAPSHOT.jar to ${SPARK_HOME}/jars.
> 2. bin/spark-shell --conf 
> spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension --conf 
> spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog
> {noformat}
> {code:scala}
> spark.sql("create table t1 using parquet as select id, id as lower_id from 
> range(5)")
> spark.sql("CREATE VIEW v1 as SELECT * FROM t1")
> spark.sql("CREATE TABLE t2 USING DELTA PARTITIONED BY (LOWER_ID) SELECT 
> LOWER_ID, ID FROM v1")
> spark.sql("CREATE TABLE t3 USING PARQUET PARTITIONED BY (LOWER_ID) SELECT 
> LOWER_ID, ID FROM v1")
> spark.sql("desc extended t2").show(false)
> spark.sql("desc extended t3").show(false)
> {code}
> {noformat}
> scala> spark.sql("desc extended t2").show(false)
> ++--+---+
> |col_name|data_type   
>   |comment|
> ++--+---+
> |lower_id|bigint  
>   |   |
> |id  |bigint  
>   |   |
> ||
>   |   |
> |# Partitioning  |
>   |   |
> |Part 0  |lower_id
>   |   |
> ||
>   |   |
> |# Detailed Table Information|
>   |   |
> |Name|default.t2  
>   |   |
> |Location
> |file:/Users/yumwang/Downloads/spark-3.1.1-bin-hadoop2.7/spark-warehouse/t2|  
>  |
> |Provider|delta   
>   |   |
> |Table Properties
> |[Type=MANAGED,delta.minReaderVersion=1,delta.minWriterVersion=2]  |  
>  |
> ++--+---+
> scala> spark.sql("desc extended t3").show(false)
> ++--+---+
> |col_name|data_type   
>   |comment|
> ++--+---+
> |ID  |bigint  
>   |null   |
> |LOWER_ID|bigint  
>   |null   |
> |# Partition Information |
>   |   |
> |# col_name  |data_type   
>   |comment|
> |LOWER_ID|bigint  
>   |null   |
> ||
>   |   |
> |# Detailed Table Information|
>   |   |
> |Database|default 
>   |   |
> |Table   |t3  
>   |   |
> |Owner   |yumwang 
>   |   |
>

[jira] [Commented] (SPARK-36086) The case of the delta table is inconsistent with parquet

2021-07-27 Thread Ruslan Krivoshein (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17388156#comment-17388156
 ] 

Ruslan Krivoshein commented on SPARK-36086:
---

Let me get on with it, please

> The case of the delta table is inconsistent with parquet
> 
>
> Key: SPARK-36086
> URL: https://issues.apache.org/jira/browse/SPARK-36086
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.1
>Reporter: Yuming Wang
>Priority: Major
>
> How to reproduce this issue:
> {noformat}
> 1. Add delta-core_2.12-1.0.0-SNAPSHOT.jar to ${SPARK_HOME}/jars.
> 2. bin/spark-shell --conf 
> spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension --conf 
> spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog
> {noformat}
> {code:scala}
> spark.sql("create table t1 using parquet as select id, id as lower_id from 
> range(5)")
> spark.sql("CREATE VIEW v1 as SELECT * FROM t1")
> spark.sql("CREATE TABLE t2 USING DELTA PARTITIONED BY (LOWER_ID) SELECT 
> LOWER_ID, ID FROM v1")
> spark.sql("CREATE TABLE t3 USING PARQUET PARTITIONED BY (LOWER_ID) SELECT 
> LOWER_ID, ID FROM v1")
> spark.sql("desc extended t2").show(false)
> spark.sql("desc extended t3").show(false)
> {code}
> {noformat}
> scala> spark.sql("desc extended t2").show(false)
> ++--+---+
> |col_name|data_type   
>   |comment|
> ++--+---+
> |lower_id|bigint  
>   |   |
> |id  |bigint  
>   |   |
> ||
>   |   |
> |# Partitioning  |
>   |   |
> |Part 0  |lower_id
>   |   |
> ||
>   |   |
> |# Detailed Table Information|
>   |   |
> |Name|default.t2  
>   |   |
> |Location
> |file:/Users/yumwang/Downloads/spark-3.1.1-bin-hadoop2.7/spark-warehouse/t2|  
>  |
> |Provider|delta   
>   |   |
> |Table Properties
> |[Type=MANAGED,delta.minReaderVersion=1,delta.minWriterVersion=2]  |  
>  |
> ++--+---+
> scala> spark.sql("desc extended t3").show(false)
> ++--+---+
> |col_name|data_type   
>   |comment|
> ++--+---+
> |ID  |bigint  
>   |null   |
> |LOWER_ID|bigint  
>   |null   |
> |# Partition Information |
>   |   |
> |# col_name  |data_type   
>   |comment|
> |LOWER_ID|bigint  
>   |null   |
> ||
>   |   |
> |# Detailed Table Information|
>   |   |
> |Database|default 
>   |   |
> |Table   |t3  
>   |   |
> |Owner   |yumwang 
>   |   |
> |Created Time|Mon Jul 12 14:07:16 CST 2021
>   |

[jira] [Commented] (SPARK-36099) Group exception messages in core/util

2021-07-27 Thread Shockang (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17388107#comment-17388107
 ] 

Shockang commented on SPARK-36099:
--

It's good for you to wait a day. I was going to submit the PR tonight. Why is 
it so coincidental![~dc-heros]

> Group exception messages in core/util
> -
>
> Key: SPARK-36099
> URL: https://issues.apache.org/jira/browse/SPARK-36099
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Allison Wang
>Priority: Major
>
> 'core/src/main/scala/org/apache/spark/util'
> || Filename ||   Count ||
> | AccumulatorV2.scala  |   4 |
> | ClosureCleaner.scala |   1 |
> | DependencyUtils.scala|   1 |
> | KeyLock.scala|   1 |
> | ListenerBus.scala|   1 |
> | NextIterator.scala   |   1 |
> | SerializableBuffer.scala |   2 |
> | ThreadUtils.scala|   4 |
> | Utils.scala  |  16 |
> 'core/src/main/scala/org/apache/spark/util/collection'
> || Filename  ||   Count ||
> | AppendOnlyMap.scala   |   1 |
> | CompactBuffer.scala   |   1 |
> | ImmutableBitSet.scala |   6 |
> | MedianHeap.scala  |   1 |
> | OpenHashSet.scala |   2 |
> 'core/src/main/scala/org/apache/spark/util/io'
> || Filename||   Count ||
> | ChunkedByteBuffer.scala |   1 |
> 'core/src/main/scala/org/apache/spark/util/logging'
> || Filename   ||   Count ||
> | DriverLogger.scala |   1 |
> 'core/src/main/scala/org/apache/spark/util/random'
> || Filename||   Count ||
> | RandomSampler.scala |   1 |



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36099) Group exception messages in core/util

2021-07-27 Thread Shockang (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17388101#comment-17388101
 ] 

Shockang commented on SPARK-36099:
--

I’m sorry for my gaffe.[~dc-heros]

> Group exception messages in core/util
> -
>
> Key: SPARK-36099
> URL: https://issues.apache.org/jira/browse/SPARK-36099
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Allison Wang
>Priority: Major
>
> 'core/src/main/scala/org/apache/spark/util'
> || Filename ||   Count ||
> | AccumulatorV2.scala  |   4 |
> | ClosureCleaner.scala |   1 |
> | DependencyUtils.scala|   1 |
> | KeyLock.scala|   1 |
> | ListenerBus.scala|   1 |
> | NextIterator.scala   |   1 |
> | SerializableBuffer.scala |   2 |
> | ThreadUtils.scala|   4 |
> | Utils.scala  |  16 |
> 'core/src/main/scala/org/apache/spark/util/collection'
> || Filename  ||   Count ||
> | AppendOnlyMap.scala   |   1 |
> | CompactBuffer.scala   |   1 |
> | ImmutableBitSet.scala |   6 |
> | MedianHeap.scala  |   1 |
> | OpenHashSet.scala |   2 |
> 'core/src/main/scala/org/apache/spark/util/io'
> || Filename||   Count ||
> | ChunkedByteBuffer.scala |   1 |
> 'core/src/main/scala/org/apache/spark/util/logging'
> || Filename   ||   Count ||
> | DriverLogger.scala |   1 |
> 'core/src/main/scala/org/apache/spark/util/random'
> || Filename||   Count ||
> | RandomSampler.scala |   1 |



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36099) Group exception messages in core/util

2021-07-27 Thread Shockang (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17388095#comment-17388095
 ] 

Shockang commented on SPARK-36099:
--

I think you should ask my permission, or my time will be wasted.[~dc-heros]

> Group exception messages in core/util
> -
>
> Key: SPARK-36099
> URL: https://issues.apache.org/jira/browse/SPARK-36099
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Allison Wang
>Priority: Major
>
> 'core/src/main/scala/org/apache/spark/util'
> || Filename ||   Count ||
> | AccumulatorV2.scala  |   4 |
> | ClosureCleaner.scala |   1 |
> | DependencyUtils.scala|   1 |
> | KeyLock.scala|   1 |
> | ListenerBus.scala|   1 |
> | NextIterator.scala   |   1 |
> | SerializableBuffer.scala |   2 |
> | ThreadUtils.scala|   4 |
> | Utils.scala  |  16 |
> 'core/src/main/scala/org/apache/spark/util/collection'
> || Filename  ||   Count ||
> | AppendOnlyMap.scala   |   1 |
> | CompactBuffer.scala   |   1 |
> | ImmutableBitSet.scala |   6 |
> | MedianHeap.scala  |   1 |
> | OpenHashSet.scala |   2 |
> 'core/src/main/scala/org/apache/spark/util/io'
> || Filename||   Count ||
> | ChunkedByteBuffer.scala |   1 |
> 'core/src/main/scala/org/apache/spark/util/logging'
> || Filename   ||   Count ||
> | DriverLogger.scala |   1 |
> 'core/src/main/scala/org/apache/spark/util/random'
> || Filename||   Count ||
> | RandomSampler.scala |   1 |



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36099) Group exception messages in core/util

2021-07-27 Thread Shockang (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17388090#comment-17388090
 ] 

Shockang commented on SPARK-36099:
--

You should tell me in advance.My code has written more than 200 lines and is 
preparing to submit pr….[~dc-heros]

> Group exception messages in core/util
> -
>
> Key: SPARK-36099
> URL: https://issues.apache.org/jira/browse/SPARK-36099
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Allison Wang
>Priority: Major
>
> 'core/src/main/scala/org/apache/spark/util'
> || Filename ||   Count ||
> | AccumulatorV2.scala  |   4 |
> | ClosureCleaner.scala |   1 |
> | DependencyUtils.scala|   1 |
> | KeyLock.scala|   1 |
> | ListenerBus.scala|   1 |
> | NextIterator.scala   |   1 |
> | SerializableBuffer.scala |   2 |
> | ThreadUtils.scala|   4 |
> | Utils.scala  |  16 |
> 'core/src/main/scala/org/apache/spark/util/collection'
> || Filename  ||   Count ||
> | AppendOnlyMap.scala   |   1 |
> | CompactBuffer.scala   |   1 |
> | ImmutableBitSet.scala |   6 |
> | MedianHeap.scala  |   1 |
> | OpenHashSet.scala |   2 |
> 'core/src/main/scala/org/apache/spark/util/io'
> || Filename||   Count ||
> | ChunkedByteBuffer.scala |   1 |
> 'core/src/main/scala/org/apache/spark/util/logging'
> || Filename   ||   Count ||
> | DriverLogger.scala |   1 |
> 'core/src/main/scala/org/apache/spark/util/random'
> || Filename||   Count ||
> | RandomSampler.scala |   1 |



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36102) Group exception messages in core/deploy

2021-07-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36102:


Assignee: Apache Spark

> Group exception messages in core/deploy
> ---
>
> Key: SPARK-36102
> URL: https://issues.apache.org/jira/browse/SPARK-36102
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Allison Wang
>Assignee: Apache Spark
>Priority: Major
>
> 'core/src/main/scala/org/apache/spark/deploy'
> || Filename  ||   Count ||
> | FaultToleranceTest.scala  |   1 |
> | PythonRunner.scala|   1 |
> | RRunner.scala |   2 |
> | SparkHadoopUtil.scala |   2 |
> | SparkSubmit.scala |   7 |
> | SparkSubmitArguments.scala|   3 |
> | StandaloneResourceUtils.scala |   1 |
> 'core/src/main/scala/org/apache/spark/deploy/history'
> || Filename ||   Count ||
> | ApplicationCache.scala   |   2 |
> | EventLogFileWriters.scala|   2 |
> | FsHistoryProvider.scala  |   5 |
> | HistoryServer.scala  |   2 |
> | HistoryServerMemoryManager.scala |   1 |
> 'core/src/main/scala/org/apache/spark/deploy/master'
> || Filename ||   Count ||
> | Master.scala |   2 |
> 'core/src/main/scala/org/apache/spark/deploy/rest'
> || Filename||   Count ||
> | RestSubmissionClient.scala  |  11 |
> | StandaloneRestServer.scala  |   2 |
> | SubmitRestProtocolMessage.scala |   5 |
> | SubmitRestProtocolRequest.scala |   1 |
> 'core/src/main/scala/org/apache/spark/deploy/security'
> || Filename  ||   Count ||
> | HadoopFSDelegationTokenProvider.scala |   1 |
> 'core/src/main/scala/org/apache/spark/deploy/worker'
> || Filename   ||   Count ||
> | DriverRunner.scala |   2 |
> | Worker.scala   |   3 |
> 'core/src/main/scala/org/apache/spark/deploy/worker/ui'
> || Filename  ||   Count ||
> | LogPage.scala |   2 |



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36102) Group exception messages in core/deploy

2021-07-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17388089#comment-17388089
 ] 

Apache Spark commented on SPARK-36102:
--

User 'dgd-contributor' has created a pull request for this issue:
https://github.com/apache/spark/pull/33540

> Group exception messages in core/deploy
> ---
>
> Key: SPARK-36102
> URL: https://issues.apache.org/jira/browse/SPARK-36102
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Allison Wang
>Priority: Major
>
> 'core/src/main/scala/org/apache/spark/deploy'
> || Filename  ||   Count ||
> | FaultToleranceTest.scala  |   1 |
> | PythonRunner.scala|   1 |
> | RRunner.scala |   2 |
> | SparkHadoopUtil.scala |   2 |
> | SparkSubmit.scala |   7 |
> | SparkSubmitArguments.scala|   3 |
> | StandaloneResourceUtils.scala |   1 |
> 'core/src/main/scala/org/apache/spark/deploy/history'
> || Filename ||   Count ||
> | ApplicationCache.scala   |   2 |
> | EventLogFileWriters.scala|   2 |
> | FsHistoryProvider.scala  |   5 |
> | HistoryServer.scala  |   2 |
> | HistoryServerMemoryManager.scala |   1 |
> 'core/src/main/scala/org/apache/spark/deploy/master'
> || Filename ||   Count ||
> | Master.scala |   2 |
> 'core/src/main/scala/org/apache/spark/deploy/rest'
> || Filename||   Count ||
> | RestSubmissionClient.scala  |  11 |
> | StandaloneRestServer.scala  |   2 |
> | SubmitRestProtocolMessage.scala |   5 |
> | SubmitRestProtocolRequest.scala |   1 |
> 'core/src/main/scala/org/apache/spark/deploy/security'
> || Filename  ||   Count ||
> | HadoopFSDelegationTokenProvider.scala |   1 |
> 'core/src/main/scala/org/apache/spark/deploy/worker'
> || Filename   ||   Count ||
> | DriverRunner.scala |   2 |
> | Worker.scala   |   3 |
> 'core/src/main/scala/org/apache/spark/deploy/worker/ui'
> || Filename  ||   Count ||
> | LogPage.scala |   2 |



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36102) Group exception messages in core/deploy

2021-07-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36102:


Assignee: (was: Apache Spark)

> Group exception messages in core/deploy
> ---
>
> Key: SPARK-36102
> URL: https://issues.apache.org/jira/browse/SPARK-36102
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Allison Wang
>Priority: Major
>
> 'core/src/main/scala/org/apache/spark/deploy'
> || Filename  ||   Count ||
> | FaultToleranceTest.scala  |   1 |
> | PythonRunner.scala|   1 |
> | RRunner.scala |   2 |
> | SparkHadoopUtil.scala |   2 |
> | SparkSubmit.scala |   7 |
> | SparkSubmitArguments.scala|   3 |
> | StandaloneResourceUtils.scala |   1 |
> 'core/src/main/scala/org/apache/spark/deploy/history'
> || Filename ||   Count ||
> | ApplicationCache.scala   |   2 |
> | EventLogFileWriters.scala|   2 |
> | FsHistoryProvider.scala  |   5 |
> | HistoryServer.scala  |   2 |
> | HistoryServerMemoryManager.scala |   1 |
> 'core/src/main/scala/org/apache/spark/deploy/master'
> || Filename ||   Count ||
> | Master.scala |   2 |
> 'core/src/main/scala/org/apache/spark/deploy/rest'
> || Filename||   Count ||
> | RestSubmissionClient.scala  |  11 |
> | StandaloneRestServer.scala  |   2 |
> | SubmitRestProtocolMessage.scala |   5 |
> | SubmitRestProtocolRequest.scala |   1 |
> 'core/src/main/scala/org/apache/spark/deploy/security'
> || Filename  ||   Count ||
> | HadoopFSDelegationTokenProvider.scala |   1 |
> 'core/src/main/scala/org/apache/spark/deploy/worker'
> || Filename   ||   Count ||
> | DriverRunner.scala |   2 |
> | Worker.scala   |   3 |
> 'core/src/main/scala/org/apache/spark/deploy/worker/ui'
> || Filename  ||   Count ||
> | LogPage.scala |   2 |



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36101) Group exception messages in core/api

2021-07-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36101:


Assignee: Apache Spark

> Group exception messages in core/api
> 
>
> Key: SPARK-36101
> URL: https://issues.apache.org/jira/browse/SPARK-36101
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Allison Wang
>Assignee: Apache Spark
>Priority: Major
>
> 'core/src/main/scala/org/apache/spark/api/java'
> || Filename||   Count ||
> | JavaUtils.scala |   1 |
> 'core/src/main/scala/org/apache/spark/api/python'
> || Filename||   Count ||
> | Py4JServer.scala|   3 |
> | PythonHadoopUtil.scala  |   1 |
> | PythonRDD.scala |   3 |
> | PythonRunner.scala  |   4 |
> | PythonWorkerFactory.scala   |   4 |
> | SerDeUtil.scala |   1 |
> | WriteInputFormatTestDataGenerator.scala |   2 |
> 'core/src/main/scala/org/apache/spark/api/r'
> || Filename   ||   Count ||
> | BaseRRunner.scala  |   1 |
> | JVMObjectTracker.scala |   1 |
> | RBackendHandler.scala  |   2 |
> | RUtils.scala   |   1 |
> | SerDe.scala|   4 |



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36101) Group exception messages in core/api

2021-07-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17388070#comment-17388070
 ] 

Apache Spark commented on SPARK-36101:
--

User 'dgd-contributor' has created a pull request for this issue:
https://github.com/apache/spark/pull/33536

> Group exception messages in core/api
> 
>
> Key: SPARK-36101
> URL: https://issues.apache.org/jira/browse/SPARK-36101
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Allison Wang
>Priority: Major
>
> 'core/src/main/scala/org/apache/spark/api/java'
> || Filename||   Count ||
> | JavaUtils.scala |   1 |
> 'core/src/main/scala/org/apache/spark/api/python'
> || Filename||   Count ||
> | Py4JServer.scala|   3 |
> | PythonHadoopUtil.scala  |   1 |
> | PythonRDD.scala |   3 |
> | PythonRunner.scala  |   4 |
> | PythonWorkerFactory.scala   |   4 |
> | SerDeUtil.scala |   1 |
> | WriteInputFormatTestDataGenerator.scala |   2 |
> 'core/src/main/scala/org/apache/spark/api/r'
> || Filename   ||   Count ||
> | BaseRRunner.scala  |   1 |
> | JVMObjectTracker.scala |   1 |
> | RBackendHandler.scala  |   2 |
> | RUtils.scala   |   1 |
> | SerDe.scala|   4 |



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36101) Group exception messages in core/api

2021-07-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36101:


Assignee: (was: Apache Spark)

> Group exception messages in core/api
> 
>
> Key: SPARK-36101
> URL: https://issues.apache.org/jira/browse/SPARK-36101
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Allison Wang
>Priority: Major
>
> 'core/src/main/scala/org/apache/spark/api/java'
> || Filename||   Count ||
> | JavaUtils.scala |   1 |
> 'core/src/main/scala/org/apache/spark/api/python'
> || Filename||   Count ||
> | Py4JServer.scala|   3 |
> | PythonHadoopUtil.scala  |   1 |
> | PythonRDD.scala |   3 |
> | PythonRunner.scala  |   4 |
> | PythonWorkerFactory.scala   |   4 |
> | SerDeUtil.scala |   1 |
> | WriteInputFormatTestDataGenerator.scala |   2 |
> 'core/src/main/scala/org/apache/spark/api/r'
> || Filename   ||   Count ||
> | BaseRRunner.scala  |   1 |
> | JVMObjectTracker.scala |   1 |
> | RBackendHandler.scala  |   2 |
> | RUtils.scala   |   1 |
> | SerDe.scala|   4 |



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-34249) Add documentation for ANSI implicit cast rules

2021-07-27 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-34249.
-
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 33516
[https://github.com/apache/spark/pull/33516]

> Add documentation for ANSI implicit cast rules
> --
>
> Key: SPARK-34249
> URL: https://issues.apache.org/jira/browse/SPARK-34249
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, SQL
>Affects Versions: 3.2.0
>Reporter: Gengliang Wang
>Priority: Major
> Fix For: 3.2.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34249) Add documentation for ANSI implicit cast rules

2021-07-27 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-34249:
---

Assignee: Gengliang Wang

> Add documentation for ANSI implicit cast rules
> --
>
> Key: SPARK-34249
> URL: https://issues.apache.org/jira/browse/SPARK-34249
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, SQL
>Affects Versions: 3.2.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
> Fix For: 3.2.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34619) Update the Spark SQL guide about day-time and year-month interval types

2021-07-27 Thread Kousuke Saruta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-34619:
---
Affects Version/s: 3.2.0

> Update the Spark SQL guide about day-time and year-month interval types
> ---
>
> Key: SPARK-34619
> URL: https://issues.apache.org/jira/browse/SPARK-34619
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
> Fix For: 3.2.0
>
>
> Describe new types at 
> http://spark.apache.org/docs/latest/sql-ref-datatypes.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34619) Update the Spark SQL guide about day-time and year-month interval types

2021-07-27 Thread Kousuke Saruta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-34619:
---
Affects Version/s: (was: 3.3.0)

> Update the Spark SQL guide about day-time and year-month interval types
> ---
>
> Key: SPARK-34619
> URL: https://issues.apache.org/jira/browse/SPARK-34619
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
> Fix For: 3.2.0
>
>
> Describe new types at 
> http://spark.apache.org/docs/latest/sql-ref-datatypes.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35398) Simplify the way to get classes from ClassBodyEvaluator in CodeGenerator.updateAndGetCompilationStats method

2021-07-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35398:


Assignee: Apache Spark

> Simplify the way to get classes from ClassBodyEvaluator in 
> CodeGenerator.updateAndGetCompilationStats method
> 
>
> Key: SPARK-35398
> URL: https://issues.apache.org/jira/browse/SPARK-35398
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Trivial
>
> SPARK-35253 upgraded janino from 3.0.16 to 3.1.4, {{ClassBodyEvaluator}} 
> provides the {{getBytecodes}} method to get
> the mapping from {{ClassFile.getThisClassName}} to {{ClassFile.toByteArray}} 
> directly in this version and we don't need to get this variable by reflection 
> api anymore.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35398) Simplify the way to get classes from ClassBodyEvaluator in CodeGenerator.updateAndGetCompilationStats method

2021-07-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35398:


Assignee: (was: Apache Spark)

> Simplify the way to get classes from ClassBodyEvaluator in 
> CodeGenerator.updateAndGetCompilationStats method
> 
>
> Key: SPARK-35398
> URL: https://issues.apache.org/jira/browse/SPARK-35398
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Yang Jie
>Priority: Trivial
>
> SPARK-35253 upgraded janino from 3.0.16 to 3.1.4, {{ClassBodyEvaluator}} 
> provides the {{getBytecodes}} method to get
> the mapping from {{ClassFile.getThisClassName}} to {{ClassFile.toByteArray}} 
> directly in this version and we don't need to get this variable by reflection 
> api anymore.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35253) Upgrade Janino from 3.0.16 to 3.1.4

2021-07-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35253:


Assignee: Apache Spark

> Upgrade Janino from 3.0.16 to 3.1.4
> ---
>
> Key: SPARK-35253
> URL: https://issues.apache.org/jira/browse/SPARK-35253
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, SQL
>Affects Versions: 3.2.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Minor
>
> From the [change log|http://janino-compiler.github.io/janino/changelog.html], 
>  the janino 3.0.x line has been deprecated,  we can use 3.1.x line instead of 
> it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35253) Upgrade Janino from 3.0.16 to 3.1.4

2021-07-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35253:


Assignee: (was: Apache Spark)

> Upgrade Janino from 3.0.16 to 3.1.4
> ---
>
> Key: SPARK-35253
> URL: https://issues.apache.org/jira/browse/SPARK-35253
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, SQL
>Affects Versions: 3.2.0
>Reporter: Yang Jie
>Priority: Minor
>
> From the [change log|http://janino-compiler.github.io/janino/changelog.html], 
>  the janino 3.0.x line has been deprecated,  we can use 3.1.x line instead of 
> it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36295) Refactor sixth set of 20 query execution errors to use error classes

2021-07-27 Thread PengLei (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17387979#comment-17387979
 ] 

PengLei commented on SPARK-36295:
-

woking on this

> Refactor sixth set of 20 query execution errors to use error classes
> 
>
> Key: SPARK-36295
> URL: https://issues.apache.org/jira/browse/SPARK-36295
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, SQL
>Affects Versions: 3.2.0
>Reporter: Karen Feng
>Priority: Major
>
> Refactor some exceptions in 
> [QueryExecutionErrors|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala]
>  to use error classes.
> There are currently ~350 exceptions in this file; so this PR only focuses on 
> the sixth set of 20.
> {code:java}
> noRecordsFromEmptyDataReaderError
> fileNotFoundError
> unsupportedSchemaColumnConvertError
> cannotReadParquetFilesError
> cannotCreateColumnarReaderError
> invalidNamespaceNameError
> unsupportedPartitionTransformError
> missingDatabaseLocationError
> cannotRemoveReservedPropertyError
> namespaceNotEmptyError
> writingJobFailedError
> writingJobAbortedError
> commitDeniedError
> unsupportedTableWritesError
> cannotCreateJDBCTableWithPartitionsError
> unsupportedUserSpecifiedSchemaError
> writeUnsupportedForBinaryFileDataSourceError
> fileLengthExceedsMaxLengthError
> unsupportedFieldNameError
> cannotSpecifyBothJdbcTableNameAndQueryError
> {code}
> For more detail, see the parent ticket SPARK-36094.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36294) Refactor fifth set of 20 query execution errors to use error classes

2021-07-27 Thread PengLei (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17387978#comment-17387978
 ] 

PengLei commented on SPARK-36294:
-

working on this

> Refactor fifth set of 20 query execution errors to use error classes
> 
>
> Key: SPARK-36294
> URL: https://issues.apache.org/jira/browse/SPARK-36294
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, SQL
>Affects Versions: 3.2.0
>Reporter: Karen Feng
>Priority: Major
>
> Refactor some exceptions in 
> [QueryExecutionErrors|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala]
>  to use error classes.
> There are currently ~350 exceptions in this file; so this PR only focuses on 
> the fifth set of 20.
> {code:java}
> createStreamingSourceNotSpecifySchemaError
> streamedOperatorUnsupportedByDataSourceError
> multiplePathsSpecifiedError
> failedToFindDataSourceError
> removedClassInSpark2Error
> incompatibleDataSourceRegisterError
> unrecognizedFileFormatError
> sparkUpgradeInReadingDatesError
> sparkUpgradeInWritingDatesError
> buildReaderUnsupportedForFileFormatError
> jobAbortedError
> taskFailedWhileWritingRowsError
> readCurrentFileNotFoundError
> unsupportedSaveModeError
> cannotClearOutputDirectoryError
> cannotClearPartitionDirectoryError
> failedToCastValueToDataTypeForPartitionColumnError
> endOfStreamError
> fallbackV1RelationReportsInconsistentSchemaError
> cannotDropNonemptyNamespaceError
> {code}
> For more detail, see the parent ticket SPARK-36094.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Issue Comment Deleted] (SPARK-36291) Refactor second set of 20 query execution errors to use error classes

2021-07-27 Thread PengLei (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

PengLei updated SPARK-36291:

Comment: was deleted

(was: working on this)

> Refactor second set of 20 query execution errors to use error classes
> -
>
> Key: SPARK-36291
> URL: https://issues.apache.org/jira/browse/SPARK-36291
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, SQL
>Affects Versions: 3.2.0
>Reporter: Karen Feng
>Priority: Major
>
> Refactor some exceptions in 
> [QueryExecutionErrors|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala]
>  to use error classes.
> There are currently ~350 exceptions in this file; so this PR only focuses on 
> the second set of 20.
> {code:java}
> inputTypeUnsupportedError
> invalidFractionOfSecondError
> overflowInSumOfDecimalError
> overflowInIntegralDivideError
> mapSizeExceedArraySizeWhenZipMapError
> copyNullFieldNotAllowedError
> literalTypeUnsupportedError
> noDefaultForDataTypeError
> doGenCodeOfAliasShouldNotBeCalledError
> orderedOperationUnsupportedByDataTypeError
> regexGroupIndexLessThanZeroError
> regexGroupIndexExceedGroupCountError
> invalidUrlError
> dataTypeOperationUnsupportedError
> mergeUnsupportedByWindowFunctionError
> dataTypeUnexpectedError
> typeUnsupportedError
> negativeValueUnexpectedError
> addNewFunctionMismatchedWithFunctionError
> cannotGenerateCodeForUncomparableTypeError
> {code}
> For more detail, see the parent ticket SPARK-36094.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34619) Update the Spark SQL guide about day-time and year-month interval types

2021-07-27 Thread Kousuke Saruta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-34619:
---
Affects Version/s: (was: 3.2.0)
   3.3.0

> Update the Spark SQL guide about day-time and year-month interval types
> ---
>
> Key: SPARK-34619
> URL: https://issues.apache.org/jira/browse/SPARK-34619
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
> Fix For: 3.2.0
>
>
> Describe new types at 
> http://spark.apache.org/docs/latest/sql-ref-datatypes.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34619) Update the Spark SQL guide about day-time and year-month interval types

2021-07-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17387969#comment-17387969
 ] 

Apache Spark commented on SPARK-34619:
--

User 'sarutak' has created a pull request for this issue:
https://github.com/apache/spark/pull/33539

> Update the Spark SQL guide about day-time and year-month interval types
> ---
>
> Key: SPARK-34619
> URL: https://issues.apache.org/jira/browse/SPARK-34619
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
> Fix For: 3.2.0
>
>
> Describe new types at 
> http://spark.apache.org/docs/latest/sql-ref-datatypes.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

1 2 >

1 - 100 of 141 matches

Mail list logo