[jira] [Commented] (SPARK-48307) InlineCTE should keep not-inlined relations in the original WithCTE node

2024-06-28 Thread ci-cassandra.apache.org (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-48307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17860739#comment-17860739
 ] 

ci-cassandra.apache.org commented on SPARK-48307:
-

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/47141

> InlineCTE should keep not-inlined relations in the original WithCTE node
> 
>
> Key: SPARK-48307
> URL: https://issues.apache.org/jira/browse/SPARK-48307
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-45096) Optimize apt-get install in Dockerfile

2023-09-06 Thread ci-cassandra.apache.org (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-45096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17762527#comment-17762527
 ] 

ci-cassandra.apache.org commented on SPARK-45096:
-

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/42842

> Optimize apt-get install in Dockerfile
> --
>
> Key: SPARK-45096
> URL: https://issues.apache.org/jira/browse/SPARK-45096
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44540) Remove unused stylesheet and javascript files of jsonFormatter

2023-07-24 Thread ci-cassandra.apache.org (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17746781#comment-17746781
 ] 

ci-cassandra.apache.org commented on SPARK-44540:
-

User 'yaooqinn' has created a pull request for this issue:
https://github.com/apache/spark/pull/42145

> Remove unused stylesheet and javascript files of jsonFormatter
> --
>
> Key: SPARK-44540
> URL: https://issues.apache.org/jira/browse/SPARK-44540
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 3.5.0
>Reporter: Kent Yao
>Priority: Major
>
> jsonFormatter.min.css and jsonFormatter.min.js is unreached



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44454) HiveShim getTablesByType support fallback

2023-07-24 Thread ci-cassandra.apache.org (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17746751#comment-17746751
 ] 

ci-cassandra.apache.org commented on SPARK-44454:
-

User 'cxzl25' has created a pull request for this issue:
https://github.com/apache/spark/pull/42033

> HiveShim getTablesByType support fallback
> -
>
> Key: SPARK-44454
> URL: https://issues.apache.org/jira/browse/SPARK-44454
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.1
>Reporter: dzcxzl
>Priority: Minor
>
> When we use a high version of Hive Client to communicate with a low version 
> of Hive meta store, we may encounter Invalid method name: 
> 'get_tables_by_type'.
>  
> {code:java}
> 23/07/17 12:45:24,391 [main] DEBUG SparkSqlParser: Parsing command: show views
> 23/07/17 12:45:24,489 [main] ERROR log: Got exception: 
> org.apache.thrift.TApplicationException Invalid method name: 
> 'get_tables_by_type'
> org.apache.thrift.TApplicationException: Invalid method name: 
> 'get_tables_by_type'
>     at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:79)
>     at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_tables_by_type(ThriftHiveMetastore.java:1433)
>     at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_tables_by_type(ThriftHiveMetastore.java:1418)
>     at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTables(HiveMetaStoreClient.java:1411)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:498)
>     at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:173)
>     at com.sun.proxy.$Proxy23.getTables(Unknown Source)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:498)
>     at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:2344)
>     at com.sun.proxy.$Proxy23.getTables(Unknown Source)
>     at org.apache.hadoop.hive.ql.metadata.Hive.getTablesByType(Hive.java:1427)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:498)
>     at 
> org.apache.spark.sql.hive.client.Shim_v2_3.getTablesByType(HiveShim.scala:1408)
>     at 
> org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$listTablesByType$1(HiveClientImpl.scala:789)
>     at 
> org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:294)
>     at 
> org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:225)
>     at 
> org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:224)
>     at 
> org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:274)
>     at 
> org.apache.spark.sql.hive.client.HiveClientImpl.listTablesByType(HiveClientImpl.scala:785)
>     at 
> org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$listViews$1(HiveExternalCatalog.scala:895)
>     at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:108)
>     at 
> org.apache.spark.sql.hive.HiveExternalCatalog.listViews(HiveExternalCatalog.scala:893)
>     at 
> org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.listViews(ExternalCatalogWithListener.scala:158)
>     at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.listViews(SessionCatalog.scala:1040)
>     at 
> org.apache.spark.sql.execution.command.ShowViewsCommand.$anonfun$run$5(views.scala:407)
>     at scala.Option.getOrElse(Option.scala:189)
>     at 
> org.apache.spark.sql.execution.command.ShowViewsCommand.run(views.scala:407) 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44059) Add named argument support for SQL functions

2023-07-12 Thread ci-cassandra.apache.org (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17742499#comment-17742499
 ] 

ci-cassandra.apache.org commented on SPARK-44059:
-

User 'learningchess2003' has created a pull request for this issue:
https://github.com/apache/spark/pull/41864

> Add named argument support for SQL functions
> 
>
> Key: SPARK-44059
> URL: https://issues.apache.org/jira/browse/SPARK-44059
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core, SQL
>Affects Versions: 3.5.0
>Reporter: Richard Yu
>Priority: Major
>
> Today, there is increasing demand for named argument functions, especially as 
> we continue to introduce longer and longer parameter lists in our SQL 
> functions. In these functions, many arguments could have default values, 
> making it a waste to specify them all even if it is redundant. This is an 
> umbrella ticket to track smaller subtasks which would be completed for 
> implementing this feature.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44217) Allow custom precision for fp approx equality

2023-07-12 Thread ci-cassandra.apache.org (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17742495#comment-17742495
 ] 

ci-cassandra.apache.org commented on SPARK-44217:
-

User 'asl3' has created a pull request for this issue:
https://github.com/apache/spark/pull/41947

> Allow custom precision for fp approx equality
> -
>
> Key: SPARK-44217
> URL: https://issues.apache.org/jira/browse/SPARK-44217
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.5.0
>Reporter: Amanda Liu
>Priority: Major
>
> SPIP: 
> https://docs.google.com/document/d/1OkyBn3JbEHkkQgSQ45Lq82esXjr9rm2Vj7Ih_4zycRc/edit#heading=h.f5f0u2riv07v



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44395) Update table function arguments to require parentheses around identifier after the TABLE keyword

2023-07-12 Thread ci-cassandra.apache.org (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17742496#comment-17742496
 ] 

ci-cassandra.apache.org commented on SPARK-44395:
-

User 'dtenedor' has created a pull request for this issue:
https://github.com/apache/spark/pull/41965

> Update table function arguments to require parentheses around identifier 
> after the TABLE keyword
> 
>
> Key: SPARK-44395
> URL: https://issues.apache.org/jira/browse/SPARK-44395
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Daniel
>Priority: Major
>
> Per the SQL standard, `TABLE identifier` should actually be passed as 
> `TABLE(identifier)`. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43995) Implement UDFRegistration

2023-07-12 Thread ci-cassandra.apache.org (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17742490#comment-17742490
 ] 

ci-cassandra.apache.org commented on SPARK-43995:
-

User 'vicennial' has created a pull request for this issue:
https://github.com/apache/spark/pull/41953

> Implement UDFRegistration
> -
>
> Key: SPARK-43995
> URL: https://issues.apache.org/jira/browse/SPARK-43995
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Venkata Sai Akhil Gudesa
>Priority: Major
>
> Reference file - 
> [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala]
> API to be implemented:
>  * 
> {noformat}
> def register(name: String, udf: UserDefinedFunction): 
> UserDefinedFunction{noformat}
>  * 
>  ** 
> [Reference|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala#L112-L123]
>  * 
> {noformat}
> def register[RT: TypeTag](name: String, func: Function0[RT]): 
> UserDefinedFunction{noformat}
>  * 
>  ** From [0 to 22 
> arguments|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala#L125-L642]
>  * 
> {noformat}
> def register(name: String, f: UDF0[_], returnType: DataType): Unit{noformat}
>  * 
>  ** From [0 to 22 
> arguments|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala#L735-L1076]
>  
> We currently do not support UDAFs so the relevant UDAF APIs may be skipped as 
> well as the python/pyspark (in the context of the scala client) related APIs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44325) Define the computing logic through PartitionEvaluator API and use it in SortMergeJoinExec

2023-07-06 Thread ci-cassandra.apache.org (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17740836#comment-17740836
 ] 

ci-cassandra.apache.org commented on SPARK-44325:
-

User 'vinodkc' has created a pull request for this issue:
https://github.com/apache/spark/pull/41884

> Define the computing logic through PartitionEvaluator API and use it in 
> SortMergeJoinExec
> -
>
> Key: SPARK-44325
> URL: https://issues.apache.org/jira/browse/SPARK-44325
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Vinod KC
>Priority: Major
>
> Define the computing logic through PartitionEvaluator API and use it in 
> SortMergeJoinExec



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44278) Implement a GRPC server interceptor that cleans up thread local properties

2023-07-03 Thread ci-cassandra.apache.org (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17739681#comment-17739681
 ] 

ci-cassandra.apache.org commented on SPARK-44278:
-

User 'heyihong' has created a pull request for this issue:
https://github.com/apache/spark/pull/41831

> Implement a GRPC server interceptor that cleans up thread local properties
> --
>
> Key: SPARK-44278
> URL: https://issues.apache.org/jira/browse/SPARK-44278
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Yihong He
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44210) Strengthen type checking and better comply with Connect specifications for `levenshtein` function

2023-06-28 Thread ci-cassandra.apache.org (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17738211#comment-17738211
 ] 

ci-cassandra.apache.org commented on SPARK-44210:
-

User 'panbingkun' has created a pull request for this issue:
https://github.com/apache/spark/pull/41724

> Strengthen type checking and better comply with Connect specifications for 
> `levenshtein` function
> -
>
> Key: SPARK-44210
> URL: https://issues.apache.org/jira/browse/SPARK-44210
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, SQL
>Affects Versions: 3.5.0
>Reporter: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41599) Memory leak in FileSystem.CACHE when submitting apps to secure cluster using InProcessLauncher

2023-06-28 Thread ci-cassandra.apache.org (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17738209#comment-17738209
 ] 

ci-cassandra.apache.org commented on SPARK-41599:
-

User 'risyomei' has created a pull request for this issue:
https://github.com/apache/spark/pull/41692

> Memory leak in FileSystem.CACHE when submitting apps to secure cluster using 
> InProcessLauncher
> --
>
> Key: SPARK-41599
> URL: https://issues.apache.org/jira/browse/SPARK-41599
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy, YARN
>Affects Versions: 3.1.2
>Reporter: Maciej Smolenski
>Priority: Major
> Attachments: InProcLaunchFsIssue.scala, 
> SPARK-41599-fixes-to-limit-FileSystem-CACHE-size-when-using-InProcessLauncher.diff
>
>
> When submitting spark application in kerberos environment the credentials of 
> 'current user' (UserGroupInformation.getCurrentUser()) are being modified.
> Filesystem.CACHE entries contain 'current user' (with user credentials) as a 
> key.
> Submitting many spark applications using InProcessLauncher cause that 
> FileSystem.CACHE becomes bigger and bigger.
> Finally process exits because of OutOfMemory error.
> Code for reproduction attached.
>  
> Output from running 'jmap -histo' on reproduction jvm shows that the number 
> of FileSystem$Cache$Key increases in time:
> time: #instances class
> 1671533274: 2 org.apache.hadoop.fs.FileSystem$Cache$Key
> 167155: 11 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671533395: 21 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671533455: 30 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671533515: 39 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671533576: 48 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671533636: 57 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671533696: 66 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671533757: 75 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671533817: 84 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671533877: 93 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671533937: 102 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671533998: 111 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671534058: 120 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671534118: 135 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671534178: 140 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671534239: 150 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671534299: 159 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671534359: 168 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671534419: 177 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671534480: 186 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671534540: 195 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671534600: 204 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671534661: 213 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671534721: 222 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671534781: 231 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671534841: 240 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671534902: 249 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671534962: 257 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671535022: 264 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671535083: 273 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671535143: 282 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671535203: 291 org.apache.hadoop.fs.FileSystem$Cache$Key



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44145) Callback prior to execution

2023-06-28 Thread ci-cassandra.apache.org (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17738210#comment-17738210
 ] 

ci-cassandra.apache.org commented on SPARK-44145:
-

User 'jdesjean' has created a pull request for this issue:
https://github.com/apache/spark/pull/41748

> Callback prior to execution
> ---
>
> Key: SPARK-44145
> URL: https://issues.apache.org/jira/browse/SPARK-44145
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Jean-Francois Desjeans Gauthier
>Priority: Major
>
> Commands are eagerly executed after analysis phase, while other queries are 
> executed after planning planning. Users of Spark need to understand time 
> spent prior to execution. Currently, they need to understand the difference 
> between these 2 modes. Add a callback after query planning is completed that 
> can be used for such use.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44128) Upgrade netty from 4.1.92 to 4.1.93

2023-06-28 Thread ci-cassandra.apache.org (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17738065#comment-17738065
 ] 

ci-cassandra.apache.org commented on SPARK-44128:
-

User 'panbingkun' has created a pull request for this issue:
https://github.com/apache/spark/pull/41681

> Upgrade netty from 4.1.92 to 4.1.93
> ---
>
> Key: SPARK-44128
> URL: https://issues.apache.org/jira/browse/SPARK-44128
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43470) Add operating system ,Java, Python version information to application log

2023-06-26 Thread ci-cassandra.apache.org (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17737312#comment-17737312
 ] 

ci-cassandra.apache.org commented on SPARK-43470:
-

User 'vinodkc' has created a pull request for this issue:
https://github.com/apache/spark/pull/41144

> Add operating system ,Java, Python version information to application log
> -
>
> Key: SPARK-43470
> URL: https://issues.apache.org/jira/browse/SPARK-43470
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Vinod KC
>Priority: Minor
>
> Include the operating system and Java version, python version information in 
> the Application log. This will provide useful context and aid in 
> troubleshooting and debugging any issues that may arise, particularly when 
> Spark runs across heterogeneous environments (systems with varying operating 
> systems and Java versions).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43523) Memory leak in Spark UI

2023-06-05 Thread ci-cassandra.apache.org (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17729480#comment-17729480
 ] 

ci-cassandra.apache.org commented on SPARK-43523:
-

User 'aminebag' has created a pull request for this issue:
https://github.com/apache/spark/pull/41423

> Memory leak in Spark UI
> ---
>
> Key: SPARK-43523
> URL: https://issues.apache.org/jira/browse/SPARK-43523
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.4.4, 3.4.0
>Reporter: Amine Bagdouri
>Priority: Major
> Attachments: spark_shell_oom.log, spark_ui_memory_leak.zip
>
>
> We have a distributed Spark application running on Azure HDInsight using 
> Spark version 2.4.4.
> After a few days of active processing on our application, we have noticed 
> that the GC CPU time ratio of the driver is close to 100%. We suspected a 
> memory leak. Thus, we have produced a heap dump and analyzed it using Eclipse 
> Memory Analyzer.
> Here is some interesting data from the driver's heap dump (heap size is 8 GB):
>  * The estimated retained heap size of String objects (~5M instances) is 3.3 
> GB. It seems that most of these instances correspond to spark events.
>  * Spark UI's AppStatusListener instance estimated retained size is 1.1 GB.
>  * The number of LiveJob objects with status "RUNNING" is 18K, knowing that 
> there shouldn't be more than 16 live running jobs since we use a fixed size 
> thread pool of 16 threads to run spark queries.
>  * The number of LiveTask objects is 485K.
>  * The AsyncEventQueue instance associated to the AppStatusListener has a 
> value of 854 for dropped events count and a value of 10001 for total events 
> count, knowing that the dropped events counter is reset every minute and that 
> the queue's default capacity is 1.
> We think that there is a memory leak in Spark UI. Here is our analysis of the 
> root cause of this leak:
>  * AppStatusListener is notified of Spark events using a bounded queue in 
> AsyncEventQueue.
>  * AppStatusListener updates its state (kvstore, liveTasks, liveStages, 
> liveJobs, ...) based on the received events. For example, onTaskStart adds a 
> task to liveTasks map and onTaskEnd removes the task from liveTasks map.
>  * When the rate of events is very high, the bounded queue in AsyncEventQueue 
> is full, some events are dropped and don't make it to AppStatusListener.
>  * Dropped events that signal the end of a processing unit prevent the state 
> of AppStatusListener from being cleaned. For example, a dropped onTaskEnd 
> event, will prevent the task from being removed from liveTasks map, and the 
> task will remain in the heap until the driver's JVM is stopped.
> We were able to confirm our analysis by reducing the capacity of the 
> AsyncEventQueue (spark.scheduler.listenerbus.eventqueue.capacity=10). After 
> having launched many spark queries using this config, we observed that the 
> number of active jobs in Spark UI increased rapidly and remained high even 
> though all submitted queries have completed. We have also noticed that some 
> executor task counters in Spark UI were negative, which confirms that 
> AppStatusListener state does not accurately reflect the reality and that it 
> can be a victim of event drops.
> Suggested fix:
> There are some limits today on the number of "dead" objects in 
> AppStatusListener's maps (for example: spark.ui.retainedJobs). We suggest 
> enforcing another configurable limit on the number of total objects in 
> AppStatusListener's maps and kvstore. This should limit the leak in the case 
> of high events rate, but AppStatusListener stats will remain inaccurate.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43376) Improve reuse subquery with table cache

2023-06-05 Thread ci-cassandra.apache.org (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17729336#comment-17729336
 ] 

ci-cassandra.apache.org commented on SPARK-43376:
-

User 'ulysses-you' has created a pull request for this issue:
https://github.com/apache/spark/pull/41454

> Improve reuse subquery with table cache
> ---
>
> Key: SPARK-43376
> URL: https://issues.apache.org/jira/browse/SPARK-43376
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: XiDuo You
>Assignee: XiDuo You
>Priority: Major
> Fix For: 3.5.0
>
>
> AQE can not reuse subquery if it is pushed into InMemoryTableScan.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43783) Enable FeatureTests.test_standard_scaler for pandas 2.0.0.

2023-06-05 Thread ci-cassandra.apache.org (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17729335#comment-17729335
 ] 

ci-cassandra.apache.org commented on SPARK-43783:
-

User 'WeichenXu123' has created a pull request for this issue:
https://github.com/apache/spark/pull/41456

> Enable FeatureTests.test_standard_scaler for pandas 2.0.0.
> --
>
> Key: SPARK-43783
> URL: https://issues.apache.org/jira/browse/SPARK-43783
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Fix `FeatureTests.test_standard_scaler` In 
> `python/pyspark/mlv2/tests/test_feature.py`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43413) IN subquery ListQuery has wrong nullability

2023-05-08 Thread ci-cassandra.apache.org (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17720780#comment-17720780
 ] 

ci-cassandra.apache.org commented on SPARK-43413:
-

User 'jchen5' has created a pull request for this issue:
https://github.com/apache/spark/pull/41094

> IN subquery ListQuery has wrong nullability
> ---
>
> Key: SPARK-43413
> URL: https://issues.apache.org/jira/browse/SPARK-43413
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Jack Chen
>Priority: Major
>
> IN subquery expressions are incorrectly marked as non-nullable, even when 
> they are actually nullable. They correctly check the nullability of the 
> left-hand-side, but the right-hand-side of a IN subquery, the ListQuery, is 
> currently defined with nullability = false always. This is incorrect and can 
> lead to incorrect query transformations.
> Example: (non_nullable_col IN (select nullable_col)) <=> TRUE . Here the IN 
> expression returns NULL when the nullable_col is null, but our code marks it 
> as non-nullable, and therefore SimplifyBinaryComparison transforms away the 
> <=> TRUE, transforming the expression to non_nullable_col IN (select 
> nullable_col) , which is an incorrect transformation because NULL values of 
> nullable_col now cause the expression to yield NULL instead of FALSE.
> This bug can potentially lead to wrong results, but in most cases this 
> doesn't directly cause wrong results end-to-end, because IN subqueries are 
> almost always transformed to semi/anti/existence joins in 
> RewritePredicateSubquery, and this rewrite can also incorrectly discard 
> NULLs, which is another bug. But we can observe it causing wrong behavior in 
> unit tests, and it could easily lead to incorrect query results if there are 
> changes to the surrounding context, so it should be fixed regardless.
> This is a long-standing bug that has existed at least since 2016, as long as 
> the ListQuery class has existed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43284) _metadata.file_path regression

2023-05-04 Thread ci-cassandra.apache.org (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17719609#comment-17719609
 ] 

ci-cassandra.apache.org commented on SPARK-43284:
-

User 'databricks-david-lewis' has created a pull request for this issue:
https://github.com/apache/spark/pull/40947

> _metadata.file_path regression
> --
>
> Key: SPARK-43284
> URL: https://issues.apache.org/jira/browse/SPARK-43284
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: David Lewis
>Assignee: David Lewis
>Priority: Major
> Fix For: 3.4.1, 3.5.0
>
>
> As part of the [SparkPath 
> refactor|https://issues.apache.org/jira/browse/SPARK-41970] the behavior of 
> `_metadata.file_path` was inadvertently changed. In Spark 3.4+ it now returns 
> a non-encoded path string, as opposed to a url-encoded path string.
> This ticket is to fix that regression.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42999) Impl Dataset#foreach, foreachPartitions

2023-04-06 Thread ci-cassandra.apache.org (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17709519#comment-17709519
 ] 

ci-cassandra.apache.org commented on SPARK-42999:
-

User 'zhenlineo' has created a pull request for this issue:
https://github.com/apache/spark/pull/40628

> Impl Dataset#foreach, foreachPartitions
> ---
>
> Key: SPARK-42999
> URL: https://issues.apache.org/jira/browse/SPARK-42999
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Zhen Li
>Priority: Major
>
> Impl the missing methods in Scala Client Dataset API.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43019) Move Ordering to PhysicalDataType

2023-04-05 Thread ci-cassandra.apache.org (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17709209#comment-17709209
 ] 

ci-cassandra.apache.org commented on SPARK-43019:
-

User 'amaliujia' has created a pull request for this issue:
https://github.com/apache/spark/pull/40651

> Move Ordering to PhysicalDataType
> -
>
> Key: SPARK-43019
> URL: https://issues.apache.org/jira/browse/SPARK-43019
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43041) Restore constructors of exceptions for compatibility in connector API

2023-04-05 Thread ci-cassandra.apache.org (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17709208#comment-17709208
 ] 

ci-cassandra.apache.org commented on SPARK-43041:
-

User 'aokolnychyi' has created a pull request for this issue:
https://github.com/apache/spark/pull/40679

> Restore constructors of exceptions for compatibility in connector API
> -
>
> Key: SPARK-43041
> URL: https://issues.apache.org/jira/browse/SPARK-43041
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Assignee: Anton Okolnychyi
>Priority: Blocker
> Fix For: 3.4.0
>
>
> Thanks [~aokolnychyi] for raising the issue as shown below:
> {quote}
> I have a question about changes to exceptions used in the public connector 
> API, such as NoSuchTableException and TableAlreadyExistsException.
> I consider those as part of the public Catalog API (TableCatalog uses them in 
> method definitions). However, it looks like PR #37887 has changed them in an 
> incompatible way. Old constructors accepting Identifier objects got removed. 
> The only way to construct such exceptions is either by passing database and 
> table strings or Scala Seq. Shall we add back old constructors to avoid 
> breaking connectors?
> {quote}
> We should restore constructors of those exceptions to preserve the 
> compatibility in connector API.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41628) Support async query execution

2023-04-03 Thread ci-cassandra.apache.org (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17708024#comment-17708024
 ] 

ci-cassandra.apache.org commented on SPARK-41628:
-

User 'Hisoka-X' has created a pull request for this issue:
https://github.com/apache/spark/pull/40649

> Support async query execution
> -
>
> Key: SPARK-41628
> URL: https://issues.apache.org/jira/browse/SPARK-41628
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Martin Grund
>Priority: Major
>
> Today the query execution is completely synchronous, add an additional 
> asynchronous API that allows to submit and polll for the result.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43011) array_insert should fail with 0 index

2023-04-03 Thread ci-cassandra.apache.org (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17707932#comment-17707932
 ] 

ci-cassandra.apache.org commented on SPARK-43011:
-

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/40641

> array_insert should fail with 0 index
> -
>
> Key: SPARK-43011
> URL: https://issues.apache.org/jira/browse/SPARK-43011
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org