date:20230615

[jira] [Assigned] (SPARK-44072) Update the incorrect sql example of insert table documentation

2023-06-15 Thread Yuming Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang reassigned SPARK-44072:
---

Assignee: Yang Zhang

> Update the incorrect sql example of insert table documentation
> --
>
> Key: SPARK-44072
> URL: https://issues.apache.org/jira/browse/SPARK-44072
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 3.3.3, 3.4.1, 3.5.0
>Reporter: Yang Zhang
>Assignee: Yang Zhang
>Priority: Major
> Fix For: 3.5.0
>
>
> Latest docs of insert table has an incorrect sql example about 'Insert Using 
> a Typed Date Literal for a Partition Column Value'.
> It should be
> {code:java}
> INSERT OVERWRITE students PARTITION (birthday = date'2019-01-02') 
> VALUES('Jason Wang', '908 Bird St, Saratoga'); {code}
> Doc link: 
> https://spark.apache.org/docs/latest/sql-ref-syntax-dml-insert-table.html#insert-using-a-typed-date-literal-for-a-partition-column-value-1



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-44072) Update the incorrect sql example of insert table documentation

2023-06-15 Thread Yuming Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-44072:

Fix Version/s: (was: 3.4.1)
   (was: 3.3.3)

> Update the incorrect sql example of insert table documentation
> --
>
> Key: SPARK-44072
> URL: https://issues.apache.org/jira/browse/SPARK-44072
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 3.3.3, 3.4.1, 3.5.0
>Reporter: Yang Zhang
>Priority: Major
> Fix For: 3.5.0
>
>
> Latest docs of insert table has an incorrect sql example about 'Insert Using 
> a Typed Date Literal for a Partition Column Value'.
> It should be
> {code:java}
> INSERT OVERWRITE students PARTITION (birthday = date'2019-01-02') 
> VALUES('Jason Wang', '908 Bird St, Saratoga'); {code}
> Doc link: 
> https://spark.apache.org/docs/latest/sql-ref-syntax-dml-insert-table.html#insert-using-a-typed-date-literal-for-a-partition-column-value-1



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-44072) Update the incorrect sql example of insert table documentation

2023-06-15 Thread Yuming Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-44072:

Fix Version/s: 3.3.3
   3.4.1

> Update the incorrect sql example of insert table documentation
> --
>
> Key: SPARK-44072
> URL: https://issues.apache.org/jira/browse/SPARK-44072
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 3.3.3, 3.4.1, 3.5.0
>Reporter: Yang Zhang
>Priority: Major
> Fix For: 3.3.3, 3.4.1, 3.5.0
>
>
> Latest docs of insert table has an incorrect sql example about 'Insert Using 
> a Typed Date Literal for a Partition Column Value'.
> It should be
> {code:java}
> INSERT OVERWRITE students PARTITION (birthday = date'2019-01-02') 
> VALUES('Jason Wang', '908 Bird St, Saratoga'); {code}
> Doc link: 
> https://spark.apache.org/docs/latest/sql-ref-syntax-dml-insert-table.html#insert-using-a-typed-date-literal-for-a-partition-column-value-1



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-44077) Session Configs were not getting honored in RDDs

2023-06-15 Thread Kapil Singh (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kapil Singh updated SPARK-44077:

Description: When calling SQLConf.get on executors, the configs are read 
from the local properties on the TaskContext. The local properties are 
populated driver-side when scheduling the job, using the properties found in 
sparkContext.localProperties. For RDD actions, local properties were not 
getting populated.

> Session Configs were not getting honored in RDDs
> 
>
> Key: SPARK-44077
> URL: https://issues.apache.org/jira/browse/SPARK-44077
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Kapil Singh
>Priority: Major
>
> When calling SQLConf.get on executors, the configs are read from the local 
> properties on the TaskContext. The local properties are populated driver-side 
> when scheduling the job, using the properties found in 
> sparkContext.localProperties. For RDD actions, local properties were not 
> getting populated.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-44077) Session Configs were not getting honored in RDDs

2023-06-15 Thread Kapil Singh (Jira)

Kapil Singh created SPARK-44077:
---

 Summary: Session Configs were not getting honored in RDDs
 Key: SPARK-44077
 URL: https://issues.apache.org/jira/browse/SPARK-44077
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.4.0
Reporter: Kapil Singh






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-44040) Incorrect result after count distinct

2023-06-15 Thread Yuming Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang reassigned SPARK-44040:
---

Assignee: Yuming Wang

> Incorrect result after count distinct
> -
>
> Key: SPARK-44040
> URL: https://issues.apache.org/jira/browse/SPARK-44040
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.3.2, 3.4.0
>Reporter: Aleksandr Aleksandrov
>Assignee: Yuming Wang
>Priority: Critical
>
> When i try to call count after distinct function for Decimal null field, 
> spark return incorrect result starting from spark 3.4.0.
> A minimal example to reproduce:
> import org.apache.spark.sql.types._
> import org.apache.spark.sql.\{Column, DataFrame, Dataset, Row, SparkSession}
> import org.apache.spark.sql.types.\{StringType, StructField, StructType}
> val schema = StructType( Array(
> StructField("money", DecimalType(38,6), true),
> StructField("reference_id", StringType, true)
> ))
> val payDf = spark.createDataFrame(sc.emptyRDD[Row], schema)
> val aggDf = payDf.agg(sum("money").as("money")).withColumn("name", lit("df1"))
> val aggDf1 = payDf.agg(sum("money").as("money")).withColumn("name", 
> lit("df2"))
> val unionDF: DataFrame = aggDf.union(aggDf1)
> unionDF.select("money").distinct.show // return correct result
> unionDF.select("money").distinct.count // return 2 instead of 1
> unionDF.select("money").distinct.count == 1 // return false
> This block of code returns some assertion error and after that an incorrect 
> count (in spark 3.2.1 everything works fine and i get correct result = 1):
> *scala> unionDF.select("money").distinct.show // return correct result*
> java.lang.AssertionError: assertion failed:
> Decimal$DecimalIsFractional
> while compiling: 
> during phase: globalPhase=terminal, enteringPhase=jvm
> library version: version 2.12.17
> compiler version: version 2.12.17
> reconstructed args: -classpath 
> /Users/aleksandrov/.ivy2/jars/org.apache.spark_spark-connect_2.12-3.4.0.jar:/Users/aleksandrov/.ivy2/jars/io.delta_delta-core_2.12-2.4.0.jar:/Users/aleksandrov/.ivy2/jars/io.delta_delta-storage-2.4.0.jar:/Users/aleksandrov/.ivy2/jars/org.spark-project.spark_unused-1.0.0.jar:/Users/aleksandrov/.ivy2/jars/org.antlr_antlr4-runtime-4.9.3.jar
>  -Yrepl-class-based -Yrepl-outdir 
> /private/var/folders/qj/_dn4xbp14jn37qmdk7ylyfwcgr/T/spark-f37bb154-75f3-4db7-aea8-3c4363377bd8/repl-350f37a1-1df1-4816-bd62-97929c60a6c1
> last tree to typer: TypeTree(class Byte)
> tree position: line 6 of 
> tree tpe: Byte
> symbol: (final abstract) class Byte in package scala
> symbol definition: final abstract class Byte extends (a ClassSymbol)
> symbol package: scala
> symbol owners: class Byte
> call site: constructor $eval in object $eval in package $line19
> == Source file context for tree position ==
> 3
> 4object $eval {
> 5lazyval $result = 
> $line19.$read.INSTANCE.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.res0
> 6lazyval $print: {_}root{_}.java.lang.String = {
> 7 $line19.$read.INSTANCE.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw
> 8
> 9""
> at 
> scala.reflect.internal.SymbolTable.throwAssertionError(SymbolTable.scala:185)
> at scala.reflect.internal.Symbols$Symbol.completeInfo(Symbols.scala:1525)
> at scala.reflect.internal.Symbols$Symbol.info(Symbols.scala:1514)
> at scala.reflect.internal.Symbols$Symbol.flatOwnerInfo(Symbols.scala:2353)
> at 
> scala.reflect.internal.Symbols$ClassSymbol.companionModule0(Symbols.scala:3346)
> at 
> scala.reflect.internal.Symbols$ClassSymbol.companionModule(Symbols.scala:3348)
> at 
> scala.reflect.internal.Symbols$ModuleClassSymbol.sourceModule(Symbols.scala:3487)
> at 
> scala.reflect.internal.Symbols.$anonfun$forEachRelevantSymbols$1$adapted(Symbols.scala:3802)
> at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
> at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
> at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38)
> at scala.reflect.internal.Symbols.markFlagsCompleted(Symbols.scala:3799)
> at scala.reflect.internal.Symbols.markFlagsCompleted$(Symbols.scala:3805)
> at scala.reflect.internal.SymbolTable.markFlagsCompleted(SymbolTable.scala:28)
> at 
> scala.reflect.internal.pickling.UnPickler$Scan.finishSym$1(UnPickler.scala:324)
> at 
> scala.reflect.internal.pickling.UnPickler$Scan.readSymbol(UnPickler.scala:342)
> at 
> scala.reflect.internal.pickling.UnPickler$Scan.readSymbolRef(UnPickler.scala:645)
> at 
> scala.reflect.internal.pickling.UnPickler$Scan.readType(UnPickler.scala:413)
> at 
> scala.reflect.internal.pickling.UnPickler$Scan.$anonfun$readSymbol$10(UnPickler.scala:357)
> at scala.reflect.internal.pickling.UnPickler$Scan.at(UnPickler.scala:188)
> at 
>

[jira] [Commented] (SPARK-44075) Make 'transformStatCorr' lazy

2023-06-15 Thread Snoot.io (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-44075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17733308#comment-17733308
 ] 

Snoot.io commented on SPARK-44075:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/41621

> Make 'transformStatCorr' lazy
> -
>
> Key: SPARK-44075
> URL: https://issues.apache.org/jira/browse/SPARK-44075
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-43928) Add bit operations to Scala and Python

2023-06-15 Thread Snoot.io (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-43928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17733307#comment-17733307
 ] 

Snoot.io commented on SPARK-43928:
--

User 'beliefer' has created a pull request for this issue:
https://github.com/apache/spark/pull/41608

> Add bit operations to Scala and Python
> --
>
> Key: SPARK-43928
> URL: https://issues.apache.org/jira/browse/SPARK-43928
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark, SQL
>Affects Versions: 3.5.0
>Reporter: Ruifeng Zheng
>Priority: Major
>
> Add following functions:
> * bit_and
> * bit_count
> * bit_get
> * bit_or
> * bit_xor
> * getbit
> to:
> * Scala API
> * Python API
> * Spark Connect Scala Client
> * Spark Connect Python Client



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-44040) Incorrect result after count distinct

2023-06-15 Thread Yuming Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang resolved SPARK-44040.
-
Fix Version/s: 3.3.3
   3.5.0
   3.4.1
   Resolution: Fixed

Issue resolved by pull request 41576
[https://github.com/apache/spark/pull/41576]

> Incorrect result after count distinct
> -
>
> Key: SPARK-44040
> URL: https://issues.apache.org/jira/browse/SPARK-44040
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.3.2, 3.4.0
>Reporter: Aleksandr Aleksandrov
>Assignee: Yuming Wang
>Priority: Critical
> Fix For: 3.3.3, 3.5.0, 3.4.1
>
>
> When i try to call count after distinct function for Decimal null field, 
> spark return incorrect result starting from spark 3.4.0.
> A minimal example to reproduce:
> import org.apache.spark.sql.types._
> import org.apache.spark.sql.\{Column, DataFrame, Dataset, Row, SparkSession}
> import org.apache.spark.sql.types.\{StringType, StructField, StructType}
> val schema = StructType( Array(
> StructField("money", DecimalType(38,6), true),
> StructField("reference_id", StringType, true)
> ))
> val payDf = spark.createDataFrame(sc.emptyRDD[Row], schema)
> val aggDf = payDf.agg(sum("money").as("money")).withColumn("name", lit("df1"))
> val aggDf1 = payDf.agg(sum("money").as("money")).withColumn("name", 
> lit("df2"))
> val unionDF: DataFrame = aggDf.union(aggDf1)
> unionDF.select("money").distinct.show // return correct result
> unionDF.select("money").distinct.count // return 2 instead of 1
> unionDF.select("money").distinct.count == 1 // return false
> This block of code returns some assertion error and after that an incorrect 
> count (in spark 3.2.1 everything works fine and i get correct result = 1):
> *scala> unionDF.select("money").distinct.show // return correct result*
> java.lang.AssertionError: assertion failed:
> Decimal$DecimalIsFractional
> while compiling: 
> during phase: globalPhase=terminal, enteringPhase=jvm
> library version: version 2.12.17
> compiler version: version 2.12.17
> reconstructed args: -classpath 
> /Users/aleksandrov/.ivy2/jars/org.apache.spark_spark-connect_2.12-3.4.0.jar:/Users/aleksandrov/.ivy2/jars/io.delta_delta-core_2.12-2.4.0.jar:/Users/aleksandrov/.ivy2/jars/io.delta_delta-storage-2.4.0.jar:/Users/aleksandrov/.ivy2/jars/org.spark-project.spark_unused-1.0.0.jar:/Users/aleksandrov/.ivy2/jars/org.antlr_antlr4-runtime-4.9.3.jar
>  -Yrepl-class-based -Yrepl-outdir 
> /private/var/folders/qj/_dn4xbp14jn37qmdk7ylyfwcgr/T/spark-f37bb154-75f3-4db7-aea8-3c4363377bd8/repl-350f37a1-1df1-4816-bd62-97929c60a6c1
> last tree to typer: TypeTree(class Byte)
> tree position: line 6 of 
> tree tpe: Byte
> symbol: (final abstract) class Byte in package scala
> symbol definition: final abstract class Byte extends (a ClassSymbol)
> symbol package: scala
> symbol owners: class Byte
> call site: constructor $eval in object $eval in package $line19
> == Source file context for tree position ==
> 3
> 4object $eval {
> 5lazyval $result = 
> $line19.$read.INSTANCE.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.res0
> 6lazyval $print: {_}root{_}.java.lang.String = {
> 7 $line19.$read.INSTANCE.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw.$iw
> 8
> 9""
> at 
> scala.reflect.internal.SymbolTable.throwAssertionError(SymbolTable.scala:185)
> at scala.reflect.internal.Symbols$Symbol.completeInfo(Symbols.scala:1525)
> at scala.reflect.internal.Symbols$Symbol.info(Symbols.scala:1514)
> at scala.reflect.internal.Symbols$Symbol.flatOwnerInfo(Symbols.scala:2353)
> at 
> scala.reflect.internal.Symbols$ClassSymbol.companionModule0(Symbols.scala:3346)
> at 
> scala.reflect.internal.Symbols$ClassSymbol.companionModule(Symbols.scala:3348)
> at 
> scala.reflect.internal.Symbols$ModuleClassSymbol.sourceModule(Symbols.scala:3487)
> at 
> scala.reflect.internal.Symbols.$anonfun$forEachRelevantSymbols$1$adapted(Symbols.scala:3802)
> at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
> at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
> at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38)
> at scala.reflect.internal.Symbols.markFlagsCompleted(Symbols.scala:3799)
> at scala.reflect.internal.Symbols.markFlagsCompleted$(Symbols.scala:3805)
> at scala.reflect.internal.SymbolTable.markFlagsCompleted(SymbolTable.scala:28)
> at 
> scala.reflect.internal.pickling.UnPickler$Scan.finishSym$1(UnPickler.scala:324)
> at 
> scala.reflect.internal.pickling.UnPickler$Scan.readSymbol(UnPickler.scala:342)
> at 
> scala.reflect.internal.pickling.UnPickler$Scan.readSymbolRef(UnPickler.scala:645)
> at 
> scala.reflect.internal.pickling.UnPickler$Scan.readType(UnPickler.scala:413)
> at 
>

[jira] [Created] (SPARK-44076) SPIP: Python Data Source API

2023-06-15 Thread Allison Wang (Jira)

Allison Wang created SPARK-44076:


 Summary: SPIP: Python Data Source API
 Key: SPARK-44076
 URL: https://issues.apache.org/jira/browse/SPARK-44076
 Project: Spark
  Issue Type: New Feature
  Components: PySpark
Affects Versions: 3.5.0
Reporter: Allison Wang


This proposal aims to introduce a simple API in Python for Data Sources. The 
idea is to enable Python developers to create data sources without having to 
learn Scala or deal with the complexities of the current data source APIs. The 
goal is to make a Python-based API that is simple and easy to use, thus making 
Spark more accessible to the wider Python developer community. This proposed 
approach is based on the recently introduced Python user-defined table 
functions (SPARK-43797) with extensions to support data sources.

{*}SPIP{*}: 
[https://docs.google.com/document/d/1oYrCKEKHzznljYfJO4kx5K_Npcgt1Slyfph3NEk7JRU/edit?usp=sharing]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-44075) Make 'transformStatCorr' lazy

2023-06-15 Thread Ruifeng Zheng (Jira)

Ruifeng Zheng created SPARK-44075:
-

 Summary: Make 'transformStatCorr' lazy
 Key: SPARK-44075
 URL: https://issues.apache.org/jira/browse/SPARK-44075
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.5.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-43474) Add support to create DataFrame Reference in Spark connect

2023-06-15 Thread Snoot.io (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-43474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17733305#comment-17733305
 ] 

Snoot.io commented on SPARK-43474:
--

User 'rangadi' has created a pull request for this issue:
https://github.com/apache/spark/pull/41618

> Add support to create DataFrame Reference in Spark connect
> --
>
> Key: SPARK-43474
> URL: https://issues.apache.org/jira/browse/SPARK-43474
> Project: Spark
>  Issue Type: Task
>  Components: Connect, Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Peng Zhong
>Priority: Major
>
> Add support in Spark Connect to cache a DataFrame on server side. From client 
> side, it can create a reference to that DataFrame given the cache key.
>  
> This function will be used in streaming foreachBatch(). Server needs to call 
> user function for every batch which takes a DataFrame as argument. With the 
> new function, we can just cache the DataFrame on the server. Pass the id back 
> to client which can creates the DataFrame reference. The server will replace 
> the reference when transforming.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-44025) CSV Table Read Error with CharType(length) column

2023-06-15 Thread Snoot.io (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-44025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17733304#comment-17733304
 ] 

Snoot.io commented on SPARK-44025:
--

User 'panbingkun' has created a pull request for this issue:
https://github.com/apache/spark/pull/41564

> CSV Table Read Error with CharType(length) column
> -
>
> Key: SPARK-44025
> URL: https://issues.apache.org/jira/browse/SPARK-44025
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
> Environment: {{apache/spark:v3.4.0 image}}
>Reporter: Fengyu Cao
>Priority: Major
>
> Problem:
>  # read a CSV format table
>  # table has a `CharType(length)` column
>  # read table failed with Exception:  `org.apache.spark.SparkException: Job 
> aborted due to stage failure: Task 0 in stage 36.0 failed 4 times, most 
> recent failure: Lost task 0.3 in stage 36.0 (TID 72) (10.113.9.208 executor 
> 11): java.lang.IllegalArgumentException: requirement failed: requiredSchema 
> (struct) should be the subset of dataSchema 
> (struct).`
>  
> reproduce with official image:
>  # {{docker run -it apache/spark:v3.4.0 /opt/spark/bin/spark-sql}}
>  # {{CREATE TABLE csv_bug (name STRING, age INT, job CHAR(4)) USING CSV 
> OPTIONS ('header' = 'true', 'sep' = ';') LOCATION 
> "/opt/spark/examples/src/main/resources/people.csv";}}
>  # SELECT * FROM csv_bug;
>  # ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
> java.lang.IllegalArgumentException: requirement failed: requiredSchema 
> (struct) should be the subset of dataSchema 
> (struct).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-44072) Update the incorrect sql example of insert table documentation

2023-06-15 Thread Snoot.io (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-44072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17733303#comment-17733303
 ] 

Snoot.io commented on SPARK-44072:
--

User 'Yohahaha' has created a pull request for this issue:
https://github.com/apache/spark/pull/41619

> Update the incorrect sql example of insert table documentation
> --
>
> Key: SPARK-44072
> URL: https://issues.apache.org/jira/browse/SPARK-44072
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 3.3.3, 3.4.1, 3.5.0
>Reporter: Yang Zhang
>Priority: Major
> Fix For: 3.5.0
>
>
> Latest docs of insert table has an incorrect sql example about 'Insert Using 
> a Typed Date Literal for a Partition Column Value'.
> It should be
> {code:java}
> INSERT OVERWRITE students PARTITION (birthday = date'2019-01-02') 
> VALUES('Jason Wang', '908 Bird St, Saratoga'); {code}
> Doc link: 
> https://spark.apache.org/docs/latest/sql-ref-syntax-dml-insert-table.html#insert-using-a-typed-date-literal-for-a-partition-column-value-1



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-44060) Code-gen for build side outer shuffled hash join

2023-06-15 Thread Snoot.io (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-44060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17733302#comment-17733302
 ] 

Snoot.io commented on SPARK-44060:
--

User 'szehon-ho' has created a pull request for this issue:
https://github.com/apache/spark/pull/41614

> Code-gen for build side outer shuffled hash join
> 
>
> Key: SPARK-44060
> URL: https://issues.apache.org/jira/browse/SPARK-44060
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Szehon Ho
>Priority: Major
>
> Here, build side outer join means LEFT OUTER join with build left, or RIGHT 
> OUTER join with build right.
> As a followup for https://github.com/apache/spark/pull/41398/ SPARK-36612 
> (non-codegen build-side outer shuffled hash join), this task is to add 
> code-gen for it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-44060) Code-gen for build side outer shuffled hash join

2023-06-15 Thread Snoot.io (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-44060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17733301#comment-17733301
 ] 

Snoot.io commented on SPARK-44060:
--

User 'szehon-ho' has created a pull request for this issue:
https://github.com/apache/spark/pull/41614

> Code-gen for build side outer shuffled hash join
> 
>
> Key: SPARK-44060
> URL: https://issues.apache.org/jira/browse/SPARK-44060
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Szehon Ho
>Priority: Major
>
> Here, build side outer join means LEFT OUTER join with build left, or RIGHT 
> OUTER join with build right.
> As a followup for https://github.com/apache/spark/pull/41398/ SPARK-36612 
> (non-codegen build-side outer shuffled hash join), this task is to add 
> code-gen for it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-44065) Optimize BroadcastHashJoin skew when localShuffleReader is disabled

2023-06-15 Thread GridGain Integration (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-44065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17733300#comment-17733300
 ] 

GridGain Integration commented on SPARK-44065:
--

User 'wForget' has created a pull request for this issue:
https://github.com/apache/spark/pull/41609

> Optimize BroadcastHashJoin skew when localShuffleReader is disabled
> ---
>
> Key: SPARK-44065
> URL: https://issues.apache.org/jira/browse/SPARK-44065
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Zhen Wang
>Priority: Major
>
> In RemoteShuffleService services such as uniffle and celeborn, it is 
> recommended to disable localShuffleReader by default for better performance. 
> But it may make BroadcastHashJoin skewed, so I want to optimize 
> BroadcastHashJoin skew in OptimizeSkewedJoin when localShuffleReader is 
> disabled.
>  
> Refer to:
> https://github.com/apache/incubator-celeborn#spark-configuration
> https://github.com/apache/incubator-uniffle/blob/master/docs/client_guide.md#support-spark-aqe



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-44074) `Logging plan changes for execution` test failed

2023-06-15 Thread Yang Jie (Jira)

Yang Jie created SPARK-44074:


 Summary: `Logging plan changes for execution` test failed
 Key: SPARK-44074
 URL: https://issues.apache.org/jira/browse/SPARK-44074
 Project: Spark
  Issue Type: Bug
  Components: SQL, Tests
Affects Versions: 3.5.0
Reporter: Yang Jie


run {{build/sbt clean "sql/test" 
-Dtest.exclude.tags=org.apache.spark.tags.ExtendedSQLTest,org.apache.spark.tags.SlowSQLTest}}

{{}}
{code:java}
2023-06-15T19:58:34.4105460Z �[0m[�[0m�[0minfo�[0m] 
�[0m�[0m�[32mQueryExecutionSuite:�[0m�[0m
2023-06-15T19:58:34.5395268Z �[0m[�[0m�[0minfo�[0m] �[0m�[0m�[32m- dumping 
query execution info to a file (77 milliseconds)�[0m�[0m
2023-06-15T19:58:34.5856902Z �[0m[�[0m�[0minfo�[0m] �[0m�[0m�[32m- dumping 
query execution info to an existing file (49 milliseconds)�[0m�[0m
2023-06-15T19:58:34.6099849Z �[0m[�[0m�[0minfo�[0m] �[0m�[0m�[32m- dumping 
query execution info to non-existing folder (25 milliseconds)�[0m�[0m
2023-06-15T19:58:34.6136467Z �[0m[�[0m�[0minfo�[0m] �[0m�[0m�[32m- dumping 
query execution info by invalid path (4 milliseconds)�[0m�[0m
2023-06-15T19:58:34.6425071Z �[0m[�[0m�[0minfo�[0m] �[0m�[0m�[32m- dumping 
query execution info to a file - explainMode=formatted (28 milliseconds)�[0m�[0m
2023-06-15T19:58:34.7084916Z �[0m[�[0m�[0minfo�[0m] �[0m�[0m�[32m- limit number 
of fields by sql config (66 milliseconds)�[0m�[0m
2023-06-15T19:58:34.7432299Z �[0m[�[0m�[0minfo�[0m] �[0m�[0m�[32m- check 
maximum fields restriction (34 milliseconds)�[0m�[0m
2023-06-15T19:58:34.7554546Z �[0m[�[0m�[0minfo�[0m] �[0m�[0m�[32m- toString() 
exception/error handling (11 milliseconds)�[0m�[0m
2023-06-15T19:58:34.7621424Z �[0m[�[0m�[0minfo�[0m] �[0m�[0m�[32m- SPARK-28346: 
clone the query plan between different stages (6 milliseconds)�[0m�[0m
2023-06-15T19:58:34.8001412Z �[0m[�[0m�[0minfo�[0m] �[0m�[0m�[31m- Logging plan 
changes for execution *** FAILED *** (12 milliseconds)�[0m�[0m
2023-06-15T19:58:34.8007977Z �[0m[�[0m�[0minfo�[0m] �[0m�[0m�[31m  
testAppender.loggingEvents.exists(((x$10: 
org.apache.logging.log4j.core.LogEvent) => 
x$10.getMessage().getFormattedMessage().contains(expectedMsg))) was false 
(QueryExecutionSuite.scala:232)�[0m�[0m 

{code}
 

but run {{build/sbt "sql/testOnly *QueryExecutionSuite"}} not this issue, need 
to investigate. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-43929) Add date time functions to Scala and Python - part 1

2023-06-15 Thread Ruifeng Zheng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng updated SPARK-43929:
--
Description: 
Add following functions:

* date_diff
* date_from_unix_date
* date_part
* dateadd
* datepart
* day

to:

* Scala API
* Python API
* Spark Connect Scala Client
* Spark Connect Python Client

  was:
Add following functions:

* date_diff
* date_from_unix_date
* date_part
* dateadd
* datepart
* day
* weekday
* convert_timezone
* extract
* now
* timestamp_micros
* timestamp_millis

to:

* Scala API
* Python API
* Spark Connect Scala Client
* Spark Connect Python Client


> Add date time functions to Scala and Python - part 1
> 
>
> Key: SPARK-43929
> URL: https://issues.apache.org/jira/browse/SPARK-43929
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark, SQL
>Affects Versions: 3.5.0
>Reporter: Ruifeng Zheng
>Priority: Major
>
> Add following functions:
> * date_diff
> * date_from_unix_date
> * date_part
> * dateadd
> * datepart
> * day
> to:
> * Scala API
> * Python API
> * Spark Connect Scala Client
> * Spark Connect Python Client



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-44073) Add date time functions to Scala and Python - part 2

2023-06-15 Thread Ruifeng Zheng (Jira)

Ruifeng Zheng created SPARK-44073:
-

 Summary: Add date time functions to Scala and Python - part 2
 Key: SPARK-44073
 URL: https://issues.apache.org/jira/browse/SPARK-44073
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, PySpark, SQL
Affects Versions: 3.5.0
Reporter: Ruifeng Zheng


Add following functions:

* weekday
* convert_timezone
* extract
* now
* timestamp_micros
* timestamp_millis

to:

* Scala API
* Python API
* Spark Connect Scala Client
* Spark Connect Python Client



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-43929) Add date time functions to Scala and Python - part 1

2023-06-15 Thread Ruifeng Zheng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng updated SPARK-43929:
--
Summary: Add date time functions to Scala and Python - part 1  (was: Add 
date time functions to Scala and Python)

> Add date time functions to Scala and Python - part 1
> 
>
> Key: SPARK-43929
> URL: https://issues.apache.org/jira/browse/SPARK-43929
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark, SQL
>Affects Versions: 3.5.0
>Reporter: Ruifeng Zheng
>Priority: Major
>
> Add following functions:
> * date_diff
> * date_from_unix_date
> * date_part
> * dateadd
> * datepart
> * day
> * weekday
> * convert_timezone
> * extract
> * now
> * timestamp_micros
> * timestamp_millis
> to:
> * Scala API
> * Python API
> * Spark Connect Scala Client
> * Spark Connect Python Client



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-44072) Update the incorrect sql example of insert table documentation

2023-06-15 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao resolved SPARK-44072.
--
Fix Version/s: (was: 3.4.1)
   (was: 3.3.3)
   Resolution: Fixed

Issue resolved by pull request 41619
[https://github.com/apache/spark/pull/41619]

> Update the incorrect sql example of insert table documentation
> --
>
> Key: SPARK-44072
> URL: https://issues.apache.org/jira/browse/SPARK-44072
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 3.3.3, 3.4.1, 3.5.0
>Reporter: Yang Zhang
>Priority: Major
> Fix For: 3.5.0
>
>
> Latest docs of insert table has an incorrect sql example about 'Insert Using 
> a Typed Date Literal for a Partition Column Value'.
> It should be
> {code:java}
> INSERT OVERWRITE students PARTITION (birthday = date'2019-01-02') 
> VALUES('Jason Wang', '908 Bird St, Saratoga'); {code}
> Doc link: 
> https://spark.apache.org/docs/latest/sql-ref-syntax-dml-insert-table.html#insert-using-a-typed-date-literal-for-a-partition-column-value-1



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-41599) Memory leak in FileSystem.CACHE when submitting apps to secure cluster using InProcessLauncher

2023-06-15 Thread Xieming Li (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17733286#comment-17733286
 ] 

Xieming Li edited comment on SPARK-41599 at 6/16/23 3:01 AM:
-

[~ste...@apache.org] [~maciejsmolenski] 

I am having this issue as well.

I'm encountering the same problem.

Could you please guide me on how to "explicitly disable the cache for that 
filesystem schema"?

I am trying to add the following configurations in my core-site.xml, but not 
sure if this is the right way.
{code:java}
  
    fs.hdfs.impl.disable.cache
    true
  
  
    fs.viewfs.impl.disable.cache
    true
   {code}


was (Author: risyomei):
[~ste...@apache.org] [~maciejsmolenski] 

I am having this issue as well.

I'm encountering the same problem.

Could you please guide me on how to "explicitly disable the cache for that 
filesystem schema"?

> Memory leak in FileSystem.CACHE when submitting apps to secure cluster using 
> InProcessLauncher
> --
>
> Key: SPARK-41599
> URL: https://issues.apache.org/jira/browse/SPARK-41599
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy, YARN
>Affects Versions: 3.1.2
>Reporter: Maciej Smolenski
>Priority: Major
> Attachments: InProcLaunchFsIssue.scala, 
> SPARK-41599-fixes-to-limit-FileSystem-CACHE-size-when-using-InProcessLauncher.diff
>
>
> When submitting spark application in kerberos environment the credentials of 
> 'current user' (UserGroupInformation.getCurrentUser()) are being modified.
> Filesystem.CACHE entries contain 'current user' (with user credentials) as a 
> key.
> Submitting many spark applications using InProcessLauncher cause that 
> FileSystem.CACHE becomes bigger and bigger.
> Finally process exits because of OutOfMemory error.
> Code for reproduction attached.
>  
> Output from running 'jmap -histo' on reproduction jvm shows that the number 
> of FileSystem$Cache$Key increases in time:
> time: #instances class
> 1671533274: 2 org.apache.hadoop.fs.FileSystem$Cache$Key
> 167155: 11 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671533395: 21 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671533455: 30 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671533515: 39 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671533576: 48 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671533636: 57 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671533696: 66 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671533757: 75 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671533817: 84 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671533877: 93 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671533937: 102 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671533998: 111 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671534058: 120 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671534118: 135 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671534178: 140 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671534239: 150 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671534299: 159 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671534359: 168 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671534419: 177 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671534480: 186 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671534540: 195 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671534600: 204 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671534661: 213 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671534721: 222 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671534781: 231 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671534841: 240 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671534902: 249 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671534962: 257 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671535022: 264 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671535083: 273 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671535143: 282 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671535203: 291 org.apache.hadoop.fs.FileSystem$Cache$Key



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41599) Memory leak in FileSystem.CACHE when submitting apps to secure cluster using InProcessLauncher

2023-06-15 Thread Xieming Li (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17733286#comment-17733286
 ] 

Xieming Li commented on SPARK-41599:


[~ste...@apache.org] [~maciejsmolenski] 

I am having this issue as well.

I'm encountering the same problem.

Could you please guide me on how to "explicitly disable the cache for that 
filesystem schema"?

> Memory leak in FileSystem.CACHE when submitting apps to secure cluster using 
> InProcessLauncher
> --
>
> Key: SPARK-41599
> URL: https://issues.apache.org/jira/browse/SPARK-41599
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy, YARN
>Affects Versions: 3.1.2
>Reporter: Maciej Smolenski
>Priority: Major
> Attachments: InProcLaunchFsIssue.scala, 
> SPARK-41599-fixes-to-limit-FileSystem-CACHE-size-when-using-InProcessLauncher.diff
>
>
> When submitting spark application in kerberos environment the credentials of 
> 'current user' (UserGroupInformation.getCurrentUser()) are being modified.
> Filesystem.CACHE entries contain 'current user' (with user credentials) as a 
> key.
> Submitting many spark applications using InProcessLauncher cause that 
> FileSystem.CACHE becomes bigger and bigger.
> Finally process exits because of OutOfMemory error.
> Code for reproduction attached.
>  
> Output from running 'jmap -histo' on reproduction jvm shows that the number 
> of FileSystem$Cache$Key increases in time:
> time: #instances class
> 1671533274: 2 org.apache.hadoop.fs.FileSystem$Cache$Key
> 167155: 11 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671533395: 21 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671533455: 30 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671533515: 39 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671533576: 48 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671533636: 57 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671533696: 66 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671533757: 75 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671533817: 84 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671533877: 93 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671533937: 102 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671533998: 111 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671534058: 120 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671534118: 135 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671534178: 140 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671534239: 150 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671534299: 159 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671534359: 168 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671534419: 177 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671534480: 186 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671534540: 195 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671534600: 204 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671534661: 213 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671534721: 222 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671534781: 231 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671534841: 240 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671534902: 249 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671534962: 257 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671535022: 264 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671535083: 273 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671535143: 282 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671535203: 291 org.apache.hadoop.fs.FileSystem$Cache$Key



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-44072) Update the incorrect sql example of insert table documentation

2023-06-15 Thread Yang Zhang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Zhang updated SPARK-44072:
---
Description: 
Latest docs of insert table has an incorrect sql example about 'Insert Using a 
Typed Date Literal for a Partition Column Value'.

It should be
{code:java}
INSERT OVERWRITE students PARTITION (birthday = date'2019-01-02') VALUES('Jason 
Wang', '908 Bird St, Saratoga'); {code}
Doc link: 
https://spark.apache.org/docs/latest/sql-ref-syntax-dml-insert-table.html#insert-using-a-typed-date-literal-for-a-partition-column-value-1

  was:
Latest docs of insert table has an incorrect sql example about 'Insert Using a 
Typed Date Literal for a Partition Column Value'.

It should be
{code:java}
INSERT OVERWRITE students PARTITION (birthday = date'2019-01-02') VALUES('Jason 
Wang', '908 Bird St, Saratoga'); {code}


> Update the incorrect sql example of insert table documentation
> --
>
> Key: SPARK-44072
> URL: https://issues.apache.org/jira/browse/SPARK-44072
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 3.3.3, 3.4.1, 3.5.0
>Reporter: Yang Zhang
>Priority: Major
> Fix For: 3.3.3, 3.4.1, 3.5.0
>
>
> Latest docs of insert table has an incorrect sql example about 'Insert Using 
> a Typed Date Literal for a Partition Column Value'.
> It should be
> {code:java}
> INSERT OVERWRITE students PARTITION (birthday = date'2019-01-02') 
> VALUES('Jason Wang', '908 Bird St, Saratoga'); {code}
> Doc link: 
> https://spark.apache.org/docs/latest/sql-ref-syntax-dml-insert-table.html#insert-using-a-typed-date-literal-for-a-partition-column-value-1



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-44072) Update the incorrect sql example of insert table documentation

2023-06-15 Thread Yang Zhang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Zhang updated SPARK-44072:
---
Description: 
Latest docs of insert table has an incorrect sql example about 'Insert Using a 
Typed Date Literal for a Partition Column Value'.

It should be
{code:java}
INSERT OVERWRITE students PARTITION (birthday = date'2019-01-02') VALUES('Jason 
Wang', '908 Bird St, Saratoga'); {code}

> Update the incorrect sql example of insert table documentation
> --
>
> Key: SPARK-44072
> URL: https://issues.apache.org/jira/browse/SPARK-44072
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 3.3.3, 3.4.1, 3.5.0
>Reporter: Yang Zhang
>Priority: Major
> Fix For: 3.3.3, 3.4.1, 3.5.0
>
>
> Latest docs of insert table has an incorrect sql example about 'Insert Using 
> a Typed Date Literal for a Partition Column Value'.
> It should be
> {code:java}
> INSERT OVERWRITE students PARTITION (birthday = date'2019-01-02') 
> VALUES('Jason Wang', '908 Bird St, Saratoga'); {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-43201) Inconsistency between from_avro and from_json function

2023-06-15 Thread Jia Fan (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-43201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17733281#comment-17733281
 ] 

Jia Fan commented on SPARK-43201:
-

If avroSchema1 not equals avroSchema2, the dataframe's schema would not match 
for each row. This will be a problem.

> Inconsistency between from_avro and from_json function
> --
>
> Key: SPARK-43201
> URL: https://issues.apache.org/jira/browse/SPARK-43201
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Philip Adetiloye
>Priority: Major
>
> Spark from_avro function does not allow schema parameter to use dataframe 
> column but takes only a String schema:
> {code:java}
> def from_avro(col: Column, jsonFormatSchema: String): Column {code}
> This makes it impossible to deserialize rows of Avro records with different 
> schema since only one schema string could be pass externally. 
>  
> Here is what I would expect like from_json function:
> {code:java}
> def from_avro(col: Column, jsonFormatSchema: Column): Column  {code}
> code example:
> {code:java}
> import org.apache.spark.sql.functions.from_avro
> val avroSchema1 = 
> """{"type":"record","name":"myrecord","fields":[{"name":"str1","type":"string"},{"name":"str2","type":"string"}]}"""
>  
> val avroSchema2 = 
> """{"type":"record","name":"myrecord","fields":[{"name":"str1","type":"string"},{"name":"str2","type":"string"}]}"""
> val df = Seq(
>   (Array[Byte](10, 97, 112, 112, 108, 101, 49, 0), avroSchema1),
>   (Array[Byte](10, 97, 112, 112, 108, 101, 50, 0), avroSchema2)
> ).toDF("binaryData", "schema")
> val parsed = df.select(from_avro($"binaryData", $"schema").as("parsedData"))
> parsed.show()
> // Output:
> // ++
> // |  parsedData|
> // ++
> // |[apple1, 1.0]|
> // |[apple2, 2.0]|
> // ++
>  {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-44072) Update the incorrect sql example of insert table documentation

2023-06-15 Thread Yang Zhang (Jira)

Yang Zhang created SPARK-44072:
--

 Summary: Update the incorrect sql example of insert table 
documentation
 Key: SPARK-44072
 URL: https://issues.apache.org/jira/browse/SPARK-44072
 Project: Spark
  Issue Type: Documentation
  Components: Documentation
Affects Versions: 3.3.3, 3.4.1, 3.5.0
Reporter: Yang Zhang
 Fix For: 3.3.3, 3.4.1, 3.5.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-44065) Optimize BroadcastHashJoin skew when localShuffleReader is disabled

2023-06-15 Thread Zhen Wang (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-44065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17733271#comment-17733271
 ] 

Zhen Wang commented on SPARK-44065:
---

https://github.com/apache/spark/pull/41609

> Optimize BroadcastHashJoin skew when localShuffleReader is disabled
> ---
>
> Key: SPARK-44065
> URL: https://issues.apache.org/jira/browse/SPARK-44065
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Zhen Wang
>Priority: Major
>
> In RemoteShuffleService services such as uniffle and celeborn, it is 
> recommended to disable localShuffleReader by default for better performance. 
> But it may make BroadcastHashJoin skewed, so I want to optimize 
> BroadcastHashJoin skew in OptimizeSkewedJoin when localShuffleReader is 
> disabled.
>  
> Refer to:
> https://github.com/apache/incubator-celeborn#spark-configuration
> https://github.com/apache/incubator-uniffle/blob/master/docs/client_guide.md#support-spark-aqe



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-43937) Add ifnull,isnotnull,equal_null,nullif,nvl,nvl2 to Scala and Python

2023-06-15 Thread Ruifeng Zheng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng reassigned SPARK-43937:
-

Assignee: BingKun Pan

> Add ifnull,isnotnull,equal_null,nullif,nvl,nvl2 to Scala and Python
> ---
>
> Key: SPARK-43937
> URL: https://issues.apache.org/jira/browse/SPARK-43937
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark, SQL
>Affects Versions: 3.5.0
>Reporter: Ruifeng Zheng
>Assignee: BingKun Pan
>Priority: Major
>
> Add following functions:
> * -not-
> * -if-
> * ifnull
> * isnotnull
> * equal_null
> * nullif
> * nvl
> * nvl2
> to:
> * Scala API
> * Python API
> * Spark Connect Scala Client
> * Spark Connect Python Client



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-43937) Add ifnull,isnotnull,equal_null,nullif,nvl,nvl2 to Scala and Python

2023-06-15 Thread Ruifeng Zheng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-43937.
---
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 41534
[https://github.com/apache/spark/pull/41534]

> Add ifnull,isnotnull,equal_null,nullif,nvl,nvl2 to Scala and Python
> ---
>
> Key: SPARK-43937
> URL: https://issues.apache.org/jira/browse/SPARK-43937
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark, SQL
>Affects Versions: 3.5.0
>Reporter: Ruifeng Zheng
>Assignee: BingKun Pan
>Priority: Major
> Fix For: 3.5.0
>
>
> Add following functions:
> * -not-
> * -if-
> * ifnull
> * isnotnull
> * equal_null
> * nullif
> * nvl
> * nvl2
> to:
> * Scala API
> * Python API
> * Spark Connect Scala Client
> * Spark Connect Python Client



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-43925) Add some, bool_or,bool_and,every to Scala and Python

2023-06-15 Thread Ruifeng Zheng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng updated SPARK-43925:
--
Description: 
Add following functions:

* -any-
* some
* bool_or
* bool_and
* every

to:

* Scala API
* Python API
* Spark Connect Scala Client
* Spark Connect Python Client

  was:
Add following functions:

* any
* some
* bool_or
* bool_and
* every

to:

* Scala API
* Python API
* Spark Connect Scala Client
* Spark Connect Python Client


> Add some, bool_or,bool_and,every to Scala and Python
> 
>
> Key: SPARK-43925
> URL: https://issues.apache.org/jira/browse/SPARK-43925
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark, SQL
>Affects Versions: 3.5.0
>Reporter: Ruifeng Zheng
>Priority: Major
>
> Add following functions:
> * -any-
> * some
> * bool_or
> * bool_and
> * every
> to:
> * Scala API
> * Python API
> * Spark Connect Scala Client
> * Spark Connect Python Client



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-43925) Add some, bool_or,bool_and,every to Scala and Python

2023-06-15 Thread Ruifeng Zheng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng updated SPARK-43925:
--
Summary: Add some, bool_or,bool_and,every to Scala and Python  (was: Add 
any, some, bool_or,bool_and,every to Scala and Python)

> Add some, bool_or,bool_and,every to Scala and Python
> 
>
> Key: SPARK-43925
> URL: https://issues.apache.org/jira/browse/SPARK-43925
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark, SQL
>Affects Versions: 3.5.0
>Reporter: Ruifeng Zheng
>Priority: Major
>
> Add following functions:
> * any
> * some
> * bool_or
> * bool_and
> * every
> to:
> * Scala API
> * Python API
> * Spark Connect Scala Client
> * Spark Connect Python Client



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-44071) Define UnresolvedNode trait to reduce redundancy

2023-06-15 Thread Ryan Johnson (Jira)

Ryan Johnson created SPARK-44071:


 Summary: Define UnresolvedNode trait to reduce redundancy
 Key: SPARK-44071
 URL: https://issues.apache.org/jira/browse/SPARK-44071
 Project: Spark
  Issue Type: New Feature
  Components: Spark Core
Affects Versions: 3.5.0
Reporter: Ryan Johnson


Looking at 
[unresolved.scala|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala],
 spark would benefit from an {{UnresolvedNode}} trait that various 
{{UnresolvedFoo}} classes could inherit from:
{code:java}
trait UnresolvedNode extends LogicalPlan {
  override def output: Seq[Attribute] = Nil
  override lazy val resolved = false
}{code}
Today, the code is duplicated in ~20 locations.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-43511) Implemented State APIs for Spark Connect Scala

2023-06-15 Thread GridGain Integration (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-43511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17733195#comment-17733195
 ] 

GridGain Integration commented on SPARK-43511:
--

User 'bogao007' has created a pull request for this issue:
https://github.com/apache/spark/pull/41558

> Implemented State APIs for Spark Connect Scala
> --
>
> Key: SPARK-43511
> URL: https://issues.apache.org/jira/browse/SPARK-43511
> Project: Spark
>  Issue Type: Task
>  Components: Connect, Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Bo Gao
>Priority: Major
>
> Implemented MapGroupsWithState and FlatMapGroupsWithState APIs for Spark 
> Connect Scala



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-44070) Bump snappy-java 1.1.10.1

2023-06-15 Thread Cheng Pan (Jira)

Cheng Pan created SPARK-44070:
-

 Summary: Bump snappy-java 1.1.10.1
 Key: SPARK-44070
 URL: https://issues.apache.org/jira/browse/SPARK-44070
 Project: Spark
  Issue Type: Dependency upgrade
  Components: Build
Affects Versions: 3.5.0
Reporter: Cheng Pan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-44055) Remove redundant `override` from `CheckpointRDD`

2023-06-15 Thread Yang Jie (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie reassigned SPARK-44055:


Assignee: Yang Jie

> Remove redundant `override` from `CheckpointRDD`
> 
>
> Key: SPARK-44055
> URL: https://issues.apache.org/jira/browse/SPARK-44055
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-44055) Remove redundant `override` from `CheckpointRDD`

2023-06-15 Thread Yang Jie (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie resolved SPARK-44055.
--
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 41597
[https://github.com/apache/spark/pull/41597]

> Remove redundant `override` from `CheckpointRDD`
> 
>
> Key: SPARK-44055
> URL: https://issues.apache.org/jira/browse/SPARK-44055
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-44069) maven test ReplSuite failed

2023-06-15 Thread Yang Jie (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-44069:
-
Description: 
https://github.com/LuciferYang/spark/actions/runs/5274544416/jobs/9541917589  
(was: {code:java}
./build/mvn  -DskipTests -Pyarn -Pmesos -Pkubernetes -Pvolcano -Phive 
-Phive-thriftserver -Phadoop-cloud -Pspark-ganglia-lgpl  clean install
 build/mvn test -pl repl{code}
 
{code:java}
ReplSuite:
17500Spark context available as 'sc' (master = local, app id = 
local-1686829049116).
17501Spark session available as 'spark'.
17502- SPARK-15236: use Hive catalog *** FAILED ***
17503  isContain was true Interpreter output contained 'Exception':
17504  Welcome to
17505  __
17506   / __/__  ___ _/ /__
17507  _\ \/ _ \/ _ `/ __/  '_/
17508 /___/ .__/\_,_/_/ /_/\_\   version 3.5.0-SNAPSHOT
17509/_/
17510   
17511  Using Scala version 2.12.17 (OpenJDK 64-Bit Server VM, Java 1.8.0_372)
17512  Type in expressions to have them evaluated.
17513  Type :help for more information.
17514  
17515  scala> 
17516  scala> java.lang.NoClassDefFoundError: 
org/sparkproject/guava/cache/CacheBuilder
17517at 
org.apache.spark.sql.catalyst.catalog.SessionCatalog.(SessionCatalog.scala:197)
17518at 
org.apache.spark.sql.internal.BaseSessionStateBuilder.catalog$lzycompute(BaseSessionStateBuilder.scala:153)
17519at 
org.apache.spark.sql.internal.BaseSessionStateBuilder.catalog(BaseSessionStateBuilder.scala:152)
17520at 
org.apache.spark.sql.internal.BaseSessionStateBuilder.v2SessionCatalog$lzycompute(BaseSessionStateBuilder.scala:166)
17521at 
org.apache.spark.sql.internal.BaseSessionStateBuilder.v2SessionCatalog(BaseSessionStateBuilder.scala:166)
17522at 
org.apache.spark.sql.internal.BaseSessionStateBuilder.catalogManager$lzycompute(BaseSessionStateBuilder.scala:168)
17523at 
org.apache.spark.sql.internal.BaseSessionStateBuilder.catalogManager(BaseSessionStateBuilder.scala:168)
17524at 
org.apache.spark.sql.internal.BaseSessionStateBuilder$$anon$1.(BaseSessionStateBuilder.scala:185)
17525at 
org.apache.spark.sql.internal.BaseSessionStateBuilder.analyzer(BaseSessionStateBuilder.scala:185)
17526at 
org.apache.spark.sql.internal.BaseSessionStateBuilder.$anonfun$build$2(BaseSessionStateBuilder.scala:373)
17527at 
org.apache.spark.sql.internal.SessionState.analyzer$lzycompute(SessionState.scala:92)
17528at 
org.apache.spark.sql.internal.SessionState.analyzer(SessionState.scala:92)
17529at 
org.apache.spark.sql.execution.QueryExecution.$anonfun$analyzed$1(QueryExecution.scala:76)
17530at 
org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
17531at 
org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$2(QueryExecution.scala:202)
17532at 
org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:529)
17533at 
org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:202)
17534at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:827)
17535at 
org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:201)
17536at 
org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:76)
17537at 
org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:74)
17538at 
org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:66)
17539at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:99)
17540at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:827)
17541at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:97)
17542at 
org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:640)
17543at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:827)
17544at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:630)
17545at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:671)
17546... 94 elided
17547  Caused by: java.lang.ClassNotFoundException: 
org.sparkproject.guava.cache.CacheBuilder
17548at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
17549at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
17550at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
17551at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
17552... 123 more
17553  
17554  scala>  | 
17555  scala> :quit (ReplSuite.scala:83)
17556Spark context available as 'sc' (master = local, app id = 
local-1686829054261).
17557Spark session available as 'spark'.
17558- SPARK-15236: use in-memory catalog
17559Spark context available as 'sc' (master = local, app id = 
local-1686829056083).
17560Spark session available as 'spark'.
17561- broadcast vars
17562Spark context available as 'sc' (master =

[jira] [Created] (SPARK-44069) maven test ReplSuite failed

2023-06-15 Thread Yang Jie (Jira)

Yang Jie created SPARK-44069:


 Summary: maven test ReplSuite failed
 Key: SPARK-44069
 URL: https://issues.apache.org/jira/browse/SPARK-44069
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.5.0
Reporter: Yang Jie


{code:java}
./build/mvn  -DskipTests -Pyarn -Pmesos -Pkubernetes -Pvolcano -Phive 
-Phive-thriftserver -Phadoop-cloud -Pspark-ganglia-lgpl  clean install
 build/mvn test -pl repl{code}
 
{code:java}
ReplSuite:
17500Spark context available as 'sc' (master = local, app id = 
local-1686829049116).
17501Spark session available as 'spark'.
17502- SPARK-15236: use Hive catalog *** FAILED ***
17503  isContain was true Interpreter output contained 'Exception':
17504  Welcome to
17505  __
17506   / __/__  ___ _/ /__
17507  _\ \/ _ \/ _ `/ __/  '_/
17508 /___/ .__/\_,_/_/ /_/\_\   version 3.5.0-SNAPSHOT
17509/_/
17510   
17511  Using Scala version 2.12.17 (OpenJDK 64-Bit Server VM, Java 1.8.0_372)
17512  Type in expressions to have them evaluated.
17513  Type :help for more information.
17514  
17515  scala> 
17516  scala> java.lang.NoClassDefFoundError: 
org/sparkproject/guava/cache/CacheBuilder
17517at 
org.apache.spark.sql.catalyst.catalog.SessionCatalog.(SessionCatalog.scala:197)
17518at 
org.apache.spark.sql.internal.BaseSessionStateBuilder.catalog$lzycompute(BaseSessionStateBuilder.scala:153)
17519at 
org.apache.spark.sql.internal.BaseSessionStateBuilder.catalog(BaseSessionStateBuilder.scala:152)
17520at 
org.apache.spark.sql.internal.BaseSessionStateBuilder.v2SessionCatalog$lzycompute(BaseSessionStateBuilder.scala:166)
17521at 
org.apache.spark.sql.internal.BaseSessionStateBuilder.v2SessionCatalog(BaseSessionStateBuilder.scala:166)
17522at 
org.apache.spark.sql.internal.BaseSessionStateBuilder.catalogManager$lzycompute(BaseSessionStateBuilder.scala:168)
17523at 
org.apache.spark.sql.internal.BaseSessionStateBuilder.catalogManager(BaseSessionStateBuilder.scala:168)
17524at 
org.apache.spark.sql.internal.BaseSessionStateBuilder$$anon$1.(BaseSessionStateBuilder.scala:185)
17525at 
org.apache.spark.sql.internal.BaseSessionStateBuilder.analyzer(BaseSessionStateBuilder.scala:185)
17526at 
org.apache.spark.sql.internal.BaseSessionStateBuilder.$anonfun$build$2(BaseSessionStateBuilder.scala:373)
17527at 
org.apache.spark.sql.internal.SessionState.analyzer$lzycompute(SessionState.scala:92)
17528at 
org.apache.spark.sql.internal.SessionState.analyzer(SessionState.scala:92)
17529at 
org.apache.spark.sql.execution.QueryExecution.$anonfun$analyzed$1(QueryExecution.scala:76)
17530at 
org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
17531at 
org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$2(QueryExecution.scala:202)
17532at 
org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:529)
17533at 
org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:202)
17534at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:827)
17535at 
org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:201)
17536at 
org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:76)
17537at 
org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:74)
17538at 
org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:66)
17539at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:99)
17540at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:827)
17541at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:97)
17542at 
org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:640)
17543at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:827)
17544at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:630)
17545at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:671)
17546... 94 elided
17547  Caused by: java.lang.ClassNotFoundException: 
org.sparkproject.guava.cache.CacheBuilder
17548at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
17549at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
17550at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
17551at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
17552... 123 more
17553  
17554  scala>  | 
17555  scala> :quit (ReplSuite.scala:83)
17556Spark context available as 'sc' (master = local, app id = 
local-1686829054261).
17557Spark session available as 'spark'.
17558- SPARK-15236: use in-memory catalog
17559Spark context available as 'sc' (master = local, app id = 
local-1686829056083).
17560Spark session available as

[jira] [Created] (SPARK-44068) Support positional parameters in Scala connect client

2023-06-15 Thread Max Gekk (Jira)

Max Gekk created SPARK-44068:


 Summary: Support positional parameters in Scala connect client
 Key: SPARK-44068
 URL: https://issues.apache.org/jira/browse/SPARK-44068
 Project: Spark
  Issue Type: New Feature
  Components: Connect, SQL
Affects Versions: 3.5.0
Reporter: Max Gekk


Implement positional parameters of parametrized queries in the Scala connect 
client.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-43942) Add string functions to Scala and Python - part 1

2023-06-15 Thread Hudson (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-43942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17733028#comment-17733028
 ] 

Hudson commented on SPARK-43942:


User 'panbingkun' has created a pull request for this issue:
https://github.com/apache/spark/pull/41561

> Add string functions to Scala and Python - part 1
> -
>
> Key: SPARK-43942
> URL: https://issues.apache.org/jira/browse/SPARK-43942
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark, SQL
>Affects Versions: 3.5.0
>Reporter: Ruifeng Zheng
>Priority: Major
>
> Add following functions:
> * char
> * btrim
> * char_length
> * character_length
> * chr
> * contains
> * elt
> * find_in_set
> * like
> * ilike
> * lcase
> * ucase
> * len
> * left
> * right
> to:
> * Scala API
> * Python API
> * Spark Connect Scala Client
> * Spark Connect Python Client



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-43942) Add string functions to Scala and Python - part 1

2023-06-15 Thread Hudson (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-43942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17733027#comment-17733027
 ] 

Hudson commented on SPARK-43942:


User 'panbingkun' has created a pull request for this issue:
https://github.com/apache/spark/pull/41561

> Add string functions to Scala and Python - part 1
> -
>
> Key: SPARK-43942
> URL: https://issues.apache.org/jira/browse/SPARK-43942
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark, SQL
>Affects Versions: 3.5.0
>Reporter: Ruifeng Zheng
>Priority: Major
>
> Add following functions:
> * char
> * btrim
> * char_length
> * character_length
> * chr
> * contains
> * elt
> * find_in_set
> * like
> * ilike
> * lcase
> * ucase
> * len
> * left
> * right
> to:
> * Scala API
> * Python API
> * Spark Connect Scala Client
> * Spark Connect Python Client



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-44067) Warning for the pandas-related behavior changes in next major release

2023-06-15 Thread Haejoon Lee (Jira)

Haejoon Lee created SPARK-44067:
---

 Summary: Warning for the pandas-related behavior changes in next 
major release
 Key: SPARK-44067
 URL: https://issues.apache.org/jira/browse/SPARK-44067
 Project: Spark
  Issue Type: Sub-task
  Components: Pandas API on Spark, PySpark
Affects Versions: 3.5.0
Reporter: Haejoon Lee


There will be many breaking changes in Spark 4.0.0. so we should warn in 
advance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-44066) Support positional parameters in parameterized query

2023-06-15 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-44066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17732955#comment-17732955
 ] 

ASF GitHub Bot commented on SPARK-44066:


User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/41568

> Support positional parameters in parameterized query
> 
>
> Key: SPARK-44066
> URL: https://issues.apache.org/jira/browse/SPARK-44066
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>
> As a follow-up to the parameterized query we added recently, we’d like to 
> support positional parameters. This is part of the SQL standard and JDBC/ODBC 
> protocol.
> Example: update COFFEES set TOTAL = TOTAL + ? where COF_NAME = ?
> Note that positional and named param marker cannot be used in the same query.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-44066) Support positional parameters in parameterized query

2023-06-15 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-44066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17732957#comment-17732957
 ] 

ASF GitHub Bot commented on SPARK-44066:


User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/41568

> Support positional parameters in parameterized query
> 
>
> Key: SPARK-44066
> URL: https://issues.apache.org/jira/browse/SPARK-44066
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>
> As a follow-up to the parameterized query we added recently, we’d like to 
> support positional parameters. This is part of the SQL standard and JDBC/ODBC 
> protocol.
> Example: update COFFEES set TOTAL = TOTAL + ? where COF_NAME = ?
> Note that positional and named param marker cannot be used in the same query.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-43952) Cancel Spark jobs not only by a single "jobgroup", but allow multiple "job tags"

2023-06-15 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-43952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17732947#comment-17732947
 ] 

ASF GitHub Bot commented on SPARK-43952:


User 'juliuszsompolski' has created a pull request for this issue:
https://github.com/apache/spark/pull/41440

> Cancel Spark jobs not only by a single "jobgroup", but allow multiple "job 
> tags"
> 
>
> Key: SPARK-43952
> URL: https://issues.apache.org/jira/browse/SPARK-43952
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 3.5.0
>Reporter: Juliusz Sompolski
>Priority: Major
>
> Currently, the only way to cancel running Spark Jobs is by using 
> SparkContext.cancelJobGroup, using a job group name that was previously set 
> using SparkContext.setJobGroup. This is problematic if multiple different 
> parts of the system want to do cancellation, and set their own ids.
> For example, 
> [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala#L133]
>  sets it's own job group, which may override job group set by user. This way, 
> if user cancels the job group they set, it will not cancel these broadcast 
> jobs launches from within their jobs...
> As a solution, consider adding SparkContext.addJobTag / 
> SparkContext.removeJobTag, which would allow to have multiple "tags" on the 
> jobs, and introduce SparkContext.cancelJobsByTag to allow more flexible 
> cancelling of jobs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-38200) [SQL] Spark JDBC Savemode Supports Upsert

2023-06-15 Thread Enrico Minack (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-38200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17732945#comment-17732945
 ] 

Enrico Minack commented on SPARK-38200:
---

Created pull request for this: https://github.com/apache/spark/pull/41611

> [SQL] Spark JDBC Savemode Supports Upsert
> -
>
> Key: SPARK-38200
> URL: https://issues.apache.org/jira/browse/SPARK-38200
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: melin
>Priority: Major
>
> upsert sql for different databases， Most databases support merge sql：
> sqlserver merge into sql : 
> [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/sqlserver/SqlServerDialect.java]
> mysql: 
> [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/mysql/MysqlDialect.java]
> oracle merge into sql : 
> [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/oracle/OracleDialect.java]
> postgres: 
> [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/psql/PostgresDialect.java]
> postgres merg into sql : 
> [https://www.postgresql.org/docs/current/sql-merge.html]
> db2 merge into sql : 
> [https://www.ibm.com/docs/en/db2-for-zos/12?topic=statements-merge]
> derby merge into sql: 
> [https://db.apache.org/derby/docs/10.14/ref/rrefsqljmerge.html]
> he merg into sql : 
> [https://www.tutorialspoint.com/h2_database/h2_database_merge.htm]
>  
> [~yao] 
>  
> https://github.com/melin/datatunnel/tree/master/plugins/jdbc/src/main/scala/com/superior/datatunnel/plugin/jdbc/support/dialect
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-19335) Spark should support doing an efficient DataFrame Upsert via JDBC

2023-06-15 Thread Enrico Minack (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-19335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17732943#comment-17732943
 ] 

Enrico Minack commented on SPARK-19335:
---

Created pull request for this: https://github.com/apache/spark/pull/41518

> Spark should support doing an efficient DataFrame Upsert via JDBC
> -
>
> Key: SPARK-19335
> URL: https://issues.apache.org/jira/browse/SPARK-19335
> Project: Spark
>  Issue Type: Improvement
>Reporter: Ilya Ganelin
>Priority: Minor
>
> Doing a database update, as opposed to an insert is useful, particularly when 
> working with streaming applications which may require revisions to previously 
> stored data. 
> Spark DataFrames/DataSets do not currently support an Update feature via the 
> JDBC Writer allowing only Overwrite or Append.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-44052) Add util to get proper Column or DataFrame class for Spark Connect.

2023-06-15 Thread Ignite TC Bot (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-44052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17732941#comment-17732941
 ] 

Ignite TC Bot commented on SPARK-44052:
---

User 'itholic' has created a pull request for this issue:
https://github.com/apache/spark/pull/41570

> Add util to get proper Column or DataFrame class for Spark Connect.
> ---
>
> Key: SPARK-44052
> URL: https://issues.apache.org/jira/browse/SPARK-44052
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Pandas API on Spark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Priority: Major
>
> There are many codes are duplicated to get proper PySparkColumn or 
> PySparkDataFrame, so it would be great if we have util function to 
> deduplicate these codes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-43291) Match behavior for DataFrame.cov on string DataFrame

2023-06-15 Thread Haejoon Lee (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-43291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17732932#comment-17732932
 ] 

Haejoon Lee commented on SPARK-43291:
-

With the major release of pandas 2.0.0 on April 3, 2023, numerous breaking 
changes have been introduced. So, we have made the decision to postpone 
addressing these breaking changes until the next major release of Spark, 
version 4.0.0 to minimize disruptions for our users and provide a more seamless 
upgrade experience.

The pandas 2.0.0 release includes a significant number of updates, such as API 
removals, changes in API behavior, parameter removals, parameter behavior 
changes, and bug fixes. We have planned the following approach for each item:

- {*}API Removals{*}: Removed APIs will remain deprecated in Spark 3.5.0, 
provide appropriate warnings, and will be removed in Spark 4.0.0.

- {*}API Behavior Changes{*}: APIs with changed behavior will retain the 
behavior in Spark 3.5.0, provide appropriate warnings, and will align the 
behavior with pandas in Spark 4.0.0.

- {*}Parameter Removals{*}: Removed parameters will remain deprecated in Spark 
3.5.0, provide appropriate warnings, and will be removed in Spark 4.0.0.

- {*}Parameter Behavior Changes{*}: Parameters with changed behavior will 
retain the behavior in Spark 3.5.0, provide appropriate warnings, and will 
align the behavior with pandas in Spark 4.0.0.

- {*}Bug Fixes{*}: Bug fixes mainly related to correctness issues will be fixed 
in pandas 3.5.0.

*To recap, all breaking changes related to pandas 2.0.0 will be supported in 
Spark 4.0.0,* *and will remain deprecated with appropriate errors in Spark 
3.5.0.*
 
Will submit a PR that deprecates all APIs and adds warnings very soon.

> Match behavior for DataFrame.cov on string DataFrame
> 
>
> Key: SPARK-43291
> URL: https://issues.apache.org/jira/browse/SPARK-43291
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Should enable test below:
> {code:java}
> pdf = pd.DataFrame([("1", "2"), ("0", "3"), ("2", "0"), ("1", "1")], 
> columns=["a", "b"])
> psdf = ps.from_pandas(pdf)
> self.assert_eq(pdf.cov(), psdf.cov()) {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-44066) Support positional parameters in parameterized query

2023-06-15 Thread Max Gekk (Jira)

Max Gekk created SPARK-44066:


 Summary: Support positional parameters in parameterized query
 Key: SPARK-44066
 URL: https://issues.apache.org/jira/browse/SPARK-44066
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Affects Versions: 3.5.0
Reporter: Max Gekk
Assignee: Max Gekk


As a follow-up to the parameterized query we added recently, we’d like to 
support positional parameters. This is part of the SQL standard and JDBC/ODBC 
protocol.

Example: update COFFEES set TOTAL = TOTAL + ? where COF_NAME = ?

Note that positional and named param marker cannot be used in the same query.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-44065) Optimize BroadcastHashJoin skew when localShuffleReader is disabled

2023-06-15 Thread Zhen Wang (Jira)

Zhen Wang created SPARK-44065:
-

 Summary: Optimize BroadcastHashJoin skew when localShuffleReader 
is disabled
 Key: SPARK-44065
 URL: https://issues.apache.org/jira/browse/SPARK-44065
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.5.0
Reporter: Zhen Wang


In RemoteShuffleService services such as uniffle and celeborn, it is 
recommended to disable localShuffleReader by default for better performance. 
But it may make BroadcastHashJoin skewed, so I want to optimize 
BroadcastHashJoin skew in OptimizeSkewedJoin when localShuffleReader is 
disabled.

 

Refer to:

https://github.com/apache/incubator-celeborn#spark-configuration

https://github.com/apache/incubator-uniffle/blob/master/docs/client_guide.md#support-spark-aqe



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-44031) Upgrade silencer to 1.7.13

2023-06-15 Thread Yuming Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang resolved SPARK-44031.
-
Fix Version/s: 3.5.0
 Assignee: Dongjoon Hyun
   Resolution: Fixed

> Upgrade silencer to 1.7.13
> --
>
> Key: SPARK-44031
> URL: https://issues.apache.org/jira/browse/SPARK-44031
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-43627) Enable pyspark.pandas.spark.functions.skew in Spark Connect.

2023-06-15 Thread Ruifeng Zheng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-43627.
---
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 41604
[https://github.com/apache/spark/pull/41604]

> Enable pyspark.pandas.spark.functions.skew in Spark Connect.
> 
>
> Key: SPARK-43627
> URL: https://issues.apache.org/jira/browse/SPARK-43627
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Pandas API on Spark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Assignee: Ruifeng Zheng
>Priority: Major
> Fix For: 3.5.0
>
>
> Enable pyspark.pandas.spark.functions.skew in Spark Connect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-43626) Enable pyspark.pandas.spark.functions.kurt in Spark Connect.

2023-06-15 Thread Ruifeng Zheng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-43626.
---
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 41604
[https://github.com/apache/spark/pull/41604]

> Enable pyspark.pandas.spark.functions.kurt in Spark Connect.
> 
>
> Key: SPARK-43626
> URL: https://issues.apache.org/jira/browse/SPARK-43626
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Pandas API on Spark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Assignee: Ruifeng Zheng
>Priority: Major
> Fix For: 3.5.0
>
>
> Enable pyspark.pandas.spark.functions.kurt in Spark Connect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

56 matches

Mail list logo