(spark) branch branch-3.5 updated: [SPARK-46953][TEST] Wrap withTable for a test in ResolveDefaultColumnsSuite

2024-02-01 Thread yao
This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 4b33d2874fb3 [SPARK-46953][TEST] Wrap withTable for a test in 
ResolveDefaultColumnsSuite
4b33d2874fb3 is described below

commit 4b33d2874fb3d73a4a35155b9d8b515518153321
Author: Kent Yao 
AuthorDate: Fri Feb 2 15:17:43 2024 +0800

[SPARK-46953][TEST] Wrap withTable for a test in ResolveDefaultColumnsSuite

### What changes were proposed in this pull request?

The table is not cleaned up after this test; test retries or upcoming new 
tests reused 't' as the table name will fail with TAEE.

### Why are the changes needed?

fix tests as FOLLOWUP of SPARK-43742

### Does this PR introduce _any_ user-facing change?

no
### How was this patch tested?

this test itself
### Was this patch authored or co-authored using generative AI tooling?

no

Closes #44993 from yaooqinn/SPARK-43742.

Authored-by: Kent Yao 
Signed-off-by: Kent Yao 
(cherry picked from commit a3432428e760fc16610cfe3380d3bdea7654f75d)
Signed-off-by: Kent Yao 
---
 .../spark/sql/ResolveDefaultColumnsSuite.scala | 104 +++--
 1 file changed, 53 insertions(+), 51 deletions(-)

diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/ResolveDefaultColumnsSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/ResolveDefaultColumnsSuite.scala
index 29b2796d25aa..00529559a485 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/ResolveDefaultColumnsSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/ResolveDefaultColumnsSuite.scala
@@ -76,57 +76,59 @@ class ResolveDefaultColumnsSuite extends QueryTest with 
SharedSparkSession {
   }
 
   test("INSERT into partitioned tables") {
-sql("create table t(c1 int, c2 int, c3 int, c4 int) using parquet 
partitioned by (c3, c4)")
-
-// INSERT without static partitions
-checkError(
-  exception = intercept[AnalysisException] {
-sql("insert into t values (1, 2, 3)")
-  },
-  errorClass = "INSERT_COLUMN_ARITY_MISMATCH.NOT_ENOUGH_DATA_COLUMNS",
-  parameters = Map(
-"tableName" -> "`spark_catalog`.`default`.`t`",
-"tableColumns" -> "`c1`, `c2`, `c3`, `c4`",
-"dataColumns" -> "`col1`, `col2`, `col3`"))
-
-// INSERT without static partitions but with column list
-sql("truncate table t")
-sql("insert into t (c2, c1, c4) values (1, 2, 3)")
-checkAnswer(spark.table("t"), Row(2, 1, null, 3))
-
-// INSERT with static partitions
-sql("truncate table t")
-checkError(
-  exception = intercept[AnalysisException] {
-sql("insert into t partition(c3=3, c4=4) values (1)")
-  },
-  errorClass = "INSERT_PARTITION_COLUMN_ARITY_MISMATCH",
-  parameters = Map(
-"tableName" -> "`spark_catalog`.`default`.`t`",
-"tableColumns" -> "`c1`, `c2`, `c3`, `c4`",
-"dataColumns" -> "`col1`",
-"staticPartCols" -> "`c3`, `c4`"))
-
-// INSERT with static partitions and with column list
-sql("truncate table t")
-sql("insert into t partition(c3=3, c4=4) (c2) values (1)")
-checkAnswer(spark.table("t"), Row(null, 1, 3, 4))
-
-// INSERT with partial static partitions
-sql("truncate table t")
-checkError(
-  exception = intercept[AnalysisException] {
-sql("insert into t partition(c3=3, c4) values (1, 2)")
-  },
-  errorClass = "INSERT_PARTITION_COLUMN_ARITY_MISMATCH",
-  parameters = Map(
-"tableName" -> "`spark_catalog`.`default`.`t`",
-"tableColumns" -> "`c1`, `c2`, `c3`, `c4`",
-"dataColumns" -> "`col1`, `col2`",
-"staticPartCols" -> "`c3`"))
-
-// INSERT with partial static partitions and with column list is not 
allowed
-intercept[AnalysisException](sql("insert into t partition(c3=3, c4) (c1) 
values (1, 4)"))
+withTable("t") {
+  sql("create table t(c1 int, c2 int, c3 int, c4 int) using parquet 
partitioned by (c3, c4)")
+
+  // INSERT without static partitions
+  checkError(
+exception = intercept[AnalysisException] {
+  sql("insert into t values (1, 2, 3)")
+},
+errorClass = "INSERT_COLUMN_ARITY_MISMATCH.NOT_ENOUGH_DATA_COLUMNS",
+parameters = Map(
+  "tableName" -> "`spark_catalog`.`default`.`t`",
+  "tableColumns" -> "`c1`, `c2`, `c3`, `c4`",
+  "dataColumns" -> "`col1`, `col2`, `col3`"))
+
+  // INSERT without static partitions but with column list
+  sql("truncate table t")
+  sql("insert into t (c2, c1, c4) values (1, 2, 3)")
+  checkAnswer(spark.table("t"), Row(2, 1, null, 3))
+
+  // INSERT with static partitions
+  sql("truncate table t")
+  checkError(
+exception = intercept[AnalysisException] {
+   

(spark) branch master updated: [SPARK-46953][TEST] Wrap withTable for a test in ResolveDefaultColumnsSuite

2024-02-01 Thread yao
This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new a3432428e760 [SPARK-46953][TEST] Wrap withTable for a test in 
ResolveDefaultColumnsSuite
a3432428e760 is described below

commit a3432428e760fc16610cfe3380d3bdea7654f75d
Author: Kent Yao 
AuthorDate: Fri Feb 2 15:17:43 2024 +0800

[SPARK-46953][TEST] Wrap withTable for a test in ResolveDefaultColumnsSuite

### What changes were proposed in this pull request?

The table is not cleaned up after this test; test retries or upcoming new 
tests reused 't' as the table name will fail with TAEE.

### Why are the changes needed?

fix tests as FOLLOWUP of SPARK-43742

### Does this PR introduce _any_ user-facing change?

no
### How was this patch tested?

this test itself
### Was this patch authored or co-authored using generative AI tooling?

no

Closes #44993 from yaooqinn/SPARK-43742.

Authored-by: Kent Yao 
Signed-off-by: Kent Yao 
---
 .../spark/sql/ResolveDefaultColumnsSuite.scala | 104 +++--
 1 file changed, 53 insertions(+), 51 deletions(-)

diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/ResolveDefaultColumnsSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/ResolveDefaultColumnsSuite.scala
index 29b2796d25aa..00529559a485 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/ResolveDefaultColumnsSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/ResolveDefaultColumnsSuite.scala
@@ -76,57 +76,59 @@ class ResolveDefaultColumnsSuite extends QueryTest with 
SharedSparkSession {
   }
 
   test("INSERT into partitioned tables") {
-sql("create table t(c1 int, c2 int, c3 int, c4 int) using parquet 
partitioned by (c3, c4)")
-
-// INSERT without static partitions
-checkError(
-  exception = intercept[AnalysisException] {
-sql("insert into t values (1, 2, 3)")
-  },
-  errorClass = "INSERT_COLUMN_ARITY_MISMATCH.NOT_ENOUGH_DATA_COLUMNS",
-  parameters = Map(
-"tableName" -> "`spark_catalog`.`default`.`t`",
-"tableColumns" -> "`c1`, `c2`, `c3`, `c4`",
-"dataColumns" -> "`col1`, `col2`, `col3`"))
-
-// INSERT without static partitions but with column list
-sql("truncate table t")
-sql("insert into t (c2, c1, c4) values (1, 2, 3)")
-checkAnswer(spark.table("t"), Row(2, 1, null, 3))
-
-// INSERT with static partitions
-sql("truncate table t")
-checkError(
-  exception = intercept[AnalysisException] {
-sql("insert into t partition(c3=3, c4=4) values (1)")
-  },
-  errorClass = "INSERT_PARTITION_COLUMN_ARITY_MISMATCH",
-  parameters = Map(
-"tableName" -> "`spark_catalog`.`default`.`t`",
-"tableColumns" -> "`c1`, `c2`, `c3`, `c4`",
-"dataColumns" -> "`col1`",
-"staticPartCols" -> "`c3`, `c4`"))
-
-// INSERT with static partitions and with column list
-sql("truncate table t")
-sql("insert into t partition(c3=3, c4=4) (c2) values (1)")
-checkAnswer(spark.table("t"), Row(null, 1, 3, 4))
-
-// INSERT with partial static partitions
-sql("truncate table t")
-checkError(
-  exception = intercept[AnalysisException] {
-sql("insert into t partition(c3=3, c4) values (1, 2)")
-  },
-  errorClass = "INSERT_PARTITION_COLUMN_ARITY_MISMATCH",
-  parameters = Map(
-"tableName" -> "`spark_catalog`.`default`.`t`",
-"tableColumns" -> "`c1`, `c2`, `c3`, `c4`",
-"dataColumns" -> "`col1`, `col2`",
-"staticPartCols" -> "`c3`"))
-
-// INSERT with partial static partitions and with column list is not 
allowed
-intercept[AnalysisException](sql("insert into t partition(c3=3, c4) (c1) 
values (1, 4)"))
+withTable("t") {
+  sql("create table t(c1 int, c2 int, c3 int, c4 int) using parquet 
partitioned by (c3, c4)")
+
+  // INSERT without static partitions
+  checkError(
+exception = intercept[AnalysisException] {
+  sql("insert into t values (1, 2, 3)")
+},
+errorClass = "INSERT_COLUMN_ARITY_MISMATCH.NOT_ENOUGH_DATA_COLUMNS",
+parameters = Map(
+  "tableName" -> "`spark_catalog`.`default`.`t`",
+  "tableColumns" -> "`c1`, `c2`, `c3`, `c4`",
+  "dataColumns" -> "`col1`, `col2`, `col3`"))
+
+  // INSERT without static partitions but with column list
+  sql("truncate table t")
+  sql("insert into t (c2, c1, c4) values (1, 2, 3)")
+  checkAnswer(spark.table("t"), Row(2, 1, null, 3))
+
+  // INSERT with static partitions
+  sql("truncate table t")
+  checkError(
+exception = intercept[AnalysisException] {
+  sql("insert into t partition(c3=3, c4=4) values (1)")
+},
+errorClass = 

(spark) branch master updated: [SPARK-46952][SQL] XML: Limit size of corrupt record

2024-02-01 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 2e07662e24e2 [SPARK-46952][SQL] XML: Limit size of corrupt record
2e07662e24e2 is described below

commit 2e07662e24e243e7d1760ea063c9e88417bc873f
Author: Sandip Agarwala <131817656+sandip...@users.noreply.github.com>
AuthorDate: Fri Feb 2 15:45:09 2024 +0900

[SPARK-46952][SQL] XML: Limit size of corrupt record

### What changes were proposed in this pull request?
Limit the size of malformed XML string that gets stored in the corrupt 
column.

### Why are the changes needed?
A large corrupt XML record can be arbitrarily large and may cause OOM.

### Does this PR introduce _any_ user-facing change?
Yes

### How was this patch tested?
Unit test

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #44994 from sandip-db/xml_limit_corrupt_record_size.

Authored-by: Sandip Agarwala <131817656+sandip...@users.noreply.github.com>
Signed-off-by: Hyukjin Kwon 
---
 .../main/scala/org/apache/spark/sql/catalyst/xml/StaxXmlParser.scala| 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/xml/StaxXmlParser.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/xml/StaxXmlParser.scala
index 2458d1772dab..674d5f63b039 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/xml/StaxXmlParser.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/xml/StaxXmlParser.scala
@@ -145,7 +145,7 @@ class StaxXmlParser(
   def doParseColumn(xml: String,
   parseMode: ParseMode,
   xsdSchema: Option[Schema]): Option[InternalRow] = {
-val xmlRecord = UTF8String.fromString(xml)
+lazy val xmlRecord = UTF8String.fromString(xml)
 try {
   xsdSchema.foreach { schema =>
 schema.newValidator().validate(new StreamSource(new StringReader(xml)))


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-46683][SQL][TESTS][FOLLOW-UP] Fix typo, use queries in partition set

2024-02-01 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 58c9b5ac8ff4 [SPARK-46683][SQL][TESTS][FOLLOW-UP] Fix typo, use 
queries in partition set
58c9b5ac8ff4 is described below

commit 58c9b5ac8ff4aec1ee5b2ae7e0d5702df2ad273c
Author: Andy Lam 
AuthorDate: Fri Feb 2 12:23:19 2024 +0900

[SPARK-46683][SQL][TESTS][FOLLOW-UP] Fix typo, use queries in partition set

### What changes were proposed in this pull request?

Fix a typo in GeneratedSubquerySuite, where it is using the set of ALL 
queries instead of the partitioned set.

### Why are the changes needed?

The partitioned set will correspond to the test name.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

NA.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #44956 from andylam-db/generated-subqueries-fix.

Authored-by: Andy Lam 
Signed-off-by: Hyukjin Kwon 
---
 .../org/apache/spark/sql/jdbc/querytest/GeneratedSubquerySuite.scala| 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/querytest/GeneratedSubquerySuite.scala
 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/querytest/GeneratedSubquerySuite.scala
index 23d61c532899..ff1afbd16865 100644
--- 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/querytest/GeneratedSubquerySuite.scala
+++ 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/querytest/GeneratedSubquerySuite.scala
@@ -418,7 +418,7 @@ class GeneratedSubquerySuite extends 
DockerJDBCIntegrationSuite with QueryGenera
 // Enable ANSI so that { NULL IN {  } } behavior is correct in 
Spark.
 localSparkSession.conf.set(SQLConf.ANSI_ENABLED.key, true)
 
-val generatedQueries = generatedQuerySpecs.map(_.query).toSeq
+val generatedQueries = querySpec.map(_.query).toSeq
 // Randomize query order because we are taking a subset of queries.
 val shuffledQueries = scala.util.Random.shuffle(generatedQueries)
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-46941][SQL] Can't insert window group limit node for top-k computation if contains SizeBasedWindowFunction

2024-02-01 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new c426b7285b58 [SPARK-46941][SQL] Can't insert window group limit node 
for top-k computation if contains SizeBasedWindowFunction
c426b7285b58 is described below

commit c426b7285b588924eaa8325cb83c868389e94bc3
Author: zml1206 
AuthorDate: Fri Feb 2 12:18:49 2024 +0900

[SPARK-46941][SQL] Can't insert window group limit node for top-k 
computation if contains SizeBasedWindowFunction

### What changes were proposed in this pull request?
Don't insert window group limit node for top-k computation if contains 
`SizeBasedWindowFunction`.

### Why are the changes needed?
Bug fix, Insert window group limit node for top-k computation contains 
`SizeBasedWindowFunction` will cause wrong result of the 
SizeBasedWindowFunction`.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
New UT. Before this pr UT will not pass.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #44980 from zml1206/SPARK-46941.

Authored-by: zml1206 
Signed-off-by: Hyukjin Kwon 
---
 .../catalyst/optimizer/InferWindowGroupLimit.scala | 11 +
 .../optimizer/InferWindowGroupLimitSuite.scala | 18 ++-
 .../spark/sql/DataFrameWindowFunctionsSuite.scala  | 27 ++
 3 files changed, 50 insertions(+), 6 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/InferWindowGroupLimit.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/InferWindowGroupLimit.scala
index 04204c6a2e10..f2e99721e926 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/InferWindowGroupLimit.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/InferWindowGroupLimit.scala
@@ -17,7 +17,7 @@
 
 package org.apache.spark.sql.catalyst.optimizer
 
-import org.apache.spark.sql.catalyst.expressions.{Alias, Attribute, 
CurrentRow, DenseRank, EqualTo, Expression, GreaterThan, GreaterThanOrEqual, 
IntegerLiteral, LessThan, LessThanOrEqual, Literal, NamedExpression, 
PredicateHelper, Rank, RowFrame, RowNumber, SpecifiedWindowFrame, 
UnboundedPreceding, WindowExpression, WindowSpecDefinition}
+import org.apache.spark.sql.catalyst.expressions.{Alias, Attribute, 
CurrentRow, DenseRank, EqualTo, Expression, GreaterThan, GreaterThanOrEqual, 
IntegerLiteral, LessThan, LessThanOrEqual, Literal, NamedExpression, 
PredicateHelper, Rank, RowFrame, RowNumber, SizeBasedWindowFunction, 
SpecifiedWindowFrame, UnboundedPreceding, WindowExpression, 
WindowSpecDefinition}
 import org.apache.spark.sql.catalyst.plans.logical.{Filter, Limit, 
LocalRelation, LogicalPlan, Window, WindowGroupLimit}
 import org.apache.spark.sql.catalyst.rules.Rule
 import org.apache.spark.sql.catalyst.trees.TreePattern.{FILTER, WINDOW}
@@ -53,13 +53,14 @@ object InferWindowGroupLimit extends Rule[LogicalPlan] with 
PredicateHelper {
   }
 
   /**
-   * All window expressions should use the same expanding window, so that
-   * we can safely do the early stop.
+   * All window expressions should use the same expanding window and do not 
contains
+   * `SizeBasedWindowFunction`, so that we can safely do the early stop.
*/
   private def isExpandingWindow(
   windowExpression: NamedExpression): Boolean = windowExpression match {
-case Alias(WindowExpression(_, WindowSpecDefinition(_, _,
-SpecifiedWindowFrame(RowFrame, UnboundedPreceding, CurrentRow))), _) => 
true
+case Alias(WindowExpression(windowFunction, WindowSpecDefinition(_, _,
+SpecifiedWindowFrame(RowFrame, UnboundedPreceding, CurrentRow))), _)
+  if !windowFunction.isInstanceOf[SizeBasedWindowFunction] => true
 case _ => false
   }
 
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/InferWindowGroupLimitSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/InferWindowGroupLimitSuite.scala
index 3b185adabc3f..5aa7a27f65fb 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/InferWindowGroupLimitSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/InferWindowGroupLimitSuite.scala
@@ -20,7 +20,7 @@ package org.apache.spark.sql.catalyst.optimizer
 import org.apache.spark.sql.Row
 import org.apache.spark.sql.catalyst.dsl.expressions._
 import org.apache.spark.sql.catalyst.dsl.plans._
-import org.apache.spark.sql.catalyst.expressions.{CurrentRow, DenseRank, 
Literal, NthValue, NTile, Rank, RowFrame, RowNumber, SpecifiedWindowFrame, 
UnboundedPreceding}
+import org.apache.spark.sql.catalyst.expressions.{CurrentRow, DenseRank, 
Literal, NthValue, NTile, PercentRank, 

(spark) branch master updated: [MINOR][DOCS] Fix outgoing links from SELECT SQL reference page

2024-02-01 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 47fb5041b014 [MINOR][DOCS] Fix outgoing links from SELECT SQL 
reference page
47fb5041b014 is described below

commit 47fb5041b0140eab57a9398419872dc8f3b57166
Author: Nicholas Chammas 
AuthorDate: Fri Feb 2 12:13:10 2024 +0900

[MINOR][DOCS] Fix outgoing links from SELECT SQL reference page

### What changes were proposed in this pull request?

Fix a couple of links that were pointing to the raw Markdown files (which 
doesn't work) rather than the compiled HTML files.

### Why are the changes needed?

Documentation links should work, especially internal links.

### Does this PR introduce _any_ user-facing change?

Yes, it changes the documentation.

### How was this patch tested?

I built the docs and confirmed the links work.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #44984 from nchammas/sql-query-ref.

Authored-by: Nicholas Chammas 
Signed-off-by: Hyukjin Kwon 
---
 docs/sql-ref-syntax-qry-select.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/sql-ref-syntax-qry-select.md 
b/docs/sql-ref-syntax-qry-select.md
index 6bcf3ab65b99..1d5532898c65 100644
--- a/docs/sql-ref-syntax-qry-select.md
+++ b/docs/sql-ref-syntax-qry-select.md
@@ -87,8 +87,8 @@ SELECT [ hints , ... ] [ ALL | DISTINCT ] { [ [ 
named_expression | regex_column_
  Specifies a source of input for the query. It can be one of the following:
  * Table relation
  * [Join relation](sql-ref-syntax-qry-select-join.html)
- * [Pivot relation](sql-ref-syntax-qry-select-pivot.md)
- * [Unpivot relation](sql-ref-syntax-qry-select-unpivot.md)
+ * [Pivot relation](sql-ref-syntax-qry-select-pivot.html)
+ * [Unpivot relation](sql-ref-syntax-qry-select-unpivot.html)
  * [Table-value function](sql-ref-syntax-qry-select-tvf.html)
  * [Inline table](sql-ref-syntax-qry-select-inline-table.html)
  * [ [LATERAL](sql-ref-syntax-qry-select-lateral-subquery.html) ] ( 
Subquery )


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch branch-3.4 updated (64115d9829ec -> 9c36cfaffb1b)

2024-02-01 Thread yao
This is an automated email from the ASF dual-hosted git repository.

yao pushed a change to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


from 64115d9829ec [SPARK-46747][SQL] Avoid scan in getTableExistsQuery for 
JDBC Dialects
 add 9c36cfaffb1b [SPARK-46945][K8S][3.4] Add 
`spark.kubernetes.legacy.useReadWriteOnceAccessMode` for old K8s clusters

No new revisions were added by this update.

Summary of changes:
 .../core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala | 9 +
 .../spark/deploy/k8s/features/MountVolumesFeatureStep.scala  | 8 +++-
 2 files changed, 16 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch branch-3.5 updated (d3b4537f8bd5 -> 547edb2b4ac7)

2024-02-01 Thread yao
This is an automated email from the ASF dual-hosted git repository.

yao pushed a change to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


from d3b4537f8bd5 [SPARK-46747][SQL] Avoid scan in getTableExistsQuery for 
JDBC Dialects
 add 547edb2b4ac7 [SPARK-46945][K8S][3.5] Add 
`spark.kubernetes.legacy.useReadWriteOnceAccessMode` for old K8s clusters

No new revisions were added by this update.

Summary of changes:
 .../core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala | 9 +
 .../spark/deploy/k8s/features/MountVolumesFeatureStep.scala  | 8 +++-
 2 files changed, 16 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated (cedef63faf14 -> 457f6f59b1b7)

2024-02-01 Thread yao
This is an automated email from the ASF dual-hosted git repository.

yao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from cedef63faf14 [SPARK-45807][SQL] Return View after calling 
replaceView(..)
 add 457f6f59b1b7 [SPARK-46945][K8S] Add 
`spark.kubernetes.legacy.useReadWriteOnceAccessMode` for old K8s clusters

No new revisions were added by this update.

Summary of changes:
 docs/core-migration-guide.md | 2 ++
 .../core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala | 9 +
 .../spark/deploy/k8s/features/MountVolumesFeatureStep.scala  | 8 +++-
 3 files changed, 18 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-45807][SQL] Return View after calling replaceView(..)

2024-02-01 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new cedef63faf14 [SPARK-45807][SQL] Return View after calling 
replaceView(..)
cedef63faf14 is described below

commit cedef63faf14ce41ea9f4540faa4a1c18cf7cea8
Author: Eduard Tudenhoefner 
AuthorDate: Fri Feb 2 10:35:56 2024 +0800

[SPARK-45807][SQL] Return View after calling replaceView(..)

### What changes were proposed in this pull request?

Follow-up API improvements based on from 
https://github.com/apache/spark/pull/43677

### Why are the changes needed?

Required for DataSourceV2 view support.

### Does this PR introduce _any_ user-facing change?

No
### How was this patch tested?

N/A

### Was this patch authored or co-authored using generative AI tooling?

N/A

Closes #44970 from nastra/SPARK-45807-return-type.

Authored-by: Eduard Tudenhoefner 
Signed-off-by: Wenchen Fan 
---
 .../java/org/apache/spark/sql/connector/catalog/ViewCatalog.java  | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git 
a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/ViewCatalog.java
 
b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/ViewCatalog.java
index 933289cab40b..abe5fb3148d0 100644
--- 
a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/ViewCatalog.java
+++ 
b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/ViewCatalog.java
@@ -115,7 +115,7 @@ public interface ViewCatalog extends CatalogPlugin {
* Create a view in the catalog.
*
* @param viewInfo the info class holding all view information
-   * @return the view created
+   * @return the created view. This can be null if getting the metadata for 
the view is expensive
* @throws ViewAlreadyExistsException If a view or table already exists for 
the identifier
* @throws NoSuchNamespaceException If the identifier namespace does not 
exist (optional)
*/
@@ -129,10 +129,12 @@ public interface ViewCatalog extends CatalogPlugin {
*
* @param viewInfo the info class holding all view information
* @param orCreate create the view if it doesn't exist
+   * @return the created/replaced view. This can be null if getting the 
metadata
+   * for the view is expensive
* @throws NoSuchViewException If the view doesn't exist or is a table
* @throws NoSuchNamespaceException If the identifier namespace does not 
exist (optional)
*/
-  default void replaceView(
+  default View replaceView(
   ViewInfo viewInfo,
   boolean orCreate)
   throws NoSuchViewException, NoSuchNamespaceException {
@@ -143,7 +145,7 @@ public interface ViewCatalog extends CatalogPlugin {
 }
 
 try {
-  createView(viewInfo);
+  return createView(viewInfo);
 } catch (ViewAlreadyExistsException e) {
   throw new RuntimeException("Race condition when creating/replacing 
view", e);
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated (53cbaeb20293 -> 49e3f5e3bad4)

2024-02-01 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 53cbaeb20293 [SPARK-46936][PS] Implement `Frame.to_feather`
 add 49e3f5e3bad4 [SPARK-46908][SQL] Support star clause in WHERE clause

No new revisions were added by this update.

Summary of changes:
 docs/sql-migration-guide.md|   1 +
 docs/sql-ref-function-invocation.md| 112 +++
 docs/sql-ref-syntax-qry-select.md  |   7 +-
 docs/sql-ref-syntax-qry-star.md|  97 +
 docs/sql-ref-syntax.md |   1 +
 docs/sql-ref.md|   1 +
 .../spark/sql/catalyst/analysis/Analyzer.scala |  10 +-
 .../spark/sql/catalyst/parser/AstBuilder.scala |   7 +-
 .../org/apache/spark/sql/internal/SQLConf.scala|  13 ++
 .../sql/catalyst/analysis/AnalysisErrorSuite.scala |   4 +-
 .../analyzer-results/selectExcept.sql.out  | 218 +
 .../subquery/in-subquery/in-group-by.sql.out   |   2 +-
 .../resources/sql-tests/inputs/selectExcept.sql|  29 +++
 .../inputs/subquery/in-subquery/in-group-by.sql|   2 +-
 .../sql-tests/results/selectExcept.sql.out | 168 
 .../subquery/in-subquery/in-group-by.sql.out   |   2 +-
 16 files changed, 666 insertions(+), 8 deletions(-)
 create mode 100644 docs/sql-ref-function-invocation.md
 create mode 100644 docs/sql-ref-syntax-qry-star.md


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-46936][PS] Implement `Frame.to_feather`

2024-02-01 Thread ruifengz
This is an automated email from the ASF dual-hosted git repository.

ruifengz pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 53cbaeb20293 [SPARK-46936][PS] Implement `Frame.to_feather`
53cbaeb20293 is described below

commit 53cbaeb202931a918864ca4aecd9826144ad8307
Author: Ruifeng Zheng 
AuthorDate: Fri Feb 2 09:04:26 2024 +0800

[SPARK-46936][PS] Implement `Frame.to_feather`

### What changes were proposed in this pull request?
Implement `Frame.to_feather`

### Why are the changes needed?
for pandas parity

### Does this PR introduce _any_ user-facing change?
yes
```
In [3]: pdf = pd.DataFrame(
   ...: [[1, 1.0, "a"]],
   ...: columns=["x", "y", "z"],
   ...: )

In [4]: pdf
Out[4]:
   xy  z
0  1  1.0  a

In [5]: psdf = ps.from_pandas(pdf)

In [6]: psdf

   xy  z
0  1  1.0  a

In [7]: pdf.to_feather("/tmp/file1.feather")

In [8]: psdf.to_feather("/tmp/file2.feather")

In [9]: f1 = pd.read_feather("/tmp/file1.feather")

In [10]: f1
Out[10]:
   xy  z
0  1  1.0  a

In [11]: f2 = pd.read_feather("/tmp/file2.feather")

In [12]: f2
Out[12]:
   xy  z
0  1  1.0  a

```

### How was this patch tested?
added ut

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #44972 from zhengruifeng/ps_to_feather.

Authored-by: Ruifeng Zheng 
Signed-off-by: Ruifeng Zheng 
---
 dev/sparktestsupport/modules.py|  2 +
 .../docs/source/reference/pyspark.pandas/frame.rst |  1 +
 python/pyspark/pandas/frame.py | 35 +++
 python/pyspark/pandas/missing/frame.py |  1 -
 .../pandas/tests/connect/io/test_parity_feather.py | 42 +
 python/pyspark/pandas/tests/io/test_feather.py | 68 ++
 6 files changed, 148 insertions(+), 1 deletion(-)

diff --git a/dev/sparktestsupport/modules.py b/dev/sparktestsupport/modules.py
index 508cf56b9c87..233dcf4e54b6 100644
--- a/dev/sparktestsupport/modules.py
+++ b/dev/sparktestsupport/modules.py
@@ -816,6 +816,7 @@ pyspark_pandas = Module(
 "pyspark.pandas.tests.series.test_stat",
 "pyspark.pandas.tests.io.test_io",
 "pyspark.pandas.tests.io.test_csv",
+"pyspark.pandas.tests.io.test_feather",
 "pyspark.pandas.tests.io.test_dataframe_conversion",
 "pyspark.pandas.tests.io.test_dataframe_spark_io",
 "pyspark.pandas.tests.io.test_series_conversion",
@@ -1297,6 +1298,7 @@ pyspark_pandas_connect_part3 = Module(
 # pandas-on-Spark unittests
 "pyspark.pandas.tests.connect.io.test_parity_io",
 "pyspark.pandas.tests.connect.io.test_parity_csv",
+"pyspark.pandas.tests.connect.io.test_parity_feather",
 "pyspark.pandas.tests.connect.io.test_parity_dataframe_conversion",
 "pyspark.pandas.tests.connect.io.test_parity_dataframe_spark_io",
 "pyspark.pandas.tests.connect.io.test_parity_series_conversion",
diff --git a/python/docs/source/reference/pyspark.pandas/frame.rst 
b/python/docs/source/reference/pyspark.pandas/frame.rst
index 77b60468b8fb..564ddb607a19 100644
--- a/python/docs/source/reference/pyspark.pandas/frame.rst
+++ b/python/docs/source/reference/pyspark.pandas/frame.rst
@@ -283,6 +283,7 @@ Serialization / IO / Conversion
DataFrame.to_numpy
DataFrame.to_spark
DataFrame.to_string
+   DataFrame.to_feather
DataFrame.to_json
DataFrame.to_dict
DataFrame.to_excel
diff --git a/python/pyspark/pandas/frame.py b/python/pyspark/pandas/frame.py
index 7222b877bba1..3b3565f7ea9f 100644
--- a/python/pyspark/pandas/frame.py
+++ b/python/pyspark/pandas/frame.py
@@ -2648,6 +2648,41 @@ defaultdict(, {'col..., 'col...})]
 psdf._to_internal_pandas(), self.to_latex, pd.DataFrame.to_latex, 
args
 )
 
+def to_feather(
+self,
+path: Union[str, IO[str]],
+**kwargs: Any,
+) -> None:
+"""
+Write a DataFrame to the binary Feather format.
+
+.. note:: This method should only be used if the resulting DataFrame 
is expected
+  to be small, as all the data is loaded into the driver's 
memory.
+
+.. versionadded:: 4.0.0
+
+Parameters
+--
+path : str, path object, file-like object
+String, path object (implementing ``os.PathLike[str]``), or 
file-like
+object implementing a binary ``write()`` function.
+**kwargs :
+Additional keywords passed to 
:func:`pyarrow.feather.write_feather`.
+This includes the `compression`, `compression_level`, `chunksize`
+and `version` keywords.
+
+Examples

(spark) branch master updated: [SPARK-46864][SS] Onboard Arbitrary StateV2 onto New Error Class Framework

2024-02-01 Thread kabhwan
This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 00e63d63f9af [SPARK-46864][SS] Onboard Arbitrary StateV2 onto New 
Error Class Framework
00e63d63f9af is described below

commit 00e63d63f9af6ef186e14159ddbe8bb8d1c8690b
Author: Eric Marnadi 
AuthorDate: Fri Feb 2 05:38:15 2024 +0900

[SPARK-46864][SS] Onboard Arbitrary StateV2 onto New Error Class Framework

### What changes were proposed in this pull request?

This PR proposes to apply error class framework to the new data source, 
State API V2.

### Why are the changes needed?

Error class framework is a standard to represent all exceptions in Spark.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Refactored unit tests to check that the right error class was being thrown 
in certain situations

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #44883 from ericm-db/state-v2-error-class.

Lead-authored-by: Eric Marnadi 
Co-authored-by: ericm-db <132308037+ericm...@users.noreply.github.com>
Signed-off-by: Jungtaek Lim 
---
 .../src/main/resources/error/error-classes.json| 29 +++
 ...r-conditions-unsupported-feature-error-class.md |  4 ++
 docs/sql-error-conditions.md   | 24 ++
 .../sql/execution/streaming/ValueStateImpl.scala   |  5 +-
 .../state/HDFSBackedStateStoreProvider.scala   |  3 +-
 .../streaming/state/StateStoreChangelog.scala  | 16 +++
 .../streaming/state/StateStoreErrors.scala | 56 ++
 .../streaming/state/MemoryStateStore.scala |  2 +-
 .../execution/streaming/state/RocksDBSuite.scala   | 37 +-
 .../streaming/state/ValueStateSuite.scala  | 43 +++--
 10 files changed, 199 insertions(+), 20 deletions(-)

diff --git a/common/utils/src/main/resources/error/error-classes.json 
b/common/utils/src/main/resources/error/error-classes.json
index 8e47490f5a61..baefb05a7070 100644
--- a/common/utils/src/main/resources/error/error-classes.json
+++ b/common/utils/src/main/resources/error/error-classes.json
@@ -1656,6 +1656,12 @@
 ],
 "sqlState" : "XX000"
   },
+  "INTERNAL_ERROR_TWS" : {
+"message" : [
+  ""
+],
+"sqlState" : "XX000"
+  },
   "INTERVAL_ARITHMETIC_OVERFLOW" : {
 "message" : [
   "."
@@ -3235,6 +3241,18 @@
 ],
 "sqlState" : "0A000"
   },
+  "STATE_STORE_MULTIPLE_VALUES_PER_KEY" : {
+"message" : [
+  "Store does not support multiple values per key"
+],
+"sqlState" : "42802"
+  },
+  "STATE_STORE_UNSUPPORTED_OPERATION" : {
+"message" : [
+  " operation not supported with "
+],
+"sqlState" : "XXKST"
+  },
   "STATIC_PARTITION_COLUMN_IN_INSERT_COLUMN_LIST" : {
 "message" : [
   "Static partition column  is also specified in the column 
list."
@@ -3388,6 +3406,12 @@
 ],
 "sqlState" : "428EK"
   },
+  "TWS_VALUE_SHOULD_NOT_BE_NULL" : {
+"message" : [
+  "New value should be non-null for "
+],
+"sqlState" : "22004"
+  },
   "UDTF_ALIAS_NUMBER_MISMATCH" : {
 "message" : [
   "The number of aliases supplied in the AS clause does not match the 
number of columns output by the UDTF.",
@@ -3921,6 +3945,11 @@
   " is a VARIABLE and cannot be updated using the SET 
statement. Use SET VARIABLE  = ... instead."
 ]
   },
+  "STATE_STORE_MULTIPLE_COLUMN_FAMILIES" : {
+"message" : [
+  "Creating multiple column families with  is not 
supported."
+]
+  },
   "TABLE_OPERATION" : {
 "message" : [
   "Table  does not support . Please check the 
current catalog and namespace to make sure the qualified table name is 
expected, and also check the catalog implementation which is configured by 
\"spark.sql.catalog\"."
diff --git a/docs/sql-error-conditions-unsupported-feature-error-class.md 
b/docs/sql-error-conditions-unsupported-feature-error-class.md
index d90d2b2a109f..1b12c4bfc1b3 100644
--- a/docs/sql-error-conditions-unsupported-feature-error-class.md
+++ b/docs/sql-error-conditions-unsupported-feature-error-class.md
@@ -190,6 +190,10 @@ set PROPERTIES and DBPROPERTIES at the same time.
 
 `` is a VARIABLE and cannot be updated using the SET statement. 
Use SET VARIABLE `` = ... instead.
 
+## STATE_STORE_MULTIPLE_COLUMN_FAMILIES
+
+Creating multiple column families with `` is not supported.
+
 ## TABLE_OPERATION
 
 Table `` does not support ``. Please check the current 
catalog and namespace to make sure the qualified table name is expected, and 
also check the catalog implementation which is configured by 
"spark.sql.catalog".
diff --git a/docs/sql-error-conditions.md 

(spark) branch master updated: [MINOR][SQL] Clean up outdated comments from `hash` function in `Metadata`

2024-02-01 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 704e9a0785f4 [MINOR][SQL] Clean up outdated comments from `hash` 
function in `Metadata`
704e9a0785f4 is described below

commit 704e9a0785f4fc4dd86b950a649114e807a826a1
Author: yangjie01 
AuthorDate: Thu Feb 1 09:31:24 2024 -0800

[MINOR][SQL] Clean up outdated comments from `hash` function in `Metadata`

### What changes were proposed in this pull request?
This pr just clean up outdated comments from `hash` function in `Metadata`

### Why are the changes needed?
Clean up outdated comments

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass GitHub Actions

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #44978 from LuciferYang/minior-remove-comments.

Authored-by: yangjie01 
Signed-off-by: Dongjoon Hyun 
---
 sql/api/src/main/scala/org/apache/spark/sql/types/Metadata.scala | 2 --
 1 file changed, 2 deletions(-)

diff --git a/sql/api/src/main/scala/org/apache/spark/sql/types/Metadata.scala 
b/sql/api/src/main/scala/org/apache/spark/sql/types/Metadata.scala
index 17be8cfa12b5..2ffd0f13ca10 100644
--- a/sql/api/src/main/scala/org/apache/spark/sql/types/Metadata.scala
+++ b/sql/api/src/main/scala/org/apache/spark/sql/types/Metadata.scala
@@ -208,8 +208,6 @@ object Metadata {
   /** Computes the hash code for the types we support. */
   private def hash(obj: Any): Int = {
 obj match {
-  // `map.mapValues` return `Map` in Scala 2.12 and return `MapView` in 
Scala 2.13, call
-  // `toMap` for Scala version compatibility.
   case map: Map[_, _] =>
 map.transform((_, v) => hash(v)).##
   case arr: Array[_] =>


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-46940][CORE] Remove unused `updateSparkConfigFromProperties` and `isAbsoluteURI` in `o.a.s.u.Utils`

2024-02-01 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new d5ca61692c34 [SPARK-46940][CORE] Remove unused 
`updateSparkConfigFromProperties` and `isAbsoluteURI` in `o.a.s.u.Utils`
d5ca61692c34 is described below

commit d5ca61692c34449bc602db6cf0919010ec5a50a3
Author: panbingkun 
AuthorDate: Thu Feb 1 09:30:07 2024 -0800

[SPARK-46940][CORE] Remove unused `updateSparkConfigFromProperties` and 
`isAbsoluteURI` in `o.a.s.u.Utils`

### What changes were proposed in this pull request?
The pr aims to remove unused `updateSparkConfigFromProperties` and 
`isAbsoluteURI` in `o.a.s.u.Utils`.

### Why are the changes needed?
Keep the code cleanly.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #44979 from panbingkun/SPARK-46940.

Authored-by: panbingkun 
Signed-off-by: Dongjoon Hyun 
---
 .../main/scala/org/apache/spark/util/Utils.scala   | 25 --
 1 file changed, 25 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/util/Utils.scala 
b/core/src/main/scala/org/apache/spark/util/Utils.scala
index a55539c0a235..b49f97aed05e 100644
--- a/core/src/main/scala/org/apache/spark/util/Utils.scala
+++ b/core/src/main/scala/org/apache/spark/util/Utils.scala
@@ -1884,17 +1884,6 @@ private[spark] object Utils
 }
   }
 
-  /** Check whether a path is an absolute URI. */
-  def isAbsoluteURI(path: String): Boolean = {
-try {
-  val uri = new URI(path: String)
-  uri.isAbsolute
-} catch {
-  case _: URISyntaxException =>
-false
-}
-  }
-
   /** Return all non-local paths from a comma-separated list of paths. */
   def nonLocalPaths(paths: String, testWindows: Boolean = false): 
Array[String] = {
 val windows = isWindows || testWindows
@@ -1931,20 +1920,6 @@ private[spark] object Utils
 path
   }
 
-  /**
-   * Updates Spark config with properties from a set of Properties.
-   * Provided properties have the highest priority.
-   */
-  def updateSparkConfigFromProperties(
-  conf: SparkConf,
-  properties: Map[String, String]) : Unit = {
-properties.filter { case (k, v) =>
-  k.startsWith("spark.")
-}.foreach { case (k, v) =>
-  conf.set(k, v)
-}
-  }
-
   /**
* Implements the same logic as JDK `java.lang.String#trim` by removing 
leading and trailing
* non-printable characters less or equal to '\u0020' (SPACE) but preserves 
natural line


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-46933][SQL] Add query execution time metric to connectors which use JDBCRDD

2024-02-01 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 22586efb3cd4 [SPARK-46933][SQL] Add query execution time metric to 
connectors which use JDBCRDD
22586efb3cd4 is described below

commit 22586efb3cd4a19969d92b249a14326fba3244d2
Author: Uros Stankovic 
AuthorDate: Thu Feb 1 22:48:50 2024 +0800

[SPARK-46933][SQL] Add query execution time metric to connectors which use 
JDBCRDD

### What changes were proposed in this pull request?
This pull request should add measuring query execution time on external 
JDBC data source.
Another change is changing access right for JDBCRDD class, that is needed 
for adding another metric (SQL text) which will be done in some next PR.

### Why are the changes needed?
Query execution time is very important metric to have

### Does this PR introduce _any_ user-facing change?
User can see query execution time on SparkPlan graph under node metrics tab

### How was this patch tested?
Tested using custom image

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #44969 from 
urosstan-db/SPARK-46933-Add-scan-metrics-to-jdbc-connector.

Lead-authored-by: Uros Stankovic 
Co-authored-by: Uros Stankovic 
<155642965+urosstan...@users.noreply.github.com>
Signed-off-by: Wenchen Fan 
---
 .../spark/sql/execution/DataSourceScanExec.scala   | 17 --
 .../datasources/DataSourceMetricsMixin.scala   | 24 
 .../sql/execution/datasources/jdbc/JDBCRDD.scala   | 26 --
 3 files changed, 63 insertions(+), 4 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala
index 2622eadaefb3..ec265f4eaea4 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala
@@ -99,6 +99,10 @@ trait DataSourceScanExec extends LeafExecNode {
   def inputRDDs(): Seq[RDD[InternalRow]]
 }
 
+object DataSourceScanExec {
+  val numOutputRowsKey = "numOutputRows"
+}
+
 /** Physical plan node for scanning data from a relation. */
 case class RowDataSourceScanExec(
 output: Seq[Attribute],
@@ -111,8 +115,17 @@ case class RowDataSourceScanExec(
 tableIdentifier: Option[TableIdentifier])
   extends DataSourceScanExec with InputRDDCodegen {
 
-  override lazy val metrics =
-Map("numOutputRows" -> SQLMetrics.createMetric(sparkContext, "number of 
output rows"))
+  override lazy val metrics: Map[String, SQLMetric] = {
+val metrics = Map(
+  DataSourceScanExec.numOutputRowsKey ->
+SQLMetrics.createMetric(sparkContext, "number of output rows")
+)
+
+rdd match {
+  case rddWithDSMetrics: DataSourceMetricsMixin => metrics ++ 
rddWithDSMetrics.getMetrics
+  case _ => metrics
+}
+  }
 
   protected override def doExecute(): RDD[InternalRow] = {
 val numOutputRows = longMetric("numOutputRows")
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceMetricsMixin.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceMetricsMixin.scala
new file mode 100644
index ..6c1e5e876e99
--- /dev/null
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceMetricsMixin.scala
@@ -0,0 +1,24 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources
+
+import org.apache.spark.sql.execution.metric.SQLMetric
+
+trait DataSourceMetricsMixin {
+  def getMetrics: Seq[(String, SQLMetric)]
+}
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRDD.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRDD.scala
index 934ed9ac2a1b..a436627fd117 100644
--- 

(spark) branch master updated: [SPARK-46852][SS] Remove use of explicit key encoder and pass it implicitly to the operator for transformWithState operator

2024-02-01 Thread kabhwan
This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new e610d1d8f79b [SPARK-46852][SS] Remove use of explicit key encoder and 
pass it implicitly to the operator for transformWithState operator
e610d1d8f79b is described below

commit e610d1d8f79b913cb9ee9236a6325202c58d8397
Author: Anish Shrigondekar 
AuthorDate: Thu Feb 1 22:31:07 2024 +0900

[SPARK-46852][SS] Remove use of explicit key encoder and pass it implicitly 
to the operator for transformWithState operator

### What changes were proposed in this pull request?
Remove use of explicit key encoder and pass it implicitly to the operator 
for transformWithState operator

### Why are the changes needed?
Changes needed to avoid asking users to provide explicit key encoder and we 
also might need them for subsequent timer related changes

### Does this PR introduce _any_ user-facing change?
Yes

### How was this patch tested?
Existing unit tests

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #44974 from anishshri-db/task/SPARK-46852.

Authored-by: Anish Shrigondekar 
Signed-off-by: Jungtaek Lim 
---
 .../sql/streaming/StatefulProcessorHandle.scala|  5 +
 .../spark/sql/catalyst/plans/logical/object.scala  |  3 +++
 .../spark/sql/execution/SparkStrategies.scala  |  3 ++-
 .../streaming/StatefulProcessorHandleImpl.scala| 13 +
 .../streaming/TransformWithStateExec.scala |  6 +-
 .../sql/execution/streaming/ValueStateImpl.scala   | 12 +---
 .../streaming/state/ValueStateSuite.scala  | 22 +++---
 .../sql/streaming/TransformWithStateSuite.scala|  8 +++-
 8 files changed, 39 insertions(+), 33 deletions(-)

diff --git 
a/sql/api/src/main/scala/org/apache/spark/sql/streaming/StatefulProcessorHandle.scala
 
b/sql/api/src/main/scala/org/apache/spark/sql/streaming/StatefulProcessorHandle.scala
index 302de4a3c947..5eaccceb947c 100644
--- 
a/sql/api/src/main/scala/org/apache/spark/sql/streaming/StatefulProcessorHandle.scala
+++ 
b/sql/api/src/main/scala/org/apache/spark/sql/streaming/StatefulProcessorHandle.scala
@@ -19,7 +19,6 @@ package org.apache.spark.sql.streaming
 import java.io.Serializable
 
 import org.apache.spark.annotation.{Evolving, Experimental}
-import org.apache.spark.sql.Encoder
 
 /**
  * Represents the operation handle provided to the stateful processor used in 
the
@@ -34,12 +33,10 @@ private[sql] trait StatefulProcessorHandle extends 
Serializable {
* The user must ensure to call this function only within the `init()` 
method of the
* StatefulProcessor.
* @param stateName - name of the state variable
-   * @param keyEncoder - Spark SQL Encoder for key
-   * @tparam K - type of key
* @tparam T - type of state variable
* @return - instance of ValueState of type T that can be used to store 
state persistently
*/
-  def getValueState[K, T](stateName: String, keyEncoder: Encoder[K]): 
ValueState[T]
+  def getValueState[T](stateName: String): ValueState[T]
 
   /** Function to return queryInfo for currently running task */
   def getQueryInfo(): QueryInfo
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/object.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/object.scala
index 8f937dd5a777..cb8673d20ed3 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/object.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/object.scala
@@ -577,6 +577,7 @@ object TransformWithState {
   timeoutMode: TimeoutMode,
   outputMode: OutputMode,
   child: LogicalPlan): LogicalPlan = {
+val keyEncoder = encoderFor[K]
 val mapped = new TransformWithState(
   UnresolvedDeserializer(encoderFor[K].deserializer, groupingAttributes),
   UnresolvedDeserializer(encoderFor[V].deserializer, dataAttributes),
@@ -585,6 +586,7 @@ object TransformWithState {
   statefulProcessor.asInstanceOf[StatefulProcessor[Any, Any, Any]],
   timeoutMode,
   outputMode,
+  keyEncoder.asInstanceOf[ExpressionEncoder[Any]],
   CatalystSerde.generateObjAttr[U],
   child
 )
@@ -600,6 +602,7 @@ case class TransformWithState(
 statefulProcessor: StatefulProcessor[Any, Any, Any],
 timeoutMode: TimeoutMode,
 outputMode: OutputMode,
+keyEncoder: ExpressionEncoder[Any],
 outputObjAttr: Attribute,
 child: LogicalPlan) extends UnaryNode with ObjectProducer {
 
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala
index 5d4063d125c8..f5c2f17f8826 100644
--- 

(spark) branch master updated: [SPARK-45110][BUILD] Upgrade rocksdbjni to 8.8.1

2024-02-01 Thread srowen
This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 1870de0b329a [SPARK-45110][BUILD] Upgrade rocksdbjni to 8.8.1
1870de0b329a is described below

commit 1870de0b329ac5ef35a331a653b4debd85eaa684
Author: panbingkun 
AuthorDate: Thu Feb 1 06:37:00 2024 -0600

[SPARK-45110][BUILD] Upgrade rocksdbjni to 8.8.1

### What changes were proposed in this pull request?
The pr aims to upgrade rocksdbjni from `8.3.2` to `8.8.1`.

Why version `8.8.1`? Because so far, `32` tests have been conducted based 
on version `8.6.7` or `8.8.1`, and no previous core issues have been found. The 
later versions have not been rigorously validated.

### Why are the changes needed?
1.The full release notes:
- https://github.com/facebook/rocksdb/releases/tag/v8.8.1
- https://github.com/facebook/rocksdb/releases/tag/v8.7.3
- https://github.com/facebook/rocksdb/releases/tag/v8.6.7
- https://github.com/facebook/rocksdb/releases/tag/v8.5.4
- https://github.com/facebook/rocksdb/releases/tag/v8.5.3
- https://github.com/facebook/rocksdb/releases/tag/v8.4.4
- https://github.com/facebook/rocksdb/releases/tag/v8.3.3

2.Bug Fixes, eg:
- Fixed a bug where compaction read under non direct IO still falls back to 
RocksDB internal prefetching after file system's prefetching returns non-OK 
status other than Status::NotSupported()
- Fix a bug with atomic_flush=true that can cause DB to stuck after a flush 
fails
- Fix a bug where if there is an error reading from offset 0 of a file from 
L1+ and that the file is not the first file in the sorted run, data can be lost 
in compaction and read/scan can return incorrect results.
- Fix a bug where iterator may return incorrect result for DeleteRange() 
users if there was an error reading from a file.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
- Pass GA.
- Manually test:
```
./build/mvn clean install -pl core -am 
-Dtest.exclude.tags=org.apache.spark.tags.ExtendedLevelDBTest -fn

```

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #43924 from panbingkun/upgrade_rocksdbjni.

Lead-authored-by: panbingkun 
Co-authored-by: panbingkun 
Signed-off-by: Sean Owen 
---
 dev/deps/spark-deps-hadoop-3-hive-2.3  |  2 +-
 pom.xml|  2 +-
 ...StoreBasicOperationsBenchmark-jdk21-results.txt | 70 ++---
 .../StateStoreBasicOperationsBenchmark-results.txt | 72 +++---
 4 files changed, 73 insertions(+), 73 deletions(-)

diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index fcb3350e5de2..e02733883642 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -239,7 +239,7 @@ parquet-jackson/1.13.1//parquet-jackson-1.13.1.jar
 pickle/1.3//pickle-1.3.jar
 py4j/0.10.9.7//py4j-0.10.9.7.jar
 remotetea-oncrpc/1.1.2//remotetea-oncrpc-1.1.2.jar
-rocksdbjni/8.3.2//rocksdbjni-8.3.2.jar
+rocksdbjni/8.8.1//rocksdbjni-8.8.1.jar
 scala-collection-compat_2.13/2.7.0//scala-collection-compat_2.13-2.7.0.jar
 scala-compiler/2.13.12//scala-compiler-2.13.12.jar
 scala-library/2.13.12//scala-library-2.13.12.jar
diff --git a/pom.xml b/pom.xml
index 6e118bb27f5a..2fc14a4cdede 100644
--- a/pom.xml
+++ b/pom.xml
@@ -677,7 +677,7 @@
   
 org.rocksdb
 rocksdbjni
-8.3.2
+8.8.1
   
   
 ${leveldbjni.group}
diff --git 
a/sql/core/benchmarks/StateStoreBasicOperationsBenchmark-jdk21-results.txt 
b/sql/core/benchmarks/StateStoreBasicOperationsBenchmark-jdk21-results.txt
index f92ae8668e16..c0d710873aed 100644
--- a/sql/core/benchmarks/StateStoreBasicOperationsBenchmark-jdk21-results.txt
+++ b/sql/core/benchmarks/StateStoreBasicOperationsBenchmark-jdk21-results.txt
@@ -6,33 +6,33 @@ OpenJDK 64-Bit Server VM 21.0.1+12-LTS on Linux 
5.15.0-1053-azure
 AMD EPYC 7763 64-Core Processor
 putting 1 rows (1 rows to overwrite - rate 100):  Best Time(ms)   Avg 
Time(ms)   Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
 
---
-In-memory5 
 6   0  1.8 541.4   1.0X
-RocksDB (trackTotalNumberOfRows: true)  40 
41   2  0.24023.4   0.1X
-RocksDB (trackTotalNumberOfRows: false) 15 
15   1  0.71452.5   0.4X
+In-memory