date:20220607

[GitHub] [spark] pan3793 commented on a diff in pull request #36789: [SPARK-39403] Add SPARK_SUBMIT_OPTS in spark-env.sh.template

2022-06-07 Thread GitBox



pan3793 commented on code in PR #36789:
URL: https://github.com/apache/spark/pull/36789#discussion_r891948046


##
conf/spark-env.sh.template:
##
@@ -79,3 +80,6 @@
 # Options for beeline
 # - SPARK_BEELINE_OPTS, to set config properties only for the beeline cli 
(e.g. "-Dx=y")
 # - SPARK_BEELINE_MEMORY, Memory for beeline (e.g. 1000M, 2G) (Default: 1G)
+
+# Options for spark-submit, spark-shell, pyspark, run-example, sparkR

Review Comment:
   Updated to use `./bin/...`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36789: [SPARK-39403] Add SPARK_SUBMIT_OPTS in spark-env.sh.template

2022-06-07 Thread GitBox



HyukjinKwon commented on code in PR #36789:
URL: https://github.com/apache/spark/pull/36789#discussion_r891946395


##
conf/spark-env.sh.template:
##
@@ -79,3 +80,6 @@
 # Options for beeline
 # - SPARK_BEELINE_OPTS, to set config properties only for the beeline cli 
(e.g. "-Dx=y")
 # - SPARK_BEELINE_MEMORY, Memory for beeline (e.g. 1000M, 2G) (Default: 1G)
+
+# Options for spark-submit, spark-shell, pyspark, run-example, sparkR

Review Comment:
   Does `pyspark` mean PySpark shell? Let's either explicitly use the script 
name like `bin/...` or use the full name as PySpark shell.



##
conf/spark-env.sh.template:
##
@@ -79,3 +80,6 @@
 # Options for beeline
 # - SPARK_BEELINE_OPTS, to set config properties only for the beeline cli 
(e.g. "-Dx=y")
 # - SPARK_BEELINE_MEMORY, Memory for beeline (e.g. 1000M, 2G) (Default: 1G)
+
+# Options for spark-submit, spark-shell, pyspark, run-example, sparkR

Review Comment:
   Does `pyspark` mean PySpark shell? Let's either explicitly use the script 
name like `./bin/...` or use the full name as PySpark shell.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #36683: [SPARK-39301][SQL][PYTHON] Leverage LocalRelation and respect Arrow batch size in createDataFrame with Arrow optimization

2022-06-07 Thread GitBox



HyukjinKwon commented on PR #36683:
URL: https://github.com/apache/spark/pull/36683#issuecomment-1149492294

   cc @mengxr and @WeichenXu123 in case you guys have some comments.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36683: [SPARK-39301][SQL][PYTHON] Leverage LocalRelation and respect Arrow batch size in createDataFrame with Arrow optimization

2022-06-07 Thread GitBox



HyukjinKwon commented on code in PR #36683:
URL: https://github.com/apache/spark/pull/36683#discussion_r891944792


##
python/pyspark/sql/pandas/conversion.py:
##
@@ -596,7 +596,7 @@ def _create_from_pandas_with_arrow(
 ]
 
 # Slice the DataFrame to be batched
-step = -(-len(pdf) // self.sparkContext.defaultParallelism)  # round 
int up
+step = self._jconf.arrowMaxRecordsPerBatch()

Review Comment:
   Yeah, that's true... perf diff seems trivial in any event and seems it works 
around the `spark.rpc.message.maxSize` issue.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] LuciferYang commented on pull request #36800: [SPARK-39409][BUILD] Upgrade scala-maven-plugin to 4.6.2

2022-06-07 Thread GitBox



LuciferYang commented on PR #36800:
URL: https://github.com/apache/spark/pull/36800#issuecomment-1149491231

   thanks @HyukjinKwon 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon closed pull request #36800: [SPARK-39409][BUILD] Upgrade scala-maven-plugin to 4.6.2

2022-06-07 Thread GitBox



HyukjinKwon closed pull request #36800: [SPARK-39409][BUILD] Upgrade 
scala-maven-plugin to 4.6.2
URL: https://github.com/apache/spark/pull/36800


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #36800: [SPARK-39409][BUILD] Upgrade scala-maven-plugin to 4.6.2

2022-06-07 Thread GitBox



HyukjinKwon commented on PR #36800:
URL: https://github.com/apache/spark/pull/36800#issuecomment-1149490024

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #36802: [SPARK-39321][SQL][TESTS][FOLLOW-UP] Respect CastWithAnsiOffSuite.ansiEnabled in 'cast string to date #2'

2022-06-07 Thread GitBox



HyukjinKwon commented on PR #36802:
URL: https://github.com/apache/spark/pull/36802#issuecomment-1149489864

   cc @cloud-fan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon opened a new pull request, #36802: [SPARK-39321][SQL][TESTS][FOLLOW-UP] Respect CastWithAnsiOffSuite.ansiEnabled in 'cast string to date #2'

2022-06-07 Thread GitBox



HyukjinKwon opened a new pull request, #36802:
URL: https://github.com/apache/spark/pull/36802

   ### What changes were proposed in this pull request?
   
   This PR fixes the test to make `CastWithAnsiOffSuite` properly respect 
`ansiEnabled` in `cast string to date #2` test by using 
`CastWithAnsiOffSuite.cast` instead of `Cast` expression.
   
   ### Why are the changes needed?
   
   To make the tests pass. Currently it fails when ANSI mode is on:
   
   https://github.com/apache/spark/runs/6786744647
   
   ### Does this PR introduce _any_ user-facing change?
   
   No, test-only.
   
   ### How was this patch tested?
   
   Manually tested in my IDE.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #36797: [SPARK-39394][DOCS][SS][3.3] Improve PySpark Structured Streaming page more readable

2022-06-07 Thread GitBox



HyukjinKwon commented on PR #36797:
URL: https://github.com/apache/spark/pull/36797#issuecomment-1149483125

   Merged to branch-3.3.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon closed pull request #36797: [SPARK-39394][DOCS][SS][3.3] Improve PySpark Structured Streaming page more readable

2022-06-07 Thread GitBox



HyukjinKwon closed pull request #36797: [SPARK-39394][DOCS][SS][3.3] Improve 
PySpark Structured Streaming page more readable
URL: https://github.com/apache/spark/pull/36797


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] LuciferYang commented on pull request #36781: [SPARK-39393][SQL] Parquet data source only supports push-down predicate filters for non-repeated primitive types

2022-06-07 Thread GitBox



LuciferYang commented on PR #36781:
URL: https://github.com/apache/spark/pull/36781#issuecomment-1149455635

   I think this pr should be backport to previous Spark version, because  when 
run `SPARK-39393: Do not push down predicate filters for repeated primitive 
fields` without this pr, I found the error logs as follows:
   
   ```
   Last failure message: There are 1 possibly leaked file streams..
at 
org.scalatest.enablers.Retrying$$anon$4.tryTryAgain$2(Retrying.scala:185)
at org.scalatest.enablers.Retrying$$anon$4.retry(Retrying.scala:192)
at org.scalatest.concurrent.Eventually.eventually(Eventually.scala:402)
at org.scalatest.concurrent.Eventually.eventually$(Eventually.scala:401)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetFilterSuite.eventually(ParquetFilterSuite.scala:77)
at org.scalatest.concurrent.Eventually.eventually(Eventually.scala:312)
at org.scalatest.concurrent.Eventually.eventually$(Eventually.scala:311)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetFilterSuite.eventually(ParquetFilterSuite.scala:77)
at 
org.apache.spark.sql.test.SharedSparkSessionBase.afterEach(SharedSparkSession.scala:164)
at 
org.apache.spark.sql.test.SharedSparkSessionBase.afterEach$(SharedSparkSession.scala:158)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetFilterSuite.afterEach(ParquetFilterSuite.scala:100)
at 
org.scalatest.BeforeAndAfterEach.$anonfun$runTest$1(BeforeAndAfterEach.scala:247)
at org.scalatest.Status.$anonfun$withAfterEffect$1(Status.scala:377)
at 
org.scalatest.Status.$anonfun$withAfterEffect$1$adapted(Status.scala:373)
at org.scalatest.FailedStatus$.whenCompleted(Status.scala:505)
at org.scalatest.Status.withAfterEffect(Status.scala:373)
at org.scalatest.Status.withAfterEffect$(Status.scala:371)
at org.scalatest.FailedStatus$.withAfterEffect(Status.scala:477)
at 
org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:246)
at 
org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:227)
at org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:64)
at 
org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:233)
at 
org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413)
at scala.collection.immutable.List.foreach(List.scala:431)
at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:396)
at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:475)
at 
org.scalatest.funsuite.AnyFunSuiteLike.runTests(AnyFunSuiteLike.scala:233)
at 
org.scalatest.funsuite.AnyFunSuiteLike.runTests$(AnyFunSuiteLike.scala:232)
at org.scalatest.funsuite.AnyFunSuite.runTests(AnyFunSuite.scala:1563)
at org.scalatest.Suite.run(Suite.scala:1112)
at org.scalatest.Suite.run$(Suite.scala:1094)
at 
org.scalatest.funsuite.AnyFunSuite.org$scalatest$funsuite$AnyFunSuiteLike$$super$run(AnyFunSuite.scala:1563)
at 
org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$run$1(AnyFunSuiteLike.scala:237)
at org.scalatest.SuperEngine.runImpl(Engine.scala:535)
at org.scalatest.funsuite.AnyFunSuiteLike.run(AnyFunSuiteLike.scala:237)
at 
org.scalatest.funsuite.AnyFunSuiteLike.run$(AnyFunSuiteLike.scala:236)
at 
org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:64)
at 
org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:213)
at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:64)
at org.scalatest.tools.SuiteRunner.run(SuiteRunner.scala:45)
at 
org.scalatest.tools.Runner$.$anonfun$doRunRunRunDaDoRunRun$13(Runner.scala:1320)
at 
org.scalatest.tools.Runner$.$anonfun$doRunRunRunDaDoRunRun$13$adapted(Runner.scala:1314)
at scala.collection.immutable.List.foreach(List.scala:431)
at org.scalatest.tools.Runner$.doRunRunRunDaDoRunRun(Runner.scala:1314)
at 
org.scalatest.tools.Runner$.$anonfun$runOptionallyWithPassFailReporter$24(Runner.scala:993)
at 
org.scalatest.tools.Runner$.$anonfun$runOptionallyWithPassFailReporter$24$adapted(Runner.scala:971)
at 
org.scalatest.tools.Runner$.withClassLoaderAndDispatchReporter(Runner.scala:1480)
at 
org.scalatest.tools.Runner$.runOptionallyWithPassFailReporter(Runner.scala:971)
at org.scalatest.tools.Runner$.run(Runner.scala:798)
at org.scalatest.tools.Runner.run(Runner.scala)
at 
org.jetbrains.plugins.scala.testingSupport.scalaTest.ScalaTestRunner.runScalaTest2or3(ScalaTestRunner.java:38)
at

[GitHub] [spark] Yaohua628 opened a new pull request, #36801: [SPARK-39404][SQL][Streaming] Minor fix for querying `_metadata` in streaming

2022-06-07 Thread GitBox



Yaohua628 opened a new pull request, #36801:
URL: https://github.com/apache/spark/pull/36801

   
   
   ### What changes were proposed in this pull request?
   We added the support to query the `_metadata` column with a file-based 
streaming source: https://github.com/apache/spark/pull/35676. 
   
   We propose to use `transformUp` instead of `match` when pattern matching the 
`dataPlan` in `MicroBatchExecution` `runBatch` method in this PR. It is fine 
for `FileStreamSource` because `FileStreamSource` always returns one 
`LogicalRelation` node 
(https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSource.scala#L247).
 
   
   But the proposed change will make the logic robust and we really should not 
rely on the upstream source to return a desired plan. In addition, the proposed 
change could also make `_metadata` work if someone wants to customize 
`FileStreamSource` `getBatch`.
   
   
   ### Why are the changes needed?
   Robust
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   Existing tests
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #36787: [SPARK-39387][FOLLOWUP][TESTS] Add a test case for HIVE-25190

2022-06-07 Thread GitBox



AmplabJenkins commented on PR #36787:
URL: https://github.com/apache/spark/pull/36787#issuecomment-1149442173

   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AngersZhuuuu commented on pull request #36786: [SPARK-39400][SQL] spark-sql should remove hive resource dir in all case

2022-06-07 Thread GitBox



AngersZh commented on PR #36786:
URL: https://github.com/apache/spark/pull/36786#issuecomment-1149434288

   > Could we add a test?
   
   Need to test after built, seems hard to write test in UT...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] LuciferYang opened a new pull request, #36800: [SPARK-39409][BUILD] Upgrade scala-maven-plugin to 4.6.2

2022-06-07 Thread GitBox



LuciferYang opened a new pull request, #36800:
URL: https://github.com/apache/spark/pull/36800

   ### What changes were proposed in this pull request?
   This pr aims upgrade scala-maven-plugin to 4.6.2
   
   
   ### Why are the changes needed?
   This version brings some bug fix related to `Incremental recompile`:
   
   - [Incremental recompileMode - java.lang.NoClassDefFoundError: 
java/sql/ResultSetMetaData](https://github.com/davidB/scala-maven-plugin/issues/502)
   - [Fix incremental compilation on Java 11+, close 
#600](https://github.com/davidB/scala-maven-plugin/pull/609)
   
   all changes as follows:
   
   - https://github.com/davidB/scala-maven-plugin/compare/4.6.1...4.6.2
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   Pass Github Actions


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] amaliujia commented on a diff in pull request #36586: [SPARK-39236][SQL] Make CreateTable and ListTables be compatible with 3 layer namespace

2022-06-07 Thread GitBox



amaliujia commented on code in PR #36586:
URL: https://github.com/apache/spark/pull/36586#discussion_r891887309


##
sql/core/src/test/scala/org/apache/spark/sql/internal/CatalogSuite.scala:
##
@@ -553,4 +571,103 @@ class CatalogSuite extends SharedSparkSession with 
AnalysisTest {
 }.getMessage
 assert(errMsg.contains("my_temp_table is a temp view. 
'recoverPartitions()' expects a table"))
   }
+
+  test("three layer namespace compatibility - create managed table") {
+val catalogName = "testcat"
+val dbName = "my_db"
+val tableName = "my_table"
+val tableSchema = new StructType().add("i", "int")
+val description = "this is a test table"
+
+val df = spark.catalog.createTable(
+  tableName = Array(catalogName, dbName, tableName).mkString("."),
+  source = classOf[FakeV2Provider].getName,
+  schema = tableSchema,
+  description = description,
+  options = Map.empty[String, String])
+assert(df.schema.equals(tableSchema))
+
+val testCatalog =
+  spark.sessionState.catalogManager.catalog(catalogName).asTableCatalog
+val table = testCatalog.loadTable(Identifier.of(Array(dbName), tableName))
+assert(table.schema().equals(tableSchema))
+
assert(table.properties().get("provider").equals(classOf[FakeV2Provider].getName))
+assert(table.properties().get("comment").equals(description))
+  }
+
+  test("three layer namespace compatibility - create external table") {
+withTempDir { dir =>
+  val catalogName = "testcat"
+  val dbName = "my_db"
+  val tableName = "my_table"
+  val tableSchema = new StructType().add("i", "int")
+  val description = "this is a test table"
+
+  val df = spark.catalog.createTable(
+tableName = Array(catalogName, dbName, tableName).mkString("."),
+source = classOf[FakeV2Provider].getName,
+schema = tableSchema,
+description = description,
+options = Map("path" -> dir.getAbsolutePath))
+  assert(df.schema.equals(tableSchema))
+
+  val testCatalog =
+spark.sessionState.catalogManager.catalog("testcat").asTableCatalog
+  val table = testCatalog.loadTable(Identifier.of(Array(dbName), 
tableName))
+  assert(table.schema().equals(tableSchema))
+  
assert(table.properties().get("provider").equals(classOf[FakeV2Provider].getName))
+  assert(table.properties().get("comment").equals(description))
+  assert(table.properties().get("path").equals(dir.getAbsolutePath))
+  assert(table.properties().get("external").equals("true"))
+  assert(table.properties().get("location").equals("file:" + 
dir.getAbsolutePath))
+}
+  }
+
+  test("three layer namespace compatibility - list tables") {
+withTempDir { dir =>
+  val catalogName = "testcat"
+  val dbName = "my_db"
+  val tableName = "my_table"
+  val tableSchema = new StructType().add("i", "int")
+  val description = "this is a test managed table"
+
+  spark.catalog.createTable(
+tableName = Array(catalogName, dbName, tableName).mkString("."),
+source = classOf[FakeV2Provider].getName,
+schema = tableSchema,
+description = description,
+options = Map.empty[String, String])
+
+  val tableName2 = "my_table2"
+  val description2 = "this is a test external table"
+
+  spark.catalog.createTable(
+tableName = Array(catalogName, dbName, tableName2).mkString("."),
+source = classOf[FakeV2Provider].getName,
+schema = tableSchema,
+description = description2,
+options = Map("path" -> dir.getAbsolutePath))
+
+  val tables = spark.catalog.listTables("testcat.my_db").collect()
+  assert(tables.size == 2)
+
+  val expectedTable1 =
+new Table(tableName, catalogName, Array(dbName), description,
+  CatalogTableType.MANAGED.name, false)
+  assert(tables.exists(t =>
+expectedTable1.name.equals(t.name) && 
expectedTable1.database.equals(t.database) &&
+expectedTable1.description.equals(t.description) &&
+expectedTable1.tableType.equals(t.tableType) &&
+expectedTable1.isTemporary == t.isTemporary))
+
+  val expectedTable2 =
+new Table(tableName2, catalogName, Array(dbName), description2,
+  CatalogTableType.EXTERNAL.name, false)
+  assert(tables.exists(t =>
+expectedTable2.name.equals(t.name) && 
expectedTable2.database.equals(t.database) &&
+expectedTable2.description.equals(t.description) &&
+expectedTable2.tableType.equals(t.tableType) &&
+expectedTable2.isTemporary == t.isTemporary))
+}

Review Comment:
   Sure I will add this test case.
   
   I won't call this is to test `backward compatibility` though. `listTable` 
should always accept DB name: either it is a `database` or it is a 
`catalog_name.database`.
   
   This is more like to test corretness. 



-- 
This is an automated message from the Apache Git Service.
To respond

[GitHub] [spark] cloud-fan commented on a diff in pull request #36586: [SPARK-39236][SQL] Make CreateTable and ListTables be compatible with 3 layer namespace

2022-06-07 Thread GitBox



cloud-fan commented on code in PR #36586:
URL: https://github.com/apache/spark/pull/36586#discussion_r891881553


##
sql/core/src/test/scala/org/apache/spark/sql/internal/CatalogSuite.scala:
##
@@ -553,4 +571,103 @@ class CatalogSuite extends SharedSparkSession with 
AnalysisTest {
 }.getMessage
 assert(errMsg.contains("my_temp_table is a temp view. 
'recoverPartitions()' expects a table"))
   }
+
+  test("three layer namespace compatibility - create managed table") {
+val catalogName = "testcat"
+val dbName = "my_db"
+val tableName = "my_table"
+val tableSchema = new StructType().add("i", "int")
+val description = "this is a test table"
+
+val df = spark.catalog.createTable(
+  tableName = Array(catalogName, dbName, tableName).mkString("."),
+  source = classOf[FakeV2Provider].getName,
+  schema = tableSchema,
+  description = description,
+  options = Map.empty[String, String])
+assert(df.schema.equals(tableSchema))
+
+val testCatalog =
+  spark.sessionState.catalogManager.catalog(catalogName).asTableCatalog
+val table = testCatalog.loadTable(Identifier.of(Array(dbName), tableName))
+assert(table.schema().equals(tableSchema))
+
assert(table.properties().get("provider").equals(classOf[FakeV2Provider].getName))
+assert(table.properties().get("comment").equals(description))
+  }
+
+  test("three layer namespace compatibility - create external table") {
+withTempDir { dir =>
+  val catalogName = "testcat"
+  val dbName = "my_db"
+  val tableName = "my_table"
+  val tableSchema = new StructType().add("i", "int")
+  val description = "this is a test table"
+
+  val df = spark.catalog.createTable(
+tableName = Array(catalogName, dbName, tableName).mkString("."),
+source = classOf[FakeV2Provider].getName,
+schema = tableSchema,
+description = description,
+options = Map("path" -> dir.getAbsolutePath))
+  assert(df.schema.equals(tableSchema))
+
+  val testCatalog =
+spark.sessionState.catalogManager.catalog("testcat").asTableCatalog
+  val table = testCatalog.loadTable(Identifier.of(Array(dbName), 
tableName))
+  assert(table.schema().equals(tableSchema))
+  
assert(table.properties().get("provider").equals(classOf[FakeV2Provider].getName))
+  assert(table.properties().get("comment").equals(description))
+  assert(table.properties().get("path").equals(dir.getAbsolutePath))
+  assert(table.properties().get("external").equals("true"))
+  assert(table.properties().get("location").equals("file:" + 
dir.getAbsolutePath))
+}
+  }
+
+  test("three layer namespace compatibility - list tables") {
+withTempDir { dir =>
+  val catalogName = "testcat"
+  val dbName = "my_db"
+  val tableName = "my_table"
+  val tableSchema = new StructType().add("i", "int")
+  val description = "this is a test managed table"
+
+  spark.catalog.createTable(
+tableName = Array(catalogName, dbName, tableName).mkString("."),
+source = classOf[FakeV2Provider].getName,
+schema = tableSchema,
+description = description,
+options = Map.empty[String, String])
+
+  val tableName2 = "my_table2"
+  val description2 = "this is a test external table"
+
+  spark.catalog.createTable(
+tableName = Array(catalogName, dbName, tableName2).mkString("."),
+source = classOf[FakeV2Provider].getName,
+schema = tableSchema,
+description = description2,
+options = Map("path" -> dir.getAbsolutePath))
+
+  val tables = spark.catalog.listTables("testcat.my_db").collect()
+  assert(tables.size == 2)
+
+  val expectedTable1 =
+new Table(tableName, catalogName, Array(dbName), description,
+  CatalogTableType.MANAGED.name, false)
+  assert(tables.exists(t =>
+expectedTable1.name.equals(t.name) && 
expectedTable1.database.equals(t.database) &&
+expectedTable1.description.equals(t.description) &&
+expectedTable1.tableType.equals(t.tableType) &&
+expectedTable1.isTemporary == t.isTemporary))
+
+  val expectedTable2 =
+new Table(tableName2, catalogName, Array(dbName), description2,
+  CatalogTableType.EXTERNAL.name, false)
+  assert(tables.exists(t =>
+expectedTable2.name.equals(t.name) && 
expectedTable2.database.equals(t.database) &&
+expectedTable2.description.equals(t.description) &&
+expectedTable2.tableType.equals(t.tableType) &&
+expectedTable2.isTemporary == t.isTemporary))
+}

Review Comment:
   can we test the backward compatibility case? e.g. register a new catalog 
"default" in this test case, and `listTable("default")` should still list 
tables under "default" schema in HMS.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go

[GitHub] [spark] cloud-fan commented on a diff in pull request #36586: [SPARK-39236][SQL] Make CreateTable and ListTables be compatible with 3 layer namespace

2022-06-07 Thread GitBox



cloud-fan commented on code in PR #36586:
URL: https://github.com/apache/spark/pull/36586#discussion_r891880524


##
sql/core/src/test/scala/org/apache/spark/sql/execution/GlobalTempViewSuite.scala:
##
@@ -165,7 +165,8 @@ class GlobalTempViewSuite extends QueryTest with 
SharedSparkSession {
   assert(spark.catalog.tableExists(globalTempDB, "src"))
   assert(spark.catalog.getTable(globalTempDB, "src").toString == new Table(
 name = "src",
-database = globalTempDB,
+catalog = "spark_catalog",

Review Comment:
   ```suggestion
   catalog = CatalogManager.SESSION_CATALOG_NAME,
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a diff in pull request #36586: [SPARK-39236][SQL] Make CreateTable and ListTables be compatible with 3 layer namespace

2022-06-07 Thread GitBox



cloud-fan commented on code in PR #36586:
URL: https://github.com/apache/spark/pull/36586#discussion_r891879819


##
sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala:
##
@@ -117,14 +131,45 @@ class CatalogImpl(sparkSession: SparkSession) extends 
Catalog {
   case NonFatal(_) => None
 }
 val isTemp = sessionCatalog.isTempView(tableIdent)
+val qualifier =
+  
Array(metadata.map(_.identifier.database).getOrElse(tableIdent.database).orNull)

Review Comment:
   
`metadata.map(_.identifier.database).getOrElse(tableIdent.database).map(Array(_)).orNull`.
 In abnormal case we should return null, not Array(null)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a diff in pull request #36586: [SPARK-39236][SQL] Make CreateTable and ListTables be compatible with 3 layer namespace

2022-06-07 Thread GitBox



cloud-fan commented on code in PR #36586:
URL: https://github.com/apache/spark/pull/36586#discussion_r891879819


##
sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala:
##
@@ -117,14 +131,45 @@ class CatalogImpl(sparkSession: SparkSession) extends 
Catalog {
   case NonFatal(_) => None
 }
 val isTemp = sessionCatalog.isTempView(tableIdent)
+val qualifier =
+  
Array(metadata.map(_.identifier.database).getOrElse(tableIdent.database).orNull)

Review Comment:
   
`metadata.map(_.identifier.database).getOrElse(tableIdent.database).toArray`. 
In abnormal case we should return null, not Array(null)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] itholic commented on pull request #36782: [SPARK-39394][DOCS][SS] Improve PySpark Structured Streaming page more readable

2022-06-07 Thread GitBox



itholic commented on PR #36782:
URL: https://github.com/apache/spark/pull/36782#issuecomment-1149410531

   @HyukjinKwon Sure, I created a PR: https://github.com/apache/spark/pull/36797


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a diff in pull request #36586: [SPARK-39236][SQL] Make CreateTable and ListTables be compatible with 3 layer namespace

2022-06-07 Thread GitBox



cloud-fan commented on code in PR #36586:
URL: https://github.com/apache/spark/pull/36586#discussion_r891879436


##
sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala:
##
@@ -117,14 +131,45 @@ class CatalogImpl(sparkSession: SparkSession) extends 
Catalog {
   case NonFatal(_) => None
 }
 val isTemp = sessionCatalog.isTempView(tableIdent)
+val qualifier =
+  
Array(metadata.map(_.identifier.database).getOrElse(tableIdent.database).orNull)
 new Table(
   name = tableIdent.table,
-  database = 
metadata.map(_.identifier.database).getOrElse(tableIdent.database).orNull,
+  catalog = "spark_catalog",

Review Comment:
   ```suggestion
 catalog = CatalogManager.SESSION_CATALOG_NAME,
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a diff in pull request #36586: [SPARK-39236][SQL] Make CreateTable and ListTables be compatible with 3 layer namespace

2022-06-07 Thread GitBox



cloud-fan commented on code in PR #36586:
URL: https://github.com/apache/spark/pull/36586#discussion_r891879053


##
sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala:
##
@@ -97,8 +98,21 @@ class CatalogImpl(sparkSession: SparkSession) extends 
Catalog {
*/
   @throws[AnalysisException]("database does not exist")
   override def listTables(dbName: String): Dataset[Table] = {
-val tables = sessionCatalog.listTables(dbName).map(makeTable)
-CatalogImpl.makeDataset(tables, sparkSession)
+// dbName could be either 2-part name or 3-part name. We assume it is 
2-part name

Review Comment:
   ```suggestion
   // `dbName` could be either a single database name (behavior in Spark 
3.3 and prior) or
   // a qualified namespace with catalog name. We assume it's a single 
database name and ...
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Borjianamin98 commented on a diff in pull request #36781: [SPARK-39393][SQL] Parquet data source only supports push-down predicate filters for non-repeated primitive types

2022-06-07 Thread GitBox



Borjianamin98 commented on code in PR #36781:
URL: https://github.com/apache/spark/pull/36781#discussion_r891878580


##
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala:
##
@@ -1316,6 +1317,34 @@ abstract class ParquetFilterSuite extends QueryTest with 
ParquetTest with Shared
 }
   }
 
+  test("SPARK-39393 Do not push down predicate filters for repeated primitive 
fields") {

Review Comment:
   Thanks. Done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a diff in pull request #36586: [SPARK-39236][SQL] Make CreateTable and ListTables be compatible with 3 layer namespace

2022-06-07 Thread GitBox



cloud-fan commented on code in PR #36586:
URL: https://github.com/apache/spark/pull/36586#discussion_r891878020


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala:
##
@@ -965,6 +965,10 @@ class SessionCatalog(
 isTempView(nameParts.asTableIdentifier)
   }
 
+  def isGlobalTempView(dbName: String): Boolean = {

Review Comment:
   ```suggestion
 def isGlobalTempViewDB(dbName: String): Boolean = {
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a diff in pull request #36704: [SPARK-39346][SQL] Convert asserts/illegal state exception to internal errors on each phase

2022-06-07 Thread GitBox



cloud-fan commented on code in PR #36704:
URL: https://github.com/apache/spark/pull/36704#discussion_r891876268


##
connector/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala:
##
@@ -666,9 +667,10 @@ abstract class KafkaMicroBatchSourceSuiteBase extends 
KafkaSourceSuiteBase {
 testUtils.sendMessages(topic2, Array("6"))
   },
   StartStream(),
-  ExpectFailure[IllegalStateException](e => {
+  ExpectFailure[SparkException](e => {
+assert(e.asInstanceOf[SparkThrowable].getErrorClass === 
"INTERNAL_ERROR")
 // The offset of `topic2` should be changed from 2 to 1
-assert(e.getMessage.contains("was changed from 2 to 1"))
+assert(e.getCause.getMessage.contains("was changed from 2 to 1"))

Review Comment:
   The error is simply wrapped and the original error is kept in causedBy, so 
this is not a big problem. The bigger problem is how to define internal errors. 
We picked some common java exceptions that can be treated as internal errors: 
when end users see these errors from Spark, it indicates a bug.
   
   For third-party Spark plugins, I think all errors from them are internal 
errors? end-users can do nothing to work around these errors. cc @srielau 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dtenedor opened a new pull request, #36799: [SPARK-39350][SQL] Add flag to control breaking change process for: DESC NAMESPACE EXTENDED should redact properties

2022-06-07 Thread GitBox



dtenedor opened a new pull request, #36799:
URL: https://github.com/apache/spark/pull/36799

   ### What changes were proposed in this pull request?
   
   Add a flag to control breaking change process for: DESC NAMESPACE EXTENDED 
should redact properties.
   
   ### Why are the changes needed?
   
   This lets Spark users control how the new behavior rolls out to users.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   This PR extends unit test coverage.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dtenedor commented on pull request #36799: [SPARK-39350][SQL] Add flag to control breaking change process for: DESC NAMESPACE EXTENDED should redact properties

2022-06-07 Thread GitBox



dtenedor commented on PR #36799:
URL: https://github.com/apache/spark/pull/36799#issuecomment-1149401878

   @HyukjinKwon would you be able to help with this PR for the breaking change 
process?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #36704: [SPARK-39346][SQL] Convert asserts/illegal state exception to internal errors on each phase

2022-06-07 Thread GitBox



HeartSaVioR commented on code in PR #36704:
URL: https://github.com/apache/spark/pull/36704#discussion_r891867452


##
connector/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala:
##
@@ -666,9 +667,10 @@ abstract class KafkaMicroBatchSourceSuiteBase extends 
KafkaSourceSuiteBase {
 testUtils.sendMessages(topic2, Array("6"))
   },
   StartStream(),
-  ExpectFailure[IllegalStateException](e => {
+  ExpectFailure[SparkException](e => {
+assert(e.asInstanceOf[SparkThrowable].getErrorClass === 
"INTERNAL_ERROR")
 // The offset of `topic2` should be changed from 2 to 1
-assert(e.getMessage.contains("was changed from 2 to 1"))
+assert(e.getCause.getMessage.contains("was changed from 2 to 1"))

Review Comment:
   And we may need to make this clear to the interface contract on data source 
- now we are applying the error class framework even the exception comes from 
the data source. The convention, only applies in Spark project itself.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #36789: [SPARK-39403] Add SPARK_SUBMIT_OPTS in spark-env.sh.template

2022-06-07 Thread GitBox



AmplabJenkins commented on PR #36789:
URL: https://github.com/apache/spark/pull/36789#issuecomment-1149394119

   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] wangyum commented on pull request #36798: [SPARK-39408][SQL] Update the buildKeys for DynamicPruningSubquery.withNewPlan

2022-06-07 Thread GitBox



wangyum commented on PR #36798:
URL: https://github.com/apache/spark/pull/36798#issuecomment-1149393100

   cc @maryannxue @cloud-fan


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] wangyum opened a new pull request, #36798: [SPARK-39408][SQL] Update the buildKeys for DynamicPruningSubquery.withNewPlan

2022-06-07 Thread GitBox



wangyum opened a new pull request, #36798:
URL: https://github.com/apache/spark/pull/36798

   ### What changes were proposed in this pull request?
   
   This pr updates the buildKeys for `DynamicPruningSubquery.withNewPlan`.
   
   ### Why are the changes needed?
   
   Fix bug. Otherwise, the structural integrity of the plan is broken after 
this step:
   
https://github.com/apache/spark/blob/d9fd36eb76fcfec95763cc4dc594eb7856b0fad2/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/DeduplicateRelations.scala#L129-L134
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Unit test.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #36704: [SPARK-39346][SQL] Convert asserts/illegal state exception to internal errors on each phase

2022-06-07 Thread GitBox



HeartSaVioR commented on code in PR #36704:
URL: https://github.com/apache/spark/pull/36704#discussion_r891861831


##
connector/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala:
##
@@ -666,9 +667,10 @@ abstract class KafkaMicroBatchSourceSuiteBase extends 
KafkaSourceSuiteBase {
 testUtils.sendMessages(topic2, Array("6"))
   },
   StartStream(),
-  ExpectFailure[IllegalStateException](e => {
+  ExpectFailure[SparkException](e => {
+assert(e.asInstanceOf[SparkThrowable].getErrorClass === 
"INTERNAL_ERROR")
 // The offset of `topic2` should be changed from 2 to 1
-assert(e.getMessage.contains("was changed from 2 to 1"))
+assert(e.getCause.getMessage.contains("was changed from 2 to 1"))

Review Comment:
   I just realized this is included in Spark 3.3.0 instead of 3.4.0... I 
thought we have sufficient time to look into, but it doesn't seem so 
unfortunately...
   
   Can we check how the error message has changed for the test in user facing, 
and decide whether it is a breaking change or not? If the error class framework 
hides the details since it's considered as an internal error, this is a huge 
breaking change since we give a guidance in the error message.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] nyingping commented on a diff in pull request #36737: [SPARK-39347] [SS] Generate wrong time window when (timestamp-startTime) % slideDuration…

2022-06-07 Thread GitBox



nyingping commented on code in PR #36737:
URL: https://github.com/apache/spark/pull/36737#discussion_r891847061


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala:
##
@@ -3963,8 +3966,10 @@ object TimeWindowing extends Rule[LogicalPlan] {
 
 def getWindow(i: Int, dataType: DataType): Expression = {
   val timestamp = PreciseTimestampConversion(window.timeColumn, 
dataType, LongType)
-  val lastStart = timestamp - (timestamp - window.startTime
-+ window.slideDuration) % window.slideDuration
+  val remainder = (timestamp - window.startTime) % window.slideDuration
+  val lastStart = CaseWhen(
+Seq((LessThan(remainder, 0), timestamp - remainder - 
window.slideDuration))

Review Comment:
   Thank you for reviewing.
   
   For any event timestamp, the start time of the last window (whether tumbling 
or sliding) obtained by it should always be less than the current timestamp. 
When `(timestamp - window.starttime)% window.slideduration <0`, the obtained 
laststart will be greater than timestamp. At this time, it is necessary to 
shift the window to the right, which is the correct laststart value.
   
   For example
   
   code
   ```
   val timestamp = -13
   val offset = 0
   val windowSize = 7
   
   // old code
   val lastStartOld = timestamp - (timestamp - offset - windowSize) % 
windowSize
   
  // new code
   val remainder =  (timestamp - offset) % windowSize
   
   val lastStartNew =
 if (remainder < 0) {
   timestamp - remainder - windowSize
 } else {
   timestamp - remainder
 }
   
   println(s"lastStartOld = $lastStartOld   lastStartNew = $lastStartNew")
   ```
   result
   ```
   lastStartOld = -7 lastStartNew = -14
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] pan3793 commented on a diff in pull request #36789: [SPARK-39403] Add SPARK_SUBMIT_OPTS in spark-env.sh.template

2022-06-07 Thread GitBox



pan3793 commented on code in PR #36789:
URL: https://github.com/apache/spark/pull/36789#discussion_r891852053


##
conf/spark-env.sh.template:
##
@@ -79,3 +80,6 @@
 # Options for beeline
 # - SPARK_BEELINE_OPTS, to set config properties only for the beeline cli 
(e.g. "-Dx=y")
 # - SPARK_BEELINE_MEMORY, Memory for beeline (e.g. 1000M, 2G) (Default: 1G)
+
+# Options for spark-submit

Review Comment:
   You are right, because `spark-shell` invokes `spark-submit`, I change the 
description to "for spark-submit and the command that depends on spark-submit"



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] nyingping commented on a diff in pull request #36737: [SPARK-39347] [SS] Generate wrong time window when (timestamp-startTime) % slideDuration…

2022-06-07 Thread GitBox



nyingping commented on code in PR #36737:
URL: https://github.com/apache/spark/pull/36737#discussion_r891842533


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala:
##
@@ -3963,8 +3966,10 @@ object TimeWindowing extends Rule[LogicalPlan] {
 
 def getWindow(i: Int, dataType: DataType): Expression = {
   val timestamp = PreciseTimestampConversion(window.timeColumn, 
dataType, LongType)
-  val lastStart = timestamp - (timestamp - window.startTime
-+ window.slideDuration) % window.slideDuration
+  val remainder = (timestamp - window.startTime) % window.slideDuration

Review Comment:
   Thank you for reviewing.
   My understanding is that whether the event time is negative or not is 
determined by the upstream data source, not the Spark platform itself. Even if 
the event time is negative, the platform should give the correct value.
   
   Does `in scala code` refer to the following code:
   ```
   val lastStart = if (remainder < 0) {
   timestamp - remainder - window.slideDuration
} else {
   timestamp - remainder
   }
   ```
   In that case,The `remainder` is a `expressions.predicate`, the `remainder < 
0` can't get a` boolean` value directly.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] nyingping commented on a diff in pull request #36737: [SPARK-39347] [SS] Generate wrong time window when (timestamp-startTime) % slideDuration…

2022-06-07 Thread GitBox



nyingping commented on code in PR #36737:
URL: https://github.com/apache/spark/pull/36737#discussion_r891842533


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala:
##
@@ -3963,8 +3966,10 @@ object TimeWindowing extends Rule[LogicalPlan] {
 
 def getWindow(i: Int, dataType: DataType): Expression = {
   val timestamp = PreciseTimestampConversion(window.timeColumn, 
dataType, LongType)
-  val lastStart = timestamp - (timestamp - window.startTime
-+ window.slideDuration) % window.slideDuration
+  val remainder = (timestamp - window.startTime) % window.slideDuration

Review Comment:
   Thank you for reviewing.
   My understanding is that whether the event time is negative or not is 
determined by the upstream data source, not the Spark platform itself. Even if 
the event time is negative, the platform should give the correct value.
   
   Does `in scala code` refer to the following code:
   ```
   val lastStart = if (remainder < 0) {
   timestamp - remainder - window.slideDuration
} else {
   timestamp - remainder
   }
   ```
   In that case,The `remainder` is a `expressions.predicate`, the `remainder < 
0` can't get a` boolean` value.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] nyingping commented on a diff in pull request #36737: [SPARK-39347] [SS] Generate wrong time window when (timestamp-startTime) % slideDuration…

2022-06-07 Thread GitBox



nyingping commented on code in PR #36737:
URL: https://github.com/apache/spark/pull/36737#discussion_r891848795


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala:
##
@@ -3963,8 +3966,10 @@ object TimeWindowing extends Rule[LogicalPlan] {
 
 def getWindow(i: Int, dataType: DataType): Expression = {
   val timestamp = PreciseTimestampConversion(window.timeColumn, 
dataType, LongType)

Review Comment:
   `before epoch` means that B.C or before '1970-01-01 00:00:00'? I 
understanding that both of negative value.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] srowen commented on a diff in pull request #36737: [SPARK-39347] [SS] Generate wrong time window when (timestamp-startTime) % slideDuration…

2022-06-07 Thread GitBox



srowen commented on code in PR #36737:
URL: https://github.com/apache/spark/pull/36737#discussion_r891850083


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala:
##
@@ -3963,8 +3966,10 @@ object TimeWindowing extends Rule[LogicalPlan] {
 
 def getWindow(i: Int, dataType: DataType): Expression = {
   val timestamp = PreciseTimestampConversion(window.timeColumn, 
dataType, LongType)
-  val lastStart = timestamp - (timestamp - window.startTime
-+ window.slideDuration) % window.slideDuration
+  val remainder = (timestamp - window.startTime) % window.slideDuration
+  val lastStart = CaseWhen(
+Seq((LessThan(remainder, 0), timestamp - remainder - 
window.slideDuration))

Review Comment:
   I think the question is how this arises in the first place. The code is self 
explanatory, not asking you to explain it



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] nyingping commented on a diff in pull request #36737: [SPARK-39347] [SS] Generate wrong time window when (timestamp-startTime) % slideDuration…

2022-06-07 Thread GitBox



nyingping commented on code in PR #36737:
URL: https://github.com/apache/spark/pull/36737#discussion_r891848795


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala:
##
@@ -3963,8 +3966,10 @@ object TimeWindowing extends Rule[LogicalPlan] {
 
 def getWindow(i: Int, dataType: DataType): Expression = {
   val timestamp = PreciseTimestampConversion(window.timeColumn, 
dataType, LongType)

Review Comment:
   `before epoch` means that B.C or before '1970-01-01 00:00:00',I 
understanding that both of negative value.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] xiuzhu9527 commented on pull request #36784: [SPARK-39396][SQL] Fix LDAP login exception 'error code 49 - invalid credentials'

2022-06-07 Thread GitBox



xiuzhu9527 commented on PR #36784:
URL: https://github.com/apache/spark/pull/36784#issuecomment-1149366600

   @pan3793 Thank you very much for your reply. This problem has affected 
users' use. Can we fix this problem first?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] nyingping commented on a diff in pull request #36737: [SPARK-39347] [SS] Generate wrong time window when (timestamp-startTime) % slideDuration…

2022-06-07 Thread GitBox



nyingping commented on code in PR #36737:
URL: https://github.com/apache/spark/pull/36737#discussion_r891847061


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala:
##
@@ -3963,8 +3966,10 @@ object TimeWindowing extends Rule[LogicalPlan] {
 
 def getWindow(i: Int, dataType: DataType): Expression = {
   val timestamp = PreciseTimestampConversion(window.timeColumn, 
dataType, LongType)
-  val lastStart = timestamp - (timestamp - window.startTime
-+ window.slideDuration) % window.slideDuration
+  val remainder = (timestamp - window.startTime) % window.slideDuration
+  val lastStart = CaseWhen(
+Seq((LessThan(remainder, 0), timestamp - remainder - 
window.slideDuration))

Review Comment:
   Thank you for reviewing.
   
   For any event timestamp, the start time of the last window (whether tumbling 
or sliding) obtained by it should always be less than the current timestamp. 
When `(timestamp - window.starttime)% window.slideduration <0`, the obtained 
laststart will be greater than timestamp. At this time, it is necessary to 
shift the window to the right, which is the correct laststart value.
   
   For example
   
   code
   ```
   val timestamp = -13
   val offset = 0
   val windowSize = 7
   
   // old code
   val lastStartOld = timestamp - (timestamp - offset - windowSize) % 
windowSize
   
  // new code
   val remainder =  (timestamp - offset) % windowSize
   
   val lastStartNew =
 if (remainder < 0) {
   timestamp - remainder - windowSize
 } else {
   timestamp - remainder - windowSize
 }
   
   println(s"lastStartOld = $lastStartOld   lastStartNew = $lastStartNew")
   ```
   result
   ```
   lastStartOld = -7 lastStartNew = -14
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #36704: [SPARK-39346][SQL] Convert asserts/illegal state exception to internal errors on each phase

2022-06-07 Thread GitBox



HeartSaVioR commented on code in PR #36704:
URL: https://github.com/apache/spark/pull/36704#discussion_r891846346


##
connector/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala:
##
@@ -666,9 +667,10 @@ abstract class KafkaMicroBatchSourceSuiteBase extends 
KafkaSourceSuiteBase {
 testUtils.sendMessages(topic2, Array("6"))
   },
   StartStream(),
-  ExpectFailure[IllegalStateException](e => {
+  ExpectFailure[SparkException](e => {
+assert(e.asInstanceOf[SparkThrowable].getErrorClass === 
"INTERNAL_ERROR")
 // The offset of `topic2` should be changed from 2 to 1
-assert(e.getMessage.contains("was changed from 2 to 1"))
+assert(e.getCause.getMessage.contains("was changed from 2 to 1"))

Review Comment:
   What makes the time as "illegal" or "inappropriate" is the matter. It does 
not only bind to the bug of the application. The definition is not strict 
enough - if we call readTable whereas concurrent operation on non-atomic 
drop-and-recreate table is happening, the time is "conditionally" 
"inappropriate".
   
   For sure, we can be strict on the project's policy to follow the convention 
you mentioned (probably define new more-specific exception(s) if needed). For 
Kafka data source, we'll need some time to sort out on this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan closed pull request #36703: [SPARK-39321][SQL] Refactor TryCast to use RuntimeReplaceable

2022-06-07 Thread GitBox



cloud-fan closed pull request #36703: [SPARK-39321][SQL] Refactor TryCast to 
use RuntimeReplaceable
URL: https://github.com/apache/spark/pull/36703


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on pull request #36703: [SPARK-39321][SQL] Refactor TryCast to use RuntimeReplaceable

2022-06-07 Thread GitBox



cloud-fan commented on PR #36703:
URL: https://github.com/apache/spark/pull/36703#issuecomment-1149360596

   thanks for the review, merging to master!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dcoliversun commented on a diff in pull request #36781: [SPARK-39393][SQL] Parquet data source only supports push-down predicate filters for non-repeated primitive types

2022-06-07 Thread GitBox



dcoliversun commented on code in PR #36781:
URL: https://github.com/apache/spark/pull/36781#discussion_r891843252


##
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala:
##
@@ -1316,6 +1317,34 @@ abstract class ParquetFilterSuite extends QueryTest with 
ParquetTest with Shared
 }
   }
 
+  test("SPARK-39393 Do not push down predicate filters for repeated primitive 
fields") {

Review Comment:
   Nice work. Little suggestion: 
   ```suggestion
 test("SPARK-39393: Do not push down predicate filters for repeated 
primitive fields") {
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] nyingping commented on a diff in pull request #36737: [SPARK-39347] [SS] Generate wrong time window when (timestamp-startTime) % slideDuration…

2022-06-07 Thread GitBox



nyingping commented on code in PR #36737:
URL: https://github.com/apache/spark/pull/36737#discussion_r891842533


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala:
##
@@ -3963,8 +3966,10 @@ object TimeWindowing extends Rule[LogicalPlan] {
 
 def getWindow(i: Int, dataType: DataType): Expression = {
   val timestamp = PreciseTimestampConversion(window.timeColumn, 
dataType, LongType)
-  val lastStart = timestamp - (timestamp - window.startTime
-+ window.slideDuration) % window.slideDuration
+  val remainder = (timestamp - window.startTime) % window.slideDuration

Review Comment:
   Thank you for reviewing.
   My understanding is that whether the event time is negative or not is 
determined by the upstream data source, not the Spark platform itself. Even if 
the event time is negative, the platform should give the correct value.
   
   Does `in scala code` refer to the following code:
   ```
   val lastStart = if (remainder < 0) {
   timestamp - remainder - window.slideDuration
} else {
   timestamp - remainder
   }
   ```
   In that case,The `remainder` is a `expressions. predicate`, the `remainder < 
0` can't get a` boolean` value.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] nyingping commented on a diff in pull request #36737: [SPARK-39347] [SS] Generate wrong time window when (timestamp-startTime) % slideDuration…

2022-06-07 Thread GitBox



nyingping commented on code in PR #36737:
URL: https://github.com/apache/spark/pull/36737#discussion_r891842533


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala:
##
@@ -3963,8 +3966,10 @@ object TimeWindowing extends Rule[LogicalPlan] {
 
 def getWindow(i: Int, dataType: DataType): Expression = {
   val timestamp = PreciseTimestampConversion(window.timeColumn, 
dataType, LongType)
-  val lastStart = timestamp - (timestamp - window.startTime
-+ window.slideDuration) % window.slideDuration
+  val remainder = (timestamp - window.startTime) % window.slideDuration

Review Comment:
   Thank you for reviewing.
   My understanding is that whether the event time is negative or not is 
determined by the upstream data source, not the Spark platform itself. Even if 
the event time is negative, the platform should give the correct value.
   
   Does `in scala code` refer to the following code:
   ```
   val lastStart = if (remainder < 0) {
   timestamp - remainder - window.slideDuration
} else {
   timestamp - remainder
   }
   ```
   In that case,The remainder is a expressions. Predicate, the remainder < 0 
can't get a Boolean value.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a diff in pull request #36704: [SPARK-39346][SQL] Convert asserts/illegal state exception to internal errors on each phase

2022-06-07 Thread GitBox



cloud-fan commented on code in PR #36704:
URL: https://github.com/apache/spark/pull/36704#discussion_r891840233


##
connector/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala:
##
@@ -666,9 +667,10 @@ abstract class KafkaMicroBatchSourceSuiteBase extends 
KafkaSourceSuiteBase {
 testUtils.sendMessages(topic2, Array("6"))
   },
   StartStream(),
-  ExpectFailure[IllegalStateException](e => {
+  ExpectFailure[SparkException](e => {
+assert(e.asInstanceOf[SparkThrowable].getErrorClass === 
"INTERNAL_ERROR")
 // The offset of `topic2` should be changed from 2 to 1
-assert(e.getMessage.contains("was changed from 2 to 1"))
+assert(e.getCause.getMessage.contains("was changed from 2 to 1"))

Review Comment:
   If Spark is a simple Java library, then this is fine. The caller should 
handle these errors property when building a product.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a diff in pull request #36704: [SPARK-39346][SQL] Convert asserts/illegal state exception to internal errors on each phase

2022-06-07 Thread GitBox



cloud-fan commented on code in PR #36704:
URL: https://github.com/apache/spark/pull/36704#discussion_r891839935


##
connector/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala:
##
@@ -666,9 +667,10 @@ abstract class KafkaMicroBatchSourceSuiteBase extends 
KafkaSourceSuiteBase {
 testUtils.sendMessages(topic2, Array("6"))
   },
   StartStream(),
-  ExpectFailure[IllegalStateException](e => {
+  ExpectFailure[SparkException](e => {
+assert(e.asInstanceOf[SparkThrowable].getErrorClass === 
"INTERNAL_ERROR")
 // The offset of `topic2` should be changed from 2 to 1
-assert(e.getMessage.contains("was changed from 2 to 1"))
+assert(e.getCause.getMessage.contains("was changed from 2 to 1"))

Review Comment:
   > Signals that a method has been invoked at an illegal or inappropriate time.
   
   This means a bug in any product, right? We should not expose these low-level 
errors to end-users, same to NPE, index out of bounds error, etc.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] itholic opened a new pull request, #36797: Spark 39394 3.3

2022-06-07 Thread GitBox



itholic opened a new pull request, #36797:
URL: https://github.com/apache/spark/pull/36797

   ### What changes were proposed in this pull request?
   
   Hotfix https://github.com/apache/spark/pull/36782 for branch-3.3.
   
   
   ### Why are the changes needed?
   
   The improvement of document readability will also improve the usability for 
PySpark Structured Streaming.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes, now the documentation is categorized by its class or their own purpose 
more clearly as below:
   
   ![Screen Shot 2022-06-07 at 12 30 01 
PM](https://user-images.githubusercontent.com/44108233/172289737-bd6ebf0e-601c-4a80-a16a-cf885302e7b6.png)
   
   
   ### How was this patch tested?
   
   The existing doc build in CI should cover.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #36792: [SPARK-39392][SQL][3.3] Refine ANSI error messages for try_* function hints

2022-06-07 Thread GitBox



AmplabJenkins commented on PR #36792:
URL: https://github.com/apache/spark/pull/36792#issuecomment-1149336001

   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #36704: [SPARK-39346][SQL] Convert asserts/illegal state exception to internal errors on each phase

2022-06-07 Thread GitBox



HeartSaVioR commented on code in PR #36704:
URL: https://github.com/apache/spark/pull/36704#discussion_r891827819


##
connector/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala:
##
@@ -666,9 +667,10 @@ abstract class KafkaMicroBatchSourceSuiteBase extends 
KafkaSourceSuiteBase {
 testUtils.sendMessages(topic2, Array("6"))
   },
   StartStream(),
-  ExpectFailure[IllegalStateException](e => {
+  ExpectFailure[SparkException](e => {
+assert(e.asInstanceOf[SparkThrowable].getErrorClass === 
"INTERNAL_ERROR")
 // The offset of `topic2` should be changed from 2 to 1
-assert(e.getMessage.contains("was changed from 2 to 1"))
+assert(e.getCause.getMessage.contains("was changed from 2 to 1"))

Review Comment:
   
https://docs.oracle.com/javase/8/docs/api/java/lang/IllegalStateException.html
   
   > Signals that a method has been invoked at an illegal or inappropriate 
time. In other words, the Java environment or Java application is not in an 
appropriate state for the requested operation.
   
   It's arguable what is the "state" of the application. If the state of the 
application is dependent on the external system (like this), this statement 
sounds to me to be valid.
   
   If Spark project wants to restrict the usage of IllegalStateException to 
only denote the possible internal error then that is fair, but would be great 
if we do not rely on "convention" (that's a magic word everyone can claim the 
different things with the same word) and explicitly mention it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] wangyum commented on pull request #36790: [SPARK-39402][SQL] Optimize ReplaceCTERefWithRepartition to support coalesce partitions

2022-06-07 Thread GitBox



wangyum commented on PR #36790:
URL: https://github.com/apache/spark/pull/36790#issuecomment-1149320236

   cc @maryannxue @cloud-fan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon closed pull request #36788: [SPARK-39401][SQL][TESTS] Replace withView with withTempView in CTEInlineSuite

2022-06-07 Thread GitBox



HyukjinKwon closed pull request #36788: [SPARK-39401][SQL][TESTS] Replace 
withView with withTempView in CTEInlineSuite
URL: https://github.com/apache/spark/pull/36788


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #36788: [SPARK-39401][SQL][TESTS] Replace withView with withTempView in CTEInlineSuite

2022-06-07 Thread GitBox



HyukjinKwon commented on PR #36788:
URL: https://github.com/apache/spark/pull/36788#issuecomment-1149311965

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36789: [SPARK-39403] Add SPARK_SUBMIT_OPTS in spark-env.sh.template

2022-06-07 Thread GitBox



HyukjinKwon commented on code in PR #36789:
URL: https://github.com/apache/spark/pull/36789#discussion_r891812728


##
conf/spark-env.sh.template:
##
@@ -79,3 +80,6 @@
 # Options for beeline
 # - SPARK_BEELINE_OPTS, to set config properties only for the beeline cli 
(e.g. "-Dx=y")
 # - SPARK_BEELINE_MEMORY, Memory for beeline (e.g. 1000M, 2G) (Default: 1G)
+
+# Options for spark-submit

Review Comment:
   Is this only for spark-submit? Seems like it's also used in Spark shell



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #36782: [SPARK-39394][DOCS][SS] Improve PySpark Structured Streaming page more readable

2022-06-07 Thread GitBox



HyukjinKwon commented on PR #36782:
URL: https://github.com/apache/spark/pull/36782#issuecomment-1149309255

   @itholic mind creating a backporting PR to 3.3? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36793: [SPARK-39406][PYTHON] Accept NumPy array in createDataFrame

2022-06-07 Thread GitBox



HyukjinKwon commented on code in PR #36793:
URL: https://github.com/apache/spark/pull/36793#discussion_r891811193


##
python/pyspark/sql/tests/test_session.py:
##
@@ -379,6 +381,54 @@ def test_use_custom_class_for_extensions(self):
 )
 
 
+class CreateDataFrame(unittest.TestCase):

Review Comment:
   Can we add this test into ArrowTests so it tests Arrow 
optimization/nonoptimization?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #36795: [SPARK-39407][SQL][TESTS] Fix ParquetIOSuite to handle the case where DNS responses on `nonexistent`

2022-06-07 Thread GitBox



dongjoon-hyun commented on PR #36795:
URL: https://github.com/apache/spark/pull/36795#issuecomment-1149307716

   Thank you, @HyukjinKwon !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun closed pull request #36795: [SPARK-39407][SQL][TESTS] Fix ParquetIOSuite to handle the case where DNS responses on `nonexistent`

2022-06-07 Thread GitBox



dongjoon-hyun closed pull request #36795: [SPARK-39407][SQL][TESTS] Fix 
ParquetIOSuite to handle the case where DNS responses on `nonexistent`
URL: https://github.com/apache/spark/pull/36795


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #36795: [SPARK-39407][SQL][TESTS] Fix ParquetIOSuite to handle the case where DNS responses on `nonexistent`

2022-06-07 Thread GitBox



dongjoon-hyun commented on PR #36795:
URL: https://github.com/apache/spark/pull/36795#issuecomment-1149307127

   Thank you, @huaxingao . Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] JoshRosen closed pull request #36794: Add Serializable Trait to DirectBinaryAvroDecoder

2022-06-07 Thread GitBox



JoshRosen closed pull request #36794: Add Serializable Trait to 
DirectBinaryAvroDecoder
URL: https://github.com/apache/spark/pull/36794


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] eswardhinakaran-toast commented on pull request #36794: Add Serializable Trait to DirectBinaryAvroDecoder

2022-06-07 Thread GitBox



eswardhinakaran-toast commented on PR #36794:
URL: https://github.com/apache/spark/pull/36794#issuecomment-1149281389

   i meant to close this PR. accidental pull request
   
   On Tue, Jun 7, 2022 at 7:40 PM UCB AMPLab ***@***.***> wrote:
   
   > Can one of the admins verify this patch?
   >
   > —
   > Reply to this email directly, view it on GitHub
   > , or
   > unsubscribe
   > 

   > .
   > You are receiving this because you authored the thread.Message ID:
   > ***@***.***>
   >
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #36794: Add Serializable Trait to DirectBinaryAvroDecoder

2022-06-07 Thread GitBox



AmplabJenkins commented on PR #36794:
URL: https://github.com/apache/spark/pull/36794#issuecomment-1149277712

   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] eswardhinakaran-toast closed pull request #36796: Add serializable trait to direct binary avro decoder

2022-06-07 Thread GitBox



eswardhinakaran-toast closed pull request #36796: Add serializable trait to 
direct binary avro decoder
URL: https://github.com/apache/spark/pull/36796


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] eswardhinakaran-toast opened a new pull request, #36796: Add serializable trait to direct binary avro decoder

2022-06-07 Thread GitBox



eswardhinakaran-toast opened a new pull request, #36796:
URL: https://github.com/apache/spark/pull/36796

   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun opened a new pull request, #36795: [SPARK-39407][SQL][TESTS] Fix ParquetIOSuite to handle the case where DNS responses on

2022-06-07 Thread GitBox



dongjoon-hyun opened a new pull request, #36795:
URL: https://github.com/apache/spark/pull/36795

   … 
   
   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] eswardhinakaran-toast opened a new pull request, #36794: Add Serializable Trait to DirectBinaryAvroDecoder

2022-06-07 Thread GitBox



eswardhinakaran-toast opened a new pull request, #36794:
URL: https://github.com/apache/spark/pull/36794

   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] xinrong-databricks commented on a diff in pull request #36793: [SPARK-39406][PYTHON] Accept NumPy array in createDataFrame

2022-06-07 Thread GitBox



xinrong-databricks commented on code in PR #36793:
URL: https://github.com/apache/spark/pull/36793#discussion_r891581163


##
python/pyspark/sql/session.py:
##
@@ -952,12 +953,30 @@ def createDataFrame(  # type: ignore[misc]
 schema = [x.encode("utf-8") if not isinstance(x, str) else x for x 
in schema]
 
 try:
-import pandas
+import pandas as pd
 
 has_pandas = True
 except Exception:
 has_pandas = False
-if has_pandas and isinstance(data, pandas.DataFrame):
+
+try:
+import numpy as np

Review Comment:
   All NumPy versions should be fine as long as pandas can convert the ndarray 
to pandas Frame. So we check pandas minimum version only.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #35561: [MINOR][DOCS] Fixed closing tags in running-on-kubernetes.md

2022-06-07 Thread GitBox



dongjoon-hyun commented on PR #35561:
URL: https://github.com/apache/spark/pull/35561#issuecomment-1149066369

   Thank YOU for your contribution. :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #36787: [SPARK-39387][BUILD][FOLLOWUP] Upgrade hive-storage-api to 2.7.3

2022-06-07 Thread GitBox



dongjoon-hyun commented on code in PR #36787:
URL: https://github.com/apache/spark/pull/36787#discussion_r891607243


##
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcQuerySuite.scala:
##
@@ -832,6 +832,18 @@ abstract class OrcQuerySuite extends OrcQueryTest with 
SharedSparkSession {
   }
 }
   }
+
+  test("SPARK-39387: Upgrade hive-storage-api to 2.7.3") {

Review Comment:
   I'd expect something like `BytesColumnVector should not throw 
RuntimeException due to overflow`.
   ```
   Caused by: java.lang.RuntimeException: Overflow of newLength. 
smallBuffer.length=1073741824, nextElemLength=1048576
at 
org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.increaseBufferSpace(BytesColumnVector.java:311)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] zr-msft commented on pull request #35561: [MINOR][DOCS] Fixed closing tags in running-on-kubernetes.md

2022-06-07 Thread GitBox



zr-msft commented on PR #35561:
URL: https://github.com/apache/spark/pull/35561#issuecomment-1149052032

   thank you @dongjoon-hyun, I made the assumption that the live site was 
updated after a PR was merged. I see my updates on the rc5 docs and thank you 
for the quick response


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] srielau commented on a diff in pull request #36693: [SPARK-39349] Add a centralized CheckError method for QA of error path

2022-06-07 Thread GitBox



srielau commented on code in PR #36693:
URL: https://github.com/apache/spark/pull/36693#discussion_r891606582


##
sql/catalyst/src/main/scala/org/apache/spark/sql/AnalysisException.scala:
##
@@ -36,13 +36,31 @@ class AnalysisException protected[sql] (
 @transient val plan: Option[LogicalPlan] = None,
 val cause: Option[Throwable] = None,
 val errorClass: Option[String] = None,
+val errorSubClass: Option[String] = None,
 val messageParameters: Array[String] = Array.empty)
   extends Exception(message, cause.orNull) with SparkThrowable with 
Serializable {
 
+protected[sql] def this(message: String,

Review Comment:
   It's need for binary compatibility :-(



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #36787: [SPARK-39387][BUILD][FOLLOWUP] Upgrade hive-storage-api to 2.7.3

2022-06-07 Thread GitBox



dongjoon-hyun commented on code in PR #36787:
URL: https://github.com/apache/spark/pull/36787#discussion_r891605317


##
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcQuerySuite.scala:
##
@@ -832,6 +832,18 @@ abstract class OrcQuerySuite extends OrcQueryTest with 
SharedSparkSession {
   }
 }
   }
+
+  test("SPARK-39387: Upgrade hive-storage-api to 2.7.3") {

Review Comment:
   Please revise the test case name too. A test case name had better be 
self-describing about the test case body.
   
   For example, we can add a test case as `ignored` test before upgrading 
`hive-storage-api`. In addition, we can make the test pass with 
`hive-storage-api` 2.8.1 too.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Eugene-Mark commented on a diff in pull request #36499: [SPARK-38846][SQL] Add explicit data mapping between Teradata Numeric Type and Spark DecimalType

2022-06-07 Thread GitBox



Eugene-Mark commented on code in PR #36499:
URL: https://github.com/apache/spark/pull/36499#discussion_r891588142


##
sql/core/src/main/scala/org/apache/spark/sql/jdbc/TeradataDialect.scala:
##
@@ -96,4 +97,29 @@ private case object TeradataDialect extends JdbcDialect {
   override def getLimitClause(limit: Integer): String = {
 ""
   }
+
+  override def getCatalystType(
+sqlType: Int, typeName: String, size: Int, md: MetadataBuilder): 
Option[DataType] = {
+if (sqlType == Types.NUMERIC) {
+  if (md == null) {
+Option(DecimalType.SYSTEM_DEFAULT)
+  } else {
+val scale = md.build().getLong("scale")
+// In Teradata, Number or Number(*) means precision and scale is 
flexible.
+// However, the scale returned from JDBC is 0, which leads to 
fractional part loss.
+// Handle this special case by adding explicit conversion to system 
default decimal type.
+// Note, even if the NUMBER is defined with explicit precision and 
scale like NUMBER(20, 0),
+// Spark will treat it as DecimalType.SYSTEM_DEFAULT, which is 
NUMBER(38,18)
+if (scale == 0) {
+  Option(DecimalType.SYSTEM_DEFAULT)
+} else {
+  // In Teradata, Number(*, scale) will return size, namely precision, 
as 40,
+  // which conflicts to DecimalType.MAX_PRECISION
+  Option(DecimalType(Math.min(size, DecimalType.MAX_PRECISION), 
scale.toInt))

Review Comment:
   Thanks for this comment! It reminds me of sth more reasonable than my 
current practice! Since in Teradata, only Number(\*)/Number, Number(*,scale) 
and Number(precision,scale) is valid expression, which means when scale is 
flexible, the precision returned must be 40. So we don't need to convert all 
scale = 0 to default decimal type, but only need to do it when the precision = 
40 is detected. Which means we will respect user's explicit scale = 0 settings, 
like Number(20,0), will be converted to DecimalType(20,0). 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Eugene-Mark commented on a diff in pull request #36499: [SPARK-38846][SQL] Add explicit data mapping between Teradata Numeric Type and Spark DecimalType

2022-06-07 Thread GitBox



Eugene-Mark commented on code in PR #36499:
URL: https://github.com/apache/spark/pull/36499#discussion_r891588142


##
sql/core/src/main/scala/org/apache/spark/sql/jdbc/TeradataDialect.scala:
##
@@ -96,4 +97,29 @@ private case object TeradataDialect extends JdbcDialect {
   override def getLimitClause(limit: Integer): String = {
 ""
   }
+
+  override def getCatalystType(
+sqlType: Int, typeName: String, size: Int, md: MetadataBuilder): 
Option[DataType] = {
+if (sqlType == Types.NUMERIC) {
+  if (md == null) {
+Option(DecimalType.SYSTEM_DEFAULT)
+  } else {
+val scale = md.build().getLong("scale")
+// In Teradata, Number or Number(*) means precision and scale is 
flexible.
+// However, the scale returned from JDBC is 0, which leads to 
fractional part loss.
+// Handle this special case by adding explicit conversion to system 
default decimal type.
+// Note, even if the NUMBER is defined with explicit precision and 
scale like NUMBER(20, 0),
+// Spark will treat it as DecimalType.SYSTEM_DEFAULT, which is 
NUMBER(38,18)
+if (scale == 0) {
+  Option(DecimalType.SYSTEM_DEFAULT)
+} else {
+  // In Teradata, Number(*, scale) will return size, namely precision, 
as 40,
+  // which conflicts to DecimalType.MAX_PRECISION
+  Option(DecimalType(Math.min(size, DecimalType.MAX_PRECISION), 
scale.toInt))

Review Comment:
   Thanks for this comment! It reminds me of sth more reasonable than my 
current practice! Since in Teradata, only Number(*)/Number, Number(*,scale) and 
Number(precision,scale) is valid expression, which means when scale is 
flexible, the precision returned must be 40. So we don't need to convert all 
scale = 0 to default decimal type, but only need to do it when the precision = 
40 is detected. Which means we will respect user's explicit scale = 0 settings, 
like Number(20,0), will be converted to DecimalType(20,0). 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] xinrong-databricks commented on a diff in pull request #36793: [SPARK-39406][PYTHON] Accept NumPy array in createDataFrame

2022-06-07 Thread GitBox



xinrong-databricks commented on code in PR #36793:
URL: https://github.com/apache/spark/pull/36793#discussion_r891581163


##
python/pyspark/sql/session.py:
##
@@ -952,12 +953,30 @@ def createDataFrame(  # type: ignore[misc]
 schema = [x.encode("utf-8") if not isinstance(x, str) else x for x 
in schema]
 
 try:
-import pandas
+import pandas as pd
 
 has_pandas = True
 except Exception:
 has_pandas = False
-if has_pandas and isinstance(data, pandas.DataFrame):
+
+try:
+import numpy as np

Review Comment:
   All NumPy versions should be fine as long as pandas can convert the ndarray 
to pandas Frame.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] xinrong-databricks opened a new pull request, #36793: [SPARK-39406][PYTHON] Accept NumPy array in createDataFrame

2022-06-07 Thread GitBox



xinrong-databricks opened a new pull request, #36793:
URL: https://github.com/apache/spark/pull/36793

   ### What changes were proposed in this pull request?
   Accept numpy array in createDataFrame, with existing dtypes support.
   
   
   ### Why are the changes needed?
   As part of [SPARK-39405], 
   
   ### Does this PR introduce _any_ user-facing change?
   Yes, NumPy array is accepted in createDataFrame now:
   ```py
   ```
   
   
   ### How was this patch tested?
   Unit tests.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] amaliujia commented on a diff in pull request #36586: [SPARK-39236][SQL] Make CreateTable and ListTables be compatible with 3 layer namespace

2022-06-07 Thread GitBox



amaliujia commented on code in PR #36586:
URL: https://github.com/apache/spark/pull/36586#discussion_r891563533


##
sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala:
##
@@ -117,14 +128,44 @@ class CatalogImpl(sparkSession: SparkSession) extends 
Catalog {
   case NonFatal(_) => None
 }
 val isTemp = sessionCatalog.isTempView(tableIdent)
+val qualifier =
+  
Array(metadata.map(_.identifier.database).getOrElse(tableIdent.database).orNull)
 new Table(
   name = tableIdent.table,

Review Comment:
   done



##
sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala:
##
@@ -117,14 +128,44 @@ class CatalogImpl(sparkSession: SparkSession) extends 
Catalog {
   case NonFatal(_) => None
 }
 val isTemp = sessionCatalog.isTempView(tableIdent)
+val qualifier =

Review Comment:
   hm updated.



##
sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala:
##
@@ -97,8 +98,18 @@ class CatalogImpl(sparkSession: SparkSession) extends 
Catalog {
*/
   @throws[AnalysisException]("database does not exist")
   override def listTables(dbName: String): Dataset[Table] = {
-val tables = sessionCatalog.listTables(dbName).map(makeTable)
-CatalogImpl.makeDataset(tables, sparkSession)
+if (sessionCatalog.databaseExists(dbName)) {

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] LuciferYang closed pull request #36694: [MINOR][BUILD] Remove redundant maven `` definition

2022-06-07 Thread GitBox



LuciferYang closed pull request #36694: [MINOR][BUILD] Remove redundant maven 
`` definition
URL: https://github.com/apache/spark/pull/36694


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] vli-databricks commented on pull request #36792: [SPARK-39392][SQL][3.3] Refine ANSI error messages for try_* function hints

2022-06-07 Thread GitBox



vli-databricks commented on PR #36792:
URL: https://github.com/apache/spark/pull/36792#issuecomment-1149003141

   @MaxGekk


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] vli-databricks opened a new pull request, #36792: [SPARK-39392][SQL][3.3] Refine ANSI error messages for try_* function hints

2022-06-07 Thread GitBox



vli-databricks opened a new pull request, #36792:
URL: https://github.com/apache/spark/pull/36792

   
   
   Refine ANSI error messages and remove 'To return NULL instead'
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   Improve error messaging for ANSI mode since the user may not even aware that 
query was returning NULLs.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No
   
   ### How was this patch tested?
   
   Unit tests


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] jerrypeng commented on a diff in pull request #36737: [SPARK-39347] [SS] Generate wrong time window when (timestamp-startTime) % slideDuration…

2022-06-07 Thread GitBox



jerrypeng commented on code in PR #36737:
URL: https://github.com/apache/spark/pull/36737#discussion_r891557370


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala:
##
@@ -3963,8 +3966,10 @@ object TimeWindowing extends Rule[LogicalPlan] {
 
 def getWindow(i: Int, dataType: DataType): Expression = {
   val timestamp = PreciseTimestampConversion(window.timeColumn, 
dataType, LongType)

Review Comment:
   If the timeColumn is before epoch, what would be the timestamp long val be? 
Negative?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] amaliujia commented on a diff in pull request #36586: [SPARK-39236][SQL] Make CreateTable and ListTables be compatible with 3 layer namespace

2022-06-07 Thread GitBox



amaliujia commented on code in PR #36586:
URL: https://github.com/apache/spark/pull/36586#discussion_r891537461


##
sql/core/src/test/scala/org/apache/spark/sql/internal/CatalogSuite.scala:
##
@@ -553,4 +570,100 @@ class CatalogSuite extends SharedSparkSession with 
AnalysisTest {
 }.getMessage
 assert(errMsg.contains("my_temp_table is a temp view. 
'recoverPartitions()' expects a table"))
   }
+
+  test("three layer namespace compatibility - create managed table") {
+spark.conf.set("spark.sql.catalog.testcat", 
classOf[InMemoryCatalog].getName)
+val catalogName = "testcat"
+val dbName = "my_db"
+val tableName = "my_table"
+val tableSchema = new StructType().add("i", "int")
+val description = "this is a test table"
+
+val df = spark.catalog.createTable(
+  tableName = Array(catalogName, dbName, tableName).mkString("."),
+  source = classOf[FakeV2Provider].getName,
+  schema = tableSchema,
+  description = description,
+  options = Map.empty[String, String])
+assert(df.schema.equals(tableSchema))
+
+val testCatalog =
+  spark.sessionState.catalogManager.catalog("testcat").asTableCatalog
+val table = testCatalog.loadTable(Identifier.of(Array(dbName), tableName))
+assert(table.schema().equals(tableSchema))
+
assert(table.properties().get("provider").equals(classOf[FakeV2Provider].getName))
+assert(table.properties().get("comment").equals(description))
+  }
+
+  test("three layer namespace compatibility - create external table") {
+withTempDir { dir =>
+  val catalogName = "testcat"
+  val dbName = "my_db"
+  val tableName = "my_table"
+  val tableSchema = new StructType().add("i", "int")
+  val description = "this is a test table"
+
+  val df = spark.catalog.createTable(
+tableName = Array(catalogName, dbName, tableName).mkString("."),
+source = classOf[FakeV2Provider].getName,
+schema = tableSchema,
+description = description,
+options = Map("path" -> dir.getAbsolutePath))
+  assert(df.schema.equals(tableSchema))
+
+  val testCatalog =
+spark.sessionState.catalogManager.catalog("testcat").asTableCatalog
+  val table = testCatalog.loadTable(Identifier.of(Array(dbName), 
tableName))
+  assert(table.schema().equals(tableSchema))
+  
assert(table.properties().get("provider").equals(classOf[FakeV2Provider].getName))
+  assert(table.properties().get("comment").equals(description))
+  assert(table.properties().get("path").equals(dir.getAbsolutePath))
+}
+  }
+
+  test("three layer namespace compatibility - list tables") {

Review Comment:
   Actually we still need the change in `listTables`: 
   
   existing users do not use 3-part name and they just use 2 part name. In the 
case of the `listTables`, existing users use `dbname` directly (so just `b` but 
not `a.b`). In this case, there is a choice to decide which catalog it is, and 
the default catalog is hive metastore.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] amaliujia commented on a diff in pull request #36586: [SPARK-39236][SQL] Make CreateTable and ListTables be compatible with 3 layer namespace

2022-06-07 Thread GitBox



amaliujia commented on code in PR #36586:
URL: https://github.com/apache/spark/pull/36586#discussion_r891534128


##
sql/core/src/main/scala/org/apache/spark/sql/catalog/interface.scala:
##
@@ -64,15 +65,34 @@ class Database(
 @Stable
 class Table(
 val name: String,
-@Nullable val database: String,
+@Nullable val catalog: String,
+@Nullable val namespace: Array[String],
 @Nullable val description: String,
 val tableType: String,
 val isTemporary: Boolean)
   extends DefinedByConstructorParams {
 
+  def this(name: String, database: String, description: String, tableType: 
String,
+isTemporary: Boolean) = {
+this(name, null, Array(database), description, tableType, isTemporary)
+  }
+
+  def database: String = {
+if (namespace == null) {
+  null
+} else if (namespace.length == 2) {

Review Comment:
   Right I think we only care about array.length = 1 which is the database.



##
sql/core/src/main/scala/org/apache/spark/sql/catalog/interface.scala:
##
@@ -64,15 +65,34 @@ class Database(
 @Stable
 class Table(
 val name: String,
-@Nullable val database: String,
+@Nullable val catalog: String,
+@Nullable val namespace: Array[String],
 @Nullable val description: String,
 val tableType: String,
 val isTemporary: Boolean)
   extends DefinedByConstructorParams {
 
+  def this(name: String, database: String, description: String, tableType: 
String,
+isTemporary: Boolean) = {
+this(name, null, Array(database), description, tableType, isTemporary)
+  }
+
+  def database: String = {
+if (namespace == null) {
+  null
+} else if (namespace.length == 2) {

Review Comment:
   Right I think we only care about array.length = 1 which is the database. 
Will remove it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] jerrypeng commented on a diff in pull request #36737: [SPARK-39347] [SS] Generate wrong time window when (timestamp-startTime) % slideDuration…

2022-06-07 Thread GitBox



jerrypeng commented on code in PR #36737:
URL: https://github.com/apache/spark/pull/36737#discussion_r891532480


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala:
##
@@ -3963,8 +3966,10 @@ object TimeWindowing extends Rule[LogicalPlan] {
 
 def getWindow(i: Int, dataType: DataType): Expression = {
   val timestamp = PreciseTimestampConversion(window.timeColumn, 
dataType, LongType)
-  val lastStart = timestamp - (timestamp - window.startTime
-+ window.slideDuration) % window.slideDuration
+  val remainder = (timestamp - window.startTime) % window.slideDuration
+  val lastStart = CaseWhen(
+Seq((LessThan(remainder, 0), timestamp - remainder - 
window.slideDuration))

Review Comment:
   Can you explain why this is correct now?  Please give an example.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Eugene-Mark commented on a diff in pull request #36499: [SPARK-38846][SQL] Add explicit data mapping between Teradata Numeric Type and Spark DecimalType

2022-06-07 Thread GitBox



Eugene-Mark commented on code in PR #36499:
URL: https://github.com/apache/spark/pull/36499#discussion_r891528661


##
sql/core/src/main/scala/org/apache/spark/sql/jdbc/TeradataDialect.scala:
##
@@ -96,4 +97,29 @@ private case object TeradataDialect extends JdbcDialect {
   override def getLimitClause(limit: Integer): String = {
 ""
   }
+
+  override def getCatalystType(
+sqlType: Int, typeName: String, size: Int, md: MetadataBuilder): 
Option[DataType] = {
+if (sqlType == Types.NUMERIC) {

Review Comment:
   Good point! Will modify as per.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] amaliujia commented on a diff in pull request #36586: [SPARK-39236][SQL] Make CreateTable and ListTables be compatible with 3 layer namespace

2022-06-07 Thread GitBox



amaliujia commented on code in PR #36586:
URL: https://github.com/apache/spark/pull/36586#discussion_r891497676


##
sql/core/src/test/scala/org/apache/spark/sql/internal/CatalogSuite.scala:
##
@@ -553,4 +570,100 @@ class CatalogSuite extends SharedSparkSession with 
AnalysisTest {
 }.getMessage
 assert(errMsg.contains("my_temp_table is a temp view. 
'recoverPartitions()' expects a table"))
   }
+
+  test("three layer namespace compatibility - create managed table") {
+spark.conf.set("spark.sql.catalog.testcat", 
classOf[InMemoryCatalog].getName)
+val catalogName = "testcat"
+val dbName = "my_db"
+val tableName = "my_table"
+val tableSchema = new StructType().add("i", "int")
+val description = "this is a test table"
+
+val df = spark.catalog.createTable(
+  tableName = Array(catalogName, dbName, tableName).mkString("."),
+  source = classOf[FakeV2Provider].getName,
+  schema = tableSchema,
+  description = description,
+  options = Map.empty[String, String])
+assert(df.schema.equals(tableSchema))
+
+val testCatalog =
+  spark.sessionState.catalogManager.catalog("testcat").asTableCatalog
+val table = testCatalog.loadTable(Identifier.of(Array(dbName), tableName))
+assert(table.schema().equals(tableSchema))
+
assert(table.properties().get("provider").equals(classOf[FakeV2Provider].getName))
+assert(table.properties().get("comment").equals(description))
+  }
+
+  test("three layer namespace compatibility - create external table") {
+withTempDir { dir =>
+  val catalogName = "testcat"
+  val dbName = "my_db"
+  val tableName = "my_table"
+  val tableSchema = new StructType().add("i", "int")
+  val description = "this is a test table"
+
+  val df = spark.catalog.createTable(
+tableName = Array(catalogName, dbName, tableName).mkString("."),
+source = classOf[FakeV2Provider].getName,
+schema = tableSchema,
+description = description,
+options = Map("path" -> dir.getAbsolutePath))
+  assert(df.schema.equals(tableSchema))
+
+  val testCatalog =
+spark.sessionState.catalogManager.catalog("testcat").asTableCatalog
+  val table = testCatalog.loadTable(Identifier.of(Array(dbName), 
tableName))
+  assert(table.schema().equals(tableSchema))
+  
assert(table.properties().get("provider").equals(classOf[FakeV2Provider].getName))
+  assert(table.properties().get("comment").equals(description))
+  assert(table.properties().get("path").equals(dir.getAbsolutePath))
+}
+  }
+
+  test("three layer namespace compatibility - list tables") {

Review Comment:
   yeah if you agree I will remove unnecessary change.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] vli-databricks closed pull request #36791: [SPARK-39392][SQL][3.3] Refine ANSI error messages for try_* function…

2022-06-07 Thread GitBox



vli-databricks closed pull request #36791: [SPARK-39392][SQL][3.3] Refine ANSI 
error messages for try_* function…
URL: https://github.com/apache/spark/pull/36791


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] vli-databricks opened a new pull request, #36791: [SPARK-39392][SQL][3.3] Refine ANSI error messages for try_* function…

2022-06-07 Thread GitBox



vli-databricks opened a new pull request, #36791:
URL: https://github.com/apache/spark/pull/36791

   … hints
   
   
   
   ### What changes were proposed in this pull request?
   
   Refine ANSI error messages and remove 'To return NULL instead'
   
   ### Why are the changes needed?
   
   Improve error messaging for ANSI mode since the user may not even aware that 
query was returning NULLs.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No
   
   ### How was this patch tested?
   
   Unit tests


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #35561: [MINOR][DOCS] Fixed closing tags in running-on-kubernetes.md

2022-06-07 Thread GitBox



dongjoon-hyun commented on PR #35561:
URL: https://github.com/apache/spark/pull/35561#issuecomment-1148929814

   Hi, @zr-msft .  Did you check Apache Spark 3.3 RC5 document? It should be 
there.
   - 
https://dist.apache.org/repos/dist/dev/spark/v3.3.0-rc5-docs/_site/index.html
   
   For branch-3.2, your contribute arrived after 3.2.1. So, it will be Apache 
Spark 3.2.2.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] zr-msft commented on pull request #35561: [MINOR][DOCS] Fixed closing tags in running-on-kubernetes.md

2022-06-07 Thread GitBox



zr-msft commented on PR #35561:
URL: https://github.com/apache/spark/pull/35561#issuecomment-1148923071

   @dongjoon-hyun I've periodically checked the docs site and I'm not seeing 
any changes show up based on commits i've added from this PR:
   * 
https://spark.apache.org/docs/latest/running-on-kubernetes.html#configuration
   * 
https://spark.apache.org/docs/3.2.1/running-on-kubernetes.html#configuration
   
   I'm also not seeing earlier commits show up:
   * 
https://github.com/apache/spark/commit/302cb2257b66642cd3de0f61a700293b8ac7b000
   * 
https://github.com/apache/spark/commit/476214bc1cc813f0a2332bee53dfc7248ebd2a66
   
   The most recent commit that shows up on this page is from Jul 18, 2021:  
   * 
https://github.com/apache/spark/commit/eea69c122f20577956c4a87a6d8eb59943c1c6f0 
-- https://spark.apache.org/docs/latest/running-on-kubernetes.html#prerequisites
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] pan3793 commented on pull request #36784: [SPARK-39396][SQL] Fix LDAP login exception 'error code 49 - invalid credentials'

2022-06-07 Thread GitBox



pan3793 commented on PR #36784:
URL: https://github.com/apache/spark/pull/36784#issuecomment-1148900904

   Thanks for ping me, I think the current `LdapAuthenticationProviderImpl` 
comes from a very old version of Hive w/o UT, so the exsiting UT can not cover 
your change.
   The `LdapAuthenticationProviderImpl` of Spark lacks a lot of 
functions(including this PR propose to add) comparing to Hive master code, I'm 
not sure if the Spark community has plan to sync the code and add more UT for 
STS.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dtenedor commented on a diff in pull request #36745: [SPARK-39359][SQL] Restrict DEFAULT columns to allowlist of supported data source types

2022-06-07 Thread GitBox



dtenedor commented on code in PR #36745:
URL: https://github.com/apache/spark/pull/36745#discussion_r891452022


##
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:
##
@@ -2881,6 +2881,15 @@ object SQLConf {
   .booleanConf
   .createWithDefault(true)
 
+  val DEFAULT_COLUMN_ALLOWED_PROVIDERS =
+buildConf("spark.sql.defaultColumn.allowedProviders")
+  .internal()
+  .doc("List of table providers wherein SQL commands are permitted to 
assign DEFAULT column " +
+"values. Comma-separated list, whitespace ignored, case-insensitive.")
+  .version("3.4.0")
+  .stringConf
+  .createWithDefault("csv,json,orc,parquet")

Review Comment:
   @cloud-fan @gengliangwang this is a good point. I was thinking the default 
value of this conf contains the four data sources that we actually have support 
for scanning the default values. The primary purpose is to serve as a mechanism 
for banning default values with the other data sources, users are not expected 
to have to change this.
   
   As Gengliang mentions, it could be a possible escape hatch if we discover a 
bug in one data source later, and it's also useful for testing. Maybe we can 
extend the description of this conf to mention that in the normal/expected 
case, users don't have to change anything.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] wangyum opened a new pull request, #36790: [SPARK-39402][SQL] Optimize ReplaceCTERefWithRepartition to support coalesce partitions

2022-06-07 Thread GitBox



wangyum opened a new pull request, #36790:
URL: https://github.com/apache/spark/pull/36790

   ### What changes were proposed in this pull request?
   
   Optimize `ReplaceCTERefWithRepartition` to support coalesce partitions. For 
example:
   
   Before this PR | After this PR
   -- | --
   
![image](https://user-images.githubusercontent.com/5399861/172430973-feb5de0c-373b-4a66-be49-d25648dcd24a.png)
 | 
![image](https://user-images.githubusercontent.com/5399861/172431034-5c5aa7c4-1cc5-4922-8fb0-a703cd6e9fec.png)
   
   ### Why are the changes needed?
   
   Reduce the overhead of shuffle as much as possible.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Unit test.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] pan3793 opened a new pull request, #36789: [SPARK-39403] Add SPARK_SUBMIT_OPTS in spark-env.sh.template

2022-06-07 Thread GitBox



pan3793 opened a new pull request, #36789:
URL: https://github.com/apache/spark/pull/36789

   
   
   ### What changes were proposed in this pull request?
   
   Add SPARK_SUBMIT_OPTS in spark-env.sh.template
   
   ### Why are the changes needed?
   
   Spark support using SPARK_SUBMIT_OPTS to pass properties to `spark-submit` 
but does not mention it in `spark-env.sh.template`
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Comment change only.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] srielau commented on a diff in pull request #36693: [SPARK-39349] Add a centralized CheckError method for QA of error path

2022-06-07 Thread GitBox



srielau commented on code in PR #36693:
URL: https://github.com/apache/spark/pull/36693#discussion_r891425678


##
core/src/main/java/org/apache/spark/memory/SparkOutOfMemoryError.java:
##
@@ -39,11 +39,17 @@ public SparkOutOfMemoryError(OutOfMemoryError e) {
 }
 
 public SparkOutOfMemoryError(String errorClass, String[] 
messageParameters) {
-super(SparkThrowableHelper.getMessage(errorClass, messageParameters, 
""));
+super(SparkThrowableHelper.getMessage(errorClass, null,

Review Comment:
   Declined, it's java



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] srielau commented on a diff in pull request #36693: [SPARK-39349] Add a centralized CheckError method for QA of error path

2022-06-07 Thread GitBox



srielau commented on code in PR #36693:
URL: https://github.com/apache/spark/pull/36693#discussion_r891425369


##
core/src/test/scala/org/apache/spark/SparkFunSuite.scala:
##
@@ -264,6 +264,87 @@ abstract class SparkFunSuite
 }
   }
 
+  /**
+   * Checks an exception with an error class against expected results.
+   * @param exception The exception to check
+   * @param errorClassThe expected error class identifying the error
+   * @param errorSubClass Optional the expected subclass, None if not given
+   * @param sqlState  Optional the expected SQLSTATE, not verified if not 
supplied
+   * @param parametersA map of parameter names and values. The names are 
as defined
+   *  in the error-classes file.
+   */
+  protected def checkError(
+  exception: Exception with SparkThrowable,
+  errorClass: String,
+  errorSubClass: Option[String],
+  sqlState: Option[String],
+  parameters: Map[String, String],
+  matchPVals: Boolean = false): Unit = {
+assert(exception.getErrorClass === errorClass)
+if (exception.getErrorSubClass != null) { assert(errorSubClass.isDefined) }
+errorSubClass.foreach(subClass => assert(exception.getErrorSubClass === 
subClass))
+sqlState.foreach(state => assert(exception.getSqlState === state))
+val expectedParameters = (exception.getParameterNames zip 
exception.getMessageParameters).toMap
+if (matchPVals == true) {
+  assert(expectedParameters.size === parameters.size)
+  expectedParameters.foreach(
+exp => {
+  val parm = parameters.getOrElse(exp._1,
+throw new IllegalArgumentException("Missing parameter" + exp._1))
+  if (!exp._2.matches(parm)) {
+throw new IllegalArgumentException("(" + exp._1 + ", " + exp._2 +
+  ") does not match: " + parm)
+  }
+}
+  )
+} else {
+  assert(expectedParameters === parameters)
+}
+  }
+
+  protected def checkError(
+  exception: Exception with SparkThrowable,
+  errorClass: String,
+  errorSubClass: String,
+  sqlState: String,
+  parameters: Map[String, String]): Unit =
+checkError(exception, errorClass, Some(errorSubClass), Some(sqlState), 
parameters)
+
+  protected def checkError(
+  exception: Exception with SparkThrowable,
+  errorClass: String,
+  sqlState: String,
+  parameters: Map[String, String]): Unit =
+checkError(exception, errorClass, None, Some(sqlState), parameters)
+
+  protected def checkError(
+  exception: Exception with SparkThrowable,
+  errorClass: String,
+  parameters: Map[String, String]): Unit =
+checkError(exception, errorClass, None, None, parameters)
+
+  /**
+   * Checks an exception with an error class against expected results.
+   * @param exception The exception to check
+   * @param errorClassThe expected error class identifying the error
+   * @param sqlState  Optional the expected SQLSTATE, not verified if not 
supplied
+   * @param parametersAn array of values. This does not verify the right 
name association.
+   */
+  protected def checkError(
+  exception: Exception with SparkThrowable,
+  errorClass: String,
+  sqlState: String,
+  parameters: Array[String]): Unit =

Review Comment:
   Removed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 >

1 - 100 of 218 matches

Mail list logo