[GitHub] [spark] sadikovi commented on pull request #36726: [SPARK-39339][SQL] Support TimestampNTZ type in JDBC data source

2022-06-03 Thread GitBox


sadikovi commented on PR #36726:
URL: https://github.com/apache/spark/pull/36726#issuecomment-1145652717

   @gengliangwang Can you review?
   
   Thanks to the comments from the reviewers, I noticed that there could be 
inconsistencies when writing timestamp with local time zone and reading as 
timestamp_ntz. For example, writing 2020-01-01 00:00:00 in local time zone 
(UTC+1), timestamp_ntz will be read as 2019-12-31 23:00:00. This is because we 
always store timestamp in UTC and we don't have information on what time zone 
was used when writing timestamp.
   
   Could you advise on how to proceed? I am not sure we can do much about it 
because we don't store time zone information.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] olaky commented on pull request #36752: [SPARK-39259][SQL][3.3] Evaluate timestamps consistently in subqueries

2022-06-03 Thread GitBox


olaky commented on PR #36752:
URL: https://github.com/apache/spark/pull/36752#issuecomment-1145665119

   Thanks for picking this up so quickly, I wanted to first see if the build 
passes and then update the comment


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gengliangwang commented on pull request #36726: [SPARK-39339][SQL] Support TimestampNTZ type in JDBC data source

2022-06-03 Thread GitBox


gengliangwang commented on PR #36726:
URL: https://github.com/apache/spark/pull/36726#issuecomment-1145676931

   @sadikovi yes will do. I just moved home. Sorry for the late reply.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] olaky commented on pull request #36753: [SPARK-39259] Evaluate timestamps consistently in subqueries

2022-06-03 Thread GitBox


olaky commented on PR #36753:
URL: https://github.com/apache/spark/pull/36753#issuecomment-1145711345

   There are some Python errors in the build, which seem to me like there is a 
problem with pandas or python versions in the build pipeline. Anyway, I really 
doubt that I broke those tests


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #36756: [SPARK-39369][INFRA] Use JAVA_OPTS for AppVeyer build to increase the memory properly

2022-06-03 Thread GitBox


HyukjinKwon commented on PR #36756:
URL: https://github.com/apache/spark/pull/36756#issuecomment-1145722312

   Let me merge this in to master to recover the build.
   
   Merged to master


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon closed pull request #36756: [SPARK-39369][INFRA] Use JAVA_OPTS for AppVeyer build to increase the memory properly

2022-06-03 Thread GitBox


HyukjinKwon closed pull request #36756: [SPARK-39369][INFRA] Use JAVA_OPTS for 
AppVeyer build to increase the memory properly
URL: https://github.com/apache/spark/pull/36756


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36701: [SPARK-39179][PYTHON][TESTS] Improve the test coverage for pyspark/shuffle.py

2022-06-03 Thread GitBox


HyukjinKwon commented on code in PR #36701:
URL: https://github.com/apache/spark/pull/36701#discussion_r888738083


##
python/pyspark/tests/test_shuffle.py:
##
@@ -54,6 +63,59 @@ def test_medium_dataset(self):
 self.assertTrue(m.spills >= 1)
 self.assertEqual(sum(sum(v) for k, v in m.items()), sum(range(self.N)) 
* 3)
 
+def test_shuffle_data_with_multiple_locations(self):
+# SPARK-39179: Test shuffle of data with multiple location also check
+# shuffle locations get randomized
+
+with tempfile.TemporaryDirectory() as tempdir1, 
tempfile.TemporaryDirectory() as tempdir2:
+original = None
+if "SPARK_LOCAL_DIRS" in os.environ:
+original = os.environ["SPARK_LOCAL_DIRS"]

Review Comment:
   you can use `os.environ.get("SPARK_LOCAL_DIRS", None)`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #36756: [SPARK-39369][INFRA] Use JAVA_OPTS for AppVeyer build to increase the memory properly

2022-06-03 Thread GitBox


HyukjinKwon commented on PR #36756:
URL: https://github.com/apache/spark/pull/36756#issuecomment-1145741362

   Seems working fine. Let me get this in to recover the build.
   
   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon opened a new pull request, #36758: [SPARK-39372][INFRA][R] Update R version to 4.2.0 in AppVeyor

2022-06-03 Thread GitBox


HyukjinKwon opened a new pull request, #36758:
URL: https://github.com/apache/spark/pull/36758

   ### What changes were proposed in this pull request?
   
   This PR updates AppVeyor to use the latest R version 4.2.0.
   
   ### Why are the changes needed?
   
   To test the latest R. R community tends to use the latest versions 
aggresively.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No, dev-only
   
   ### How was this patch tested?
   
   CI in this PR should test it out.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon closed pull request #36756: [SPARK-39369][INFRA] Use JAVA_OPTS for AppVeyer build to increase the memory properly

2022-06-03 Thread GitBox


HyukjinKwon closed pull request #36756: [SPARK-39369][INFRA] Use JAVA_OPTS for 
AppVeyer build to increase the memory properly
URL: https://github.com/apache/spark/pull/36756


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk commented on pull request #36753: [SPARK-39259][SQL][3.2] Evaluate timestamps consistently in subqueries

2022-06-03 Thread GitBox


MaxGekk commented on PR #36753:
URL: https://github.com/apache/spark/pull/36753#issuecomment-1145741825

   > There are some Python errors in the build, which seem to me like there is 
a problem with pandas or python versions in the build pipeline.
   
   Looking at GAs for commits in 
https://github.com/apache/spark/commits/branch-3.2, they have been broken for 
some time already. cc @HyukjinKwon @ueshin @itholic


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #36757: [SPARK-39371][DOCS][Core] Review and fix issues in Scala/Java API docs of Core module

2022-06-03 Thread GitBox


HyukjinKwon commented on PR #36757:
URL: https://github.com/apache/spark/pull/36757#issuecomment-1145743251

   Merged to master and branch-3.3.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon closed pull request #36757: [SPARK-39371][DOCS][Core] Review and fix issues in Scala/Java API docs of Core module

2022-06-03 Thread GitBox


HyukjinKwon closed pull request #36757: [SPARK-39371][DOCS][Core] Review and 
fix issues in Scala/Java API docs of Core module
URL: https://github.com/apache/spark/pull/36757


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #36753: [SPARK-39259][SQL][3.2] Evaluate timestamps consistently in subqueries

2022-06-03 Thread GitBox


HyukjinKwon commented on PR #36753:
URL: https://github.com/apache/spark/pull/36753#issuecomment-1145749316

   Let me fix it


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #36753: [SPARK-39259][SQL][3.2] Evaluate timestamps consistently in subqueries

2022-06-03 Thread GitBox


HyukjinKwon commented on PR #36753:
URL: https://github.com/apache/spark/pull/36753#issuecomment-1145754894

   https://github.com/apache/spark/pull/36759


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon opened a new pull request, #36759: [SPARK-39373][PYTHON][3.2] Recover branch-3.2 build broken by SPARK-39273 and SPARK-39252

2022-06-03 Thread GitBox


HyukjinKwon opened a new pull request, #36759:
URL: https://github.com/apache/spark/pull/36759

   ### What changes were proposed in this pull request?
   
   Backporting SPARK-39273 and SPARK-39252 brought some mistakes together into 
branch-3.2. This PR fixes it. One is to avoid exact match (by a different 
pandas version). Another one is unused import.
   
   ### Why are the changes needed?
   
   To recover the build
   
   ### Does this PR introduce _any_ user-facing change?
   
   No, test-only.
   
   ### How was this patch tested?
   
   CI in this PR should test it out.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk commented on pull request #36753: [SPARK-39259][SQL][3.2] Evaluate timestamps consistently in subqueries

2022-06-03 Thread GitBox


MaxGekk commented on PR #36753:
URL: https://github.com/apache/spark/pull/36753#issuecomment-1145768022

   Let's wait for https://github.com/apache/spark/pull/36759 and then 
re-trigger the build via an empty commit:
   ```
   $ git commit --allow-empty -m "Trigger build"
   ``` 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] pralabhkumar commented on a diff in pull request #36701: [SPARK-39179][PYTHON][TESTS] Improve the test coverage for pyspark/shuffle.py

2022-06-03 Thread GitBox


pralabhkumar commented on code in PR #36701:
URL: https://github.com/apache/spark/pull/36701#discussion_r888778272


##
python/pyspark/tests/test_shuffle.py:
##
@@ -54,6 +63,59 @@ def test_medium_dataset(self):
 self.assertTrue(m.spills >= 1)
 self.assertEqual(sum(sum(v) for k, v in m.items()), sum(range(self.N)) 
* 3)
 
+def test_shuffle_data_with_multiple_locations(self):
+# SPARK-39179: Test shuffle of data with multiple location also check
+# shuffle locations get randomized
+
+with tempfile.TemporaryDirectory() as tempdir1, 
tempfile.TemporaryDirectory() as tempdir2:
+original = None
+if "SPARK_LOCAL_DIRS" in os.environ:
+original = os.environ["SPARK_LOCAL_DIRS"]

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36758: [SPARK-39372][INFRA][R] Update R version to 4.2.0 in AppVeyor

2022-06-03 Thread GitBox


HyukjinKwon commented on code in PR #36758:
URL: https://github.com/apache/spark/pull/36758#discussion_r10253


##
dev/appveyor-install-dependencies.ps1:
##
@@ -129,7 +129,7 @@ $env:PATH = "$env:HADOOP_HOME\bin;" + $env:PATH
 Pop-Location
 
 # == R
-$rVer = "4.0.2"
+$rVer = "4.2.0"
 $rToolsVer = "4.0.2"

Review Comment:
   While the tests passed, it failed to download RTools 4.2.0. I reverted the 
RTools upgrade here for now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36758: [SPARK-39372][INFRA][R] Update R version to 4.2.0 in AppVeyor

2022-06-03 Thread GitBox


HyukjinKwon commented on code in PR #36758:
URL: https://github.com/apache/spark/pull/36758#discussion_r10253


##
dev/appveyor-install-dependencies.ps1:
##
@@ -129,7 +129,7 @@ $env:PATH = "$env:HADOOP_HOME\bin;" + $env:PATH
 Pop-Location
 
 # == R
-$rVer = "4.0.2"
+$rVer = "4.2.0"
 $rToolsVer = "4.0.2"

Review Comment:
   While the tests passed (one test failed but seems flaky), it failed to 
download RTools 4.2.0. I reverted the RTools upgrade here for now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36759: [SPARK-39373][PYTHON][3.2] Recover branch-3.2 build broken by SPARK-39273 and SPARK-39252

2022-06-03 Thread GitBox


HyukjinKwon commented on code in PR #36759:
URL: https://github.com/apache/spark/pull/36759#discussion_r21336


##
python/pyspark/pandas/tests/test_series.py:
##
@@ -2201,7 +2201,7 @@ def test_mad(self):
 pser.index = pmidx
 psser = ps.from_pandas(pser)
 
-self.assert_eq(pser.mad(), psser.mad())
+self.assert_eq(pser.mad(), psser.mad(), check_exact=False)

Review Comment:
   ```suggestion
   self.assert_eq(pser.mad(), psser.mad(), almost=True)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36758: [SPARK-39372][R] Support R 4.2.0

2022-06-03 Thread GitBox


HyukjinKwon commented on code in PR #36758:
URL: https://github.com/apache/spark/pull/36758#discussion_r10253


##
dev/appveyor-install-dependencies.ps1:
##
@@ -129,7 +129,7 @@ $env:PATH = "$env:HADOOP_HOME\bin;" + $env:PATH
 Pop-Location
 
 # == R
-$rVer = "4.0.2"
+$rVer = "4.2.0"
 $rToolsVer = "4.0.2"

Review Comment:
   While the tests passed, it failed to download RTools 4.2.0. I reverted the 
RTools upgrade here for now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36758: [SPARK-39372][R] Support R 4.2.0

2022-06-03 Thread GitBox


HyukjinKwon commented on code in PR #36758:
URL: https://github.com/apache/spark/pull/36758#discussion_r58660


##
R/pkg/R/mllib_classification.R:
##
@@ -331,7 +331,7 @@ setMethod("spark.logit", signature(data = "SparkDataFrame", 
formula = "formula")
 }
 
 if (!is.null(upperBoundsOnCoefficients)) {
-  if (class(upperBoundsOnCoefficients) != "matrix") {
+  if (is.matrix(upperBoundsOnCoefficients)) {

Review Comment:
   ```suggestion
 if (!is.matrix(upperBoundsOnCoefficients)) {
   ```



##
R/pkg/R/mllib_classification.R:
##
@@ -322,7 +322,7 @@ setMethod("spark.logit", signature(data = "SparkDataFrame", 
formula = "formula")
 }
 
 if (!is.null(lowerBoundsOnCoefficients)) {
-  if (class(lowerBoundsOnCoefficients) != "matrix") {
+  if (is.matrix(lowerBoundsOnCoefficients)) {

Review Comment:
   ```suggestion
 if (!is.matrix(lowerBoundsOnCoefficients)) {
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wangyum opened a new pull request, #36760: [SPARK-39374][SQL] Improve error message for user specified column list

2022-06-03 Thread GitBox


wangyum opened a new pull request, #36760:
URL: https://github.com/apache/spark/pull/36760

   ### What changes were proposed in this pull request?
   
   This PR improves error message for user specified column list. For example:
   ```sql
   create table t1(c1 int, c2 bigint, c3 string) using parquet;
   insert into t1(c1, c2, c4) values(1, 2, 3);
   ```
   Before this PR:
   ```
   Cannot resolve column name c4; line 1 pos 0
   org.apache.spark.sql.AnalysisException: Cannot resolve column name c4; line 
1 pos 0
   ```
   After this PR:
   ```
   [MISSING_COLUMN] Column 'c4' does not exist. Did you mean one of the 
following? [c1, c2, c3]; line 1 pos 0
   org.apache.spark.sql.AnalysisException: [MISSING_COLUMN] Column 'c4' does 
not exist. Did you mean one of the following? [c1, c2, c3]; line 1 pos 0
   ```
   
   ### Why are the changes needed?
   
   Improve error message.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Existing test.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gengliangwang opened a new pull request, #36761: [SPARK-39367][DOCS][SQL][3.2] Make SQL error objects private

2022-06-03 Thread GitBox


gengliangwang opened a new pull request, #36761:
URL: https://github.com/apache/spark/pull/36761

   
   
   ### What changes were proposed in this pull request?
   
   
   Partially backport https://github.com/apache/spark/pull/36754 to branch-3.2, 
make the following objects as private so that they won't show up in API doc:
   * QueryCompilationErrors
   * QueryExecutionErrors
   * QueryParsingErrors
   
   ### Why are the changes needed?
   
   Fix bug in doc
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   No
   ### How was this patch tested?
   
   GA tests


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36758: [SPARK-39372][R] Support R 4.2.0

2022-06-03 Thread GitBox


HyukjinKwon commented on code in PR #36758:
URL: https://github.com/apache/spark/pull/36758#discussion_r888911462


##
R/pkg/R/serialize.R:
##
@@ -58,7 +58,12 @@ writeObject <- function(con, object, writeType = TRUE) {
   # Checking types is needed here, since 'is.na' only handles atomic vectors,
   # lists and pairlists
   if (type %in% c("integer", "character", "logical", "double", "numeric")) {
-if (is.na(object)) {
+if (is.na(object[[1]])) {

Review Comment:
   **R 4.1 and below:**
   
   ```
   Warning in if (is.na(c(1, 2))) print("abc") :
 the condition has length > 1 and only the first element will be used
   ```
   
   **R 4.2+:**
   
   ```
   Error in if (is.na(c(1, 2))) print("abc") : the condition has length > 1
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #36758: [SPARK-39372][R] Support R 4.2.0

2022-06-03 Thread GitBox


HyukjinKwon commented on PR #36758:
URL: https://github.com/apache/spark/pull/36758#issuecomment-1145930588

   Tests should pass now ..


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #36759: [SPARK-39373][PYTHON][3.2] Recover branch-3.2 build broken by SPARK-39273 and SPARK-39252

2022-06-03 Thread GitBox


HyukjinKwon commented on PR #36759:
URL: https://github.com/apache/spark/pull/36759#issuecomment-1145932875

   Merged to branch-3.2.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon closed pull request #36759: [SPARK-39373][PYTHON][3.2] Recover branch-3.2 build broken by SPARK-39273 and SPARK-39252

2022-06-03 Thread GitBox


HyukjinKwon closed pull request #36759: [SPARK-39373][PYTHON][3.2] Recover 
branch-3.2 build broken by SPARK-39273 and SPARK-39252
URL: https://github.com/apache/spark/pull/36759


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #36761: [SPARK-39367][DOCS][SQL][3.2] Make SQL error objects private

2022-06-03 Thread GitBox


HyukjinKwon commented on PR #36761:
URL: https://github.com/apache/spark/pull/36761#issuecomment-1145933920

   Merged to branch-3.2.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon closed pull request #36761: [SPARK-39367][DOCS][SQL][3.2] Make SQL error objects private

2022-06-03 Thread GitBox


HyukjinKwon closed pull request #36761: [SPARK-39367][DOCS][SQL][3.2] Make SQL 
error objects private
URL: https://github.com/apache/spark/pull/36761


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36758: [SPARK-39372][R] Support R 4.2.0

2022-06-03 Thread GitBox


HyukjinKwon commented on code in PR #36758:
URL: https://github.com/apache/spark/pull/36758#discussion_r888920686


##
R/pkg/R/serialize.R:
##
@@ -58,7 +58,12 @@ writeObject <- function(con, object, writeType = TRUE) {
   # Checking types is needed here, since 'is.na' only handles atomic vectors,
   # lists and pairlists
   if (type %in% c("integer", "character", "logical", "double", "numeric")) {
-if (is.na(object)) {
+if (is.na(object[[1]])) {
+  # Uses the first element for now to keep the behavior same as R before
+  # 4.2.0. This is wrong because we should differenciate c(NA) from a
+  # single NA as the former means array(null) and the latter means null
+  # in Spark SQL. However, it requires non-trivial comparison to 
distinguish

Review Comment:
   e.g.) we should check if the input is vector, list, array, etc, which is 
exactly being done at `getSerdeType`. However, this comparison here (up to my 
best knowledge) is a shortcut to avoid the overhead from `getSerdeType`. So, I 
just decided to leave it as is for now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk commented on pull request #36753: [SPARK-39259][SQL][3.2] Evaluate timestamps consistently in subqueries

2022-06-03 Thread GitBox


MaxGekk commented on PR #36753:
URL: https://github.com/apache/spark/pull/36753#issuecomment-1145942007

   @olaky Could you re-trigger builds, please.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #36758: [SPARK-39372][R] Support R 4.2.0

2022-06-03 Thread GitBox


HyukjinKwon commented on PR #36758:
URL: https://github.com/apache/spark/pull/36758#issuecomment-1145940475

   cc @felixcheung @shivaram @viirya too FYI 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] EnricoMi commented on a diff in pull request #36150: [SPARK-38864][SQL] Add melt / unpivot to Dataset

2022-06-03 Thread GitBox


EnricoMi commented on code in PR #36150:
URL: https://github.com/apache/spark/pull/36150#discussion_r889066692


##
sql/core/src/main/scala/org/apache/spark/sql/Melt.scala:
##
@@ -0,0 +1,125 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import org.apache.spark.sql.catalyst.analysis.{AnsiTypeCoercion, TypeCoercion}
+import org.apache.spark.sql.catalyst.expressions.{AttributeReference, Cast, 
Expression, Literal}
+import org.apache.spark.sql.catalyst.expressions.objects.AssertNotNull
+import org.apache.spark.sql.catalyst.plans.logical.Expand
+import org.apache.spark.sql.functions.col
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.types.{DataType, StringType}
+
+private[sql] object Melt {
+  def of[_](ds: Dataset[_],
+ids: Seq[String],
+values: Seq[String] = Seq.empty,
+dropNulls: Boolean = false,
+variableColumnName: String = "variable",
+valueColumnName: String = "value"): DataFrame = {
+// values should be disjoint to ids
+if (values.intersect(ids).nonEmpty) {
+  throw new IllegalArgumentException(s"A column cannot be both an id and a 
value column: " +

Review Comment:
   We are down to 2 melt-specific errors, and `MISSING_COLUMNS` is gone, so 
never mind.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] akpatnam25 commented on a diff in pull request #36734: [SPARK-38987][SHUFFLE] Throw FetchFailedException when merged shuffle blocks are corrupted and spark.shuffle.detectCorrupt is se

2022-06-03 Thread GitBox


akpatnam25 commented on code in PR #36734:
URL: https://github.com/apache/spark/pull/36734#discussion_r889067758


##
core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala:
##
@@ -1885,6 +1885,16 @@ private[spark] class DAGScheduler(
   mapOutputTracker.
 unregisterMergeResult(shuffleId, reduceId, bmAddress, 
Option(mapIndex))
 }
+  } else {
+// Unregister the merge result of  if there 
is a FetchFailed event
+// and is not a  MetaDataFetchException which is signified by 
bmAddress being null
+if (bmAddress != null
+  && 
bmAddress.executorId.equals(BlockManagerId.SHUFFLE_MERGER_IDENTIFIER)) {

Review Comment:
   done



##
core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala:
##
@@ -1885,6 +1885,16 @@ private[spark] class DAGScheduler(
   mapOutputTracker.
 unregisterMergeResult(shuffleId, reduceId, bmAddress, 
Option(mapIndex))
 }
+  } else {
+// Unregister the merge result of  if there 
is a FetchFailed event
+// and is not a  MetaDataFetchException which is signified by 
bmAddress being null
+if (bmAddress != null
+  && 
bmAddress.executorId.equals(BlockManagerId.SHUFFLE_MERGER_IDENTIFIER)) {
+  assert(pushBasedShuffleEnabled, "Pushed based shuffle needs to " 
+
+"be enabled so that merge results are present.")

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] EnricoMi commented on a diff in pull request #36150: [SPARK-38864][SQL] Add melt / unpivot to Dataset

2022-06-03 Thread GitBox


EnricoMi commented on code in PR #36150:
URL: https://github.com/apache/spark/pull/36150#discussion_r889068185


##
sql/core/src/main/scala/org/apache/spark/sql/Melt.scala:
##
@@ -0,0 +1,125 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import org.apache.spark.sql.catalyst.analysis.{AnsiTypeCoercion, TypeCoercion}
+import org.apache.spark.sql.catalyst.expressions.{AttributeReference, Cast, 
Expression, Literal}
+import org.apache.spark.sql.catalyst.expressions.objects.AssertNotNull
+import org.apache.spark.sql.catalyst.plans.logical.Expand
+import org.apache.spark.sql.functions.col
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.types.{DataType, StringType}
+
+private[sql] object Melt {

Review Comment:
   Alright, `Melt` is now a `LogicalPlan`, which already fixed / implemented 
SPARK-39292.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] EnricoMi commented on a diff in pull request #36150: [SPARK-38864][SQL] Add melt / unpivot to Dataset

2022-06-03 Thread GitBox


EnricoMi commented on code in PR #36150:
URL: https://github.com/apache/spark/pull/36150#discussion_r889072078


##
sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala:
##
@@ -2012,6 +2012,152 @@ class Dataset[T] private[sql](
   @scala.annotation.varargs
   def agg(expr: Column, exprs: Column*): DataFrame = groupBy().agg(expr, exprs 
: _*)
 
+  /**
+   * (Scala-specific)
+   * Unpivot a DataFrame from wide format to long format, optionally
+   * leaving identifier variables set.
+   *
+   * This function is useful to massage a DataFrame into a format where some
+   * columns are identifier variables (`ids`), while all other columns,
+   * considered measured variables (`values`), are "unpivoted" to the rows,
+   * leaving just two non-identifier columns, 'variable' and 'value'.
+   *
+   * {{{
+   *   val df = Seq((1, 11, 12L), (2, 21, 22L)).toDF("id", "int", "long")
+   *   df.show()
+   *   // output:
+   *   // +---+---++
+   *   // | id|int|long|
+   *   // +---+---++
+   *   // |  1| 11|  12|
+   *   // |  2| 21|  22|
+   *   // +---+---++
+   *
+   *   df.melt(Seq("id")).show()
+   *   // output:
+   *   // +---++-+
+   *   // | id|variable|value|

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] EnricoMi commented on a diff in pull request #36150: [SPARK-38864][SQL] Add melt / unpivot to Dataset

2022-06-03 Thread GitBox


EnricoMi commented on code in PR #36150:
URL: https://github.com/apache/spark/pull/36150#discussion_r889074533


##
sql/core/src/main/scala/org/apache/spark/sql/Melt.scala:
##
@@ -0,0 +1,143 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import org.apache.spark.sql.catalyst.analysis.{AnsiTypeCoercion, TypeCoercion, 
UnresolvedAttribute}
+import org.apache.spark.sql.catalyst.expressions.{AttributeReference, Cast, 
Expression, Literal, NamedExpression}
+import org.apache.spark.sql.catalyst.expressions.objects.AssertNotNull
+import org.apache.spark.sql.catalyst.plans.logical.Expand
+import org.apache.spark.sql.functions.col
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.types.{DataType, StringType}
+
+private[sql] object Melt {
+  def of[_](
+  ds: Dataset[_],
+  ids: Seq[String],
+  values: Seq[String],
+  dropNulls: Boolean,
+  variableColumnName: String,
+  valueColumnName: String): DataFrame = {
+// values should be disjoint to ids
+if (values.intersect(ids).nonEmpty) {
+  throw new AnalysisException("MELT_ID_AND_VALUE_COLUMNS_NOT_DISJOINT", 
Array(
+ids.mkString(", "), values.mkString(", "), 
values.intersect(ids).mkString(", ")
+  ))
+}
+
+// id columns must not contain variable or value column name
+if (ids.contains(variableColumnName)) {
+  throw new AnalysisException("MELT_VARIABLE_COLUMN_IS_ID_COLUMN", Array(
+variableColumnName, ids.mkString(", ")
+  ))
+}
+if (ids.contains(valueColumnName)) {
+  throw new AnalysisException("MELT_VALUE_COLUMN_IS_ID_COLUMN", Array(
+valueColumnName, ids.mkString(", ")
+  ))
+}
+
+// if no values given, all non-id columns are melted
+val valueNameCandidates = ds.columns.diff(ids).toSeq
+val valueNames = 
Some(values).filter(_.nonEmpty).getOrElse(valueNameCandidates)
+
+// if there are no values given and no non-id columns exist, we cannot melt
+if (valueNames.isEmpty) {
+  throw new AnalysisException("MELT_REQUIRES_VALUE_COLUMNS", 
Array(ids.mkString(", ")))
+}
+
+// resolve given id and value column names
+val idExprs = ids.map(resolve(ds, _))
+val valueExprs = valueNames.map(resolve(ds, _))
+
+// all given ids and values should exist in ds
+val unresolvedIds = idExprs.filter(_.isLeft).map(_.swap.toOption.get)
+if (unresolvedIds.nonEmpty) {
+  val idNameCandidates = ds.columns.diff(valueNames).toSeq
+  throw new AnalysisException("MISSING_COLUMNS",
+Array(unresolvedIds.mkString(", "), idNameCandidates.mkString(", ")))
+}
+val unresolvedValues = valueExprs.filter(_.isLeft).map(_.swap.toOption.get)
+if (unresolvedValues.nonEmpty) {
+  throw new AnalysisException("MISSING_COLUMNS",
+Array(unresolvedValues.mkString(", "), valueNameCandidates.mkString(", 
")))
+}
+
+// all melted values have to have the same type
+val resolvedValues = valueExprs.filter(_.isRight).map(_.toOption.get)
+val valueType = resolvedValues.map(_.dataType).reduce(tightestCommonType)

Review Comment:
   The best I could do is use `findWiderType` / 
`findWiderTypeWithoutStringPromotion`, applied to `Melt` value columns in a new 
type coercion rule `MeltCoercion`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] EnricoMi commented on a diff in pull request #36150: [SPARK-38864][SQL] Add melt / unpivot to Dataset

2022-06-03 Thread GitBox


EnricoMi commented on code in PR #36150:
URL: https://github.com/apache/spark/pull/36150#discussion_r889077442


##
sql/core/src/main/scala/org/apache/spark/sql/Melt.scala:
##
@@ -0,0 +1,143 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import org.apache.spark.sql.catalyst.analysis.{AnsiTypeCoercion, TypeCoercion, 
UnresolvedAttribute}
+import org.apache.spark.sql.catalyst.expressions.{AttributeReference, Cast, 
Expression, Literal, NamedExpression}
+import org.apache.spark.sql.catalyst.expressions.objects.AssertNotNull
+import org.apache.spark.sql.catalyst.plans.logical.Expand
+import org.apache.spark.sql.functions.col
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.types.{DataType, StringType}
+
+private[sql] object Melt {
+  def of[_](
+  ds: Dataset[_],
+  ids: Seq[String],
+  values: Seq[String],
+  dropNulls: Boolean,
+  variableColumnName: String,
+  valueColumnName: String): DataFrame = {

Review Comment:
   `Melt` is now a `LogicalPlan` with some rules in the analyser to melt all 
value columns except for the id columns if no value columns are given. The 
`Melt` logical plan is replaced by the equivalent `Expand` operation during 
analysis, once ids columns, value columns and output value type are resolved.
   
   This is much cleaner now, and allows for expressions, structs and `*` for id 
and value columns. Thanks for the pointers!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] EnricoMi commented on pull request #36150: [SPARK-38864][SQL] Add melt / unpivot to Dataset

2022-06-03 Thread GitBox


EnricoMi commented on PR #36150:
URL: https://github.com/apache/spark/pull/36150#issuecomment-1146105405

   @HyukjinKwon @cloud-fan which value do you prefer for the variable column?
   
   - `NamedExpression.name`
   - `NamedExpression.prettyName`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dtenedor commented on pull request #36672: [SPARK-39265][SQL] Support vectorized Parquet scans with DEFAULT values

2022-06-03 Thread GitBox


dtenedor commented on PR #36672:
URL: https://github.com/apache/spark/pull/36672#issuecomment-1146161162

   @gengliangwang @HyukjinKwon @cloud-fan can someone please merge this in (or 
leave more review comment(s) if desired for another pass)?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] singhpk234 commented on a diff in pull request #36760: [SPARK-39374][SQL] Improve error message for user specified column list

2022-06-03 Thread GitBox


singhpk234 commented on code in PR #36760:
URL: https://github.com/apache/spark/pull/36760#discussion_r889140462


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala:
##
@@ -3424,9 +3424,10 @@ class Analyzer(override val catalogManager: 
CatalogManager)
 i.userSpecifiedCols, "in the column list", resolver)
 
   i.userSpecifiedCols.map { col =>
-  i.table.resolve(Seq(col), resolver)
-.getOrElse(throw 
QueryCompilationErrors.cannotResolveUserSpecifiedColumnsError(
-  col, i.table))
+i.table.resolve(Seq(col), resolver)
+  .getOrElse(i.failAnalysis(
+errorClass = "MISSING_COLUMN",
+messageParameters = Array(col, 
i.table.output.map(_.name).mkString(", "

Review Comment:
   [question] we can have a table which has  > 100 columns as well, showing all 
the columns in error message would make it unusually large. Are we ok with it ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun closed pull request #36758: [SPARK-39372][R] Support R 4.2.0

2022-06-03 Thread GitBox


dongjoon-hyun closed pull request #36758: [SPARK-39372][R] Support R 4.2.0
URL: https://github.com/apache/spark/pull/36758


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun opened a new pull request, #36762: [SPARK-39259][SQL][TEST] Fix Scala 2.13 ClassCastException

2022-06-03 Thread GitBox


dongjoon-hyun opened a new pull request, #36762:
URL: https://github.com/apache/spark/pull/36762

   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #36753: [SPARK-39259][SQL][3.2] Evaluate timestamps consistently in subqueries

2022-06-03 Thread GitBox


dongjoon-hyun commented on PR #36753:
URL: https://github.com/apache/spark/pull/36753#issuecomment-1146210977

   Here is the PR to fix Scala 2.13 issue.
   - https://github.com/apache/spark/pull/36762


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #36752: [SPARK-39259][SQL][3.3] Evaluate timestamps consistently in subqueries

2022-06-03 Thread GitBox


dongjoon-hyun commented on PR #36752:
URL: https://github.com/apache/spark/pull/36752#issuecomment-1146211548

   Hi, @olaky and @MaxGekk . Unfortunately, this broke Scala 2.13 in 
master/branch-3.3 and RC4. I made a PR to fix it.
   - https://github.com/apache/spark/pull/36762


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #36762: [SPARK-39259][SQL][TEST][FOLLOWUP] Fix Scala 2.13 ClassCastException

2022-06-03 Thread GitBox


dongjoon-hyun commented on PR #36762:
URL: https://github.com/apache/spark/pull/36762#issuecomment-1146213732

   cc @olaky , @MaxGekk , @HyukjinKwon


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #36762: [SPARK-39259][SQL][TEST][FOLLOWUP] Fix Scala 2.13 `ClassCastException` in `ComputeCurrentTimeSuite`

2022-06-03 Thread GitBox


dongjoon-hyun commented on PR #36762:
URL: https://github.com/apache/spark/pull/36762#issuecomment-1146215037

   cc @viirya


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #36762: [SPARK-39259][SQL][TEST][FOLLOWUP] Fix Scala 2.13 `ClassCastException` in `ComputeCurrentTimeSuite`

2022-06-03 Thread GitBox


dongjoon-hyun commented on PR #36762:
URL: https://github.com/apache/spark/pull/36762#issuecomment-1146234918

   cc @huaxingao , too


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] karenfeng opened a new pull request, #36763: [SPARK-39376][SQL] Hide duplicated columns in star expansion of subquery alias from NATURAL/USING JOIN

2022-06-03 Thread GitBox


karenfeng opened a new pull request, #36763:
URL: https://github.com/apache/spark/pull/36763

   ### What changes were proposed in this pull request?
   
   Follows up from https://github.com/apache/spark/pull/31666. This PR 
introduced a bug where the qualified star expansion of a subquery alias 
containing a NATURAL/USING output duplicated columns.
   
   ### Why are the changes needed?
   
   Duplicated, hidden columns should not be output from a star expansion.
   
   ### Does this PR introduce _any_ user-facing change?
   
   The query
   
   ```
   val df1 = Seq((3, 8)).toDF("a", "b") 
   val df2 = Seq((8, 7)).toDF("b", "d") 
   val joinDF = df1.join(df2, "b")
   joinDF.alias("r").select("r.*")
   ```
   
   Now outputs a single column `b`, instead of two (duplicate) columns for `b`.
   
   ### How was this patch tested?
   
   UTs


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #36762: [SPARK-39259][SQL][TEST][FOLLOWUP] Fix Scala 2.13 `ClassCastException` in `ComputeCurrentTimeSuite`

2022-06-03 Thread GitBox


dongjoon-hyun commented on PR #36762:
URL: https://github.com/apache/spark/pull/36762#issuecomment-1146248944

   Thank you for your review and approval, @huaxingao !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] amaliujia commented on a diff in pull request #36586: [SPARK-39236][SQL] Make CreateTable and ListTables be compatible with 3 layer namespace

2022-06-03 Thread GitBox


amaliujia commented on code in PR #36586:
URL: https://github.com/apache/spark/pull/36586#discussion_r889237990


##
sql/core/src/test/scala/org/apache/spark/sql/internal/CatalogSuite.scala:
##
@@ -553,4 +570,100 @@ class CatalogSuite extends SharedSparkSession with 
AnalysisTest {
 }.getMessage
 assert(errMsg.contains("my_temp_table is a temp view. 
'recoverPartitions()' expects a table"))
   }
+
+  test("three layer namespace compatibility - create managed table") {
+spark.conf.set("spark.sql.catalog.testcat", 
classOf[InMemoryCatalog].getName)
+val catalogName = "testcat"
+val dbName = "my_db"
+val tableName = "my_table"
+val tableSchema = new StructType().add("i", "int")
+val description = "this is a test table"
+
+val df = spark.catalog.createTable(
+  tableName = Array(catalogName, dbName, tableName).mkString("."),
+  source = classOf[FakeV2Provider].getName,
+  schema = tableSchema,
+  description = description,
+  options = Map.empty[String, String])
+assert(df.schema.equals(tableSchema))
+
+val testCatalog =
+  spark.sessionState.catalogManager.catalog("testcat").asTableCatalog
+val table = testCatalog.loadTable(Identifier.of(Array(dbName), tableName))
+assert(table.schema().equals(tableSchema))
+
assert(table.properties().get("provider").equals(classOf[FakeV2Provider].getName))
+assert(table.properties().get("comment").equals(description))
+  }
+
+  test("three layer namespace compatibility - create external table") {
+withTempDir { dir =>
+  val catalogName = "testcat"
+  val dbName = "my_db"
+  val tableName = "my_table"
+  val tableSchema = new StructType().add("i", "int")
+  val description = "this is a test table"
+
+  val df = spark.catalog.createTable(
+tableName = Array(catalogName, dbName, tableName).mkString("."),
+source = classOf[FakeV2Provider].getName,
+schema = tableSchema,
+description = description,
+options = Map("path" -> dir.getAbsolutePath))
+  assert(df.schema.equals(tableSchema))
+
+  val testCatalog =
+spark.sessionState.catalogManager.catalog("testcat").asTableCatalog
+  val table = testCatalog.loadTable(Identifier.of(Array(dbName), 
tableName))
+  assert(table.schema().equals(tableSchema))
+  
assert(table.properties().get("provider").equals(classOf[FakeV2Provider].getName))
+  assert(table.properties().get("comment").equals(description))
+  assert(table.properties().get("path").equals(dir.getAbsolutePath))
+}
+  }
+
+  test("three layer namespace compatibility - list tables") {

Review Comment:
   It turns out our code does not support `a.b` as table name:
   
   ```
   `also_table_name.my_table1` is not a valid name for tables/databases. Valid 
names only contain alphabet characters, numbers and _.
   org.apache.spark.sql.AnalysisException: `also_table_name.my_table1` is not a 
valid name for tables/databases. Valid names only contain alphabet characters, 
numbers and _.
at 
org.apache.spark.sql.errors.QueryCompilationErrors$.invalidNameForTableOrDatabaseError(QueryCompilationErrors.scala:572)
at 
org.apache.spark.sql.catalyst.catalog.SessionCatalog.validateName(SessionCatalog.scala:146)
at 
org.apache.spark.sql.catalyst.catalog.SessionCatalog.createTable(SessionCatalog.scala:350)
at 
org.apache.spark.sql.internal.CatalogSuite.createTable(CatalogSuite.scala:63)
at 
org.apache.spark.sql.internal.CatalogSuite.$anonfun$new$101(CatalogSuite.scala:677)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gengliangwang closed pull request #36672: [SPARK-39265][SQL] Support vectorized Parquet scans with DEFAULT values

2022-06-03 Thread GitBox


gengliangwang closed pull request #36672: [SPARK-39265][SQL] Support vectorized 
Parquet scans with DEFAULT values
URL: https://github.com/apache/spark/pull/36672


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gengliangwang commented on pull request #36675: [SPARK-39294][SQL] Support vectorized Orc scans with DEFAULT values

2022-06-03 Thread GitBox


gengliangwang commented on PR #36675:
URL: https://github.com/apache/spark/pull/36675#issuecomment-1146266746

   @dtenedor The PR https://github.com/apache/spark/pull/36672/ is merged. Can 
you remove the Parquet-related code in this one? I will review this one after 
that today.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] amaliujia commented on a diff in pull request #36586: [SPARK-39236][SQL] Make CreateTable and ListTables be compatible with 3 layer namespace

2022-06-03 Thread GitBox


amaliujia commented on code in PR #36586:
URL: https://github.com/apache/spark/pull/36586#discussion_r889237990


##
sql/core/src/test/scala/org/apache/spark/sql/internal/CatalogSuite.scala:
##
@@ -553,4 +570,100 @@ class CatalogSuite extends SharedSparkSession with 
AnalysisTest {
 }.getMessage
 assert(errMsg.contains("my_temp_table is a temp view. 
'recoverPartitions()' expects a table"))
   }
+
+  test("three layer namespace compatibility - create managed table") {
+spark.conf.set("spark.sql.catalog.testcat", 
classOf[InMemoryCatalog].getName)
+val catalogName = "testcat"
+val dbName = "my_db"
+val tableName = "my_table"
+val tableSchema = new StructType().add("i", "int")
+val description = "this is a test table"
+
+val df = spark.catalog.createTable(
+  tableName = Array(catalogName, dbName, tableName).mkString("."),
+  source = classOf[FakeV2Provider].getName,
+  schema = tableSchema,
+  description = description,
+  options = Map.empty[String, String])
+assert(df.schema.equals(tableSchema))
+
+val testCatalog =
+  spark.sessionState.catalogManager.catalog("testcat").asTableCatalog
+val table = testCatalog.loadTable(Identifier.of(Array(dbName), tableName))
+assert(table.schema().equals(tableSchema))
+
assert(table.properties().get("provider").equals(classOf[FakeV2Provider].getName))
+assert(table.properties().get("comment").equals(description))
+  }
+
+  test("three layer namespace compatibility - create external table") {
+withTempDir { dir =>
+  val catalogName = "testcat"
+  val dbName = "my_db"
+  val tableName = "my_table"
+  val tableSchema = new StructType().add("i", "int")
+  val description = "this is a test table"
+
+  val df = spark.catalog.createTable(
+tableName = Array(catalogName, dbName, tableName).mkString("."),
+source = classOf[FakeV2Provider].getName,
+schema = tableSchema,
+description = description,
+options = Map("path" -> dir.getAbsolutePath))
+  assert(df.schema.equals(tableSchema))
+
+  val testCatalog =
+spark.sessionState.catalogManager.catalog("testcat").asTableCatalog
+  val table = testCatalog.loadTable(Identifier.of(Array(dbName), 
tableName))
+  assert(table.schema().equals(tableSchema))
+  
assert(table.properties().get("provider").equals(classOf[FakeV2Provider].getName))
+  assert(table.properties().get("comment").equals(description))
+  assert(table.properties().get("path").equals(dir.getAbsolutePath))
+}
+  }
+
+  test("three layer namespace compatibility - list tables") {

Review Comment:
   It turns out our code does not support `a.b` as table/database name:
   
   ```
   `also_table_name.my_table1` is not a valid name for tables/databases. Valid 
names only contain alphabet characters, numbers and _.
   org.apache.spark.sql.AnalysisException: `also_table_name.my_table1` is not a 
valid name for tables/databases. Valid names only contain alphabet characters, 
numbers and _.
at 
org.apache.spark.sql.errors.QueryCompilationErrors$.invalidNameForTableOrDatabaseError(QueryCompilationErrors.scala:572)
at 
org.apache.spark.sql.catalyst.catalog.SessionCatalog.validateName(SessionCatalog.scala:146)
at 
org.apache.spark.sql.catalyst.catalog.SessionCatalog.createTable(SessionCatalog.scala:350)
at 
org.apache.spark.sql.internal.CatalogSuite.createTable(CatalogSuite.scala:63)
at 
org.apache.spark.sql.internal.CatalogSuite.$anonfun$new$101(CatalogSuite.scala:677)
   ```
   
   In this case, we won't hit the ambiguity that a database name `a.b` could 
either a database name or `a` is catalog name for `ListTables` API?
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] amaliujia commented on a diff in pull request #36586: [SPARK-39236][SQL] Make CreateTable and ListTables be compatible with 3 layer namespace

2022-06-03 Thread GitBox


amaliujia commented on code in PR #36586:
URL: https://github.com/apache/spark/pull/36586#discussion_r889237990


##
sql/core/src/test/scala/org/apache/spark/sql/internal/CatalogSuite.scala:
##
@@ -553,4 +570,100 @@ class CatalogSuite extends SharedSparkSession with 
AnalysisTest {
 }.getMessage
 assert(errMsg.contains("my_temp_table is a temp view. 
'recoverPartitions()' expects a table"))
   }
+
+  test("three layer namespace compatibility - create managed table") {
+spark.conf.set("spark.sql.catalog.testcat", 
classOf[InMemoryCatalog].getName)
+val catalogName = "testcat"
+val dbName = "my_db"
+val tableName = "my_table"
+val tableSchema = new StructType().add("i", "int")
+val description = "this is a test table"
+
+val df = spark.catalog.createTable(
+  tableName = Array(catalogName, dbName, tableName).mkString("."),
+  source = classOf[FakeV2Provider].getName,
+  schema = tableSchema,
+  description = description,
+  options = Map.empty[String, String])
+assert(df.schema.equals(tableSchema))
+
+val testCatalog =
+  spark.sessionState.catalogManager.catalog("testcat").asTableCatalog
+val table = testCatalog.loadTable(Identifier.of(Array(dbName), tableName))
+assert(table.schema().equals(tableSchema))
+
assert(table.properties().get("provider").equals(classOf[FakeV2Provider].getName))
+assert(table.properties().get("comment").equals(description))
+  }
+
+  test("three layer namespace compatibility - create external table") {
+withTempDir { dir =>
+  val catalogName = "testcat"
+  val dbName = "my_db"
+  val tableName = "my_table"
+  val tableSchema = new StructType().add("i", "int")
+  val description = "this is a test table"
+
+  val df = spark.catalog.createTable(
+tableName = Array(catalogName, dbName, tableName).mkString("."),
+source = classOf[FakeV2Provider].getName,
+schema = tableSchema,
+description = description,
+options = Map("path" -> dir.getAbsolutePath))
+  assert(df.schema.equals(tableSchema))
+
+  val testCatalog =
+spark.sessionState.catalogManager.catalog("testcat").asTableCatalog
+  val table = testCatalog.loadTable(Identifier.of(Array(dbName), 
tableName))
+  assert(table.schema().equals(tableSchema))
+  
assert(table.properties().get("provider").equals(classOf[FakeV2Provider].getName))
+  assert(table.properties().get("comment").equals(description))
+  assert(table.properties().get("path").equals(dir.getAbsolutePath))
+}
+  }
+
+  test("three layer namespace compatibility - list tables") {

Review Comment:
   It turns out our code does not support `a.b` as table/database name:
   
   ```
   `also_table_name.my_table1` is not a valid name for tables/databases. Valid 
names only contain alphabet characters, numbers and _.
   org.apache.spark.sql.AnalysisException: `also_table_name.my_table1` is not a 
valid name for tables/databases. Valid names only contain alphabet characters, 
numbers and _.
at 
org.apache.spark.sql.errors.QueryCompilationErrors$.invalidNameForTableOrDatabaseError(QueryCompilationErrors.scala:572)
at 
org.apache.spark.sql.catalyst.catalog.SessionCatalog.validateName(SessionCatalog.scala:146)
at 
org.apache.spark.sql.catalyst.catalog.SessionCatalog.createTable(SessionCatalog.scala:350)
at 
org.apache.spark.sql.internal.CatalogSuite.createTable(CatalogSuite.scala:63)
at 
org.apache.spark.sql.internal.CatalogSuite.$anonfun$new$101(CatalogSuite.scala:677)
   ```
   
   In this case, we won't hit the ambiguity that a database name `a.b` could be 
either a database name or `a` is catalog name for `ListTables` API?
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dtenedor commented on pull request #36675: [SPARK-39294][SQL] Support vectorized Orc scans with DEFAULT values

2022-06-03 Thread GitBox


dtenedor commented on PR #36675:
URL: https://github.com/apache/spark/pull/36675#issuecomment-1146271997

   @gengliangwang sure, I just synced my Apache Spark repository fork to 
include the PR https://github.com/apache/spark/pull/36672 and then merged the 
changes into this branch. This PR now just shows changes to two files. Thanks 
for your help.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] amaliujia commented on a diff in pull request #36586: [SPARK-39236][SQL] Make CreateTable and ListTables be compatible with 3 layer namespace

2022-06-03 Thread GitBox


amaliujia commented on code in PR #36586:
URL: https://github.com/apache/spark/pull/36586#discussion_r889237990


##
sql/core/src/test/scala/org/apache/spark/sql/internal/CatalogSuite.scala:
##
@@ -553,4 +570,100 @@ class CatalogSuite extends SharedSparkSession with 
AnalysisTest {
 }.getMessage
 assert(errMsg.contains("my_temp_table is a temp view. 
'recoverPartitions()' expects a table"))
   }
+
+  test("three layer namespace compatibility - create managed table") {
+spark.conf.set("spark.sql.catalog.testcat", 
classOf[InMemoryCatalog].getName)
+val catalogName = "testcat"
+val dbName = "my_db"
+val tableName = "my_table"
+val tableSchema = new StructType().add("i", "int")
+val description = "this is a test table"
+
+val df = spark.catalog.createTable(
+  tableName = Array(catalogName, dbName, tableName).mkString("."),
+  source = classOf[FakeV2Provider].getName,
+  schema = tableSchema,
+  description = description,
+  options = Map.empty[String, String])
+assert(df.schema.equals(tableSchema))
+
+val testCatalog =
+  spark.sessionState.catalogManager.catalog("testcat").asTableCatalog
+val table = testCatalog.loadTable(Identifier.of(Array(dbName), tableName))
+assert(table.schema().equals(tableSchema))
+
assert(table.properties().get("provider").equals(classOf[FakeV2Provider].getName))
+assert(table.properties().get("comment").equals(description))
+  }
+
+  test("three layer namespace compatibility - create external table") {
+withTempDir { dir =>
+  val catalogName = "testcat"
+  val dbName = "my_db"
+  val tableName = "my_table"
+  val tableSchema = new StructType().add("i", "int")
+  val description = "this is a test table"
+
+  val df = spark.catalog.createTable(
+tableName = Array(catalogName, dbName, tableName).mkString("."),
+source = classOf[FakeV2Provider].getName,
+schema = tableSchema,
+description = description,
+options = Map("path" -> dir.getAbsolutePath))
+  assert(df.schema.equals(tableSchema))
+
+  val testCatalog =
+spark.sessionState.catalogManager.catalog("testcat").asTableCatalog
+  val table = testCatalog.loadTable(Identifier.of(Array(dbName), 
tableName))
+  assert(table.schema().equals(tableSchema))
+  
assert(table.properties().get("provider").equals(classOf[FakeV2Provider].getName))
+  assert(table.properties().get("comment").equals(description))
+  assert(table.properties().get("path").equals(dir.getAbsolutePath))
+}
+  }
+
+  test("three layer namespace compatibility - list tables") {

Review Comment:
   It turns out our code does not support `a.b` as table/database name:
   
   ```
   `also_name.my_db` is not a valid name for tables/databases. Valid names only 
contain alphabet characters, numbers and _.
   org.apache.spark.sql.AnalysisException: `also_name.my_db` is not a valid 
name for tables/databases. Valid names only contain alphabet characters, 
numbers and _.
at 
org.apache.spark.sql.errors.QueryCompilationErrors$.invalidNameForTableOrDatabaseError(QueryCompilationErrors.scala:572)
at 
org.apache.spark.sql.catalyst.catalog.SessionCatalog.validateName(SessionCatalog.scala:146)
at 
org.apache.spark.sql.catalyst.catalog.SessionCatalog.createTable(SessionCatalog.scala:350)
at 
org.apache.spark.sql.internal.CatalogSuite.createTable(CatalogSuite.scala:63)
at 
org.apache.spark.sql.internal.CatalogSuite.$anonfun$new$101(CatalogSuite.scala:677)
   ```
   
   In this case, we won't hit the ambiguity that a database name `a.b` could be 
either a database name or `a` is catalog name for `ListTables` API?
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] amaliujia commented on a diff in pull request #36586: [SPARK-39236][SQL] Make CreateTable and ListTables be compatible with 3 layer namespace

2022-06-03 Thread GitBox


amaliujia commented on code in PR #36586:
URL: https://github.com/apache/spark/pull/36586#discussion_r889237990


##
sql/core/src/test/scala/org/apache/spark/sql/internal/CatalogSuite.scala:
##
@@ -553,4 +570,100 @@ class CatalogSuite extends SharedSparkSession with 
AnalysisTest {
 }.getMessage
 assert(errMsg.contains("my_temp_table is a temp view. 
'recoverPartitions()' expects a table"))
   }
+
+  test("three layer namespace compatibility - create managed table") {
+spark.conf.set("spark.sql.catalog.testcat", 
classOf[InMemoryCatalog].getName)
+val catalogName = "testcat"
+val dbName = "my_db"
+val tableName = "my_table"
+val tableSchema = new StructType().add("i", "int")
+val description = "this is a test table"
+
+val df = spark.catalog.createTable(
+  tableName = Array(catalogName, dbName, tableName).mkString("."),
+  source = classOf[FakeV2Provider].getName,
+  schema = tableSchema,
+  description = description,
+  options = Map.empty[String, String])
+assert(df.schema.equals(tableSchema))
+
+val testCatalog =
+  spark.sessionState.catalogManager.catalog("testcat").asTableCatalog
+val table = testCatalog.loadTable(Identifier.of(Array(dbName), tableName))
+assert(table.schema().equals(tableSchema))
+
assert(table.properties().get("provider").equals(classOf[FakeV2Provider].getName))
+assert(table.properties().get("comment").equals(description))
+  }
+
+  test("three layer namespace compatibility - create external table") {
+withTempDir { dir =>
+  val catalogName = "testcat"
+  val dbName = "my_db"
+  val tableName = "my_table"
+  val tableSchema = new StructType().add("i", "int")
+  val description = "this is a test table"
+
+  val df = spark.catalog.createTable(
+tableName = Array(catalogName, dbName, tableName).mkString("."),
+source = classOf[FakeV2Provider].getName,
+schema = tableSchema,
+description = description,
+options = Map("path" -> dir.getAbsolutePath))
+  assert(df.schema.equals(tableSchema))
+
+  val testCatalog =
+spark.sessionState.catalogManager.catalog("testcat").asTableCatalog
+  val table = testCatalog.loadTable(Identifier.of(Array(dbName), 
tableName))
+  assert(table.schema().equals(tableSchema))
+  
assert(table.properties().get("provider").equals(classOf[FakeV2Provider].getName))
+  assert(table.properties().get("comment").equals(description))
+  assert(table.properties().get("path").equals(dir.getAbsolutePath))
+}
+  }
+
+  test("three layer namespace compatibility - list tables") {

Review Comment:
   It turns out our code does not support `a.b` as table/database name:
   
   ```
   `also_table_name.my_db` is not a valid name for tables/databases. Valid 
names only contain alphabet characters, numbers and _.
   org.apache.spark.sql.AnalysisException: `also_table_name.my_db` is not a 
valid name for tables/databases. Valid names only contain alphabet characters, 
numbers and _.
at 
org.apache.spark.sql.errors.QueryCompilationErrors$.invalidNameForTableOrDatabaseError(QueryCompilationErrors.scala:572)
at 
org.apache.spark.sql.catalyst.catalog.SessionCatalog.validateName(SessionCatalog.scala:146)
at 
org.apache.spark.sql.catalyst.catalog.SessionCatalog.createTable(SessionCatalog.scala:350)
at 
org.apache.spark.sql.internal.CatalogSuite.createTable(CatalogSuite.scala:63)
at 
org.apache.spark.sql.internal.CatalogSuite.$anonfun$new$101(CatalogSuite.scala:677)
   ```
   
   In this case, we won't hit the ambiguity that a database name `a.b` could be 
either a database name or `a` is catalog name for `ListTables` API?
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #36762: [SPARK-39259][SQL][TEST][FOLLOWUP] Fix Scala 2.13 `ClassCastException` in `ComputeCurrentTimeSuite`

2022-06-03 Thread GitBox


dongjoon-hyun commented on PR #36762:
URL: https://github.com/apache/spark/pull/36762#issuecomment-1146288710

   Thank you, @MaxGekk !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on pull request #36762: [SPARK-39259][SQL][TEST][FOLLOWUP] Fix Scala 2.13 `ClassCastException` in `ComputeCurrentTimeSuite`

2022-06-03 Thread GitBox


viirya commented on PR #36762:
URL: https://github.com/apache/spark/pull/36762#issuecomment-1146303346

   Nice catch!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #36762: [SPARK-39259][SQL][TEST][FOLLOWUP] Fix Scala 2.13 `ClassCastException` in `ComputeCurrentTimeSuite`

2022-06-03 Thread GitBox


dongjoon-hyun commented on PR #36762:
URL: https://github.com/apache/spark/pull/36762#issuecomment-1146309982

   Thank you, @viirya and all.
   
   All test passed in CI and I verified manually with Scala 2.13. 
   
   Merged to master/3.3.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun closed pull request #36762: [SPARK-39259][SQL][TEST][FOLLOWUP] Fix Scala 2.13 `ClassCastException` in `ComputeCurrentTimeSuite`

2022-06-03 Thread GitBox


dongjoon-hyun closed pull request #36762: [SPARK-39259][SQL][TEST][FOLLOWUP] 
Fix Scala 2.13 `ClassCastException` in `ComputeCurrentTimeSuite`
URL: https://github.com/apache/spark/pull/36762


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gengliangwang commented on pull request #36675: [SPARK-39294][SQL] Support vectorized Orc scans with DEFAULT values

2022-06-03 Thread GitBox


gengliangwang commented on PR #36675:
URL: https://github.com/apache/spark/pull/36675#issuecomment-1146349945

   Merging to master


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gengliangwang closed pull request #36675: [SPARK-39294][SQL] Support vectorized Orc scans with DEFAULT values

2022-06-03 Thread GitBox


gengliangwang closed pull request #36675: [SPARK-39294][SQL] Support vectorized 
Orc scans with DEFAULT values
URL: https://github.com/apache/spark/pull/36675


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dtenedor commented on a diff in pull request #36745: [SPARK-39359][SQL] Restrict DEFAULT columns to allowlist of supported data source types

2022-06-03 Thread GitBox


dtenedor commented on code in PR #36745:
URL: https://github.com/apache/spark/pull/36745#discussion_r889272134


##
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalogSuite.scala:
##
@@ -122,7 +122,7 @@ abstract class SessionCatalogSuite extends AnalysisTest 
with Eventually {
   }
 
   test("create table with default columns") {
-withBasicCatalog { catalog =>
+if (!isHiveExternalCatalog) withBasicCatalog { catalog =>

Review Comment:
   Good question. This test is part of the base class `SessionCatalogSuite`. It 
fails when running with the `HiveExternalSessionCatalogSuite` subclass, since 
Hive tables are not included in the allowlist of providers that allow DEFAULT 
columns. To simplify this, I edited the config to add Hive tables to the list 
of allowed providers and this test passes now in all cases.



##
sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala:
##
@@ -41,21 +41,28 @@ import org.apache.spark.sql.types.{MetadataBuilder, 
StructField, StructType}
  *
  * We can remove this rule once we implement all the catalog functionality in 
`V2SessionCatalog`.
  */
-class ResolveSessionCatalog(val catalogManager: CatalogManager)
+class ResolveSessionCatalog(val analyzer: Analyzer)
   extends Rule[LogicalPlan] with LookupCatalog {
+  val catalogManager = analyzer.catalogManager

Review Comment:
   SG, done.



##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ResolveDefaultColumnsUtil.scala:
##
@@ -83,16 +83,26 @@ object ResolveDefaultColumns {
*
* @param analyzer  used for analyzing the result of parsing the 
expression stored as text.
* @param tableSchema   represents the names and types of the columns of the 
statement to process.
+   * @param tableProvider provider of the target table to store default values 
for, if any.
* @param statementType name of the statement being processed, such as 
INSERT; useful for errors.
* @return a copy of `tableSchema` with field metadata updated with the 
constant-folded values.
*/
   def constantFoldCurrentDefaultsToExistDefaults(
   analyzer: Analyzer,
   tableSchema: StructType,
+  tableProvider: Option[String],
   statementType: String): StructType = {
 if (SQLConf.get.enableDefaultColumns) {
   val newFields: Seq[StructField] = tableSchema.fields.map { field =>
 if (field.metadata.contains(CURRENT_DEFAULT_COLUMN_METADATA_KEY)) {
+  // Make sure that the target table has a provider that supports 
default column values.
+  val allowedProviders: Array[String] =
+SQLConf.get.getConf(SQLConf.DEFAULT_COLUMN_ALLOWED_PROVIDERS)

Review Comment:
   Good idea, done!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] otterc commented on a diff in pull request #35906: [SPARK-33236][shuffle] Enable Push-based shuffle service to store state in NM level DB for work preserving restart

2022-06-03 Thread GitBox


otterc commented on code in PR #35906:
URL: https://github.com/apache/spark/pull/35906#discussion_r888604583


##
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java:
##
@@ -209,9 +243,13 @@ private AppShufflePartitionInfo 
getOrCreateAppShufflePartitionInfo(
 appShuffleInfo.getMergedShuffleIndexFilePath(shuffleId, 
shuffleMergeId, reduceId));
   File metaFile =
 appShuffleInfo.getMergedShuffleMetaFile(shuffleId, shuffleMergeId, 
reduceId);
+  // Make sure unuseful non-finalized merged data/index/meta files get 
cleaned up
+  // during service restart

Review Comment:
   Is this comment misplaced?



##
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java:
##
@@ -316,23 +356,25 @@ public String[] getMergedBlockDirs(String appId) {
   public void applicationRemoved(String appId, boolean cleanupLocalDirs) {
 logger.info("Application {} removed, cleanupLocalDirs = {}", appId, 
cleanupLocalDirs);
 AppShuffleInfo appShuffleInfo = appsShuffleInfo.remove(appId);
+cleanupAppAttemptPathInfoInDB(
+new AppAttemptId(appShuffleInfo.appId, appShuffleInfo.attemptId));
 if (null != appShuffleInfo) {
   mergedShuffleCleaner.execute(
-() -> closeAndDeletePartitionFilesIfNeeded(appShuffleInfo, 
cleanupLocalDirs));
+() -> closeAndDeletePartitionFilesOrDBIfNeeded(appShuffleInfo, 
cleanupLocalDirs, true));
 }
   }
 
-
   /**
* Clean up the AppShufflePartitionInfo for a specific AppShuffleInfo.
* If cleanupLocalDirs is true, the merged shuffle files will also be 
deleted.
* The cleanup will be executed in a separate thread.
*/
   @SuppressWarnings("SynchronizationOnLocalVariableOrMethodParameter")
   @VisibleForTesting
-  void closeAndDeletePartitionFilesIfNeeded(
+  void closeAndDeletePartitionFilesOrDBIfNeeded(
   AppShuffleInfo appShuffleInfo,
-  boolean cleanupLocalDirs) {
+  boolean cleanupLocalDirs,
+  boolean cleanupDB) {

Review Comment:
   `cleanupDB` -> `removeFromDb`?



##
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java:
##
@@ -286,7 +325,8 @@ public ManagedBuffer getMergedBlockData(
 shuffleId, shuffleMergeId, reduceId,
 ErrorHandler.BlockFetchErrorHandler.STALE_SHUFFLE_BLOCK_FETCH));
 }
-File dataFile = appShuffleInfo.getMergedShuffleDataFile(shuffleId, 
shuffleMergeId, reduceId);
+File dataFile =
+  appShuffleInfo.getMergedShuffleDataFile(shuffleId, shuffleMergeId, 
reduceId);

Review Comment:
   revert?



##
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java:
##
@@ -655,6 +747,194 @@ public void registerExecutor(String appId, 
ExecutorShuffleInfo executorInfo) {
 }
   }
 
+
+  @Override
+  public void close() {
+if (db != null) {
+  try {
+db.close();
+  } catch (IOException e) {
+logger.error("Exception closing leveldb with registered app paths info 
and "
++ "shuffle partition info", e);
+  }
+}
+  }
+
+  private void writeAppPathsInfoToDb(String appId, int attemptId, AppPathsInfo 
appPathsInfo) {
+if (db != null) {
+  try {
+byte[] key = getDbAppAttemptPathsKey(new AppAttemptId(appId, 
attemptId));
+String valueStr = mapper.writeValueAsString(appPathsInfo);
+byte[] value = valueStr.getBytes(StandardCharsets.UTF_8);
+db.put(key, value);
+  } catch (Exception e) {
+logger.error("Error saving registered app paths info", e);
+  }
+}
+  }
+
+  private void writeAppAttemptShuffleMergeInfo(
+  String appId,
+  int appAttemptId,
+  int shuffleId,
+  int shuffleMergeId) {
+if (db != null) {
+  // Write AppAttemptShuffleMergeId into LevelDB for finalized shuffles
+  try{
+byte[] dbKey = getDbAppAttemptShufflePartitionKey(
+new AppAttemptShuffleMergeId(appId, appAttemptId, shuffleId, 
shuffleMergeId));
+db.put(dbKey, new byte[0]);
+  } catch (Exception e) {
+logger.error("Error saving active app shuffle partition", e);
+  }
+}
+
+  }
+
+  private  T parseDbKey(String key, String prefix, Class valueType) {
+try {
+  String json = key.substring(prefix.length() + 1);
+  return mapper.readValue(json, valueType);
+} catch (Exception exception) {
+  logger.error("Exception while parsing the DB key {}", key);
+  return null;
+}
+  }
+
+  private AppPathsInfo parseDBAppAttemptPathsValue(byte[] value, AppAttemptId 
appAttemptId) {
+try {
+  return mapper.readValue(value, AppPathsInfo.class);
+} catch (Exception exception) {
+  logger.error("Exception while parsing the DB value for {}", 
appAttemptId);
+  return null;
+}
+  }
+
+  private AppAttemptId parseDbAppAttemptPathsKey(String key) {
+return p

[GitHub] [spark] HyukjinKwon commented on pull request #36758: [SPARK-39372][R] Support R 4.2.0

2022-06-03 Thread GitBox


HyukjinKwon commented on PR #36758:
URL: https://github.com/apache/spark/pull/36758#issuecomment-1146426846

   Thanks!!!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] BryanCutler commented on a diff in pull request #36683: [SPARK-39301][SQL][PYTHON] Leverage LocalRelation and respect Arrow batch size in createDataFrame with Arrow optimization

2022-06-03 Thread GitBox


BryanCutler commented on code in PR #36683:
URL: https://github.com/apache/spark/pull/36683#discussion_r889416243


##
python/pyspark/sql/pandas/conversion.py:
##
@@ -596,7 +596,7 @@ def _create_from_pandas_with_arrow(
 ]
 
 # Slice the DataFrame to be batched
-step = -(-len(pdf) // self.sparkContext.defaultParallelism)  # round 
int up
+step = self._jconf.arrowMaxRecordsPerBatch()

Review Comment:
   I thought this was to control how many partitions were in the rdd?  Each 
partition could have multiple batches, and probably should be capped at 
`arrowMaxRecordsPerBatch`, but since it was coming from a local Pandas 
DataFrame already in memory, that didn't seem to be a big deal.



##
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:
##
@@ -2575,6 +2575,18 @@ object SQLConf {
   .booleanConf
   .createWithDefault(false)
 
+  val ARROW_LOCAL_RELATION_THRESHOLD =
+buildConf("spark.sql.execution.arrow.localRelationThreshold")
+  .doc(
+"When converting Arrow batches to Spark DataFrame, local collections 
are used in the " +
+  "driver side if the byte size of Arrow batches is smaller than this 
threshold. " +
+  "Otherwise, the Arrow batches are sent and deserialized to Spark 
internal rows " +
+  "in the executors.")
+  .version("3.4.0")
+  .bytesConf(ByteUnit.BYTE)
+  .checkValue(_ >= 0, "This value must be equal to or greater than 0.")
+  .createWithDefaultString("48MB")

Review Comment:
   Is this the max size of each batch or all batches together?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36683: [SPARK-39301][SQL][PYTHON] Leverage LocalRelation and respect Arrow batch size in createDataFrame with Arrow optimization

2022-06-03 Thread GitBox


HyukjinKwon commented on code in PR #36683:
URL: https://github.com/apache/spark/pull/36683#discussion_r889421112


##
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:
##
@@ -2575,6 +2575,18 @@ object SQLConf {
   .booleanConf
   .createWithDefault(false)
 
+  val ARROW_LOCAL_RELATION_THRESHOLD =
+buildConf("spark.sql.execution.arrow.localRelationThreshold")
+  .doc(
+"When converting Arrow batches to Spark DataFrame, local collections 
are used in the " +
+  "driver side if the byte size of Arrow batches is smaller than this 
threshold. " +
+  "Otherwise, the Arrow batches are sent and deserialized to Spark 
internal rows " +
+  "in the executors.")
+  .version("3.4.0")
+  .bytesConf(ByteUnit.BYTE)
+  .checkValue(_ >= 0, "This value must be equal to or greater than 0.")
+  .createWithDefaultString("48MB")

Review Comment:
   It's all together ..so pretty small



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] github-actions[bot] commented on pull request #33934: [SPARK-36691][PYTHON] PythonRunner failed should pass error message to ApplicationMaster too

2022-06-03 Thread GitBox


github-actions[bot] commented on PR #33934:
URL: https://github.com/apache/spark/pull/33934#issuecomment-1146474433

   We're closing this PR because it hasn't been updated in a while. This isn't 
a judgement on the merit of the PR in any way. It's just a way of keeping the 
PR queue manageable.
   If you'd like to revive this PR, please reopen it and ask a committer to 
remove the Stale tag!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wangyum closed pull request #36755: [SPARK-39368][SQL] Move `RewritePredicateSubquery` into `InjectRuntimeFilter`

2022-06-03 Thread GitBox


wangyum closed pull request #36755: [SPARK-39368][SQL] Move 
`RewritePredicateSubquery` into `InjectRuntimeFilter`
URL: https://github.com/apache/spark/pull/36755


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wangyum commented on pull request #36755: [SPARK-39368][SQL] Move `RewritePredicateSubquery` into `InjectRuntimeFilter`

2022-06-03 Thread GitBox


wangyum commented on PR #36755:
URL: https://github.com/apache/spark/pull/36755#issuecomment-1146477751

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] tiagovrtr commented on pull request #33675: [SPARK-27997][K8S] Add support for kubernetes OAuth Token refresh

2022-06-03 Thread GitBox


tiagovrtr commented on PR #33675:
URL: https://github.com/apache/spark/pull/33675#issuecomment-1146480382

   Looping in @renato-farias as I'm leaving the BBC and he'll very much welcome 
this fix in a future release 🙂 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] JoshRosen commented on pull request #36654: [SPARK-39259][SQL] Evaluate timestamps consistently in subqueries

2022-06-03 Thread GitBox


JoshRosen commented on PR #36654:
URL: https://github.com/apache/spark/pull/36654#issuecomment-1146483609

   This looks like it might be a fix for a correctness issue? If so, we should 
probably backport this change to maintenance branches for the other 
currently-supported Spark versions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #36675: [SPARK-39294][SQL] Support vectorized Orc scans with DEFAULT values

2022-06-03 Thread GitBox


dongjoon-hyun commented on PR #36675:
URL: https://github.com/apache/spark/pull/36675#issuecomment-1146485190

   Thank you all!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhouyejoe commented on pull request #35906: [SPARK-33236][shuffle] Enable Push-based shuffle service to store state in NM level DB for work preserving restart

2022-06-03 Thread GitBox


zhouyejoe commented on PR #35906:
URL: https://github.com/apache/spark/pull/35906#issuecomment-1146485325

   Updated PR. Addressed comments and fixed break build. Will also add more 
comments to the code change.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wangyum opened a new pull request, #36764: [SPARK-39377][SQL][TESTS] Normalize expr ids in ListQuery and Exists expressions

2022-06-03 Thread GitBox


wangyum opened a new pull request, #36764:
URL: https://github.com/apache/spark/pull/36764

   ### What changes were proposed in this pull request?
   
   This PR makes it normalize expr ids in `ListQuery` and `Exists` expressions. 
For example:
   ```scala
   val testRelation = LocalRelation($"a".int, $"b".int, $"c".int)
   val x = testRelation.as("x")
   val y = testRelation.as("y")
   
   val originalQuery = x.where($"x.a".in(ListQuery(y.select($"b".analyze
   
   println(Optimize.execute(normalizeExprIds(originalQuery)))
   ```
   Before this PR:
   ```
   Filter a#0 IN (list#0 [])
   :  +- Project [b#15]
   : +- SubqueryAlias y
   :+- LocalRelation , [a#14, b#15, c#16]
   +- LocalRelation , [a#0, b#0, c#0]   
   ```
   After this PR:
   ```
   Filter a#0 IN (list#0 [])
   :  +- Project [b#0]
   : +- SubqueryAlias y
   :+- LocalRelation , [a#0, b#0, c#0]
   +- LocalRelation , [a#0, b#0, c#0]
   ```
   
   ### Why are the changes needed?
   
   `PlanTestBase.comparePlans` fails in some cases because the expr id not 
equal.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   manual test.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wangyum commented on a diff in pull request #36760: [SPARK-39374][SQL] Improve error message for user specified column list

2022-06-03 Thread GitBox


wangyum commented on code in PR #36760:
URL: https://github.com/apache/spark/pull/36760#discussion_r889430728


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala:
##
@@ -3424,9 +3424,10 @@ class Analyzer(override val catalogManager: 
CatalogManager)
 i.userSpecifiedCols, "in the column list", resolver)
 
   i.userSpecifiedCols.map { col =>
-  i.table.resolve(Seq(col), resolver)
-.getOrElse(throw 
QueryCompilationErrors.cannotResolveUserSpecifiedColumnsError(
-  col, i.table))
+i.table.resolve(Seq(col), resolver)
+  .getOrElse(i.failAnalysis(
+errorClass = "MISSING_COLUMN",
+messageParameters = Array(col, 
i.table.output.map(_.name).mkString(", "

Review Comment:
   The following query also shows all columns:
   ```sql
   select not_exist_col from tbl;
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] JoshRosen commented on a diff in pull request #36654: [SPARK-39259][SQL] Evaluate timestamps consistently in subqueries

2022-06-03 Thread GitBox


JoshRosen commented on code in PR #36654:
URL: https://github.com/apache/spark/pull/36654#discussion_r889431243


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala:
##
@@ -479,21 +479,24 @@ abstract class QueryPlan[PlanType <: QueryPlan[PlanType]]
* first to this node, then this node's subqueries and finally this node's 
children.
* When the partial function does not apply to a given node, it is left 
unchanged.
*/
-  def transformDownWithSubqueries(f: PartialFunction[PlanType, PlanType]): 
PlanType = {
+  def transformDownWithSubqueries(

Review Comment:
   Instead of modifying this method's signature, I slightly prefer adding an 
overload named `transformDownWithSubqueriesAndPruning` in order to remain 
consistent with the naming conventions for other transform methods. 
   
   Adding a new method also avoids introducing source or binary compatibility 
issues for third-party code that calls Catalyst APIs. Technically speaking, 
Catalyst APIs are considered internal to Spark and are subject to change 
between minor releases (see 
[source](https://github.com/apache/spark/blob/bb51add5c79558df863d37965603387d40cc4387/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/package.scala#L20-L24)),
 but I think it's nice to try to avoid API breakage when feasible.
   
   I also ran into problems when trying to call `transformDownWithSubqueries` 
without supplying any arguments in the first argument group: calling 
`transformDownWithSubqueries() { f }` resulted in confusing compilation errors.
   
   As a result, I'm going to submit a followup to this PR to split this into 
two methods as I've suggested above.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] JoshRosen opened a new pull request, #36765: [SPARK-39259][FOLLOWUP] Fix source and binary incompatibilities in transformDownWithSubqueries

2022-06-03 Thread GitBox


JoshRosen opened a new pull request, #36765:
URL: https://github.com/apache/spark/pull/36765

   ### What changes were proposed in this pull request?
   
   This is a followup to #36654. That PR modified 
`QueryPlan.transformDownWithSubqueries` to add tree pattern pruning.
   
   In this PR, I roll back the change to that method's signature and instead 
add a new `transformDownWithSubqueriesAndPruning` method.
   
   ### Why are the changes needed?
   
   The original change breaks binary and source compatibility in Catalyst. 
Technically speaking, Catalyst APIs are considered internal to Spark and are 
subject to change between minor releases (see 
[source](https://github.com/apache/spark/blob/bb51add5c79558df863d37965603387d40cc4387/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/package.scala#L20-L24)),
 but I think it's nice to try to avoid API breakage when possible.
   
   While trying to compile some custom Catalyst code, I ran into issues when 
trying to call the `transformDownWithSubqueries` method without supplying 
pattern bits. If I do `transformDownWithSubqueries() { f} ` then I get a 
compilation error. I think this is due to the first parameter group containing 
all default parameters.
   
   This PR's solution of adding a new `transformDownWithSubqueriesAndPruning` 
method solves this problem. It's also more consistent with the naming 
convention of other pruning-enabled tree transformation methods.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Existing tests.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] JoshRosen commented on a diff in pull request #36654: [SPARK-39259][SQL] Evaluate timestamps consistently in subqueries

2022-06-03 Thread GitBox


JoshRosen commented on code in PR #36654:
URL: https://github.com/apache/spark/pull/36654#discussion_r889432632


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala:
##
@@ -479,21 +479,24 @@ abstract class QueryPlan[PlanType <: QueryPlan[PlanType]]
* first to this node, then this node's subqueries and finally this node's 
children.
* When the partial function does not apply to a given node, it is left 
unchanged.
*/
-  def transformDownWithSubqueries(f: PartialFunction[PlanType, PlanType]): 
PlanType = {
+  def transformDownWithSubqueries(

Review Comment:
   Here's my PR: https://github.com/apache/spark/pull/36765



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wangyum opened a new pull request, #36766: [SPARK-32184][SQL] Removes inferred predicate if it has InOrCorrelatedExistsSubquery

2022-06-03 Thread GitBox


wangyum opened a new pull request, #36766:
URL: https://github.com/apache/spark/pull/36766

   ### What changes were proposed in this pull request?
   
   This pr removes inferred predicate if it has InOrCorrelatedExistsSubquery.
   
   ### Why are the changes needed?
   
   Inferred predicate do not always has benefit. For example:

   TPCH q18 before this PR | TPCH q18 after this PR
   -- | --
   
![image](https://user-images.githubusercontent.com/5399861/171972270-14091986-cee5-411c-9d19-f335c4eb2b75.jpg)
 | 
![image](https://user-images.githubusercontent.com/5399861/171972406-a6b8b188-ae76-4036-abcb-47932f3bc83f.jpg)
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   
   ### How was this patch tested?
   
   Unit test.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk commented on pull request #36765: [SPARK-39259][SQL][FOLLOWUP] Fix source and binary incompatibilities in transformDownWithSubqueries

2022-06-03 Thread GitBox


MaxGekk commented on PR #36765:
URL: https://github.com/apache/spark/pull/36765#issuecomment-1146546802

   +1, LGTM. Merging to master/3.3.
   Thank you, @JoshRosen.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk closed pull request #36765: [SPARK-39259][SQL][FOLLOWUP] Fix source and binary incompatibilities in transformDownWithSubqueries

2022-06-03 Thread GitBox


MaxGekk closed pull request #36765: [SPARK-39259][SQL][FOLLOWUP] Fix source and 
binary incompatibilities in transformDownWithSubqueries
URL: https://github.com/apache/spark/pull/36765


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org