[GitHub] [spark] SparkQA removed a comment on pull request #33333: [SPARK-36129][BUILD] Upgrade commons-compress to 1.21 to deal with CVEs

2021-07-13 Thread GitBox


SparkQA removed a comment on pull request #3:
URL: https://github.com/apache/spark/pull/3#issuecomment-879558541


   **[Test build #140993 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140993/testReport)**
 for PR 3 at commit 
[`ad3da13`](https://github.com/apache/spark/commit/ad3da133d369c9d3bcf587c5810f3d526b87ea61).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33333: [SPARK-36129][BUILD] Upgrade commons-compress to 1.21 to deal with CVEs

2021-07-13 Thread GitBox


SparkQA commented on pull request #3:
URL: https://github.com/apache/spark/pull/3#issuecomment-879614152


   **[Test build #140993 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140993/testReport)**
 for PR 3 at commit 
[`ad3da13`](https://github.com/apache/spark/commit/ad3da133d369c9d3bcf587c5810f3d526b87ea61).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sarutak edited a comment on pull request #33333: [SPARK-36129][BUILD] Upgrade commons-compress to 1.21 to deal with CVEs

2021-07-13 Thread GitBox


sarutak edited a comment on pull request #3:
URL: https://github.com/apache/spark/pull/3#issuecomment-879613028


   Yeah, I'll do it. Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sunchao commented on a change in pull request #33330: [SPARK-36123][SQL] Parquet vectorized reader doesn't skip null values correctly

2021-07-13 Thread GitBox


sunchao commented on a change in pull request #0:
URL: https://github.com/apache/spark/pull/0#discussion_r669311658



##
File path: 
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedRleValuesReader.java
##
@@ -358,7 +386,14 @@ public void skipIntegers(int total) {
 while (left > 0) {
   if (this.currentCount == 0) this.readNextGroup();
   int n = Math.min(left, this.currentCount);
-  advance(n);
+  switch (mode) {
+case RLE:
+  break;
+case PACKED:
+  currentBufferIdx += n;
+  break;
+  }
+  currentCount -= n;

Review comment:
   hmm somehow it survived - let me remove it for real.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #33330: [SPARK-36123][SQL] Parquet vectorized reader doesn't skip null values correctly

2021-07-13 Thread GitBox


viirya commented on a change in pull request #0:
URL: https://github.com/apache/spark/pull/0#discussion_r669311526



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetColumnIndexSuite.scala
##
@@ -31,24 +31,25 @@ class ParquetColumnIndexSuite extends QueryTest with 
ParquetTest with SharedSpar
*  |---|-|-|---|---|---|---|---|
* col_2   400   300   200 200 200 200 200 200

Review comment:
   And also the PR description. :)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sarutak commented on pull request #33333: [SPARK-36129][BUILD] Upgrade commons-compress to 1.21 to deal with CVEs

2021-07-13 Thread GitBox


sarutak commented on pull request #3:
URL: https://github.com/apache/spark/pull/3#issuecomment-879613028


   Yeah, I'll do it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #33333: [SPARK-36129][BUILD] Upgrade commons-compress to 1.21 to deal with CVEs

2021-07-13 Thread GitBox


dongjoon-hyun commented on pull request #3:
URL: https://github.com/apache/spark/pull/3#issuecomment-879612808


   BTW, for the other old branches, we might need to revisit.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #33333: [SPARK-36129][BUILD] Upgrade commons-compress to 1.21 to deal with CVEs

2021-07-13 Thread GitBox


dongjoon-hyun commented on pull request #3:
URL: https://github.com/apache/spark/pull/3#issuecomment-879612523


   @gengliangwang I landed this to branch-3.2 since this is related to some 
CVEs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun closed pull request #33333: [SPARK-36129][BUILD] Upgrade commons-compress to 1.21 to deal with CVEs

2021-07-13 Thread GitBox


dongjoon-hyun closed pull request #3:
URL: https://github.com/apache/spark/pull/3


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #33330: [SPARK-36123][SQL] Parquet vectorized reader doesn't skip null values correctly

2021-07-13 Thread GitBox


viirya commented on a change in pull request #0:
URL: https://github.com/apache/spark/pull/0#discussion_r669310575



##
File path: 
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedRleValuesReader.java
##
@@ -358,7 +386,14 @@ public void skipIntegers(int total) {
 while (left > 0) {
   if (this.currentCount == 0) this.readNextGroup();
   int n = Math.min(left, this.currentCount);
-  advance(n);
+  switch (mode) {
+case RLE:
+  break;
+case PACKED:
+  currentBufferIdx += n;
+  break;
+  }
+  currentCount -= n;

Review comment:
   Do you remove `advance()` method? I don't see it is removed here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33253: [SPARK-36038][CORE] Speculation metrics summary at stage level

2021-07-13 Thread GitBox


SparkQA commented on pull request #33253:
URL: https://github.com/apache/spark/pull/33253#issuecomment-879611464


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45509/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sunchao commented on a change in pull request #33330: [SPARK-36123][SQL] Parquet vectorized reader doesn't skip null values correctly

2021-07-13 Thread GitBox


sunchao commented on a change in pull request #0:
URL: https://github.com/apache/spark/pull/0#discussion_r669309831



##
File path: 
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedRleValuesReader.java
##
@@ -358,7 +386,14 @@ public void skipIntegers(int total) {
 while (left > 0) {
   if (this.currentCount == 0) this.readNextGroup();
   int n = Math.min(left, this.currentCount);
-  advance(n);
+  switch (mode) {
+case RLE:
+  break;
+case PACKED:
+  currentBufferIdx += n;
+  break;
+  }
+  currentCount -= n;

Review comment:
   no - it used to call `advance` which does the same thing, but since we 
removed the method I just replaced it with the old method body




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #33330: [SPARK-36123][SQL] Parquet vectorized reader doesn't skip null values correctly

2021-07-13 Thread GitBox


dongjoon-hyun commented on pull request #0:
URL: https://github.com/apache/spark/pull/0#issuecomment-879611200


   I reviewed and merged https://github.com/apache/spark/pull/4 .
   Please rebase this PR to the master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun closed pull request #33334: [SPARK-36131][SQL][TEST] Refactor ParquetColumnIndexSuite

2021-07-13 Thread GitBox


dongjoon-hyun closed pull request #4:
URL: https://github.com/apache/spark/pull/4


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sunchao commented on a change in pull request #33330: [SPARK-36123][SQL] Parquet vectorized reader doesn't skip null values correctly

2021-07-13 Thread GitBox


sunchao commented on a change in pull request #0:
URL: https://github.com/apache/spark/pull/0#discussion_r669309298



##
File path: 
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedRleValuesReader.java
##
@@ -255,6 +253,36 @@ private void readBatchInternal(
 state.advanceOffsetAndRowId(offset, rowId);
   }
 
+  /**
+   * Skip the next `n` values (either null or non-null) from this definition 
level reader and
+   * `valueReader`.
+   */
+  private void skipValues(
+  int n,
+  ParquetReadState state,
+  VectorizedValuesReader valuesReader,
+  ParquetVectorUpdater updater) {
+while (n > 0) {
+  if (this.currentCount == 0) this.readNextGroup();
+  int num = Math.min(n, this.currentCount);
+  switch (mode) {
+case RLE:
+  if (currentValue == state.maxDefinitionLevel) {
+updater.skipValues(num, valuesReader);
+  }

Review comment:
   yes the else case is when the value is null - let me add some comments. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #33330: [SPARK-36123][SQL] Parquet vectorized reader doesn't skip null values correctly

2021-07-13 Thread GitBox


viirya commented on a change in pull request #0:
URL: https://github.com/apache/spark/pull/0#discussion_r669309094



##
File path: 
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedRleValuesReader.java
##
@@ -358,7 +386,14 @@ public void skipIntegers(int total) {
 while (left > 0) {
   if (this.currentCount == 0) this.readNextGroup();
   int n = Math.min(left, this.currentCount);
-  advance(n);
+  switch (mode) {
+case RLE:
+  break;
+case PACKED:
+  currentBufferIdx += n;
+  break;
+  }
+  currentCount -= n;

Review comment:
   Hmm, this is a old bug?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #33334: [SPARK-36131][SQL][TEST] Refactor ParquetColumnIndexSuite

2021-07-13 Thread GitBox


dongjoon-hyun commented on pull request #4:
URL: https://github.com/apache/spark/pull/4#issuecomment-879610496


   Merged to master/3.2.
   
   cc @gengliangwang .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sunchao commented on a change in pull request #33330: [SPARK-36123][SQL] Parquet vectorized reader doesn't skip null values correctly

2021-07-13 Thread GitBox


sunchao commented on a change in pull request #0:
URL: https://github.com/apache/spark/pull/0#discussion_r669308991



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetColumnIndexSuite.scala
##
@@ -31,24 +31,25 @@ class ParquetColumnIndexSuite extends QueryTest with 
ParquetTest with SharedSpar
*  |---|-|-|---|---|---|---|---|
* col_2   400   300   200 200 200 200 200 200

Review comment:
   opps you are right - it should be 400, 300, 300, 200, 200, 200, 200, 200




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #33334: [SPARK-36131][SQL][TEST] Refactor ParquetColumnIndexSuite

2021-07-13 Thread GitBox


dongjoon-hyun commented on a change in pull request #4:
URL: https://github.com/apache/spark/pull/4#discussion_r669308630



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetColumnIndexSuite.scala
##
@@ -31,96 +40,56 @@ class ParquetColumnIndexSuite extends QueryTest with 
ParquetTest with SharedSpar
*  |---|-|-|---|---|---|---|---|
* col_2   400   300   200 200 200 200 200 200
*/
-  def checkUnalignedPages(actions: (DataFrame => DataFrame)*): Unit = {
-withTempPath(file => {
-  val ds = spark.range(0, 2000).map(i => (i, i + ":" + "o" * (i / 
100).toInt))
-  ds.coalesce(1)
-  .write
-  .option("parquet.page.size", "4096")
-  .parquet(file.getCanonicalPath)
+  def checkUnalignedPages(df: DataFrame)(actions: (DataFrame => DataFrame)*): 
Unit = {
+Seq(true, false).foreach { enableDictionary =>

Review comment:
   I added this to the PR description.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #33330: [SPARK-36123][SQL] Parquet vectorized reader doesn't skip null values correctly

2021-07-13 Thread GitBox


viirya commented on a change in pull request #0:
URL: https://github.com/apache/spark/pull/0#discussion_r669308507



##
File path: 
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedRleValuesReader.java
##
@@ -255,6 +253,36 @@ private void readBatchInternal(
 state.advanceOffsetAndRowId(offset, rowId);
   }
 
+  /**
+   * Skip the next `n` values (either null or non-null) from this definition 
level reader and
+   * `valueReader`.
+   */
+  private void skipValues(
+  int n,
+  ParquetReadState state,
+  VectorizedValuesReader valuesReader,
+  ParquetVectorUpdater updater) {
+while (n > 0) {
+  if (this.currentCount == 0) this.readNextGroup();
+  int num = Math.min(n, this.currentCount);
+  switch (mode) {
+case RLE:
+  if (currentValue == state.maxDefinitionLevel) {
+updater.skipValues(num, valuesReader);
+  }
+  break;
+case PACKED:
+  for (int i = 0; i < num; ++i) {
+if (currentBuffer[currentBufferIdx++] == state.maxDefinitionLevel) 
{
+  updater.skipValues(1, valuesReader);
+}

Review comment:
   ditto.
   
   If so, maybe it is good to add a comment.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #33330: [SPARK-36123][SQL] Parquet vectorized reader doesn't skip null values correctly

2021-07-13 Thread GitBox


viirya commented on a change in pull request #0:
URL: https://github.com/apache/spark/pull/0#discussion_r669308360



##
File path: 
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedRleValuesReader.java
##
@@ -255,6 +253,36 @@ private void readBatchInternal(
 state.advanceOffsetAndRowId(offset, rowId);
   }
 
+  /**
+   * Skip the next `n` values (either null or non-null) from this definition 
level reader and
+   * `valueReader`.
+   */
+  private void skipValues(
+  int n,
+  ParquetReadState state,
+  VectorizedValuesReader valuesReader,
+  ParquetVectorUpdater updater) {
+while (n > 0) {
+  if (this.currentCount == 0) this.readNextGroup();
+  int num = Math.min(n, this.currentCount);
+  switch (mode) {
+case RLE:
+  if (currentValue == state.maxDefinitionLevel) {
+updater.skipValues(num, valuesReader);
+  }

Review comment:
   Is else case for null value?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #33323: [SPARK-35739][SQL] Add Java-compatible Dataset.join overloads

2021-07-13 Thread GitBox


AmplabJenkins removed a comment on pull request #33323:
URL: https://github.com/apache/spark/pull/33323#issuecomment-879608970


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45508/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #33077: [SPARK-34892][SS] Introduce MergingSortWithSessionWindowStateIterator sorting input rows and rows in state efficiently

2021-07-13 Thread GitBox


AmplabJenkins removed a comment on pull request #33077:
URL: https://github.com/apache/spark/pull/33077#issuecomment-879608822


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/140989/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #33323: [SPARK-35739][SQL] Add Java-compatible Dataset.join overloads

2021-07-13 Thread GitBox


AmplabJenkins commented on pull request #33323:
URL: https://github.com/apache/spark/pull/33323#issuecomment-879608970


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45508/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33323: [SPARK-35739][SQL] Add Java-compatible Dataset.join overloads

2021-07-13 Thread GitBox


SparkQA commented on pull request #33323:
URL: https://github.com/apache/spark/pull/33323#issuecomment-879608954


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45508/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhouyejoe commented on a change in pull request #33078: [SPARK-35546][Shuffle] Enable push-based shuffle when multiple app attempts are enabled and manage concurrent access to the sta

2021-07-13 Thread GitBox


zhouyejoe commented on a change in pull request #33078:
URL: https://github.com/apache/spark/pull/33078#discussion_r669306705



##
File path: 
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java
##
@@ -567,7 +598,8 @@ public void onData(String streamId, ByteBuffer buf) throws 
IOException {
   // memory, while still providing the necessary guarantee.
   synchronized (partitionInfo) {
 Map shufflePartitions =
-  mergeManager.partitions.get(partitionInfo.appShuffleId);
+  mergeManager.appsShuffleInfo.get(partitionInfo.appId).partitions

Review comment:
   Updated a UT to test this case where a NullPointerException will be 
thrown out.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #33077: [SPARK-34892][SS] Introduce MergingSortWithSessionWindowStateIterator sorting input rows and rows in state efficiently

2021-07-13 Thread GitBox


AmplabJenkins commented on pull request #33077:
URL: https://github.com/apache/spark/pull/33077#issuecomment-879608822


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/140989/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cfmcgrady commented on pull request #33335: [SPARK-36130][SQL] UnwrapCastInBinaryComparison should skip In expression when in.list contains an expression that is not literal

2021-07-13 Thread GitBox


cfmcgrady commented on pull request #5:
URL: https://github.com/apache/spark/pull/5#issuecomment-879608835


   cc @allisonwang-db @cloud-fan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #33077: [SPARK-34892][SS] Introduce MergingSortWithSessionWindowStateIterator sorting input rows and rows in state efficiently

2021-07-13 Thread GitBox


SparkQA removed a comment on pull request #33077:
URL: https://github.com/apache/spark/pull/33077#issuecomment-879512560


   **[Test build #140989 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140989/testReport)**
 for PR 33077 at commit 
[`e4a74a3`](https://github.com/apache/spark/commit/e4a74a3375ff0de4ac72aa951d36aa36f2492e01).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhouyejoe commented on a change in pull request #33078: [SPARK-35546][Shuffle] Enable push-based shuffle when multiple app attempts are enabled and manage concurrent access to the sta

2021-07-13 Thread GitBox


zhouyejoe commented on a change in pull request #33078:
URL: https://github.com/apache/spark/pull/33078#discussion_r669306864



##
File path: 
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java
##
@@ -403,38 +394,78 @@ public MergeStatuses 
finalizeShuffleMerge(FinalizeShuffleMerge msg) throws IOExc
 reduceIds.add(partition.reduceId);
 sizes.add(partition.getLastChunkOffset());
   } catch (IOException ioe) {
-logger.warn("Exception while finalizing shuffle partition {} {} 
{}", msg.appId,
-  msg.shuffleId, partition.reduceId, ioe);
+logger.warn("Exception while finalizing shuffle partition {}_{} {} 
{}", msg.appId,
+  msg.attemptId, msg.shuffleId, partition.reduceId, ioe);
   } finally {
 partition.closeAllFiles();
-// The partition should be removed after the files are written so 
that any new stream
-// for the same reduce partition will see that the data file 
exists.
-partitionsIter.remove();
   }
 }
   }
   mergeStatuses = new MergeStatuses(msg.shuffleId,
 bitmaps.toArray(new RoaringBitmap[bitmaps.size()]), 
Ints.toArray(reduceIds),
 Longs.toArray(sizes));
 }
-partitions.remove(appShuffleId);
-logger.info("Finalized shuffle {} from Application {}.", msg.shuffleId, 
msg.appId);
+logger.info("Finalized shuffle {} from Application {}_{}.",
+  msg.shuffleId, msg.appId, msg.attemptId);
 return mergeStatuses;
   }
 
   @Override
   public void registerExecutor(String appId, ExecutorShuffleInfo executorInfo) 
{
 if (logger.isDebugEnabled()) {
   logger.debug("register executor with RemoteBlockPushResolver {} 
local-dirs {} "
-+ "num sub-dirs {}", appId, Arrays.toString(executorInfo.localDirs),
-  executorInfo.subDirsPerLocalDir);
++ "num sub-dirs {} shuffleManager {}", appId, 
Arrays.toString(executorInfo.localDirs),
+executorInfo.subDirsPerLocalDir, executorInfo.shuffleManager);
+}
+String shuffleManagerMeta = executorInfo.shuffleManager;
+if (shuffleManagerMeta.contains(":")) {
+  String mergeDirInfo = 
shuffleManagerMeta.substring(shuffleManagerMeta.indexOf(":") + 1);
+  try {
+ObjectMapper mapper = new ObjectMapper();
+MergeDirectoryMeta mergeDirectoryMeta =
+  mapper.readValue(mergeDirInfo, MergeDirectoryMeta.class);
+if (mergeDirectoryMeta.attemptId == ATTEMPT_ID_UNDEFINED) {
+  // When attemptId is -1, there is no attemptId stored in the 
ExecutorShuffleInfo.
+  // Only the first ExecutorRegister message can register the merge 
dirs
+  appsShuffleInfo.computeIfAbsent(appId, id ->
+new AppShuffleInfo(
+  appId, mergeDirectoryMeta.attemptId,
+  new AppPathsInfo(appId, executorInfo.localDirs,
+mergeDirectoryMeta.mergeDir, executorInfo.subDirsPerLocalDir)
+));
+} else {
+  // If attemptId is not -1, there is attemptId stored in the 
ExecutorShuffleInfo.
+  // The first ExecutorRegister message from the same application 
attempt wil register
+  // the merge dirs in External Shuffle Service. Any later 
ExecutorRegister message
+  // from the same application attempt will not override the merge 
dirs. But it can
+  // be overridden by ExecutorRegister message from newer application 
attempt,
+  // and former attempts' shuffle partitions information will also be 
cleaned up.
+  ConcurrentMap appShuffleInfoToBeCleanedUp =
+Maps.newConcurrentMap();
+  appsShuffleInfo.compute(appId, (id, appShuffleInfo) -> {
+if (appShuffleInfo == null || (appShuffleInfo != null
+  && mergeDirectoryMeta.attemptId > appShuffleInfo.attemptId)) {
+  
appShuffleInfoToBeCleanedUp.putIfAbsent(appShuffleInfo.attemptId, 
appShuffleInfo);
+  appShuffleInfo =
+new AppShuffleInfo(
+  appId, mergeDirectoryMeta.attemptId,
+  new AppPathsInfo(appId, executorInfo.localDirs,
+mergeDirectoryMeta.mergeDir, 
executorInfo.subDirsPerLocalDir));
+}
+return appShuffleInfo;
+  });
+  for (AppShuffleInfo appShuffleInfo: 
appShuffleInfoToBeCleanedUp.values()) {
+logger.info("Remove shuffle info for {}_{} as new application 
attempt registered",
+  appId, appShuffleInfo.attemptId);
+appShuffleInfo.cleanupShufflePartitionInfo();

Review comment:
   Added UT to check if the channels are closed and set to null.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For 

[GitHub] [spark] SparkQA commented on pull request #33077: [SPARK-34892][SS] Introduce MergingSortWithSessionWindowStateIterator sorting input rows and rows in state efficiently

2021-07-13 Thread GitBox


SparkQA commented on pull request #33077:
URL: https://github.com/apache/spark/pull/33077#issuecomment-879607967


   **[Test build #140989 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140989/testReport)**
 for PR 33077 at commit 
[`e4a74a3`](https://github.com/apache/spark/commit/e4a74a3375ff0de4ac72aa951d36aa36f2492e01).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
 * `class MergingSortWithSessionWindowStateIterator(`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhouyejoe commented on a change in pull request #33078: [SPARK-35546][Shuffle] Enable push-based shuffle when multiple app attempts are enabled and manage concurrent access to the sta

2021-07-13 Thread GitBox


zhouyejoe commented on a change in pull request #33078:
URL: https://github.com/apache/spark/pull/33078#discussion_r669306705



##
File path: 
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java
##
@@ -567,7 +598,8 @@ public void onData(String streamId, ByteBuffer buf) throws 
IOException {
   // memory, while still providing the necessary guarantee.
   synchronized (partitionInfo) {
 Map shufflePartitions =
-  mergeManager.partitions.get(partitionInfo.appShuffleId);
+  mergeManager.appsShuffleInfo.get(partitionInfo.appId).partitions

Review comment:
   Added a UT to test this case where a NullPointerException will be thrown 
out.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #33330: [SPARK-36123][SQL] Parquet vectorized reader doesn't skip null values correctly

2021-07-13 Thread GitBox


viirya commented on a change in pull request #0:
URL: https://github.com/apache/spark/pull/0#discussion_r669306577



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetColumnIndexSuite.scala
##
@@ -31,24 +31,25 @@ class ParquetColumnIndexSuite extends QueryTest with 
ParquetTest with SharedSpar
*  |---|-|-|---|---|---|---|---|
* col_2   400   300   200 200 200 200 200 200

Review comment:
   BTW, about the layout, the total of col_1 or col_2 is all 2000? But 
looks like col_2 has only 1900?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #33334: [SPARK-36131][SQL][TEST] Refactor ParquetColumnIndexSuite

2021-07-13 Thread GitBox


dongjoon-hyun commented on a change in pull request #4:
URL: https://github.com/apache/spark/pull/4#discussion_r669305829



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetColumnIndexSuite.scala
##
@@ -31,96 +40,56 @@ class ParquetColumnIndexSuite extends QueryTest with 
ParquetTest with SharedSpar
*  |---|-|-|---|---|---|---|---|
* col_2   400   300   200 200 200 200 200 200
*/
-  def checkUnalignedPages(actions: (DataFrame => DataFrame)*): Unit = {
-withTempPath(file => {
-  val ds = spark.range(0, 2000).map(i => (i, i + ":" + "o" * (i / 
100).toInt))
-  ds.coalesce(1)
-  .write
-  .option("parquet.page.size", "4096")
-  .parquet(file.getCanonicalPath)
+  def checkUnalignedPages(df: DataFrame)(actions: (DataFrame => DataFrame)*): 
Unit = {
+Seq(true, false).foreach { enableDictionary =>

Review comment:
   Thank you for adding this!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #33335: [SPARK-36130][SQL] Fix UnwrapCastInBinaryComparison bug.

2021-07-13 Thread GitBox


AmplabJenkins commented on pull request #5:
URL: https://github.com/apache/spark/pull/5#issuecomment-879606997


   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cfmcgrady opened a new pull request #33335: [SPARK-36130][SQL] Fix UnwrapCastInBinaryComparison bug.

2021-07-13 Thread GitBox


cfmcgrady opened a new pull request #5:
URL: https://github.com/apache/spark/pull/5


   
   
   ### What changes were proposed in this pull request?
   
   
   This PR fix rule `UnwrapCastInBinaryComparison` bug. Rule 
UnwrapCastInBinaryComparison should skip In expression when in.list contains an 
expression that is not literal.
   
   - In
   
   Before this pr, the following example will throw an exception.
   ```scala
 withTable("tbl") {
   sql("CREATE TABLE tbl (d decimal(33, 27)) USING PARQUET")
   sql("SELECT d FROM tbl WHERE d NOT IN (d + 1)")
 }
   ```
   - InSet
   
   As the analyzer guarantee that all the elements in the `inSet.hset` are 
literal, so this is not an issue for `InSet`.
   
   
https://github.com/apache/spark/blob/fbf53dee37129a493a4e5d5a007625b35f44fbda/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala#L264-L279
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   No, only bug fix.
   
   ### How was this patch tested?
   
   
   New test.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #33330: [SPARK-36123][SQL] Parquet vectorized reader doesn't skip null values correctly

2021-07-13 Thread GitBox


dongjoon-hyun commented on pull request #0:
URL: https://github.com/apache/spark/pull/0#issuecomment-879605733


   Thank you, @sunchao .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32401: [SPARK-35276][CORE] Calculate checksum for shuffle data and write as checksum file

2021-07-13 Thread GitBox


SparkQA commented on pull request #32401:
URL: https://github.com/apache/spark/pull/32401#issuecomment-879603860


   **[Test build #141000 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141000/testReport)**
 for PR 32401 at commit 
[`230648c`](https://github.com/apache/spark/commit/230648ce0dcc1762d9690aecfd63568721f82e4e).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33081: [SPARK-34893][SS] Support session window natively

2021-07-13 Thread GitBox


SparkQA commented on pull request #33081:
URL: https://github.com/apache/spark/pull/33081#issuecomment-879603643


   **[Test build #140999 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140999/testReport)**
 for PR 33081 at commit 
[`0af2fd1`](https://github.com/apache/spark/commit/0af2fd12f2bcd09ea3b788b9c6ec99c022199398).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33258: [SPARK-36037][SQL] Support ANSI SQL LOCALTIMESTAMP datetime value function

2021-07-13 Thread GitBox


SparkQA commented on pull request #33258:
URL: https://github.com/apache/spark/pull/33258#issuecomment-879603580


   **[Test build #140998 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140998/testReport)**
 for PR 33258 at commit 
[`a843bc3`](https://github.com/apache/spark/commit/a843bc3efe1f52dd72f9f5bf5028f1e3e5aaa0a1).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33334: [SPARK-36131][SQL][TEST] Refactor ParquetColumnIndexSuite

2021-07-13 Thread GitBox


SparkQA commented on pull request #4:
URL: https://github.com/apache/spark/pull/4#issuecomment-879603443


   **[Test build #140996 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140996/testReport)**
 for PR 4 at commit 
[`9502871`](https://github.com/apache/spark/commit/95028716f1a69fc48ec5d1f6e1228f458574b65f).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33324: [SPARK-36093][SQL] RemoveRedundantAliases should not change Command's parameter's expression's name

2021-07-13 Thread GitBox


SparkQA commented on pull request #33324:
URL: https://github.com/apache/spark/pull/33324#issuecomment-879603497


   **[Test build #140997 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140997/testReport)**
 for PR 33324 at commit 
[`60401a1`](https://github.com/apache/spark/commit/60401a1062c9af68a3d2d55ecbd72f60fc1cf142).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33253: [SPARK-36038][CORE] Speculation metrics summary at stage level

2021-07-13 Thread GitBox


SparkQA commented on pull request #33253:
URL: https://github.com/apache/spark/pull/33253#issuecomment-879598200


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45509/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33323: [SPARK-35739][SQL] Add Java-compatible Dataset.join overloads

2021-07-13 Thread GitBox


SparkQA commented on pull request #33323:
URL: https://github.com/apache/spark/pull/33323#issuecomment-879596706


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45508/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on a change in pull request #32401: [SPARK-35276][CORE] Calculate checksum for shuffle data and write as checksum file

2021-07-13 Thread GitBox


HeartSaVioR commented on a change in pull request #32401:
URL: https://github.com/apache/spark/pull/32401#discussion_r669288375



##
File path: 
core/src/main/java/org/apache/spark/shuffle/checksum/ShuffleChecksumHelper.java
##
@@ -0,0 +1,83 @@
+package org.apache.spark.shuffle.checksum;

Review comment:
   RAT is a license checker.
   
   > 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140992/console
   
   ```
   
   Running Apache RAT checks
   
   Attempting to fetch rat
   Could not find Apache license headers in the following files:
!? 
/home/jenkins/workspace/SparkPullRequestBuilder/core/src/main/java/org/apache/spark/shuffle/checksum/ShuffleChecksumHelper.java
   [error] running 
/home/jenkins/workspace/SparkPullRequestBuilder/dev/check-license ; received 
return code 1
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] beliefer commented on pull request #33258: [SPARK-36037][SQL] Support ANSI SQL LOCALTIMESTAMP datetime value function

2021-07-13 Thread GitBox


beliefer commented on pull request #33258:
URL: https://github.com/apache/spark/pull/33258#issuecomment-879587025


   retest this please


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sarutak commented on a change in pull request #33253: [SPARK-36038][CORE] Speculation metrics summary at stage level

2021-07-13 Thread GitBox


sarutak commented on a change in pull request #33253:
URL: https://github.com/apache/spark/pull/33253#discussion_r669284266



##
File path: 
core/src/test/resources/HistoryServerExpectations/running_app_list_json_expectation.json
##
@@ -1 +1 @@
-[ ]
+[ ]

Review comment:
   Unnecessary change here.

##
File path: core/src/test/resources/spark-events/application_1625351839633_843287
##
@@ -0,0 +1,37 @@
+{"Event":"SparkListenerLogStart","Spark Version":"3.3.0-SNAPSHOT"}

Review comment:
   This file need to be added to `.rat-exclude`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Ngone51 commented on a change in pull request #32401: [SPARK-35276][CORE] Calculate checksum for shuffle data and write as checksum file

2021-07-13 Thread GitBox


Ngone51 commented on a change in pull request #32401:
URL: https://github.com/apache/spark/pull/32401#discussion_r669282673



##
File path: 
core/src/main/java/org/apache/spark/shuffle/checksum/ShuffleChecksumHelper.java
##
@@ -0,0 +1,83 @@
+package org.apache.spark.shuffle.checksum;

Review comment:
   oh yeah, this's definately a mistake. Though, it doesn't seem to be 
related to `RAT` failure.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sunchao commented on pull request #33330: [SPARK-36123][SQL] Parquet vectorized reader doesn't skip null values correctly

2021-07-13 Thread GitBox


sunchao commented on pull request #0:
URL: https://github.com/apache/spark/pull/0#issuecomment-879583404


   thanks @gengliangwang - I opened #4 for this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sunchao opened a new pull request #33334: [SPARK-35743][SQL][TEST] Refactor ParquetColumnIndexSuite

2021-07-13 Thread GitBox


sunchao opened a new pull request #4:
URL: https://github.com/apache/spark/pull/4


   
   
   ### What changes were proposed in this pull request?
   
   
   Refactor `ParquetColumnIndexSuite` and allow better code reuse.
   
   ### Why are the changes needed?
   
   
   A few methods in the test suite can share the same utility method 
`checkUnalignedPages` so it's better to do that and remove code duplication.
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   No
   
   ### How was this patch tested?
   
   
   Existing tests.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #32872: [SPARK-35639][SQL] Make hasCoalescedPartition return true if something was actually coalesced

2021-07-13 Thread GitBox


cloud-fan commented on a change in pull request #32872:
URL: https://github.com/apache/spark/pull/32872#discussion_r669277698



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CustomShuffleReaderExec.scala
##
@@ -87,8 +87,15 @@ case class CustomShuffleReaderExec private(
 Iterator(desc)
   }
 
-  def hasCoalescedPartition: Boolean =
-partitionSpecs.exists(_.isInstanceOf[CoalescedPartitionSpec])
+  /**
+   * Returns true iff some non-empty partitions were combined
+   */
+  def hasCoalescedPartition: Boolean = {
+partitionSpecs.exists {
+  case s: CoalescedPartitionSpec => s.endReducerIndex - 
s.startReducerIndex > 1

Review comment:
   ```
   // If all input RDDs have 0 partition, we create an empty partition for 
every shuffle reader.
   if (validMetrics.isEmpty) {
 return Seq.fill(numShuffles)(Seq(CoalescedPartitionSpec(0, 0, 0)))
   }
   ```
   
   Shall we use `CoalescedPartitionSpec(0, numReducers, 0)`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #33253: [SPARK-36038][CORE] Speculation metrics summary at stage level

2021-07-13 Thread GitBox


AmplabJenkins removed a comment on pull request #33253:
URL: https://github.com/apache/spark/pull/33253#issuecomment-879580591


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/140995/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33253: [SPARK-36038][CORE] Speculation metrics summary at stage level

2021-07-13 Thread GitBox


SparkQA commented on pull request #33253:
URL: https://github.com/apache/spark/pull/33253#issuecomment-879580580


   **[Test build #140995 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140995/testReport)**
 for PR 33253 at commit 
[`77920b8`](https://github.com/apache/spark/commit/77920b86a9acdff45d9d35e3179b9daafe3bb84f).
* This patch **fails RAT tests**.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #33253: [SPARK-36038][CORE] Speculation metrics summary at stage level

2021-07-13 Thread GitBox


SparkQA removed a comment on pull request #33253:
URL: https://github.com/apache/spark/pull/33253#issuecomment-879580409


   **[Test build #140995 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140995/testReport)**
 for PR 33253 at commit 
[`77920b8`](https://github.com/apache/spark/commit/77920b86a9acdff45d9d35e3179b9daafe3bb84f).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #33253: [SPARK-36038][CORE] Speculation metrics summary at stage level

2021-07-13 Thread GitBox


AmplabJenkins commented on pull request #33253:
URL: https://github.com/apache/spark/pull/33253#issuecomment-879580591


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/140995/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33253: [SPARK-36038][CORE] Speculation metrics summary at stage level

2021-07-13 Thread GitBox


SparkQA commented on pull request #33253:
URL: https://github.com/apache/spark/pull/33253#issuecomment-879580409


   **[Test build #140995 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140995/testReport)**
 for PR 33253 at commit 
[`77920b8`](https://github.com/apache/spark/commit/77920b86a9acdff45d9d35e3179b9daafe3bb84f).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #33174: [SPARK-35721][PYTHON] Path level discover for python unittests

2021-07-13 Thread GitBox


AmplabJenkins removed a comment on pull request #33174:
URL: https://github.com/apache/spark/pull/33174#issuecomment-879578133


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/140991/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] venkata91 commented on pull request #33253: [SPARK-36038][CORE] Speculation metrics summary at stage level

2021-07-13 Thread GitBox


venkata91 commented on pull request #33253:
URL: https://github.com/apache/spark/pull/33253#issuecomment-879579744


   > @venkata91 Could you fix the style issue first?
   > 
   > ```
   >  [error] 
/home/runner/work/spark/spark/core/src/test/scala/org/apache/spark/deploy/history/HistoryServerSuite.scala:195:
 File line length exceeds 100 characters
   > ```
   
   Yeah fixed it. Also now Github actions seem to run which is good.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #33174: [SPARK-35721][PYTHON] Path level discover for python unittests

2021-07-13 Thread GitBox


AmplabJenkins commented on pull request #33174:
URL: https://github.com/apache/spark/pull/33174#issuecomment-879578133


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/140991/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #33323: [SPARK-35739][SQL] Add Java-compatible Dataset.join overloads

2021-07-13 Thread GitBox


AmplabJenkins removed a comment on pull request #33323:
URL: https://github.com/apache/spark/pull/33323#issuecomment-879136851


   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #33174: [SPARK-35721][PYTHON] Path level discover for python unittests

2021-07-13 Thread GitBox


SparkQA removed a comment on pull request #33174:
URL: https://github.com/apache/spark/pull/33174#issuecomment-879537921


   **[Test build #140991 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140991/testReport)**
 for PR 33174 at commit 
[`c6d4f21`](https://github.com/apache/spark/commit/c6d4f21ca368bbc7ba4236dcd9d09904e7b82e5b).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33323: [SPARK-35739][SQL] Add Java-compatible Dataset.join overloads

2021-07-13 Thread GitBox


SparkQA commented on pull request #33323:
URL: https://github.com/apache/spark/pull/33323#issuecomment-879577688


   **[Test build #140994 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140994/testReport)**
 for PR 33323 at commit 
[`7beee40`](https://github.com/apache/spark/commit/7beee40113ccf7317143c51b123bca0d65d9a0c1).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33174: [SPARK-35721][PYTHON] Path level discover for python unittests

2021-07-13 Thread GitBox


SparkQA commented on pull request #33174:
URL: https://github.com/apache/spark/pull/33174#issuecomment-879577612


   **[Test build #140991 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140991/testReport)**
 for PR 33174 at commit 
[`c6d4f21`](https://github.com/apache/spark/commit/c6d4f21ca368bbc7ba4236dcd9d09904e7b82e5b).
* This patch **fails PySpark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #33077: [SPARK-34892][SS] Introduce MergingSortWithSessionWindowStateIterator sorting input rows and rows in state efficiently

2021-07-13 Thread GitBox


AmplabJenkins removed a comment on pull request #33077:
URL: https://github.com/apache/spark/pull/33077#issuecomment-879577172


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/140987/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #33330: [SPARK-36123][SQL] Parquet vectorized reader doesn't skip null values correctly

2021-07-13 Thread GitBox


AmplabJenkins removed a comment on pull request #0:
URL: https://github.com/apache/spark/pull/0#issuecomment-879577174


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/140986/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32401: [SPARK-35276][CORE] Calculate checksum for shuffle data and write as checksum file

2021-07-13 Thread GitBox


AmplabJenkins removed a comment on pull request #32401:
URL: https://github.com/apache/spark/pull/32401#issuecomment-879577176


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45506/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #33258: [SPARK-36037][SQL] Support ANSI SQL LOCALTIMESTAMP datetime value function

2021-07-13 Thread GitBox


AmplabJenkins removed a comment on pull request #33258:
URL: https://github.com/apache/spark/pull/33258#issuecomment-879577173


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/140990/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #33333: [SPARK-36129][BUILD] Upgrade commons-compress to 1.21 to deal with CVEs

2021-07-13 Thread GitBox


AmplabJenkins removed a comment on pull request #3:
URL: https://github.com/apache/spark/pull/3#issuecomment-879577171


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45507/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #33333: [SPARK-36129][BUILD] Upgrade commons-compress to 1.21 to deal with CVEs

2021-07-13 Thread GitBox


AmplabJenkins commented on pull request #3:
URL: https://github.com/apache/spark/pull/3#issuecomment-879577171


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45507/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #33258: [SPARK-36037][SQL] Support ANSI SQL LOCALTIMESTAMP datetime value function

2021-07-13 Thread GitBox


AmplabJenkins commented on pull request #33258:
URL: https://github.com/apache/spark/pull/33258#issuecomment-879577173


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/140990/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #33077: [SPARK-34892][SS] Introduce MergingSortWithSessionWindowStateIterator sorting input rows and rows in state efficiently

2021-07-13 Thread GitBox


AmplabJenkins commented on pull request #33077:
URL: https://github.com/apache/spark/pull/33077#issuecomment-879577172


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/140987/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #33330: [SPARK-36123][SQL] Parquet vectorized reader doesn't skip null values correctly

2021-07-13 Thread GitBox


AmplabJenkins commented on pull request #0:
URL: https://github.com/apache/spark/pull/0#issuecomment-879577174


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/140986/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32401: [SPARK-35276][CORE] Calculate checksum for shuffle data and write as checksum file

2021-07-13 Thread GitBox


AmplabJenkins commented on pull request #32401:
URL: https://github.com/apache/spark/pull/32401#issuecomment-879577176


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45506/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #33332: [SQL] Warn if less files visible after stats write

2021-07-13 Thread GitBox


HyukjinKwon commented on a change in pull request #2:
URL: https://github.com/apache/spark/pull/2#discussion_r669267871



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/BasicWriteStatsTracker.scala
##
@@ -166,7 +166,7 @@ class BasicWriteTaskStatsTracker(hadoopConf: Configuration)
 }
 
 if (numSubmittedFiles != numFiles) {
-  logInfo(s"Expected $numSubmittedFiles files, but only saw $numFiles. " +
+  logWarning(s"Expected $numSubmittedFiles files, but only saw $numFiles. 
" +

Review comment:
   WDYT @steveloughran ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33333: [SPARK-36129][BUILD] Upgrade commons-compress to 1.21 to deal with CVEs

2021-07-13 Thread GitBox


SparkQA commented on pull request #3:
URL: https://github.com/apache/spark/pull/3#issuecomment-879572572


   Kubernetes integration test unable to build dist.
   
   exiting with code: 1
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45507/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #33258: [SPARK-36037][SQL] Support ANSI SQL LOCALTIMESTAMP datetime value function

2021-07-13 Thread GitBox


SparkQA removed a comment on pull request #33258:
URL: https://github.com/apache/spark/pull/33258#issuecomment-879537873


   **[Test build #140990 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140990/testReport)**
 for PR 33258 at commit 
[`a843bc3`](https://github.com/apache/spark/commit/a843bc3efe1f52dd72f9f5bf5028f1e3e5aaa0a1).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #33332: [SQL] Warn if less files visible after stats write

2021-07-13 Thread GitBox


HyukjinKwon commented on pull request #2:
URL: https://github.com/apache/spark/pull/2#issuecomment-879571129


   @tooptoop4 please refer to https://spark.apache.org/contributing.html and 
make the PR description and title properly with a jira.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33258: [SPARK-36037][SQL] Support ANSI SQL LOCALTIMESTAMP datetime value function

2021-07-13 Thread GitBox


SparkQA commented on pull request #33258:
URL: https://github.com/apache/spark/pull/33258#issuecomment-879571046


   **[Test build #140990 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140990/testReport)**
 for PR 33258 at commit 
[`a843bc3`](https://github.com/apache/spark/commit/a843bc3efe1f52dd72f9f5bf5028f1e3e5aaa0a1).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
 * `case class ParseToTimestampLTZ(`
 * `case class DomainJoin(`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon edited a comment on pull request #33329: [WIP][SPARK-35917][SHUFFLE][CORE][3.2] Disable push-based shuffle feature to prevent it from being used

2021-07-13 Thread GitBox


HyukjinKwon edited a comment on pull request #33329:
URL: https://github.com/apache/spark/pull/33329#issuecomment-879569468


   Yeah, I think we won't necessarily have to make it failed when it's enabled. 
I believe it's fine to explicitly document that this feature is unstable, and 
either correctness or backward compatibility isn't guaranteed at this moment - 
to be clear, is it not usable at all? In fact, I guess we haven't even properly 
documented this yet (?).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #33329: [WIP][SPARK-35917][SHUFFLE][CORE][3.2] Disable push-based shuffle feature to prevent it from being used

2021-07-13 Thread GitBox


HyukjinKwon commented on pull request #33329:
URL: https://github.com/apache/spark/pull/33329#issuecomment-879569468


   Yeah, I think we won't necessarily have to make it failed when it's enabled. 
I believe it's fine to explicitly document that this feature is unstable, and 
either correctness or backward compatibility isn't guaranteed.  In fact, I 
guess we haven't even properly documented this yet (?).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gengliangwang commented on pull request #33330: [SPARK-36123][SQL] Parquet vectorized reader doesn't skip null values correctly

2021-07-13 Thread GitBox


gengliangwang commented on pull request #0:
URL: https://github.com/apache/spark/pull/0#issuecomment-879568535


   @sunchao Thanks for the work. I think it's OK to have a PR for test 
refactoring.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #33325: [SPARK-36076][SQL][3.0] ArrayIndexOutOfBounds in Cast string to timestamp

2021-07-13 Thread GitBox


HyukjinKwon commented on pull request #33325:
URL: https://github.com/apache/spark/pull/33325#issuecomment-879567089


   the sparkr test failure should be ignorable.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] ekoifman commented on a change in pull request #32872: [SPARK-35639][SQL] Make hasCoalescedPartition return true if something was actually coalesced

2021-07-13 Thread GitBox


ekoifman commented on a change in pull request #32872:
URL: https://github.com/apache/spark/pull/32872#discussion_r669262650



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CustomShuffleReaderExec.scala
##
@@ -87,8 +87,15 @@ case class CustomShuffleReaderExec private(
 Iterator(desc)
   }
 
-  def hasCoalescedPartition: Boolean =
-partitionSpecs.exists(_.isInstanceOf[CoalescedPartitionSpec])
+  /**
+   * Returns true iff some non-empty partitions were combined
+   */
+  def hasCoalescedPartition: Boolean = {
+partitionSpecs.exists {
+  case s: CoalescedPartitionSpec => s.endReducerIndex - 
s.startReducerIndex > 1

Review comment:
   Interesting.  I didn't realize Spark could produce 
`spark.sql.shuffle.partitions` empty partitions.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #33324: [SPARK-36093][SQL] RemoveRedundantAliases should not change Command's parameter's expression's name

2021-07-13 Thread GitBox


HyukjinKwon commented on a change in pull request #33324:
URL: https://github.com/apache/spark/pull/33324#discussion_r669262443



##
File path: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
##
@@ -4058,6 +4058,44 @@ class SQLQuerySuite extends QueryTest with 
SharedSparkSession with AdaptiveSpark
 Row(1, 2, 1, 2) :: Nil)
 }
   }
+
+  test("SPARK-36093") {

Review comment:
   Can you add a test title?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #33323: [SPARK-35739][SQL] Add Java-compatible Dataset.join overloads

2021-07-13 Thread GitBox


HyukjinKwon commented on pull request #33323:
URL: https://github.com/apache/spark/pull/33323#issuecomment-879566146






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #33323: [SPARK-35739][SQL] Add Java-compatible Dataset.join overloads

2021-07-13 Thread GitBox


HyukjinKwon commented on a change in pull request #33323:
URL: https://github.com/apache/spark/pull/33323#discussion_r669261877



##
File path: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
##
@@ -981,6 +1006,58 @@ class Dataset[T] private[sql](
 join(right, usingColumns, "inner")
   }
 
+  /**
+   * Equi-join with another `DataFrame` using the given column. A cross join 
with a predicate
+   * is specified as an inner join. If you would explicitly like to perform a 
cross join use the
+   * `crossJoin` method.
+   *
+   * Different from other join functions, the join column will only appear 
once in the output,
+   * i.e. similar to SQL's `JOIN USING` syntax.
+   *
+   * @param right Right side of the join operation.
+   * @param usingColumn Name of the column to join on. This column must exist 
on both sides.
+   * @param joinType Type of join to perform. Default `inner`. Must be one of:
+   * `inner`, `cross`, `outer`, `full`, `fullouter`, 
`full_outer`, `left`,
+   * `leftouter`, `left_outer`, `right`, `rightouter`, 
`right_outer`,
+   * `semi`, `leftsemi`, `left_semi`, `anti`, `leftanti`, 
left_anti`.
+   *
+   * @note If you perform a self-join using this function without aliasing the 
input
+   * `DataFrame`s, you will NOT be able to reference any columns after the 
join, since
+   * there is no way to disambiguate which side of the join you would like to 
reference.
+   *
+   * @group untypedrel
+   * @since 3.1.3
+   */
+  def join(right: Dataset[_], usingColumn: String, joinType: String): 
DataFrame = {
+join(right, Seq(usingColumn), joinType)
+  }
+
+  /**
+   * (Java-specific) Equi-join with another `DataFrame` using the given 
columns. A cross join with
+   * a predicate is specified as an inner join. If you would explicitly like 
to perform a cross
+   * join use the `crossJoin` method.
+   *
+   * Different from other join functions, the join columns will only appear 
once in the output,
+   * i.e. similar to SQL's `JOIN USING` syntax.
+   *
+   * @param right Right side of the join operation.
+   * @param usingColumns Names of the columns to join on. This columns must 
exist on both sides.
+   * @param joinType Type of join to perform. Default `inner`. Must be one of:
+   * `inner`, `cross`, `outer`, `full`, `fullouter`, 
`full_outer`, `left`,
+   * `leftouter`, `left_outer`, `right`, `rightouter`, 
`right_outer`,
+   * `semi`, `leftsemi`, `left_semi`, `anti`, `leftanti`, 
left_anti`.
+   *
+   * @note If you perform a self-join using this function without aliasing the 
input
+   * `DataFrame`s, you will NOT be able to reference any columns after the 
join, since
+   * there is no way to disambiguate which side of the join you would like to 
reference.
+   *
+   * @group untypedrel
+   * @since 3.1.3
+   */
+  def join(right: Dataset[_], usingColumns: Array[String], joinType: String): 
DataFrame = {
+join(right, usingColumns.toSeq, joinType)
+  }
+
   /**
* Equi-join with another `DataFrame` using the given columns. A cross join 
with a predicate

Review comment:
   Please add "(Scala-specific)" on other methods




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #33323: [SPARK-35739][SQL] Add Java-compatible Dataset.join overloads

2021-07-13 Thread GitBox


HyukjinKwon commented on a change in pull request #33323:
URL: https://github.com/apache/spark/pull/33323#discussion_r669261712



##
File path: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
##
@@ -956,6 +956,31 @@ class Dataset[T] private[sql](
 join(right, Seq(usingColumn))
   }
 
+  /**
+   * (Java-specific) Inner equi-join with another `DataFrame` using the given 
columns.
+   *
+   * Different from other join functions, the join columns will only appear 
once in the output,
+   * i.e. similar to SQL's `JOIN USING` syntax.
+   *
+   * {{{
+   *   // Joining df1 and df2 using the columns "user_id" and "user_name"
+   *   df1.join(df2, new String[] {"user_id", "user_name"});
+   * }}}
+   *
+   * @param right Right side of the join operation.
+   * @param usingColumns Names of the columns to join on. This columns must 
exist on both sides.
+   *
+   * @note If you perform a self-join using this function without aliasing the 
input
+   * `DataFrame`s, you will NOT be able to reference any columns after the 
join, since
+   * there is no way to disambiguate which side of the join you would like to 
reference.
+   *
+   * @group untypedrel
+   * @since 3.1.3
+   */
+  def join(right: Dataset[_], usingColumns: Array[String]): DataFrame = {
+join(right, usingColumns.toSeq)
+  }
+
   /**
* Inner equi-join with another `DataFrame` using the given columns.

Review comment:
   Can you add "(Scala-specific)" here?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #33323: [SPARK-35739][SQL] Add Java-compatible Dataset.join overloads

2021-07-13 Thread GitBox


HyukjinKwon commented on a change in pull request #33323:
URL: https://github.com/apache/spark/pull/33323#discussion_r669261605



##
File path: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
##
@@ -956,6 +956,31 @@ class Dataset[T] private[sql](
 join(right, Seq(usingColumn))
   }
 
+  /**
+   * (Java-specific) Inner equi-join with another `DataFrame` using the given 
columns.
+   *
+   * Different from other join functions, the join columns will only appear 
once in the output,
+   * i.e. similar to SQL's `JOIN USING` syntax.
+   *
+   * {{{
+   *   // Joining df1 and df2 using the columns "user_id" and "user_name"
+   *   df1.join(df2, new String[] {"user_id", "user_name"});
+   * }}}
+   *
+   * @param right Right side of the join operation.
+   * @param usingColumns Names of the columns to join on. This columns must 
exist on both sides.
+   *
+   * @note If you perform a self-join using this function without aliasing the 
input
+   * `DataFrame`s, you will NOT be able to reference any columns after the 
join, since
+   * there is no way to disambiguate which side of the join you would like to 
reference.
+   *
+   * @group untypedrel
+   * @since 3.1.3

Review comment:
   Let's target 3.3.0




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32401: [SPARK-35276][CORE] Calculate checksum for shuffle data and write as checksum file

2021-07-13 Thread GitBox


SparkQA commented on pull request #32401:
URL: https://github.com/apache/spark/pull/32401#issuecomment-879565484


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45506/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #33330: [SPARK-36123][SQL] Parquet vectorized reader doesn't skip null values correctly

2021-07-13 Thread GitBox


SparkQA removed a comment on pull request #0:
URL: https://github.com/apache/spark/pull/0#issuecomment-879470999


   **[Test build #140986 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140986/testReport)**
 for PR 0 at commit 
[`41a7ca8`](https://github.com/apache/spark/commit/41a7ca8b7b2464a7363ba6d89dfdbeb8ed7c96aa).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33330: [SPARK-36123][SQL] Parquet vectorized reader doesn't skip null values correctly

2021-07-13 Thread GitBox


SparkQA commented on pull request #0:
URL: https://github.com/apache/spark/pull/0#issuecomment-879564742


   **[Test build #140986 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140986/testReport)**
 for PR 0 at commit 
[`41a7ca8`](https://github.com/apache/spark/commit/41a7ca8b7b2464a7363ba6d89dfdbeb8ed7c96aa).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #33077: [SPARK-34892][SS] Introduce MergingSortWithSessionWindowStateIterator sorting input rows and rows in state efficiently

2021-07-13 Thread GitBox


SparkQA removed a comment on pull request #33077:
URL: https://github.com/apache/spark/pull/33077#issuecomment-879471127


   **[Test build #140987 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140987/testReport)**
 for PR 33077 at commit 
[`b540632`](https://github.com/apache/spark/commit/b540632e1180dcb1ce9f49626a9fb925fd503742).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33077: [SPARK-34892][SS] Introduce MergingSortWithSessionWindowStateIterator sorting input rows and rows in state efficiently

2021-07-13 Thread GitBox


SparkQA commented on pull request #33077:
URL: https://github.com/apache/spark/pull/33077#issuecomment-879564143


   **[Test build #140987 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140987/testReport)**
 for PR 33077 at commit 
[`b540632`](https://github.com/apache/spark/commit/b540632e1180dcb1ce9f49626a9fb925fd503742).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sarutak commented on pull request #33253: [SPARK-36038][CORE] Speculation metrics summary at stage level

2021-07-13 Thread GitBox


sarutak commented on pull request #33253:
URL: https://github.com/apache/spark/pull/33253#issuecomment-879563645


   @venkata91 Could you fix the style issue first?
   ```
[error] 
/home/runner/work/spark/spark/core/src/test/scala/org/apache/spark/deploy/history/HistoryServerSuite.scala:195:
 File line length exceeds 100 characters
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #33258: [SPARK-36037][SQL] Support ANSI SQL LOCALTIMESTAMP datetime value function

2021-07-13 Thread GitBox


AmplabJenkins removed a comment on pull request #33258:
URL: https://github.com/apache/spark/pull/33258#issuecomment-879562230


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45504/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33258: [SPARK-36037][SQL] Support ANSI SQL LOCALTIMESTAMP datetime value function

2021-07-13 Thread GitBox


SparkQA commented on pull request #33258:
URL: https://github.com/apache/spark/pull/33258#issuecomment-879562217


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45504/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #33258: [SPARK-36037][SQL] Support ANSI SQL LOCALTIMESTAMP datetime value function

2021-07-13 Thread GitBox


AmplabJenkins commented on pull request #33258:
URL: https://github.com/apache/spark/pull/33258#issuecomment-879562230


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45504/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #33286: [SPARK-36079][SQL] Null-based filter estimate should always be in the range [0, 1]

2021-07-13 Thread GitBox


AmplabJenkins removed a comment on pull request #33286:
URL: https://github.com/apache/spark/pull/33286#issuecomment-879558620


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/140985/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #33286: [SPARK-36079][SQL] Null-based filter estimate should always be in the range [0, 1]

2021-07-13 Thread GitBox


AmplabJenkins commented on pull request #33286:
URL: https://github.com/apache/spark/pull/33286#issuecomment-879558620


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/140985/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33333: [SPARK-36129][BUILD] Upgrade commons-compress to 1.21 to deal with CVEs

2021-07-13 Thread GitBox


SparkQA commented on pull request #3:
URL: https://github.com/apache/spark/pull/3#issuecomment-879558541


   **[Test build #140993 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140993/testReport)**
 for PR 3 at commit 
[`ad3da13`](https://github.com/apache/spark/commit/ad3da133d369c9d3bcf587c5810f3d526b87ea61).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   7   >