date:20210625

[GitHub] [spark] AngersZhuuuu commented on pull request #33092: [SPARK-35905][SQL][FOLLOWUP][TESTS] Fix UT mistake in SQLQuerySuite

2021-06-25 Thread GitBox



AngersZh commented on pull request #33092:
URL: https://github.com/apache/spark/pull/33092#issuecomment-868952001


   @dongjoon-hyun Should I add SPARK-35905 to UT title?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] yaooqinn commented on pull request #33063: [SPARK-35879][Core][Shuffle] Fix performance regression caused by collectFetchRequests

2021-06-25 Thread GitBox



yaooqinn commented on pull request #33063:
URL: https://github.com/apache/spark/pull/33063#issuecomment-868947497


   thanks all! merged to master/3.1


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] yaooqinn closed pull request #33063: [SPARK-35879][Core][Shuffle] Fix performance regression caused by collectFetchRequests

2021-06-25 Thread GitBox



yaooqinn closed pull request #33063:
URL: https://github.com/apache/spark/pull/33063


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] yaooqinn commented on pull request #33063: [SPARK-35879][Core][Shuffle] Fix performance regression caused by collectFetchRequests

2021-06-25 Thread GitBox



yaooqinn commented on pull request #33063:
URL: https://github.com/apache/spark/pull/33063#issuecomment-868943286


   @mridulm @dongjoon-hyun, I re-run the benchmark based on the final commit 
manually. The debug log below shows the performance regression is gone.
   ```log
   21/06/26 04:04:01 INFO MapOutputTrackerWorker: Got the map output locations
   21/06/26 04:04:01 DEBUG ShuffleBlockFetcherIterator: maxBytesInFlight: 
50331648, targetRemoteRequestSize: 10066329, maxBlocksInFlightPerAddress: 
2147483647
   21/06/26 04:04:01 DEBUG ShuffleBlockFetcherIterator: Creating fetch request 
of 2606046 at BlockManagerId(2, 10.1.5.72, 37767, None) with 88 blocks
   21/06/26 04:04:01 DEBUG ShuffleBlockFetcherIterator: Collected remote fetch 
requests for BlockManagerId(2, 10.1.5.72, 37767, None) in 82 ms
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32140: [SPARK-32922][SHUFFLE][CORE] Adds support for executors to fetch local and remote merged shuffle data

2021-06-25 Thread GitBox



AmplabJenkins removed a comment on pull request #32140:
URL: https://github.com/apache/spark/pull/32140#issuecomment-818410050


   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32921: [SPARK-35779][SQL] Dynamic filtering for Data Source V2

2021-06-25 Thread GitBox



AmplabJenkins removed a comment on pull request #32921:
URL: https://github.com/apache/spark/pull/32921#issuecomment-868940480


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/44884/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #32921: [SPARK-35779][SQL] Dynamic filtering for Data Source V2

2021-06-25 Thread GitBox



AmplabJenkins commented on pull request #32921:
URL: https://github.com/apache/spark/pull/32921#issuecomment-868940480


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/44884/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] mridulm commented on pull request #32140: [SPARK-32922][SHUFFLE][CORE] Adds support for executors to fetch local and remote merged shuffle data

2021-06-25 Thread GitBox



mridulm commented on pull request #32140:
URL: https://github.com/apache/spark/pull/32140#issuecomment-868939032


   The github actions test failure looks unrelated, let me try jenkins anyway


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] mridulm commented on pull request #32140: [SPARK-32922][SHUFFLE][CORE] Adds support for executors to fetch local and remote merged shuffle data

2021-06-25 Thread GitBox



mridulm commented on pull request #32140:
URL: https://github.com/apache/spark/pull/32140#issuecomment-868938908


   Jenkins, test this please


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] wangyum commented on pull request #33099: [SPARK-35904][SQL] Collapse above RebalancePartitions

2021-06-25 Thread GitBox



wangyum commented on pull request #33099:
URL: https://github.com/apache/spark/pull/33099#issuecomment-868937773


   cc @ulysses-you  @cloud-fan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] wangyum opened a new pull request #33099: [SPARK-35904][SQL] Collapse above RebalancePartitions

2021-06-25 Thread GitBox



wangyum opened a new pull request #33099:
URL: https://github.com/apache/spark/pull/33099


   ### What changes were proposed in this pull request?
   
   Make `RebalancePartitions` extends `RepartitionOperation`.
   
   ### Why are the changes needed?
   
   `CollapseRepartition` can optimize `RebalancePartitions` if possible.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   
   ### How was this patch tested?
   
   Unit test.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #32921: [SPARK-35779][SQL] Dynamic filtering for Data Source V2

2021-06-25 Thread GitBox



SparkQA removed a comment on pull request #32921:
URL: https://github.com/apache/spark/pull/32921#issuecomment-868868259


   **[Test build #140348 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140348/testReport)**
 for PR 32921 at commit 
[`b138ec4`](https://github.com/apache/spark/commit/b138ec45fa21aa7d8926f9746029103cfdee9d65).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32921: [SPARK-35779][SQL] Dynamic filtering for Data Source V2

2021-06-25 Thread GitBox



SparkQA commented on pull request #32921:
URL: https://github.com/apache/spark/pull/32921#issuecomment-868935984


   **[Test build #140348 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140348/testReport)**
 for PR 32921 at commit 
[`b138ec4`](https://github.com/apache/spark/commit/b138ec45fa21aa7d8926f9746029103cfdee9d65).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #33097: [SPARK-35901][PYTHON] Refine type hints in pyspark.pandas.window

2021-06-25 Thread GitBox



AmplabJenkins removed a comment on pull request #33097:
URL: https://github.com/apache/spark/pull/33097#issuecomment-868935705


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/44883/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #33097: [SPARK-35901][PYTHON] Refine type hints in pyspark.pandas.window

2021-06-25 Thread GitBox



AmplabJenkins commented on pull request #33097:
URL: https://github.com/apache/spark/pull/33097#issuecomment-868935705


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/44883/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32921: [SPARK-35779][SQL] Dynamic filtering for Data Source V2

2021-06-25 Thread GitBox



SparkQA commented on pull request #32921:
URL: https://github.com/apache/spark/pull/32921#issuecomment-868934750


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44884/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #33097: [SPARK-35901][PYTHON] Refine type hints in pyspark.pandas.window

2021-06-25 Thread GitBox



SparkQA commented on pull request #33097:
URL: https://github.com/apache/spark/pull/33097#issuecomment-868934724


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44883/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Yikun commented on pull request #32867: [SPARK-35721][PYTHON] Path level discover for python unittests

2021-06-25 Thread GitBox



Yikun commented on pull request #32867:
URL: https://github.com/apache/spark/pull/32867#issuecomment-868934485


   @HyukjinKwon Ready for review, it would be good if you could take a look 
again. : )


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32753: [SPARK-34859][SQL] Handle column index when using vectorized Parquet reader

2021-06-25 Thread GitBox



dongjoon-hyun commented on a change in pull request #32753:
URL: https://github.com/apache/spark/pull/32753#discussion_r659108805



##
File path: 
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/ParquetReadState.java
##
@@ -33,31 +51,107 @@
   /** The remaining number of values to read in the current batch */
   int valuesToReadInBatch;
 
-  ParquetReadState(int maxDefinitionLevel) {
+  ParquetReadState(int maxDefinitionLevel, PrimitiveIterator.OfLong 
rowIndexes) {
 this.maxDefinitionLevel = maxDefinitionLevel;
+this.rowRanges = rowIndexes == null ? null : constructRanges(rowIndexes);
+nextRange();
   }
 
   /**
-   * Called at the beginning of reading a new batch.
+   * Construct a list of row ranges from the given `rowIndexes`. For example, 
suppose the
+   * `rowIndexes` are `[0, 1, 2, 4, 5, 7, 8, 9]`, it will be converted into 3 
row ranges:
+   * `[0-2], [4-5], [7-9]`.
*/
-  void resetForBatch(int batchSize) {
+  private Iterator constructRanges(PrimitiveIterator.OfLong 
rowIndexes) {
+List rowRanges = new ArrayList<>();
+long currentStart = Long.MIN_VALUE;
+long previous = Long.MIN_VALUE;
+
+while (rowIndexes.hasNext()) {
+  long idx = rowIndexes.nextLong();
+  if (previous == Long.MIN_VALUE) {
+currentStart = previous = idx;
+  } else if (previous + 1 != idx) {
+RowRange range = new RowRange(currentStart, previous);
+rowRanges.add(range);
+currentStart = previous = idx;
+  } else {
+previous = idx;
+  }
+}
+
+if (previous != Long.MIN_VALUE) {
+  rowRanges.add(new RowRange(currentStart, previous));
+}
+
+return rowRanges.iterator();
+  }
+
+  /**
+   * Must be called at the beginning of reading a new batch.
+   */
+  void resetForNewBatch(int batchSize) {
 this.offset = 0;
 this.valuesToReadInBatch = batchSize;
   }
 
   /**
-   * Called at the beginning of reading a new page.
+   * Must be called at the beginning of reading a new page.
*/
-  void resetForPage(int totalValuesInPage) {
+  void resetForNewPage(int totalValuesInPage, long pageFirstRowIndex) {
 this.valuesToReadInPage = totalValuesInPage;
+this.rowId = pageFirstRowIndex;
   }
 
   /**
-   * Advance the current offset to the new values.
+   * Returns the start index of the current row range.
*/
-  void advanceOffset(int newOffset) {
+  long currentRangeStart() {
+return currentRange.start;
+  }
+
+  /**
+   * Returns the end index of the current row range.
+   */
+  long currentRangeEnd() {
+return currentRange.end;
+  }
+
+  /**
+   * Advance the current offset and rowId to the new values.
+   */
+  void advanceOffsetAndRowId(int newOffset, long newRowId) {
 valuesToReadInBatch -= (newOffset - offset);
-valuesToReadInPage -= (newOffset - offset);
+valuesToReadInPage -= (newRowId - rowId);
 offset = newOffset;
+rowId = newRowId;
+  }
+
+  /**
+   * Advance to the next range.
+   */
+  void nextRange() {
+if (rowRanges == null) {
+  currentRange = MAX_ROW_RANGE;
+} else {
+  if (!rowRanges.hasNext()) {
+currentRange = MIN_ROW_RANGE;
+  } else {
+currentRange = rowRanges.next();
+  }
+}
+  }
+
+  /**
+   * Helper struct to represent a range of row indexes `[start, end]`.
+   */
+  private static class RowRange {
+long start;

Review comment:
   If this should be immutable by definition of this struct, maybe `final`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32753: [SPARK-34859][SQL] Handle column index when using vectorized Parquet reader

2021-06-25 Thread GitBox



dongjoon-hyun commented on a change in pull request #32753:
URL: https://github.com/apache/spark/pull/32753#discussion_r659108572



##
File path: 
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/ParquetReadState.java
##
@@ -33,31 +51,107 @@
   /** The remaining number of values to read in the current batch */
   int valuesToReadInBatch;
 
-  ParquetReadState(int maxDefinitionLevel) {
+  ParquetReadState(int maxDefinitionLevel, PrimitiveIterator.OfLong 
rowIndexes) {
 this.maxDefinitionLevel = maxDefinitionLevel;
+this.rowRanges = rowIndexes == null ? null : constructRanges(rowIndexes);
+nextRange();
   }
 
   /**
-   * Called at the beginning of reading a new batch.
+   * Construct a list of row ranges from the given `rowIndexes`. For example, 
suppose the
+   * `rowIndexes` are `[0, 1, 2, 4, 5, 7, 8, 9]`, it will be converted into 3 
row ranges:
+   * `[0-2], [4-5], [7-9]`.
*/
-  void resetForBatch(int batchSize) {
+  private Iterator constructRanges(PrimitiveIterator.OfLong 
rowIndexes) {
+List rowRanges = new ArrayList<>();
+long currentStart = Long.MIN_VALUE;
+long previous = Long.MIN_VALUE;
+
+while (rowIndexes.hasNext()) {
+  long idx = rowIndexes.nextLong();
+  if (previous == Long.MIN_VALUE) {
+currentStart = previous = idx;
+  } else if (previous + 1 != idx) {
+RowRange range = new RowRange(currentStart, previous);
+rowRanges.add(range);
+currentStart = previous = idx;
+  } else {
+previous = idx;
+  }
+}
+
+if (previous != Long.MIN_VALUE) {
+  rowRanges.add(new RowRange(currentStart, previous));
+}
+
+return rowRanges.iterator();
+  }
+
+  /**
+   * Must be called at the beginning of reading a new batch.
+   */
+  void resetForNewBatch(int batchSize) {
 this.offset = 0;
 this.valuesToReadInBatch = batchSize;
   }
 
   /**
-   * Called at the beginning of reading a new page.
+   * Must be called at the beginning of reading a new page.
*/
-  void resetForPage(int totalValuesInPage) {
+  void resetForNewPage(int totalValuesInPage, long pageFirstRowIndex) {
 this.valuesToReadInPage = totalValuesInPage;
+this.rowId = pageFirstRowIndex;
   }
 
   /**
-   * Advance the current offset to the new values.
+   * Returns the start index of the current row range.
*/
-  void advanceOffset(int newOffset) {
+  long currentRangeStart() {
+return currentRange.start;
+  }
+
+  /**
+   * Returns the end index of the current row range.
+   */
+  long currentRangeEnd() {
+return currentRange.end;
+  }
+
+  /**
+   * Advance the current offset and rowId to the new values.
+   */
+  void advanceOffsetAndRowId(int newOffset, long newRowId) {
 valuesToReadInBatch -= (newOffset - offset);
-valuesToReadInPage -= (newOffset - offset);
+valuesToReadInPage -= (newRowId - rowId);
 offset = newOffset;
+rowId = newRowId;
+  }
+
+  /**
+   * Advance to the next range.
+   */
+  void nextRange() {
+if (rowRanges == null) {
+  currentRange = MAX_ROW_RANGE;
+} else {
+  if (!rowRanges.hasNext()) {
+currentRange = MIN_ROW_RANGE;
+  } else {
+currentRange = rowRanges.next();
+  }
+}

Review comment:
   Shall we flatten more?
   ```java
   if (rowRanges == null) {
 currentRange = MAX_ROW_RANGE;
   } else if (!rowRanges.hasNext()) {
 currentRange = MIN_ROW_RANGE;
   } else {
 currentRange = rowRanges.next();
   }
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #33097: [SPARK-35901][PYTHON] Refine type hints in pyspark.pandas.window

2021-06-25 Thread GitBox



SparkQA removed a comment on pull request #33097:
URL: https://github.com/apache/spark/pull/33097#issuecomment-868928627


   **[Test build #140352 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140352/testReport)**
 for PR 33097 at commit 
[`53febc2`](https://github.com/apache/spark/commit/53febc20416506da2877a4e988ddad606859e732).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32753: [SPARK-34859][SQL] Handle column index when using vectorized Parquet reader

2021-06-25 Thread GitBox



dongjoon-hyun commented on a change in pull request #32753:
URL: https://github.com/apache/spark/pull/32753#discussion_r659108216



##
File path: 
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/ParquetReadState.java
##
@@ -33,31 +51,107 @@
   /** The remaining number of values to read in the current batch */
   int valuesToReadInBatch;
 
-  ParquetReadState(int maxDefinitionLevel) {
+  ParquetReadState(int maxDefinitionLevel, PrimitiveIterator.OfLong 
rowIndexes) {
 this.maxDefinitionLevel = maxDefinitionLevel;
+this.rowRanges = rowIndexes == null ? null : constructRanges(rowIndexes);
+nextRange();
   }
 
   /**
-   * Called at the beginning of reading a new batch.
+   * Construct a list of row ranges from the given `rowIndexes`. For example, 
suppose the
+   * `rowIndexes` are `[0, 1, 2, 4, 5, 7, 8, 9]`, it will be converted into 3 
row ranges:
+   * `[0-2], [4-5], [7-9]`.
*/
-  void resetForBatch(int batchSize) {
+  private Iterator constructRanges(PrimitiveIterator.OfLong 
rowIndexes) {
+List rowRanges = new ArrayList<>();
+long currentStart = Long.MIN_VALUE;
+long previous = Long.MIN_VALUE;
+
+while (rowIndexes.hasNext()) {
+  long idx = rowIndexes.nextLong();
+  if (previous == Long.MIN_VALUE) {
+currentStart = previous = idx;
+  } else if (previous + 1 != idx) {
+RowRange range = new RowRange(currentStart, previous);
+rowRanges.add(range);
+currentStart = previous = idx;
+  } else {
+previous = idx;
+  }

Review comment:
   If we always do `previous = idx` in all three cases, shall we simplify 
the logic by moving `previous = idx` out of the `if-else-statements`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32753: [SPARK-34859][SQL] Handle column index when using vectorized Parquet reader

2021-06-25 Thread GitBox



dongjoon-hyun commented on a change in pull request #32753:
URL: https://github.com/apache/spark/pull/32753#discussion_r659107932



##
File path: 
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/ParquetReadState.java
##
@@ -17,13 +17,31 @@
 
 package org.apache.spark.sql.execution.datasources.parquet;
 
+import java.util.ArrayList;
+import java.util.Iterator;
+import java.util.List;
+import java.util.PrimitiveIterator;
+
 /**
  * Helper class to store intermediate state while reading a Parquet column 
chunk.
  */
 final class ParquetReadState {
-  /** Maximum definition level */
+  private static final RowRange MAX_ROW_RANGE = new RowRange(Long.MIN_VALUE, 
Long.MAX_VALUE);
+  private static final RowRange MIN_ROW_RANGE = new RowRange(Long.MAX_VALUE, 
Long.MIN_VALUE);
+
+  /** Iterator over all row ranges, only not-null if column index is present */
+  private final Iterator rowRanges;

Review comment:
   Thank you for the illustration.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #33097: [SPARK-35901][PYTHON] Refine type hints in pyspark.pandas.window

2021-06-25 Thread GitBox



SparkQA commented on pull request #33097:
URL: https://github.com/apache/spark/pull/33097#issuecomment-868932228


   **[Test build #140352 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140352/testReport)**
 for PR 33097 at commit 
[`53febc2`](https://github.com/apache/spark/commit/53febc20416506da2877a4e988ddad606859e732).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
 * `class RollingAndExpanding(Generic[T_Frame], metaclass=ABCMeta):`
 * `class RollingLike(RollingAndExpanding[T_Frame]):`
 * `class Rolling(RollingLike[T_Frame]):`
 * `class RollingGroupby(RollingLike[T_Frame]):`
 * `class ExpandingLike(RollingAndExpanding[T_Frame]):`
 * `class Expanding(ExpandingLike[T_Frame]):`
 * `class ExpandingGroupby(ExpandingLike[T_Frame]):`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #32921: [SPARK-35779][SQL] Dynamic filtering for Data Source V2

2021-06-25 Thread GitBox



dongjoon-hyun commented on pull request #32921:
URL: https://github.com/apache/spark/pull/32921#issuecomment-868932131


   Thank you for rebasing, @aokolnychyi .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32921: [SPARK-35779][SQL] Dynamic filtering for Data Source V2

2021-06-25 Thread GitBox



SparkQA commented on pull request #32921:
URL: https://github.com/apache/spark/pull/32921#issuecomment-868930686


   **[Test build #140353 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140353/testReport)**
 for PR 32921 at commit 
[`d5781be`](https://github.com/apache/spark/commit/d5781be3d3f771610a779cf085486a9d8646a0b8).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32921: [SPARK-35779][SQL] Dynamic filtering for Data Source V2

2021-06-25 Thread GitBox



SparkQA commented on pull request #32921:
URL: https://github.com/apache/spark/pull/32921#issuecomment-868930518


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44884/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #33097: [SPARK-35901][PYTHON] Refine type hints in pyspark.pandas.window

2021-06-25 Thread GitBox



SparkQA commented on pull request #33097:
URL: https://github.com/apache/spark/pull/33097#issuecomment-868930350


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44883/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun opened a new pull request #33098: [SPARK-35903][TESTS] Parameterize 'master' in TPCDSQueryBenchmark

2021-06-25 Thread GitBox



dongjoon-hyun opened a new pull request #33098:
URL: https://github.com/apache/spark/pull/33098


   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #33097: [SPARK-35901][PYTHON] Refine type hints in pyspark.pandas.window

2021-06-25 Thread GitBox



SparkQA commented on pull request #33097:
URL: https://github.com/apache/spark/pull/33097#issuecomment-868928627


   **[Test build #140352 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140352/testReport)**
 for PR 33097 at commit 
[`53febc2`](https://github.com/apache/spark/commit/53febc20416506da2877a4e988ddad606859e732).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #33065: [SPARK-35880][SS] Track the duplicates dropped count in dedupe operator

2021-06-25 Thread GitBox



SparkQA removed a comment on pull request #33065:
URL: https://github.com/apache/spark/pull/33065#issuecomment-868853841


   **[Test build #140346 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140346/testReport)**
 for PR 33065 at commit 
[`4d05f7a`](https://github.com/apache/spark/commit/4d05f7af67486addfb02d2d62998b662c5976006).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #33065: [SPARK-35880][SS] Track the duplicates dropped count in dedupe operator

2021-06-25 Thread GitBox



SparkQA commented on pull request #33065:
URL: https://github.com/apache/spark/pull/33065#issuecomment-868927200


   **[Test build #140346 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140346/testReport)**
 for PR 33065 at commit 
[`4d05f7a`](https://github.com/apache/spark/commit/4d05f7af67486addfb02d2d62998b662c5976006).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] mridulm commented on a change in pull request #33034: WIP: [SPARK-32923][CORE][SHUFFLE] Handle indeterminate stage retries for push-based shuffle

2021-06-25 Thread GitBox



mridulm commented on a change in pull request #33034:
URL: https://github.com/apache/spark/pull/33034#discussion_r659103667



##
File path: 
common/network-common/src/main/java/org/apache/spark/network/client/TransportClient.java
##
@@ -222,7 +223,7 @@ public void sendMergedBlockMetaReq(
 handler.addRpcRequest(requestId, callback);
 RpcChannelListener listener = new RpcChannelListener(requestId, callback);
 channel.writeAndFlush(
-  new MergedBlockMetaRequest(requestId, appId, shuffleId, 
reduceId)).addListener(listener);
+  new MergedBlockMetaRequest(requestId, appId, shuffleId, 
shuffleSequenceId, reduceId)).addListener(listener);

Review comment:
   When an indeterminate stage is retried due to fetch failure, spark will 
fail all children stages which depend on that stage across all jobs and fail 
all of them so they are retried. See 
[here](https://github.com/apache/spark/blob/b5a15035851bfba12ef1c68d10103cec42cbac0c/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L1807).
   
   If a failed stages task does not honor task cancellation and runs to 
completion, its output would still not be consumed (and will not be candidate 
for finalization as well)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] mridulm commented on a change in pull request #33034: WIP: [SPARK-32923][CORE][SHUFFLE] Handle indeterminate stage retries for push-based shuffle

2021-06-25 Thread GitBox



mridulm commented on a change in pull request #33034:
URL: https://github.com/apache/spark/pull/33034#discussion_r659103667



##
File path: 
common/network-common/src/main/java/org/apache/spark/network/client/TransportClient.java
##
@@ -222,7 +223,7 @@ public void sendMergedBlockMetaReq(
 handler.addRpcRequest(requestId, callback);
 RpcChannelListener listener = new RpcChannelListener(requestId, callback);
 channel.writeAndFlush(
-  new MergedBlockMetaRequest(requestId, appId, shuffleId, 
reduceId)).addListener(listener);
+  new MergedBlockMetaRequest(requestId, appId, shuffleId, 
shuffleSequenceId, reduceId)).addListener(listener);

Review comment:
   When an indeterminate stage is retried due to fetch failure, spark will 
fail all children stages which depend on that stage across all jobs and fail 
all of them so they are retried. See 
[here](https://github.com/apache/spark/blob/b5a15035851bfba12ef1c68d10103cec42cbac0c/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L1807).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32753: [SPARK-34859][SQL] Handle column index when using vectorized Parquet reader

2021-06-25 Thread GitBox



AmplabJenkins removed a comment on pull request #32753:
URL: https://github.com/apache/spark/pull/32753#issuecomment-868924203


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/140345/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #33096: [SPARK-35899][SQL] Utility to convert connector expressions to Catalyst

2021-06-25 Thread GitBox



AmplabJenkins removed a comment on pull request #33096:
URL: https://github.com/apache/spark/pull/33096#issuecomment-868924198


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/44881/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #33095: [WIP][SPARK-35339][PYTHON] Improve unit tests for data-type-based basic operations

2021-06-25 Thread GitBox



AmplabJenkins removed a comment on pull request #33095:
URL: https://github.com/apache/spark/pull/33095#issuecomment-868924197


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/44882/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] wangyum commented on a change in pull request #32932: [SPARK-35786][SQL] Add a new operator to rebalance the query output if AQE is enabled

2021-06-25 Thread GitBox



wangyum commented on a change in pull request #32932:
URL: https://github.com/apache/spark/pull/32932#discussion_r659102206



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
##
@@ -1351,6 +1351,31 @@ object RepartitionByExpression {
   }
 }
 
+/**
+ * This operator is used to rebalance the output partitions of the given 
`child`, so that every
+ * partition is of a reasonable size (not too small and not too big). It also 
try its best to
+ * partition the child output by `partitionExpressions`. If there are skews, 
Spark will split the
+ * skewed partitions, to make these partitions not too big. This operator is 
useful when you need
+ * to write the result of `child` to a table, to avoid too small/big files.
+ *
+ * Note that, this operator only makes sense when AQE is enabled.
+ */
+case class RebalancePartitions(
+partitionExpressions: Seq[Expression],
+child: LogicalPlan) extends UnaryNode {

Review comment:
   Make `RebalancePartitions` extends `RepartitionOperation`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #32753: [SPARK-34859][SQL] Handle column index when using vectorized Parquet reader

2021-06-25 Thread GitBox



AmplabJenkins commented on pull request #32753:
URL: https://github.com/apache/spark/pull/32753#issuecomment-868924203


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/140345/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #33096: [SPARK-35899][SQL] Utility to convert connector expressions to Catalyst

2021-06-25 Thread GitBox



AmplabJenkins commented on pull request #33096:
URL: https://github.com/apache/spark/pull/33096#issuecomment-868924198


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/44881/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #33095: [WIP][SPARK-35339][PYTHON] Improve unit tests for data-type-based basic operations

2021-06-25 Thread GitBox



AmplabJenkins commented on pull request #33095:
URL: https://github.com/apache/spark/pull/33095#issuecomment-868924197


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/44882/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #33078: [SPARK-35546][Shuffle] Enable push-based shuffle when multiple app attempts are enabled and manage concurrent access to the state in a

2021-06-25 Thread GitBox



AmplabJenkins commented on pull request #33078:
URL: https://github.com/apache/spark/pull/33078#issuecomment-868924160


   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] aokolnychyi commented on a change in pull request #33096: [SPARK-35899][SQL] Utility to convert connector expressions to Catalyst

2021-06-25 Thread GitBox



aokolnychyi commented on a change in pull request #33096:
URL: https://github.com/apache/spark/pull/33096#discussion_r659099103



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/V2ExpressionUtils.scala
##
@@ -0,0 +1,80 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions
+
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.catalyst.SQLConfHelper
+import org.apache.spark.sql.catalyst.analysis.Resolver
+import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+import org.apache.spark.sql.connector.expressions.{Expression => V2Expression, 
FieldReference, IdentityTransform, NamedReference, NullOrdering => 
V2NullOrdering, SortDirection => V2SortDirection, SortValue}
+import org.apache.spark.sql.errors.QueryCompilationErrors
+
+/**
+ * A utility class that converts public connector expressions into Catalyst 
expressions.
+ */
+private[sql] object V2ExpressionUtils extends SQLConfHelper {
+  import 
org.apache.spark.sql.connector.catalog.CatalogV2Implicits.MultipartIdentifierHelper
+
+  def resolveRef[T <: NamedExpression](

Review comment:
   Yeah, totally!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] viirya commented on pull request #33096: [SPARK-35899][SQL] Utility to convert connector expressions to Catalyst

2021-06-25 Thread GitBox



viirya commented on pull request #33096:
URL: https://github.com/apache/spark/pull/33096#issuecomment-868902634


   lgtm too


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] viirya commented on a change in pull request #33096: [SPARK-35899][SQL] Utility to convert connector expressions to Catalyst

2021-06-25 Thread GitBox



viirya commented on a change in pull request #33096:
URL: https://github.com/apache/spark/pull/33096#discussion_r659099034



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/V2ExpressionUtils.scala
##
@@ -0,0 +1,80 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions
+
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.catalyst.SQLConfHelper
+import org.apache.spark.sql.catalyst.analysis.Resolver
+import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+import org.apache.spark.sql.connector.expressions.{Expression => V2Expression, 
FieldReference, IdentityTransform, NamedReference, NullOrdering => 
V2NullOrdering, SortDirection => V2SortDirection, SortValue}
+import org.apache.spark.sql.errors.QueryCompilationErrors
+
+/**
+ * A utility class that converts public connector expressions into Catalyst 
expressions.
+ */
+private[sql] object V2ExpressionUtils extends SQLConfHelper {
+  import 
org.apache.spark.sql.connector.catalog.CatalogV2Implicits.MultipartIdentifierHelper
+
+  def resolveRef[T <: NamedExpression](

Review comment:
   Oh, I see. I cannot tell it from this change. Thanks.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] aokolnychyi commented on a change in pull request #33096: [SPARK-35899][SQL] Utility to convert connector expressions to Catalyst

2021-06-25 Thread GitBox



aokolnychyi commented on a change in pull request #33096:
URL: https://github.com/apache/spark/pull/33096#discussion_r659098919



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/V2ExpressionUtils.scala
##
@@ -0,0 +1,80 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions
+
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.catalyst.SQLConfHelper
+import org.apache.spark.sql.catalyst.analysis.Resolver
+import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+import org.apache.spark.sql.connector.expressions.{Expression => V2Expression, 
FieldReference, IdentityTransform, NamedReference, NullOrdering => 
V2NullOrdering, SortDirection => V2SortDirection, SortValue}
+import org.apache.spark.sql.errors.QueryCompilationErrors
+
+/**
+ * A utility class that converts public connector expressions into Catalyst 
expressions.
+ */
+private[sql] object V2ExpressionUtils extends SQLConfHelper {
+  import 
org.apache.spark.sql.connector.catalog.CatalogV2Implicits.MultipartIdentifierHelper
+
+  def resolveRef[T <: NamedExpression](

Review comment:
   There are certain places where we need a bit more specific types. I 
could cast after calling but that does not seem better.
   
   For example, I need `Attribute` 
[here](https://github.com/apache/spark/pull/32921/files#diff-9dc37f97148227618575e1c56f6177260412561e7b44ef93eb5d7acf7a0fee52R76).
 I also need `AttributeReference` 
[here](https://github.com/apache/spark/pull/33008/files#diff-c5574d47ec4d5764008276aab9acc836e4526d3a95c3fcbbf9c53c67b05538f8R110).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] ueshin closed pull request #33094: [SPARK-35466][PYTHON] Fix disallow_untyped_defs mypy checks for pyspark.pandas.data_type_ops.*

2021-06-25 Thread GitBox



ueshin closed pull request #33094:
URL: https://github.com/apache/spark/pull/33094


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] ueshin commented on pull request #33094: [SPARK-35466][PYTHON] Fix disallow_untyped_defs mypy checks for pyspark.pandas.data_type_ops.*

2021-06-25 Thread GitBox



ueshin commented on pull request #33094:
URL: https://github.com/apache/spark/pull/33094#issuecomment-868902241


   Thanks! merging to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #32753: [SPARK-34859][SQL] Handle column index when using vectorized Parquet reader

2021-06-25 Thread GitBox



SparkQA removed a comment on pull request #32753:
URL: https://github.com/apache/spark/pull/32753#issuecomment-868829589


   **[Test build #140345 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140345/testReport)**
 for PR 32753 at commit 
[`4e16bbd`](https://github.com/apache/spark/commit/4e16bbd0948db19696eb296cf4189319e3adc05a).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32753: [SPARK-34859][SQL] Handle column index when using vectorized Parquet reader

2021-06-25 Thread GitBox



SparkQA commented on pull request #32753:
URL: https://github.com/apache/spark/pull/32753#issuecomment-868901657


   **[Test build #140345 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140345/testReport)**
 for PR 32753 at commit 
[`4e16bbd`](https://github.com/apache/spark/commit/4e16bbd0948db19696eb296cf4189319e3adc05a).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] viirya commented on a change in pull request #33096: [SPARK-35899][SQL] Utility to convert connector expressions to Catalyst

2021-06-25 Thread GitBox



viirya commented on a change in pull request #33096:
URL: https://github.com/apache/spark/pull/33096#discussion_r659097582



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/V2ExpressionUtils.scala
##
@@ -0,0 +1,80 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions
+
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.catalyst.SQLConfHelper
+import org.apache.spark.sql.catalyst.analysis.Resolver
+import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+import org.apache.spark.sql.connector.expressions.{Expression => V2Expression, 
FieldReference, IdentityTransform, NamedReference, NullOrdering => 
V2NullOrdering, SortDirection => V2SortDirection, SortValue}
+import org.apache.spark.sql.errors.QueryCompilationErrors
+
+/**
+ * A utility class that converts public connector expressions into Catalyst 
expressions.
+ */
+private[sql] object V2ExpressionUtils extends SQLConfHelper {
+  import 
org.apache.spark.sql.connector.catalog.CatalogV2Implicits.MultipartIdentifierHelper
+
+  def resolveRef[T <: NamedExpression](

Review comment:
   Do we need generic here? I think `resolve` just return `NamedExpression`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] aokolnychyi commented on pull request #33096: [SPARK-35899][SQL] Utility to convert connector expressions to Catalyst

2021-06-25 Thread GitBox



aokolnychyi commented on pull request #33096:
URL: https://github.com/apache/spark/pull/33096#issuecomment-868900677


   Thank you, @dongjoon-hyun!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #33096: [SPARK-35899][SQL] Utility to convert connector expressions to Catalyst

2021-06-25 Thread GitBox



SparkQA commented on pull request #33096:
URL: https://github.com/apache/spark/pull/33096#issuecomment-868900609


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44881/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] ueshin opened a new pull request #33097: [SPARK-35901][PYTHON] Refine type hints in pyspark.pandas.window

2021-06-25 Thread GitBox



ueshin opened a new pull request #33097:
URL: https://github.com/apache/spark/pull/33097


   ### What changes were proposed in this pull request?
   
   Refines type hints in `pyspark.pandas.window`.
   
   Also, some refactoring is included to clean up the type hierarchy of 
`Rolling` and `Expanding`.
   
   ### Why are the changes needed?
   
   We can use more strict type hints for functions in pyspark.pandas.window 
using the generic way.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Existing tests.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun closed pull request #33096: [SPARK-35899][SQL] Utility to convert connector expressions to Catalyst

2021-06-25 Thread GitBox



dongjoon-hyun closed pull request #33096:
URL: https://github.com/apache/spark/pull/33096


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #33096: [SPARK-35899][SQL] Utility to convert connector expressions to Catalyst

2021-06-25 Thread GitBox



dongjoon-hyun commented on pull request #33096:
URL: https://github.com/apache/spark/pull/33096#issuecomment-868900348


   I checked the GitHub Action. There is one irrelevant failure and the others 
passed.
   ```
- SPARK-29022: Commands using SerDe provided in --hive.aux.jars.path *** 
FAILED *** (3 minutes, 19 seconds)
   ```
   
   Merged to master for Apache Spark 3.2.0.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #33095: [WIP][SPARK-35339][PYTHON] Improve unit tests for data-type-based basic operations

2021-06-25 Thread GitBox



SparkQA commented on pull request #33095:
URL: https://github.com/apache/spark/pull/33095#issuecomment-868900344


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44882/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #33080: [SPARK-35728][SPARK-35778][FOLLOWUP][TESTS] Add test case to check multiply/divide of day-time interval and year-month interval of any

2021-06-25 Thread GitBox



AmplabJenkins commented on pull request #33080:
URL: https://github.com/apache/spark/pull/33080#issuecomment-868899123


   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] aokolnychyi commented on a change in pull request #32921: [SPARK-35779][SQL] Dynamic filtering for Data Source V2

2021-06-25 Thread GitBox



aokolnychyi commented on a change in pull request #32921:
URL: https://github.com/apache/spark/pull/32921#discussion_r659095549



##
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/subquery.scala
##
@@ -227,3 +228,14 @@ object ReuseSubquery extends Rule[SparkPlan] {
 }
   }
 }
+
+object PrepareScans extends Rule[SparkPlan] {
+  def apply(plan: SparkPlan): SparkPlan = {
+val scans = plan.collect {
+  case scan: BatchScanExec => scan

Review comment:
   Resolving this one too as it no longer applies.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] aokolnychyi commented on a change in pull request #32921: [SPARK-35779][SQL] Dynamic filtering for Data Source V2

2021-06-25 Thread GitBox



aokolnychyi commented on a change in pull request #32921:
URL: https://github.com/apache/spark/pull/32921#discussion_r659095329



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
##
@@ -96,6 +96,7 @@ case class AdaptiveSparkPlanExec(
   @transient private val queryStageOptimizerRules: Seq[Rule[SparkPlan]] = Seq(
 PlanAdaptiveDynamicPruningFilters(this),
 ReuseAdaptiveSubquery(context.subqueryCache),
+PrepareScans,

Review comment:
   I am resolving this thread to not mislead other reviewers. It no longer 
applies.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32921: [SPARK-35779][SQL] Dynamic filtering for Data Source V2

2021-06-25 Thread GitBox



AmplabJenkins removed a comment on pull request #32921:
URL: https://github.com/apache/spark/pull/32921#issuecomment-868896849


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/44879/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] aokolnychyi commented on pull request #33096: [SPARK-35899][SQL] Utility to convert connector expressions to Catalyst

2021-06-25 Thread GitBox



aokolnychyi commented on pull request #33096:
URL: https://github.com/apache/spark/pull/33096#issuecomment-868897551


   Good call, @dongjoon-hyun. Added to the PR description. Could you check, 
please?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR closed pull request #33085: [SPARK-35894][BUILD] Introduce new style enforce to not import scala.collection.Seq/IndexedSeq

2021-06-25 Thread GitBox



HeartSaVioR closed pull request #33085:
URL: https://github.com/apache/spark/pull/33085


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #32921: [SPARK-35779][SQL] Dynamic filtering for Data Source V2

2021-06-25 Thread GitBox



AmplabJenkins commented on pull request #32921:
URL: https://github.com/apache/spark/pull/32921#issuecomment-868896849


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/44879/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on pull request #33085: [SPARK-35894][BUILD] Introduce new style enforce to not import scala.collection.Seq/IndexedSeq

2021-06-25 Thread GitBox



HeartSaVioR commented on pull request #33085:
URL: https://github.com/apache/spark/pull/33085#issuecomment-868896779


   Thanks, merging to master!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #33096: [SPARK-35899][SQL] Utility to convert connector expressions to Catalyst

2021-06-25 Thread GitBox



SparkQA commented on pull request #33096:
URL: https://github.com/apache/spark/pull/33096#issuecomment-868895421


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44881/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #33095: [WIP][SPARK-35339][PYTHON] Improve unit tests for data-type-based basic operations

2021-06-25 Thread GitBox



SparkQA commented on pull request #33095:
URL: https://github.com/apache/spark/pull/33095#issuecomment-868894980


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44882/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] ueshin commented on pull request #33094: [SPARK-35466][PYTHON] Fix disallow_untyped_defs mypy checks for pyspark.pandas.data_type_ops.*

2021-06-25 Thread GitBox



ueshin commented on pull request #33094:
URL: https://github.com/apache/spark/pull/33094#issuecomment-868893445


   cc @HyukjinKwon @itholic 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] github-actions[bot] commented on pull request #29113: [SPARK-32314][SHS] Add config to control whether log old format of stacktrace

2021-06-25 Thread GitBox



github-actions[bot] commented on pull request #29113:
URL: https://github.com/apache/spark/pull/29113#issuecomment-868891093


   We're closing this PR because it hasn't been updated in a while. This isn't 
a judgement on the merit of the PR in any way. It's just a way of keeping the 
PR queue manageable.
   If you'd like to revive this PR, please reopen it and ask a committer to 
remove the Stale tag!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] github-actions[bot] closed pull request #31840: [SPARK-34745][SQL] Unify overflow exception error message of integral types

2021-06-25 Thread GitBox



github-actions[bot] closed pull request #31840:
URL: https://github.com/apache/spark/pull/31840


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] github-actions[bot] commented on pull request #30763: [SPARK-31801][API][SHUFFLE] Register map output metadata

2021-06-25 Thread GitBox



github-actions[bot] commented on pull request #30763:
URL: https://github.com/apache/spark/pull/30763#issuecomment-868891088


   We're closing this PR because it hasn't been updated in a while. This isn't 
a judgement on the merit of the PR in any way. It's just a way of keeping the 
PR queue manageable.
   If you'd like to revive this PR, please reopen it and ask a committer to 
remove the Stale tag!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32921: [SPARK-35779][SQL] Dynamic filtering for Data Source V2

2021-06-25 Thread GitBox



SparkQA commented on pull request #32921:
URL: https://github.com/apache/spark/pull/32921#issuecomment-868890661


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44879/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #31490: [SPARK-34365][AVRO] Add support for positional Catalyst-to-Avro schema matching

2021-06-25 Thread GitBox



AmplabJenkins commented on pull request #31490:
URL: https://github.com/apache/spark/pull/31490#issuecomment-868889338


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/44880/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #31490: [SPARK-34365][AVRO] Add support for positional Catalyst-to-Avro schema matching

2021-06-25 Thread GitBox



SparkQA commented on pull request #31490:
URL: https://github.com/apache/spark/pull/31490#issuecomment-868889333


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44880/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #33083: Allow sequences (tuples and lists) as pivot values argument in PySpark.

2021-06-25 Thread GitBox



AmplabJenkins commented on pull request #33083:
URL: https://github.com/apache/spark/pull/33083#issuecomment-86298


   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #33095: [WIP][SPARK-35339][PYTHON] Improve unit tests for data-type-based basic operations

2021-06-25 Thread GitBox



SparkQA removed a comment on pull request #33095:
URL: https://github.com/apache/spark/pull/33095#issuecomment-868886086


   **[Test build #140351 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140351/testReport)**
 for PR 33095 at commit 
[`48a364f`](https://github.com/apache/spark/commit/48a364f8228817cc3cef0946fd8184fbad70e829).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #31490: [SPARK-34365][AVRO] Add support for positional Catalyst-to-Avro schema matching

2021-06-25 Thread GitBox



AmplabJenkins removed a comment on pull request #31490:
URL: https://github.com/apache/spark/pull/31490#issuecomment-868885474


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/140349/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32787: [SPARK-35618][SQL] Resolve star expressions in subqueries using outer query plans

2021-06-25 Thread GitBox



AmplabJenkins removed a comment on pull request #32787:
URL: https://github.com/apache/spark/pull/32787#issuecomment-868885472


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/140339/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #33095: [WIP][SPARK-35339][PYTHON] Improve unit tests for data-type-based basic operations

2021-06-25 Thread GitBox



AmplabJenkins removed a comment on pull request #33095:
URL: https://github.com/apache/spark/pull/33095#issuecomment-868886338


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/140351/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #33095: [WIP][SPARK-35339][PYTHON] Improve unit tests for data-type-based basic operations

2021-06-25 Thread GitBox



SparkQA commented on pull request #33095:
URL: https://github.com/apache/spark/pull/33095#issuecomment-868886329


   **[Test build #140351 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140351/testReport)**
 for PR 33095 at commit 
[`48a364f`](https://github.com/apache/spark/commit/48a364f8228817cc3cef0946fd8184fbad70e829).
* This patch **fails Python style tests**.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #33095: [WIP][SPARK-35339][PYTHON] Improve unit tests for data-type-based basic operations

2021-06-25 Thread GitBox



AmplabJenkins commented on pull request #33095:
URL: https://github.com/apache/spark/pull/33095#issuecomment-868886338


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/140351/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #33095: [WIP][SPARK-35339][PYTHON] Improve unit tests for data-type-based basic operations

2021-06-25 Thread GitBox



SparkQA commented on pull request #33095:
URL: https://github.com/apache/spark/pull/33095#issuecomment-868886086


   **[Test build #140351 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140351/testReport)**
 for PR 33095 at commit 
[`48a364f`](https://github.com/apache/spark/commit/48a364f8228817cc3cef0946fd8184fbad70e829).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #33096: [SPARK-35899][SQL] Utility to convert connector expressions to Catalyst

2021-06-25 Thread GitBox



SparkQA commented on pull request #33096:
URL: https://github.com/apache/spark/pull/33096#issuecomment-868886081


   **[Test build #140350 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140350/testReport)**
 for PR 33096 at commit 
[`3ac51da`](https://github.com/apache/spark/commit/3ac51da333f0400264081a3425e332646282c3a9).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #32787: [SPARK-35618][SQL] Resolve star expressions in subqueries using outer query plans

2021-06-25 Thread GitBox



AmplabJenkins commented on pull request #32787:
URL: https://github.com/apache/spark/pull/32787#issuecomment-868885472


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/140339/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #31490: [SPARK-34365][AVRO] Add support for positional Catalyst-to-Avro schema matching

2021-06-25 Thread GitBox



AmplabJenkins commented on pull request #31490:
URL: https://github.com/apache/spark/pull/31490#issuecomment-868885474


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/140349/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on pull request #33085: [SPARK-35894][BUILD] Introduce new style enforce to not import scala.collection.Seq/IndexedSeq

2021-06-25 Thread GitBox



HeartSaVioR commented on pull request #33085:
URL: https://github.com/apache/spark/pull/33085#issuecomment-868883808


   GA build passed for Scala 2.13 build, and style check with new rule is now 
passed.
   
   @srowen Would it be good to go?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #33096: [SPARK-35899][SQL] Utility to convert connector expressions to Catalyst

2021-06-25 Thread GitBox



dongjoon-hyun commented on pull request #33096:
URL: https://github.com/apache/spark/pull/33096#issuecomment-868883368


   cc @viirya 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #33096: [SPARK-35899][SQL] Utility to convert connector expressions to Catalyst

2021-06-25 Thread GitBox



dongjoon-hyun commented on pull request #33096:
URL: https://github.com/apache/spark/pull/33096#issuecomment-868883315


   Thank you for pinging me, @aokolnychyi !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32921: [SPARK-35779][SQL] Dynamic filtering for Data Source V2

2021-06-25 Thread GitBox



SparkQA commented on pull request #32921:
URL: https://github.com/apache/spark/pull/32921#issuecomment-868882933


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44879/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] aokolnychyi commented on pull request #33096: [SPARK-35899][SQL] Utility to convert connector expressions to Catalyst

2021-06-25 Thread GitBox



aokolnychyi commented on pull request #33096:
URL: https://github.com/apache/spark/pull/33096#issuecomment-868882776


   This PR contains a utility class I need for dynamic filtering.
   
   cc @sunchao @huaxingao @viirya @dongjoon-hyun @cloud-fan @HyukjinKwon 
@rdblue @holdenk


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] aokolnychyi commented on pull request #32921: [SPARK-35779][SQL] Dynamic filtering for Data Source V2

2021-06-25 Thread GitBox



aokolnychyi commented on pull request #32921:
URL: https://github.com/apache/spark/pull/32921#issuecomment-868882181


   Submitted #33096 for the utility class.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] aokolnychyi opened a new pull request #33096: [SPARK-35899][SQL] Utility to convert connector expressions to Catalyst

2021-06-25 Thread GitBox



aokolnychyi opened a new pull request #33096:
URL: https://github.com/apache/spark/pull/33096


   
   
   ### What changes were proposed in this pull request?
   
   
   This PR adds a utility to convert public connector expressions to Catalyst 
expressions.
   
   ### Why are the changes needed?
   
   
   These changes are needed as more and more places require this logic and it 
is better to implement it in a single place.
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   No.
   
   ### How was this patch tested?
   
   
   Existing tests.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #31490: [SPARK-34365][AVRO] Add support for positional Catalyst-to-Avro schema matching

2021-06-25 Thread GitBox



SparkQA commented on pull request #31490:
URL: https://github.com/apache/spark/pull/31490#issuecomment-868882011


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44880/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] ueshin commented on a change in pull request #33094: [SPARK-35466][PYTHON] Fix disallow_untyped_defs mypy checks for pyspark.pandas.data_type_ops.*

2021-06-25 Thread GitBox



ueshin commented on a change in pull request #33094:
URL: https://github.com/apache/spark/pull/33094#discussion_r659078486



##
File path: python/pyspark/pandas/data_type_ops/base.py
##
@@ -65,6 +65,7 @@
 
 
 T_IndexOps = TypeVar("T_IndexOps", bound="IndexOpsMixin")
+IndexOpsLike = Union["Series", "Index"]

Review comment:
   Good reference is here: 
https://stackoverflow.com/questions/58903906/whats-the-difference-between-a-constrained-typevar-and-a-union




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #31490: [SPARK-34365][AVRO] Add support for positional Catalyst-to-Avro schema matching

2021-06-25 Thread GitBox



SparkQA removed a comment on pull request #31490:
URL: https://github.com/apache/spark/pull/31490#issuecomment-868868541


   **[Test build #140349 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140349/testReport)**
 for PR 31490 at commit 
[`a0b4ecd`](https://github.com/apache/spark/commit/a0b4ecd336b63b2bbad3ff80a30249cba590f053).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #31490: [SPARK-34365][AVRO] Add support for positional Catalyst-to-Avro schema matching

2021-06-25 Thread GitBox



SparkQA commented on pull request #31490:
URL: https://github.com/apache/spark/pull/31490#issuecomment-868877249


   **[Test build #140349 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140349/testReport)**
 for PR 31490 at commit 
[`a0b4ecd`](https://github.com/apache/spark/commit/a0b4ecd336b63b2bbad3ff80a30249cba590f053).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Victsm commented on a change in pull request #33034: WIP: [SPARK-32923][CORE][SHUFFLE] Handle indeterminate stage retries for push-based shuffle

2021-06-25 Thread GitBox



Victsm commented on a change in pull request #33034:
URL: https://github.com/apache/spark/pull/33034#discussion_r659076673



##
File path: 
common/network-common/src/main/java/org/apache/spark/network/client/TransportClient.java
##
@@ -222,7 +223,7 @@ public void sendMergedBlockMetaReq(
 handler.addRpcRequest(requestId, callback);
 RpcChannelListener listener = new RpcChannelListener(requestId, callback);
 channel.writeAndFlush(
-  new MergedBlockMetaRequest(requestId, appId, shuffleId, 
reduceId)).addListener(listener);
+  new MergedBlockMetaRequest(requestId, appId, shuffleId, 
shuffleSequenceId, reduceId)).addListener(listener);

Review comment:
   What about the following scenario:
   1. An indeterminate stage generates the shuffle data for a given shuffle.
   2. Downstream reduce stage experienced shuffle fetch failure, leading to 
retry of the indeterminate stage.
   3. Tasks from the retry of the indeterminate stage start pushing blocks, 
which would lead to invalidating the shuffle data from the 1st attempt.
   4. In the meantime, we might still have dangling tasks from the first failed 
reduce stage trying to fetch shuffle blocks corresponding to the 1st attempt of 
the indeterminate stage.
   
   Is the above scenario possible with indeterminate stage retry, and would we 
run into issues if the seq ID is only used on the push side but not the fetch 
side?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] xinrong-databricks commented on pull request #33094: [SPARK-35466][PYTHON] Fix disallow_untyped_defs mypy checks for pyspark.pandas.data_type_ops.*

2021-06-25 Thread GitBox



xinrong-databricks commented on pull request #33094:
URL: https://github.com/apache/spark/pull/33094#issuecomment-868875080


   Thanks for working on that!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] xinrong-databricks commented on a change in pull request #33094: [SPARK-35466][PYTHON] Fix disallow_untyped_defs mypy checks for pyspark.pandas.data_type_ops.*

2021-06-25 Thread GitBox



xinrong-databricks commented on a change in pull request #33094:
URL: https://github.com/apache/spark/pull/33094#discussion_r659076301



##
File path: python/pyspark/pandas/data_type_ops/base.py
##
@@ -65,6 +65,7 @@
 
 
 T_IndexOps = TypeVar("T_IndexOps", bound="IndexOpsMixin")
+IndexOpsLike = Union["Series", "Index"]

Review comment:
   Why do we still need `IndexOpsLike` since we have `T_IndexOps`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #32787: [SPARK-35618][SQL] Resolve star expressions in subqueries using outer query plans

2021-06-25 Thread GitBox



SparkQA removed a comment on pull request #32787:
URL: https://github.com/apache/spark/pull/32787#issuecomment-868762786


   **[Test build #140339 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140339/testReport)**
 for PR 32787 at commit 
[`b093b97`](https://github.com/apache/spark/commit/b093b97e98668cecd7f0cf52ccc830c158e0b22c).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 7 >

1 - 100 of 686 matches

Mail list logo