[GitHub] [spark] AngersZhuuuu commented on pull request #33092: [SPARK-35905][SQL][FOLLOWUP][TESTS] Fix UT mistake in SQLQuerySuite
AngersZh commented on pull request #33092: URL: https://github.com/apache/spark/pull/33092#issuecomment-868952001 @dongjoon-hyun Should I add SPARK-35905 to UT title? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on pull request #33063: [SPARK-35879][Core][Shuffle] Fix performance regression caused by collectFetchRequests
yaooqinn commented on pull request #33063: URL: https://github.com/apache/spark/pull/33063#issuecomment-868947497 thanks all! merged to master/3.1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn closed pull request #33063: [SPARK-35879][Core][Shuffle] Fix performance regression caused by collectFetchRequests
yaooqinn closed pull request #33063: URL: https://github.com/apache/spark/pull/33063 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on pull request #33063: [SPARK-35879][Core][Shuffle] Fix performance regression caused by collectFetchRequests
yaooqinn commented on pull request #33063: URL: https://github.com/apache/spark/pull/33063#issuecomment-868943286 @mridulm @dongjoon-hyun, I re-run the benchmark based on the final commit manually. The debug log below shows the performance regression is gone. ```log 21/06/26 04:04:01 INFO MapOutputTrackerWorker: Got the map output locations 21/06/26 04:04:01 DEBUG ShuffleBlockFetcherIterator: maxBytesInFlight: 50331648, targetRemoteRequestSize: 10066329, maxBlocksInFlightPerAddress: 2147483647 21/06/26 04:04:01 DEBUG ShuffleBlockFetcherIterator: Creating fetch request of 2606046 at BlockManagerId(2, 10.1.5.72, 37767, None) with 88 blocks 21/06/26 04:04:01 DEBUG ShuffleBlockFetcherIterator: Collected remote fetch requests for BlockManagerId(2, 10.1.5.72, 37767, None) in 82 ms ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32140: [SPARK-32922][SHUFFLE][CORE] Adds support for executors to fetch local and remote merged shuffle data
AmplabJenkins removed a comment on pull request #32140: URL: https://github.com/apache/spark/pull/32140#issuecomment-818410050 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32921: [SPARK-35779][SQL] Dynamic filtering for Data Source V2
AmplabJenkins removed a comment on pull request #32921: URL: https://github.com/apache/spark/pull/32921#issuecomment-868940480 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/44884/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32921: [SPARK-35779][SQL] Dynamic filtering for Data Source V2
AmplabJenkins commented on pull request #32921: URL: https://github.com/apache/spark/pull/32921#issuecomment-868940480 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/44884/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] mridulm commented on pull request #32140: [SPARK-32922][SHUFFLE][CORE] Adds support for executors to fetch local and remote merged shuffle data
mridulm commented on pull request #32140: URL: https://github.com/apache/spark/pull/32140#issuecomment-868939032 The github actions test failure looks unrelated, let me try jenkins anyway -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] mridulm commented on pull request #32140: [SPARK-32922][SHUFFLE][CORE] Adds support for executors to fetch local and remote merged shuffle data
mridulm commented on pull request #32140: URL: https://github.com/apache/spark/pull/32140#issuecomment-868938908 Jenkins, test this please -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wangyum commented on pull request #33099: [SPARK-35904][SQL] Collapse above RebalancePartitions
wangyum commented on pull request #33099: URL: https://github.com/apache/spark/pull/33099#issuecomment-868937773 cc @ulysses-you @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wangyum opened a new pull request #33099: [SPARK-35904][SQL] Collapse above RebalancePartitions
wangyum opened a new pull request #33099: URL: https://github.com/apache/spark/pull/33099 ### What changes were proposed in this pull request? Make `RebalancePartitions` extends `RepartitionOperation`. ### Why are the changes needed? `CollapseRepartition` can optimize `RebalancePartitions` if possible. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Unit test. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32921: [SPARK-35779][SQL] Dynamic filtering for Data Source V2
SparkQA removed a comment on pull request #32921: URL: https://github.com/apache/spark/pull/32921#issuecomment-868868259 **[Test build #140348 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140348/testReport)** for PR 32921 at commit [`b138ec4`](https://github.com/apache/spark/commit/b138ec45fa21aa7d8926f9746029103cfdee9d65). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32921: [SPARK-35779][SQL] Dynamic filtering for Data Source V2
SparkQA commented on pull request #32921: URL: https://github.com/apache/spark/pull/32921#issuecomment-868935984 **[Test build #140348 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140348/testReport)** for PR 32921 at commit [`b138ec4`](https://github.com/apache/spark/commit/b138ec45fa21aa7d8926f9746029103cfdee9d65). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33097: [SPARK-35901][PYTHON] Refine type hints in pyspark.pandas.window
AmplabJenkins removed a comment on pull request #33097: URL: https://github.com/apache/spark/pull/33097#issuecomment-868935705 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/44883/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33097: [SPARK-35901][PYTHON] Refine type hints in pyspark.pandas.window
AmplabJenkins commented on pull request #33097: URL: https://github.com/apache/spark/pull/33097#issuecomment-868935705 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/44883/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32921: [SPARK-35779][SQL] Dynamic filtering for Data Source V2
SparkQA commented on pull request #32921: URL: https://github.com/apache/spark/pull/32921#issuecomment-868934750 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44884/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33097: [SPARK-35901][PYTHON] Refine type hints in pyspark.pandas.window
SparkQA commented on pull request #33097: URL: https://github.com/apache/spark/pull/33097#issuecomment-868934724 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44883/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Yikun commented on pull request #32867: [SPARK-35721][PYTHON] Path level discover for python unittests
Yikun commented on pull request #32867: URL: https://github.com/apache/spark/pull/32867#issuecomment-868934485 @HyukjinKwon Ready for review, it would be good if you could take a look again. : ) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32753: [SPARK-34859][SQL] Handle column index when using vectorized Parquet reader
dongjoon-hyun commented on a change in pull request #32753: URL: https://github.com/apache/spark/pull/32753#discussion_r659108805 ## File path: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/ParquetReadState.java ## @@ -33,31 +51,107 @@ /** The remaining number of values to read in the current batch */ int valuesToReadInBatch; - ParquetReadState(int maxDefinitionLevel) { + ParquetReadState(int maxDefinitionLevel, PrimitiveIterator.OfLong rowIndexes) { this.maxDefinitionLevel = maxDefinitionLevel; +this.rowRanges = rowIndexes == null ? null : constructRanges(rowIndexes); +nextRange(); } /** - * Called at the beginning of reading a new batch. + * Construct a list of row ranges from the given `rowIndexes`. For example, suppose the + * `rowIndexes` are `[0, 1, 2, 4, 5, 7, 8, 9]`, it will be converted into 3 row ranges: + * `[0-2], [4-5], [7-9]`. */ - void resetForBatch(int batchSize) { + private Iterator constructRanges(PrimitiveIterator.OfLong rowIndexes) { +List rowRanges = new ArrayList<>(); +long currentStart = Long.MIN_VALUE; +long previous = Long.MIN_VALUE; + +while (rowIndexes.hasNext()) { + long idx = rowIndexes.nextLong(); + if (previous == Long.MIN_VALUE) { +currentStart = previous = idx; + } else if (previous + 1 != idx) { +RowRange range = new RowRange(currentStart, previous); +rowRanges.add(range); +currentStart = previous = idx; + } else { +previous = idx; + } +} + +if (previous != Long.MIN_VALUE) { + rowRanges.add(new RowRange(currentStart, previous)); +} + +return rowRanges.iterator(); + } + + /** + * Must be called at the beginning of reading a new batch. + */ + void resetForNewBatch(int batchSize) { this.offset = 0; this.valuesToReadInBatch = batchSize; } /** - * Called at the beginning of reading a new page. + * Must be called at the beginning of reading a new page. */ - void resetForPage(int totalValuesInPage) { + void resetForNewPage(int totalValuesInPage, long pageFirstRowIndex) { this.valuesToReadInPage = totalValuesInPage; +this.rowId = pageFirstRowIndex; } /** - * Advance the current offset to the new values. + * Returns the start index of the current row range. */ - void advanceOffset(int newOffset) { + long currentRangeStart() { +return currentRange.start; + } + + /** + * Returns the end index of the current row range. + */ + long currentRangeEnd() { +return currentRange.end; + } + + /** + * Advance the current offset and rowId to the new values. + */ + void advanceOffsetAndRowId(int newOffset, long newRowId) { valuesToReadInBatch -= (newOffset - offset); -valuesToReadInPage -= (newOffset - offset); +valuesToReadInPage -= (newRowId - rowId); offset = newOffset; +rowId = newRowId; + } + + /** + * Advance to the next range. + */ + void nextRange() { +if (rowRanges == null) { + currentRange = MAX_ROW_RANGE; +} else { + if (!rowRanges.hasNext()) { +currentRange = MIN_ROW_RANGE; + } else { +currentRange = rowRanges.next(); + } +} + } + + /** + * Helper struct to represent a range of row indexes `[start, end]`. + */ + private static class RowRange { +long start; Review comment: If this should be immutable by definition of this struct, maybe `final`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32753: [SPARK-34859][SQL] Handle column index when using vectorized Parquet reader
dongjoon-hyun commented on a change in pull request #32753: URL: https://github.com/apache/spark/pull/32753#discussion_r659108572 ## File path: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/ParquetReadState.java ## @@ -33,31 +51,107 @@ /** The remaining number of values to read in the current batch */ int valuesToReadInBatch; - ParquetReadState(int maxDefinitionLevel) { + ParquetReadState(int maxDefinitionLevel, PrimitiveIterator.OfLong rowIndexes) { this.maxDefinitionLevel = maxDefinitionLevel; +this.rowRanges = rowIndexes == null ? null : constructRanges(rowIndexes); +nextRange(); } /** - * Called at the beginning of reading a new batch. + * Construct a list of row ranges from the given `rowIndexes`. For example, suppose the + * `rowIndexes` are `[0, 1, 2, 4, 5, 7, 8, 9]`, it will be converted into 3 row ranges: + * `[0-2], [4-5], [7-9]`. */ - void resetForBatch(int batchSize) { + private Iterator constructRanges(PrimitiveIterator.OfLong rowIndexes) { +List rowRanges = new ArrayList<>(); +long currentStart = Long.MIN_VALUE; +long previous = Long.MIN_VALUE; + +while (rowIndexes.hasNext()) { + long idx = rowIndexes.nextLong(); + if (previous == Long.MIN_VALUE) { +currentStart = previous = idx; + } else if (previous + 1 != idx) { +RowRange range = new RowRange(currentStart, previous); +rowRanges.add(range); +currentStart = previous = idx; + } else { +previous = idx; + } +} + +if (previous != Long.MIN_VALUE) { + rowRanges.add(new RowRange(currentStart, previous)); +} + +return rowRanges.iterator(); + } + + /** + * Must be called at the beginning of reading a new batch. + */ + void resetForNewBatch(int batchSize) { this.offset = 0; this.valuesToReadInBatch = batchSize; } /** - * Called at the beginning of reading a new page. + * Must be called at the beginning of reading a new page. */ - void resetForPage(int totalValuesInPage) { + void resetForNewPage(int totalValuesInPage, long pageFirstRowIndex) { this.valuesToReadInPage = totalValuesInPage; +this.rowId = pageFirstRowIndex; } /** - * Advance the current offset to the new values. + * Returns the start index of the current row range. */ - void advanceOffset(int newOffset) { + long currentRangeStart() { +return currentRange.start; + } + + /** + * Returns the end index of the current row range. + */ + long currentRangeEnd() { +return currentRange.end; + } + + /** + * Advance the current offset and rowId to the new values. + */ + void advanceOffsetAndRowId(int newOffset, long newRowId) { valuesToReadInBatch -= (newOffset - offset); -valuesToReadInPage -= (newOffset - offset); +valuesToReadInPage -= (newRowId - rowId); offset = newOffset; +rowId = newRowId; + } + + /** + * Advance to the next range. + */ + void nextRange() { +if (rowRanges == null) { + currentRange = MAX_ROW_RANGE; +} else { + if (!rowRanges.hasNext()) { +currentRange = MIN_ROW_RANGE; + } else { +currentRange = rowRanges.next(); + } +} Review comment: Shall we flatten more? ```java if (rowRanges == null) { currentRange = MAX_ROW_RANGE; } else if (!rowRanges.hasNext()) { currentRange = MIN_ROW_RANGE; } else { currentRange = rowRanges.next(); } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #33097: [SPARK-35901][PYTHON] Refine type hints in pyspark.pandas.window
SparkQA removed a comment on pull request #33097: URL: https://github.com/apache/spark/pull/33097#issuecomment-868928627 **[Test build #140352 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140352/testReport)** for PR 33097 at commit [`53febc2`](https://github.com/apache/spark/commit/53febc20416506da2877a4e988ddad606859e732). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32753: [SPARK-34859][SQL] Handle column index when using vectorized Parquet reader
dongjoon-hyun commented on a change in pull request #32753: URL: https://github.com/apache/spark/pull/32753#discussion_r659108216 ## File path: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/ParquetReadState.java ## @@ -33,31 +51,107 @@ /** The remaining number of values to read in the current batch */ int valuesToReadInBatch; - ParquetReadState(int maxDefinitionLevel) { + ParquetReadState(int maxDefinitionLevel, PrimitiveIterator.OfLong rowIndexes) { this.maxDefinitionLevel = maxDefinitionLevel; +this.rowRanges = rowIndexes == null ? null : constructRanges(rowIndexes); +nextRange(); } /** - * Called at the beginning of reading a new batch. + * Construct a list of row ranges from the given `rowIndexes`. For example, suppose the + * `rowIndexes` are `[0, 1, 2, 4, 5, 7, 8, 9]`, it will be converted into 3 row ranges: + * `[0-2], [4-5], [7-9]`. */ - void resetForBatch(int batchSize) { + private Iterator constructRanges(PrimitiveIterator.OfLong rowIndexes) { +List rowRanges = new ArrayList<>(); +long currentStart = Long.MIN_VALUE; +long previous = Long.MIN_VALUE; + +while (rowIndexes.hasNext()) { + long idx = rowIndexes.nextLong(); + if (previous == Long.MIN_VALUE) { +currentStart = previous = idx; + } else if (previous + 1 != idx) { +RowRange range = new RowRange(currentStart, previous); +rowRanges.add(range); +currentStart = previous = idx; + } else { +previous = idx; + } Review comment: If we always do `previous = idx` in all three cases, shall we simplify the logic by moving `previous = idx` out of the `if-else-statements`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32753: [SPARK-34859][SQL] Handle column index when using vectorized Parquet reader
dongjoon-hyun commented on a change in pull request #32753: URL: https://github.com/apache/spark/pull/32753#discussion_r659107932 ## File path: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/ParquetReadState.java ## @@ -17,13 +17,31 @@ package org.apache.spark.sql.execution.datasources.parquet; +import java.util.ArrayList; +import java.util.Iterator; +import java.util.List; +import java.util.PrimitiveIterator; + /** * Helper class to store intermediate state while reading a Parquet column chunk. */ final class ParquetReadState { - /** Maximum definition level */ + private static final RowRange MAX_ROW_RANGE = new RowRange(Long.MIN_VALUE, Long.MAX_VALUE); + private static final RowRange MIN_ROW_RANGE = new RowRange(Long.MAX_VALUE, Long.MIN_VALUE); + + /** Iterator over all row ranges, only not-null if column index is present */ + private final Iterator rowRanges; Review comment: Thank you for the illustration. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33097: [SPARK-35901][PYTHON] Refine type hints in pyspark.pandas.window
SparkQA commented on pull request #33097: URL: https://github.com/apache/spark/pull/33097#issuecomment-868932228 **[Test build #140352 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140352/testReport)** for PR 33097 at commit [`53febc2`](https://github.com/apache/spark/commit/53febc20416506da2877a4e988ddad606859e732). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class RollingAndExpanding(Generic[T_Frame], metaclass=ABCMeta):` * `class RollingLike(RollingAndExpanding[T_Frame]):` * `class Rolling(RollingLike[T_Frame]):` * `class RollingGroupby(RollingLike[T_Frame]):` * `class ExpandingLike(RollingAndExpanding[T_Frame]):` * `class Expanding(ExpandingLike[T_Frame]):` * `class ExpandingGroupby(ExpandingLike[T_Frame]):` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #32921: [SPARK-35779][SQL] Dynamic filtering for Data Source V2
dongjoon-hyun commented on pull request #32921: URL: https://github.com/apache/spark/pull/32921#issuecomment-868932131 Thank you for rebasing, @aokolnychyi . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32921: [SPARK-35779][SQL] Dynamic filtering for Data Source V2
SparkQA commented on pull request #32921: URL: https://github.com/apache/spark/pull/32921#issuecomment-868930686 **[Test build #140353 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140353/testReport)** for PR 32921 at commit [`d5781be`](https://github.com/apache/spark/commit/d5781be3d3f771610a779cf085486a9d8646a0b8). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32921: [SPARK-35779][SQL] Dynamic filtering for Data Source V2
SparkQA commented on pull request #32921: URL: https://github.com/apache/spark/pull/32921#issuecomment-868930518 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44884/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33097: [SPARK-35901][PYTHON] Refine type hints in pyspark.pandas.window
SparkQA commented on pull request #33097: URL: https://github.com/apache/spark/pull/33097#issuecomment-868930350 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44883/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun opened a new pull request #33098: [SPARK-35903][TESTS] Parameterize 'master' in TPCDSQueryBenchmark
dongjoon-hyun opened a new pull request #33098: URL: https://github.com/apache/spark/pull/33098 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33097: [SPARK-35901][PYTHON] Refine type hints in pyspark.pandas.window
SparkQA commented on pull request #33097: URL: https://github.com/apache/spark/pull/33097#issuecomment-868928627 **[Test build #140352 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140352/testReport)** for PR 33097 at commit [`53febc2`](https://github.com/apache/spark/commit/53febc20416506da2877a4e988ddad606859e732). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #33065: [SPARK-35880][SS] Track the duplicates dropped count in dedupe operator
SparkQA removed a comment on pull request #33065: URL: https://github.com/apache/spark/pull/33065#issuecomment-868853841 **[Test build #140346 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140346/testReport)** for PR 33065 at commit [`4d05f7a`](https://github.com/apache/spark/commit/4d05f7af67486addfb02d2d62998b662c5976006). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33065: [SPARK-35880][SS] Track the duplicates dropped count in dedupe operator
SparkQA commented on pull request #33065: URL: https://github.com/apache/spark/pull/33065#issuecomment-868927200 **[Test build #140346 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140346/testReport)** for PR 33065 at commit [`4d05f7a`](https://github.com/apache/spark/commit/4d05f7af67486addfb02d2d62998b662c5976006). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] mridulm commented on a change in pull request #33034: WIP: [SPARK-32923][CORE][SHUFFLE] Handle indeterminate stage retries for push-based shuffle
mridulm commented on a change in pull request #33034: URL: https://github.com/apache/spark/pull/33034#discussion_r659103667 ## File path: common/network-common/src/main/java/org/apache/spark/network/client/TransportClient.java ## @@ -222,7 +223,7 @@ public void sendMergedBlockMetaReq( handler.addRpcRequest(requestId, callback); RpcChannelListener listener = new RpcChannelListener(requestId, callback); channel.writeAndFlush( - new MergedBlockMetaRequest(requestId, appId, shuffleId, reduceId)).addListener(listener); + new MergedBlockMetaRequest(requestId, appId, shuffleId, shuffleSequenceId, reduceId)).addListener(listener); Review comment: When an indeterminate stage is retried due to fetch failure, spark will fail all children stages which depend on that stage across all jobs and fail all of them so they are retried. See [here](https://github.com/apache/spark/blob/b5a15035851bfba12ef1c68d10103cec42cbac0c/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L1807). If a failed stages task does not honor task cancellation and runs to completion, its output would still not be consumed (and will not be candidate for finalization as well) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] mridulm commented on a change in pull request #33034: WIP: [SPARK-32923][CORE][SHUFFLE] Handle indeterminate stage retries for push-based shuffle
mridulm commented on a change in pull request #33034: URL: https://github.com/apache/spark/pull/33034#discussion_r659103667 ## File path: common/network-common/src/main/java/org/apache/spark/network/client/TransportClient.java ## @@ -222,7 +223,7 @@ public void sendMergedBlockMetaReq( handler.addRpcRequest(requestId, callback); RpcChannelListener listener = new RpcChannelListener(requestId, callback); channel.writeAndFlush( - new MergedBlockMetaRequest(requestId, appId, shuffleId, reduceId)).addListener(listener); + new MergedBlockMetaRequest(requestId, appId, shuffleId, shuffleSequenceId, reduceId)).addListener(listener); Review comment: When an indeterminate stage is retried due to fetch failure, spark will fail all children stages which depend on that stage across all jobs and fail all of them so they are retried. See [here](https://github.com/apache/spark/blob/b5a15035851bfba12ef1c68d10103cec42cbac0c/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L1807). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32753: [SPARK-34859][SQL] Handle column index when using vectorized Parquet reader
AmplabJenkins removed a comment on pull request #32753: URL: https://github.com/apache/spark/pull/32753#issuecomment-868924203 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/140345/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33096: [SPARK-35899][SQL] Utility to convert connector expressions to Catalyst
AmplabJenkins removed a comment on pull request #33096: URL: https://github.com/apache/spark/pull/33096#issuecomment-868924198 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/44881/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33095: [WIP][SPARK-35339][PYTHON] Improve unit tests for data-type-based basic operations
AmplabJenkins removed a comment on pull request #33095: URL: https://github.com/apache/spark/pull/33095#issuecomment-868924197 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/44882/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wangyum commented on a change in pull request #32932: [SPARK-35786][SQL] Add a new operator to rebalance the query output if AQE is enabled
wangyum commented on a change in pull request #32932: URL: https://github.com/apache/spark/pull/32932#discussion_r659102206 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala ## @@ -1351,6 +1351,31 @@ object RepartitionByExpression { } } +/** + * This operator is used to rebalance the output partitions of the given `child`, so that every + * partition is of a reasonable size (not too small and not too big). It also try its best to + * partition the child output by `partitionExpressions`. If there are skews, Spark will split the + * skewed partitions, to make these partitions not too big. This operator is useful when you need + * to write the result of `child` to a table, to avoid too small/big files. + * + * Note that, this operator only makes sense when AQE is enabled. + */ +case class RebalancePartitions( +partitionExpressions: Seq[Expression], +child: LogicalPlan) extends UnaryNode { Review comment: Make `RebalancePartitions` extends `RepartitionOperation`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32753: [SPARK-34859][SQL] Handle column index when using vectorized Parquet reader
AmplabJenkins commented on pull request #32753: URL: https://github.com/apache/spark/pull/32753#issuecomment-868924203 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/140345/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33096: [SPARK-35899][SQL] Utility to convert connector expressions to Catalyst
AmplabJenkins commented on pull request #33096: URL: https://github.com/apache/spark/pull/33096#issuecomment-868924198 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/44881/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33095: [WIP][SPARK-35339][PYTHON] Improve unit tests for data-type-based basic operations
AmplabJenkins commented on pull request #33095: URL: https://github.com/apache/spark/pull/33095#issuecomment-868924197 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/44882/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33078: [SPARK-35546][Shuffle] Enable push-based shuffle when multiple app attempts are enabled and manage concurrent access to the state in a
AmplabJenkins commented on pull request #33078: URL: https://github.com/apache/spark/pull/33078#issuecomment-868924160 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] aokolnychyi commented on a change in pull request #33096: [SPARK-35899][SQL] Utility to convert connector expressions to Catalyst
aokolnychyi commented on a change in pull request #33096: URL: https://github.com/apache/spark/pull/33096#discussion_r659099103 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/V2ExpressionUtils.scala ## @@ -0,0 +1,80 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions + +import org.apache.spark.sql.AnalysisException +import org.apache.spark.sql.catalyst.SQLConfHelper +import org.apache.spark.sql.catalyst.analysis.Resolver +import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan +import org.apache.spark.sql.connector.expressions.{Expression => V2Expression, FieldReference, IdentityTransform, NamedReference, NullOrdering => V2NullOrdering, SortDirection => V2SortDirection, SortValue} +import org.apache.spark.sql.errors.QueryCompilationErrors + +/** + * A utility class that converts public connector expressions into Catalyst expressions. + */ +private[sql] object V2ExpressionUtils extends SQLConfHelper { + import org.apache.spark.sql.connector.catalog.CatalogV2Implicits.MultipartIdentifierHelper + + def resolveRef[T <: NamedExpression]( Review comment: Yeah, totally! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on pull request #33096: [SPARK-35899][SQL] Utility to convert connector expressions to Catalyst
viirya commented on pull request #33096: URL: https://github.com/apache/spark/pull/33096#issuecomment-868902634 lgtm too -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #33096: [SPARK-35899][SQL] Utility to convert connector expressions to Catalyst
viirya commented on a change in pull request #33096: URL: https://github.com/apache/spark/pull/33096#discussion_r659099034 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/V2ExpressionUtils.scala ## @@ -0,0 +1,80 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions + +import org.apache.spark.sql.AnalysisException +import org.apache.spark.sql.catalyst.SQLConfHelper +import org.apache.spark.sql.catalyst.analysis.Resolver +import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan +import org.apache.spark.sql.connector.expressions.{Expression => V2Expression, FieldReference, IdentityTransform, NamedReference, NullOrdering => V2NullOrdering, SortDirection => V2SortDirection, SortValue} +import org.apache.spark.sql.errors.QueryCompilationErrors + +/** + * A utility class that converts public connector expressions into Catalyst expressions. + */ +private[sql] object V2ExpressionUtils extends SQLConfHelper { + import org.apache.spark.sql.connector.catalog.CatalogV2Implicits.MultipartIdentifierHelper + + def resolveRef[T <: NamedExpression]( Review comment: Oh, I see. I cannot tell it from this change. Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] aokolnychyi commented on a change in pull request #33096: [SPARK-35899][SQL] Utility to convert connector expressions to Catalyst
aokolnychyi commented on a change in pull request #33096: URL: https://github.com/apache/spark/pull/33096#discussion_r659098919 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/V2ExpressionUtils.scala ## @@ -0,0 +1,80 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions + +import org.apache.spark.sql.AnalysisException +import org.apache.spark.sql.catalyst.SQLConfHelper +import org.apache.spark.sql.catalyst.analysis.Resolver +import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan +import org.apache.spark.sql.connector.expressions.{Expression => V2Expression, FieldReference, IdentityTransform, NamedReference, NullOrdering => V2NullOrdering, SortDirection => V2SortDirection, SortValue} +import org.apache.spark.sql.errors.QueryCompilationErrors + +/** + * A utility class that converts public connector expressions into Catalyst expressions. + */ +private[sql] object V2ExpressionUtils extends SQLConfHelper { + import org.apache.spark.sql.connector.catalog.CatalogV2Implicits.MultipartIdentifierHelper + + def resolveRef[T <: NamedExpression]( Review comment: There are certain places where we need a bit more specific types. I could cast after calling but that does not seem better. For example, I need `Attribute` [here](https://github.com/apache/spark/pull/32921/files#diff-9dc37f97148227618575e1c56f6177260412561e7b44ef93eb5d7acf7a0fee52R76). I also need `AttributeReference` [here](https://github.com/apache/spark/pull/33008/files#diff-c5574d47ec4d5764008276aab9acc836e4526d3a95c3fcbbf9c53c67b05538f8R110). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ueshin closed pull request #33094: [SPARK-35466][PYTHON] Fix disallow_untyped_defs mypy checks for pyspark.pandas.data_type_ops.*
ueshin closed pull request #33094: URL: https://github.com/apache/spark/pull/33094 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ueshin commented on pull request #33094: [SPARK-35466][PYTHON] Fix disallow_untyped_defs mypy checks for pyspark.pandas.data_type_ops.*
ueshin commented on pull request #33094: URL: https://github.com/apache/spark/pull/33094#issuecomment-868902241 Thanks! merging to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32753: [SPARK-34859][SQL] Handle column index when using vectorized Parquet reader
SparkQA removed a comment on pull request #32753: URL: https://github.com/apache/spark/pull/32753#issuecomment-868829589 **[Test build #140345 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140345/testReport)** for PR 32753 at commit [`4e16bbd`](https://github.com/apache/spark/commit/4e16bbd0948db19696eb296cf4189319e3adc05a). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32753: [SPARK-34859][SQL] Handle column index when using vectorized Parquet reader
SparkQA commented on pull request #32753: URL: https://github.com/apache/spark/pull/32753#issuecomment-868901657 **[Test build #140345 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140345/testReport)** for PR 32753 at commit [`4e16bbd`](https://github.com/apache/spark/commit/4e16bbd0948db19696eb296cf4189319e3adc05a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #33096: [SPARK-35899][SQL] Utility to convert connector expressions to Catalyst
viirya commented on a change in pull request #33096: URL: https://github.com/apache/spark/pull/33096#discussion_r659097582 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/V2ExpressionUtils.scala ## @@ -0,0 +1,80 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions + +import org.apache.spark.sql.AnalysisException +import org.apache.spark.sql.catalyst.SQLConfHelper +import org.apache.spark.sql.catalyst.analysis.Resolver +import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan +import org.apache.spark.sql.connector.expressions.{Expression => V2Expression, FieldReference, IdentityTransform, NamedReference, NullOrdering => V2NullOrdering, SortDirection => V2SortDirection, SortValue} +import org.apache.spark.sql.errors.QueryCompilationErrors + +/** + * A utility class that converts public connector expressions into Catalyst expressions. + */ +private[sql] object V2ExpressionUtils extends SQLConfHelper { + import org.apache.spark.sql.connector.catalog.CatalogV2Implicits.MultipartIdentifierHelper + + def resolveRef[T <: NamedExpression]( Review comment: Do we need generic here? I think `resolve` just return `NamedExpression`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] aokolnychyi commented on pull request #33096: [SPARK-35899][SQL] Utility to convert connector expressions to Catalyst
aokolnychyi commented on pull request #33096: URL: https://github.com/apache/spark/pull/33096#issuecomment-868900677 Thank you, @dongjoon-hyun! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33096: [SPARK-35899][SQL] Utility to convert connector expressions to Catalyst
SparkQA commented on pull request #33096: URL: https://github.com/apache/spark/pull/33096#issuecomment-868900609 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44881/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ueshin opened a new pull request #33097: [SPARK-35901][PYTHON] Refine type hints in pyspark.pandas.window
ueshin opened a new pull request #33097: URL: https://github.com/apache/spark/pull/33097 ### What changes were proposed in this pull request? Refines type hints in `pyspark.pandas.window`. Also, some refactoring is included to clean up the type hierarchy of `Rolling` and `Expanding`. ### Why are the changes needed? We can use more strict type hints for functions in pyspark.pandas.window using the generic way. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing tests. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun closed pull request #33096: [SPARK-35899][SQL] Utility to convert connector expressions to Catalyst
dongjoon-hyun closed pull request #33096: URL: https://github.com/apache/spark/pull/33096 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #33096: [SPARK-35899][SQL] Utility to convert connector expressions to Catalyst
dongjoon-hyun commented on pull request #33096: URL: https://github.com/apache/spark/pull/33096#issuecomment-868900348 I checked the GitHub Action. There is one irrelevant failure and the others passed. ``` - SPARK-29022: Commands using SerDe provided in --hive.aux.jars.path *** FAILED *** (3 minutes, 19 seconds) ``` Merged to master for Apache Spark 3.2.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33095: [WIP][SPARK-35339][PYTHON] Improve unit tests for data-type-based basic operations
SparkQA commented on pull request #33095: URL: https://github.com/apache/spark/pull/33095#issuecomment-868900344 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44882/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33080: [SPARK-35728][SPARK-35778][FOLLOWUP][TESTS] Add test case to check multiply/divide of day-time interval and year-month interval of any
AmplabJenkins commented on pull request #33080: URL: https://github.com/apache/spark/pull/33080#issuecomment-868899123 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] aokolnychyi commented on a change in pull request #32921: [SPARK-35779][SQL] Dynamic filtering for Data Source V2
aokolnychyi commented on a change in pull request #32921: URL: https://github.com/apache/spark/pull/32921#discussion_r659095549 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/subquery.scala ## @@ -227,3 +228,14 @@ object ReuseSubquery extends Rule[SparkPlan] { } } } + +object PrepareScans extends Rule[SparkPlan] { + def apply(plan: SparkPlan): SparkPlan = { +val scans = plan.collect { + case scan: BatchScanExec => scan Review comment: Resolving this one too as it no longer applies. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] aokolnychyi commented on a change in pull request #32921: [SPARK-35779][SQL] Dynamic filtering for Data Source V2
aokolnychyi commented on a change in pull request #32921: URL: https://github.com/apache/spark/pull/32921#discussion_r659095329 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala ## @@ -96,6 +96,7 @@ case class AdaptiveSparkPlanExec( @transient private val queryStageOptimizerRules: Seq[Rule[SparkPlan]] = Seq( PlanAdaptiveDynamicPruningFilters(this), ReuseAdaptiveSubquery(context.subqueryCache), +PrepareScans, Review comment: I am resolving this thread to not mislead other reviewers. It no longer applies. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32921: [SPARK-35779][SQL] Dynamic filtering for Data Source V2
AmplabJenkins removed a comment on pull request #32921: URL: https://github.com/apache/spark/pull/32921#issuecomment-868896849 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/44879/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] aokolnychyi commented on pull request #33096: [SPARK-35899][SQL] Utility to convert connector expressions to Catalyst
aokolnychyi commented on pull request #33096: URL: https://github.com/apache/spark/pull/33096#issuecomment-868897551 Good call, @dongjoon-hyun. Added to the PR description. Could you check, please? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR closed pull request #33085: [SPARK-35894][BUILD] Introduce new style enforce to not import scala.collection.Seq/IndexedSeq
HeartSaVioR closed pull request #33085: URL: https://github.com/apache/spark/pull/33085 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32921: [SPARK-35779][SQL] Dynamic filtering for Data Source V2
AmplabJenkins commented on pull request #32921: URL: https://github.com/apache/spark/pull/32921#issuecomment-868896849 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/44879/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on pull request #33085: [SPARK-35894][BUILD] Introduce new style enforce to not import scala.collection.Seq/IndexedSeq
HeartSaVioR commented on pull request #33085: URL: https://github.com/apache/spark/pull/33085#issuecomment-868896779 Thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33096: [SPARK-35899][SQL] Utility to convert connector expressions to Catalyst
SparkQA commented on pull request #33096: URL: https://github.com/apache/spark/pull/33096#issuecomment-868895421 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44881/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33095: [WIP][SPARK-35339][PYTHON] Improve unit tests for data-type-based basic operations
SparkQA commented on pull request #33095: URL: https://github.com/apache/spark/pull/33095#issuecomment-868894980 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44882/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ueshin commented on pull request #33094: [SPARK-35466][PYTHON] Fix disallow_untyped_defs mypy checks for pyspark.pandas.data_type_ops.*
ueshin commented on pull request #33094: URL: https://github.com/apache/spark/pull/33094#issuecomment-868893445 cc @HyukjinKwon @itholic -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] github-actions[bot] commented on pull request #29113: [SPARK-32314][SHS] Add config to control whether log old format of stacktrace
github-actions[bot] commented on pull request #29113: URL: https://github.com/apache/spark/pull/29113#issuecomment-868891093 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] github-actions[bot] closed pull request #31840: [SPARK-34745][SQL] Unify overflow exception error message of integral types
github-actions[bot] closed pull request #31840: URL: https://github.com/apache/spark/pull/31840 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] github-actions[bot] commented on pull request #30763: [SPARK-31801][API][SHUFFLE] Register map output metadata
github-actions[bot] commented on pull request #30763: URL: https://github.com/apache/spark/pull/30763#issuecomment-868891088 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32921: [SPARK-35779][SQL] Dynamic filtering for Data Source V2
SparkQA commented on pull request #32921: URL: https://github.com/apache/spark/pull/32921#issuecomment-868890661 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44879/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31490: [SPARK-34365][AVRO] Add support for positional Catalyst-to-Avro schema matching
AmplabJenkins commented on pull request #31490: URL: https://github.com/apache/spark/pull/31490#issuecomment-868889338 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/44880/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31490: [SPARK-34365][AVRO] Add support for positional Catalyst-to-Avro schema matching
SparkQA commented on pull request #31490: URL: https://github.com/apache/spark/pull/31490#issuecomment-868889333 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44880/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33083: Allow sequences (tuples and lists) as pivot values argument in PySpark.
AmplabJenkins commented on pull request #33083: URL: https://github.com/apache/spark/pull/33083#issuecomment-86298 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #33095: [WIP][SPARK-35339][PYTHON] Improve unit tests for data-type-based basic operations
SparkQA removed a comment on pull request #33095: URL: https://github.com/apache/spark/pull/33095#issuecomment-868886086 **[Test build #140351 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140351/testReport)** for PR 33095 at commit [`48a364f`](https://github.com/apache/spark/commit/48a364f8228817cc3cef0946fd8184fbad70e829). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #31490: [SPARK-34365][AVRO] Add support for positional Catalyst-to-Avro schema matching
AmplabJenkins removed a comment on pull request #31490: URL: https://github.com/apache/spark/pull/31490#issuecomment-868885474 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/140349/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32787: [SPARK-35618][SQL] Resolve star expressions in subqueries using outer query plans
AmplabJenkins removed a comment on pull request #32787: URL: https://github.com/apache/spark/pull/32787#issuecomment-868885472 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/140339/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33095: [WIP][SPARK-35339][PYTHON] Improve unit tests for data-type-based basic operations
AmplabJenkins removed a comment on pull request #33095: URL: https://github.com/apache/spark/pull/33095#issuecomment-868886338 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/140351/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33095: [WIP][SPARK-35339][PYTHON] Improve unit tests for data-type-based basic operations
SparkQA commented on pull request #33095: URL: https://github.com/apache/spark/pull/33095#issuecomment-868886329 **[Test build #140351 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140351/testReport)** for PR 33095 at commit [`48a364f`](https://github.com/apache/spark/commit/48a364f8228817cc3cef0946fd8184fbad70e829). * This patch **fails Python style tests**. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33095: [WIP][SPARK-35339][PYTHON] Improve unit tests for data-type-based basic operations
AmplabJenkins commented on pull request #33095: URL: https://github.com/apache/spark/pull/33095#issuecomment-868886338 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/140351/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33095: [WIP][SPARK-35339][PYTHON] Improve unit tests for data-type-based basic operations
SparkQA commented on pull request #33095: URL: https://github.com/apache/spark/pull/33095#issuecomment-868886086 **[Test build #140351 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140351/testReport)** for PR 33095 at commit [`48a364f`](https://github.com/apache/spark/commit/48a364f8228817cc3cef0946fd8184fbad70e829). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33096: [SPARK-35899][SQL] Utility to convert connector expressions to Catalyst
SparkQA commented on pull request #33096: URL: https://github.com/apache/spark/pull/33096#issuecomment-868886081 **[Test build #140350 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140350/testReport)** for PR 33096 at commit [`3ac51da`](https://github.com/apache/spark/commit/3ac51da333f0400264081a3425e332646282c3a9). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32787: [SPARK-35618][SQL] Resolve star expressions in subqueries using outer query plans
AmplabJenkins commented on pull request #32787: URL: https://github.com/apache/spark/pull/32787#issuecomment-868885472 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/140339/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31490: [SPARK-34365][AVRO] Add support for positional Catalyst-to-Avro schema matching
AmplabJenkins commented on pull request #31490: URL: https://github.com/apache/spark/pull/31490#issuecomment-868885474 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/140349/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on pull request #33085: [SPARK-35894][BUILD] Introduce new style enforce to not import scala.collection.Seq/IndexedSeq
HeartSaVioR commented on pull request #33085: URL: https://github.com/apache/spark/pull/33085#issuecomment-868883808 GA build passed for Scala 2.13 build, and style check with new rule is now passed. @srowen Would it be good to go? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #33096: [SPARK-35899][SQL] Utility to convert connector expressions to Catalyst
dongjoon-hyun commented on pull request #33096: URL: https://github.com/apache/spark/pull/33096#issuecomment-868883368 cc @viirya -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #33096: [SPARK-35899][SQL] Utility to convert connector expressions to Catalyst
dongjoon-hyun commented on pull request #33096: URL: https://github.com/apache/spark/pull/33096#issuecomment-868883315 Thank you for pinging me, @aokolnychyi ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32921: [SPARK-35779][SQL] Dynamic filtering for Data Source V2
SparkQA commented on pull request #32921: URL: https://github.com/apache/spark/pull/32921#issuecomment-868882933 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44879/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] aokolnychyi commented on pull request #33096: [SPARK-35899][SQL] Utility to convert connector expressions to Catalyst
aokolnychyi commented on pull request #33096: URL: https://github.com/apache/spark/pull/33096#issuecomment-868882776 This PR contains a utility class I need for dynamic filtering. cc @sunchao @huaxingao @viirya @dongjoon-hyun @cloud-fan @HyukjinKwon @rdblue @holdenk -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] aokolnychyi commented on pull request #32921: [SPARK-35779][SQL] Dynamic filtering for Data Source V2
aokolnychyi commented on pull request #32921: URL: https://github.com/apache/spark/pull/32921#issuecomment-868882181 Submitted #33096 for the utility class. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] aokolnychyi opened a new pull request #33096: [SPARK-35899][SQL] Utility to convert connector expressions to Catalyst
aokolnychyi opened a new pull request #33096: URL: https://github.com/apache/spark/pull/33096 ### What changes were proposed in this pull request? This PR adds a utility to convert public connector expressions to Catalyst expressions. ### Why are the changes needed? These changes are needed as more and more places require this logic and it is better to implement it in a single place. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing tests. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31490: [SPARK-34365][AVRO] Add support for positional Catalyst-to-Avro schema matching
SparkQA commented on pull request #31490: URL: https://github.com/apache/spark/pull/31490#issuecomment-868882011 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44880/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ueshin commented on a change in pull request #33094: [SPARK-35466][PYTHON] Fix disallow_untyped_defs mypy checks for pyspark.pandas.data_type_ops.*
ueshin commented on a change in pull request #33094: URL: https://github.com/apache/spark/pull/33094#discussion_r659078486 ## File path: python/pyspark/pandas/data_type_ops/base.py ## @@ -65,6 +65,7 @@ T_IndexOps = TypeVar("T_IndexOps", bound="IndexOpsMixin") +IndexOpsLike = Union["Series", "Index"] Review comment: Good reference is here: https://stackoverflow.com/questions/58903906/whats-the-difference-between-a-constrained-typevar-and-a-union -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #31490: [SPARK-34365][AVRO] Add support for positional Catalyst-to-Avro schema matching
SparkQA removed a comment on pull request #31490: URL: https://github.com/apache/spark/pull/31490#issuecomment-868868541 **[Test build #140349 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140349/testReport)** for PR 31490 at commit [`a0b4ecd`](https://github.com/apache/spark/commit/a0b4ecd336b63b2bbad3ff80a30249cba590f053). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31490: [SPARK-34365][AVRO] Add support for positional Catalyst-to-Avro schema matching
SparkQA commented on pull request #31490: URL: https://github.com/apache/spark/pull/31490#issuecomment-868877249 **[Test build #140349 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140349/testReport)** for PR 31490 at commit [`a0b4ecd`](https://github.com/apache/spark/commit/a0b4ecd336b63b2bbad3ff80a30249cba590f053). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Victsm commented on a change in pull request #33034: WIP: [SPARK-32923][CORE][SHUFFLE] Handle indeterminate stage retries for push-based shuffle
Victsm commented on a change in pull request #33034: URL: https://github.com/apache/spark/pull/33034#discussion_r659076673 ## File path: common/network-common/src/main/java/org/apache/spark/network/client/TransportClient.java ## @@ -222,7 +223,7 @@ public void sendMergedBlockMetaReq( handler.addRpcRequest(requestId, callback); RpcChannelListener listener = new RpcChannelListener(requestId, callback); channel.writeAndFlush( - new MergedBlockMetaRequest(requestId, appId, shuffleId, reduceId)).addListener(listener); + new MergedBlockMetaRequest(requestId, appId, shuffleId, shuffleSequenceId, reduceId)).addListener(listener); Review comment: What about the following scenario: 1. An indeterminate stage generates the shuffle data for a given shuffle. 2. Downstream reduce stage experienced shuffle fetch failure, leading to retry of the indeterminate stage. 3. Tasks from the retry of the indeterminate stage start pushing blocks, which would lead to invalidating the shuffle data from the 1st attempt. 4. In the meantime, we might still have dangling tasks from the first failed reduce stage trying to fetch shuffle blocks corresponding to the 1st attempt of the indeterminate stage. Is the above scenario possible with indeterminate stage retry, and would we run into issues if the seq ID is only used on the push side but not the fetch side? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] xinrong-databricks commented on pull request #33094: [SPARK-35466][PYTHON] Fix disallow_untyped_defs mypy checks for pyspark.pandas.data_type_ops.*
xinrong-databricks commented on pull request #33094: URL: https://github.com/apache/spark/pull/33094#issuecomment-868875080 Thanks for working on that! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] xinrong-databricks commented on a change in pull request #33094: [SPARK-35466][PYTHON] Fix disallow_untyped_defs mypy checks for pyspark.pandas.data_type_ops.*
xinrong-databricks commented on a change in pull request #33094: URL: https://github.com/apache/spark/pull/33094#discussion_r659076301 ## File path: python/pyspark/pandas/data_type_ops/base.py ## @@ -65,6 +65,7 @@ T_IndexOps = TypeVar("T_IndexOps", bound="IndexOpsMixin") +IndexOpsLike = Union["Series", "Index"] Review comment: Why do we still need `IndexOpsLike` since we have `T_IndexOps`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32787: [SPARK-35618][SQL] Resolve star expressions in subqueries using outer query plans
SparkQA removed a comment on pull request #32787: URL: https://github.com/apache/spark/pull/32787#issuecomment-868762786 **[Test build #140339 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140339/testReport)** for PR 32787 at commit [`b093b97`](https://github.com/apache/spark/commit/b093b97e98668cecd7f0cf52ccc830c158e0b22c). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org