[GitHub] [spark] SparkQA removed a comment on pull request #33333: [SPARK-36129][BUILD] Upgrade commons-compress to 1.21 to deal with CVEs
SparkQA removed a comment on pull request #3: URL: https://github.com/apache/spark/pull/3#issuecomment-879558541 **[Test build #140993 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140993/testReport)** for PR 3 at commit [`ad3da13`](https://github.com/apache/spark/commit/ad3da133d369c9d3bcf587c5810f3d526b87ea61). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33333: [SPARK-36129][BUILD] Upgrade commons-compress to 1.21 to deal with CVEs
SparkQA commented on pull request #3: URL: https://github.com/apache/spark/pull/3#issuecomment-879614152 **[Test build #140993 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140993/testReport)** for PR 3 at commit [`ad3da13`](https://github.com/apache/spark/commit/ad3da133d369c9d3bcf587c5810f3d526b87ea61). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sarutak edited a comment on pull request #33333: [SPARK-36129][BUILD] Upgrade commons-compress to 1.21 to deal with CVEs
sarutak edited a comment on pull request #3: URL: https://github.com/apache/spark/pull/3#issuecomment-879613028 Yeah, I'll do it. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sunchao commented on a change in pull request #33330: [SPARK-36123][SQL] Parquet vectorized reader doesn't skip null values correctly
sunchao commented on a change in pull request #0: URL: https://github.com/apache/spark/pull/0#discussion_r669311658 ## File path: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedRleValuesReader.java ## @@ -358,7 +386,14 @@ public void skipIntegers(int total) { while (left > 0) { if (this.currentCount == 0) this.readNextGroup(); int n = Math.min(left, this.currentCount); - advance(n); + switch (mode) { +case RLE: + break; +case PACKED: + currentBufferIdx += n; + break; + } + currentCount -= n; Review comment: hmm somehow it survived - let me remove it for real. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #33330: [SPARK-36123][SQL] Parquet vectorized reader doesn't skip null values correctly
viirya commented on a change in pull request #0: URL: https://github.com/apache/spark/pull/0#discussion_r669311526 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetColumnIndexSuite.scala ## @@ -31,24 +31,25 @@ class ParquetColumnIndexSuite extends QueryTest with ParquetTest with SharedSpar * |---|-|-|---|---|---|---|---| * col_2 400 300 200 200 200 200 200 200 Review comment: And also the PR description. :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sarutak commented on pull request #33333: [SPARK-36129][BUILD] Upgrade commons-compress to 1.21 to deal with CVEs
sarutak commented on pull request #3: URL: https://github.com/apache/spark/pull/3#issuecomment-879613028 Yeah, I'll do it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #33333: [SPARK-36129][BUILD] Upgrade commons-compress to 1.21 to deal with CVEs
dongjoon-hyun commented on pull request #3: URL: https://github.com/apache/spark/pull/3#issuecomment-879612808 BTW, for the other old branches, we might need to revisit. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #33333: [SPARK-36129][BUILD] Upgrade commons-compress to 1.21 to deal with CVEs
dongjoon-hyun commented on pull request #3: URL: https://github.com/apache/spark/pull/3#issuecomment-879612523 @gengliangwang I landed this to branch-3.2 since this is related to some CVEs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun closed pull request #33333: [SPARK-36129][BUILD] Upgrade commons-compress to 1.21 to deal with CVEs
dongjoon-hyun closed pull request #3: URL: https://github.com/apache/spark/pull/3 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #33330: [SPARK-36123][SQL] Parquet vectorized reader doesn't skip null values correctly
viirya commented on a change in pull request #0: URL: https://github.com/apache/spark/pull/0#discussion_r669310575 ## File path: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedRleValuesReader.java ## @@ -358,7 +386,14 @@ public void skipIntegers(int total) { while (left > 0) { if (this.currentCount == 0) this.readNextGroup(); int n = Math.min(left, this.currentCount); - advance(n); + switch (mode) { +case RLE: + break; +case PACKED: + currentBufferIdx += n; + break; + } + currentCount -= n; Review comment: Do you remove `advance()` method? I don't see it is removed here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33253: [SPARK-36038][CORE] Speculation metrics summary at stage level
SparkQA commented on pull request #33253: URL: https://github.com/apache/spark/pull/33253#issuecomment-879611464 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45509/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sunchao commented on a change in pull request #33330: [SPARK-36123][SQL] Parquet vectorized reader doesn't skip null values correctly
sunchao commented on a change in pull request #0: URL: https://github.com/apache/spark/pull/0#discussion_r669309831 ## File path: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedRleValuesReader.java ## @@ -358,7 +386,14 @@ public void skipIntegers(int total) { while (left > 0) { if (this.currentCount == 0) this.readNextGroup(); int n = Math.min(left, this.currentCount); - advance(n); + switch (mode) { +case RLE: + break; +case PACKED: + currentBufferIdx += n; + break; + } + currentCount -= n; Review comment: no - it used to call `advance` which does the same thing, but since we removed the method I just replaced it with the old method body -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #33330: [SPARK-36123][SQL] Parquet vectorized reader doesn't skip null values correctly
dongjoon-hyun commented on pull request #0: URL: https://github.com/apache/spark/pull/0#issuecomment-879611200 I reviewed and merged https://github.com/apache/spark/pull/4 . Please rebase this PR to the master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun closed pull request #33334: [SPARK-36131][SQL][TEST] Refactor ParquetColumnIndexSuite
dongjoon-hyun closed pull request #4: URL: https://github.com/apache/spark/pull/4 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sunchao commented on a change in pull request #33330: [SPARK-36123][SQL] Parquet vectorized reader doesn't skip null values correctly
sunchao commented on a change in pull request #0: URL: https://github.com/apache/spark/pull/0#discussion_r669309298 ## File path: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedRleValuesReader.java ## @@ -255,6 +253,36 @@ private void readBatchInternal( state.advanceOffsetAndRowId(offset, rowId); } + /** + * Skip the next `n` values (either null or non-null) from this definition level reader and + * `valueReader`. + */ + private void skipValues( + int n, + ParquetReadState state, + VectorizedValuesReader valuesReader, + ParquetVectorUpdater updater) { +while (n > 0) { + if (this.currentCount == 0) this.readNextGroup(); + int num = Math.min(n, this.currentCount); + switch (mode) { +case RLE: + if (currentValue == state.maxDefinitionLevel) { +updater.skipValues(num, valuesReader); + } Review comment: yes the else case is when the value is null - let me add some comments. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #33330: [SPARK-36123][SQL] Parquet vectorized reader doesn't skip null values correctly
viirya commented on a change in pull request #0: URL: https://github.com/apache/spark/pull/0#discussion_r669309094 ## File path: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedRleValuesReader.java ## @@ -358,7 +386,14 @@ public void skipIntegers(int total) { while (left > 0) { if (this.currentCount == 0) this.readNextGroup(); int n = Math.min(left, this.currentCount); - advance(n); + switch (mode) { +case RLE: + break; +case PACKED: + currentBufferIdx += n; + break; + } + currentCount -= n; Review comment: Hmm, this is a old bug? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #33334: [SPARK-36131][SQL][TEST] Refactor ParquetColumnIndexSuite
dongjoon-hyun commented on pull request #4: URL: https://github.com/apache/spark/pull/4#issuecomment-879610496 Merged to master/3.2. cc @gengliangwang . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sunchao commented on a change in pull request #33330: [SPARK-36123][SQL] Parquet vectorized reader doesn't skip null values correctly
sunchao commented on a change in pull request #0: URL: https://github.com/apache/spark/pull/0#discussion_r669308991 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetColumnIndexSuite.scala ## @@ -31,24 +31,25 @@ class ParquetColumnIndexSuite extends QueryTest with ParquetTest with SharedSpar * |---|-|-|---|---|---|---|---| * col_2 400 300 200 200 200 200 200 200 Review comment: opps you are right - it should be 400, 300, 300, 200, 200, 200, 200, 200 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #33334: [SPARK-36131][SQL][TEST] Refactor ParquetColumnIndexSuite
dongjoon-hyun commented on a change in pull request #4: URL: https://github.com/apache/spark/pull/4#discussion_r669308630 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetColumnIndexSuite.scala ## @@ -31,96 +40,56 @@ class ParquetColumnIndexSuite extends QueryTest with ParquetTest with SharedSpar * |---|-|-|---|---|---|---|---| * col_2 400 300 200 200 200 200 200 200 */ - def checkUnalignedPages(actions: (DataFrame => DataFrame)*): Unit = { -withTempPath(file => { - val ds = spark.range(0, 2000).map(i => (i, i + ":" + "o" * (i / 100).toInt)) - ds.coalesce(1) - .write - .option("parquet.page.size", "4096") - .parquet(file.getCanonicalPath) + def checkUnalignedPages(df: DataFrame)(actions: (DataFrame => DataFrame)*): Unit = { +Seq(true, false).foreach { enableDictionary => Review comment: I added this to the PR description. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #33330: [SPARK-36123][SQL] Parquet vectorized reader doesn't skip null values correctly
viirya commented on a change in pull request #0: URL: https://github.com/apache/spark/pull/0#discussion_r669308507 ## File path: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedRleValuesReader.java ## @@ -255,6 +253,36 @@ private void readBatchInternal( state.advanceOffsetAndRowId(offset, rowId); } + /** + * Skip the next `n` values (either null or non-null) from this definition level reader and + * `valueReader`. + */ + private void skipValues( + int n, + ParquetReadState state, + VectorizedValuesReader valuesReader, + ParquetVectorUpdater updater) { +while (n > 0) { + if (this.currentCount == 0) this.readNextGroup(); + int num = Math.min(n, this.currentCount); + switch (mode) { +case RLE: + if (currentValue == state.maxDefinitionLevel) { +updater.skipValues(num, valuesReader); + } + break; +case PACKED: + for (int i = 0; i < num; ++i) { +if (currentBuffer[currentBufferIdx++] == state.maxDefinitionLevel) { + updater.skipValues(1, valuesReader); +} Review comment: ditto. If so, maybe it is good to add a comment. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #33330: [SPARK-36123][SQL] Parquet vectorized reader doesn't skip null values correctly
viirya commented on a change in pull request #0: URL: https://github.com/apache/spark/pull/0#discussion_r669308360 ## File path: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedRleValuesReader.java ## @@ -255,6 +253,36 @@ private void readBatchInternal( state.advanceOffsetAndRowId(offset, rowId); } + /** + * Skip the next `n` values (either null or non-null) from this definition level reader and + * `valueReader`. + */ + private void skipValues( + int n, + ParquetReadState state, + VectorizedValuesReader valuesReader, + ParquetVectorUpdater updater) { +while (n > 0) { + if (this.currentCount == 0) this.readNextGroup(); + int num = Math.min(n, this.currentCount); + switch (mode) { +case RLE: + if (currentValue == state.maxDefinitionLevel) { +updater.skipValues(num, valuesReader); + } Review comment: Is else case for null value? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33323: [SPARK-35739][SQL] Add Java-compatible Dataset.join overloads
AmplabJenkins removed a comment on pull request #33323: URL: https://github.com/apache/spark/pull/33323#issuecomment-879608970 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45508/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33077: [SPARK-34892][SS] Introduce MergingSortWithSessionWindowStateIterator sorting input rows and rows in state efficiently
AmplabJenkins removed a comment on pull request #33077: URL: https://github.com/apache/spark/pull/33077#issuecomment-879608822 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/140989/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33323: [SPARK-35739][SQL] Add Java-compatible Dataset.join overloads
AmplabJenkins commented on pull request #33323: URL: https://github.com/apache/spark/pull/33323#issuecomment-879608970 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45508/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33323: [SPARK-35739][SQL] Add Java-compatible Dataset.join overloads
SparkQA commented on pull request #33323: URL: https://github.com/apache/spark/pull/33323#issuecomment-879608954 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45508/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhouyejoe commented on a change in pull request #33078: [SPARK-35546][Shuffle] Enable push-based shuffle when multiple app attempts are enabled and manage concurrent access to the sta
zhouyejoe commented on a change in pull request #33078: URL: https://github.com/apache/spark/pull/33078#discussion_r669306705 ## File path: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java ## @@ -567,7 +598,8 @@ public void onData(String streamId, ByteBuffer buf) throws IOException { // memory, while still providing the necessary guarantee. synchronized (partitionInfo) { Map shufflePartitions = - mergeManager.partitions.get(partitionInfo.appShuffleId); + mergeManager.appsShuffleInfo.get(partitionInfo.appId).partitions Review comment: Updated a UT to test this case where a NullPointerException will be thrown out. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33077: [SPARK-34892][SS] Introduce MergingSortWithSessionWindowStateIterator sorting input rows and rows in state efficiently
AmplabJenkins commented on pull request #33077: URL: https://github.com/apache/spark/pull/33077#issuecomment-879608822 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/140989/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cfmcgrady commented on pull request #33335: [SPARK-36130][SQL] UnwrapCastInBinaryComparison should skip In expression when in.list contains an expression that is not literal
cfmcgrady commented on pull request #5: URL: https://github.com/apache/spark/pull/5#issuecomment-879608835 cc @allisonwang-db @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #33077: [SPARK-34892][SS] Introduce MergingSortWithSessionWindowStateIterator sorting input rows and rows in state efficiently
SparkQA removed a comment on pull request #33077: URL: https://github.com/apache/spark/pull/33077#issuecomment-879512560 **[Test build #140989 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140989/testReport)** for PR 33077 at commit [`e4a74a3`](https://github.com/apache/spark/commit/e4a74a3375ff0de4ac72aa951d36aa36f2492e01). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhouyejoe commented on a change in pull request #33078: [SPARK-35546][Shuffle] Enable push-based shuffle when multiple app attempts are enabled and manage concurrent access to the sta
zhouyejoe commented on a change in pull request #33078: URL: https://github.com/apache/spark/pull/33078#discussion_r669306864 ## File path: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java ## @@ -403,38 +394,78 @@ public MergeStatuses finalizeShuffleMerge(FinalizeShuffleMerge msg) throws IOExc reduceIds.add(partition.reduceId); sizes.add(partition.getLastChunkOffset()); } catch (IOException ioe) { -logger.warn("Exception while finalizing shuffle partition {} {} {}", msg.appId, - msg.shuffleId, partition.reduceId, ioe); +logger.warn("Exception while finalizing shuffle partition {}_{} {} {}", msg.appId, + msg.attemptId, msg.shuffleId, partition.reduceId, ioe); } finally { partition.closeAllFiles(); -// The partition should be removed after the files are written so that any new stream -// for the same reduce partition will see that the data file exists. -partitionsIter.remove(); } } } mergeStatuses = new MergeStatuses(msg.shuffleId, bitmaps.toArray(new RoaringBitmap[bitmaps.size()]), Ints.toArray(reduceIds), Longs.toArray(sizes)); } -partitions.remove(appShuffleId); -logger.info("Finalized shuffle {} from Application {}.", msg.shuffleId, msg.appId); +logger.info("Finalized shuffle {} from Application {}_{}.", + msg.shuffleId, msg.appId, msg.attemptId); return mergeStatuses; } @Override public void registerExecutor(String appId, ExecutorShuffleInfo executorInfo) { if (logger.isDebugEnabled()) { logger.debug("register executor with RemoteBlockPushResolver {} local-dirs {} " -+ "num sub-dirs {}", appId, Arrays.toString(executorInfo.localDirs), - executorInfo.subDirsPerLocalDir); ++ "num sub-dirs {} shuffleManager {}", appId, Arrays.toString(executorInfo.localDirs), +executorInfo.subDirsPerLocalDir, executorInfo.shuffleManager); +} +String shuffleManagerMeta = executorInfo.shuffleManager; +if (shuffleManagerMeta.contains(":")) { + String mergeDirInfo = shuffleManagerMeta.substring(shuffleManagerMeta.indexOf(":") + 1); + try { +ObjectMapper mapper = new ObjectMapper(); +MergeDirectoryMeta mergeDirectoryMeta = + mapper.readValue(mergeDirInfo, MergeDirectoryMeta.class); +if (mergeDirectoryMeta.attemptId == ATTEMPT_ID_UNDEFINED) { + // When attemptId is -1, there is no attemptId stored in the ExecutorShuffleInfo. + // Only the first ExecutorRegister message can register the merge dirs + appsShuffleInfo.computeIfAbsent(appId, id -> +new AppShuffleInfo( + appId, mergeDirectoryMeta.attemptId, + new AppPathsInfo(appId, executorInfo.localDirs, +mergeDirectoryMeta.mergeDir, executorInfo.subDirsPerLocalDir) +)); +} else { + // If attemptId is not -1, there is attemptId stored in the ExecutorShuffleInfo. + // The first ExecutorRegister message from the same application attempt wil register + // the merge dirs in External Shuffle Service. Any later ExecutorRegister message + // from the same application attempt will not override the merge dirs. But it can + // be overridden by ExecutorRegister message from newer application attempt, + // and former attempts' shuffle partitions information will also be cleaned up. + ConcurrentMap appShuffleInfoToBeCleanedUp = +Maps.newConcurrentMap(); + appsShuffleInfo.compute(appId, (id, appShuffleInfo) -> { +if (appShuffleInfo == null || (appShuffleInfo != null + && mergeDirectoryMeta.attemptId > appShuffleInfo.attemptId)) { + appShuffleInfoToBeCleanedUp.putIfAbsent(appShuffleInfo.attemptId, appShuffleInfo); + appShuffleInfo = +new AppShuffleInfo( + appId, mergeDirectoryMeta.attemptId, + new AppPathsInfo(appId, executorInfo.localDirs, +mergeDirectoryMeta.mergeDir, executorInfo.subDirsPerLocalDir)); +} +return appShuffleInfo; + }); + for (AppShuffleInfo appShuffleInfo: appShuffleInfoToBeCleanedUp.values()) { +logger.info("Remove shuffle info for {}_{} as new application attempt registered", + appId, appShuffleInfo.attemptId); +appShuffleInfo.cleanupShufflePartitionInfo(); Review comment: Added UT to check if the channels are closed and set to null. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For
[GitHub] [spark] SparkQA commented on pull request #33077: [SPARK-34892][SS] Introduce MergingSortWithSessionWindowStateIterator sorting input rows and rows in state efficiently
SparkQA commented on pull request #33077: URL: https://github.com/apache/spark/pull/33077#issuecomment-879607967 **[Test build #140989 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140989/testReport)** for PR 33077 at commit [`e4a74a3`](https://github.com/apache/spark/commit/e4a74a3375ff0de4ac72aa951d36aa36f2492e01). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class MergingSortWithSessionWindowStateIterator(` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhouyejoe commented on a change in pull request #33078: [SPARK-35546][Shuffle] Enable push-based shuffle when multiple app attempts are enabled and manage concurrent access to the sta
zhouyejoe commented on a change in pull request #33078: URL: https://github.com/apache/spark/pull/33078#discussion_r669306705 ## File path: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java ## @@ -567,7 +598,8 @@ public void onData(String streamId, ByteBuffer buf) throws IOException { // memory, while still providing the necessary guarantee. synchronized (partitionInfo) { Map shufflePartitions = - mergeManager.partitions.get(partitionInfo.appShuffleId); + mergeManager.appsShuffleInfo.get(partitionInfo.appId).partitions Review comment: Added a UT to test this case where a NullPointerException will be thrown out. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #33330: [SPARK-36123][SQL] Parquet vectorized reader doesn't skip null values correctly
viirya commented on a change in pull request #0: URL: https://github.com/apache/spark/pull/0#discussion_r669306577 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetColumnIndexSuite.scala ## @@ -31,24 +31,25 @@ class ParquetColumnIndexSuite extends QueryTest with ParquetTest with SharedSpar * |---|-|-|---|---|---|---|---| * col_2 400 300 200 200 200 200 200 200 Review comment: BTW, about the layout, the total of col_1 or col_2 is all 2000? But looks like col_2 has only 1900? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #33334: [SPARK-36131][SQL][TEST] Refactor ParquetColumnIndexSuite
dongjoon-hyun commented on a change in pull request #4: URL: https://github.com/apache/spark/pull/4#discussion_r669305829 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetColumnIndexSuite.scala ## @@ -31,96 +40,56 @@ class ParquetColumnIndexSuite extends QueryTest with ParquetTest with SharedSpar * |---|-|-|---|---|---|---|---| * col_2 400 300 200 200 200 200 200 200 */ - def checkUnalignedPages(actions: (DataFrame => DataFrame)*): Unit = { -withTempPath(file => { - val ds = spark.range(0, 2000).map(i => (i, i + ":" + "o" * (i / 100).toInt)) - ds.coalesce(1) - .write - .option("parquet.page.size", "4096") - .parquet(file.getCanonicalPath) + def checkUnalignedPages(df: DataFrame)(actions: (DataFrame => DataFrame)*): Unit = { +Seq(true, false).foreach { enableDictionary => Review comment: Thank you for adding this! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33335: [SPARK-36130][SQL] Fix UnwrapCastInBinaryComparison bug.
AmplabJenkins commented on pull request #5: URL: https://github.com/apache/spark/pull/5#issuecomment-879606997 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cfmcgrady opened a new pull request #33335: [SPARK-36130][SQL] Fix UnwrapCastInBinaryComparison bug.
cfmcgrady opened a new pull request #5: URL: https://github.com/apache/spark/pull/5 ### What changes were proposed in this pull request? This PR fix rule `UnwrapCastInBinaryComparison` bug. Rule UnwrapCastInBinaryComparison should skip In expression when in.list contains an expression that is not literal. - In Before this pr, the following example will throw an exception. ```scala withTable("tbl") { sql("CREATE TABLE tbl (d decimal(33, 27)) USING PARQUET") sql("SELECT d FROM tbl WHERE d NOT IN (d + 1)") } ``` - InSet As the analyzer guarantee that all the elements in the `inSet.hset` are literal, so this is not an issue for `InSet`. https://github.com/apache/spark/blob/fbf53dee37129a493a4e5d5a007625b35f44fbda/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala#L264-L279 ### Does this PR introduce _any_ user-facing change? No, only bug fix. ### How was this patch tested? New test. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #33330: [SPARK-36123][SQL] Parquet vectorized reader doesn't skip null values correctly
dongjoon-hyun commented on pull request #0: URL: https://github.com/apache/spark/pull/0#issuecomment-879605733 Thank you, @sunchao . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32401: [SPARK-35276][CORE] Calculate checksum for shuffle data and write as checksum file
SparkQA commented on pull request #32401: URL: https://github.com/apache/spark/pull/32401#issuecomment-879603860 **[Test build #141000 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141000/testReport)** for PR 32401 at commit [`230648c`](https://github.com/apache/spark/commit/230648ce0dcc1762d9690aecfd63568721f82e4e). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33081: [SPARK-34893][SS] Support session window natively
SparkQA commented on pull request #33081: URL: https://github.com/apache/spark/pull/33081#issuecomment-879603643 **[Test build #140999 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140999/testReport)** for PR 33081 at commit [`0af2fd1`](https://github.com/apache/spark/commit/0af2fd12f2bcd09ea3b788b9c6ec99c022199398). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33258: [SPARK-36037][SQL] Support ANSI SQL LOCALTIMESTAMP datetime value function
SparkQA commented on pull request #33258: URL: https://github.com/apache/spark/pull/33258#issuecomment-879603580 **[Test build #140998 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140998/testReport)** for PR 33258 at commit [`a843bc3`](https://github.com/apache/spark/commit/a843bc3efe1f52dd72f9f5bf5028f1e3e5aaa0a1). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33334: [SPARK-36131][SQL][TEST] Refactor ParquetColumnIndexSuite
SparkQA commented on pull request #4: URL: https://github.com/apache/spark/pull/4#issuecomment-879603443 **[Test build #140996 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140996/testReport)** for PR 4 at commit [`9502871`](https://github.com/apache/spark/commit/95028716f1a69fc48ec5d1f6e1228f458574b65f). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33324: [SPARK-36093][SQL] RemoveRedundantAliases should not change Command's parameter's expression's name
SparkQA commented on pull request #33324: URL: https://github.com/apache/spark/pull/33324#issuecomment-879603497 **[Test build #140997 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140997/testReport)** for PR 33324 at commit [`60401a1`](https://github.com/apache/spark/commit/60401a1062c9af68a3d2d55ecbd72f60fc1cf142). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33253: [SPARK-36038][CORE] Speculation metrics summary at stage level
SparkQA commented on pull request #33253: URL: https://github.com/apache/spark/pull/33253#issuecomment-879598200 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45509/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33323: [SPARK-35739][SQL] Add Java-compatible Dataset.join overloads
SparkQA commented on pull request #33323: URL: https://github.com/apache/spark/pull/33323#issuecomment-879596706 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45508/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on a change in pull request #32401: [SPARK-35276][CORE] Calculate checksum for shuffle data and write as checksum file
HeartSaVioR commented on a change in pull request #32401: URL: https://github.com/apache/spark/pull/32401#discussion_r669288375 ## File path: core/src/main/java/org/apache/spark/shuffle/checksum/ShuffleChecksumHelper.java ## @@ -0,0 +1,83 @@ +package org.apache.spark.shuffle.checksum; Review comment: RAT is a license checker. > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140992/console ``` Running Apache RAT checks Attempting to fetch rat Could not find Apache license headers in the following files: !? /home/jenkins/workspace/SparkPullRequestBuilder/core/src/main/java/org/apache/spark/shuffle/checksum/ShuffleChecksumHelper.java [error] running /home/jenkins/workspace/SparkPullRequestBuilder/dev/check-license ; received return code 1 ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on pull request #33258: [SPARK-36037][SQL] Support ANSI SQL LOCALTIMESTAMP datetime value function
beliefer commented on pull request #33258: URL: https://github.com/apache/spark/pull/33258#issuecomment-879587025 retest this please -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sarutak commented on a change in pull request #33253: [SPARK-36038][CORE] Speculation metrics summary at stage level
sarutak commented on a change in pull request #33253: URL: https://github.com/apache/spark/pull/33253#discussion_r669284266 ## File path: core/src/test/resources/HistoryServerExpectations/running_app_list_json_expectation.json ## @@ -1 +1 @@ -[ ] +[ ] Review comment: Unnecessary change here. ## File path: core/src/test/resources/spark-events/application_1625351839633_843287 ## @@ -0,0 +1,37 @@ +{"Event":"SparkListenerLogStart","Spark Version":"3.3.0-SNAPSHOT"} Review comment: This file need to be added to `.rat-exclude`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on a change in pull request #32401: [SPARK-35276][CORE] Calculate checksum for shuffle data and write as checksum file
Ngone51 commented on a change in pull request #32401: URL: https://github.com/apache/spark/pull/32401#discussion_r669282673 ## File path: core/src/main/java/org/apache/spark/shuffle/checksum/ShuffleChecksumHelper.java ## @@ -0,0 +1,83 @@ +package org.apache.spark.shuffle.checksum; Review comment: oh yeah, this's definately a mistake. Though, it doesn't seem to be related to `RAT` failure. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sunchao commented on pull request #33330: [SPARK-36123][SQL] Parquet vectorized reader doesn't skip null values correctly
sunchao commented on pull request #0: URL: https://github.com/apache/spark/pull/0#issuecomment-879583404 thanks @gengliangwang - I opened #4 for this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sunchao opened a new pull request #33334: [SPARK-35743][SQL][TEST] Refactor ParquetColumnIndexSuite
sunchao opened a new pull request #4: URL: https://github.com/apache/spark/pull/4 ### What changes were proposed in this pull request? Refactor `ParquetColumnIndexSuite` and allow better code reuse. ### Why are the changes needed? A few methods in the test suite can share the same utility method `checkUnalignedPages` so it's better to do that and remove code duplication. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing tests. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #32872: [SPARK-35639][SQL] Make hasCoalescedPartition return true if something was actually coalesced
cloud-fan commented on a change in pull request #32872: URL: https://github.com/apache/spark/pull/32872#discussion_r669277698 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CustomShuffleReaderExec.scala ## @@ -87,8 +87,15 @@ case class CustomShuffleReaderExec private( Iterator(desc) } - def hasCoalescedPartition: Boolean = -partitionSpecs.exists(_.isInstanceOf[CoalescedPartitionSpec]) + /** + * Returns true iff some non-empty partitions were combined + */ + def hasCoalescedPartition: Boolean = { +partitionSpecs.exists { + case s: CoalescedPartitionSpec => s.endReducerIndex - s.startReducerIndex > 1 Review comment: ``` // If all input RDDs have 0 partition, we create an empty partition for every shuffle reader. if (validMetrics.isEmpty) { return Seq.fill(numShuffles)(Seq(CoalescedPartitionSpec(0, 0, 0))) } ``` Shall we use `CoalescedPartitionSpec(0, numReducers, 0)`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33253: [SPARK-36038][CORE] Speculation metrics summary at stage level
AmplabJenkins removed a comment on pull request #33253: URL: https://github.com/apache/spark/pull/33253#issuecomment-879580591 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/140995/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33253: [SPARK-36038][CORE] Speculation metrics summary at stage level
SparkQA commented on pull request #33253: URL: https://github.com/apache/spark/pull/33253#issuecomment-879580580 **[Test build #140995 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140995/testReport)** for PR 33253 at commit [`77920b8`](https://github.com/apache/spark/commit/77920b86a9acdff45d9d35e3179b9daafe3bb84f). * This patch **fails RAT tests**. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #33253: [SPARK-36038][CORE] Speculation metrics summary at stage level
SparkQA removed a comment on pull request #33253: URL: https://github.com/apache/spark/pull/33253#issuecomment-879580409 **[Test build #140995 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140995/testReport)** for PR 33253 at commit [`77920b8`](https://github.com/apache/spark/commit/77920b86a9acdff45d9d35e3179b9daafe3bb84f). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33253: [SPARK-36038][CORE] Speculation metrics summary at stage level
AmplabJenkins commented on pull request #33253: URL: https://github.com/apache/spark/pull/33253#issuecomment-879580591 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/140995/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33253: [SPARK-36038][CORE] Speculation metrics summary at stage level
SparkQA commented on pull request #33253: URL: https://github.com/apache/spark/pull/33253#issuecomment-879580409 **[Test build #140995 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140995/testReport)** for PR 33253 at commit [`77920b8`](https://github.com/apache/spark/commit/77920b86a9acdff45d9d35e3179b9daafe3bb84f). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33174: [SPARK-35721][PYTHON] Path level discover for python unittests
AmplabJenkins removed a comment on pull request #33174: URL: https://github.com/apache/spark/pull/33174#issuecomment-879578133 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/140991/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] venkata91 commented on pull request #33253: [SPARK-36038][CORE] Speculation metrics summary at stage level
venkata91 commented on pull request #33253: URL: https://github.com/apache/spark/pull/33253#issuecomment-879579744 > @venkata91 Could you fix the style issue first? > > ``` > [error] /home/runner/work/spark/spark/core/src/test/scala/org/apache/spark/deploy/history/HistoryServerSuite.scala:195: File line length exceeds 100 characters > ``` Yeah fixed it. Also now Github actions seem to run which is good. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33174: [SPARK-35721][PYTHON] Path level discover for python unittests
AmplabJenkins commented on pull request #33174: URL: https://github.com/apache/spark/pull/33174#issuecomment-879578133 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/140991/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33323: [SPARK-35739][SQL] Add Java-compatible Dataset.join overloads
AmplabJenkins removed a comment on pull request #33323: URL: https://github.com/apache/spark/pull/33323#issuecomment-879136851 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #33174: [SPARK-35721][PYTHON] Path level discover for python unittests
SparkQA removed a comment on pull request #33174: URL: https://github.com/apache/spark/pull/33174#issuecomment-879537921 **[Test build #140991 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140991/testReport)** for PR 33174 at commit [`c6d4f21`](https://github.com/apache/spark/commit/c6d4f21ca368bbc7ba4236dcd9d09904e7b82e5b). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33323: [SPARK-35739][SQL] Add Java-compatible Dataset.join overloads
SparkQA commented on pull request #33323: URL: https://github.com/apache/spark/pull/33323#issuecomment-879577688 **[Test build #140994 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140994/testReport)** for PR 33323 at commit [`7beee40`](https://github.com/apache/spark/commit/7beee40113ccf7317143c51b123bca0d65d9a0c1). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33174: [SPARK-35721][PYTHON] Path level discover for python unittests
SparkQA commented on pull request #33174: URL: https://github.com/apache/spark/pull/33174#issuecomment-879577612 **[Test build #140991 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140991/testReport)** for PR 33174 at commit [`c6d4f21`](https://github.com/apache/spark/commit/c6d4f21ca368bbc7ba4236dcd9d09904e7b82e5b). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33077: [SPARK-34892][SS] Introduce MergingSortWithSessionWindowStateIterator sorting input rows and rows in state efficiently
AmplabJenkins removed a comment on pull request #33077: URL: https://github.com/apache/spark/pull/33077#issuecomment-879577172 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/140987/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33330: [SPARK-36123][SQL] Parquet vectorized reader doesn't skip null values correctly
AmplabJenkins removed a comment on pull request #0: URL: https://github.com/apache/spark/pull/0#issuecomment-879577174 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/140986/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32401: [SPARK-35276][CORE] Calculate checksum for shuffle data and write as checksum file
AmplabJenkins removed a comment on pull request #32401: URL: https://github.com/apache/spark/pull/32401#issuecomment-879577176 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45506/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33258: [SPARK-36037][SQL] Support ANSI SQL LOCALTIMESTAMP datetime value function
AmplabJenkins removed a comment on pull request #33258: URL: https://github.com/apache/spark/pull/33258#issuecomment-879577173 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/140990/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33333: [SPARK-36129][BUILD] Upgrade commons-compress to 1.21 to deal with CVEs
AmplabJenkins removed a comment on pull request #3: URL: https://github.com/apache/spark/pull/3#issuecomment-879577171 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45507/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33333: [SPARK-36129][BUILD] Upgrade commons-compress to 1.21 to deal with CVEs
AmplabJenkins commented on pull request #3: URL: https://github.com/apache/spark/pull/3#issuecomment-879577171 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45507/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33258: [SPARK-36037][SQL] Support ANSI SQL LOCALTIMESTAMP datetime value function
AmplabJenkins commented on pull request #33258: URL: https://github.com/apache/spark/pull/33258#issuecomment-879577173 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/140990/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33077: [SPARK-34892][SS] Introduce MergingSortWithSessionWindowStateIterator sorting input rows and rows in state efficiently
AmplabJenkins commented on pull request #33077: URL: https://github.com/apache/spark/pull/33077#issuecomment-879577172 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/140987/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33330: [SPARK-36123][SQL] Parquet vectorized reader doesn't skip null values correctly
AmplabJenkins commented on pull request #0: URL: https://github.com/apache/spark/pull/0#issuecomment-879577174 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/140986/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32401: [SPARK-35276][CORE] Calculate checksum for shuffle data and write as checksum file
AmplabJenkins commented on pull request #32401: URL: https://github.com/apache/spark/pull/32401#issuecomment-879577176 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45506/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #33332: [SQL] Warn if less files visible after stats write
HyukjinKwon commented on a change in pull request #2: URL: https://github.com/apache/spark/pull/2#discussion_r669267871 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/BasicWriteStatsTracker.scala ## @@ -166,7 +166,7 @@ class BasicWriteTaskStatsTracker(hadoopConf: Configuration) } if (numSubmittedFiles != numFiles) { - logInfo(s"Expected $numSubmittedFiles files, but only saw $numFiles. " + + logWarning(s"Expected $numSubmittedFiles files, but only saw $numFiles. " + Review comment: WDYT @steveloughran ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33333: [SPARK-36129][BUILD] Upgrade commons-compress to 1.21 to deal with CVEs
SparkQA commented on pull request #3: URL: https://github.com/apache/spark/pull/3#issuecomment-879572572 Kubernetes integration test unable to build dist. exiting with code: 1 URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45507/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #33258: [SPARK-36037][SQL] Support ANSI SQL LOCALTIMESTAMP datetime value function
SparkQA removed a comment on pull request #33258: URL: https://github.com/apache/spark/pull/33258#issuecomment-879537873 **[Test build #140990 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140990/testReport)** for PR 33258 at commit [`a843bc3`](https://github.com/apache/spark/commit/a843bc3efe1f52dd72f9f5bf5028f1e3e5aaa0a1). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #33332: [SQL] Warn if less files visible after stats write
HyukjinKwon commented on pull request #2: URL: https://github.com/apache/spark/pull/2#issuecomment-879571129 @tooptoop4 please refer to https://spark.apache.org/contributing.html and make the PR description and title properly with a jira. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33258: [SPARK-36037][SQL] Support ANSI SQL LOCALTIMESTAMP datetime value function
SparkQA commented on pull request #33258: URL: https://github.com/apache/spark/pull/33258#issuecomment-879571046 **[Test build #140990 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140990/testReport)** for PR 33258 at commit [`a843bc3`](https://github.com/apache/spark/commit/a843bc3efe1f52dd72f9f5bf5028f1e3e5aaa0a1). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class ParseToTimestampLTZ(` * `case class DomainJoin(` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon edited a comment on pull request #33329: [WIP][SPARK-35917][SHUFFLE][CORE][3.2] Disable push-based shuffle feature to prevent it from being used
HyukjinKwon edited a comment on pull request #33329: URL: https://github.com/apache/spark/pull/33329#issuecomment-879569468 Yeah, I think we won't necessarily have to make it failed when it's enabled. I believe it's fine to explicitly document that this feature is unstable, and either correctness or backward compatibility isn't guaranteed at this moment - to be clear, is it not usable at all? In fact, I guess we haven't even properly documented this yet (?). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #33329: [WIP][SPARK-35917][SHUFFLE][CORE][3.2] Disable push-based shuffle feature to prevent it from being used
HyukjinKwon commented on pull request #33329: URL: https://github.com/apache/spark/pull/33329#issuecomment-879569468 Yeah, I think we won't necessarily have to make it failed when it's enabled. I believe it's fine to explicitly document that this feature is unstable, and either correctness or backward compatibility isn't guaranteed. In fact, I guess we haven't even properly documented this yet (?). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gengliangwang commented on pull request #33330: [SPARK-36123][SQL] Parquet vectorized reader doesn't skip null values correctly
gengliangwang commented on pull request #0: URL: https://github.com/apache/spark/pull/0#issuecomment-879568535 @sunchao Thanks for the work. I think it's OK to have a PR for test refactoring. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #33325: [SPARK-36076][SQL][3.0] ArrayIndexOutOfBounds in Cast string to timestamp
HyukjinKwon commented on pull request #33325: URL: https://github.com/apache/spark/pull/33325#issuecomment-879567089 the sparkr test failure should be ignorable. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ekoifman commented on a change in pull request #32872: [SPARK-35639][SQL] Make hasCoalescedPartition return true if something was actually coalesced
ekoifman commented on a change in pull request #32872: URL: https://github.com/apache/spark/pull/32872#discussion_r669262650 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CustomShuffleReaderExec.scala ## @@ -87,8 +87,15 @@ case class CustomShuffleReaderExec private( Iterator(desc) } - def hasCoalescedPartition: Boolean = -partitionSpecs.exists(_.isInstanceOf[CoalescedPartitionSpec]) + /** + * Returns true iff some non-empty partitions were combined + */ + def hasCoalescedPartition: Boolean = { +partitionSpecs.exists { + case s: CoalescedPartitionSpec => s.endReducerIndex - s.startReducerIndex > 1 Review comment: Interesting. I didn't realize Spark could produce `spark.sql.shuffle.partitions` empty partitions. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #33324: [SPARK-36093][SQL] RemoveRedundantAliases should not change Command's parameter's expression's name
HyukjinKwon commented on a change in pull request #33324: URL: https://github.com/apache/spark/pull/33324#discussion_r669262443 ## File path: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala ## @@ -4058,6 +4058,44 @@ class SQLQuerySuite extends QueryTest with SharedSparkSession with AdaptiveSpark Row(1, 2, 1, 2) :: Nil) } } + + test("SPARK-36093") { Review comment: Can you add a test title? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #33323: [SPARK-35739][SQL] Add Java-compatible Dataset.join overloads
HyukjinKwon commented on pull request #33323: URL: https://github.com/apache/spark/pull/33323#issuecomment-879566146 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #33323: [SPARK-35739][SQL] Add Java-compatible Dataset.join overloads
HyukjinKwon commented on a change in pull request #33323: URL: https://github.com/apache/spark/pull/33323#discussion_r669261877 ## File path: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ## @@ -981,6 +1006,58 @@ class Dataset[T] private[sql]( join(right, usingColumns, "inner") } + /** + * Equi-join with another `DataFrame` using the given column. A cross join with a predicate + * is specified as an inner join. If you would explicitly like to perform a cross join use the + * `crossJoin` method. + * + * Different from other join functions, the join column will only appear once in the output, + * i.e. similar to SQL's `JOIN USING` syntax. + * + * @param right Right side of the join operation. + * @param usingColumn Name of the column to join on. This column must exist on both sides. + * @param joinType Type of join to perform. Default `inner`. Must be one of: + * `inner`, `cross`, `outer`, `full`, `fullouter`, `full_outer`, `left`, + * `leftouter`, `left_outer`, `right`, `rightouter`, `right_outer`, + * `semi`, `leftsemi`, `left_semi`, `anti`, `leftanti`, left_anti`. + * + * @note If you perform a self-join using this function without aliasing the input + * `DataFrame`s, you will NOT be able to reference any columns after the join, since + * there is no way to disambiguate which side of the join you would like to reference. + * + * @group untypedrel + * @since 3.1.3 + */ + def join(right: Dataset[_], usingColumn: String, joinType: String): DataFrame = { +join(right, Seq(usingColumn), joinType) + } + + /** + * (Java-specific) Equi-join with another `DataFrame` using the given columns. A cross join with + * a predicate is specified as an inner join. If you would explicitly like to perform a cross + * join use the `crossJoin` method. + * + * Different from other join functions, the join columns will only appear once in the output, + * i.e. similar to SQL's `JOIN USING` syntax. + * + * @param right Right side of the join operation. + * @param usingColumns Names of the columns to join on. This columns must exist on both sides. + * @param joinType Type of join to perform. Default `inner`. Must be one of: + * `inner`, `cross`, `outer`, `full`, `fullouter`, `full_outer`, `left`, + * `leftouter`, `left_outer`, `right`, `rightouter`, `right_outer`, + * `semi`, `leftsemi`, `left_semi`, `anti`, `leftanti`, left_anti`. + * + * @note If you perform a self-join using this function without aliasing the input + * `DataFrame`s, you will NOT be able to reference any columns after the join, since + * there is no way to disambiguate which side of the join you would like to reference. + * + * @group untypedrel + * @since 3.1.3 + */ + def join(right: Dataset[_], usingColumns: Array[String], joinType: String): DataFrame = { +join(right, usingColumns.toSeq, joinType) + } + /** * Equi-join with another `DataFrame` using the given columns. A cross join with a predicate Review comment: Please add "(Scala-specific)" on other methods -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #33323: [SPARK-35739][SQL] Add Java-compatible Dataset.join overloads
HyukjinKwon commented on a change in pull request #33323: URL: https://github.com/apache/spark/pull/33323#discussion_r669261712 ## File path: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ## @@ -956,6 +956,31 @@ class Dataset[T] private[sql]( join(right, Seq(usingColumn)) } + /** + * (Java-specific) Inner equi-join with another `DataFrame` using the given columns. + * + * Different from other join functions, the join columns will only appear once in the output, + * i.e. similar to SQL's `JOIN USING` syntax. + * + * {{{ + * // Joining df1 and df2 using the columns "user_id" and "user_name" + * df1.join(df2, new String[] {"user_id", "user_name"}); + * }}} + * + * @param right Right side of the join operation. + * @param usingColumns Names of the columns to join on. This columns must exist on both sides. + * + * @note If you perform a self-join using this function without aliasing the input + * `DataFrame`s, you will NOT be able to reference any columns after the join, since + * there is no way to disambiguate which side of the join you would like to reference. + * + * @group untypedrel + * @since 3.1.3 + */ + def join(right: Dataset[_], usingColumns: Array[String]): DataFrame = { +join(right, usingColumns.toSeq) + } + /** * Inner equi-join with another `DataFrame` using the given columns. Review comment: Can you add "(Scala-specific)" here? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #33323: [SPARK-35739][SQL] Add Java-compatible Dataset.join overloads
HyukjinKwon commented on a change in pull request #33323: URL: https://github.com/apache/spark/pull/33323#discussion_r669261605 ## File path: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ## @@ -956,6 +956,31 @@ class Dataset[T] private[sql]( join(right, Seq(usingColumn)) } + /** + * (Java-specific) Inner equi-join with another `DataFrame` using the given columns. + * + * Different from other join functions, the join columns will only appear once in the output, + * i.e. similar to SQL's `JOIN USING` syntax. + * + * {{{ + * // Joining df1 and df2 using the columns "user_id" and "user_name" + * df1.join(df2, new String[] {"user_id", "user_name"}); + * }}} + * + * @param right Right side of the join operation. + * @param usingColumns Names of the columns to join on. This columns must exist on both sides. + * + * @note If you perform a self-join using this function without aliasing the input + * `DataFrame`s, you will NOT be able to reference any columns after the join, since + * there is no way to disambiguate which side of the join you would like to reference. + * + * @group untypedrel + * @since 3.1.3 Review comment: Let's target 3.3.0 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32401: [SPARK-35276][CORE] Calculate checksum for shuffle data and write as checksum file
SparkQA commented on pull request #32401: URL: https://github.com/apache/spark/pull/32401#issuecomment-879565484 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45506/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #33330: [SPARK-36123][SQL] Parquet vectorized reader doesn't skip null values correctly
SparkQA removed a comment on pull request #0: URL: https://github.com/apache/spark/pull/0#issuecomment-879470999 **[Test build #140986 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140986/testReport)** for PR 0 at commit [`41a7ca8`](https://github.com/apache/spark/commit/41a7ca8b7b2464a7363ba6d89dfdbeb8ed7c96aa). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33330: [SPARK-36123][SQL] Parquet vectorized reader doesn't skip null values correctly
SparkQA commented on pull request #0: URL: https://github.com/apache/spark/pull/0#issuecomment-879564742 **[Test build #140986 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140986/testReport)** for PR 0 at commit [`41a7ca8`](https://github.com/apache/spark/commit/41a7ca8b7b2464a7363ba6d89dfdbeb8ed7c96aa). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #33077: [SPARK-34892][SS] Introduce MergingSortWithSessionWindowStateIterator sorting input rows and rows in state efficiently
SparkQA removed a comment on pull request #33077: URL: https://github.com/apache/spark/pull/33077#issuecomment-879471127 **[Test build #140987 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140987/testReport)** for PR 33077 at commit [`b540632`](https://github.com/apache/spark/commit/b540632e1180dcb1ce9f49626a9fb925fd503742). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33077: [SPARK-34892][SS] Introduce MergingSortWithSessionWindowStateIterator sorting input rows and rows in state efficiently
SparkQA commented on pull request #33077: URL: https://github.com/apache/spark/pull/33077#issuecomment-879564143 **[Test build #140987 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140987/testReport)** for PR 33077 at commit [`b540632`](https://github.com/apache/spark/commit/b540632e1180dcb1ce9f49626a9fb925fd503742). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sarutak commented on pull request #33253: [SPARK-36038][CORE] Speculation metrics summary at stage level
sarutak commented on pull request #33253: URL: https://github.com/apache/spark/pull/33253#issuecomment-879563645 @venkata91 Could you fix the style issue first? ``` [error] /home/runner/work/spark/spark/core/src/test/scala/org/apache/spark/deploy/history/HistoryServerSuite.scala:195: File line length exceeds 100 characters ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33258: [SPARK-36037][SQL] Support ANSI SQL LOCALTIMESTAMP datetime value function
AmplabJenkins removed a comment on pull request #33258: URL: https://github.com/apache/spark/pull/33258#issuecomment-879562230 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45504/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33258: [SPARK-36037][SQL] Support ANSI SQL LOCALTIMESTAMP datetime value function
SparkQA commented on pull request #33258: URL: https://github.com/apache/spark/pull/33258#issuecomment-879562217 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45504/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33258: [SPARK-36037][SQL] Support ANSI SQL LOCALTIMESTAMP datetime value function
AmplabJenkins commented on pull request #33258: URL: https://github.com/apache/spark/pull/33258#issuecomment-879562230 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45504/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33286: [SPARK-36079][SQL] Null-based filter estimate should always be in the range [0, 1]
AmplabJenkins removed a comment on pull request #33286: URL: https://github.com/apache/spark/pull/33286#issuecomment-879558620 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/140985/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33286: [SPARK-36079][SQL] Null-based filter estimate should always be in the range [0, 1]
AmplabJenkins commented on pull request #33286: URL: https://github.com/apache/spark/pull/33286#issuecomment-879558620 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/140985/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33333: [SPARK-36129][BUILD] Upgrade commons-compress to 1.21 to deal with CVEs
SparkQA commented on pull request #3: URL: https://github.com/apache/spark/pull/3#issuecomment-879558541 **[Test build #140993 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140993/testReport)** for PR 3 at commit [`ad3da13`](https://github.com/apache/spark/commit/ad3da133d369c9d3bcf587c5810f3d526b87ea61). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org