[GitHub] [spark] AmplabJenkins commented on pull request #29322: [SPARK-32511][SQL] Add dropFields method to Column class
AmplabJenkins commented on pull request #29322: URL: https://github.com/apache/spark/pull/29322#issuecomment-672394105 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] rednaxelafx commented on a change in pull request #29407: [SPARK-32588][CORE][TEST] Fix SizeEstimator initialization in tests
rednaxelafx commented on a change in pull request #29407: URL: https://github.com/apache/spark/pull/29407#discussion_r468936303 ## File path: core/src/test/scala/org/apache/spark/util/SizeEstimatorSuite.scala ## @@ -214,6 +216,10 @@ class SizeEstimatorSuite } test("class field blocks rounding on 64-bit VM without useCompressedOops") { +System.setProperty(TEST_USE_COMPRESSED_OOPS_KEY, "false") Review comment: Since this PR is all about explicit re-initialization, should we add `System.setProperty("os.arch", "amd64")` to this block as well? Actually would it be a good idea to wrap this sequence into a helper method or something? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ral51 commented on pull request #19410: [SPARK-22184][CORE][GRAPHX] GraphX fails in case of insufficient memory and checkpoints enabled
ral51 commented on pull request #19410: URL: https://github.com/apache/spark/pull/19410#issuecomment-672393523 I ran into same issue myself. Is there a workaround? @szhem @EthanRock This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ral51 edited a comment on pull request #19410: [SPARK-22184][CORE][GRAPHX] GraphX fails in case of insufficient memory and checkpoints enabled
ral51 edited a comment on pull request #19410: URL: https://github.com/apache/spark/pull/19410#issuecomment-672393523 I ran into same issue today. Is there a workaround? @szhem @EthanRock This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29322: [SPARK-32511][SQL] Add dropFields method to Column class
SparkQA commented on pull request #29322: URL: https://github.com/apache/spark/pull/29322#issuecomment-672393733 **[Test build #127351 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127351/testReport)** for PR 29322 at commit [`ad111ba`](https://github.com/apache/spark/commit/ad111ba8e1e95586378dc9300358703e8aabb7ad). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on pull request #29401: [SPARK-32400][SQL] Improve test coverage of HiveScriptTransformationExec
maropu commented on pull request #29401: URL: https://github.com/apache/spark/pull/29401#issuecomment-672392509 Looks okay and I have no more comment. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] rohitmishr1484 commented on pull request #29410: [SPARK-32180][PYSPARK][DOCS] Getting started-Installation guide for pyspark doc
rohitmishr1484 commented on pull request #29410: URL: https://github.com/apache/spark/pull/29410#issuecomment-672391444 Hi @HyukjinKwon, I was not sure how to add you as a Reviewer for this Pull request, thus adding this comment. I would like to mention a few points: 1. Baseline description: I have used this pull request as a reference since this is my first pull request- #29385 or https://github.com/apache/spark/pull/29385. 2. Most of the information used for the "Installation Page" has come from Koalas documentation-https://koalas.readthedocs.io/en/latest/getting_started/install.html. Am I supposed to mention this as reference/credit anywhere in this documentation? 3. Personally I have used only PyPI as an installation mechanism thus haven't tried the other three. A prerequisite for PyPI installation was the availability of JAVA 8 path in JAVA_HOME environment variable. Please let me know if that's something I need to add explicitly in the "Dependencies" section. 4. Where ever necessary I have updated the links. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #29270: [SPARK-32466][TEST][SQL] Add PlanStabilitySuite to detect SparkPlan regression
maropu commented on a change in pull request #29270: URL: https://github.com/apache/spark/pull/29270#discussion_r468932486 ## File path: sql/core/src/test/scala/org/apache/spark/sql/PlanStabilitySuite.scala ## @@ -0,0 +1,312 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql + +import java.io.File +import java.nio.charset.StandardCharsets + +import scala.collection.mutable + +import org.apache.commons.io.FileUtils + +import org.apache.spark.sql.catalyst.expressions.AttributeSet +import org.apache.spark.sql.catalyst.util._ +import org.apache.spark.sql.execution._ +import org.apache.spark.sql.execution.adaptive.DisableAdaptiveExecutionSuite +import org.apache.spark.sql.execution.exchange.{Exchange, ReusedExchangeExec} +import org.apache.spark.sql.internal.SQLConf + +/** + * Check that TPCDS SparkPlans don't change. + * If there are plan differences, the error message looks like this: + * Plans did not match: + * last approved plan: /path/to/tpcds-plan-stability/approved-plans-xxx/q1/simplified.txt + * last explain: /path/to/tpcds-plan-stability/approved-plans-xxx/q1/explain.txt + * actual plan: /path/to/tmp/q1.actual.simplified.txt + * actual explain: /path/to/tmp/q1.actual.explain.txt + * [side by side plan diff] + * The explain files are saved to help debug later, they are not checked. Only the simplified + * plans are checked (by string comparison). + * + * Approving new plans: + * IF the plan change is intended then re-running the test + * with environ var SPARK_GENERATE_GOLDEN_FILES=1 will make the new plan canon. + * This should be done only for the queries that need it, to avoid unnecessary diffs in the + * other approved plans. + * This can be done by running sbt test-only *PlanStability[WithStats]Suite* -- -z "q31" + * The new plan files should be part of the PR and reviewed. + */ +trait PlanStabilitySuite extends TPCDSBase with DisableAdaptiveExecutionSuite { + + private val originalMaxToStringFields = conf.maxToStringFields + + override def beforeAll(): Unit = { +conf.setConf(SQLConf.MAX_TO_STRING_FIELDS, 100) +super.beforeAll() + } + + override def afterAll(): Unit = { +super.afterAll() +conf.setConf(SQLConf.MAX_TO_STRING_FIELDS, originalMaxToStringFields) + } + + private val regenerateGoldenFiles: Boolean = System.getenv("SPARK_GENERATE_GOLDEN_FILES") == "1" + + protected val baseResourcePath = { +// use the same way as `SQLQueryTestSuite` to get the resource path +java.nio.file.Paths.get("src", "test", "resources", "tpcds-plan-stability").toFile + } + + def goldenFilePath: String + + private def getDirForTest(name: String): File = { +new File(goldenFilePath, name) + } + + private def isApproved(name: String, dir: File, actualSimplifiedPlan: String): Boolean = { +val file = new File(dir, "simplified.txt") +val approved = FileUtils.readFileToString(file, StandardCharsets.UTF_8) +approved == actualSimplifiedPlan + } + + /** + * Serialize and save this SparkPlan. + * The resulting file is used by [[checkWithApproved]] to check stability. + * + * @param planthe [[SparkPlan]] + * @param namethe name of the query + * @param explain the full explain output; this is saved to help debug later as the simplified + *plan is not too useful for debugging + */ + private def approvePlan( Review comment: nit: `approvePlan` -> `generateApprovedPlanFile`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #29270: [SPARK-32466][TEST][SQL] Add PlanStabilitySuite to detect SparkPlan regression
maropu commented on a change in pull request #29270: URL: https://github.com/apache/spark/pull/29270#discussion_r468931369 ## File path: sql/core/src/test/scala/org/apache/spark/sql/PlanStabilitySuite.scala ## @@ -0,0 +1,312 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql + +import java.io.File +import java.nio.charset.StandardCharsets + +import scala.collection.mutable + +import org.apache.commons.io.FileUtils + +import org.apache.spark.sql.catalyst.expressions.AttributeSet +import org.apache.spark.sql.catalyst.util._ +import org.apache.spark.sql.execution._ +import org.apache.spark.sql.execution.adaptive.DisableAdaptiveExecutionSuite +import org.apache.spark.sql.execution.exchange.{Exchange, ReusedExchangeExec} +import org.apache.spark.sql.internal.SQLConf + +/** + * Check that TPCDS SparkPlans don't change. + * If there are plan differences, the error message looks like this: + * Plans did not match: + * last approved plan: /path/to/tpcds-plan-stability/approved-plans-xxx/q1/simplified.txt + * last explain: /path/to/tpcds-plan-stability/approved-plans-xxx/q1/explain.txt + * actual plan: /path/to/tmp/q1.actual.simplified.txt + * actual explain: /path/to/tmp/q1.actual.explain.txt + * [side by side plan diff] + * The explain files are saved to help debug later, they are not checked. Only the simplified + * plans are checked (by string comparison). + * + * Approving new plans: + * IF the plan change is intended then re-running the test + * with environ var SPARK_GENERATE_GOLDEN_FILES=1 will make the new plan canon. + * This should be done only for the queries that need it, to avoid unnecessary diffs in the + * other approved plans. + * This can be done by running sbt test-only *PlanStability[WithStats]Suite* -- -z "q31" + * The new plan files should be part of the PR and reviewed. + */ +trait PlanStabilitySuite extends TPCDSBase with DisableAdaptiveExecutionSuite { + + private val originalMaxToStringFields = conf.maxToStringFields + + override def beforeAll(): Unit = { +conf.setConf(SQLConf.MAX_TO_STRING_FIELDS, 100) Review comment: Why did you choose `100`? For debugging purpose, the larger value looks better though. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #29270: [SPARK-32466][TEST][SQL] Add PlanStabilitySuite to detect SparkPlan regression
maropu commented on a change in pull request #29270: URL: https://github.com/apache/spark/pull/29270#discussion_r468925917 ## File path: sql/core/src/test/scala/org/apache/spark/sql/PlanStabilitySuite.scala ## @@ -0,0 +1,312 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql + +import java.io.File +import java.nio.charset.StandardCharsets + +import scala.collection.mutable + +import org.apache.commons.io.FileUtils + +import org.apache.spark.sql.catalyst.expressions.AttributeSet +import org.apache.spark.sql.catalyst.util._ +import org.apache.spark.sql.execution._ +import org.apache.spark.sql.execution.adaptive.DisableAdaptiveExecutionSuite +import org.apache.spark.sql.execution.exchange.{Exchange, ReusedExchangeExec} +import org.apache.spark.sql.internal.SQLConf + +/** + * Check that TPCDS SparkPlans don't change. + * If there are plan differences, the error message looks like this: + * Plans did not match: + * last approved plan: /path/to/tpcds-plan-stability/approved-plans-xxx/q1/simplified.txt + * last explain: /path/to/tpcds-plan-stability/approved-plans-xxx/q1/explain.txt + * actual plan: /path/to/tmp/q1.actual.simplified.txt + * actual explain: /path/to/tmp/q1.actual.explain.txt + * [side by side plan diff] + * The explain files are saved to help debug later, they are not checked. Only the simplified + * plans are checked (by string comparison). + * + * Approving new plans: + * IF the plan change is intended then re-running the test + * with environ var SPARK_GENERATE_GOLDEN_FILES=1 will make the new plan canon. + * This should be done only for the queries that need it, to avoid unnecessary diffs in the + * other approved plans. + * This can be done by running sbt test-only *PlanStability[WithStats]Suite* -- -z "q31" + * The new plan files should be part of the PR and reviewed. + */ +trait PlanStabilitySuite extends TPCDSBase with DisableAdaptiveExecutionSuite { Review comment: Is this test related to the adaptive execution? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29411: [SPARK-32596][CORE] Clear Ivy resolution files as part of finally block
AmplabJenkins removed a comment on pull request #29411: URL: https://github.com/apache/spark/pull/29411#issuecomment-672354973 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] mridulm commented on pull request #29411: [SPARK-32596][CORE] Clear Ivy resolution files as part of finally block
mridulm commented on pull request #29411: URL: https://github.com/apache/spark/pull/29411#issuecomment-672367418 ok to test This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] mridulm commented on a change in pull request #29411: [SPARK-32596][CORE] Clear Ivy resolution files as part of finally block
mridulm commented on a change in pull request #29411: URL: https://github.com/apache/spark/pull/29411#discussion_r468923013 ## File path: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ## @@ -1400,11 +1399,10 @@ private[spark] object SparkSubmitUtils { "[organization]_[artifact]-[revision](-[classifier]).[ext]", retrieveOptions.setConfs(Array(ivyConfName))) val paths = resolveDependencyPaths(rr.getArtifacts.toArray, packagesDirectory) Review comment: nit: you dont need `paths` anymore This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] mridulm commented on pull request #29411: [SPARK-32596][CORE] Clear Ivy resolution files as part of finally block
mridulm commented on pull request #29411: URL: https://github.com/apache/spark/pull/29411#issuecomment-672367215 +CC @vanzin, @brkyvz This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on pull request #29270: [SPARK-32466][TEST][SQL] Add PlanStabilitySuite to detect SparkPlan regression
maropu commented on pull request #29270: URL: https://github.com/apache/spark/pull/29270#issuecomment-672365902 @Ngone51 Too heavy to show the page `Files changes`!!! How about reducing the number of the generated files for reviews? Probably, a few of the generated file examples are enough for reviews. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29411: [SPARK-32596][CORE] Clear Ivy resolution files as part of finally block
AmplabJenkins removed a comment on pull request #29411: URL: https://github.com/apache/spark/pull/29411#issuecomment-672354589 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29411: [SPARK-32596][CORE] Clear Ivy resolution files as part of finally block
AmplabJenkins commented on pull request #29411: URL: https://github.com/apache/spark/pull/29411#issuecomment-672354973 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29411: [SPARK-32596][CORE] Clear Ivy resolution files as part of finally block
AmplabJenkins commented on pull request #29411: URL: https://github.com/apache/spark/pull/29411#issuecomment-672354589 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] venkata91 opened a new pull request #29411: [SPARK-32596][CORE] Clear Ivy resolution files as part of finally block
venkata91 opened a new pull request #29411: URL: https://github.com/apache/spark/pull/29411 ### What changes were proposed in this pull request? Clear Ivy resolution files as part of finally block if not failures while artifacts resolution can leave the resolution files around. Use tempIvyPath for SparkSubmitUtils.buildIvySettings in tests. This why the test "SPARK-10878: test resolution files cleaned after resolving artifact" did not capture these issues. ### Why are the changes needed? This is a bug ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing unit tests This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on pull request #29407: [SPARK-32588][TEST] Fix SizeEstimator initialization in tests.
maropu commented on pull request #29407: URL: https://github.com/apache/spark/pull/29407#issuecomment-672345352 cc: @kiszk @rednaxelafx This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29328: [SPARK-32516][SQL] 'path' option cannot co-exist with load()'s path parameters
AmplabJenkins removed a comment on pull request #29328: URL: https://github.com/apache/spark/pull/29328#issuecomment-672339519 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29328: [SPARK-32516][SQL] 'path' option cannot co-exist with load()'s path parameters
AmplabJenkins commented on pull request #29328: URL: https://github.com/apache/spark/pull/29328#issuecomment-672339519 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29328: [SPARK-32516][SQL] 'path' option cannot co-exist with load()'s path parameters
SparkQA commented on pull request #29328: URL: https://github.com/apache/spark/pull/29328#issuecomment-672338989 **[Test build #127350 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127350/testReport)** for PR 29328 at commit [`63c9383`](https://github.com/apache/spark/commit/63c93839b25a50855851199c5286becd4674d0cf). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] imback82 commented on a change in pull request #29328: [SPARK-32516][SQL] 'path' option cannot co-exist with load()'s path parameters
imback82 commented on a change in pull request #29328: URL: https://github.com/apache/spark/pull/29328#discussion_r468910024 ## File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala ## @@ -245,12 +245,19 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging { "read files of Hive data source directly.") } +if (extraOptions.contains("path") && paths.nonEmpty) { Review comment: I added the check for `paths` option. Should we handle the case where `path` and `paths` options are both defined? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29367: [SPARK-31198][CORE] Use graceful decommissioning as part of dynamic scaling
AmplabJenkins removed a comment on pull request #29367: URL: https://github.com/apache/spark/pull/29367#issuecomment-672337529 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29367: [SPARK-31198][CORE] Use graceful decommissioning as part of dynamic scaling
AmplabJenkins commented on pull request #29367: URL: https://github.com/apache/spark/pull/29367#issuecomment-672337529 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29367: [SPARK-31198][CORE] Use graceful decommissioning as part of dynamic scaling
SparkQA removed a comment on pull request #29367: URL: https://github.com/apache/spark/pull/29367#issuecomment-672264107 **[Test build #127345 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127345/testReport)** for PR 29367 at commit [`6a69126`](https://github.com/apache/spark/commit/6a6912606761a503ec85dbc54fa5ea465771effc). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29367: [SPARK-31198][CORE] Use graceful decommissioning as part of dynamic scaling
SparkQA commented on pull request #29367: URL: https://github.com/apache/spark/pull/29367#issuecomment-672336829 **[Test build #127345 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127345/testReport)** for PR 29367 at commit [`6a69126`](https://github.com/apache/spark/commit/6a6912606761a503ec85dbc54fa5ea465771effc). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] holdenk commented on a change in pull request #29367: [SPARK-31198][CORE] Use graceful decommissioning as part of dynamic scaling
holdenk commented on a change in pull request #29367: URL: https://github.com/apache/spark/pull/29367#discussion_r468905696 ## File path: core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala ## @@ -503,6 +450,88 @@ class CoarseGrainedSchedulerBackend(scheduler: TaskSchedulerImpl, val rpcEnv: Rp protected def minRegisteredRatio: Double = _minRegisteredRatio + /** + * Request that the cluster manager decommission the specified executors. + * + * @param executorsAndDecomInfo Identifiers of executors & decommission info. + * @param adjustTargetNumExecutors whether the target number of executors will be adjusted down + * after these executors have been decommissioned. + * @return the ids of the executors acknowledged by the cluster manager to be removed. + */ + override def decommissionExecutors( + executorsAndDecomInfo: Array[(String, ExecutorDecommissionInfo)], + adjustTargetNumExecutors: Boolean): Seq[String] = { + +val executorsToDecommission = executorsAndDecomInfo.filter { case (executorId, _) => + CoarseGrainedSchedulerBackend.this.synchronized { +// Only bother decommissioning executors which are alive. +if (isExecutorActive(executorId)) { + executorsPendingDecommission += executorId + true +} else { + false +} + } +} + +// If we don't want to replace the executors we are decommissioning +if (adjustTargetNumExecutors) { + adjustExecutors(executorsToDecommission.map(_._1)) +} + +val decommissioned = executorsToDecommission.filter { case (executorId, decomInfo) => + doDecommission(executorId, decomInfo) +}.map(_._1) +decommissioned + } + + + private def doDecommission(executorId: String, + decomInfo: ExecutorDecommissionInfo): Boolean = { + +logInfo(s"Asking executor $executorId to decommissioning.") +try { + scheduler.executorDecommission(executorId, decomInfo) + if (driverEndpoint != null) { +logInfo("Propagating executor decommission to driver.") +driverEndpoint.send(DecommissionExecutor(executorId, decomInfo)) Review comment: Ok so to me in CoarseGrainedSchedulerBackend if we receive a `DecommissionExecutor` we call `decommissionExecutor` in the EAM base class which then calls the `decommissionExecutors` in the Scheduler which, if the executor has not been marked as decommissioning, marks the executor as decommissioning and then calls into `doDecomission` which asks the scheduler and driver and block manager (as appopriate) to take on the next steps in decommissioning. On the driver endpoint we will delegate to this same function, and since the executor is already marked as decommissioned we will no-op. We could drop this message from the driver endpoint, but I figured leaving it there incase it's easier for someone to send a no-reply message to the driver endpoint then the boolean ask makes sense. WDYT? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29396: [SPARK-32579][SQL] Implement JDBCScan/ScanBuilder/WriteBuilder
AmplabJenkins removed a comment on pull request #29396: URL: https://github.com/apache/spark/pull/29396#issuecomment-672331249 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29396: [SPARK-32579][SQL] Implement JDBCScan/ScanBuilder/WriteBuilder
AmplabJenkins commented on pull request #29396: URL: https://github.com/apache/spark/pull/29396#issuecomment-672331249 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29396: [SPARK-32579][SQL] Implement JDBCScan/ScanBuilder/WriteBuilder
SparkQA commented on pull request #29396: URL: https://github.com/apache/spark/pull/29396#issuecomment-672330830 **[Test build #127349 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127349/testReport)** for PR 29396 at commit [`c53d683`](https://github.com/apache/spark/commit/c53d6836ca3e0b2d211d5322a7cef4df401caeb7). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] holdenk commented on a change in pull request #29367: [SPARK-31198][CORE] Use graceful decommissioning as part of dynamic scaling
holdenk commented on a change in pull request #29367: URL: https://github.com/apache/spark/pull/29367#discussion_r468902557 ## File path: core/src/test/scala/org/apache/spark/deploy/DecommissionWorkerSuite.scala ## @@ -242,8 +242,10 @@ class DecommissionWorkerSuite assert(jobResult === 2) } // 6 tasks: 2 from first stage, 2 rerun again from first stage, 2nd stage attempt 1 and 2. -val tasksSeen = listener.getTasksFinished() Review comment: Yeah good point, it's probably not the listener. It's only showing up for me in GHA though - https://github.com/apache/spark/pull/29367/checks?check_run_id=972990200#step:14:13579 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29396: [SPARK-32579][SQL] Implement JDBCScan/ScanBuilder/WriteBuilder
AmplabJenkins removed a comment on pull request #29396: URL: https://github.com/apache/spark/pull/29396#issuecomment-672304579 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/127348/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29396: [SPARK-32579][SQL] Implement JDBCScan/ScanBuilder/WriteBuilder
SparkQA commented on pull request #29396: URL: https://github.com/apache/spark/pull/29396#issuecomment-672304557 **[Test build #127348 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127348/testReport)** for PR 29396 at commit [`307a693`](https://github.com/apache/spark/commit/307a693a40b7fc59262ba27fc1502b8ba70a86b4). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29396: [SPARK-32579][SQL] Implement JDBCScan/ScanBuilder/WriteBuilder
AmplabJenkins removed a comment on pull request #29396: URL: https://github.com/apache/spark/pull/29396#issuecomment-672304569 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29396: [SPARK-32579][SQL] Implement JDBCScan/ScanBuilder/WriteBuilder
SparkQA removed a comment on pull request #29396: URL: https://github.com/apache/spark/pull/29396#issuecomment-672302622 **[Test build #127348 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127348/testReport)** for PR 29396 at commit [`307a693`](https://github.com/apache/spark/commit/307a693a40b7fc59262ba27fc1502b8ba70a86b4). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29396: [SPARK-32579][SQL] Implement JDBCScan/ScanBuilder/WriteBuilder
AmplabJenkins commented on pull request #29396: URL: https://github.com/apache/spark/pull/29396#issuecomment-672304569 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29396: [SPARK-32579][SQL] Implement JDBCScan/ScanBuilder/WriteBuilder
AmplabJenkins commented on pull request #29396: URL: https://github.com/apache/spark/pull/29396#issuecomment-672303780 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29396: [SPARK-32579][SQL] Implement JDBCScan/ScanBuilder/WriteBuilder
AmplabJenkins removed a comment on pull request #29396: URL: https://github.com/apache/spark/pull/29396#issuecomment-672303780 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29396: [SPARK-32579][SQL] Implement JDBCScan/ScanBuilder/WriteBuilder
SparkQA commented on pull request #29396: URL: https://github.com/apache/spark/pull/29396#issuecomment-672302622 **[Test build #127348 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127348/testReport)** for PR 29396 at commit [`307a693`](https://github.com/apache/spark/commit/307a693a40b7fc59262ba27fc1502b8ba70a86b4). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] huaxingao commented on a change in pull request #29396: [SPARK-32579][SQL] Implement JDBCScan/ScanBuilder/WriteBuilder
huaxingao commented on a change in pull request #29396: URL: https://github.com/apache/spark/pull/29396#discussion_r46891 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCScanBuilder.scala ## @@ -0,0 +1,61 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.sql.execution.datasources.v2.jdbc + +import org.apache.spark.sql.SparkSession +import org.apache.spark.sql.connector.read.{Scan, ScanBuilder, SupportsPushDownFilters, SupportsPushDownRequiredColumns} +import org.apache.spark.sql.execution.datasources.jdbc.{JDBCOptions, JDBCRDD, JDBCRelation} +import org.apache.spark.sql.jdbc.JdbcDialects +import org.apache.spark.sql.sources.Filter +import org.apache.spark.sql.types.StructType + +case class JDBCScanBuilder( +session: SparkSession, +schema: StructType, +jdbcOptions: JDBCOptions) + extends ScanBuilder with SupportsPushDownFilters with SupportsPushDownRequiredColumns { + + private var pushedFilter = Array.empty[Filter] + + private var prunedSchema = schema + + override def pushFilters(filters: Array[Filter]): Array[Filter] = { +if (jdbcOptions.pushDownPredicate) { + val dialect = JdbcDialects.get(jdbcOptions.url) + val (pushed, unSupported) = filters.partition(JDBCRDD.compileFilter(_, dialect).isDefined) Review comment: I looked the code, seems it doesn't have an easy way to only call `compilefilter` once. After calling `compilefilter` to decide which filters to push down, `buildScan` takes the pushed down filer as an array of source.Filter, which needs to be turned into SQL string filter format by calling `compilefilter` again. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28804: [SPARK-31973][SQL] Skip partial aggregates if grouping keys have high cardinality
AmplabJenkins removed a comment on pull request #28804: URL: https://github.com/apache/spark/pull/28804#issuecomment-672297178 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28804: [SPARK-31973][SQL] Skip partial aggregates if grouping keys have high cardinality
AmplabJenkins commented on pull request #28804: URL: https://github.com/apache/spark/pull/28804#issuecomment-672297178 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28804: [SPARK-31973][SQL] Skip partial aggregates if grouping keys have high cardinality
SparkQA commented on pull request #28804: URL: https://github.com/apache/spark/pull/28804#issuecomment-672296315 **[Test build #127340 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127340/testReport)** for PR 28804 at commit [`11572a1`](https://github.com/apache/spark/commit/11572a105ef870c9e95b4302e6613b7f0e73d0de). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28804: [SPARK-31973][SQL] Skip partial aggregates if grouping keys have high cardinality
SparkQA removed a comment on pull request #28804: URL: https://github.com/apache/spark/pull/28804#issuecomment-672091607 **[Test build #127340 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127340/testReport)** for PR 28804 at commit [`11572a1`](https://github.com/apache/spark/commit/11572a105ef870c9e95b4302e6613b7f0e73d0de). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] agrawaldevesh commented on a change in pull request #29367: [SPARK-31198][CORE] Use graceful decommissioning as part of dynamic scaling
agrawaldevesh commented on a change in pull request #29367: URL: https://github.com/apache/spark/pull/29367#discussion_r468882865 ## File path: core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala ## @@ -503,6 +450,88 @@ class CoarseGrainedSchedulerBackend(scheduler: TaskSchedulerImpl, val rpcEnv: Rp protected def minRegisteredRatio: Double = _minRegisteredRatio + /** + * Request that the cluster manager decommission the specified executors. + * + * @param executorsAndDecomInfo Identifiers of executors & decommission info. + * @param adjustTargetNumExecutors whether the target number of executors will be adjusted down + * after these executors have been decommissioned. + * @return the ids of the executors acknowledged by the cluster manager to be removed. + */ + override def decommissionExecutors( + executorsAndDecomInfo: Array[(String, ExecutorDecommissionInfo)], + adjustTargetNumExecutors: Boolean): Seq[String] = { + +val executorsToDecommission = executorsAndDecomInfo.filter { case (executorId, _) => + CoarseGrainedSchedulerBackend.this.synchronized { +// Only bother decommissioning executors which are alive. +if (isExecutorActive(executorId)) { + executorsPendingDecommission += executorId + true +} else { + false +} + } +} + +// If we don't want to replace the executors we are decommissioning +if (adjustTargetNumExecutors) { + adjustExecutors(executorsToDecommission.map(_._1)) +} + +val decommissioned = executorsToDecommission.filter { case (executorId, decomInfo) => + doDecommission(executorId, decomInfo) +}.map(_._1) +decommissioned + } + + + private def doDecommission(executorId: String, + decomInfo: ExecutorDecommissionInfo): Boolean = { + +logInfo(s"Asking executor $executorId to decommissioning.") +try { + scheduler.executorDecommission(executorId, decomInfo) + if (driverEndpoint != null) { +logInfo("Propagating executor decommission to driver.") +driverEndpoint.send(DecommissionExecutor(executorId, decomInfo)) Review comment: I believe there is still a cycle here: Please trace through how `DecommissionExecutor` message is handled: It will eventually call this doDecommission which will send the message again ... If I understand correctly, this may end up live-locking the driver until the poor executor actually dies for good. One way to break this cycle is to directly call `doDecommission` in the handlers for `DecommissionExecutor` in this class's `receive` and `receiveAndReply` methods, with a special flag that forbids re-sending the message. [blocker] ## File path: core/src/test/scala/org/apache/spark/deploy/DecommissionWorkerSuite.scala ## @@ -242,8 +242,10 @@ class DecommissionWorkerSuite assert(jobResult === 2) } // 6 tasks: 2 from first stage, 2 rerun again from first stage, 2nd stage attempt 1 and 2. -val tasksSeen = listener.getTasksFinished() Review comment: Would you happen to recall the github actions error you got that lead to this change ? I would like to dig further because I invoke the listener using TestUtils.withListener(sc, listener): Which waits for the listener to drain and also removes the listener. So I don't think wrapping this in an eventually should actually be doing anything: The listener has already been removed. Perhaps I ought to bring back the the "waiting for job done" inside of the getTasksFinished or as a separate call. I would like to understand further just so that I can learn about some of the gotchas with this listener stuff. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Udbhav30 commented on pull request #29387: [SPARK-32481] Support truncate table to move data to trash
Udbhav30 commented on pull request #29387: URL: https://github.com/apache/spark/pull/29387#issuecomment-67229 Gentle ping @dongjoon-hyun This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29367: [SPARK-31198][CORE] Use graceful decommissioning as part of dynamic scaling
AmplabJenkins removed a comment on pull request #29367: URL: https://github.com/apache/spark/pull/29367#issuecomment-672288827 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28617: [SPARK-31694][SQL] Add SupportsPartitions APIs on DataSourceV2
AmplabJenkins removed a comment on pull request #28617: URL: https://github.com/apache/spark/pull/28617#issuecomment-672288736 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29367: [SPARK-31198][CORE] Use graceful decommissioning as part of dynamic scaling
AmplabJenkins commented on pull request #29367: URL: https://github.com/apache/spark/pull/29367#issuecomment-672288827 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28617: [SPARK-31694][SQL] Add SupportsPartitions APIs on DataSourceV2
AmplabJenkins commented on pull request #28617: URL: https://github.com/apache/spark/pull/28617#issuecomment-672288736 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] c21 commented on pull request #29342: [SPARK-32399][SQL] Full outer shuffled hash join
c21 commented on pull request #29342: URL: https://github.com/apache/spark/pull/29342#issuecomment-672288194 @agrawaldevesh - thanks for notes. I totally agree. Just to point out for existing current approach, I already use unsafe row boolean type to store the matched bit in `BytesToBytesMap`. I think for CPU usage, the current approach works better as it does not need to have extra look up in key array, when iterating all values of map (which my gut feeling is not very efficient). For memory usage, the newly proposed approach works better as it only saves information for matched row, but not all rows. This is a trade-off here. Side note: for our internal workload originally it is CPU bound but not memory bound, and we are gradually moving towards memory bound more and more now with new type of machines. Not sure whether it's a trend for others for caring more of memory usage. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29367: [SPARK-31198][CORE] Use graceful decommissioning as part of dynamic scaling
SparkQA removed a comment on pull request #29367: URL: https://github.com/apache/spark/pull/29367#issuecomment-672181632 **[Test build #127343 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127343/testReport)** for PR 29367 at commit [`cc76ff5`](https://github.com/apache/spark/commit/cc76ff5104e8376063f640568d8983adc5dda32d). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29367: [SPARK-31198][CORE] Use graceful decommissioning as part of dynamic scaling
SparkQA commented on pull request #29367: URL: https://github.com/apache/spark/pull/29367#issuecomment-672288032 **[Test build #127343 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127343/testReport)** for PR 29367 at commit [`cc76ff5`](https://github.com/apache/spark/commit/cc76ff5104e8376063f640568d8983adc5dda32d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28617: [SPARK-31694][SQL] Add SupportsPartitions APIs on DataSourceV2
SparkQA removed a comment on pull request #28617: URL: https://github.com/apache/spark/pull/28617#issuecomment-672075384 **[Test build #127338 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127338/testReport)** for PR 28617 at commit [`c96e0fc`](https://github.com/apache/spark/commit/c96e0fc4377adb3435848acddf0cd0e8f44ab3e4). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28617: [SPARK-31694][SQL] Add SupportsPartitions APIs on DataSourceV2
SparkQA commented on pull request #28617: URL: https://github.com/apache/spark/pull/28617#issuecomment-672287681 **[Test build #127338 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127338/testReport)** for PR 28617 at commit [`c96e0fc`](https://github.com/apache/spark/commit/c96e0fc4377adb3435848acddf0cd0e8f44ab3e4). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29410: [WIP][SPARK-32180][PYTHON][DOCS] Getting started-Installation guide for pyspark doc
AmplabJenkins removed a comment on pull request #29410: URL: https://github.com/apache/spark/pull/29410#issuecomment-672277585 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29410: [WIP][SPARK-32180][PYTHON][DOCS] Getting started-Installation guide for pyspark doc
AmplabJenkins commented on pull request #29410: URL: https://github.com/apache/spark/pull/29410#issuecomment-672278185 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29410: [WIP][SPARK-32180][PYTHON][DOCS] Getting started-Installation guide for pyspark doc
AmplabJenkins commented on pull request #29410: URL: https://github.com/apache/spark/pull/29410#issuecomment-672277585 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29383: [SPARK-31703][SQL] Parquet RLE float/double are read incorrectly on big endian platforms
AmplabJenkins commented on pull request #29383: URL: https://github.com/apache/spark/pull/29383#issuecomment-672276640 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29383: [SPARK-31703][SQL] Parquet RLE float/double are read incorrectly on big endian platforms
AmplabJenkins removed a comment on pull request #29383: URL: https://github.com/apache/spark/pull/29383#issuecomment-672276640 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29383: [SPARK-31703][SQL] Parquet RLE float/double are read incorrectly on big endian platforms
SparkQA removed a comment on pull request #29383: URL: https://github.com/apache/spark/pull/29383#issuecomment-672070547 **[Test build #127336 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127336/testReport)** for PR 29383 at commit [`f160da0`](https://github.com/apache/spark/commit/f160da0035d8b1df08b75832f5871f7afc5cb839). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29383: [SPARK-31703][SQL] Parquet RLE float/double are read incorrectly on big endian platforms
SparkQA commented on pull request #29383: URL: https://github.com/apache/spark/pull/29383#issuecomment-672275557 **[Test build #127336 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127336/testReport)** for PR 29383 at commit [`f160da0`](https://github.com/apache/spark/commit/f160da0035d8b1df08b75832f5871f7afc5cb839). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] agrawaldevesh commented on pull request #29342: [SPARK-32399][SQL] Full outer shuffled hash join
agrawaldevesh commented on pull request #29342: URL: https://github.com/apache/spark/pull/29342#issuecomment-672275450 Hi Cheng, I am wondering if you might have a perf test handy to test this new implementation vs your old approach ? After going through the code and following along, I think my random idea of Bitset/auxiliary data structure might be a a lot more work/plumbing and may be inefficient compared to your older approach :-(. Let me explain: The hashed relation has a list of (key, value) stored in pages. We want to know which of those key-value pairs didn't match. Unfortunately, due to the indirection of dataPages, this list is stored in a chunked way inside pages. You don't have ready access to the "index" of this list but instead of other things like (keyIndex, key, value, valueOffset, valueBase) and the like. I don't want to lead you astray but I would advise you to not throw your old implementation and do some perf testing to validate that the axillary approach works better ? Another way to solve this would be go with the spirit of your existing solution of putting the "matched bit" in the key-value payloads, but just put that as an unsafe field deep inside BinaryMap, so that you can avail of all the unsafe goodness. There will be a "Location.markMatched()" call to set this matched bit. Later on when you iterate these page entries in the BinaryMap, you can skip over those with the set matched bit. But this "unsafe" approach is more of an optimization and can cause other regressions. So I would be curious for perf numbers for the two approaches of out of band matched-bit vs within the hashed relation matched-bit (older approach). All the best and looking forward to how this new approach pans out ! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] rohitmishr1484 opened a new pull request #29410: [WIP][SPARK-32180][PYTHON][DOCS] Getting started-Installation guide for pyspark doc
rohitmishr1484 opened a new pull request #29410: URL: https://github.com/apache/spark/pull/29410 # What changes were proposed in this pull request? This PR proposes to add getting started- installation to new PySpark docs. ### Why are the changes needed? Better documentation. ### Does this PR introduce _any_ user-facing change? No. Documentation only. ### How was this patch tested? By generating documents locally. Since its a WIP, I will retest it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29367: [SPARK-31198][CORE] Use graceful decommissioning as part of dynamic scaling
AmplabJenkins removed a comment on pull request #29367: URL: https://github.com/apache/spark/pull/29367#issuecomment-672270731 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29367: [SPARK-31198][CORE] Use graceful decommissioning as part of dynamic scaling
AmplabJenkins commented on pull request #29367: URL: https://github.com/apache/spark/pull/29367#issuecomment-672270731 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28841: [SPARK-31962][SQL] Provide modifiedAfter and modifiedBefore options when filtering from a batch-based file data source
AmplabJenkins removed a comment on pull request #28841: URL: https://github.com/apache/spark/pull/28841#issuecomment-672269109 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/127341/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29367: [SPARK-31198][CORE] Use graceful decommissioning as part of dynamic scaling
SparkQA commented on pull request #29367: URL: https://github.com/apache/spark/pull/29367#issuecomment-672269751 **[Test build #127342 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127342/testReport)** for PR 29367 at commit [`4d8b6cd`](https://github.com/apache/spark/commit/4d8b6cd840e56beef028f2d2c34b42486604cd36). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29367: [SPARK-31198][CORE] Use graceful decommissioning as part of dynamic scaling
SparkQA removed a comment on pull request #29367: URL: https://github.com/apache/spark/pull/29367#issuecomment-672137339 **[Test build #127342 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127342/testReport)** for PR 29367 at commit [`4d8b6cd`](https://github.com/apache/spark/commit/4d8b6cd840e56beef028f2d2c34b42486604cd36). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28841: [SPARK-31962][SQL] Provide modifiedAfter and modifiedBefore options when filtering from a batch-based file data source
AmplabJenkins removed a comment on pull request #28841: URL: https://github.com/apache/spark/pull/28841#issuecomment-672269096 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28841: [SPARK-31962][SQL] Provide modifiedAfter and modifiedBefore options when filtering from a batch-based file data source
SparkQA removed a comment on pull request #28841: URL: https://github.com/apache/spark/pull/28841#issuecomment-672107180 **[Test build #127341 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127341/testReport)** for PR 28841 at commit [`b090639`](https://github.com/apache/spark/commit/b090639886470f0320683a24a92ab9ebdae2a00a). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28841: [SPARK-31962][SQL] Provide modifiedAfter and modifiedBefore options when filtering from a batch-based file data source
AmplabJenkins commented on pull request #28841: URL: https://github.com/apache/spark/pull/28841#issuecomment-672269096 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28841: [SPARK-31962][SQL] Provide modifiedAfter and modifiedBefore options when filtering from a batch-based file data source
SparkQA commented on pull request #28841: URL: https://github.com/apache/spark/pull/28841#issuecomment-672268829 **[Test build #127341 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127341/testReport)** for PR 28841 at commit [`b090639`](https://github.com/apache/spark/commit/b090639886470f0320683a24a92ab9ebdae2a00a). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29409: [SPARK-32594][SQL] Fix serialization of dates inserted to Hive tables
AmplabJenkins removed a comment on pull request #29409: URL: https://github.com/apache/spark/pull/29409#issuecomment-672268078 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29409: [SPARK-32594][SQL] Fix serialization of dates inserted to Hive tables
AmplabJenkins commented on pull request #29409: URL: https://github.com/apache/spark/pull/29409#issuecomment-672268078 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29409: [SPARK-32594][SQL] Fix serialization of dates inserted to Hive tables
SparkQA commented on pull request #29409: URL: https://github.com/apache/spark/pull/29409#issuecomment-672267613 **[Test build #127347 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127347/testReport)** for PR 29409 at commit [`713f6ee`](https://github.com/apache/spark/commit/713f6ee1b6b63edb8d184dbdd4115838800fe412). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk opened a new pull request #29409: [SPARK-32594][SQL] Fix serialization of dates inserted to Hive tables
MaxGekk opened a new pull request #29409: URL: https://github.com/apache/spark/pull/29409 ### What changes were proposed in this pull request? Fix `DaysWritable` by overriding parent's method `def get(doesTimeMatter: Boolean): Date` from `DateWritable` instead of `Date get()` because the former one uses the first one. The bug occurs because `HiveOutputWriter.write()` call `def get(doesTimeMatter: Boolean): Date` transitively with default implementation from the parent class `DateWritable` which doesn't respect date rebases and uses not initialized `daysSinceEpoch` (0 which `1970-01-01`). ### Why are the changes needed? The changes fix the bug: ```sql spark-sql> CREATE TABLE table1 (d date); spark-sql> INSERT INTO table1 VALUES (date '2020-08-11'); spark-sql> SELECT * FROM table1; 1970-01-01 ``` The expected result of the last SQL statement must be **2020-08-11** but got **1970-01-01**. ### Does this PR introduce _any_ user-facing change? Yes. After the fix, `INSERT` work correctly: ```sql spark-sql> SELECT * FROM table1; 2020-08-11 ``` ### How was this patch tested? Add new test to `HiveSerDeReadWriteSuite` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29328: [SPARK-32516][SQL] 'path' option cannot co-exist with load()'s path parameters
AmplabJenkins removed a comment on pull request #29328: URL: https://github.com/apache/spark/pull/29328#issuecomment-672264729 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29328: [SPARK-32516][SQL] 'path' option cannot co-exist with load()'s path parameters
AmplabJenkins commented on pull request #29328: URL: https://github.com/apache/spark/pull/29328#issuecomment-672264729 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29367: [SPARK-31198][CORE] Use graceful decommissioning as part of dynamic scaling
SparkQA commented on pull request #29367: URL: https://github.com/apache/spark/pull/29367#issuecomment-672264107 **[Test build #127345 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127345/testReport)** for PR 29367 at commit [`6a69126`](https://github.com/apache/spark/commit/6a6912606761a503ec85dbc54fa5ea465771effc). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29328: [SPARK-32516][SQL] 'path' option cannot co-exist with load()'s path parameters
SparkQA commented on pull request #29328: URL: https://github.com/apache/spark/pull/29328#issuecomment-672264110 **[Test build #127346 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127346/testReport)** for PR 29328 at commit [`808b7c0`](https://github.com/apache/spark/commit/808b7c0315849586d0b3e68e55914da525e2baba). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29328: [SPARK-32516][SQL] 'path' option cannot co-exist with load()'s path parameters
AmplabJenkins removed a comment on pull request #29328: URL: https://github.com/apache/spark/pull/29328#issuecomment-672261135 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29367: [SPARK-31198][CORE] Use graceful decommissioning as part of dynamic scaling
AmplabJenkins removed a comment on pull request #29367: URL: https://github.com/apache/spark/pull/29367#issuecomment-672261183 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28617: [SPARK-31694][SQL] Add SupportsPartitions APIs on DataSourceV2
AmplabJenkins commented on pull request #28617: URL: https://github.com/apache/spark/pull/28617#issuecomment-672260894 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29367: [SPARK-31198][CORE] Use graceful decommissioning as part of dynamic scaling
AmplabJenkins commented on pull request #29367: URL: https://github.com/apache/spark/pull/29367#issuecomment-672261183 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29328: [SPARK-32516][SQL] 'path' option cannot co-exist with load()'s path parameters
AmplabJenkins commented on pull request #29328: URL: https://github.com/apache/spark/pull/29328#issuecomment-672261135 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28617: [SPARK-31694][SQL] Add SupportsPartitions APIs on DataSourceV2
AmplabJenkins removed a comment on pull request #28617: URL: https://github.com/apache/spark/pull/28617#issuecomment-672260894 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29328: [SPARK-32516][SQL] 'path' option cannot co-exist with load()'s path parameters
SparkQA commented on pull request #29328: URL: https://github.com/apache/spark/pull/29328#issuecomment-672260512 **[Test build #127344 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127344/testReport)** for PR 29328 at commit [`650d45d`](https://github.com/apache/spark/commit/650d45d0594734ef2fc3b9c037440d5d5f5f34cb). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28617: [SPARK-31694][SQL] Add SupportsPartitions APIs on DataSourceV2
SparkQA removed a comment on pull request #28617: URL: https://github.com/apache/spark/pull/28617#issuecomment-672019338 **[Test build #127334 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127334/testReport)** for PR 28617 at commit [`4bf9711`](https://github.com/apache/spark/commit/4bf97113834666c5505d4a550cca77969beeaaed). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28617: [SPARK-31694][SQL] Add SupportsPartitions APIs on DataSourceV2
SparkQA commented on pull request #28617: URL: https://github.com/apache/spark/pull/28617#issuecomment-672259677 **[Test build #127334 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127334/testReport)** for PR 28617 at commit [`4bf9711`](https://github.com/apache/spark/commit/4bf97113834666c5505d4a550cca77969beeaaed). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29360: [SPARK-32542][SQL]Add a Batch in Optimizer to improve performance in multidimensional analysis
AmplabJenkins commented on pull request #29360: URL: https://github.com/apache/spark/pull/29360#issuecomment-672248376 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29360: [SPARK-32542][SQL]Add a Batch in Optimizer to improve performance in multidimensional analysis
AmplabJenkins removed a comment on pull request #29360: URL: https://github.com/apache/spark/pull/29360#issuecomment-672248376 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29360: [SPARK-32542][SQL]Add a Batch in Optimizer to improve performance in multidimensional analysis
SparkQA removed a comment on pull request #29360: URL: https://github.com/apache/spark/pull/29360#issuecomment-672003736 **[Test build #127332 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127332/testReport)** for PR 29360 at commit [`5bda4ad`](https://github.com/apache/spark/commit/5bda4adbd94fd2f33f40ac7b210c6d6ab1774e08). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29360: [SPARK-32542][SQL]Add a Batch in Optimizer to improve performance in multidimensional analysis
SparkQA commented on pull request #29360: URL: https://github.com/apache/spark/pull/29360#issuecomment-672247240 **[Test build #127332 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127332/testReport)** for PR 29360 at commit [`5bda4ad`](https://github.com/apache/spark/commit/5bda4adbd94fd2f33f40ac7b210c6d6ab1774e08). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] jkleckner commented on pull request #28423: [SPARK-24266][k8s] Restart the watcher when we receive a version changed from k8s
jkleckner commented on pull request #28423: URL: https://github.com/apache/spark/pull/28423#issuecomment-672244924 It looks a bit different from what I see. For me, it appears to get stuck at the very end of writing data to Bigtable in the very last task of a job. Our partner is working to back port the fix I mentioned and I will let you know if that addresses the hang. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29360: [SPARK-32542][SQL]Add a Batch in Optimizer to improve performance in multidimensional analysis
AmplabJenkins removed a comment on pull request #29360: URL: https://github.com/apache/spark/pull/29360#issuecomment-672233491 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/127335/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29360: [SPARK-32542][SQL]Add a Batch in Optimizer to improve performance in multidimensional analysis
AmplabJenkins commented on pull request #29360: URL: https://github.com/apache/spark/pull/29360#issuecomment-672233431 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29360: [SPARK-32542][SQL]Add a Batch in Optimizer to improve performance in multidimensional analysis
AmplabJenkins removed a comment on pull request #29360: URL: https://github.com/apache/spark/pull/29360#issuecomment-672233431 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org