[GitHub] [spark] HyukjinKwon commented on pull request #28565: [SPARK-31102][SQL][3.0] Spark-sql fails to parse when contains comment
HyukjinKwon commented on pull request #28565: URL: https://github.com/apache/spark/pull/28565#issuecomment-629963202 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #28566: [SPARK-31746][YARN][TESTS] Show the actual error message in LocalityPlacementStrategySuite
HyukjinKwon commented on a change in pull request #28566: URL: https://github.com/apache/spark/pull/28566#discussion_r426384293 ## File path: resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/LocalityPlacementStrategySuite.scala ## @@ -57,7 +62,6 @@ class LocalityPlacementStrategySuite extends SparkFunSuite { // goal is to create enough requests for localized containers (so there should be many // tasks on several hosts that have no allocated containers). -val resource = Resource.newInstance(8 * 1024, 4) Review comment: should be fixed now. thanks for pointing out quickly. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk commented on a change in pull request #28558: [SPARK-31738][SQL][DOCS] Describe 'L' and 'M' month pattern letters
MaxGekk commented on a change in pull request #28558: URL: https://github.com/apache/spark/pull/28558#discussion_r426383975 ## File path: docs/sql-ref-datetime-pattern.md ## @@ -76,6 +76,57 @@ The count of pattern letters determines the format. - Year: The count of letters determines the minimum field width below which padding is used. If the count of letters is two, then a reduced two digit form is used. For printing, this outputs the rightmost two digits. For parsing, this will parse using the base value of 2000, resulting in a year within the range 2000 to 2099 inclusive. If the count of letters is less than four (but not two), then the sign is only output for negative years. Otherwise, the sign is output if the pad width is exceeded when 'G' is not present. +- Month: If the number of pattern letters is 3 or more, the month is interpreted as text; otherwise, it is interpreted as a number. The text form is depend on letters - 'M' denotes the 'standard' form, and 'L' is for 'stand-alone' form. The difference between the 'standard' and 'stand-alone' forms is trickier to describe as there is no difference in English. However, in other languages there is a difference in the word used when the text is used alone, as opposed to in a complete date. For example, the word used for a month when used alone in a date picker is different to the word used for month in association with a day and year in a date. Here are examples for all supported pattern letters: + - `'M'` or `'L'`: Month number in a year starting from 1. There is no difference between 'M' and 'L'. Month from 1 to 9 are printed without padding. +```sql +spark-sql> select date_format(date '1970-01-01', "M"); +1 +spark-sql> select date_format(date '1970-12-01', "L"); +12 +``` + - `'MM'` or `'LL'`: Month number in a year starting from 1. Zero padding is added for month 1-9. + ```sql + spark-sql> select date_format(date '1970-1-01', "LL"); + 01 + spark-sql> select date_format(date '1970-09-01', "MM"); + 09 + ``` + - `'MMM'`: Short textual representation in the standard form. The month pattern should be a part of a date pattern not just a stand-alone month except locales where there is no difference between stand and stand-alone forms like in English. +```sql +spark-sql> select date_format(date '1970-01-01', "d MMM"); +1 Jan +spark-sql> select to_csv(named_struct('date', date '1970-01-01'), map('dateFormat', 'dd MMM', 'locale', 'RU')); +01 янв. +``` + - `'LLL'`: Short textual representation in the stand-alone form. It should be used to format/parse only months without any other date fields. +```sql +spark-sql> select date_format(date '1970-01-01', "LLL"); +Jan +spark-sql> select to_csv(named_struct('date', date '1970-01-01'), map('dateFormat', 'LLL', 'locale', 'RU')); +янв. +``` + - `''`: full textual month representation in the standard form. It is used for parsing/formatting months as a part of dates/timestamps. +```sql +spark-sql> select date_format(date '1970-01-01', " "); +January 1970 +spark-sql> select to_csv(named_struct('date', date '1970-01-01'), map('dateFormat', 'd ', 'locale', 'RU')); +1 января +``` + - `''`: full textual month representation in the stand-alone form. The pattern can be used to format/parse only months. +```sql +spark-sql> select date_format(date '1970-01-01', ""); +January +spark-sql> select to_csv(named_struct('date', date '1970-01-01'), map('dateFormat', '', 'locale', 'RU')); +январь +``` + - `'L'` or `'M'`: Narrow textual representation of standard or stand-alone forms. Typically it is a single letter. Review comment: Added This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on pull request #28565: [SPARK-31102][SQL][3.0] Spark-sql fails to parse when contains comment
maropu commented on pull request #28565: URL: https://github.com/apache/spark/pull/28565#issuecomment-629962468 It seems the failure above is not related to this PR. See: https://github.com/apache/spark/pull/28566 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk commented on a change in pull request #28558: [SPARK-31738][SQL][DOCS] Describe 'L' and 'M' month pattern letters
MaxGekk commented on a change in pull request #28558: URL: https://github.com/apache/spark/pull/28558#discussion_r426383919 ## File path: docs/sql-ref-datetime-pattern.md ## @@ -76,6 +76,57 @@ The count of pattern letters determines the format. - Year: The count of letters determines the minimum field width below which padding is used. If the count of letters is two, then a reduced two digit form is used. For printing, this outputs the rightmost two digits. For parsing, this will parse using the base value of 2000, resulting in a year within the range 2000 to 2099 inclusive. If the count of letters is less than four (but not two), then the sign is only output for negative years. Otherwise, the sign is output if the pad width is exceeded when 'G' is not present. +- Month: If the number of pattern letters is 3 or more, the month is interpreted as text; otherwise, it is interpreted as a number. The text form is depend on letters - 'M' denotes the 'standard' form, and 'L' is for 'stand-alone' form. The difference between the 'standard' and 'stand-alone' forms is trickier to describe as there is no difference in English. However, in other languages there is a difference in the word used when the text is used alone, as opposed to in a complete date. For example, the word used for a month when used alone in a date picker is different to the word used for month in association with a day and year in a date. Here are examples for all supported pattern letters: Review comment: Added This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #28566: [SPARK-31746][YARN][TESTS] Show the actual error message in LocalityPlacementStrategySuite
HyukjinKwon commented on a change in pull request #28566: URL: https://github.com/apache/spark/pull/28566#discussion_r426383725 ## File path: resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/LocalityPlacementStrategySuite.scala ## @@ -57,7 +62,6 @@ class LocalityPlacementStrategySuite extends SparkFunSuite { // goal is to create enough requests for localized containers (so there should be many // tasks on several hosts that have no allocated containers). -val resource = Resource.newInstance(8 * 1024, 4) Review comment: ah, yeah. Seems it's used in the branch-3.0. I will fix. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28527: [SPARK-31709][SQL] Proper base path for database/table location when it is a relative path
AmplabJenkins removed a comment on pull request #28527: URL: https://github.com/apache/spark/pull/28527#issuecomment-629961956 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28527: [SPARK-31709][SQL] Proper base path for database/table location when it is a relative path
AmplabJenkins commented on pull request #28527: URL: https://github.com/apache/spark/pull/28527#issuecomment-629961956 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #28566: [SPARK-31746][YARN][TESTS] Show the actual error message in LocalityPlacementStrategySuite
maropu commented on a change in pull request #28566: URL: https://github.com/apache/spark/pull/28566#discussion_r426383379 ## File path: resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/LocalityPlacementStrategySuite.scala ## @@ -57,7 +62,6 @@ class LocalityPlacementStrategySuite extends SparkFunSuite { // goal is to create enough requests for localized containers (so there should be many // tasks on several hosts that have no allocated containers). -val resource = Resource.newInstance(8 * 1024, 4) Review comment: https://github.com/apache/spark/pull/28566#issuecomment-629961204 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu edited a comment on pull request #28566: [SPARK-31746][YARN][TESTS] Show the actual error message in LocalityPlacementStrategySuite
maropu edited a comment on pull request #28566: URL: https://github.com/apache/spark/pull/28566#issuecomment-629961204 @HyukjinKwon It seems branch-3.0 broken? ``` [info] Done packaging. [error] /home/jenkins/workspace/SparkPullRequestBuilder@4/resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/LocalityPlacementStrategySuite.scala:65: not found: value resource [error] yarnConf, resource, new MockResolver()) [error] ^ [info] Packaging /home/jenkins/workspace/SparkPullRequestBuilder@4/external/kafka-0-10-token-provider/target/scala-2.12/spark-token-provider-kafka-0-10_2.12-3.0.1-SNAPSHOT-tests.jar ... ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28527: [SPARK-31709][SQL] Proper base path for database/table location when it is a relative path
SparkQA removed a comment on pull request #28527: URL: https://github.com/apache/spark/pull/28527#issuecomment-629879870 **[Test build #122764 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122764/testReport)** for PR 28527 at commit [`7d50c17`](https://github.com/apache/spark/commit/7d50c17ceca0051e455ec1faf17f3c9ad05a206f). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on pull request #28566: [SPARK-31746][YARN][TESTS] Show the actual error message in LocalityPlacementStrategySuite
maropu commented on pull request #28566: URL: https://github.com/apache/spark/pull/28566#issuecomment-629961204 @HyukjinKwon It seems branch-3.0 broken? ``` LocalityPlacementStrategySuite ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28527: [SPARK-31709][SQL] Proper base path for database/table location when it is a relative path
SparkQA commented on pull request #28527: URL: https://github.com/apache/spark/pull/28527#issuecomment-629961162 **[Test build #122764 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122764/testReport)** for PR 28527 at commit [`7d50c17`](https://github.com/apache/spark/commit/7d50c17ceca0051e455ec1faf17f3c9ad05a206f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28565: [SPARK-31102][SQL][3.0] Spark-sql fails to parse when contains comment
AmplabJenkins removed a comment on pull request #28565: URL: https://github.com/apache/spark/pull/28565#issuecomment-629960072 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/122788/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28565: [SPARK-31102][SQL][3.0] Spark-sql fails to parse when contains comment
SparkQA commented on pull request #28565: URL: https://github.com/apache/spark/pull/28565#issuecomment-629960051 **[Test build #122788 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122788/testReport)** for PR 28565 at commit [`49fa2a8`](https://github.com/apache/spark/commit/49fa2a87254238eb42e2801321dc50fb94cc50bb). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28565: [SPARK-31102][SQL][3.0] Spark-sql fails to parse when contains comment
AmplabJenkins commented on pull request #28565: URL: https://github.com/apache/spark/pull/28565#issuecomment-629960066 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28565: [SPARK-31102][SQL][3.0] Spark-sql fails to parse when contains comment
SparkQA removed a comment on pull request #28565: URL: https://github.com/apache/spark/pull/28565#issuecomment-629957300 **[Test build #122788 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122788/testReport)** for PR 28565 at commit [`49fa2a8`](https://github.com/apache/spark/commit/49fa2a87254238eb42e2801321dc50fb94cc50bb). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28565: [SPARK-31102][SQL][3.0] Spark-sql fails to parse when contains comment
AmplabJenkins removed a comment on pull request #28565: URL: https://github.com/apache/spark/pull/28565#issuecomment-629960066 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28563: [SPARK-31743][CORE] Add spark_info metric into PrometheusResource
AmplabJenkins commented on pull request #28563: URL: https://github.com/apache/spark/pull/28563#issuecomment-629959652 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28563: [SPARK-31743][CORE] Add spark_info metric into PrometheusResource
AmplabJenkins removed a comment on pull request #28563: URL: https://github.com/apache/spark/pull/28563#issuecomment-629959652 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on a change in pull request #28511: [SPARK-31684][SQL] Overwrite partition failed with 'WRONG FS' when the target partition is not belong to the filesystem as same
yaooqinn commented on a change in pull request #28511: URL: https://github.com/apache/spark/pull/28511#discussion_r426381332 ## File path: sql/hive/benchmarks/InsertIntoHiveTableBenchmark-results.txt ## @@ -0,0 +1,11 @@ +Java HotSpot(TM) 64-Bit Server VM 1.8.0_251-b08 on Mac OS X 10.15.4 +Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz +insert hive table benchmark: Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative + +INSERT INTO DYNAMIC7346 7470 175 0.0 717423.0 1.0X +INSERT INTO HYBRID 1179 1188 13 0.0 115184.2 6.2X +INSERT INTO STATIC 344367 48 0.0 33585.1 21.4X +INSERT OVERWRITE DYNAMIC 7656 7714 82 0.0 747622.7 1.0X +INSERT OVERWRITE HYBRID1179 1183 6 0.0 115163.3 6.2X +INSERT OVERWRITE STATIC 400408 10 0.0 39014.2 18.4X Review comment: Let me run this benchmark on the master branch and update the result later in the PR description. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28563: [SPARK-31743][CORE] Add spark_info metric into PrometheusResource
SparkQA removed a comment on pull request #28563: URL: https://github.com/apache/spark/pull/28563#issuecomment-629892529 **[Test build #122773 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122773/testReport)** for PR 28563 at commit [`a0aff8a`](https://github.com/apache/spark/commit/a0aff8af880f6d64e43a7229f44fa7237dfd718e). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28563: [SPARK-31743][CORE] Add spark_info metric into PrometheusResource
SparkQA commented on pull request #28563: URL: https://github.com/apache/spark/pull/28563#issuecomment-629958887 **[Test build #122773 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122773/testReport)** for PR 28563 at commit [`a0aff8a`](https://github.com/apache/spark/commit/a0aff8af880f6d64e43a7229f44fa7237dfd718e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28562: [SPARK-31742][TESTS] Increase the eventually time limit for Mino kdc in tests to fix flakiness
AmplabJenkins commented on pull request #28562: URL: https://github.com/apache/spark/pull/28562#issuecomment-629958185 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on a change in pull request #28511: [SPARK-31684][SQL] Overwrite partition failed with 'WRONG FS' when the target partition is not belong to the filesystem as same
yaooqinn commented on a change in pull request #28511: URL: https://github.com/apache/spark/pull/28511#discussion_r426380145 ## File path: sql/hive/src/test/scala/org/apache/spark/sql/execution/benchmark/InsertIntoHiveTableBenchmark.scala ## @@ -0,0 +1,144 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.benchmark + +import org.apache.spark.benchmark.Benchmark +import org.apache.spark.sql.SparkSession +import org.apache.spark.sql.hive.HiveUtils +import org.apache.spark.sql.hive.test.TestHive + +/** + * Benchmark to measure hive table write performance. + * To run this benchmark: + * {{{ + * 1. without sbt: bin/spark-submit --class + *--jars ,, + *--packages org.spark-project.hive:hive-exec:1.2.1.spark2 + * + * 2. build/sbt "hive/test:runMain " -Phive-1.2 or + * build/sbt "hive/test:runMain " -Phive-2.3 + * 3. generate result: + * SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "hive/test:runMain " + * Results will be written to "benchmarks/InsertIntoHiveTableBenchmark-results.txt". + * 4. -Phive-1.2 does not work for JDK 11 + * }}} + */ +object InsertIntoHiveTableBenchmark extends SqlBasedBenchmark { + + override def getSparkSession: SparkSession = TestHive.sparkSession + + val tempTable = "temp" + val numRows = 1024 * 10 + val sql = spark.sql _ + + // scalastyle:off hadoopconfiguration + private val hadoopConf = spark.sparkContext.hadoopConfiguration + // scalastyle:on hadoopconfiguration + hadoopConf.set("hive.exec.dynamic.partition", "true") + hadoopConf.set("hive.exec.dynamic.partition.mode", "nonstrict") + hadoopConf.set("hive.exec.max.dynamic.partitions", numRows.toString) + + def withTempTable(tableNames: String*)(f: => Unit): Unit = { +val ds = spark.range(numRows) +tableNames.foreach { name => + ds.createOrReplaceTempView(name) +} +try f finally tableNames.foreach(spark.catalog.dropTempView) + } + + def withTable(tableNames: String*)(f: => Unit): Unit = { +tableNames.foreach { name => + sql(s"CREATE TABLE $name(a INT) STORED AS TEXTFILE PARTITIONED BY (b INT, c INT)") +} +try f finally { + tableNames.foreach { name => +spark.sql(s"DROP TABLE IF EXISTS $name") + } +} + } + + def insertOverwriteDynamic(table: String, benchmark: Benchmark): Unit = { +benchmark.addCase("INSERT OVERWRITE DYNAMIC") { _ => + sql(s"INSERT OVERWRITE TABLE $table SELECT CAST(id AS INT) AS a," + +s" CAST(id % 10 AS INT) AS b, CAST(id % 100 AS INT) AS c FROM $tempTable DISTRIBUTE BY a") +} + } + + def insertOverwriteHybrid(table: String, benchmark: Benchmark): Unit = { +benchmark.addCase("INSERT OVERWRITE HYBRID") { _ => + sql(s"INSERT OVERWRITE TABLE $table partition(b=1, c) SELECT CAST(id AS INT) AS a," + +s" CAST(id % 10 AS INT) AS c FROM $tempTable DISTRIBUTE BY a") +} + } + + def insertOverwriteStatic(table: String, benchmark: Benchmark): Unit = { +benchmark.addCase("INSERT OVERWRITE STATIC") { _ => + sql(s"INSERT OVERWRITE TABLE $table partition(b=1, c=10) SELECT CAST(id AS INT) AS a" + +s" FROM $tempTable DISTRIBUTE BY a") +} + } + + def insertIntoDynamic(table: String, benchmark: Benchmark): Unit = { +benchmark.addCase("INSERT INTO DYNAMIC") { _ => + sql(s"INSERT INTO TABLE $table SELECT CAST(id AS INT) AS a," + +s" CAST(id % 10 AS INT) AS b, CAST(id % 100 AS INT) AS c FROM $tempTable DISTRIBUTE BY a") +} + } + + def insertIntoHybrid(table: String, benchmark: Benchmark): Unit = { +benchmark.addCase("INSERT INTO HYBRID") { _ => + sql(s"INSERT INTO TABLE $table partition(b=1, c) SELECT CAST(id AS INT) AS a," + +s" CAST(id % 10 AS INT) AS c FROM $tempTable DISTRIBUTE BY a") +} + } + + def insertIntoStatic(table: String, benchmark: Benchmark): Unit = { +benchmark.addCase("INSERT INTO STATIC") { _ => + sql(s"INSERT INTO TABLE $table partition(b=1, c=10) SELECT CAST(id AS INT) AS a" + +s" FROM $tempTable DISTRIBUTE BY a") +} + } + + override def runBenchmarkSuite(mainArgs: Array[String]): Unit = { +withTempTable(tempTable) { + val t1 =
[GitHub] [spark] AmplabJenkins removed a comment on pull request #27066: [SPARK-31317][SQL] Add withFields method to Column
AmplabJenkins removed a comment on pull request #27066: URL: https://github.com/apache/spark/pull/27066#issuecomment-629957757 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/122775/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28562: [SPARK-31742][TESTS] Increase the eventually time limit for Mino kdc in tests to fix flakiness
AmplabJenkins removed a comment on pull request #28562: URL: https://github.com/apache/spark/pull/28562#issuecomment-629958185 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on a change in pull request #28511: [SPARK-31684][SQL] Overwrite partition failed with 'WRONG FS' when the target partition is not belong to the filesystem as same
yaooqinn commented on a change in pull request #28511: URL: https://github.com/apache/spark/pull/28511#discussion_r426379977 ## File path: sql/hive/src/test/scala/org/apache/spark/sql/execution/benchmark/InsertIntoHiveTableBenchmark.scala ## @@ -0,0 +1,144 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.benchmark + +import org.apache.spark.benchmark.Benchmark +import org.apache.spark.sql.SparkSession +import org.apache.spark.sql.hive.HiveUtils +import org.apache.spark.sql.hive.test.TestHive + +/** + * Benchmark to measure hive table write performance. + * To run this benchmark: + * {{{ + * 1. without sbt: bin/spark-submit --class + *--jars ,, + *--packages org.spark-project.hive:hive-exec:1.2.1.spark2 + * + * 2. build/sbt "hive/test:runMain " -Phive-1.2 or + * build/sbt "hive/test:runMain " -Phive-2.3 + * 3. generate result: + * SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "hive/test:runMain " + * Results will be written to "benchmarks/InsertIntoHiveTableBenchmark-results.txt". + * 4. -Phive-1.2 does not work for JDK 11 + * }}} + */ +object InsertIntoHiveTableBenchmark extends SqlBasedBenchmark { + + override def getSparkSession: SparkSession = TestHive.sparkSession + + val tempTable = "temp" + val numRows = 1024 * 10 + val sql = spark.sql _ + + // scalastyle:off hadoopconfiguration + private val hadoopConf = spark.sparkContext.hadoopConfiguration + // scalastyle:on hadoopconfiguration + hadoopConf.set("hive.exec.dynamic.partition", "true") + hadoopConf.set("hive.exec.dynamic.partition.mode", "nonstrict") + hadoopConf.set("hive.exec.max.dynamic.partitions", numRows.toString) + + def withTempTable(tableNames: String*)(f: => Unit): Unit = { +val ds = spark.range(numRows) +tableNames.foreach { name => + ds.createOrReplaceTempView(name) +} +try f finally tableNames.foreach(spark.catalog.dropTempView) + } + + def withTable(tableNames: String*)(f: => Unit): Unit = { +tableNames.foreach { name => + sql(s"CREATE TABLE $name(a INT) STORED AS TEXTFILE PARTITIONED BY (b INT, c INT)") +} +try f finally { + tableNames.foreach { name => +spark.sql(s"DROP TABLE IF EXISTS $name") + } +} + } + + def insertOverwriteDynamic(table: String, benchmark: Benchmark): Unit = { +benchmark.addCase("INSERT OVERWRITE DYNAMIC") { _ => + sql(s"INSERT OVERWRITE TABLE $table SELECT CAST(id AS INT) AS a," + +s" CAST(id % 10 AS INT) AS b, CAST(id % 100 AS INT) AS c FROM $tempTable DISTRIBUTE BY a") +} + } + + def insertOverwriteHybrid(table: String, benchmark: Benchmark): Unit = { +benchmark.addCase("INSERT OVERWRITE HYBRID") { _ => + sql(s"INSERT OVERWRITE TABLE $table partition(b=1, c) SELECT CAST(id AS INT) AS a," + +s" CAST(id % 10 AS INT) AS c FROM $tempTable DISTRIBUTE BY a") +} + } + + def insertOverwriteStatic(table: String, benchmark: Benchmark): Unit = { +benchmark.addCase("INSERT OVERWRITE STATIC") { _ => + sql(s"INSERT OVERWRITE TABLE $table partition(b=1, c=10) SELECT CAST(id AS INT) AS a" + +s" FROM $tempTable DISTRIBUTE BY a") +} + } + + def insertIntoDynamic(table: String, benchmark: Benchmark): Unit = { +benchmark.addCase("INSERT INTO DYNAMIC") { _ => + sql(s"INSERT INTO TABLE $table SELECT CAST(id AS INT) AS a," + +s" CAST(id % 10 AS INT) AS b, CAST(id % 100 AS INT) AS c FROM $tempTable DISTRIBUTE BY a") +} + } + + def insertIntoHybrid(table: String, benchmark: Benchmark): Unit = { +benchmark.addCase("INSERT INTO HYBRID") { _ => + sql(s"INSERT INTO TABLE $table partition(b=1, c) SELECT CAST(id AS INT) AS a," + +s" CAST(id % 10 AS INT) AS c FROM $tempTable DISTRIBUTE BY a") +} + } + + def insertIntoStatic(table: String, benchmark: Benchmark): Unit = { +benchmark.addCase("INSERT INTO STATIC") { _ => + sql(s"INSERT INTO TABLE $table partition(b=1, c=10) SELECT CAST(id AS INT) AS a" + +s" FROM $tempTable DISTRIBUTE BY a") +} + } + + override def runBenchmarkSuite(mainArgs: Array[String]): Unit = { +withTempTable(tempTable) { + val t1 =
[GitHub] [spark] AmplabJenkins removed a comment on pull request #27066: [SPARK-31317][SQL] Add withFields method to Column
AmplabJenkins removed a comment on pull request #27066: URL: https://github.com/apache/spark/pull/27066#issuecomment-629957755 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #27066: [SPARK-31317][SQL] Add withFields method to Column
AmplabJenkins commented on pull request #27066: URL: https://github.com/apache/spark/pull/27066#issuecomment-629957755 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28562: [SPARK-31742][TESTS] Increase the eventually time limit for Mino kdc in tests to fix flakiness
SparkQA removed a comment on pull request #28562: URL: https://github.com/apache/spark/pull/28562#issuecomment-629891408 **[Test build #122772 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122772/testReport)** for PR 28562 at commit [`8bc6df2`](https://github.com/apache/spark/commit/8bc6df2c2e86d917e48c4debd78e87f714a27151). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28565: [SPARK-31102][SQL][3.0] Spark-sql fails to parse when contains comment
AmplabJenkins commented on pull request #28565: URL: https://github.com/apache/spark/pull/28565#issuecomment-629957600 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28565: [SPARK-31102][SQL][3.0] Spark-sql fails to parse when contains comment
AmplabJenkins removed a comment on pull request #28565: URL: https://github.com/apache/spark/pull/28565#issuecomment-629957600 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #27066: [SPARK-31317][SQL] Add withFields method to Column
SparkQA removed a comment on pull request #27066: URL: https://github.com/apache/spark/pull/27066#issuecomment-629900457 **[Test build #122775 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122775/testReport)** for PR 27066 at commit [`7f76539`](https://github.com/apache/spark/commit/7f76539ac50f59264cc443cfec93e4a8f4e495ab). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #27066: [SPARK-31317][SQL] Add withFields method to Column
SparkQA commented on pull request #27066: URL: https://github.com/apache/spark/pull/27066#issuecomment-629957411 **[Test build #122775 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122775/testReport)** for PR 27066 at commit [`7f76539`](https://github.com/apache/spark/commit/7f76539ac50f59264cc443cfec93e4a8f4e495ab). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28562: [SPARK-31742][TESTS] Increase the eventually time limit for Mino kdc in tests to fix flakiness
SparkQA commented on pull request #28562: URL: https://github.com/apache/spark/pull/28562#issuecomment-629957428 **[Test build #122772 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122772/testReport)** for PR 28562 at commit [`8bc6df2`](https://github.com/apache/spark/commit/8bc6df2c2e86d917e48c4debd78e87f714a27151). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28565: [SPARK-31102][SQL][3.0] Spark-sql fails to parse when contains comment
SparkQA commented on pull request #28565: URL: https://github.com/apache/spark/pull/28565#issuecomment-629957300 **[Test build #122788 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122788/testReport)** for PR 28565 at commit [`49fa2a8`](https://github.com/apache/spark/commit/49fa2a87254238eb42e2801321dc50fb94cc50bb). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon closed pull request #28566: [SPARK-31746][YARN][TESTS] Show the actual error message in LocalityPlacementStrategySuite
HyukjinKwon closed pull request #28566: URL: https://github.com/apache/spark/pull/28566 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #28566: [SPARK-31746][YARN][TESTS] Show the actual error message in LocalityPlacementStrategySuite
HyukjinKwon commented on pull request #28566: URL: https://github.com/apache/spark/pull/28566#issuecomment-629956109 Merged to master and branch-3.0. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #28566: [SPARK-31746][YARN][TESTS] Show the actual error message in LocalityPlacementStrategySuite
HyukjinKwon commented on pull request #28566: URL: https://github.com/apache/spark/pull/28566#issuecomment-629955913 I am going to merge this, see https://github.com/apache/spark/pull/28463#issuecomment-629955825. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #28463: [SPARK-31399][CORE][test-hadoop3.2][test-java11] Support indylambda Scala closure in ClosureCleaner
HyukjinKwon commented on pull request #28463: URL: https://github.com/apache/spark/pull/28463#issuecomment-629955825 `LocalityPlacementStrategySuite` was failed again. Potentially related. I am going to merge https://github.com/apache/spark/pull/28566 together. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on pull request #28565: [SPARK-31102][SQL][3.0] Spark-sql fails to parse when contains comment
maropu commented on pull request #28565: URL: https://github.com/apache/spark/pull/28565#issuecomment-629955864 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28463: [SPARK-31399][CORE][test-hadoop3.2][test-java11] Support indylambda Scala closure in ClosureCleaner
AmplabJenkins removed a comment on pull request #28463: URL: https://github.com/apache/spark/pull/28463#issuecomment-629955676 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28463: [SPARK-31399][CORE][test-hadoop3.2][test-java11] Support indylambda Scala closure in ClosureCleaner
AmplabJenkins commented on pull request #28463: URL: https://github.com/apache/spark/pull/28463#issuecomment-629955676 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan closed pull request #28463: [SPARK-31399][CORE][test-hadoop3.2][test-java11] Support indylambda Scala closure in ClosureCleaner
cloud-fan closed pull request #28463: URL: https://github.com/apache/spark/pull/28463 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28463: [SPARK-31399][CORE][test-hadoop3.2][test-java11] Support indylambda Scala closure in ClosureCleaner
SparkQA commented on pull request #28463: URL: https://github.com/apache/spark/pull/28463#issuecomment-629955235 **[Test build #122781 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122781/testReport)** for PR 28463 at commit [`978e60e`](https://github.com/apache/spark/commit/978e60e171e35b01ee166e00c4f63da3db877aad). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `// starting closure (in class T)` * `// we need to track calls from \"inner closure\" to outer classes relative to it (class T, A, B)` * `logDebug(s\"found inner class $ownerExternalName\")` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #28463: [SPARK-31399][CORE][test-hadoop3.2][test-java11] Support indylambda Scala closure in ClosureCleaner
cloud-fan commented on pull request #28463: URL: https://github.com/apache/spark/pull/28463#issuecomment-629955260 We don't have many critical changes after the last success build: https://github.com/apache/spark/pull/28463#issuecomment-624694820 The failed flaky tests are unrelated to this PR, and we need to unblock 3.0 ASAP. I'm merging it first, will monitor the jenkins builds later. Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28463: [SPARK-31399][CORE][test-hadoop3.2][test-java11] Support indylambda Scala closure in ClosureCleaner
SparkQA removed a comment on pull request #28463: URL: https://github.com/apache/spark/pull/28463#issuecomment-629923433 **[Test build #122781 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122781/testReport)** for PR 28463 at commit [`978e60e`](https://github.com/apache/spark/commit/978e60e171e35b01ee166e00c4f63da3db877aad). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28561: [SPARK-31740][K8S][TESTS] Use github URL instead of a broken link
AmplabJenkins commented on pull request #28561: URL: https://github.com/apache/spark/pull/28561#issuecomment-629954597 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28561: [SPARK-31740][K8S][TESTS] Use github URL instead of a broken link
SparkQA commented on pull request #28561: URL: https://github.com/apache/spark/pull/28561#issuecomment-629954432 **[Test build #122786 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122786/testReport)** for PR 28561 at commit [`77c2e14`](https://github.com/apache/spark/commit/77c2e14669f64c19d6068dbec695287b08f54205). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28561: [SPARK-31740][K8S][TESTS] Use github URL instead of a broken link
SparkQA removed a comment on pull request #28561: URL: https://github.com/apache/spark/pull/28561#issuecomment-629949398 **[Test build #122786 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122786/testReport)** for PR 28561 at commit [`77c2e14`](https://github.com/apache/spark/commit/77c2e14669f64c19d6068dbec695287b08f54205). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28561: [SPARK-31740][K8S][TESTS] Use github URL instead of a broken link
AmplabJenkins removed a comment on pull request #28561: URL: https://github.com/apache/spark/pull/28561#issuecomment-629954597 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #28558: [SPARK-31738][SQL][DOCS] Describe 'L' and 'M' month pattern letters
cloud-fan commented on a change in pull request #28558: URL: https://github.com/apache/spark/pull/28558#discussion_r426375094 ## File path: docs/sql-ref-datetime-pattern.md ## @@ -76,6 +76,57 @@ The count of pattern letters determines the format. - Year: The count of letters determines the minimum field width below which padding is used. If the count of letters is two, then a reduced two digit form is used. For printing, this outputs the rightmost two digits. For parsing, this will parse using the base value of 2000, resulting in a year within the range 2000 to 2099 inclusive. If the count of letters is less than four (but not two), then the sign is only output for negative years. Otherwise, the sign is output if the pad width is exceeded when 'G' is not present. +- Month: If the number of pattern letters is 3 or more, the month is interpreted as text; otherwise, it is interpreted as a number. The text form is depend on letters - 'M' denotes the 'standard' form, and 'L' is for 'stand-alone' form. The difference between the 'standard' and 'stand-alone' forms is trickier to describe as there is no difference in English. However, in other languages there is a difference in the word used when the text is used alone, as opposed to in a complete date. For example, the word used for a month when used alone in a date picker is different to the word used for month in association with a day and year in a date. Here are examples for all supported pattern letters: + - `'M'` or `'L'`: Month number in a year starting from 1. There is no difference between 'M' and 'L'. Month from 1 to 9 are printed without padding. +```sql +spark-sql> select date_format(date '1970-01-01', "M"); +1 +spark-sql> select date_format(date '1970-12-01', "L"); +12 +``` + - `'MM'` or `'LL'`: Month number in a year starting from 1. Zero padding is added for month 1-9. + ```sql + spark-sql> select date_format(date '1970-1-01', "LL"); + 01 + spark-sql> select date_format(date '1970-09-01', "MM"); + 09 + ``` + - `'MMM'`: Short textual representation in the standard form. The month pattern should be a part of a date pattern not just a stand-alone month except locales where there is no difference between stand and stand-alone forms like in English. +```sql +spark-sql> select date_format(date '1970-01-01', "d MMM"); +1 Jan +spark-sql> select to_csv(named_struct('date', date '1970-01-01'), map('dateFormat', 'dd MMM', 'locale', 'RU')); +01 янв. +``` + - `'LLL'`: Short textual representation in the stand-alone form. It should be used to format/parse only months without any other date fields. +```sql +spark-sql> select date_format(date '1970-01-01', "LLL"); +Jan +spark-sql> select to_csv(named_struct('date', date '1970-01-01'), map('dateFormat', 'LLL', 'locale', 'RU')); +янв. +``` + - `''`: full textual month representation in the standard form. It is used for parsing/formatting months as a part of dates/timestamps. +```sql +spark-sql> select date_format(date '1970-01-01', " "); +January 1970 +spark-sql> select to_csv(named_struct('date', date '1970-01-01'), map('dateFormat', 'd ', 'locale', 'RU')); +1 января +``` + - `''`: full textual month representation in the stand-alone form. The pattern can be used to format/parse only months. +```sql +spark-sql> select date_format(date '1970-01-01', ""); +January +spark-sql> select to_csv(named_struct('date', date '1970-01-01'), map('dateFormat', '', 'locale', 'RU')); +январь +``` + - `'L'` or `'M'`: Narrow textual representation of standard or stand-alone forms. Typically it is a single letter. Review comment: how about ``` Here are examples for all supported pattern letters (more than 5 letter is invalid): ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28523: [SPARK-31706][SQL] add back the support of streaming update mode
AmplabJenkins removed a comment on pull request #28523: URL: https://github.com/apache/spark/pull/28523#issuecomment-629951647 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28523: [SPARK-31706][SQL] add back the support of streaming update mode
AmplabJenkins commented on pull request #28523: URL: https://github.com/apache/spark/pull/28523#issuecomment-629951647 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] holdenk commented on a change in pull request #28370: [SPARK-20732][CORE] Decommission cache blocks to other executors when an executor is decommissioned
holdenk commented on a change in pull request #28370: URL: https://github.com/apache/spark/pull/28370#discussion_r426374454 ## File path: core/src/main/scala/org/apache/spark/storage/BlockManager.scala ## @@ -1829,7 +1901,58 @@ private[spark] class BlockManager( data.dispose() } + /** + * Class to handle block manager decommissioning retries + * It creates a Thread to retry offloading all RDD cache blocks + */ + private class BlockManagerDecommissionManager(conf: SparkConf) { Review comment: So if you look at the parent issue you can see there is another sub issue that says migrate shuffle blocks. It’s ok to ask for a follow up even if there is one (we all miss things in reading), but attempt to vote a -1 has a higher bar than just asking for something. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28523: [SPARK-31706][SQL] add back the support of streaming update mode
SparkQA commented on pull request #28523: URL: https://github.com/apache/spark/pull/28523#issuecomment-629951351 **[Test build #122787 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122787/testReport)** for PR 28523 at commit [`1955f01`](https://github.com/apache/spark/commit/1955f01fa870cd180f66a22070ee1b0ca9a73ca3). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun closed pull request #28561: [SPARK-31740][K8S][TESTS] Use github URL instead of a broken link
dongjoon-hyun closed pull request #28561: URL: https://github.com/apache/spark/pull/28561 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] holdenk commented on a change in pull request #28370: [SPARK-20732][CORE] Decommission cache blocks to other executors when an executor is decommissioned
holdenk commented on a change in pull request #28370: URL: https://github.com/apache/spark/pull/28370#discussion_r426373154 ## File path: core/src/main/scala/org/apache/spark/storage/BlockManager.scala ## @@ -1829,7 +1901,58 @@ private[spark] class BlockManager( data.dispose() } + /** + * Class to handle block manager decommissioning retries + * It creates a Thread to retry offloading all RDD cache blocks + */ + private class BlockManagerDecommissionManager(conf: SparkConf) { +@volatile private var stopped = false +private val sleepInterval = conf.get( + config.STORAGE_DECOMMISSION_REPLICATION_REATTEMPT_INTERVAL) + +private val blockReplicationThread = new Thread { + override def run(): Unit = { +var failures = 0 +while (blockManagerDecommissioning + && !stopped + && !Thread.interrupted() Review comment: If an interrupt exception is caught the thread would still be marked as interrupted This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #28523: [SPARK-31706][SQL] add back the support of streaming update mode
cloud-fan commented on pull request #28523: URL: https://github.com/apache/spark/pull/28523#issuecomment-629949602 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] holdenk commented on a change in pull request #28370: [SPARK-20732][CORE] Decommission cache blocks to other executors when an executor is decommissioned
holdenk commented on a change in pull request #28370: URL: https://github.com/apache/spark/pull/28370#discussion_r426372989 ## File path: core/src/test/scala/org/apache/spark/storage/BlockManagerDecommissionSuite.scala ## @@ -0,0 +1,106 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.storage + +import java.util.concurrent.Semaphore + +import scala.collection.mutable.ArrayBuffer +import scala.concurrent.duration._ + +import org.apache.spark.{LocalSparkContext, SparkConf, SparkContext, SparkFunSuite, Success} +import org.apache.spark.internal.config +import org.apache.spark.scheduler.{SparkListener, SparkListenerTaskEnd, SparkListenerTaskStart} +import org.apache.spark.scheduler.cluster.StandaloneSchedulerBackend +import org.apache.spark.util.{ResetSystemProperties, ThreadUtils} + +class BlockManagerDecommissionSuite extends SparkFunSuite with LocalSparkContext +with ResetSystemProperties { + + override def beforeEach(): Unit = { +val conf = new SparkConf().setAppName("test") + .set(config.Worker.WORKER_DECOMMISSION_ENABLED, true) + .set(config.STORAGE_DECOMMISSION_ENABLED, true) + +sc = new SparkContext("local-cluster[2, 1, 1024]", "test", conf) + } + + test(s"verify that an already running task which is going to cache data succeeds " + +s"on a decommissioned executor") { +// Create input RDD with 10 partitions +val input = sc.parallelize(1 to 10, 10) +val accum = sc.longAccumulator("mapperRunAccumulator") +// Do a count to wait for the executors to be registered. Review comment: That’s ok for this test. But no harm in changing to the utility function This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] holdenk commented on a change in pull request #28370: [SPARK-20732][CORE] Decommission cache blocks to other executors when an executor is decommissioned
holdenk commented on a change in pull request #28370: URL: https://github.com/apache/spark/pull/28370#discussion_r426372857 ## File path: core/src/main/scala/org/apache/spark/storage/BlockManagerMasterEndpoint.scala ## @@ -299,6 +310,39 @@ class BlockManagerMasterEndpoint( blockManagerIdByExecutor.get(execId).foreach(removeBlockManager) } + /** + * Decommission the given Seq of blockmanagers + *- Adds these block managers to decommissioningBlockManagerSet Set + *- Sends the DecommissionBlockManager message to each of the [[BlockManagerSlaveEndpoint]] + */ + def decommissionBlockManagers(blockManagerIds: Seq[BlockManagerId]): Future[Seq[Unit]] = { +val newBlockManagersToDecommission = blockManagerIds.toSet.diff(decommissioningBlockManagerSet) +val futures = newBlockManagersToDecommission.map { blockManagerId => + decommissioningBlockManagerSet.add(blockManagerId) + val info = blockManagerInfo(blockManagerId) + info.slaveEndpoint.ask[Unit](DecommissionBlockManager) +} +Future.sequence{ futures.toSeq } + } + + /** + * Returns a Seq of ReplicateBlock for each RDD block stored by given blockManagerId + * @param blockManagerId - block manager id for which ReplicateBlock info is needed + * @return Seq of ReplicateBlock + */ + private def getReplicateInfoForRDDBlocks(blockManagerId: BlockManagerId): Seq[ReplicateBlock] = { +val info = blockManagerInfo(blockManagerId) + +val rddBlocks = info.blocks.keySet().asScala.filter(_.isRDD) +rddBlocks.map { blockId => + val currentBlockLocations = blockLocations.get(blockId) + val maxReplicas = currentBlockLocations.size + 1 Review comment: Reasonable then to add a comment This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28561: [SPARK-31740][K8S][TESTS] Use github URL instead of a broken link
SparkQA commented on pull request #28561: URL: https://github.com/apache/spark/pull/28561#issuecomment-629949398 **[Test build #122786 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122786/testReport)** for PR 28561 at commit [`77c2e14`](https://github.com/apache/spark/commit/77c2e14669f64c19d6068dbec695287b08f54205). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #28561: [SPARK-31740][K8S][TESTS] Use github URL instead of a broken link
HyukjinKwon commented on pull request #28561: URL: https://github.com/apache/spark/pull/28561#issuecomment-629949070 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28561: [SPARK-31740][K8S][TESTS] Use github URL instead of a broken link
AmplabJenkins removed a comment on pull request #28561: URL: https://github.com/apache/spark/pull/28561#issuecomment-629948590 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/27422/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28561: [SPARK-31740][K8S][TESTS] Use github URL instead of a broken link
SparkQA commented on pull request #28561: URL: https://github.com/apache/spark/pull/28561#issuecomment-629948571 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/27422/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28561: [SPARK-31740][K8S][TESTS] Use github URL instead of a broken link
AmplabJenkins removed a comment on pull request #28561: URL: https://github.com/apache/spark/pull/28561#issuecomment-629948587 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28561: [SPARK-31740][K8S][TESTS] Use github URL instead of a broken link
AmplabJenkins commented on pull request #28561: URL: https://github.com/apache/spark/pull/28561#issuecomment-629948587 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #28549: [SPARK-31727][SQL] Fix error message of casting timestamp to int in ANSI non-codegen mode
cloud-fan commented on pull request #28549: URL: https://github.com/apache/spark/pull/28549#issuecomment-629946729 thanks, merging to master/3.0! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan closed pull request #28549: [SPARK-31727][SQL] Fix error message of casting timestamp to int in ANSI non-codegen mode
cloud-fan closed pull request #28549: URL: https://github.com/apache/spark/pull/28549 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk commented on a change in pull request #28558: [SPARK-31738][SQL][DOCS] Describe 'L' and 'M' month pattern letters
MaxGekk commented on a change in pull request #28558: URL: https://github.com/apache/spark/pull/28558#discussion_r426370104 ## File path: docs/sql-ref-datetime-pattern.md ## @@ -76,6 +76,57 @@ The count of pattern letters determines the format. - Year: The count of letters determines the minimum field width below which padding is used. If the count of letters is two, then a reduced two digit form is used. For printing, this outputs the rightmost two digits. For parsing, this will parse using the base value of 2000, resulting in a year within the range 2000 to 2099 inclusive. If the count of letters is less than four (but not two), then the sign is only output for negative years. Otherwise, the sign is output if the pad width is exceeded when 'G' is not present. +- Month: If the number of pattern letters is 3 or more, the month is interpreted as text; otherwise, it is interpreted as a number. The text form is depend on letters - 'M' denotes the 'standard' form, and 'L' is for 'stand-alone' form. The difference between the 'standard' and 'stand-alone' forms is trickier to describe as there is no difference in English. However, in other languages there is a difference in the word used when the text is used alone, as opposed to in a complete date. For example, the word used for a month when used alone in a date picker is different to the word used for month in association with a day and year in a date. Here are examples for all supported pattern letters: + - `'M'` or `'L'`: Month number in a year starting from 1. There is no difference between 'M' and 'L'. Month from 1 to 9 are printed without padding. +```sql +spark-sql> select date_format(date '1970-01-01', "M"); +1 +spark-sql> select date_format(date '1970-12-01', "L"); +12 +``` + - `'MM'` or `'LL'`: Month number in a year starting from 1. Zero padding is added for month 1-9. + ```sql + spark-sql> select date_format(date '1970-1-01', "LL"); + 01 + spark-sql> select date_format(date '1970-09-01', "MM"); + 09 + ``` + - `'MMM'`: Short textual representation in the standard form. The month pattern should be a part of a date pattern not just a stand-alone month except locales where there is no difference between stand and stand-alone forms like in English. +```sql +spark-sql> select date_format(date '1970-01-01', "d MMM"); +1 Jan +spark-sql> select to_csv(named_struct('date', date '1970-01-01'), map('dateFormat', 'dd MMM', 'locale', 'RU')); +01 янв. +``` + - `'LLL'`: Short textual representation in the stand-alone form. It should be used to format/parse only months without any other date fields. +```sql +spark-sql> select date_format(date '1970-01-01', "LLL"); +Jan +spark-sql> select to_csv(named_struct('date', date '1970-01-01'), map('dateFormat', 'LLL', 'locale', 'RU')); +янв. +``` + - `''`: full textual month representation in the standard form. It is used for parsing/formatting months as a part of dates/timestamps. +```sql +spark-sql> select date_format(date '1970-01-01', " "); +January 1970 +spark-sql> select to_csv(named_struct('date', date '1970-01-01'), map('dateFormat', 'd ', 'locale', 'RU')); +1 января +``` + - `''`: full textual month representation in the stand-alone form. The pattern can be used to format/parse only months. +```sql +spark-sql> select date_format(date '1970-01-01', ""); +January +spark-sql> select to_csv(named_struct('date', date '1970-01-01'), map('dateFormat', '', 'locale', 'RU')); +январь +``` + - `'L'` or `'M'`: Narrow textual representation of standard or stand-alone forms. Typically it is a single letter. Review comment: I wrote above that this is the list of all supported patterns or do you think it is not enough? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #28562: [SPARK-31742][TESTS] Increase the eventually time limit for Mino kdc in tests to fix flakiness
HyukjinKwon commented on pull request #28562: URL: https://github.com/apache/spark/pull/28562#issuecomment-629946373 Thanks all! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk commented on a change in pull request #28558: [SPARK-31738][SQL][DOCS] Describe 'L' and 'M' month pattern letters
MaxGekk commented on a change in pull request #28558: URL: https://github.com/apache/spark/pull/28558#discussion_r426370309 ## File path: docs/sql-ref-datetime-pattern.md ## @@ -76,6 +76,57 @@ The count of pattern letters determines the format. - Year: The count of letters determines the minimum field width below which padding is used. If the count of letters is two, then a reduced two digit form is used. For printing, this outputs the rightmost two digits. For parsing, this will parse using the base value of 2000, resulting in a year within the range 2000 to 2099 inclusive. If the count of letters is less than four (but not two), then the sign is only output for negative years. Otherwise, the sign is output if the pad width is exceeded when 'G' is not present. +- Month: If the number of pattern letters is 3 or more, the month is interpreted as text; otherwise, it is interpreted as a number. The text form is depend on letters - 'M' denotes the 'standard' form, and 'L' is for 'stand-alone' form. The difference between the 'standard' and 'stand-alone' forms is trickier to describe as there is no difference in English. However, in other languages there is a difference in the word used when the text is used alone, as opposed to in a complete date. For example, the word used for a month when used alone in a date picker is different to the word used for month in association with a day and year in a date. Here are examples for all supported pattern letters: Review comment: This is the official doc, isn't it as we decided to not refer to Java doc anymore. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk commented on a change in pull request #28558: [SPARK-31738][SQL][DOCS] Describe 'L' and 'M' month pattern letters
MaxGekk commented on a change in pull request #28558: URL: https://github.com/apache/spark/pull/28558#discussion_r426370104 ## File path: docs/sql-ref-datetime-pattern.md ## @@ -76,6 +76,57 @@ The count of pattern letters determines the format. - Year: The count of letters determines the minimum field width below which padding is used. If the count of letters is two, then a reduced two digit form is used. For printing, this outputs the rightmost two digits. For parsing, this will parse using the base value of 2000, resulting in a year within the range 2000 to 2099 inclusive. If the count of letters is less than four (but not two), then the sign is only output for negative years. Otherwise, the sign is output if the pad width is exceeded when 'G' is not present. +- Month: If the number of pattern letters is 3 or more, the month is interpreted as text; otherwise, it is interpreted as a number. The text form is depend on letters - 'M' denotes the 'standard' form, and 'L' is for 'stand-alone' form. The difference between the 'standard' and 'stand-alone' forms is trickier to describe as there is no difference in English. However, in other languages there is a difference in the word used when the text is used alone, as opposed to in a complete date. For example, the word used for a month when used alone in a date picker is different to the word used for month in association with a day and year in a date. Here are examples for all supported pattern letters: + - `'M'` or `'L'`: Month number in a year starting from 1. There is no difference between 'M' and 'L'. Month from 1 to 9 are printed without padding. +```sql +spark-sql> select date_format(date '1970-01-01', "M"); +1 +spark-sql> select date_format(date '1970-12-01', "L"); +12 +``` + - `'MM'` or `'LL'`: Month number in a year starting from 1. Zero padding is added for month 1-9. + ```sql + spark-sql> select date_format(date '1970-1-01', "LL"); + 01 + spark-sql> select date_format(date '1970-09-01', "MM"); + 09 + ``` + - `'MMM'`: Short textual representation in the standard form. The month pattern should be a part of a date pattern not just a stand-alone month except locales where there is no difference between stand and stand-alone forms like in English. +```sql +spark-sql> select date_format(date '1970-01-01', "d MMM"); +1 Jan +spark-sql> select to_csv(named_struct('date', date '1970-01-01'), map('dateFormat', 'dd MMM', 'locale', 'RU')); +01 янв. +``` + - `'LLL'`: Short textual representation in the stand-alone form. It should be used to format/parse only months without any other date fields. +```sql +spark-sql> select date_format(date '1970-01-01', "LLL"); +Jan +spark-sql> select to_csv(named_struct('date', date '1970-01-01'), map('dateFormat', 'LLL', 'locale', 'RU')); +янв. +``` + - `''`: full textual month representation in the standard form. It is used for parsing/formatting months as a part of dates/timestamps. +```sql +spark-sql> select date_format(date '1970-01-01', " "); +January 1970 +spark-sql> select to_csv(named_struct('date', date '1970-01-01'), map('dateFormat', 'd ', 'locale', 'RU')); +1 января +``` + - `''`: full textual month representation in the stand-alone form. The pattern can be used to format/parse only months. +```sql +spark-sql> select date_format(date '1970-01-01', ""); +January +spark-sql> select to_csv(named_struct('date', date '1970-01-01'), map('dateFormat', '', 'locale', 'RU')); +январь +``` + - `'L'` or `'M'`: Narrow textual representation of standard or stand-alone forms. Typically it is a single letter. Review comment: I wrote above that the list of all supported patterns or do you think we must say for idiots that others are not supported? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #28558: [SPARK-31738][SQL][DOCS] Describe 'L' and 'M' month pattern letters
cloud-fan commented on a change in pull request #28558: URL: https://github.com/apache/spark/pull/28558#discussion_r426368941 ## File path: docs/sql-ref-datetime-pattern.md ## @@ -76,6 +76,57 @@ The count of pattern letters determines the format. - Year: The count of letters determines the minimum field width below which padding is used. If the count of letters is two, then a reduced two digit form is used. For printing, this outputs the rightmost two digits. For parsing, this will parse using the base value of 2000, resulting in a year within the range 2000 to 2099 inclusive. If the count of letters is less than four (but not two), then the sign is only output for negative years. Otherwise, the sign is output if the pad width is exceeded when 'G' is not present. +- Month: If the number of pattern letters is 3 or more, the month is interpreted as text; otherwise, it is interpreted as a number. The text form is depend on letters - 'M' denotes the 'standard' form, and 'L' is for 'stand-alone' form. The difference between the 'standard' and 'stand-alone' forms is trickier to describe as there is no difference in English. However, in other languages there is a difference in the word used when the text is used alone, as opposed to in a complete date. For example, the word used for a month when used alone in a date picker is different to the word used for month in association with a day and year in a date. Here are examples for all supported pattern letters: Review comment: It's better to give a quick example here, so that some users can stop reading the examples to save time. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #28558: [SPARK-31738][SQL][DOCS] Describe 'L' and 'M' month pattern letters
cloud-fan commented on a change in pull request #28558: URL: https://github.com/apache/spark/pull/28558#discussion_r426368941 ## File path: docs/sql-ref-datetime-pattern.md ## @@ -76,6 +76,57 @@ The count of pattern letters determines the format. - Year: The count of letters determines the minimum field width below which padding is used. If the count of letters is two, then a reduced two digit form is used. For printing, this outputs the rightmost two digits. For parsing, this will parse using the base value of 2000, resulting in a year within the range 2000 to 2099 inclusive. If the count of letters is less than four (but not two), then the sign is only output for negative years. Otherwise, the sign is output if the pad width is exceeded when 'G' is not present. +- Month: If the number of pattern letters is 3 or more, the month is interpreted as text; otherwise, it is interpreted as a number. The text form is depend on letters - 'M' denotes the 'standard' form, and 'L' is for 'stand-alone' form. The difference between the 'standard' and 'stand-alone' forms is trickier to describe as there is no difference in English. However, in other languages there is a difference in the word used when the text is used alone, as opposed to in a complete date. For example, the word used for a month when used alone in a date picker is different to the word used for month in association with a day and year in a date. Here are examples for all supported pattern letters: Review comment: It's better to give a quick example here, so that some users can stop reading the following detailed examples to save time. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #28558: [SPARK-31738][SQL][DOCS] Describe 'L' and 'M' month pattern letters
cloud-fan commented on a change in pull request #28558: URL: https://github.com/apache/spark/pull/28558#discussion_r426368640 ## File path: docs/sql-ref-datetime-pattern.md ## @@ -76,6 +76,57 @@ The count of pattern letters determines the format. - Year: The count of letters determines the minimum field width below which padding is used. If the count of letters is two, then a reduced two digit form is used. For printing, this outputs the rightmost two digits. For parsing, this will parse using the base value of 2000, resulting in a year within the range 2000 to 2099 inclusive. If the count of letters is less than four (but not two), then the sign is only output for negative years. Otherwise, the sign is output if the pad width is exceeded when 'G' is not present. +- Month: If the number of pattern letters is 3 or more, the month is interpreted as text; otherwise, it is interpreted as a number. The text form is depend on letters - 'M' denotes the 'standard' form, and 'L' is for 'stand-alone' form. The difference between the 'standard' and 'stand-alone' forms is trickier to describe as there is no difference in English. However, in other languages there is a difference in the word used when the text is used alone, as opposed to in a complete date. For example, the word used for a month when used alone in a date picker is different to the word used for month in association with a day and year in a date. Here are examples for all supported pattern letters: + - `'M'` or `'L'`: Month number in a year starting from 1. There is no difference between 'M' and 'L'. Month from 1 to 9 are printed without padding. +```sql +spark-sql> select date_format(date '1970-01-01', "M"); +1 +spark-sql> select date_format(date '1970-12-01', "L"); +12 +``` + - `'MM'` or `'LL'`: Month number in a year starting from 1. Zero padding is added for month 1-9. + ```sql + spark-sql> select date_format(date '1970-1-01', "LL"); + 01 + spark-sql> select date_format(date '1970-09-01', "MM"); + 09 + ``` + - `'MMM'`: Short textual representation in the standard form. The month pattern should be a part of a date pattern not just a stand-alone month except locales where there is no difference between stand and stand-alone forms like in English. +```sql +spark-sql> select date_format(date '1970-01-01', "d MMM"); +1 Jan +spark-sql> select to_csv(named_struct('date', date '1970-01-01'), map('dateFormat', 'dd MMM', 'locale', 'RU')); +01 янв. +``` + - `'LLL'`: Short textual representation in the stand-alone form. It should be used to format/parse only months without any other date fields. +```sql +spark-sql> select date_format(date '1970-01-01', "LLL"); +Jan +spark-sql> select to_csv(named_struct('date', date '1970-01-01'), map('dateFormat', 'LLL', 'locale', 'RU')); +янв. +``` + - `''`: full textual month representation in the standard form. It is used for parsing/formatting months as a part of dates/timestamps. +```sql +spark-sql> select date_format(date '1970-01-01', " "); +January 1970 +spark-sql> select to_csv(named_struct('date', date '1970-01-01'), map('dateFormat', 'd ', 'locale', 'RU')); +1 января +``` + - `''`: full textual month representation in the stand-alone form. The pattern can be used to format/parse only months. +```sql +spark-sql> select date_format(date '1970-01-01', ""); +January +spark-sql> select to_csv(named_struct('date', date '1970-01-01'), map('dateFormat', '', 'locale', 'RU')); +январь +``` + - `'L'` or `'M'`: Narrow textual representation of standard or stand-alone forms. Typically it is a single letter. Review comment: do we document somewhere that 6 or more `L/M`s are invalid? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #27066: [SPARK-31317][SQL] Add withFields method to Column
SparkQA commented on pull request #27066: URL: https://github.com/apache/spark/pull/27066#issuecomment-629943931 **[Test build #122785 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122785/testReport)** for PR 27066 at commit [`c99c086`](https://github.com/apache/spark/commit/c99c086a2796aef8727328d15f13f9ecb0dc2977). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28565: [SPARK-31102][SQL][3.0] Spark-sql fails to parse when contains comment
AmplabJenkins removed a comment on pull request #28565: URL: https://github.com/apache/spark/pull/28565#issuecomment-629943290 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/122783/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28565: [SPARK-31102][SQL][3.0] Spark-sql fails to parse when contains comment
AmplabJenkins removed a comment on pull request #28565: URL: https://github.com/apache/spark/pull/28565#issuecomment-629943283 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28565: [SPARK-31102][SQL][3.0] Spark-sql fails to parse when contains comment
SparkQA removed a comment on pull request #28565: URL: https://github.com/apache/spark/pull/28565#issuecomment-629940467 **[Test build #122783 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122783/testReport)** for PR 28565 at commit [`49fa2a8`](https://github.com/apache/spark/commit/49fa2a87254238eb42e2801321dc50fb94cc50bb). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28565: [SPARK-31102][SQL][3.0] Spark-sql fails to parse when contains comment
AmplabJenkins commented on pull request #28565: URL: https://github.com/apache/spark/pull/28565#issuecomment-629943283 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28559: [SPARK-31739][PYSPARK][DOCS][MINOR] Fix docstring syntax issues and misplaced space characters.
dongjoon-hyun commented on a change in pull request #28559: URL: https://github.com/apache/spark/pull/28559#discussion_r426367501 ## File path: python/pyspark/sql/dataframe.py ## @@ -2150,9 +2150,9 @@ def toDF(self, *cols): @since(3.0) def transform(self, func): -"""Returns a new class:`DataFrame`. Concise syntax for chaining custom transformations. +"""Returns a new :class:`DataFrame`. Concise syntax for chaining custom transformations. -:param func: a function that takes and returns a class:`DataFrame`. +:param func: a function that takes and returns a :class:`DataFrame`. Review comment: Could you fix `classification.py` and `regression.py`, too? ``` pyspark/ml/classification.py:To be mixed in with class:`pyspark.ml.JavaModel` pyspark/ml/regression.py:To be mixed in with class:`pyspark.ml.JavaModel` ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28565: [SPARK-31102][SQL][3.0] Spark-sql fails to parse when contains comment
SparkQA commented on pull request #28565: URL: https://github.com/apache/spark/pull/28565#issuecomment-629943270 **[Test build #122783 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122783/testReport)** for PR 28565 at commit [`49fa2a8`](https://github.com/apache/spark/commit/49fa2a87254238eb42e2801321dc50fb94cc50bb). * This patch **fails build dependency tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] xuanyuanking commented on a change in pull request #28558: [SPARK-31738][SQL][DOCS] Describe 'L' and 'M' month pattern letters
xuanyuanking commented on a change in pull request #28558: URL: https://github.com/apache/spark/pull/28558#discussion_r426367390 ## File path: docs/sql-ref-datetime-pattern.md ## @@ -76,6 +76,57 @@ The count of pattern letters determines the format. - Year: The count of letters determines the minimum field width below which padding is used. If the count of letters is two, then a reduced two digit form is used. For printing, this outputs the rightmost two digits. For parsing, this will parse using the base value of 2000, resulting in a year within the range 2000 to 2099 inclusive. If the count of letters is less than four (but not two), then the sign is only output for negative years. Otherwise, the sign is output if the pad width is exceeded when 'G' is not present. +- Month: If the number of pattern letters is 3 or more, the month is interpreted as text; otherwise, it is interpreted as a number. The text form is depend on letters - 'M' denotes the 'standard' form, and 'L' is for 'stand-alone' form. The difference between the 'standard' and 'stand-alone' forms is trickier to describe as there is no difference in English. However, in other languages there is a difference in the word used when the text is used alone, as opposed to in a complete date. For example, the word used for a month when used alone in a date picker is different to the word used for month in association with a day and year in a date. Here are examples for all supported pattern letters: Review comment: Do we also need to add an extra official description link for the 'standard' and 'standalone' mode? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on a change in pull request #28370: [SPARK-20732][CORE] Decommission cache blocks to other executors when an executor is decommissioned
Ngone51 commented on a change in pull request #28370: URL: https://github.com/apache/spark/pull/28370#discussion_r42630 ## File path: core/src/main/scala/org/apache/spark/storage/BlockManager.scala ## @@ -1829,7 +1901,58 @@ private[spark] class BlockManager( data.dispose() } + /** + * Class to handle block manager decommissioning retries + * It creates a Thread to retry offloading all RDD cache blocks + */ + private class BlockManagerDecommissionManager(conf: SparkConf) { +@volatile private var stopped = false +private val sleepInterval = conf.get( + config.STORAGE_DECOMMISSION_REPLICATION_REATTEMPT_INTERVAL) + +private val blockReplicationThread = new Thread { + override def run(): Unit = { +var failures = 0 +while (blockManagerDecommissioning + && !stopped + && !Thread.interrupted() Review comment: > Unless the interrupt exception is caught inside of the block transfer If there's an `InterruptedException` captured by block transfer, then `Thread.interrupted()` inside `blockReplicationThread` would return false. And what do you expect for this case? If you want the decommission thread stop, then, `Thread.interrupted()` won't work; Or if you want the decommission thread to keep working, then, `Thread.interrupted()` is useless because the status has already been cleared(unless block transfer set it to interrupted again). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on a change in pull request #28370: [SPARK-20732][CORE] Decommission cache blocks to other executors when an executor is decommissioned
Ngone51 commented on a change in pull request #28370: URL: https://github.com/apache/spark/pull/28370#discussion_r42630 ## File path: core/src/main/scala/org/apache/spark/storage/BlockManager.scala ## @@ -1829,7 +1901,58 @@ private[spark] class BlockManager( data.dispose() } + /** + * Class to handle block manager decommissioning retries + * It creates a Thread to retry offloading all RDD cache blocks + */ + private class BlockManagerDecommissionManager(conf: SparkConf) { +@volatile private var stopped = false +private val sleepInterval = conf.get( + config.STORAGE_DECOMMISSION_REPLICATION_REATTEMPT_INTERVAL) + +private val blockReplicationThread = new Thread { + override def run(): Unit = { +var failures = 0 +while (blockManagerDecommissioning + && !stopped + && !Thread.interrupted() Review comment: > Unless the interrupt exception is caught inside of the block transfer If there's an `InterruptedException` captured by block transfer, then `Thread.interrupted()` inside `blockReplicationThread` would return false. And what do you expect for this case? If you want the decommission thread stop, then, `Thread.interrupted()` won't work; Or if you want the decommission to keep working, then, `Thread.interrupted()` is useless because the status has already been cleared(unless block transfer set it to interrupted again). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28565: [SPARK-31102][SQL][3.0] Spark-sql fails to parse when contains comment
AmplabJenkins removed a comment on pull request #28565: URL: https://github.com/apache/spark/pull/28565#issuecomment-629940841 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28566: [SPARK-31746][YARN][TESTS] Show the actual error message in LocalityPlacementStrategySuite
AmplabJenkins removed a comment on pull request #28566: URL: https://github.com/apache/spark/pull/28566#issuecomment-629940854 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk commented on a change in pull request #28558: [SPARK-31738][SQL][DOCS] Describe 'L' and 'M' month pattern letters
MaxGekk commented on a change in pull request #28558: URL: https://github.com/apache/spark/pull/28558#discussion_r426365527 ## File path: docs/sql-ref-datetime-pattern.md ## @@ -76,6 +76,57 @@ The count of pattern letters determines the format. - Year: The count of letters determines the minimum field width below which padding is used. If the count of letters is two, then a reduced two digit form is used. For printing, this outputs the rightmost two digits. For parsing, this will parse using the base value of 2000, resulting in a year within the range 2000 to 2099 inclusive. If the count of letters is less than four (but not two), then the sign is only output for negative years. Otherwise, the sign is output if the pad width is exceeded when 'G' is not present. +- Month: If the number of pattern letters is 3 or more, the month is interpreted as text; otherwise, it is interpreted as a number. The text form is depend on letters - 'M' denotes the 'standard' form, and 'L' is for 'stand-alone' form. The difference between the 'standard' and 'stand-alone' forms is trickier to describe as there is no difference in English. However, in other languages there is a difference in the word used when the text is used alone, as opposed to in a complete date. For example, the word used for a month when used alone in a date picker is different to the word used for month in association with a day and year in a date. Here are examples for all supported pattern letters: + - `'M'` or `'L'`: Month number in a year starting from 1. There is no difference between 'M' and 'L'. Month from 1 to 9 are printed without padding. +```sql +spark-sql> select date_format(date '1970-01-01', "M"); +1 +spark-sql> select date_format(date '1970-12-01', "L"); +12 +``` + - `'MM'` or `'LL'`: Month number in a year starting from 1. Zero padding is added for month 1-9. + ```sql + spark-sql> select date_format(date '1970-1-01', "LL"); + 01 + spark-sql> select date_format(date '1970-09-01', "MM"); + 09 + ``` + - `'MMM'`: Short textual representation in the standard form. The month pattern should be a part of a date pattern not just a stand-alone month except locales where there is no difference between stand and stand-alone forms like in English. +```sql +spark-sql> select date_format(date '1970-01-01', "d MMM"); +1 Jan +spark-sql> select to_csv(named_struct('date', date '1970-01-01'), map('dateFormat', 'dd MMM', 'locale', 'RU')); +01 янв. +``` + - `'LLL'`: Short textual representation in the stand-alone form. It should be used to format/parse only months without any other date fields. +```sql +spark-sql> select date_format(date '1970-01-01', "LLL"); +Jan +spark-sql> select to_csv(named_struct('date', date '1970-01-01'), map('dateFormat', 'LLL', 'locale', 'RU')); +янв. +``` + - `''`: full textual month representation in the standard form. It is used for parsing/formatting months as a part of dates/timestamps. +```sql +spark-sql> select date_format(date '1970-01-01', " "); +January 1970 +spark-sql> select to_csv(named_struct('date', date '1970-01-01'), map('dateFormat', 'd ', 'locale', 'RU')); +1 января +``` + - `''`: full textual month representation in the stand-alone form. The pattern can be used to format/parse only months. +```sql +spark-sql> select date_format(date '1970-01-01', ""); +January +spark-sql> select to_csv(named_struct('date', date '1970-01-01'), map('dateFormat', '', 'locale', 'RU')); +январь +``` + - `'L'` or `'M'`: Narrow textual representation of standard or stand-alone forms. Typically it is a single letter. Review comment: I demonstrate only supported patterns This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk commented on a change in pull request #28558: [SPARK-31738][SQL][DOCS] Describe 'L' and 'M' month pattern letters
MaxGekk commented on a change in pull request #28558: URL: https://github.com/apache/spark/pull/28558#discussion_r426365426 ## File path: docs/sql-ref-datetime-pattern.md ## @@ -76,6 +76,57 @@ The count of pattern letters determines the format. - Year: The count of letters determines the minimum field width below which padding is used. If the count of letters is two, then a reduced two digit form is used. For printing, this outputs the rightmost two digits. For parsing, this will parse using the base value of 2000, resulting in a year within the range 2000 to 2099 inclusive. If the count of letters is less than four (but not two), then the sign is only output for negative years. Otherwise, the sign is output if the pad width is exceeded when 'G' is not present. +- Month: If the number of pattern letters is 3 or more, the month is interpreted as text; otherwise, it is interpreted as a number. The text form is depend on letters - 'M' denotes the 'standard' form, and 'L' is for 'stand-alone' form. The difference between the 'standard' and 'stand-alone' forms is trickier to describe as there is no difference in English. However, in other languages there is a difference in the word used when the text is used alone, as opposed to in a complete date. For example, the word used for a month when used alone in a date picker is different to the word used for month in association with a day and year in a date. Here are examples for all supported pattern letters: Review comment: The PR contains examples in Russian already This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28561: [SPARK-31740][K8S][TESTS] Use github URL instead of a broken link
SparkQA commented on pull request #28561: URL: https://github.com/apache/spark/pull/28561#issuecomment-629940891 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/27422/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28566: [SPARK-31746][YARN][TESTS] Show the actual error message in LocalityPlacementStrategySuite
AmplabJenkins commented on pull request #28566: URL: https://github.com/apache/spark/pull/28566#issuecomment-629940854 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28565: [SPARK-31102][SQL][3.0] Spark-sql fails to parse when contains comment
AmplabJenkins commented on pull request #28565: URL: https://github.com/apache/spark/pull/28565#issuecomment-629940841 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #28563: [SPARK-31743][CORE] Add spark_info metric into PrometheusResource
dongjoon-hyun commented on pull request #28563: URL: https://github.com/apache/spark/pull/28563#issuecomment-629940568 Thank you all. This is a pretty minor, I'll merge this to branch-3.0 to complete the 3.0 feature. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun closed pull request #28563: [SPARK-31743][CORE] Add spark_info metric into PrometheusResource
dongjoon-hyun closed pull request #28563: URL: https://github.com/apache/spark/pull/28563 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #27066: [SPARK-31317][SQL] Add withFields method to Column
SparkQA commented on pull request #27066: URL: https://github.com/apache/spark/pull/27066#issuecomment-629940521 **[Test build #122784 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122784/testReport)** for PR 27066 at commit [`35351b3`](https://github.com/apache/spark/commit/35351b3aa611acd1a3068cb5e14ea05b57a38a6d). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28566: [SPARK-31746][YARN][TESTS] Show the actual error message in LocalityPlacementStrategySuite
SparkQA commented on pull request #28566: URL: https://github.com/apache/spark/pull/28566#issuecomment-629940475 **[Test build #122782 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122782/testReport)** for PR 28566 at commit [`c7bc490`](https://github.com/apache/spark/commit/c7bc490e94dd9d677f107508280cae943e20818d). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28565: [SPARK-31102][SQL][3.0] Spark-sql fails to parse when contains comment
SparkQA commented on pull request #28565: URL: https://github.com/apache/spark/pull/28565#issuecomment-629940467 **[Test build #122783 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122783/testReport)** for PR 28565 at commit [`49fa2a8`](https://github.com/apache/spark/commit/49fa2a87254238eb42e2801321dc50fb94cc50bb). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun closed pull request #28562: [SPARK-31742][TESTS] Increase the eventually time limit for Mino kdc in tests to fix flakiness
dongjoon-hyun closed pull request #28562: URL: https://github.com/apache/spark/pull/28562 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org