[GitHub] [spark] yaooqinn commented on a change in pull request #28442: [SPARK-31631][TESTS] Fix test flakiness caused by MiniKdc which throws 'address in use' BindException with retry
yaooqinn commented on a change in pull request #28442: URL: https://github.com/apache/spark/pull/28442#discussion_r419878281 ## File path: external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaTestUtils.scala ## @@ -131,11 +130,7 @@ class KafkaTestUtils( } private def setUpMiniKdc(): Unit = { -val kdcDir = Utils.createTempDir() Review comment: Do I still need to address this comment? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28370: [SPARK-20732][CORE] Decommission cache blocks to other executors when an executor is decommissioned
AmplabJenkins commented on pull request #28370: URL: https://github.com/apache/spark/pull/28370#issuecomment-623869047 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28370: [SPARK-20732][CORE] Decommission cache blocks to other executors when an executor is decommissioned
AmplabJenkins removed a comment on pull request #28370: URL: https://github.com/apache/spark/pull/28370#issuecomment-623869047 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #28366: [SPARK-31365][SQL] Enable nested predicate pushdown per data sources
cloud-fan commented on a change in pull request #28366: URL: https://github.com/apache/spark/pull/28366#discussion_r419877636 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ## @@ -2063,16 +2063,17 @@ object SQLConf { .booleanConf .createWithDefault(true) - val NESTED_PREDICATE_PUSHDOWN_ENABLED = -buildConf("spark.sql.optimizer.nestedPredicatePushdown.enabled") - .internal() - .doc("When true, Spark tries to push down predicates for nested columns and or names " + -"containing `dots` to data sources. Currently, Parquet implements both optimizations " + -"while ORC only supports predicates for names containing `dots`. The other data sources" + -"don't support this feature yet.") + val NESTED_PREDICATE_PUSHDOWN_V1_SOURCE_LIST = +buildConf("spark.sql.optimizer.nestedPredicatePushdown.supportedV1Sources") Review comment: `supportedV1Sources` -> `supportedFileSources`? DS v1 and file source are different APIs and have different planner rules/physical nodes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28370: [SPARK-20732][CORE] Decommission cache blocks to other executors when an executor is decommissioned
SparkQA commented on pull request #28370: URL: https://github.com/apache/spark/pull/28370#issuecomment-623868721 **[Test build #122302 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122302/testReport)** for PR 28370 at commit [`c645582`](https://github.com/apache/spark/commit/c645582e2df06fe4736dc3b1673b377d4baf96f0). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #28366: [SPARK-31365][SQL] Enable nested predicate pushdown per data sources
cloud-fan commented on a change in pull request #28366: URL: https://github.com/apache/spark/pull/28366#discussion_r419877190 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala ## @@ -179,15 +179,22 @@ class DataSourceV2Strategy(session: SparkSession) extends Strategy with Predicat case OverwriteByExpression(r: DataSourceV2Relation, deleteExpr, query, writeOptions, _) => // fail if any filter cannot be converted. correctness depends on removing all matching data. - val filters = splitConjunctivePredicates(deleteExpr).map { -filter => DataSourceStrategy.translateFilter(deleteExpr).getOrElse( - throw new AnalysisException(s"Cannot translate expression to source filter: $filter")) - }.toArray + val filters = splitConjunctivePredicates(deleteExpr) + def transferFilters = +(filters: Seq[Expression], supportNestedPredicatePushdown: Boolean) => { Review comment: Do we need the `supportNestedPredicatePushdown` parameter here as the caller side always pass true? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28451: [SPARK-31636][SQL][DOCS] Remove HTML syntax in SQL reference
AmplabJenkins removed a comment on pull request #28451: URL: https://github.com/apache/spark/pull/28451#issuecomment-623867741 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28451: [SPARK-31636][SQL][DOCS] Remove HTML syntax in SQL reference
SparkQA removed a comment on pull request #28451: URL: https://github.com/apache/spark/pull/28451#issuecomment-623865188 **[Test build #122301 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122301/testReport)** for PR 28451 at commit [`289e5ae`](https://github.com/apache/spark/commit/289e5aea19b2b027efa37fbfe7bd723824b02b92). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28451: [SPARK-31636][SQL][DOCS] Remove HTML syntax in SQL reference
SparkQA commented on pull request #28451: URL: https://github.com/apache/spark/pull/28451#issuecomment-623867673 **[Test build #122301 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122301/testReport)** for PR 28451 at commit [`289e5ae`](https://github.com/apache/spark/commit/289e5aea19b2b027efa37fbfe7bd723824b02b92). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28451: [SPARK-31636][SQL][DOCS] Remove HTML syntax in SQL reference
AmplabJenkins commented on pull request #28451: URL: https://github.com/apache/spark/pull/28451#issuecomment-623867741 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] igreenfield commented on pull request #26624: [SPARK-8981][CORE][test-hadoop3.2][test-java11] Add MDC support in Executor
igreenfield commented on pull request #26624: URL: https://github.com/apache/spark/pull/26624#issuecomment-623867478 I also ok with removing the default appId, appName. User will add what he needs. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dilipbiswal commented on a change in pull request #28451: [SPARK-31636][SQL][DOCS] Remove HTML syntax in SQL reference
dilipbiswal commented on a change in pull request #28451: URL: https://github.com/apache/spark/pull/28451#discussion_r419874633 ## File path: docs/sql-ref-identifier.md ## @@ -27,41 +27,34 @@ An identifier is a string used to identify a database object such as a table, vi Regular Identifier -{% highlight sql %} +```sql { letter | digit | '_' } [ , ... ] -{% endhighlight %} +``` Note: If `spark.sql.ansi.enabled` is set to true, ANSI SQL reserved keywords cannot be used as identifiers. For more details, please refer to [ANSI Compliance](sql-ref-ansi-compliance.html). Review comment: @huaxingao Should we bold "Note" ? I see that in other places we do bold it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #28431: [SPARK-31623][SQL][TESTS] Benchmark rebasing of INT96 and TIMESTAMP_MILLIS timestamps in read/write
cloud-fan commented on pull request #28431: URL: https://github.com/apache/spark/pull/28431#issuecomment-623865659 thanks, merging to master/3.0! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28451: [SPARK-31636][SQL][DOCS] Remove HTML syntax in SQL reference
AmplabJenkins removed a comment on pull request #28451: URL: https://github.com/apache/spark/pull/28451#issuecomment-623865497 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28451: [SPARK-31636][SQL][DOCS] Remove HTML syntax in SQL reference
AmplabJenkins commented on pull request #28451: URL: https://github.com/apache/spark/pull/28451#issuecomment-623865497 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28451: [SPARK-31636][SQL][DOCS] Remove HTML syntax in SQL reference
SparkQA commented on pull request #28451: URL: https://github.com/apache/spark/pull/28451#issuecomment-623865188 **[Test build #122301 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122301/testReport)** for PR 28451 at commit [`289e5ae`](https://github.com/apache/spark/commit/289e5aea19b2b027efa37fbfe7bd723824b02b92). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #27710: [SPARK-30960][SQL] add back the legacy date/timestamp format support in CSV/JSON parser
cloud-fan commented on a change in pull request #27710: URL: https://github.com/apache/spark/pull/27710#discussion_r419872705 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala ## @@ -239,7 +246,23 @@ class JacksonParser( case DateType => (parser: JsonParser) => parseJsonToken[java.lang.Integer](parser, dataType) { case VALUE_STRING if parser.getTextLength >= 1 => - dateFormatter.parse(parser.getText) + try { +dateFormatter.parse(parser.getText) + } catch { +case NonFatal(e) => + // If fails to parse, then tries the way used in 2.0 and 1.x for backwards + // compatibility. + val str = UTF8String.fromString(DateTimeUtils.cleanLegacyTimestampStr(parser.getText)) + DateTimeUtils.stringToDate(str, options.zoneId).getOrElse { +// In Spark 1.5.0, we store the data as number of days since epoch in string. +// So, we just convert it to Int. +try { + parser.getText.toInt Review comment: good catch! I think we should. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] huaxingao commented on a change in pull request #28451: [SPARK-31636][SQL][DOCS] Remove HTML syntax in SQL reference
huaxingao commented on a change in pull request #28451: URL: https://github.com/apache/spark/pull/28451#discussion_r419872224 ## File path: docs/sql-ref-literals.md ## @@ -71,128 +68,114 @@ SELECT 'it\'s $10.' AS col; +-+ |It's $10.| +-+ -{% endhighlight %} +``` ### Binary Literal A binary literal is used to specify a byte sequence value. Syntax -{% highlight sql %} +```sql X { 'c [ ... ]' | "c [ ... ]" } -{% endhighlight %} +``` + + Parameters - Parameters +* **c** - - c - One character from the character set. Review comment: seems to be hexadecimal. Changed to the following: ``` Syntax X { 'num [ ... ]' | "num [ ... ]" } Parameters * **num** Any hexadecimal number from 0 to F. ``` cc @yaooqinn This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #28310: [SPARK-31527][SQL] date add/subtract interval only allow those day precision in ansi mode
cloud-fan commented on a change in pull request #28310: URL: https://github.com/apache/spark/pull/28310#discussion_r419872162 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala ## @@ -618,6 +618,22 @@ object DateTimeUtils { instantToMicros(resultTimestamp.toInstant) } + /** + * Add the date and the interval's months and days. + * Returns a date value, expressed in days since 1.1.1970. + * + * @throws DateTimeException if the result exceeds the supported date range + * @throws IllegalArgumentException if the interval has `microseconds` part + */ + def dateAddInterval( + start: SQLDate, + interval: CalendarInterval): SQLDate = { +require(interval.microseconds == 0, + "Cannot add hours, minutes or seconds, milliseconds, microseconds to a date") +val ld = LocalDate.ofEpochDay(start).plusMonths(interval.months).plusDays(interval.days) Review comment: FYI, in snowflake `internal '1 month 1 day'` is different from `internal '1 day 1 month'`. We should at least document our own behavior. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] huaxingao commented on a change in pull request #28451: [SPARK-31636][SQL][DOCS] Remove HTML syntax in SQL reference
huaxingao commented on a change in pull request #28451: URL: https://github.com/apache/spark/pull/28451#discussion_r419872224 ## File path: docs/sql-ref-literals.md ## @@ -71,128 +68,114 @@ SELECT 'it\'s $10.' AS col; +-+ |It's $10.| +-+ -{% endhighlight %} +``` ### Binary Literal A binary literal is used to specify a byte sequence value. Syntax -{% highlight sql %} +```sql X { 'c [ ... ]' | "c [ ... ]" } -{% endhighlight %} +``` + + Parameters - Parameters +* **c** - - c - One character from the character set. Review comment: seems to be hexadecimal. Changed to the following: ``` Syntax ```sql X { 'num [ ... ]' | "num [ ... ]" } ``` Parameters * **num** Any hexadecimal number from 0 to F. ``` cc @yaooqinn This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] huaxingao commented on a change in pull request #28451: [SPARK-31636][SQL][DOCS] Remove HTML syntax in SQL reference
huaxingao commented on a change in pull request #28451: URL: https://github.com/apache/spark/pull/28451#discussion_r419871991 ## File path: docs/sql-ref-identifier.md ## @@ -27,54 +27,47 @@ An identifier is a string used to identify a database object such as a table, vi Regular Identifier -{% highlight sql %} +```sql { letter | digit | '_' } [ , ... ] -{% endhighlight %} +``` Note: If `spark.sql.ansi.enabled` is set to true, ANSI SQL reserved keywords cannot be used as identifiers. For more details, please refer to [ANSI Compliance](sql-ref-ansi-compliance.html). Delimited Identifier -{% highlight sql %} +```sql `c [ ... ]` -{% endhighlight %} +``` ### Parameters - - letter - +* **letter** + Any letter from A-Z or a-z. - - - - digit - + +* **digit** + Any numeral from 0 to 9. - - - - c - + +* **c** + Any character from the character set. Use ` to escape special characters (e.g., `). - - ### Examples -{% highlight sql %} +```sql -- This CREATE TABLE fails with ParseException because of the illegal identifier name a.b CREATE TABLE test (a.b int); -org.apache.spark.sql.catalyst.parser.ParseException: -no viable alternative at input 'CREATE TABLE test (a.'(line 1, pos 20) + org.apache.spark.sql.catalyst.parser.ParseException: Review comment: Fixed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] huaxingao commented on a change in pull request #28451: [SPARK-31636][SQL][DOCS] Remove HTML syntax in SQL reference
huaxingao commented on a change in pull request #28451: URL: https://github.com/apache/spark/pull/28451#discussion_r419871927 ## File path: docs/sql-ref-literals.md ## @@ -35,22 +35,19 @@ A string literal is used to specify a character string value. Syntax -{% highlight sql %} +```sql 'c [ ... ]' | "c [ ... ]" Review comment: changed to ```char``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] huaxingao commented on a change in pull request #28451: [SPARK-31636][SQL][DOCS] Remove HTML syntax in SQL reference
huaxingao commented on a change in pull request #28451: URL: https://github.com/apache/spark/pull/28451#discussion_r419871848 ## File path: docs/sql-ref-ansi-compliance.md ## @@ -66,7 +66,7 @@ This means that in case an operation causes overflows, the result is the same wi On the other hand, Spark SQL returns null for decimal overflows. When `spark.sql.ansi.enabled` is set to `true` and an overflow occurs in numeric and interval arithmetic operations, it throws an arithmetic exception at runtime. -{% highlight sql %} +```sql -- `spark.sql.ansi.enabled=true` Review comment: I don't have a strong opinion on this. seems to me comment is OK too. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #28445: [SPARK-31212][SQL][2.4] Fix Failure of casting the '1000-02-29' string to the date type
cloud-fan commented on pull request #28445: URL: https://github.com/apache/spark/pull/28445#issuecomment-623863062 @MaxGekk what's your opinion? I'm fine with this fix but I won't encourage people to spend much time fixing datetime related bugs in 2.4. The datetime part is completely rewritten in 3.0. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] igreenfield commented on a change in pull request #26624: [SPARK-8981][CORE][test-hadoop3.2][test-java11] Add MDC support in Executor
igreenfield commented on a change in pull request #26624: URL: https://github.com/apache/spark/pull/26624#discussion_r419869698 ## File path: docs/configuration.md ## @@ -2670,6 +2670,9 @@ Spark uses [log4j](http://logging.apache.org/log4j/) for logging. You can config `log4j.properties` file in the `conf` directory. One way to start is to copy the existing `log4j.properties.template` located there. +By default, Spark adds to the MDC 3 records: `appId`, `appName` and `taskName` you can add that to your patternLayout `%X{appId}` in order to print in the logs Review comment: Maybe in both places? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #28383: [SPARK-31590][SQL] Metadata-only queries should not include subquery in partition filters
cloud-fan commented on pull request #28383: URL: https://github.com/apache/spark/pull/28383#issuecomment-623861615 Shall we remove `OptimizeMetadataOnlyQuery`? IIRC it has a correcness issue and we disable it by default. cc @gengliangwang This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28239: [SPARK-31467][SQL][TEST] Refactor the sql tests to prevent TableAlreadyExistsException
AmplabJenkins removed a comment on pull request #28239: URL: https://github.com/apache/spark/pull/28239#issuecomment-623857362 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28239: [SPARK-31467][SQL][TEST] Refactor the sql tests to prevent TableAlreadyExistsException
SparkQA removed a comment on pull request #28239: URL: https://github.com/apache/spark/pull/28239#issuecomment-623778576 **[Test build #122295 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122295/testReport)** for PR 28239 at commit [`453c5a5`](https://github.com/apache/spark/commit/453c5a5e0717d2681fc2e0ed4f48ca093d4020a0). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28239: [SPARK-31467][SQL][TEST] Refactor the sql tests to prevent TableAlreadyExistsException
AmplabJenkins commented on pull request #28239: URL: https://github.com/apache/spark/pull/28239#issuecomment-623857362 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #28452: [SPARK-27963][FOLLOW-UP][DOCS][CORE] Remove `for testing` because CleanerListener is used ExecutorMonitor during dynamic allocation
dongjoon-hyun commented on pull request #28452: URL: https://github.com/apache/spark/pull/28452#issuecomment-623857249 Thank you all! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28239: [SPARK-31467][SQL][TEST] Refactor the sql tests to prevent TableAlreadyExistsException
SparkQA commented on pull request #28239: URL: https://github.com/apache/spark/pull/28239#issuecomment-623856851 **[Test build #122295 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122295/testReport)** for PR 28239 at commit [`453c5a5`](https://github.com/apache/spark/commit/453c5a5e0717d2681fc2e0ed4f48ca093d4020a0). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #26624: [SPARK-8981][CORE][test-hadoop3.2][test-java11] Add MDC support in Executor
cloud-fan commented on pull request #26624: URL: https://github.com/apache/spark/pull/26624#issuecomment-623855672 LGTM except for the app id/name. I'm still not convinced that it's working, at least @Ngone51 reported he can't see app id/name by local testing. Can you clearly point out the code that sets app id/name? You mentioned it's in DAG scheduler, can you point out which line? It's even better if you can add a test. BTW I think it's OK to ask users to set app id/name themselves by `mdc.appId/Name`. I'm good with this patch if we just remove the handling of app id/name. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26624: [SPARK-8981][CORE][test-hadoop3.2][test-java11] Add MDC support in Executor
cloud-fan commented on a change in pull request #26624: URL: https://github.com/apache/spark/pull/26624#discussion_r419860787 ## File path: docs/configuration.md ## @@ -2670,6 +2670,9 @@ Spark uses [log4j](http://logging.apache.org/log4j/) for logging. You can config `log4j.properties` file in the `conf` directory. One way to start is to copy the existing `log4j.properties.template` located there. +By default, Spark adds to the MDC 3 records: `appId`, `appName` and `taskName` you can add that to your patternLayout `%X{appId}` in order to print in the logs Review comment: I think it's better to put the doc in `conf/log4j.properties.template`, where users use this feature. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] huaxingao commented on a change in pull request #28451: [SPARK-31636][SQL][DOCS] Remove HTML syntax in SQL reference
huaxingao commented on a change in pull request #28451: URL: https://github.com/apache/spark/pull/28451#discussion_r419857329 ## File path: docs/sql-ref-functions-udf-aggregate.md ## @@ -113,26 +102,26 @@ OPTIONS ( ); SELECT * FROM employees; --- +---+--+ --- | name|salary| --- +---+--+ --- |Michael| 3000| --- | Andy| 4500| --- | Justin| 3500| --- | Berta| 4000| --- +---+--+ ++---+--+ +| name|salary| ++---+--+ +|Michael| 3000| +| Andy| 4500| +| Justin| 3500| +| Berta| 4000| ++---+--+ SELECT myAverage(salary) as average_salary FROM employees; --- +--+ --- |average_salary| --- +--+ --- |3750.0| --- +--+ -{% endhighlight %} ++--+ +|average_salary| ++--+ +|3750.0| ++--+ +``` Review comment: This is for examples ``. I prefer to keep this since we use this format for all the examples. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gatorsmile commented on a change in pull request #28224: [SPARK-31429][SQL][DOC] Automatically generates a SQL document for built-in functions
gatorsmile commented on a change in pull request #28224: URL: https://github.com/apache/spark/pull/28224#discussion_r419857270 ## File path: docs/sql-ref-functions-builtin.md ## @@ -0,0 +1,77 @@ +--- +layout: global +title: Built-in Functions +displayTitle: Built-in Functions +license: | + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + http://www.apache.org/licenses/LICENSE-2.0 + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +--- + +{% for static_file in site.static_files %} +{% if static_file.name == 'generated-agg-funcs-table.html' %} +### Aggregate Functions +{% include_relative generated-agg-funcs-table.html %} + Examples +{% include_relative generated-agg-funcs-examples.html %} +{% break %} +{% endif %} +{% endfor %} + +{% for static_file in site.static_files %} +{% if static_file.name == 'generated-window-funcs-table.html' %} +### Window Functions +{% include_relative generated-window-funcs-table.html %} +{% break %} +{% endif %} +{% endfor %} + +{% for static_file in site.static_files %} +{% if static_file.name == 'generated-array-funcs-table.html' %} +### Array Functions +{% include_relative generated-array-funcs-table.html %} + Examples +{% include_relative generated-array-funcs-examples.html %} +{% break %} +{% endif %} +{% endfor %} + +{% for static_file in site.static_files %} +{% if static_file.name == 'generated-map-funcs-table.html' %} +### Map Functions +{% include_relative generated-map-funcs-table.html %} + Examples +{% include_relative generated-map-funcs-examples.html %} +{% break %} +{% endif %} +{% endfor %} + +{% for static_file in site.static_files %} +{% if static_file.name == 'generated-datetime-funcs-table.html' %} +### Date and Timestamp Functions +{% include_relative generated-datetime-funcs-table.html %} + Examples +{% include_relative generated-datetime-funcs-examples.html %} +{% break %} +{% endif %} +{% endfor %} + +{% for static_file in site.static_files %} +{% if static_file.name == 'generated-json-funcs-table.html' %} +### JSON Functions +{% include_relative generated-json-funcs-table.html %} + Examples +{% include_relative generated-agg-funcs-examples.html %} Review comment: generated-agg-funcs-examples.html -> generated-json-funcs-examples.html ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28009: [SPARK-31235][YARN] Separates different categories of applications
SparkQA removed a comment on pull request #28009: URL: https://github.com/apache/spark/pull/28009#issuecomment-623844664 **[Test build #122299 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122299/testReport)** for PR 28009 at commit [`b762753`](https://github.com/apache/spark/commit/b762753d9642d7c5b1faa8d5dcaa6402c95730c1). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28164: [SPARK-31393][SQL] Show the correct alias in schema for expression
SparkQA commented on pull request #28164: URL: https://github.com/apache/spark/pull/28164#issuecomment-623848821 **[Test build #122300 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122300/testReport)** for PR 28164 at commit [`cc4ee4c`](https://github.com/apache/spark/commit/cc4ee4c7b09ee9c09f40ac1d4f714db6a83838d4). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28009: [SPARK-31235][YARN] Separates different categories of applications
AmplabJenkins commented on pull request #28009: URL: https://github.com/apache/spark/pull/28009#issuecomment-623848806 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28009: [SPARK-31235][YARN] Separates different categories of applications
SparkQA commented on pull request #28009: URL: https://github.com/apache/spark/pull/28009#issuecomment-623848721 **[Test build #122299 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122299/testReport)** for PR 28009 at commit [`b762753`](https://github.com/apache/spark/commit/b762753d9642d7c5b1faa8d5dcaa6402c95730c1). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28009: [SPARK-31235][YARN] Separates different categories of applications
AmplabJenkins removed a comment on pull request #28009: URL: https://github.com/apache/spark/pull/28009#issuecomment-623848806 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28164: [SPARK-31393][SQL] Show the correct alias in schema for expression
AmplabJenkins removed a comment on pull request #28164: URL: https://github.com/apache/spark/pull/28164#issuecomment-623847685 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28164: [SPARK-31393][SQL] Show the correct alias in schema for expression
AmplabJenkins commented on pull request #28164: URL: https://github.com/apache/spark/pull/28164#issuecomment-623847685 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28009: [SPARK-31235][YARN] Separates different categories of applications
SparkQA removed a comment on pull request #28009: URL: https://github.com/apache/spark/pull/28009#issuecomment-623821922 **[Test build #122298 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122298/testReport)** for PR 28009 at commit [`4599e18`](https://github.com/apache/spark/commit/4599e18141efce8cc241fd1f9f4d5b84e3a297e7). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28009: [SPARK-31235][YARN] Separates different categories of applications
AmplabJenkins commented on pull request #28009: URL: https://github.com/apache/spark/pull/28009#issuecomment-623846376 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28009: [SPARK-31235][YARN] Separates different categories of applications
AmplabJenkins removed a comment on pull request #28009: URL: https://github.com/apache/spark/pull/28009#issuecomment-623846376 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28009: [SPARK-31235][YARN] Separates different categories of applications
SparkQA commented on pull request #28009: URL: https://github.com/apache/spark/pull/28009#issuecomment-623846317 **[Test build #122298 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122298/testReport)** for PR 28009 at commit [`4599e18`](https://github.com/apache/spark/commit/4599e18141efce8cc241fd1f9f4d5b84e3a297e7). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28009: [SPARK-31235][YARN] Separates different categories of applications
AmplabJenkins commented on pull request #28009: URL: https://github.com/apache/spark/pull/28009#issuecomment-623845229 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28009: [SPARK-31235][YARN] Separates different categories of applications
AmplabJenkins removed a comment on pull request #28009: URL: https://github.com/apache/spark/pull/28009#issuecomment-623845229 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28009: [SPARK-31235][YARN] Separates different categories of applications
SparkQA commented on pull request #28009: URL: https://github.com/apache/spark/pull/28009#issuecomment-623844664 **[Test build #122299 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122299/testReport)** for PR 28009 at commit [`b762753`](https://github.com/apache/spark/commit/b762753d9642d7c5b1faa8d5dcaa6402c95730c1). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28009: [SPARK-31235][YARN] Separates different categories of applications
AmplabJenkins removed a comment on pull request #28009: URL: https://github.com/apache/spark/pull/28009#issuecomment-623824088 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28009: [SPARK-31235][YARN] Separates different categories of applications
AmplabJenkins commented on pull request #28009: URL: https://github.com/apache/spark/pull/28009#issuecomment-623824088 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28009: [SPARK-31235][YARN] Separates different categories of applications
SparkQA commented on pull request #28009: URL: https://github.com/apache/spark/pull/28009#issuecomment-623821922 **[Test build #122298 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122298/testReport)** for PR 28009 at commit [`4599e18`](https://github.com/apache/spark/commit/4599e18141efce8cc241fd1f9f4d5b84e3a297e7). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on pull request #28430: [SPARK-31372][SQL][TEST][FOLLOW-UP] Improve ExpressionsSchemaSuite so that easy to track the diff.
beliefer commented on pull request #28430: URL: https://github.com/apache/spark/pull/28430#issuecomment-623813026 @HyukjinKwon Thanks for your help! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #26624: [SPARK-8981][CORE][test-hadoop3.2][test-java11] Add MDC support in Executor
AmplabJenkins removed a comment on pull request #26624: URL: https://github.com/apache/spark/pull/26624#issuecomment-623811893 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #26624: [SPARK-8981][CORE][test-hadoop3.2][test-java11] Add MDC support in Executor
AmplabJenkins commented on pull request #26624: URL: https://github.com/apache/spark/pull/26624#issuecomment-623811893 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #26624: [SPARK-8981][CORE][test-hadoop3.2][test-java11] Add MDC support in Executor
SparkQA commented on pull request #26624: URL: https://github.com/apache/spark/pull/26624#issuecomment-623811305 **[Test build #122294 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122294/testReport)** for PR 26624 at commit [`50a68c7`](https://github.com/apache/spark/commit/50a68c7eded44ce6eb5afaff6f8170c4add70a25). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #26624: [SPARK-8981][CORE][test-hadoop3.2][test-java11] Add MDC support in Executor
SparkQA removed a comment on pull request #26624: URL: https://github.com/apache/spark/pull/26624#issuecomment-623774430 **[Test build #122294 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122294/testReport)** for PR 26624 at commit [`50a68c7`](https://github.com/apache/spark/commit/50a68c7eded44ce6eb5afaff6f8170c4add70a25). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #28383: [SPARK-31590][SQL] Metadata-only queries should not include subquery in partition filters
maropu commented on a change in pull request #28383: URL: https://github.com/apache/spark/pull/28383#discussion_r419817895 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/OptimizeMetadataOnlyQuery.scala ## @@ -117,7 +117,7 @@ case class OptimizeMetadataOnlyQuery(catalog: SessionCatalog) extends Rule[Logic case a: AttributeReference => a.withName(relation.output.find(_.semanticEquals(a)).get.name) } -} Review comment: Could you filter out this unsupported case outside `replaceTableScanWithPartitionMetadata`(I think this filtering is not related to normalization)? e.g., in https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/OptimizeMetadataOnlyQuery.scala#L53-L55 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] igreenfield commented on a change in pull request #26624: [SPARK-8981][CORE][test-hadoop3.2][test-java11] Add MDC support in Executor
igreenfield commented on a change in pull request #26624: URL: https://github.com/apache/spark/pull/26624#discussion_r419832871 ## File path: core/src/main/scala/org/apache/spark/util/ThreadUtils.scala ## @@ -17,21 +17,106 @@ package org.apache.spark.util +import java.util import java.util.concurrent._ import java.util.concurrent.locks.ReentrantLock +import com.google.common.util.concurrent.{MoreExecutors, ThreadFactoryBuilder} import scala.concurrent.{Awaitable, ExecutionContext, ExecutionContextExecutor, Future} import scala.concurrent.duration.{Duration, FiniteDuration} import scala.language.higherKinds import scala.util.control.NonFatal -import com.google.common.util.concurrent.ThreadFactoryBuilder - import org.apache.spark.SparkException import org.apache.spark.rpc.RpcAbortException private[spark] object ThreadUtils { + object MDCAwareThreadPoolExecutor { +def newCachedThreadPool(threadFactory: ThreadFactory): ThreadPoolExecutor = { + // The values needs to be synced with `Executors.newCachedThreadPool` + new MDCAwareThreadPoolExecutor( +0, +Integer.MAX_VALUE, +60L, +TimeUnit.SECONDS, +new SynchronousQueue[Runnable], +threadFactory) +} + +def newFixedThreadPool(nThreads: Int, threadFactory: ThreadFactory): ThreadPoolExecutor = { + // The values needs to be synced with `Executors.newFixedThreadPool` + new MDCAwareThreadPoolExecutor( +nThreads, +nThreads, +0L, +TimeUnit.MILLISECONDS, +new LinkedBlockingQueue[Runnable], +threadFactory) +} + +def newSingleThreadExecutor(threadFactory: ThreadFactory): ExecutorService = { + // The values needs to be synced with `Executors.newSingleThreadExecutor` + Executors.unconfigurableExecutorService( +new MDCAwareThreadPoolExecutor( Review comment: @viirya @HyukjinKwon 1. Yes I am confident: finalize is "best-effort" the JVM does guaranty to run finalize also in later JVM it becomes ``` @Deprecated(since="9") protected void finalize() throws Throwable { } ``` 2. OK I will add comment even so I think in any case no one should rely on finalize. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28123: [SPARK-31350][SQL] Coalesce bucketed tables for join if applicable
AmplabJenkins commented on pull request #28123: URL: https://github.com/apache/spark/pull/28123#issuecomment-623800734 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28123: [SPARK-31350][SQL] Coalesce bucketed tables for join if applicable
AmplabJenkins removed a comment on pull request #28123: URL: https://github.com/apache/spark/pull/28123#issuecomment-623800734 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] imback82 commented on a change in pull request #28123: [SPARK-31350][SQL] Coalesce bucketed tables for join if applicable
imback82 commented on a change in pull request #28123: URL: https://github.com/apache/spark/pull/28123#discussion_r419830462 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/bucketing/CoalesceBucketsInJoin.scala ## @@ -0,0 +1,109 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.bucketing + +import org.apache.spark.sql.catalyst.catalog.BucketSpec +import org.apache.spark.sql.catalyst.planning.ExtractEquiJoinKeys +import org.apache.spark.sql.catalyst.plans.logical.{Filter, Join, LogicalPlan, Project} +import org.apache.spark.sql.catalyst.rules.Rule +import org.apache.spark.sql.execution.datasources.{HadoopFsRelation, LogicalRelation} +import org.apache.spark.sql.internal.SQLConf + +/** + * This rule adds a `CoalesceBuckets` logical plan if one side of two bucketed tables can be + * coalesced when the two bucketed tables are joined and they differ in the number of buckets. + */ +object CoalesceBucketsInJoin extends Rule[LogicalPlan] { Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28123: [SPARK-31350][SQL] Coalesce bucketed tables for join if applicable
SparkQA commented on pull request #28123: URL: https://github.com/apache/spark/pull/28123#issuecomment-623800494 **[Test build #122297 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122297/testReport)** for PR 28123 at commit [`eeb0ec7`](https://github.com/apache/spark/commit/eeb0ec7385a9253df0e64316ef2bf069cccf9b6f). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] imback82 commented on a change in pull request #28123: [SPARK-31350][SQL] Coalesce bucketed tables for join if applicable
imback82 commented on a change in pull request #28123: URL: https://github.com/apache/spark/pull/28123#discussion_r419830393 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/bucketing/CoalesceBucketsInJoin.scala ## @@ -0,0 +1,109 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.bucketing + +import org.apache.spark.sql.catalyst.catalog.BucketSpec +import org.apache.spark.sql.catalyst.planning.ExtractEquiJoinKeys +import org.apache.spark.sql.catalyst.plans.logical.{Filter, Join, LogicalPlan, Project} +import org.apache.spark.sql.catalyst.rules.Rule +import org.apache.spark.sql.execution.datasources.{HadoopFsRelation, LogicalRelation} +import org.apache.spark.sql.internal.SQLConf + +/** + * This rule adds a `CoalesceBuckets` logical plan if one side of two bucketed tables can be + * coalesced when the two bucketed tables are joined and they differ in the number of buckets. Review comment: Added more comments This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] imback82 commented on a change in pull request #28123: [SPARK-31350][SQL] Coalesce bucketed tables for join if applicable
imback82 commented on a change in pull request #28123: URL: https://github.com/apache/spark/pull/28123#discussion_r419830249 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala ## @@ -221,3 +223,22 @@ object FileSourceStrategy extends Strategy with Logging { case _ => Nil } } + +/** + * Extractor that handles `CoalesceBuckets` in the child plan extracted from `ScanOperation`. + */ +object ScanOperationWithCoalescedBuckets { Review comment: I added it in CoalesceBucketsInEquiJoinSuite.scala. (Please let me know if it makes more sense to have it in FileSourceStrategySuite.scala) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dilipbiswal commented on pull request #28433: [SPARK-31030] [DOCS] [FOLLOWUP] Replace HTML Table by Markdown Table
dilipbiswal commented on pull request #28433: URL: https://github.com/apache/spark/pull/28433#issuecomment-623799161 @maropu @srowen Can this get in now, if there are no other comments ? The reason i ask is @huaxingao has a big PR which is changing a lot of files. If this can get in first then she can rebase and push ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dilipbiswal commented on a change in pull request #28451: [SPARK-31636][SQL][DOCS] Remove HTML syntax in SQL reference
dilipbiswal commented on a change in pull request #28451: URL: https://github.com/apache/spark/pull/28451#discussion_r419827929 ## File path: docs/sql-ref-literals.md ## @@ -71,128 +68,114 @@ SELECT 'it\'s $10.' AS col; +-+ |It's $10.| +-+ -{% endhighlight %} +``` ### Binary Literal A binary literal is used to specify a byte sequence value. Syntax -{% highlight sql %} +```sql X { 'c [ ... ]' | "c [ ... ]" } -{% endhighlight %} +``` + + Parameters - Parameters +* **c** - - c - One character from the character set. Review comment: I believe there is limitation on the chars that are allowed in the binary literal ? for example, i tried : SELECT X'zz' AS col and got an exception ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dilipbiswal commented on a change in pull request #28451: [SPARK-31636][SQL][DOCS] Remove HTML syntax in SQL reference
dilipbiswal commented on a change in pull request #28451: URL: https://github.com/apache/spark/pull/28451#discussion_r419826842 ## File path: docs/sql-ref-literals.md ## @@ -35,22 +35,19 @@ A string literal is used to specify a character string value. Syntax -{% highlight sql %} +```sql 'c [ ... ]' | "c [ ... ]" Review comment: @huaxingao the parameter `c` kind of looks weird especially in new format ? What do you think of character or any_char or something like that ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28451: [SPARK-31636][SQL][DOCS] Remove HTML syntax in SQL reference
AmplabJenkins removed a comment on pull request #28451: URL: https://github.com/apache/spark/pull/28451#issuecomment-623793408 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28451: [SPARK-31636][SQL][DOCS] Remove HTML syntax in SQL reference
AmplabJenkins commented on pull request #28451: URL: https://github.com/apache/spark/pull/28451#issuecomment-623793408 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28451: [SPARK-31636][SQL][DOCS] Remove HTML syntax in SQL reference
SparkQA removed a comment on pull request #28451: URL: https://github.com/apache/spark/pull/28451#issuecomment-623791128 **[Test build #122296 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122296/testReport)** for PR 28451 at commit [`66d82ca`](https://github.com/apache/spark/commit/66d82ca5a45d3de05429cab2e09ef723cb4d426b). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28451: [SPARK-31636][SQL][DOCS] Remove HTML syntax in SQL reference
SparkQA commented on pull request #28451: URL: https://github.com/apache/spark/pull/28451#issuecomment-623793351 **[Test build #122296 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122296/testReport)** for PR 28451 at commit [`66d82ca`](https://github.com/apache/spark/commit/66d82ca5a45d3de05429cab2e09ef723cb4d426b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28451: [SPARK-31636][SQL][DOCS] Remove HTML syntax in SQL reference
AmplabJenkins removed a comment on pull request #28451: URL: https://github.com/apache/spark/pull/28451#issuecomment-623791394 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] huaxingao commented on pull request #28451: [SPARK-31636][SQL][DOCS] Remove HTML syntax in SQL reference
huaxingao commented on pull request #28451: URL: https://github.com/apache/spark/pull/28451#issuecomment-623791469 > Rather, we should remove indents in the other places for following the result format? It's better to remove indents. Will spend some time to find all the error messages. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28451: [SPARK-31636][SQL][DOCS] Remove HTML syntax in SQL reference
AmplabJenkins commented on pull request #28451: URL: https://github.com/apache/spark/pull/28451#issuecomment-623791394 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28451: [SPARK-31636][SQL][DOCS] Remove HTML syntax in SQL reference
SparkQA commented on pull request #28451: URL: https://github.com/apache/spark/pull/28451#issuecomment-623791128 **[Test build #122296 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122296/testReport)** for PR 28451 at commit [`66d82ca`](https://github.com/apache/spark/commit/66d82ca5a45d3de05429cab2e09ef723cb4d426b). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] huaxingao commented on a change in pull request #28451: [SPARK-31636][SQL][DOCS] Remove HTML syntax in SQL reference
huaxingao commented on a change in pull request #28451: URL: https://github.com/apache/spark/pull/28451#discussion_r419818939 ## File path: docs/_data/menu-sql.yaml ## @@ -156,22 +156,22 @@ url: sql-ref-syntax-qry-select-distribute-by.html - text: LIMIT Clause url: sql-ref-syntax-qry-select-limit.html +- text: Common Table Expression + url: sql-ref-syntax-qry-select-cte.html +- text: Inline Table + url: sql-ref-syntax-qry-select-inline-table.html - text: JOIN url: sql-ref-syntax-qry-select-join.html - text: Join Hints url: sql-ref-syntax-qry-select-hints.html +- text: LIKE Predicate + url: sql-ref-syntax-qry-select-like.html - text: Set Operators Review comment: I didn't change the order of the first 8 clauses. I think these should be grouped together. But I changed the rest to make them alphabetical order. https://user-images.githubusercontent.com/13592258/81027881-33663000-8e34-11ea-9305-3d62a1769362.png;> This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on pull request #28383: [SPARK-31590][SQL] Metadata-only queries should not include subquery in partition filters
maropu commented on pull request #28383: URL: https://github.com/apache/spark/pull/28383#issuecomment-623791024 > Applying OptimizeMetadataOnlyQuery rule will generate scalar-subquery. Is this statement true? It seems the test query itself has a subquery. ``` // Analyzed plan of the test query Aggregate [partcol1#40], [partcol1#40, max(partcol2#41) AS partcol2#71] +- Filter ((partcol1#40 = scalar-subquery#70 []) AND (partcol2#41 = even)) : +- Aggregate [max(partcol1#40) AS max(partcol1)#73] : +- SubqueryAlias spark_catalog.default.srcpart :+- Relation[col1#38,col2#39,partcol1#40,partcol2#41] parquet +- SubqueryAlias spark_catalog.default.srcpart +- Relation[col1#38,col2#39,partcol1#40,partcol2#41] parquet ``` I think the root cause is just that unsupported `partitionFilters` (subquery) is passed into `FileIndex.listFiles`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #28383: [SPARK-31590][SQL] Metadata-only queries should not include subquery in partition filters
maropu commented on a change in pull request #28383: URL: https://github.com/apache/spark/pull/28383#discussion_r419817895 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/OptimizeMetadataOnlyQuery.scala ## @@ -117,7 +117,7 @@ case class OptimizeMetadataOnlyQuery(catalog: SessionCatalog) extends Rule[Logic case a: AttributeReference => a.withName(relation.output.find(_.semanticEquals(a)).get.name) } -} Review comment: Could you filter out this unsupported case in advance? e.g., in https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/OptimizeMetadataOnlyQuery.scala#L53-L55 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] huaxingao commented on a change in pull request #28451: [SPARK-31636][SQL][DOCS] Remove HTML syntax in SQL reference
huaxingao commented on a change in pull request #28451: URL: https://github.com/apache/spark/pull/28451#discussion_r419817932 ## File path: docs/sql-ref-functions-udf-aggregate.md ## @@ -113,26 +102,26 @@ OPTIONS ( ); SELECT * FROM employees; --- +---+--+ --- | name|salary| --- +---+--+ --- |Michael| 3000| --- | Andy| 4500| --- | Justin| 3500| --- | Berta| 4000| --- +---+--+ ++---+--+ +| name|salary| ++---+--+ +|Michael| 3000| +| Andy| 4500| +| Justin| 3500| +| Berta| 4000| ++---+--+ SELECT myAverage(salary) as average_salary FROM employees; --- +--+ --- |average_salary| --- +--+ --- |3750.0| --- +--+ -{% endhighlight %} ++--+ +|average_salary| ++--+ +|3750.0| ++--+ +``` Review comment: I will take a look at this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28452: [SPARK-27963][FOLLOW-UP][DOCS][CORE] Remove `for testing` because CleanerListener is used ExecutorMonitor during dynamic alloca
AmplabJenkins removed a comment on pull request #28452: URL: https://github.com/apache/spark/pull/28452#issuecomment-623789508 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/122293/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28452: [SPARK-27963][FOLLOW-UP][DOCS][CORE] Remove `for testing` because CleanerListener is used ExecutorMonitor during dynamic alloca
AmplabJenkins removed a comment on pull request #28452: URL: https://github.com/apache/spark/pull/28452#issuecomment-623789506 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28452: [SPARK-27963][FOLLOW-UP][DOCS][CORE] Remove `for testing` because CleanerListener is used ExecutorMonitor during dynamic allocation
SparkQA removed a comment on pull request #28452: URL: https://github.com/apache/spark/pull/28452#issuecomment-623755415 **[Test build #122293 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122293/testReport)** for PR 28452 at commit [`34c0724`](https://github.com/apache/spark/commit/34c072421e664b8b366574d2f77fce7b0bea3412). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28452: [SPARK-27963][FOLLOW-UP][DOCS][CORE] Remove `for testing` because CleanerListener is used ExecutorMonitor during dynamic allocation
AmplabJenkins commented on pull request #28452: URL: https://github.com/apache/spark/pull/28452#issuecomment-623789506 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28452: [SPARK-27963][FOLLOW-UP][DOCS][CORE] Remove `for testing` because CleanerListener is used ExecutorMonitor during dynamic allocation
SparkQA commented on pull request #28452: URL: https://github.com/apache/spark/pull/28452#issuecomment-623789260 **[Test build #122293 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122293/testReport)** for PR 28452 at commit [`34c0724`](https://github.com/apache/spark/commit/34c072421e664b8b366574d2f77fce7b0bea3412). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on pull request #28336: [SPARK-31559][YARN] Re-obtain tokens at the startup of AM for yarn cluster mode if principal and keytab are available
HeartSaVioR commented on pull request #28336: URL: https://github.com/apache/spark/pull/28336#issuecomment-623787519 friendly reminder to @vanzin @squito also cc. @jerryshao, @tgravescs to expand the availability of reviews This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #28452: [SPARK-27963][FOLLOW-UP][DOCS][CORE] Remove `for testing` because CleanerListener is used ExecutorMonitor during dynamic allocation
HyukjinKwon commented on pull request #28452: URL: https://github.com/apache/spark/pull/28452#issuecomment-623787379 Merged to master and branch-3.0 since the related linter tests were already passed. I don't believe this change affects other tests or build. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #28452: [MINOR][CORE] Remove `for testing` because CleanerListener is used ExecutorMonitor during dynamic allocation
HyukjinKwon commented on pull request #28452: URL: https://github.com/apache/spark/pull/28452#issuecomment-623787185 Let me just turn this to a followup of SPARK-27963 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #28430: [SPARK-31372][SQL][TEST][FOLLOW-UP] Improve ExpressionsSchemaSuite so that easy to track the diff.
HyukjinKwon commented on pull request #28430: URL: https://github.com/apache/spark/pull/28430#issuecomment-623786752 Merged to master and branc-3.0. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #26624: [SPARK-8981][CORE][test-hadoop3.2][test-java11] Add MDC support in Executor
HyukjinKwon commented on a change in pull request #26624: URL: https://github.com/apache/spark/pull/26624#discussion_r419813534 ## File path: core/src/main/scala/org/apache/spark/util/ThreadUtils.scala ## @@ -17,21 +17,106 @@ package org.apache.spark.util +import java.util import java.util.concurrent._ import java.util.concurrent.locks.ReentrantLock +import com.google.common.util.concurrent.{MoreExecutors, ThreadFactoryBuilder} import scala.concurrent.{Awaitable, ExecutionContext, ExecutionContextExecutor, Future} import scala.concurrent.duration.{Duration, FiniteDuration} import scala.language.higherKinds import scala.util.control.NonFatal -import com.google.common.util.concurrent.ThreadFactoryBuilder - import org.apache.spark.SparkException import org.apache.spark.rpc.RpcAbortException private[spark] object ThreadUtils { + object MDCAwareThreadPoolExecutor { +def newCachedThreadPool(threadFactory: ThreadFactory): ThreadPoolExecutor = { + // The values needs to be synced with `Executors.newCachedThreadPool` + new MDCAwareThreadPoolExecutor( +0, +Integer.MAX_VALUE, +60L, +TimeUnit.SECONDS, +new SynchronousQueue[Runnable], +threadFactory) +} + +def newFixedThreadPool(nThreads: Int, threadFactory: ThreadFactory): ThreadPoolExecutor = { + // The values needs to be synced with `Executors.newFixedThreadPool` + new MDCAwareThreadPoolExecutor( +nThreads, +nThreads, +0L, +TimeUnit.MILLISECONDS, +new LinkedBlockingQueue[Runnable], +threadFactory) +} + +def newSingleThreadExecutor(threadFactory: ThreadFactory): ExecutorService = { + // The values needs to be synced with `Executors.newSingleThreadExecutor` + Executors.unconfigurableExecutorService( +new MDCAwareThreadPoolExecutor( Review comment: But let's at least leave a comment here in case people use it without knowing there's difference vs built-in `newSingleThreadExecutor`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #28451: [SPARK-31636][SQL][DOCS] Remove HTML syntax in SQL reference
maropu commented on a change in pull request #28451: URL: https://github.com/apache/spark/pull/28451#discussion_r419790030 ## File path: docs/_data/menu-sql.yaml ## @@ -156,22 +156,22 @@ url: sql-ref-syntax-qry-select-distribute-by.html - text: LIMIT Clause url: sql-ref-syntax-qry-select-limit.html +- text: Common Table Expression + url: sql-ref-syntax-qry-select-cte.html +- text: Inline Table + url: sql-ref-syntax-qry-select-inline-table.html - text: JOIN url: sql-ref-syntax-qry-select-join.html - text: Join Hints url: sql-ref-syntax-qry-select-hints.html +- text: LIKE Predicate + url: sql-ref-syntax-qry-select-like.html - text: Set Operators Review comment: Why do we need the changes in this file? ## File path: docs/sql-ref-identifier.md ## @@ -27,54 +27,47 @@ An identifier is a string used to identify a database object such as a table, vi Regular Identifier -{% highlight sql %} +```sql { letter | digit | '_' } [ , ... ] -{% endhighlight %} +``` Note: If `spark.sql.ansi.enabled` is set to true, ANSI SQL reserved keywords cannot be used as identifiers. For more details, please refer to [ANSI Compliance](sql-ref-ansi-compliance.html). Delimited Identifier -{% highlight sql %} +```sql `c [ ... ]` -{% endhighlight %} +``` ### Parameters - - letter - +* **letter** + Any letter from A-Z or a-z. - - - - digit - + +* **digit** + Any numeral from 0 to 9. - - - - c - + +* **c** + Any character from the character set. Use ` to escape special characters (e.g., `). - - ### Examples -{% highlight sql %} +```sql -- This CREATE TABLE fails with ParseException because of the illegal identifier name a.b CREATE TABLE test (a.b int); -org.apache.spark.sql.catalyst.parser.ParseException: -no viable alternative at input 'CREATE TABLE test (a.'(line 1, pos 20) + org.apache.spark.sql.catalyst.parser.ParseException: Review comment: Rather, we should remove indents in the other places for following the result format? ## File path: docs/sql-ref-functions-udf-aggregate.md ## @@ -113,26 +102,26 @@ OPTIONS ( ); SELECT * FROM employees; --- +---+--+ --- | name|salary| --- +---+--+ --- |Michael| 3000| --- | Andy| 4500| --- | Justin| 3500| --- | Berta| 4000| --- +---+--+ ++---+--+ +| name|salary| ++---+--+ +|Michael| 3000| +| Andy| 4500| +| Justin| 3500| +| Berta| 4000| ++---+--+ SELECT myAverage(salary) as average_salary FROM employees; --- +--+ --- |average_salary| --- +--+ --- |3750.0| --- +--+ -{% endhighlight %} ++--+ +|average_salary| ++--+ +|3750.0| ++--+ +``` Review comment: We cannot avoid this tag, too? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dilipbiswal commented on a change in pull request #28451: [SPARK-31636][SQL][DOCS] Remove HTML syntax in SQL reference
dilipbiswal commented on a change in pull request #28451: URL: https://github.com/apache/spark/pull/28451#discussion_r419810299 ## File path: docs/sql-ref-ansi-compliance.md ## @@ -66,7 +66,7 @@ This means that in case an operation causes overflows, the result is the same wi On the other hand, Spark SQL returns null for decimal overflows. When `spark.sql.ansi.enabled` is set to `true` and an overflow occurs in numeric and interval arithmetic operations, it throws an arithmetic exception at runtime. -{% highlight sql %} +```sql -- `spark.sql.ansi.enabled=true` Review comment: @huaxingao I know that it's not related to the format change that you r doing in this PR. But shouldn't we have a SET statement here, so users can cut-paste the command in their shell to see the behavior ? Perhaps we discussed it in the pr that added this clause. Just a question :-) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #26624: [SPARK-8981][CORE][test-hadoop3.2][test-java11] Add MDC support in Executor
viirya commented on a change in pull request #26624: URL: https://github.com/apache/spark/pull/26624#discussion_r419808850 ## File path: core/src/main/scala/org/apache/spark/util/ThreadUtils.scala ## @@ -17,21 +17,106 @@ package org.apache.spark.util +import java.util import java.util.concurrent._ import java.util.concurrent.locks.ReentrantLock +import com.google.common.util.concurrent.{MoreExecutors, ThreadFactoryBuilder} import scala.concurrent.{Awaitable, ExecutionContext, ExecutionContextExecutor, Future} import scala.concurrent.duration.{Duration, FiniteDuration} import scala.language.higherKinds import scala.util.control.NonFatal -import com.google.common.util.concurrent.ThreadFactoryBuilder - import org.apache.spark.SparkException import org.apache.spark.rpc.RpcAbortException private[spark] object ThreadUtils { + object MDCAwareThreadPoolExecutor { +def newCachedThreadPool(threadFactory: ThreadFactory): ThreadPoolExecutor = { + // The values needs to be synced with `Executors.newCachedThreadPool` + new MDCAwareThreadPoolExecutor( +0, +Integer.MAX_VALUE, +60L, +TimeUnit.SECONDS, +new SynchronousQueue[Runnable], +threadFactory) +} + +def newFixedThreadPool(nThreads: Int, threadFactory: ThreadFactory): ThreadPoolExecutor = { + // The values needs to be synced with `Executors.newFixedThreadPool` + new MDCAwareThreadPoolExecutor( +nThreads, +nThreads, +0L, +TimeUnit.MILLISECONDS, +new LinkedBlockingQueue[Runnable], +threadFactory) +} + +def newSingleThreadExecutor(threadFactory: ThreadFactory): ExecutorService = { + // The values needs to be synced with `Executors.newSingleThreadExecutor` + Executors.unconfigurableExecutorService( +new MDCAwareThreadPoolExecutor( Review comment: The finalize method helps to shutdown the underlying thread pool. I'm not sure if we rely on this or not in Spark. If you are confident on this change, then it should be fine. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28239: [SPARK-31467][SQL][TEST] Refactor the sql tests to prevent TableAlreadyExistsException
AmplabJenkins removed a comment on pull request #28239: URL: https://github.com/apache/spark/pull/28239#issuecomment-623778968 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28239: [SPARK-31467][SQL][TEST] Refactor the sql tests to prevent TableAlreadyExistsException
AmplabJenkins commented on pull request #28239: URL: https://github.com/apache/spark/pull/28239#issuecomment-623778968 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28239: [SPARK-31467][SQL][TEST] Refactor the sql tests to prevent TableAlreadyExistsException
SparkQA commented on pull request #28239: URL: https://github.com/apache/spark/pull/28239#issuecomment-623778576 **[Test build #122295 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122295/testReport)** for PR 28239 at commit [`453c5a5`](https://github.com/apache/spark/commit/453c5a5e0717d2681fc2e0ed4f48ca093d4020a0). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on pull request #28239: [SPARK-31467][SQL][TEST] Refactor the sql tests to prevent TableAlreadyExistsException
maropu commented on pull request #28239: URL: https://github.com/apache/spark/pull/28239#issuecomment-623775852 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #26624: [SPARK-8981][CORE][test-hadoop3.2][test-java11] Add MDC support in Executor
SparkQA commented on pull request #26624: URL: https://github.com/apache/spark/pull/26624#issuecomment-623774430 **[Test build #122294 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122294/testReport)** for PR 26624 at commit [`50a68c7`](https://github.com/apache/spark/commit/50a68c7eded44ce6eb5afaff6f8170c4add70a25). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #26624: [SPARK-8981][CORE][test-hadoop3.2][test-java11] Add MDC support in Executor
AmplabJenkins removed a comment on pull request #26624: URL: https://github.com/apache/spark/pull/26624#issuecomment-623772844 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #26624: [SPARK-8981][CORE][test-hadoop3.2][test-java11] Add MDC support in Executor
AmplabJenkins commented on pull request #26624: URL: https://github.com/apache/spark/pull/26624#issuecomment-623772844 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org