[GitHub] [spark] yaooqinn commented on a change in pull request #28442: [SPARK-31631][TESTS] Fix test flakiness caused by MiniKdc which throws 'address in use' BindException with retry

2020-05-04 Thread GitBox


yaooqinn commented on a change in pull request #28442:
URL: https://github.com/apache/spark/pull/28442#discussion_r419878281



##
File path: 
external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaTestUtils.scala
##
@@ -131,11 +130,7 @@ class KafkaTestUtils(
   }
 
   private def setUpMiniKdc(): Unit = {
-val kdcDir = Utils.createTempDir()

Review comment:
   Do I still need to address this comment?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28370: [SPARK-20732][CORE] Decommission cache blocks to other executors when an executor is decommissioned

2020-05-04 Thread GitBox


AmplabJenkins commented on pull request #28370:
URL: https://github.com/apache/spark/pull/28370#issuecomment-623869047







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28370: [SPARK-20732][CORE] Decommission cache blocks to other executors when an executor is decommissioned

2020-05-04 Thread GitBox


AmplabJenkins removed a comment on pull request #28370:
URL: https://github.com/apache/spark/pull/28370#issuecomment-623869047







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #28366: [SPARK-31365][SQL] Enable nested predicate pushdown per data sources

2020-05-04 Thread GitBox


cloud-fan commented on a change in pull request #28366:
URL: https://github.com/apache/spark/pull/28366#discussion_r419877636



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##
@@ -2063,16 +2063,17 @@ object SQLConf {
   .booleanConf
   .createWithDefault(true)
 
-  val NESTED_PREDICATE_PUSHDOWN_ENABLED =
-buildConf("spark.sql.optimizer.nestedPredicatePushdown.enabled")
-  .internal()
-  .doc("When true, Spark tries to push down predicates for nested columns 
and or names " +
-"containing `dots` to data sources. Currently, Parquet implements both 
optimizations " +
-"while ORC only supports predicates for names containing `dots`. The 
other data sources" +
-"don't support this feature yet.")
+  val NESTED_PREDICATE_PUSHDOWN_V1_SOURCE_LIST =
+buildConf("spark.sql.optimizer.nestedPredicatePushdown.supportedV1Sources")

Review comment:
   `supportedV1Sources` -> `supportedFileSources`?
   
   DS v1 and file source are different APIs and have different planner 
rules/physical nodes.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28370: [SPARK-20732][CORE] Decommission cache blocks to other executors when an executor is decommissioned

2020-05-04 Thread GitBox


SparkQA commented on pull request #28370:
URL: https://github.com/apache/spark/pull/28370#issuecomment-623868721


   **[Test build #122302 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122302/testReport)**
 for PR 28370 at commit 
[`c645582`](https://github.com/apache/spark/commit/c645582e2df06fe4736dc3b1673b377d4baf96f0).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #28366: [SPARK-31365][SQL] Enable nested predicate pushdown per data sources

2020-05-04 Thread GitBox


cloud-fan commented on a change in pull request #28366:
URL: https://github.com/apache/spark/pull/28366#discussion_r419877190



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala
##
@@ -179,15 +179,22 @@ class DataSourceV2Strategy(session: SparkSession) extends 
Strategy with Predicat
 
 case OverwriteByExpression(r: DataSourceV2Relation, deleteExpr, query, 
writeOptions, _) =>
   // fail if any filter cannot be converted. correctness depends on 
removing all matching data.
-  val filters = splitConjunctivePredicates(deleteExpr).map {
-filter => DataSourceStrategy.translateFilter(deleteExpr).getOrElse(
-  throw new AnalysisException(s"Cannot translate expression to source 
filter: $filter"))
-  }.toArray
+  val filters = splitConjunctivePredicates(deleteExpr)
+  def transferFilters =
+(filters: Seq[Expression], supportNestedPredicatePushdown: Boolean) => 
{

Review comment:
   Do we need the `supportNestedPredicatePushdown` parameter here as the 
caller side always pass true?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28451: [SPARK-31636][SQL][DOCS] Remove HTML syntax in SQL reference

2020-05-04 Thread GitBox


AmplabJenkins removed a comment on pull request #28451:
URL: https://github.com/apache/spark/pull/28451#issuecomment-623867741







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28451: [SPARK-31636][SQL][DOCS] Remove HTML syntax in SQL reference

2020-05-04 Thread GitBox


SparkQA removed a comment on pull request #28451:
URL: https://github.com/apache/spark/pull/28451#issuecomment-623865188


   **[Test build #122301 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122301/testReport)**
 for PR 28451 at commit 
[`289e5ae`](https://github.com/apache/spark/commit/289e5aea19b2b027efa37fbfe7bd723824b02b92).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28451: [SPARK-31636][SQL][DOCS] Remove HTML syntax in SQL reference

2020-05-04 Thread GitBox


SparkQA commented on pull request #28451:
URL: https://github.com/apache/spark/pull/28451#issuecomment-623867673


   **[Test build #122301 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122301/testReport)**
 for PR 28451 at commit 
[`289e5ae`](https://github.com/apache/spark/commit/289e5aea19b2b027efa37fbfe7bd723824b02b92).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28451: [SPARK-31636][SQL][DOCS] Remove HTML syntax in SQL reference

2020-05-04 Thread GitBox


AmplabJenkins commented on pull request #28451:
URL: https://github.com/apache/spark/pull/28451#issuecomment-623867741







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] igreenfield commented on pull request #26624: [SPARK-8981][CORE][test-hadoop3.2][test-java11] Add MDC support in Executor

2020-05-04 Thread GitBox


igreenfield commented on pull request #26624:
URL: https://github.com/apache/spark/pull/26624#issuecomment-623867478


   I also ok with removing the default appId, appName. User will add what he 
needs.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dilipbiswal commented on a change in pull request #28451: [SPARK-31636][SQL][DOCS] Remove HTML syntax in SQL reference

2020-05-04 Thread GitBox


dilipbiswal commented on a change in pull request #28451:
URL: https://github.com/apache/spark/pull/28451#discussion_r419874633



##
File path: docs/sql-ref-identifier.md
##
@@ -27,41 +27,34 @@ An identifier is a string used to identify a database 
object such as a table, vi
 
  Regular Identifier
 
-{% highlight sql %}
+```sql
 { letter | digit | '_' } [ , ... ]
-{% endhighlight %}
+```
 Note: If `spark.sql.ansi.enabled` is set to true, ANSI SQL reserved keywords 
cannot be used as identifiers. For more details, please refer to [ANSI 
Compliance](sql-ref-ansi-compliance.html).

Review comment:
   @huaxingao Should we bold "Note" ? I see that in other places we do bold 
it.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #28431: [SPARK-31623][SQL][TESTS] Benchmark rebasing of INT96 and TIMESTAMP_MILLIS timestamps in read/write

2020-05-04 Thread GitBox


cloud-fan commented on pull request #28431:
URL: https://github.com/apache/spark/pull/28431#issuecomment-623865659


   thanks, merging to master/3.0!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28451: [SPARK-31636][SQL][DOCS] Remove HTML syntax in SQL reference

2020-05-04 Thread GitBox


AmplabJenkins removed a comment on pull request #28451:
URL: https://github.com/apache/spark/pull/28451#issuecomment-623865497







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28451: [SPARK-31636][SQL][DOCS] Remove HTML syntax in SQL reference

2020-05-04 Thread GitBox


AmplabJenkins commented on pull request #28451:
URL: https://github.com/apache/spark/pull/28451#issuecomment-623865497







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28451: [SPARK-31636][SQL][DOCS] Remove HTML syntax in SQL reference

2020-05-04 Thread GitBox


SparkQA commented on pull request #28451:
URL: https://github.com/apache/spark/pull/28451#issuecomment-623865188


   **[Test build #122301 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122301/testReport)**
 for PR 28451 at commit 
[`289e5ae`](https://github.com/apache/spark/commit/289e5aea19b2b027efa37fbfe7bd723824b02b92).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #27710: [SPARK-30960][SQL] add back the legacy date/timestamp format support in CSV/JSON parser

2020-05-04 Thread GitBox


cloud-fan commented on a change in pull request #27710:
URL: https://github.com/apache/spark/pull/27710#discussion_r419872705



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala
##
@@ -239,7 +246,23 @@ class JacksonParser(
 case DateType =>
   (parser: JsonParser) => parseJsonToken[java.lang.Integer](parser, 
dataType) {
 case VALUE_STRING if parser.getTextLength >= 1 =>
-  dateFormatter.parse(parser.getText)
+  try {
+dateFormatter.parse(parser.getText)
+  } catch {
+case NonFatal(e) =>
+  // If fails to parse, then tries the way used in 2.0 and 1.x for 
backwards
+  // compatibility.
+  val str = 
UTF8String.fromString(DateTimeUtils.cleanLegacyTimestampStr(parser.getText))
+  DateTimeUtils.stringToDate(str, options.zoneId).getOrElse {
+// In Spark 1.5.0, we store the data as number of days since 
epoch in string.
+// So, we just convert it to Int.
+try {
+  parser.getText.toInt

Review comment:
   good catch! I think we should.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] huaxingao commented on a change in pull request #28451: [SPARK-31636][SQL][DOCS] Remove HTML syntax in SQL reference

2020-05-04 Thread GitBox


huaxingao commented on a change in pull request #28451:
URL: https://github.com/apache/spark/pull/28451#discussion_r419872224



##
File path: docs/sql-ref-literals.md
##
@@ -71,128 +68,114 @@ SELECT 'it\'s $10.' AS col;
 +-+
 |It's $10.|
 +-+
-{% endhighlight %}
+```
 
 ### Binary Literal
 
 A binary literal is used to specify a byte sequence value.
 
  Syntax
 
-{% highlight sql %}
+```sql
 X { 'c [ ... ]' | "c [ ... ]" }
-{% endhighlight %}
+```
+
+ Parameters
 
- Parameters
+* **c**
 
-
-  c
-  
 One character from the character set.

Review comment:
   seems to be hexadecimal. Changed to the following: 
   ```
    Syntax
   
   X { 'num [ ... ]' | "num [ ... ]" }
   
    Parameters
   
   * **num**
   
   Any hexadecimal number from 0 to F.
   ```
   cc @yaooqinn 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #28310: [SPARK-31527][SQL] date add/subtract interval only allow those day precision in ansi mode

2020-05-04 Thread GitBox


cloud-fan commented on a change in pull request #28310:
URL: https://github.com/apache/spark/pull/28310#discussion_r419872162



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
##
@@ -618,6 +618,22 @@ object DateTimeUtils {
 instantToMicros(resultTimestamp.toInstant)
   }
 
+  /**
+   * Add the date and the interval's months and days.
+   * Returns a date value, expressed in days since 1.1.1970.
+   *
+   * @throws DateTimeException if the result exceeds the supported date range
+   * @throws IllegalArgumentException if the interval has `microseconds` part
+   */
+  def dateAddInterval(
+ start: SQLDate,
+ interval: CalendarInterval): SQLDate = {
+require(interval.microseconds == 0,
+  "Cannot add hours, minutes or seconds, milliseconds, microseconds to a 
date")
+val ld = 
LocalDate.ofEpochDay(start).plusMonths(interval.months).plusDays(interval.days)

Review comment:
   FYI, in snowflake `internal '1 month 1 day'` is different from `internal 
'1 day 1 month'`. We should at least document our own behavior.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] huaxingao commented on a change in pull request #28451: [SPARK-31636][SQL][DOCS] Remove HTML syntax in SQL reference

2020-05-04 Thread GitBox


huaxingao commented on a change in pull request #28451:
URL: https://github.com/apache/spark/pull/28451#discussion_r419872224



##
File path: docs/sql-ref-literals.md
##
@@ -71,128 +68,114 @@ SELECT 'it\'s $10.' AS col;
 +-+
 |It's $10.|
 +-+
-{% endhighlight %}
+```
 
 ### Binary Literal
 
 A binary literal is used to specify a byte sequence value.
 
  Syntax
 
-{% highlight sql %}
+```sql
 X { 'c [ ... ]' | "c [ ... ]" }
-{% endhighlight %}
+```
+
+ Parameters
 
- Parameters
+* **c**
 
-
-  c
-  
 One character from the character set.

Review comment:
   seems to be hexadecimal. Changed to the following: 
   ```
    Syntax
   
   ```sql
   X { 'num [ ... ]' | "num [ ... ]" }
   ```
   
    Parameters
   
   * **num**
   
   Any hexadecimal number from 0 to F.
   ```
   cc @yaooqinn 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] huaxingao commented on a change in pull request #28451: [SPARK-31636][SQL][DOCS] Remove HTML syntax in SQL reference

2020-05-04 Thread GitBox


huaxingao commented on a change in pull request #28451:
URL: https://github.com/apache/spark/pull/28451#discussion_r419871991



##
File path: docs/sql-ref-identifier.md
##
@@ -27,54 +27,47 @@ An identifier is a string used to identify a database 
object such as a table, vi
 
  Regular Identifier
 
-{% highlight sql %}
+```sql
 { letter | digit | '_' } [ , ... ]
-{% endhighlight %}
+```
 Note: If `spark.sql.ansi.enabled` is set to true, ANSI SQL reserved keywords 
cannot be used as identifiers. For more details, please refer to [ANSI 
Compliance](sql-ref-ansi-compliance.html).
 
  Delimited Identifier
 
-{% highlight sql %}
+```sql
 `c [ ... ]`
-{% endhighlight %}
+```
 
 ### Parameters
 
-
-  letter
-  
+* **letter**
+
 Any letter from A-Z or a-z.
-  
-
-
-  digit
-  
+
+* **digit**
+
 Any numeral from 0 to 9.
-  
-
-
-  c
-  
+
+* **c**
+
 Any character from the character set. Use ` to escape special 
characters (e.g., `).
-  
-
 
 ### Examples
 
-{% highlight sql %}
+```sql
 -- This CREATE TABLE fails with ParseException because of the illegal 
identifier name a.b
 CREATE TABLE test (a.b int);
-org.apache.spark.sql.catalyst.parser.ParseException:
-no viable alternative at input 'CREATE TABLE test (a.'(line 1, pos 20)
+  org.apache.spark.sql.catalyst.parser.ParseException:

Review comment:
   Fixed. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] huaxingao commented on a change in pull request #28451: [SPARK-31636][SQL][DOCS] Remove HTML syntax in SQL reference

2020-05-04 Thread GitBox


huaxingao commented on a change in pull request #28451:
URL: https://github.com/apache/spark/pull/28451#discussion_r419871927



##
File path: docs/sql-ref-literals.md
##
@@ -35,22 +35,19 @@ A string literal is used to specify a character string 
value.
 
  Syntax
 
-{% highlight sql %}
+```sql
 'c [ ... ]' | "c [ ... ]"

Review comment:
   changed to ```char```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] huaxingao commented on a change in pull request #28451: [SPARK-31636][SQL][DOCS] Remove HTML syntax in SQL reference

2020-05-04 Thread GitBox


huaxingao commented on a change in pull request #28451:
URL: https://github.com/apache/spark/pull/28451#discussion_r419871848



##
File path: docs/sql-ref-ansi-compliance.md
##
@@ -66,7 +66,7 @@ This means that in case an operation causes overflows, the 
result is the same wi
 On the other hand, Spark SQL returns null for decimal overflows.
 When `spark.sql.ansi.enabled` is set to `true` and an overflow occurs in 
numeric and interval arithmetic operations, it throws an arithmetic exception 
at runtime.
 
-{% highlight sql %}
+```sql
 -- `spark.sql.ansi.enabled=true`

Review comment:
   I don't have a strong opinion on this. seems to me comment is OK too. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #28445: [SPARK-31212][SQL][2.4] Fix Failure of casting the '1000-02-29' string to the date type

2020-05-04 Thread GitBox


cloud-fan commented on pull request #28445:
URL: https://github.com/apache/spark/pull/28445#issuecomment-623863062


   @MaxGekk what's your opinion? I'm fine with this fix but I won't encourage 
people to spend much time fixing datetime related bugs in 2.4. The datetime 
part is completely rewritten in 3.0.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] igreenfield commented on a change in pull request #26624: [SPARK-8981][CORE][test-hadoop3.2][test-java11] Add MDC support in Executor

2020-05-04 Thread GitBox


igreenfield commented on a change in pull request #26624:
URL: https://github.com/apache/spark/pull/26624#discussion_r419869698



##
File path: docs/configuration.md
##
@@ -2670,6 +2670,9 @@ Spark uses [log4j](http://logging.apache.org/log4j/) for 
logging. You can config
 `log4j.properties` file in the `conf` directory. One way to start is to copy 
the existing
 `log4j.properties.template` located there.
 
+By default, Spark adds to the MDC 3 records: `appId`, `appName` and `taskName` 
you can add that to your patternLayout `%X{appId}` in order to print in the logs

Review comment:
   Maybe in both places?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #28383: [SPARK-31590][SQL] Metadata-only queries should not include subquery in partition filters

2020-05-04 Thread GitBox


cloud-fan commented on pull request #28383:
URL: https://github.com/apache/spark/pull/28383#issuecomment-623861615


   Shall we remove `OptimizeMetadataOnlyQuery`? IIRC it has a correcness issue 
and we disable it by default. cc @gengliangwang 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28239: [SPARK-31467][SQL][TEST] Refactor the sql tests to prevent TableAlreadyExistsException

2020-05-04 Thread GitBox


AmplabJenkins removed a comment on pull request #28239:
URL: https://github.com/apache/spark/pull/28239#issuecomment-623857362







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28239: [SPARK-31467][SQL][TEST] Refactor the sql tests to prevent TableAlreadyExistsException

2020-05-04 Thread GitBox


SparkQA removed a comment on pull request #28239:
URL: https://github.com/apache/spark/pull/28239#issuecomment-623778576


   **[Test build #122295 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122295/testReport)**
 for PR 28239 at commit 
[`453c5a5`](https://github.com/apache/spark/commit/453c5a5e0717d2681fc2e0ed4f48ca093d4020a0).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28239: [SPARK-31467][SQL][TEST] Refactor the sql tests to prevent TableAlreadyExistsException

2020-05-04 Thread GitBox


AmplabJenkins commented on pull request #28239:
URL: https://github.com/apache/spark/pull/28239#issuecomment-623857362







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #28452: [SPARK-27963][FOLLOW-UP][DOCS][CORE] Remove `for testing` because CleanerListener is used ExecutorMonitor during dynamic allocation

2020-05-04 Thread GitBox


dongjoon-hyun commented on pull request #28452:
URL: https://github.com/apache/spark/pull/28452#issuecomment-623857249


   Thank you all!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28239: [SPARK-31467][SQL][TEST] Refactor the sql tests to prevent TableAlreadyExistsException

2020-05-04 Thread GitBox


SparkQA commented on pull request #28239:
URL: https://github.com/apache/spark/pull/28239#issuecomment-623856851


   **[Test build #122295 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122295/testReport)**
 for PR 28239 at commit 
[`453c5a5`](https://github.com/apache/spark/commit/453c5a5e0717d2681fc2e0ed4f48ca093d4020a0).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #26624: [SPARK-8981][CORE][test-hadoop3.2][test-java11] Add MDC support in Executor

2020-05-04 Thread GitBox


cloud-fan commented on pull request #26624:
URL: https://github.com/apache/spark/pull/26624#issuecomment-623855672


   LGTM except for the app id/name. I'm still not convinced that it's working, 
at least @Ngone51 reported he can't see app id/name by local testing.
   
   Can you clearly point out the code that sets app id/name? You mentioned it's 
in DAG scheduler, can you point out which line? It's even better if you can add 
a test.
   
   BTW I think it's OK to ask users to set app id/name themselves by 
`mdc.appId/Name`. I'm good with this patch if we just remove the handling of 
app id/name.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #26624: [SPARK-8981][CORE][test-hadoop3.2][test-java11] Add MDC support in Executor

2020-05-04 Thread GitBox


cloud-fan commented on a change in pull request #26624:
URL: https://github.com/apache/spark/pull/26624#discussion_r419860787



##
File path: docs/configuration.md
##
@@ -2670,6 +2670,9 @@ Spark uses [log4j](http://logging.apache.org/log4j/) for 
logging. You can config
 `log4j.properties` file in the `conf` directory. One way to start is to copy 
the existing
 `log4j.properties.template` located there.
 
+By default, Spark adds to the MDC 3 records: `appId`, `appName` and `taskName` 
you can add that to your patternLayout `%X{appId}` in order to print in the logs

Review comment:
   I think it's better to put the doc in `conf/log4j.properties.template`, 
where users use this feature.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] huaxingao commented on a change in pull request #28451: [SPARK-31636][SQL][DOCS] Remove HTML syntax in SQL reference

2020-05-04 Thread GitBox


huaxingao commented on a change in pull request #28451:
URL: https://github.com/apache/spark/pull/28451#discussion_r419857329



##
File path: docs/sql-ref-functions-udf-aggregate.md
##
@@ -113,26 +102,26 @@ OPTIONS (
 );
 
 SELECT * FROM employees;
--- +---+--+
--- |   name|salary|
--- +---+--+
--- |Michael|  3000|
--- |   Andy|  4500|
--- | Justin|  3500|
--- |  Berta|  4000|
--- +---+--+
++---+--+
+|   name|salary|
++---+--+
+|Michael|  3000|
+|   Andy|  4500|
+| Justin|  3500|
+|  Berta|  4000|
++---+--+
 
 SELECT myAverage(salary) as average_salary FROM employees;
--- +--+
--- |average_salary|
--- +--+
--- |3750.0|
--- +--+
-{% endhighlight %}
++--+
+|average_salary|
++--+
+|3750.0|
++--+
+```
 

Review comment:
   This is for examples ``. I prefer to keep this 
since we use this format for all the examples. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gatorsmile commented on a change in pull request #28224: [SPARK-31429][SQL][DOC] Automatically generates a SQL document for built-in functions

2020-05-04 Thread GitBox


gatorsmile commented on a change in pull request #28224:
URL: https://github.com/apache/spark/pull/28224#discussion_r419857270



##
File path: docs/sql-ref-functions-builtin.md
##
@@ -0,0 +1,77 @@
+---
+layout: global
+title: Built-in Functions
+displayTitle: Built-in Functions
+license: |
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+  http://www.apache.org/licenses/LICENSE-2.0
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+---
+
+{% for static_file in site.static_files %}
+{% if static_file.name == 'generated-agg-funcs-table.html' %}
+### Aggregate Functions
+{% include_relative generated-agg-funcs-table.html %}
+ Examples
+{% include_relative generated-agg-funcs-examples.html %}
+{% break %}
+{% endif %}
+{% endfor %}
+
+{% for static_file in site.static_files %}
+{% if static_file.name == 'generated-window-funcs-table.html' %}
+### Window Functions
+{% include_relative generated-window-funcs-table.html %}
+{% break %}
+{% endif %}
+{% endfor %}
+
+{% for static_file in site.static_files %}
+{% if static_file.name == 'generated-array-funcs-table.html' %}
+### Array Functions
+{% include_relative generated-array-funcs-table.html %}
+ Examples
+{% include_relative generated-array-funcs-examples.html %}
+{% break %}
+{% endif %}
+{% endfor %}
+
+{% for static_file in site.static_files %}
+{% if static_file.name == 'generated-map-funcs-table.html' %}
+### Map Functions
+{% include_relative generated-map-funcs-table.html %}
+ Examples
+{% include_relative generated-map-funcs-examples.html %}
+{% break %}
+{% endif %}
+{% endfor %}
+
+{% for static_file in site.static_files %}
+{% if static_file.name == 'generated-datetime-funcs-table.html' %}
+### Date and Timestamp Functions
+{% include_relative generated-datetime-funcs-table.html %}
+ Examples
+{% include_relative generated-datetime-funcs-examples.html %}
+{% break %}
+{% endif %}
+{% endfor %}
+
+{% for static_file in site.static_files %}
+{% if static_file.name == 'generated-json-funcs-table.html' %}
+### JSON Functions
+{% include_relative generated-json-funcs-table.html %}
+ Examples
+{% include_relative generated-agg-funcs-examples.html %}

Review comment:
   generated-agg-funcs-examples.html -> generated-json-funcs-examples.html ?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28009: [SPARK-31235][YARN] Separates different categories of applications

2020-05-04 Thread GitBox


SparkQA removed a comment on pull request #28009:
URL: https://github.com/apache/spark/pull/28009#issuecomment-623844664


   **[Test build #122299 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122299/testReport)**
 for PR 28009 at commit 
[`b762753`](https://github.com/apache/spark/commit/b762753d9642d7c5b1faa8d5dcaa6402c95730c1).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28164: [SPARK-31393][SQL] Show the correct alias in schema for expression

2020-05-04 Thread GitBox


SparkQA commented on pull request #28164:
URL: https://github.com/apache/spark/pull/28164#issuecomment-623848821


   **[Test build #122300 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122300/testReport)**
 for PR 28164 at commit 
[`cc4ee4c`](https://github.com/apache/spark/commit/cc4ee4c7b09ee9c09f40ac1d4f714db6a83838d4).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28009: [SPARK-31235][YARN] Separates different categories of applications

2020-05-04 Thread GitBox


AmplabJenkins commented on pull request #28009:
URL: https://github.com/apache/spark/pull/28009#issuecomment-623848806







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28009: [SPARK-31235][YARN] Separates different categories of applications

2020-05-04 Thread GitBox


SparkQA commented on pull request #28009:
URL: https://github.com/apache/spark/pull/28009#issuecomment-623848721


   **[Test build #122299 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122299/testReport)**
 for PR 28009 at commit 
[`b762753`](https://github.com/apache/spark/commit/b762753d9642d7c5b1faa8d5dcaa6402c95730c1).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28009: [SPARK-31235][YARN] Separates different categories of applications

2020-05-04 Thread GitBox


AmplabJenkins removed a comment on pull request #28009:
URL: https://github.com/apache/spark/pull/28009#issuecomment-623848806







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28164: [SPARK-31393][SQL] Show the correct alias in schema for expression

2020-05-04 Thread GitBox


AmplabJenkins removed a comment on pull request #28164:
URL: https://github.com/apache/spark/pull/28164#issuecomment-623847685







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28164: [SPARK-31393][SQL] Show the correct alias in schema for expression

2020-05-04 Thread GitBox


AmplabJenkins commented on pull request #28164:
URL: https://github.com/apache/spark/pull/28164#issuecomment-623847685







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28009: [SPARK-31235][YARN] Separates different categories of applications

2020-05-04 Thread GitBox


SparkQA removed a comment on pull request #28009:
URL: https://github.com/apache/spark/pull/28009#issuecomment-623821922


   **[Test build #122298 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122298/testReport)**
 for PR 28009 at commit 
[`4599e18`](https://github.com/apache/spark/commit/4599e18141efce8cc241fd1f9f4d5b84e3a297e7).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28009: [SPARK-31235][YARN] Separates different categories of applications

2020-05-04 Thread GitBox


AmplabJenkins commented on pull request #28009:
URL: https://github.com/apache/spark/pull/28009#issuecomment-623846376







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28009: [SPARK-31235][YARN] Separates different categories of applications

2020-05-04 Thread GitBox


AmplabJenkins removed a comment on pull request #28009:
URL: https://github.com/apache/spark/pull/28009#issuecomment-623846376







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28009: [SPARK-31235][YARN] Separates different categories of applications

2020-05-04 Thread GitBox


SparkQA commented on pull request #28009:
URL: https://github.com/apache/spark/pull/28009#issuecomment-623846317


   **[Test build #122298 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122298/testReport)**
 for PR 28009 at commit 
[`4599e18`](https://github.com/apache/spark/commit/4599e18141efce8cc241fd1f9f4d5b84e3a297e7).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28009: [SPARK-31235][YARN] Separates different categories of applications

2020-05-04 Thread GitBox


AmplabJenkins commented on pull request #28009:
URL: https://github.com/apache/spark/pull/28009#issuecomment-623845229







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28009: [SPARK-31235][YARN] Separates different categories of applications

2020-05-04 Thread GitBox


AmplabJenkins removed a comment on pull request #28009:
URL: https://github.com/apache/spark/pull/28009#issuecomment-623845229







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28009: [SPARK-31235][YARN] Separates different categories of applications

2020-05-04 Thread GitBox


SparkQA commented on pull request #28009:
URL: https://github.com/apache/spark/pull/28009#issuecomment-623844664


   **[Test build #122299 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122299/testReport)**
 for PR 28009 at commit 
[`b762753`](https://github.com/apache/spark/commit/b762753d9642d7c5b1faa8d5dcaa6402c95730c1).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28009: [SPARK-31235][YARN] Separates different categories of applications

2020-05-04 Thread GitBox


AmplabJenkins removed a comment on pull request #28009:
URL: https://github.com/apache/spark/pull/28009#issuecomment-623824088







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28009: [SPARK-31235][YARN] Separates different categories of applications

2020-05-04 Thread GitBox


AmplabJenkins commented on pull request #28009:
URL: https://github.com/apache/spark/pull/28009#issuecomment-623824088







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28009: [SPARK-31235][YARN] Separates different categories of applications

2020-05-04 Thread GitBox


SparkQA commented on pull request #28009:
URL: https://github.com/apache/spark/pull/28009#issuecomment-623821922


   **[Test build #122298 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122298/testReport)**
 for PR 28009 at commit 
[`4599e18`](https://github.com/apache/spark/commit/4599e18141efce8cc241fd1f9f4d5b84e3a297e7).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] beliefer commented on pull request #28430: [SPARK-31372][SQL][TEST][FOLLOW-UP] Improve ExpressionsSchemaSuite so that easy to track the diff.

2020-05-04 Thread GitBox


beliefer commented on pull request #28430:
URL: https://github.com/apache/spark/pull/28430#issuecomment-623813026


   @HyukjinKwon Thanks for your help!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #26624: [SPARK-8981][CORE][test-hadoop3.2][test-java11] Add MDC support in Executor

2020-05-04 Thread GitBox


AmplabJenkins removed a comment on pull request #26624:
URL: https://github.com/apache/spark/pull/26624#issuecomment-623811893







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #26624: [SPARK-8981][CORE][test-hadoop3.2][test-java11] Add MDC support in Executor

2020-05-04 Thread GitBox


AmplabJenkins commented on pull request #26624:
URL: https://github.com/apache/spark/pull/26624#issuecomment-623811893







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #26624: [SPARK-8981][CORE][test-hadoop3.2][test-java11] Add MDC support in Executor

2020-05-04 Thread GitBox


SparkQA commented on pull request #26624:
URL: https://github.com/apache/spark/pull/26624#issuecomment-623811305


   **[Test build #122294 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122294/testReport)**
 for PR 26624 at commit 
[`50a68c7`](https://github.com/apache/spark/commit/50a68c7eded44ce6eb5afaff6f8170c4add70a25).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #26624: [SPARK-8981][CORE][test-hadoop3.2][test-java11] Add MDC support in Executor

2020-05-04 Thread GitBox


SparkQA removed a comment on pull request #26624:
URL: https://github.com/apache/spark/pull/26624#issuecomment-623774430


   **[Test build #122294 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122294/testReport)**
 for PR 26624 at commit 
[`50a68c7`](https://github.com/apache/spark/commit/50a68c7eded44ce6eb5afaff6f8170c4add70a25).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #28383: [SPARK-31590][SQL] Metadata-only queries should not include subquery in partition filters

2020-05-04 Thread GitBox


maropu commented on a change in pull request #28383:
URL: https://github.com/apache/spark/pull/28383#discussion_r419817895



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/OptimizeMetadataOnlyQuery.scala
##
@@ -117,7 +117,7 @@ case class OptimizeMetadataOnlyQuery(catalog: 
SessionCatalog) extends Rule[Logic
 case a: AttributeReference =>
   a.withName(relation.output.find(_.semanticEquals(a)).get.name)
   }
-}

Review comment:
   Could you filter out this unsupported case outside 
`replaceTableScanWithPartitionMetadata`(I think this filtering is not related 
to normalization)? e.g., in 
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/OptimizeMetadataOnlyQuery.scala#L53-L55





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] igreenfield commented on a change in pull request #26624: [SPARK-8981][CORE][test-hadoop3.2][test-java11] Add MDC support in Executor

2020-05-04 Thread GitBox


igreenfield commented on a change in pull request #26624:
URL: https://github.com/apache/spark/pull/26624#discussion_r419832871



##
File path: core/src/main/scala/org/apache/spark/util/ThreadUtils.scala
##
@@ -17,21 +17,106 @@
 
 package org.apache.spark.util
 
+import java.util
 import java.util.concurrent._
 import java.util.concurrent.locks.ReentrantLock
 
+import com.google.common.util.concurrent.{MoreExecutors, ThreadFactoryBuilder}
 import scala.concurrent.{Awaitable, ExecutionContext, 
ExecutionContextExecutor, Future}
 import scala.concurrent.duration.{Duration, FiniteDuration}
 import scala.language.higherKinds
 import scala.util.control.NonFatal
 
-import com.google.common.util.concurrent.ThreadFactoryBuilder
-
 import org.apache.spark.SparkException
 import org.apache.spark.rpc.RpcAbortException
 
 private[spark] object ThreadUtils {
 
+  object MDCAwareThreadPoolExecutor {
+def newCachedThreadPool(threadFactory: ThreadFactory): ThreadPoolExecutor 
= {
+  // The values needs to be synced with `Executors.newCachedThreadPool`
+  new MDCAwareThreadPoolExecutor(
+0,
+Integer.MAX_VALUE,
+60L,
+TimeUnit.SECONDS,
+new SynchronousQueue[Runnable],
+threadFactory)
+}
+
+def newFixedThreadPool(nThreads: Int, threadFactory: ThreadFactory): 
ThreadPoolExecutor = {
+  // The values needs to be synced with `Executors.newFixedThreadPool`
+  new MDCAwareThreadPoolExecutor(
+nThreads,
+nThreads,
+0L,
+TimeUnit.MILLISECONDS,
+new LinkedBlockingQueue[Runnable],
+threadFactory)
+}
+
+def newSingleThreadExecutor(threadFactory: ThreadFactory): ExecutorService 
= {
+  // The values needs to be synced with `Executors.newSingleThreadExecutor`
+  Executors.unconfigurableExecutorService(
+new MDCAwareThreadPoolExecutor(

Review comment:
   @viirya @HyukjinKwon 
   1. Yes I am confident: finalize is "best-effort" the JVM does guaranty to 
run finalize also in later JVM it becomes 
   ```
   @Deprecated(since="9")
   protected void finalize() throws Throwable { }
   ```
   2. OK I will add comment even so I think in any case no one should rely on 
finalize.  





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28123: [SPARK-31350][SQL] Coalesce bucketed tables for join if applicable

2020-05-04 Thread GitBox


AmplabJenkins commented on pull request #28123:
URL: https://github.com/apache/spark/pull/28123#issuecomment-623800734







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28123: [SPARK-31350][SQL] Coalesce bucketed tables for join if applicable

2020-05-04 Thread GitBox


AmplabJenkins removed a comment on pull request #28123:
URL: https://github.com/apache/spark/pull/28123#issuecomment-623800734







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] imback82 commented on a change in pull request #28123: [SPARK-31350][SQL] Coalesce bucketed tables for join if applicable

2020-05-04 Thread GitBox


imback82 commented on a change in pull request #28123:
URL: https://github.com/apache/spark/pull/28123#discussion_r419830462



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/bucketing/CoalesceBucketsInJoin.scala
##
@@ -0,0 +1,109 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.bucketing
+
+import org.apache.spark.sql.catalyst.catalog.BucketSpec
+import org.apache.spark.sql.catalyst.planning.ExtractEquiJoinKeys
+import org.apache.spark.sql.catalyst.plans.logical.{Filter, Join, LogicalPlan, 
Project}
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.execution.datasources.{HadoopFsRelation, 
LogicalRelation}
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * This rule adds a `CoalesceBuckets` logical plan if one side of two bucketed 
tables can be
+ * coalesced when the two bucketed tables are joined and they differ in the 
number of buckets.
+ */
+object CoalesceBucketsInJoin extends Rule[LogicalPlan]  {

Review comment:
   done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28123: [SPARK-31350][SQL] Coalesce bucketed tables for join if applicable

2020-05-04 Thread GitBox


SparkQA commented on pull request #28123:
URL: https://github.com/apache/spark/pull/28123#issuecomment-623800494


   **[Test build #122297 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122297/testReport)**
 for PR 28123 at commit 
[`eeb0ec7`](https://github.com/apache/spark/commit/eeb0ec7385a9253df0e64316ef2bf069cccf9b6f).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] imback82 commented on a change in pull request #28123: [SPARK-31350][SQL] Coalesce bucketed tables for join if applicable

2020-05-04 Thread GitBox


imback82 commented on a change in pull request #28123:
URL: https://github.com/apache/spark/pull/28123#discussion_r419830393



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/bucketing/CoalesceBucketsInJoin.scala
##
@@ -0,0 +1,109 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.bucketing
+
+import org.apache.spark.sql.catalyst.catalog.BucketSpec
+import org.apache.spark.sql.catalyst.planning.ExtractEquiJoinKeys
+import org.apache.spark.sql.catalyst.plans.logical.{Filter, Join, LogicalPlan, 
Project}
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.execution.datasources.{HadoopFsRelation, 
LogicalRelation}
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * This rule adds a `CoalesceBuckets` logical plan if one side of two bucketed 
tables can be
+ * coalesced when the two bucketed tables are joined and they differ in the 
number of buckets.

Review comment:
   Added more comments





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] imback82 commented on a change in pull request #28123: [SPARK-31350][SQL] Coalesce bucketed tables for join if applicable

2020-05-04 Thread GitBox


imback82 commented on a change in pull request #28123:
URL: https://github.com/apache/spark/pull/28123#discussion_r419830249



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala
##
@@ -221,3 +223,22 @@ object FileSourceStrategy extends Strategy with Logging {
 case _ => Nil
   }
 }
+
+/**
+ * Extractor that handles `CoalesceBuckets` in the child plan extracted from 
`ScanOperation`.
+ */
+object ScanOperationWithCoalescedBuckets {

Review comment:
   I added it in CoalesceBucketsInEquiJoinSuite.scala. (Please let me know 
if it makes more sense to have it in FileSourceStrategySuite.scala)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dilipbiswal commented on pull request #28433: [SPARK-31030] [DOCS] [FOLLOWUP] Replace HTML Table by Markdown Table

2020-05-04 Thread GitBox


dilipbiswal commented on pull request #28433:
URL: https://github.com/apache/spark/pull/28433#issuecomment-623799161


   @maropu @srowen Can this get in now, if there are no other comments ? The 
reason i ask is @huaxingao has a big PR which is changing a lot of files. If 
this can get in first then she can rebase and push ?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dilipbiswal commented on a change in pull request #28451: [SPARK-31636][SQL][DOCS] Remove HTML syntax in SQL reference

2020-05-04 Thread GitBox


dilipbiswal commented on a change in pull request #28451:
URL: https://github.com/apache/spark/pull/28451#discussion_r419827929



##
File path: docs/sql-ref-literals.md
##
@@ -71,128 +68,114 @@ SELECT 'it\'s $10.' AS col;
 +-+
 |It's $10.|
 +-+
-{% endhighlight %}
+```
 
 ### Binary Literal
 
 A binary literal is used to specify a byte sequence value.
 
  Syntax
 
-{% highlight sql %}
+```sql
 X { 'c [ ... ]' | "c [ ... ]" }
-{% endhighlight %}
+```
+
+ Parameters
 
- Parameters
+* **c**
 
-
-  c
-  
 One character from the character set.

Review comment:
   I believe there is limitation on the chars that are allowed in the 
binary literal ?
   for example, i tried :
   SELECT X'zz' AS col and got an exception ?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dilipbiswal commented on a change in pull request #28451: [SPARK-31636][SQL][DOCS] Remove HTML syntax in SQL reference

2020-05-04 Thread GitBox


dilipbiswal commented on a change in pull request #28451:
URL: https://github.com/apache/spark/pull/28451#discussion_r419826842



##
File path: docs/sql-ref-literals.md
##
@@ -35,22 +35,19 @@ A string literal is used to specify a character string 
value.
 
  Syntax
 
-{% highlight sql %}
+```sql
 'c [ ... ]' | "c [ ... ]"

Review comment:
   @huaxingao the parameter `c` kind of looks weird especially in new 
format ? What do you think of character  or any_char or something like that ?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28451: [SPARK-31636][SQL][DOCS] Remove HTML syntax in SQL reference

2020-05-04 Thread GitBox


AmplabJenkins removed a comment on pull request #28451:
URL: https://github.com/apache/spark/pull/28451#issuecomment-623793408







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28451: [SPARK-31636][SQL][DOCS] Remove HTML syntax in SQL reference

2020-05-04 Thread GitBox


AmplabJenkins commented on pull request #28451:
URL: https://github.com/apache/spark/pull/28451#issuecomment-623793408







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28451: [SPARK-31636][SQL][DOCS] Remove HTML syntax in SQL reference

2020-05-04 Thread GitBox


SparkQA removed a comment on pull request #28451:
URL: https://github.com/apache/spark/pull/28451#issuecomment-623791128


   **[Test build #122296 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122296/testReport)**
 for PR 28451 at commit 
[`66d82ca`](https://github.com/apache/spark/commit/66d82ca5a45d3de05429cab2e09ef723cb4d426b).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28451: [SPARK-31636][SQL][DOCS] Remove HTML syntax in SQL reference

2020-05-04 Thread GitBox


SparkQA commented on pull request #28451:
URL: https://github.com/apache/spark/pull/28451#issuecomment-623793351


   **[Test build #122296 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122296/testReport)**
 for PR 28451 at commit 
[`66d82ca`](https://github.com/apache/spark/commit/66d82ca5a45d3de05429cab2e09ef723cb4d426b).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28451: [SPARK-31636][SQL][DOCS] Remove HTML syntax in SQL reference

2020-05-04 Thread GitBox


AmplabJenkins removed a comment on pull request #28451:
URL: https://github.com/apache/spark/pull/28451#issuecomment-623791394







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] huaxingao commented on pull request #28451: [SPARK-31636][SQL][DOCS] Remove HTML syntax in SQL reference

2020-05-04 Thread GitBox


huaxingao commented on pull request #28451:
URL: https://github.com/apache/spark/pull/28451#issuecomment-623791469


   > Rather, we should remove indents in the other places for following the 
result format?
   
   It's better to remove indents. Will spend some time to find all the error 
messages. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28451: [SPARK-31636][SQL][DOCS] Remove HTML syntax in SQL reference

2020-05-04 Thread GitBox


AmplabJenkins commented on pull request #28451:
URL: https://github.com/apache/spark/pull/28451#issuecomment-623791394







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28451: [SPARK-31636][SQL][DOCS] Remove HTML syntax in SQL reference

2020-05-04 Thread GitBox


SparkQA commented on pull request #28451:
URL: https://github.com/apache/spark/pull/28451#issuecomment-623791128


   **[Test build #122296 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122296/testReport)**
 for PR 28451 at commit 
[`66d82ca`](https://github.com/apache/spark/commit/66d82ca5a45d3de05429cab2e09ef723cb4d426b).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] huaxingao commented on a change in pull request #28451: [SPARK-31636][SQL][DOCS] Remove HTML syntax in SQL reference

2020-05-04 Thread GitBox


huaxingao commented on a change in pull request #28451:
URL: https://github.com/apache/spark/pull/28451#discussion_r419818939



##
File path: docs/_data/menu-sql.yaml
##
@@ -156,22 +156,22 @@
   url: sql-ref-syntax-qry-select-distribute-by.html
 - text: LIMIT Clause 
   url: sql-ref-syntax-qry-select-limit.html
+- text: Common Table Expression
+  url: sql-ref-syntax-qry-select-cte.html
+- text: Inline Table
+  url: sql-ref-syntax-qry-select-inline-table.html
 - text: JOIN
   url: sql-ref-syntax-qry-select-join.html
 - text: Join Hints
   url: sql-ref-syntax-qry-select-hints.html
+- text: LIKE Predicate
+  url: sql-ref-syntax-qry-select-like.html
 - text: Set Operators

Review comment:
   I didn't change the order of the first 8 clauses. I think these should 
be grouped together. But I changed the rest to make them alphabetical order.
   https://user-images.githubusercontent.com/13592258/81027881-33663000-8e34-11ea-9305-3d62a1769362.png;>
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on pull request #28383: [SPARK-31590][SQL] Metadata-only queries should not include subquery in partition filters

2020-05-04 Thread GitBox


maropu commented on pull request #28383:
URL: https://github.com/apache/spark/pull/28383#issuecomment-623791024


   > Applying OptimizeMetadataOnlyQuery rule will generate scalar-subquery.
   
   Is this statement true? It seems the test query itself has a subquery.
   ```
   // Analyzed plan of the test query
   Aggregate [partcol1#40], [partcol1#40, max(partcol2#41) AS partcol2#71]
   +- Filter ((partcol1#40 = scalar-subquery#70 []) AND (partcol2#41 = even))
  :  +- Aggregate [max(partcol1#40) AS max(partcol1)#73]
  : +- SubqueryAlias spark_catalog.default.srcpart
  :+- Relation[col1#38,col2#39,partcol1#40,partcol2#41] parquet
  +- SubqueryAlias spark_catalog.default.srcpart
 +- Relation[col1#38,col2#39,partcol1#40,partcol2#41] parquet
   ```
   I think the root cause is just that unsupported `partitionFilters` 
(subquery) is passed into `FileIndex.listFiles`.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #28383: [SPARK-31590][SQL] Metadata-only queries should not include subquery in partition filters

2020-05-04 Thread GitBox


maropu commented on a change in pull request #28383:
URL: https://github.com/apache/spark/pull/28383#discussion_r419817895



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/OptimizeMetadataOnlyQuery.scala
##
@@ -117,7 +117,7 @@ case class OptimizeMetadataOnlyQuery(catalog: 
SessionCatalog) extends Rule[Logic
 case a: AttributeReference =>
   a.withName(relation.output.find(_.semanticEquals(a)).get.name)
   }
-}

Review comment:
   Could you filter out this unsupported case in advance? e.g., in 
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/OptimizeMetadataOnlyQuery.scala#L53-L55





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] huaxingao commented on a change in pull request #28451: [SPARK-31636][SQL][DOCS] Remove HTML syntax in SQL reference

2020-05-04 Thread GitBox


huaxingao commented on a change in pull request #28451:
URL: https://github.com/apache/spark/pull/28451#discussion_r419817932



##
File path: docs/sql-ref-functions-udf-aggregate.md
##
@@ -113,26 +102,26 @@ OPTIONS (
 );
 
 SELECT * FROM employees;
--- +---+--+
--- |   name|salary|
--- +---+--+
--- |Michael|  3000|
--- |   Andy|  4500|
--- | Justin|  3500|
--- |  Berta|  4000|
--- +---+--+
++---+--+
+|   name|salary|
++---+--+
+|Michael|  3000|
+|   Andy|  4500|
+| Justin|  3500|
+|  Berta|  4000|
++---+--+
 
 SELECT myAverage(salary) as average_salary FROM employees;
--- +--+
--- |average_salary|
--- +--+
--- |3750.0|
--- +--+
-{% endhighlight %}
++--+
+|average_salary|
++--+
+|3750.0|
++--+
+```
 

Review comment:
   I will take a look at this. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28452: [SPARK-27963][FOLLOW-UP][DOCS][CORE] Remove `for testing` because CleanerListener is used ExecutorMonitor during dynamic alloca

2020-05-04 Thread GitBox


AmplabJenkins removed a comment on pull request #28452:
URL: https://github.com/apache/spark/pull/28452#issuecomment-623789508


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/122293/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28452: [SPARK-27963][FOLLOW-UP][DOCS][CORE] Remove `for testing` because CleanerListener is used ExecutorMonitor during dynamic alloca

2020-05-04 Thread GitBox


AmplabJenkins removed a comment on pull request #28452:
URL: https://github.com/apache/spark/pull/28452#issuecomment-623789506


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28452: [SPARK-27963][FOLLOW-UP][DOCS][CORE] Remove `for testing` because CleanerListener is used ExecutorMonitor during dynamic allocation

2020-05-04 Thread GitBox


SparkQA removed a comment on pull request #28452:
URL: https://github.com/apache/spark/pull/28452#issuecomment-623755415


   **[Test build #122293 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122293/testReport)**
 for PR 28452 at commit 
[`34c0724`](https://github.com/apache/spark/commit/34c072421e664b8b366574d2f77fce7b0bea3412).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28452: [SPARK-27963][FOLLOW-UP][DOCS][CORE] Remove `for testing` because CleanerListener is used ExecutorMonitor during dynamic allocation

2020-05-04 Thread GitBox


AmplabJenkins commented on pull request #28452:
URL: https://github.com/apache/spark/pull/28452#issuecomment-623789506







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28452: [SPARK-27963][FOLLOW-UP][DOCS][CORE] Remove `for testing` because CleanerListener is used ExecutorMonitor during dynamic allocation

2020-05-04 Thread GitBox


SparkQA commented on pull request #28452:
URL: https://github.com/apache/spark/pull/28452#issuecomment-623789260


   **[Test build #122293 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122293/testReport)**
 for PR 28452 at commit 
[`34c0724`](https://github.com/apache/spark/commit/34c072421e664b8b366574d2f77fce7b0bea3412).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #28336: [SPARK-31559][YARN] Re-obtain tokens at the startup of AM for yarn cluster mode if principal and keytab are available

2020-05-04 Thread GitBox


HeartSaVioR commented on pull request #28336:
URL: https://github.com/apache/spark/pull/28336#issuecomment-623787519


   friendly reminder to @vanzin @squito
   also cc. @jerryshao, @tgravescs to expand the availability of reviews



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #28452: [SPARK-27963][FOLLOW-UP][DOCS][CORE] Remove `for testing` because CleanerListener is used ExecutorMonitor during dynamic allocation

2020-05-04 Thread GitBox


HyukjinKwon commented on pull request #28452:
URL: https://github.com/apache/spark/pull/28452#issuecomment-623787379


   Merged to master and branch-3.0 since the related linter tests were already 
passed. I don't believe this change affects other tests or build.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #28452: [MINOR][CORE] Remove `for testing` because CleanerListener is used ExecutorMonitor during dynamic allocation

2020-05-04 Thread GitBox


HyukjinKwon commented on pull request #28452:
URL: https://github.com/apache/spark/pull/28452#issuecomment-623787185


   Let me just turn this to a followup of SPARK-27963



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #28430: [SPARK-31372][SQL][TEST][FOLLOW-UP] Improve ExpressionsSchemaSuite so that easy to track the diff.

2020-05-04 Thread GitBox


HyukjinKwon commented on pull request #28430:
URL: https://github.com/apache/spark/pull/28430#issuecomment-623786752


   Merged to master and branc-3.0.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #26624: [SPARK-8981][CORE][test-hadoop3.2][test-java11] Add MDC support in Executor

2020-05-04 Thread GitBox


HyukjinKwon commented on a change in pull request #26624:
URL: https://github.com/apache/spark/pull/26624#discussion_r419813534



##
File path: core/src/main/scala/org/apache/spark/util/ThreadUtils.scala
##
@@ -17,21 +17,106 @@
 
 package org.apache.spark.util
 
+import java.util
 import java.util.concurrent._
 import java.util.concurrent.locks.ReentrantLock
 
+import com.google.common.util.concurrent.{MoreExecutors, ThreadFactoryBuilder}
 import scala.concurrent.{Awaitable, ExecutionContext, 
ExecutionContextExecutor, Future}
 import scala.concurrent.duration.{Duration, FiniteDuration}
 import scala.language.higherKinds
 import scala.util.control.NonFatal
 
-import com.google.common.util.concurrent.ThreadFactoryBuilder
-
 import org.apache.spark.SparkException
 import org.apache.spark.rpc.RpcAbortException
 
 private[spark] object ThreadUtils {
 
+  object MDCAwareThreadPoolExecutor {
+def newCachedThreadPool(threadFactory: ThreadFactory): ThreadPoolExecutor 
= {
+  // The values needs to be synced with `Executors.newCachedThreadPool`
+  new MDCAwareThreadPoolExecutor(
+0,
+Integer.MAX_VALUE,
+60L,
+TimeUnit.SECONDS,
+new SynchronousQueue[Runnable],
+threadFactory)
+}
+
+def newFixedThreadPool(nThreads: Int, threadFactory: ThreadFactory): 
ThreadPoolExecutor = {
+  // The values needs to be synced with `Executors.newFixedThreadPool`
+  new MDCAwareThreadPoolExecutor(
+nThreads,
+nThreads,
+0L,
+TimeUnit.MILLISECONDS,
+new LinkedBlockingQueue[Runnable],
+threadFactory)
+}
+
+def newSingleThreadExecutor(threadFactory: ThreadFactory): ExecutorService 
= {
+  // The values needs to be synced with `Executors.newSingleThreadExecutor`
+  Executors.unconfigurableExecutorService(
+new MDCAwareThreadPoolExecutor(

Review comment:
   But let's at least leave a comment here in case people use it without 
knowing there's difference vs built-in `newSingleThreadExecutor`.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #28451: [SPARK-31636][SQL][DOCS] Remove HTML syntax in SQL reference

2020-05-04 Thread GitBox


maropu commented on a change in pull request #28451:
URL: https://github.com/apache/spark/pull/28451#discussion_r419790030



##
File path: docs/_data/menu-sql.yaml
##
@@ -156,22 +156,22 @@
   url: sql-ref-syntax-qry-select-distribute-by.html
 - text: LIMIT Clause 
   url: sql-ref-syntax-qry-select-limit.html
+- text: Common Table Expression
+  url: sql-ref-syntax-qry-select-cte.html
+- text: Inline Table
+  url: sql-ref-syntax-qry-select-inline-table.html
 - text: JOIN
   url: sql-ref-syntax-qry-select-join.html
 - text: Join Hints
   url: sql-ref-syntax-qry-select-hints.html
+- text: LIKE Predicate
+  url: sql-ref-syntax-qry-select-like.html
 - text: Set Operators

Review comment:
   Why do we need the changes in this file?

##
File path: docs/sql-ref-identifier.md
##
@@ -27,54 +27,47 @@ An identifier is a string used to identify a database 
object such as a table, vi
 
  Regular Identifier
 
-{% highlight sql %}
+```sql
 { letter | digit | '_' } [ , ... ]
-{% endhighlight %}
+```
 Note: If `spark.sql.ansi.enabled` is set to true, ANSI SQL reserved keywords 
cannot be used as identifiers. For more details, please refer to [ANSI 
Compliance](sql-ref-ansi-compliance.html).
 
  Delimited Identifier
 
-{% highlight sql %}
+```sql
 `c [ ... ]`
-{% endhighlight %}
+```
 
 ### Parameters
 
-
-  letter
-  
+* **letter**
+
 Any letter from A-Z or a-z.
-  
-
-
-  digit
-  
+
+* **digit**
+
 Any numeral from 0 to 9.
-  
-
-
-  c
-  
+
+* **c**
+
 Any character from the character set. Use ` to escape special 
characters (e.g., `).
-  
-
 
 ### Examples
 
-{% highlight sql %}
+```sql
 -- This CREATE TABLE fails with ParseException because of the illegal 
identifier name a.b
 CREATE TABLE test (a.b int);
-org.apache.spark.sql.catalyst.parser.ParseException:
-no viable alternative at input 'CREATE TABLE test (a.'(line 1, pos 20)
+  org.apache.spark.sql.catalyst.parser.ParseException:

Review comment:
   Rather, we should remove indents in the other places for following the 
result format?

##
File path: docs/sql-ref-functions-udf-aggregate.md
##
@@ -113,26 +102,26 @@ OPTIONS (
 );
 
 SELECT * FROM employees;
--- +---+--+
--- |   name|salary|
--- +---+--+
--- |Michael|  3000|
--- |   Andy|  4500|
--- | Justin|  3500|
--- |  Berta|  4000|
--- +---+--+
++---+--+
+|   name|salary|
++---+--+
+|Michael|  3000|
+|   Andy|  4500|
+| Justin|  3500|
+|  Berta|  4000|
++---+--+
 
 SELECT myAverage(salary) as average_salary FROM employees;
--- +--+
--- |average_salary|
--- +--+
--- |3750.0|
--- +--+
-{% endhighlight %}
++--+
+|average_salary|
++--+
+|3750.0|
++--+
+```
 

Review comment:
   We cannot avoid this tag, too?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dilipbiswal commented on a change in pull request #28451: [SPARK-31636][SQL][DOCS] Remove HTML syntax in SQL reference

2020-05-04 Thread GitBox


dilipbiswal commented on a change in pull request #28451:
URL: https://github.com/apache/spark/pull/28451#discussion_r419810299



##
File path: docs/sql-ref-ansi-compliance.md
##
@@ -66,7 +66,7 @@ This means that in case an operation causes overflows, the 
result is the same wi
 On the other hand, Spark SQL returns null for decimal overflows.
 When `spark.sql.ansi.enabled` is set to `true` and an overflow occurs in 
numeric and interval arithmetic operations, it throws an arithmetic exception 
at runtime.
 
-{% highlight sql %}
+```sql
 -- `spark.sql.ansi.enabled=true`

Review comment:
   @huaxingao I know that it's not related to the format change that you r 
doing in this PR. But shouldn't we have a SET statement here, so users can 
cut-paste the command in their shell to see the behavior ? Perhaps we discussed 
it in the pr that added this clause. Just a question :-)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #26624: [SPARK-8981][CORE][test-hadoop3.2][test-java11] Add MDC support in Executor

2020-05-04 Thread GitBox


viirya commented on a change in pull request #26624:
URL: https://github.com/apache/spark/pull/26624#discussion_r419808850



##
File path: core/src/main/scala/org/apache/spark/util/ThreadUtils.scala
##
@@ -17,21 +17,106 @@
 
 package org.apache.spark.util
 
+import java.util
 import java.util.concurrent._
 import java.util.concurrent.locks.ReentrantLock
 
+import com.google.common.util.concurrent.{MoreExecutors, ThreadFactoryBuilder}
 import scala.concurrent.{Awaitable, ExecutionContext, 
ExecutionContextExecutor, Future}
 import scala.concurrent.duration.{Duration, FiniteDuration}
 import scala.language.higherKinds
 import scala.util.control.NonFatal
 
-import com.google.common.util.concurrent.ThreadFactoryBuilder
-
 import org.apache.spark.SparkException
 import org.apache.spark.rpc.RpcAbortException
 
 private[spark] object ThreadUtils {
 
+  object MDCAwareThreadPoolExecutor {
+def newCachedThreadPool(threadFactory: ThreadFactory): ThreadPoolExecutor 
= {
+  // The values needs to be synced with `Executors.newCachedThreadPool`
+  new MDCAwareThreadPoolExecutor(
+0,
+Integer.MAX_VALUE,
+60L,
+TimeUnit.SECONDS,
+new SynchronousQueue[Runnable],
+threadFactory)
+}
+
+def newFixedThreadPool(nThreads: Int, threadFactory: ThreadFactory): 
ThreadPoolExecutor = {
+  // The values needs to be synced with `Executors.newFixedThreadPool`
+  new MDCAwareThreadPoolExecutor(
+nThreads,
+nThreads,
+0L,
+TimeUnit.MILLISECONDS,
+new LinkedBlockingQueue[Runnable],
+threadFactory)
+}
+
+def newSingleThreadExecutor(threadFactory: ThreadFactory): ExecutorService 
= {
+  // The values needs to be synced with `Executors.newSingleThreadExecutor`
+  Executors.unconfigurableExecutorService(
+new MDCAwareThreadPoolExecutor(

Review comment:
   The finalize method helps to shutdown the underlying thread pool. I'm 
not sure if we rely on this or not in Spark. If you are confident on this 
change, then it should be fine. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28239: [SPARK-31467][SQL][TEST] Refactor the sql tests to prevent TableAlreadyExistsException

2020-05-04 Thread GitBox


AmplabJenkins removed a comment on pull request #28239:
URL: https://github.com/apache/spark/pull/28239#issuecomment-623778968







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28239: [SPARK-31467][SQL][TEST] Refactor the sql tests to prevent TableAlreadyExistsException

2020-05-04 Thread GitBox


AmplabJenkins commented on pull request #28239:
URL: https://github.com/apache/spark/pull/28239#issuecomment-623778968







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28239: [SPARK-31467][SQL][TEST] Refactor the sql tests to prevent TableAlreadyExistsException

2020-05-04 Thread GitBox


SparkQA commented on pull request #28239:
URL: https://github.com/apache/spark/pull/28239#issuecomment-623778576


   **[Test build #122295 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122295/testReport)**
 for PR 28239 at commit 
[`453c5a5`](https://github.com/apache/spark/commit/453c5a5e0717d2681fc2e0ed4f48ca093d4020a0).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on pull request #28239: [SPARK-31467][SQL][TEST] Refactor the sql tests to prevent TableAlreadyExistsException

2020-05-04 Thread GitBox


maropu commented on pull request #28239:
URL: https://github.com/apache/spark/pull/28239#issuecomment-623775852


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #26624: [SPARK-8981][CORE][test-hadoop3.2][test-java11] Add MDC support in Executor

2020-05-04 Thread GitBox


SparkQA commented on pull request #26624:
URL: https://github.com/apache/spark/pull/26624#issuecomment-623774430


   **[Test build #122294 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122294/testReport)**
 for PR 26624 at commit 
[`50a68c7`](https://github.com/apache/spark/commit/50a68c7eded44ce6eb5afaff6f8170c4add70a25).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #26624: [SPARK-8981][CORE][test-hadoop3.2][test-java11] Add MDC support in Executor

2020-05-04 Thread GitBox


AmplabJenkins removed a comment on pull request #26624:
URL: https://github.com/apache/spark/pull/26624#issuecomment-623772844







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #26624: [SPARK-8981][CORE][test-hadoop3.2][test-java11] Add MDC support in Executor

2020-05-04 Thread GitBox


AmplabJenkins commented on pull request #26624:
URL: https://github.com/apache/spark/pull/26624#issuecomment-623772844







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   7   >