date:20200804

[GitHub] [spark] yanxiaole commented on a change in pull request #29350: [SPARK-32529][CORE] Fix Historyserver log scan aborted by application status change

2020-08-04 Thread GitBox



yanxiaole commented on a change in pull request #29350:
URL: https://github.com/apache/spark/pull/29350#discussion_r465514321



##
File path: 
core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala
##
@@ -530,10 +530,16 @@ private[history] class FsHistoryProvider(conf: SparkConf, 
clock: Clock)
   // If the file is currently not being tracked by the SHS, add an 
entry for it and try
   // to parse it. This will allow the cleaner code to detect the 
file as stale later on
   // if it was not possible to parse it.
-  listing.write(LogInfo(reader.rootPath.toString(), 
newLastScanTime, LogType.EventLogs,
-None, None, reader.fileSizeForLastIndex, reader.lastIndex, 
None,
-reader.completed))
-  reader.fileSizeForLastIndex > 0
+  try {
+listing.write(LogInfo(reader.rootPath.toString(), 
newLastScanTime,
+  LogType.EventLogs, None, None, reader.fileSizeForLastIndex, 
reader.lastIndex,
+  None, reader.completed))
+reader.fileSizeForLastIndex > 0
+  } catch {
+case _: FileNotFoundException => false
+  }
+case _: FileNotFoundException =>

Review comment:
   added
   
   > nit: I'd have empty new line after `}`.
   
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28761: [SPARK-25557][SQL][test-hive2.3] Nested column predicate pushdown for ORC

2020-08-04 Thread GitBox



SparkQA commented on pull request #28761:
URL: https://github.com/apache/spark/pull/28761#issuecomment-669018092


   **[Test build #127084 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127084/testReport)**
 for PR 28761 at commit 
[`0747fcd`](https://github.com/apache/spark/commit/0747fcdef1ffdba5b8ce3cbafdc03cac3559f7d4).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] viirya commented on pull request #28761: [SPARK-25557][SQL][test-hive2.3] Nested column predicate pushdown for ORC

2020-08-04 Thread GitBox



viirya commented on pull request #28761:
URL: https://github.com/apache/spark/pull/28761#issuecomment-669016101


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29031: [SPARK-32216][SQL] Remove redundant ProjectExec

2020-08-04 Thread GitBox



AmplabJenkins removed a comment on pull request #29031:
URL: https://github.com/apache/spark/pull/29031#issuecomment-669015435


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/127077/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29311: [SPARK-32501][SQL] Convert null to "null" in structs, maps and arrays while casting to strings

2020-08-04 Thread GitBox



AmplabJenkins removed a comment on pull request #29311:
URL: https://github.com/apache/spark/pull/29311#issuecomment-669015509







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29350: [SPARK-32529][CORE] Fix Historyserver log scan aborted by application status change

2020-08-04 Thread GitBox



AmplabJenkins removed a comment on pull request #29350:
URL: https://github.com/apache/spark/pull/29350#issuecomment-669015466







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29347: [WIP][SPARK-32492][SQL][FOLLOWUP][test-maven] Fix jenkins maven jobs

2020-08-04 Thread GitBox



AmplabJenkins removed a comment on pull request #29347:
URL: https://github.com/apache/spark/pull/29347#issuecomment-669015453







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #29311: [SPARK-32501][SQL] Convert null to "null" in structs, maps and arrays while casting to strings

2020-08-04 Thread GitBox



AmplabJenkins commented on pull request #29311:
URL: https://github.com/apache/spark/pull/29311#issuecomment-669015509







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29031: [SPARK-32216][SQL] Remove redundant ProjectExec

2020-08-04 Thread GitBox



AmplabJenkins removed a comment on pull request #29031:
URL: https://github.com/apache/spark/pull/29031#issuecomment-669015424


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #29031: [SPARK-32216][SQL] Remove redundant ProjectExec

2020-08-04 Thread GitBox



AmplabJenkins commented on pull request #29031:
URL: https://github.com/apache/spark/pull/29031#issuecomment-669015424







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #29350: [SPARK-32529][CORE] Fix Historyserver log scan aborted by application status change

2020-08-04 Thread GitBox



AmplabJenkins commented on pull request #29350:
URL: https://github.com/apache/spark/pull/29350#issuecomment-669015466







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #29031: [SPARK-32216][SQL] Remove redundant ProjectExec

2020-08-04 Thread GitBox



SparkQA commented on pull request #29031:
URL: https://github.com/apache/spark/pull/29031#issuecomment-669015315


   **[Test build #127077 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127077/testReport)**
 for PR 29031 at commit 
[`4585a04`](https://github.com/apache/spark/commit/4585a04ef4548022865ea0978040d9dd56df8252).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #29031: [SPARK-32216][SQL] Remove redundant ProjectExec

2020-08-04 Thread GitBox



SparkQA removed a comment on pull request #29031:
URL: https://github.com/apache/spark/pull/29031#issuecomment-668981110


   **[Test build #127077 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127077/testReport)**
 for PR 29031 at commit 
[`4585a04`](https://github.com/apache/spark/commit/4585a04ef4548022865ea0978040d9dd56df8252).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #29347: [WIP][SPARK-32492][SQL][FOLLOWUP][test-maven] Fix jenkins maven jobs

2020-08-04 Thread GitBox



AmplabJenkins commented on pull request #29347:
URL: https://github.com/apache/spark/pull/29347#issuecomment-669015443







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #29324: [SPARK-32402][SQL] Implement ALTER TABLE in JDBC Table Catalog

2020-08-04 Thread GitBox



SparkQA commented on pull request #29324:
URL: https://github.com/apache/spark/pull/29324#issuecomment-669014937


   **[Test build #127082 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127082/testReport)**
 for PR 29324 at commit 
[`dfc7387`](https://github.com/apache/spark/commit/dfc73873f6342774a8f96782c2e4853ede86a190).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #29350: [SPARK-32529][CORE] Fix Historyserver log scan aborted by application status change

2020-08-04 Thread GitBox



SparkQA commented on pull request #29350:
URL: https://github.com/apache/spark/pull/29350#issuecomment-669014868


   **[Test build #127080 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127080/testReport)**
 for PR 29350 at commit 
[`d1bf4ca`](https://github.com/apache/spark/commit/d1bf4caa30231a41fd4e6025c34af71c5f15e07e).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #29311: [SPARK-32501][SQL] Convert null to "null" in structs, maps and arrays while casting to strings

2020-08-04 Thread GitBox



SparkQA commented on pull request #29311:
URL: https://github.com/apache/spark/pull/29311#issuecomment-669014971


   **[Test build #127083 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127083/testReport)**
 for PR 29311 at commit 
[`9547682`](https://github.com/apache/spark/commit/95476826b8e99e0d4e0453d2d161e1f8bfabcd8f).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #29347: [WIP][SPARK-32492][SQL][FOLLOWUP][test-maven] Fix jenkins maven jobs

2020-08-04 Thread GitBox



SparkQA commented on pull request #29347:
URL: https://github.com/apache/spark/pull/29347#issuecomment-669014908


   **[Test build #127081 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127081/testReport)**
 for PR 29347 at commit 
[`61ba3ce`](https://github.com/apache/spark/commit/61ba3cee43b06d4987f5ae71bd01b20ce674766a).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28761: [SPARK-25557][SQL][test-hive1.2] Nested column predicate pushdown for ORC

2020-08-04 Thread GitBox



AmplabJenkins removed a comment on pull request #28761:
URL: https://github.com/apache/spark/pull/28761#issuecomment-669014517


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/127078/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28761: [SPARK-25557][SQL][test-hive1.2] Nested column predicate pushdown for ORC

2020-08-04 Thread GitBox



AmplabJenkins commented on pull request #28761:
URL: https://github.com/apache/spark/pull/28761#issuecomment-669014509







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28761: [SPARK-25557][SQL][test-hive1.2] Nested column predicate pushdown for ORC

2020-08-04 Thread GitBox



AmplabJenkins removed a comment on pull request #28761:
URL: https://github.com/apache/spark/pull/28761#issuecomment-669014509


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28761: [SPARK-25557][SQL][test-hive1.2] Nested column predicate pushdown for ORC

2020-08-04 Thread GitBox



SparkQA removed a comment on pull request #28761:
URL: https://github.com/apache/spark/pull/28761#issuecomment-668983196


   **[Test build #127078 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127078/testReport)**
 for PR 28761 at commit 
[`0747fcd`](https://github.com/apache/spark/commit/0747fcdef1ffdba5b8ce3cbafdc03cac3559f7d4).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28761: [SPARK-25557][SQL][test-hive1.2] Nested column predicate pushdown for ORC

2020-08-04 Thread GitBox



SparkQA commented on pull request #28761:
URL: https://github.com/apache/spark/pull/28761#issuecomment-669014400


   **[Test build #127078 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127078/testReport)**
 for PR 28761 at commit 
[`0747fcd`](https://github.com/apache/spark/commit/0747fcdef1ffdba5b8ce3cbafdc03cac3559f7d4).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
 * `abstract class ParquetFilterSuite extends QueryTest with ParquetTest 
with SharedSparkSession `
 * `class OrcFilterSuite extends OrcTest with SharedSparkSession `
 * `class OrcFilterSuite extends OrcTest with SharedSparkSession `



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] yaooqinn commented on pull request #29347: [WIP][SPARK-32492][SQL][FOLLOWUP][test-maven] Fix jenkins maven jobs

2020-08-04 Thread GitBox



yaooqinn commented on pull request #29347:
URL: https://github.com/apache/spark/pull/29347#issuecomment-669011351


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] gengliangwang commented on a change in pull request #29137: [SPARK-32337][SQL] Show initial plan in AQE plan tree string

2020-08-04 Thread GitBox



gengliangwang commented on a change in pull request #29137:
URL: https://github.com/apache/spark/pull/29137#discussion_r465503960



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
##
@@ -279,34 +280,69 @@ case class AdaptiveSparkPlanExec(
   prefix: String = "",
   addSuffix: Boolean = false,
   maxFields: Int,
-  printNodeId: Boolean): Unit = {
-super.generateTreeString(depth,
+  printNodeId: Boolean,
+  indent: Int = 0): Unit = {
+super.generateTreeString(
+  depth,
   lastChildren,
   append,
   verbose,
   prefix,
   addSuffix,
   maxFields,
+  printNodeId,
+  indent)
+generateTreeStringWithHeader(
+  if (isFinalPlan) "Final Plan" else "Current Plan",

Review comment:
   > Why it's Current Plan not Final Plan in EXPLAIN FORMATTED
   
   @cloud-fan I think it is from here.
   @allisonwang-db Could you update the PR description to explain the 
difference?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] MaxGekk commented on a change in pull request #29311: [SPARK-32501][SQL] Convert null to "null" in structs, maps and arrays while casting to strings

2020-08-04 Thread GitBox



MaxGekk commented on a change in pull request #29311:
URL: https://github.com/apache/spark/pull/29311#discussion_r465503811



##
File path: docs/sql-migration-guide.md
##
@@ -34,6 +34,8 @@ license: |
 
   - In Spark 3.1, structs and maps are wrapped by the `{}` brackets in casting 
them to strings. For instance, the `show()` action and the `CAST` expression 
use such brackets. In Spark 3.0 and earlier, the `[]` brackets are used for the 
same purpose. To restore the behavior before Spark 3.1, you can set 
`spark.sql.legacy.castComplexTypesToString.enabled` to `true`.
 
+  - In Spark 3.1, `CAST` converts NULL elements of structures, arrays and maps 
to "null". In Spark 3.0 or earlier, NULL elements are converted to empty 
strings. To restore the behavior before Spark 3.1, you can set 
`spark.sql.legacy.castComplexTypesToString.enabled` to `true`.

Review comment:
   https://user-images.githubusercontent.com/1580697/89379682-e594e880-d6fe-11ea-8594-040a428cdce3.png";>
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] viirya commented on a change in pull request #29324: [SPARK-32402][SQL] Implement ALTER TABLE in JDBC Table Catalog

2020-08-04 Thread GitBox



viirya commented on a change in pull request #29324:
URL: https://github.com/apache/spark/pull/29324#discussion_r465502978



##
File path: sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala
##
@@ -184,15 +189,61 @@ abstract class JdbcDialect extends Serializable {
   /**
* Rename an existing table.
*
-   * TODO (SPARK-32382): Override this method in the dialects that don't 
support such syntax.
-   *
* @param oldTable The existing table.
* @param newTable New name of the table.
* @return The SQL statement to use for renaming the table.
*/
   def renameTable(oldTable: String, newTable: String): String = {
 s"ALTER TABLE $oldTable RENAME TO $newTable"
   }
+
+  /**
+   * Alter an existing table.
+   * TODO (SPARK-32523): Override this method in the dialects that have 
different syntax.
+   *
+   * @param tableName The name of the table to be altered.
+   * @param changes Changes to apply to the table.
+   * @return The SQL statements to use for altering the table.
+   */
+  def alterTable(tableName: String, changes: Seq[TableChange]): Array[String] 
= {
+val updateClause = ArrayBuilder.make[String]
+for (change <- changes) {
+  change match {
+case add: AddColumn if add.fieldNames.length == 1 =>
+  add.fieldNames match {
+case Array(name) =>
+  val dataType = JdbcUtils.getJdbcType(add.dataType(), 
this).databaseTypeDefinition
+  updateClause += s"ALTER TABLE $tableName ADD COLUMN $name 
$dataType"
+  }
+case rename: RenameColumn if rename.fieldNames.length == 1 =>
+  rename.fieldNames match {
+case Array(name) =>
+  updateClause += s"ALTER TABLE $tableName RENAME COLUMN $name TO 
${rename.newName}"
+  }
+case delete: DeleteColumn if delete.fieldNames.length == 1 =>
+  delete.fieldNames match {
+case Array(name) =>
+  updateClause += s"ALTER TABLE $tableName DROP COLUMN $name"
+  }
+case update: UpdateColumnType if update.fieldNames.length == 1 =>
+  update.fieldNames match {
+case Array(name) =>
+  val dataType = JdbcUtils.getJdbcType(update.newDataType(), this)

Review comment:
   nit: We know `fieldNames` must be one element now. We don't need `match` 
and can just access `fieldNames(0)`.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] viirya commented on a change in pull request #29324: [SPARK-32402][SQL] Implement ALTER TABLE in JDBC Table Catalog

2020-08-04 Thread GitBox



viirya commented on a change in pull request #29324:
URL: https://github.com/apache/spark/pull/29324#discussion_r465501617



##
File path: sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala
##
@@ -184,15 +189,61 @@ abstract class JdbcDialect extends Serializable {
   /**
* Rename an existing table.
*
-   * TODO (SPARK-32382): Override this method in the dialects that don't 
support such syntax.
-   *
* @param oldTable The existing table.
* @param newTable New name of the table.
* @return The SQL statement to use for renaming the table.
*/
   def renameTable(oldTable: String, newTable: String): String = {
 s"ALTER TABLE $oldTable RENAME TO $newTable"
   }
+
+  /**
+   * Alter an existing table.
+   * TODO (SPARK-32523): Override this method in the dialects that have 
different syntax.

Review comment:
   Because you will override this method in other places, not here. 
Remember to remove this later. :)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29324: [SPARK-32402][SQL] Implement ALTER TABLE in JDBC Table Catalog

2020-08-04 Thread GitBox



AmplabJenkins removed a comment on pull request #29324:
URL: https://github.com/apache/spark/pull/29324#issuecomment-669006266







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29347: [WIP][SPARK-32492][SQL][FOLLOWUP][test-maven] Fix jenkins maven jobs

2020-08-04 Thread GitBox



AmplabJenkins removed a comment on pull request #29347:
URL: https://github.com/apache/spark/pull/29347#issuecomment-669006278







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #29347: [WIP][SPARK-32492][SQL][FOLLOWUP][test-maven] Fix jenkins maven jobs

2020-08-04 Thread GitBox



AmplabJenkins commented on pull request #29347:
URL: https://github.com/apache/spark/pull/29347#issuecomment-669006278







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #29324: [SPARK-32402][SQL] Implement ALTER TABLE in JDBC Table Catalog

2020-08-04 Thread GitBox



AmplabJenkins commented on pull request #29324:
URL: https://github.com/apache/spark/pull/29324#issuecomment-669006266







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] MaxGekk commented on a change in pull request #29311: [SPARK-32501][SQL] Convert null to "null" in structs, maps and arrays while casting to strings

2020-08-04 Thread GitBox



MaxGekk commented on a change in pull request #29311:
URL: https://github.com/apache/spark/pull/29311#discussion_r465500077



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
##
@@ -321,7 +321,9 @@ abstract class CastBase extends UnaryExpression with 
TimeZoneAwareExpression wit
   var i = 1
   while (i < array.numElements) {
 builder.append(",")
-if (!array.isNullAt(i)) {
+if (array.isNullAt(i)) {

Review comment:
   Here, we have 3 branches. Don't think your proposal is applicable.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] MaxGekk commented on a change in pull request #29311: [SPARK-32501][SQL] Convert null to "null" in structs, maps and arrays while casting to strings

2020-08-04 Thread GitBox



MaxGekk commented on a change in pull request #29311:
URL: https://github.com/apache/spark/pull/29311#discussion_r465499676



##
File path: docs/sql-migration-guide.md
##
@@ -34,6 +34,8 @@ license: |
 
   - In Spark 3.1, structs and maps are wrapped by the `{}` brackets in casting 
them to strings. For instance, the `show()` action and the `CAST` expression 
use such brackets. In Spark 3.0 and earlier, the `[]` brackets are used for the 
same purpose. To restore the behavior before Spark 3.1, you can set 
`spark.sql.legacy.castComplexTypesToString.enabled` to `true`.
 
+  - In Spark 3.1, `CAST` converts NULL elements of structures, arrays and maps 
to "null". In Spark 3.0 or earlier, NULL elements are converted to empty 
strings. To restore the behavior before Spark 3.1, you can set 
`spark.sql.legacy.castComplexTypesToString.enabled` to `true`.

Review comment:
   The passive stuff was not recommended by IntelliJ IDEA ;-) Native 
speakers don't like it.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29339: [Spark-32512][SQL] add alter table add/drop partition command for datasourcev2

2020-08-04 Thread GitBox



AmplabJenkins removed a comment on pull request #29339:
URL: https://github.com/apache/spark/pull/29339#issuecomment-669004419







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #29339: [Spark-32512][SQL] add alter table add/drop partition command for datasourcev2

2020-08-04 Thread GitBox



AmplabJenkins commented on pull request #29339:
URL: https://github.com/apache/spark/pull/29339#issuecomment-669004419







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #29339: [Spark-32512][SQL] add alter table add/drop partition command for datasourcev2

2020-08-04 Thread GitBox



SparkQA commented on pull request #29339:
URL: https://github.com/apache/spark/pull/29339#issuecomment-669003736


   **[Test build #127072 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127072/testReport)**
 for PR 29339 at commit 
[`61cae52`](https://github.com/apache/spark/commit/61cae52344983f267867da0360e34b4a2a0c9e83).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #29339: [Spark-32512][SQL] add alter table add/drop partition command for datasourcev2

2020-08-04 Thread GitBox



SparkQA removed a comment on pull request #29339:
URL: https://github.com/apache/spark/pull/29339#issuecomment-668927081


   **[Test build #127072 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127072/testReport)**
 for PR 29339 at commit 
[`61cae52`](https://github.com/apache/spark/commit/61cae52344983f267867da0360e34b4a2a0c9e83).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] agrawaldevesh commented on pull request #29304: [SPARK-32494][SQL] Null Aware Anti Join Optimize Support Multi-Column

2020-08-04 Thread GitBox



agrawaldevesh commented on pull request #29304:
URL: https://github.com/apache/spark/pull/29304#issuecomment-669001099


   @leanken ... this was a GREAT GREAT attempt and I certainly learned a ton 
from it :-P. I am curious if you ran profiled it while running the Q16 and have 
a sense of where the low hanging fruits might be ? 
   
   We can also consider the hybrid approach we discussed where we double the 
memory and keep the original HashedRelation for step 1 and 2 of the paper but 
use the inverted indices only for the step 3. That might help with the inverted 
index caused regression for the single key case. 
   
   In any case, I am totally with @cloud-fan that supporting shuffled hash join 
single key is more important. (As I also noted in my previous comment):
   
   > As a diversion, I wonder if it makes sense instead to support the single 
key case but for distributed scenario (shuffle hash join and like) if this 
multi-key stuff is really hard. I think the single-key distributed case would 
be more common.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #29324: [SPARK-32402][SQL] Implement ALTER TABLE in JDBC Table Catalog

2020-08-04 Thread GitBox



HyukjinKwon commented on a change in pull request #29324:
URL: https://github.com/apache/spark/pull/29324#discussion_r465493233



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
##
@@ -184,7 +179,7 @@ object JdbcUtils extends Logging {
 }
   }
 
-  private def getJdbcType(dt: DataType, dialect: JdbcDialect): JdbcType = {
+  private[sql] def getJdbcType(dt: DataType, dialect: JdbcDialect): JdbcType = 
{

Review comment:
   It's okay to remove `private[sql]` because `execution` is already in the 
private package (see also SPARK-16964)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29332: [SPARK-32518][CORE] CoarseGrainedSchedulerBackend.maxNumConcurrentTasks should consider all kinds of resources

2020-08-04 Thread GitBox



AmplabJenkins removed a comment on pull request #29332:
URL: https://github.com/apache/spark/pull/29332#issuecomment-668996964







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #29332: [SPARK-32518][CORE] CoarseGrainedSchedulerBackend.maxNumConcurrentTasks should consider all kinds of resources

2020-08-04 Thread GitBox



AmplabJenkins commented on pull request #29332:
URL: https://github.com/apache/spark/pull/29332#issuecomment-668996964







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #29332: [SPARK-32518][CORE] CoarseGrainedSchedulerBackend.maxNumConcurrentTasks should consider all kinds of resources

2020-08-04 Thread GitBox



SparkQA removed a comment on pull request #29332:
URL: https://github.com/apache/spark/pull/29332#issuecomment-668959450


   **[Test build #127074 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127074/testReport)**
 for PR 29332 at commit 
[`786b145`](https://github.com/apache/spark/commit/786b145771a12004668ea35afdc6e4054924bfed).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #29332: [SPARK-32518][CORE] CoarseGrainedSchedulerBackend.maxNumConcurrentTasks should consider all kinds of resources

2020-08-04 Thread GitBox



SparkQA commented on pull request #29332:
URL: https://github.com/apache/spark/pull/29332#issuecomment-668996210


   **[Test build #127074 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127074/testReport)**
 for PR 29332 at commit 
[`786b145`](https://github.com/apache/spark/commit/786b145771a12004668ea35afdc6e4054924bfed).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #29311: [SPARK-32501][SQL] Convert null to "null" in structs, maps and arrays while casting to strings

2020-08-04 Thread GitBox



cloud-fan commented on a change in pull request #29311:
URL: https://github.com/apache/spark/pull/29311#discussion_r465489920



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
##
@@ -895,6 +905,10 @@ abstract class CastBase extends UnaryExpression with 
TimeZoneAwareExpression wit
 """
   }
 
+  private def outNullElem(buffer: ExprValue): Block = {
+if (legacyCastToStr) code";" else code"""$buffer.append(" null");"""

Review comment:
   `code";"` -> `code""`?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #29311: [SPARK-32501][SQL] Convert null to "null" in structs, maps and arrays while casting to strings

2020-08-04 Thread GitBox



cloud-fan commented on a change in pull request #29311:
URL: https://github.com/apache/spark/pull/29311#discussion_r465489374



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
##
@@ -342,7 +344,9 @@ abstract class CastBase extends UnaryExpression with 
TimeZoneAwareExpression wit
   val valueToUTF8String = castToString(vt)
   builder.append(keyToUTF8String(keyArray.get(0, 
kt)).asInstanceOf[UTF8String])
   builder.append(" ->")
-  if (!valueArray.isNullAt(0)) {
+  if (valueArray.isNullAt(0)) {

Review comment:
   ditto





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #29311: [SPARK-32501][SQL] Convert null to "null" in structs, maps and arrays while casting to strings

2020-08-04 Thread GitBox



cloud-fan commented on a change in pull request #29311:
URL: https://github.com/apache/spark/pull/29311#discussion_r465489447



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
##
@@ -351,7 +355,9 @@ abstract class CastBase extends UnaryExpression with 
TimeZoneAwareExpression wit
 builder.append(", ")
 builder.append(keyToUTF8String(keyArray.get(i, 
kt)).asInstanceOf[UTF8String])
 builder.append(" ->")
-if (!valueArray.isNullAt(i)) {
+if (valueArray.isNullAt(i)) {

Review comment:
   ditto

##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
##
@@ -369,13 +375,17 @@ abstract class CastBase extends UnaryExpression with 
TimeZoneAwareExpression wit
 if (row.numFields > 0) {
   val st = fields.map(_.dataType)
   val toUTF8StringFuncs = st.map(castToString)
-  if (!row.isNullAt(0)) {
+  if (row.isNullAt(0)) {

Review comment:
   ditto





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #29311: [SPARK-32501][SQL] Convert null to "null" in structs, maps and arrays while casting to strings

2020-08-04 Thread GitBox



cloud-fan commented on a change in pull request #29311:
URL: https://github.com/apache/spark/pull/29311#discussion_r465489268



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
##
@@ -321,7 +321,9 @@ abstract class CastBase extends UnaryExpression with 
TimeZoneAwareExpression wit
   var i = 1
   while (i < array.numElements) {
 builder.append(",")
-if (!array.isNullAt(i)) {
+if (array.isNullAt(i)) {

Review comment:
   `if (array.isNullAt(i) && !legacyCastToStr)`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #29311: [SPARK-32501][SQL] Convert null to "null" in structs, maps and arrays while casting to strings

2020-08-04 Thread GitBox



cloud-fan commented on a change in pull request #29311:
URL: https://github.com/apache/spark/pull/29311#discussion_r465488975



##
File path: docs/sql-migration-guide.md
##
@@ -34,6 +34,8 @@ license: |
 
   - In Spark 3.1, structs and maps are wrapped by the `{}` brackets in casting 
them to strings. For instance, the `show()` action and the `CAST` expression 
use such brackets. In Spark 3.0 and earlier, the `[]` brackets are used for the 
same purpose. To restore the behavior before Spark 3.1, you can set 
`spark.sql.legacy.castComplexTypesToString.enabled` to `true`.
 
+  - In Spark 3.1, `CAST` converts NULL elements of structures, arrays and maps 
to "null". In Spark 3.0 or earlier, NULL elements are converted to empty 
strings. To restore the behavior before Spark 3.1, you can set 
`spark.sql.legacy.castComplexTypesToString.enabled` to `true`.

Review comment:
   `In Spark 3.1, NULL elements of structures, arrays and maps are 
converted to "null" in casting them to strings.`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] stijndehaes commented on pull request #28423: [SPARK-24266][k8s] Restart the watcher when we receive a version changed from k8s

2020-08-04 Thread GitBox



stijndehaes commented on pull request #28423:
URL: https://github.com/apache/spark/pull/28423#issuecomment-668991161


   @jkleckner I have never had a problem with the driver watching the 
executors. I think there was already a fallback mechanism there, but I never 
looked into the code for that one.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] JkSelf commented on pull request #29266: [SPARK-32464][SQL] Support skew handling on join that has one side wi…

2020-08-04 Thread GitBox



JkSelf commented on pull request #29266:
URL: https://github.com/apache/spark/pull/29266#issuecomment-668989319


   Can you show the plan changes in UI? And whether introduced additional 
shuffle when change the partition num in bucket side or not?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on pull request #29137: [SPARK-32337][SQL] Show initial plan in AQE plan tree string

2020-08-04 Thread GitBox



cloud-fan commented on pull request #29137:
URL: https://github.com/apache/spark/pull/29137#issuecomment-668988346


   ```
   == Physical Plan ==
   AdaptiveSparkPlan (9)
   +- == Current Plan ==
  BroadcastHashJoin Inner BuildRight (8)
  :- Project (3)
  :  +- Filter (2)
   +- == Initial Plan ==
  BroadcastHashJoin Inner BuildRight (8)
  :- Project (3)
  :  +- Filter (2)
   ```
   
   Why it's `Current Plan` not `Final Plan` in `EXPLAIN FORMATTED`? And can we 
use an example that AQE changes the plan?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] ScrapCodes commented on pull request #29334: [SPARK-32495][2.4] Update jackson versions to a maintained release, to fix various security vulnerabilities.

2020-08-04 Thread GitBox



ScrapCodes commented on pull request #29334:
URL: https://github.com/apache/spark/pull/29334#issuecomment-668988454


   Alrighty, then I will skip this for 2.4.7 release, even though I still feel 
that this might be safe and good for people in general, provided jackson 2.6.7 
had last release in 2017 (ignoring the micro CVE backport release in 2019).
   
   If we do need to revisit this at a later point, we can always take care of 
it in the next release. Thanks !
   
   > Alright, one more 
[FasterXML/jackson-databind#2798](https://github.com/FasterXML/jackson-databind/issues/2798),
 Shall we consider 2.10.x ? change is the same and the later is free from whole 
store house of CVEs.
   
   We can ignore this, as it does not include the code paths(org.jsecurity and 
com.pastdev.httpcomponents) spark uses (hopefully )



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29353: [SPARK-32532][SQL] Improve ORC read/write performance on nested structs and array of structs

2020-08-04 Thread GitBox



AmplabJenkins removed a comment on pull request #29353:
URL: https://github.com/apache/spark/pull/29353#issuecomment-668986802







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29352: [SPARK-32531][SQL][TEST] Add benchmarks for nested structs and arrays for different file formats

2020-08-04 Thread GitBox



AmplabJenkins removed a comment on pull request #29352:
URL: https://github.com/apache/spark/pull/29352#issuecomment-668986647







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #29353: [SPARK-32532][SQL] Improve ORC read/write performance on nested structs and array of structs

2020-08-04 Thread GitBox



AmplabJenkins commented on pull request #29353:
URL: https://github.com/apache/spark/pull/29353#issuecomment-668986802







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #29352: [SPARK-32531][SQL][TEST] Add benchmarks for nested structs and arrays for different file formats

2020-08-04 Thread GitBox



AmplabJenkins commented on pull request #29352:
URL: https://github.com/apache/spark/pull/29352#issuecomment-668986647







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #29334: [SPARK-32495][2.4] Update jackson versions to a maintained release, to fix various security vulnerabilities.

2020-08-04 Thread GitBox



HyukjinKwon commented on pull request #29334:
URL: https://github.com/apache/spark/pull/29334#issuecomment-668986454


   For the PR itself, I agree with @srowen's and @dongjoon-hyun comments at 
https://github.com/apache/spark/pull/29334#issuecomment-668044607 and 
https://github.com/apache/spark/pull/29334#issuecomment-668381120. Sorry for 
late responses.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #29353: [SPARK-32532][SQL] Improve ORC read/write performance on nested structs and array of structs

2020-08-04 Thread GitBox



SparkQA removed a comment on pull request #29353:
URL: https://github.com/apache/spark/pull/29353#issuecomment-668914387


   **[Test build #127071 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127071/testReport)**
 for PR 29353 at commit 
[`13af454`](https://github.com/apache/spark/commit/13af454cf4bb53a1c6b2c5f6aef19b6b92420e26).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #29353: [SPARK-32532][SQL] Improve ORC read/write performance on nested structs and array of structs

2020-08-04 Thread GitBox



SparkQA commented on pull request #29353:
URL: https://github.com/apache/spark/pull/29353#issuecomment-668986200


   **[Test build #127071 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127071/testReport)**
 for PR 29353 at commit 
[`13af454`](https://github.com/apache/spark/commit/13af454cf4bb53a1c6b2c5f6aef19b6b92420e26).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #29352: [SPARK-32531][SQL][TEST] Add benchmarks for nested structs and arrays for different file formats

2020-08-04 Thread GitBox



SparkQA removed a comment on pull request #29352:
URL: https://github.com/apache/spark/pull/29352#issuecomment-668911605


   **[Test build #127070 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127070/testReport)**
 for PR 29352 at commit 
[`b10f1fc`](https://github.com/apache/spark/commit/b10f1fc4e3b6ac7564df9ca1503cc30e43043223).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #29352: [SPARK-32531][SQL][TEST] Add benchmarks for nested structs and arrays for different file formats

2020-08-04 Thread GitBox



SparkQA commented on pull request #29352:
URL: https://github.com/apache/spark/pull/29352#issuecomment-668985998


   **[Test build #127070 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127070/testReport)**
 for PR 29352 at commit 
[`b10f1fc`](https://github.com/apache/spark/commit/b10f1fc4e3b6ac7564df9ca1503cc30e43043223).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
 * `trait ArrayWithNestedStructTest extends ReadSchemaTest `
 * `trait MapWithNestedStructTest extends ReadSchemaTest `



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #29334: [SPARK-32495][2.4] Update jackson versions to a maintained release, to fix various security vulnerabilities.

2020-08-04 Thread GitBox



HyukjinKwon commented on pull request #29334:
URL: https://github.com/apache/spark/pull/29334#issuecomment-668985839


   Yes, I think it does. That was one of reasons why I was hesitant. FYI, there 
was a bit of discussions and updates about resources at SPARK-32264.
   
   Given that the PRs (to `branch-2.4` or `branch-3.0`) are not very frequent, 
I guess it might be okay ...



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29350: [SPARK-32529][CORE] Fix Historyserver log scan aborted by application status change

2020-08-04 Thread GitBox



AmplabJenkins removed a comment on pull request #29350:
URL: https://github.com/apache/spark/pull/29350#issuecomment-668985557







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #29350: [SPARK-32529][CORE] Fix Historyserver log scan aborted by application status change

2020-08-04 Thread GitBox



AmplabJenkins commented on pull request #29350:
URL: https://github.com/apache/spark/pull/29350#issuecomment-668985557







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #29350: [SPARK-32529][CORE] Fix Historyserver log scan aborted by application status change

2020-08-04 Thread GitBox



SparkQA commented on pull request #29350:
URL: https://github.com/apache/spark/pull/29350#issuecomment-668985252


   **[Test build #127079 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127079/testReport)**
 for PR 29350 at commit 
[`d113709`](https://github.com/apache/spark/commit/d11370942478256158fc70b755ca536d95606a04).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on a change in pull request #29350: [SPARK-32529][CORE] Fix Historyserver log scan aborted by application status change

2020-08-04 Thread GitBox



HeartSaVioR commented on a change in pull request #29350:
URL: https://github.com/apache/spark/pull/29350#discussion_r465477234



##
File path: 
core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala
##
@@ -530,10 +530,16 @@ private[history] class FsHistoryProvider(conf: SparkConf, 
clock: Clock)
   // If the file is currently not being tracked by the SHS, add an 
entry for it and try
   // to parse it. This will allow the cleaner code to detect the 
file as stale later on
   // if it was not possible to parse it.
-  listing.write(LogInfo(reader.rootPath.toString(), 
newLastScanTime, LogType.EventLogs,
-None, None, reader.fileSizeForLastIndex, reader.lastIndex, 
None,
-reader.completed))
-  reader.fileSizeForLastIndex > 0
+  try {
+listing.write(LogInfo(reader.rootPath.toString(), 
newLastScanTime,
+  LogType.EventLogs, None, None, reader.fileSizeForLastIndex, 
reader.lastIndex,
+  None, reader.completed))
+reader.fileSizeForLastIndex > 0
+  } catch {
+case _: FileNotFoundException => false
+  }
+case _: FileNotFoundException =>

Review comment:
   nit: I'd have empty new line after `}`.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29350: [SPARK-32529][CORE] Fix Historyserver log scan aborted by application status change

2020-08-04 Thread GitBox



AmplabJenkins removed a comment on pull request #29350:
URL: https://github.com/apache/spark/pull/29350#issuecomment-668691724


   Can one of the admins verify this patch?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28761: [SPARK-25557][SQL][test-hive1.2] Nested column predicate pushdown for ORC

2020-08-04 Thread GitBox



AmplabJenkins removed a comment on pull request #28761:
URL: https://github.com/apache/spark/pull/28761#issuecomment-668983580







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on pull request #29350: [SPARK-32529][CORE] Fix Historyserver log scan aborted by application status change

2020-08-04 Thread GitBox



HeartSaVioR commented on pull request #29350:
URL: https://github.com/apache/spark/pull/29350#issuecomment-668983630


   ok to test



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] ScrapCodes commented on pull request #29334: [SPARK-32495][2.4] Update jackson versions to a maintained release, to fix various security vulnerabilities.

2020-08-04 Thread GitBox



ScrapCodes commented on pull request #29334:
URL: https://github.com/apache/spark/pull/29334#issuecomment-668983574


   @HyukjinKwon Thanks for looking in to it, and it is my mistake, I did not 
know that github actions are not ported to other branches yet. I am not 100% 
sure that they should be ported or not, does it effect our total resource 
capacity on github actions?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28761: [SPARK-25557][SQL][test-hive1.2] Nested column predicate pushdown for ORC

2020-08-04 Thread GitBox



AmplabJenkins commented on pull request #28761:
URL: https://github.com/apache/spark/pull/28761#issuecomment-668983580







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28761: [SPARK-25557][SQL][test-hive1.2] Nested column predicate pushdown for ORC

2020-08-04 Thread GitBox



SparkQA commented on pull request #28761:
URL: https://github.com/apache/spark/pull/28761#issuecomment-668983196


   **[Test build #127078 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127078/testReport)**
 for PR 28761 at commit 
[`0747fcd`](https://github.com/apache/spark/commit/0747fcdef1ffdba5b8ce3cbafdc03cac3559f7d4).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] viirya commented on pull request #28761: [SPARK-25557][SQL][test-hive1.2] Nested column predicate pushdown for ORC

2020-08-04 Thread GitBox



viirya commented on pull request #28761:
URL: https://github.com/apache/spark/pull/28761#issuecomment-668982975


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #29334: [SPARK-32495][2.4] Update jackson versions to a maintained release, to fix various security vulnerabilities.

2020-08-04 Thread GitBox



HyukjinKwon commented on pull request #29334:
URL: https://github.com/apache/spark/pull/29334#issuecomment-668981971


   @ScrapCodes, yes, the m2 is corrupted in Jenkins machine. In the master, 
this dependency check is being skipped in Jenkins and GitHub Actions build runs 
instead.
   
   In other branches, they are not done yet (see SPARK-32249). I was hesitant 
about porting GitHub actions back yet but maybe we should go and port it back.
   
   For the meanwhile, maybe we can try to remove `.m2` in Jenkins machine, 
@shaneknapp. This solves the problem temporarily IIRC.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29031: [SPARK-32216][SQL] Remove redundant ProjectExec

2020-08-04 Thread GitBox



AmplabJenkins removed a comment on pull request #29031:
URL: https://github.com/apache/spark/pull/29031#issuecomment-668981428







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #29031: [SPARK-32216][SQL] Remove redundant ProjectExec

2020-08-04 Thread GitBox



AmplabJenkins commented on pull request #29031:
URL: https://github.com/apache/spark/pull/29031#issuecomment-668981428







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #29031: [SPARK-32216][SQL] Remove redundant ProjectExec

2020-08-04 Thread GitBox



SparkQA commented on pull request #29031:
URL: https://github.com/apache/spark/pull/29031#issuecomment-668981110


   **[Test build #127077 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127077/testReport)**
 for PR 29031 at commit 
[`4585a04`](https://github.com/apache/spark/commit/4585a04ef4548022865ea0978040d9dd56df8252).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on pull request #29031: [SPARK-32216][SQL] Remove redundant ProjectExec

2020-08-04 Thread GitBox



cloud-fan commented on pull request #29031:
URL: https://github.com/apache/spark/pull/29031#issuecomment-668980317


   add to whitelist



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] allisonwang-db opened a new pull request #29031: [SPARK-32216][SQL] Remove redundant ProjectExec

2020-08-04 Thread GitBox



allisonwang-db opened a new pull request #29031:
URL: https://github.com/apache/spark/pull/29031


   ### What changes were proposed in this pull request?
   This PR added a physical rule to remove redundant project nodes. A 
`ProjectExec` is redundant when
   1. It has the same output attributes and order as its child's output when 
ordering of these attributes is required.
   2. It has the same output attributes as its child's output when attribute 
output ordering is not required.
   
   For example:
   After Filter: 
   ```
   == Physical Plan ==
   *(1) Project [a#14L, b#15L, c#16, key#17] 
   +- *(1) Filter (isnotnull(a#14L) AND (a#14L > 5))
  +- *(1) ColumnarToRow
 +- FileScan parquet [a#14L,b#15L,c#16,key#17] 
   ```
   The `Project a#14L, b#15L, c#16, key#17` is redundant because its output is 
exactly the same as filter's output.
   
   Before Aggregate:
   ```
   == Physical Plan ==
   *(2) HashAggregate(keys=[key#17], functions=[sum(a#14L), last(b#15L, 
false)], output=[sum_a#39L, key#17, last_b#41L])
   +- Exchange hashpartitioning(key#17, 5), true, [id=#77]
  +- *(1) HashAggregate(keys=[key#17], functions=[partial_sum(a#14L), 
partial_last(b#15L, false)], output=[key#17, sum#49L, last#50L, valueSet#51])
 +- *(1) Project [key#17, a#14L, b#15L]
+- *(1) Filter (isnotnull(a#14L) AND (a#14L > 100))
   +- *(1) ColumnarToRow
  +- FileScan parquet [a#14L,b#15L,key#17] 
   ```
   The `Project key#17, a#14L, b#15L` is redundant because hash aggregate 
doesn't require child plan's output to be in a specific order.
   
   ### Why are the changes needed?
   
   It removes unnecessary query nodes and makes query plan cleaner.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No
   
   ### How was this patch tested?
   
   Unit tests



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on pull request #29031: [SPARK-32216][SQL] Remove redundant ProjectExec

2020-08-04 Thread GitBox



cloud-fan commented on pull request #29031:
URL: https://github.com/apache/spark/pull/29031#issuecomment-668980249


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan closed pull request #29031: [SPARK-32216][SQL] Remove redundant ProjectExec

2020-08-04 Thread GitBox



cloud-fan closed pull request #29031:
URL: https://github.com/apache/spark/pull/29031


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on pull request #29125: [SPARK-32018][SQL][3.0] UnsafeRow.setDecimal should set null with overflowed value

2020-08-04 Thread GitBox



cloud-fan commented on pull request #29125:
URL: https://github.com/apache/spark/pull/29125#issuecomment-668980063


   @skambha you will still hit the sum bug when you disable whole-stage-codegen 
(or fallback to it due to generated code exceeds 64kb), right?
   
   We are not introducing a new correctness bug. It's an existing bug and the 
backport makes it more visible.
   
   We've added a mechanism in the master branch to check the streaming state 
store backward compatibility. If we want to backport the actual fix, we need to 
backport this mechanism as well. I think that's too many things to backport.
   
   How about this: we force to enable ANSI for decimal sum, so that the 
behavior is the same without fixing the UnsafeRow bug? It's not an ideal fix 
but should be safer to backport. @skambha what do you think? Can you help to do 
it?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] ScrapCodes commented on pull request #29334: [WIP][RFC][SPARK-32495][2.4] Update jackson versions to a maintained release, to fix various security vulnerabilities.

2020-08-04 Thread GitBox



ScrapCodes commented on pull request #29334:
URL: https://github.com/apache/spark/pull/29334#issuecomment-668977421


   Alright, one more https://github.com/FasterXML/jackson-databind/issues/2798, 
Shall we consider 2.10.x ? change is the same and the later is free from whole 
store house of CVEs.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] ScrapCodes commented on pull request #29334: [WIP][RFC][SPARK-32495][2.4] Update jackson versions to a maintained release, to fix various security vulnerabilities.

2020-08-04 Thread GitBox



ScrapCodes commented on pull request #29334:
URL: https://github.com/apache/spark/pull/29334#issuecomment-668976387


   @Fokko, @srowen and @dongjoon-hyun Thank for giving me the feedback. I 
agree, with you guys. But, I wanted to give this patch a try - can it be done 
in a clean way? This patch is almost the same as one mentioned at, 
https://github.com/apache/spark/pull/21596
   
   Also, the behaviour change was due to a bug in jackson 2.7.x. I have tested, 
it worked fine.
   
   Somehow the Jenkins, does not pass due to corrupted m2 cache. Do you think, 
that it is worth a try?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] MaxGekk commented on pull request #29311: [SPARK-32501][SQL] Convert null to "null" in structs, maps and arrays while casting to strings

2020-08-04 Thread GitBox



MaxGekk commented on pull request #29311:
URL: https://github.com/apache/spark/pull/29311#issuecomment-668976181


   @cloud-fan @maropu @HyukjinKwon Please, review this PR.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #29354: [WIP][Spark-32533][SQL] Improve Avro read/write performance on nested structs and array of structs

2020-08-04 Thread GitBox



HyukjinKwon commented on pull request #29354:
URL: https://github.com/apache/spark/pull/29354#issuecomment-668975373


   It might be great if we can elabourate how it improves performance. We can 
focus on the fix only instead of mixing refactoring here.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #29354: [WIP][Spark-32533][SQL] Improve Avro read/write performance on nested structs and array of structs

2020-08-04 Thread GitBox



HyukjinKwon commented on a change in pull request #29354:
URL: https://github.com/apache/spark/pull/29354#discussion_r465466999



##
File path: 
external/avro/src/main/scala/org/apache/spark/sql/avro/AvroDeserializer.scala
##
@@ -1,427 +0,0 @@
-/*

Review comment:
   Hey, let's avoid renaming a file like this. It's difficult to track what 
was changed. Can we keep the class and file names?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #29353: [SPARK-32532][SQL] Improve ORC read/write performance on nested structs and array of structs

2020-08-04 Thread GitBox



HyukjinKwon commented on a change in pull request #29353:
URL: https://github.com/apache/spark/pull/29353#discussion_r465466392



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcDeserializer.scala
##
@@ -72,137 +74,191 @@ class OrcDeserializer(
   /**
* Creates a writer to write ORC values to Catalyst data structure at the 
given ordinal.
*/
-  private def newWriter(
-  dataType: DataType, updater: CatalystDataUpdater): (Int, 
WritableComparable[_]) => Unit =
-dataType match {
-  case NullType => (ordinal, _) =>
-updater.setNullAt(ordinal)
-
-  case BooleanType => (ordinal, value) =>
-updater.setBoolean(ordinal, value.asInstanceOf[BooleanWritable].get)
-
-  case ByteType => (ordinal, value) =>
-updater.setByte(ordinal, value.asInstanceOf[ByteWritable].get)
-
-  case ShortType => (ordinal, value) =>
-updater.setShort(ordinal, value.asInstanceOf[ShortWritable].get)
-
-  case IntegerType => (ordinal, value) =>
-updater.setInt(ordinal, value.asInstanceOf[IntWritable].get)
-
-  case LongType => (ordinal, value) =>
-updater.setLong(ordinal, value.asInstanceOf[LongWritable].get)
-
-  case FloatType => (ordinal, value) =>
-updater.setFloat(ordinal, value.asInstanceOf[FloatWritable].get)
-
-  case DoubleType => (ordinal, value) =>
-updater.setDouble(ordinal, value.asInstanceOf[DoubleWritable].get)
-
-  case StringType => (ordinal, value) =>
-updater.set(ordinal, 
UTF8String.fromBytes(value.asInstanceOf[Text].copyBytes))
-
-  case BinaryType => (ordinal, value) =>
-val binary = value.asInstanceOf[BytesWritable]
-val bytes = new Array[Byte](binary.getLength)
-System.arraycopy(binary.getBytes, 0, bytes, 0, binary.getLength)
-updater.set(ordinal, bytes)
-
-  case DateType => (ordinal, value) =>
-updater.setInt(ordinal, OrcShimUtils.getGregorianDays(value))
-
-  case TimestampType => (ordinal, value) =>
-updater.setLong(ordinal, 
DateTimeUtils.fromJavaTimestamp(value.asInstanceOf[OrcTimestamp]))
-
-  case DecimalType.Fixed(precision, scale) => (ordinal, value) =>
-val v = OrcShimUtils.getDecimal(value)
-v.changePrecision(precision, scale)
-updater.set(ordinal, v)
-
-  case st: StructType => (ordinal, value) =>
-val result = new SpecificInternalRow(st)
-val fieldUpdater = new RowUpdater(result)
-val fieldConverters = st.map(_.dataType).map { dt =>
-  newWriter(dt, fieldUpdater)
-}.toArray
-val orcStruct = value.asInstanceOf[OrcStruct]
-
-var i = 0
-while (i < st.length) {
-  val value = orcStruct.getFieldValue(i)
-  if (value == null) {
-result.setNullAt(i)
-  } else {
-fieldConverters(i)(i, value)
+  private def newWriter(dataType: DataType, reuseObj: Boolean):
+  (CatalystDataUpdater, Int, WritableComparable[_]) => Unit = dataType match {
+case NullType => (updater, ordinal, _) =>
+  updater.setNullAt(ordinal)
+
+case BooleanType => (updater, ordinal, value) =>
+  updater.setBoolean(ordinal, value.asInstanceOf[BooleanWritable].get)
+
+case ByteType => (updater, ordinal, value) =>
+  updater.setByte(ordinal, value.asInstanceOf[ByteWritable].get)
+
+case ShortType => (updater, ordinal, value) =>
+  updater.setShort(ordinal, value.asInstanceOf[ShortWritable].get)
+
+case IntegerType => (updater, ordinal, value) =>
+  updater.setInt(ordinal, value.asInstanceOf[IntWritable].get)
+
+case LongType => (updater, ordinal, value) =>
+  updater.setLong(ordinal, value.asInstanceOf[LongWritable].get)
+
+case FloatType => (updater, ordinal, value) =>
+  updater.setFloat(ordinal, value.asInstanceOf[FloatWritable].get)
+
+case DoubleType => (updater, ordinal, value) =>
+  updater.setDouble(ordinal, value.asInstanceOf[DoubleWritable].get)
+
+case StringType => (updater, ordinal, value) =>
+  updater.set(ordinal, 
UTF8String.fromBytes(value.asInstanceOf[Text].copyBytes))
+
+case BinaryType => (updater, ordinal, value) =>
+  val binary = value.asInstanceOf[BytesWritable]
+  val bytes = new Array[Byte](binary.getLength)
+  System.arraycopy(binary.getBytes, 0, bytes, 0, binary.getLength)
+  updater.set(ordinal, bytes)
+
+case DateType => (updater, ordinal, value) =>
+  updater.setInt(ordinal, OrcShimUtils.getGregorianDays(value))
+
+case TimestampType => (updater, ordinal, value) =>
+  updater.setLong(ordinal, 
DateTimeUtils.fromJavaTimestamp(value.asInstanceOf[OrcTimestamp]))
+
+case DecimalType.Fixed(precision, scale) => (updater, ordinal, value) =>
+  val v = OrcShimUtils.getDecimal(value)
+  v.changePrecision(precision, scale)
+  updater.set(ordinal, v)
+
+case st: StructType =>
+  val createRow: () => InternalRow = getRowCrea

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29211: [SPARK-31197][CORE] Shutdown executor once we are done decommissioning

2020-08-04 Thread GitBox



AmplabJenkins removed a comment on pull request #29211:
URL: https://github.com/apache/spark/pull/29211#issuecomment-668973946







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #29211: [SPARK-31197][CORE] Shutdown executor once we are done decommissioning

2020-08-04 Thread GitBox



AmplabJenkins commented on pull request #29211:
URL: https://github.com/apache/spark/pull/29211#issuecomment-668973946







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #29211: [SPARK-31197][CORE] Shutdown executor once we are done decommissioning

2020-08-04 Thread GitBox



SparkQA commented on pull request #29211:
URL: https://github.com/apache/spark/pull/29211#issuecomment-668973694


   **[Test build #127076 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127076/testReport)**
 for PR 29211 at commit 
[`477e77f`](https://github.com/apache/spark/commit/477e77f5ad3755af81e5c0304eeb37ce273a18d4).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #29339: [Spark-32512][SQL] add alter table add/drop partition command for datasourcev2

2020-08-04 Thread GitBox



SparkQA commented on pull request #29339:
URL: https://github.com/apache/spark/pull/29339#issuecomment-668973681


   **[Test build #127075 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127075/testReport)**
 for PR 29339 at commit 
[`b1fc84b`](https://github.com/apache/spark/commit/b1fc84bfd33fadbeccf118a5db2e752e241947dc).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29339: [Spark-32512][SQL] add alter table add/drop partition command for datasourcev2

2020-08-04 Thread GitBox



AmplabJenkins removed a comment on pull request #29339:
URL: https://github.com/apache/spark/pull/29339#issuecomment-668972189







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #29339: [Spark-32512][SQL] add alter table add/drop partition command for datasourcev2

2020-08-04 Thread GitBox



AmplabJenkins commented on pull request #29339:
URL: https://github.com/apache/spark/pull/29339#issuecomment-668972189







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #29353: [SPARK-32532][SQL] Improve ORC read/write performance on nested structs and array of structs

2020-08-04 Thread GitBox



HyukjinKwon commented on a change in pull request #29353:
URL: https://github.com/apache/spark/pull/29353#discussion_r465460837



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcDeserializer.scala
##
@@ -72,137 +74,191 @@ class OrcDeserializer(
   /**
* Creates a writer to write ORC values to Catalyst data structure at the 
given ordinal.
*/
-  private def newWriter(
-  dataType: DataType, updater: CatalystDataUpdater): (Int, 
WritableComparable[_]) => Unit =
-dataType match {
-  case NullType => (ordinal, _) =>
-updater.setNullAt(ordinal)
-
-  case BooleanType => (ordinal, value) =>
-updater.setBoolean(ordinal, value.asInstanceOf[BooleanWritable].get)
-
-  case ByteType => (ordinal, value) =>
-updater.setByte(ordinal, value.asInstanceOf[ByteWritable].get)
-
-  case ShortType => (ordinal, value) =>
-updater.setShort(ordinal, value.asInstanceOf[ShortWritable].get)
-
-  case IntegerType => (ordinal, value) =>
-updater.setInt(ordinal, value.asInstanceOf[IntWritable].get)
-
-  case LongType => (ordinal, value) =>
-updater.setLong(ordinal, value.asInstanceOf[LongWritable].get)
-
-  case FloatType => (ordinal, value) =>
-updater.setFloat(ordinal, value.asInstanceOf[FloatWritable].get)
-
-  case DoubleType => (ordinal, value) =>
-updater.setDouble(ordinal, value.asInstanceOf[DoubleWritable].get)
-
-  case StringType => (ordinal, value) =>
-updater.set(ordinal, 
UTF8String.fromBytes(value.asInstanceOf[Text].copyBytes))
-
-  case BinaryType => (ordinal, value) =>
-val binary = value.asInstanceOf[BytesWritable]
-val bytes = new Array[Byte](binary.getLength)
-System.arraycopy(binary.getBytes, 0, bytes, 0, binary.getLength)
-updater.set(ordinal, bytes)
-
-  case DateType => (ordinal, value) =>
-updater.setInt(ordinal, OrcShimUtils.getGregorianDays(value))
-
-  case TimestampType => (ordinal, value) =>
-updater.setLong(ordinal, 
DateTimeUtils.fromJavaTimestamp(value.asInstanceOf[OrcTimestamp]))
-
-  case DecimalType.Fixed(precision, scale) => (ordinal, value) =>
-val v = OrcShimUtils.getDecimal(value)
-v.changePrecision(precision, scale)
-updater.set(ordinal, v)
-
-  case st: StructType => (ordinal, value) =>
-val result = new SpecificInternalRow(st)
-val fieldUpdater = new RowUpdater(result)
-val fieldConverters = st.map(_.dataType).map { dt =>
-  newWriter(dt, fieldUpdater)
-}.toArray
-val orcStruct = value.asInstanceOf[OrcStruct]
-
-var i = 0
-while (i < st.length) {
-  val value = orcStruct.getFieldValue(i)
-  if (value == null) {
-result.setNullAt(i)
-  } else {
-fieldConverters(i)(i, value)
+  private def newWriter(dataType: DataType, reuseObj: Boolean):
+  (CatalystDataUpdater, Int, WritableComparable[_]) => Unit = dataType match {

Review comment:
   I would keep this style as:
   
   ```scala
   private def newWriter(
   dataType: DataType, reuseObj: Boolean)
   : (CatalystDataUpdater, Int, WritableComparable[_]) => Unit =
 dataType match {
   case NullType => (updater, ordinal, _) =>
 ...
 }
   ```
   
   to reduce the diff.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #29353: [SPARK-32532][SQL] Improve ORC read/write performance on nested structs and array of structs

2020-08-04 Thread GitBox



HyukjinKwon commented on a change in pull request #29353:
URL: https://github.com/apache/spark/pull/29353#discussion_r465460837



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcDeserializer.scala
##
@@ -72,137 +74,191 @@ class OrcDeserializer(
   /**
* Creates a writer to write ORC values to Catalyst data structure at the 
given ordinal.
*/
-  private def newWriter(
-  dataType: DataType, updater: CatalystDataUpdater): (Int, 
WritableComparable[_]) => Unit =
-dataType match {
-  case NullType => (ordinal, _) =>
-updater.setNullAt(ordinal)
-
-  case BooleanType => (ordinal, value) =>
-updater.setBoolean(ordinal, value.asInstanceOf[BooleanWritable].get)
-
-  case ByteType => (ordinal, value) =>
-updater.setByte(ordinal, value.asInstanceOf[ByteWritable].get)
-
-  case ShortType => (ordinal, value) =>
-updater.setShort(ordinal, value.asInstanceOf[ShortWritable].get)
-
-  case IntegerType => (ordinal, value) =>
-updater.setInt(ordinal, value.asInstanceOf[IntWritable].get)
-
-  case LongType => (ordinal, value) =>
-updater.setLong(ordinal, value.asInstanceOf[LongWritable].get)
-
-  case FloatType => (ordinal, value) =>
-updater.setFloat(ordinal, value.asInstanceOf[FloatWritable].get)
-
-  case DoubleType => (ordinal, value) =>
-updater.setDouble(ordinal, value.asInstanceOf[DoubleWritable].get)
-
-  case StringType => (ordinal, value) =>
-updater.set(ordinal, 
UTF8String.fromBytes(value.asInstanceOf[Text].copyBytes))
-
-  case BinaryType => (ordinal, value) =>
-val binary = value.asInstanceOf[BytesWritable]
-val bytes = new Array[Byte](binary.getLength)
-System.arraycopy(binary.getBytes, 0, bytes, 0, binary.getLength)
-updater.set(ordinal, bytes)
-
-  case DateType => (ordinal, value) =>
-updater.setInt(ordinal, OrcShimUtils.getGregorianDays(value))
-
-  case TimestampType => (ordinal, value) =>
-updater.setLong(ordinal, 
DateTimeUtils.fromJavaTimestamp(value.asInstanceOf[OrcTimestamp]))
-
-  case DecimalType.Fixed(precision, scale) => (ordinal, value) =>
-val v = OrcShimUtils.getDecimal(value)
-v.changePrecision(precision, scale)
-updater.set(ordinal, v)
-
-  case st: StructType => (ordinal, value) =>
-val result = new SpecificInternalRow(st)
-val fieldUpdater = new RowUpdater(result)
-val fieldConverters = st.map(_.dataType).map { dt =>
-  newWriter(dt, fieldUpdater)
-}.toArray
-val orcStruct = value.asInstanceOf[OrcStruct]
-
-var i = 0
-while (i < st.length) {
-  val value = orcStruct.getFieldValue(i)
-  if (value == null) {
-result.setNullAt(i)
-  } else {
-fieldConverters(i)(i, value)
+  private def newWriter(dataType: DataType, reuseObj: Boolean):
+  (CatalystDataUpdater, Int, WritableComparable[_]) => Unit = dataType match {

Review comment:
   I would keep this style as:
   
   ```scala
   private def newWriter(
   dataType: DataType, reuseObj: Boolean)
   : (CatalystDataUpdater, Int, WritableComparable[_]) => Unit =
 dataType match {
 case NullType => (updater, ordinal, _) =>
 ...
 }
   ```
   
   to reduce the diff.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29333: [WIP][SPARK-32357][INFRA] Publish failed and succeeded test reports in GitHub Actions

2020-08-04 Thread GitBox



AmplabJenkins removed a comment on pull request #29333:
URL: https://github.com/apache/spark/pull/29333#issuecomment-668968321


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/127073/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29333: [WIP][SPARK-32357][INFRA] Publish failed and succeeded test reports in GitHub Actions

2020-08-04 Thread GitBox



AmplabJenkins removed a comment on pull request #29333:
URL: https://github.com/apache/spark/pull/29333#issuecomment-668968315


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #29333: [WIP][SPARK-32357][INFRA] Publish failed and succeeded test reports in GitHub Actions

2020-08-04 Thread GitBox



AmplabJenkins commented on pull request #29333:
URL: https://github.com/apache/spark/pull/29333#issuecomment-668968315







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 >

1 - 100 of 543 matches

Mail list logo