[GitHub] [spark] yanxiaole commented on a change in pull request #29350: [SPARK-32529][CORE] Fix Historyserver log scan aborted by application status change
yanxiaole commented on a change in pull request #29350: URL: https://github.com/apache/spark/pull/29350#discussion_r465514321 ## File path: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ## @@ -530,10 +530,16 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock) // If the file is currently not being tracked by the SHS, add an entry for it and try // to parse it. This will allow the cleaner code to detect the file as stale later on // if it was not possible to parse it. - listing.write(LogInfo(reader.rootPath.toString(), newLastScanTime, LogType.EventLogs, -None, None, reader.fileSizeForLastIndex, reader.lastIndex, None, -reader.completed)) - reader.fileSizeForLastIndex > 0 + try { +listing.write(LogInfo(reader.rootPath.toString(), newLastScanTime, + LogType.EventLogs, None, None, reader.fileSizeForLastIndex, reader.lastIndex, + None, reader.completed)) +reader.fileSizeForLastIndex > 0 + } catch { +case _: FileNotFoundException => false + } +case _: FileNotFoundException => Review comment: added > nit: I'd have empty new line after `}`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28761: [SPARK-25557][SQL][test-hive2.3] Nested column predicate pushdown for ORC
SparkQA commented on pull request #28761: URL: https://github.com/apache/spark/pull/28761#issuecomment-669018092 **[Test build #127084 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127084/testReport)** for PR 28761 at commit [`0747fcd`](https://github.com/apache/spark/commit/0747fcdef1ffdba5b8ce3cbafdc03cac3559f7d4). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on pull request #28761: [SPARK-25557][SQL][test-hive2.3] Nested column predicate pushdown for ORC
viirya commented on pull request #28761: URL: https://github.com/apache/spark/pull/28761#issuecomment-669016101 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29031: [SPARK-32216][SQL] Remove redundant ProjectExec
AmplabJenkins removed a comment on pull request #29031: URL: https://github.com/apache/spark/pull/29031#issuecomment-669015435 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/127077/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29311: [SPARK-32501][SQL] Convert null to "null" in structs, maps and arrays while casting to strings
AmplabJenkins removed a comment on pull request #29311: URL: https://github.com/apache/spark/pull/29311#issuecomment-669015509 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29350: [SPARK-32529][CORE] Fix Historyserver log scan aborted by application status change
AmplabJenkins removed a comment on pull request #29350: URL: https://github.com/apache/spark/pull/29350#issuecomment-669015466 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29347: [WIP][SPARK-32492][SQL][FOLLOWUP][test-maven] Fix jenkins maven jobs
AmplabJenkins removed a comment on pull request #29347: URL: https://github.com/apache/spark/pull/29347#issuecomment-669015453 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29311: [SPARK-32501][SQL] Convert null to "null" in structs, maps and arrays while casting to strings
AmplabJenkins commented on pull request #29311: URL: https://github.com/apache/spark/pull/29311#issuecomment-669015509 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29031: [SPARK-32216][SQL] Remove redundant ProjectExec
AmplabJenkins removed a comment on pull request #29031: URL: https://github.com/apache/spark/pull/29031#issuecomment-669015424 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29031: [SPARK-32216][SQL] Remove redundant ProjectExec
AmplabJenkins commented on pull request #29031: URL: https://github.com/apache/spark/pull/29031#issuecomment-669015424 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29350: [SPARK-32529][CORE] Fix Historyserver log scan aborted by application status change
AmplabJenkins commented on pull request #29350: URL: https://github.com/apache/spark/pull/29350#issuecomment-669015466 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29031: [SPARK-32216][SQL] Remove redundant ProjectExec
SparkQA commented on pull request #29031: URL: https://github.com/apache/spark/pull/29031#issuecomment-669015315 **[Test build #127077 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127077/testReport)** for PR 29031 at commit [`4585a04`](https://github.com/apache/spark/commit/4585a04ef4548022865ea0978040d9dd56df8252). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29031: [SPARK-32216][SQL] Remove redundant ProjectExec
SparkQA removed a comment on pull request #29031: URL: https://github.com/apache/spark/pull/29031#issuecomment-668981110 **[Test build #127077 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127077/testReport)** for PR 29031 at commit [`4585a04`](https://github.com/apache/spark/commit/4585a04ef4548022865ea0978040d9dd56df8252). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29347: [WIP][SPARK-32492][SQL][FOLLOWUP][test-maven] Fix jenkins maven jobs
AmplabJenkins commented on pull request #29347: URL: https://github.com/apache/spark/pull/29347#issuecomment-669015443 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29324: [SPARK-32402][SQL] Implement ALTER TABLE in JDBC Table Catalog
SparkQA commented on pull request #29324: URL: https://github.com/apache/spark/pull/29324#issuecomment-669014937 **[Test build #127082 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127082/testReport)** for PR 29324 at commit [`dfc7387`](https://github.com/apache/spark/commit/dfc73873f6342774a8f96782c2e4853ede86a190). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29350: [SPARK-32529][CORE] Fix Historyserver log scan aborted by application status change
SparkQA commented on pull request #29350: URL: https://github.com/apache/spark/pull/29350#issuecomment-669014868 **[Test build #127080 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127080/testReport)** for PR 29350 at commit [`d1bf4ca`](https://github.com/apache/spark/commit/d1bf4caa30231a41fd4e6025c34af71c5f15e07e). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29311: [SPARK-32501][SQL] Convert null to "null" in structs, maps and arrays while casting to strings
SparkQA commented on pull request #29311: URL: https://github.com/apache/spark/pull/29311#issuecomment-669014971 **[Test build #127083 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127083/testReport)** for PR 29311 at commit [`9547682`](https://github.com/apache/spark/commit/95476826b8e99e0d4e0453d2d161e1f8bfabcd8f). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29347: [WIP][SPARK-32492][SQL][FOLLOWUP][test-maven] Fix jenkins maven jobs
SparkQA commented on pull request #29347: URL: https://github.com/apache/spark/pull/29347#issuecomment-669014908 **[Test build #127081 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127081/testReport)** for PR 29347 at commit [`61ba3ce`](https://github.com/apache/spark/commit/61ba3cee43b06d4987f5ae71bd01b20ce674766a). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28761: [SPARK-25557][SQL][test-hive1.2] Nested column predicate pushdown for ORC
AmplabJenkins removed a comment on pull request #28761: URL: https://github.com/apache/spark/pull/28761#issuecomment-669014517 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/127078/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28761: [SPARK-25557][SQL][test-hive1.2] Nested column predicate pushdown for ORC
AmplabJenkins commented on pull request #28761: URL: https://github.com/apache/spark/pull/28761#issuecomment-669014509 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28761: [SPARK-25557][SQL][test-hive1.2] Nested column predicate pushdown for ORC
AmplabJenkins removed a comment on pull request #28761: URL: https://github.com/apache/spark/pull/28761#issuecomment-669014509 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28761: [SPARK-25557][SQL][test-hive1.2] Nested column predicate pushdown for ORC
SparkQA removed a comment on pull request #28761: URL: https://github.com/apache/spark/pull/28761#issuecomment-668983196 **[Test build #127078 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127078/testReport)** for PR 28761 at commit [`0747fcd`](https://github.com/apache/spark/commit/0747fcdef1ffdba5b8ce3cbafdc03cac3559f7d4). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28761: [SPARK-25557][SQL][test-hive1.2] Nested column predicate pushdown for ORC
SparkQA commented on pull request #28761: URL: https://github.com/apache/spark/pull/28761#issuecomment-669014400 **[Test build #127078 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127078/testReport)** for PR 28761 at commit [`0747fcd`](https://github.com/apache/spark/commit/0747fcdef1ffdba5b8ce3cbafdc03cac3559f7d4). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `abstract class ParquetFilterSuite extends QueryTest with ParquetTest with SharedSparkSession ` * `class OrcFilterSuite extends OrcTest with SharedSparkSession ` * `class OrcFilterSuite extends OrcTest with SharedSparkSession ` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on pull request #29347: [WIP][SPARK-32492][SQL][FOLLOWUP][test-maven] Fix jenkins maven jobs
yaooqinn commented on pull request #29347: URL: https://github.com/apache/spark/pull/29347#issuecomment-669011351 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gengliangwang commented on a change in pull request #29137: [SPARK-32337][SQL] Show initial plan in AQE plan tree string
gengliangwang commented on a change in pull request #29137: URL: https://github.com/apache/spark/pull/29137#discussion_r465503960 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala ## @@ -279,34 +280,69 @@ case class AdaptiveSparkPlanExec( prefix: String = "", addSuffix: Boolean = false, maxFields: Int, - printNodeId: Boolean): Unit = { -super.generateTreeString(depth, + printNodeId: Boolean, + indent: Int = 0): Unit = { +super.generateTreeString( + depth, lastChildren, append, verbose, prefix, addSuffix, maxFields, + printNodeId, + indent) +generateTreeStringWithHeader( + if (isFinalPlan) "Final Plan" else "Current Plan", Review comment: > Why it's Current Plan not Final Plan in EXPLAIN FORMATTED @cloud-fan I think it is from here. @allisonwang-db Could you update the PR description to explain the difference? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk commented on a change in pull request #29311: [SPARK-32501][SQL] Convert null to "null" in structs, maps and arrays while casting to strings
MaxGekk commented on a change in pull request #29311: URL: https://github.com/apache/spark/pull/29311#discussion_r465503811 ## File path: docs/sql-migration-guide.md ## @@ -34,6 +34,8 @@ license: | - In Spark 3.1, structs and maps are wrapped by the `{}` brackets in casting them to strings. For instance, the `show()` action and the `CAST` expression use such brackets. In Spark 3.0 and earlier, the `[]` brackets are used for the same purpose. To restore the behavior before Spark 3.1, you can set `spark.sql.legacy.castComplexTypesToString.enabled` to `true`. + - In Spark 3.1, `CAST` converts NULL elements of structures, arrays and maps to "null". In Spark 3.0 or earlier, NULL elements are converted to empty strings. To restore the behavior before Spark 3.1, you can set `spark.sql.legacy.castComplexTypesToString.enabled` to `true`. Review comment: https://user-images.githubusercontent.com/1580697/89379682-e594e880-d6fe-11ea-8594-040a428cdce3.png";> This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #29324: [SPARK-32402][SQL] Implement ALTER TABLE in JDBC Table Catalog
viirya commented on a change in pull request #29324: URL: https://github.com/apache/spark/pull/29324#discussion_r465502978 ## File path: sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala ## @@ -184,15 +189,61 @@ abstract class JdbcDialect extends Serializable { /** * Rename an existing table. * - * TODO (SPARK-32382): Override this method in the dialects that don't support such syntax. - * * @param oldTable The existing table. * @param newTable New name of the table. * @return The SQL statement to use for renaming the table. */ def renameTable(oldTable: String, newTable: String): String = { s"ALTER TABLE $oldTable RENAME TO $newTable" } + + /** + * Alter an existing table. + * TODO (SPARK-32523): Override this method in the dialects that have different syntax. + * + * @param tableName The name of the table to be altered. + * @param changes Changes to apply to the table. + * @return The SQL statements to use for altering the table. + */ + def alterTable(tableName: String, changes: Seq[TableChange]): Array[String] = { +val updateClause = ArrayBuilder.make[String] +for (change <- changes) { + change match { +case add: AddColumn if add.fieldNames.length == 1 => + add.fieldNames match { +case Array(name) => + val dataType = JdbcUtils.getJdbcType(add.dataType(), this).databaseTypeDefinition + updateClause += s"ALTER TABLE $tableName ADD COLUMN $name $dataType" + } +case rename: RenameColumn if rename.fieldNames.length == 1 => + rename.fieldNames match { +case Array(name) => + updateClause += s"ALTER TABLE $tableName RENAME COLUMN $name TO ${rename.newName}" + } +case delete: DeleteColumn if delete.fieldNames.length == 1 => + delete.fieldNames match { +case Array(name) => + updateClause += s"ALTER TABLE $tableName DROP COLUMN $name" + } +case update: UpdateColumnType if update.fieldNames.length == 1 => + update.fieldNames match { +case Array(name) => + val dataType = JdbcUtils.getJdbcType(update.newDataType(), this) Review comment: nit: We know `fieldNames` must be one element now. We don't need `match` and can just access `fieldNames(0)`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #29324: [SPARK-32402][SQL] Implement ALTER TABLE in JDBC Table Catalog
viirya commented on a change in pull request #29324: URL: https://github.com/apache/spark/pull/29324#discussion_r465501617 ## File path: sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala ## @@ -184,15 +189,61 @@ abstract class JdbcDialect extends Serializable { /** * Rename an existing table. * - * TODO (SPARK-32382): Override this method in the dialects that don't support such syntax. - * * @param oldTable The existing table. * @param newTable New name of the table. * @return The SQL statement to use for renaming the table. */ def renameTable(oldTable: String, newTable: String): String = { s"ALTER TABLE $oldTable RENAME TO $newTable" } + + /** + * Alter an existing table. + * TODO (SPARK-32523): Override this method in the dialects that have different syntax. Review comment: Because you will override this method in other places, not here. Remember to remove this later. :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29324: [SPARK-32402][SQL] Implement ALTER TABLE in JDBC Table Catalog
AmplabJenkins removed a comment on pull request #29324: URL: https://github.com/apache/spark/pull/29324#issuecomment-669006266 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29347: [WIP][SPARK-32492][SQL][FOLLOWUP][test-maven] Fix jenkins maven jobs
AmplabJenkins removed a comment on pull request #29347: URL: https://github.com/apache/spark/pull/29347#issuecomment-669006278 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29347: [WIP][SPARK-32492][SQL][FOLLOWUP][test-maven] Fix jenkins maven jobs
AmplabJenkins commented on pull request #29347: URL: https://github.com/apache/spark/pull/29347#issuecomment-669006278 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29324: [SPARK-32402][SQL] Implement ALTER TABLE in JDBC Table Catalog
AmplabJenkins commented on pull request #29324: URL: https://github.com/apache/spark/pull/29324#issuecomment-669006266 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk commented on a change in pull request #29311: [SPARK-32501][SQL] Convert null to "null" in structs, maps and arrays while casting to strings
MaxGekk commented on a change in pull request #29311: URL: https://github.com/apache/spark/pull/29311#discussion_r465500077 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala ## @@ -321,7 +321,9 @@ abstract class CastBase extends UnaryExpression with TimeZoneAwareExpression wit var i = 1 while (i < array.numElements) { builder.append(",") -if (!array.isNullAt(i)) { +if (array.isNullAt(i)) { Review comment: Here, we have 3 branches. Don't think your proposal is applicable. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk commented on a change in pull request #29311: [SPARK-32501][SQL] Convert null to "null" in structs, maps and arrays while casting to strings
MaxGekk commented on a change in pull request #29311: URL: https://github.com/apache/spark/pull/29311#discussion_r465499676 ## File path: docs/sql-migration-guide.md ## @@ -34,6 +34,8 @@ license: | - In Spark 3.1, structs and maps are wrapped by the `{}` brackets in casting them to strings. For instance, the `show()` action and the `CAST` expression use such brackets. In Spark 3.0 and earlier, the `[]` brackets are used for the same purpose. To restore the behavior before Spark 3.1, you can set `spark.sql.legacy.castComplexTypesToString.enabled` to `true`. + - In Spark 3.1, `CAST` converts NULL elements of structures, arrays and maps to "null". In Spark 3.0 or earlier, NULL elements are converted to empty strings. To restore the behavior before Spark 3.1, you can set `spark.sql.legacy.castComplexTypesToString.enabled` to `true`. Review comment: The passive stuff was not recommended by IntelliJ IDEA ;-) Native speakers don't like it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29339: [Spark-32512][SQL] add alter table add/drop partition command for datasourcev2
AmplabJenkins removed a comment on pull request #29339: URL: https://github.com/apache/spark/pull/29339#issuecomment-669004419 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29339: [Spark-32512][SQL] add alter table add/drop partition command for datasourcev2
AmplabJenkins commented on pull request #29339: URL: https://github.com/apache/spark/pull/29339#issuecomment-669004419 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29339: [Spark-32512][SQL] add alter table add/drop partition command for datasourcev2
SparkQA commented on pull request #29339: URL: https://github.com/apache/spark/pull/29339#issuecomment-669003736 **[Test build #127072 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127072/testReport)** for PR 29339 at commit [`61cae52`](https://github.com/apache/spark/commit/61cae52344983f267867da0360e34b4a2a0c9e83). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29339: [Spark-32512][SQL] add alter table add/drop partition command for datasourcev2
SparkQA removed a comment on pull request #29339: URL: https://github.com/apache/spark/pull/29339#issuecomment-668927081 **[Test build #127072 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127072/testReport)** for PR 29339 at commit [`61cae52`](https://github.com/apache/spark/commit/61cae52344983f267867da0360e34b4a2a0c9e83). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] agrawaldevesh commented on pull request #29304: [SPARK-32494][SQL] Null Aware Anti Join Optimize Support Multi-Column
agrawaldevesh commented on pull request #29304: URL: https://github.com/apache/spark/pull/29304#issuecomment-669001099 @leanken ... this was a GREAT GREAT attempt and I certainly learned a ton from it :-P. I am curious if you ran profiled it while running the Q16 and have a sense of where the low hanging fruits might be ? We can also consider the hybrid approach we discussed where we double the memory and keep the original HashedRelation for step 1 and 2 of the paper but use the inverted indices only for the step 3. That might help with the inverted index caused regression for the single key case. In any case, I am totally with @cloud-fan that supporting shuffled hash join single key is more important. (As I also noted in my previous comment): > As a diversion, I wonder if it makes sense instead to support the single key case but for distributed scenario (shuffle hash join and like) if this multi-key stuff is really hard. I think the single-key distributed case would be more common. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #29324: [SPARK-32402][SQL] Implement ALTER TABLE in JDBC Table Catalog
HyukjinKwon commented on a change in pull request #29324: URL: https://github.com/apache/spark/pull/29324#discussion_r465493233 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala ## @@ -184,7 +179,7 @@ object JdbcUtils extends Logging { } } - private def getJdbcType(dt: DataType, dialect: JdbcDialect): JdbcType = { + private[sql] def getJdbcType(dt: DataType, dialect: JdbcDialect): JdbcType = { Review comment: It's okay to remove `private[sql]` because `execution` is already in the private package (see also SPARK-16964) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29332: [SPARK-32518][CORE] CoarseGrainedSchedulerBackend.maxNumConcurrentTasks should consider all kinds of resources
AmplabJenkins removed a comment on pull request #29332: URL: https://github.com/apache/spark/pull/29332#issuecomment-668996964 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29332: [SPARK-32518][CORE] CoarseGrainedSchedulerBackend.maxNumConcurrentTasks should consider all kinds of resources
AmplabJenkins commented on pull request #29332: URL: https://github.com/apache/spark/pull/29332#issuecomment-668996964 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29332: [SPARK-32518][CORE] CoarseGrainedSchedulerBackend.maxNumConcurrentTasks should consider all kinds of resources
SparkQA removed a comment on pull request #29332: URL: https://github.com/apache/spark/pull/29332#issuecomment-668959450 **[Test build #127074 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127074/testReport)** for PR 29332 at commit [`786b145`](https://github.com/apache/spark/commit/786b145771a12004668ea35afdc6e4054924bfed). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29332: [SPARK-32518][CORE] CoarseGrainedSchedulerBackend.maxNumConcurrentTasks should consider all kinds of resources
SparkQA commented on pull request #29332: URL: https://github.com/apache/spark/pull/29332#issuecomment-668996210 **[Test build #127074 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127074/testReport)** for PR 29332 at commit [`786b145`](https://github.com/apache/spark/commit/786b145771a12004668ea35afdc6e4054924bfed). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29311: [SPARK-32501][SQL] Convert null to "null" in structs, maps and arrays while casting to strings
cloud-fan commented on a change in pull request #29311: URL: https://github.com/apache/spark/pull/29311#discussion_r465489920 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala ## @@ -895,6 +905,10 @@ abstract class CastBase extends UnaryExpression with TimeZoneAwareExpression wit """ } + private def outNullElem(buffer: ExprValue): Block = { +if (legacyCastToStr) code";" else code"""$buffer.append(" null");""" Review comment: `code";"` -> `code""`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29311: [SPARK-32501][SQL] Convert null to "null" in structs, maps and arrays while casting to strings
cloud-fan commented on a change in pull request #29311: URL: https://github.com/apache/spark/pull/29311#discussion_r465489374 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala ## @@ -342,7 +344,9 @@ abstract class CastBase extends UnaryExpression with TimeZoneAwareExpression wit val valueToUTF8String = castToString(vt) builder.append(keyToUTF8String(keyArray.get(0, kt)).asInstanceOf[UTF8String]) builder.append(" ->") - if (!valueArray.isNullAt(0)) { + if (valueArray.isNullAt(0)) { Review comment: ditto This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29311: [SPARK-32501][SQL] Convert null to "null" in structs, maps and arrays while casting to strings
cloud-fan commented on a change in pull request #29311: URL: https://github.com/apache/spark/pull/29311#discussion_r465489447 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala ## @@ -351,7 +355,9 @@ abstract class CastBase extends UnaryExpression with TimeZoneAwareExpression wit builder.append(", ") builder.append(keyToUTF8String(keyArray.get(i, kt)).asInstanceOf[UTF8String]) builder.append(" ->") -if (!valueArray.isNullAt(i)) { +if (valueArray.isNullAt(i)) { Review comment: ditto ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala ## @@ -369,13 +375,17 @@ abstract class CastBase extends UnaryExpression with TimeZoneAwareExpression wit if (row.numFields > 0) { val st = fields.map(_.dataType) val toUTF8StringFuncs = st.map(castToString) - if (!row.isNullAt(0)) { + if (row.isNullAt(0)) { Review comment: ditto This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29311: [SPARK-32501][SQL] Convert null to "null" in structs, maps and arrays while casting to strings
cloud-fan commented on a change in pull request #29311: URL: https://github.com/apache/spark/pull/29311#discussion_r465489268 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala ## @@ -321,7 +321,9 @@ abstract class CastBase extends UnaryExpression with TimeZoneAwareExpression wit var i = 1 while (i < array.numElements) { builder.append(",") -if (!array.isNullAt(i)) { +if (array.isNullAt(i)) { Review comment: `if (array.isNullAt(i) && !legacyCastToStr)` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29311: [SPARK-32501][SQL] Convert null to "null" in structs, maps and arrays while casting to strings
cloud-fan commented on a change in pull request #29311: URL: https://github.com/apache/spark/pull/29311#discussion_r465488975 ## File path: docs/sql-migration-guide.md ## @@ -34,6 +34,8 @@ license: | - In Spark 3.1, structs and maps are wrapped by the `{}` brackets in casting them to strings. For instance, the `show()` action and the `CAST` expression use such brackets. In Spark 3.0 and earlier, the `[]` brackets are used for the same purpose. To restore the behavior before Spark 3.1, you can set `spark.sql.legacy.castComplexTypesToString.enabled` to `true`. + - In Spark 3.1, `CAST` converts NULL elements of structures, arrays and maps to "null". In Spark 3.0 or earlier, NULL elements are converted to empty strings. To restore the behavior before Spark 3.1, you can set `spark.sql.legacy.castComplexTypesToString.enabled` to `true`. Review comment: `In Spark 3.1, NULL elements of structures, arrays and maps are converted to "null" in casting them to strings.` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] stijndehaes commented on pull request #28423: [SPARK-24266][k8s] Restart the watcher when we receive a version changed from k8s
stijndehaes commented on pull request #28423: URL: https://github.com/apache/spark/pull/28423#issuecomment-668991161 @jkleckner I have never had a problem with the driver watching the executors. I think there was already a fallback mechanism there, but I never looked into the code for that one. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] JkSelf commented on pull request #29266: [SPARK-32464][SQL] Support skew handling on join that has one side wi…
JkSelf commented on pull request #29266: URL: https://github.com/apache/spark/pull/29266#issuecomment-668989319 Can you show the plan changes in UI? And whether introduced additional shuffle when change the partition num in bucket side or not? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #29137: [SPARK-32337][SQL] Show initial plan in AQE plan tree string
cloud-fan commented on pull request #29137: URL: https://github.com/apache/spark/pull/29137#issuecomment-668988346 ``` == Physical Plan == AdaptiveSparkPlan (9) +- == Current Plan == BroadcastHashJoin Inner BuildRight (8) :- Project (3) : +- Filter (2) +- == Initial Plan == BroadcastHashJoin Inner BuildRight (8) :- Project (3) : +- Filter (2) ``` Why it's `Current Plan` not `Final Plan` in `EXPLAIN FORMATTED`? And can we use an example that AQE changes the plan? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ScrapCodes commented on pull request #29334: [SPARK-32495][2.4] Update jackson versions to a maintained release, to fix various security vulnerabilities.
ScrapCodes commented on pull request #29334: URL: https://github.com/apache/spark/pull/29334#issuecomment-668988454 Alrighty, then I will skip this for 2.4.7 release, even though I still feel that this might be safe and good for people in general, provided jackson 2.6.7 had last release in 2017 (ignoring the micro CVE backport release in 2019). If we do need to revisit this at a later point, we can always take care of it in the next release. Thanks ! > Alright, one more [FasterXML/jackson-databind#2798](https://github.com/FasterXML/jackson-databind/issues/2798), Shall we consider 2.10.x ? change is the same and the later is free from whole store house of CVEs. We can ignore this, as it does not include the code paths(org.jsecurity and com.pastdev.httpcomponents) spark uses (hopefully ) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29353: [SPARK-32532][SQL] Improve ORC read/write performance on nested structs and array of structs
AmplabJenkins removed a comment on pull request #29353: URL: https://github.com/apache/spark/pull/29353#issuecomment-668986802 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29352: [SPARK-32531][SQL][TEST] Add benchmarks for nested structs and arrays for different file formats
AmplabJenkins removed a comment on pull request #29352: URL: https://github.com/apache/spark/pull/29352#issuecomment-668986647 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29353: [SPARK-32532][SQL] Improve ORC read/write performance on nested structs and array of structs
AmplabJenkins commented on pull request #29353: URL: https://github.com/apache/spark/pull/29353#issuecomment-668986802 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29352: [SPARK-32531][SQL][TEST] Add benchmarks for nested structs and arrays for different file formats
AmplabJenkins commented on pull request #29352: URL: https://github.com/apache/spark/pull/29352#issuecomment-668986647 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #29334: [SPARK-32495][2.4] Update jackson versions to a maintained release, to fix various security vulnerabilities.
HyukjinKwon commented on pull request #29334: URL: https://github.com/apache/spark/pull/29334#issuecomment-668986454 For the PR itself, I agree with @srowen's and @dongjoon-hyun comments at https://github.com/apache/spark/pull/29334#issuecomment-668044607 and https://github.com/apache/spark/pull/29334#issuecomment-668381120. Sorry for late responses. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29353: [SPARK-32532][SQL] Improve ORC read/write performance on nested structs and array of structs
SparkQA removed a comment on pull request #29353: URL: https://github.com/apache/spark/pull/29353#issuecomment-668914387 **[Test build #127071 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127071/testReport)** for PR 29353 at commit [`13af454`](https://github.com/apache/spark/commit/13af454cf4bb53a1c6b2c5f6aef19b6b92420e26). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29353: [SPARK-32532][SQL] Improve ORC read/write performance on nested structs and array of structs
SparkQA commented on pull request #29353: URL: https://github.com/apache/spark/pull/29353#issuecomment-668986200 **[Test build #127071 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127071/testReport)** for PR 29353 at commit [`13af454`](https://github.com/apache/spark/commit/13af454cf4bb53a1c6b2c5f6aef19b6b92420e26). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29352: [SPARK-32531][SQL][TEST] Add benchmarks for nested structs and arrays for different file formats
SparkQA removed a comment on pull request #29352: URL: https://github.com/apache/spark/pull/29352#issuecomment-668911605 **[Test build #127070 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127070/testReport)** for PR 29352 at commit [`b10f1fc`](https://github.com/apache/spark/commit/b10f1fc4e3b6ac7564df9ca1503cc30e43043223). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29352: [SPARK-32531][SQL][TEST] Add benchmarks for nested structs and arrays for different file formats
SparkQA commented on pull request #29352: URL: https://github.com/apache/spark/pull/29352#issuecomment-668985998 **[Test build #127070 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127070/testReport)** for PR 29352 at commit [`b10f1fc`](https://github.com/apache/spark/commit/b10f1fc4e3b6ac7564df9ca1503cc30e43043223). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `trait ArrayWithNestedStructTest extends ReadSchemaTest ` * `trait MapWithNestedStructTest extends ReadSchemaTest ` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #29334: [SPARK-32495][2.4] Update jackson versions to a maintained release, to fix various security vulnerabilities.
HyukjinKwon commented on pull request #29334: URL: https://github.com/apache/spark/pull/29334#issuecomment-668985839 Yes, I think it does. That was one of reasons why I was hesitant. FYI, there was a bit of discussions and updates about resources at SPARK-32264. Given that the PRs (to `branch-2.4` or `branch-3.0`) are not very frequent, I guess it might be okay ... This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29350: [SPARK-32529][CORE] Fix Historyserver log scan aborted by application status change
AmplabJenkins removed a comment on pull request #29350: URL: https://github.com/apache/spark/pull/29350#issuecomment-668985557 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29350: [SPARK-32529][CORE] Fix Historyserver log scan aborted by application status change
AmplabJenkins commented on pull request #29350: URL: https://github.com/apache/spark/pull/29350#issuecomment-668985557 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29350: [SPARK-32529][CORE] Fix Historyserver log scan aborted by application status change
SparkQA commented on pull request #29350: URL: https://github.com/apache/spark/pull/29350#issuecomment-668985252 **[Test build #127079 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127079/testReport)** for PR 29350 at commit [`d113709`](https://github.com/apache/spark/commit/d11370942478256158fc70b755ca536d95606a04). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on a change in pull request #29350: [SPARK-32529][CORE] Fix Historyserver log scan aborted by application status change
HeartSaVioR commented on a change in pull request #29350: URL: https://github.com/apache/spark/pull/29350#discussion_r465477234 ## File path: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ## @@ -530,10 +530,16 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock) // If the file is currently not being tracked by the SHS, add an entry for it and try // to parse it. This will allow the cleaner code to detect the file as stale later on // if it was not possible to parse it. - listing.write(LogInfo(reader.rootPath.toString(), newLastScanTime, LogType.EventLogs, -None, None, reader.fileSizeForLastIndex, reader.lastIndex, None, -reader.completed)) - reader.fileSizeForLastIndex > 0 + try { +listing.write(LogInfo(reader.rootPath.toString(), newLastScanTime, + LogType.EventLogs, None, None, reader.fileSizeForLastIndex, reader.lastIndex, + None, reader.completed)) +reader.fileSizeForLastIndex > 0 + } catch { +case _: FileNotFoundException => false + } +case _: FileNotFoundException => Review comment: nit: I'd have empty new line after `}`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29350: [SPARK-32529][CORE] Fix Historyserver log scan aborted by application status change
AmplabJenkins removed a comment on pull request #29350: URL: https://github.com/apache/spark/pull/29350#issuecomment-668691724 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28761: [SPARK-25557][SQL][test-hive1.2] Nested column predicate pushdown for ORC
AmplabJenkins removed a comment on pull request #28761: URL: https://github.com/apache/spark/pull/28761#issuecomment-668983580 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on pull request #29350: [SPARK-32529][CORE] Fix Historyserver log scan aborted by application status change
HeartSaVioR commented on pull request #29350: URL: https://github.com/apache/spark/pull/29350#issuecomment-668983630 ok to test This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ScrapCodes commented on pull request #29334: [SPARK-32495][2.4] Update jackson versions to a maintained release, to fix various security vulnerabilities.
ScrapCodes commented on pull request #29334: URL: https://github.com/apache/spark/pull/29334#issuecomment-668983574 @HyukjinKwon Thanks for looking in to it, and it is my mistake, I did not know that github actions are not ported to other branches yet. I am not 100% sure that they should be ported or not, does it effect our total resource capacity on github actions? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28761: [SPARK-25557][SQL][test-hive1.2] Nested column predicate pushdown for ORC
AmplabJenkins commented on pull request #28761: URL: https://github.com/apache/spark/pull/28761#issuecomment-668983580 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28761: [SPARK-25557][SQL][test-hive1.2] Nested column predicate pushdown for ORC
SparkQA commented on pull request #28761: URL: https://github.com/apache/spark/pull/28761#issuecomment-668983196 **[Test build #127078 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127078/testReport)** for PR 28761 at commit [`0747fcd`](https://github.com/apache/spark/commit/0747fcdef1ffdba5b8ce3cbafdc03cac3559f7d4). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on pull request #28761: [SPARK-25557][SQL][test-hive1.2] Nested column predicate pushdown for ORC
viirya commented on pull request #28761: URL: https://github.com/apache/spark/pull/28761#issuecomment-668982975 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #29334: [SPARK-32495][2.4] Update jackson versions to a maintained release, to fix various security vulnerabilities.
HyukjinKwon commented on pull request #29334: URL: https://github.com/apache/spark/pull/29334#issuecomment-668981971 @ScrapCodes, yes, the m2 is corrupted in Jenkins machine. In the master, this dependency check is being skipped in Jenkins and GitHub Actions build runs instead. In other branches, they are not done yet (see SPARK-32249). I was hesitant about porting GitHub actions back yet but maybe we should go and port it back. For the meanwhile, maybe we can try to remove `.m2` in Jenkins machine, @shaneknapp. This solves the problem temporarily IIRC. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29031: [SPARK-32216][SQL] Remove redundant ProjectExec
AmplabJenkins removed a comment on pull request #29031: URL: https://github.com/apache/spark/pull/29031#issuecomment-668981428 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29031: [SPARK-32216][SQL] Remove redundant ProjectExec
AmplabJenkins commented on pull request #29031: URL: https://github.com/apache/spark/pull/29031#issuecomment-668981428 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29031: [SPARK-32216][SQL] Remove redundant ProjectExec
SparkQA commented on pull request #29031: URL: https://github.com/apache/spark/pull/29031#issuecomment-668981110 **[Test build #127077 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127077/testReport)** for PR 29031 at commit [`4585a04`](https://github.com/apache/spark/commit/4585a04ef4548022865ea0978040d9dd56df8252). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #29031: [SPARK-32216][SQL] Remove redundant ProjectExec
cloud-fan commented on pull request #29031: URL: https://github.com/apache/spark/pull/29031#issuecomment-668980317 add to whitelist This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] allisonwang-db opened a new pull request #29031: [SPARK-32216][SQL] Remove redundant ProjectExec
allisonwang-db opened a new pull request #29031: URL: https://github.com/apache/spark/pull/29031 ### What changes were proposed in this pull request? This PR added a physical rule to remove redundant project nodes. A `ProjectExec` is redundant when 1. It has the same output attributes and order as its child's output when ordering of these attributes is required. 2. It has the same output attributes as its child's output when attribute output ordering is not required. For example: After Filter: ``` == Physical Plan == *(1) Project [a#14L, b#15L, c#16, key#17] +- *(1) Filter (isnotnull(a#14L) AND (a#14L > 5)) +- *(1) ColumnarToRow +- FileScan parquet [a#14L,b#15L,c#16,key#17] ``` The `Project a#14L, b#15L, c#16, key#17` is redundant because its output is exactly the same as filter's output. Before Aggregate: ``` == Physical Plan == *(2) HashAggregate(keys=[key#17], functions=[sum(a#14L), last(b#15L, false)], output=[sum_a#39L, key#17, last_b#41L]) +- Exchange hashpartitioning(key#17, 5), true, [id=#77] +- *(1) HashAggregate(keys=[key#17], functions=[partial_sum(a#14L), partial_last(b#15L, false)], output=[key#17, sum#49L, last#50L, valueSet#51]) +- *(1) Project [key#17, a#14L, b#15L] +- *(1) Filter (isnotnull(a#14L) AND (a#14L > 100)) +- *(1) ColumnarToRow +- FileScan parquet [a#14L,b#15L,key#17] ``` The `Project key#17, a#14L, b#15L` is redundant because hash aggregate doesn't require child plan's output to be in a specific order. ### Why are the changes needed? It removes unnecessary query nodes and makes query plan cleaner. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Unit tests This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #29031: [SPARK-32216][SQL] Remove redundant ProjectExec
cloud-fan commented on pull request #29031: URL: https://github.com/apache/spark/pull/29031#issuecomment-668980249 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan closed pull request #29031: [SPARK-32216][SQL] Remove redundant ProjectExec
cloud-fan closed pull request #29031: URL: https://github.com/apache/spark/pull/29031 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #29125: [SPARK-32018][SQL][3.0] UnsafeRow.setDecimal should set null with overflowed value
cloud-fan commented on pull request #29125: URL: https://github.com/apache/spark/pull/29125#issuecomment-668980063 @skambha you will still hit the sum bug when you disable whole-stage-codegen (or fallback to it due to generated code exceeds 64kb), right? We are not introducing a new correctness bug. It's an existing bug and the backport makes it more visible. We've added a mechanism in the master branch to check the streaming state store backward compatibility. If we want to backport the actual fix, we need to backport this mechanism as well. I think that's too many things to backport. How about this: we force to enable ANSI for decimal sum, so that the behavior is the same without fixing the UnsafeRow bug? It's not an ideal fix but should be safer to backport. @skambha what do you think? Can you help to do it? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ScrapCodes commented on pull request #29334: [WIP][RFC][SPARK-32495][2.4] Update jackson versions to a maintained release, to fix various security vulnerabilities.
ScrapCodes commented on pull request #29334: URL: https://github.com/apache/spark/pull/29334#issuecomment-668977421 Alright, one more https://github.com/FasterXML/jackson-databind/issues/2798, Shall we consider 2.10.x ? change is the same and the later is free from whole store house of CVEs. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ScrapCodes commented on pull request #29334: [WIP][RFC][SPARK-32495][2.4] Update jackson versions to a maintained release, to fix various security vulnerabilities.
ScrapCodes commented on pull request #29334: URL: https://github.com/apache/spark/pull/29334#issuecomment-668976387 @Fokko, @srowen and @dongjoon-hyun Thank for giving me the feedback. I agree, with you guys. But, I wanted to give this patch a try - can it be done in a clean way? This patch is almost the same as one mentioned at, https://github.com/apache/spark/pull/21596 Also, the behaviour change was due to a bug in jackson 2.7.x. I have tested, it worked fine. Somehow the Jenkins, does not pass due to corrupted m2 cache. Do you think, that it is worth a try? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk commented on pull request #29311: [SPARK-32501][SQL] Convert null to "null" in structs, maps and arrays while casting to strings
MaxGekk commented on pull request #29311: URL: https://github.com/apache/spark/pull/29311#issuecomment-668976181 @cloud-fan @maropu @HyukjinKwon Please, review this PR. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #29354: [WIP][Spark-32533][SQL] Improve Avro read/write performance on nested structs and array of structs
HyukjinKwon commented on pull request #29354: URL: https://github.com/apache/spark/pull/29354#issuecomment-668975373 It might be great if we can elabourate how it improves performance. We can focus on the fix only instead of mixing refactoring here. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #29354: [WIP][Spark-32533][SQL] Improve Avro read/write performance on nested structs and array of structs
HyukjinKwon commented on a change in pull request #29354: URL: https://github.com/apache/spark/pull/29354#discussion_r465466999 ## File path: external/avro/src/main/scala/org/apache/spark/sql/avro/AvroDeserializer.scala ## @@ -1,427 +0,0 @@ -/* Review comment: Hey, let's avoid renaming a file like this. It's difficult to track what was changed. Can we keep the class and file names? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #29353: [SPARK-32532][SQL] Improve ORC read/write performance on nested structs and array of structs
HyukjinKwon commented on a change in pull request #29353: URL: https://github.com/apache/spark/pull/29353#discussion_r465466392 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcDeserializer.scala ## @@ -72,137 +74,191 @@ class OrcDeserializer( /** * Creates a writer to write ORC values to Catalyst data structure at the given ordinal. */ - private def newWriter( - dataType: DataType, updater: CatalystDataUpdater): (Int, WritableComparable[_]) => Unit = -dataType match { - case NullType => (ordinal, _) => -updater.setNullAt(ordinal) - - case BooleanType => (ordinal, value) => -updater.setBoolean(ordinal, value.asInstanceOf[BooleanWritable].get) - - case ByteType => (ordinal, value) => -updater.setByte(ordinal, value.asInstanceOf[ByteWritable].get) - - case ShortType => (ordinal, value) => -updater.setShort(ordinal, value.asInstanceOf[ShortWritable].get) - - case IntegerType => (ordinal, value) => -updater.setInt(ordinal, value.asInstanceOf[IntWritable].get) - - case LongType => (ordinal, value) => -updater.setLong(ordinal, value.asInstanceOf[LongWritable].get) - - case FloatType => (ordinal, value) => -updater.setFloat(ordinal, value.asInstanceOf[FloatWritable].get) - - case DoubleType => (ordinal, value) => -updater.setDouble(ordinal, value.asInstanceOf[DoubleWritable].get) - - case StringType => (ordinal, value) => -updater.set(ordinal, UTF8String.fromBytes(value.asInstanceOf[Text].copyBytes)) - - case BinaryType => (ordinal, value) => -val binary = value.asInstanceOf[BytesWritable] -val bytes = new Array[Byte](binary.getLength) -System.arraycopy(binary.getBytes, 0, bytes, 0, binary.getLength) -updater.set(ordinal, bytes) - - case DateType => (ordinal, value) => -updater.setInt(ordinal, OrcShimUtils.getGregorianDays(value)) - - case TimestampType => (ordinal, value) => -updater.setLong(ordinal, DateTimeUtils.fromJavaTimestamp(value.asInstanceOf[OrcTimestamp])) - - case DecimalType.Fixed(precision, scale) => (ordinal, value) => -val v = OrcShimUtils.getDecimal(value) -v.changePrecision(precision, scale) -updater.set(ordinal, v) - - case st: StructType => (ordinal, value) => -val result = new SpecificInternalRow(st) -val fieldUpdater = new RowUpdater(result) -val fieldConverters = st.map(_.dataType).map { dt => - newWriter(dt, fieldUpdater) -}.toArray -val orcStruct = value.asInstanceOf[OrcStruct] - -var i = 0 -while (i < st.length) { - val value = orcStruct.getFieldValue(i) - if (value == null) { -result.setNullAt(i) - } else { -fieldConverters(i)(i, value) + private def newWriter(dataType: DataType, reuseObj: Boolean): + (CatalystDataUpdater, Int, WritableComparable[_]) => Unit = dataType match { +case NullType => (updater, ordinal, _) => + updater.setNullAt(ordinal) + +case BooleanType => (updater, ordinal, value) => + updater.setBoolean(ordinal, value.asInstanceOf[BooleanWritable].get) + +case ByteType => (updater, ordinal, value) => + updater.setByte(ordinal, value.asInstanceOf[ByteWritable].get) + +case ShortType => (updater, ordinal, value) => + updater.setShort(ordinal, value.asInstanceOf[ShortWritable].get) + +case IntegerType => (updater, ordinal, value) => + updater.setInt(ordinal, value.asInstanceOf[IntWritable].get) + +case LongType => (updater, ordinal, value) => + updater.setLong(ordinal, value.asInstanceOf[LongWritable].get) + +case FloatType => (updater, ordinal, value) => + updater.setFloat(ordinal, value.asInstanceOf[FloatWritable].get) + +case DoubleType => (updater, ordinal, value) => + updater.setDouble(ordinal, value.asInstanceOf[DoubleWritable].get) + +case StringType => (updater, ordinal, value) => + updater.set(ordinal, UTF8String.fromBytes(value.asInstanceOf[Text].copyBytes)) + +case BinaryType => (updater, ordinal, value) => + val binary = value.asInstanceOf[BytesWritable] + val bytes = new Array[Byte](binary.getLength) + System.arraycopy(binary.getBytes, 0, bytes, 0, binary.getLength) + updater.set(ordinal, bytes) + +case DateType => (updater, ordinal, value) => + updater.setInt(ordinal, OrcShimUtils.getGregorianDays(value)) + +case TimestampType => (updater, ordinal, value) => + updater.setLong(ordinal, DateTimeUtils.fromJavaTimestamp(value.asInstanceOf[OrcTimestamp])) + +case DecimalType.Fixed(precision, scale) => (updater, ordinal, value) => + val v = OrcShimUtils.getDecimal(value) + v.changePrecision(precision, scale) + updater.set(ordinal, v) + +case st: StructType => + val createRow: () => InternalRow = getRowCrea
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29211: [SPARK-31197][CORE] Shutdown executor once we are done decommissioning
AmplabJenkins removed a comment on pull request #29211: URL: https://github.com/apache/spark/pull/29211#issuecomment-668973946 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29211: [SPARK-31197][CORE] Shutdown executor once we are done decommissioning
AmplabJenkins commented on pull request #29211: URL: https://github.com/apache/spark/pull/29211#issuecomment-668973946 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29211: [SPARK-31197][CORE] Shutdown executor once we are done decommissioning
SparkQA commented on pull request #29211: URL: https://github.com/apache/spark/pull/29211#issuecomment-668973694 **[Test build #127076 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127076/testReport)** for PR 29211 at commit [`477e77f`](https://github.com/apache/spark/commit/477e77f5ad3755af81e5c0304eeb37ce273a18d4). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29339: [Spark-32512][SQL] add alter table add/drop partition command for datasourcev2
SparkQA commented on pull request #29339: URL: https://github.com/apache/spark/pull/29339#issuecomment-668973681 **[Test build #127075 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127075/testReport)** for PR 29339 at commit [`b1fc84b`](https://github.com/apache/spark/commit/b1fc84bfd33fadbeccf118a5db2e752e241947dc). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29339: [Spark-32512][SQL] add alter table add/drop partition command for datasourcev2
AmplabJenkins removed a comment on pull request #29339: URL: https://github.com/apache/spark/pull/29339#issuecomment-668972189 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29339: [Spark-32512][SQL] add alter table add/drop partition command for datasourcev2
AmplabJenkins commented on pull request #29339: URL: https://github.com/apache/spark/pull/29339#issuecomment-668972189 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #29353: [SPARK-32532][SQL] Improve ORC read/write performance on nested structs and array of structs
HyukjinKwon commented on a change in pull request #29353: URL: https://github.com/apache/spark/pull/29353#discussion_r465460837 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcDeserializer.scala ## @@ -72,137 +74,191 @@ class OrcDeserializer( /** * Creates a writer to write ORC values to Catalyst data structure at the given ordinal. */ - private def newWriter( - dataType: DataType, updater: CatalystDataUpdater): (Int, WritableComparable[_]) => Unit = -dataType match { - case NullType => (ordinal, _) => -updater.setNullAt(ordinal) - - case BooleanType => (ordinal, value) => -updater.setBoolean(ordinal, value.asInstanceOf[BooleanWritable].get) - - case ByteType => (ordinal, value) => -updater.setByte(ordinal, value.asInstanceOf[ByteWritable].get) - - case ShortType => (ordinal, value) => -updater.setShort(ordinal, value.asInstanceOf[ShortWritable].get) - - case IntegerType => (ordinal, value) => -updater.setInt(ordinal, value.asInstanceOf[IntWritable].get) - - case LongType => (ordinal, value) => -updater.setLong(ordinal, value.asInstanceOf[LongWritable].get) - - case FloatType => (ordinal, value) => -updater.setFloat(ordinal, value.asInstanceOf[FloatWritable].get) - - case DoubleType => (ordinal, value) => -updater.setDouble(ordinal, value.asInstanceOf[DoubleWritable].get) - - case StringType => (ordinal, value) => -updater.set(ordinal, UTF8String.fromBytes(value.asInstanceOf[Text].copyBytes)) - - case BinaryType => (ordinal, value) => -val binary = value.asInstanceOf[BytesWritable] -val bytes = new Array[Byte](binary.getLength) -System.arraycopy(binary.getBytes, 0, bytes, 0, binary.getLength) -updater.set(ordinal, bytes) - - case DateType => (ordinal, value) => -updater.setInt(ordinal, OrcShimUtils.getGregorianDays(value)) - - case TimestampType => (ordinal, value) => -updater.setLong(ordinal, DateTimeUtils.fromJavaTimestamp(value.asInstanceOf[OrcTimestamp])) - - case DecimalType.Fixed(precision, scale) => (ordinal, value) => -val v = OrcShimUtils.getDecimal(value) -v.changePrecision(precision, scale) -updater.set(ordinal, v) - - case st: StructType => (ordinal, value) => -val result = new SpecificInternalRow(st) -val fieldUpdater = new RowUpdater(result) -val fieldConverters = st.map(_.dataType).map { dt => - newWriter(dt, fieldUpdater) -}.toArray -val orcStruct = value.asInstanceOf[OrcStruct] - -var i = 0 -while (i < st.length) { - val value = orcStruct.getFieldValue(i) - if (value == null) { -result.setNullAt(i) - } else { -fieldConverters(i)(i, value) + private def newWriter(dataType: DataType, reuseObj: Boolean): + (CatalystDataUpdater, Int, WritableComparable[_]) => Unit = dataType match { Review comment: I would keep this style as: ```scala private def newWriter( dataType: DataType, reuseObj: Boolean) : (CatalystDataUpdater, Int, WritableComparable[_]) => Unit = dataType match { case NullType => (updater, ordinal, _) => ... } ``` to reduce the diff. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #29353: [SPARK-32532][SQL] Improve ORC read/write performance on nested structs and array of structs
HyukjinKwon commented on a change in pull request #29353: URL: https://github.com/apache/spark/pull/29353#discussion_r465460837 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcDeserializer.scala ## @@ -72,137 +74,191 @@ class OrcDeserializer( /** * Creates a writer to write ORC values to Catalyst data structure at the given ordinal. */ - private def newWriter( - dataType: DataType, updater: CatalystDataUpdater): (Int, WritableComparable[_]) => Unit = -dataType match { - case NullType => (ordinal, _) => -updater.setNullAt(ordinal) - - case BooleanType => (ordinal, value) => -updater.setBoolean(ordinal, value.asInstanceOf[BooleanWritable].get) - - case ByteType => (ordinal, value) => -updater.setByte(ordinal, value.asInstanceOf[ByteWritable].get) - - case ShortType => (ordinal, value) => -updater.setShort(ordinal, value.asInstanceOf[ShortWritable].get) - - case IntegerType => (ordinal, value) => -updater.setInt(ordinal, value.asInstanceOf[IntWritable].get) - - case LongType => (ordinal, value) => -updater.setLong(ordinal, value.asInstanceOf[LongWritable].get) - - case FloatType => (ordinal, value) => -updater.setFloat(ordinal, value.asInstanceOf[FloatWritable].get) - - case DoubleType => (ordinal, value) => -updater.setDouble(ordinal, value.asInstanceOf[DoubleWritable].get) - - case StringType => (ordinal, value) => -updater.set(ordinal, UTF8String.fromBytes(value.asInstanceOf[Text].copyBytes)) - - case BinaryType => (ordinal, value) => -val binary = value.asInstanceOf[BytesWritable] -val bytes = new Array[Byte](binary.getLength) -System.arraycopy(binary.getBytes, 0, bytes, 0, binary.getLength) -updater.set(ordinal, bytes) - - case DateType => (ordinal, value) => -updater.setInt(ordinal, OrcShimUtils.getGregorianDays(value)) - - case TimestampType => (ordinal, value) => -updater.setLong(ordinal, DateTimeUtils.fromJavaTimestamp(value.asInstanceOf[OrcTimestamp])) - - case DecimalType.Fixed(precision, scale) => (ordinal, value) => -val v = OrcShimUtils.getDecimal(value) -v.changePrecision(precision, scale) -updater.set(ordinal, v) - - case st: StructType => (ordinal, value) => -val result = new SpecificInternalRow(st) -val fieldUpdater = new RowUpdater(result) -val fieldConverters = st.map(_.dataType).map { dt => - newWriter(dt, fieldUpdater) -}.toArray -val orcStruct = value.asInstanceOf[OrcStruct] - -var i = 0 -while (i < st.length) { - val value = orcStruct.getFieldValue(i) - if (value == null) { -result.setNullAt(i) - } else { -fieldConverters(i)(i, value) + private def newWriter(dataType: DataType, reuseObj: Boolean): + (CatalystDataUpdater, Int, WritableComparable[_]) => Unit = dataType match { Review comment: I would keep this style as: ```scala private def newWriter( dataType: DataType, reuseObj: Boolean) : (CatalystDataUpdater, Int, WritableComparable[_]) => Unit = dataType match { case NullType => (updater, ordinal, _) => ... } ``` to reduce the diff. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29333: [WIP][SPARK-32357][INFRA] Publish failed and succeeded test reports in GitHub Actions
AmplabJenkins removed a comment on pull request #29333: URL: https://github.com/apache/spark/pull/29333#issuecomment-668968321 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/127073/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29333: [WIP][SPARK-32357][INFRA] Publish failed and succeeded test reports in GitHub Actions
AmplabJenkins removed a comment on pull request #29333: URL: https://github.com/apache/spark/pull/29333#issuecomment-668968315 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29333: [WIP][SPARK-32357][INFRA] Publish failed and succeeded test reports in GitHub Actions
AmplabJenkins commented on pull request #29333: URL: https://github.com/apache/spark/pull/29333#issuecomment-668968315 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org