[GitHub] AmplabJenkins removed a comment on issue #23392: [SPARK-26450][SQL] Avoid rebuilding map of schema for every column in projection
AmplabJenkins removed a comment on issue #23392: [SPARK-26450][SQL] Avoid rebuilding map of schema for every column in projection URL: https://github.com/apache/spark/pull/23392#issuecomment-450802529 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23392: [SPARK-26450][SQL] Avoid rebuilding map of schema for every column in projection
AmplabJenkins removed a comment on issue #23392: [SPARK-26450][SQL] Avoid rebuilding map of schema for every column in projection URL: https://github.com/apache/spark/pull/23392#issuecomment-450802533 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/100632/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23392: [SPARK-26450][SQL] Avoid rebuilding map of schema for every column in projection
AmplabJenkins commented on issue #23392: [SPARK-26450][SQL] Avoid rebuilding map of schema for every column in projection URL: https://github.com/apache/spark/pull/23392#issuecomment-450802533 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/100632/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23392: [SPARK-26450][SQL] Avoid rebuilding map of schema for every column in projection
AmplabJenkins commented on issue #23392: [SPARK-26450][SQL] Avoid rebuilding map of schema for every column in projection URL: https://github.com/apache/spark/pull/23392#issuecomment-450802529 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA removed a comment on issue #23392: [SPARK-26450][SQL] Avoid rebuilding map of schema for every column in projection
SparkQA removed a comment on issue #23392: [SPARK-26450][SQL] Avoid rebuilding map of schema for every column in projection URL: https://github.com/apache/spark/pull/23392#issuecomment-450784176 **[Test build #100632 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/100632/testReport)** for PR 23392 at commit [`a25b59c`](https://github.com/apache/spark/commit/a25b59ca756958370dd7ba14d6c1e33dec424ea8). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA commented on issue #23392: [SPARK-26450][SQL] Avoid rebuilding map of schema for every column in projection
SparkQA commented on issue #23392: [SPARK-26450][SQL] Avoid rebuilding map of schema for every column in projection URL: https://github.com/apache/spark/pull/23392#issuecomment-450802341 **[Test build #100632 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/100632/testReport)** for PR 23392 at commit [`a25b59c`](https://github.com/apache/spark/commit/a25b59ca756958370dd7ba14d6c1e33dec424ea8). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23349: [SPARK-26403][SQL] Support pivoting using array column for `pivot(column)` API
AmplabJenkins removed a comment on issue #23349: [SPARK-26403][SQL] Support pivoting using array column for `pivot(column)` API URL: https://github.com/apache/spark/pull/23349#issuecomment-450800928 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23349: [SPARK-26403][SQL] Support pivoting using array column for `pivot(column)` API
AmplabJenkins removed a comment on issue #23349: [SPARK-26403][SQL] Support pivoting using array column for `pivot(column)` API URL: https://github.com/apache/spark/pull/23349#issuecomment-450800930 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/100631/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23349: [SPARK-26403][SQL] Support pivoting using array column for `pivot(column)` API
AmplabJenkins commented on issue #23349: [SPARK-26403][SQL] Support pivoting using array column for `pivot(column)` API URL: https://github.com/apache/spark/pull/23349#issuecomment-450800928 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23349: [SPARK-26403][SQL] Support pivoting using array column for `pivot(column)` API
AmplabJenkins commented on issue #23349: [SPARK-26403][SQL] Support pivoting using array column for `pivot(column)` API URL: https://github.com/apache/spark/pull/23349#issuecomment-450800930 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/100631/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA removed a comment on issue #23349: [SPARK-26403][SQL] Support pivoting using array column for `pivot(column)` API
SparkQA removed a comment on issue #23349: [SPARK-26403][SQL] Support pivoting using array column for `pivot(column)` API URL: https://github.com/apache/spark/pull/23349#issuecomment-450782084 **[Test build #100631 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/100631/testReport)** for PR 23349 at commit [`d91ade6`](https://github.com/apache/spark/commit/d91ade60e14dbb7327351de5c59f50ba7d66e26a). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA commented on issue #23349: [SPARK-26403][SQL] Support pivoting using array column for `pivot(column)` API
SparkQA commented on issue #23349: [SPARK-26403][SQL] Support pivoting using array column for `pivot(column)` API URL: https://github.com/apache/spark/pull/23349#issuecomment-450800738 **[Test build #100631 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/100631/testReport)** for PR 23349 at commit [`d91ade6`](https://github.com/apache/spark/commit/d91ade60e14dbb7327351de5c59f50ba7d66e26a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] gatorsmile commented on a change in pull request #21826: [SPARK-24872] Replace the symbol '||' of Or operator with 'or'
gatorsmile commented on a change in pull request #21826: [SPARK-24872] Replace the symbol '||' of Or operator with 'or' URL: https://github.com/apache/spark/pull/21826#discussion_r244672255 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala ## @@ -442,7 +442,7 @@ case class Or(left: Expression, right: Expression) extends BinaryOperator with P override def inputType: AbstractDataType = BooleanType - override def symbol: String = "||" + override def symbol: String = "or" Review comment: So far, yes. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] gatorsmile commented on a change in pull request #23391: [SPARK-26456][SQL] Cast date/timestamp to string by Date/TimestampFormatter
gatorsmile commented on a change in pull request #23391: [SPARK-26456][SQL] Cast date/timestamp to string by Date/TimestampFormatter URL: https://github.com/apache/spark/pull/23391#discussion_r244671504 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala ## @@ -230,7 +235,7 @@ object PartitioningUtils { // Once we get the string, we try to parse it and find the partition column and value. val maybeColumn = parsePartitionColumn(currentPath.getName, typeInference, userSpecifiedDataTypes, -validatePartitionColumns, timeZone) +validatePartitionColumns, timeZone, dateFormatter, timestampFormatter) Review comment: When the partition/bucket column is Date type, our parquet writer convert the date value to a string and record it as the directory name. Is that possible that our Spark could return a wrong result? For example, - Join two partitioned tables using the partitioned key column whose data type is Date. - One table is wrote by Spark 2.4 or prior, and another table is wrote by Spark 3.0 or later. - The partition columns contain the value before October 1582. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] gatorsmile commented on a change in pull request #23391: [SPARK-26456][SQL] Cast date/timestamp to string by Date/TimestampFormatter
gatorsmile commented on a change in pull request #23391: [SPARK-26456][SQL] Cast date/timestamp to string by Date/TimestampFormatter URL: https://github.com/apache/spark/pull/23391#discussion_r244671504 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala ## @@ -230,7 +235,7 @@ object PartitioningUtils { // Once we get the string, we try to parse it and find the partition column and value. val maybeColumn = parsePartitionColumn(currentPath.getName, typeInference, userSpecifiedDataTypes, -validatePartitionColumns, timeZone) +validatePartitionColumns, timeZone, dateFormatter, timestampFormatter) Review comment: When the partition column is Date type, our parquet writer stores the date value as the directory name. Is that possible that our Spark could return a wrong result? For example, - Join two partitioned tables using the partitioned key column whose data type is Date. - One table is wrote by Spark 2.4 or prior, and another table is wrote by Spark 3.0 or later. - The partition columns contain the value before October 1582. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] felixcheung commented on a change in pull request #23424: [SPARK-24421][CORE][FOLLOWUP] Use normal direct ByteBuffer allocation if Cleaner can't be set
felixcheung commented on a change in pull request #23424: [SPARK-24421][CORE][FOLLOWUP] Use normal direct ByteBuffer allocation if Cleaner can't be set URL: https://github.com/apache/spark/pull/23424#discussion_r244670802 ## File path: common/unsafe/src/main/java/org/apache/spark/unsafe/Platform.java ## @@ -209,21 +209,25 @@ public static long reallocateMemory(long address, long oldSize, long newSize) { } /** - * Uses internal JDK APIs to allocate a DirectByteBuffer while ignoring the JVM's - * MaxDirectMemorySize limit (the default limit is too low and we do not want to require users - * to increase it). + * Allocate a DirectByteBuffer, potentially bypassing the JVM's MaxDirectMemorySize limit. */ public static ByteBuffer allocateDirectBuffer(int size) { try { + if (CLEANER_CREATE_METHOD == null) { +// Can't set a Cleaner (see comments on field), so need to allocate via normal Java APIs +return ByteBuffer.allocateDirect(size); Review comment: try /catch OOM and log a message on setting MaxDirectMemorySize? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] wangyum commented on a change in pull request #22999: [SPARK-20319][SQL] Already quoted identifiers are getting wrapped with additional quotes
wangyum commented on a change in pull request #22999: [SPARK-20319][SQL] Already quoted identifiers are getting wrapped with additional quotes URL: https://github.com/apache/spark/pull/22999#discussion_r244670287 ## File path: sql/core/src/main/scala/org/apache/spark/sql/jdbc/OracleDialect.scala ## @@ -81,6 +81,10 @@ private case object OracleDialect extends JdbcDialect { case _ => None } + override def quoteIdentifier(colName: String): String = { +s${colName.stripPrefix("\"").stripSuffix("\"")} Review comment: If so, we need verify both `getInsertStatement` and `createTable`? https://github.com/apache/spark/blob/5f0ddd2d6e2fdebf549207bbc4b13ca709eee3c4/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala#L729-L746 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23388: [SPARK-26448][SQL] retain the difference between 0.0 and -0.0
AmplabJenkins removed a comment on issue #23388: [SPARK-26448][SQL] retain the difference between 0.0 and -0.0 URL: https://github.com/apache/spark/pull/23388#issuecomment-450796525 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23388: [SPARK-26448][SQL] retain the difference between 0.0 and -0.0
AmplabJenkins commented on issue #23388: [SPARK-26448][SQL] retain the difference between 0.0 and -0.0 URL: https://github.com/apache/spark/pull/23388#issuecomment-450796525 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23388: [SPARK-26448][SQL] retain the difference between 0.0 and -0.0
AmplabJenkins removed a comment on issue #23388: [SPARK-26448][SQL] retain the difference between 0.0 and -0.0 URL: https://github.com/apache/spark/pull/23388#issuecomment-450796528 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/100630/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23388: [SPARK-26448][SQL] retain the difference between 0.0 and -0.0
AmplabJenkins commented on issue #23388: [SPARK-26448][SQL] retain the difference between 0.0 and -0.0 URL: https://github.com/apache/spark/pull/23388#issuecomment-450796528 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/100630/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA removed a comment on issue #23388: [SPARK-26448][SQL] retain the difference between 0.0 and -0.0
SparkQA removed a comment on issue #23388: [SPARK-26448][SQL] retain the difference between 0.0 and -0.0 URL: https://github.com/apache/spark/pull/23388#issuecomment-450778855 **[Test build #100630 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/100630/testReport)** for PR 23388 at commit [`c228ad9`](https://github.com/apache/spark/commit/c228ad97fcbed7e93940d120f177817f7ad55c27). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA commented on issue #23388: [SPARK-26448][SQL] retain the difference between 0.0 and -0.0
SparkQA commented on issue #23388: [SPARK-26448][SQL] retain the difference between 0.0 and -0.0 URL: https://github.com/apache/spark/pull/23388#issuecomment-450796348 **[Test build #100630 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/100630/testReport)** for PR 23388 at commit [`c228ad9`](https://github.com/apache/spark/commit/c228ad97fcbed7e93940d120f177817f7ad55c27). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23419: [SPARK-26507][CORE] Fix core tests for Java 11
AmplabJenkins removed a comment on issue #23419: [SPARK-26507][CORE] Fix core tests for Java 11 URL: https://github.com/apache/spark/pull/23419#issuecomment-450795529 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23419: [SPARK-26507][CORE] Fix core tests for Java 11
AmplabJenkins commented on issue #23419: [SPARK-26507][CORE] Fix core tests for Java 11 URL: https://github.com/apache/spark/pull/23419#issuecomment-450795530 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/100629/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23419: [SPARK-26507][CORE] Fix core tests for Java 11
AmplabJenkins removed a comment on issue #23419: [SPARK-26507][CORE] Fix core tests for Java 11 URL: https://github.com/apache/spark/pull/23419#issuecomment-450795530 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/100629/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23419: [SPARK-26507][CORE] Fix core tests for Java 11
AmplabJenkins commented on issue #23419: [SPARK-26507][CORE] Fix core tests for Java 11 URL: https://github.com/apache/spark/pull/23419#issuecomment-450795529 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA removed a comment on issue #23419: [SPARK-26507][CORE] Fix core tests for Java 11
SparkQA removed a comment on issue #23419: [SPARK-26507][CORE] Fix core tests for Java 11 URL: https://github.com/apache/spark/pull/23419#issuecomment-450777104 **[Test build #100629 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/100629/testReport)** for PR 23419 at commit [`e4551bd`](https://github.com/apache/spark/commit/e4551bd63bba3578824f235f37cf8aded490805f). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA commented on issue #23419: [SPARK-26507][CORE] Fix core tests for Java 11
SparkQA commented on issue #23419: [SPARK-26507][CORE] Fix core tests for Java 11 URL: https://github.com/apache/spark/pull/23419#issuecomment-450795355 **[Test build #100629 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/100629/testReport)** for PR 23419 at commit [`e4551bd`](https://github.com/apache/spark/commit/e4551bd63bba3578824f235f37cf8aded490805f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] HyukjinKwon commented on a change in pull request #23417: [SPARK-26374][TEST][SQL] Enable TimestampFormatter in HadoopFsRelationTest
HyukjinKwon commented on a change in pull request #23417: [SPARK-26374][TEST][SQL] Enable TimestampFormatter in HadoopFsRelationTest URL: https://github.com/apache/spark/pull/23417#discussion_r244668273 ## File path: sql/hive/src/test/scala/org/apache/spark/sql/sources/HadoopFsRelationTest.scala ## @@ -138,7 +137,8 @@ abstract class HadoopFsRelationTest extends QueryTest with SQLTestUtils with Tes logInfo(s"Testing $dataType data type$extraMessage") val extraOptions = Map[String, String]( -"parquet.enable.dictionary" -> parquetDictionaryEncodingEnabled.toString +"parquet.enable.dictionary" -> parquetDictionaryEncodingEnabled.toString, +"timestampFormat" -> "-MM-dd'T'HH:mm:ss.SSSX" Review comment: Similar question was raised at https://github.com/apache/spark/pull/23417#discussion_r244549254. Looks this is going to be investigated soon separately. It's going to at least introduce some behaviour changes: ```scala scala> val fomatter = new org.apache.spark.sql.catalyst.util.Iso8601TimestampFormatter("-MM-dd'T'HH:mm:ss.SSSXXX", java.util.TimeZone.getDefault(), java.util.Locale.US) fomatter: org.apache.spark.sql.catalyst.util.Iso8601TimestampFormatter = org.apache.spark.sql.catalyst.util.Iso8601TimestampFormatter@2df019b8 scala> fomatter.format(fomatter.parse("0015-03-10T08:53:43.591+07:30")) res0: String = 0015-03-10T08:19:08.591+06:55 scala> val fomatter = new org.apache.spark.sql.catalyst.util.Iso8601TimestampFormatter("-MM-dd'T'HH:mm:ss.SSSX", java.util.TimeZone.getDefault(), java.util.Locale.US) fomatter: org.apache.spark.sql.catalyst.util.Iso8601TimestampFormatter = org.apache.spark.sql.catalyst.util.Iso8601TimestampFormatter@763ff91 scala> fomatter.format(fomatter.parse("0015-03-10T08:53:43.591+07:30")) res1: String = 0015-03-10T08:19:08.591+06:55:25 ``` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] cloud-fan commented on a change in pull request #23417: [SPARK-26374][TEST][SQL] Enable TimestampFormatter in HadoopFsRelationTest
cloud-fan commented on a change in pull request #23417: [SPARK-26374][TEST][SQL] Enable TimestampFormatter in HadoopFsRelationTest URL: https://github.com/apache/spark/pull/23417#discussion_r244665820 ## File path: sql/hive/src/test/scala/org/apache/spark/sql/sources/HadoopFsRelationTest.scala ## @@ -138,7 +137,8 @@ abstract class HadoopFsRelationTest extends QueryTest with SQLTestUtils with Tes logInfo(s"Testing $dataType data type$extraMessage") val extraOptions = Map[String, String]( -"parquet.enable.dictionary" -> parquetDictionaryEncodingEnabled.toString +"parquet.enable.dictionary" -> parquetDictionaryEncodingEnabled.toString, +"timestampFormat" -> "-MM-dd'T'HH:mm:ss.SSSX" Review comment: with the new parser and the default timestamp format, spark can't write and read back timestamp data before 1582? what's the consequence if we make this the default format? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] srowen commented on issue #23422: [SPARK-26514][CORE] Support running multi tasks per cpu core
srowen commented on issue #23422: [SPARK-26514][CORE] Support running multi tasks per cpu core URL: https://github.com/apache/spark/pull/23422#issuecomment-450786857 I don't think we can do this. First, the new config name is pretty confusing; I understand you're reversing the order of cpu and tasks but it really is just going to confuse people. This doesn't resolve what happens if both are set. If anything, it's more reasonable to let spark.task.cpus take on fractional values. Or just let the resource manager over-commit cores for your machines. Let it say there are 96 cores on a 64 core machine, and let Spark use them as usual. This was possible on YARN, but I am actually not sure about other resource managers. What's the use case? this and the JIRA don't give any argument for it. An I/O-bound job that can nevertheless do more I/O if it's parallelized further? You can just increase the parallelism already without this change; it'll cause you to use more executor slots than otherwise, but, those won't matter unless the use case is also that there are other concurrent Spark jobs that could use the slots. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #22141: [SPARK-25154][SQL] Support NOT IN sub-queries inside nested OR conditions.
AmplabJenkins removed a comment on issue #22141: [SPARK-25154][SQL] Support NOT IN sub-queries inside nested OR conditions. URL: https://github.com/apache/spark/pull/22141#issuecomment-450786776 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #22141: [SPARK-25154][SQL] Support NOT IN sub-queries inside nested OR conditions.
AmplabJenkins removed a comment on issue #22141: [SPARK-25154][SQL] Support NOT IN sub-queries inside nested OR conditions. URL: https://github.com/apache/spark/pull/22141#issuecomment-450786780 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/6553/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #22141: [SPARK-25154][SQL] Support NOT IN sub-queries inside nested OR conditions.
AmplabJenkins commented on issue #22141: [SPARK-25154][SQL] Support NOT IN sub-queries inside nested OR conditions. URL: https://github.com/apache/spark/pull/22141#issuecomment-450786780 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/6553/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #22141: [SPARK-25154][SQL] Support NOT IN sub-queries inside nested OR conditions.
AmplabJenkins commented on issue #22141: [SPARK-25154][SQL] Support NOT IN sub-queries inside nested OR conditions. URL: https://github.com/apache/spark/pull/22141#issuecomment-450786776 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23425: [SPARK-26306][TEST][BUILD] More memory to de-flake SorterSuite
AmplabJenkins removed a comment on issue #23425: [SPARK-26306][TEST][BUILD] More memory to de-flake SorterSuite URL: https://github.com/apache/spark/pull/23425#issuecomment-450786442 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23425: [SPARK-26306][TEST][BUILD] More memory to de-flake SorterSuite
AmplabJenkins removed a comment on issue #23425: [SPARK-26306][TEST][BUILD] More memory to de-flake SorterSuite URL: https://github.com/apache/spark/pull/23425#issuecomment-450786443 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/6552/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23425: [SPARK-26306][TEST][BUILD] More memory to de-flake SorterSuite
AmplabJenkins commented on issue #23425: [SPARK-26306][TEST][BUILD] More memory to de-flake SorterSuite URL: https://github.com/apache/spark/pull/23425#issuecomment-450786442 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23425: [SPARK-26306][TEST][BUILD] More memory to de-flake SorterSuite
AmplabJenkins commented on issue #23425: [SPARK-26306][TEST][BUILD] More memory to de-flake SorterSuite URL: https://github.com/apache/spark/pull/23425#issuecomment-450786443 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/6552/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] srowen closed pull request #23420: [SPARK-26508][Core][SQL] Address warning messages in Java reported at lgtm.com
srowen closed pull request #23420: [SPARK-26508][Core][SQL] Address warning messages in Java reported at lgtm.com URL: https://github.com/apache/spark/pull/23420 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/common/network-common/src/main/java/org/apache/spark/network/util/ByteUnit.java b/common/network-common/src/main/java/org/apache/spark/network/util/ByteUnit.java index 984575acaf511..6f7925c26094d 100644 --- a/common/network-common/src/main/java/org/apache/spark/network/util/ByteUnit.java +++ b/common/network-common/src/main/java/org/apache/spark/network/util/ByteUnit.java @@ -18,11 +18,11 @@ public enum ByteUnit { BYTE(1), - KiB(1024L), - MiB((long) Math.pow(1024L, 2L)), - GiB((long) Math.pow(1024L, 3L)), - TiB((long) Math.pow(1024L, 4L)), - PiB((long) Math.pow(1024L, 5L)); + KiB(1L << 10), + MiB(1L << 20), + GiB(1L << 30), + TiB(1L << 40), + PiB(1L << 50); ByteUnit(long multiplier) { this.multiplier = multiplier; @@ -50,7 +50,7 @@ public long convertTo(long d, ByteUnit u) { } } - public double toBytes(long d) { + public long toBytes(long d) { if (d < 0) { throw new IllegalArgumentException("Negative size value. Size must be positive: " + d); } diff --git a/common/network-common/src/main/java/org/apache/spark/network/util/TransportConf.java b/common/network-common/src/main/java/org/apache/spark/network/util/TransportConf.java index 43a6bc7dc3d06..201628b04fbef 100644 --- a/common/network-common/src/main/java/org/apache/spark/network/util/TransportConf.java +++ b/common/network-common/src/main/java/org/apache/spark/network/util/TransportConf.java @@ -309,8 +309,8 @@ public int chunkFetchHandlerThreads() { } int chunkFetchHandlerThreadsPercent = conf.getInt("spark.shuffle.server.chunkFetchHandlerThreadsPercent", 100); -return (int)Math.ceil( - (this.serverThreads() > 0 ? this.serverThreads() : 2 * NettyRuntime.availableProcessors()) * - chunkFetchHandlerThreadsPercent/(double)100); +int threads = + this.serverThreads() > 0 ? this.serverThreads() : 2 * NettyRuntime.availableProcessors(); +return (int) Math.ceil(threads * (chunkFetchHandlerThreadsPercent / 100.0)); } } diff --git a/core/src/main/java/org/apache/spark/unsafe/map/BytesToBytesMap.java b/core/src/main/java/org/apache/spark/unsafe/map/BytesToBytesMap.java index 7df8aafb2b674..2ff98a69ee1f4 100644 --- a/core/src/main/java/org/apache/spark/unsafe/map/BytesToBytesMap.java +++ b/core/src/main/java/org/apache/spark/unsafe/map/BytesToBytesMap.java @@ -712,7 +712,7 @@ public boolean append(Object kbase, long koff, int klen, Object vbase, long voff final long recordOffset = offset; UnsafeAlignedOffset.putSize(base, offset, klen + vlen + uaoSize); UnsafeAlignedOffset.putSize(base, offset + uaoSize, klen); - offset += (2 * uaoSize); + offset += (2L * uaoSize); Platform.copyMemory(kbase, koff, base, offset, klen); offset += klen; Platform.copyMemory(vbase, voff, base, offset, vlen); @@ -780,7 +780,7 @@ private void allocate(int capacity) { assert (capacity >= 0); capacity = Math.max((int) Math.min(MAX_CAPACITY, ByteArrayMethods.nextPowerOf2(capacity)), 64); assert (capacity <= MAX_CAPACITY); -longArray = allocateArray(capacity * 2); +longArray = allocateArray(capacity * 2L); longArray.zeroOut(); this.growthThreshold = (int) (capacity * loadFactor); diff --git a/examples/src/main/java/org/apache/spark/examples/JavaTC.java b/examples/src/main/java/org/apache/spark/examples/JavaTC.java index c9ca9c9b3a412..7e8df69e7e8da 100644 --- a/examples/src/main/java/org/apache/spark/examples/JavaTC.java +++ b/examples/src/main/java/org/apache/spark/examples/JavaTC.java @@ -71,7 +71,7 @@ public static void main(String[] args) { JavaSparkContext jsc = new JavaSparkContext(spark.sparkContext()); -Integer slices = (args.length > 0) ? Integer.parseInt(args[0]): 2; +int slices = (args.length > 0) ? Integer.parseInt(args[0]): 2; JavaPairRDD tc = jsc.parallelizePairs(generateGraph(), slices).cache(); // Linear transitive closure: each round grows paths by one edge, diff --git a/examples/src/main/java/org/apache/spark/examples/ml/JavaALSExample.java b/examples/src/main/java/org/apache/spark/examples/ml/JavaALSExample.java index 27052be87b82e..b8d2c9f6a6584 100644 --- a/examples/src/main/java/org/apache/spark/examples/ml/JavaALSExample.java +++ b/examples/src/main/java/org/apache/spark/examples/ml/JavaALSExample.java @@ -111,7 +111,7 @@ public static void main(String[] args) { .setMetricName("rmse") .setLabelCol("rating") .setPredictionCol("prediction"); -Double rmse =
[GitHub] srowen commented on issue #23420: [SPARK-26508][Core][SQL] Address warning messages in Java reported at lgtm.com
srowen commented on issue #23420: [SPARK-26508][Core][SQL] Address warning messages in Java reported at lgtm.com URL: https://github.com/apache/spark/pull/23420#issuecomment-450786128 Merged to master This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA commented on issue #23425: [SPARK-26306][TEST][BUILD] More memory to de-flake SorterSuite
SparkQA commented on issue #23425: [SPARK-26306][TEST][BUILD] More memory to de-flake SorterSuite URL: https://github.com/apache/spark/pull/23425#issuecomment-450786104 **[Test build #100634 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/100634/testReport)** for PR 23425 at commit [`1aa7ad7`](https://github.com/apache/spark/commit/1aa7ad7aee0e10fbefd78638c8b896e60e3715b5). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] srowen opened a new pull request #23425: [SPARK-26306][TEST][BUILD] More memory to de-flake SorterSuite
srowen opened a new pull request #23425: [SPARK-26306][TEST][BUILD] More memory to de-flake SorterSuite URL: https://github.com/apache/spark/pull/23425 ## What changes were proposed in this pull request? Increase test memory to avoid OOM in TimSort-related tests. ## How was this patch tested? Existing tests. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23424: [SPARK-24421][CORE][FOLLOWUP] Use normal direct ByteBuffer allocation if Cleaner can't be set
AmplabJenkins removed a comment on issue #23424: [SPARK-24421][CORE][FOLLOWUP] Use normal direct ByteBuffer allocation if Cleaner can't be set URL: https://github.com/apache/spark/pull/23424#issuecomment-450785846 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23424: [SPARK-24421][CORE][FOLLOWUP] Use normal direct ByteBuffer allocation if Cleaner can't be set
AmplabJenkins removed a comment on issue #23424: [SPARK-24421][CORE][FOLLOWUP] Use normal direct ByteBuffer allocation if Cleaner can't be set URL: https://github.com/apache/spark/pull/23424#issuecomment-450785847 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/6551/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23424: [SPARK-24421][CORE][FOLLOWUP] Use normal direct ByteBuffer allocation if Cleaner can't be set
AmplabJenkins commented on issue #23424: [SPARK-24421][CORE][FOLLOWUP] Use normal direct ByteBuffer allocation if Cleaner can't be set URL: https://github.com/apache/spark/pull/23424#issuecomment-450785847 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/6551/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23424: [SPARK-24421][CORE][FOLLOWUP] Use normal direct ByteBuffer allocation if Cleaner can't be set
AmplabJenkins commented on issue #23424: [SPARK-24421][CORE][FOLLOWUP] Use normal direct ByteBuffer allocation if Cleaner can't be set URL: https://github.com/apache/spark/pull/23424#issuecomment-450785846 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA commented on issue #23424: [SPARK-24421][CORE][FOLLOWUP] Use normal direct ByteBuffer allocation if Cleaner can't be set
SparkQA commented on issue #23424: [SPARK-24421][CORE][FOLLOWUP] Use normal direct ByteBuffer allocation if Cleaner can't be set URL: https://github.com/apache/spark/pull/23424#issuecomment-450785770 **[Test build #100633 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/100633/testReport)** for PR 23424 at commit [`2f267a5`](https://github.com/apache/spark/commit/2f267a5d63e10d3d1e986a346a4385a93a27ce7c). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA commented on issue #23404: [SPARK-26501]Fix unexpected overriden of exitFn in SparkSubmitSuite
SparkQA commented on issue #23404: [SPARK-26501]Fix unexpected overriden of exitFn in SparkSubmitSuite URL: https://github.com/apache/spark/pull/23404#issuecomment-450785748 **[Test build #4492 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4492/testReport)** for PR 23404 at commit [`66a9d5d`](https://github.com/apache/spark/commit/66a9d5d333271eae76c18d4e33076724371bbe6a). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23404: [SPARK-26501]Fix unexpected overriden of exitFn in SparkSubmitSuite
AmplabJenkins removed a comment on issue #23404: [SPARK-26501]Fix unexpected overriden of exitFn in SparkSubmitSuite URL: https://github.com/apache/spark/pull/23404#issuecomment-450474110 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] srowen commented on a change in pull request #23424: [SPARK-24421][CORE][FOLLOWUP] Use normal direct ByteBuffer allocation if Cleaner can't be set
srowen commented on a change in pull request #23424: [SPARK-24421][CORE][FOLLOWUP] Use normal direct ByteBuffer allocation if Cleaner can't be set URL: https://github.com/apache/spark/pull/23424#discussion_r244661628 ## File path: common/unsafe/src/main/java/org/apache/spark/unsafe/Platform.java ## @@ -209,21 +209,25 @@ public static long reallocateMemory(long address, long oldSize, long newSize) { } /** - * Uses internal JDK APIs to allocate a DirectByteBuffer while ignoring the JVM's - * MaxDirectMemorySize limit (the default limit is too low and we do not want to require users - * to increase it). + * Allocate a DirectByteBuffer, potentially bypassing the JVM's MaxDirectMemorySize limit. */ public static ByteBuffer allocateDirectBuffer(int size) { try { + if (CLEANER_CREATE_METHOD == null) { +// Can't set a Cleaner (see comments on field), so need to allocate via normal Java APIs +return ByteBuffer.allocateDirect(size); + } + // Otherwise, use internal JDK APIs to allocate a DirectByteBuffer while ignoring the JVM's + // MaxDirectMemorySize limit (the default limit is too low and we do not want to + // require users to increase it). long memory = allocateMemory(size); ByteBuffer buffer = (ByteBuffer) DBB_CONSTRUCTOR.newInstance(memory, size); - if (CLEANER_CREATE_METHOD != null) { -try { - DBB_CLEANER_FIELD.set(buffer, - CLEANER_CREATE_METHOD.invoke(null, buffer, (Runnable) () -> freeMemory(memory))); -} catch (IllegalAccessException | InvocationTargetException e) { - throw new IllegalStateException(e); -} + try { +DBB_CLEANER_FIELD.set(buffer, +CLEANER_CREATE_METHOD.invoke(null, buffer, (Runnable) () -> freeMemory(memory))); + } catch (IllegalAccessException | InvocationTargetException e) { +freeMemory(memory); Review comment: Just to be totally safe, free the memory that was allocated but can't be used now in this case. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] srowen commented on a change in pull request #22993: [SPARK-24421][BUILD][CORE] Accessing sun.misc.Cleaner in JDK11
srowen commented on a change in pull request #22993: [SPARK-24421][BUILD][CORE] Accessing sun.misc.Cleaner in JDK11 URL: https://github.com/apache/spark/pull/22993#discussion_r244661604 ## File path: common/unsafe/src/main/java/org/apache/spark/unsafe/Platform.java ## @@ -159,18 +213,18 @@ public static long reallocateMemory(long address, long oldSize, long newSize) { * MaxDirectMemorySize limit (the default limit is too low and we do not want to require users * to increase it). */ - @SuppressWarnings("unchecked") public static ByteBuffer allocateDirectBuffer(int size) { try { - Class cls = Class.forName("java.nio.DirectByteBuffer"); - Constructor constructor = cls.getDeclaredConstructor(Long.TYPE, Integer.TYPE); - constructor.setAccessible(true); - Field cleanerField = cls.getDeclaredField("cleaner"); - cleanerField.setAccessible(true); long memory = allocateMemory(size); - ByteBuffer buffer = (ByteBuffer) constructor.newInstance(memory, size); - Cleaner cleaner = Cleaner.create(buffer, () -> freeMemory(memory)); - cleanerField.set(buffer, cleaner); + ByteBuffer buffer = (ByteBuffer) DBB_CONSTRUCTOR.newInstance(memory, size); + if (CLEANER_CREATE_METHOD != null) { Review comment: See https://github.com/apache/spark/pull/23424 ; I now think this was an error. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] srowen opened a new pull request #23424: [SPARK-24421][CORE][FOLLOWUP] Use normal direct ByteBuffer allocation if Cleaner can't be set
srowen opened a new pull request #23424: [SPARK-24421][CORE][FOLLOWUP] Use normal direct ByteBuffer allocation if Cleaner can't be set URL: https://github.com/apache/spark/pull/23424 ## What changes were proposed in this pull request? In Java 9+ we can't use sun.misc.Cleaner by default anymore, and this was largely handled in https://github.com/apache/spark/pull/22993 However I think the change there left a significant problem. If a DirectByteBuffer is allocated using the reflective hack in Platform, now, we by default can't set a Cleaner. But I believe this means the memory isn't freed promptly or possibly at all. If a Cleaner can't be set, I think we need to use normal APIs to allocate the direct ByteBuffer. According to comments in the code, the downside is simply that the normal APIs will check and impose limits on how much off-heap memory can be allocated. Per the original review on https://github.com/apache/spark/pull/22993 this much seems fine, as either way in this case the user would have to add a JVM setting (increase max, or allow the reflective access). ## How was this patch tested? Existing tests. This resolved an OutOfMemoryError in Java 11 from TimSort tests without increasing test heap size. (See https://github.com/apache/spark/pull/23419#issuecomment-450772125 ) This suggests there is a problem and that this resolves it. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] beliefer commented on a change in pull request #23409: [SPARK-26502][SQL] Move hiveResultString() from QueryExecution to HiveResult
beliefer commented on a change in pull request #23409: [SPARK-26502][SQL] Move hiveResultString() from QueryExecution to HiveResult URL: https://github.com/apache/spark/pull/23409#discussion_r244661148 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/HiveResult.scala ## @@ -0,0 +1,112 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution + +import java.nio.charset.StandardCharsets +import java.sql.{Date, Timestamp} + +import org.apache.spark.sql.Row +import org.apache.spark.sql.catalyst.util.DateTimeUtils +import org.apache.spark.sql.execution.command.{DescribeTableCommand, ExecutedCommandExec, ShowTablesCommand} +import org.apache.spark.sql.types._ + +object HiveResult { Review comment: HiveResult.hiveResultString seems could used for other(eg. MySql).So I suggest modify the name. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23420: [SPARK-26508][Core][SQL] Address warning messages in Java reported at lgtm.com
AmplabJenkins removed a comment on issue #23420: [SPARK-26508][Core][SQL] Address warning messages in Java reported at lgtm.com URL: https://github.com/apache/spark/pull/23420#issuecomment-450784759 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23420: [SPARK-26508][Core][SQL] Address warning messages in Java reported at lgtm.com
AmplabJenkins commented on issue #23420: [SPARK-26508][Core][SQL] Address warning messages in Java reported at lgtm.com URL: https://github.com/apache/spark/pull/23420#issuecomment-450784759 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23420: [SPARK-26508][Core][SQL] Address warning messages in Java reported at lgtm.com
AmplabJenkins commented on issue #23420: [SPARK-26508][Core][SQL] Address warning messages in Java reported at lgtm.com URL: https://github.com/apache/spark/pull/23420#issuecomment-450784760 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/100628/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23420: [SPARK-26508][Core][SQL] Address warning messages in Java reported at lgtm.com
AmplabJenkins removed a comment on issue #23420: [SPARK-26508][Core][SQL] Address warning messages in Java reported at lgtm.com URL: https://github.com/apache/spark/pull/23420#issuecomment-450784760 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/100628/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA removed a comment on issue #23420: [SPARK-26508][Core][SQL] Address warning messages in Java reported at lgtm.com
SparkQA removed a comment on issue #23420: [SPARK-26508][Core][SQL] Address warning messages in Java reported at lgtm.com URL: https://github.com/apache/spark/pull/23420#issuecomment-450765699 **[Test build #100628 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/100628/testReport)** for PR 23420 at commit [`3df0a0a`](https://github.com/apache/spark/commit/3df0a0ab27b2c841da4c7b3da6ecf8b7f48d7e6d). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA commented on issue #23420: [SPARK-26508][Core][SQL] Address warning messages in Java reported at lgtm.com
SparkQA commented on issue #23420: [SPARK-26508][Core][SQL] Address warning messages in Java reported at lgtm.com URL: https://github.com/apache/spark/pull/23420#issuecomment-450784630 **[Test build #100628 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/100628/testReport)** for PR 23420 at commit [`3df0a0a`](https://github.com/apache/spark/commit/3df0a0ab27b2c841da4c7b3da6ecf8b7f48d7e6d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] LantaoJin commented on issue #22874: [SPARK-25865][CORE] Add GC information to ExecutorMetrics
LantaoJin commented on issue #22874: [SPARK-25865][CORE] Add GC information to ExecutorMetrics URL: https://github.com/apache/spark/pull/22874#issuecomment-450784430 > They make sense over the entire lifetime of the executor, but not when viewed within one stage -- you'd want to subtract out the value at the beginning of the stage. You are right. They only make sense over the entire lifetime. I don't want to separate this metrics to multi-stage. I will check the current implementation. One of purposes to add this is determining the frequency of major & minor GC. Only memory usage couldn't tell us the rationality of memory allocation. For example, two cases, their configured memory for executor are all 10GB and their usages are all near 10GB. So should we increase or decrease the configured memory for them? This metrics may be helpful. We can increase configured memory for the first one if it has very frequency major GC and decrease the second one if only some minor GC and none major GC. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA commented on issue #23392: [SPARK-26450][SQL] Avoid rebuilding map of schema for every column in projection
SparkQA commented on issue #23392: [SPARK-26450][SQL] Avoid rebuilding map of schema for every column in projection URL: https://github.com/apache/spark/pull/23392#issuecomment-450784176 **[Test build #100632 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/100632/testReport)** for PR 23392 at commit [`a25b59c`](https://github.com/apache/spark/commit/a25b59ca756958370dd7ba14d6c1e33dec424ea8). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] gatorsmile commented on a change in pull request #23276: [SPARK-26321][SQL] Split a SQL in correct way
gatorsmile commented on a change in pull request #23276: [SPARK-26321][SQL] Split a SQL in correct way URL: https://github.com/apache/spark/pull/23276#discussion_r244657377 ## File path: sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala ## @@ -18,13 +18,13 @@ package org.apache.spark.sql.hive.thriftserver import java.io._ -import java.util.{ArrayList => JArrayList, Locale} +import java.util.{ArrayList, Locale} Review comment: keep this unchanged? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] gatorsmile commented on a change in pull request #23276: [SPARK-26321][SQL] Split a SQL in correct way
gatorsmile commented on a change in pull request #23276: [SPARK-26321][SQL] Split a SQL in correct way URL: https://github.com/apache/spark/pull/23276#discussion_r244659512 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala ## @@ -87,4 +92,93 @@ object StringUtils { } funcNames.toSeq } + + /** + * Split the text into one or more SQLs with bracketed comments reserved + * + * Highlighted Corner Cases: semicolon in double quotes, single quotes or inline comments. + * Expected Behavior: The blanks will be trimed and a blank line will be omitted. + * + * @param text One or more SQLs separated by semicolons + * @return the trimmed SQL array (Array is for Java introp) + */ + def split(text: String): Array[String] = { +val D_QUOTE: Char = '"' +val S_QUOTE: Char = '\'' +val Q_QUOTE: Char = '`' +val SEMICOLON: Char = ';' +val ESCAPE: Char = '\\' +val DOT = '.' +val SINGLE_COMMENT = "--" +val BRACKETED_COMMENT_START = "/*" +val BRACKETED_COMMENT_END = "*/" +val FORWARD_SLASH = '/' + +// quoteFlag acts as an enum of D_QUOTE, S_QUOTE, DOT +// * D_QUOTE: the cursor stands on a doulbe quoted string +// * S_QUOTE: the cursor stands on a single quoted string +// * DASH: the cursor stands in the SINGLE_COMMENT +// * FORWARD_SLASH: the cursor stands in the BRACKETED_COMMENT +// * DOT: default value for other cases +var quoteFlag: Char = DOT +var cursor: Int = 0 +val ret: mutable.ArrayBuffer[String] = mutable.ArrayBuffer() +var currentSQL: mutable.StringBuilder = mutable.StringBuilder.newBuilder + +while (cursor < text.length) { + val current: Char = text(cursor) + + text.substring(cursor) match { Review comment: Based on the current impl, there are many cases we need to consider. It is easy to miss one of it. Could we simplify it? first handling the special cases, e.g., in quotes, in comments, and then entering the regular mode. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] gatorsmile commented on a change in pull request #23276: [SPARK-26321][SQL] Split a SQL in correct way
gatorsmile commented on a change in pull request #23276: [SPARK-26321][SQL] Split a SQL in correct way URL: https://github.com/apache/spark/pull/23276#discussion_r244658838 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala ## @@ -87,4 +92,93 @@ object StringUtils { } funcNames.toSeq } + + /** + * Split the text into one or more SQLs with bracketed comments reserved + * + * Highlighted Corner Cases: semicolon in double quotes, single quotes or inline comments. + * Expected Behavior: The blanks will be trimed and a blank line will be omitted. + * + * @param text One or more SQLs separated by semicolons + * @return the trimmed SQL array (Array is for Java introp) + */ + def split(text: String): Array[String] = { +val D_QUOTE: Char = '"' +val S_QUOTE: Char = '\'' +val Q_QUOTE: Char = '`' +val SEMICOLON: Char = ';' +val ESCAPE: Char = '\\' +val DOT = '.' +val SINGLE_COMMENT = "--" +val BRACKETED_COMMENT_START = "/*" +val BRACKETED_COMMENT_END = "*/" +val FORWARD_SLASH = '/' + +// quoteFlag acts as an enum of D_QUOTE, S_QUOTE, DOT +// * D_QUOTE: the cursor stands on a doulbe quoted string +// * S_QUOTE: the cursor stands on a single quoted string +// * DASH: the cursor stands in the SINGLE_COMMENT +// * FORWARD_SLASH: the cursor stands in the BRACKETED_COMMENT +// * DOT: default value for other cases +var quoteFlag: Char = DOT +var cursor: Int = 0 +val ret: mutable.ArrayBuffer[String] = mutable.ArrayBuffer() +var currentSQL: mutable.StringBuilder = mutable.StringBuilder.newBuilder + +while (cursor < text.length) { + val current: Char = text(cursor) + + text.substring(cursor) match { +// if it stands on the opening of a bracketed comment, consume 2 characters +case remaining if quoteFlag == DOT + && current == '/' + && remaining.startsWith(BRACKETED_COMMENT_START) => + quoteFlag = current + currentSQL.append("/*") Review comment: Use `BRACKETED_COMMENT_START` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] gatorsmile commented on a change in pull request #23276: [SPARK-26321][SQL] Split a SQL in correct way
gatorsmile commented on a change in pull request #23276: [SPARK-26321][SQL] Split a SQL in correct way URL: https://github.com/apache/spark/pull/23276#discussion_r244656958 ## File path: sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala ## @@ -331,6 +337,64 @@ private[hive] class SparkSQLCLIDriver extends CliDriver with Logging { console.printInfo(s"Spark master: $master, Application Id: $appId") } + override def processLine(line: String, allowInterrupting: Boolean): Int = { Review comment: Could you write a comment above this line and explain the code is from org.apache.hadoop.hive.cli.CliDriver? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] gatorsmile commented on a change in pull request #23276: [SPARK-26321][SQL] Split a SQL in correct way
gatorsmile commented on a change in pull request #23276: [SPARK-26321][SQL] Split a SQL in correct way URL: https://github.com/apache/spark/pull/23276#discussion_r244659180 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala ## @@ -87,4 +92,93 @@ object StringUtils { } funcNames.toSeq } + + /** + * Split the text into one or more SQLs with bracketed comments reserved + * + * Highlighted Corner Cases: semicolon in double quotes, single quotes or inline comments. + * Expected Behavior: The blanks will be trimed and a blank line will be omitted. + * + * @param text One or more SQLs separated by semicolons + * @return the trimmed SQL array (Array is for Java introp) + */ + def split(text: String): Array[String] = { +val D_QUOTE: Char = '"' +val S_QUOTE: Char = '\'' +val Q_QUOTE: Char = '`' +val SEMICOLON: Char = ';' +val ESCAPE: Char = '\\' +val DOT = '.' +val SINGLE_COMMENT = "--" +val BRACKETED_COMMENT_START = "/*" +val BRACKETED_COMMENT_END = "*/" +val FORWARD_SLASH = '/' + +// quoteFlag acts as an enum of D_QUOTE, S_QUOTE, DOT +// * D_QUOTE: the cursor stands on a doulbe quoted string +// * S_QUOTE: the cursor stands on a single quoted string +// * DASH: the cursor stands in the SINGLE_COMMENT +// * FORWARD_SLASH: the cursor stands in the BRACKETED_COMMENT +// * DOT: default value for other cases +var quoteFlag: Char = DOT Review comment: can we use enum? The current way mixed the actual flag with the symbol. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] gatorsmile commented on a change in pull request #23276: [SPARK-26321][SQL] Split a SQL in correct way
gatorsmile commented on a change in pull request #23276: [SPARK-26321][SQL] Split a SQL in correct way URL: https://github.com/apache/spark/pull/23276#discussion_r244658802 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala ## @@ -87,4 +92,93 @@ object StringUtils { } funcNames.toSeq } + + /** + * Split the text into one or more SQLs with bracketed comments reserved + * + * Highlighted Corner Cases: semicolon in double quotes, single quotes or inline comments. + * Expected Behavior: The blanks will be trimed and a blank line will be omitted. + * + * @param text One or more SQLs separated by semicolons + * @return the trimmed SQL array (Array is for Java introp) + */ + def split(text: String): Array[String] = { +val D_QUOTE: Char = '"' +val S_QUOTE: Char = '\'' +val Q_QUOTE: Char = '`' +val SEMICOLON: Char = ';' +val ESCAPE: Char = '\\' +val DOT = '.' +val SINGLE_COMMENT = "--" +val BRACKETED_COMMENT_START = "/*" +val BRACKETED_COMMENT_END = "*/" +val FORWARD_SLASH = '/' + +// quoteFlag acts as an enum of D_QUOTE, S_QUOTE, DOT +// * D_QUOTE: the cursor stands on a doulbe quoted string +// * S_QUOTE: the cursor stands on a single quoted string +// * DASH: the cursor stands in the SINGLE_COMMENT +// * FORWARD_SLASH: the cursor stands in the BRACKETED_COMMENT +// * DOT: default value for other cases +var quoteFlag: Char = DOT +var cursor: Int = 0 +val ret: mutable.ArrayBuffer[String] = mutable.ArrayBuffer() +var currentSQL: mutable.StringBuilder = mutable.StringBuilder.newBuilder + +while (cursor < text.length) { + val current: Char = text(cursor) + + text.substring(cursor) match { +// if it stands on the opening of a bracketed comment, consume 2 characters +case remaining if quoteFlag == DOT + && current == '/' Review comment: Could you follow the indentation convention in our code base? https://github.com/databricks/scala-style-guide#indent You can find many examples in the code base This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] gatorsmile commented on a change in pull request #23276: [SPARK-26321][SQL] Split a SQL in correct way
gatorsmile commented on a change in pull request #23276: [SPARK-26321][SQL] Split a SQL in correct way URL: https://github.com/apache/spark/pull/23276#discussion_r244657396 ## File path: sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala ## @@ -384,7 +448,7 @@ private[hive] class SparkSQLCLIDriver extends CliDriver with Logging { return ret } - val res = new JArrayList[String]() + val res = new ArrayList[String]() Review comment: keep this unchanged? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] gatorsmile commented on a change in pull request #23276: [SPARK-26321][SQL] Split a SQL in correct way
gatorsmile commented on a change in pull request #23276: [SPARK-26321][SQL] Split a SQL in correct way URL: https://github.com/apache/spark/pull/23276#discussion_r244659631 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/StringUtilsSuite.scala ## @@ -43,4 +43,70 @@ class StringUtilsSuite extends SparkFunSuite { assert(filterPattern(names, " a. ") === Seq("a1", "a2")) assert(filterPattern(names, " d* ") === Nil) } + + test("split a SQL") { Review comment: The test case coverage is still not enough. For example, adding some test cases from https://github.com/apache/hive/commit/65a65826a0d351a3d918bdb98595bdd106d37adb#diff-6182a9d1d63c707dff0ecd4e6a025fd2 ? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23349: [SPARK-26403][SQL] Support pivoting using array column for `pivot(column)` API
AmplabJenkins commented on issue #23349: [SPARK-26403][SQL] Support pivoting using array column for `pivot(column)` API URL: https://github.com/apache/spark/pull/23349#issuecomment-450782512 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23349: [SPARK-26403][SQL] Support pivoting using array column for `pivot(column)` API
AmplabJenkins removed a comment on issue #23349: [SPARK-26403][SQL] Support pivoting using array column for `pivot(column)` API URL: https://github.com/apache/spark/pull/23349#issuecomment-450782512 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23349: [SPARK-26403][SQL] Support pivoting using array column for `pivot(column)` API
AmplabJenkins removed a comment on issue #23349: [SPARK-26403][SQL] Support pivoting using array column for `pivot(column)` API URL: https://github.com/apache/spark/pull/23349#issuecomment-450782513 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/6550/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23349: [SPARK-26403][SQL] Support pivoting using array column for `pivot(column)` API
AmplabJenkins commented on issue #23349: [SPARK-26403][SQL] Support pivoting using array column for `pivot(column)` API URL: https://github.com/apache/spark/pull/23349#issuecomment-450782513 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/6550/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA commented on issue #23349: [SPARK-26403][SQL] Support pivoting using array column for `pivot(column)` API
SparkQA commented on issue #23349: [SPARK-26403][SQL] Support pivoting using array column for `pivot(column)` API URL: https://github.com/apache/spark/pull/23349#issuecomment-450782084 **[Test build #100631 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/100631/testReport)** for PR 23349 at commit [`d91ade6`](https://github.com/apache/spark/commit/d91ade60e14dbb7327351de5c59f50ba7d66e26a). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] HyukjinKwon commented on a change in pull request #23349: [SPARK-26403][SQL] Support pivoting using array column for `pivot(column)` API
HyukjinKwon commented on a change in pull request #23349: [SPARK-26403][SQL] Support pivoting using array column for `pivot(column)` API URL: https://github.com/apache/spark/pull/23349#discussion_r244658731 ## File path: sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala ## @@ -422,10 +422,14 @@ class RelationalGroupedDataset protected[sql]( def pivot(pivotColumn: Column, values: Seq[Any]): RelationalGroupedDataset = { groupType match { case RelationalGroupedDataset.GroupByType => -val valueExprs = values.map(_ match { +val valueExprs = values.map { case c: Column => c.expr + // ArrayType returns a `WrappedArray` but currently `Literal.apply` + // does not support this type although it supports a normal array. + // Here manually unwrap to make it an array. See also SPARK-26403. + case v: collection.mutable.WrappedArray[_] => Literal.apply(v.array) Review comment: Yup. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] cloud-fan commented on issue #23325: [SPARK-26376][SQL] Skip inputs without tokens by JSON datasource
cloud-fan commented on issue #23325: [SPARK-26376][SQL] Skip inputs without tokens by JSON datasource URL: https://github.com/apache/spark/pull/23325#issuecomment-450780863 then how about `from_json` always return null for corrupted record if mode is `PERMISSIVE`? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] ConcurrencyPractitioner closed pull request #21651: [SPARK-18258] Sink need access to offset representation
ConcurrencyPractitioner closed pull request #21651: [SPARK-18258] Sink need access to offset representation URL: https://github.com/apache/spark/pull/21651 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSink.scala b/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSink.scala index 08914d82fffdd..8014b6e733bb8 100644 --- a/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSink.scala +++ b/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSink.scala @@ -21,6 +21,7 @@ import java.{util => ju} import org.apache.spark.internal.Logging import org.apache.spark.sql.{DataFrame, SQLContext} +import org.apache.spark.sql.execution.streaming.OffsetSeq import org.apache.spark.sql.execution.streaming.Sink private[kafka010] class KafkaSink( @@ -31,12 +32,12 @@ private[kafka010] class KafkaSink( override def toString(): String = "KafkaSink" - override def addBatch(batchId: Long, data: DataFrame): Unit = { + override def addBatch(batchId: Long, data: DataFrame, start: OffsetSeq, end: OffsetSeq): Unit = { if (batchId <= latestBatchId) { logInfo(s"Skipping already committed batch $batchId") } else { KafkaWriter.write(sqlContext.sparkSession, -data.queryExecution, executorKafkaParams, topic) +data.queryExecution, executorKafkaParams, topic, start, end) latestBatchId = batchId } } diff --git a/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala b/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala index d225c1ea6b7f1..1a5857a591499 100644 --- a/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala +++ b/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala @@ -252,7 +252,7 @@ private[kafka010] class KafkaSourceProvider extends DataSourceRegister val topic = parameters.get(TOPIC_OPTION_KEY).map(_.trim) val specifiedKafkaParams = kafkaParamsForProducer(parameters) KafkaWriter.write(outerSQLContext.sparkSession, data.queryExecution, - new ju.HashMap[String, Object](specifiedKafkaParams.asJava), topic) + new ju.HashMap[String, Object](specifiedKafkaParams.asJava), topic, null, null) /* This method is suppose to return a relation that reads the data that was written. * We cannot support this for Kafka. Therefore, in order to make things consistent, diff --git a/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaWriteTask.scala b/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaWriteTask.scala index d90630a8adc93..112a48a718335 100644 --- a/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaWriteTask.scala +++ b/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaWriteTask.scala @@ -23,6 +23,7 @@ import org.apache.kafka.clients.producer.{Callback, KafkaProducer, ProducerRecor import org.apache.spark.sql.catalyst.InternalRow import org.apache.spark.sql.catalyst.expressions.{Attribute, Cast, Literal, UnsafeProjection} +import org.apache.spark.sql.execution.streaming.OffsetSeq import org.apache.spark.sql.types.{BinaryType, StringType} /** @@ -33,7 +34,9 @@ import org.apache.spark.sql.types.{BinaryType, StringType} private[kafka010] class KafkaWriteTask( producerConfiguration: ju.Map[String, Object], inputSchema: Seq[Attribute], -topic: Option[String]) extends KafkaRowWriter(inputSchema, topic) { +topic: Option[String], +start: OffsetSeq, // not done +end: OffsetSeq) extends KafkaRowWriter(inputSchema, topic) { // used to synchronize with Kafka callbacks private var producer: KafkaProducer[Array[Byte], Array[Byte]] = _ diff --git a/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaWriter.scala b/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaWriter.scala index 15cd44812cb0c..dc433edc10faa 100644 --- a/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaWriter.scala +++ b/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaWriter.scala @@ -23,6 +23,7 @@ import org.apache.spark.internal.Logging import org.apache.spark.sql.{AnalysisException, SparkSession} import org.apache.spark.sql.catalyst.expressions._ import org.apache.spark.sql.execution.{QueryExecution, SQLExecution} +import org.apache.spark.sql.execution.streaming.OffsetSeq import org.apache.spark.sql.types.{BinaryType, StringType} import org.apache.spark.util.Utils
[GitHub] liupc commented on a change in pull request #23404: [SPARK-26501]Fix unexpected overriden of exitFn in SparkSubmitSuite
liupc commented on a change in pull request #23404: [SPARK-26501]Fix unexpected overriden of exitFn in SparkSubmitSuite URL: https://github.com/apache/spark/pull/23404#discussion_r244657632 ## File path: core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala ## @@ -72,26 +72,32 @@ trait TestPrematureExit { mainObject.printStream = printStream @volatile var exitedCleanly = false -mainObject.exitFn = (_) => exitedCleanly = true - -@volatile var exception: Exception = null -val thread = new Thread { - override def run() = try { -mainObject.main(input) - } catch { -// Capture the exception to check whether the exception contains searchString or not -case e: Exception => exception = e - } +def withFakeExit(body: => Unit): Unit = { Review comment: I agree that a try-finally block is necessary, because the func body throws exception. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] liupc commented on a change in pull request #23404: [SPARK-26501]Fix unexpected overriden of exitFn in SparkSubmitSuite
liupc commented on a change in pull request #23404: [SPARK-26501]Fix unexpected overriden of exitFn in SparkSubmitSuite URL: https://github.com/apache/spark/pull/23404#discussion_r244656931 ## File path: core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala ## @@ -72,26 +72,32 @@ trait TestPrematureExit { mainObject.printStream = printStream @volatile var exitedCleanly = false -mainObject.exitFn = (_) => exitedCleanly = true - -@volatile var exception: Exception = null -val thread = new Thread { - override def run() = try { -mainObject.main(input) - } catch { -// Capture the exception to check whether the exception contains searchString or not -case e: Exception => exception = e - } +def withFakeExit(body: => Unit): Unit = { Review comment: It's ok to just modify/restore exitFn in a straight way, but use a helper method here will make the modification of exitFn more clear? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] liupc commented on a change in pull request #23404: [SPARK-26501]Fix unexpected overriden of exitFn in SparkSubmitSuite
liupc commented on a change in pull request #23404: [SPARK-26501]Fix unexpected overriden of exitFn in SparkSubmitSuite URL: https://github.com/apache/spark/pull/23404#discussion_r244656931 ## File path: core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala ## @@ -72,26 +72,32 @@ trait TestPrematureExit { mainObject.printStream = printStream @volatile var exitedCleanly = false -mainObject.exitFn = (_) => exitedCleanly = true - -@volatile var exception: Exception = null -val thread = new Thread { - override def run() = try { -mainObject.main(input) - } catch { -// Capture the exception to check whether the exception contains searchString or not -case e: Exception => exception = e - } +def withFakeExit(body: => Unit): Unit = { Review comment: It's ok to just modify/restore exitFn in a straight way, but use a helper method here will make the modification of exitFn more clear? moreover, I was wondering if a try-finally block is necessary? there is already a try-catch block in the func body of the only occurrence. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] cloud-fan commented on a change in pull request #23349: [SPARK-26403][SQL] Support pivoting using array column for `pivot(column)` API
cloud-fan commented on a change in pull request #23349: [SPARK-26403][SQL] Support pivoting using array column for `pivot(column)` API URL: https://github.com/apache/spark/pull/23349#discussion_r244656526 ## File path: sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala ## @@ -422,10 +422,14 @@ class RelationalGroupedDataset protected[sql]( def pivot(pivotColumn: Column, values: Seq[Any]): RelationalGroupedDataset = { groupType match { case RelationalGroupedDataset.GroupByType => -val valueExprs = values.map(_ match { +val valueExprs = values.map { case c: Column => c.expr + // ArrayType returns a `WrappedArray` but currently `Literal.apply` + // does not support this type although it supports a normal array. + // Here manually unwrap to make it an array. See also SPARK-26403. + case v: collection.mutable.WrappedArray[_] => Literal.apply(v.array) Review comment: ah I see. Then I think it's better to put it in `Literal.apply`, as it can help more cases. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA commented on issue #23388: [SPARK-26448][SQL] retain the difference between 0.0 and -0.0
SparkQA commented on issue #23388: [SPARK-26448][SQL] retain the difference between 0.0 and -0.0 URL: https://github.com/apache/spark/pull/23388#issuecomment-450778855 **[Test build #100630 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/100630/testReport)** for PR 23388 at commit [`c228ad9`](https://github.com/apache/spark/commit/c228ad97fcbed7e93940d120f177817f7ad55c27). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23388: [SPARK-26448][SQL] retain the difference between 0.0 and -0.0
AmplabJenkins commented on issue #23388: [SPARK-26448][SQL] retain the difference between 0.0 and -0.0 URL: https://github.com/apache/spark/pull/23388#issuecomment-450778815 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/6549/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23388: [SPARK-26448][SQL] retain the difference between 0.0 and -0.0
AmplabJenkins removed a comment on issue #23388: [SPARK-26448][SQL] retain the difference between 0.0 and -0.0 URL: https://github.com/apache/spark/pull/23388#issuecomment-450778815 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/6549/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23388: [SPARK-26448][SQL] retain the difference between 0.0 and -0.0
AmplabJenkins commented on issue #23388: [SPARK-26448][SQL] retain the difference between 0.0 and -0.0 URL: https://github.com/apache/spark/pull/23388#issuecomment-450778812 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23388: [SPARK-26448][SQL] retain the difference between 0.0 and -0.0
AmplabJenkins removed a comment on issue #23388: [SPARK-26448][SQL] retain the difference between 0.0 and -0.0 URL: https://github.com/apache/spark/pull/23388#issuecomment-450778812 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] cloud-fan commented on a change in pull request #23388: [SPARK-26448][SQL] retain the difference between 0.0 and -0.0
cloud-fan commented on a change in pull request #23388: [SPARK-26448][SQL] retain the difference between 0.0 and -0.0 URL: https://github.com/apache/spark/pull/23388#discussion_r244656211 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala ## @@ -0,0 +1,184 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.optimizer + +import org.apache.spark.sql.catalyst.expressions.{Alias, And, ArrayTransform, CreateArray, CreateMap, CreateNamedStruct, CreateNamedStructUnsafe, CreateStruct, EqualTo, ExpectsInputTypes, Expression, GetStructField, LambdaFunction, NamedLambdaVariable, UnaryExpression} +import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, ExprCode} +import org.apache.spark.sql.catalyst.planning.ExtractEquiJoinKeys +import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, Subquery, Window} +import org.apache.spark.sql.catalyst.rules.Rule +import org.apache.spark.sql.types._ + +/** + * We need to take care of special floating numbers (NaN and -0.0) in several places: + * 1. When compare values, different NaNs should be treated as same, `-0.0` and `0.0` should be + * treated as same. + * 2. In GROUP BY, different NaNs should belong to the same group, -0.0 and 0.0 should belong + * to the same group. + * 3. In join keys, different NaNs should be treated as same, `-0.0` and `0.0` should be + * treated as same. + * 4. In window partition keys, different NaNs should be treated as same, `-0.0` and `0.0` + * should be treated as same. + * + * Case 1 is fine, as we handle NaN and -0.0 well during comparison. For complex types, we + * recursively compare the fields/elements, so it's also fine. + * + * Case 2, 3 and 4 are problematic, as they compare `UnsafeRow` binary directly, and different + * NaNs have different binary representation, and the same thing happens for -0.0 and 0.0. + * + * This rule normalizes NaN and -0.0 in Window partition keys, Join keys and Aggregate grouping + * expressions. + * + * Note that, this rule should be an analyzer rule, as it must be applied to make the query result + * corrected. Currently it's executed as an optimizer rule, because the optimizer may create new + * joins(for subquery) and reorder joins(may change the join condition), and this rule needs to be + * executed at the end. + */ +object NormalizeFloatingNumbers extends Rule[LogicalPlan] { + + def apply(plan: LogicalPlan): LogicalPlan = plan match { +// A subquery will be rewritten into join later, and will go through this rule Review comment: This is same as `ExtractPythonUDFs` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] srowen commented on a change in pull request #23391: [SPARK-26456][SQL] Cast date/timestamp to string by Date/TimestampFormatter
srowen commented on a change in pull request #23391: [SPARK-26456][SQL] Cast date/timestamp to string by Date/TimestampFormatter URL: https://github.com/apache/spark/pull/23391#discussion_r244656099 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala ## @@ -230,7 +235,7 @@ object PartitioningUtils { // Once we get the string, we try to parse it and find the partition column and value. val maybeColumn = parsePartitionColumn(currentPath.getName, typeInference, userSpecifiedDataTypes, -validatePartitionColumns, timeZone) +validatePartitionColumns, timeZone, dateFormatter, timestampFormatter) Review comment: @MaxGekk you probably have a better summary than I do, but is the problem fundamentally about writing formatted dates incorrectly? older dates would have slightly the wrong hour/minute, and was it because the timezone wasn't fully and correctly specified? and this would manifest if writing to JSON or CSV? yeah, a brief summary of the type of bug that was fixed by the new parser would be helpful in the release notes (just "Docs text" of the JIRA) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] cloud-fan commented on a change in pull request #23388: [SPARK-26448][SQL] retain the difference between 0.0 and -0.0
cloud-fan commented on a change in pull request #23388: [SPARK-26448][SQL] retain the difference between 0.0 and -0.0 URL: https://github.com/apache/spark/pull/23388#discussion_r244655875 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala ## @@ -102,8 +102,8 @@ object ExtractEquiJoinKeys extends Logging with PredicateHelper { type ReturnType = (JoinType, Seq[Expression], Seq[Expression], Option[Expression], LogicalPlan, LogicalPlan) - def unapply(plan: LogicalPlan): Option[ReturnType] = plan match { -case join @ Join(left, right, joinType, condition) => + def unapply(join: Join): Option[ReturnType] = join match { +case Join(left, right, joinType, condition) => Review comment: we can, but that will introduce a lot of code diff, because of the indentation... This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] cloud-fan commented on a change in pull request #23388: [SPARK-26448][SQL] retain the difference between 0.0 and -0.0
cloud-fan commented on a change in pull request #23388: [SPARK-26448][SQL] retain the difference between 0.0 and -0.0 URL: https://github.com/apache/spark/pull/23388#discussion_r244655849 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala ## @@ -0,0 +1,184 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.optimizer + +import org.apache.spark.sql.catalyst.expressions.{Alias, And, ArrayTransform, CreateArray, CreateMap, CreateNamedStruct, CreateNamedStructUnsafe, CreateStruct, EqualTo, ExpectsInputTypes, Expression, GetStructField, LambdaFunction, NamedLambdaVariable, UnaryExpression} +import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, ExprCode} +import org.apache.spark.sql.catalyst.planning.ExtractEquiJoinKeys +import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, Subquery, Window} +import org.apache.spark.sql.catalyst.rules.Rule +import org.apache.spark.sql.types._ + +/** + * We need to take care of special floating numbers (NaN and -0.0) in several places: + * 1. When compare values, different NaNs should be treated as same, `-0.0` and `0.0` should be + * treated as same. + * 2. In GROUP BY, different NaNs should belong to the same group, -0.0 and 0.0 should belong + * to the same group. + * 3. In join keys, different NaNs should be treated as same, `-0.0` and `0.0` should be + * treated as same. + * 4. In window partition keys, different NaNs should be treated as same, `-0.0` and `0.0` + * should be treated as same. + * + * Case 1 is fine, as we handle NaN and -0.0 well during comparison. For complex types, we + * recursively compare the fields/elements, so it's also fine. + * + * Case 2, 3 and 4 are problematic, as they compare `UnsafeRow` binary directly, and different + * NaNs have different binary representation, and the same thing happens for -0.0 and 0.0. + * + * This rule normalizes NaN and -0.0 in Window partition keys, Join keys and Aggregate grouping + * expressions. + * + * Note that, this rule should be an analyzer rule, as it must be applied to make the query result + * corrected. Currently it's executed as an optimizer rule, because the optimizer may create new + * joins(for subquery) and reorder joins(may change the join condition), and this rule needs to be + * executed at the end. + */ +object NormalizeFloatingNumbers extends Rule[LogicalPlan] { + + def apply(plan: LogicalPlan): LogicalPlan = plan match { +// A subquery will be rewritten into join later, and will go through this rule Review comment: `OptimizeSubqueries` will apply the entire optimizer and triggers this rule. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] cloud-fan commented on a change in pull request #23388: [SPARK-26448][SQL] retain the difference between 0.0 and -0.0
cloud-fan commented on a change in pull request #23388: [SPARK-26448][SQL] retain the difference between 0.0 and -0.0 URL: https://github.com/apache/spark/pull/23388#discussion_r244655827 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala ## @@ -0,0 +1,179 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.optimizer + +import org.apache.spark.sql.catalyst.expressions.{Alias, And, ArrayTransform, CreateArray, CreateMap, CreateNamedStruct, CreateNamedStructUnsafe, CreateStruct, EqualTo, ExpectsInputTypes, Expression, GetStructField, LambdaFunction, NamedLambdaVariable, UnaryExpression} +import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, ExprCode} +import org.apache.spark.sql.catalyst.planning.ExtractEquiJoinKeys +import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, Subquery, Window} +import org.apache.spark.sql.catalyst.rules.Rule +import org.apache.spark.sql.types._ + +/** + * We need to take care of special floating numbers (NaN and -0.0) in several places: + * 1. When compare values, different NaNs should be treated as same, `-0.0` and `0.0` should be + * treated as same. + * 2. In GROUP BY, different NaNs should belong to the same group, -0.0 and 0.0 should belong + * to the same group. + * 3. In join keys, different NaNs should be treated as same, `-0.0` and `0.0` should be + * treated as same. + * 4. In window partition keys, different NaNs should be treated as same, `-0.0` and `0.0` + * should be treated as same. + * + * Case 1 is fine, as we handle NaN and -0.0 well during comparison. For complex types, we + * recursively compare the fields/elements, so it's also fine. + * + * Case 2, 3 and 4 are problematic, as they compare `UnsafeRow` binary directly, and different + * NaNs have different binary representation, and the same thing happens for -0.0 and 0.0. + * + * This rule normalizes NaN and -0.0 in Window partition keys, Join keys and Aggregate grouping + * expressions. + */ +object NormalizeFloatingNumbers extends Rule[LogicalPlan] { Review comment: ah good catch! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] cloud-fan commented on a change in pull request #23391: [SPARK-26456][SQL] Cast date/timestamp to string by Date/TimestampFormatter
cloud-fan commented on a change in pull request #23391: [SPARK-26456][SQL] Cast date/timestamp to string by Date/TimestampFormatter URL: https://github.com/apache/spark/pull/23391#discussion_r244655731 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala ## @@ -230,7 +235,7 @@ object PartitioningUtils { // Once we get the string, we try to parse it and find the partition column and value. val maybeColumn = parsePartitionColumn(currentPath.getName, typeInference, userSpecifiedDataTypes, -validatePartitionColumns, timeZone) +validatePartitionColumns, timeZone, dateFormatter, timestampFormatter) Review comment: then shall we update the migration guide about the difference? I think the changes here is better as SQL standard uses Gregorian calendar. IIUC, the behavior difference only happens when reading files? If users write a timestamp literal and display it, we should be fine. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] HyukjinKwon commented on a change in pull request #23414: [SPARK-26449][PYTHON] add a transform method to the Dataframe class
HyukjinKwon commented on a change in pull request #23414: [SPARK-26449][PYTHON] add a transform method to the Dataframe class URL: https://github.com/apache/spark/pull/23414#discussion_r244655655 ## File path: python/pyspark/sql/dataframe.py ## @@ -2046,6 +2046,40 @@ def toDF(self, *cols): jdf = self._jdf.toDF(self._jseq(cols)) return DataFrame(jdf, self.sql_ctx) +@since(3.0) +def transform(self, func): +"""Returns a new class:`DataFrame` according to a user-defined custom transform method. +This allows chaining transformations rather than using nested or temporary variables. + +:param func: a user-defined custom transform function +This is equiavalent to a nested call: +actual_df = with_something(with_greeting(source_df), "crazy")) + +credit to: https://medium.com/@mrpowers/chaining-custom-pyspark-transformations-4f38a8c7ae55 + +A more concrete example:: +>>> sc = pyspark.SparkContext(master='local') +>>> spark = pyspark.sql.SparkSession(sparkContext=sc) +>>> from pyspark.sql.functions import lit +>>> def with_greeting(df): +... return df.withColumn("greeting", lit("hi")) +>>> def with_something(df, something): +... return df.withColumn("something", lit(something)) +>>> data = [("jose", 1), ("li", 2), ("liz", 3)] +>>> source_df = spark.createDataFrame(data, ["name", "age"]) +>>> actual_df = source_df.transform(with_greeting).transform(lambda x: with_something(x, "crazy")) Review comment: I think we don't necessarily have to demonstrate the chaining of multiple `transform`. We can chain other APIs as well, for instance, `df.transform(...).select(...).transform(...)` in that sense. `show()` is already DataFrame API. I think `df.transform(...).show()` is simple and good enough. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] srowen commented on issue #23419: [SPARK-26507][CORE] Fix core tests for Java 11
srowen commented on issue #23419: [SPARK-26507][CORE] Fix core tests for Java 11 URL: https://github.com/apache/spark/pull/23419#issuecomment-450777350 OK, so you are suggesting increasing the heap size there just because it currently fails sometimes? that's fine too, I can also make that change separately. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] HyukjinKwon commented on a change in pull request #23414: [SPARK-26449][PYTHON] add a transform method to the Dataframe class
HyukjinKwon commented on a change in pull request #23414: [SPARK-26449][PYTHON] add a transform method to the Dataframe class URL: https://github.com/apache/spark/pull/23414#discussion_r244655567 ## File path: python/pyspark/sql/dataframe.py ## @@ -2046,6 +2046,36 @@ def toDF(self, *cols): jdf = self._jdf.toDF(self._jseq(cols)) return DataFrame(jdf, self.sql_ctx) +@since(3.0) +def transform(self, func): +"""Returns a new class:`DataFrame` according to a custom transform function. +This allows chaining transformations rather than using nested or temporary variables. + +:param func: a custom transform function which returns a DataFrame Review comment: nit: `DataFrame` -> `` class:`DataFrame` `` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] dongjoon-hyun edited a comment on issue #23419: [SPARK-26507][CORE] Fix core tests for Java 11
dongjoon-hyun edited a comment on issue #23419: [SPARK-26507][CORE] Fix core tests for Java 11 URL: https://github.com/apache/spark/pull/23419#issuecomment-450777144 That's great! Yep. +1 for handling them separately. BTW, I found that `SorterSuite` flakiness issue was filed as https://issues.apache.org/jira/browse/SPARK-26306 . And, I added two recent Jenkins failure urls, too. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] dongjoon-hyun commented on issue #23419: [SPARK-26507][CORE] Fix core tests for Java 11
dongjoon-hyun commented on issue #23419: [SPARK-26507][CORE] Fix core tests for Java 11 URL: https://github.com/apache/spark/pull/23419#issuecomment-450777144 That's great! Yep. +1 for handling them separately. `SorterSuite` flakiness issue was filed as https://issues.apache.org/jira/browse/SPARK-26306 . I added two recent Jenkins failure urls, too. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] HyukjinKwon commented on a change in pull request #23414: [SPARK-26449][PYTHON] add a transform method to the Dataframe class
HyukjinKwon commented on a change in pull request #23414: [SPARK-26449][PYTHON] add a transform method to the Dataframe class URL: https://github.com/apache/spark/pull/23414#discussion_r244655505 ## File path: python/pyspark/sql/dataframe.py ## @@ -2046,6 +2046,36 @@ def toDF(self, *cols): jdf = self._jdf.toDF(self._jseq(cols)) return DataFrame(jdf, self.sql_ctx) +@since(3.0) +def transform(self, func): +"""Returns a new class:`DataFrame` according to a custom transform function. +This allows chaining transformations rather than using nested or temporary variables. + +:param func: a custom transform function which returns a DataFrame + +>>> from pyspark.sql.functions import lit +>>> def with_greeting(df): Review comment: Can we make the example more concise and meaningful? I think we should focus only on a simple example about the API itself rather then using `lambda`. For instance, ```python >>> df = spark.range(10) >>> def cast_to_str(input_df): ... return input_df.select([col(c).cast("string") for c in input_df.columns]) >>> df.transform(cast_to_str).show() ``` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org