[GitHub] [spark] cloud-fan commented on a change in pull request #28756: [SPARK-8981][CORE][FOLLOW-UP] Clean up MDC properties after running a task
cloud-fan commented on a change in pull request #28756: URL: https://github.com/apache/spark/pull/28756#discussion_r438580524 ## File path: core/src/main/scala/org/apache/spark/executor/Executor.scala ## @@ -322,11 +322,15 @@ private[spark] class Executor( val taskId = taskDescription.taskId val threadName = s"Executor task launch worker for task $taskId" val taskName = taskDescription.name -val mdcProperties = taskDescription.properties.asScala - .filter(_._1.startsWith("mdc.")).map { item => +val mdcProperties = (taskDescription.properties.asScala ++ + Seq((Executor.TASK_MDC_KEY, taskName))) + .filter(_._1.startsWith(Executor.MDC_KEY)).map { item => val key = item._1.substring(4) +if (key == Executor.TASK_MDC_KEY && item._2 != taskName) { + logWarning(s"Override mdc.taskName is not allowed, ignore ${item._2}") Review comment: What's the benefit to let users override the task name? It's just confusing to me. Let's not support a non-existing use case. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk commented on a change in pull request #28787: [SPARK-31959][SQL] Fix Gregorian-Julian micros rebasing while switching standard time zone offset
MaxGekk commented on a change in pull request #28787: URL: https://github.com/apache/spark/pull/28787#discussion_r438579601 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/RebaseDateTimeSuite.scala ## @@ -409,4 +409,31 @@ class RebaseDateTimeSuite extends SparkFunSuite with Matchers with SQLHelper { } } } + + test("SPARK-31959: JST -> HKT at Asia/Hong_Kong in 1945") { +// The 'Asia/Hong_Kong' time zone switched from 'Japan Standard Time' (JST = UTC+9) +// to 'Hong Kong Time' (HKT = UTC+8). After Sunday, 18 November, 1945 01:59:59 AM, +// clocks were moved backward to become Sunday, 18 November, 1945 01:00:00 AM. +// In this way, the overlap happened w/o Daylight Saving Time. +val hkZid = getZoneId("Asia/Hong_Kong") +withDefaultTimeZone(hkZid) { + val ldt = LocalDateTime.of(1945, 11, 18, 1, 30, 0) + val earlierMicros = instantToMicros(ldt.atZone(hkZid).withEarlierOffsetAtOverlap().toInstant) + val laterMicros = instantToMicros(ldt.atZone(hkZid).withLaterOffsetAtOverlap().toInstant) + assert(earlierMicros + MICROS_PER_HOUR === laterMicros) + val rebasedEarlierMicros = rebaseGregorianToJulianMicros(hkZid, earlierMicros) + val rebasedLaterMicros = rebaseGregorianToJulianMicros(hkZid, laterMicros) + def toTsStr(micros: Long): String = toJavaTimestamp(micros).toString + val expected = "1945-11-18 01:30:00.0" + assert(toTsStr(rebasedEarlierMicros) === expected) + assert(toTsStr(rebasedLaterMicros) === expected) + assert(rebasedEarlierMicros + MICROS_PER_HOUR === rebasedLaterMicros) + // Check optimized rebasing Review comment: Yes, it relies but I set the default JVM time zone to `Asia/Hong_Kong` in the test. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28715: [SPARK-31897][SQL]Enable codegen for GenerateExec
AmplabJenkins removed a comment on pull request #28715: URL: https://github.com/apache/spark/pull/28715#issuecomment-642450214 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28715: [SPARK-31897][SQL]Enable codegen for GenerateExec
AmplabJenkins commented on pull request #28715: URL: https://github.com/apache/spark/pull/28715#issuecomment-642450214 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28715: [SPARK-31897][SQL]Enable codegen for GenerateExec
SparkQA commented on pull request #28715: URL: https://github.com/apache/spark/pull/28715#issuecomment-642449582 **[Test build #123828 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123828/testReport)** for PR 28715 at commit [`bdbaa6b`](https://github.com/apache/spark/commit/bdbaa6b2becc311bde4a342668e4501325b6aa28). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #28787: [SPARK-31959][SQL] Fix Gregorian-Julian micros rebasing while switching standard time zone offset
cloud-fan commented on a change in pull request #28787: URL: https://github.com/apache/spark/pull/28787#discussion_r438577268 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/RebaseDateTimeSuite.scala ## @@ -409,4 +409,31 @@ class RebaseDateTimeSuite extends SparkFunSuite with Matchers with SQLHelper { } } } + + test("SPARK-31959: JST -> HKT at Asia/Hong_Kong in 1945") { +// The 'Asia/Hong_Kong' time zone switched from 'Japan Standard Time' (JST = UTC+9) +// to 'Hong Kong Time' (HKT = UTC+8). After Sunday, 18 November, 1945 01:59:59 AM, +// clocks were moved backward to become Sunday, 18 November, 1945 01:00:00 AM. +// In this way, the overlap happened w/o Daylight Saving Time. +val hkZid = getZoneId("Asia/Hong_Kong") +withDefaultTimeZone(hkZid) { + val ldt = LocalDateTime.of(1945, 11, 18, 1, 30, 0) + val earlierMicros = instantToMicros(ldt.atZone(hkZid).withEarlierOffsetAtOverlap().toInstant) + val laterMicros = instantToMicros(ldt.atZone(hkZid).withLaterOffsetAtOverlap().toInstant) + assert(earlierMicros + MICROS_PER_HOUR === laterMicros) + val rebasedEarlierMicros = rebaseGregorianToJulianMicros(hkZid, earlierMicros) + val rebasedLaterMicros = rebaseGregorianToJulianMicros(hkZid, laterMicros) + def toTsStr(micros: Long): String = toJavaTimestamp(micros).toString + val expected = "1945-11-18 01:30:00.0" + assert(toTsStr(rebasedEarlierMicros) === expected) + assert(toTsStr(rebasedLaterMicros) === expected) + assert(rebasedEarlierMicros + MICROS_PER_HOUR === rebasedLaterMicros) + // Check optimized rebasing Review comment: so this relies on the JVM system timezone is not `Asia/Hong_Kong`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] igreenfield commented on a change in pull request #28756: [SPARK-8981][CORE][FOLLOW-UP] Clean up MDC properties after running a task
igreenfield commented on a change in pull request #28756: URL: https://github.com/apache/spark/pull/28756#discussion_r438577011 ## File path: core/src/main/scala/org/apache/spark/executor/Executor.scala ## @@ -322,11 +322,15 @@ private[spark] class Executor( val taskId = taskDescription.taskId val threadName = s"Executor task launch worker for task $taskId" val taskName = taskDescription.name -val mdcProperties = taskDescription.properties.asScala - .filter(_._1.startsWith("mdc.")).map { item => +val mdcProperties = (taskDescription.properties.asScala ++ + Seq((Executor.TASK_MDC_KEY, taskName))) + .filter(_._1.startsWith(Executor.MDC_KEY)).map { item => val key = item._1.substring(4) +if (key == Executor.TASK_MDC_KEY && item._2 != taskName) { + logWarning(s"Override mdc.taskName is not allowed, ignore ${item._2}") Review comment: I think we should let the user override it, and write to the log that is overridden. (windows way is not let you do things, Linux way: with great power comes greater responsibility) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #28715: [SPARK-31897][SQL]Enable codegen for GenerateExec
cloud-fan commented on pull request #28715: URL: https://github.com/apache/spark/pull/28715#issuecomment-642446974 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #28756: [SPARK-8981][CORE][FOLLOW-UP] Clean up MDC properties after running a task
cloud-fan commented on a change in pull request #28756: URL: https://github.com/apache/spark/pull/28756#discussion_r438575268 ## File path: core/src/main/scala/org/apache/spark/executor/Executor.scala ## @@ -322,11 +322,15 @@ private[spark] class Executor( val taskId = taskDescription.taskId val threadName = s"Executor task launch worker for task $taskId" val taskName = taskDescription.name -val mdcProperties = taskDescription.properties.asScala - .filter(_._1.startsWith("mdc.")).map { item => +val mdcProperties = (taskDescription.properties.asScala ++ + Seq((Executor.TASK_MDC_KEY, taskName))) + .filter(_._1.startsWith(Executor.MDC_KEY)).map { item => val key = item._1.substring(4) +if (key == Executor.TASK_MDC_KEY && item._2 != taskName) { + logWarning(s"Override mdc.taskName is not allowed, ignore ${item._2}") Review comment: then our document is wrong. We must make sure `taskName` always represent the value as we documented. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk commented on a change in pull request #28787: [SPARK-31959][SQL] Fix Gregorian-Julian micros rebasing while switching standard time zone offset
MaxGekk commented on a change in pull request #28787: URL: https://github.com/apache/spark/pull/28787#discussion_r438556136 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/RebaseDateTime.scala ## @@ -326,20 +326,34 @@ object RebaseDateTime { */ private[sql] def rebaseGregorianToJulianMicros(zoneId: ZoneId, micros: Long): Long = { val instant = microsToInstant(micros) -var ldt = instant.atZone(zoneId).toLocalDateTime +val zonedDateTime = instant.atZone(zoneId) +var ldt = zonedDateTime.toLocalDateTime if (ldt.isAfter(julianEndTs) && ldt.isBefore(gregorianStartTs)) { ldt = LocalDateTime.of(gregorianStartDate, ldt.toLocalTime) } val cal = new Calendar.Builder() - // `gregory` is a hybrid calendar that supports both - // the Julian and Gregorian calendar systems + // `gregory` is a hybrid calendar that supports both the Julian and Gregorian calendar systems .setCalendarType("gregory") .setDate(ldt.getYear, ldt.getMonthValue - 1, ldt.getDayOfMonth) .setTimeOfDay(ldt.getHour, ldt.getMinute, ldt.getSecond) - // Local time-line can overlaps, such as at an autumn daylight savings cutover. - // This setting selects the original local timestamp mapped to the given `micros`. - .set(Calendar.DST_OFFSET, zoneId.getRules.getDaylightSavings(instant).toMillis.toInt) .build() +// A local timestamp can have 2 instants in the cases of switching from: +// 1. Summer to winter time. +// 2. One standard time zone to another one. For example, Asia/Hong_Kong switched from JST +// to HKT on 18 November, 1945 01:59:59 AM. +// Below we check that the original `instant` is earlier or later instant. If it is an earlier +// instant, we take the standard and DST offsets of the previous day otherwise of the next one. +val trans = zoneId.getRules.getTransition(ldt) +if (trans != null && trans.isOverlap) { Review comment: Only when overlapping happens, it shouldn't be so often. Usually, it happens once per year. Please, see my comment in the code above. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan closed pull request #28785: [SPARK-31958][SQL] normalize special floating numbers in subquery
cloud-fan closed pull request #28785: URL: https://github.com/apache/spark/pull/28785 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #28785: [SPARK-31958][SQL] normalize special floating numbers in subquery
cloud-fan commented on pull request #28785: URL: https://github.com/apache/spark/pull/28785#issuecomment-642443401 thanks for the review, merging to master/3.0! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #26901: [SPARK-29152][CORE][2.4] Executor Plugin shutdown when dynamic allocation is enabled
AmplabJenkins commented on pull request #26901: URL: https://github.com/apache/spark/pull/26901#issuecomment-642443415 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #26901: [SPARK-29152][CORE][2.4] Executor Plugin shutdown when dynamic allocation is enabled
AmplabJenkins removed a comment on pull request #26901: URL: https://github.com/apache/spark/pull/26901#issuecomment-642443415 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #26901: [SPARK-29152][CORE][2.4] Executor Plugin shutdown when dynamic allocation is enabled
SparkQA commented on pull request #26901: URL: https://github.com/apache/spark/pull/26901#issuecomment-642442833 **[Test build #123827 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123827/testReport)** for PR 26901 at commit [`123f429`](https://github.com/apache/spark/commit/123f4297ff1858506e9611d556491b47c57e5419). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #28785: [SPARK-31958][SQL] normalize special floating numbers in subquery
viirya commented on a change in pull request #28785: URL: https://github.com/apache/spark/pull/28785#discussion_r438571966 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala ## @@ -56,10 +56,6 @@ import org.apache.spark.sql.types._ object NormalizeFloatingNumbers extends Rule[LogicalPlan] { Review comment: I see. Makes sense. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] igreenfield commented on pull request #28629: [SPARK-31769] Add MDC support for driver threads
igreenfield commented on pull request #28629: URL: https://github.com/apache/spark/pull/28629#issuecomment-642441141 ping @cloud-fan what do you think. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] igreenfield commented on a change in pull request #28756: [SPARK-8981][CORE][FOLLOW-UP] Clean up MDC properties after running a task
igreenfield commented on a change in pull request #28756: URL: https://github.com/apache/spark/pull/28756#discussion_r438570609 ## File path: core/src/main/scala/org/apache/spark/executor/Executor.scala ## @@ -322,11 +322,15 @@ private[spark] class Executor( val taskId = taskDescription.taskId val threadName = s"Executor task launch worker for task $taskId" val taskName = taskDescription.name -val mdcProperties = taskDescription.properties.asScala - .filter(_._1.startsWith("mdc.")).map { item => +val mdcProperties = (taskDescription.properties.asScala ++ + Seq((Executor.TASK_MDC_KEY, taskName))) + .filter(_._1.startsWith(Executor.MDC_KEY)).map { item => val key = item._1.substring(4) +if (key == Executor.TASK_MDC_KEY && item._2 != taskName) { + logWarning(s"Override mdc.taskName is not allowed, ignore ${item._2}") Review comment: Why we do not let override the task name in MDC? ## File path: core/src/main/scala/org/apache/spark/executor/Executor.scala ## @@ -969,4 +992,7 @@ private[spark] object Executor { // task is fully deserialized. When possible, the TaskContext.getLocalProperty call should be // used instead. val taskDeserializationProps: ThreadLocal[Properties] = new ThreadLocal[Properties] + + val MDC_KEY = "mdc." + val TASK_MDC_KEY = s"${MDC_KEY}taskName" Review comment: if you change this key you need also to update the docs This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] karuppayya commented on a change in pull request #28715: [SPARK-31897][SQL]Enable codegen for GenerateExec
karuppayya commented on a change in pull request #28715: URL: https://github.com/apache/spark/pull/28715#discussion_r438569824 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/GenerateExec.scala ## @@ -226,18 +226,33 @@ case class GenerateExec( "0" } val numOutput = metricTerm(ctx, "numOutputRows") +val requiredInput = + evaluateRequiredVariablesExpr(child.output, +input, parent.inputSet) ++ position ++ values s""" |${data.code} |$initMapData |int $numElements = ${data.isNull} ? 0 : ${data.value}.numElements(); |for (int $index = $init; $index < $numElements; $index++) { | $numOutput.add(1); | $updateRowData - | ${consume(ctx, input ++ position ++ values)} + | ${consume(ctx, requiredInput)} |} """.stripMargin } + /** + * Returns [ExprCode] for required attributes + */ + private def evaluateRequiredVariablesExpr( Review comment: @cloud-fan Should this method be in introduced in WholeStageCodeGenExec.scala? Please advise This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #28785: [SPARK-31958][SQL] normalize special floating numbers in subquery
cloud-fan commented on a change in pull request #28785: URL: https://github.com/apache/spark/pull/28785#discussion_r438569515 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala ## @@ -56,10 +56,6 @@ import org.apache.spark.sql.types._ object NormalizeFloatingNumbers extends Rule[LogicalPlan] { Review comment: it's still true, the correlated subquery becomes join, and may have new join keys. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] karuppayya commented on a change in pull request #28715: [SPARK-31897][SQL]Enable codegen for GenerateExec
karuppayya commented on a change in pull request #28715: URL: https://github.com/apache/spark/pull/28715#discussion_r438569824 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/GenerateExec.scala ## @@ -226,18 +226,33 @@ case class GenerateExec( "0" } val numOutput = metricTerm(ctx, "numOutputRows") +val requiredInput = + evaluateRequiredVariablesExpr(child.output, +input, parent.inputSet) ++ position ++ values s""" |${data.code} |$initMapData |int $numElements = ${data.isNull} ? 0 : ${data.value}.numElements(); |for (int $index = $init; $index < $numElements; $index++) { | $numOutput.add(1); | $updateRowData - | ${consume(ctx, input ++ position ++ values)} + | ${consume(ctx, requiredInput)} |} """.stripMargin } + /** + * Returns [ExprCode] for required attributes + */ + private def evaluateRequiredVariablesExpr( Review comment: @cloud-fan Should this method be in introduced in WholeStageCodeGenExec.scala? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] karuppayya commented on pull request #28715: [SPARK-31897][SQL]Enable codegen for GenerateExec
karuppayya commented on pull request #28715: URL: https://github.com/apache/spark/pull/28715#issuecomment-642439230 The test failure doesnt seem related to this change. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #28785: [SPARK-31958][SQL] normalize special floating numbers in subquery
cloud-fan commented on a change in pull request #28785: URL: https://github.com/apache/spark/pull/28785#discussion_r438569515 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala ## @@ -56,10 +56,6 @@ import org.apache.spark.sql.types._ object NormalizeFloatingNumbers extends Rule[LogicalPlan] { Review comment: ah yea, actually it's better to execute it before the `RewriteSubquery` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #28785: [SPARK-31958][SQL] normalize special floating numbers in subquery
cloud-fan commented on a change in pull request #28785: URL: https://github.com/apache/spark/pull/28785#discussion_r438568381 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala ## @@ -56,10 +56,6 @@ import org.apache.spark.sql.types._ object NormalizeFloatingNumbers extends Rule[LogicalPlan] { def apply(plan: LogicalPlan): LogicalPlan = plan match { -// A subquery will be rewritten into join later, and will go through this rule -// eventually. Here we skip subquery, as we only need to run this rule once. -case _: Subquery => plan Review comment: No we can't. This fix relies on the rule `OptimizeSubqueries`, which is an inner object of the `class Optimizer` as it needs to rerun the entire optimizer for subquery. So we can't use `OptimizeSubqueries` in `NormalizeFloatingPointNumbersSuite`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #28785: [SPARK-31958][SQL] normalize special floating numbers in subquery
viirya commented on a change in pull request #28785: URL: https://github.com/apache/spark/pull/28785#discussion_r438568623 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala ## @@ -56,10 +56,6 @@ import org.apache.spark.sql.types._ object NormalizeFloatingNumbers extends Rule[LogicalPlan] { Review comment: Does it also mean `This batch must be executed after the `RewriteSubquery` batch, which creates joins.` is not definitely true now? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28715: [SPARK-31897][SQL]Enable codegen for GenerateExec
AmplabJenkins removed a comment on pull request #28715: URL: https://github.com/apache/spark/pull/28715#issuecomment-642436049 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/123812/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28715: [SPARK-31897][SQL]Enable codegen for GenerateExec
AmplabJenkins removed a comment on pull request #28715: URL: https://github.com/apache/spark/pull/28715#issuecomment-642436045 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default
SparkQA commented on pull request #28593: URL: https://github.com/apache/spark/pull/28593#issuecomment-642436024 **[Test build #123826 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123826/testReport)** for PR 28593 at commit [`b4f4d53`](https://github.com/apache/spark/commit/b4f4d537d95ded13e80e8171036e0d36410e9732). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28715: [SPARK-31897][SQL]Enable codegen for GenerateExec
AmplabJenkins commented on pull request #28715: URL: https://github.com/apache/spark/pull/28715#issuecomment-642436045 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28715: [SPARK-31897][SQL]Enable codegen for GenerateExec
SparkQA removed a comment on pull request #28715: URL: https://github.com/apache/spark/pull/28715#issuecomment-642382760 **[Test build #123812 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123812/testReport)** for PR 28715 at commit [`bdbaa6b`](https://github.com/apache/spark/commit/bdbaa6b2becc311bde4a342668e4501325b6aa28). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28715: [SPARK-31897][SQL]Enable codegen for GenerateExec
SparkQA commented on pull request #28715: URL: https://github.com/apache/spark/pull/28715#issuecomment-642435548 **[Test build #123812 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123812/testReport)** for PR 28715 at commit [`bdbaa6b`](https://github.com/apache/spark/commit/bdbaa6b2becc311bde4a342668e4501325b6aa28). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default
AmplabJenkins removed a comment on pull request #28593: URL: https://github.com/apache/spark/pull/28593#issuecomment-642433149 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default
AmplabJenkins commented on pull request #28593: URL: https://github.com/apache/spark/pull/28593#issuecomment-642433149 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28641: [SPARK-31824][CORE][TESTS] DAGSchedulerSuite: Improve and reuse completeShuffleMapStageSuccessfully
AmplabJenkins removed a comment on pull request #28641: URL: https://github.com/apache/spark/pull/28641#issuecomment-642430815 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/123809/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28750: [SPARK-31916][SQL] StringConcat can lead to StringIndexOutOfBoundsException
AmplabJenkins removed a comment on pull request #28750: URL: https://github.com/apache/spark/pull/28750#issuecomment-642430970 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28641: [SPARK-31824][CORE][TESTS] DAGSchedulerSuite: Improve and reuse completeShuffleMapStageSuccessfully
AmplabJenkins removed a comment on pull request #28641: URL: https://github.com/apache/spark/pull/28641#issuecomment-642430811 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28750: [SPARK-31916][SQL] StringConcat can lead to StringIndexOutOfBoundsException
AmplabJenkins commented on pull request #28750: URL: https://github.com/apache/spark/pull/28750#issuecomment-642430970 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28641: [SPARK-31824][CORE][TESTS] DAGSchedulerSuite: Improve and reuse completeShuffleMapStageSuccessfully
SparkQA removed a comment on pull request #28641: URL: https://github.com/apache/spark/pull/28641#issuecomment-642378743 **[Test build #123809 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123809/testReport)** for PR 28641 at commit [`8d16061`](https://github.com/apache/spark/commit/8d1606131162c1be0b1363120186a90b9f8c4914). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28641: [SPARK-31824][CORE][TESTS] DAGSchedulerSuite: Improve and reuse completeShuffleMapStageSuccessfully
AmplabJenkins commented on pull request #28641: URL: https://github.com/apache/spark/pull/28641#issuecomment-642430811 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28641: [SPARK-31824][CORE][TESTS] DAGSchedulerSuite: Improve and reuse completeShuffleMapStageSuccessfully
SparkQA commented on pull request #28641: URL: https://github.com/apache/spark/pull/28641#issuecomment-642430256 **[Test build #123809 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123809/testReport)** for PR 28641 at commit [`8d16061`](https://github.com/apache/spark/commit/8d1606131162c1be0b1363120186a90b9f8c4914). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #28685: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function
HyukjinKwon commented on a change in pull request #28685: URL: https://github.com/apache/spark/pull/28685#discussion_r438561649 ## File path: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ## @@ -993,6 +993,30 @@ object functions { Lead(e.expr, Literal(offset), Literal(defaultValue)) } + /** + * Window function: returns the value that is the `offset`th row of the window frame + * (counting from 1), and `null` if the size of window frame is less than `offset` rows. + * + * This is equivalent to the nth_value function in SQL. + * + * @group window_funcs + * @since 3.0.0 Review comment: Here should be changed too. You could wait for some feedback from @hvanhovell before making more changes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28750: [SPARK-31916][SQL] StringConcat can lead to StringIndexOutOfBoundsException
SparkQA removed a comment on pull request #28750: URL: https://github.com/apache/spark/pull/28750#issuecomment-642325127 **[Test build #123797 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123797/testReport)** for PR 28750 at commit [`c7e780b`](https://github.com/apache/spark/commit/c7e780b5081c811c9410df6069ca6016ff7f0b90). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28750: [SPARK-31916][SQL] StringConcat can lead to StringIndexOutOfBoundsException
SparkQA commented on pull request #28750: URL: https://github.com/apache/spark/pull/28750#issuecomment-642428805 **[Test build #123797 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123797/testReport)** for PR 28750 at commit [`c7e780b`](https://github.com/apache/spark/commit/c7e780b5081c811c9410df6069ca6016ff7f0b90). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on pull request #28779: [SPARK-31950][SQL][TESTS] Extract SQL keywords from the generated parser class
viirya commented on pull request #28779: URL: https://github.com/apache/spark/pull/28779#issuecomment-642428072 Text-based solution sounds good. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28685: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function
AmplabJenkins removed a comment on pull request #28685: URL: https://github.com/apache/spark/pull/28685#issuecomment-642427640 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28685: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function
AmplabJenkins commented on pull request #28685: URL: https://github.com/apache/spark/pull/28685#issuecomment-642427640 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28685: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function
SparkQA commented on pull request #28685: URL: https://github.com/apache/spark/pull/28685#issuecomment-642427079 **[Test build #123825 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123825/testReport)** for PR 28685 at commit [`4d08364`](https://github.com/apache/spark/commit/4d083643f600ac0b14db7d312461f84c2c5029de). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28791: [SPARK-31935][SQL][TESTS][FOLLOWUP] Fix the test case for Hadoop2/3
AmplabJenkins removed a comment on pull request #28791: URL: https://github.com/apache/spark/pull/28791#issuecomment-642426075 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/123796/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28791: [SPARK-31935][SQL][TESTS][FOLLOWUP] Fix the test case for Hadoop2/3
AmplabJenkins removed a comment on pull request #28791: URL: https://github.com/apache/spark/pull/28791#issuecomment-642426069 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28764: [SPARK-21117][SQL] Built-in SQL Function Support - WIDTH_BUCKET
AmplabJenkins removed a comment on pull request #28764: URL: https://github.com/apache/spark/pull/28764#issuecomment-642425772 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/123795/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28791: [SPARK-31935][SQL][TESTS][FOLLOWUP] Fix the test case for Hadoop2/3
AmplabJenkins commented on pull request #28791: URL: https://github.com/apache/spark/pull/28791#issuecomment-642426069 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28764: [SPARK-21117][SQL] Built-in SQL Function Support - WIDTH_BUCKET
AmplabJenkins removed a comment on pull request #28764: URL: https://github.com/apache/spark/pull/28764#issuecomment-642425765 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gengliangwang commented on pull request #28733: [SPARK-31705][SQL] Push more possible predicates through Join via CNF conversion
gengliangwang commented on pull request #28733: URL: https://github.com/apache/spark/pull/28733#issuecomment-642425715 @wangyum @maropu @viirya @dilipbiswal @AngersZh @cloud-fan Thanks for the review. I think this PR is ready to be merged once the tests are passed. Let me know if you still have more comments. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28764: [SPARK-21117][SQL] Built-in SQL Function Support - WIDTH_BUCKET
AmplabJenkins commented on pull request #28764: URL: https://github.com/apache/spark/pull/28764#issuecomment-642425765 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default
AmplabJenkins removed a comment on pull request #28593: URL: https://github.com/apache/spark/pull/28593#issuecomment-642425089 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/123824/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28791: [SPARK-31935][SQL][TESTS][FOLLOWUP] Fix the test case for Hadoop2/3
SparkQA commented on pull request #28791: URL: https://github.com/apache/spark/pull/28791#issuecomment-642425419 **[Test build #123796 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123796/testReport)** for PR 28791 at commit [`dd4ab2e`](https://github.com/apache/spark/commit/dd4ab2e75db2e3a3ea136d907e71aa24f3f0bab9). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28791: [SPARK-31935][SQL][TESTS][FOLLOWUP] Fix the test case for Hadoop2/3
SparkQA removed a comment on pull request #28791: URL: https://github.com/apache/spark/pull/28791#issuecomment-642325096 **[Test build #123796 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123796/testReport)** for PR 28791 at commit [`dd4ab2e`](https://github.com/apache/spark/commit/dd4ab2e75db2e3a3ea136d907e71aa24f3f0bab9). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default
SparkQA removed a comment on pull request #28593: URL: https://github.com/apache/spark/pull/28593#issuecomment-642424761 **[Test build #123824 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123824/testReport)** for PR 28593 at commit [`c8d5aa5`](https://github.com/apache/spark/commit/c8d5aa5cf1c0c5eaf85ad6e01b008f025e468d55). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28764: [SPARK-21117][SQL] Built-in SQL Function Support - WIDTH_BUCKET
SparkQA removed a comment on pull request #28764: URL: https://github.com/apache/spark/pull/28764#issuecomment-642323162 **[Test build #123795 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123795/testReport)** for PR 28764 at commit [`cace933`](https://github.com/apache/spark/commit/cace933e23bf3af44a12e64150687bcfee350c01). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default
AmplabJenkins commented on pull request #28593: URL: https://github.com/apache/spark/pull/28593#issuecomment-642425084 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default
AmplabJenkins removed a comment on pull request #28593: URL: https://github.com/apache/spark/pull/28593#issuecomment-642425084 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28764: [SPARK-21117][SQL] Built-in SQL Function Support - WIDTH_BUCKET
SparkQA commented on pull request #28764: URL: https://github.com/apache/spark/pull/28764#issuecomment-642425022 **[Test build #123795 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123795/testReport)** for PR 28764 at commit [`cace933`](https://github.com/apache/spark/commit/cace933e23bf3af44a12e64150687bcfee350c01). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default
SparkQA commented on pull request #28593: URL: https://github.com/apache/spark/pull/28593#issuecomment-642425069 **[Test build #123824 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123824/testReport)** for PR 28593 at commit [`c8d5aa5`](https://github.com/apache/spark/commit/c8d5aa5cf1c0c5eaf85ad6e01b008f025e468d55). * This patch **fails RAT tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default
SparkQA commented on pull request #28593: URL: https://github.com/apache/spark/pull/28593#issuecomment-642424761 **[Test build #123824 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123824/testReport)** for PR 28593 at commit [`c8d5aa5`](https://github.com/apache/spark/commit/c8d5aa5cf1c0c5eaf85ad6e01b008f025e468d55). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #28685: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function
beliefer commented on a change in pull request #28685: URL: https://github.com/apache/spark/pull/28685#discussion_r438556665 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala ## @@ -474,6 +479,52 @@ case class Lag(input: Expression, offset: Expression, default: Expression) override val direction = Descending } +/** + * The NthValue function returns the value of `input` at the row that is the `offset`th row of + * the window frame (counting from 1). Offsets start at 0, which is the current row. When the + * value of `input` is null at the `offset`th row or there is no such an `offset`th row, null + * is returned. + * + * @param input expression to evaluate `offset`th row of the window frame. + * @param offset rows to jump ahead in the partition. + */ +@ExpressionDescription( + usage = """ +_FUNC_(input[, offset]) - Returns the value of `input` at the row that is the`offset`th row + of the window frame (counting from 1). If the value of `input` at the `offset`th row is + null, null is returned. If there is no such an offset row (e.g., when the offset is 10, + size of the window frame less than 10), null is returned. + """, + since = "3.0.0") Review comment: OK. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk commented on a change in pull request #28787: [SPARK-31959][SQL] Fix Gregorian-Julian micros rebasing while switching standard time zone offset
MaxGekk commented on a change in pull request #28787: URL: https://github.com/apache/spark/pull/28787#discussion_r438556136 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/RebaseDateTime.scala ## @@ -326,20 +326,34 @@ object RebaseDateTime { */ private[sql] def rebaseGregorianToJulianMicros(zoneId: ZoneId, micros: Long): Long = { val instant = microsToInstant(micros) -var ldt = instant.atZone(zoneId).toLocalDateTime +val zonedDateTime = instant.atZone(zoneId) +var ldt = zonedDateTime.toLocalDateTime if (ldt.isAfter(julianEndTs) && ldt.isBefore(gregorianStartTs)) { ldt = LocalDateTime.of(gregorianStartDate, ldt.toLocalTime) } val cal = new Calendar.Builder() - // `gregory` is a hybrid calendar that supports both - // the Julian and Gregorian calendar systems + // `gregory` is a hybrid calendar that supports both the Julian and Gregorian calendar systems .setCalendarType("gregory") .setDate(ldt.getYear, ldt.getMonthValue - 1, ldt.getDayOfMonth) .setTimeOfDay(ldt.getHour, ldt.getMinute, ldt.getSecond) - // Local time-line can overlaps, such as at an autumn daylight savings cutover. - // This setting selects the original local timestamp mapped to the given `micros`. - .set(Calendar.DST_OFFSET, zoneId.getRules.getDaylightSavings(instant).toMillis.toInt) .build() +// A local timestamp can have 2 instants in the cases of switching from: +// 1. Summer to winter time. +// 2. One standard time zone to another one. For example, Asia/Hong_Kong switched from JST +// to HKT on 18 November, 1945 01:59:59 AM. +// Below we check that the original `instant` is earlier or later instant. If it is an earlier +// instant, we take the standard and DST offsets of the previous day otherwise of the next one. +val trans = zoneId.getRules.getTransition(ldt) +if (trans != null && trans.isOverlap) { Review comment: I wrote the comment above This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk commented on a change in pull request #28787: [SPARK-31959][SQL] Fix Gregorian-Julian micros rebasing while switching standard time zone offset
MaxGekk commented on a change in pull request #28787: URL: https://github.com/apache/spark/pull/28787#discussion_r438556301 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/RebaseDateTimeSuite.scala ## @@ -409,4 +409,31 @@ class RebaseDateTimeSuite extends SparkFunSuite with Matchers with SQLHelper { } } } + + test("SPARK-31959: JST -> HKT at Asia/Hong_Kong in 1945") { +// The 'Asia/Hong_Kong' time zone switched from 'Japan Standard Time' (JST = UTC+9) +// to 'Hong Kong Time' (HKT = UTC+8). After Sunday, 18 November, 1945 01:59:59 AM, +// clocks were moved backward to become Sunday, 18 November, 1945 01:00:00 AM. +// In this way, the overlap happened w/o Daylight Saving Time. +val hkZid = getZoneId("Asia/Hong_Kong") +withDefaultTimeZone(hkZid) { + val ldt = LocalDateTime.of(1945, 11, 18, 1, 30, 0) + val earlierMicros = instantToMicros(ldt.atZone(hkZid).withEarlierOffsetAtOverlap().toInstant) + val laterMicros = instantToMicros(ldt.atZone(hkZid).withLaterOffsetAtOverlap().toInstant) + assert(earlierMicros + MICROS_PER_HOUR === laterMicros) + val rebasedEarlierMicros = rebaseGregorianToJulianMicros(hkZid, earlierMicros) + val rebasedLaterMicros = rebaseGregorianToJulianMicros(hkZid, laterMicros) + def toTsStr(micros: Long): String = toJavaTimestamp(micros).toString + val expected = "1945-11-18 01:30:00.0" + assert(toTsStr(rebasedEarlierMicros) === expected) + assert(toTsStr(rebasedLaterMicros) === expected) + assert(rebasedEarlierMicros + MICROS_PER_HOUR === rebasedLaterMicros) + // Check optimized rebasing Review comment: via pre-calculated offsets and switch points. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default
AmplabJenkins commented on pull request #28593: URL: https://github.com/apache/spark/pull/28593#issuecomment-642422771 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28764: [SPARK-21117][SQL] Built-in SQL Function Support - WIDTH_BUCKET
AmplabJenkins removed a comment on pull request #28764: URL: https://github.com/apache/spark/pull/28764#issuecomment-642422649 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default
AmplabJenkins removed a comment on pull request #28593: URL: https://github.com/apache/spark/pull/28593#issuecomment-642422771 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28764: [SPARK-21117][SQL] Built-in SQL Function Support - WIDTH_BUCKET
AmplabJenkins commented on pull request #28764: URL: https://github.com/apache/spark/pull/28764#issuecomment-642422649 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28764: [SPARK-21117][SQL] Built-in SQL Function Support - WIDTH_BUCKET
SparkQA commented on pull request #28764: URL: https://github.com/apache/spark/pull/28764#issuecomment-642422303 **[Test build #123823 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123823/testReport)** for PR 28764 at commit [`9e8eab2`](https://github.com/apache/spark/commit/9e8eab25eeffba0260d1a98a199f832251809c1d). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on pull request #28764: [SPARK-21117][SQL] Built-in SQL Function Support - WIDTH_BUCKET
maropu commented on pull request #28764: URL: https://github.com/apache/spark/pull/28764#issuecomment-642421193 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng commented on pull request #27473: [SPARK-30699][ML][PYSPARK] GMM blockify input vectors
zhengruifeng commented on pull request #27473: URL: https://github.com/apache/spark/pull/27473#issuecomment-642420940 @mengxr OK, I will be more patient for reviewing. actually, I did not ping Owen in some of those PRs, I will involve more ML committers/contributors in future PRs and tickets. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #28751: [SPARK-31926][SQL][test-hive1.2] Fix concurrency issue for ThriftCLIService to getPortNumber
cloud-fan commented on a change in pull request #28751: URL: https://github.com/apache/spark/pull/28751#discussion_r438553842 ## File path: project/SparkBuild.scala ## @@ -480,7 +480,8 @@ object SparkParallelTestGrouping { "org.apache.spark.sql.hive.thriftserver.SparkSQLEnvSuite", "org.apache.spark.sql.hive.thriftserver.ui.ThriftServerPageSuite", "org.apache.spark.sql.hive.thriftserver.ui.HiveThriftServer2ListenerSuite", -"org.apache.spark.sql.hive.thriftserver.ThriftServerWithSparkContextSuite", + "org.apache.spark.sql.hive.thriftserver.ThriftServerWithSparkContextInHttpSuite", + "org.apache.spark.sql.hive.thriftserver.ThriftServerWithSparkContextInBinarySuite", "org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite" Review comment: Can we just run these 2 test suites one by one? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28764: [SPARK-21117][SQL] Built-in SQL Function Support - WIDTH_BUCKET
AmplabJenkins removed a comment on pull request #28764: URL: https://github.com/apache/spark/pull/28764#issuecomment-642420191 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/123808/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28764: [SPARK-21117][SQL] Built-in SQL Function Support - WIDTH_BUCKET
AmplabJenkins removed a comment on pull request #28764: URL: https://github.com/apache/spark/pull/28764#issuecomment-642420188 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28764: [SPARK-21117][SQL] Built-in SQL Function Support - WIDTH_BUCKET
SparkQA removed a comment on pull request #28764: URL: https://github.com/apache/spark/pull/28764#issuecomment-642374887 **[Test build #123808 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123808/testReport)** for PR 28764 at commit [`9e8eab2`](https://github.com/apache/spark/commit/9e8eab25eeffba0260d1a98a199f832251809c1d). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28764: [SPARK-21117][SQL] Built-in SQL Function Support - WIDTH_BUCKET
AmplabJenkins commented on pull request #28764: URL: https://github.com/apache/spark/pull/28764#issuecomment-642420188 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28764: [SPARK-21117][SQL] Built-in SQL Function Support - WIDTH_BUCKET
SparkQA commented on pull request #28764: URL: https://github.com/apache/spark/pull/28764#issuecomment-642419817 **[Test build #123808 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123808/testReport)** for PR 28764 at commit [`9e8eab2`](https://github.com/apache/spark/commit/9e8eab25eeffba0260d1a98a199f832251809c1d). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #28787: [SPARK-31959][SQL] Fix Gregorian-Julian micros rebasing while switching standard time zone offset
cloud-fan commented on a change in pull request #28787: URL: https://github.com/apache/spark/pull/28787#discussion_r438550078 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/RebaseDateTime.scala ## @@ -326,20 +326,34 @@ object RebaseDateTime { */ private[sql] def rebaseGregorianToJulianMicros(zoneId: ZoneId, micros: Long): Long = { val instant = microsToInstant(micros) -var ldt = instant.atZone(zoneId).toLocalDateTime +val zonedDateTime = instant.atZone(zoneId) +var ldt = zonedDateTime.toLocalDateTime if (ldt.isAfter(julianEndTs) && ldt.isBefore(gregorianStartTs)) { ldt = LocalDateTime.of(gregorianStartDate, ldt.toLocalTime) } val cal = new Calendar.Builder() - // `gregory` is a hybrid calendar that supports both - // the Julian and Gregorian calendar systems + // `gregory` is a hybrid calendar that supports both the Julian and Gregorian calendar systems .setCalendarType("gregory") .setDate(ldt.getYear, ldt.getMonthValue - 1, ldt.getDayOfMonth) .setTimeOfDay(ldt.getHour, ldt.getMinute, ldt.getSecond) - // Local time-line can overlaps, such as at an autumn daylight savings cutover. - // This setting selects the original local timestamp mapped to the given `micros`. - .set(Calendar.DST_OFFSET, zoneId.getRules.getDaylightSavings(instant).toMillis.toInt) .build() +// A local timestamp can have 2 instants in the cases of switching from: +// 1. Summer to winter time. +// 2. One standard time zone to another one. For example, Asia/Hong_Kong switched from JST +// to HKT on 18 November, 1945 01:59:59 AM. +// Below we check that the original `instant` is earlier or later instant. If it is an earlier +// instant, we take the standard and DST offsets of the previous day otherwise of the next one. +val trans = zoneId.getRules.getTransition(ldt) +if (trans != null && trans.isOverlap) { Review comment: when will we go into this expensive branch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #28787: [SPARK-31959][SQL] Fix Gregorian-Julian micros rebasing while switching standard time zone offset
cloud-fan commented on a change in pull request #28787: URL: https://github.com/apache/spark/pull/28787#discussion_r438549793 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/RebaseDateTimeSuite.scala ## @@ -409,4 +409,31 @@ class RebaseDateTimeSuite extends SparkFunSuite with Matchers with SQLHelper { } } } + + test("SPARK-31959: JST -> HKT at Asia/Hong_Kong in 1945") { +// The 'Asia/Hong_Kong' time zone switched from 'Japan Standard Time' (JST = UTC+9) +// to 'Hong Kong Time' (HKT = UTC+8). After Sunday, 18 November, 1945 01:59:59 AM, +// clocks were moved backward to become Sunday, 18 November, 1945 01:00:00 AM. +// In this way, the overlap happened w/o Daylight Saving Time. +val hkZid = getZoneId("Asia/Hong_Kong") +withDefaultTimeZone(hkZid) { + val ldt = LocalDateTime.of(1945, 11, 18, 1, 30, 0) + val earlierMicros = instantToMicros(ldt.atZone(hkZid).withEarlierOffsetAtOverlap().toInstant) + val laterMicros = instantToMicros(ldt.atZone(hkZid).withLaterOffsetAtOverlap().toInstant) + assert(earlierMicros + MICROS_PER_HOUR === laterMicros) + val rebasedEarlierMicros = rebaseGregorianToJulianMicros(hkZid, earlierMicros) + val rebasedLaterMicros = rebaseGregorianToJulianMicros(hkZid, laterMicros) + def toTsStr(micros: Long): String = toJavaTimestamp(micros).toString + val expected = "1945-11-18 01:30:00.0" + assert(toTsStr(rebasedEarlierMicros) === expected) + assert(toTsStr(rebasedLaterMicros) === expected) + assert(rebasedEarlierMicros + MICROS_PER_HOUR === rebasedLaterMicros) + // Check optimized rebasing Review comment: what do you mean by "optimized rebase"? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default
AmplabJenkins removed a comment on pull request #28593: URL: https://github.com/apache/spark/pull/28593#issuecomment-642413943 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/123810/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default
SparkQA commented on pull request #28593: URL: https://github.com/apache/spark/pull/28593#issuecomment-642413824 **[Test build #123810 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123810/testReport)** for PR 28593 at commit [`a6a9bd4`](https://github.com/apache/spark/commit/a6a9bd431fa401be36173a2866f6d56138472f2d). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default
AmplabJenkins removed a comment on pull request #28593: URL: https://github.com/apache/spark/pull/28593#issuecomment-642413936 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default
AmplabJenkins commented on pull request #28593: URL: https://github.com/apache/spark/pull/28593#issuecomment-642413936 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28593: [SPARK-31710][SQL] Fail casting numeric to timestamp by default
SparkQA removed a comment on pull request #28593: URL: https://github.com/apache/spark/pull/28593#issuecomment-642378726 **[Test build #123810 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123810/testReport)** for PR 28593 at commit [`a6a9bd4`](https://github.com/apache/spark/commit/a6a9bd431fa401be36173a2866f6d56138472f2d). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build
AmplabJenkins removed a comment on pull request #28788: URL: https://github.com/apache/spark/pull/28788#issuecomment-642413591 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build
AmplabJenkins commented on pull request #28788: URL: https://github.com/apache/spark/pull/28788#issuecomment-642413591 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28788: [SPARK-31960][Yarn][Build] Only populate Hadoop classpath for no-hadoop build
SparkQA commented on pull request #28788: URL: https://github.com/apache/spark/pull/28788#issuecomment-642413193 **[Test build #123822 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123822/testReport)** for PR 28788 at commit [`7c9e1ad`](https://github.com/apache/spark/commit/7c9e1ad2bb12246d689e46794bd7851d8188a545). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #28685: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function
HyukjinKwon commented on a change in pull request #28685: URL: https://github.com/apache/spark/pull/28685#discussion_r438545420 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala ## @@ -474,6 +479,52 @@ case class Lag(input: Expression, offset: Expression, default: Expression) override val direction = Descending } +/** + * The NthValue function returns the value of `input` at the row that is the `offset`th row of + * the window frame (counting from 1). Offsets start at 0, which is the current row. When the + * value of `input` is null at the `offset`th row or there is no such an `offset`th row, null + * is returned. + * + * @param input expression to evaluate `offset`th row of the window frame. + * @param offset rows to jump ahead in the partition. + */ +@ExpressionDescription( + usage = """ +_FUNC_(input[, offset]) - Returns the value of `input` at the row that is the`offset`th row + of the window frame (counting from 1). If the value of `input` at the `offset`th row is + null, null is returned. If there is no such an offset row (e.g., when the offset is 10, + size of the window frame less than 10), null is returned. + """, + since = "3.0.0") Review comment: Let's change it to 3.1.0 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #28685: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function
HyukjinKwon commented on pull request #28685: URL: https://github.com/apache/spark/pull/28685#issuecomment-642410273 cc @hvanhovell FYI This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #28685: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function
HyukjinKwon commented on a change in pull request #28685: URL: https://github.com/apache/spark/pull/28685#discussion_r437941258 ## File path: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ## @@ -993,6 +993,30 @@ object functions { Lead(e.expr, Literal(offset), Literal(defaultValue)) } + /** + * Window function: returns the value that is the `offset`th row of the window frame + * (counting from 1), and `null` if the size of window frame is less than `offset` rows. + * + * This is equivalent to the nth_value function in SQL. + * + * @group window_funcs + * @since 3.0.0 + */ + def nth_value(columnName: String, offset: Int): Column = { Review comment: @beliefer, how is it different from `lag`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28776: [SPARK-31935][SQL][3.0][test-hadoop3.2] Hadoop file system config should be effective in data source options
AmplabJenkins removed a comment on pull request #28776: URL: https://github.com/apache/spark/pull/28776#issuecomment-642408668 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28776: [SPARK-31935][SQL][3.0][test-hadoop3.2] Hadoop file system config should be effective in data source options
AmplabJenkins commented on pull request #28776: URL: https://github.com/apache/spark/pull/28776#issuecomment-642408668 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28776: [SPARK-31935][SQL][3.0][test-hadoop3.2] Hadoop file system config should be effective in data source options
SparkQA commented on pull request #28776: URL: https://github.com/apache/spark/pull/28776#issuecomment-642408240 **[Test build #123821 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123821/testReport)** for PR 28776 at commit [`da8d48d`](https://github.com/apache/spark/commit/da8d48d7984dd523f44c564ead9f7d5fb9cdd4ef). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #28798: [SPARK-31966][ML][TESTS][PYTHON] Increase the timeout for StreamingLogisticRegressionWithSGDTests.test_training_and_prediction
HyukjinKwon commented on pull request #28798: URL: https://github.com/apache/spark/pull/28798#issuecomment-642406590 Thank you @dongjoon-hyun! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun closed pull request #28798: [SPARK-31966][ML][TESTS][PYTHON] Increase the timeout for StreamingLogisticRegressionWithSGDTests.test_training_and_prediction
dongjoon-hyun closed pull request #28798: URL: https://github.com/apache/spark/pull/28798 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #28798: [SPARK-31966][ML][TESTS][PYTHON] Increase the timeout for StreamingLogisticRegressionWithSGDTests.test_training_and_prediction
dongjoon-hyun commented on pull request #28798: URL: https://github.com/apache/spark/pull/28798#issuecomment-642406211 Merged to master/3.0. Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28798: [SPARK-31966][ML][TESTS][PYTHON] Increase the timeout for StreamingLogisticRegressionWithSGDTests.test_training_and_prediction
AmplabJenkins removed a comment on pull request #28798: URL: https://github.com/apache/spark/pull/28798#issuecomment-642404308 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org