[spark] branch master updated (439e94c -> 82af318)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 439e94c [SPARK-35737][SQL] Parse day-time interval literals to tightest types add 82af318 [SPARK-35748][SS][SQL] Fix StreamingJoinHelper to be able to handle day-time interval No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/analysis/StreamingJoinHelper.scala| 3 +++ .../sql/catalyst/analysis/StreamingJoinHelperSuite.scala | 12 2 files changed, 15 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-35771][SQL] Format year-month intervals using type fields
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 184f65e [SPARK-35771][SQL] Format year-month intervals using type fields 184f65e is described below commit 184f65e7c7359d4ff68ce5b558b186bca6e8efb1 Author: Kousuke Saruta AuthorDate: Wed Jun 16 11:08:02 2021 +0300 [SPARK-35771][SQL] Format year-month intervals using type fields ### What changes were proposed in this pull request? This PR proposes to format year-month interval to strings using the start and end fields of `YearMonthIntervalType`. ### Why are the changes needed? Currently, they are ignored, and any `YearMonthIntervalType` is formatted as `INTERVAL YEAR TO MONTH`. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? New test. Closes #32924 from sarutak/year-month-interval-format. Authored-by: Kousuke Saruta Signed-off-by: Max Gekk --- .../spark/sql/catalyst/util/IntervalUtils.scala| 21 - .../sql/catalyst/util/IntervalUtilsSuite.scala | 26 ++ 2 files changed, 42 insertions(+), 5 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala index aca3d15..e9be3de 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala @@ -29,7 +29,7 @@ import org.apache.spark.sql.catalyst.util.DateTimeUtils.millisToMicros import org.apache.spark.sql.catalyst.util.IntervalStringStyles.{ANSI_STYLE, HIVE_STYLE, IntervalStyle} import org.apache.spark.sql.errors.QueryExecutionErrors import org.apache.spark.sql.internal.SQLConf -import org.apache.spark.sql.types.{DayTimeIntervalType, Decimal} +import org.apache.spark.sql.types.{DayTimeIntervalType, Decimal, YearMonthIntervalType} import org.apache.spark.unsafe.types.{CalendarInterval, UTF8String} // The style of textual representation of intervals @@ -945,7 +945,6 @@ object IntervalUtils { def toYearMonthIntervalString( months: Int, style: IntervalStyle, - // TODO(SPARK-35771): Format year-month intervals using type fields startField: Byte, endField: Byte): String = { var sign = "" @@ -954,10 +953,22 @@ object IntervalUtils { sign = "-" absMonths = -absMonths } -val payload = s"$sign${absMonths / MONTHS_PER_YEAR}-${absMonths % MONTHS_PER_YEAR}" +val year = s"$sign${absMonths / MONTHS_PER_YEAR}" +val month = s"${absMonths % MONTHS_PER_YEAR}" +val yearAndMonth = s"$year-$month" style match { - case ANSI_STYLE => s"INTERVAL '$payload' YEAR TO MONTH" - case HIVE_STYLE => payload + case ANSI_STYLE => +val formatBuilder = new StringBuilder("INTERVAL '") +if (startField == endField) { + startField match { +case YearMonthIntervalType.YEAR => formatBuilder.append(s"$year' YEAR") +case YearMonthIntervalType.MONTH => formatBuilder.append(s"$month' MONTH") + } +} else { + formatBuilder.append(s"$yearAndMonth' YEAR TO MONTH") +} +formatBuilder.toString + case HIVE_STYLE => s"$yearAndMonth" } } diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/IntervalUtilsSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/IntervalUtilsSuite.scala index c2ece95..b6cf152 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/IntervalUtilsSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/IntervalUtilsSuite.scala @@ -619,4 +619,30 @@ class IntervalUtilsSuite extends SparkFunSuite with SQLHelper { assert(toDayTimeIntervalString(micros, ANSI_STYLE, SECOND, SECOND) === sec) } } + + test("SPARK-35771: Format year-month intervals using type fields") { +import org.apache.spark.sql.types.YearMonthIntervalType._ +Seq( + 0 -> +("INTERVAL '0-0' YEAR TO MONTH", "INTERVAL '0' YEAR", "INTERVAL '0' MONTH"), + -11 -> ("INTERVAL '-0-11' YEAR TO MONTH", "INTERVAL '-0' YEAR", "INTERVAL '11' MONTH"), + 11 -> ("INTERVAL '0-11' YEAR TO MONTH", "INTERVAL '0' YEAR", "INTERVAL '11' MONTH"), + -13 -> ("INTERVAL '-1-1' YEAR TO MONTH", "INTERVAL '-1' YEAR", "INTERVAL '1' MONTH&
[spark] branch master updated (aaa8a80 -> 4530760)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from aaa8a80 [SPARK-35613][CORE][SQL] Cache commonly occurring strings in SQLMetrics, JSONProtocol and AccumulatorV2 classes add 4530760 [SPARK-35774][SQL] Parse any year-month interval types in SQL No new revisions were added by this update. Summary of changes: .../main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 | 2 +- .../scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala | 9 +++-- .../test/scala/org/apache/spark/sql/types/DataTypeSuite.scala| 2 +- 3 files changed, 9 insertions(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (82af318 -> aab0c2b)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 82af318 [SPARK-35748][SS][SQL] Fix StreamingJoinHelper to be able to handle day-time interval add aab0c2b [SPARK-35736][SPARK-35737][SQL][FOLLOWUP] Move a common logic to DayTimeIntervalType No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/catalyst/parser/AstBuilder.scala| 12 .../org/apache/spark/sql/types/DayTimeIntervalType.scala | 2 ++ 2 files changed, 6 insertions(+), 8 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-35680][SQL] Add fields to `YearMonthIntervalType`
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 61ce8f7 [SPARK-35680][SQL] Add fields to `YearMonthIntervalType` 61ce8f7 is described below commit 61ce8f764982306f2c7a8b2b3dfe22963b49f2d5 Author: Max Gekk AuthorDate: Tue Jun 15 23:08:12 2021 +0300 [SPARK-35680][SQL] Add fields to `YearMonthIntervalType` ### What changes were proposed in this pull request? Extend `YearMonthIntervalType` to support interval fields. Valid interval field values: - 0 (YEAR) - 1 (MONTH) After the changes, the following year-month interval types are supported: 1. `YearMonthIntervalType(0, 0)` or `YearMonthIntervalType(YEAR, YEAR)` 2. `YearMonthIntervalType(0, 1)` or `YearMonthIntervalType(YEAR, MONTH)`. **It is the default one**. 3. `YearMonthIntervalType(1, 1)` or `YearMonthIntervalType(MONTH, MONTH)` Closes #32825 ### Why are the changes needed? In the current implementation, Spark supports only `interval year to month` but the SQL standard allows to specify the start and end fields. The changes will allow to follow ANSI SQL standard more precisely. ### Does this PR introduce _any_ user-facing change? Yes but `YearMonthIntervalType` has not been released yet. ### How was this patch tested? By existing test suites. Closes #32909 from MaxGekk/add-fields-to-YearMonthIntervalType. Authored-by: Max Gekk Signed-off-by: Max Gekk --- .../spark/sql/catalyst/expressions/UnsafeRow.java | 11 ++--- .../java/org/apache/spark/sql/types/DataTypes.java | 19 +--- .../sql/catalyst/CatalystTypeConverters.scala | 3 +- .../apache/spark/sql/catalyst/InternalRow.scala| 4 +- .../spark/sql/catalyst/JavaTypeInference.scala | 2 +- .../spark/sql/catalyst/ScalaReflection.scala | 10 ++--- .../spark/sql/catalyst/SerializerBuildHelper.scala | 2 +- .../spark/sql/catalyst/analysis/Analyzer.scala | 18 .../apache/spark/sql/catalyst/dsl/package.scala| 7 ++- .../spark/sql/catalyst/encoders/RowEncoder.scala | 6 +-- .../spark/sql/catalyst/expressions/Cast.scala | 35 +-- .../expressions/InterpretedUnsafeProjection.scala | 2 +- .../catalyst/expressions/SpecificInternalRow.scala | 2 +- .../catalyst/expressions/aggregate/Average.scala | 6 +-- .../sql/catalyst/expressions/aggregate/Sum.scala | 2 +- .../sql/catalyst/expressions/arithmetic.scala | 10 ++--- .../expressions/codegen/CodeGenerator.scala| 4 +- .../expressions/collectionOperations.scala | 8 ++-- .../catalyst/expressions/datetimeExpressions.scala | 2 +- .../spark/sql/catalyst/expressions/hash.scala | 2 +- .../catalyst/expressions/intervalExpressions.scala | 10 ++--- .../spark/sql/catalyst/expressions/literals.scala | 16 --- .../catalyst/expressions/windowExpressions.scala | 4 +- .../spark/sql/catalyst/parser/AstBuilder.scala | 6 ++- .../spark/sql/catalyst/util/IntervalUtils.scala| 17 ++- .../apache/spark/sql/catalyst/util/TypeUtils.scala | 4 +- .../spark/sql/errors/QueryCompilationErrors.scala | 11 + .../org/apache/spark/sql/types/DataType.scala | 4 +- .../spark/sql/types/YearMonthIntervalType.scala| 52 ++ .../org/apache/spark/sql/util/ArrowUtils.scala | 4 +- .../org/apache/spark/sql/RandomDataGenerator.scala | 2 +- .../spark/sql/RandomDataGeneratorSuite.scala | 4 +- .../sql/catalyst/CatalystTypeConvertersSuite.scala | 2 +- .../sql/catalyst/encoders/RowEncoderSuite.scala| 18 .../expressions/ArithmeticExpressionSuite.scala| 10 ++--- .../spark/sql/catalyst/expressions/CastSuite.scala | 30 ++--- .../sql/catalyst/expressions/CastSuiteBase.scala | 8 ++-- .../expressions/DateExpressionsSuite.scala | 19 .../expressions/HashExpressionsSuite.scala | 2 +- .../expressions/IntervalExpressionsSuite.scala | 24 ++ .../expressions/LiteralExpressionSuite.scala | 6 +-- .../catalyst/expressions/LiteralGenerator.scala| 4 +- .../expressions/MutableProjectionSuite.scala | 14 +++--- .../optimizer/PushFoldableIntoBranchesSuite.scala | 16 +++ .../sql/catalyst/parser/DataTypeParserSuite.scala | 2 +- .../sql/catalyst/util/IntervalUtilsSuite.scala | 5 ++- .../org/apache/spark/sql/types/DataTypeSuite.scala | 6 +-- .../apache/spark/sql/types/DataTypeTestUtils.scala | 18 .../apache/spark/sql/util/ArrowUtilsSuite.scala| 2 +- .../apache/spark/sql/execution/HiveResult.scala| 4 +- .../sql/execution/aggregate/HashMapGenerator.scala | 3 +- .../spark/sql/execution/aggregate/udaf.scala | 4 +- .../spark/sql/execution/arrow/ArrowWriter.scala| 2 +- .../sql/execution/columnar
[spark] branch branch-3.1 updated: [SPARK-35679][SQL] instantToMicros overflow
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch branch-3.1 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.1 by this push: new ff83105 [SPARK-35679][SQL] instantToMicros overflow ff83105 is described below commit ff831054d9a3b0fbd58b532bf6c527276d7994c6 Author: dgd-contributor AuthorDate: Thu Jun 10 08:08:51 2021 +0300 [SPARK-35679][SQL] instantToMicros overflow ### Why are the changes needed? With Long.minValue cast to an instant, secs will be floored in function microsToInstant and cause overflow when multiply with Micros_per_second ``` def microsToInstant(micros: Long): Instant = { val secs = Math.floorDiv(micros, MICROS_PER_SECOND) // Unfolded Math.floorMod(us, MICROS_PER_SECOND) to reuse the result of // the above calculation of `secs` via `floorDiv`. val mos = micros - secs * MICROS_PER_SECOND <- it will overflow here Instant.ofEpochSecond(secs, mos * NANOS_PER_MICROS) } ``` But the overflow is acceptable because it won't produce any change to the result However, when convert the instant back to micro value, it will raise Overflow Error ``` def instantToMicros(instant: Instant): Long = { val us = Math.multiplyExact(instant.getEpochSecond, MICROS_PER_SECOND) <- It overflow here val result = Math.addExact(us, NANOSECONDS.toMicros(instant.getNano)) result } ``` Code to reproduce this error ``` instantToMicros(microToInstant(Long.MinValue)) ``` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Test added Closes #32839 from dgd-contributor/SPARK-35679_instantToMicro. Authored-by: dgd-contributor Signed-off-by: Max Gekk (cherry picked from commit aa3de4077302fe7e0b23b01a338c7feab0e5974e) Signed-off-by: Max Gekk --- .../org/apache/spark/sql/catalyst/util/DateTimeUtils.scala | 14 +++--- .../spark/sql/catalyst/util/DateTimeUtilsSuite.scala | 5 + 2 files changed, 16 insertions(+), 3 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala index 89cb67c..a4c34e1 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala @@ -375,6 +375,9 @@ object DateTimeUtils { timestamp.get } } + // See issue SPARK-35679 + // min second cause overflow in instant to micro + private val MIN_SECONDS = Math.floorDiv(Long.MinValue, MICROS_PER_SECOND) /** * Gets the number of microseconds since the epoch of 1970-01-01 00:00:00Z from the given @@ -382,9 +385,14 @@ object DateTimeUtils { * microseconds where microsecond 0 is 1970-01-01 00:00:00Z. */ def instantToMicros(instant: Instant): Long = { -val us = Math.multiplyExact(instant.getEpochSecond, MICROS_PER_SECOND) -val result = Math.addExact(us, NANOSECONDS.toMicros(instant.getNano)) -result +val secs = instant.getEpochSecond +if (secs == MIN_SECONDS) { + val us = Math.multiplyExact(secs + 1, MICROS_PER_SECOND) + Math.addExact(us, NANOSECONDS.toMicros(instant.getNano) - MICROS_PER_SECOND) +} else { + val us = Math.multiplyExact(secs, MICROS_PER_SECOND) + Math.addExact(us, NANOSECONDS.toMicros(instant.getNano)) +} } /** diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala index fb2d511..8cc6bf2 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala @@ -688,4 +688,9 @@ class DateTimeUtilsSuite extends SparkFunSuite with Matchers with SQLHelper { assert(toDate("tomorrow CET ", zoneId).get === today + 1) } } + + test("SPARK-35679: instantToMicros should be able to return microseconds of Long.MinValue") { +assert(instantToMicros(microsToInstant(Long.MaxValue)) === Long.MaxValue) +assert(instantToMicros(microsToInstant(Long.MinValue)) === Long.MinValue) + } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (8dde20a -> aa3de40)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 8dde20a [SPARK-35675][SQL] EnsureRequirements remove shuffle should respect PartitioningCollection add aa3de40 [SPARK-35679][SQL] instantToMicros overflow No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/catalyst/util/DateTimeUtils.scala | 14 +++--- .../spark/sql/catalyst/util/DateTimeUtilsSuite.scala | 5 + 2 files changed, 16 insertions(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-35679][SQL] instantToMicros overflow
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new fe18354 [SPARK-35679][SQL] instantToMicros overflow fe18354 is described below commit fe1835406ac46eca2b147e59a92a91ba04f61121 Author: dgd-contributor AuthorDate: Thu Jun 10 08:08:51 2021 +0300 [SPARK-35679][SQL] instantToMicros overflow With Long.minValue cast to an instant, secs will be floored in function microsToInstant and cause overflow when multiply with Micros_per_second ``` def microsToInstant(micros: Long): Instant = { val secs = Math.floorDiv(micros, MICROS_PER_SECOND) // Unfolded Math.floorMod(us, MICROS_PER_SECOND) to reuse the result of // the above calculation of `secs` via `floorDiv`. val mos = micros - secs * MICROS_PER_SECOND <- it will overflow here Instant.ofEpochSecond(secs, mos * NANOS_PER_MICROS) } ``` But the overflow is acceptable because it won't produce any change to the result However, when convert the instant back to micro value, it will raise Overflow Error ``` def instantToMicros(instant: Instant): Long = { val us = Math.multiplyExact(instant.getEpochSecond, MICROS_PER_SECOND) <- It overflow here val result = Math.addExact(us, NANOSECONDS.toMicros(instant.getNano)) result } ``` Code to reproduce this error ``` instantToMicros(microToInstant(Long.MinValue)) ``` No Test added Closes #32839 from dgd-contributor/SPARK-35679_instantToMicro. Authored-by: dgd-contributor Signed-off-by: Max Gekk (cherry picked from commit aa3de4077302fe7e0b23b01a338c7feab0e5974e) Signed-off-by: Max Gekk --- .../spark/sql/catalyst/util/DateTimeUtils.scala | 20 +--- .../spark/sql/catalyst/util/DateTimeUtilsSuite.scala | 5 + 2 files changed, 22 insertions(+), 3 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala index 52df227..2e242f2 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala @@ -413,10 +413,24 @@ object DateTimeUtils { } } + // See issue SPARK-35679 + // min second cause overflow in instant to micro + private val MIN_SECONDS = Math.floorDiv(Long.MinValue, MICROS_PER_SECOND) + + /** + * Gets the number of microseconds since the epoch of 1970-01-01 00:00:00Z from the given + * instance of `java.time.Instant`. The epoch microsecond count is a simple incrementing count of + * microseconds where microsecond 0 is 1970-01-01 00:00:00Z. + */ def instantToMicros(instant: Instant): Long = { -val us = Math.multiplyExact(instant.getEpochSecond, MICROS_PER_SECOND) -val result = Math.addExact(us, NANOSECONDS.toMicros(instant.getNano)) -result +val secs = instant.getEpochSecond +if (secs == MIN_SECONDS) { + val us = Math.multiplyExact(secs + 1, MICROS_PER_SECOND) + Math.addExact(us, NANOSECONDS.toMicros(instant.getNano) - MICROS_PER_SECOND) +} else { + val us = Math.multiplyExact(secs, MICROS_PER_SECOND) + Math.addExact(us, NANOSECONDS.toMicros(instant.getNano)) +} } def microsToInstant(us: Long): Instant = { diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala index d457c88..d451aff 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala @@ -686,4 +686,9 @@ class DateTimeUtilsSuite extends SparkFunSuite with Matchers with SQLHelper { assert(toDate("tomorrow CET ", zoneId).get === today + 1) } } + + test("SPARK-35679: instantToMicros should be able to return microseconds of Long.MinValue") { +assert(instantToMicros(microsToInstant(Long.MaxValue)) === Long.MaxValue) +assert(instantToMicros(microsToInstant(Long.MinValue)) === Long.MinValue) + } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-35734][SQL] Format day-time intervals using type fields
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 80f7989 [SPARK-35734][SQL] Format day-time intervals using type fields 80f7989 is described below commit 80f7989d9a761eff1a0b3c64ec3aabb81506953d Author: Kousuke Saruta AuthorDate: Sat Jun 12 21:45:12 2021 +0300 [SPARK-35734][SQL] Format day-time intervals using type fields ### What changes were proposed in this pull request? This PR add a feature which formats day-time interval to strings using the start and end fields of `DayTimeIntervalType`. ### Why are the changes needed? Currently, they are ignored, and any `DayTimeIntervalType` is formatted as `INTERVAL DAY TO SECOND.` ### Does this PR introduce _any_ user-facing change? Yes. The format of day-time intervals is determined the start and end fields. ### How was this patch tested? New test. Closes #32891 from sarutak/interval-format. Authored-by: Kousuke Saruta Signed-off-by: Max Gekk --- .../spark/sql/catalyst/util/IntervalUtils.scala| 58 +-- .../sql/catalyst/util/IntervalUtilsSuite.scala | 84 ++ 2 files changed, 138 insertions(+), 4 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala index c18cca9..dda5581 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala @@ -21,6 +21,7 @@ import java.time.{Duration, Period} import java.time.temporal.ChronoUnit import java.util.concurrent.TimeUnit +import scala.collection.mutable import scala.util.control.NonFatal import org.apache.spark.sql.catalyst.util.DateTimeConstants._ @@ -28,7 +29,7 @@ import org.apache.spark.sql.catalyst.util.DateTimeUtils.millisToMicros import org.apache.spark.sql.catalyst.util.IntervalStringStyles.{ANSI_STYLE, HIVE_STYLE, IntervalStyle} import org.apache.spark.sql.errors.QueryExecutionErrors import org.apache.spark.sql.internal.SQLConf -import org.apache.spark.sql.types.Decimal +import org.apache.spark.sql.types.{DayTimeIntervalType, Decimal} import org.apache.spark.unsafe.types.{CalendarInterval, UTF8String} // The style of textual representation of intervals @@ -960,18 +961,34 @@ object IntervalUtils { def toDayTimeIntervalString( micros: Long, style: IntervalStyle, - // TODO(SPARK-35734): Format day-time intervals using type fields startField: Byte, endField: Byte): String = { var sign = "" var rest = micros +val from = DayTimeIntervalType.fieldToString(startField).toUpperCase +val to = DayTimeIntervalType.fieldToString(endField).toUpperCase if (micros < 0) { if (micros == Long.MinValue) { // Especial handling of minimum `Long` value because negate op overflows `Long`. // seconds = 106751991 * (24 * 60 * 60) + 4 * 60 * 60 + 54 = 9223372036854 // microseconds = -922337203685400L-775808 == Long.MinValue val minIntervalString = style match { - case ANSI_STYLE => "INTERVAL '-106751991 04:00:54.775808' DAY TO SECOND" + case ANSI_STYLE => +val baseStr = "-106751991 04:00:54.775808" +val fromPos = startField match { + case DayTimeIntervalType.DAY => 0 + case DayTimeIntervalType.HOUR => 11 + case DayTimeIntervalType.MINUTE => 14 + case DayTimeIntervalType.SECOND => 17 +} +val toPos = endField match { + case DayTimeIntervalType.DAY => 10 + case DayTimeIntervalType.HOUR => 13 + case DayTimeIntervalType.MINUTE => 16 + case DayTimeIntervalType.SECOND => baseStr.length +} +val postfix = if (startField == endField) from else s"$from TO $to" +s"INTERVAL '${baseStr.substring(fromPos, toPos)}' $postfix" case HIVE_STYLE => "-106751991 04:00:54.775808000" } return minIntervalString @@ -992,7 +1009,40 @@ object IntervalUtils { val secStr = java.math.BigDecimal.valueOf(secondsWithFraction, 6) .stripTrailingZeros() .toPlainString() -f"INTERVAL '$sign$days $hours%02d:$minutes%02d:$leadSecZero$secStr' DAY TO SECOND" +val formatBuilder = new StringBuilder("INTERVAL '") +if (startField == endField) { + startField match { +case DayTimeIntervalType.DAY => formatBuilder.append(s
[spark] branch master updated: [SPARK-35716][SQL] Support casting of timestamp without time zone to date type
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new d21ff13 [SPARK-35716][SQL] Support casting of timestamp without time zone to date type d21ff13 is described below commit d21ff1318f614fc207d9cd3c485e4337faa8e878 Author: Gengliang Wang AuthorDate: Thu Jun 10 23:37:02 2021 +0300 [SPARK-35716][SQL] Support casting of timestamp without time zone to date type ### What changes were proposed in this pull request? Extend the Cast expression and support TimestampWithoutTZType in casting to DateType. ### Why are the changes needed? To conform the ANSI SQL standard which requires to support such casting. ### Does this PR introduce _any_ user-facing change? No, the new timestamp type is not released yet. ### How was this patch tested? Unit test Closes #32869 from gengliangwang/castToDate. Authored-by: Gengliang Wang Signed-off-by: Max Gekk --- .../org/apache/spark/sql/catalyst/expressions/Cast.scala | 9 + .../org/apache/spark/sql/catalyst/expressions/CastSuite.scala | 11 +-- 2 files changed, 18 insertions(+), 2 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala index 8de19ba..fba17d3 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala @@ -72,6 +72,7 @@ object Cast { case (StringType, DateType) => true case (TimestampType, DateType) => true +case (TimestampWithoutTZType, DateType) => true case (StringType, CalendarIntervalType) => true case (StringType, DayTimeIntervalType) => true @@ -534,6 +535,8 @@ abstract class CastBase extends UnaryExpression with TimeZoneAwareExpression wit // throw valid precision more than seconds, according to Hive. // Timestamp.nanos is in 0 to 999,999,999, no more than a second. buildCast[Long](_, t => microsToDays(t, zoneId)) +case TimestampWithoutTZType => + buildCast[Long](_, t => microsToDays(t, ZoneOffset.UTC)) } // IntervalConverter @@ -1204,6 +1207,11 @@ abstract class CastBase extends UnaryExpression with TimeZoneAwareExpression wit (c, evPrim, evNull) => code"""$evPrim = org.apache.spark.sql.catalyst.util.DateTimeUtils.microsToDays($c, $zid);""" + case TimestampWithoutTZType => +(c, evPrim, evNull) => + // scalastyle:off line.size.limit + code"$evPrim = org.apache.spark.sql.catalyst.util.DateTimeUtils.microsToDays($c, java.time.ZoneOffset.UTC);" + // scalastyle:on line.size.limit case _ => (c, evPrim, evNull) => code"$evNull = true;" } @@ -1953,6 +1961,7 @@ object AnsiCast { case (StringType, DateType) => true case (TimestampType, DateType) => true +case (TimestampWithoutTZType, DateType) => true case (_: NumericType, _: NumericType) => true case (StringType, _: NumericType) => true diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala index a4e4257..51a7740 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala @@ -18,7 +18,7 @@ package org.apache.spark.sql.catalyst.expressions import java.sql.{Date, Timestamp} -import java.time.{DateTimeException, Duration, LocalDateTime, Period} +import java.time.{DateTimeException, Duration, LocalDate, LocalDateTime, Period} import java.time.temporal.ChronoUnit import java.util.{Calendar, TimeZone} @@ -1256,10 +1256,17 @@ abstract class AnsiCastSuiteBase extends CastSuiteBase { test("SPARK-35711: cast timestamp without time zone to timestamp with local time zone") { specialTs.foreach { s => - val dt = LocalDateTime.parse(s.replace(" ", "T")) + val dt = LocalDateTime.parse(s) checkEvaluation(cast(dt, TimestampType), DateTimeUtils.localDateTimeToMicros(dt)) } } + + test("SPARK-35716: cast timestamp without time zone to date type") { +specialTs.foreach { s => + val dt = LocalDateTime.parse(s) + checkEvaluation(cast(dt, DateType), LocalDate.parse(s.split("T")(0))) +} + } } /** - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (0ba1d38 -> 6272222)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 0ba1d38 [SPARK-35701][SQL] Use copy-on-write semantics for SQLConf registered configurations add 627 [SPARK-35719][SQL] Support type conversion between timestamp and timestamp without time zone type No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/expressions/Cast.scala | 27 ++ .../spark/sql/catalyst/expressions/CastSuite.scala | 24 +++ 2 files changed, 43 insertions(+), 8 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (69aa7ad -> 7978fdc)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 69aa7ad [SPARK-35714][CORE] Bug fix for deadlock during the executor shutdown add 7978fdc [SPARK-35736][SQL] Parse any day-time interval types in SQL No new revisions were added by this update. Summary of changes: docs/sql-ref-ansi-compliance.md | 2 ++ .../antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4| 9 - .../org/apache/spark/sql/catalyst/parser/AstBuilder.scala | 11 +-- .../test/scala/org/apache/spark/sql/types/DataTypeSuite.scala | 2 +- 4 files changed, 20 insertions(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (7fcb127 -> 45b7f76)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 7fcb127 [SPARK-35670][BUILD] Upgrade ZSTD-JNI to 1.5.0-2 add 45b7f76 [SPARK-35095][SS][TESTS] Use ANSI intervals in streaming join tests No new revisions were added by this update. Summary of changes: .../spark/sql/streaming/StreamingJoinSuite.scala | 41 +++--- 1 file changed, 37 insertions(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-35726][SQL] Truncate java.time.Duration by fields of day-time interval type
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 2ebad72 [SPARK-35726][SQL] Truncate java.time.Duration by fields of day-time interval type 2ebad72 is described below commit 2ebad727587e25b8bf4a8439593e7402ea4e2827 Author: Angerszh AuthorDate: Sat Jun 19 13:51:21 2021 +0300 [SPARK-35726][SQL] Truncate java.time.Duration by fields of day-time interval type ### What changes were proposed in this pull request? Support truncate java.time.Duration by fields of day-time interval type. ### Why are the changes needed? To respect fields of the target day-time interval types. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Added UT Closes #32950 from AngersZh/SPARK-35726. Authored-by: Angerszh Signed-off-by: Max Gekk --- .../spark/sql/catalyst/CatalystTypeConverters.scala | 11 ++- .../spark/sql/catalyst/util/IntervalUtils.scala | 15 +-- .../org/apache/spark/sql/RandomDataGenerator.scala | 13 - .../sql/catalyst/CatalystTypeConvertersSuite.scala | 20 4 files changed, 51 insertions(+), 8 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystTypeConverters.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystTypeConverters.scala index 08a5fd5..1742524 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystTypeConverters.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystTypeConverters.scala @@ -32,6 +32,7 @@ import org.apache.spark.sql.catalyst.expressions._ import org.apache.spark.sql.catalyst.util._ import org.apache.spark.sql.internal.SQLConf import org.apache.spark.sql.types._ +import org.apache.spark.sql.types.DayTimeIntervalType._ import org.apache.spark.sql.types.YearMonthIntervalType._ import org.apache.spark.unsafe.types.UTF8String @@ -76,8 +77,7 @@ object CatalystTypeConverters { case LongType => LongConverter case FloatType => FloatConverter case DoubleType => DoubleConverter - // TODO(SPARK-35726): Truncate java.time.Duration by fields of day-time interval type - case _: DayTimeIntervalType => DurationConverter + case DayTimeIntervalType(_, endField) => DurationConverter(endField) case YearMonthIntervalType(_, endField) => PeriodConverter(endField) case dataType: DataType => IdentityConverter(dataType) } @@ -432,9 +432,10 @@ object CatalystTypeConverters { override def toScalaImpl(row: InternalRow, column: Int): Double = row.getDouble(column) } - private object DurationConverter extends CatalystTypeConverter[Duration, Duration, Any] { + private case class DurationConverter(endField: Byte) + extends CatalystTypeConverter[Duration, Duration, Any] { override def toCatalystImpl(scalaValue: Duration): Long = { - IntervalUtils.durationToMicros(scalaValue) + IntervalUtils.durationToMicros(scalaValue, endField) } override def toScala(catalystValue: Any): Duration = { if (catalystValue == null) null @@ -523,7 +524,7 @@ object CatalystTypeConverters { map, (key: Any) => convertToCatalyst(key), (value: Any) => convertToCatalyst(value)) -case d: Duration => DurationConverter.toCatalyst(d) +case d: Duration => DurationConverter(SECOND).toCatalyst(d) case p: Period => PeriodConverter(MONTH).toCatalyst(p) case other => other } diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala index e87ea51..3f56d2f 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala @@ -892,18 +892,29 @@ object IntervalUtils { * @throws ArithmeticException If numeric overflow occurs */ def durationToMicros(duration: Duration): Long = { +durationToMicros(duration, DayTimeIntervalType.SECOND) + } + + def durationToMicros(duration: Duration, endField: Byte): Long = { val seconds = duration.getSeconds -if (seconds == minDurationSeconds) { +val micros = if (seconds == minDurationSeconds) { val microsInSeconds = (minDurationSeconds + 1) * MICROS_PER_SECOND val nanoAdjustment = duration.getNano assert(0 <= nanoAdjustment && nanoAdjustment < NANOS_PER_SECOND, "Duration.getNano() must return the adjustment to the seconds field " + -"in the range from 0 to 9 nanoseconds, inclusive.&q
[spark] branch master updated: [SPARK-35819][SQL] Support Cast between different field YearMonthIntervalType
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 86bcd1f [SPARK-35819][SQL] Support Cast between different field YearMonthIntervalType 86bcd1f is described below commit 86bcd1fba09d9b5e4d36a48824354aaae769fa21 Author: Angerszh AuthorDate: Sat Jun 19 21:43:06 2021 +0300 [SPARK-35819][SQL] Support Cast between different field YearMonthIntervalType ### What changes were proposed in this pull request? Support Cast between different field YearMonthIntervalType ### Why are the changes needed? Make user convenient to get different field YearMonthIntervalType ### Does this PR introduce _any_ user-facing change? User can call cast YearMonthIntervalType(YEAR, MONTH) to YearMonthIntervalType(YEAR, YEAR) etc ### How was this patch tested? Added UT Closes #32974 from AngersZh/SPARK-35819. Authored-by: Angerszh Signed-off-by: Max Gekk --- .../org/apache/spark/sql/catalyst/expressions/Cast.scala | 12 .../apache/spark/sql/catalyst/expressions/CastSuite.scala| 11 +++ 2 files changed, 23 insertions(+) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala index 52801ec..cdf0753 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala @@ -82,6 +82,8 @@ object Cast { case (StringType, _: DayTimeIntervalType) => true case (StringType, _: YearMonthIntervalType) => true +case (_: YearMonthIntervalType, _: YearMonthIntervalType) => true + case (StringType, _: NumericType) => true case (BooleanType, _: NumericType) => true case (DateType, _: NumericType) => true @@ -580,6 +582,8 @@ abstract class CastBase extends UnaryExpression with TimeZoneAwareExpression wit it: YearMonthIntervalType): Any => Any = from match { case StringType => buildCast[UTF8String](_, s => IntervalUtils.castStringToYMInterval(s, it.startField, it.endField)) +case _: YearMonthIntervalType => buildCast[Int](_, s => + IntervalUtils.periodToMonths(IntervalUtils.monthsToPeriod(s), it.endField)) } // LongConverter @@ -1481,6 +1485,12 @@ abstract class CastBase extends UnaryExpression with TimeZoneAwareExpression wit code""" $evPrim = $util.castStringToYMInterval($c, (byte)${it.startField}, (byte)${it.endField}); """ +case _: YearMonthIntervalType => + val util = IntervalUtils.getClass.getCanonicalName.stripSuffix("$") + (c, evPrim, _) => +code""" + $evPrim = $util.periodToMonths($util.monthsToPeriod($c), (byte)${it.endField}); +""" } private[this] def decimalToTimestampCode(d: ExprValue): Block = { @@ -2051,6 +2061,8 @@ object AnsiCast { case (StringType, _: DayTimeIntervalType) => true case (StringType, _: YearMonthIntervalType) => true +case (_: YearMonthIntervalType, _: YearMonthIntervalType) => true + case (StringType, DateType) => true case (TimestampType, DateType) => true case (TimestampWithoutTZType, DateType) => true diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala index d114968..51c3681 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala @@ -30,6 +30,7 @@ import org.apache.spark.sql.catalyst.util.DateTimeTestUtils._ import org.apache.spark.sql.catalyst.util.DateTimeUtils._ import org.apache.spark.sql.internal.SQLConf import org.apache.spark.sql.types._ +import org.apache.spark.sql.types.YearMonthIntervalType._ import org.apache.spark.unsafe.types.UTF8String /** @@ -662,4 +663,14 @@ class CastSuite extends CastSuiteBase { checkEvaluation(cast(invalidInput, TimestampWithoutTZType), null) } } + + test("SPARK-35819: Support cast YearMonthIntervalType in different fields") { +val ym = cast(Literal.create("1-1"), YearMonthIntervalType(YEAR, MONTH)) +Seq(YearMonthIntervalType(YEAR, YEAR) -> 12, + YearMonthIntervalType(YEAR, MONTH) -> 13, + YearMonthIntervalType(MONTH, MONTH) -> 13) + .foreach { case (dt, value) => +checkEvaluation(cast(ym, dt), value) + } + } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (6d30991 -> 4758dc7)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 6d30991 [SPARK-35303][SPARK-35498][PYTHON][FOLLOW-UP] Copy local properties when starting the thread, and use inheritable thread in the current codebase add 4758dc7 [SPARK-35771][SQL][FOLLOWUP] IntervalUtils.toYearMonthIntervalString should consider the case year-month type is casted as month type No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/catalyst/util/IntervalUtils.scala | 5 ++--- .../spark/sql/catalyst/util/IntervalUtilsSuite.scala | 14 +++--- 2 files changed, 9 insertions(+), 10 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-35769][SQL] Truncate java.time.Period by fields of year-month interval type
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 071566c [SPARK-35769][SQL] Truncate java.time.Period by fields of year-month interval type 071566c is described below commit 071566caf3d1efc752006afaec974c6c4cfdc679 Author: Angerszh AuthorDate: Fri Jun 18 11:55:57 2021 +0300 [SPARK-35769][SQL] Truncate java.time.Period by fields of year-month interval type ### What changes were proposed in this pull request? Support truncate java.time.Period by fields of year-month interval type ### Why are the changes needed? To follow the SQL standard and respect the field restriction of the target year-month type. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Added UT Closes #32945 from AngersZh/SPARK-35769. Authored-by: Angerszh Signed-off-by: Max Gekk --- .../apache/spark/sql/catalyst/CatalystTypeConverters.scala| 11 ++- .../org/apache/spark/sql/catalyst/util/IntervalUtils.scala| 11 ++- .../test/scala/org/apache/spark/sql/RandomDataGenerator.scala | 7 +-- .../spark/sql/catalyst/CatalystTypeConvertersSuite.scala | 10 ++ 4 files changed, 31 insertions(+), 8 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystTypeConverters.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystTypeConverters.scala index 38790e0..08a5fd5 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystTypeConverters.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystTypeConverters.scala @@ -32,6 +32,7 @@ import org.apache.spark.sql.catalyst.expressions._ import org.apache.spark.sql.catalyst.util._ import org.apache.spark.sql.internal.SQLConf import org.apache.spark.sql.types._ +import org.apache.spark.sql.types.YearMonthIntervalType._ import org.apache.spark.unsafe.types.UTF8String /** @@ -77,8 +78,7 @@ object CatalystTypeConverters { case DoubleType => DoubleConverter // TODO(SPARK-35726): Truncate java.time.Duration by fields of day-time interval type case _: DayTimeIntervalType => DurationConverter - // TODO(SPARK-35769): Truncate java.time.Period by fields of year-month interval type - case _: YearMonthIntervalType => PeriodConverter + case YearMonthIntervalType(_, endField) => PeriodConverter(endField) case dataType: DataType => IdentityConverter(dataType) } converter.asInstanceOf[CatalystTypeConverter[Any, Any, Any]] @@ -444,9 +444,10 @@ object CatalystTypeConverters { IntervalUtils.microsToDuration(row.getLong(column)) } - private object PeriodConverter extends CatalystTypeConverter[Period, Period, Any] { + private case class PeriodConverter(endField: Byte) + extends CatalystTypeConverter[Period, Period, Any] { override def toCatalystImpl(scalaValue: Period): Int = { - IntervalUtils.periodToMonths(scalaValue) + IntervalUtils.periodToMonths(scalaValue, endField) } override def toScala(catalystValue: Any): Period = { if (catalystValue == null) null @@ -523,7 +524,7 @@ object CatalystTypeConverters { (key: Any) => convertToCatalyst(key), (value: Any) => convertToCatalyst(value)) case d: Duration => DurationConverter.toCatalyst(d) -case p: Period => PeriodConverter.toCatalyst(p) +case p: Period => PeriodConverter(MONTH).toCatalyst(p) case other => other } diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala index 9e67004..e87ea51 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala @@ -925,8 +925,17 @@ object IntervalUtils { * @throws ArithmeticException If numeric overflow occurs */ def periodToMonths(period: Period): Int = { +periodToMonths(period, YearMonthIntervalType.MONTH) + } + + def periodToMonths(period: Period, endField: Byte): Int = { val monthsInYears = Math.multiplyExact(period.getYears, MONTHS_PER_YEAR) -Math.addExact(monthsInYears, period.getMonths) +val months = Math.addExact(monthsInYears, period.getMonths) +if (endField == YearMonthIntervalType.YEAR) { + months - months % MONTHS_PER_YEAR +} else { + months +} } /** diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/RandomDataGenerator.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/RandomDataGenerator.scala index 603c88d..6201f12 100644 --- a/sql/catalys
[spark] branch master updated: [SPARK-35827][SQL] Show proper error message when update column types to year-month/day-time interval
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new af20474 [SPARK-35827][SQL] Show proper error message when update column types to year-month/day-time interval af20474 is described below commit af20474c67a61190829fa100a17ded58cb9a2102 Author: Kousuke Saruta AuthorDate: Sun Jun 20 23:39:46 2021 +0300 [SPARK-35827][SQL] Show proper error message when update column types to year-month/day-time interval ### What changes were proposed in this pull request? This PR fixes error message shown when changing a column type to year-month/day-time interval type is attempted. ### Why are the changes needed? It's for consistent behavior. Updating column types to interval types are prohibited for V2 source tables. So, if we attempt to update the type of a column to the conventional interval type, an error message like `Error in query: Cannot update field to interval type;`. But, for year-month/day-time interval types, another error message like `Error in query: Cannot update field : cannot be cast to interval year;`. You can reproduce with the following procedure. ``` $ bin/spark-sql spark-sql> SET spark.sql.catalog.mycatalog=; spark-sql> CREATE TABLE mycatalog.t1(c1 int) USING ; spark-sql> ALTER TABLE mycatalog.t1 ALTER COLUMN c1 TYPE interval year to month; ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Modified an existing test. Closes #32978 from sarutak/err-msg-interval. Authored-by: Kousuke Saruta Signed-off-by: Max Gekk --- .../org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala | 3 ++- .../scala/org/apache/spark/sql/connector/AlterTableTests.scala | 9 +++-- 2 files changed, 9 insertions(+), 3 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala index a351a10..28b0f1f 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala @@ -533,7 +533,8 @@ trait CheckAnalysis extends PredicateHelper with LookupCatalog { case u: UserDefinedType[_] => alter.failAnalysis(s"Cannot update ${table.name} field $fieldName type: " + s"update a UserDefinedType[${u.sql}] by updating its fields") - case _: CalendarIntervalType => + case _: CalendarIntervalType | _: YearMonthIntervalType | + _: DayTimeIntervalType => alter.failAnalysis(s"Cannot update ${table.name} field $fieldName to " + s"interval type") case _ => diff --git a/sql/core/src/test/scala/org/apache/spark/sql/connector/AlterTableTests.scala b/sql/core/src/test/scala/org/apache/spark/sql/connector/AlterTableTests.scala index afc51f4..165735f 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/connector/AlterTableTests.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/connector/AlterTableTests.scala @@ -403,8 +403,13 @@ trait AlterTableTests extends SharedSparkSession { val t = s"${catalogAndNamespace}table_name" withTable(t) { sql(s"CREATE TABLE $t (id int) USING $v2Format") - val e = intercept[AnalysisException](sql(s"ALTER TABLE $t ALTER COLUMN id TYPE interval")) - assert(e.getMessage.contains("id to interval type")) + (DataTypeTestUtils.dayTimeIntervalTypes ++ DataTypeTestUtils.yearMonthIntervalTypes) +.foreach { + case d: DataType => d.typeName +val e = intercept[AnalysisException]( + sql(s"ALTER TABLE $t ALTER COLUMN id TYPE ${d.typeName}")) +assert(e.getMessage.contains("id to interval type")) +} } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (682e7f2 -> 974d127)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 682e7f2 [SPARK-29375][SPARK-28940][SPARK-32041][SQL] Whole plan exchange and subquery reuse add 974d127 [SPARK-35545][FOLLOW-UP][TEST][SQL] Add a regression test for the SubqueryExpression refactor No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/sql/SQLQuerySuite.scala | 24 ++ 1 file changed, 24 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (844f10c -> 2c91672)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 844f10c [SPARK-35391] Fix memory leak in ExecutorAllocationListener add 2c91672 [SPARK-35775][SQL][TESTS] Check all year-month interval types in aggregate expressions No new revisions were added by this update. Summary of changes: .../apache/spark/sql/DataFrameAggregateSuite.scala | 84 ++ 1 file changed, 56 insertions(+), 28 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-35840][SQL] Add `apply()` for a single field to `YearMonthIntervalType` and `DayTimeIntervalType`
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 37ef7bb [SPARK-35840][SQL] Add `apply()` for a single field to `YearMonthIntervalType` and `DayTimeIntervalType` 37ef7bb is described below commit 37ef7bb98cdb1a8eefa06677f119a4d97e242097 Author: Max Gekk AuthorDate: Mon Jun 21 14:15:33 2021 +0300 [SPARK-35840][SQL] Add `apply()` for a single field to `YearMonthIntervalType` and `DayTimeIntervalType` ### What changes were proposed in this pull request? In the PR, I propose to add 2 new methods that accept one field and produce either `YearMonthIntervalType` or `DayTimeIntervalType`. ### Why are the changes needed? To improve code maintenance. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? By existing test suites. Closes #32997 from MaxGekk/ansi-interval-types-single-field. Authored-by: Max Gekk Signed-off-by: Max Gekk --- .../src/main/scala/org/apache/spark/sql/types/DataType.scala | 12 ++-- .../org/apache/spark/sql/types/DayTimeIntervalType.scala | 1 + .../org/apache/spark/sql/types/YearMonthIntervalType.scala | 1 + .../spark/sql/catalyst/CatalystTypeConvertersSuite.scala | 12 ++-- .../apache/spark/sql/catalyst/expressions/CastSuite.scala| 4 ++-- .../scala/org/apache/spark/sql/types/DataTypeTestUtils.scala | 12 ++-- 6 files changed, 22 insertions(+), 20 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/types/DataType.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/types/DataType.scala index d781b05..22c9428 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/types/DataType.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/types/DataType.scala @@ -173,18 +173,18 @@ object DataType { private val otherTypes = { Seq(NullType, DateType, TimestampType, BinaryType, IntegerType, BooleanType, LongType, DoubleType, FloatType, ShortType, ByteType, StringType, CalendarIntervalType, - DayTimeIntervalType(DAY, DAY), + DayTimeIntervalType(DAY), DayTimeIntervalType(DAY, HOUR), DayTimeIntervalType(DAY, MINUTE), DayTimeIntervalType(DAY, SECOND), - DayTimeIntervalType(HOUR, HOUR), + DayTimeIntervalType(HOUR), DayTimeIntervalType(HOUR, MINUTE), DayTimeIntervalType(HOUR, SECOND), - DayTimeIntervalType(MINUTE, MINUTE), + DayTimeIntervalType(MINUTE), DayTimeIntervalType(MINUTE, SECOND), - DayTimeIntervalType(SECOND, SECOND), - YearMonthIntervalType(YEAR, YEAR), - YearMonthIntervalType(MONTH, MONTH), + DayTimeIntervalType(SECOND), + YearMonthIntervalType(YEAR), + YearMonthIntervalType(MONTH), YearMonthIntervalType(YEAR, MONTH), TimestampWithoutTZType) .map(t => t.typeName -> t).toMap diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/types/DayTimeIntervalType.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/types/DayTimeIntervalType.scala index c6d2537..99aa5f1 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/types/DayTimeIntervalType.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/types/DayTimeIntervalType.scala @@ -101,6 +101,7 @@ case object DayTimeIntervalType extends AbstractDataType { val DEFAULT = DayTimeIntervalType(DAY, SECOND) def apply(): DayTimeIntervalType = DEFAULT + def apply(field: Byte): DayTimeIntervalType = DayTimeIntervalType(field, field) override private[sql] def defaultConcreteType: DataType = DEFAULT diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/types/YearMonthIntervalType.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/types/YearMonthIntervalType.scala index e6e2643..04902e3 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/types/YearMonthIntervalType.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/types/YearMonthIntervalType.scala @@ -95,6 +95,7 @@ case object YearMonthIntervalType extends AbstractDataType { val DEFAULT = YearMonthIntervalType(YEAR, MONTH) def apply(): YearMonthIntervalType = DEFAULT + def apply(field: Byte): YearMonthIntervalType = YearMonthIntervalType(field, field) override private[sql] def defaultConcreteType: DataType = DEFAULT diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/CatalystTypeConvertersSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/CatalystTypeConvertersSuite.scala index f08dc5f..f850a5a 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/CatalystTypeConvertersSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/CatalystTypeConvertersSuite.scala @@ -277,16 +277,16 @@
[spark] branch master updated (79e3d0d -> 7c1a9dd)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 79e3d0d [SPARK-35855][SQL] Unify reuse map data structures in non-AQE and AQE rules add 7c1a9dd [SPARK-35776][SQL][TESTS] Check all year-month interval types in arrow No new revisions were added by this update. Summary of changes: .../spark/sql/execution/arrow/ArrowWriterSuite.scala | 18 +- 1 file changed, 13 insertions(+), 5 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (960a7e5 -> 4416b4b)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 960a7e5 [SPARK-35856][SQL][TESTS] Move new interval type test cases from CastSuite to CastBaseSuite add 4416b4b [SPARK-35734][SQL][FOLLOWUP] IntervalUtils.toDayTimeIntervalString should consider the case a day-time type is casted as another day-time type No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/util/IntervalUtils.scala| 148 +++-- .../sql/catalyst/util/IntervalUtilsSuite.scala | 58 +--- 2 files changed, 119 insertions(+), 87 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-35772][SQL][TESTS] Check all year-month interval types in `HiveInspectors` tests
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new df55945 [SPARK-35772][SQL][TESTS] Check all year-month interval types in `HiveInspectors` tests df55945 is described below commit df55945804918f4d147dcef7a9b5f18bff4cabc9 Author: Angerszh AuthorDate: Wed Jun 23 08:54:07 2021 +0300 [SPARK-35772][SQL][TESTS] Check all year-month interval types in `HiveInspectors` tests ### What changes were proposed in this pull request? Check all year-month interval types in HiveInspectors tests. ### Why are the changes needed? To improve test coverage. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Added UT. Closes #32970 from AngersZh/SPARK-35772. Authored-by: Angerszh Signed-off-by: Max Gekk --- .../execution/HiveScriptTransformationSuite.scala | 46 -- 1 file changed, 35 insertions(+), 11 deletions(-) diff --git a/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveScriptTransformationSuite.scala b/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveScriptTransformationSuite.scala index 8cea781..d84a766 100644 --- a/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveScriptTransformationSuite.scala +++ b/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveScriptTransformationSuite.scala @@ -32,6 +32,7 @@ import org.apache.spark.sql.execution._ import org.apache.spark.sql.functions._ import org.apache.spark.sql.hive.test.TestHiveSingleton import org.apache.spark.sql.types._ +import org.apache.spark.sql.types.YearMonthIntervalType._ import org.apache.spark.unsafe.types.CalendarInterval class HiveScriptTransformationSuite extends BaseScriptTransformationSuite with TestHiveSingleton { @@ -521,22 +522,20 @@ class HiveScriptTransformationSuite extends BaseScriptTransformationSuite with T } - test("SPARK-34879: HiveInspectors supports DayTimeIntervalType and YearMonthIntervalType") { + test("SPARK-34879: HiveInspectors supports DayTimeIntervalType") { assume(TestUtils.testCommandAvailable("/bin/bash")) withTempView("v") { val df = Seq( (Duration.ofDays(1), Duration.ofSeconds(100).plusNanos(123456), - Duration.of(Long.MaxValue, ChronoUnit.MICROS), - Period.ofMonths(10)), + Duration.of(Long.MaxValue, ChronoUnit.MICROS)), (Duration.ofDays(1), Duration.ofSeconds(100).plusNanos(1123456789), - Duration.ofSeconds(Long.MaxValue / DateTimeConstants.MICROS_PER_SECOND), - Period.ofMonths(10)) - ).toDF("a", "b", "c", "d") + Duration.ofSeconds(Long.MaxValue / DateTimeConstants.MICROS_PER_SECOND)) + ).toDF("a", "b", "c") df.createTempView("v") - // Hive serde supports DayTimeIntervalType/YearMonthIntervalType as input and output data type + // Hive serde supports DayTimeIntervalType as input and output data type checkAnswer( df, (child: SparkPlan) => createScriptTransformationExec( @@ -545,12 +544,37 @@ class HiveScriptTransformationSuite extends BaseScriptTransformationSuite with T // TODO(SPARK-35733): Check all day-time interval types in HiveInspectors tests AttributeReference("a", DayTimeIntervalType())(), AttributeReference("b", DayTimeIntervalType())(), -AttributeReference("c", DayTimeIntervalType())(), -// TODO(SPARK-35772): Check all year-month interval types in HiveInspectors tests -AttributeReference("d", YearMonthIntervalType())()), +AttributeReference("c", DayTimeIntervalType())()), + child = child, + ioschema = hiveIOSchema), +df.select($"a", $"b", $"c").collect()) +} + } + + test("SPARK-35722: HiveInspectors supports all type of YearMonthIntervalType") { +assume(TestUtils.testCommandAvailable("/bin/bash")) +withTempView("v") { + val schema = StructType(Seq( +StructField("a", YearMonthIntervalType(YEAR)), +StructField("b", YearMonthIntervalType(YEAR, MONTH)), +StructField("c", YearMonthIntervalType(MONTH)) + )) + val df = spark.createDataFrame(sparkContext.parallelize(Seq( +Row(Period.ofMonths(13), Period.ofMonths(13), Period.ofMonths(13)) + )), schema) + + // Hive serde supports YearMonthIntervalType as input and output data type + checkAnswer( +df, +(child: SparkPlan) =&g
[spark] branch master updated (7c1a9dd -> 758b423)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 7c1a9dd [SPARK-35776][SQL][TESTS] Check all year-month interval types in arrow add 758b423 [SPARK-35860][SQL] Support UpCast between different field of YearMonthIntervalType/DayTimeIntervalType No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/catalyst/expressions/Cast.scala | 3 +++ .../spark/sql/catalyst/expressions/CastSuiteBase.scala | 12 2 files changed, 15 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (66d5a00 -> 077cf2a)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 66d5a00 [SPARK-35817][SQL] Restore performance of queries against wide Avro tables add 077cf2a [SPARK-35733][SQL][TESTS] Check all day-time interval types in `HiveInspectors` tests No new revisions were added by this update. Summary of changes: .../execution/HiveScriptTransformationSuite.scala | 53 +++--- 1 file changed, 37 insertions(+), 16 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (2d3fa04 -> ad18722)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 2d3fa04 [SPARK-35729][SQL][TESTS] Check all day-time interval types in aggregate expressions add ad18722 [SPARK-35731][SQL][TESTS] Check all day-time interval types in arrow No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/execution/arrow/ArrowWriterSuite.scala | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (4761977 -> 2d3fa04)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 4761977 [SPARK-34889][SS] Introduce MergingSessionsIterator merging elements directly which belong to the same session add 2d3fa04 [SPARK-35729][SQL][TESTS] Check all day-time interval types in aggregate expressions No new revisions were added by this update. Summary of changes: .../apache/spark/sql/DataFrameAggregateSuite.scala | 285 - .../org/apache/spark/sql/test/SQLTestData.scala| 106 2 files changed, 327 insertions(+), 64 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (43cd6ca -> 5a510cf)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 43cd6ca [SPARK-35378][SQL][FOLLOWUP] isLocal should consider CommandResult add 5a510cf [SPARK-35726][SPARK-35769][SQL][FOLLOWUP] Call periodToMonths and durationToMicros in HiveResult should add endField No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/sql/execution/HiveResult.scala| 12 ++-- 1 file changed, 10 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (4a6d90e -> 1488ea9)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 4a6d90e [SPARK-35611][SS] Introduce the strategy on mismatched offset for start offset timestamp on Kafka data source add 1488ea9 [SPARK-35820][SQL] Support Cast between different field DayTimeIntervalType No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/expressions/Cast.scala | 10 ++ .../spark/sql/catalyst/expressions/CastSuite.scala | 21 + 2 files changed, 31 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (cfcfbca -> 345d3db)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from cfcfbca [SPARK-35476][PYTHON] Fix disallow_untyped_defs mypy checks for pyspark.pandas.series add 345d3db Revert "[SPARK-35778][SQL][TESTS] Check multiply/divide of year month interval of any fields by numeric" No new revisions were added by this update. Summary of changes: .../sql/catalyst/expressions/IntervalExpressionsSuite.scala| 10 -- 1 file changed, 4 insertions(+), 6 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-35871][SQL] Literal.create(value, dataType) should support fields
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new de35675 [SPARK-35871][SQL] Literal.create(value, dataType) should support fields de35675 is described below commit de35675c6191c05195a3c7ef4c11889469e9e192 Author: Angerszh AuthorDate: Thu Jun 24 17:36:48 2021 +0300 [SPARK-35871][SQL] Literal.create(value, dataType) should support fields ### What changes were proposed in this pull request? Current Literal.create(data, dataType) for Period to YearMonthIntervalType and Duration to DayTimeIntervalType is not correct. if data type is Period/Duration, it will create converter of default YearMonthIntervalType/DayTimeIntervalType, then the result is not correct, this pr fix this bug. ### Why are the changes needed? Fix bug when use Literal.create() ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Added UT Closes #33056 from AngersZh/SPARK-35871. Authored-by: Angerszh Signed-off-by: Max Gekk --- .../spark/sql/catalyst/expressions/literals.scala | 8 +++- .../expressions/LiteralExpressionSuite.scala | 24 ++ 2 files changed, 31 insertions(+), 1 deletion(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala index d31634c..94052a2 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala @@ -153,7 +153,13 @@ object Literal { def fromObject(obj: Any): Literal = new Literal(obj, ObjectType(obj.getClass)) def create(v: Any, dataType: DataType): Literal = { -Literal(CatalystTypeConverters.convertToCatalyst(v), dataType) +dataType match { + case _: YearMonthIntervalType if v.isInstanceOf[Period] => +Literal(CatalystTypeConverters.createToCatalystConverter(dataType)(v), dataType) + case _: DayTimeIntervalType if v.isInstanceOf[Duration] => +Literal(CatalystTypeConverters.createToCatalystConverter(dataType)(v), dataType) + case _ => Literal(CatalystTypeConverters.convertToCatalyst(v), dataType) +} } def create[T : TypeTag](v: T): Literal = Try { diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/LiteralExpressionSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/LiteralExpressionSuite.scala index 6410651..50b7263 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/LiteralExpressionSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/LiteralExpressionSuite.scala @@ -32,6 +32,8 @@ import org.apache.spark.sql.catalyst.util.DateTimeConstants._ import org.apache.spark.sql.catalyst.util.DateTimeUtils import org.apache.spark.sql.internal.SQLConf import org.apache.spark.sql.types._ +import org.apache.spark.sql.types.DayTimeIntervalType._ +import org.apache.spark.sql.types.YearMonthIntervalType._ import org.apache.spark.unsafe.types.CalendarInterval class LiteralExpressionSuite extends SparkFunSuite with ExpressionEvalHelper { @@ -432,4 +434,26 @@ class LiteralExpressionSuite extends SparkFunSuite with ExpressionEvalHelper { assert(literal.toString === expected) } } + + test("SPARK-35871: Literal.create(value, dataType) should support fields") { +val period = Period.ofMonths(13) +DataTypeTestUtils.yearMonthIntervalTypes.foreach { dt => + val result = dt.endField match { +case YEAR => 12 +case MONTH => 13 + } + checkEvaluation(Literal.create(period, dt), result) +} + +val duration = Duration.ofSeconds(86400 + 3600 + 60 + 1) +DataTypeTestUtils.dayTimeIntervalTypes.foreach { dt => + val result = dt.endField match { +case DAY => 864L +case HOUR => 900L +case MINUTE => 9006000L +case SECOND => 9006100L + } + checkEvaluation(Literal.create(duration, dt), result) +} + } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (345d3db -> d40a1a2)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 345d3db Revert "[SPARK-35778][SQL][TESTS] Check multiply/divide of year month interval of any fields by numeric" add d40a1a2 Revert "[SPARK-35728][SQL][TESTS] Check multiply/divide of day-time intervals of any fields by numeric" No new revisions were added by this update. Summary of changes: .../sql/catalyst/expressions/IntervalExpressionsSuite.scala | 12 +--- 1 file changed, 5 insertions(+), 7 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-35736][SPARK-35774][SQL][FOLLOWUP] Prohibit to specify the same units for FROM and TO with unit-to-unit interval syntax
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 156b9b5 [SPARK-35736][SPARK-35774][SQL][FOLLOWUP] Prohibit to specify the same units for FROM and TO with unit-to-unit interval syntax 156b9b5 is described below commit 156b9b5d14d9f759ae2beef46f163a541878f53c Author: Kousuke Saruta AuthorDate: Thu Jun 24 23:13:31 2021 +0300 [SPARK-35736][SPARK-35774][SQL][FOLLOWUP] Prohibit to specify the same units for FROM and TO with unit-to-unit interval syntax ### What changes were proposed in this pull request? This PR change the behavior of unit-to-unit interval syntax to prohibit the case that the same units are specified for FROM and TO. ### Why are the changes needed? For ANSI compliance. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? New test. Closes #33057 from sarutak/prohibit-unit-pattern. Authored-by: Kousuke Saruta Signed-off-by: Max Gekk --- .../spark/sql/catalyst/parser/AstBuilder.scala | 30 ++ .../spark/sql/errors/QueryParsingErrors.scala | 2 +- .../apache/spark/sql/types/StructTypeSuite.scala | 24 + 3 files changed, 45 insertions(+), 11 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala index 6a373ab..f82b9be 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala @@ -2521,23 +2521,33 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with SQLConfHelper with Logg } override def visitYearMonthIntervalDataType(ctx: YearMonthIntervalDataTypeContext): DataType = { -val start = YearMonthIntervalType.stringToField(ctx.from.getText.toLowerCase(Locale.ROOT)) -val end = if (ctx.to != null) { - YearMonthIntervalType.stringToField(ctx.to.getText.toLowerCase(Locale.ROOT)) +val startStr = ctx.from.getText.toLowerCase(Locale.ROOT) +val start = YearMonthIntervalType.stringToField(startStr) +if (ctx.to != null) { + val endStr = ctx.to.getText.toLowerCase(Locale.ROOT) + val end = YearMonthIntervalType.stringToField(endStr) + if (end <= start) { +throw QueryParsingErrors.fromToIntervalUnsupportedError(startStr, endStr, ctx) + } + YearMonthIntervalType(start, end) } else { - start + YearMonthIntervalType(start) } -YearMonthIntervalType(start, end) } override def visitDayTimeIntervalDataType(ctx: DayTimeIntervalDataTypeContext): DataType = { -val start = DayTimeIntervalType.stringToField(ctx.from.getText.toLowerCase(Locale.ROOT)) -val end = if (ctx.to != null ) { - DayTimeIntervalType.stringToField(ctx.to.getText.toLowerCase(Locale.ROOT)) +val startStr = ctx.from.getText.toLowerCase(Locale.ROOT) +val start = DayTimeIntervalType.stringToField(startStr) +if (ctx.to != null ) { + val endStr = ctx.to.getText.toLowerCase(Locale.ROOT) + val end = DayTimeIntervalType.stringToField(endStr) + if (end <= start) { +throw QueryParsingErrors.fromToIntervalUnsupportedError(startStr, endStr, ctx) + } + DayTimeIntervalType(start, end) } else { - start + DayTimeIntervalType(start) } -DayTimeIntervalType(start, end) } /** diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala index 9002728..cab1c17 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala @@ -206,7 +206,7 @@ object QueryParsingErrors { } def fromToIntervalUnsupportedError( - from: String, to: String, ctx: UnitToUnitIntervalContext): Throwable = { + from: String, to: String, ctx: ParserRuleContext): Throwable = { new ParseException(s"Intervals FROM $from TO $to are not supported.", ctx) } diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/types/StructTypeSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/types/StructTypeSuite.scala index 820f326..18821b8 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/types/StructTypeSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/types/StructTypeSuite.scala @@ -18,8 +18,11 @@ package org.apache.spark.sql.types import org.apache.spark.SparkFunSuite +import org.apache.spark.sql.catalyst.parser.ParseE
[spark] branch master updated: [SPARK-35764][SQL] Assign pretty names to TimestampWithoutTZType
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 195090a [SPARK-35764][SQL] Assign pretty names to TimestampWithoutTZType 195090a is described below commit 195090afcc8ed138336b353edc0a4db6f0f5f168 Author: Gengliang Wang AuthorDate: Tue Jun 15 12:15:13 2021 +0300 [SPARK-35764][SQL] Assign pretty names to TimestampWithoutTZType ### What changes were proposed in this pull request? In the PR, I propose to override the typeName() method in TimestampWithoutTZType, and assign it a name according to the ANSI SQL standard ![image](https://user-images.githubusercontent.com/1097932/122013859-2cf50680-cdf1-11eb-9fcd-0ec1b59fb5c0.png) ### Why are the changes needed? To improve Spark SQL user experience, and have readable types in error messages. ### Does this PR introduce _any_ user-facing change? No, the new timestamp type is not released yet. ### How was this patch tested? Unit test Closes #32915 from gengliangwang/typename. Authored-by: Gengliang Wang Signed-off-by: Max Gekk --- .../org/apache/spark/sql/types/TimestampWithoutTZType.scala | 2 ++ .../apache/spark/sql/catalyst/expressions/CastSuite.scala | 13 + 2 files changed, 15 insertions(+) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/types/TimestampWithoutTZType.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/types/TimestampWithoutTZType.scala index 558f5ee..856d549 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/types/TimestampWithoutTZType.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/types/TimestampWithoutTZType.scala @@ -48,6 +48,8 @@ class TimestampWithoutTZType private() extends AtomicType { */ override def defaultSize: Int = 8 + override def typeName: String = "timestamp without time zone" + private[spark] override def asNullable: TimestampWithoutTZType = this } diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala index c268d52..910c757 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala @@ -1295,6 +1295,19 @@ abstract class AnsiCastSuiteBase extends CastSuiteBase { } } } + + test("disallow type conversions between Numeric types and Timestamp without time zone type") { +import DataTypeTestUtils.numericTypes +checkInvalidCastFromNumericType(TimestampWithoutTZType) +var errorMsg = "cannot cast bigint to timestamp without time zone" +verifyCastFailure(cast(Literal(0L), TimestampWithoutTZType), Some(errorMsg)) + +val timestampWithoutTZLiteral = Literal.create(LocalDateTime.now(), TimestampWithoutTZType) +errorMsg = "cannot cast timestamp without time zone to" +numericTypes.foreach { numericType => + verifyCastFailure(cast(timestampWithoutTZLiteral, numericType), Some(errorMsg)) +} + } } /** - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (195090a -> b191d72)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 195090a [SPARK-35764][SQL] Assign pretty names to TimestampWithoutTZType add b191d72 [SPARK-35056][SQL] Group exception messages in execution/streaming No new revisions were added by this update. Summary of changes: .../spark/sql/errors/QueryExecutionErrors.scala| 95 +- .../streaming/CheckpointFileManager.scala | 16 ++-- .../sql/execution/streaming/FileStreamSink.scala | 21 + .../sql/execution/streaming/GroupStateImpl.scala | 15 ++-- .../sql/execution/streaming/HDFSMetadataLog.scala | 7 +- .../streaming/ManifestFileCommitProtocol.scala | 4 +- .../execution/streaming/MicroBatchExecution.scala | 4 +- .../execution/streaming/StreamingRelation.scala| 3 +- .../execution/streaming/statefulOperators.scala| 3 +- 9 files changed, 121 insertions(+), 47 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (c382d40 -> 8a02f3a)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from c382d40 [SPARK-35766][SQL][TESTS] Break down CastSuite/AnsiCastSuite into multiple files add 8a02f3a [SPARK-35129][SQL] Construct year-month interval column from integral fields No new revisions were added by this update. Summary of changes: .../sql/catalyst/analysis/FunctionRegistry.scala | 1 + .../catalyst/expressions/intervalExpressions.scala | 53 +++ .../expressions/IntervalExpressionsSuite.scala | 28 ++ .../sql-functions/sql-expression-schema.md | 3 +- .../test/resources/sql-tests/inputs/interval.sql | 9 .../sql-tests/results/ansi/interval.sql.out| 60 +- .../resources/sql-tests/results/interval.sql.out | 60 +- 7 files changed, 211 insertions(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-35737][SQL] Parse day-time interval literals to tightest types
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 439e94c [SPARK-35737][SQL] Parse day-time interval literals to tightest types 439e94c is described below commit 439e94c1712366ff267183d3946f2507ebf3a98e Author: Kousuke Saruta AuthorDate: Mon Jun 14 10:06:19 2021 +0300 [SPARK-35737][SQL] Parse day-time interval literals to tightest types ### What changes were proposed in this pull request? This PR add a feature which parse day-time interval literals to tightest type. ### Why are the changes needed? To comply with the ANSI behavior. For example, `INTERVAL '10 20:30' DAY TO MINUTE` should be parsed as `DayTimeIntervalType(DAY, MINUTE)` but not as `DayTimeIntervalType(DAY, SECOND)`. ### Does this PR introduce _any_ user-facing change? No because `DayTimeIntervalType` will be introduced in `3.2.0`. ### How was this patch tested? New tests. Closes #32892 from sarutak/tight-daytime-interval. Authored-by: Kousuke Saruta Signed-off-by: Max Gekk --- .../org/apache/spark/sql/catalyst/parser/AstBuilder.scala | 9 +++-- .../resources/sql-tests/results/ansi/interval.sql.out | 12 ++-- .../src/test/resources/sql-tests/results/interval.sql.out | 12 ++-- .../test/scala/org/apache/spark/sql/SQLQuerySuite.scala | 15 +++ 4 files changed, 34 insertions(+), 14 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala index 9f1d665..4bbd9bd 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala @@ -2357,9 +2357,14 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with SQLConfHelper with Logg Literal(calendarInterval.months, YearMonthIntervalType) } else { assert(calendarInterval.months == 0) +val strToFieldIndex = DayTimeIntervalType.dayTimeFields.map(i => + DayTimeIntervalType.fieldToString(i) -> i).toMap +val fromUnit = + ctx.errorCapturingUnitToUnitInterval.body.from.getText.toLowerCase(Locale.ROOT) val micros = IntervalUtils.getDuration(calendarInterval, TimeUnit.MICROSECONDS) -// TODO(SPARK-35737): Parse day-time interval literals to tightest types -Literal(micros, DayTimeIntervalType()) +val start = strToFieldIndex(fromUnit) +val end = strToFieldIndex(toUnit) +Literal(micros, DayTimeIntervalType(start, end)) } } else { Literal(calendarInterval, CalendarIntervalType) diff --git a/sql/core/src/test/resources/sql-tests/results/ansi/interval.sql.out b/sql/core/src/test/resources/sql-tests/results/ansi/interval.sql.out index 3205259..f2f5d5c 100644 --- a/sql/core/src/test/resources/sql-tests/results/ansi/interval.sql.out +++ b/sql/core/src/test/resources/sql-tests/results/ansi/interval.sql.out @@ -363,7 +363,7 @@ struct -- !query select interval '20 15' day to hour -- !query schema -struct +struct -- !query output 20 15:00:00.0 @@ -371,7 +371,7 @@ struct -- !query select interval '20 15:40' day to minute -- !query schema -struct +struct -- !query output 20 15:40:00.0 @@ -387,7 +387,7 @@ struct -- !query select interval '15:40' hour to minute -- !query schema -struct +struct -- !query output 0 15:40:00.0 @@ -395,7 +395,7 @@ struct -- !query select interval '15:40:32.9989' hour to second -- !query schema -struct +struct -- !query output 0 15:40:32.998999000 @@ -403,7 +403,7 @@ struct -- !query select interval '40:32.9989' minute to second -- !query schema -struct +struct -- !query output 0 00:40:32.998999000 @@ -411,7 +411,7 @@ struct -- !query select interval '40:32' minute to second -- !query schema -struct +struct -- !query output 0 00:40:32.0 diff --git a/sql/core/src/test/resources/sql-tests/results/interval.sql.out b/sql/core/src/test/resources/sql-tests/results/interval.sql.out index ef9ef8f..9b44960 100644 --- a/sql/core/src/test/resources/sql-tests/results/interval.sql.out +++ b/sql/core/src/test/resources/sql-tests/results/interval.sql.out @@ -357,7 +357,7 @@ struct -- !query select interval '20 15' day to hour -- !query schema -struct +struct -- !query output 20 15:00:00.0 @@ -365,7 +365,7 @@ struct -- !query select interval '20 15:40' day to minute -- !query schema -struct +struct -- !query output 20 15:40:00.0 @@ -381,7 +381,7 @@ struct -- !query select interval '15:40' hour to minute -- !query schema -struct +struct -- !query output 0 15:40:00.000
[spark] branch master updated (0e554d4 -> 939ae91)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 0e554d4 [SPARK-35770][SQL] Parse YearMonthIntervalType from JSON add 939ae91 [SPARK-35130][SQL] Add make_dt_interval function to construct DayTimeIntervalType value No new revisions were added by this update. Summary of changes: .../sql/catalyst/analysis/FunctionRegistry.scala | 1 + .../catalyst/expressions/intervalExpressions.scala | 77 ++ .../spark/sql/catalyst/util/IntervalUtils.scala| 13 .../expressions/IntervalExpressionsSuite.scala | 45 + .../sql-functions/sql-expression-schema.md | 5 +- .../test/resources/sql-tests/inputs/interval.sql | 8 +++ .../sql-tests/results/ansi/interval.sql.out| 49 +- .../resources/sql-tests/results/interval.sql.out | 50 +- 8 files changed, 244 insertions(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (8aeed08 -> 0e554d4)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 8aeed08 [SPARK-35757][CORE] Add bitwise AND operation and functionality for intersecting bloom filters add 0e554d4 [SPARK-35770][SQL] Parse YearMonthIntervalType from JSON No new revisions were added by this update. Summary of changes: .../src/main/scala/org/apache/spark/sql/types/DataType.scala | 7 +-- .../src/test/scala/org/apache/spark/sql/types/DataTypeSuite.scala | 3 +++ 2 files changed, 8 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-35732][SQL] Parse DayTimeIntervalType from JSON
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 234163f [SPARK-35732][SQL] Parse DayTimeIntervalType from JSON 234163f is described below commit 234163fbe08c8ac51b2ce52094c38baaa4e06709 Author: Angerszh AuthorDate: Thu Jun 17 12:54:34 2021 +0300 [SPARK-35732][SQL] Parse DayTimeIntervalType from JSON ### What changes were proposed in this pull request? Support Parse DayTimeIntervalType from JSON ### Why are the changes needed? this will allow to store day-second intervals as table columns into Hive external catalog. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Added UT Closes #32930 from AngersZh/SPARK-35732. Lead-authored-by: Angerszh Co-authored-by: AngersZh Signed-off-by: Max Gekk --- .../main/scala/org/apache/spark/sql/types/DataType.scala| 13 +++-- .../scala/org/apache/spark/sql/types/DataTypeSuite.scala| 2 +- 2 files changed, 12 insertions(+), 3 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/types/DataType.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/types/DataType.scala index 1c4ad88..d781b05 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/types/DataType.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/types/DataType.scala @@ -37,6 +37,7 @@ import org.apache.spark.sql.catalyst.util.StringUtils.StringConcat import org.apache.spark.sql.internal.SQLConf import org.apache.spark.sql.internal.SQLConf.StoreAssignmentPolicy import org.apache.spark.sql.internal.SQLConf.StoreAssignmentPolicy.{ANSI, STRICT} +import org.apache.spark.sql.types.DayTimeIntervalType._ import org.apache.spark.sql.types.YearMonthIntervalType._ import org.apache.spark.util.Utils @@ -172,8 +173,16 @@ object DataType { private val otherTypes = { Seq(NullType, DateType, TimestampType, BinaryType, IntegerType, BooleanType, LongType, DoubleType, FloatType, ShortType, ByteType, StringType, CalendarIntervalType, - // TODO(SPARK-35732): Parse DayTimeIntervalType from JSON - DayTimeIntervalType(), + DayTimeIntervalType(DAY, DAY), + DayTimeIntervalType(DAY, HOUR), + DayTimeIntervalType(DAY, MINUTE), + DayTimeIntervalType(DAY, SECOND), + DayTimeIntervalType(HOUR, HOUR), + DayTimeIntervalType(HOUR, MINUTE), + DayTimeIntervalType(HOUR, SECOND), + DayTimeIntervalType(MINUTE, MINUTE), + DayTimeIntervalType(MINUTE, SECOND), + DayTimeIntervalType(SECOND, SECOND), YearMonthIntervalType(YEAR, YEAR), YearMonthIntervalType(MONTH, MONTH), YearMonthIntervalType(YEAR, MONTH), diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/types/DataTypeSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/types/DataTypeSuite.scala index d620415..3761833 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/types/DataTypeSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/types/DataTypeSuite.scala @@ -256,7 +256,7 @@ class DataTypeSuite extends SparkFunSuite { checkDataTypeFromJson(VarcharType(10)) checkDataTypeFromDDL(VarcharType(11)) - + dayTimeIntervalTypes.foreach(checkDataTypeFromJson) yearMonthIntervalTypes.foreach(checkDataTypeFromJson) yearMonthIntervalTypes.foreach(checkDataTypeFromDDL) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-35704][SQL] Add fields to `DayTimeIntervalType`
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new d53831f [SPARK-35704][SQL] Add fields to `DayTimeIntervalType` d53831f is described below commit d53831ff5cef618cfbc08dcea84ca9bb61ed9903 Author: Max Gekk AuthorDate: Fri Jun 11 16:16:33 2021 +0300 [SPARK-35704][SQL] Add fields to `DayTimeIntervalType` ### What changes were proposed in this pull request? Extend DayTimeIntervalType to support interval fields. Valid interval field values: - 0 (DAY) - 1 (HOUR) - 2 (MINUTE) - 3 (SECOND) After the changes, the following day-time interval types are supported: 1. `DayTimeIntervalType(0, 0)` or `DayTimeIntervalType(DAY, DAY)` 2. `DayTimeIntervalType(0, 1)` or `DayTimeIntervalType(DAY, HOUR)` 3. `DayTimeIntervalType(0, 2)` or `DayTimeIntervalType(DAY, MINUTE)` 4. `DayTimeIntervalType(0, 3)` or `DayTimeIntervalType(DAY, SECOND)`. **It is the default one**. The second fraction precision is microseconds. 5. `DayTimeIntervalType(1, 1)` or `DayTimeIntervalType(HOUR, HOUR)` 6. `DayTimeIntervalType(1, 2)` or `DayTimeIntervalType(HOUR, MINUTE)` 7. `DayTimeIntervalType(1, 3)` or `DayTimeIntervalType(HOUR, SECOND)` 8. `DayTimeIntervalType(2, 2)` or `DayTimeIntervalType(MINUTE, MINUTE)` 9. `DayTimeIntervalType(2, 3)` or `DayTimeIntervalType(MINUTE, SECOND)` 10. `DayTimeIntervalType(3, 3)` or `DayTimeIntervalType(SECOND, SECOND)` ### Why are the changes needed? In the current implementation, Spark supports only `interval day to second` but the SQL standard allows to specify the start and end fields. The changes will allow to follow ANSI SQL standard more precisely. ### Does this PR introduce _any_ user-facing change? Yes but `DayTimeIntervalType` has not been released yet. ### How was this patch tested? By existing test suites. Closes #32849 from MaxGekk/day-time-interval-type-units. Authored-by: Max Gekk Signed-off-by: Max Gekk --- .../spark/sql/catalyst/expressions/UnsafeRow.java | 9 ++-- .../java/org/apache/spark/sql/types/DataTypes.java | 19 ++-- .../sql/catalyst/CatalystTypeConverters.scala | 3 +- .../apache/spark/sql/catalyst/InternalRow.scala| 4 +- .../spark/sql/catalyst/JavaTypeInference.scala | 2 +- .../spark/sql/catalyst/ScalaReflection.scala | 6 +-- .../spark/sql/catalyst/SerializerBuildHelper.scala | 2 +- .../spark/sql/catalyst/analysis/Analyzer.scala | 22 - .../apache/spark/sql/catalyst/dsl/package.scala| 7 ++- .../spark/sql/catalyst/encoders/RowEncoder.scala | 6 +-- .../spark/sql/catalyst/expressions/Cast.scala | 46 -- .../expressions/InterpretedUnsafeProjection.scala | 2 +- .../catalyst/expressions/SpecificInternalRow.scala | 3 +- .../catalyst/expressions/aggregate/Average.scala | 6 +-- .../sql/catalyst/expressions/aggregate/Sum.scala | 2 +- .../sql/catalyst/expressions/arithmetic.scala | 10 ++-- .../expressions/codegen/CodeGenerator.scala| 4 +- .../catalyst/expressions/codegen/javaCode.scala| 2 +- .../expressions/collectionOperations.scala | 8 ++-- .../catalyst/expressions/datetimeExpressions.scala | 14 +++--- .../spark/sql/catalyst/expressions/hash.scala | 2 +- .../catalyst/expressions/intervalExpressions.scala | 12 ++--- .../spark/sql/catalyst/expressions/literals.scala | 16 --- .../catalyst/expressions/windowExpressions.scala | 2 +- .../spark/sql/catalyst/parser/AstBuilder.scala | 6 ++- .../spark/sql/catalyst/util/IntervalUtils.scala| 13 +- .../apache/spark/sql/catalyst/util/TypeUtils.scala | 5 +- .../spark/sql/errors/QueryCompilationErrors.scala | 11 + .../org/apache/spark/sql/types/DataType.scala | 3 +- .../spark/sql/types/DayTimeIntervalType.scala | 54 ++ .../org/apache/spark/sql/util/ArrowUtils.scala | 4 +- .../org/apache/spark/sql/RandomDataGenerator.scala | 2 +- .../spark/sql/RandomDataGeneratorSuite.scala | 3 +- .../sql/catalyst/CatalystTypeConvertersSuite.scala | 3 +- .../sql/catalyst/encoders/RowEncoderSuite.scala| 17 --- .../expressions/ArithmeticExpressionSuite.scala| 8 ++-- .../spark/sql/catalyst/expressions/CastSuite.scala | 39 +--- .../expressions/DateExpressionsSuite.scala | 12 +++-- .../expressions/HashExpressionsSuite.scala | 2 +- .../expressions/IntervalExpressionsSuite.scala | 32 - .../expressions/LiteralExpressionSuite.scala | 4 +- .../catalyst/expressions/LiteralGenerator.scala| 4 +- .../expressions/MutableProjectionSuite.scala | 9 ++-- .../sql/catalyst/parser/DataTypeParserSuite.scala | 2 +- .../sql/catalyst/util
[spark] branch master updated (38eb5a6 -> 2c8ced9)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 38eb5a6 [SPARK-35354][SQL] Replace BaseJoinExec with ShuffledJoin in CoalesceBucketsInJoin add 2c8ced9 [SPARK-35111][SPARK-35112][SQL][FOLLOWUP] Rename ANSI interval patterns and regexps No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/util/IntervalUtils.scala| 31 +++--- 1 file changed, 16 insertions(+), 15 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (d808956 -> 7182f8c)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from d808956 [MINOR][INFRA] Add python/.idea into git ignore add 7182f8c [SPARK-35360][SQL] RepairTableCommand respects `spark.sql.addPartitionInBatch.size` too No new revisions were added by this update. Summary of changes: .../src/main/scala/org/apache/spark/sql/internal/SQLConf.scala | 5 +++-- .../src/main/scala/org/apache/spark/sql/execution/command/ddl.scala | 2 +- 2 files changed, 4 insertions(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (af1dba7 -> e3c6907)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from af1dba7 [SPARK-35440][SQL] Add function type to `ExpressionInfo` for UDF add e3c6907 [SPARK-35490][BUILD] Update json4s to 3.7.0-M11 No new revisions were added by this update. Summary of changes: .../main/scala/org/apache/spark/deploy/FaultToleranceTest.scala | 1 - .../org/apache/spark/deploy/history/HistoryServerSuite.scala | 1 - dev/deps/spark-deps-hadoop-2.7-hive-2.3 | 8 dev/deps/spark-deps-hadoop-3.2-hive-2.3 | 8 pom.xml | 2 +- .../sql/streaming/StreamingQueryStatusAndProgressSuite.scala | 1 - 6 files changed, 9 insertions(+), 12 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-35581][SQL] Support special datetime values in typed literals only
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new a59063d [SPARK-35581][SQL] Support special datetime values in typed literals only a59063d is described below commit a59063d5446a78d713c19a0d43f86c9f72ffd77d Author: Max Gekk AuthorDate: Tue Jun 1 15:29:05 2021 +0300 [SPARK-35581][SQL] Support special datetime values in typed literals only ### What changes were proposed in this pull request? In the PR, I propose to support special datetime values introduced by #25708 and by #25716 only in typed literals, and don't recognize them in parsing strings to dates/timestamps. The following string values are supported only in typed timestamp literals: - `epoch [zoneId]` - `1970-01-01 00:00:00+00 (Unix system time zero)` - `today [zoneId]` - midnight today. - `yesterday [zoneId]` - midnight yesterday - `tomorrow [zoneId]` - midnight tomorrow - `now` - current query start time. For example: ```sql spark-sql> SELECT timestamp 'tomorrow'; 2019-09-07 00:00:00 ``` Similarly, the following special date values are supported only in typed date literals: - `epoch [zoneId]` - `1970-01-01` - `today [zoneId]` - the current date in the time zone specified by `spark.sql.session.timeZone`. - `yesterday [zoneId]` - the current date -1 - `tomorrow [zoneId]` - the current date + 1 - `now` - the date of running the current query. It has the same notion as `today`. For example: ```sql spark-sql> SELECT date 'tomorrow' - date 'yesterday'; 2 ``` ### Why are the changes needed? In the current implementation, Spark supports the special date/timestamp value in any input strings casted to dates/timestamps that leads to the following problems: - If executors have different system time, the result is inconsistent, and random. Column values depend on where the conversions were performed. - The special values play the role of distributed non-deterministic functions though users might think of the values as constants. ### Does this PR introduce _any_ user-facing change? Yes but the probability should be small. ### How was this patch tested? By running existing test suites: ``` $ build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite -- -z interval.sql" $ build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite -- -z date.sql" $ build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite -- -z timestamp.sql" $ build/sbt "test:testOnly *DateTimeUtilsSuite" ``` Closes #32714 from MaxGekk/remove-datetime-special-values. Lead-authored-by: Max Gekk Co-authored-by: Maxim Gekk Signed-off-by: Max Gekk --- docs/sql-migration-guide.md| 2 ++ .../src/main/scala/org/apache/spark/sql/Row.scala | 2 +- .../spark/sql/catalyst/catalog/interface.scala | 4 +-- .../sql/catalyst/csv/UnivocityGenerator.scala | 1 - .../spark/sql/catalyst/csv/UnivocityParser.scala | 3 +- .../spark/sql/catalyst/expressions/Cast.scala | 20 + .../spark/sql/catalyst/expressions/literals.scala | 2 +- .../spark/sql/catalyst/json/JacksonGenerator.scala | 1 - .../spark/sql/catalyst/json/JacksonParser.scala| 3 +- .../spark/sql/catalyst/parser/AstBuilder.scala | 9 -- .../spark/sql/catalyst/util/DateFormatter.scala| 34 - .../spark/sql/catalyst/util/DateTimeUtils.scala| 35 +- .../sql/catalyst/util/TimestampFormatter.scala | 23 +++--- .../expressions/HashExpressionsSuite.scala | 2 +- .../sql/catalyst/util/DateFormatterSuite.scala | 34 ++--- .../sql/catalyst/util/DateTimeUtilsSuite.scala | 32 ++-- .../sql/catalyst/util/DatetimeFormatterSuite.scala | 2 +- .../catalyst/util/TimestampFormatterSuite.scala| 24 +-- .../spark/sql/catalyst/util/UnsafeArraySuite.scala | 6 ++-- .../execution/BaseScriptTransformationExec.scala | 11 +++ .../apache/spark/sql/execution/HiveResult.scala| 12 ++-- .../execution/datasources/PartitioningUtils.scala | 2 +- .../execution/datasources/jdbc/JDBCRelation.scala | 5 ++-- .../org/apache/spark/sql/jdbc/JdbcDialects.scala | 4 +-- .../sql-tests/inputs/postgreSQL/timestamp.sql | 16 +- .../sql-tests/results/postgreSQL/timestamp.sql.out | 16 +- .../org/apache/spark/sql/CsvFunctionsSuite.scala | 19 .../org/apache/spark/sql/JsonFunctionsSuite.scala | 19 .../parquet/ParquetPartitionDiscoverySuite.scala | 2 +- .../spark/sql/hive/HiveExternalCatalog.scala | 8 ++--- .../apache/spark/sql/hive/c
[spark] branch branch-3.2 updated: [SPARK-35982][SQL] Allow from_json/to_json for map types where value types are year-month intervals
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch branch-3.2 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.2 by this push: new 26bcf02 [SPARK-35982][SQL] Allow from_json/to_json for map types where value types are year-month intervals 26bcf02 is described below commit 26bcf028339c02ca75af31ab8105f7dbe58c49a9 Author: Kousuke Saruta AuthorDate: Mon Jul 5 10:35:50 2021 +0300 [SPARK-35982][SQL] Allow from_json/to_json for map types where value types are year-month intervals ### What changes were proposed in this pull request? This PR fixes two issues. One is that `to_json` doesn't support `map` types where value types are `year-month` interval types like: ``` spark-sql> select to_json(map('a', interval '1-2' year to month)); 21/07/02 11:38:15 ERROR SparkSQLDriver: Failed in [select to_json(map('a', interval '1-2' year to month))] java.lang.RuntimeException: Failed to convert value 14 (class of class java.lang.Integer) with the type of YearMonthIntervalType(0,1) to JSON. ``` The other issue is that even if the issue of `to_json` is resolved, `from_json` doesn't support to convert `year-month` interval string to JSON. So the result of following query will be `null`. ``` spark-sql> select from_json(to_json(map('a', interval '1-2' year to month)), 'a interval year to month'); {"a":null} ``` ### Why are the changes needed? There should be no reason why year-month intervals cannot used as map value types. `CalendarIntervalTypes` can do it. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? New tests. Closes #33181 from sarutak/map-json-yminterval. Authored-by: Kousuke Saruta Signed-off-by: Max Gekk (cherry picked from commit 647422685292cd1a46766afa9b07b6fcfc181bbd) Signed-off-by: Max Gekk --- .../spark/sql/catalyst/json/JacksonGenerator.scala | 9 .../spark/sql/catalyst/json/JacksonParser.scala| 7 ++ .../org/apache/spark/sql/JsonFunctionsSuite.scala | 26 ++ 3 files changed, 42 insertions(+) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala index 2567438..9777d56 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala @@ -147,6 +147,15 @@ private[sql] class JacksonGenerator( (row: SpecializedGetters, ordinal: Int) => gen.writeString(row.getInterval(ordinal).toString) +case YearMonthIntervalType(start, end) => + (row: SpecializedGetters, ordinal: Int) => +val ymString = IntervalUtils.toYearMonthIntervalString( + row.getInt(ordinal), + IntervalStringStyles.ANSI_STYLE, + start, + end) +gen.writeString(ymString) + case BinaryType => (row: SpecializedGetters, ordinal: Int) => gen.writeBinary(row.getBinary(ordinal)) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala index 27e1411..2aa735d 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala @@ -295,6 +295,13 @@ class JacksonParser( IntervalUtils.safeStringToInterval(UTF8String.fromString(parser.getText)) } +case ym: YearMonthIntervalType => (parser: JsonParser) => + parseJsonToken[Integer](parser, dataType) { +case VALUE_STRING => + val expr = Cast(Literal(parser.getText), ym) + Integer.valueOf(expr.eval(EmptyRow).asInstanceOf[Int]) + } + case st: StructType => val fieldConverters = st.map(_.dataType).map(makeConverter).toArray (parser: JsonParser) => parseJsonToken[InternalRow](parser, dataType) { diff --git a/sql/core/src/test/scala/org/apache/spark/sql/JsonFunctionsSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/JsonFunctionsSuite.scala index 5485cc1..c2bea8c 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/JsonFunctionsSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/JsonFunctionsSuite.scala @@ -18,6 +18,7 @@ package org.apache.spark.sql import java.text.SimpleDateFormat +import java.time.Period import java.util.Locale import collection.JavaConverters._ @@ -27,6 +28,7 @@ import org.apache.spark.sql.functions._ import org.apache.spark.sql.internal.SQLConf import org.ap
[spark] branch master updated: [SPARK-35982][SQL] Allow from_json/to_json for map types where value types are year-month intervals
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 6474226 [SPARK-35982][SQL] Allow from_json/to_json for map types where value types are year-month intervals 6474226 is described below commit 647422685292cd1a46766afa9b07b6fcfc181bbd Author: Kousuke Saruta AuthorDate: Mon Jul 5 10:35:50 2021 +0300 [SPARK-35982][SQL] Allow from_json/to_json for map types where value types are year-month intervals ### What changes were proposed in this pull request? This PR fixes two issues. One is that `to_json` doesn't support `map` types where value types are `year-month` interval types like: ``` spark-sql> select to_json(map('a', interval '1-2' year to month)); 21/07/02 11:38:15 ERROR SparkSQLDriver: Failed in [select to_json(map('a', interval '1-2' year to month))] java.lang.RuntimeException: Failed to convert value 14 (class of class java.lang.Integer) with the type of YearMonthIntervalType(0,1) to JSON. ``` The other issue is that even if the issue of `to_json` is resolved, `from_json` doesn't support to convert `year-month` interval string to JSON. So the result of following query will be `null`. ``` spark-sql> select from_json(to_json(map('a', interval '1-2' year to month)), 'a interval year to month'); {"a":null} ``` ### Why are the changes needed? There should be no reason why year-month intervals cannot used as map value types. `CalendarIntervalTypes` can do it. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? New tests. Closes #33181 from sarutak/map-json-yminterval. Authored-by: Kousuke Saruta Signed-off-by: Max Gekk --- .../spark/sql/catalyst/json/JacksonGenerator.scala | 9 .../spark/sql/catalyst/json/JacksonParser.scala| 7 ++ .../org/apache/spark/sql/JsonFunctionsSuite.scala | 26 ++ 3 files changed, 42 insertions(+) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala index 2567438..9777d56 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala @@ -147,6 +147,15 @@ private[sql] class JacksonGenerator( (row: SpecializedGetters, ordinal: Int) => gen.writeString(row.getInterval(ordinal).toString) +case YearMonthIntervalType(start, end) => + (row: SpecializedGetters, ordinal: Int) => +val ymString = IntervalUtils.toYearMonthIntervalString( + row.getInt(ordinal), + IntervalStringStyles.ANSI_STYLE, + start, + end) +gen.writeString(ymString) + case BinaryType => (row: SpecializedGetters, ordinal: Int) => gen.writeBinary(row.getBinary(ordinal)) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala index 27e1411..2aa735d 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala @@ -295,6 +295,13 @@ class JacksonParser( IntervalUtils.safeStringToInterval(UTF8String.fromString(parser.getText)) } +case ym: YearMonthIntervalType => (parser: JsonParser) => + parseJsonToken[Integer](parser, dataType) { +case VALUE_STRING => + val expr = Cast(Literal(parser.getText), ym) + Integer.valueOf(expr.eval(EmptyRow).asInstanceOf[Int]) + } + case st: StructType => val fieldConverters = st.map(_.dataType).map(makeConverter).toArray (parser: JsonParser) => parseJsonToken[InternalRow](parser, dataType) { diff --git a/sql/core/src/test/scala/org/apache/spark/sql/JsonFunctionsSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/JsonFunctionsSuite.scala index 5485cc1..c2bea8c 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/JsonFunctionsSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/JsonFunctionsSuite.scala @@ -18,6 +18,7 @@ package org.apache.spark.sql import java.text.SimpleDateFormat +import java.time.Period import java.util.Locale import collection.JavaConverters._ @@ -27,6 +28,7 @@ import org.apache.spark.sql.functions._ import org.apache.spark.sql.internal.SQLConf import org.apache.spark.sql.test.SharedSparkSession import org.apache.spark.sql.types._ +import org.apache.spark.sql.types.YearMonthIntervalType.{MO
[spark] branch master updated (2fe6c94 -> f4237af)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 2fe6c94 [SPARK-33996][BUILD][FOLLOW-UP] Match SBT's plugin checkstyle version to Maven's add f4237af [SPARK-35998][SQL] Make from_csv/to_csv to handle year-month intervals properly No new revisions were added by this update. Summary of changes: .../sql/catalyst/csv/UnivocityGenerator.scala | 7 +- .../spark/sql/catalyst/csv/UnivocityParser.scala | 7 +- .../org/apache/spark/sql/CsvFunctionsSuite.scala | 29 ++ 3 files changed, 41 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.2 updated: [SPARK-35998][SQL] Make from_csv/to_csv to handle year-month intervals properly
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch branch-3.2 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.2 by this push: new 544b7e1 [SPARK-35998][SQL] Make from_csv/to_csv to handle year-month intervals properly 544b7e1 is described below commit 544b7e16acf51b7c2a9555fb4ebe7b19a00e Author: Kousuke Saruta AuthorDate: Mon Jul 5 13:10:50 2021 +0300 [SPARK-35998][SQL] Make from_csv/to_csv to handle year-month intervals properly ### What changes were proposed in this pull request? This PR fixes an issue that `from_csv/to_csv` doesn't handle year-month intervals properly. `from_csv` throws exception if year-month interval types are given. ``` spark-sql> select from_csv("interval '1-2' year to month", "a interval year to month"); 21/07/03 04:32:24 ERROR SparkSQLDriver: Failed in [select from_csv("interval '1-2' year to month", "a interval year to month")] java.lang.Exception: Unsupported type: interval year to month at org.apache.spark.sql.errors.QueryExecutionErrors$.unsupportedTypeError(QueryExecutionErrors.scala:775) at org.apache.spark.sql.catalyst.csv.UnivocityParser.makeConverter(UnivocityParser.scala:224) at org.apache.spark.sql.catalyst.csv.UnivocityParser.$anonfun$valueConverters$1(UnivocityParser.scala:134) ``` Also, `to_csv` doesn't handle year-month interval types properly though any exception is thrown. The result of `to_csv` for year-month interval types is not ANSI interval compliant form. ``` spark-sql> select to_csv(named_struct("a", interval '1-2' year to month)); 14 ``` The result above should be `INTERVAL '1-2' YEAR TO MONTH`. ### Why are the changes needed? Bug fix. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? New tests. Closes #33210 from sarutak/csv-yminterval. Authored-by: Kousuke Saruta Signed-off-by: Max Gekk (cherry picked from commit f4237aff7ebece0b8d61e680ecbe759850f25af8) Signed-off-by: Max Gekk --- .../sql/catalyst/csv/UnivocityGenerator.scala | 7 +- .../spark/sql/catalyst/csv/UnivocityParser.scala | 7 +- .../org/apache/spark/sql/CsvFunctionsSuite.scala | 29 ++ 3 files changed, 41 insertions(+), 2 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityGenerator.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityGenerator.scala index 11b31ce..5d70ccb 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityGenerator.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityGenerator.scala @@ -22,7 +22,7 @@ import java.io.Writer import com.univocity.parsers.csv.CsvWriter import org.apache.spark.sql.catalyst.InternalRow -import org.apache.spark.sql.catalyst.util.{DateFormatter, TimestampFormatter} +import org.apache.spark.sql.catalyst.util.{DateFormatter, IntervalStringStyles, IntervalUtils, TimestampFormatter} import org.apache.spark.sql.catalyst.util.LegacyDateFormats.FAST_DATE_FORMAT import org.apache.spark.sql.types._ @@ -61,6 +61,11 @@ class UnivocityGenerator( case TimestampType => (row: InternalRow, ordinal: Int) => timestampFormatter.format(row.getLong(ordinal)) +case YearMonthIntervalType(start, end) => + (row: InternalRow, ordinal: Int) => +IntervalUtils.toYearMonthIntervalString( + row.getInt(ordinal), IntervalStringStyles.ANSI_STYLE, start, end) + case udt: UserDefinedType[_] => makeConverter(udt.sqlType) case dt: DataType => diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityParser.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityParser.scala index 672d133..3ec1ea0 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityParser.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityParser.scala @@ -26,7 +26,7 @@ import com.univocity.parsers.csv.CsvParser import org.apache.spark.SparkUpgradeException import org.apache.spark.internal.Logging import org.apache.spark.sql.catalyst.{InternalRow, NoopFilters, OrderedFilters} -import org.apache.spark.sql.catalyst.expressions.{ExprUtils, GenericInternalRow} +import org.apache.spark.sql.catalyst.expressions.{Cast, EmptyRow, ExprUtils, GenericInternalRow, Literal} import org.apache.spark.sql.catalyst.util._ import org.apache.spark.sql.catalyst.util.LegacyDateFormats.FAST_DATE_FORMAT import org.apache.spark.sql.errors.QueryExecutionErrors @@ -217,6 +217,11 @@ class UnivocityParser( IntervalUtils.
[spark] branch branch-3.2 updated: [SPARK-36021][SQL] Parse interval literals should support more than 2 digits
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch branch-3.2 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.2 by this push: new 74bfbcd [SPARK-36021][SQL] Parse interval literals should support more than 2 digits 74bfbcd is described below commit 74bfbcd6430b1a5d274c660224851ce1562813e4 Author: Angerszh AuthorDate: Wed Jul 7 20:31:29 2021 +0300 [SPARK-36021][SQL] Parse interval literals should support more than 2 digits ### What changes were proposed in this pull request? For case ``` spark-sql> select interval '123456:12' minute to second; Error in query: requirement failed: Interval string must match day-time format of '^(?[+|-])?(?\d{1,2}):(?(\d{1,2})(\.(\d{1,9}))?)$': 123456:12, set spark.sql.legacy.fromDayTimeString.enabled to true to restore the behavior before Spark 3.0.(line 1, pos 16) == SQL == select interval '123456:12' minute to second ^^^ ``` we should support hour/minute/second when for more than 2 digits when parse interval literal string ### Why are the changes needed? Keep consistence ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Added UT Closes #33231 from AngersZh/SPARK-36021. Authored-by: Angerszh Signed-off-by: Max Gekk (cherry picked from commit eaa200e586043e29e2f3566b95b6943b811f) Signed-off-by: Max Gekk --- .../spark/sql/catalyst/util/IntervalUtils.scala| 144 +-- .../sql/catalyst/util/IntervalUtilsSuite.scala | 25 +-- .../test/resources/sql-tests/inputs/interval.sql | 21 +++ .../sql-tests/results/ansi/interval.sql.out| 198 - .../resources/sql-tests/results/interval.sql.out | 198 - .../sql-tests/results/postgreSQL/interval.sql.out | 20 +-- 6 files changed, 467 insertions(+), 139 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala index ad87f2a..24bcad8 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala @@ -57,6 +57,12 @@ object IntervalUtils { } import IntervalUnit._ + private val MAX_DAY = Long.MaxValue / MICROS_PER_DAY + private val MAX_HOUR = Long.MaxValue / MICROS_PER_HOUR + private val MAX_MINUTE = Long.MaxValue / MICROS_PER_MINUTE + private val MAX_SECOND = Long.MaxValue / MICROS_PER_SECOND + private val MIN_SECOND = Long.MinValue / MICROS_PER_SECOND + def getYears(months: Int): Int = months / MONTHS_PER_YEAR def getYears(interval: CalendarInterval): Int = getYears(interval.months) @@ -213,19 +219,25 @@ object IntervalUtils { } /** - * Parse YearMonth string in form: [+|-]-MM + * Parse year-month interval in form: [+|-]-MM * * adapted from HiveIntervalYearMonth.valueOf */ def fromYearMonthString(input: String): CalendarInterval = { +fromYearMonthString(input, YM.YEAR, YM.MONTH) + } + + /** + * Parse year-month interval in form: [+|-]-MM + * + * adapted from HiveIntervalYearMonth.valueOf + * Below interval conversion patterns are supported: + * - YEAR TO (YEAR|MONTH) + */ + def fromYearMonthString(input: String, startField: Byte, endField: Byte): CalendarInterval = { require(input != null, "Interval year-month string must be not null") -input.trim match { - case yearMonthRegex(sign, yearStr, monthStr) => -new CalendarInterval(toYMInterval(yearStr, monthStr, finalSign(sign)), 0, 0) - case _ => -throw new IllegalArgumentException( - s"Interval string does not match year-month format of 'y-m': $input") -} +val months = castStringToYMInterval(UTF8String.fromString(input), startField, endField) +new CalendarInterval(months, 0, 0) } private def safeToInterval[T](interval: String)(f: => T): T = { @@ -397,7 +409,7 @@ object IntervalUtils { secondStr: String, sign: Int): Long = { var micros = 0L -val days = toLongWithRange(DAY, dayStr, 0, Int.MaxValue).toInt +val days = toLongWithRange(DAY, dayStr, 0, MAX_DAY).toInt micros = Math.addExact(micros, sign * days * MICROS_PER_DAY) val hours = toLongWithRange(HOUR, hourStr, 0, 23) micros = Math.addExact(micros, sign * hours * MICROS_PER_HOUR) @@ -413,7 +425,7 @@ object IntervalUtils { secondStr: String, sign: Int): Long = { var micros = 0L -val hours = toLongWithRange(HOUR, hourStr, 0, 2562047788L) +val hours = toLongWithRange(HOUR, hourStr, 0, MAX_HOUR) micros = Math.addExact(micros, sign * h
[spark] branch master updated (62ff2ad -> ea3333a)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 62ff2ad [SPARK-36015][SQL] Support TimestampNTZType in the Window spec definition add eaa [SPARK-36021][SQL] Parse interval literals should support more than 2 digits No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/util/IntervalUtils.scala| 144 +-- .../sql/catalyst/util/IntervalUtilsSuite.scala | 25 +-- .../test/resources/sql-tests/inputs/interval.sql | 21 +++ .../sql-tests/results/ansi/interval.sql.out| 198 - .../resources/sql-tests/results/interval.sql.out | 198 - .../sql-tests/results/postgreSQL/interval.sql.out | 20 +-- 6 files changed, 467 insertions(+), 139 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-36015][SQL] Support TimestampNTZType in the Window spec definition
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 62ff2ad [SPARK-36015][SQL] Support TimestampNTZType in the Window spec definition 62ff2ad is described below commit 62ff2add9444fbd54802548b3daf7cde5820feef Author: gengjiaan AuthorDate: Wed Jul 7 20:27:05 2021 +0300 [SPARK-36015][SQL] Support TimestampNTZType in the Window spec definition ### What changes were proposed in this pull request? The method `WindowSpecDefinition.isValidFrameType` doesn't consider `TimestampNTZType`. We should support it as for `TimestampType`. ### Why are the changes needed? Support `TimestampNTZType` in the Window spec definition. ### Does this PR introduce _any_ user-facing change? 'Yes'. This PR allows users use `TimestampNTZType` as the sort spec in window spec definition. ### How was this patch tested? New tests. Closes #33246 from beliefer/SPARK-36015. Authored-by: gengjiaan Signed-off-by: Max Gekk --- .../catalyst/expressions/windowExpressions.scala | 6 +-- .../sql/execution/window/WindowExecBase.scala | 10 ++-- .../src/test/resources/sql-tests/inputs/window.sql | 9 .../resources/sql-tests/results/window.sql.out | 56 +- 4 files changed, 73 insertions(+), 8 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala index 2555c6a..fc2e449 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala @@ -102,9 +102,9 @@ case class WindowSpecDefinition( private def isValidFrameType(ft: DataType): Boolean = (orderSpec.head.dataType, ft) match { case (DateType, IntegerType) => true case (DateType, _: YearMonthIntervalType) => true -case (TimestampType, CalendarIntervalType) => true -case (TimestampType, _: YearMonthIntervalType) => true -case (TimestampType, _: DayTimeIntervalType) => true +case (TimestampType | TimestampNTZType, CalendarIntervalType) => true +case (TimestampType | TimestampNTZType, _: YearMonthIntervalType) => true +case (TimestampType | TimestampNTZType, _: DayTimeIntervalType) => true case (a, b) => a == b } } diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowExecBase.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowExecBase.scala index 2aa0b02..f3b3b34 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowExecBase.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowExecBase.scala @@ -24,7 +24,7 @@ import org.apache.spark.sql.catalyst.InternalRow import org.apache.spark.sql.catalyst.expressions._ import org.apache.spark.sql.catalyst.expressions.aggregate.AggregateExpression import org.apache.spark.sql.execution.UnaryExecNode -import org.apache.spark.sql.types.{CalendarIntervalType, DateType, DayTimeIntervalType, IntegerType, TimestampType, YearMonthIntervalType} +import org.apache.spark.sql.types._ trait WindowExecBase extends UnaryExecNode { def windowExpression: Seq[NamedExpression] @@ -96,10 +96,12 @@ trait WindowExecBase extends UnaryExecNode { val boundExpr = (expr.dataType, boundOffset.dataType) match { case (DateType, IntegerType) => DateAdd(expr, boundOffset) case (DateType, _: YearMonthIntervalType) => DateAddYMInterval(expr, boundOffset) - case (TimestampType, CalendarIntervalType) => TimeAdd(expr, boundOffset, Some(timeZone)) - case (TimestampType, _: YearMonthIntervalType) => + case (TimestampType | TimestampNTZType, CalendarIntervalType) => +TimeAdd(expr, boundOffset, Some(timeZone)) + case (TimestampType | TimestampNTZType, _: YearMonthIntervalType) => TimestampAddYMInterval(expr, boundOffset, Some(timeZone)) - case (TimestampType, _: DayTimeIntervalType) => TimeAdd(expr, boundOffset, Some(timeZone)) + case (TimestampType | TimestampNTZType, _: DayTimeIntervalType) => +TimeAdd(expr, boundOffset, Some(timeZone)) case (a, b) if a == b => Add(expr, boundOffset) } val bound = MutableProjection.create(boundExpr :: Nil, child.output) diff --git a/sql/core/src/test/resources/sql-tests/inputs/window.sql b/sql/core/src/test/resources/sql-tests/inputs/window.sql index 46d3629..9766aaf 100644 --- a/sql/core/src/test/resources/sql-tests/inputs/window.sql +++ b/sql/core/src/test/resources/sql-tes
[spark] branch branch-3.2 updated: [SPARK-36015][SQL] Support TimestampNTZType in the Window spec definition
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch branch-3.2 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.2 by this push: new 2fc57bba [SPARK-36015][SQL] Support TimestampNTZType in the Window spec definition 2fc57bba is described below commit 2fc57bba31a9bea3b97582751a693acd268158e9 Author: gengjiaan AuthorDate: Wed Jul 7 20:27:05 2021 +0300 [SPARK-36015][SQL] Support TimestampNTZType in the Window spec definition ### What changes were proposed in this pull request? The method `WindowSpecDefinition.isValidFrameType` doesn't consider `TimestampNTZType`. We should support it as for `TimestampType`. ### Why are the changes needed? Support `TimestampNTZType` in the Window spec definition. ### Does this PR introduce _any_ user-facing change? 'Yes'. This PR allows users use `TimestampNTZType` as the sort spec in window spec definition. ### How was this patch tested? New tests. Closes #33246 from beliefer/SPARK-36015. Authored-by: gengjiaan Signed-off-by: Max Gekk (cherry picked from commit 62ff2add9444fbd54802548b3daf7cde5820feef) Signed-off-by: Max Gekk --- .../catalyst/expressions/windowExpressions.scala | 6 +-- .../sql/execution/window/WindowExecBase.scala | 10 ++-- .../src/test/resources/sql-tests/inputs/window.sql | 9 .../resources/sql-tests/results/window.sql.out | 56 +- 4 files changed, 73 insertions(+), 8 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala index 2555c6a..fc2e449 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala @@ -102,9 +102,9 @@ case class WindowSpecDefinition( private def isValidFrameType(ft: DataType): Boolean = (orderSpec.head.dataType, ft) match { case (DateType, IntegerType) => true case (DateType, _: YearMonthIntervalType) => true -case (TimestampType, CalendarIntervalType) => true -case (TimestampType, _: YearMonthIntervalType) => true -case (TimestampType, _: DayTimeIntervalType) => true +case (TimestampType | TimestampNTZType, CalendarIntervalType) => true +case (TimestampType | TimestampNTZType, _: YearMonthIntervalType) => true +case (TimestampType | TimestampNTZType, _: DayTimeIntervalType) => true case (a, b) => a == b } } diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowExecBase.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowExecBase.scala index 2aa0b02..f3b3b34 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowExecBase.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowExecBase.scala @@ -24,7 +24,7 @@ import org.apache.spark.sql.catalyst.InternalRow import org.apache.spark.sql.catalyst.expressions._ import org.apache.spark.sql.catalyst.expressions.aggregate.AggregateExpression import org.apache.spark.sql.execution.UnaryExecNode -import org.apache.spark.sql.types.{CalendarIntervalType, DateType, DayTimeIntervalType, IntegerType, TimestampType, YearMonthIntervalType} +import org.apache.spark.sql.types._ trait WindowExecBase extends UnaryExecNode { def windowExpression: Seq[NamedExpression] @@ -96,10 +96,12 @@ trait WindowExecBase extends UnaryExecNode { val boundExpr = (expr.dataType, boundOffset.dataType) match { case (DateType, IntegerType) => DateAdd(expr, boundOffset) case (DateType, _: YearMonthIntervalType) => DateAddYMInterval(expr, boundOffset) - case (TimestampType, CalendarIntervalType) => TimeAdd(expr, boundOffset, Some(timeZone)) - case (TimestampType, _: YearMonthIntervalType) => + case (TimestampType | TimestampNTZType, CalendarIntervalType) => +TimeAdd(expr, boundOffset, Some(timeZone)) + case (TimestampType | TimestampNTZType, _: YearMonthIntervalType) => TimestampAddYMInterval(expr, boundOffset, Some(timeZone)) - case (TimestampType, _: DayTimeIntervalType) => TimeAdd(expr, boundOffset, Some(timeZone)) + case (TimestampType | TimestampNTZType, _: DayTimeIntervalType) => +TimeAdd(expr, boundOffset, Some(timeZone)) case (a, b) if a == b => Add(expr, boundOffset) } val bound = MutableProjection.create(boundExpr :: Nil, child.output) diff --git a/sql/core/src/test/resources/sql-tests/inputs/window.sql b/sql/core/src/test/resources/sql-tests/inputs/window.sql index 46d3629..9766aaf 100
[spark] branch branch-3.2 updated: [SPARK-36016][SQL] Support TimestampNTZType in expression ApproxCountDistinctForIntervals
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch branch-3.2 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.2 by this push: new 0c7972b [SPARK-36016][SQL] Support TimestampNTZType in expression ApproxCountDistinctForIntervals 0c7972b is described below commit 0c7972ba5f9c4ab991b8ade00df409fb4392788a Author: gengjiaan AuthorDate: Wed Jul 7 20:22:46 2021 +0300 [SPARK-36016][SQL] Support TimestampNTZType in expression ApproxCountDistinctForIntervals ### What changes were proposed in this pull request? The current `ApproxCountDistinctForInterval`s supports `TimestampType`, but not supports timestamp without time zone yet. This PR will add the function. ### Why are the changes needed? `ApproxCountDistinctForInterval` need supports `TimestampNTZType`. ### Does this PR introduce _any_ user-facing change? 'Yes'. `ApproxCountDistinctForInterval` accepts `TimestampNTZType`. ### How was this patch tested? New tests. Closes #33243 from beliefer/SPARK-36016. Authored-by: gengjiaan Signed-off-by: Max Gekk (cherry picked from commit be382a6285fcbf374faeec1298371952a7bf1193) Signed-off-by: Max Gekk --- .../expressions/aggregate/ApproxCountDistinctForIntervals.scala | 6 +++--- .../aggregate/ApproxCountDistinctForIntervalsSuite.scala | 8 ++-- 2 files changed, 9 insertions(+), 5 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproxCountDistinctForIntervals.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproxCountDistinctForIntervals.scala index 19e212d..a7e9a22 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproxCountDistinctForIntervals.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproxCountDistinctForIntervals.scala @@ -61,7 +61,7 @@ case class ApproxCountDistinctForIntervals( } override def inputTypes: Seq[AbstractDataType] = { -Seq(TypeCollection(NumericType, TimestampType, DateType), ArrayType) +Seq(TypeCollection(NumericType, TimestampType, DateType, TimestampNTZType), ArrayType) } // Mark as lazy so that endpointsExpression is not evaluated during tree transformation. @@ -79,7 +79,7 @@ case class ApproxCountDistinctForIntervals( TypeCheckFailure("The endpoints provided must be constant literals") } else { endpointsExpression.dataType match { -case ArrayType(_: NumericType | DateType | TimestampType, _) => +case ArrayType(_: NumericType | DateType | TimestampType | TimestampNTZType, _) => if (endpoints.length < 2) { TypeCheckFailure("The number of endpoints must be >= 2 to construct intervals") } else { @@ -122,7 +122,7 @@ case class ApproxCountDistinctForIntervals( n.numeric.toDouble(value.asInstanceOf[n.InternalType]) case _: DateType => value.asInstanceOf[Int].toDouble -case _: TimestampType => +case TimestampType | TimestampNTZType => value.asInstanceOf[Long].toDouble } diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproxCountDistinctForIntervalsSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproxCountDistinctForIntervalsSuite.scala index 73f18d4..9d53673 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproxCountDistinctForIntervalsSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproxCountDistinctForIntervalsSuite.scala @@ -18,6 +18,7 @@ package org.apache.spark.sql.catalyst.expressions.aggregate import java.sql.{Date, Timestamp} +import java.time.LocalDateTime import org.apache.spark.SparkFunSuite import org.apache.spark.sql.catalyst.InternalRow @@ -38,7 +39,7 @@ class ApproxCountDistinctForIntervalsSuite extends SparkFunSuite { assert( wrongColumn.checkInputDataTypes() match { case TypeCheckFailure(msg) -if msg.contains("requires (numeric or timestamp or date) type") => true +if msg.contains("requires (numeric or timestamp or date or timestamp_ntz) type") => true case _ => false }) } @@ -199,7 +200,9 @@ class ApproxCountDistinctForIntervalsSuite extends SparkFunSuite { (intRecords.map(DateTimeUtils.toJavaDate), intEndpoints.map(DateTimeUtils.toJavaDate), DateType), (intRecords.map(DateTimeUtils.toJavaTimestamp(_)), - intEndpoints.map(DateTimeUtils.toJavaTimestamp(_)), TimestampType) + intEndpoints.map(Dat
[spark] branch master updated (55373b1 -> be382a6)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 55373b1 [SPARK-35907][CORE] Instead of File#mkdirs, Files#createDirectories is expected add be382a6 [SPARK-36016][SQL] Support TimestampNTZType in expression ApproxCountDistinctForIntervals No new revisions were added by this update. Summary of changes: .../expressions/aggregate/ApproxCountDistinctForIntervals.scala | 6 +++--- .../aggregate/ApproxCountDistinctForIntervalsSuite.scala | 8 ++-- 2 files changed, 9 insertions(+), 5 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.2 updated: [SPARK-36022][SQL] Respect interval fields in extract
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch branch-3.2 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.2 by this push: new 429d178 [SPARK-36022][SQL] Respect interval fields in extract 429d178 is described below commit 429d1780b3ab3267a5e22a857cf51458713bc208 Author: Kousuke Saruta AuthorDate: Thu Jul 8 09:40:57 2021 +0300 [SPARK-36022][SQL] Respect interval fields in extract ### What changes were proposed in this pull request? This PR fixes an issue about `extract`. `Extract` should process only existing fields of interval types. For example: ``` spark-sql> SELECT EXTRACT(MONTH FROM INTERVAL '2021-11' YEAR TO MONTH); 11 spark-sql> SELECT EXTRACT(MONTH FROM INTERVAL '2021' YEAR); 0 ``` The last command should fail as the month field doesn't present in INTERVAL YEAR. ### Why are the changes needed? Bug fix. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? New tests. Closes #33247 from sarutak/fix-extract-interval. Authored-by: Kousuke Saruta Signed-off-by: Max Gekk (cherry picked from commit 39002cb99514010f6d6cc2e575b9eab1694f04ef) Signed-off-by: Max Gekk --- .../catalyst/expressions/intervalExpressions.scala | 24 ++-- .../apache/spark/sql/IntervalFunctionsSuite.scala | 64 ++ 2 files changed, 82 insertions(+), 6 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/intervalExpressions.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/intervalExpressions.scala index 5d49007..5b111d1 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/intervalExpressions.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/intervalExpressions.scala @@ -29,6 +29,8 @@ import org.apache.spark.sql.catalyst.util.IntervalUtils._ import org.apache.spark.sql.errors.QueryExecutionErrors import org.apache.spark.sql.internal.SQLConf import org.apache.spark.sql.types._ +import org.apache.spark.sql.types.DayTimeIntervalType.{DAY, HOUR, MINUTE, SECOND} +import org.apache.spark.sql.types.YearMonthIntervalType.{MONTH, YEAR} import org.apache.spark.unsafe.types.CalendarInterval abstract class ExtractIntervalPart[T]( @@ -125,33 +127,43 @@ object ExtractIntervalPart { source: Expression, errorHandleFunc: => Nothing): Expression = { (extractField.toUpperCase(Locale.ROOT), source.dataType) match { - case ("YEAR" | "Y" | "YEARS" | "YR" | "YRS", _: YearMonthIntervalType) => + case ("YEAR" | "Y" | "YEARS" | "YR" | "YRS", YearMonthIntervalType(start, end)) +if isUnitInIntervalRange(YEAR, start, end) => ExtractANSIIntervalYears(source) case ("YEAR" | "Y" | "YEARS" | "YR" | "YRS", CalendarIntervalType) => ExtractIntervalYears(source) - case ("MONTH" | "MON" | "MONS" | "MONTHS", _: YearMonthIntervalType) => + case ("MONTH" | "MON" | "MONS" | "MONTHS", YearMonthIntervalType(start, end)) +if isUnitInIntervalRange(MONTH, start, end) => ExtractANSIIntervalMonths(source) case ("MONTH" | "MON" | "MONS" | "MONTHS", CalendarIntervalType) => ExtractIntervalMonths(source) - case ("DAY" | "D" | "DAYS", _: DayTimeIntervalType) => + case ("DAY" | "D" | "DAYS", DayTimeIntervalType(start, end)) +if isUnitInIntervalRange(DAY, start, end) => ExtractANSIIntervalDays(source) case ("DAY" | "D" | "DAYS", CalendarIntervalType) => ExtractIntervalDays(source) - case ("HOUR" | "H" | "HOURS" | "HR" | "HRS", _: DayTimeIntervalType) => + case ("HOUR" | "H" | "HOURS" | "HR" | "HRS", DayTimeIntervalType(start, end)) +if isUnitInIntervalRange(HOUR, start, end) => ExtractANSIIntervalHours(source) case ("HOUR" | "H" | "HOURS" | "HR" | "HRS", CalendarIntervalType) => ExtractIntervalHours(source) - case ("MINUTE" | "M" | "MIN" | "MINS" | "MINUTES", _: DayTimeIntervalType) => + case ("MINUTE" | "M" | "MIN" | "MINS" | "MINUTES", DayTimeInterva
[spark] branch master updated (23943e5 -> 39002cb)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 23943e5 [SPARK-32577][SQL][TEST][FOLLOWUP] Fix the config value of shuffled hash join for all other test queries add 39002cb [SPARK-36022][SQL] Respect interval fields in extract No new revisions were added by this update. Summary of changes: .../catalyst/expressions/intervalExpressions.scala | 24 ++-- .../apache/spark/sql/IntervalFunctionsSuite.scala | 64 ++ 2 files changed, 82 insertions(+), 6 deletions(-) create mode 100644 sql/core/src/test/scala/org/apache/spark/sql/IntervalFunctionsSuite.scala - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (39002cb -> 89aa16b)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 39002cb [SPARK-36022][SQL] Respect interval fields in extract add 89aa16b [SPARK-36021][SQL][FOLLOWUP] DT/YM func use field byte to keep consistence No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/parser/AstBuilder.scala | 17 --- .../spark/sql/catalyst/util/IntervalUtils.scala| 24 ++ .../catalyst/parser/ExpressionParserSuite.scala| 3 ++- .../sql/catalyst/util/IntervalUtilsSuite.scala | 13 4 files changed, 21 insertions(+), 36 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.2 updated: [SPARK-36021][SQL][FOLLOWUP] DT/YM func use field byte to keep consistence
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch branch-3.2 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.2 by this push: new 2776e8a [SPARK-36021][SQL][FOLLOWUP] DT/YM func use field byte to keep consistence 2776e8a is described below commit 2776e8aa4792a7b2e95d5bbbd76cea0f9554503c Author: Angerszh AuthorDate: Thu Jul 8 12:22:04 2021 +0300 [SPARK-36021][SQL][FOLLOWUP] DT/YM func use field byte to keep consistence ### What changes were proposed in this pull request? With more thought, all DT/YM function use field byte to keep consistence is better ### Why are the changes needed? Keep code consistence ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Not need Closes #33252 from AngersZh/SPARK-36021-FOLLOWUP. Authored-by: Angerszh Signed-off-by: Max Gekk (cherry picked from commit 89aa16b4a838246cfb7bdc9318461485016f6252) Signed-off-by: Max Gekk --- .../spark/sql/catalyst/parser/AstBuilder.scala | 17 --- .../spark/sql/catalyst/util/IntervalUtils.scala| 24 ++ .../catalyst/parser/ExpressionParserSuite.scala| 3 ++- .../sql/catalyst/util/IntervalUtilsSuite.scala | 13 4 files changed, 21 insertions(+), 36 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala index d6363b5..4f1e53f 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala @@ -40,7 +40,6 @@ import org.apache.spark.sql.catalyst.plans.logical._ import org.apache.spark.sql.catalyst.trees.CurrentOrigin import org.apache.spark.sql.catalyst.util.{CharVarcharUtils, DateTimeUtils, IntervalUtils} import org.apache.spark.sql.catalyst.util.DateTimeUtils.{convertSpecialDate, convertSpecialTimestamp, convertSpecialTimestampNTZ, getZoneId, stringToDate, stringToTimestamp, stringToTimestampWithoutTimeZone} -import org.apache.spark.sql.catalyst.util.IntervalUtils.IntervalUnit import org.apache.spark.sql.connector.catalog.{SupportsNamespaces, TableCatalog} import org.apache.spark.sql.connector.catalog.TableChange.ColumnPosition import org.apache.spark.sql.connector.expressions.{ApplyTransform, BucketTransform, DaysTransform, Expression => V2Expression, FieldReference, HoursTransform, IdentityTransform, LiteralValue, MonthsTransform, Transform, YearsTransform} @@ -2487,18 +2486,10 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with SQLConfHelper with Logg (from, to) match { case ("year", "month") => IntervalUtils.fromYearMonthString(value) - case ("day", "hour") => -IntervalUtils.fromDayTimeString(value, IntervalUnit.DAY, IntervalUnit.HOUR) - case ("day", "minute") => -IntervalUtils.fromDayTimeString(value, IntervalUnit.DAY, IntervalUnit.MINUTE) - case ("day", "second") => -IntervalUtils.fromDayTimeString(value, IntervalUnit.DAY, IntervalUnit.SECOND) - case ("hour", "minute") => -IntervalUtils.fromDayTimeString(value, IntervalUnit.HOUR, IntervalUnit.MINUTE) - case ("hour", "second") => -IntervalUtils.fromDayTimeString(value, IntervalUnit.HOUR, IntervalUnit.SECOND) - case ("minute", "second") => -IntervalUtils.fromDayTimeString(value, IntervalUnit.MINUTE, IntervalUnit.SECOND) + case ("day", "hour") | ("day", "minute") | ("day", "second") | ("hour", "minute") | + ("hour", "second") | ("minute", "second") => +IntervalUtils.fromDayTimeString(value, + DayTimeIntervalType.stringToField(from), DayTimeIntervalType.stringToField(to)) case _ => throw QueryParsingErrors.fromToIntervalUnsupportedError(from, to, ctx) } diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala index 24bcad8..7579a28 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala @@ -458,7 +458,7 @@ object IntervalUtils { * adapted from HiveIntervalDayTime.valueOf */ def fromDayTimeS
[spark] branch branch-3.2 updated: [SPARK-36055][SQL] Assign pretty SQL string to TimestampNTZ literals
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch branch-3.2 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.2 by this push: new 9103c1f [SPARK-36055][SQL] Assign pretty SQL string to TimestampNTZ literals 9103c1f is described below commit 9103c1fe2332a60424077ca9ecffb24afa440c55 Author: Gengliang Wang AuthorDate: Thu Jul 8 21:42:50 2021 +0300 [SPARK-36055][SQL] Assign pretty SQL string to TimestampNTZ literals ### What changes were proposed in this pull request? Currently the TimestampNTZ literals shows only long value instead of timestamp string in its SQL string and toString result. Before changes (with default timestamp type as TIMESTAMP_NTZ) ``` – !query select timestamp '2019-01-01\t' – !query schema struct<15463008:timestamp_ntz> ``` After changes: ``` – !query select timestamp '2019-01-01\t' – !query schema struct ``` ### Why are the changes needed? Make the schema of TimestampNTZ literals readable. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Unit test Closes #33269 from gengliangwang/ntzLiteralString. Authored-by: Gengliang Wang Signed-off-by: Max Gekk (cherry picked from commit ee945e99cc1d3979a2c24077a9ae786ce50bbe81) Signed-off-by: Max Gekk --- .../spark/sql/catalyst/expressions/literals.scala | 6 +- .../catalyst/expressions/LiteralExpressionSuite.scala | 9 + .../sql-tests/results/timestampNTZ/datetime.sql.out| 18 +- 3 files changed, 23 insertions(+), 10 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala index a2270eb..ee40909 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala @@ -28,7 +28,7 @@ import java.lang.{Short => JavaShort} import java.math.{BigDecimal => JavaBigDecimal} import java.nio.charset.StandardCharsets import java.sql.{Date, Timestamp} -import java.time.{Duration, Instant, LocalDate, LocalDateTime, Period} +import java.time.{Duration, Instant, LocalDate, LocalDateTime, Period, ZoneOffset} import java.util import java.util.Objects import javax.xml.bind.DatatypeConverter @@ -352,6 +352,8 @@ case class Literal (value: Any, dataType: DataType) extends LeafExpression { DateFormatter().format(value.asInstanceOf[Int]) case TimestampType => TimestampFormatter.getFractionFormatter(timeZoneId).format(value.asInstanceOf[Long]) +case TimestampNTZType => + TimestampFormatter.getFractionFormatter(ZoneOffset.UTC).format(value.asInstanceOf[Long]) case DayTimeIntervalType(startField, endField) => toDayTimeIntervalString(value.asInstanceOf[Long], ANSI_STYLE, startField, endField) case YearMonthIntervalType(startField, endField) => @@ -473,6 +475,8 @@ case class Literal (value: Any, dataType: DataType) extends LeafExpression { s"DATE '$toString'" case (v: Long, TimestampType) => s"TIMESTAMP '$toString'" +case (v: Long, TimestampNTZType) => + s"TIMESTAMP_NTZ '$toString'" case (i: CalendarInterval, CalendarIntervalType) => s"INTERVAL '${i.toString}'" case (v: Array[Byte], BinaryType) => s"X'${DatatypeConverter.printHexBinary(v)}'" diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/LiteralExpressionSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/LiteralExpressionSuite.scala index 50b7263..4081e13 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/LiteralExpressionSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/LiteralExpressionSuite.scala @@ -362,6 +362,15 @@ class LiteralExpressionSuite extends SparkFunSuite with ExpressionEvalHelper { } } + test("SPARK-36055: TimestampNTZ toString") { +assert(Literal.default(TimestampNTZType).toString === "1970-01-01 00:00:00") +withTimeZones(sessionTimeZone = "GMT+01:00", systemTimeZone = "GMT-08:00") { + val timestamp = LocalDateTime.of(2021, 2, 3, 16, 50, 3, 45600) + val literalStr = Literal.create(timestamp).toString + assert(literalStr === "2021-02-03 16:50:03.456") +} + } + test("SPARK-35664: construct literals from java.time.LocalDateTime") { Seq( LocalDateTime.of(1, 1, 1, 0, 0, 0, 0), diff
[spark] branch master updated (e071721 -> ee945e9)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from e071721 [SPARK-36012][SQL] Add null flag in SHOW CREATE TABLE add ee945e9 [SPARK-36055][SQL] Assign pretty SQL string to TimestampNTZ literals No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/expressions/literals.scala | 6 +- .../catalyst/expressions/LiteralExpressionSuite.scala | 9 + .../sql-tests/results/timestampNTZ/datetime.sql.out| 18 +- 3 files changed, 23 insertions(+), 10 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-35777][SQL][TESTS] Check all year-month interval types in UDF
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 490ae8f [SPARK-35777][SQL][TESTS] Check all year-month interval types in UDF 490ae8f is described below commit 490ae8f4d6c32274144e22789fc184150302c7f7 Author: Angerszh AuthorDate: Thu Jun 24 08:56:08 2021 +0300 [SPARK-35777][SQL][TESTS] Check all year-month interval types in UDF ### What changes were proposed in this pull request? Check all year-month interval types in UDF. ### Why are the changes needed? New checks should improve test coverage. ### Does this PR introduce _any_ user-facing change? Yes but `YearMonthIntervalType` has not been released yet. ### How was this patch tested? Existed UT. Closes #32985 from AngersZh/SPARK-35777. Authored-by: Angerszh Signed-off-by: Max Gekk --- .../test/scala/org/apache/spark/sql/UDFSuite.scala | 47 +++--- 1 file changed, 33 insertions(+), 14 deletions(-) diff --git a/sql/core/src/test/scala/org/apache/spark/sql/UDFSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/UDFSuite.scala index 65cbaf5..a341c87 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/UDFSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/UDFSuite.scala @@ -42,6 +42,7 @@ import org.apache.spark.sql.internal.SQLConf import org.apache.spark.sql.test.SharedSparkSession import org.apache.spark.sql.test.SQLTestData._ import org.apache.spark.sql.types._ +import org.apache.spark.sql.types.YearMonthIntervalType._ import org.apache.spark.sql.util.QueryExecutionListener private case class FunctionResult(f1: String, f2: String) @@ -902,27 +903,45 @@ class UDFSuite extends QueryTest with SharedSparkSession { assert(e.isInstanceOf[java.lang.ArithmeticException]) } - test("SPARK-34663: using java.time.Period in UDF") { + test("SPARK-34663, SPARK-35777: using java.time.Period in UDF") { // Regular case -val input = Seq(java.time.Period.ofMonths(11)).toDF("p") +val input = Seq(java.time.Period.ofMonths(13)).toDF("p") + .select($"p", $"p".cast(YearMonthIntervalType(YEAR)).as("p_y"), +$"p".cast(YearMonthIntervalType(MONTH)).as("p_m")) val incMonth = udf((p: java.time.Period) => p.plusMonths(1)) -val result = input.select(incMonth($"p").as("new_p")) -checkAnswer(result, Row(java.time.Period.ofYears(1)) :: Nil) -// TODO(SPARK-35777): Check all year-month interval types in UDF -assert(result.schema === new StructType().add("new_p", YearMonthIntervalType())) +val result = input.select(incMonth($"p").as("new_p"), + incMonth($"p_y").as("new_p_y"), + incMonth($"p_m").as("new_p_m")) +checkAnswer(result, Row(java.time.Period.ofMonths(14).normalized(), + java.time.Period.ofMonths(13).normalized(), + java.time.Period.ofMonths(14).normalized()) :: Nil) +assert(result.schema === new StructType() + .add("new_p", YearMonthIntervalType()) + .add("new_p_y", YearMonthIntervalType()) + .add("new_p_m", YearMonthIntervalType())) // UDF produces `null` val nullFunc = udf((_: java.time.Period) => null.asInstanceOf[java.time.Period]) -val nullResult = input.select(nullFunc($"p").as("null_p")) -checkAnswer(nullResult, Row(null) :: Nil) -// TODO(SPARK-35777): Check all year-month interval types in UDF -assert(nullResult.schema === new StructType().add("null_p", YearMonthIntervalType())) +val nullResult = input.select(nullFunc($"p").as("null_p"), + nullFunc($"p_y").as("null_p_y"), nullFunc($"p_m").as("null_p_m")) +checkAnswer(nullResult, Row(null, null, null) :: Nil) +assert(nullResult.schema === new StructType() + .add("null_p", YearMonthIntervalType()) + .add("null_p_y", YearMonthIntervalType()) + .add("null_p_m", YearMonthIntervalType())) // Input parameter of UDF is null val nullInput = Seq(null.asInstanceOf[java.time.Period]).toDF("null_p") + .select($"null_p", +$"null_p".cast(YearMonthIntervalType(YEAR)).as("null_p_y"), +$"null_p".cast(YearMonthIntervalType(MONTH)).as("null_p_m")) val constPeriod = udf((_: java.time.Period) => java.time.Period.ofYears(10)) -val constResult = nullInput.select(constPeriod($"null_p").as("10_years")) -checkAnswer(constResult, Row(java.time.Period.ofYears(10)) :: Nil) -// TODO(SPARK-3
[spark] branch master updated: [SPARK-35852][SQL] Use DateAdd instead of TimeAdd for DateType +/- INTERVAL DAY
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 61bd036 [SPARK-35852][SQL] Use DateAdd instead of TimeAdd for DateType +/- INTERVAL DAY 61bd036 is described below commit 61bd036cb96fd3a81fc4d131a29d956407956c94 Author: PengLei <18066542...@189.cn> AuthorDate: Thu Jun 24 08:47:29 2021 +0300 [SPARK-35852][SQL] Use DateAdd instead of TimeAdd for DateType +/- INTERVAL DAY ### What changes were proposed in this pull request? We use `DateAdd` to impl `DateType` `+`/`-` `INTERVAL DAY` ### Why are the changes needed? To improve the impl of `DateType` `+`/`-` `INTERVAL DAY` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Add ut test Closes #33033 from Peng-Lei/SPARK-35852. Authored-by: PengLei <18066542...@189.cn> Signed-off-by: Max Gekk --- .../spark/sql/catalyst/analysis/Analyzer.scala | 5 + .../apache/spark/sql/ColumnExpressionSuite.scala | 23 ++ 2 files changed, 28 insertions(+) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala index 2f142bf..7cb270c 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala @@ -53,6 +53,7 @@ import org.apache.spark.sql.execution.datasources.v2.DataSourceV2Relation import org.apache.spark.sql.internal.SQLConf import org.apache.spark.sql.internal.SQLConf.{PartitionOverwriteMode, StoreAssignmentPolicy} import org.apache.spark.sql.types._ +import org.apache.spark.sql.types.DayTimeIntervalType.DAY import org.apache.spark.sql.util.{CaseInsensitiveStringMap, SchemaUtils} import org.apache.spark.unsafe.types.UTF8String import org.apache.spark.util.Utils @@ -349,7 +350,9 @@ class Analyzer(override val catalogManager: CatalogManager) case p: LogicalPlan => p.transformExpressionsUpWithPruning( _.containsPattern(BINARY_ARITHMETIC), ruleId) { case a @ Add(l, r, f) if a.childrenResolved => (l.dataType, r.dataType) match { + case (DateType, DayTimeIntervalType(DAY, DAY)) => DateAdd(l, ExtractANSIIntervalDays(r)) case (DateType, _: DayTimeIntervalType) => TimeAdd(Cast(l, TimestampType), r) + case (DayTimeIntervalType(DAY, DAY), DateType) => DateAdd(r, ExtractANSIIntervalDays(l)) case (_: DayTimeIntervalType, DateType) => TimeAdd(Cast(r, TimestampType), l) case (DateType, _: YearMonthIntervalType) => DateAddYMInterval(l, r) case (_: YearMonthIntervalType, DateType) => DateAddYMInterval(r, l) @@ -366,6 +369,8 @@ class Analyzer(override val catalogManager: CatalogManager) case _ => a } case s @ Subtract(l, r, f) if s.childrenResolved => (l.dataType, r.dataType) match { + case (DateType, DayTimeIntervalType(DAY, DAY)) => +DateAdd(l, UnaryMinus(ExtractANSIIntervalDays(r), f)) case (DateType, _: DayTimeIntervalType) => DatetimeSub(l, r, TimeAdd(Cast(l, TimestampType), UnaryMinus(r, f))) case (DateType, _: YearMonthIntervalType) => diff --git a/sql/core/src/test/scala/org/apache/spark/sql/ColumnExpressionSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/ColumnExpressionSuite.scala index 13fdbf2..b0cd613 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/ColumnExpressionSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/ColumnExpressionSuite.scala @@ -2881,4 +2881,27 @@ class ColumnExpressionSuite extends QueryTest with SharedSparkSession { } } } + + test("SPARK-35852: add/subtract a interval day to/from a date") { +withSQLConf(SQLConf.DATETIME_JAVA8API_ENABLED.key -> "true") { + Seq( +(LocalDate.of(1, 1, 1), Duration.ofDays(31)), +(LocalDate.of(1582, 9, 15), Duration.ofDays(30)), +(LocalDate.of(1900, 1, 1), Duration.ofDays(0)), +(LocalDate.of(1970, 1, 1), Duration.ofDays(-1)), +(LocalDate.of(2021, 3, 14), Duration.ofDays(1)), +(LocalDate.of(2020, 12, 31), Duration.ofDays(4 * 30)), +(LocalDate.of(2020, 2, 29), Duration.ofDays(365)), +(LocalDate.of(1, 1, 1), Duration.ofDays(-2)) + ).foreach { case (date, duration) => +val days = duration.toDays +val add = date.plusDays(days) +val sub = date.minusDays(days) +val df = Seq((date, duration)).toDF("start", "diff") + .select($"start", $"diff" cast DayTimeIntervalType(DAY) as "diff&q
[spark] branch master updated (1295e88 -> 8a1995f)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 1295e88 [SPARK-35786][SQL] Add a new operator to distingush if AQE can optimize safely add 8a1995f [SPARK-35728][SQL][TESTS] Check multiply/divide of day-time intervals of any fields by numeric No new revisions were added by this update. Summary of changes: .../sql/catalyst/expressions/IntervalExpressionsSuite.scala | 12 +++- 1 file changed, 7 insertions(+), 5 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-35778][SQL][TESTS] Check multiply/divide of year month interval of any fields by numeric
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 3904c0e [SPARK-35778][SQL][TESTS] Check multiply/divide of year month interval of any fields by numeric 3904c0e is described below commit 3904c0edbad1838d70cbde0aec0318dea1ecdf4a Author: PengLei <18066542...@189.cn> AuthorDate: Thu Jun 24 12:25:06 2021 +0300 [SPARK-35778][SQL][TESTS] Check multiply/divide of year month interval of any fields by numeric ### What changes were proposed in this pull request? Check multiply/divide of year-month intervals of any fields by numeric. ### Why are the changes needed? To improve test coverage. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Expanded existed test cases. Closes #33051 from Peng-Lei/SPARK-35778. Authored-by: PengLei <18066542...@189.cn> Signed-off-by: Max Gekk --- .../sql/catalyst/expressions/IntervalExpressionsSuite.scala| 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/IntervalExpressionsSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/IntervalExpressionsSuite.scala index 3d423dd..b83aeab 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/IntervalExpressionsSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/IntervalExpressionsSuite.scala @@ -325,7 +325,6 @@ class IntervalExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper { checkException(days = Int.MaxValue) } - // TODO(SPARK-35778): Check multiply/divide of year-month intervals of any fields by numeric test("SPARK-34824: multiply year-month interval by numeric") { Seq( (Period.ofYears(-123), Literal(null, DecimalType.USER_DEFAULT)) -> null, @@ -337,7 +336,9 @@ class IntervalExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper { (Period.ofYears(), 0.0001d) -> Period.ofYears(1), (Period.ofYears(), BigDecimal(0.0001)) -> Period.ofYears(1) ).foreach { case ((period, num), expected) => - checkEvaluation(MultiplyYMInterval(Literal(period), Literal(num)), expected) + DataTypeTestUtils.yearMonthIntervalTypes.foreach { dt => +checkEvaluation(MultiplyYMInterval(Literal.create(period, dt), Literal(num)), expected) + } } Seq( @@ -398,7 +399,6 @@ class IntervalExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper { } } - // TODO(SPARK-35778): Check multiply/divide of year-month intervals of any fields by numeric test("SPARK-34868: divide year-month interval by numeric") { Seq( (Period.ofYears(-123), Literal(null, DecimalType.USER_DEFAULT)) -> null, @@ -412,7 +412,9 @@ class IntervalExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper { (Period.ofYears(1000), 100d) -> Period.ofYears(10), (Period.ofMonths(2), BigDecimal(0.1)) -> Period.ofMonths(20) ).foreach { case ((period, num), expected) => - checkEvaluation(DivideYMInterval(Literal(period), Literal(num)), expected) + DataTypeTestUtils.yearMonthIntervalTypes.foreach { dt => +checkEvaluation(DivideYMInterval(Literal.create(period, dt), Literal(num)), expected) + } } Seq( - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (92ddef7 -> 7d0786f)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 92ddef7 [SPARK-35696][PYTHON][DOCS][FOLLOW-UP] Fix underline for title in FAQ to remove warnings add 7d0786f [SPARK-35730][SQL][TESTS] Check all day-time interval types in UDF No new revisions were added by this update. Summary of changes: .../test/scala/org/apache/spark/sql/UDFSuite.scala | 112 ++--- 1 file changed, 99 insertions(+), 13 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-35975][SQL] New configuration `spark.sql.timestampType` for the default timestamp type
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new a643076 [SPARK-35975][SQL] New configuration `spark.sql.timestampType` for the default timestamp type a643076 is described below commit a643076d4ef622eac505ebf22c9aa2cc909320ac Author: Gengliang Wang AuthorDate: Thu Jul 1 23:25:18 2021 +0300 [SPARK-35975][SQL] New configuration `spark.sql.timestampType` for the default timestamp type ### What changes were proposed in this pull request? Add a new configuration `spark.sql.timestampType`, which configures the default timestamp type of Spark SQL, including SQL DDL and Cast clause. Setting the configuration as `TIMESTAMP_NTZ` will use `TIMESTAMP WITHOUT TIME ZONE` as the default type while putting it as `TIMESTAMP_LTZ` will use `TIMESTAMP WITH LOCAL TIME ZONE`. The default value of the new configuration is TIMESTAMP_LTZ, which is consistent with previous Spark releases. ### Why are the changes needed? A new configuration for switching the default timestamp type as timestamp without time zone. ### Does this PR introduce _any_ user-facing change? No, it's a new feature. ### How was this patch tested? Unit test Closes #33176 from gengliangwang/newTsTypeConf. Authored-by: Gengliang Wang Signed-off-by: Max Gekk --- .../spark/sql/catalyst/parser/AstBuilder.scala | 2 +- .../org/apache/spark/sql/internal/SQLConf.scala| 28 ++ .../sql/catalyst/parser/DataTypeParserSuite.scala | 14 ++- 3 files changed, 42 insertions(+), 2 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala index 224c2d0..361ecc1 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala @@ -2502,7 +2502,7 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with SQLConfHelper with Logg case ("float" | "real", Nil) => FloatType case ("double", Nil) => DoubleType case ("date", Nil) => DateType - case ("timestamp", Nil) => TimestampType + case ("timestamp", Nil) => SQLConf.get.timestampType case ("string", Nil) => StringType case ("character" | "char", length :: Nil) => CharType(length.getText.toInt) case ("varchar", length :: Nil) => VarcharType(length.getText.toInt) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala index 30e5a16..3aed3c2 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala @@ -44,6 +44,7 @@ import org.apache.spark.sql.catalyst.plans.logical.HintErrorHandler import org.apache.spark.sql.catalyst.util.DateTimeUtils import org.apache.spark.sql.connector.catalog.CatalogManager.SESSION_CATALOG_NAME import org.apache.spark.sql.errors.{QueryCompilationErrors, QueryExecutionErrors} +import org.apache.spark.sql.types.{AtomicType, TimestampNTZType, TimestampType} import org.apache.spark.unsafe.array.ByteArrayMethods import org.apache.spark.util.Utils @@ -2820,6 +2821,24 @@ object SQLConf { .booleanConf .createWithDefault(true) + object TimestampTypes extends Enumeration { +val TIMESTAMP_NTZ, TIMESTAMP_LTZ = Value + } + + val TIMESTAMP_TYPE = +buildConf("spark.sql.timestampType") + .doc("Configures the default timestamp type of Spark SQL, including SQL DDL and Cast " + +s"clause. Setting the configuration as ${TimestampTypes.TIMESTAMP_NTZ.toString} will " + +"use TIMESTAMP WITHOUT TIME ZONE as the default type while putting it as " + +s"${TimestampTypes.TIMESTAMP_LTZ.toString} will use TIMESTAMP WITH LOCAL TIME ZONE. " + +"Before the 3.2.0 release, Spark only supports the TIMESTAMP WITH " + +"LOCAL TIME ZONE type.") + .version("3.2.0") + .stringConf + .transform(_.toUpperCase(Locale.ROOT)) + .checkValues(TimestampTypes.values.map(_.toString)) + .createWithDefault(TimestampTypes.TIMESTAMP_LTZ.toString) + val DATETIME_JAVA8API_ENABLED = buildConf("spark.sql.datetime.java8API.enabled") .doc("If the configuration property is set to true, java.time.Instant and " + "java.time.LocalDate classes of Java 8 API
[spark] branch master updated: [SPARK-35935][SQL] Prevent failure of `MSCK REPAIR TABLE` on table refreshing
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new d28ca9c [SPARK-35935][SQL] Prevent failure of `MSCK REPAIR TABLE` on table refreshing d28ca9c is described below commit d28ca9cc9808828118be64a545c3478160fdc170 Author: Max Gekk AuthorDate: Wed Jun 30 09:44:52 2021 +0300 [SPARK-35935][SQL] Prevent failure of `MSCK REPAIR TABLE` on table refreshing ### What changes were proposed in this pull request? In the PR, I propose to catch all non-fatal exceptions coming `refreshTable()` at the final stage of table repairing, and output an error message instead of failing with an exception. ### Why are the changes needed? 1. The uncaught exceptions from table refreshing might be considered as regression comparing to previous Spark versions. Table refreshing was introduced by https://github.com/apache/spark/pull/31066. 2. This should improve user experience with Spark SQL. For instance, when the `MSCK REPAIR TABLE` is performed in a chain of command in SQL where catching exception is difficult or even impossible. ### Does this PR introduce _any_ user-facing change? Yes. Before the changes the `MSCK REPAIR TABLE` command can fail with the exception portrayed in SPARK-35935. After the changes, the same command outputs error message, and completes successfully. ### How was this patch tested? By existing test suites. Closes #33137 from MaxGekk/msck-repair-catch-except. Authored-by: Max Gekk Signed-off-by: Max Gekk --- .../scala/org/apache/spark/sql/execution/command/ddl.scala | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala index 0876b5f..06c6847 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala @@ -675,7 +675,15 @@ case class RepairTableCommand( // This is always the case for Hive format tables, but is not true for Datasource tables created // before Spark 2.1 unless they are converted via `msck repair table`. spark.sessionState.catalog.alterTable(table.copy(tracksPartitionsInCatalog = true)) -spark.catalog.refreshTable(tableIdentWithDB) +try { + spark.catalog.refreshTable(tableIdentWithDB) +} catch { + case NonFatal(e) => +logError(s"Cannot refresh the table '$tableIdentWithDB'. A query of the table " + + "might return wrong result if the table was cached. To avoid such issue, you should " + + "uncache the table manually via the UNCACHE TABLE command after table recovering will " + + "complete fully.", e) +} logInfo(s"Recovered all partitions: added ($addedAmount), dropped ($droppedAmount).") Seq.empty[Row] } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: Revert "[SPARK-33995][SQL] Expose make_interval as a Scala function"
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 7668226 Revert "[SPARK-33995][SQL] Expose make_interval as a Scala function" 7668226 is described below commit 76682268d746e72f0e8aa4cc64860e0bfd90f1ed Author: Max Gekk AuthorDate: Wed Jun 30 09:26:35 2021 +0300 Revert "[SPARK-33995][SQL] Expose make_interval as a Scala function" ### What changes were proposed in this pull request? This reverts commit e6753c9402b5c40d9e2af662f28bd4f07a0bae17. ### Why are the changes needed? The `make_interval` function aims to construct values of the legacy interval type `CalendarIntervalType` which will be substituted by ANSI interval types (see SPARK-27790). Since the function has not been released yet, it would be better to don't expose it via public API at all. ### Does this PR introduce _any_ user-facing change? Should not since the `make_interval` function has not been released yet. ### How was this patch tested? By existing test suites, and GA/jenkins builds. Closes #33143 from MaxGekk/revert-make_interval. Authored-by: Max Gekk Signed-off-by: Max Gekk --- .../scala/org/apache/spark/sql/functions.scala | 25 .../apache/spark/sql/JavaDateFunctionsSuite.java | 68 -- .../org/apache/spark/sql/DateFunctionsSuite.scala | 40 - 3 files changed, 133 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/functions.scala b/sql/core/src/main/scala/org/apache/spark/sql/functions.scala index c446d6b..ecd60ff 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/functions.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/functions.scala @@ -2929,31 +2929,6 @@ object functions { // /** - * (Scala-specific) Creates a datetime interval - * - * @param years Number of years - * @param months Number of months - * @param weeks Number of weeks - * @param days Number of days - * @param hours Number of hours - * @param mins Number of mins - * @param secs Number of secs - * @return A datetime interval - * @group datetime_funcs - * @since 3.2.0 - */ - def make_interval( - years: Column = lit(0), - months: Column = lit(0), - weeks: Column = lit(0), - days: Column = lit(0), - hours: Column = lit(0), - mins: Column = lit(0), - secs: Column = lit(0)): Column = withExpr { -MakeInterval(years.expr, months.expr, weeks.expr, days.expr, hours.expr, mins.expr, secs.expr) - } - - /** * Returns the date that is `numMonths` after `startDate`. * * @param startDate A date, timestamp or string. If a string, the data must be in a format that diff --git a/sql/core/src/test/java/test/org/apache/spark/sql/JavaDateFunctionsSuite.java b/sql/core/src/test/java/test/org/apache/spark/sql/JavaDateFunctionsSuite.java deleted file mode 100644 index 2d1de77..000 --- a/sql/core/src/test/java/test/org/apache/spark/sql/JavaDateFunctionsSuite.java +++ /dev/null @@ -1,68 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one or more - * contributor license agreements. See the NOTICE file distributed with - * this work for additional information regarding copyright ownership. - * The ASF licenses this file to You under the Apache License, Version 2.0 - * (the "License"); you may not use this file except in compliance with - * the License. You may obtain a copy of the License at - * - *http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -package test.org.apache.spark.sql; - -import org.apache.spark.sql.Column; -import org.apache.spark.sql.Dataset; -import org.apache.spark.sql.Row; -import org.apache.spark.sql.RowFactory; -import org.apache.spark.sql.test.TestSparkSession; -import org.apache.spark.sql.types.StructType; -import org.junit.After; -import org.junit.Assert; -import org.junit.Before; -import org.junit.Test; - -import java.sql.Date; -import java.util.*; - -import static org.apache.spark.sql.types.DataTypes.*; -import static org.apache.spark.sql.functions.*; - -public class JavaDateFunctionsSuite { - private transient TestSparkSession spark; - - @Before - public void setUp() { -spark = new TestSparkSession(); -} - - @After - public void tearDown() { -spark.stop(); -spark = null; - } - - @Test - public void m
[spark] branch branch-3.0 updated (ab46045 -> 6a1361c)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from ab46045 [SPARK-35886][SQL][3.0] PromotePrecision should not overwrite genCodePromotePrecision should not overwrite genCode add 6a1361c [SPARK-35935][SQL][3.1][3.0] Prevent failure of `MSCK REPAIR TABLE` on table refreshing No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/sql/execution/command/ddl.scala | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.1 updated (fe412b6 -> b6e8fab)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch branch-3.1 in repository https://gitbox.apache.org/repos/asf/spark.git. from fe412b6 [SPARK-35898][SQL] Fix arrays and maps in RowToColumnConverter add b6e8fab [SPARK-35935][SQL][3.1][3.0] Prevent failure of `MSCK REPAIR TABLE` on table refreshing No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/sql/execution/command/ddl.scala | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-35735][SQL] Take into account day-time interval fields in cast
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 2febd5c [SPARK-35735][SQL] Take into account day-time interval fields in cast 2febd5c is described below commit 2febd5c3f0c3a0c6660cfb340eb65316a1ca4acd Author: Angerszh AuthorDate: Wed Jun 30 16:05:04 2021 +0300 [SPARK-35735][SQL] Take into account day-time interval fields in cast ### What changes were proposed in this pull request? Support take into account day-time interval field in cast. ### Why are the changes needed? To conform to the SQL standard. ### Does this PR introduce _any_ user-facing change? An user can use `cast(str, DayTimeInterval(DAY, HOUR))`, for instance. ### How was this patch tested? Added UT. Closes #32943 from AngersZh/SPARK-35735. Authored-by: Angerszh Signed-off-by: Max Gekk --- .../spark/sql/catalyst/util/IntervalUtils.scala| 203 +++-- .../sql/catalyst/expressions/CastSuiteBase.scala | 145 ++- 2 files changed, 324 insertions(+), 24 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala index 7a6de7f..30a2fa5 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala @@ -119,18 +119,27 @@ object IntervalUtils { } } + val supportedFormat = Map( +(YM.YEAR, YM.MONTH) -> Seq("[+|-]y-m", "INTERVAL [+|-]'[+|-]y-m' YEAR TO MONTH"), +(YM.YEAR, YM.YEAR) -> Seq("[+|-]y", "INTERVAL [+|-]'[+|-]y' YEAR"), +(YM.MONTH, YM.MONTH) -> Seq("[+|-]m", "INTERVAL [+|-]'[+|-]m' MONTH"), +(DT.DAY, DT.DAY) -> Seq("[+|-]d", "INTERVAL [+|-]'[+|-]d' DAY"), +(DT.DAY, DT.HOUR) -> Seq("[+|-]d h", "INTERVAL [+|-]'[+|-]d h' DAY TO HOUR"), +(DT.DAY, DT.MINUTE) -> Seq("[+|-]d h:m", "INTERVAL [+|-]'[+|-]d h:m' DAY TO MINUTE"), +(DT.DAY, DT.SECOND) -> Seq("[+|-]d h:m:s.n", "INTERVAL [+|-]'[+|-]d h:m:s.n' DAY TO SECOND"), +(DT.HOUR, DT.HOUR) -> Seq("[+|-]h", "INTERVAL [+|-]'[+|-]h' HOUR"), +(DT.HOUR, DT.MINUTE) -> Seq("[+|-]h:m", "INTERVAL [+|-]'[+|-]h:m' HOUR TO MINUTE"), +(DT.HOUR, DT.SECOND) -> Seq("[+|-]h:m:s.n", "INTERVAL [+|-]'[+|-]h:m:s.n' HOUR TO SECOND"), +(DT.MINUTE, DT.MINUTE) -> Seq("[+|-]m", "INTERVAL [+|-]'[+|-]m' MINUTE"), +(DT.MINUTE, DT.SECOND) -> Seq("[+|-]m:s.n", "INTERVAL [+|-]'[+|-]m:s.n' MINUTE TO SECOND"), +(DT.SECOND, DT.SECOND) -> Seq("[+|-]s.n", "INTERVAL [+|-]'[+|-]s.n' SECOND") + ) + def castStringToYMInterval( input: UTF8String, startField: Byte, endField: Byte): Int = { -val supportedFormat = Map( - (YM.YEAR, YM.MONTH) -> -Seq("[+|-]y-m", "INTERVAL [+|-]'[+|-]y-m' YEAR TO MONTH"), - (YM.YEAR, YM.YEAR) -> Seq("[+|-]y", "INTERVAL [+|-]'[+|-]y' YEAR"), - (YM.MONTH, YM.MONTH) -> Seq("[+|-]m", "INTERVAL [+|-]'[+|-]m' MONTH") -) - def checkStringIntervalType(targetStartField: Byte, targetEndField: Byte): Unit = { if (startField != targetStartField || endField != targetEndField) { throw new IllegalArgumentException(s"Interval string does not match year-month format of " + @@ -151,7 +160,7 @@ object IntervalUtils { checkStringIntervalType(YM.YEAR, YM.MONTH) toYMInterval(year, month, getSign(firstSign, secondSign)) case yearMonthIndividualRegex(secondSign, value) => -safeToYMInterval { +safeToInterval { val sign = getSign("+", secondSign) if (endField == YM.YEAR) { sign * Math.toIntExact(value.toLong * MONTHS_PER_YEAR) @@ -166,7 +175,7 @@ object IntervalUtils { } } case yearMonthIndividualLiteralRegex(firstSign, secondSign, value, suffix) => -safeToYMInterval { +safeToInterval { val sign = getSign(firstSign, secondSign) if ("YEAR".equalsIgnoreCase(suffix)) { checkStringIntervalType(YM.YEAR, YM.YEAR) @@ -202,7 +211,7 @@ object IntervalUtils { } } - private def safeToYMInterval(f: => Int): Int = { + private def safeToInterval[T](f: => T): T = { try { f } catch { @@ -213,24 +222,72 @@ object IntervalUtils { }
[spark] branch master updated (108635a -> 356aef4)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 108635a Revert "[SPARK-35904][SQL] Collapse above RebalancePartitions" add 356aef4 [SPARK-35728][SPARK-35778][SQL][TESTS] Check multiply/divide of day-time and year-month interval of any fields by a numeric No new revisions were added by this update. Summary of changes: .../expressions/IntervalExpressionsSuite.scala | 136 - 1 file changed, 131 insertions(+), 5 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-35895][SQL] Support subtracting Intervals from TimestampWithoutTZ
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 645fb59 [SPARK-35895][SQL] Support subtracting Intervals from TimestampWithoutTZ 645fb59 is described below commit 645fb59652fad5a7b84a691e2446b396cf81048a Author: Gengliang Wang AuthorDate: Sat Jun 26 13:19:00 2021 +0300 [SPARK-35895][SQL] Support subtracting Intervals from TimestampWithoutTZ ### What changes were proposed in this pull request? Support the following operation: - TimestampWithoutTZ - Year-Month interval The following operation is actually supported in https://github.com/apache/spark/pull/33076/. This PR is to add end-to-end tests for them: - TimestampWithoutTZ - Calendar interval - TimestampWithoutTZ - Daytime interval ### Why are the changes needed? Support subtracting all 3 interval types from a timestamp without time zone ### Does this PR introduce _any_ user-facing change? No, the timestamp without time zone type is not release yet. ### How was this patch tested? Unit tests Closes #33086 from gengliangwang/subtract. Authored-by: Gengliang Wang Signed-off-by: Max Gekk --- .../spark/sql/catalyst/analysis/Analyzer.scala | 2 +- .../test/resources/sql-tests/inputs/datetime.sql | 11 .../sql-tests/results/ansi/datetime.sql.out| 74 +- .../sql-tests/results/datetime-legacy.sql.out | 74 +- .../resources/sql-tests/results/datetime.sql.out | 74 +- 5 files changed, 231 insertions(+), 4 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala index 6737ed5..5de228b 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala @@ -378,7 +378,7 @@ class Analyzer(override val catalogManager: CatalogManager) DatetimeSub(l, r, TimeAdd(Cast(l, TimestampType), UnaryMinus(r, f))) case (DateType, _: YearMonthIntervalType) => DatetimeSub(l, r, DateAddYMInterval(l, UnaryMinus(r, f))) - case (TimestampType, _: YearMonthIntervalType) => + case (TimestampType | TimestampWithoutTZType, _: YearMonthIntervalType) => DatetimeSub(l, r, TimestampAddYMInterval(l, UnaryMinus(r, f))) case (CalendarIntervalType, CalendarIntervalType) | (_: DayTimeIntervalType, _: DayTimeIntervalType) => s diff --git a/sql/core/src/test/resources/sql-tests/inputs/datetime.sql b/sql/core/src/test/resources/sql-tests/inputs/datetime.sql index 819bf4b..d68c9ff 100644 --- a/sql/core/src/test/resources/sql-tests/inputs/datetime.sql +++ b/sql/core/src/test/resources/sql-tests/inputs/datetime.sql @@ -236,3 +236,14 @@ select to_timestamp_ntz('2021-06-25 10:11:12') + interval '10-9' year to month; select to_timestamp_ntz('2021-06-25 10:11:12') + interval '20 15' day to hour; select to_timestamp_ntz('2021-06-25 10:11:12') + interval '20 15:40' day to minute; select to_timestamp_ntz('2021-06-25 10:11:12') + interval '20 15:40:32.9989' day to second; + +-- TimestampWithoutTZ - Intervals +select to_timestamp_ntz('2021-06-25 10:11:12') - interval 2 day; +select to_timestamp_ntz('2021-06-25 10:11:12') - interval '0-0' year to month; +select to_timestamp_ntz('2021-06-25 10:11:12') - interval '1-2' year to month; +select to_timestamp_ntz('2021-06-25 10:11:12') - interval '0 0:0:0' day to second; +select to_timestamp_ntz('2021-06-25 10:11:12') - interval '0 0:0:0.1' day to second; +select to_timestamp_ntz('2021-06-25 10:11:12') - interval '10-9' year to month; +select to_timestamp_ntz('2021-06-25 10:11:12') - interval '20 15' day to hour; +select to_timestamp_ntz('2021-06-25 10:11:12') - interval '20 15:40' day to minute; +select to_timestamp_ntz('2021-06-25 10:11:12') - interval '20 15:40:32.9989' day to second; diff --git a/sql/core/src/test/resources/sql-tests/results/ansi/datetime.sql.out b/sql/core/src/test/resources/sql-tests/results/ansi/datetime.sql.out index 33d041b..08b01ca 100644 --- a/sql/core/src/test/resources/sql-tests/results/ansi/datetime.sql.out +++ b/sql/core/src/test/resources/sql-tests/results/ansi/datetime.sql.out @@ -1,5 +1,5 @@ -- Automatically generated by SQLQueryTestSuite --- Number of queries: 176 +-- Number of queries: 185 -- !query @@ -1506,3 +1506,75 @@ select to_timestamp_ntz('2021-06-25 10:11:12') + interval '20 15:40:32.9989' struct -- !query output 2021-07-16 01:51:44.998999 + + +-- !query +select to_timestamp_ntz('2021-06-25 10:11:12') - interval 2 day +-- !
[spark] branch branch-3.2 updated: [SPARK-36083][SQL] make_timestamp: return different result based on the default timestamp type
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch branch-3.2 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.2 by this push: new 09e5bbd [SPARK-36083][SQL] make_timestamp: return different result based on the default timestamp type 09e5bbd is described below commit 09e5bbdfbe53ccb4efd85f5774b7bee9e731a14f Author: Gengliang Wang AuthorDate: Sun Jul 11 20:47:49 2021 +0300 [SPARK-36083][SQL] make_timestamp: return different result based on the default timestamp type ### What changes were proposed in this pull request? The SQL function MAKE_TIMESTAMP should return different results based on the default timestamp type: * when "spark.sql.timestampType" is TIMESTAMP_NTZ, return TimestampNTZType literal * when "spark.sql.timestampType" is TIMESTAMP_LTZ, return TimestampType literal ### Why are the changes needed? As "spark.sql.timestampType" sets the default timestamp type, the make_timestamp function should behave consistently with it. ### Does this PR introduce _any_ user-facing change? Yes, when the value of "spark.sql.timestampType" is TIMESTAMP_NTZ, the result type of `MAKE_TIMESTAMP` is of TIMESTAMP_NTZ type. ### How was this patch tested? Unit test Closes #33290 from gengliangwang/mkTS. Authored-by: Gengliang Wang Signed-off-by: Max Gekk (cherry picked from commit 17ddcc9e8273a098b63984b950bfa6cd36b58b99) Signed-off-by: Max Gekk --- .../catalyst/expressions/datetimeExpressions.scala | 30 -- .../expressions/DateExpressionsSuite.scala | 120 +++-- .../test/resources/sql-tests/inputs/datetime.sql | 4 + .../sql-tests/results/ansi/datetime.sql.out| 19 +++- .../sql-tests/results/datetime-legacy.sql.out | 18 +++- .../resources/sql-tests/results/datetime.sql.out | 18 +++- .../results/timestampNTZ/datetime.sql.out | 18 +++- 7 files changed, 161 insertions(+), 66 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala index be527ce..979eeba 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala @@ -2286,7 +2286,8 @@ case class MakeDate( // scalastyle:off line.size.limit @ExpressionDescription( - usage = "_FUNC_(year, month, day, hour, min, sec[, timezone]) - Create timestamp from year, month, day, hour, min, sec and timezone fields.", + usage = "_FUNC_(year, month, day, hour, min, sec[, timezone]) - Create timestamp from year, month, day, hour, min, sec and timezone fields. " + +"The result data type is consistent with the value of configuration `spark.sql.timestampType`", arguments = """ Arguments: * year - the year to represent, from 1 to @@ -2324,7 +2325,8 @@ case class MakeTimestamp( sec: Expression, timezone: Option[Expression] = None, timeZoneId: Option[String] = None, -failOnError: Boolean = SQLConf.get.ansiEnabled) +failOnError: Boolean = SQLConf.get.ansiEnabled, +override val dataType: DataType = SQLConf.get.timestampType) extends SeptenaryExpression with TimeZoneAwareExpression with ImplicitCastInputTypes with NullIntolerant { @@ -2335,7 +2337,8 @@ case class MakeTimestamp( hour: Expression, min: Expression, sec: Expression) = { -this(year, month, day, hour, min, sec, None, None, SQLConf.get.ansiEnabled) +this(year, month, day, hour, min, sec, None, None, SQLConf.get.ansiEnabled, + SQLConf.get.timestampType) } def this( @@ -2346,7 +2349,8 @@ case class MakeTimestamp( min: Expression, sec: Expression, timezone: Expression) = { -this(year, month, day, hour, min, sec, Some(timezone), None, SQLConf.get.ansiEnabled) +this(year, month, day, hour, min, sec, Some(timezone), None, SQLConf.get.ansiEnabled, + SQLConf.get.timestampType) } override def children: Seq[Expression] = Seq(year, month, day, hour, min, sec) ++ timezone @@ -2355,7 +2359,6 @@ case class MakeTimestamp( override def inputTypes: Seq[AbstractDataType] = Seq(IntegerType, IntegerType, IntegerType, IntegerType, IntegerType, DecimalType(8, 6)) ++ timezone.map(_ => StringType) - override def dataType: DataType = TimestampType override def nullable: Boolean = if (failOnError) children.exists(_.nullable) else true override def withTimeZone(timeZoneId: String): TimeZoneAwareExpression = @@ -2388,7 +2391,11 @@ case class MakeTimestam
[spark] branch master updated (cfcd094 -> 17ddcc9)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from cfcd094 [SPARK-36036][CORE] Fix cleanup of DownloadFile resources add 17ddcc9 [SPARK-36083][SQL] make_timestamp: return different result based on the default timestamp type No new revisions were added by this update. Summary of changes: .../catalyst/expressions/datetimeExpressions.scala | 30 -- .../expressions/DateExpressionsSuite.scala | 120 +++-- .../test/resources/sql-tests/inputs/datetime.sql | 4 + .../sql-tests/results/ansi/datetime.sql.out| 19 +++- .../sql-tests/results/datetime-legacy.sql.out | 18 +++- .../resources/sql-tests/results/datetime.sql.out | 18 +++- .../results/timestampNTZ/datetime.sql.out | 18 +++- 7 files changed, 161 insertions(+), 66 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.2 updated: [SPARK-36044][SQL] Suport TimestampNTZ in functions unix_timestamp/to_unix_timestamp
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch branch-3.2 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.2 by this push: new 5816482 [SPARK-36044][SQL] Suport TimestampNTZ in functions unix_timestamp/to_unix_timestamp 5816482 is described below commit 58164828683db50fb0e1698b679ffc0a773da847 Author: gengjiaan AuthorDate: Mon Jul 12 09:55:43 2021 +0300 [SPARK-36044][SQL] Suport TimestampNTZ in functions unix_timestamp/to_unix_timestamp ### What changes were proposed in this pull request? The functions `unix_timestamp`/`to_unix_timestamp` should be able to accept input of `TimestampNTZType`. ### Why are the changes needed? The functions `unix_timestamp`/`to_unix_timestamp` should be able to accept input of `TimestampNTZType`. ### Does this PR introduce _any_ user-facing change? 'Yes'. ### How was this patch tested? New tests. Closes #33278 from beliefer/SPARK-36044. Authored-by: gengjiaan Signed-off-by: Max Gekk (cherry picked from commit 8738682f6a36436da0e9fc332d58b2f41309e2c2) Signed-off-by: Max Gekk --- .../spark/sql/catalyst/expressions/datetimeExpressions.scala | 8 +--- .../spark/sql/catalyst/expressions/DateExpressionsSuite.scala | 9 + .../test/scala/org/apache/spark/sql/DateFunctionsSuite.scala | 11 ++- 3 files changed, 24 insertions(+), 4 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala index 979eeba..f0ed89e 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala @@ -1091,8 +1091,10 @@ abstract class ToTimestamp override protected def formatString: Expression = right override protected def isParsing = true + override def forTimestampNTZ: Boolean = left.dataType == TimestampNTZType + override def inputTypes: Seq[AbstractDataType] = -Seq(TypeCollection(StringType, DateType, TimestampType), StringType) +Seq(TypeCollection(StringType, DateType, TimestampType, TimestampNTZType), StringType) override def dataType: DataType = LongType override def nullable: Boolean = if (failOnError) children.exists(_.nullable) else true @@ -1112,7 +1114,7 @@ abstract class ToTimestamp left.dataType match { case DateType => daysToMicros(t.asInstanceOf[Int], zoneId) / downScaleFactor -case TimestampType => +case TimestampType | TimestampNTZType => t.asInstanceOf[Long] / downScaleFactor case StringType => val fmt = right.eval(input) @@ -1192,7 +1194,7 @@ abstract class ToTimestamp |} |""".stripMargin) } - case TimestampType => + case TimestampType | TimestampNTZType => val eval1 = left.genCode(ctx) ev.copy(code = code""" ${eval1.code} diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala index 5f071c3..02d6d95 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala @@ -916,6 +916,11 @@ class DateExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper { Literal(new Timestamp(100)), Literal("-MM-dd HH:mm:ss"), timeZoneId), 1000L) checkEvaluation( + UnixTimestamp( + Literal(DateTimeUtils.microsToLocalDateTime(DateTimeUtils.millisToMicros(100))), +Literal("-MM-dd HH:mm:ss"), timeZoneId), + 1000L) +checkEvaluation( UnixTimestamp(Literal(date1), Literal("-MM-dd HH:mm:ss"), timeZoneId), MICROSECONDS.toSeconds( DateTimeUtils.daysToMicros(DateTimeUtils.fromJavaDate(date1), tz.toZoneId))) @@ -981,6 +986,10 @@ class DateExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper { checkEvaluation(ToUnixTimestamp( Literal(new Timestamp(100)), Literal(fmt1)), 1000L) +checkEvaluation(ToUnixTimestamp( + Literal(DateTimeUtils.microsToLocalDateTime(DateTimeUtils.millisToMicros(100))), + Literal(fmt1)), + 1000L) checkEvaluation( ToUnixTimestamp(Literal(date1), Literal(fmt
[spark] branch master updated (badb039 -> 8738682)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from badb039 [SPARK-36003][PYTHON] Implement unary operator `invert` of integral ps.Series/Index add 8738682 [SPARK-36044][SQL] Suport TimestampNTZ in functions unix_timestamp/to_unix_timestamp No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/expressions/datetimeExpressions.scala | 8 +--- .../spark/sql/catalyst/expressions/DateExpressionsSuite.scala | 9 + .../test/scala/org/apache/spark/sql/DateFunctionsSuite.scala | 11 ++- 3 files changed, 24 insertions(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (8738682 -> 32720dd)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 8738682 [SPARK-36044][SQL] Suport TimestampNTZ in functions unix_timestamp/to_unix_timestamp add 32720dd [SPARK-36072][SQL] TO_TIMESTAMP: return different results based on the default timestamp type No new revisions were added by this update. Summary of changes: .../catalyst/expressions/datetimeExpressions.scala | 54 +- .../expressions/DateExpressionsSuite.scala | 23 +++--- .../results/timestampNTZ/datetime.sql.out | 84 +++--- 3 files changed, 76 insertions(+), 85 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.2 updated: [SPARK-36072][SQL] TO_TIMESTAMP: return different results based on the default timestamp type
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch branch-3.2 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.2 by this push: new 4e9e2f3 [SPARK-36072][SQL] TO_TIMESTAMP: return different results based on the default timestamp type 4e9e2f3 is described below commit 4e9e2f32e84cd1d317e25b85eb18f2c662bc641f Author: Gengliang Wang AuthorDate: Mon Jul 12 10:12:30 2021 +0300 [SPARK-36072][SQL] TO_TIMESTAMP: return different results based on the default timestamp type ### What changes were proposed in this pull request? The SQL function TO_TIMESTAMP should return different results based on the default timestamp type: * when "spark.sql.timestampType" is TIMESTAMP_NTZ, return TimestampNTZType literal * when "spark.sql.timestampType" is TIMESTAMP_LTZ, return TimestampType literal This PR also refactor the class GetTimestamp and GetTimestampNTZ to reduce duplicated code. ### Why are the changes needed? As "spark.sql.timestampType" sets the default timestamp type, the to_timestamp function should behave consistently with it. ### Does this PR introduce _any_ user-facing change? Yes, when the value of "spark.sql.timestampType" is TIMESTAMP_NTZ, the result type of `TO_TIMESTAMP` is of TIMESTAMP_NTZ type. ### How was this patch tested? Unit test Closes #33280 from gengliangwang/to_timestamp. Authored-by: Gengliang Wang Signed-off-by: Max Gekk (cherry picked from commit 32720dd3e18ea43c6d88125a52356f40f808b300) Signed-off-by: Max Gekk --- .../catalyst/expressions/datetimeExpressions.scala | 54 +- .../expressions/DateExpressionsSuite.scala | 23 +++--- .../results/timestampNTZ/datetime.sql.out | 84 +++--- 3 files changed, 76 insertions(+), 85 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala index f0ed89e..0ebeacb 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala @@ -1008,17 +1008,17 @@ case class UnixTimestamp( copy(timeExp = newLeft, format = newRight) } -case class GetTimestampNTZ( +/** + * Gets a timestamp from a string or a date. + */ +case class GetTimestamp( left: Expression, right: Expression, +override val dataType: DataType, timeZoneId: Option[String] = None, failOnError: Boolean = SQLConf.get.ansiEnabled) extends ToTimestamp { - override val forTimestampNTZ: Boolean = true - - override def inputTypes: Seq[AbstractDataType] = Seq(StringType, StringType) - - override def dataType: DataType = TimestampNTZType + override val forTimestampNTZ: Boolean = dataType == TimestampNTZType override protected def downScaleFactor: Long = 1 @@ -1064,7 +1064,7 @@ case class ParseToTimestampNTZ( child: Expression) extends RuntimeReplaceable { def this(left: Expression, format: Expression) = { -this(left, Option(format), GetTimestampNTZ(left, format)) +this(left, Option(format), GetTimestamp(left, format, TimestampNTZType)) } def this(left: Expression) = this(left, None, Cast(left, TimestampNTZType)) @@ -1886,7 +1886,7 @@ case class ParseToDate(left: Expression, format: Option[Expression], child: Expr extends RuntimeReplaceable { def this(left: Expression, format: Expression) = { -this(left, Option(format), Cast(GetTimestamp(left, format), DateType)) +this(left, Option(format), Cast(GetTimestamp(left, format, TimestampType), DateType)) } def this(left: Expression) = { @@ -1911,7 +1911,8 @@ case class ParseToDate(left: Expression, format: Option[Expression], child: Expr usage = """ _FUNC_(timestamp_str[, fmt]) - Parses the `timestamp_str` expression with the `fmt` expression to a timestamp. Returns null with invalid input. By default, it follows casting rules to - a timestamp if the `fmt` is omitted. + a timestamp if the `fmt` is omitted. The result data type is consistent with the value of + configuration `spark.sql.timestampType`. """, arguments = """ Arguments: @@ -1929,20 +1930,24 @@ case class ParseToDate(left: Expression, format: Option[Expression], child: Expr group = "datetime_funcs", since = "2.2.0") // scalastyle:on line.size.limit -case class ParseToTimestamp(left: Expression, format: Option[Expression], child: Expression) - extends RuntimeReplaceable { +case class ParseToTimestamp( +left: Expression, +format: Opt
[spark] branch master updated: [SPARK-36046][SQL] Support new functions make_timestamp_ntz and make_timestamp_ltz
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 92bf83e [SPARK-36046][SQL] Support new functions make_timestamp_ntz and make_timestamp_ltz 92bf83e is described below commit 92bf83ed0aba2c399debb1db5fb88bad3961ab06 Author: Gengliang Wang AuthorDate: Mon Jul 12 22:44:26 2021 +0300 [SPARK-36046][SQL] Support new functions make_timestamp_ntz and make_timestamp_ltz ### What changes were proposed in this pull request? Support new functions make_timestamp_ntz and make_timestamp_ltz Syntax: * `make_timestamp_ntz(year, month, day, hour, min, sec)`: Create local date-time from year, month, day, hour, min, sec fields * `make_timestamp_ltz(year, month, day, hour, min, sec[, timezone])`: Create current timestamp with local time zone from year, month, day, hour, min, sec and timezone fields ### Why are the changes needed? As the result of `make_timestamp` become consistent with the SQL configuration `spark.sql.timestmapType`, we need these two new functions to construct timestamp literals. They align to the functions [`make_timestamp` and `make_timestamptz`](https://www.postgresql.org/docs/9.4/functions-datetime.html) in PostgreSQL ### Does this PR introduce _any_ user-facing change? Yes, two new datetime functions: make_timestamp_ntz and make_timestamp_ltz. ### How was this patch tested? End-to-end tests. Closes #33299 from gengliangwang/make_timestamp_ntz_ltz. Authored-by: Gengliang Wang Signed-off-by: Max Gekk --- .../sql/catalyst/analysis/FunctionRegistry.scala | 2 + .../catalyst/expressions/datetimeExpressions.scala | 122 + .../sql-functions/sql-expression-schema.md | 6 +- .../test/resources/sql-tests/inputs/datetime.sql | 12 ++ .../sql-tests/results/ansi/datetime.sql.out| 61 ++- .../sql-tests/results/datetime-legacy.sql.out | 59 +- .../resources/sql-tests/results/datetime.sql.out | 59 +- .../results/timestampNTZ/datetime.sql.out | 59 +- 8 files changed, 374 insertions(+), 6 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala index fcc0220..4fd871d 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala @@ -552,6 +552,8 @@ object FunctionRegistry { expression[TimeWindow]("window"), expression[MakeDate]("make_date"), expression[MakeTimestamp]("make_timestamp"), +expression[MakeTimestampNTZ]("make_timestamp_ntz", true), +expression[MakeTimestampLTZ]("make_timestamp_ltz", true), expression[MakeInterval]("make_interval"), expression[MakeDTInterval]("make_dt_interval"), expression[MakeYMInterval]("make_ym_interval"), diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala index 0ebeacb..2840b18 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala @@ -2272,6 +2272,128 @@ case class MakeDate( // scalastyle:off line.size.limit @ExpressionDescription( + usage = "_FUNC_(year, month, day, hour, min, sec) - Create local date-time from year, month, day, hour, min, sec fields. ", + arguments = """ +Arguments: + * year - the year to represent, from 1 to + * month - the month-of-year to represent, from 1 (January) to 12 (December) + * day - the day-of-month to represent, from 1 to 31 + * hour - the hour-of-day to represent, from 0 to 23 + * min - the minute-of-hour to represent, from 0 to 59 + * sec - the second-of-minute and its micro-fraction to represent, from + 0 to 60. If the sec argument equals to 60, the seconds field is set + to 0 and 1 minute is added to the final timestamp. + """, + examples = """ +Examples: + > SELECT _FUNC_(2014, 12, 28, 6, 30, 45.887); + 2014-12-28 06:30:45.887 + > SELECT _FUNC_(2019, 6, 30, 23, 59, 60); + 2019-07-01 00:00:00 + > SELECT _FUNC_(null, 7, 22, 15, 30, 0); + NULL + """, + group = "datetime_funcs", + since = "3.2.0") +// sca
[spark] branch branch-3.2 updated: [SPARK-36046][SQL] Support new functions make_timestamp_ntz and make_timestamp_ltz
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch branch-3.2 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.2 by this push: new fba3e90 [SPARK-36046][SQL] Support new functions make_timestamp_ntz and make_timestamp_ltz fba3e90 is described below commit fba3e90863e584361b2f61828a500c96d89c35de Author: Gengliang Wang AuthorDate: Mon Jul 12 22:44:26 2021 +0300 [SPARK-36046][SQL] Support new functions make_timestamp_ntz and make_timestamp_ltz ### What changes were proposed in this pull request? Support new functions make_timestamp_ntz and make_timestamp_ltz Syntax: * `make_timestamp_ntz(year, month, day, hour, min, sec)`: Create local date-time from year, month, day, hour, min, sec fields * `make_timestamp_ltz(year, month, day, hour, min, sec[, timezone])`: Create current timestamp with local time zone from year, month, day, hour, min, sec and timezone fields ### Why are the changes needed? As the result of `make_timestamp` become consistent with the SQL configuration `spark.sql.timestmapType`, we need these two new functions to construct timestamp literals. They align to the functions [`make_timestamp` and `make_timestamptz`](https://www.postgresql.org/docs/9.4/functions-datetime.html) in PostgreSQL ### Does this PR introduce _any_ user-facing change? Yes, two new datetime functions: make_timestamp_ntz and make_timestamp_ltz. ### How was this patch tested? End-to-end tests. Closes #33299 from gengliangwang/make_timestamp_ntz_ltz. Authored-by: Gengliang Wang Signed-off-by: Max Gekk (cherry picked from commit 92bf83ed0aba2c399debb1db5fb88bad3961ab06) Signed-off-by: Max Gekk --- .../sql/catalyst/analysis/FunctionRegistry.scala | 2 + .../catalyst/expressions/datetimeExpressions.scala | 122 + .../sql-functions/sql-expression-schema.md | 6 +- .../test/resources/sql-tests/inputs/datetime.sql | 12 ++ .../sql-tests/results/ansi/datetime.sql.out| 61 ++- .../sql-tests/results/datetime-legacy.sql.out | 59 +- .../resources/sql-tests/results/datetime.sql.out | 59 +- .../results/timestampNTZ/datetime.sql.out | 59 +- 8 files changed, 374 insertions(+), 6 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala index fcc0220..4fd871d 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala @@ -552,6 +552,8 @@ object FunctionRegistry { expression[TimeWindow]("window"), expression[MakeDate]("make_date"), expression[MakeTimestamp]("make_timestamp"), +expression[MakeTimestampNTZ]("make_timestamp_ntz", true), +expression[MakeTimestampLTZ]("make_timestamp_ltz", true), expression[MakeInterval]("make_interval"), expression[MakeDTInterval]("make_dt_interval"), expression[MakeYMInterval]("make_ym_interval"), diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala index 0ebeacb..2840b18 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala @@ -2272,6 +2272,128 @@ case class MakeDate( // scalastyle:off line.size.limit @ExpressionDescription( + usage = "_FUNC_(year, month, day, hour, min, sec) - Create local date-time from year, month, day, hour, min, sec fields. ", + arguments = """ +Arguments: + * year - the year to represent, from 1 to + * month - the month-of-year to represent, from 1 (January) to 12 (December) + * day - the day-of-month to represent, from 1 to 31 + * hour - the hour-of-day to represent, from 0 to 23 + * min - the minute-of-hour to represent, from 0 to 59 + * sec - the second-of-minute and its micro-fraction to represent, from + 0 to 60. If the sec argument equals to 60, the seconds field is set + to 0 and 1 minute is added to the final timestamp. + """, + examples = """ +Examples: + > SELECT _FUNC_(2014, 12, 28, 6, 30, 45.887); + 2014-12-28 06:30:45.887 + > SELECT _FUNC_(2019, 6, 30, 23, 59, 60); + 2019-07-01 00:00:00 + > SELECT _FUNC_(null, 7, 22, 15, 30, 0);
[spark] branch master updated: [SPARK-35977][SQL] Support non-reserved keyword TIMESTAMP_NTZ
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 5f44acf [SPARK-35977][SQL] Support non-reserved keyword TIMESTAMP_NTZ 5f44acf is described below commit 5f44acff3df51721fe891ea50c0a5bcf3a37a719 Author: Gengliang Wang AuthorDate: Mon Jul 5 22:30:44 2021 +0300 [SPARK-35977][SQL] Support non-reserved keyword TIMESTAMP_NTZ ### What changes were proposed in this pull request? Support new keyword TIMESTAMP_NTZ, which can be used for: - timestamp without time zone data type in DDL - timestamp without time zone data type in Cast clause. - timestamp without time zone data type literal ### Why are the changes needed? Users can use `TIMESTAMP_NTZ` in DDL/Cast/Literals for the timestamp without time zone type directly. ### Does this PR introduce _any_ user-facing change? No, the new timestamp type is not released yet. ### How was this patch tested? Unit test Closes #33221 from gengliangwang/timstamp_ntz. Authored-by: Gengliang Wang Signed-off-by: Max Gekk --- .../main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala | 4 .../org/apache/spark/sql/catalyst/parser/DataTypeParserSuite.scala| 1 + .../org/apache/spark/sql/catalyst/parser/ExpressionParserSuite.scala | 3 +++ 3 files changed, 8 insertions(+) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala index 5b9107f..c650cf0 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala @@ -2125,6 +2125,9 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with SQLConfHelper with Logg val zoneId = getZoneId(conf.sessionLocalTimeZone) val specialDate = convertSpecialDate(value, zoneId).map(Literal(_, DateType)) specialDate.getOrElse(toLiteral(stringToDate, DateType)) +case "TIMESTAMP_NTZ" => + val specialTs = convertSpecialTimestampNTZ(value).map(Literal(_, TimestampNTZType)) + specialTs.getOrElse(toLiteral(stringToTimestampWithoutTimeZone, TimestampNTZType)) case "TIMESTAMP" => def constructTimestampLTZLiteral(value: String): Literal = { val zoneId = getZoneId(conf.sessionLocalTimeZone) @@ -2525,6 +2528,7 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with SQLConfHelper with Logg case ("double", Nil) => DoubleType case ("date", Nil) => DateType case ("timestamp", Nil) => SQLConf.get.timestampType + case ("timestamp_ntz", Nil) => TimestampNTZType case ("string", Nil) => StringType case ("character" | "char", length :: Nil) => CharType(length.getText.toInt) case ("varchar", length :: Nil) => VarcharType(length.getText.toInt) diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/DataTypeParserSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/DataTypeParserSuite.scala index a6b78e0..d34 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/DataTypeParserSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/DataTypeParserSuite.scala @@ -58,6 +58,7 @@ class DataTypeParserSuite extends SparkFunSuite with SQLHelper { checkDataType("deC", DecimalType.USER_DEFAULT) checkDataType("DATE", DateType) checkDataType("timestamp", TimestampType) + checkDataType("timestamp_ntz", TimestampNTZType) checkDataType("string", StringType) checkDataType("ChaR(5)", CharType(5)) checkDataType("ChaRacter(5)", CharType(5)) diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/ExpressionParserSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/ExpressionParserSuite.scala index 37e2d9b..7b13fa9 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/ExpressionParserSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/ExpressionParserSuite.scala @@ -465,6 +465,9 @@ class ExpressionParserSuite extends AnalysisTest { intercept("timestamP '2016-33-11 20:54:00.000'", "Cannot parse the TIMESTAMP value") // Timestamp without time zone +assertEqual("tImEstAmp_Ntz '2016-03-11 20:54:00.000'", + Literal(LocalDateTime.parse("2016-03-11T20:54:00.000"))) +intercept("tImE
[spark] branch branch-3.2 updated: [SPARK-35977][SQL] Support non-reserved keyword TIMESTAMP_NTZ
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch branch-3.2 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.2 by this push: new 1ec37dd [SPARK-35977][SQL] Support non-reserved keyword TIMESTAMP_NTZ 1ec37dd is described below commit 1ec37dd164ae64c78bdccd8c0604b4013a692015 Author: Gengliang Wang AuthorDate: Mon Jul 5 22:30:44 2021 +0300 [SPARK-35977][SQL] Support non-reserved keyword TIMESTAMP_NTZ ### What changes were proposed in this pull request? Support new keyword TIMESTAMP_NTZ, which can be used for: - timestamp without time zone data type in DDL - timestamp without time zone data type in Cast clause. - timestamp without time zone data type literal ### Why are the changes needed? Users can use `TIMESTAMP_NTZ` in DDL/Cast/Literals for the timestamp without time zone type directly. ### Does this PR introduce _any_ user-facing change? No, the new timestamp type is not released yet. ### How was this patch tested? Unit test Closes #33221 from gengliangwang/timstamp_ntz. Authored-by: Gengliang Wang Signed-off-by: Max Gekk (cherry picked from commit 5f44acff3df51721fe891ea50c0a5bcf3a37a719) Signed-off-by: Max Gekk --- .../main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala | 4 .../org/apache/spark/sql/catalyst/parser/DataTypeParserSuite.scala| 1 + .../org/apache/spark/sql/catalyst/parser/ExpressionParserSuite.scala | 3 +++ 3 files changed, 8 insertions(+) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala index 5b9107f..c650cf0 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala @@ -2125,6 +2125,9 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with SQLConfHelper with Logg val zoneId = getZoneId(conf.sessionLocalTimeZone) val specialDate = convertSpecialDate(value, zoneId).map(Literal(_, DateType)) specialDate.getOrElse(toLiteral(stringToDate, DateType)) +case "TIMESTAMP_NTZ" => + val specialTs = convertSpecialTimestampNTZ(value).map(Literal(_, TimestampNTZType)) + specialTs.getOrElse(toLiteral(stringToTimestampWithoutTimeZone, TimestampNTZType)) case "TIMESTAMP" => def constructTimestampLTZLiteral(value: String): Literal = { val zoneId = getZoneId(conf.sessionLocalTimeZone) @@ -2525,6 +2528,7 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with SQLConfHelper with Logg case ("double", Nil) => DoubleType case ("date", Nil) => DateType case ("timestamp", Nil) => SQLConf.get.timestampType + case ("timestamp_ntz", Nil) => TimestampNTZType case ("string", Nil) => StringType case ("character" | "char", length :: Nil) => CharType(length.getText.toInt) case ("varchar", length :: Nil) => VarcharType(length.getText.toInt) diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/DataTypeParserSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/DataTypeParserSuite.scala index a6b78e0..d34 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/DataTypeParserSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/DataTypeParserSuite.scala @@ -58,6 +58,7 @@ class DataTypeParserSuite extends SparkFunSuite with SQLHelper { checkDataType("deC", DecimalType.USER_DEFAULT) checkDataType("DATE", DateType) checkDataType("timestamp", TimestampType) + checkDataType("timestamp_ntz", TimestampNTZType) checkDataType("string", StringType) checkDataType("ChaR(5)", CharType(5)) checkDataType("ChaRacter(5)", CharType(5)) diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/ExpressionParserSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/ExpressionParserSuite.scala index 37e2d9b..7b13fa9 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/ExpressionParserSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/ExpressionParserSuite.scala @@ -465,6 +465,9 @@ class ExpressionParserSuite extends AnalysisTest { intercept("timestamP '2016-33-11 20:54:00.000'", "Cannot parse the TIMESTAMP value") // Timestamp without time zone +assertEqual("tImEstAmp_Ntz '2016-03-11
[spark] branch branch-3.2 updated: [SPARK-35735][SQL][FOLLOWUP] Fix case minute to second regex can cover by hour to minute and unit case-sensitive issue
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch branch-3.2 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.2 by this push: new dd038aa [SPARK-35735][SQL][FOLLOWUP] Fix case minute to second regex can cover by hour to minute and unit case-sensitive issue dd038aa is described below commit dd038aacd454e1a70ef2b2598fb504d8ad04b82a Author: Angerszh AuthorDate: Wed Jul 7 12:37:19 2021 +0300 [SPARK-35735][SQL][FOLLOWUP] Fix case minute to second regex can cover by hour to minute and unit case-sensitive issue ### What changes were proposed in this pull request? When cast `10:10` to interval minute to second, it can be catch by hour to minute regex, here to fix this. ### Why are the changes needed? Fix bug ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Added UT Closes #33242 from AngersZh/SPARK-35735-FOLLOWUP. Authored-by: Angerszh Signed-off-by: Max Gekk (cherry picked from commit 3953754f36656e1a0bee16b89fae0142f172a91a) Signed-off-by: Max Gekk --- .../spark/sql/catalyst/util/IntervalUtils.scala| 92 +++--- .../sql/catalyst/expressions/CastSuiteBase.scala | 4 + 2 files changed, 48 insertions(+), 48 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala index b174165..ad87f2a 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala @@ -19,6 +19,7 @@ package org.apache.spark.sql.catalyst.util import java.time.{Duration, Period} import java.time.temporal.ChronoUnit +import java.util.Locale import java.util.concurrent.TimeUnit import scala.collection.mutable @@ -173,15 +174,14 @@ object IntervalUtils { startField: Byte, endField: Byte): Int = { -def checkYMIntervalStringDataType(ym: YM): Unit = - checkIntervalStringDataType(input, startField, endField, ym) +def checkTargetType(targetStartField: Byte, targetEndField: Byte): Boolean = + startField == targetStartField && endField == targetEndField input.trimAll().toString match { - case yearMonthRegex(sign, year, month) => -checkYMIntervalStringDataType(YM(YM.YEAR, YM.MONTH)) + case yearMonthRegex(sign, year, month) if checkTargetType(YM.YEAR, YM.MONTH) => toYMInterval(year, month, finalSign(sign)) - case yearMonthLiteralRegex(firstSign, secondSign, year, month) => -checkYMIntervalStringDataType(YM(YM.YEAR, YM.MONTH)) + case yearMonthLiteralRegex(firstSign, secondSign, year, month) +if checkTargetType(YM.YEAR, YM.MONTH) => toYMInterval(year, month, finalSign(firstSign, secondSign)) case yearMonthIndividualRegex(firstSign, value) => safeToInterval("year-month") { @@ -195,15 +195,16 @@ object IntervalUtils { input, startField, endField, "year-month", YM(startField, endField).typeName) } } - case yearMonthIndividualLiteralRegex(firstSign, secondSign, value, suffix) => + case yearMonthIndividualLiteralRegex(firstSign, secondSign, value, unit) => safeToInterval("year-month") { val sign = finalSign(firstSign, secondSign) - if ("YEAR".equalsIgnoreCase(suffix)) { -checkYMIntervalStringDataType(YM(YM.YEAR, YM.YEAR)) -sign * Math.toIntExact(value.toLong * MONTHS_PER_YEAR) - } else { -checkYMIntervalStringDataType(YM(YM.MONTH, YM.MONTH)) -Math.toIntExact(sign * value.toLong) + unit.toUpperCase(Locale.ROOT) match { +case "YEAR" if checkTargetType(YM.YEAR, YM.YEAR) => + sign * Math.toIntExact(value.toLong * MONTHS_PER_YEAR) +case "MONTH" if checkTargetType(YM.MONTH, YM.MONTH) => + Math.toIntExact(sign * value.toLong) +case _ => throwIllegalIntervalFormatException(input, startField, endField, + "year-month", YM(startField, endField).typeName) } } case _ => throwIllegalIntervalFormatException(input, startField, endField, @@ -303,48 +304,45 @@ object IntervalUtils { } } -def checkDTIntervalStringDataType(dt: DT): Unit = - checkIntervalStringDataType(input, startField, endField, dt, Some(fallbackNotice)) +def checkTargetType(targetStartField: Byte, targetEndField: Byte): Boolean = + startField == targetStartField && endField == targetEndField input.trimAll().toString match { - case day
[spark] branch branch-3.2 updated: [SPARK-36017][SQL] Support TimestampNTZType in expression ApproximatePercentile
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch branch-3.2 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.2 by this push: new 25ea296 [SPARK-36017][SQL] Support TimestampNTZType in expression ApproximatePercentile 25ea296 is described below commit 25ea296c3c4295d78a573d98edbdd8f1de0e0447 Author: gengjiaan AuthorDate: Wed Jul 7 12:41:11 2021 +0300 [SPARK-36017][SQL] Support TimestampNTZType in expression ApproximatePercentile ### What changes were proposed in this pull request? The current `ApproximatePercentile` supports `TimestampType`, but not supports timestamp without time zone yet. This PR will add the function. ### Why are the changes needed? `ApproximatePercentile` need supports `TimestampNTZType`. ### Does this PR introduce _any_ user-facing change? 'Yes'. `ApproximatePercentile` accepts `TimestampNTZType`. ### How was this patch tested? New tests. Closes #33241 from beliefer/SPARK-36017. Authored-by: gengjiaan Signed-off-by: Max Gekk (cherry picked from commit cc4463e818749faaf648ec71699d1e2fd3828c3f) Signed-off-by: Max Gekk --- .../expressions/aggregate/ApproximatePercentile.scala | 10 +- .../apache/spark/sql/ApproximatePercentileQuerySuite.scala | 14 +- 2 files changed, 14 insertions(+), 10 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala index 78e64bf..8cce79c 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala @@ -92,9 +92,9 @@ case class ApproximatePercentile( private lazy val accuracy: Long = accuracyExpression.eval().asInstanceOf[Number].longValue override def inputTypes: Seq[AbstractDataType] = { -// Support NumericType, DateType and TimestampType since their internal types are all numeric, -// and can be easily cast to double for processing. -Seq(TypeCollection(NumericType, DateType, TimestampType), +// Support NumericType, DateType, TimestampType and TimestampNTZType since their internal types +// are all numeric, and can be easily cast to double for processing. +Seq(TypeCollection(NumericType, DateType, TimestampType, TimestampNTZType), TypeCollection(DoubleType, ArrayType(DoubleType, containsNull = false)), IntegralType) } @@ -139,7 +139,7 @@ case class ApproximatePercentile( // Convert the value to a double value val doubleValue = child.dataType match { case DateType => value.asInstanceOf[Int].toDouble -case TimestampType => value.asInstanceOf[Long].toDouble +case TimestampType | TimestampNTZType => value.asInstanceOf[Long].toDouble case n: NumericType => n.numeric.toDouble(value.asInstanceOf[n.InternalType]) case other: DataType => throw QueryExecutionErrors.dataTypeUnexpectedError(other) @@ -158,7 +158,7 @@ case class ApproximatePercentile( val doubleResult = buffer.getPercentiles(percentages) val result = child.dataType match { case DateType => doubleResult.map(_.toInt) - case TimestampType => doubleResult.map(_.toLong) + case TimestampType | TimestampNTZType => doubleResult.map(_.toLong) case ByteType => doubleResult.map(_.toByte) case ShortType => doubleResult.map(_.toShort) case IntegerType => doubleResult.map(_.toInt) diff --git a/sql/core/src/test/scala/org/apache/spark/sql/ApproximatePercentileQuerySuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/ApproximatePercentileQuerySuite.scala index 4991e39..5ff15c9 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/ApproximatePercentileQuerySuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/ApproximatePercentileQuerySuite.scala @@ -18,6 +18,7 @@ package org.apache.spark.sql import java.sql.{Date, Timestamp} +import java.time.LocalDateTime import org.apache.spark.sql.catalyst.expressions.aggregate.ApproximatePercentile import org.apache.spark.sql.catalyst.expressions.aggregate.ApproximatePercentile.DEFAULT_PERCENTILE_ACCURACY @@ -89,23 +90,26 @@ class ApproximatePercentileQuerySuite extends QueryTest with SharedSparkSession test("percentile_approx, different column types") { withTempView(table) { val intSeq = 1 to 1000 - val data: Seq[(java.math.BigDecimal, Date, Timestamp)] = intSeq.map { i => -(new java.math.BigDecimal(i), DateTimeUtils.toJavaDate(i), DateTimeUtils.toJavaTimestamp(i)) + val data: Seq[(java.m
[spark] branch master updated: [SPARK-35735][SQL][FOLLOWUP] Fix case minute to second regex can cover by hour to minute and unit case-sensitive issue
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 3953754 [SPARK-35735][SQL][FOLLOWUP] Fix case minute to second regex can cover by hour to minute and unit case-sensitive issue 3953754 is described below commit 3953754f36656e1a0bee16b89fae0142f172a91a Author: Angerszh AuthorDate: Wed Jul 7 12:37:19 2021 +0300 [SPARK-35735][SQL][FOLLOWUP] Fix case minute to second regex can cover by hour to minute and unit case-sensitive issue ### What changes were proposed in this pull request? When cast `10:10` to interval minute to second, it can be catch by hour to minute regex, here to fix this. ### Why are the changes needed? Fix bug ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Added UT Closes #33242 from AngersZh/SPARK-35735-FOLLOWUP. Authored-by: Angerszh Signed-off-by: Max Gekk --- .../spark/sql/catalyst/util/IntervalUtils.scala| 92 +++--- .../sql/catalyst/expressions/CastSuiteBase.scala | 4 + 2 files changed, 48 insertions(+), 48 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala index b174165..ad87f2a 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala @@ -19,6 +19,7 @@ package org.apache.spark.sql.catalyst.util import java.time.{Duration, Period} import java.time.temporal.ChronoUnit +import java.util.Locale import java.util.concurrent.TimeUnit import scala.collection.mutable @@ -173,15 +174,14 @@ object IntervalUtils { startField: Byte, endField: Byte): Int = { -def checkYMIntervalStringDataType(ym: YM): Unit = - checkIntervalStringDataType(input, startField, endField, ym) +def checkTargetType(targetStartField: Byte, targetEndField: Byte): Boolean = + startField == targetStartField && endField == targetEndField input.trimAll().toString match { - case yearMonthRegex(sign, year, month) => -checkYMIntervalStringDataType(YM(YM.YEAR, YM.MONTH)) + case yearMonthRegex(sign, year, month) if checkTargetType(YM.YEAR, YM.MONTH) => toYMInterval(year, month, finalSign(sign)) - case yearMonthLiteralRegex(firstSign, secondSign, year, month) => -checkYMIntervalStringDataType(YM(YM.YEAR, YM.MONTH)) + case yearMonthLiteralRegex(firstSign, secondSign, year, month) +if checkTargetType(YM.YEAR, YM.MONTH) => toYMInterval(year, month, finalSign(firstSign, secondSign)) case yearMonthIndividualRegex(firstSign, value) => safeToInterval("year-month") { @@ -195,15 +195,16 @@ object IntervalUtils { input, startField, endField, "year-month", YM(startField, endField).typeName) } } - case yearMonthIndividualLiteralRegex(firstSign, secondSign, value, suffix) => + case yearMonthIndividualLiteralRegex(firstSign, secondSign, value, unit) => safeToInterval("year-month") { val sign = finalSign(firstSign, secondSign) - if ("YEAR".equalsIgnoreCase(suffix)) { -checkYMIntervalStringDataType(YM(YM.YEAR, YM.YEAR)) -sign * Math.toIntExact(value.toLong * MONTHS_PER_YEAR) - } else { -checkYMIntervalStringDataType(YM(YM.MONTH, YM.MONTH)) -Math.toIntExact(sign * value.toLong) + unit.toUpperCase(Locale.ROOT) match { +case "YEAR" if checkTargetType(YM.YEAR, YM.YEAR) => + sign * Math.toIntExact(value.toLong * MONTHS_PER_YEAR) +case "MONTH" if checkTargetType(YM.MONTH, YM.MONTH) => + Math.toIntExact(sign * value.toLong) +case _ => throwIllegalIntervalFormatException(input, startField, endField, + "year-month", YM(startField, endField).typeName) } } case _ => throwIllegalIntervalFormatException(input, startField, endField, @@ -303,48 +304,45 @@ object IntervalUtils { } } -def checkDTIntervalStringDataType(dt: DT): Unit = - checkIntervalStringDataType(input, startField, endField, dt, Some(fallbackNotice)) +def checkTargetType(targetStartField: Byte, targetEndField: Byte): Boolean = + startField == targetStartField && endField == targetEndField input.trimAll().toString match { - case dayHourRegex(sign, day, hour) => -checkDTIntervalStringDataType(DT(DT.DAY, DT.HOUR)) + case dayH
[spark] branch master updated: [SPARK-36017][SQL] Support TimestampNTZType in expression ApproximatePercentile
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new cc4463e [SPARK-36017][SQL] Support TimestampNTZType in expression ApproximatePercentile cc4463e is described below commit cc4463e818749faaf648ec71699d1e2fd3828c3f Author: gengjiaan AuthorDate: Wed Jul 7 12:41:11 2021 +0300 [SPARK-36017][SQL] Support TimestampNTZType in expression ApproximatePercentile ### What changes were proposed in this pull request? The current `ApproximatePercentile` supports `TimestampType`, but not supports timestamp without time zone yet. This PR will add the function. ### Why are the changes needed? `ApproximatePercentile` need supports `TimestampNTZType`. ### Does this PR introduce _any_ user-facing change? 'Yes'. `ApproximatePercentile` accepts `TimestampNTZType`. ### How was this patch tested? New tests. Closes #33241 from beliefer/SPARK-36017. Authored-by: gengjiaan Signed-off-by: Max Gekk --- .../expressions/aggregate/ApproximatePercentile.scala | 10 +- .../apache/spark/sql/ApproximatePercentileQuerySuite.scala | 14 +- 2 files changed, 14 insertions(+), 10 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala index 78e64bf..8cce79c 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala @@ -92,9 +92,9 @@ case class ApproximatePercentile( private lazy val accuracy: Long = accuracyExpression.eval().asInstanceOf[Number].longValue override def inputTypes: Seq[AbstractDataType] = { -// Support NumericType, DateType and TimestampType since their internal types are all numeric, -// and can be easily cast to double for processing. -Seq(TypeCollection(NumericType, DateType, TimestampType), +// Support NumericType, DateType, TimestampType and TimestampNTZType since their internal types +// are all numeric, and can be easily cast to double for processing. +Seq(TypeCollection(NumericType, DateType, TimestampType, TimestampNTZType), TypeCollection(DoubleType, ArrayType(DoubleType, containsNull = false)), IntegralType) } @@ -139,7 +139,7 @@ case class ApproximatePercentile( // Convert the value to a double value val doubleValue = child.dataType match { case DateType => value.asInstanceOf[Int].toDouble -case TimestampType => value.asInstanceOf[Long].toDouble +case TimestampType | TimestampNTZType => value.asInstanceOf[Long].toDouble case n: NumericType => n.numeric.toDouble(value.asInstanceOf[n.InternalType]) case other: DataType => throw QueryExecutionErrors.dataTypeUnexpectedError(other) @@ -158,7 +158,7 @@ case class ApproximatePercentile( val doubleResult = buffer.getPercentiles(percentages) val result = child.dataType match { case DateType => doubleResult.map(_.toInt) - case TimestampType => doubleResult.map(_.toLong) + case TimestampType | TimestampNTZType => doubleResult.map(_.toLong) case ByteType => doubleResult.map(_.toByte) case ShortType => doubleResult.map(_.toShort) case IntegerType => doubleResult.map(_.toInt) diff --git a/sql/core/src/test/scala/org/apache/spark/sql/ApproximatePercentileQuerySuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/ApproximatePercentileQuerySuite.scala index 4991e39..5ff15c9 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/ApproximatePercentileQuerySuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/ApproximatePercentileQuerySuite.scala @@ -18,6 +18,7 @@ package org.apache.spark.sql import java.sql.{Date, Timestamp} +import java.time.LocalDateTime import org.apache.spark.sql.catalyst.expressions.aggregate.ApproximatePercentile import org.apache.spark.sql.catalyst.expressions.aggregate.ApproximatePercentile.DEFAULT_PERCENTILE_ACCURACY @@ -89,23 +90,26 @@ class ApproximatePercentileQuerySuite extends QueryTest with SharedSparkSession test("percentile_approx, different column types") { withTempView(table) { val intSeq = 1 to 1000 - val data: Seq[(java.math.BigDecimal, Date, Timestamp)] = intSeq.map { i => -(new java.math.BigDecimal(i), DateTimeUtils.toJavaDate(i), DateTimeUtils.toJavaTimestamp(i)) + val data: Seq[(java.math.BigDecimal, Date, Timestamp, LocalDateTime)] = intSeq.map { i => +(new ja
[spark] branch branch-3.2 updated: [SPARK-36054][SQL] Support group by TimestampNTZ type column
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch branch-3.2 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.2 by this push: new ae62c9d [SPARK-36054][SQL] Support group by TimestampNTZ type column ae62c9d is described below commit ae62c9d7726e2b05897f7e807bb9cdcc2748e3fa Author: Gengliang Wang AuthorDate: Thu Jul 8 22:33:25 2021 +0300 [SPARK-36054][SQL] Support group by TimestampNTZ type column ### What changes were proposed in this pull request? Support group by TimestampNTZ type column ### Why are the changes needed? It's a basic SQL operation. ### Does this PR introduce _any_ user-facing change? No, the new timestmap type is not released yet. ### How was this patch tested? Unit test Closes #33268 from gengliangwang/agg. Authored-by: Gengliang Wang Signed-off-by: Max Gekk (cherry picked from commit 382b66e26725e4667607303ebb9803b05e8076bc) Signed-off-by: Max Gekk --- .../org/apache/spark/sql/catalyst/expressions/hash.scala| 2 +- .../spark/sql/execution/vectorized/OffHeapColumnVector.java | 3 ++- .../spark/sql/execution/vectorized/OnHeapColumnVector.java | 3 ++- .../spark/sql/execution/aggregate/HashMapGenerator.scala| 2 +- .../org/apache/spark/sql/DataFrameAggregateSuite.scala | 13 - 5 files changed, 18 insertions(+), 5 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/hash.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/hash.scala index d730586..3785262 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/hash.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/hash.scala @@ -490,7 +490,7 @@ abstract class HashExpression[E] extends Expression { case BooleanType => genHashBoolean(input, result) case ByteType | ShortType | IntegerType | DateType => genHashInt(input, result) case LongType => genHashLong(input, result) -case TimestampType => genHashTimestamp(input, result) +case TimestampType | TimestampNTZType => genHashTimestamp(input, result) case FloatType => genHashFloat(input, result) case DoubleType => genHashDouble(input, result) case d: DecimalType => genHashDecimal(ctx, d, input, result) diff --git a/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OffHeapColumnVector.java b/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OffHeapColumnVector.java index 7da5a28..b4b6903 100644 --- a/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OffHeapColumnVector.java +++ b/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OffHeapColumnVector.java @@ -553,7 +553,8 @@ public final class OffHeapColumnVector extends WritableColumnVector { type instanceof DateType || DecimalType.is32BitDecimalType(type)) { this.data = Platform.reallocateMemory(data, oldCapacity * 4L, newCapacity * 4L); } else if (type instanceof LongType || type instanceof DoubleType || -DecimalType.is64BitDecimalType(type) || type instanceof TimestampType) { +DecimalType.is64BitDecimalType(type) || type instanceof TimestampType || +type instanceof TimestampNTZType) { this.data = Platform.reallocateMemory(data, oldCapacity * 8L, newCapacity * 8L); } else if (childColumns != null) { // Nothing to store. diff --git a/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OnHeapColumnVector.java b/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OnHeapColumnVector.java index 5942c5f..3fb96d8 100644 --- a/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OnHeapColumnVector.java +++ b/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OnHeapColumnVector.java @@ -547,7 +547,8 @@ public final class OnHeapColumnVector extends WritableColumnVector { if (intData != null) System.arraycopy(intData, 0, newData, 0, capacity); intData = newData; } -} else if (type instanceof LongType || type instanceof TimestampType || +} else if (type instanceof LongType || +type instanceof TimestampType ||type instanceof TimestampNTZType || DecimalType.is64BitDecimalType(type) || type instanceof DayTimeIntervalType) { if (longData == null || longData.length < newCapacity) { long[] newData = new long[newCapacity]; diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashMapGenerator.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashMapGenerator.scala index b3f5e34..713e7db 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashMapGenerator.scala +++ b/
[spark] branch master updated (819c482 -> 382b66e)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 819c482 [SPARK-35340][PYTHON] Standardize TypeError messages for unsupported basic operations add 382b66e [SPARK-36054][SQL] Support group by TimestampNTZ type column No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/catalyst/expressions/hash.scala| 2 +- .../spark/sql/execution/vectorized/OffHeapColumnVector.java | 3 ++- .../spark/sql/execution/vectorized/OnHeapColumnVector.java | 3 ++- .../spark/sql/execution/aggregate/HashMapGenerator.scala| 2 +- .../org/apache/spark/sql/DataFrameAggregateSuite.scala | 13 - 5 files changed, 18 insertions(+), 5 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.2 updated: [SPARK-36049][SQL] Remove IntervalUnit
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch branch-3.2 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.2 by this push: new 2f54d9e [SPARK-36049][SQL] Remove IntervalUnit 2f54d9e is described below commit 2f54d9eed6dce5e9fc853ef4dceb1abb2b338a34 Author: Angerszh AuthorDate: Thu Jul 8 23:02:21 2021 +0300 [SPARK-36049][SQL] Remove IntervalUnit ### What changes were proposed in this pull request? Remove IntervalUnit ### Why are the changes needed? Clean code ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Not need Closes #33265 from AngersZh/SPARK-36049. Lead-authored-by: Angerszh Co-authored-by: Maxim Gekk Signed-off-by: Max Gekk (cherry picked from commit fef7e1703c342165000f89b01112a8a28a936436) Signed-off-by: Max Gekk --- .../spark/sql/catalyst/util/IntervalUtils.scala| 120 - .../catalyst/parser/ExpressionParserSuite.scala| 33 ++ 2 files changed, 58 insertions(+), 95 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala index 7579a28..e026266 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala @@ -41,22 +41,6 @@ object IntervalStringStyles extends Enumeration { object IntervalUtils { - object IntervalUnit extends Enumeration { -type IntervalUnit = Value - -val NANOSECOND = Value(0, "nanosecond") -val MICROSECOND = Value(1, "microsecond") -val MILLISECOND = Value(2, "millisecond") -val SECOND = Value(3, "second") -val MINUTE = Value(4, "minute") -val HOUR = Value(5, "hour") -val DAY = Value(6, "day") -val WEEK = Value(7, "week") -val MONTH = Value(8, "month") -val YEAR = Value(9, "year") - } - import IntervalUnit._ - private val MAX_DAY = Long.MaxValue / MICROS_PER_DAY private val MAX_HOUR = Long.MaxValue / MICROS_PER_HOUR private val MAX_MINUTE = Long.MaxValue / MICROS_PER_MINUTE @@ -97,7 +81,7 @@ object IntervalUtils { def getSeconds(interval: CalendarInterval): Decimal = getSeconds(interval.microseconds) private def toLongWithRange( - fieldName: IntervalUnit, + fieldName: UTF8String, s: String, minValue: Long, maxValue: Long): Long = { @@ -250,10 +234,11 @@ object IntervalUtils { } } - private def toYMInterval(yearStr: String, monthStr: String, sign: Int): Int = { + private def toYMInterval(year: String, month: String, sign: Int): Int = { safeToInterval("year-month") { - val years = toLongWithRange(YEAR, yearStr, 0, Integer.MAX_VALUE / MONTHS_PER_YEAR) - val totalMonths = sign * (years * MONTHS_PER_YEAR + toLongWithRange(MONTH, monthStr, 0, 11)) + val years = toLongWithRange(yearStr, year, 0, Integer.MAX_VALUE / MONTHS_PER_YEAR) + val totalMonths = +sign * (years * MONTHS_PER_YEAR + toLongWithRange(monthStr, month, 0, 11)) Math.toIntExact(totalMonths) } } @@ -402,45 +387,33 @@ object IntervalUtils { } } - def toDTInterval( - dayStr: String, - hourStr: String, - minuteStr: String, - secondStr: String, - sign: Int): Long = { + def toDTInterval(day: String, hour: String, minute: String, second: String, sign: Int): Long = { var micros = 0L -val days = toLongWithRange(DAY, dayStr, 0, MAX_DAY).toInt +val days = toLongWithRange(dayStr, day, 0, MAX_DAY).toInt micros = Math.addExact(micros, sign * days * MICROS_PER_DAY) -val hours = toLongWithRange(HOUR, hourStr, 0, 23) +val hours = toLongWithRange(hourStr, hour, 0, 23) micros = Math.addExact(micros, sign * hours * MICROS_PER_HOUR) -val minutes = toLongWithRange(MINUTE, minuteStr, 0, 59) +val minutes = toLongWithRange(minuteStr, minute, 0, 59) micros = Math.addExact(micros, sign * minutes * MICROS_PER_MINUTE) -micros = Math.addExact(micros, sign * parseSecondNano(secondStr)) +micros = Math.addExact(micros, sign * parseSecondNano(second)) micros } - def toDTInterval( - hourStr: String, - minuteStr: String, - secondStr: String, - sign: Int): Long = { + def toDTInterval(hour: String, minute: String, second: String, sign: Int): Long = { var micros = 0L -val hours = toLongWithRange(HOUR, hourStr, 0, MAX_HOUR) +val hours = toLongWithRange(hourStr, hour, 0, MAX_HOUR) micros = Math.addExact(micros, sign * hours * MICROS_PER_HOUR) -val minutes = toLongWithRange(MINU
[spark] branch master updated (382b66e -> fef7e170)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 382b66e [SPARK-36054][SQL] Support group by TimestampNTZ type column add fef7e170 [SPARK-36049][SQL] Remove IntervalUnit No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/util/IntervalUtils.scala| 120 - .../catalyst/parser/ExpressionParserSuite.scala| 33 ++ 2 files changed, 58 insertions(+), 95 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (87282f0 -> c8ff613)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 87282f0 [SPARK-35972][SQL] When replace ExtractValue in NestedColumnAliasing we should use semanticEquals add c8ff613 [SPARK-35983][SQL] Allow from_json/to_json for map types where value types are day-time intervals No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/json/JacksonGenerator.scala | 9 + .../spark/sql/catalyst/json/JacksonParser.scala| 7 .../org/apache/spark/sql/JsonFunctionsSuite.scala | 44 +- 3 files changed, 59 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.2 updated: [SPARK-35983][SQL] Allow from_json/to_json for map types where value types are day-time intervals
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch branch-3.2 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.2 by this push: new 634b2e2 [SPARK-35983][SQL] Allow from_json/to_json for map types where value types are day-time intervals 634b2e2 is described below commit 634b2e265c1a95af2f2df35f8ecb9d05d4fe752f Author: Kousuke Saruta AuthorDate: Tue Jul 6 11:06:56 2021 +0300 [SPARK-35983][SQL] Allow from_json/to_json for map types where value types are day-time intervals ### What changes were proposed in this pull request? This PR fixes two issues. One is that `to_json` doesn't support `map` types where value types are `day-time` interval types like: ``` spark-sql> select to_json(map('a', interval '1 2:3:4' day to second)); 21/07/06 14:53:58 ERROR SparkSQLDriver: Failed in [select to_json(map('a', interval '1 2:3:4' day to second))] java.lang.RuntimeException: Failed to convert value 9378400 (class of class java.lang.Long) with the type of DayTimeIntervalType(0,3) to JSON. ``` The other issue is that even if the issue of `to_json` is resolved, `from_json` doesn't support to convert `day-time` interval string to JSON. So the result of following query will be `null`. ``` spark-sql> select from_json(to_json(map('a', interval '1 2:3:4' day to second)), 'a interval day to second'); {"a":null} ``` ### Why are the changes needed? There should be no reason why day-time intervals cannot used as map value types. `CalendarIntervalTypes` can do it. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? New tests. Closes #33225 from sarutak/json-dtinterval. Authored-by: Kousuke Saruta Signed-off-by: Max Gekk (cherry picked from commit c8ff613c3cd0d04ebfaf57feeb67e21b3c8410a2) Signed-off-by: Max Gekk --- .../spark/sql/catalyst/json/JacksonGenerator.scala | 9 + .../spark/sql/catalyst/json/JacksonParser.scala| 7 .../org/apache/spark/sql/JsonFunctionsSuite.scala | 44 +- 3 files changed, 59 insertions(+), 1 deletion(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala index 9777d56..9bd0546 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala @@ -156,6 +156,15 @@ private[sql] class JacksonGenerator( end) gen.writeString(ymString) +case DayTimeIntervalType(start, end) => + (row: SpecializedGetters, ordinal: Int) => +val dtString = IntervalUtils.toDayTimeIntervalString( + row.getLong(ordinal), + IntervalStringStyles.ANSI_STYLE, + start, + end) +gen.writeString(dtString) + case BinaryType => (row: SpecializedGetters, ordinal: Int) => gen.writeBinary(row.getBinary(ordinal)) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala index 2aa735d..8a1191c 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala @@ -302,6 +302,13 @@ class JacksonParser( Integer.valueOf(expr.eval(EmptyRow).asInstanceOf[Int]) } +case dt: DayTimeIntervalType => (parser: JsonParser) => + parseJsonToken[java.lang.Long](parser, dataType) { +case VALUE_STRING => + val expr = Cast(Literal(parser.getText), dt) + java.lang.Long.valueOf(expr.eval(EmptyRow).asInstanceOf[Long]) + } + case st: StructType => val fieldConverters = st.map(_.dataType).map(makeConverter).toArray (parser: JsonParser) => parseJsonToken[InternalRow](parser, dataType) { diff --git a/sql/core/src/test/scala/org/apache/spark/sql/JsonFunctionsSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/JsonFunctionsSuite.scala index c2bea8c..82cca2b 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/JsonFunctionsSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/JsonFunctionsSuite.scala @@ -18,7 +18,7 @@ package org.apache.spark.sql import java.text.SimpleDateFormat -import java.time.Period +import java.time.{Duration, Period} import java.util.Locale import collection.JavaConverters._ @@ -28,6 +28,7 @@ import org.apache.spark.sql.functions._ import org.apache.spark.sql.internal.SQLConf import org.apache.spark.sql.test.SharedSparkSession
[spark] branch branch-3.2 updated: [SPARK-36023][SPARK-35735][SPARK-35768][SQL] Refactor code about parse string to DT/YM
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch branch-3.2 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.2 by this push: new b53d285 [SPARK-36023][SPARK-35735][SPARK-35768][SQL] Refactor code about parse string to DT/YM b53d285 is described below commit b53d285f72a918abeafaf7517281d08cf57beb64 Author: Angerszh AuthorDate: Tue Jul 6 13:51:06 2021 +0300 [SPARK-36023][SPARK-35735][SPARK-35768][SQL] Refactor code about parse string to DT/YM ### What changes were proposed in this pull request? Refactor code about parse string to DT/YM intervals. ### Why are the changes needed? Extracting the common code about parse string to DT/YM should improve code maintenance. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existed UT. Closes #33217 from AngersZh/SPARK-35735-35768. Authored-by: Angerszh Signed-off-by: Max Gekk (cherry picked from commit 26d1bb16bc565dbcb1a3f536dc78cd87be6c2468) Signed-off-by: Max Gekk --- .../spark/sql/catalyst/util/IntervalUtils.scala| 201 ++--- .../sql/catalyst/expressions/CastSuiteBase.scala | 28 ++- 2 files changed, 123 insertions(+), 106 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala index 30a2fa5..b174165 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala @@ -29,7 +29,7 @@ import org.apache.spark.sql.catalyst.util.DateTimeUtils.millisToMicros import org.apache.spark.sql.catalyst.util.IntervalStringStyles.{ANSI_STYLE, HIVE_STYLE, IntervalStyle} import org.apache.spark.sql.errors.QueryExecutionErrors import org.apache.spark.sql.internal.SQLConf -import org.apache.spark.sql.types.{DayTimeIntervalType => DT, Decimal, YearMonthIntervalType => YM} +import org.apache.spark.sql.types.{DataType, DayTimeIntervalType => DT, Decimal, YearMonthIntervalType => YM} import org.apache.spark.unsafe.types.{CalendarInterval, UTF8String} // The style of textual representation of intervals @@ -110,7 +110,7 @@ object IntervalUtils { private val yearMonthIndividualLiteralRegex = (s"(?i)^INTERVAL\\s+([+|-])?'$yearMonthIndividualPatternString'\\s+(YEAR|MONTH)$$").r - private def getSign(firstSign: String, secondSign: String): Int = { + private def finalSign(firstSign: String, secondSign: String = null): Int = { (firstSign, secondSign) match { case ("-", "-") => 1 case ("-", _) => -1 @@ -119,6 +119,39 @@ object IntervalUtils { } } + private def throwIllegalIntervalFormatException( + input: UTF8String, + startFiled: Byte, + endField: Byte, + intervalStr: String, + typeName: String, + fallBackNotice: Option[String] = None) = { +throw new IllegalArgumentException( + s"Interval string does not match $intervalStr format of " + +s"${supportedFormat((startFiled, endField)).map(format => s"`$format`").mkString(", ")} " + +s"when cast to $typeName: ${input.toString}" + +s"${fallBackNotice.map(s => s", $s").getOrElse("")}") + } + + private def checkIntervalStringDataType( + input: UTF8String, + targetStartField: Byte, + targetEndField: Byte, + inputIntervalType: DataType, + fallBackNotice: Option[String] = None): Unit = { +val (intervalStr, typeName, inputStartField, inputEndField) = inputIntervalType match { + case DT(startField, endField) => +("day-time", DT(targetStartField, targetEndField).typeName, startField, endField) + case YM(startField, endField) => +("year-month", YM(targetStartField, targetEndField).typeName, startField, endField) +} +if (targetStartField != inputStartField || targetEndField != inputEndField) { + throwIllegalIntervalFormatException( +input, targetStartField, targetEndField, intervalStr, typeName, fallBackNotice) +} + } + + val supportedFormat = Map( (YM.YEAR, YM.MONTH) -> Seq("[+|-]y-m", "INTERVAL [+|-]'[+|-]y-m' YEAR TO MONTH"), (YM.YEAR, YM.YEAR) -> Seq("[+|-]y", "INTERVAL [+|-]'[+|-]y' YEAR"), @@ -140,56 +173,41 @@ object IntervalUtils { startField: Byte, endField: Byte): Int = { -def checkStringIntervalType(targetStartField: Byte, targetEndField: Byte): Unit = { - if (startField != targetStartField || endField != targetEndField) { -
[spark] branch master updated: [SPARK-36023][SPARK-35735][SPARK-35768][SQL] Refactor code about parse string to DT/YM
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 26d1bb1 [SPARK-36023][SPARK-35735][SPARK-35768][SQL] Refactor code about parse string to DT/YM 26d1bb1 is described below commit 26d1bb16bc565dbcb1a3f536dc78cd87be6c2468 Author: Angerszh AuthorDate: Tue Jul 6 13:51:06 2021 +0300 [SPARK-36023][SPARK-35735][SPARK-35768][SQL] Refactor code about parse string to DT/YM ### What changes were proposed in this pull request? Refactor code about parse string to DT/YM intervals. ### Why are the changes needed? Extracting the common code about parse string to DT/YM should improve code maintenance. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existed UT. Closes #33217 from AngersZh/SPARK-35735-35768. Authored-by: Angerszh Signed-off-by: Max Gekk --- .../spark/sql/catalyst/util/IntervalUtils.scala| 201 ++--- .../sql/catalyst/expressions/CastSuiteBase.scala | 28 ++- 2 files changed, 123 insertions(+), 106 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala index 30a2fa5..b174165 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala @@ -29,7 +29,7 @@ import org.apache.spark.sql.catalyst.util.DateTimeUtils.millisToMicros import org.apache.spark.sql.catalyst.util.IntervalStringStyles.{ANSI_STYLE, HIVE_STYLE, IntervalStyle} import org.apache.spark.sql.errors.QueryExecutionErrors import org.apache.spark.sql.internal.SQLConf -import org.apache.spark.sql.types.{DayTimeIntervalType => DT, Decimal, YearMonthIntervalType => YM} +import org.apache.spark.sql.types.{DataType, DayTimeIntervalType => DT, Decimal, YearMonthIntervalType => YM} import org.apache.spark.unsafe.types.{CalendarInterval, UTF8String} // The style of textual representation of intervals @@ -110,7 +110,7 @@ object IntervalUtils { private val yearMonthIndividualLiteralRegex = (s"(?i)^INTERVAL\\s+([+|-])?'$yearMonthIndividualPatternString'\\s+(YEAR|MONTH)$$").r - private def getSign(firstSign: String, secondSign: String): Int = { + private def finalSign(firstSign: String, secondSign: String = null): Int = { (firstSign, secondSign) match { case ("-", "-") => 1 case ("-", _) => -1 @@ -119,6 +119,39 @@ object IntervalUtils { } } + private def throwIllegalIntervalFormatException( + input: UTF8String, + startFiled: Byte, + endField: Byte, + intervalStr: String, + typeName: String, + fallBackNotice: Option[String] = None) = { +throw new IllegalArgumentException( + s"Interval string does not match $intervalStr format of " + +s"${supportedFormat((startFiled, endField)).map(format => s"`$format`").mkString(", ")} " + +s"when cast to $typeName: ${input.toString}" + +s"${fallBackNotice.map(s => s", $s").getOrElse("")}") + } + + private def checkIntervalStringDataType( + input: UTF8String, + targetStartField: Byte, + targetEndField: Byte, + inputIntervalType: DataType, + fallBackNotice: Option[String] = None): Unit = { +val (intervalStr, typeName, inputStartField, inputEndField) = inputIntervalType match { + case DT(startField, endField) => +("day-time", DT(targetStartField, targetEndField).typeName, startField, endField) + case YM(startField, endField) => +("year-month", YM(targetStartField, targetEndField).typeName, startField, endField) +} +if (targetStartField != inputStartField || targetEndField != inputEndField) { + throwIllegalIntervalFormatException( +input, targetStartField, targetEndField, intervalStr, typeName, fallBackNotice) +} + } + + val supportedFormat = Map( (YM.YEAR, YM.MONTH) -> Seq("[+|-]y-m", "INTERVAL [+|-]'[+|-]y-m' YEAR TO MONTH"), (YM.YEAR, YM.YEAR) -> Seq("[+|-]y", "INTERVAL [+|-]'[+|-]y' YEAR"), @@ -140,56 +173,41 @@ object IntervalUtils { startField: Byte, endField: Byte): Int = { -def checkStringIntervalType(targetStartField: Byte, targetEndField: Byte): Unit = { - if (startField != targetStartField || endField != targetEndField) { -throw new IllegalArgumentException(s"Interval string does not match year-month format of " + - s&q
[spark] branch master updated (38ef477 -> d572a85)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 38ef477 [SPARK-32921][SHUFFLE] MapOutputTracker extensions to support push-based shuffle add d572a85 [SPARK-35224][SQL][TESTS] Fix buffer overflow in `MutableProjectionSuite` No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/expressions/MutableProjectionSuite.scala| 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (74afc68 -> c0a3c0c)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 74afc68 [SPARK-35213][SQL] Keep the correct ordering of nested structs in chained withField operations add c0a3c0c [SPARK-35088][SQL] Accept ANSI intervals by the Sequence expression No new revisions were added by this update. Summary of changes: .../expressions/collectionOperations.scala | 194 - .../expressions/CollectionExpressionsSuite.scala | 176 ++- 2 files changed, 326 insertions(+), 44 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.1 updated (eaceb40 -> c59db3d)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch branch-3.1 in repository https://gitbox.apache.org/repos/asf/spark.git. from eaceb40 [SPARK-35213][SQL] Keep the correct ordering of nested structs in chained withField operations add c59db3d [SPARK-35224][SQL][TESTS][3.1][3.0] Fix buffer overflow in `MutableProjectionSuite` No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/expressions/MutableProjectionSuite.scala| 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org