[jira] [Created] (SPARK-34727) Difference in results of casting float to timestamp
Maxim Gekk created SPARK-34727: -- Summary: Difference in results of casting float to timestamp Key: SPARK-34727 URL: https://issues.apache.org/jira/browse/SPARK-34727 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.2.0 Reporter: Maxim Gekk The code below portraits the issue: {code:sql} spark-sql> CREATE TEMP VIEW v1 AS SELECT 16777215.0f AS f; spark-sql> SELECT * FROM v1; 1.6777215E7 spark-sql> SELECT CAST(f AS TIMESTAMP) FROM v1; 1970-07-14 07:20:15 spark-sql> CACHE TABLE v1; spark-sql> SELECT * FROM v1; 1.6777215E7 spark-sql> SELECT CAST(f AS TIMESTAMP) FROM v1; 1970-07-14 07:20:14.951424 {code} The result from the cached view *1970-07-14 07:20:14.951424* is different from un-cached view *1970-07-14 07:20:15*. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34727) Difference in results of casting float to timestamp
[ https://issues.apache.org/jira/browse/SPARK-34727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17300263#comment-17300263 ] Maxim Gekk commented on SPARK-34727: I am working on a fix. > Difference in results of casting float to timestamp > --- > > Key: SPARK-34727 > URL: https://issues.apache.org/jira/browse/SPARK-34727 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Maxim Gekk >Priority: Major > > The code below portraits the issue: > {code:sql} > spark-sql> CREATE TEMP VIEW v1 AS SELECT 16777215.0f AS f; > spark-sql> SELECT * FROM v1; > 1.6777215E7 > spark-sql> SELECT CAST(f AS TIMESTAMP) FROM v1; > 1970-07-14 07:20:15 > spark-sql> CACHE TABLE v1; > spark-sql> SELECT * FROM v1; > 1.6777215E7 > spark-sql> SELECT CAST(f AS TIMESTAMP) FROM v1; > 1970-07-14 07:20:14.951424 > {code} > The result from the cached view *1970-07-14 07:20:14.951424* is different > from un-cached view *1970-07-14 07:20:15*. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34721) Add an year-month interval to a date
[ https://issues.apache.org/jira/browse/SPARK-34721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17299834#comment-17299834 ] Maxim Gekk commented on SPARK-34721: I am working on this. > Add an year-month interval to a date > > > Key: SPARK-34721 > URL: https://issues.apache.org/jira/browse/SPARK-34721 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Maxim Gekk >Priority: Major > > Support adding of YearMonthIntervalType values to DATE values. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34721) Add an year-month interval to a date
Maxim Gekk created SPARK-34721: -- Summary: Add an year-month interval to a date Key: SPARK-34721 URL: https://issues.apache.org/jira/browse/SPARK-34721 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.2.0 Reporter: Maxim Gekk Support adding of YearMonthIntervalType values to DATE values. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34718) Assign pretty names to YearMonthIntervalType and DayTimeIntervalType
Maxim Gekk created SPARK-34718: -- Summary: Assign pretty names to YearMonthIntervalType and DayTimeIntervalType Key: SPARK-34718 URL: https://issues.apache.org/jira/browse/SPARK-34718 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.2.0 Reporter: Maxim Gekk Override the typeName() method in YearMonthIntervalType and DayTimeIntervalType, and assign names according to the SQL standard. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34716) Support ANSI SQL intervals by the aggregate function `sum`
Maxim Gekk created SPARK-34716: -- Summary: Support ANSI SQL intervals by the aggregate function `sum` Key: SPARK-34716 URL: https://issues.apache.org/jira/browse/SPARK-34716 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.2.0 Reporter: Maxim Gekk Extend org.apache.spark.sql.catalyst.expressions.aggregate.Sum to support DayTimeIntervalType and YearMonthIntervalType. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34715) Add round trip tests for period <-> month and duration <-> micros
[ https://issues.apache.org/jira/browse/SPARK-34715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-34715: --- Description: Similarly to the test from the PR https://github.com/apache/spark/pull/31799, add tests: 1. Months -> Period -> Months 2. Period -> Months -> Period 3. Duration -> micros -> Duration > Add round trip tests for period <-> month and duration <-> micros > - > > Key: SPARK-34715 > URL: https://issues.apache.org/jira/browse/SPARK-34715 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Maxim Gekk >Priority: Major > > Similarly to the test from the PR https://github.com/apache/spark/pull/31799, > add tests: > 1. Months -> Period -> Months > 2. Period -> Months -> Period > 3. Duration -> micros -> Duration > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34715) Add round trip tests for period <-> month and duration <-> micros
Maxim Gekk created SPARK-34715: -- Summary: Add round trip tests for period <-> month and duration <-> micros Key: SPARK-34715 URL: https://issues.apache.org/jira/browse/SPARK-34715 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.2.0 Reporter: Maxim Gekk -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34675) TimeZone inconsistencies when JVM and session timezones are different
[ https://issues.apache.org/jira/browse/SPARK-34675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17299345#comment-17299345 ] Maxim Gekk commented on SPARK-34675: > Could you link the original related patch and close this issue, [~maxgekk]? I think the issue has been fixed by multiple commits for sub-tasks of https://issues.apache.org/jira/browse/SPARK-26651, https://issues.apache.org/jira/browse/SPARK-31404 & https://issues.apache.org/jira/browse/SPARK-30951 . It is hard to identify particular patches that fix the issue. > TimeZone inconsistencies when JVM and session timezones are different > - > > Key: SPARK-34675 > URL: https://issues.apache.org/jira/browse/SPARK-34675 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.7 >Reporter: Shubham Chaurasia >Priority: Major > > Inserted following data with UTC as both JVM and session timezone. > Spark-shell launch command > {code} > bin/spark-shell --conf spark.hadoop.metastore.catalog.default=hive --conf > spark.sql.catalogImplementation=hive --conf > spark.hadoop.hive.metastore.uris=thrift://localhost:9083 --conf > spark.driver.extraJavaOptions=' -Duser.timezone=UTC' --conf > spark.executor.extraJavaOptions='-Duser.timezone=UTC' > {code} > Table creation > {code:scala} > sql("use ts").show > sql("create table spark_parquet(type string, t timestamp) stored as > parquet").show > sql("create table spark_orc(type string, t timestamp) stored as orc").show > sql("create table spark_avro(type string, t timestamp) stored as avro").show > sql("create table spark_text(type string, t timestamp) stored as > textfile").show > sql("insert into spark_parquet values ('FROM SPARK-EXT PARQUET', '1989-01-05 > 01:02:03')").show > sql("insert into spark_orc values ('FROM SPARK-EXT ORC', '1989-01-05 > 01:02:03')").show > sql("insert into spark_avro values ('FROM SPARK-EXT AVRO', '1989-01-05 > 01:02:03')").show > sql("insert into spark_text values ('FROM SPARK-EXT TEXT', '1989-01-05 > 01:02:03')").show > {code} > Used following function to check and verify the returned timestamps > {code:scala} > scala> :paste > // Entering paste mode (ctrl-D to finish) > def showTs( > db: String, > tables: String* > ): org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = { > sql("use " + db).show > import scala.collection.mutable.ListBuffer > var results = new ListBuffer[org.apache.spark.sql.DataFrame]() > for (tbl <- tables) { > val query = "select * from " + tbl > println("Executing - " + query); > results += sql(query) > } > println("user.timezone - " + System.getProperty("user.timezone")) > println("TimeZone.getDefault - " + java.util.TimeZone.getDefault.getID) > println("spark.sql.session.timeZone - " + > spark.conf.get("spark.sql.session.timeZone")) > var unionDf = results(0) > for (i <- 1 until results.length) { > unionDf = unionDf.unionAll(results(i)) > } > val augmented = unionDf.map(r => (r.getString(0), r.getTimestamp(1), > r.getTimestamp(1).getTime)) > val renamed = augmented.withColumnRenamed("_1", > "type").withColumnRenamed("_2", "ts").withColumnRenamed("_3", "millis") > renamed.show(false) > return renamed > } > // Exiting paste mode, now interpreting. > scala> showTs("ts", "spark_parquet", "spark_orc", "spark_avro", "spark_text") > Hive Session ID = daa82b83-b50d-4038-97ee-1ecb2d01b368 > ++ > || > ++ > ++ > Executing - select * from spark_parquet > Executing - select * from spark_orc > Executing - select * from spark_avro > Executing - select * from spark_text > user.timezone - UTC > TimeZone.getDefault - UTC > spark.sql.session.timeZone - UTC > +--+---++ > > |type |ts |millis | > +--+---++ > |FROM SPARK-EXT PARQUET|1989-01-05 01:02:03|599965323000| > |FROM SPARK-EXT ORC|1989-01-05 01:02:03|599965323000| > |FROM SPARK-EXT AVRO |1989-01-05 01:02:03|599965323000| > |FROM SPARK-EXT TEXT |1989-01-05 01:02:03|599965323000| > +--+---++ > {code} > 1. Set session timezone to America/Los_Angeles > {code:scala} > scala> spark.conf.set("spark.sql.session.timeZone", "America/Los_Angeles") > scala> showTs("ts", "spark_parquet", "spark_orc", "spark_avro", "spark_text") > ++ > || > ++ > ++ > Executing - select * from spark_parquet > Executing - select * from spark_orc > Executing - select * from spark_avro > Executing - select * from spark_text > user.timezone - UTC > TimeZone.getDefault - UTC > spark.sql.session.timeZone - America/Los_Angeles > +--+---++ > |type |ts
[jira] [Commented] (SPARK-34675) TimeZone inconsistencies when JVM and session timezones are different
[ https://issues.apache.org/jira/browse/SPARK-34675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17299086#comment-17299086 ] Maxim Gekk commented on SPARK-34675: Here is the output on the current master (the same result for all datasources): {code} scala> spark.conf.set("spark.sql.session.timeZone", "America/Los_Angeles") scala> showTs("default", "spark_parquet", "spark_orc", "spark_avro", "spark_text") ++ || ++ ++ Executing - select * from spark_parquet Executing - select * from spark_orc Executing - select * from spark_avro Executing - select * from spark_text user.timezone - America/Los_Angeles TimeZone.getDefault - America/Los_Angeles spark.sql.session.timeZone - America/Los_Angeles +--+---++ |type |ts |millis | +--+---++ |FROM SPARK-EXT PARQUET|1989-01-05 01:02:03|54123000| |FROM SPARK-EXT ORC|1989-01-05 01:02:03|54123000| |FROM SPARK-EXT AVRO |1989-01-05 01:02:03|54123000| |FROM SPARK-EXT TEXT |1989-01-05 01:02:03|54123000| +--+---++ res18: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [type: string, ts: timestamp ... 1 more field] {code} > TimeZone inconsistencies when JVM and session timezones are different > - > > Key: SPARK-34675 > URL: https://issues.apache.org/jira/browse/SPARK-34675 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.7 >Reporter: Shubham Chaurasia >Priority: Major > > Inserted following data with UTC as both JVM and session timezone. > Spark-shell launch command > {code} > bin/spark-shell --conf spark.hadoop.metastore.catalog.default=hive --conf > spark.sql.catalogImplementation=hive --conf > spark.hadoop.hive.metastore.uris=thrift://localhost:9083 --conf > spark.driver.extraJavaOptions=' -Duser.timezone=UTC' --conf > spark.executor.extraJavaOptions='-Duser.timezone=UTC' > {code} > Table creation > {code:scala} > sql("use ts").show > sql("create table spark_parquet(type string, t timestamp) stored as > parquet").show > sql("create table spark_orc(type string, t timestamp) stored as orc").show > sql("create table spark_avro(type string, t timestamp) stored as avro").show > sql("create table spark_text(type string, t timestamp) stored as > textfile").show > sql("insert into spark_parquet values ('FROM SPARK-EXT PARQUET', '1989-01-05 > 01:02:03')").show > sql("insert into spark_orc values ('FROM SPARK-EXT ORC', '1989-01-05 > 01:02:03')").show > sql("insert into spark_avro values ('FROM SPARK-EXT AVRO', '1989-01-05 > 01:02:03')").show > sql("insert into spark_text values ('FROM SPARK-EXT TEXT', '1989-01-05 > 01:02:03')").show > {code} > Used following function to check and verify the returned timestamps > {code:scala} > scala> :paste > // Entering paste mode (ctrl-D to finish) > def showTs( > db: String, > tables: String* > ): org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = { > sql("use " + db).show > import scala.collection.mutable.ListBuffer > var results = new ListBuffer[org.apache.spark.sql.DataFrame]() > for (tbl <- tables) { > val query = "select * from " + tbl > println("Executing - " + query); > results += sql(query) > } > println("user.timezone - " + System.getProperty("user.timezone")) > println("TimeZone.getDefault - " + java.util.TimeZone.getDefault.getID) > println("spark.sql.session.timeZone - " + > spark.conf.get("spark.sql.session.timeZone")) > var unionDf = results(0) > for (i <- 1 until results.length) { > unionDf = unionDf.unionAll(results(i)) > } > val augmented = unionDf.map(r => (r.getString(0), r.getTimestamp(1), > r.getTimestamp(1).getTime)) > val renamed = augmented.withColumnRenamed("_1", > "type").withColumnRenamed("_2", "ts").withColumnRenamed("_3", "millis") > renamed.show(false) > return renamed > } > // Exiting paste mode, now interpreting. > scala> showTs("ts", "spark_parquet", "spark_orc", "spark_avro", "spark_text") > Hive Session ID = daa82b83-b50d-4038-97ee-1ecb2d01b368 > ++ > || > ++ > ++ > Executing - select * from spark_parquet > Executing - select * from spark_orc > Executing - select * from spark_avro > Executing - select * from spark_text > user.timezone - UTC > TimeZone.getDefault - UTC > spark.sql.session.timeZone - UTC > +--+---++ > > |type |ts |millis | > +--+---++ > |FROM SPARK-EXT PARQUET|1989-01-05 01:02:03|599965323000| > |FROM SPARK-EXT ORC|1989-01-05 01:02:03|599965323000| > |FROM SPARK-EXT AVRO |1989-01-05
[jira] [Created] (SPARK-34695) Overflow in round trip conversion from micros to duration
Maxim Gekk created SPARK-34695: -- Summary: Overflow in round trip conversion from micros to duration Key: SPARK-34695 URL: https://issues.apache.org/jira/browse/SPARK-34695 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.2.0 Reporter: Maxim Gekk The code below fails with long overflow: {code:scala} scala> import org.apache.spark.sql.catalyst.util.IntervalUtils._ import org.apache.spark.sql.catalyst.util.IntervalUtils._ scala> val minDuration = microsToDuration(Long.MinValue) minDuration: java.time.Duration = PT-2562047788H-54.775808S scala> durationToMicros(minDuration) java.lang.ArithmeticException: long overflow at java.lang.Math.multiplyExact(Math.java:892) at org.apache.spark.sql.catalyst.util.IntervalUtils$.durationToMicros(IntervalUtils.scala:782) ... 49 elided {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34677) Support add and subtract of ANSI SQL intervals
Maxim Gekk created SPARK-34677: -- Summary: Support add and subtract of ANSI SQL intervals Key: SPARK-34677 URL: https://issues.apache.org/jira/browse/SPARK-34677 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.2.0 Reporter: Maxim Gekk Support unary -, -/+ of two ANSI SQL intervals (the same type): year-month and day-time intervals. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34675) TimeZone inconsistencies when JVM and session timezones are different
[ https://issues.apache.org/jira/browse/SPARK-34675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17298233#comment-17298233 ] Maxim Gekk commented on SPARK-34675: > Set session timezone to America/Los_Angeles > scala> spark.conf.set("spark.sql.session.timeZone", "America/Los_Angeles") Processing of dates/timestamps in Spark 2.4.x is based on Java 7 time APIs where JVM time zone is "hard coded" in the classes java.sql.Date/java.sql.Timestamp. So, Spark 2.4.x cannot apply the session time zone in some cases. In Spark 3.x, most of the problems were solved. I would recommend to try the same on it. > TimeZone inconsistencies when JVM and session timezones are different > - > > Key: SPARK-34675 > URL: https://issues.apache.org/jira/browse/SPARK-34675 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.7 >Reporter: Shubham Chaurasia >Priority: Major > > Inserted following data with UTC as both JVM and session timezone. > Spark-shell launch command > {code} > bin/spark-shell --conf spark.hadoop.metastore.catalog.default=hive --conf > spark.sql.catalogImplementation=hive --conf > spark.hadoop.hive.metastore.uris=thrift://localhost:9083 --conf > spark.driver.extraJavaOptions=' -Duser.timezone=UTC' --conf > spark.executor.extraJavaOptions='-Duser.timezone=UTC' > {code} > Table creation > {code:scala} > sql("use ts").show > sql("create table spark_parquet(type string, t timestamp) stored as > parquet").show > sql("create table spark_orc(type string, t timestamp) stored as orc").show > sql("create table spark_avro(type string, t timestamp) stored as avro").show > sql("create table spark_text(type string, t timestamp) stored as > textfile").show > sql("insert into spark_parquet values ('FROM SPARK-EXT PARQUET', '1989-01-05 > 01:02:03')").show > sql("insert into spark_orc values ('FROM SPARK-EXT ORC', '1989-01-05 > 01:02:03')").show > sql("insert into spark_avro values ('FROM SPARK-EXT AVRO', '1989-01-05 > 01:02:03')").show > sql("insert into spark_text values ('FROM SPARK-EXT TEXT', '1989-01-05 > 01:02:03')").show > {code} > Used following function to check and verify the returned timestamps > {code:scala} > scala> :paste > // Entering paste mode (ctrl-D to finish) > def showTs( > db: String, > tables: String* > ): org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = { > sql("use " + db).show > import scala.collection.mutable.ListBuffer > var results = new ListBuffer[org.apache.spark.sql.DataFrame]() > for (tbl <- tables) { > val query = "select * from " + tbl > println("Executing - " + query); > results += sql(query) > } > println("user.timezone - " + System.getProperty("user.timezone")) > println("TimeZone.getDefault - " + java.util.TimeZone.getDefault.getID) > println("spark.sql.session.timeZone - " + > spark.conf.get("spark.sql.session.timeZone")) > var unionDf = results(0) > for (i <- 1 until results.length) { > unionDf = unionDf.unionAll(results(i)) > } > val augmented = unionDf.map(r => (r.getString(0), r.getTimestamp(1), > r.getTimestamp(1).getTime)) > val renamed = augmented.withColumnRenamed("_1", > "type").withColumnRenamed("_2", "ts").withColumnRenamed("_3", "millis") > renamed.show(false) > return renamed > } > // Exiting paste mode, now interpreting. > scala> showTs("ts", "spark_parquet", "spark_orc", "spark_avro", "spark_text") > Hive Session ID = daa82b83-b50d-4038-97ee-1ecb2d01b368 > ++ > || > ++ > ++ > Executing - select * from spark_parquet > Executing - select * from spark_orc > Executing - select * from spark_avro > Executing - select * from spark_text > user.timezone - UTC > TimeZone.getDefault - UTC > spark.sql.session.timeZone - UTC > +--+---++ > > |type |ts |millis | > +--+---++ > |FROM SPARK-EXT PARQUET|1989-01-05 01:02:03|599965323000| > |FROM SPARK-EXT ORC|1989-01-05 01:02:03|599965323000| > |FROM SPARK-EXT AVRO |1989-01-05 01:02:03|599965323000| > |FROM SPARK-EXT TEXT |1989-01-05 01:02:03|599965323000| > +--+---++ > {code} > 1. Set session timezone to America/Los_Angeles > {code:scala} > scala> spark.conf.set("spark.sql.session.timeZone", "America/Los_Angeles") > scala> showTs("ts", "spark_parquet", "spark_orc", "spark_avro", "spark_text") > ++ > || > ++ > ++ > Executing - select * from spark_parquet > Executing - select * from spark_orc > Executing - select * from spark_avro > Executing - select * from spark_text > user.timezone - UTC > TimeZone.getDefault - UTC > spark.sql.session.timeZone - America/Los_Angeles >
[jira] [Updated] (SPARK-34668) Support casting of day-time intervals to strings
[ https://issues.apache.org/jira/browse/SPARK-34668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-34668: --- Description: Extend the Cast expression and support DayTimeIntervalType in casting to StringType. (was: Extend the Cast expression and support YearMonthIntervalType in casting to StringType.) > Support casting of day-time intervals to strings > > > Key: SPARK-34668 > URL: https://issues.apache.org/jira/browse/SPARK-34668 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Maxim Gekk >Priority: Major > > Extend the Cast expression and support DayTimeIntervalType in casting to > StringType. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34668) Support casting of day-time intervals to strings
Maxim Gekk created SPARK-34668: -- Summary: Support casting of day-time intervals to strings Key: SPARK-34668 URL: https://issues.apache.org/jira/browse/SPARK-34668 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.2.0 Reporter: Maxim Gekk Extend the Cast expression and support YearMonthIntervalType in casting to StringType. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34667) Support casting of year-month intervals to strings
Maxim Gekk created SPARK-34667: -- Summary: Support casting of year-month intervals to strings Key: SPARK-34667 URL: https://issues.apache.org/jira/browse/SPARK-34667 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.2.0 Reporter: Maxim Gekk Extend the Cast expression and support YearMonthIntervalType in casting to StringType. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34666) Test DayTimeIntervalType/YearMonthIntervalType as ordered and atomic types
Maxim Gekk created SPARK-34666: -- Summary: Test DayTimeIntervalType/YearMonthIntervalType as ordered and atomic types Key: SPARK-34666 URL: https://issues.apache.org/jira/browse/SPARK-34666 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.2.0 Reporter: Maxim Gekk Add DayTimeIntervalType and YearMonthIntervalType to DataTypeTestUtils.ordered and DataTypeTestUtils.atomicTypes. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34663) Test year-month and day-time intervals in UDF
Maxim Gekk created SPARK-34663: -- Summary: Test year-month and day-time intervals in UDF Key: SPARK-34663 URL: https://issues.apache.org/jira/browse/SPARK-34663 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.2.0 Reporter: Maxim Gekk Write tests for year-month and day-time intervals in UDF as input parameters and results. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34615) Support java.time.Period as an external type of the year-month interval type
[ https://issues.apache.org/jira/browse/SPARK-34615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17296540#comment-17296540 ] Maxim Gekk commented on SPARK-34615: I am working on this sub-task. > Support java.time.Period as an external type of the year-month interval type > > > Key: SPARK-34615 > URL: https://issues.apache.org/jira/browse/SPARK-34615 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Maxim Gekk >Priority: Major > > Allow parallelization/collection of java.time.Period values, and convert the > values to interval values of YearMonthIntervalType. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34619) Update the Spark SQL guide about day-time and year-month interval types
Maxim Gekk created SPARK-34619: -- Summary: Update the Spark SQL guide about day-time and year-month interval types Key: SPARK-34619 URL: https://issues.apache.org/jira/browse/SPARK-34619 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.2.0 Reporter: Maxim Gekk Describe new types at http://spark.apache.org/docs/latest/sql-ref-datatypes.html -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34615) Support java.time.Period as an external type of the year-month interval type
[ https://issues.apache.org/jira/browse/SPARK-34615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-34615: --- Description: Allow parallelization/collection of java.time.Period values, and convert the values to interval values of YearMonthIntervalType. (was: Allow parallelization/collection of java.time.Duration values, and convert the values to interval values of DayTimeIntervalType.) > Support java.time.Period as an external type of the year-month interval type > > > Key: SPARK-34615 > URL: https://issues.apache.org/jira/browse/SPARK-34615 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Maxim Gekk >Priority: Major > > Allow parallelization/collection of java.time.Period values, and convert the > values to interval values of YearMonthIntervalType. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34615) Support java.time.Period as an external type of the year-month interval type
Maxim Gekk created SPARK-34615: -- Summary: Support java.time.Period as an external type of the year-month interval type Key: SPARK-34615 URL: https://issues.apache.org/jira/browse/SPARK-34615 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.2.0 Reporter: Maxim Gekk Allow parallelization/collection of java.time.Duration values, and convert the values to interval values of DayTimeIntervalType. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34605) Support java.time.Duration as an external type of the day-time interval type
[ https://issues.apache.org/jira/browse/SPARK-34605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-34605: --- Summary: Support java.time.Duration as an external type of the day-time interval type (was: Support java.time.Duration as an external type for the day-time interval type) > Support java.time.Duration as an external type of the day-time interval type > > > Key: SPARK-34605 > URL: https://issues.apache.org/jira/browse/SPARK-34605 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Maxim Gekk >Priority: Major > > Allow parallelization/collection of java.time.Duration values, and convert > the values to interval values of DayTimeIntervalType. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34605) Support java.time.Duration as an external type for the day-time interval type
[ https://issues.apache.org/jira/browse/SPARK-34605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17294395#comment-17294395 ] Maxim Gekk commented on SPARK-34605: I am working on the sub-task. > Support java.time.Duration as an external type for the day-time interval type > - > > Key: SPARK-34605 > URL: https://issues.apache.org/jira/browse/SPARK-34605 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Maxim Gekk >Priority: Major > > Allow parallelization/collection of java.time.Duration values, and convert > the values to interval values of DayTimeIntervalType. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34605) Support java.time.Duration as an external type for the day-time interval type
Maxim Gekk created SPARK-34605: -- Summary: Support java.time.Duration as an external type for the day-time interval type Key: SPARK-34605 URL: https://issues.apache.org/jira/browse/SPARK-34605 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.2.0 Reporter: Maxim Gekk Allow parallelization/collection of java.time.Duration values, and convert the values to interval values of DayTimeIntervalType. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27790) Support ANSI SQL INTERVAL types
[ https://issues.apache.org/jira/browse/SPARK-27790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-27790: --- Description: Spark has an INTERVAL data type, but it is “broken”: # It cannot be persisted # It is not comparable because it crosses the month day line. That is there is no telling whether “1 Month 1 Day” is equal to “1 Month 1 Day” since not all months have the same number of days. I propose here to introduce the two flavours of INTERVAL as described in the ANSI SQL Standard and deprecate the Sparks interval type. * ANSI describes two non overlapping “classes”: ** YEAR-MONTH, ** DAY-SECOND ranges * Members within each class can be compared and sorted. * Supports datetime arithmetic * Can be persisted. The old and new flavors of INTERVAL can coexist until Spark INTERVAL is eventually retired. Also any semantic “breakage” can be controlled via legacy config settings. *Milestone 1* -- Spark Interval equivalency ( The new interval types meet or exceed all function of the existing SQL Interval): * Add two new DataType implementations for interval year-month and day-second. Includes the JSON format and DLL string. * Infra support: check the caller sides of DateType/TimestampType * Support the two new interval types in Dataset/UDF. * Interval literals (with a legacy config to still allow mixed year-month day-seconds fields and return legacy interval values) * Interval arithmetic(interval * num, interval / num, interval +/- interval) * Datetime functions/operators: Datetime - Datetime (to days or day second), Datetime +/- interval * Cast to and from the new two interval types, cast string to interval, cast interval to string (pretty printing), with the SQL syntax to specify the types * Support sorting intervals. *Milestone 2* -- Persistence: * Ability to create tables of type interval * Ability to write to common file formats such as Parquet and JSON. * INSERT, SELECT, UPDATE, MERGE * Discovery *Milestone 3* -- Client support * JDBC support * Hive Thrift server *Milestone 4* -- PySpark and Spark R integration * Python UDF can take and return intervals * DataFrame support was: Spark has an INTERVAL data type, but it is “broken”: # It cannot be persisted # It is not comparable because it crosses the month day line. That is there is no telling whether “1 Month 1 Day” is equal to “1 Month 1 Day” since not all months have the same number of days. I propose here to introduce the two flavours of INTERVAL as described in the ANSI SQL Standard and deprecate the Sparks interval type. * ANSI describes two non overlapping “classes”: * YEAR-MONTH, * DAY-SECOND ranges * Members within each class can be compared and sorted. * Supports datetime arithmetic * Can be persisted. The old and new flavors of INTERVAL can coexist until Spark INTERVAL is eventually retired. Also any semantic “breakage” can be controlled via legacy config settings. *Milestone 1* -- Spark Interval equivalency ( The new interval types meet or exceed all function of the existing SQL Interval): * Add two new DataType implementations for interval year-month and day-second. Includes the JSON format and DLL string. * Infra support: check the caller sides of DateType/TimestampType * Support the two new interval types in Dataset/UDF. * Interval literals (with a legacy config to still allow mixed year-month day-seconds fields and return legacy interval values) * Interval arithmetic(interval * num, interval / num, interval +/- interval) * Datetime functions/operators: Datetime - Datetime (to days or day second), Datetime +/- interval * Cast to and from the new two interval types, cast string to interval, cast interval to string (pretty printing), with the SQL syntax to specify the types * Support sorting intervals. *Milestone 2* -- Persistence: * Ability to create tables of type interval * Ability to write to common file formats such as Parquet and JSON. * INSERT, SELECT, UPDATE, MERGE * Discovery *Milestone 3* -- Client support * JDBC support * Hive Thrift server *Milestone 4* -- PySpark and Spark R integration * Python UDF can take and return intervals * DataFrame support > Support ANSI SQL INTERVAL types > --- > > Key: SPARK-27790 > URL: https://issues.apache.org/jira/browse/SPARK-27790 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Maxim Gekk >Assignee: Apache Spark >Priority: Major > > Spark has an INTERVAL data type, but it is “broken”: > # It cannot be persisted > # It is not comparable because it crosses the month day line. That is there > is no telling whether “1 Month 1 Day” is equal to “1 Month 1 Day” since not > all months have the same number of days. > I propose here to introduce the two flavours of INTERVAL as described in
[jira] [Updated] (SPARK-27790) Support ANSI SQL INTERVAL types
[ https://issues.apache.org/jira/browse/SPARK-27790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-27790: --- Description: Spark has an INTERVAL data type, but it is “broken”: # It cannot be persisted # It is not comparable because it crosses the month day line. That is there is no telling whether “1 Month 1 Day” is equal to “1 Month 1 Day” since not all months have the same number of days. I propose here to introduce the two flavours of INTERVAL as described in the ANSI SQL Standard and deprecate the Sparks interval type. * ANSI describes two non overlapping “classes”: * YEAR-MONTH, * DAY-SECOND ranges * Members within each class can be compared and sorted. * Supports datetime arithmetic * Can be persisted. The old and new flavors of INTERVAL can coexist until Spark INTERVAL is eventually retired. Also any semantic “breakage” can be controlled via legacy config settings. *Milestone 1* -- Spark Interval equivalency ( The new interval types meet or exceed all function of the existing SQL Interval): * Add two new DataType implementations for interval year-month and day-second. Includes the JSON format and DLL string. * Infra support: check the caller sides of DateType/TimestampType * Support the two new interval types in Dataset/UDF. * Interval literals (with a legacy config to still allow mixed year-month day-seconds fields and return legacy interval values) * Interval arithmetic(interval * num, interval / num, interval +/- interval) * Datetime functions/operators: Datetime - Datetime (to days or day second), Datetime +/- interval * Cast to and from the new two interval types, cast string to interval, cast interval to string (pretty printing), with the SQL syntax to specify the types * Support sorting intervals. *Milestone 2* -- Persistence: * Ability to create tables of type interval * Ability to write to common file formats such as Parquet and JSON. * INSERT, SELECT, UPDATE, MERGE * Discovery *Milestone 3* -- Client support * JDBC support * Hive Thrift server *Milestone 4* -- PySpark and Spark R integration * Python UDF can take and return intervals * DataFrame support was: Spark has an INTERVAL data type, but it is “broken”: # It cannot be persisted # It is not comparable because it crosses the month day line. That is there is no telling whether “1 Month 1 Day” is equal to “1 Month 1 Day” since not all months have the same number of days. I propose here to introduce the two flavours of INTERVAL as described in the ANSI SQL Standard and deprecate the Sparks interval type. * ANSI describes two non overlapping “classes”: * YEAR-MONTH, * DAY-SECOND ranges * Members within each class can be compared and sorted. * Supports datetime arithmetic * Can be persisted. The old and new flavors of INTERVAL can coexist until Spark INTERVAL is eventually retired. Also any semantic “breakage” can be controlled via legacy config settings. *Milestone 1* -- Spark Interval equivalency ( The new interval types meet or exceed all function of the existing SQL Interval): * Add two new DataType implementations for interval year-month and day-second. Includes the JSON format and DLL string. * Infra support: check the caller sides of DateType/TimestampType * Support the two new interval types in Dataset/UDF. * Interval literals (with a legacy config to still allow mixed year-month day-seconds fields and return legacy interval values) * Interval arithmetic(interval * num, interval / num, interval +/- interval) * Datetime functions/operators: Datetime - Datetime (to days or day second), Datetime +/- interval * Cast to and from the new two interval types, cast string to interval, cast interval to string (pretty printing), with the SQL syntax to specify the types * Support sorting intervals. *Milestone 2* -- Persistence: * Ability to create tables of type interval * Ability to write to common file formats such as Parquet and JSON. * INSERT, SELECT, UPDATE, MERGE * Discovery *Milestone 3* -- Client support * JDBC support * Hive Thrift server *Milestone 4* -- PySpark and Spark R integration * Python UDF can take and return intervals * DataFrame support > Support ANSI SQL INTERVAL types > --- > > Key: SPARK-27790 > URL: https://issues.apache.org/jira/browse/SPARK-27790 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Maxim Gekk >Assignee: Apache Spark >Priority: Major > > Spark has an INTERVAL data type, but it is “broken”: > # It cannot be persisted > # It is not comparable because it crosses the month day line. That is there > is no telling whether “1 Month 1 Day” is equal to “1 Month 1 Day” since not > all months have the same number of days. > I propose here to introduce the two flavours of INTERVAL as
[jira] [Updated] (SPARK-27793) Add ANSI SQL day-time and year-month interval types
[ https://issues.apache.org/jira/browse/SPARK-27793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-27793: --- Summary: Add ANSI SQL day-time and year-month interval types (was: Add day-time and year-month interval types) > Add ANSI SQL day-time and year-month interval types > --- > > Key: SPARK-27793 > URL: https://issues.apache.org/jira/browse/SPARK-27793 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Maxim Gekk >Priority: Major > > Extend Catalyst's type system by two new types that conform to the SQL > standard (see SQL:2016, section 4.6.3): > # DayTimeIntervalType represents the day-time interval type, > # YearMonthIntervalType for SQL year-month interval type. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27791) Support SQL year-month INTERVAL type
[ https://issues.apache.org/jira/browse/SPARK-27791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-27791: --- Parent: (was: SPARK-27790) Issue Type: Improvement (was: Sub-task) > Support SQL year-month INTERVAL type > > > Key: SPARK-27791 > URL: https://issues.apache.org/jira/browse/SPARK-27791 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.0 >Reporter: Maxim Gekk >Priority: Major > > The INTERVAL type must conform to SQL year-month INTERVAL type, has 2 logical > types: > # YEAR - Unconstrained except by the leading field precision > # MONTH - Months (within years) (0-11) > And support arithmetic operations involving values of type datetime or > interval. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-27791) Support SQL year-month INTERVAL type
[ https://issues.apache.org/jira/browse/SPARK-27791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk resolved SPARK-27791. Resolution: Won't Fix > Support SQL year-month INTERVAL type > > > Key: SPARK-27791 > URL: https://issues.apache.org/jira/browse/SPARK-27791 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.4.0 >Reporter: Maxim Gekk >Priority: Major > > The INTERVAL type must conform to SQL year-month INTERVAL type, has 2 logical > types: > # YEAR - Unconstrained except by the leading field precision > # MONTH - Months (within years) (0-11) > And support arithmetic operations involving values of type datetime or > interval. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27793) Add day-time and year-month interval types
[ https://issues.apache.org/jira/browse/SPARK-27793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-27793: --- Description: Extend Catalyst's type system by two new types that conform to the SQL standard (see SQL:2016, section 4.6.3): # DayTimeIntervalType represents the day-time interval type, # YearMonthIntervalType for SQL year-month interval type. was: The day-time INTERVAL type contains the following fields: # DAY - Unconstrained except by the leading field precision # HOUR - Hours (within days) (0-23) # MINUTE - Minutes (within hours) (0-59) # SECOND - Seconds and possibly fractions of a second (0-59.999...) The interval type should support all operations defined by SQL standard > Add day-time and year-month interval types > -- > > Key: SPARK-27793 > URL: https://issues.apache.org/jira/browse/SPARK-27793 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Maxim Gekk >Priority: Major > > Extend Catalyst's type system by two new types that conform to the SQL > standard (see SQL:2016, section 4.6.3): > # DayTimeIntervalType represents the day-time interval type, > # YearMonthIntervalType for SQL year-month interval type. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27793) Add day-time and year-month interval types
[ https://issues.apache.org/jira/browse/SPARK-27793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-27793: --- Summary: Add day-time and year-month interval types (was: Support SQL day-time INTERVAL type) > Add day-time and year-month interval types > -- > > Key: SPARK-27793 > URL: https://issues.apache.org/jira/browse/SPARK-27793 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Maxim Gekk >Priority: Major > > The day-time INTERVAL type contains the following fields: > # DAY - Unconstrained except by the leading field precision > # HOUR - Hours (within days) (0-23) > # MINUTE - Minutes (within hours) (0-59) > # SECOND - Seconds and possibly fractions of a second (0-59.999...) > The interval type should support all operations defined by SQL standard -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27793) Support SQL day-time INTERVAL type
[ https://issues.apache.org/jira/browse/SPARK-27793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-27793: --- Affects Version/s: (was: 2.4.3) 3.2.0 > Support SQL day-time INTERVAL type > -- > > Key: SPARK-27793 > URL: https://issues.apache.org/jira/browse/SPARK-27793 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Maxim Gekk >Priority: Major > > The day-time INTERVAL type contains the following fields: > # DAY - Unconstrained except by the leading field precision > # HOUR - Hours (within days) (0-23) > # MINUTE - Minutes (within hours) (0-59) > # SECOND - Seconds and possibly fractions of a second (0-59.999...) > The interval type should support all operations defined by SQL standard -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27790) Support ANSI SQL INTERVAL types
[ https://issues.apache.org/jira/browse/SPARK-27790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-27790: --- Description: Spark has an INTERVAL data type, but it is “broken”: # It cannot be persisted # It is not comparable because it crosses the month day line. That is there is no telling whether “1 Month 1 Day” is equal to “1 Month 1 Day” since not all months have the same number of days. I propose here to introduce the two flavours of INTERVAL as described in the ANSI SQL Standard and deprecate the Sparks interval type. * ANSI describes two non overlapping “classes”: * YEAR-MONTH, * DAY-SECOND ranges * Members within each class can be compared and sorted. * Supports datetime arithmetic * Can be persisted. The old and new flavors of INTERVAL can coexist until Spark INTERVAL is eventually retired. Also any semantic “breakage” can be controlled via legacy config settings. *Milestone 1* -- Spark Interval equivalency ( The new interval types meet or exceed all function of the existing SQL Interval): * Add two new DataType implementations for interval year-month and day-second. Includes the JSON format and DLL string. * Infra support: check the caller sides of DateType/TimestampType * Support the two new interval types in Dataset/UDF. * Interval literals (with a legacy config to still allow mixed year-month day-seconds fields and return legacy interval values) * Interval arithmetic(interval * num, interval / num, interval +/- interval) * Datetime functions/operators: Datetime - Datetime (to days or day second), Datetime +/- interval * Cast to and from the new two interval types, cast string to interval, cast interval to string (pretty printing), with the SQL syntax to specify the types * Support sorting intervals. *Milestone 2* -- Persistence: * Ability to create tables of type interval * Ability to write to common file formats such as Parquet and JSON. * INSERT, SELECT, UPDATE, MERGE * Discovery *Milestone 3* -- Client support * JDBC support * Hive Thrift server *Milestone 4* -- PySpark and Spark R integration * Python UDF can take and return intervals * DataFrame support was: Spark has an INTERVAL data type, but it is “broken”: # It cannot be persisted # It is not comparable because it crosses the month day line. That is there is no telling whether “1 Month 1 Day” is equal to “1 Month 1 Day” since not all months have the same number of days. I propose here to introduce the two flavours of INTERVAL as described in the ANSI SQL Standard and deprecate the Sparks interval type. * ANSI describes two non overlapping “classes”: * YEAR-MONTH, * DAY-SECOND ranges * Members within each class can be compared and sorted. * Supports datetime arithmetic * Can be persisted. The old and new flavors of INTERVAL can coexist until Spark INTERVAL is eventually retired. Also any semantic “breakage” can be controlled via legacy config settings. > Support ANSI SQL INTERVAL types > --- > > Key: SPARK-27790 > URL: https://issues.apache.org/jira/browse/SPARK-27790 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Maxim Gekk >Assignee: Apache Spark >Priority: Major > > Spark has an INTERVAL data type, but it is “broken”: > # It cannot be persisted > # It is not comparable because it crosses the month day line. That is there > is no telling whether “1 Month 1 Day” is equal to “1 Month 1 Day” since not > all months have the same number of days. > I propose here to introduce the two flavours of INTERVAL as described in the > ANSI SQL Standard and deprecate the Sparks interval type. > * ANSI describes two non overlapping “classes”: > * YEAR-MONTH, > * DAY-SECOND ranges > * Members within each class can be compared and sorted. > * Supports datetime arithmetic > * Can be persisted. > The old and new flavors of INTERVAL can coexist until Spark INTERVAL is > eventually retired. Also any semantic “breakage” can be controlled via legacy > config settings. > *Milestone 1* -- Spark Interval equivalency ( The new interval types meet > or exceed all function of the existing SQL Interval): > * Add two new DataType implementations for interval year-month and > day-second. Includes the JSON format and DLL string. > * Infra support: check the caller sides of DateType/TimestampType > * Support the two new interval types in Dataset/UDF. > * Interval literals (with a legacy config to still allow mixed year-month > day-seconds fields and return legacy interval values) > * Interval arithmetic(interval * num, interval / num, interval +/- interval) > * Datetime functions/operators: Datetime - Datetime (to days or day second), > Datetime +/- interval > * Cast to and from the new two interval types, cast string to
[jira] [Updated] (SPARK-27790) Support ANSI SQL INTERVAL types
[ https://issues.apache.org/jira/browse/SPARK-27790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-27790: --- Description: Spark has an INTERVAL data type, but it is “broken”: # It cannot be persisted # It is not comparable because it crosses the month day line. That is there is no telling whether “1 Month 1 Day” is equal to “1 Month 1 Day” since not all months have the same number of days. I propose here to introduce the two flavours of INTERVAL as described in the ANSI SQL Standard and deprecate the Sparks interval type. * ANSI describes two non overlapping “classes”: * YEAR-MONTH, * DAY-SECOND ranges * Members within each class can be compared and sorted. * Supports datetime arithmetic * Can be persisted. The old and new flavors of INTERVAL can coexist until Spark INTERVAL is eventually retired. Also any semantic “breakage” can be controlled via legacy config settings. was: SQL standard defines 2 interval types: # year-month interval contains a YEAR field or a MONTH field or both # day-time interval contains DAY, HOUR, MINUTE, and SECOND (possibly fraction of seconds) Need to add 2 new internal types YearMonthIntervalType and DayTimeIntervalType, support operations defined by SQL standard as well as INTERVAL literals. The java.time.Period and java.time.Duration can be supported as external type for YearMonthIntervalType and DayTimeIntervalType. > Support ANSI SQL INTERVAL types > --- > > Key: SPARK-27790 > URL: https://issues.apache.org/jira/browse/SPARK-27790 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.2.0 >Reporter: Maxim Gekk >Assignee: Apache Spark >Priority: Major > > Spark has an INTERVAL data type, but it is “broken”: > # It cannot be persisted > # It is not comparable because it crosses the month day line. That is there > is no telling whether “1 Month 1 Day” is equal to “1 Month 1 Day” since not > all months have the same number of days. > I propose here to introduce the two flavours of INTERVAL as described in the > ANSI SQL Standard and deprecate the Sparks interval type. > * ANSI describes two non overlapping “classes”: > * YEAR-MONTH, > * DAY-SECOND ranges > * Members within each class can be compared and sorted. > * Supports datetime arithmetic > * Can be persisted. > The old and new flavors of INTERVAL can coexist until Spark INTERVAL is > eventually retired. Also any semantic “breakage” can be controlled via legacy > config settings. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-27790) Support ANSI SQL INTERVAL types
[ https://issues.apache.org/jira/browse/SPARK-27790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-27790: --- Comment: was deleted (was: I am working on it.) > Support ANSI SQL INTERVAL types > --- > > Key: SPARK-27790 > URL: https://issues.apache.org/jira/browse/SPARK-27790 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Maxim Gekk >Assignee: Apache Spark >Priority: Major > > Spark has an INTERVAL data type, but it is “broken”: > # It cannot be persisted > # It is not comparable because it crosses the month day line. That is there > is no telling whether “1 Month 1 Day” is equal to “1 Month 1 Day” since not > all months have the same number of days. > I propose here to introduce the two flavours of INTERVAL as described in the > ANSI SQL Standard and deprecate the Sparks interval type. > * ANSI describes two non overlapping “classes”: > * YEAR-MONTH, > * DAY-SECOND ranges > * Members within each class can be compared and sorted. > * Supports datetime arithmetic > * Can be persisted. > The old and new flavors of INTERVAL can coexist until Spark INTERVAL is > eventually retired. Also any semantic “breakage” can be controlled via legacy > config settings. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27790) Support ANSI SQL INTERVAL types
[ https://issues.apache.org/jira/browse/SPARK-27790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-27790: --- Issue Type: Improvement (was: New Feature) > Support ANSI SQL INTERVAL types > --- > > Key: SPARK-27790 > URL: https://issues.apache.org/jira/browse/SPARK-27790 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Maxim Gekk >Assignee: Apache Spark >Priority: Major > > Spark has an INTERVAL data type, but it is “broken”: > # It cannot be persisted > # It is not comparable because it crosses the month day line. That is there > is no telling whether “1 Month 1 Day” is equal to “1 Month 1 Day” since not > all months have the same number of days. > I propose here to introduce the two flavours of INTERVAL as described in the > ANSI SQL Standard and deprecate the Sparks interval type. > * ANSI describes two non overlapping “classes”: > * YEAR-MONTH, > * DAY-SECOND ranges > * Members within each class can be compared and sorted. > * Supports datetime arithmetic > * Can be persisted. > The old and new flavors of INTERVAL can coexist until Spark INTERVAL is > eventually retired. Also any semantic “breakage” can be controlled via legacy > config settings. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27790) Support ANSI SQL INTERVAL types
[ https://issues.apache.org/jira/browse/SPARK-27790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-27790: --- Summary: Support ANSI SQL INTERVAL types (was: Support SQL INTERVAL types) > Support ANSI SQL INTERVAL types > --- > > Key: SPARK-27790 > URL: https://issues.apache.org/jira/browse/SPARK-27790 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.2.0 >Reporter: Maxim Gekk >Assignee: Apache Spark >Priority: Major > > SQL standard defines 2 interval types: > # year-month interval contains a YEAR field or a MONTH field or both > # day-time interval contains DAY, HOUR, MINUTE, and SECOND (possibly fraction > of seconds) > Need to add 2 new internal types YearMonthIntervalType and > DayTimeIntervalType, support operations defined by SQL standard as well as > INTERVAL literals. > The java.time.Period and java.time.Duration can be supported as external type > for YearMonthIntervalType and DayTimeIntervalType. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34564) DateTimeUtils.fromJavaDate fails for very late dates during casting to Int
[ https://issues.apache.org/jira/browse/SPARK-34564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17292427#comment-17292427 ] Maxim Gekk commented on SPARK-34564: We changed the behavior intentionally because we do believe that it is better to return an error instead of an incorrect result silently. > However, the question is even if such late dates are not supported, could it >fail in more gentle way? How? What would you like to see? > DateTimeUtils.fromJavaDate fails for very late dates during casting to Int > -- > > Key: SPARK-34564 > URL: https://issues.apache.org/jira/browse/SPARK-34564 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.1, 3.2.0, 3.1.2 >Reporter: kondziolka9ld >Priority: Major > > Please consider a following scenario on *spark-3.0.1*: > {code:java} > scala> List(("some date", new Date(Int.MaxValue)), ("some corner case date", > new Date(Long.MaxValue))).toDF > java.lang.RuntimeException: Error while encoding: > java.lang.ArithmeticException: integer overflow > staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, > fromString, knownnotnull(assertnotnull(input[0, scala.Tuple2, true]))._1, > true, false) AS _1#0 > staticinvoke(class org.apache.spark.sql.catalyst.util.DateTimeUtils$, > DateType, fromJavaDate, knownnotnull(assertnotnull(input[0, scala.Tuple2, > true]))._2, true, false) AS _2#1 > at > org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$Serializer.apply(ExpressionEncoder.scala:215) > at > org.apache.spark.sql.SparkSession.$anonfun$createDataset$1(SparkSession.scala:466) > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238) > at scala.collection.immutable.List.foreach(List.scala:392) > at scala.collection.TraversableLike.map(TraversableLike.scala:238) > at scala.collection.TraversableLike.map$(TraversableLike.scala:231) > at scala.collection.immutable.List.map(List.scala:298) > at org.apache.spark.sql.SparkSession.createDataset(SparkSession.scala:466) > at org.apache.spark.sql.SQLContext.createDataset(SQLContext.scala:353) > at > org.apache.spark.sql.SQLImplicits.localSeqToDatasetHolder(SQLImplicits.scala:231) > ... 51 elided > Caused by: java.lang.ArithmeticException: integer overflow > at java.lang.Math.toIntExact(Math.java:1011) > at > org.apache.spark.sql.catalyst.util.DateTimeUtils$.fromJavaDate(DateTimeUtils.scala:111) > at > org.apache.spark.sql.catalyst.util.DateTimeUtils.fromJavaDate(DateTimeUtils.scala) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown > Source) > at > org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$Serializer.apply(ExpressionEncoder.scala:211) > ... 60 more > {code} > In opposition to *spark-2.4.7* where it is possible to create dataframe with > such values: > {code:java} > scala> val df = List(("some date", new Date(Int.MaxValue)), ("some corner > case date", new Date(Long.MaxValue))).toDF > df: org.apache.spark.sql.DataFrame = [_1: string, _2: date]scala> df.show > ++-+ > | _1| _2| > ++-+ > | some date| 1970-01-25| > |some corner case ...|1701498-03-18| > ++-+ > {code} > Anyway, I am aware of the fact that during collecting these data I will got > another result: > {code:java} > scala> df.collect > res10: Array[org.apache.spark.sql.Row] = Array([some date,1970-01-25], [some > corner case date,?498-03-18]) > {code} > what seems to be natural because of behaviour of *java.sql.Date*: > {code:java} > scala> new java.sql.Date(Long.MaxValue) > res1: java.sql.Date = ?994-08-17 > {code} > > > When it comes to easier reproduction, please consider: > {code:java} > scala> org.apache.spark.sql.catalyst.util.DateTimeUtils.fromJavaDate(new > java.sql.Date(Long.MaxValue)) > java.lang.ArithmeticException: integer overflow > at java.lang.Math.toIntExact(Math.java:1011) > at > org.apache.spark.sql.catalyst.util.DateTimeUtils$.fromJavaDate(DateTimeUtils.scala:111) > ... 47 elided > {code} > However, the question is even if such late dates are not supported, could it > fail in more gentle way? > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34561) Cannot drop/add columns from/to a dataset of v2 `DESCRIBE TABLE`
Maxim Gekk created SPARK-34561: -- Summary: Cannot drop/add columns from/to a dataset of v2 `DESCRIBE TABLE` Key: SPARK-34561 URL: https://issues.apache.org/jira/browse/SPARK-34561 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.2.0 Reporter: Maxim Gekk Dropping a column from a dataset of v2 `DESCRIBE TABLE` fails with: {code:java} Resolved attribute(s) col_name#102,data_type#103 missing from col_name#29,data_type#30,comment#31 in operator !Project [col_name#102, data_type#103]. Attribute(s) with the same name appear in the operation: col_name,data_type. Please check if the right attribute(s) are used.; !Project [col_name#102, data_type#103] +- LocalRelation [col_name#29, data_type#30, comment#31]{code} The code below demonstrates the issue: {code:java} val tbl = s"${catalogAndNamespace}tbl" withTable(tbl) { sql(s"CREATE TABLE $tbl (c0 INT) USING $v2Format") val description = sql(s"DESCRIBE TABLE $tbl") val noComment = description.drop("comment") } {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34560) Cannot join datasets of SHOW TABLES
Maxim Gekk created SPARK-34560: -- Summary: Cannot join datasets of SHOW TABLES Key: SPARK-34560 URL: https://issues.apache.org/jira/browse/SPARK-34560 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.2.0 Reporter: Maxim Gekk The example portraits the issue: {code:scala} scala> sql("CREATE NAMESPACE ns1") res8: org.apache.spark.sql.DataFrame = [] scala> sql("CREATE NAMESPACE ns2") res9: org.apache.spark.sql.DataFrame = [] scala> sql("CREATE TABLE ns1.tbl1 (c INT)") res10: org.apache.spark.sql.DataFrame = [] scala> sql("CREATE TABLE ns2.tbl2 (c INT)") res11: org.apache.spark.sql.DataFrame = [] scala> val show1 = sql("SHOW TABLES IN ns1") show1: org.apache.spark.sql.DataFrame = [namespace: string, tableName: string ... 1 more field] scala> val show2 = sql("SHOW TABLES IN ns2") show2: org.apache.spark.sql.DataFrame = [namespace: string, tableName: string ... 1 more field] scala> show1.show +-+-+---+ |namespace|tableName|isTemporary| +-+-+---+ | ns1| tbl1| false| +-+-+---+ scala> show2.show +-+-+---+ |namespace|tableName|isTemporary| +-+-+---+ | ns2| tbl2| false| +-+-+---+ scala> show1.join(show2).where(show1("tableName") =!= show2("tableName")).show org.apache.spark.sql.AnalysisException: Column tableName#17 are ambiguous. It's probably because you joined several Datasets together, and some of these Datasets are the same. This column points to one of the Datasets but Spark is unable to figure out which one. Please alias the Datasets with different names via `Dataset.as` before joining them, and specify the column using qualified name, e.g. `df.as("a").join(df.as("b"), $"a.id" > $"b.id")`. You can also set spark.sql.analyzer.failAmbiguousSelfJoin to false to disable this check. at org.apache.spark.sql.execution.analysis.DetectAmbiguousSelfJoin$.apply(DetectAmbiguousSelfJoin.scala:157) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34447) Refactor the unified v1 and v2 command tests
[ https://issues.apache.org/jira/browse/SPARK-34447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-34447: --- Description: The ticket aims to gather potential improvements for the unified tests. 1. Remove SharedSparkSession from *ParserSuite 2. Rename tests like AlterTableAddPartitionSuite -> AddPartitionsSuite 3. Add JIRA ID SPARK-33829 to "SPARK-33786: Cache's storage level should be respected when a table name is altered" 4. Reset default namespace in ShowTablesSuiteBase."change current catalog and namespace with USE statements" using spark.sessionState.catalogManager.reset() was: The ticket aims to gather potential improvements for the unified tests. 1. Remove SharedSparkSession from *ParserSuite 2. Rename tests like AlterTableAddPartitionSuite -> AddPartitionsSuite 3. Add JIRA ID SPARK-33829 to "SPARK-33786: Cache's storage level should be respected when a table name is altered" > Refactor the unified v1 and v2 command tests > > > Key: SPARK-34447 > URL: https://issues.apache.org/jira/browse/SPARK-34447 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Maxim Gekk >Priority: Minor > > The ticket aims to gather potential improvements for the unified tests. > 1. Remove SharedSparkSession from *ParserSuite > 2. Rename tests like AlterTableAddPartitionSuite -> AddPartitionsSuite > 3. Add JIRA ID SPARK-33829 to "SPARK-33786: Cache's storage level should be > respected when a table name is altered" > 4. Reset default namespace in ShowTablesSuiteBase."change current catalog > and namespace with USE statements" using > spark.sessionState.catalogManager.reset() -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34554) Implement the copy() method in ColumnarMap
Maxim Gekk created SPARK-34554: -- Summary: Implement the copy() method in ColumnarMap Key: SPARK-34554 URL: https://issues.apache.org/jira/browse/SPARK-34554 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.2.0 Reporter: Maxim Gekk Implement ColumnarMap.copy() using ColumnarArray.copy() -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34543) Respect case sensitivity in V1 ALTER TABLE .. SET LOCATION
[ https://issues.apache.org/jira/browse/SPARK-34543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-34543: --- Fix Version/s: (was: 3.0.2) (was: 2.4.8) (was: 3.1.0) > Respect case sensitivity in V1 ALTER TABLE .. SET LOCATION > -- > > Key: SPARK-34543 > URL: https://issues.apache.org/jira/browse/SPARK-34543 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.4.7, 3.0.2, 3.1.1 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > > SHOW PARTITIONS is case sensitive, and doesn't respect the SQL config > *spark.sql.caseSensitive* which is false by default, for instance: > {code:sql} > spark-sql> CREATE TABLE tbl (id INT, part INT) PARTITIONED BY (part); > spark-sql> INSERT INTO tbl PARTITION (part=0) SELECT 0; > spark-sql> SHOW TABLE EXTENDED LIKE 'tbl' PARTITION (part=0); > Location: > file:/Users/maximgekk/proj/set-location-case-sense/spark-warehouse/tbl/part=0 > spark-sql> ALTER TABLE tbl ADD PARTITION (part=1); > spark-sql> SELECT * FROM tbl; > 0 0 > {code} > Create new partition folder in the file system: > {code} > $ cp -r > /Users/maximgekk/proj/set-location-case-sense/spark-warehouse/tbl/part=0 > /Users/maximgekk/proj/set-location-case-sense/spark-warehouse/tbl/aaa > {code} > Set new location for the partition part=1: > {code:sql} > spark-sql> ALTER TABLE tbl PARTITION (part=1) SET LOCATION > '/Users/maximgekk/proj/set-location-case-sense/spark-warehouse/tbl/aaa'; > spark-sql> SELECT * FROM tbl; > 0 0 > 0 1 > spark-sql> ALTER TABLE tbl ADD PARTITION (PART=2); > spark-sql> SELECT * FROM tbl; > 0 0 > 0 1 > {code} > Set location for a partition in the upper case: > {code} > $ cp -r > /Users/maximgekk/proj/set-location-case-sense/spark-warehouse/tbl/part=0 > /Users/maximgekk/proj/set-location-case-sense/spark-warehouse/tbl/bbb > {code} > {code:sql} > spark-sql> ALTER TABLE tbl PARTITION (PART=2) SET LOCATION > '/Users/maximgekk/proj/set-location-case-sense/spark-warehouse/tbl/bbb'; > Error in query: Partition spec is invalid. The spec (PART) must match the > partition spec (part) defined in table '`default`.`tbl`' > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34543) Respect case sensitivity in V1 ALTER TABLE .. SET LOCATION
[ https://issues.apache.org/jira/browse/SPARK-34543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-34543: --- Affects Version/s: (was: 3.0.1) (was: 3.1.0) 3.1.1 3.0.2 > Respect case sensitivity in V1 ALTER TABLE .. SET LOCATION > -- > > Key: SPARK-34543 > URL: https://issues.apache.org/jira/browse/SPARK-34543 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.4.7, 3.0.2, 3.1.1 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > Fix For: 2.4.8, 3.0.2, 3.1.0 > > > SHOW PARTITIONS is case sensitive, and doesn't respect the SQL config > *spark.sql.caseSensitive* which is false by default, for instance: > {code:sql} > spark-sql> CREATE TABLE tbl (id INT, part INT) PARTITIONED BY (part); > spark-sql> INSERT INTO tbl PARTITION (part=0) SELECT 0; > spark-sql> SHOW TABLE EXTENDED LIKE 'tbl' PARTITION (part=0); > Location: > file:/Users/maximgekk/proj/set-location-case-sense/spark-warehouse/tbl/part=0 > spark-sql> ALTER TABLE tbl ADD PARTITION (part=1); > spark-sql> SELECT * FROM tbl; > 0 0 > {code} > Create new partition folder in the file system: > {code} > $ cp -r > /Users/maximgekk/proj/set-location-case-sense/spark-warehouse/tbl/part=0 > /Users/maximgekk/proj/set-location-case-sense/spark-warehouse/tbl/aaa > {code} > Set new location for the partition part=1: > {code:sql} > spark-sql> ALTER TABLE tbl PARTITION (part=1) SET LOCATION > '/Users/maximgekk/proj/set-location-case-sense/spark-warehouse/tbl/aaa'; > spark-sql> SELECT * FROM tbl; > 0 0 > 0 1 > spark-sql> ALTER TABLE tbl ADD PARTITION (PART=2); > spark-sql> SELECT * FROM tbl; > 0 0 > 0 1 > {code} > Set location for a partition in the upper case: > {code} > $ cp -r > /Users/maximgekk/proj/set-location-case-sense/spark-warehouse/tbl/part=0 > /Users/maximgekk/proj/set-location-case-sense/spark-warehouse/tbl/bbb > {code} > {code:sql} > spark-sql> ALTER TABLE tbl PARTITION (PART=2) SET LOCATION > '/Users/maximgekk/proj/set-location-case-sense/spark-warehouse/tbl/bbb'; > Error in query: Partition spec is invalid. The spec (PART) must match the > partition spec (part) defined in table '`default`.`tbl`' > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34543) Respect case sensitivity in V1 ALTER TABLE .. SET LOCATION
[ https://issues.apache.org/jira/browse/SPARK-34543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-34543: --- Description: SHOW PARTITIONS is case sensitive, and doesn't respect the SQL config *spark.sql.caseSensitive* which is false by default, for instance: {code:sql} spark-sql> CREATE TABLE tbl (id INT, part INT) PARTITIONED BY (part); spark-sql> INSERT INTO tbl PARTITION (part=0) SELECT 0; spark-sql> SHOW TABLE EXTENDED LIKE 'tbl' PARTITION (part=0); Location: file:/Users/maximgekk/proj/set-location-case-sense/spark-warehouse/tbl/part=0 spark-sql> ALTER TABLE tbl ADD PARTITION (part=1); spark-sql> SELECT * FROM tbl; 0 0 {code} Create new partition folder in the file system: {code} $ cp -r /Users/maximgekk/proj/set-location-case-sense/spark-warehouse/tbl/part=0 /Users/maximgekk/proj/set-location-case-sense/spark-warehouse/tbl/aaa {code} Set new location for the partition part=1: {code:sql} spark-sql> ALTER TABLE tbl PARTITION (part=1) SET LOCATION '/Users/maximgekk/proj/set-location-case-sense/spark-warehouse/tbl/aaa'; spark-sql> SELECT * FROM tbl; 0 0 0 1 spark-sql> ALTER TABLE tbl ADD PARTITION (PART=2); spark-sql> SELECT * FROM tbl; 0 0 0 1 {code} Set location for a partition in the upper case: {code} $ cp -r /Users/maximgekk/proj/set-location-case-sense/spark-warehouse/tbl/part=0 /Users/maximgekk/proj/set-location-case-sense/spark-warehouse/tbl/bbb {code} {code:sql} spark-sql> ALTER TABLE tbl PARTITION (PART=2) SET LOCATION '/Users/maximgekk/proj/set-location-case-sense/spark-warehouse/tbl/bbb'; Error in query: Partition spec is invalid. The spec (PART) must match the partition spec (part) defined in table '`default`.`tbl`' {code} was: SHOW PARTITIONS is case sensitive, and doesn't respect the SQL config *spark.sql.caseSensitive* which is false by default, for instance: {code:sql} spark-sql> CREATE TABLE tbl1 (price int, qty int, year int, month int) > USING parquet > PARTITIONED BY (year, month); spark-sql> INSERT INTO tbl1 PARTITION(year = 2015, month = 1) SELECT 1, 1; spark-sql> SHOW PARTITIONS tbl1 PARTITION(YEAR = 2015, Month = 1); Error in query: Non-partitioning column(s) [YEAR, Month] are specified for SHOW PARTITIONS; {code} > Respect case sensitivity in V1 ALTER TABLE .. SET LOCATION > -- > > Key: SPARK-34543 > URL: https://issues.apache.org/jira/browse/SPARK-34543 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.4.7, 3.0.1, 3.1.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > Fix For: 2.4.8, 3.0.2, 3.1.0 > > > SHOW PARTITIONS is case sensitive, and doesn't respect the SQL config > *spark.sql.caseSensitive* which is false by default, for instance: > {code:sql} > spark-sql> CREATE TABLE tbl (id INT, part INT) PARTITIONED BY (part); > spark-sql> INSERT INTO tbl PARTITION (part=0) SELECT 0; > spark-sql> SHOW TABLE EXTENDED LIKE 'tbl' PARTITION (part=0); > Location: > file:/Users/maximgekk/proj/set-location-case-sense/spark-warehouse/tbl/part=0 > spark-sql> ALTER TABLE tbl ADD PARTITION (part=1); > spark-sql> SELECT * FROM tbl; > 0 0 > {code} > Create new partition folder in the file system: > {code} > $ cp -r > /Users/maximgekk/proj/set-location-case-sense/spark-warehouse/tbl/part=0 > /Users/maximgekk/proj/set-location-case-sense/spark-warehouse/tbl/aaa > {code} > Set new location for the partition part=1: > {code:sql} > spark-sql> ALTER TABLE tbl PARTITION (part=1) SET LOCATION > '/Users/maximgekk/proj/set-location-case-sense/spark-warehouse/tbl/aaa'; > spark-sql> SELECT * FROM tbl; > 0 0 > 0 1 > spark-sql> ALTER TABLE tbl ADD PARTITION (PART=2); > spark-sql> SELECT * FROM tbl; > 0 0 > 0 1 > {code} > Set location for a partition in the upper case: > {code} > $ cp -r > /Users/maximgekk/proj/set-location-case-sense/spark-warehouse/tbl/part=0 > /Users/maximgekk/proj/set-location-case-sense/spark-warehouse/tbl/bbb > {code} > {code:sql} > spark-sql> ALTER TABLE tbl PARTITION (PART=2) SET LOCATION > '/Users/maximgekk/proj/set-location-case-sense/spark-warehouse/tbl/bbb'; > Error in query: Partition spec is invalid. The spec (PART) must match the > partition spec (part) defined in table '`default`.`tbl`' > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34543) Respect case sensitivity in V1 ALTER TABLE .. SET LOCATION
Maxim Gekk created SPARK-34543: -- Summary: Respect case sensitivity in V1 ALTER TABLE .. SET LOCATION Key: SPARK-34543 URL: https://issues.apache.org/jira/browse/SPARK-34543 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 2.4.7, 3.0.1, 3.1.0 Reporter: Maxim Gekk Assignee: Maxim Gekk Fix For: 2.4.8, 3.0.2, 3.1.0 SHOW PARTITIONS is case sensitive, and doesn't respect the SQL config *spark.sql.caseSensitive* which is false by default, for instance: {code:sql} spark-sql> CREATE TABLE tbl1 (price int, qty int, year int, month int) > USING parquet > PARTITIONED BY (year, month); spark-sql> INSERT INTO tbl1 PARTITION(year = 2015, month = 1) SELECT 1, 1; spark-sql> SHOW PARTITIONS tbl1 PARTITION(YEAR = 2015, Month = 1); Error in query: Non-partitioning column(s) [YEAR, Month] are specified for SHOW PARTITIONS; {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34447) Refactor the unified v1 and v2 command tests
[ https://issues.apache.org/jira/browse/SPARK-34447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-34447: --- Description: The ticket aims to gather potential improvements for the unified tests. 1. Remove SharedSparkSession from *ParserSuite 2. Rename tests like AlterTableAddPartitionSuite -> AddPartitionsSuite 3. Add JIRA ID SPARK-33829 to "SPARK-33786: Cache's storage level should be respected when a table name is altered" was: The ticket aims to gather potential improvements for the unified tests. 1. Remove SharedSparkSession from *ParserSuite 2. Rename tests like AlterTableAddPartitionSuite -> AddPartitionsSuite > Refactor the unified v1 and v2 command tests > > > Key: SPARK-34447 > URL: https://issues.apache.org/jira/browse/SPARK-34447 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Maxim Gekk >Priority: Minor > > The ticket aims to gather potential improvements for the unified tests. > 1. Remove SharedSparkSession from *ParserSuite > 2. Rename tests like AlterTableAddPartitionSuite -> AddPartitionsSuite > 3. Add JIRA ID SPARK-33829 to "SPARK-33786: Cache's storage level should be > respected when a table name is altered" -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34518) Rename `AlterTableRecoverPartitionsCommand` to `RepairTableCommand`
Maxim Gekk created SPARK-34518: -- Summary: Rename `AlterTableRecoverPartitionsCommand` to `RepairTableCommand` Key: SPARK-34518 URL: https://issues.apache.org/jira/browse/SPARK-34518 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.2.0 Reporter: Maxim Gekk `AlterTableRecoverPartitionsCommand` is the execution node for the command `ALTER TABLE .. RECOVER PARTITIONS` which cannot drop/sync partitions. Since `ALTER TABLE .. RECOVER PARTITIONS` is a case of `MSCK REPAIR TABLE`, and `ALTER TABLE .. RECOVER PARTITIONS` does not support any options, it makes sense to rename `AlterTableRecoverPartitionsCommand` to `RepairTableCommand`. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27790) Support SQL INTERVAL types
[ https://issues.apache.org/jira/browse/SPARK-27790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-27790: --- Affects Version/s: (was: 3.1.0) 3.2.0 > Support SQL INTERVAL types > -- > > Key: SPARK-27790 > URL: https://issues.apache.org/jira/browse/SPARK-27790 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.2.0 >Reporter: Maxim Gekk >Priority: Major > > SQL standard defines 2 interval types: > # year-month interval contains a YEAR field or a MONTH field or both > # day-time interval contains DAY, HOUR, MINUTE, and SECOND (possibly fraction > of seconds) > Need to add 2 new internal types YearMonthIntervalType and > DayTimeIntervalType, support operations defined by SQL standard as well as > INTERVAL literals. > The java.time.Period and java.time.Duration can be supported as external type > for YearMonthIntervalType and DayTimeIntervalType. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34447) Refactor the unified v1 and v2 command tests
[ https://issues.apache.org/jira/browse/SPARK-34447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-34447: --- Description: The ticket aims to gather potential improvements for the unified tests. 1. Remove SharedSparkSession from *ParserSuite 2. Rename tests like AlterTableAddPartitionSuite -> AddPartitionsSuite was: The ticket aims to gather potential improvements for the unified tests. 1. Remove SharedSparkSession from *ParserSuite > Refactor the unified v1 and v2 command tests > > > Key: SPARK-34447 > URL: https://issues.apache.org/jira/browse/SPARK-34447 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Maxim Gekk >Priority: Minor > > The ticket aims to gather potential improvements for the unified tests. > 1. Remove SharedSparkSession from *ParserSuite > 2. Rename tests like AlterTableAddPartitionSuite -> AddPartitionsSuite -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34475) Rename v2 logical nodes
[ https://issues.apache.org/jira/browse/SPARK-34475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-34475: --- Description: Rename v2 logical nodes for simplicity in the form: + (was: To be consistent with other exec nodes, rename: * AlterTableAddPartitionExec -> AddPartitionExec * AlterTableRenamePartitionExec -> RenamePartitionExec * AlterTableDropPartitionExec -> DropPartitionExec) > Rename v2 logical nodes > --- > > Key: SPARK-34475 > URL: https://issues.apache.org/jira/browse/SPARK-34475 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > Fix For: 3.2.0 > > > Rename v2 logical nodes for simplicity in the form: + -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34475) Rename v2 logical nodes
Maxim Gekk created SPARK-34475: -- Summary: Rename v2 logical nodes Key: SPARK-34475 URL: https://issues.apache.org/jira/browse/SPARK-34475 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.2.0 Reporter: Maxim Gekk Assignee: Maxim Gekk Fix For: 3.2.0 To be consistent with other exec nodes, rename: * AlterTableAddPartitionExec -> AddPartitionExec * AlterTableRenamePartitionExec -> RenamePartitionExec * AlterTableDropPartitionExec -> DropPartitionExec -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34302) Migrate ALTER TABLE .. CHANGE COLUMN to new resolution framework
[ https://issues.apache.org/jira/browse/SPARK-34302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17286842#comment-17286842 ] Maxim Gekk commented on SPARK-34302: [~imback82] I don't plan to work on this in the near future. Please, feel free to take this. > Migrate ALTER TABLE .. CHANGE COLUMN to new resolution framework > > > Key: SPARK-34302 > URL: https://issues.apache.org/jira/browse/SPARK-34302 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > Fix For: 3.2.0 > > > # Create the Command logical node for ALTER TABLE .. CHANGE COLUMN > # Remove AlterTableAlterColumnStatement > # Remove the check verifyAlterTableType() from run() -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34468) Fix v2 ALTER TABLE .. RENAME TO
Maxim Gekk created SPARK-34468: -- Summary: Fix v2 ALTER TABLE .. RENAME TO Key: SPARK-34468 URL: https://issues.apache.org/jira/browse/SPARK-34468 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.2.0 Reporter: Maxim Gekk The v2 `ALTER TABLE .. RENAME TO` command should rename a table in-place instead of moving it to the "root" namespace: {code:scala} sql("ALTER TABLE ns1.ns2.ns3.src_tbl RENAME TO dst_tbl") sql(s"SHOW TABLES IN $catalog").show(false) +-+-+---+ |namespace|tableName|isTemporary| +-+-+---+ | |dst_tbl |false | +-+-+---+ {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34466) Improve docs for ALTER TABLE .. RENAME TO
Maxim Gekk created SPARK-34466: -- Summary: Improve docs for ALTER TABLE .. RENAME TO Key: SPARK-34466 URL: https://issues.apache.org/jira/browse/SPARK-34466 Project: Spark Issue Type: Documentation Components: SQL Affects Versions: 3.2.0 Reporter: Maxim Gekk The v1 ALTER TABLE .. RENAME TO command can only rename a table in a database but it cannot be used to move the table to another database. We should explicitly document the behaviour. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-34439) Recognize `spark_catalog` in new identifier while view/table renaming
[ https://issues.apache.org/jira/browse/SPARK-34439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk resolved SPARK-34439. Resolution: Won't Fix > Recognize `spark_catalog` in new identifier while view/table renaming > - > > Key: SPARK-34439 > URL: https://issues.apache.org/jira/browse/SPARK-34439 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Maxim Gekk >Priority: Major > > Currently, v1 ALTER TABLE .. RENAME TO doesn't recognize spark_catalog in new > view/table identifiers. The example below demonstrates the issue: > {code:scala} > spark-sql> CREATE DATABASE db; > spark-sql> CREATE TABLE spark_catalog.db.tbl (c0 INT) USING parquet; > spark-sql> INSERT INTO spark_catalog.db.tbl SELECT 0; > spark-sql> SELECT * FROM spark_catalog.db.tbl; > 0 > spark-sql> ALTER TABLE spark_catalog.db.tbl RENAME TO spark_catalog.db.tbl2; > Error in query: spark_catalog.db.tbl2 is not a valid TableIdentifier as it > has more than 2 name parts. > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34465) Rename alter table exec nodes
[ https://issues.apache.org/jira/browse/SPARK-34465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-34465: --- Description: To be consistent with other exec nodes, rename: * AlterTableAddPartitionExec -> AddPartitionExec * AlterTableRenamePartitionExec -> RenamePartitionExec * AlterTableDropPartitionExec -> DropPartitionExec was: To be consistent with other exec nodes, rename: AlterTableAddPartitionExec -> AddPartitionExec AlterTableRenamePartitionExec -> RenamePartitionExec AlterTableDropPartitionExec -> DropPartitionExec > Rename alter table exec nodes > - > > Key: SPARK-34465 > URL: https://issues.apache.org/jira/browse/SPARK-34465 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Maxim Gekk >Priority: Major > > To be consistent with other exec nodes, rename: > * AlterTableAddPartitionExec -> AddPartitionExec > * AlterTableRenamePartitionExec -> RenamePartitionExec > * AlterTableDropPartitionExec -> DropPartitionExec -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34465) Rename alter table exec nodes
Maxim Gekk created SPARK-34465: -- Summary: Rename alter table exec nodes Key: SPARK-34465 URL: https://issues.apache.org/jira/browse/SPARK-34465 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.2.0 Reporter: Maxim Gekk To be consistent with other exec nodes, rename: AlterTableAddPartitionExec -> AddPartitionExec AlterTableRenamePartitionExec -> RenamePartitionExec AlterTableDropPartitionExec -> DropPartitionExec -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34455) Deprecate spark.sql.legacy.replaceDatabricksSparkAvro.enabled
Maxim Gekk created SPARK-34455: -- Summary: Deprecate spark.sql.legacy.replaceDatabricksSparkAvro.enabled Key: SPARK-34455 URL: https://issues.apache.org/jira/browse/SPARK-34455 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.2.0 Reporter: Maxim Gekk Mark spark.sql.legacy.replaceDatabricksSparkAvro.enabled as deprecated, and recommend to use `.format("avro")` in `DataFrameWriter` or `DataFrameReader` instead. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34454) SQL configs from the legacy namespace must be internal
Maxim Gekk created SPARK-34454: -- Summary: SQL configs from the legacy namespace must be internal Key: SPARK-34454 URL: https://issues.apache.org/jira/browse/SPARK-34454 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.0 Reporter: Maxim Gekk Assignee: Maxim Gekk Fix For: 3.0.0 Assumed that legacy SQL configs shouldn't be set by users in common cases. The purpose of the configs is to allow switching to old behavior in corner cases. So, the configs can be marked as internals. The ticket aims to inspect existing SQL configs in SQLConf and add internal() call to config entry builders. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34454) SQL configs from the legacy namespace must be internal
[ https://issues.apache.org/jira/browse/SPARK-34454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-34454: --- Fix Version/s: (was: 3.0.0) > SQL configs from the legacy namespace must be internal > -- > > Key: SPARK-34454 > URL: https://issues.apache.org/jira/browse/SPARK-34454 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Minor > > Assumed that legacy SQL configs shouldn't be set by users in common cases. > The purpose of the configs is to allow switching to old behavior in corner > cases. So, the configs can be marked as internals. The ticket aims to inspect > existing SQL configs in SQLConf and add internal() call to config entry > builders. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34451) Add alternatives for datetime rebasing SQL configs and deprecate legacy configs
Maxim Gekk created SPARK-34451: -- Summary: Add alternatives for datetime rebasing SQL configs and deprecate legacy configs Key: SPARK-34451 URL: https://issues.apache.org/jira/browse/SPARK-34451 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.2.0 Reporter: Maxim Gekk The rebasing SQL configs like spark.sql.legacy.parquet.datetimeRebaseModeInRead can be used not only for migration from previous Spark versions but also to read/write datatime columns saved by other systems/frameworks/libs. The ticket aims to move the configs from the legacy namespace by introducing alternatives (like spark.sql.parquet.datetimeRebaseModeInRead) and deprecating the legacy configs (spark.sql.legacy.parquet.datetimeRebaseModeInRead). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34450) Unify v1 and v2 ALTER TABLE .. RENAME tests
[ https://issues.apache.org/jira/browse/SPARK-34450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-34450: --- Summary: Unify v1 and v2 ALTER TABLE .. RENAME tests (was: Unify v1 and v2 `ALTER TABLE .. RENAME` tests) > Unify v1 and v2 ALTER TABLE .. RENAME tests > --- > > Key: SPARK-34450 > URL: https://issues.apache.org/jira/browse/SPARK-34450 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Maxim Gekk >Priority: Major > > Extract ALTER TABLE .. RENAME tests to the common place to run them for V1 > and v2 datasources. Some tests can be places to V1 and V2 specific test > suites. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34450) Unify v1 and v2 `ALTER TABLE .. RENAME` tests
[ https://issues.apache.org/jira/browse/SPARK-34450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-34450: --- Description: Extract ALTER TABLE .. RENAME tests to the common place to run them for V1 and v2 datasources. Some tests can be places to V1 and V2 specific test suites. > Unify v1 and v2 `ALTER TABLE .. RENAME` tests > - > > Key: SPARK-34450 > URL: https://issues.apache.org/jira/browse/SPARK-34450 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Maxim Gekk >Priority: Major > > Extract ALTER TABLE .. RENAME tests to the common place to run them for V1 > and v2 datasources. Some tests can be places to V1 and V2 specific test > suites. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34450) Unify v1 and v2 `ALTER TABLE .. RENAME` tests
Maxim Gekk created SPARK-34450: -- Summary: Unify v1 and v2 `ALTER TABLE .. RENAME` tests Key: SPARK-34450 URL: https://issues.apache.org/jira/browse/SPARK-34450 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.2.0 Reporter: Maxim Gekk -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33381) Unify DSv1 and DSv2 command tests
[ https://issues.apache.org/jira/browse/SPARK-33381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-33381: --- Affects Version/s: 3.2.0 > Unify DSv1 and DSv2 command tests > - > > Key: SPARK-33381 > URL: https://issues.apache.org/jira/browse/SPARK-33381 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 3.1.0, 3.2.0 >Reporter: Maxim Gekk >Priority: Major > > Create unified test suites for DSv1 and DSv2 commands like CREATE TABLE, SHOW > TABLES and etc. Put datasource specific tests to separate test suites. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34447) Refactor the unified v1 and v2 command tests
Maxim Gekk created SPARK-34447: -- Summary: Refactor the unified v1 and v2 command tests Key: SPARK-34447 URL: https://issues.apache.org/jira/browse/SPARK-34447 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.2.0 Reporter: Maxim Gekk The ticket aims to gather potential improvements for the unified tests. 1. Remove SharedSparkSession from *ParserSuite -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34445) Make `spark.sql.legacy.replaceDatabricksSparkAvro.enabled` as non-internal
Maxim Gekk created SPARK-34445: -- Summary: Make `spark.sql.legacy.replaceDatabricksSparkAvro.enabled` as non-internal Key: SPARK-34445 URL: https://issues.apache.org/jira/browse/SPARK-34445 Project: Spark Issue Type: Documentation Components: SQL Affects Versions: 3.2.0 Reporter: Maxim Gekk The SQL config spark.sql.legacy.replaceDatabricksSparkAvro.enabled has been already documented in Spark SQL guide, in fact. Need to make it non-internal as it is documented publically. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34440) Allow saving/loading datetime in ORC w/o rebasing
[ https://issues.apache.org/jira/browse/SPARK-34440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-34440: --- Affects Version/s: (was: 3.1.0) 3.2.0 > Allow saving/loading datetime in ORC w/o rebasing > - > > Key: SPARK-34440 > URL: https://issues.apache.org/jira/browse/SPARK-34440 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.2.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > > Currently, Spark always performs rebasing of date/timestamp columns in ORC > datasource but this is not required by parquet spec. This tickets aims to > allow users to turn off rebasing via SQL configs or DS options. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34440) Allow saving/loading datetime in ORC w/o rebasing
[ https://issues.apache.org/jira/browse/SPARK-34440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-34440: --- Fix Version/s: (was: 3.2.0) > Allow saving/loading datetime in ORC w/o rebasing > - > > Key: SPARK-34440 > URL: https://issues.apache.org/jira/browse/SPARK-34440 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.1.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > > Currently, Spark always performs rebasing of date/timestamp columns in ORC > datasource but this is not required by parquet spec. This tickets aims to > allow users to turn off rebasing via SQL configs or DS options. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34440) Allow saving/loading datetime in ORC w/o rebasing
[ https://issues.apache.org/jira/browse/SPARK-34440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-34440: --- Fix Version/s: (was: 3.1.0) 3.2.0 > Allow saving/loading datetime in ORC w/o rebasing > - > > Key: SPARK-34440 > URL: https://issues.apache.org/jira/browse/SPARK-34440 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.1.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > Fix For: 3.2.0 > > > Currently, Spark always performs rebasing of INT96 columns in Parquet > datasource but this is not required by parquet spec. This tickets aims to > allow users to turn off rebasing via SQL config. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34440) Allow saving/loading datetime in ORC w/o rebasing
[ https://issues.apache.org/jira/browse/SPARK-34440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-34440: --- Description: Currently, Spark always performs rebasing of date/timestamp columns in ORC datasource but this is not required by parquet spec. This tickets aims to allow users to turn off rebasing via SQL configs or DS options. (was: Currently, Spark always performs rebasing of INT96 columns in Parquet datasource but this is not required by parquet spec. This tickets aims to allow users to turn off rebasing via SQL config.) > Allow saving/loading datetime in ORC w/o rebasing > - > > Key: SPARK-34440 > URL: https://issues.apache.org/jira/browse/SPARK-34440 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.1.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > Fix For: 3.2.0 > > > Currently, Spark always performs rebasing of date/timestamp columns in ORC > datasource but this is not required by parquet spec. This tickets aims to > allow users to turn off rebasing via SQL configs or DS options. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34440) Allow saving/loading datetime in ORC w/o rebasing
Maxim Gekk created SPARK-34440: -- Summary: Allow saving/loading datetime in ORC w/o rebasing Key: SPARK-34440 URL: https://issues.apache.org/jira/browse/SPARK-34440 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 3.1.0 Reporter: Maxim Gekk Assignee: Maxim Gekk Fix For: 3.1.0 Currently, Spark always performs rebasing of INT96 columns in Parquet datasource but this is not required by parquet spec. This tickets aims to allow users to turn off rebasing via SQL config. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34439) Recognize `spark_catalog` in new identifier while view/table renaming
Maxim Gekk created SPARK-34439: -- Summary: Recognize `spark_catalog` in new identifier while view/table renaming Key: SPARK-34439 URL: https://issues.apache.org/jira/browse/SPARK-34439 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.2.0 Reporter: Maxim Gekk Currently, v1 ALTER TABLE .. RENAME TO doesn't recognize spark_catalog in new view/table identifiers. The example below demonstrates the issue: {code:scala} spark-sql> CREATE DATABASE db; spark-sql> CREATE TABLE spark_catalog.db.tbl (c0 INT) USING parquet; spark-sql> INSERT INTO spark_catalog.db.tbl SELECT 0; spark-sql> SELECT * FROM spark_catalog.db.tbl; 0 spark-sql> ALTER TABLE spark_catalog.db.tbl RENAME TO spark_catalog.db.tbl2; Error in query: spark_catalog.db.tbl2 is not a valid TableIdentifier as it has more than 2 name parts. {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34437) Update Spark SQL guide about rebase DS options and SQL configs
Maxim Gekk created SPARK-34437: -- Summary: Update Spark SQL guide about rebase DS options and SQL configs Key: SPARK-34437 URL: https://issues.apache.org/jira/browse/SPARK-34437 Project: Spark Issue Type: Documentation Components: SQL Affects Versions: 3.2.0 Reporter: Maxim Gekk Describe the following SQL configs: * spark.sql.legacy.parquet.int96RebaseModeInWrite * spark.sql.legacy.parquet.datetimeRebaseModeInWrite * spark.sql.legacy.parquet.int96RebaseModeInRead * spark.sql.legacy.parquet.datetimeRebaseModeInRead * spark.sql.legacy.avro.datetimeRebaseModeInWrite * spark.sql.legacy.avro.datetimeRebaseModeInRead And Avro/Parquet options datetimeRebaseMode and int96RebaseMode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34434) Mention DS rebase options in SparkUpgradeException
Maxim Gekk created SPARK-34434: -- Summary: Mention DS rebase options in SparkUpgradeException Key: SPARK-34434 URL: https://issues.apache.org/jira/browse/SPARK-34434 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.2.0 Reporter: Maxim Gekk Mention the DS options added by SPARK-34404 and SPARK-34377 in SparkUpgradeException. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34431) Only load hive-site.xml once
Maxim Gekk created SPARK-34431: -- Summary: Only load hive-site.xml once Key: SPARK-34431 URL: https://issues.apache.org/jira/browse/SPARK-34431 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.2.0 Reporter: Maxim Gekk Hive configs from hive-site.xml are parsed over and over again. We can optimize this, and parse it only once. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34424) HiveOrcHadoopFsRelationSuite fails with seed 610710213676
Maxim Gekk created SPARK-34424: -- Summary: HiveOrcHadoopFsRelationSuite fails with seed 610710213676 Key: SPARK-34424 URL: https://issues.apache.org/jira/browse/SPARK-34424 Project: Spark Issue Type: Test Components: SQL Affects Versions: 3.0.2, 3.2.0, 3.1.1 Reporter: Maxim Gekk The test "test all data types" in HiveOrcHadoopFsRelationSuite fails with: {code:java} == Results == !== Correct Answer - 20 ==== Spark Answer - 20 == struct struct [1,1582-10-15] [1,1582-10-15] [2,null] [2,null] [3,1970-01-01] [3,1970-01-01] [4,1681-08-06] [4,1681-08-06] [5,1582-10-15] [5,1582-10-15] [6,-12-31] [6,-12-31] [7,0583-01-04] [7,0583-01-04] [8,6077-03-04] [8,6077-03-04] ![9,1582-10-06] [9,1582-10-15] [10,1582-10-15] [10,1582-10-15] [11,-12-31] [11,-12-31] [12,9722-10-04] [12,9722-10-04] [13,0243-12-19] [13,0243-12-19] [14,-12-31] [14,-12-31] [15,8743-01-24] [15,8743-01-24] [16,1039-10-31] [16,1039-10-31] [17,-12-31] [17,-12-31] [18,1582-10-15] [18,1582-10-15] [19,1582-10-15] [19,1582-10-15] [20,1582-10-15] [20,1582-10-15] {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34418) Check v1 TRUNCATE TABLE preserves partitions
Maxim Gekk created SPARK-34418: -- Summary: Check v1 TRUNCATE TABLE preserves partitions Key: SPARK-34418 URL: https://issues.apache.org/jira/browse/SPARK-34418 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.2.0 Reporter: Maxim Gekk Add a test which checks TRUNCATE TABLE only removes rows and preserves existing partitions. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-34392) Invalid ID for offset-based ZoneId since Spark 3.0
[ https://issues.apache.org/jira/browse/SPARK-34392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17281838#comment-17281838 ] Maxim Gekk edited comment on SPARK-34392 at 2/9/21, 3:26 PM: - The "GMT+8:00" string is unsupported format in 3.0, see docs for the to_utc_timestamp() function (https://github.com/apache/spark/blob/30468a901577e82c855fbc4cb78e1b869facb44c/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L3397-L3402): {code:scala} @param tz A string detailing the time zone ID that the input should be adjusted to. It should be in the format of either region-based zone IDs or zone offsets. Region IDs must have the form 'area/city', such as 'America/Los_Angeles'. Zone offsets must be in the format '(+|-)HH:mm', for example '-08:00' or '+01:00'. Also 'UTC' and 'Z' are supported as aliases of '+00:00'. Other short names are not recommended to use because they can be ambiguous. {code} was (Author: maxgekk): The "GMT+8:00" string is unsupported format in 3.0, see docs for the to_utc_timestamp() function: {code:scala} * @param tz A string detailing the time zone ID that the input should be adjusted to. It should * be in the format of either region-based zone IDs or zone offsets. Region IDs must * have the form 'area/city', such as 'America/Los_Angeles'. Zone offsets must be in * the format '(+|-)HH:mm', for example '-08:00' or '+01:00'. Also 'UTC' and 'Z' are * supported as aliases of '+00:00'. Other short names are not recommended to use * because they can be ambiguous. {code} > Invalid ID for offset-based ZoneId since Spark 3.0 > -- > > Key: SPARK-34392 > URL: https://issues.apache.org/jira/browse/SPARK-34392 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0, 3.0.1 >Reporter: Yuming Wang >Priority: Major > > How to reproduce this issue: > {code:sql} > select to_utc_timestamp("2020-02-07 16:00:00", "GMT+8:00"); > {code} > Spark 2.4: > {noformat} > spark-sql> select to_utc_timestamp("2020-02-07 16:00:00", "GMT+8:00"); > 2020-02-07 08:00:00 > Time taken: 0.089 seconds, Fetched 1 row(s) > {noformat} > Spark 3.x: > {noformat} > spark-sql> select to_utc_timestamp("2020-02-07 16:00:00", "GMT+8:00"); > 21/02/07 01:24:32 ERROR SparkSQLDriver: Failed in [select > to_utc_timestamp("2020-02-07 16:00:00", "GMT+8:00")] > java.time.DateTimeException: Invalid ID for offset-based ZoneId: GMT+8:00 > at java.time.ZoneId.ofWithPrefix(ZoneId.java:437) > at java.time.ZoneId.of(ZoneId.java:407) > at java.time.ZoneId.of(ZoneId.java:359) > at java.time.ZoneId.of(ZoneId.java:315) > at > org.apache.spark.sql.catalyst.util.DateTimeUtils$.getZoneId(DateTimeUtils.scala:53) > at > org.apache.spark.sql.catalyst.util.DateTimeUtils$.toUTCTime(DateTimeUtils.scala:814) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34392) Invalid ID for offset-based ZoneId since Spark 3.0
[ https://issues.apache.org/jira/browse/SPARK-34392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17281838#comment-17281838 ] Maxim Gekk commented on SPARK-34392: The "GMT+8:00" string is unsupported format in 3.0, see docs for the to_utc_timestamp() function: {code:scala} * @param tz A string detailing the time zone ID that the input should be adjusted to. It should * be in the format of either region-based zone IDs or zone offsets. Region IDs must * have the form 'area/city', such as 'America/Los_Angeles'. Zone offsets must be in * the format '(+|-)HH:mm', for example '-08:00' or '+01:00'. Also 'UTC' and 'Z' are * supported as aliases of '+00:00'. Other short names are not recommended to use * because they can be ambiguous. {code} > Invalid ID for offset-based ZoneId since Spark 3.0 > -- > > Key: SPARK-34392 > URL: https://issues.apache.org/jira/browse/SPARK-34392 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0, 3.0.1 >Reporter: Yuming Wang >Priority: Major > > How to reproduce this issue: > {code:sql} > select to_utc_timestamp("2020-02-07 16:00:00", "GMT+8:00"); > {code} > Spark 2.4: > {noformat} > spark-sql> select to_utc_timestamp("2020-02-07 16:00:00", "GMT+8:00"); > 2020-02-07 08:00:00 > Time taken: 0.089 seconds, Fetched 1 row(s) > {noformat} > Spark 3.x: > {noformat} > spark-sql> select to_utc_timestamp("2020-02-07 16:00:00", "GMT+8:00"); > 21/02/07 01:24:32 ERROR SparkSQLDriver: Failed in [select > to_utc_timestamp("2020-02-07 16:00:00", "GMT+8:00")] > java.time.DateTimeException: Invalid ID for offset-based ZoneId: GMT+8:00 > at java.time.ZoneId.ofWithPrefix(ZoneId.java:437) > at java.time.ZoneId.of(ZoneId.java:407) > at java.time.ZoneId.of(ZoneId.java:359) > at java.time.ZoneId.of(ZoneId.java:315) > at > org.apache.spark.sql.catalyst.util.DateTimeUtils$.getZoneId(DateTimeUtils.scala:53) > at > org.apache.spark.sql.catalyst.util.DateTimeUtils$.toUTCTime(DateTimeUtils.scala:814) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34404) Support Avro datasource options to control datetime rebasing in read
[ https://issues.apache.org/jira/browse/SPARK-34404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-34404: --- Description: Add new Avro option similar to the SQL configs {{spark.sql.legacy.parquet.datetimeRebaseModeInRead}}{{.}} (was: Add new parquet options similar to the SQL configs {{spark.sql.legacy.parquet.datetimeRebaseModeInRead}} and {{spark.sql.legacy.parquet.int96RebaseModeInRead.}}) > Support Avro datasource options to control datetime rebasing in read > > > Key: SPARK-34404 > URL: https://issues.apache.org/jira/browse/SPARK-34404 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.2.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > Fix For: 3.2.0 > > > Add new Avro option similar to the SQL configs > {{spark.sql.legacy.parquet.datetimeRebaseModeInRead}}{{.}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34404) Support Avro datasource options to control datetime rebasing in read
Maxim Gekk created SPARK-34404: -- Summary: Support Avro datasource options to control datetime rebasing in read Key: SPARK-34404 URL: https://issues.apache.org/jira/browse/SPARK-34404 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 3.2.0 Reporter: Maxim Gekk Assignee: Maxim Gekk Fix For: 3.2.0 Add new parquet options similar to the SQL configs {{spark.sql.legacy.parquet.datetimeRebaseModeInRead}} and {{spark.sql.legacy.parquet.int96RebaseModeInRead.}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34401) Update public docs about altering cached tables/views
Maxim Gekk created SPARK-34401: -- Summary: Update public docs about altering cached tables/views Key: SPARK-34401 URL: https://issues.apache.org/jira/browse/SPARK-34401 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.2.0 Reporter: Maxim Gekk -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34397) Support v2 `MSCK REPAIR TABLE`
Maxim Gekk created SPARK-34397: -- Summary: Support v2 `MSCK REPAIR TABLE` Key: SPARK-34397 URL: https://issues.apache.org/jira/browse/SPARK-34397 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.2.0 Reporter: Maxim Gekk Implement the `MSCK REPAIR TABLE` command for tables from v2 catalogs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34386) "Proleptic" date off by 10 days when returned by .collectAsList
[ https://issues.apache.org/jira/browse/SPARK-34386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17280269#comment-17280269 ] Maxim Gekk commented on SPARK-34386: [~bysza] You can find more details in the blog post: https://databricks.com/blog/2020/07/22/a-comprehensive-look-at-dates-and-timestamps-in-apache-spark-3-0.html > "Proleptic" date off by 10 days when returned by .collectAsList > --- > > Key: SPARK-34386 > URL: https://issues.apache.org/jira/browse/SPARK-34386 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.1 > Environment: Windows 10 >Reporter: Marek Byszewski >Priority: Major > > Run the following commands using Spark 3.0.1: > {{scala> spark.sql("select to_timestamp('1582-10-05 02:12:34.997') as > data_console").show(false)}} > {{+---+}} > {{|data_console |}} > {{+---+}} > {{|*1582-10-05 02:12:34.997*|}} > {{+---+}} > {{scala> spark.sql("select to_timestamp('1582-10-05 02:12:34.997') as > data_console")}} > {{res3: org.apache.spark.sql.DataFrame = [data_console: timestamp]}} > {{scala> res3.collectAsList}} > {{res4: java.util.List[org.apache.spark.sql.Row] = > [[*1582-10-{color:#FF}15{color} 02:12:34.997*]]}} > Notice that the returned date is off by 10 days compared to the date returned > by the first command. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34386) "Proleptic" date off by 10 days when returned by .collectAsList
[ https://issues.apache.org/jira/browse/SPARK-34386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17280268#comment-17280268 ] Maxim Gekk commented on SPARK-34386: [~bysza] Thanks for the ping. This is expected behavior, actually. The collectAsList() method converts internal timestamp values (in the Proleptic Gregorian calendar) to java.sql.Timestamp which is based on the hybrid calendar (Julian + Proleptic Gregorian calendars). The timestamp from your example doesn't exist in the hybrid calendar, so, Spark shifts it to the closest valid date which is 1582-10-15. If you want to receive timestamps AS IS from collectAsList(), please, switch to Java 8 types via *spark.sql.datetime.java8API.enabled*. > "Proleptic" date off by 10 days when returned by .collectAsList > --- > > Key: SPARK-34386 > URL: https://issues.apache.org/jira/browse/SPARK-34386 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.1 > Environment: Windows 10 >Reporter: Marek Byszewski >Priority: Major > > Run the following commands using Spark 3.0.1: > {{scala> spark.sql("select to_timestamp('1582-10-05 02:12:34.997') as > data_console").show(false)}} > {{+---+}} > {{|data_console |}} > {{+---+}} > {{|*1582-10-05 02:12:34.997*|}} > {{+---+}} > {{scala> spark.sql("select to_timestamp('1582-10-05 02:12:34.997') as > data_console")}} > {{res3: org.apache.spark.sql.DataFrame = [data_console: timestamp]}} > {{scala> res3.collectAsList}} > {{res4: java.util.List[org.apache.spark.sql.Row] = > [[*1582-10-{color:#FF}15{color} 02:12:34.997*]]}} > Notice that the returned date is off by 10 days compared to the date returned > by the first command. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34385) Unwrap SparkUpgradeException in v2 Parquet datasource
[ https://issues.apache.org/jira/browse/SPARK-34385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-34385: --- Summary: Unwrap SparkUpgradeException in v2 Parquet datasource (was: Unwrap SparkUpgradeException in v1 Parquet datasource) > Unwrap SparkUpgradeException in v2 Parquet datasource > - > > Key: SPARK-34385 > URL: https://issues.apache.org/jira/browse/SPARK-34385 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Maxim Gekk >Priority: Major > > Unwrap SparkUpgradeException in FilePartitionReader, and throw it as caused > one of SparkException. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34385) Unwrap SparkUpgradeException in v1 Parquet datasource
Maxim Gekk created SPARK-34385: -- Summary: Unwrap SparkUpgradeException in v1 Parquet datasource Key: SPARK-34385 URL: https://issues.apache.org/jira/browse/SPARK-34385 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.2.0 Reporter: Maxim Gekk Unwrap SparkUpgradeException in FilePartitionReader, and throw it as caused one of SparkException. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34377) Support parquet datasource options to control datetime rebasing in read
Maxim Gekk created SPARK-34377: -- Summary: Support parquet datasource options to control datetime rebasing in read Key: SPARK-34377 URL: https://issues.apache.org/jira/browse/SPARK-34377 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 3.2.0 Reporter: Maxim Gekk Add new parquet options similar to the SQL configs {{spark.sql.legacy.parquet.datetimeRebaseModeInRead}} and {{spark.sql.legacy.parquet.int96RebaseModeInRead.}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34371) Run datetime rebasing tests for parquet DSv1 and DSv2
Maxim Gekk created SPARK-34371: -- Summary: Run datetime rebasing tests for parquet DSv1 and DSv2 Key: SPARK-34371 URL: https://issues.apache.org/jira/browse/SPARK-34371 Project: Spark Issue Type: Test Components: SQL Affects Versions: 3.2.0 Reporter: Maxim Gekk Extract datetime rebasing tests from ParquetIOSuite and place them a separate test suite to run it for both implementations DS v1 and v2. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34360) Support table truncation by v2 Table Catalogs
[ https://issues.apache.org/jira/browse/SPARK-34360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-34360: --- Description: Add new method `truncateTable` to the TableCatalog interface with default implementation. And implement this method in InMemoryTableCatalog. (was: Add new method `truncatePartition` in `SupportsPartitionManagement` and `truncatePartitions` in `SupportsAtomicPartitionManagement`.) > Support table truncation by v2 Table Catalogs > - > > Key: SPARK-34360 > URL: https://issues.apache.org/jira/browse/SPARK-34360 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > Fix For: 3.2.0 > > > Add new method `truncateTable` to the TableCatalog interface with default > implementation. And implement this method in InMemoryTableCatalog. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34360) Support table truncation by v2 Table Catalogs
Maxim Gekk created SPARK-34360: -- Summary: Support table truncation by v2 Table Catalogs Key: SPARK-34360 URL: https://issues.apache.org/jira/browse/SPARK-34360 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.2.0 Reporter: Maxim Gekk Assignee: Maxim Gekk Fix For: 3.2.0 Add new method `truncatePartition` in `SupportsPartitionManagement` and `truncatePartitions` in `SupportsAtomicPartitionManagement`. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34332) Unify v1 and v2 ALTER TABLE .. SET LOCATION tests
[ https://issues.apache.org/jira/browse/SPARK-34332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-34332: --- Description: Extract ALTER TABLE .. SET LOCATION tests to the common place to run them for V1 and v2 datasources. Some tests can be places to V1 and V2 specific test suites. (was: Extract ALTER TABLE .. SET SERDE tests to the common place to run them for V1 and v2 datasources. Some tests can be places to V1 and V2 specific test suites.) > Unify v1 and v2 ALTER TABLE .. SET LOCATION tests > - > > Key: SPARK-34332 > URL: https://issues.apache.org/jira/browse/SPARK-34332 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > Fix For: 3.2.0 > > > Extract ALTER TABLE .. SET LOCATION tests to the common place to run them for > V1 and v2 datasources. Some tests can be places to V1 and V2 specific test > suites. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34332) Unify v1 and v2 ALTER TABLE .. SET LOCATION tests
Maxim Gekk created SPARK-34332: -- Summary: Unify v1 and v2 ALTER TABLE .. SET LOCATION tests Key: SPARK-34332 URL: https://issues.apache.org/jira/browse/SPARK-34332 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.2.0 Reporter: Maxim Gekk Assignee: Maxim Gekk Fix For: 3.2.0 Extract ALTER TABLE .. SET SERDE tests to the common place to run them for V1 and v2 datasources. Some tests can be places to V1 and V2 specific test suites. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34314) Wrong discovered partition value
[ https://issues.apache.org/jira/browse/SPARK-34314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-34314: --- Affects Version/s: 3.1.0 3.0.2 2.4.8 > Wrong discovered partition value > > > Key: SPARK-34314 > URL: https://issues.apache.org/jira/browse/SPARK-34314 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.8, 3.0.2, 3.1.0, 3.2.0 >Reporter: Maxim Gekk >Priority: Major > > The example below portraits the issue: > {code:scala} > val df = Seq((0, "AA"), (1, "-0")).toDF("id", "part") > df.write > .partitionBy("part") > .format("parquet") > .save(path) > val readback = spark.read.parquet(path) > readback.printSchema() > readback.show(false) > {code} > It write the partition value as string: > {code} > /private/var/folders/p3/dfs6mf655d7fnjrsjvldh0tcgn/T/spark-e09eae99-7ecf-4ab2-b99b-f63f8dea658d > ├── _SUCCESS > ├── part=-0 > │ └── part-1-02144398-2896-4d21-9628-a8743d098cb4.c000.snappy.parquet > └── part=AA > └── part-0-02144398-2896-4d21-9628-a8743d098cb4.c000.snappy.parquet > {code} > *"-0"* and "AA". > but when Spark reads data back, it transforms "-0" to "0" > {code} > root > |-- id: integer (nullable = true) > |-- part: string (nullable = true) > +---++ > |id |part| > +---++ > |0 |AA | > |1 |0 | > +---++ > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34314) Wrong discovered partition value
Maxim Gekk created SPARK-34314: -- Summary: Wrong discovered partition value Key: SPARK-34314 URL: https://issues.apache.org/jira/browse/SPARK-34314 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.2.0 Reporter: Maxim Gekk The example below portraits the issue: {code:scala} val df = Seq((0, "AA"), (1, "-0")).toDF("id", "part") df.write .partitionBy("part") .format("parquet") .save(path) val readback = spark.read.parquet(path) readback.printSchema() readback.show(false) {code} It write the partition value as string: {code} /private/var/folders/p3/dfs6mf655d7fnjrsjvldh0tcgn/T/spark-e09eae99-7ecf-4ab2-b99b-f63f8dea658d ├── _SUCCESS ├── part=-0 │ └── part-1-02144398-2896-4d21-9628-a8743d098cb4.c000.snappy.parquet └── part=AA └── part-0-02144398-2896-4d21-9628-a8743d098cb4.c000.snappy.parquet {code} *"-0"* and "AA". but when Spark reads data back, it transforms "-0" to "0" {code} root |-- id: integer (nullable = true) |-- part: string (nullable = true) +---++ |id |part| +---++ |0 |AA | |1 |0 | +---++ {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34312) Support partition truncation by `SupportsPartitionManagement`
[ https://issues.apache.org/jira/browse/SPARK-34312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-34312: --- Description: Add new method `truncatePartition` in `SupportsPartitionManagement` and `truncatePartitions` in `SupportsAtomicPartitionManagement`. (was: Add new method `purgePartition` in `SupportsPartitionManagement` and `purgePartitions` in `SupportsAtomicPartitionManagement`.) > Support partition truncation by `SupportsPartitionManagement` > - > > Key: SPARK-34312 > URL: https://issues.apache.org/jira/browse/SPARK-34312 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > Fix For: 3.2.0 > > > Add new method `truncatePartition` in `SupportsPartitionManagement` and > `truncatePartitions` in `SupportsAtomicPartitionManagement`. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34312) Support partition truncation by `SupportsPartitionManagement`
Maxim Gekk created SPARK-34312: -- Summary: Support partition truncation by `SupportsPartitionManagement` Key: SPARK-34312 URL: https://issues.apache.org/jira/browse/SPARK-34312 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.2.0 Reporter: Maxim Gekk Assignee: Maxim Gekk Fix For: 3.2.0 Add new method `purgePartition` in `SupportsPartitionManagement` and `purgePartitions` in `SupportsAtomicPartitionManagement`. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org