from:"Maxim Gekk \(Jira\)"

[jira] [Created] (SPARK-34727) Difference in results of casting float to timestamp

2021-03-12 Thread Maxim Gekk (Jira)

Maxim Gekk created SPARK-34727:
--

 Summary: Difference in results of casting float to timestamp
 Key: SPARK-34727
 URL: https://issues.apache.org/jira/browse/SPARK-34727
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.2.0
Reporter: Maxim Gekk


The code below portraits the issue:
{code:sql}
spark-sql> CREATE TEMP VIEW v1 AS SELECT 16777215.0f AS f;
spark-sql> SELECT * FROM v1;
1.6777215E7
spark-sql> SELECT CAST(f AS TIMESTAMP) FROM v1;
1970-07-14 07:20:15
spark-sql> CACHE TABLE v1;
spark-sql> SELECT * FROM v1;
1.6777215E7
spark-sql> SELECT CAST(f AS TIMESTAMP) FROM v1;
1970-07-14 07:20:14.951424
{code}

The result from the cached view *1970-07-14 07:20:14.951424* is different from 
un-cached view *1970-07-14 07:20:15*.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34727) Difference in results of casting float to timestamp

2021-03-12 Thread Maxim Gekk (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17300263#comment-17300263
 ] 

Maxim Gekk commented on SPARK-34727:


I am working on a fix.

> Difference in results of casting float to timestamp
> ---
>
> Key: SPARK-34727
> URL: https://issues.apache.org/jira/browse/SPARK-34727
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Priority: Major
>
> The code below portraits the issue:
> {code:sql}
> spark-sql> CREATE TEMP VIEW v1 AS SELECT 16777215.0f AS f;
> spark-sql> SELECT * FROM v1;
> 1.6777215E7
> spark-sql> SELECT CAST(f AS TIMESTAMP) FROM v1;
> 1970-07-14 07:20:15
> spark-sql> CACHE TABLE v1;
> spark-sql> SELECT * FROM v1;
> 1.6777215E7
> spark-sql> SELECT CAST(f AS TIMESTAMP) FROM v1;
> 1970-07-14 07:20:14.951424
> {code}
> The result from the cached view *1970-07-14 07:20:14.951424* is different 
> from un-cached view *1970-07-14 07:20:15*.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34721) Add an year-month interval to a date

2021-03-11 Thread Maxim Gekk (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17299834#comment-17299834
 ] 

Maxim Gekk commented on SPARK-34721:


I am working on this.

> Add an year-month interval to a date
> 
>
> Key: SPARK-34721
> URL: https://issues.apache.org/jira/browse/SPARK-34721
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Priority: Major
>
> Support adding of YearMonthIntervalType values to DATE values.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34721) Add an year-month interval to a date

2021-03-11 Thread Maxim Gekk (Jira)

Maxim Gekk created SPARK-34721:
--

 Summary: Add an year-month interval to a date
 Key: SPARK-34721
 URL: https://issues.apache.org/jira/browse/SPARK-34721
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Maxim Gekk


Support adding of YearMonthIntervalType values to DATE values.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34718) Assign pretty names to YearMonthIntervalType and DayTimeIntervalType

2021-03-11 Thread Maxim Gekk (Jira)

Maxim Gekk created SPARK-34718:
--

 Summary: Assign pretty names to YearMonthIntervalType and 
DayTimeIntervalType
 Key: SPARK-34718
 URL: https://issues.apache.org/jira/browse/SPARK-34718
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Maxim Gekk


Override the typeName() method in YearMonthIntervalType and 
DayTimeIntervalType, and assign names according to the SQL standard.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34716) Support ANSI SQL intervals by the aggregate function `sum`

2021-03-11 Thread Maxim Gekk (Jira)

Maxim Gekk created SPARK-34716:
--

 Summary: Support ANSI SQL intervals by the aggregate function `sum`
 Key: SPARK-34716
 URL: https://issues.apache.org/jira/browse/SPARK-34716
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Maxim Gekk


Extend org.apache.spark.sql.catalyst.expressions.aggregate.Sum to support 
DayTimeIntervalType and YearMonthIntervalType.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34715) Add round trip tests for period <-> month and duration <-> micros

2021-03-11 Thread Maxim Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-34715:
---
Description: 
Similarly to the test from the PR https://github.com/apache/spark/pull/31799, 
add tests:
1. Months -> Period -> Months
2. Period -> Months -> Period
3. Duration -> micros -> Duration
 

> Add round trip tests for period <-> month and duration <-> micros
> -
>
> Key: SPARK-34715
> URL: https://issues.apache.org/jira/browse/SPARK-34715
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Priority: Major
>
> Similarly to the test from the PR https://github.com/apache/spark/pull/31799, 
> add tests:
> 1. Months -> Period -> Months
> 2. Period -> Months -> Period
> 3. Duration -> micros -> Duration
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34715) Add round trip tests for period <-> month and duration <-> micros

2021-03-11 Thread Maxim Gekk (Jira)

Maxim Gekk created SPARK-34715:
--

 Summary: Add round trip tests for period <-> month and duration 
<-> micros
 Key: SPARK-34715
 URL: https://issues.apache.org/jira/browse/SPARK-34715
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Maxim Gekk






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34675) TimeZone inconsistencies when JVM and session timezones are different

2021-03-10 Thread Maxim Gekk (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17299345#comment-17299345
 ] 

Maxim Gekk commented on SPARK-34675:


> Could you link the original related patch and close this issue, [~maxgekk]?
 
I think the issue has been fixed by multiple commits for sub-tasks of 
https://issues.apache.org/jira/browse/SPARK-26651, 
https://issues.apache.org/jira/browse/SPARK-31404 & 
https://issues.apache.org/jira/browse/SPARK-30951 . It is hard to identify 
particular patches that fix the issue.

> TimeZone inconsistencies when JVM and session timezones are different
> -
>
> Key: SPARK-34675
> URL: https://issues.apache.org/jira/browse/SPARK-34675
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.7
>Reporter: Shubham Chaurasia
>Priority: Major
>
> Inserted following data with UTC as both JVM and session timezone.
> Spark-shell launch command
> {code}
> bin/spark-shell --conf spark.hadoop.metastore.catalog.default=hive --conf 
> spark.sql.catalogImplementation=hive --conf 
> spark.hadoop.hive.metastore.uris=thrift://localhost:9083 --conf 
> spark.driver.extraJavaOptions=' -Duser.timezone=UTC' --conf 
> spark.executor.extraJavaOptions='-Duser.timezone=UTC'
> {code}
> Table creation  
> {code:scala}
> sql("use ts").show
> sql("create table spark_parquet(type string, t timestamp) stored as 
> parquet").show
> sql("create table spark_orc(type string, t timestamp) stored as orc").show
> sql("create table spark_avro(type string, t timestamp) stored as avro").show
> sql("create table spark_text(type string, t timestamp) stored as 
> textfile").show
> sql("insert into spark_parquet values ('FROM SPARK-EXT PARQUET', '1989-01-05 
> 01:02:03')").show
> sql("insert into spark_orc values ('FROM SPARK-EXT ORC', '1989-01-05 
> 01:02:03')").show
> sql("insert into spark_avro values ('FROM SPARK-EXT AVRO', '1989-01-05 
> 01:02:03')").show
> sql("insert into spark_text values ('FROM SPARK-EXT TEXT', '1989-01-05 
> 01:02:03')").show
> {code}
> Used following function to check and verify the returned timestamps
> {code:scala}
> scala> :paste
> // Entering paste mode (ctrl-D to finish)
> def showTs(
> db: String,
> tables: String*
> ): org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = {
>   sql("use " + db).show
>   import scala.collection.mutable.ListBuffer
>   var results = new ListBuffer[org.apache.spark.sql.DataFrame]()
>   for (tbl <- tables) {
> val query = "select * from " + tbl
> println("Executing - " + query);
> results += sql(query)
>   }
>   println("user.timezone - " + System.getProperty("user.timezone"))
>   println("TimeZone.getDefault - " + java.util.TimeZone.getDefault.getID)
>   println("spark.sql.session.timeZone - " + 
> spark.conf.get("spark.sql.session.timeZone"))
>   var unionDf = results(0)
>   for (i <- 1 until results.length) {
> unionDf = unionDf.unionAll(results(i))
>   }
>   val augmented = unionDf.map(r => (r.getString(0), r.getTimestamp(1), 
> r.getTimestamp(1).getTime))
>   val renamed = augmented.withColumnRenamed("_1", 
> "type").withColumnRenamed("_2", "ts").withColumnRenamed("_3", "millis")
> renamed.show(false)
>   return renamed
> }
> // Exiting paste mode, now interpreting.
> scala> showTs("ts", "spark_parquet", "spark_orc", "spark_avro", "spark_text")
> Hive Session ID = daa82b83-b50d-4038-97ee-1ecb2d01b368
> ++
> ||
> ++
> ++
> Executing - select * from spark_parquet
> Executing - select * from spark_orc
> Executing - select * from spark_avro
> Executing - select * from spark_text
> user.timezone - UTC
> TimeZone.getDefault - UTC
> spark.sql.session.timeZone - UTC
> +--+---++ 
>   
> |type  |ts |millis  |
> +--+---++
> |FROM SPARK-EXT PARQUET|1989-01-05 01:02:03|599965323000|
> |FROM SPARK-EXT ORC|1989-01-05 01:02:03|599965323000|
> |FROM SPARK-EXT AVRO   |1989-01-05 01:02:03|599965323000|
> |FROM SPARK-EXT TEXT   |1989-01-05 01:02:03|599965323000|
> +--+---++
> {code}
> 1. Set session timezone to America/Los_Angeles
> {code:scala}
> scala> spark.conf.set("spark.sql.session.timeZone", "America/Los_Angeles")
> scala> showTs("ts", "spark_parquet", "spark_orc", "spark_avro", "spark_text")
> ++
> ||
> ++
> ++
> Executing - select * from spark_parquet
> Executing - select * from spark_orc
> Executing - select * from spark_avro
> Executing - select * from spark_text
> user.timezone - UTC
> TimeZone.getDefault - UTC
> spark.sql.session.timeZone - America/Los_Angeles
> +--+---++
> |type  |ts

[jira] [Commented] (SPARK-34675) TimeZone inconsistencies when JVM and session timezones are different

2021-03-10 Thread Maxim Gekk (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17299086#comment-17299086
 ] 

Maxim Gekk commented on SPARK-34675:


Here is the output on the current master (the same result for all datasources):

{code}
scala> spark.conf.set("spark.sql.session.timeZone", "America/Los_Angeles")

scala> showTs("default", "spark_parquet", "spark_orc", "spark_avro", 
"spark_text")
++
||
++
++

Executing - select * from spark_parquet
Executing - select * from spark_orc
Executing - select * from spark_avro
Executing - select * from spark_text
user.timezone - America/Los_Angeles
TimeZone.getDefault - America/Los_Angeles
spark.sql.session.timeZone - America/Los_Angeles
+--+---++
|type  |ts |millis  |
+--+---++
|FROM SPARK-EXT PARQUET|1989-01-05 01:02:03|54123000|
|FROM SPARK-EXT ORC|1989-01-05 01:02:03|54123000|
|FROM SPARK-EXT AVRO   |1989-01-05 01:02:03|54123000|
|FROM SPARK-EXT TEXT   |1989-01-05 01:02:03|54123000|
+--+---++

res18: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [type: string, 
ts: timestamp ... 1 more field]
{code}

 

> TimeZone inconsistencies when JVM and session timezones are different
> -
>
> Key: SPARK-34675
> URL: https://issues.apache.org/jira/browse/SPARK-34675
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.7
>Reporter: Shubham Chaurasia
>Priority: Major
>
> Inserted following data with UTC as both JVM and session timezone.
> Spark-shell launch command
> {code}
> bin/spark-shell --conf spark.hadoop.metastore.catalog.default=hive --conf 
> spark.sql.catalogImplementation=hive --conf 
> spark.hadoop.hive.metastore.uris=thrift://localhost:9083 --conf 
> spark.driver.extraJavaOptions=' -Duser.timezone=UTC' --conf 
> spark.executor.extraJavaOptions='-Duser.timezone=UTC'
> {code}
> Table creation  
> {code:scala}
> sql("use ts").show
> sql("create table spark_parquet(type string, t timestamp) stored as 
> parquet").show
> sql("create table spark_orc(type string, t timestamp) stored as orc").show
> sql("create table spark_avro(type string, t timestamp) stored as avro").show
> sql("create table spark_text(type string, t timestamp) stored as 
> textfile").show
> sql("insert into spark_parquet values ('FROM SPARK-EXT PARQUET', '1989-01-05 
> 01:02:03')").show
> sql("insert into spark_orc values ('FROM SPARK-EXT ORC', '1989-01-05 
> 01:02:03')").show
> sql("insert into spark_avro values ('FROM SPARK-EXT AVRO', '1989-01-05 
> 01:02:03')").show
> sql("insert into spark_text values ('FROM SPARK-EXT TEXT', '1989-01-05 
> 01:02:03')").show
> {code}
> Used following function to check and verify the returned timestamps
> {code:scala}
> scala> :paste
> // Entering paste mode (ctrl-D to finish)
> def showTs(
> db: String,
> tables: String*
> ): org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = {
>   sql("use " + db).show
>   import scala.collection.mutable.ListBuffer
>   var results = new ListBuffer[org.apache.spark.sql.DataFrame]()
>   for (tbl <- tables) {
> val query = "select * from " + tbl
> println("Executing - " + query);
> results += sql(query)
>   }
>   println("user.timezone - " + System.getProperty("user.timezone"))
>   println("TimeZone.getDefault - " + java.util.TimeZone.getDefault.getID)
>   println("spark.sql.session.timeZone - " + 
> spark.conf.get("spark.sql.session.timeZone"))
>   var unionDf = results(0)
>   for (i <- 1 until results.length) {
> unionDf = unionDf.unionAll(results(i))
>   }
>   val augmented = unionDf.map(r => (r.getString(0), r.getTimestamp(1), 
> r.getTimestamp(1).getTime))
>   val renamed = augmented.withColumnRenamed("_1", 
> "type").withColumnRenamed("_2", "ts").withColumnRenamed("_3", "millis")
> renamed.show(false)
>   return renamed
> }
> // Exiting paste mode, now interpreting.
> scala> showTs("ts", "spark_parquet", "spark_orc", "spark_avro", "spark_text")
> Hive Session ID = daa82b83-b50d-4038-97ee-1ecb2d01b368
> ++
> ||
> ++
> ++
> Executing - select * from spark_parquet
> Executing - select * from spark_orc
> Executing - select * from spark_avro
> Executing - select * from spark_text
> user.timezone - UTC
> TimeZone.getDefault - UTC
> spark.sql.session.timeZone - UTC
> +--+---++ 
>   
> |type  |ts |millis  |
> +--+---++
> |FROM SPARK-EXT PARQUET|1989-01-05 01:02:03|599965323000|
> |FROM SPARK-EXT ORC|1989-01-05 01:02:03|599965323000|
> |FROM SPARK-EXT AVRO   |1989-01-05

[jira] [Created] (SPARK-34695) Overflow in round trip conversion from micros to duration

2021-03-10 Thread Maxim Gekk (Jira)

Maxim Gekk created SPARK-34695:
--

 Summary: Overflow in round trip conversion from micros to duration
 Key: SPARK-34695
 URL: https://issues.apache.org/jira/browse/SPARK-34695
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Maxim Gekk


The code below fails with long overflow:
{code:scala}
scala> import org.apache.spark.sql.catalyst.util.IntervalUtils._
import org.apache.spark.sql.catalyst.util.IntervalUtils._

scala> val minDuration = microsToDuration(Long.MinValue)
minDuration: java.time.Duration = PT-2562047788H-54.775808S

scala> durationToMicros(minDuration)
java.lang.ArithmeticException: long overflow
  at java.lang.Math.multiplyExact(Math.java:892)
  at 
org.apache.spark.sql.catalyst.util.IntervalUtils$.durationToMicros(IntervalUtils.scala:782)
  ... 49 elided
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34677) Support add and subtract of ANSI SQL intervals

2021-03-09 Thread Maxim Gekk (Jira)

Maxim Gekk created SPARK-34677:
--

 Summary: Support add and subtract of ANSI SQL intervals
 Key: SPARK-34677
 URL: https://issues.apache.org/jira/browse/SPARK-34677
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Maxim Gekk


Support unary -, -/+ of two ANSI SQL intervals (the same type): year-month and 
day-time intervals.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34675) TimeZone inconsistencies when JVM and session timezones are different

2021-03-09 Thread Maxim Gekk (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17298233#comment-17298233
 ] 

Maxim Gekk commented on SPARK-34675:


> Set session timezone to America/Los_Angeles
> scala> spark.conf.set("spark.sql.session.timeZone", "America/Los_Angeles")

Processing of dates/timestamps in Spark 2.4.x is based on Java 7 time APIs 
where JVM time zone is "hard coded" in the classes 
java.sql.Date/java.sql.Timestamp. So, Spark 2.4.x cannot apply the session time 
zone in some cases. In Spark 3.x, most of the problems were solved. I would 
recommend to try the same on it.
 

> TimeZone inconsistencies when JVM and session timezones are different
> -
>
> Key: SPARK-34675
> URL: https://issues.apache.org/jira/browse/SPARK-34675
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.7
>Reporter: Shubham Chaurasia
>Priority: Major
>
> Inserted following data with UTC as both JVM and session timezone.
> Spark-shell launch command
> {code}
> bin/spark-shell --conf spark.hadoop.metastore.catalog.default=hive --conf 
> spark.sql.catalogImplementation=hive --conf 
> spark.hadoop.hive.metastore.uris=thrift://localhost:9083 --conf 
> spark.driver.extraJavaOptions=' -Duser.timezone=UTC' --conf 
> spark.executor.extraJavaOptions='-Duser.timezone=UTC'
> {code}
> Table creation  
> {code:scala}
> sql("use ts").show
> sql("create table spark_parquet(type string, t timestamp) stored as 
> parquet").show
> sql("create table spark_orc(type string, t timestamp) stored as orc").show
> sql("create table spark_avro(type string, t timestamp) stored as avro").show
> sql("create table spark_text(type string, t timestamp) stored as 
> textfile").show
> sql("insert into spark_parquet values ('FROM SPARK-EXT PARQUET', '1989-01-05 
> 01:02:03')").show
> sql("insert into spark_orc values ('FROM SPARK-EXT ORC', '1989-01-05 
> 01:02:03')").show
> sql("insert into spark_avro values ('FROM SPARK-EXT AVRO', '1989-01-05 
> 01:02:03')").show
> sql("insert into spark_text values ('FROM SPARK-EXT TEXT', '1989-01-05 
> 01:02:03')").show
> {code}
> Used following function to check and verify the returned timestamps
> {code:scala}
> scala> :paste
> // Entering paste mode (ctrl-D to finish)
> def showTs(
> db: String,
> tables: String*
> ): org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = {
>   sql("use " + db).show
>   import scala.collection.mutable.ListBuffer
>   var results = new ListBuffer[org.apache.spark.sql.DataFrame]()
>   for (tbl <- tables) {
> val query = "select * from " + tbl
> println("Executing - " + query);
> results += sql(query)
>   }
>   println("user.timezone - " + System.getProperty("user.timezone"))
>   println("TimeZone.getDefault - " + java.util.TimeZone.getDefault.getID)
>   println("spark.sql.session.timeZone - " + 
> spark.conf.get("spark.sql.session.timeZone"))
>   var unionDf = results(0)
>   for (i <- 1 until results.length) {
> unionDf = unionDf.unionAll(results(i))
>   }
>   val augmented = unionDf.map(r => (r.getString(0), r.getTimestamp(1), 
> r.getTimestamp(1).getTime))
>   val renamed = augmented.withColumnRenamed("_1", 
> "type").withColumnRenamed("_2", "ts").withColumnRenamed("_3", "millis")
> renamed.show(false)
>   return renamed
> }
> // Exiting paste mode, now interpreting.
> scala> showTs("ts", "spark_parquet", "spark_orc", "spark_avro", "spark_text")
> Hive Session ID = daa82b83-b50d-4038-97ee-1ecb2d01b368
> ++
> ||
> ++
> ++
> Executing - select * from spark_parquet
> Executing - select * from spark_orc
> Executing - select * from spark_avro
> Executing - select * from spark_text
> user.timezone - UTC
> TimeZone.getDefault - UTC
> spark.sql.session.timeZone - UTC
> +--+---++ 
>   
> |type  |ts |millis  |
> +--+---++
> |FROM SPARK-EXT PARQUET|1989-01-05 01:02:03|599965323000|
> |FROM SPARK-EXT ORC|1989-01-05 01:02:03|599965323000|
> |FROM SPARK-EXT AVRO   |1989-01-05 01:02:03|599965323000|
> |FROM SPARK-EXT TEXT   |1989-01-05 01:02:03|599965323000|
> +--+---++
> {code}
> 1. Set session timezone to America/Los_Angeles
> {code:scala}
> scala> spark.conf.set("spark.sql.session.timeZone", "America/Los_Angeles")
> scala> showTs("ts", "spark_parquet", "spark_orc", "spark_avro", "spark_text")
> ++
> ||
> ++
> ++
> Executing - select * from spark_parquet
> Executing - select * from spark_orc
> Executing - select * from spark_avro
> Executing - select * from spark_text
> user.timezone - UTC
> TimeZone.getDefault - UTC
> spark.sql.session.timeZone - America/Los_Angeles
>

[jira] [Updated] (SPARK-34668) Support casting of day-time intervals to strings

2021-03-08 Thread Maxim Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-34668:
---
Description: Extend the Cast expression and support DayTimeIntervalType in 
casting to StringType.  (was: Extend the Cast expression and support 
YearMonthIntervalType in casting to StringType.)

> Support casting of day-time intervals to strings
> 
>
> Key: SPARK-34668
> URL: https://issues.apache.org/jira/browse/SPARK-34668
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Priority: Major
>
> Extend the Cast expression and support DayTimeIntervalType in casting to 
> StringType.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34668) Support casting of day-time intervals to strings

2021-03-08 Thread Maxim Gekk (Jira)

Maxim Gekk created SPARK-34668:
--

 Summary: Support casting of day-time intervals to strings
 Key: SPARK-34668
 URL: https://issues.apache.org/jira/browse/SPARK-34668
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Maxim Gekk


Extend the Cast expression and support YearMonthIntervalType in casting to 
StringType.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34667) Support casting of year-month intervals to strings

2021-03-08 Thread Maxim Gekk (Jira)

Maxim Gekk created SPARK-34667:
--

 Summary: Support casting of year-month intervals to strings
 Key: SPARK-34667
 URL: https://issues.apache.org/jira/browse/SPARK-34667
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Maxim Gekk


Extend the Cast expression and support YearMonthIntervalType in casting to 
StringType.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34666) Test DayTimeIntervalType/YearMonthIntervalType as ordered and atomic types

2021-03-08 Thread Maxim Gekk (Jira)

Maxim Gekk created SPARK-34666:
--

 Summary: Test DayTimeIntervalType/YearMonthIntervalType as ordered 
and atomic types
 Key: SPARK-34666
 URL: https://issues.apache.org/jira/browse/SPARK-34666
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Maxim Gekk


Add DayTimeIntervalType and YearMonthIntervalType to DataTypeTestUtils.ordered 
and DataTypeTestUtils.atomicTypes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34663) Test year-month and day-time intervals in UDF

2021-03-08 Thread Maxim Gekk (Jira)

Maxim Gekk created SPARK-34663:
--

 Summary: Test year-month and day-time intervals in UDF
 Key: SPARK-34663
 URL: https://issues.apache.org/jira/browse/SPARK-34663
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Maxim Gekk


Write tests for year-month and day-time intervals in UDF as input parameters 
and results.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34615) Support java.time.Period as an external type of the year-month interval type

2021-03-06 Thread Maxim Gekk (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17296540#comment-17296540
 ] 

Maxim Gekk commented on SPARK-34615:


I am working on this sub-task.

> Support java.time.Period as an external type of the year-month interval type
> 
>
> Key: SPARK-34615
> URL: https://issues.apache.org/jira/browse/SPARK-34615
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Priority: Major
>
> Allow parallelization/collection of java.time.Period values, and convert the 
> values to interval values of YearMonthIntervalType.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34619) Update the Spark SQL guide about day-time and year-month interval types

2021-03-04 Thread Maxim Gekk (Jira)

Maxim Gekk created SPARK-34619:
--

 Summary: Update the Spark SQL guide about day-time and year-month 
interval types
 Key: SPARK-34619
 URL: https://issues.apache.org/jira/browse/SPARK-34619
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Maxim Gekk


Describe new types at http://spark.apache.org/docs/latest/sql-ref-datatypes.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34615) Support java.time.Period as an external type of the year-month interval type

2021-03-03 Thread Maxim Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-34615:
---
Description: Allow parallelization/collection of java.time.Period values, 
and convert the values to interval values of YearMonthIntervalType.  (was: 
Allow parallelization/collection of java.time.Duration values, and convert the 
values to interval values of DayTimeIntervalType.)

> Support java.time.Period as an external type of the year-month interval type
> 
>
> Key: SPARK-34615
> URL: https://issues.apache.org/jira/browse/SPARK-34615
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Priority: Major
>
> Allow parallelization/collection of java.time.Period values, and convert the 
> values to interval values of YearMonthIntervalType.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34615) Support java.time.Period as an external type of the year-month interval type

2021-03-03 Thread Maxim Gekk (Jira)

Maxim Gekk created SPARK-34615:
--

 Summary: Support java.time.Period as an external type of the 
year-month interval type
 Key: SPARK-34615
 URL: https://issues.apache.org/jira/browse/SPARK-34615
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Maxim Gekk


Allow parallelization/collection of java.time.Duration values, and convert the 
values to interval values of DayTimeIntervalType.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34605) Support java.time.Duration as an external type of the day-time interval type

2021-03-03 Thread Maxim Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-34605:
---
Summary: Support java.time.Duration as an external type of the day-time 
interval type  (was: Support java.time.Duration as an external type for the 
day-time interval type)

> Support java.time.Duration as an external type of the day-time interval type
> 
>
> Key: SPARK-34605
> URL: https://issues.apache.org/jira/browse/SPARK-34605
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Priority: Major
>
> Allow parallelization/collection of java.time.Duration values, and convert 
> the values to interval values of DayTimeIntervalType.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34605) Support java.time.Duration as an external type for the day-time interval type

2021-03-03 Thread Maxim Gekk (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17294395#comment-17294395
 ] 

Maxim Gekk commented on SPARK-34605:


I am working on the sub-task.

> Support java.time.Duration as an external type for the day-time interval type
> -
>
> Key: SPARK-34605
> URL: https://issues.apache.org/jira/browse/SPARK-34605
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Priority: Major
>
> Allow parallelization/collection of java.time.Duration values, and convert 
> the values to interval values of DayTimeIntervalType.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34605) Support java.time.Duration as an external type for the day-time interval type

2021-03-03 Thread Maxim Gekk (Jira)

Maxim Gekk created SPARK-34605:
--

 Summary: Support java.time.Duration as an external type for the 
day-time interval type
 Key: SPARK-34605
 URL: https://issues.apache.org/jira/browse/SPARK-34605
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Maxim Gekk


Allow parallelization/collection of java.time.Duration values, and convert the 
values to interval values of DayTimeIntervalType.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-27790) Support ANSI SQL INTERVAL types

2021-03-02 Thread Maxim Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-27790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-27790:
---
Description: 
Spark has an INTERVAL data type, but it is “broken”:
# It cannot be persisted
# It is not comparable because it crosses the month day line. That is there is 
no telling whether “1 Month 1 Day” is equal to “1 Month 1 Day” since not all 
months have the same number of days.

I propose here to introduce the two flavours of INTERVAL as described in the 
ANSI SQL Standard and deprecate the Sparks interval type.
* ANSI describes two non overlapping “classes”: 
** YEAR-MONTH, 
** DAY-SECOND ranges
* Members within each class can be compared and sorted.
* Supports datetime arithmetic
* Can be persisted.

The old and new flavors of INTERVAL can coexist until Spark INTERVAL is 
eventually retired. Also any semantic “breakage” can be controlled via legacy 
config settings. 

*Milestone 1* --  Spark Interval equivalency (   The new interval types meet or 
exceed all function of the existing SQL Interval):
* Add two new DataType implementations for interval year-month and day-second. 
Includes the JSON format and DLL string.
* Infra support: check the caller sides of DateType/TimestampType
* Support the two new interval types in Dataset/UDF.
* Interval literals (with a legacy config to still allow mixed year-month 
day-seconds fields and return legacy interval values)
* Interval arithmetic(interval * num, interval / num, interval +/- interval)
* Datetime functions/operators: Datetime - Datetime (to days or day second), 
Datetime +/- interval
* Cast to and from the new two interval types, cast string to interval, cast 
interval to string (pretty printing), with the SQL syntax to specify the types
* Support sorting intervals.

*Milestone 2* -- Persistence:
* Ability to create tables of type interval
* Ability to write to common file formats such as Parquet and JSON.
* INSERT, SELECT, UPDATE, MERGE
* Discovery

*Milestone 3* --  Client support
* JDBC support
* Hive Thrift server

*Milestone 4* -- PySpark and Spark R integration
* Python UDF can take and return intervals
* DataFrame support



  was:
Spark has an INTERVAL data type, but it is “broken”:
# It cannot be persisted
# It is not comparable because it crosses the month day line. That is there is 
no telling whether “1 Month 1 Day” is equal to “1 Month 1 Day” since not all 
months have the same number of days.

I propose here to introduce the two flavours of INTERVAL as described in the 
ANSI SQL Standard and deprecate the Sparks interval type.
* ANSI describes two non overlapping “classes”: 
* YEAR-MONTH, 
* DAY-SECOND ranges
* Members within each class can be compared and sorted.
* Supports datetime arithmetic
* Can be persisted.

The old and new flavors of INTERVAL can coexist until Spark INTERVAL is 
eventually retired. Also any semantic “breakage” can be controlled via legacy 
config settings. 

*Milestone 1* --  Spark Interval equivalency (   The new interval types meet or 
exceed all function of the existing SQL Interval):
* Add two new DataType implementations for interval year-month and day-second. 
Includes the JSON format and DLL string.
* Infra support: check the caller sides of DateType/TimestampType
* Support the two new interval types in Dataset/UDF.
* Interval literals (with a legacy config to still allow mixed year-month 
day-seconds fields and return legacy interval values)
* Interval arithmetic(interval * num, interval / num, interval +/- interval)
* Datetime functions/operators: Datetime - Datetime (to days or day second), 
Datetime +/- interval
* Cast to and from the new two interval types, cast string to interval, cast 
interval to string (pretty printing), with the SQL syntax to specify the types
* Support sorting intervals.

*Milestone 2* -- Persistence:
* Ability to create tables of type interval
* Ability to write to common file formats such as Parquet and JSON.
* INSERT, SELECT, UPDATE, MERGE
* Discovery

*Milestone 3* --  Client support
* JDBC support
* Hive Thrift server

*Milestone 4* -- PySpark and Spark R integration
* Python UDF can take and return intervals
* DataFrame support




> Support ANSI SQL INTERVAL types
> ---
>
> Key: SPARK-27790
> URL: https://issues.apache.org/jira/browse/SPARK-27790
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Assignee: Apache Spark
>Priority: Major
>
> Spark has an INTERVAL data type, but it is “broken”:
> # It cannot be persisted
> # It is not comparable because it crosses the month day line. That is there 
> is no telling whether “1 Month 1 Day” is equal to “1 Month 1 Day” since not 
> all months have the same number of days.
> I propose here to introduce the two flavours of INTERVAL as described in

[jira] [Updated] (SPARK-27790) Support ANSI SQL INTERVAL types

2021-03-02 Thread Maxim Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-27790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-27790:
---
Description: 
Spark has an INTERVAL data type, but it is “broken”:
# It cannot be persisted
# It is not comparable because it crosses the month day line. That is there is 
no telling whether “1 Month 1 Day” is equal to “1 Month 1 Day” since not all 
months have the same number of days.

I propose here to introduce the two flavours of INTERVAL as described in the 
ANSI SQL Standard and deprecate the Sparks interval type.
* ANSI describes two non overlapping “classes”: 
* YEAR-MONTH, 
* DAY-SECOND ranges
* Members within each class can be compared and sorted.
* Supports datetime arithmetic
* Can be persisted.

The old and new flavors of INTERVAL can coexist until Spark INTERVAL is 
eventually retired. Also any semantic “breakage” can be controlled via legacy 
config settings. 

*Milestone 1* --  Spark Interval equivalency (   The new interval types meet or 
exceed all function of the existing SQL Interval):
* Add two new DataType implementations for interval year-month and day-second. 
Includes the JSON format and DLL string.
* Infra support: check the caller sides of DateType/TimestampType
* Support the two new interval types in Dataset/UDF.
* Interval literals (with a legacy config to still allow mixed year-month 
day-seconds fields and return legacy interval values)
* Interval arithmetic(interval * num, interval / num, interval +/- interval)
* Datetime functions/operators: Datetime - Datetime (to days or day second), 
Datetime +/- interval
* Cast to and from the new two interval types, cast string to interval, cast 
interval to string (pretty printing), with the SQL syntax to specify the types
* Support sorting intervals.

*Milestone 2* -- Persistence:
* Ability to create tables of type interval
* Ability to write to common file formats such as Parquet and JSON.
* INSERT, SELECT, UPDATE, MERGE
* Discovery

*Milestone 3* --  Client support
* JDBC support
* Hive Thrift server

*Milestone 4* -- PySpark and Spark R integration
* Python UDF can take and return intervals
* DataFrame support



  was:
Spark has an INTERVAL data type, but it is “broken”:
# It cannot be persisted
# It is not comparable because it crosses the month day line. That is there is 
no telling whether “1 Month 1 Day” is equal to “1 Month 1 Day” since not all 
months have the same number of days.

I propose here to introduce the two flavours of INTERVAL as described in the 
ANSI SQL Standard and deprecate the Sparks interval type.
* ANSI describes two non overlapping “classes”: 
* YEAR-MONTH, 
* DAY-SECOND ranges
* Members within each class can be compared and sorted.
* Supports datetime arithmetic
* Can be persisted.

The old and new flavors of INTERVAL can coexist until Spark INTERVAL is 
eventually retired. Also any semantic “breakage” can be controlled via legacy 
config settings. 

*Milestone 1* --  Spark Interval equivalency (   The new interval types meet or 
exceed all function of the existing SQL Interval):
* Add two new DataType implementations for interval year-month and day-second. 
Includes the JSON format and DLL string.
* Infra support: check the caller sides of DateType/TimestampType
* Support the two new interval types in Dataset/UDF.
* Interval literals (with a legacy config to still allow mixed year-month 
day-seconds fields and return legacy interval values)
* Interval arithmetic(interval * num, interval / num, interval +/- interval)
* Datetime functions/operators: Datetime - Datetime (to days or day second), 
Datetime +/- interval
* Cast to and from the new two interval types, cast string to interval, cast 
interval to string (pretty printing), with the SQL syntax to specify the types
* Support sorting intervals.

*Milestone 2* -- Persistence:
* Ability to create tables of type interval
* Ability to write to common file formats such as Parquet and JSON.
* INSERT, SELECT, UPDATE, MERGE
* Discovery

*Milestone 3* --  Client support
* JDBC support
* Hive Thrift server

*Milestone 4* -- PySpark and Spark R integration
* Python UDF can take and return intervals
* DataFrame support




> Support ANSI SQL INTERVAL types
> ---
>
> Key: SPARK-27790
> URL: https://issues.apache.org/jira/browse/SPARK-27790
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Assignee: Apache Spark
>Priority: Major
>
> Spark has an INTERVAL data type, but it is “broken”:
> # It cannot be persisted
> # It is not comparable because it crosses the month day line. That is there 
> is no telling whether “1 Month 1 Day” is equal to “1 Month 1 Day” since not 
> all months have the same number of days.
> I propose here to introduce the two flavours of INTERVAL as

[jira] [Updated] (SPARK-27793) Add ANSI SQL day-time and year-month interval types

2021-03-02 Thread Maxim Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-27793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-27793:
---
Summary: Add ANSI SQL day-time and year-month interval types  (was: Add 
day-time and year-month interval types)

> Add ANSI SQL day-time and year-month interval types
> ---
>
> Key: SPARK-27793
> URL: https://issues.apache.org/jira/browse/SPARK-27793
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Priority: Major
>
> Extend Catalyst's type system by two new types that conform to the SQL 
> standard (see SQL:2016, section 4.6.3):
> # DayTimeIntervalType represents the day-time interval type,
> # YearMonthIntervalType for SQL year-month interval type.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-27791) Support SQL year-month INTERVAL type

2021-03-02 Thread Maxim Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-27791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-27791:
---
Parent: (was: SPARK-27790)
Issue Type: Improvement  (was: Sub-task)

> Support SQL year-month INTERVAL type
> 
>
> Key: SPARK-27791
> URL: https://issues.apache.org/jira/browse/SPARK-27791
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Maxim Gekk
>Priority: Major
>
> The INTERVAL type must conform to SQL year-month INTERVAL type, has 2 logical 
> types:
> # YEAR - Unconstrained except by the leading field precision
> # MONTH - Months (within years) (0-11)
> And support arithmetic operations involving values of type datetime or 
> interval.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-27791) Support SQL year-month INTERVAL type

2021-03-02 Thread Maxim Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-27791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk resolved SPARK-27791.

Resolution: Won't Fix

> Support SQL year-month INTERVAL type
> 
>
> Key: SPARK-27791
> URL: https://issues.apache.org/jira/browse/SPARK-27791
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Maxim Gekk
>Priority: Major
>
> The INTERVAL type must conform to SQL year-month INTERVAL type, has 2 logical 
> types:
> # YEAR - Unconstrained except by the leading field precision
> # MONTH - Months (within years) (0-11)
> And support arithmetic operations involving values of type datetime or 
> interval.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-27793) Add day-time and year-month interval types

2021-03-02 Thread Maxim Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-27793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-27793:
---
Description: 
Extend Catalyst's type system by two new types that conform to the SQL standard 
(see SQL:2016, section 4.6.3):
# DayTimeIntervalType represents the day-time interval type,
# YearMonthIntervalType for SQL year-month interval type.

  was:
The day-time INTERVAL type contains the following fields:
# DAY - Unconstrained except by the leading field precision
# HOUR - Hours (within days) (0-23)
# MINUTE - Minutes (within hours) (0-59)
# SECOND - Seconds and possibly fractions of a second (0-59.999...)

The interval type should support all operations defined by SQL standard


> Add day-time and year-month interval types
> --
>
> Key: SPARK-27793
> URL: https://issues.apache.org/jira/browse/SPARK-27793
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Priority: Major
>
> Extend Catalyst's type system by two new types that conform to the SQL 
> standard (see SQL:2016, section 4.6.3):
> # DayTimeIntervalType represents the day-time interval type,
> # YearMonthIntervalType for SQL year-month interval type.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-27793) Add day-time and year-month interval types

2021-03-02 Thread Maxim Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-27793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-27793:
---
Summary: Add day-time and year-month interval types  (was: Support SQL 
day-time INTERVAL type)

> Add day-time and year-month interval types
> --
>
> Key: SPARK-27793
> URL: https://issues.apache.org/jira/browse/SPARK-27793
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Priority: Major
>
> The day-time INTERVAL type contains the following fields:
> # DAY - Unconstrained except by the leading field precision
> # HOUR - Hours (within days) (0-23)
> # MINUTE - Minutes (within hours) (0-59)
> # SECOND - Seconds and possibly fractions of a second (0-59.999...)
> The interval type should support all operations defined by SQL standard



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-27793) Support SQL day-time INTERVAL type

2021-03-02 Thread Maxim Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-27793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-27793:
---
Affects Version/s: (was: 2.4.3)
   3.2.0

> Support SQL day-time INTERVAL type
> --
>
> Key: SPARK-27793
> URL: https://issues.apache.org/jira/browse/SPARK-27793
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Priority: Major
>
> The day-time INTERVAL type contains the following fields:
> # DAY - Unconstrained except by the leading field precision
> # HOUR - Hours (within days) (0-23)
> # MINUTE - Minutes (within hours) (0-59)
> # SECOND - Seconds and possibly fractions of a second (0-59.999...)
> The interval type should support all operations defined by SQL standard



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-27790) Support ANSI SQL INTERVAL types

2021-03-02 Thread Maxim Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-27790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-27790:
---
Description: 
Spark has an INTERVAL data type, but it is “broken”:
# It cannot be persisted
# It is not comparable because it crosses the month day line. That is there is 
no telling whether “1 Month 1 Day” is equal to “1 Month 1 Day” since not all 
months have the same number of days.

I propose here to introduce the two flavours of INTERVAL as described in the 
ANSI SQL Standard and deprecate the Sparks interval type.
* ANSI describes two non overlapping “classes”: 
* YEAR-MONTH, 
* DAY-SECOND ranges
* Members within each class can be compared and sorted.
* Supports datetime arithmetic
* Can be persisted.

The old and new flavors of INTERVAL can coexist until Spark INTERVAL is 
eventually retired. Also any semantic “breakage” can be controlled via legacy 
config settings. 

*Milestone 1* --  Spark Interval equivalency (   The new interval types meet or 
exceed all function of the existing SQL Interval):
* Add two new DataType implementations for interval year-month and day-second. 
Includes the JSON format and DLL string.
* Infra support: check the caller sides of DateType/TimestampType
* Support the two new interval types in Dataset/UDF.
* Interval literals (with a legacy config to still allow mixed year-month 
day-seconds fields and return legacy interval values)
* Interval arithmetic(interval * num, interval / num, interval +/- interval)
* Datetime functions/operators: Datetime - Datetime (to days or day second), 
Datetime +/- interval
* Cast to and from the new two interval types, cast string to interval, cast 
interval to string (pretty printing), with the SQL syntax to specify the types
* Support sorting intervals.

*Milestone 2* -- Persistence:
* Ability to create tables of type interval
* Ability to write to common file formats such as Parquet and JSON.
* INSERT, SELECT, UPDATE, MERGE
* Discovery

*Milestone 3* --  Client support
* JDBC support
* Hive Thrift server

*Milestone 4* -- PySpark and Spark R integration
* Python UDF can take and return intervals
* DataFrame support



  was:
Spark has an INTERVAL data type, but it is “broken”:
# It cannot be persisted
# It is not comparable because it crosses the month day line. That is there is 
no telling whether “1 Month 1 Day” is equal to “1 Month 1 Day” since not all 
months have the same number of days.

I propose here to introduce the two flavours of INTERVAL as described in the 
ANSI SQL Standard and deprecate the Sparks interval type.
* ANSI describes two non overlapping “classes”: 
* YEAR-MONTH, 
* DAY-SECOND ranges
* Members within each class can be compared and sorted.
* Supports datetime arithmetic
* Can be persisted.

The old and new flavors of INTERVAL can coexist until Spark INTERVAL is 
eventually retired. Also any semantic “breakage” can be controlled via legacy 
config settings. 




> Support ANSI SQL INTERVAL types
> ---
>
> Key: SPARK-27790
> URL: https://issues.apache.org/jira/browse/SPARK-27790
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Assignee: Apache Spark
>Priority: Major
>
> Spark has an INTERVAL data type, but it is “broken”:
> # It cannot be persisted
> # It is not comparable because it crosses the month day line. That is there 
> is no telling whether “1 Month 1 Day” is equal to “1 Month 1 Day” since not 
> all months have the same number of days.
> I propose here to introduce the two flavours of INTERVAL as described in the 
> ANSI SQL Standard and deprecate the Sparks interval type.
> * ANSI describes two non overlapping “classes”: 
> * YEAR-MONTH, 
> * DAY-SECOND ranges
> * Members within each class can be compared and sorted.
> * Supports datetime arithmetic
> * Can be persisted.
> The old and new flavors of INTERVAL can coexist until Spark INTERVAL is 
> eventually retired. Also any semantic “breakage” can be controlled via legacy 
> config settings. 
> *Milestone 1* --  Spark Interval equivalency (   The new interval types meet 
> or exceed all function of the existing SQL Interval):
> * Add two new DataType implementations for interval year-month and 
> day-second. Includes the JSON format and DLL string.
> * Infra support: check the caller sides of DateType/TimestampType
> * Support the two new interval types in Dataset/UDF.
> * Interval literals (with a legacy config to still allow mixed year-month 
> day-seconds fields and return legacy interval values)
> * Interval arithmetic(interval * num, interval / num, interval +/- interval)
> * Datetime functions/operators: Datetime - Datetime (to days or day second), 
> Datetime +/- interval
> * Cast to and from the new two interval types, cast string to

[jira] [Updated] (SPARK-27790) Support ANSI SQL INTERVAL types

2021-03-02 Thread Maxim Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-27790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-27790:
---
Description: 
Spark has an INTERVAL data type, but it is “broken”:
# It cannot be persisted
# It is not comparable because it crosses the month day line. That is there is 
no telling whether “1 Month 1 Day” is equal to “1 Month 1 Day” since not all 
months have the same number of days.

I propose here to introduce the two flavours of INTERVAL as described in the 
ANSI SQL Standard and deprecate the Sparks interval type.
* ANSI describes two non overlapping “classes”: 
* YEAR-MONTH, 
* DAY-SECOND ranges
* Members within each class can be compared and sorted.
* Supports datetime arithmetic
* Can be persisted.

The old and new flavors of INTERVAL can coexist until Spark INTERVAL is 
eventually retired. Also any semantic “breakage” can be controlled via legacy 
config settings. 



  was:
SQL standard defines 2 interval types:
# year-month interval contains a YEAR field or a MONTH field or both
# day-time interval contains DAY, HOUR, MINUTE, and SECOND (possibly fraction 
of seconds)

Need to add 2 new internal types YearMonthIntervalType and DayTimeIntervalType, 
support operations defined by SQL standard as well as INTERVAL literals.

The java.time.Period and java.time.Duration can be supported as external type 
for YearMonthIntervalType and DayTimeIntervalType.


> Support ANSI SQL INTERVAL types
> ---
>
> Key: SPARK-27790
> URL: https://issues.apache.org/jira/browse/SPARK-27790
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Assignee: Apache Spark
>Priority: Major
>
> Spark has an INTERVAL data type, but it is “broken”:
> # It cannot be persisted
> # It is not comparable because it crosses the month day line. That is there 
> is no telling whether “1 Month 1 Day” is equal to “1 Month 1 Day” since not 
> all months have the same number of days.
> I propose here to introduce the two flavours of INTERVAL as described in the 
> ANSI SQL Standard and deprecate the Sparks interval type.
> * ANSI describes two non overlapping “classes”: 
> * YEAR-MONTH, 
> * DAY-SECOND ranges
> * Members within each class can be compared and sorted.
> * Supports datetime arithmetic
> * Can be persisted.
> The old and new flavors of INTERVAL can coexist until Spark INTERVAL is 
> eventually retired. Also any semantic “breakage” can be controlled via legacy 
> config settings. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Issue Comment Deleted] (SPARK-27790) Support ANSI SQL INTERVAL types

2021-03-02 Thread Maxim Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-27790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-27790:
---
Comment: was deleted

(was: I am working on it.)

> Support ANSI SQL INTERVAL types
> ---
>
> Key: SPARK-27790
> URL: https://issues.apache.org/jira/browse/SPARK-27790
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Assignee: Apache Spark
>Priority: Major
>
> Spark has an INTERVAL data type, but it is “broken”:
> # It cannot be persisted
> # It is not comparable because it crosses the month day line. That is there 
> is no telling whether “1 Month 1 Day” is equal to “1 Month 1 Day” since not 
> all months have the same number of days.
> I propose here to introduce the two flavours of INTERVAL as described in the 
> ANSI SQL Standard and deprecate the Sparks interval type.
> * ANSI describes two non overlapping “classes”: 
> * YEAR-MONTH, 
> * DAY-SECOND ranges
> * Members within each class can be compared and sorted.
> * Supports datetime arithmetic
> * Can be persisted.
> The old and new flavors of INTERVAL can coexist until Spark INTERVAL is 
> eventually retired. Also any semantic “breakage” can be controlled via legacy 
> config settings. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-27790) Support ANSI SQL INTERVAL types

2021-03-02 Thread Maxim Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-27790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-27790:
---
Issue Type: Improvement  (was: New Feature)

> Support ANSI SQL INTERVAL types
> ---
>
> Key: SPARK-27790
> URL: https://issues.apache.org/jira/browse/SPARK-27790
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Assignee: Apache Spark
>Priority: Major
>
> Spark has an INTERVAL data type, but it is “broken”:
> # It cannot be persisted
> # It is not comparable because it crosses the month day line. That is there 
> is no telling whether “1 Month 1 Day” is equal to “1 Month 1 Day” since not 
> all months have the same number of days.
> I propose here to introduce the two flavours of INTERVAL as described in the 
> ANSI SQL Standard and deprecate the Sparks interval type.
> * ANSI describes two non overlapping “classes”: 
> * YEAR-MONTH, 
> * DAY-SECOND ranges
> * Members within each class can be compared and sorted.
> * Supports datetime arithmetic
> * Can be persisted.
> The old and new flavors of INTERVAL can coexist until Spark INTERVAL is 
> eventually retired. Also any semantic “breakage” can be controlled via legacy 
> config settings. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-27790) Support ANSI SQL INTERVAL types

2021-03-02 Thread Maxim Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-27790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-27790:
---
Summary: Support ANSI SQL INTERVAL types  (was: Support SQL INTERVAL types)

> Support ANSI SQL INTERVAL types
> ---
>
> Key: SPARK-27790
> URL: https://issues.apache.org/jira/browse/SPARK-27790
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Assignee: Apache Spark
>Priority: Major
>
> SQL standard defines 2 interval types:
> # year-month interval contains a YEAR field or a MONTH field or both
> # day-time interval contains DAY, HOUR, MINUTE, and SECOND (possibly fraction 
> of seconds)
> Need to add 2 new internal types YearMonthIntervalType and 
> DayTimeIntervalType, support operations defined by SQL standard as well as 
> INTERVAL literals.
> The java.time.Period and java.time.Duration can be supported as external type 
> for YearMonthIntervalType and DayTimeIntervalType.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34564) DateTimeUtils.fromJavaDate fails for very late dates during casting to Int

2021-02-28 Thread Maxim Gekk (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17292427#comment-17292427
 ] 

Maxim Gekk commented on SPARK-34564:


We changed the behavior intentionally because we do believe that it is better 
to return an error instead of an incorrect result silently. 

> However, the question is even if such late dates are not supported, could it 
>fail in more gentle way?

How? What would you like to see?

> DateTimeUtils.fromJavaDate fails for very late dates during casting to Int
> --
>
> Key: SPARK-34564
> URL: https://issues.apache.org/jira/browse/SPARK-34564
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1, 3.2.0, 3.1.2
>Reporter: kondziolka9ld
>Priority: Major
>
> Please consider a following scenario on *spark-3.0.1*: 
> {code:java}
> scala> List(("some date", new Date(Int.MaxValue)), ("some corner case date", 
> new Date(Long.MaxValue))).toDF
> java.lang.RuntimeException: Error while encoding: 
> java.lang.ArithmeticException: integer overflow
> staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, 
> fromString, knownnotnull(assertnotnull(input[0, scala.Tuple2, true]))._1, 
> true, false) AS _1#0
> staticinvoke(class org.apache.spark.sql.catalyst.util.DateTimeUtils$, 
> DateType, fromJavaDate, knownnotnull(assertnotnull(input[0, scala.Tuple2, 
> true]))._2, true, false) AS _2#1
>   at 
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$Serializer.apply(ExpressionEncoder.scala:215)
>   at 
> org.apache.spark.sql.SparkSession.$anonfun$createDataset$1(SparkSession.scala:466)
>   at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
>   at scala.collection.immutable.List.foreach(List.scala:392)
>   at scala.collection.TraversableLike.map(TraversableLike.scala:238)
>   at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
>   at scala.collection.immutable.List.map(List.scala:298)
>   at org.apache.spark.sql.SparkSession.createDataset(SparkSession.scala:466)
>   at org.apache.spark.sql.SQLContext.createDataset(SQLContext.scala:353)
>   at 
> org.apache.spark.sql.SQLImplicits.localSeqToDatasetHolder(SQLImplicits.scala:231)
>   ... 51 elided
> Caused by: java.lang.ArithmeticException: integer overflow
>   at java.lang.Math.toIntExact(Math.java:1011)
>   at 
> org.apache.spark.sql.catalyst.util.DateTimeUtils$.fromJavaDate(DateTimeUtils.scala:111)
>   at 
> org.apache.spark.sql.catalyst.util.DateTimeUtils.fromJavaDate(DateTimeUtils.scala)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
>  Source)
>   at 
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$Serializer.apply(ExpressionEncoder.scala:211)
>   ... 60 more
> {code}
>  In opposition to *spark-2.4.7* where it is possible to create dataframe with 
> such values:  
> {code:java}
> scala> val df = List(("some date", new Date(Int.MaxValue)), ("some corner 
> case date", new Date(Long.MaxValue))).toDF
> df: org.apache.spark.sql.DataFrame = [_1: string, _2: date]scala> df.show
> ++-+
> |  _1|   _2|
> ++-+
> |   some date|   1970-01-25|
> |some corner case ...|1701498-03-18|
> ++-+
> {code}
> Anyway, I am aware of the fact that during collecting these data I will got 
> another result: 
> {code:java}
> scala> df.collect
> res10: Array[org.apache.spark.sql.Row] = Array([some date,1970-01-25], [some 
> corner case date,?498-03-18])
> {code}
> what seems to be natural because of behaviour of *java.sql.Date*:
> {code:java}
> scala> new java.sql.Date(Long.MaxValue)
> res1: java.sql.Date = ?994-08-17
> {code}
>   
> 
> When it comes to easier reproduction, please consider: 
> {code:java}
> scala> org.apache.spark.sql.catalyst.util.DateTimeUtils.fromJavaDate(new 
> java.sql.Date(Long.MaxValue))
> java.lang.ArithmeticException: integer overflow
>   at java.lang.Math.toIntExact(Math.java:1011)
>   at 
> org.apache.spark.sql.catalyst.util.DateTimeUtils$.fromJavaDate(DateTimeUtils.scala:111)
>   ... 47 elided
> {code}
>  However, the question is even if such late dates are not supported, could it 
> fail in more gentle way?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34561) Cannot drop/add columns from/to a dataset of v2 `DESCRIBE TABLE`

2021-02-27 Thread Maxim Gekk (Jira)

Maxim Gekk created SPARK-34561:
--

 Summary: Cannot drop/add columns from/to a dataset of v2 `DESCRIBE 
TABLE`
 Key: SPARK-34561
 URL: https://issues.apache.org/jira/browse/SPARK-34561
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Maxim Gekk


Dropping a column from a dataset of v2 `DESCRIBE TABLE` fails with:
{code:java}
Resolved attribute(s) col_name#102,data_type#103 missing from 
col_name#29,data_type#30,comment#31 in operator !Project [col_name#102, 
data_type#103]. Attribute(s) with the same name appear in the operation: 
col_name,data_type. Please check if the right attribute(s) are used.;
!Project [col_name#102, data_type#103]
+- LocalRelation [col_name#29, data_type#30, comment#31]{code}
The code below demonstrates the issue:
{code:java}
val tbl = s"${catalogAndNamespace}tbl"
withTable(tbl) {
  sql(s"CREATE TABLE $tbl (c0 INT) USING $v2Format")
  val description = sql(s"DESCRIBE TABLE $tbl")
  val noComment = description.drop("comment")
}
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34560) Cannot join datasets of SHOW TABLES

2021-02-27 Thread Maxim Gekk (Jira)

Maxim Gekk created SPARK-34560:
--

 Summary: Cannot join datasets of SHOW TABLES
 Key: SPARK-34560
 URL: https://issues.apache.org/jira/browse/SPARK-34560
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.2.0
Reporter: Maxim Gekk


The example portraits the issue:

{code:scala}
scala> sql("CREATE NAMESPACE ns1")
res8: org.apache.spark.sql.DataFrame = []

scala> sql("CREATE NAMESPACE ns2")
res9: org.apache.spark.sql.DataFrame = []

scala> sql("CREATE TABLE ns1.tbl1 (c INT)")
res10: org.apache.spark.sql.DataFrame = []

scala> sql("CREATE TABLE ns2.tbl2 (c INT)")
res11: org.apache.spark.sql.DataFrame = []

scala> val show1 = sql("SHOW TABLES IN ns1")
show1: org.apache.spark.sql.DataFrame = [namespace: string, tableName: string 
... 1 more field]

scala> val show2 = sql("SHOW TABLES IN ns2")
show2: org.apache.spark.sql.DataFrame = [namespace: string, tableName: string 
... 1 more field]

scala> show1.show
+-+-+---+
|namespace|tableName|isTemporary|
+-+-+---+
|  ns1| tbl1|  false|
+-+-+---+


scala> show2.show
+-+-+---+
|namespace|tableName|isTemporary|
+-+-+---+
|  ns2| tbl2|  false|
+-+-+---+


scala> show1.join(show2).where(show1("tableName") =!= show2("tableName")).show
org.apache.spark.sql.AnalysisException: Column tableName#17 are ambiguous. It's 
probably because you joined several Datasets together, and some of these 
Datasets are the same. This column points to one of the Datasets but Spark is 
unable to figure out which one. Please alias the Datasets with different names 
via `Dataset.as` before joining them, and specify the column using qualified 
name, e.g. `df.as("a").join(df.as("b"), $"a.id" > $"b.id")`. You can also set 
spark.sql.analyzer.failAmbiguousSelfJoin to false to disable this check.
  at 
org.apache.spark.sql.execution.analysis.DetectAmbiguousSelfJoin$.apply(DetectAmbiguousSelfJoin.scala:157)
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34447) Refactor the unified v1 and v2 command tests

2021-02-27 Thread Maxim Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-34447:
---
Description: 
The ticket aims to gather potential improvements for the unified tests.
 1. Remove SharedSparkSession from *ParserSuite
 2. Rename tests like AlterTableAddPartitionSuite -> AddPartitionsSuite
 3. Add JIRA ID SPARK-33829 to "SPARK-33786: Cache's storage level should be 
respected when a table name is altered"
 4. Reset default namespace in ShowTablesSuiteBase."change current catalog and 
namespace with USE statements" using spark.sessionState.catalogManager.reset()

  was:
The ticket aims to gather potential improvements for the unified tests.
 1. Remove SharedSparkSession from *ParserSuite
 2. Rename tests like AlterTableAddPartitionSuite -> AddPartitionsSuite
 3. Add JIRA ID SPARK-33829 to "SPARK-33786: Cache's storage level should be 
respected when a table name is altered"


> Refactor the unified v1 and v2 command tests
> 
>
> Key: SPARK-34447
> URL: https://issues.apache.org/jira/browse/SPARK-34447
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Priority: Minor
>
> The ticket aims to gather potential improvements for the unified tests.
>  1. Remove SharedSparkSession from *ParserSuite
>  2. Rename tests like AlterTableAddPartitionSuite -> AddPartitionsSuite
>  3. Add JIRA ID SPARK-33829 to "SPARK-33786: Cache's storage level should be 
> respected when a table name is altered"
>  4. Reset default namespace in ShowTablesSuiteBase."change current catalog 
> and namespace with USE statements" using 
> spark.sessionState.catalogManager.reset()



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34554) Implement the copy() method in ColumnarMap

2021-02-26 Thread Maxim Gekk (Jira)

Maxim Gekk created SPARK-34554:
--

 Summary: Implement the copy() method in ColumnarMap
 Key: SPARK-34554
 URL: https://issues.apache.org/jira/browse/SPARK-34554
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Maxim Gekk


Implement ColumnarMap.copy() using ColumnarArray.copy()



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34543) Respect case sensitivity in V1 ALTER TABLE .. SET LOCATION

2021-02-25 Thread Maxim Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-34543:
---
Fix Version/s: (was: 3.0.2)
   (was: 2.4.8)
   (was: 3.1.0)

> Respect case sensitivity in V1 ALTER TABLE .. SET LOCATION
> --
>
> Key: SPARK-34543
> URL: https://issues.apache.org/jira/browse/SPARK-34543
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.4.7, 3.0.2, 3.1.1
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
>
> SHOW PARTITIONS is case sensitive, and doesn't respect the SQL config 
> *spark.sql.caseSensitive* which is false by default, for instance:
> {code:sql}
> spark-sql> CREATE TABLE tbl (id INT, part INT) PARTITIONED BY (part);
> spark-sql> INSERT INTO tbl PARTITION (part=0) SELECT 0;
> spark-sql> SHOW TABLE EXTENDED LIKE 'tbl' PARTITION (part=0);
> Location: 
> file:/Users/maximgekk/proj/set-location-case-sense/spark-warehouse/tbl/part=0
> spark-sql> ALTER TABLE tbl ADD PARTITION (part=1);
> spark-sql> SELECT * FROM tbl;
> 0 0
> {code}
> Create new partition folder in the file system:
> {code}
> $ cp -r 
> /Users/maximgekk/proj/set-location-case-sense/spark-warehouse/tbl/part=0 
> /Users/maximgekk/proj/set-location-case-sense/spark-warehouse/tbl/aaa
> {code}
> Set new location for the partition part=1:
> {code:sql}
> spark-sql> ALTER TABLE tbl PARTITION (part=1) SET LOCATION 
> '/Users/maximgekk/proj/set-location-case-sense/spark-warehouse/tbl/aaa';
> spark-sql> SELECT * FROM tbl;
> 0 0
> 0 1
> spark-sql> ALTER TABLE tbl ADD PARTITION (PART=2);
> spark-sql> SELECT * FROM tbl;
> 0 0
> 0 1
> {code}
> Set location for a partition in the upper case:
> {code}
> $ cp -r 
> /Users/maximgekk/proj/set-location-case-sense/spark-warehouse/tbl/part=0 
> /Users/maximgekk/proj/set-location-case-sense/spark-warehouse/tbl/bbb
> {code}
> {code:sql}
> spark-sql> ALTER TABLE tbl PARTITION (PART=2) SET LOCATION 
> '/Users/maximgekk/proj/set-location-case-sense/spark-warehouse/tbl/bbb';
> Error in query: Partition spec is invalid. The spec (PART) must match the 
> partition spec (part) defined in table '`default`.`tbl`'
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34543) Respect case sensitivity in V1 ALTER TABLE .. SET LOCATION

2021-02-25 Thread Maxim Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-34543:
---
Affects Version/s: (was: 3.0.1)
   (was: 3.1.0)
   3.1.1
   3.0.2

> Respect case sensitivity in V1 ALTER TABLE .. SET LOCATION
> --
>
> Key: SPARK-34543
> URL: https://issues.apache.org/jira/browse/SPARK-34543
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.4.7, 3.0.2, 3.1.1
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
> Fix For: 2.4.8, 3.0.2, 3.1.0
>
>
> SHOW PARTITIONS is case sensitive, and doesn't respect the SQL config 
> *spark.sql.caseSensitive* which is false by default, for instance:
> {code:sql}
> spark-sql> CREATE TABLE tbl (id INT, part INT) PARTITIONED BY (part);
> spark-sql> INSERT INTO tbl PARTITION (part=0) SELECT 0;
> spark-sql> SHOW TABLE EXTENDED LIKE 'tbl' PARTITION (part=0);
> Location: 
> file:/Users/maximgekk/proj/set-location-case-sense/spark-warehouse/tbl/part=0
> spark-sql> ALTER TABLE tbl ADD PARTITION (part=1);
> spark-sql> SELECT * FROM tbl;
> 0 0
> {code}
> Create new partition folder in the file system:
> {code}
> $ cp -r 
> /Users/maximgekk/proj/set-location-case-sense/spark-warehouse/tbl/part=0 
> /Users/maximgekk/proj/set-location-case-sense/spark-warehouse/tbl/aaa
> {code}
> Set new location for the partition part=1:
> {code:sql}
> spark-sql> ALTER TABLE tbl PARTITION (part=1) SET LOCATION 
> '/Users/maximgekk/proj/set-location-case-sense/spark-warehouse/tbl/aaa';
> spark-sql> SELECT * FROM tbl;
> 0 0
> 0 1
> spark-sql> ALTER TABLE tbl ADD PARTITION (PART=2);
> spark-sql> SELECT * FROM tbl;
> 0 0
> 0 1
> {code}
> Set location for a partition in the upper case:
> {code}
> $ cp -r 
> /Users/maximgekk/proj/set-location-case-sense/spark-warehouse/tbl/part=0 
> /Users/maximgekk/proj/set-location-case-sense/spark-warehouse/tbl/bbb
> {code}
> {code:sql}
> spark-sql> ALTER TABLE tbl PARTITION (PART=2) SET LOCATION 
> '/Users/maximgekk/proj/set-location-case-sense/spark-warehouse/tbl/bbb';
> Error in query: Partition spec is invalid. The spec (PART) must match the 
> partition spec (part) defined in table '`default`.`tbl`'
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34543) Respect case sensitivity in V1 ALTER TABLE .. SET LOCATION

2021-02-25 Thread Maxim Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-34543:
---
Description: 
SHOW PARTITIONS is case sensitive, and doesn't respect the SQL config 
*spark.sql.caseSensitive* which is false by default, for instance:
{code:sql}
spark-sql> CREATE TABLE tbl (id INT, part INT) PARTITIONED BY (part);
spark-sql> INSERT INTO tbl PARTITION (part=0) SELECT 0;
spark-sql> SHOW TABLE EXTENDED LIKE 'tbl' PARTITION (part=0);
Location: 
file:/Users/maximgekk/proj/set-location-case-sense/spark-warehouse/tbl/part=0
spark-sql> ALTER TABLE tbl ADD PARTITION (part=1);
spark-sql> SELECT * FROM tbl;
0   0
{code}
Create new partition folder in the file system:
{code}
$ cp -r 
/Users/maximgekk/proj/set-location-case-sense/spark-warehouse/tbl/part=0 
/Users/maximgekk/proj/set-location-case-sense/spark-warehouse/tbl/aaa
{code}
Set new location for the partition part=1:
{code:sql}
spark-sql> ALTER TABLE tbl PARTITION (part=1) SET LOCATION 
'/Users/maximgekk/proj/set-location-case-sense/spark-warehouse/tbl/aaa';
spark-sql> SELECT * FROM tbl;
0   0
0   1
spark-sql> ALTER TABLE tbl ADD PARTITION (PART=2);
spark-sql> SELECT * FROM tbl;
0   0
0   1
{code}
Set location for a partition in the upper case:
{code}
$ cp -r 
/Users/maximgekk/proj/set-location-case-sense/spark-warehouse/tbl/part=0 
/Users/maximgekk/proj/set-location-case-sense/spark-warehouse/tbl/bbb
{code}
{code:sql}
spark-sql> ALTER TABLE tbl PARTITION (PART=2) SET LOCATION 
'/Users/maximgekk/proj/set-location-case-sense/spark-warehouse/tbl/bbb';
Error in query: Partition spec is invalid. The spec (PART) must match the 
partition spec (part) defined in table '`default`.`tbl`'
{code}
 

  was:
SHOW PARTITIONS is case sensitive, and doesn't respect the SQL config 
*spark.sql.caseSensitive* which is false by default, for instance:
{code:sql}
spark-sql> CREATE TABLE tbl1 (price int, qty int, year int, month int)
 > USING parquet
 > PARTITIONED BY (year, month);
spark-sql> INSERT INTO tbl1 PARTITION(year = 2015, month = 1) SELECT 1, 1;
spark-sql> SHOW PARTITIONS tbl1 PARTITION(YEAR = 2015, Month = 1);
Error in query: Non-partitioning column(s) [YEAR, Month] are specified for SHOW 
PARTITIONS;
{code}
 


> Respect case sensitivity in V1 ALTER TABLE .. SET LOCATION
> --
>
> Key: SPARK-34543
> URL: https://issues.apache.org/jira/browse/SPARK-34543
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.4.7, 3.0.1, 3.1.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
> Fix For: 2.4.8, 3.0.2, 3.1.0
>
>
> SHOW PARTITIONS is case sensitive, and doesn't respect the SQL config 
> *spark.sql.caseSensitive* which is false by default, for instance:
> {code:sql}
> spark-sql> CREATE TABLE tbl (id INT, part INT) PARTITIONED BY (part);
> spark-sql> INSERT INTO tbl PARTITION (part=0) SELECT 0;
> spark-sql> SHOW TABLE EXTENDED LIKE 'tbl' PARTITION (part=0);
> Location: 
> file:/Users/maximgekk/proj/set-location-case-sense/spark-warehouse/tbl/part=0
> spark-sql> ALTER TABLE tbl ADD PARTITION (part=1);
> spark-sql> SELECT * FROM tbl;
> 0 0
> {code}
> Create new partition folder in the file system:
> {code}
> $ cp -r 
> /Users/maximgekk/proj/set-location-case-sense/spark-warehouse/tbl/part=0 
> /Users/maximgekk/proj/set-location-case-sense/spark-warehouse/tbl/aaa
> {code}
> Set new location for the partition part=1:
> {code:sql}
> spark-sql> ALTER TABLE tbl PARTITION (part=1) SET LOCATION 
> '/Users/maximgekk/proj/set-location-case-sense/spark-warehouse/tbl/aaa';
> spark-sql> SELECT * FROM tbl;
> 0 0
> 0 1
> spark-sql> ALTER TABLE tbl ADD PARTITION (PART=2);
> spark-sql> SELECT * FROM tbl;
> 0 0
> 0 1
> {code}
> Set location for a partition in the upper case:
> {code}
> $ cp -r 
> /Users/maximgekk/proj/set-location-case-sense/spark-warehouse/tbl/part=0 
> /Users/maximgekk/proj/set-location-case-sense/spark-warehouse/tbl/bbb
> {code}
> {code:sql}
> spark-sql> ALTER TABLE tbl PARTITION (PART=2) SET LOCATION 
> '/Users/maximgekk/proj/set-location-case-sense/spark-warehouse/tbl/bbb';
> Error in query: Partition spec is invalid. The spec (PART) must match the 
> partition spec (part) defined in table '`default`.`tbl`'
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34543) Respect case sensitivity in V1 ALTER TABLE .. SET LOCATION

2021-02-25 Thread Maxim Gekk (Jira)

Maxim Gekk created SPARK-34543:
--

 Summary: Respect case sensitivity in V1 ALTER TABLE .. SET LOCATION
 Key: SPARK-34543
 URL: https://issues.apache.org/jira/browse/SPARK-34543
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 2.4.7, 3.0.1, 3.1.0
Reporter: Maxim Gekk
Assignee: Maxim Gekk
 Fix For: 2.4.8, 3.0.2, 3.1.0


SHOW PARTITIONS is case sensitive, and doesn't respect the SQL config 
*spark.sql.caseSensitive* which is false by default, for instance:
{code:sql}
spark-sql> CREATE TABLE tbl1 (price int, qty int, year int, month int)
 > USING parquet
 > PARTITIONED BY (year, month);
spark-sql> INSERT INTO tbl1 PARTITION(year = 2015, month = 1) SELECT 1, 1;
spark-sql> SHOW PARTITIONS tbl1 PARTITION(YEAR = 2015, Month = 1);
Error in query: Non-partitioning column(s) [YEAR, Month] are specified for SHOW 
PARTITIONS;
{code}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34447) Refactor the unified v1 and v2 command tests

2021-02-24 Thread Maxim Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-34447:
---
Description: 
The ticket aims to gather potential improvements for the unified tests.
 1. Remove SharedSparkSession from *ParserSuite
 2. Rename tests like AlterTableAddPartitionSuite -> AddPartitionsSuite
 3. Add JIRA ID SPARK-33829 to "SPARK-33786: Cache's storage level should be 
respected when a table name is altered"

  was:
The ticket aims to gather potential improvements for the unified tests.
 1. Remove SharedSparkSession from *ParserSuite
 2. Rename tests like AlterTableAddPartitionSuite -> AddPartitionsSuite


> Refactor the unified v1 and v2 command tests
> 
>
> Key: SPARK-34447
> URL: https://issues.apache.org/jira/browse/SPARK-34447
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Priority: Minor
>
> The ticket aims to gather potential improvements for the unified tests.
>  1. Remove SharedSparkSession from *ParserSuite
>  2. Rename tests like AlterTableAddPartitionSuite -> AddPartitionsSuite
>  3. Add JIRA ID SPARK-33829 to "SPARK-33786: Cache's storage level should be 
> respected when a table name is altered"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34518) Rename `AlterTableRecoverPartitionsCommand` to `RepairTableCommand`

2021-02-24 Thread Maxim Gekk (Jira)

Maxim Gekk created SPARK-34518:
--

 Summary: Rename `AlterTableRecoverPartitionsCommand` to 
`RepairTableCommand`
 Key: SPARK-34518
 URL: https://issues.apache.org/jira/browse/SPARK-34518
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.2.0
Reporter: Maxim Gekk


`AlterTableRecoverPartitionsCommand` is the execution node for the command 
`ALTER TABLE .. RECOVER PARTITIONS` which cannot drop/sync partitions. Since 
`ALTER TABLE .. RECOVER PARTITIONS` is a case of `MSCK REPAIR TABLE`, and 
`ALTER TABLE .. RECOVER PARTITIONS` does not support any options, it makes 
sense to rename `AlterTableRecoverPartitionsCommand` to `RepairTableCommand`.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-27790) Support SQL INTERVAL types

2021-02-22 Thread Maxim Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-27790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-27790:
---
Affects Version/s: (was: 3.1.0)
   3.2.0

> Support SQL INTERVAL types
> --
>
> Key: SPARK-27790
> URL: https://issues.apache.org/jira/browse/SPARK-27790
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Priority: Major
>
> SQL standard defines 2 interval types:
> # year-month interval contains a YEAR field or a MONTH field or both
> # day-time interval contains DAY, HOUR, MINUTE, and SECOND (possibly fraction 
> of seconds)
> Need to add 2 new internal types YearMonthIntervalType and 
> DayTimeIntervalType, support operations defined by SQL standard as well as 
> INTERVAL literals.
> The java.time.Period and java.time.Duration can be supported as external type 
> for YearMonthIntervalType and DayTimeIntervalType.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34447) Refactor the unified v1 and v2 command tests

2021-02-20 Thread Maxim Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-34447:
---
Description: 
The ticket aims to gather potential improvements for the unified tests.
 1. Remove SharedSparkSession from *ParserSuite
 2. Rename tests like AlterTableAddPartitionSuite -> AddPartitionsSuite

  was:
The ticket aims to gather potential improvements for the unified tests.
1. Remove SharedSparkSession from *ParserSuite


> Refactor the unified v1 and v2 command tests
> 
>
> Key: SPARK-34447
> URL: https://issues.apache.org/jira/browse/SPARK-34447
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Priority: Minor
>
> The ticket aims to gather potential improvements for the unified tests.
>  1. Remove SharedSparkSession from *ParserSuite
>  2. Rename tests like AlterTableAddPartitionSuite -> AddPartitionsSuite



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34475) Rename v2 logical nodes

2021-02-19 Thread Maxim Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-34475:
---
Description: Rename v2 logical nodes for simplicity in the form:  + 
   (was: To be consistent with other exec nodes, rename:
* AlterTableAddPartitionExec -> AddPartitionExec
* AlterTableRenamePartitionExec -> RenamePartitionExec 
* AlterTableDropPartitionExec -> DropPartitionExec)

> Rename v2 logical nodes
> ---
>
> Key: SPARK-34475
> URL: https://issues.apache.org/jira/browse/SPARK-34475
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
> Fix For: 3.2.0
>
>
> Rename v2 logical nodes for simplicity in the form:  +  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34475) Rename v2 logical nodes

2021-02-19 Thread Maxim Gekk (Jira)

Maxim Gekk created SPARK-34475:
--

 Summary: Rename v2 logical nodes
 Key: SPARK-34475
 URL: https://issues.apache.org/jira/browse/SPARK-34475
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Maxim Gekk
Assignee: Maxim Gekk
 Fix For: 3.2.0


To be consistent with other exec nodes, rename:
* AlterTableAddPartitionExec -> AddPartitionExec
* AlterTableRenamePartitionExec -> RenamePartitionExec 
* AlterTableDropPartitionExec -> DropPartitionExec



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34302) Migrate ALTER TABLE .. CHANGE COLUMN to new resolution framework

2021-02-18 Thread Maxim Gekk (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17286842#comment-17286842
 ] 

Maxim Gekk commented on SPARK-34302:


[~imback82] I don't plan to work on this in the near future. Please, feel free 
to take this.

> Migrate ALTER TABLE .. CHANGE COLUMN to new resolution framework
> 
>
> Key: SPARK-34302
> URL: https://issues.apache.org/jira/browse/SPARK-34302
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
> Fix For: 3.2.0
>
>
> # Create the Command logical node for ALTER TABLE .. CHANGE COLUMN
> # Remove AlterTableAlterColumnStatement
> # Remove the check verifyAlterTableType() from run()



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34468) Fix v2 ALTER TABLE .. RENAME TO

2021-02-18 Thread Maxim Gekk (Jira)

Maxim Gekk created SPARK-34468:
--

 Summary: Fix v2 ALTER TABLE .. RENAME TO
 Key: SPARK-34468
 URL: https://issues.apache.org/jira/browse/SPARK-34468
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Maxim Gekk


The v2 `ALTER TABLE .. RENAME TO` command should rename a table in-place 
instead of moving it to the "root" namespace:

{code:scala}
sql("ALTER TABLE ns1.ns2.ns3.src_tbl RENAME TO dst_tbl")
sql(s"SHOW TABLES IN $catalog").show(false)

+-+-+---+
|namespace|tableName|isTemporary|
+-+-+---+
| |dst_tbl  |false  |
+-+-+---+
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34466) Improve docs for ALTER TABLE .. RENAME TO

2021-02-18 Thread Maxim Gekk (Jira)

Maxim Gekk created SPARK-34466:
--

 Summary: Improve docs for ALTER TABLE .. RENAME TO
 Key: SPARK-34466
 URL: https://issues.apache.org/jira/browse/SPARK-34466
 Project: Spark
  Issue Type: Documentation
  Components: SQL
Affects Versions: 3.2.0
Reporter: Maxim Gekk


The v1 ALTER TABLE .. RENAME TO command can only rename a table in a database 
but it cannot be used to move the table to another database. We should 
explicitly document the behaviour.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-34439) Recognize `spark_catalog` in new identifier while view/table renaming

2021-02-18 Thread Maxim Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk resolved SPARK-34439.

Resolution: Won't Fix

> Recognize `spark_catalog` in new identifier while view/table renaming
> -
>
> Key: SPARK-34439
> URL: https://issues.apache.org/jira/browse/SPARK-34439
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Priority: Major
>
> Currently, v1 ALTER TABLE .. RENAME TO doesn't recognize spark_catalog in new 
> view/table identifiers. The example below demonstrates the issue:
> {code:scala}
> spark-sql> CREATE DATABASE db;
> spark-sql> CREATE TABLE spark_catalog.db.tbl (c0 INT) USING parquet;
> spark-sql> INSERT INTO spark_catalog.db.tbl SELECT 0;
> spark-sql> SELECT * FROM spark_catalog.db.tbl;
> 0
> spark-sql> ALTER TABLE spark_catalog.db.tbl RENAME TO spark_catalog.db.tbl2;
> Error in query: spark_catalog.db.tbl2 is not a valid TableIdentifier as it 
> has more than 2 name parts.
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34465) Rename alter table exec nodes

2021-02-18 Thread Maxim Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-34465:
---
Description: 
To be consistent with other exec nodes, rename:
* AlterTableAddPartitionExec -> AddPartitionExec
* AlterTableRenamePartitionExec -> RenamePartitionExec 
* AlterTableDropPartitionExec -> DropPartitionExec

  was:
To be consistent with other exec nodes, rename:
AlterTableAddPartitionExec -> AddPartitionExec
AlterTableRenamePartitionExec -> RenamePartitionExec 
AlterTableDropPartitionExec -> DropPartitionExec


> Rename alter table exec nodes
> -
>
> Key: SPARK-34465
> URL: https://issues.apache.org/jira/browse/SPARK-34465
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Priority: Major
>
> To be consistent with other exec nodes, rename:
> * AlterTableAddPartitionExec -> AddPartitionExec
> * AlterTableRenamePartitionExec -> RenamePartitionExec 
> * AlterTableDropPartitionExec -> DropPartitionExec



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34465) Rename alter table exec nodes

2021-02-18 Thread Maxim Gekk (Jira)

Maxim Gekk created SPARK-34465:
--

 Summary: Rename alter table exec nodes
 Key: SPARK-34465
 URL: https://issues.apache.org/jira/browse/SPARK-34465
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Maxim Gekk


To be consistent with other exec nodes, rename:
AlterTableAddPartitionExec -> AddPartitionExec
AlterTableRenamePartitionExec -> RenamePartitionExec 
AlterTableDropPartitionExec -> DropPartitionExec



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34455) Deprecate spark.sql.legacy.replaceDatabricksSparkAvro.enabled

2021-02-17 Thread Maxim Gekk (Jira)

Maxim Gekk created SPARK-34455:
--

 Summary: Deprecate 
spark.sql.legacy.replaceDatabricksSparkAvro.enabled
 Key: SPARK-34455
 URL: https://issues.apache.org/jira/browse/SPARK-34455
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.2.0
Reporter: Maxim Gekk


Mark spark.sql.legacy.replaceDatabricksSparkAvro.enabled as deprecated, and 
recommend to use `.format("avro")` in `DataFrameWriter` or `DataFrameReader` 
instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34454) SQL configs from the legacy namespace must be internal

2021-02-17 Thread Maxim Gekk (Jira)

Maxim Gekk created SPARK-34454:
--

 Summary: SQL configs from the legacy namespace must be internal
 Key: SPARK-34454
 URL: https://issues.apache.org/jira/browse/SPARK-34454
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.0
Reporter: Maxim Gekk
Assignee: Maxim Gekk
 Fix For: 3.0.0


Assumed that legacy SQL configs shouldn't be set by users in common cases. The 
purpose of the configs is to allow switching to old behavior in corner cases. 
So, the configs can be marked as internals. The ticket aims to inspect existing 
SQL configs in SQLConf and add internal() call to config entry builders.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34454) SQL configs from the legacy namespace must be internal

2021-02-17 Thread Maxim Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-34454:
---
Fix Version/s: (was: 3.0.0)

> SQL configs from the legacy namespace must be internal
> --
>
> Key: SPARK-34454
> URL: https://issues.apache.org/jira/browse/SPARK-34454
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Minor
>
> Assumed that legacy SQL configs shouldn't be set by users in common cases. 
> The purpose of the configs is to allow switching to old behavior in corner 
> cases. So, the configs can be marked as internals. The ticket aims to inspect 
> existing SQL configs in SQLConf and add internal() call to config entry 
> builders.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34451) Add alternatives for datetime rebasing SQL configs and deprecate legacy configs

2021-02-17 Thread Maxim Gekk (Jira)

Maxim Gekk created SPARK-34451:
--

 Summary: Add alternatives for datetime rebasing SQL configs and 
deprecate legacy configs
 Key: SPARK-34451
 URL: https://issues.apache.org/jira/browse/SPARK-34451
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.2.0
Reporter: Maxim Gekk


The rebasing SQL configs like spark.sql.legacy.parquet.datetimeRebaseModeInRead 
can be used not only for migration from previous Spark versions but also to 
read/write datatime columns saved by other systems/frameworks/libs. The ticket 
aims to move the configs from the legacy namespace by introducing alternatives 
(like spark.sql.parquet.datetimeRebaseModeInRead) and deprecating the legacy 
configs (spark.sql.legacy.parquet.datetimeRebaseModeInRead).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34450) Unify v1 and v2 ALTER TABLE .. RENAME tests

2021-02-17 Thread Maxim Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-34450:
---
Summary: Unify v1 and v2 ALTER TABLE .. RENAME tests  (was: Unify v1 and v2 
`ALTER TABLE .. RENAME` tests)

> Unify v1 and v2 ALTER TABLE .. RENAME tests
> ---
>
> Key: SPARK-34450
> URL: https://issues.apache.org/jira/browse/SPARK-34450
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Priority: Major
>
> Extract ALTER TABLE .. RENAME tests to the common place to run them for V1 
> and v2 datasources. Some tests can be places to V1 and V2 specific test 
> suites.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34450) Unify v1 and v2 `ALTER TABLE .. RENAME` tests

2021-02-17 Thread Maxim Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-34450:
---
Description: Extract ALTER TABLE .. RENAME tests to the common place to run 
them for V1 and v2 datasources. Some tests can be places to V1 and V2 specific 
test suites.

> Unify v1 and v2 `ALTER TABLE .. RENAME` tests
> -
>
> Key: SPARK-34450
> URL: https://issues.apache.org/jira/browse/SPARK-34450
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Priority: Major
>
> Extract ALTER TABLE .. RENAME tests to the common place to run them for V1 
> and v2 datasources. Some tests can be places to V1 and V2 specific test 
> suites.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34450) Unify v1 and v2 `ALTER TABLE .. RENAME` tests

2021-02-17 Thread Maxim Gekk (Jira)

Maxim Gekk created SPARK-34450:
--

 Summary: Unify v1 and v2 `ALTER TABLE .. RENAME` tests
 Key: SPARK-34450
 URL: https://issues.apache.org/jira/browse/SPARK-34450
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Maxim Gekk






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33381) Unify DSv1 and DSv2 command tests

2021-02-17 Thread Maxim Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-33381:
---
Affects Version/s: 3.2.0

> Unify DSv1 and DSv2 command tests
> -
>
> Key: SPARK-33381
> URL: https://issues.apache.org/jira/browse/SPARK-33381
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 3.1.0, 3.2.0
>Reporter: Maxim Gekk
>Priority: Major
>
> Create unified test suites for DSv1 and DSv2 commands like CREATE TABLE, SHOW 
> TABLES and etc. Put datasource specific tests to separate test suites. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34447) Refactor the unified v1 and v2 command tests

2021-02-16 Thread Maxim Gekk (Jira)

Maxim Gekk created SPARK-34447:
--

 Summary: Refactor the unified v1 and v2 command tests
 Key: SPARK-34447
 URL: https://issues.apache.org/jira/browse/SPARK-34447
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Maxim Gekk


The ticket aims to gather potential improvements for the unified tests.
1. Remove SharedSparkSession from *ParserSuite



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34445) Make `spark.sql.legacy.replaceDatabricksSparkAvro.enabled` as non-internal

2021-02-15 Thread Maxim Gekk (Jira)

Maxim Gekk created SPARK-34445:
--

 Summary: Make 
`spark.sql.legacy.replaceDatabricksSparkAvro.enabled` as non-internal
 Key: SPARK-34445
 URL: https://issues.apache.org/jira/browse/SPARK-34445
 Project: Spark
  Issue Type: Documentation
  Components: SQL
Affects Versions: 3.2.0
Reporter: Maxim Gekk


The SQL config spark.sql.legacy.replaceDatabricksSparkAvro.enabled has been 
already documented in Spark SQL guide, in fact. Need to make it non-internal as 
it is documented publically.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34440) Allow saving/loading datetime in ORC w/o rebasing

2021-02-15 Thread Maxim Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-34440:
---
Affects Version/s: (was: 3.1.0)
   3.2.0

> Allow saving/loading datetime in ORC w/o rebasing
> -
>
> Key: SPARK-34440
> URL: https://issues.apache.org/jira/browse/SPARK-34440
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
>
> Currently, Spark always performs rebasing of date/timestamp columns in ORC 
> datasource but this is not required by parquet spec. This tickets aims to 
> allow users to turn off rebasing via SQL configs or DS options.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34440) Allow saving/loading datetime in ORC w/o rebasing

2021-02-15 Thread Maxim Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-34440:
---
Fix Version/s: (was: 3.2.0)

> Allow saving/loading datetime in ORC w/o rebasing
> -
>
> Key: SPARK-34440
> URL: https://issues.apache.org/jira/browse/SPARK-34440
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
>
> Currently, Spark always performs rebasing of date/timestamp columns in ORC 
> datasource but this is not required by parquet spec. This tickets aims to 
> allow users to turn off rebasing via SQL configs or DS options.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34440) Allow saving/loading datetime in ORC w/o rebasing

2021-02-15 Thread Maxim Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-34440:
---
Fix Version/s: (was: 3.1.0)
   3.2.0

> Allow saving/loading datetime in ORC w/o rebasing
> -
>
> Key: SPARK-34440
> URL: https://issues.apache.org/jira/browse/SPARK-34440
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
> Fix For: 3.2.0
>
>
> Currently, Spark always performs rebasing of INT96 columns in Parquet 
> datasource but this is not required by parquet spec. This tickets aims to 
> allow users to turn off rebasing via SQL config.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34440) Allow saving/loading datetime in ORC w/o rebasing

2021-02-15 Thread Maxim Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-34440:
---
Description: Currently, Spark always performs rebasing of date/timestamp 
columns in ORC datasource but this is not required by parquet spec. This 
tickets aims to allow users to turn off rebasing via SQL configs or DS options. 
 (was: Currently, Spark always performs rebasing of INT96 columns in Parquet 
datasource but this is not required by parquet spec. This tickets aims to allow 
users to turn off rebasing via SQL config.)

> Allow saving/loading datetime in ORC w/o rebasing
> -
>
> Key: SPARK-34440
> URL: https://issues.apache.org/jira/browse/SPARK-34440
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
> Fix For: 3.2.0
>
>
> Currently, Spark always performs rebasing of date/timestamp columns in ORC 
> datasource but this is not required by parquet spec. This tickets aims to 
> allow users to turn off rebasing via SQL configs or DS options.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34440) Allow saving/loading datetime in ORC w/o rebasing

2021-02-15 Thread Maxim Gekk (Jira)

Maxim Gekk created SPARK-34440:
--

 Summary: Allow saving/loading datetime in ORC w/o rebasing
 Key: SPARK-34440
 URL: https://issues.apache.org/jira/browse/SPARK-34440
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Affects Versions: 3.1.0
Reporter: Maxim Gekk
Assignee: Maxim Gekk
 Fix For: 3.1.0


Currently, Spark always performs rebasing of INT96 columns in Parquet 
datasource but this is not required by parquet spec. This tickets aims to allow 
users to turn off rebasing via SQL config.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34439) Recognize `spark_catalog` in new identifier while view/table renaming

2021-02-15 Thread Maxim Gekk (Jira)

Maxim Gekk created SPARK-34439:
--

 Summary: Recognize `spark_catalog` in new identifier while 
view/table renaming
 Key: SPARK-34439
 URL: https://issues.apache.org/jira/browse/SPARK-34439
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Maxim Gekk


Currently, v1 ALTER TABLE .. RENAME TO doesn't recognize spark_catalog in new 
view/table identifiers. The example below demonstrates the issue:
{code:scala}
spark-sql> CREATE DATABASE db;
spark-sql> CREATE TABLE spark_catalog.db.tbl (c0 INT) USING parquet;
spark-sql> INSERT INTO spark_catalog.db.tbl SELECT 0;
spark-sql> SELECT * FROM spark_catalog.db.tbl;
0
spark-sql> ALTER TABLE spark_catalog.db.tbl RENAME TO spark_catalog.db.tbl2;
Error in query: spark_catalog.db.tbl2 is not a valid TableIdentifier as it has 
more than 2 name parts.
{code}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34437) Update Spark SQL guide about rebase DS options and SQL configs

2021-02-14 Thread Maxim Gekk (Jira)

Maxim Gekk created SPARK-34437:
--

 Summary: Update Spark SQL guide about rebase DS options and SQL 
configs
 Key: SPARK-34437
 URL: https://issues.apache.org/jira/browse/SPARK-34437
 Project: Spark
  Issue Type: Documentation
  Components: SQL
Affects Versions: 3.2.0
Reporter: Maxim Gekk


Describe the following SQL configs:
* spark.sql.legacy.parquet.int96RebaseModeInWrite
* spark.sql.legacy.parquet.datetimeRebaseModeInWrite
* spark.sql.legacy.parquet.int96RebaseModeInRead
* spark.sql.legacy.parquet.datetimeRebaseModeInRead
* spark.sql.legacy.avro.datetimeRebaseModeInWrite
* spark.sql.legacy.avro.datetimeRebaseModeInRead

And Avro/Parquet options datetimeRebaseMode and int96RebaseMode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34434) Mention DS rebase options in SparkUpgradeException

2021-02-14 Thread Maxim Gekk (Jira)

Maxim Gekk created SPARK-34434:
--

 Summary: Mention DS rebase options in SparkUpgradeException 
 Key: SPARK-34434
 URL: https://issues.apache.org/jira/browse/SPARK-34434
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.2.0
Reporter: Maxim Gekk


Mention the DS options added by SPARK-34404 and SPARK-34377 in 
SparkUpgradeException.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34431) Only load hive-site.xml once

2021-02-13 Thread Maxim Gekk (Jira)

Maxim Gekk created SPARK-34431:
--

 Summary: Only load hive-site.xml once
 Key: SPARK-34431
 URL: https://issues.apache.org/jira/browse/SPARK-34431
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.2.0
Reporter: Maxim Gekk


Hive configs from hive-site.xml are parsed over and over again. We can optimize 
this, and parse it only once.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34424) HiveOrcHadoopFsRelationSuite fails with seed 610710213676

2021-02-12 Thread Maxim Gekk (Jira)

Maxim Gekk created SPARK-34424:
--

 Summary: HiveOrcHadoopFsRelationSuite fails with seed 610710213676
 Key: SPARK-34424
 URL: https://issues.apache.org/jira/browse/SPARK-34424
 Project: Spark
  Issue Type: Test
  Components: SQL
Affects Versions: 3.0.2, 3.2.0, 3.1.1
Reporter: Maxim Gekk


The test "test all data types" in HiveOrcHadoopFsRelationSuite fails with:
{code:java}
== Results ==
!== Correct Answer - 20 ==== Spark Answer - 20 ==
 struct   struct
 [1,1582-10-15]   [1,1582-10-15]
 [2,null] [2,null]
 [3,1970-01-01]   [3,1970-01-01]
 [4,1681-08-06]   [4,1681-08-06]
 [5,1582-10-15]   [5,1582-10-15]
 [6,-12-31]   [6,-12-31]
 [7,0583-01-04]   [7,0583-01-04]
 [8,6077-03-04]   [8,6077-03-04]
![9,1582-10-06]   [9,1582-10-15]
 [10,1582-10-15]  [10,1582-10-15]
 [11,-12-31]  [11,-12-31]
 [12,9722-10-04]  [12,9722-10-04]
 [13,0243-12-19]  [13,0243-12-19]
 [14,-12-31]  [14,-12-31]
 [15,8743-01-24]  [15,8743-01-24]
 [16,1039-10-31]  [16,1039-10-31]
 [17,-12-31]  [17,-12-31]
 [18,1582-10-15]  [18,1582-10-15]
 [19,1582-10-15]  [19,1582-10-15]
 [20,1582-10-15]  [20,1582-10-15]
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34418) Check v1 TRUNCATE TABLE preserves partitions

2021-02-10 Thread Maxim Gekk (Jira)

Maxim Gekk created SPARK-34418:
--

 Summary: Check v1 TRUNCATE TABLE preserves partitions
 Key: SPARK-34418
 URL: https://issues.apache.org/jira/browse/SPARK-34418
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Maxim Gekk


Add a test which checks TRUNCATE TABLE only removes rows and preserves existing 
partitions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-34392) Invalid ID for offset-based ZoneId since Spark 3.0

2021-02-09 Thread Maxim Gekk (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17281838#comment-17281838
 ] 

Maxim Gekk edited comment on SPARK-34392 at 2/9/21, 3:26 PM:
-

The "GMT+8:00" string is unsupported format in 3.0, see docs for the 
to_utc_timestamp() function 
(https://github.com/apache/spark/blob/30468a901577e82c855fbc4cb78e1b869facb44c/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L3397-L3402):
{code:scala}
@param tz A string detailing the time zone ID that the input should be adjusted 
to. It should
  be in the format of either region-based zone IDs or zone offsets. Region IDs 
must
  have the form 'area/city', such as 'America/Los_Angeles'. Zone offsets must 
be in
  the format '(+|-)HH:mm', for example '-08:00' or '+01:00'. Also 'UTC' and 'Z' 
are
  supported as aliases of '+00:00'. Other short names are not recommended to use
  because they can be ambiguous.
{code}


was (Author: maxgekk):
The "GMT+8:00" string is unsupported format in 3.0, see docs for the 
to_utc_timestamp() function:
{code:scala}
   * @param tz A string detailing the time zone ID that the input should be 
adjusted to. It should
   *   be in the format of either region-based zone IDs or zone 
offsets. Region IDs must
   *   have the form 'area/city', such as 'America/Los_Angeles'. Zone 
offsets must be in
   *   the format '(+|-)HH:mm', for example '-08:00' or '+01:00'. Also 
'UTC' and 'Z' are
   *   supported as aliases of '+00:00'. Other short names are not 
recommended to use
   *   because they can be ambiguous.
{code}

> Invalid ID for offset-based ZoneId since Spark 3.0
> --
>
> Key: SPARK-34392
> URL: https://issues.apache.org/jira/browse/SPARK-34392
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0, 3.0.1
>Reporter: Yuming Wang
>Priority: Major
>
> How to reproduce this issue:
> {code:sql}
> select to_utc_timestamp("2020-02-07 16:00:00", "GMT+8:00");
> {code}
> Spark 2.4:
> {noformat}
> spark-sql> select to_utc_timestamp("2020-02-07 16:00:00", "GMT+8:00");
> 2020-02-07 08:00:00
> Time taken: 0.089 seconds, Fetched 1 row(s)
> {noformat}
> Spark 3.x:
> {noformat}
> spark-sql> select to_utc_timestamp("2020-02-07 16:00:00", "GMT+8:00");
> 21/02/07 01:24:32 ERROR SparkSQLDriver: Failed in [select 
> to_utc_timestamp("2020-02-07 16:00:00", "GMT+8:00")]
> java.time.DateTimeException: Invalid ID for offset-based ZoneId: GMT+8:00
>   at java.time.ZoneId.ofWithPrefix(ZoneId.java:437)
>   at java.time.ZoneId.of(ZoneId.java:407)
>   at java.time.ZoneId.of(ZoneId.java:359)
>   at java.time.ZoneId.of(ZoneId.java:315)
>   at 
> org.apache.spark.sql.catalyst.util.DateTimeUtils$.getZoneId(DateTimeUtils.scala:53)
>   at 
> org.apache.spark.sql.catalyst.util.DateTimeUtils$.toUTCTime(DateTimeUtils.scala:814)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34392) Invalid ID for offset-based ZoneId since Spark 3.0

2021-02-09 Thread Maxim Gekk (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17281838#comment-17281838
 ] 

Maxim Gekk commented on SPARK-34392:


The "GMT+8:00" string is unsupported format in 3.0, see docs for the 
to_utc_timestamp() function:
{code:scala}
   * @param tz A string detailing the time zone ID that the input should be 
adjusted to. It should
   *   be in the format of either region-based zone IDs or zone 
offsets. Region IDs must
   *   have the form 'area/city', such as 'America/Los_Angeles'. Zone 
offsets must be in
   *   the format '(+|-)HH:mm', for example '-08:00' or '+01:00'. Also 
'UTC' and 'Z' are
   *   supported as aliases of '+00:00'. Other short names are not 
recommended to use
   *   because they can be ambiguous.
{code}

> Invalid ID for offset-based ZoneId since Spark 3.0
> --
>
> Key: SPARK-34392
> URL: https://issues.apache.org/jira/browse/SPARK-34392
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0, 3.0.1
>Reporter: Yuming Wang
>Priority: Major
>
> How to reproduce this issue:
> {code:sql}
> select to_utc_timestamp("2020-02-07 16:00:00", "GMT+8:00");
> {code}
> Spark 2.4:
> {noformat}
> spark-sql> select to_utc_timestamp("2020-02-07 16:00:00", "GMT+8:00");
> 2020-02-07 08:00:00
> Time taken: 0.089 seconds, Fetched 1 row(s)
> {noformat}
> Spark 3.x:
> {noformat}
> spark-sql> select to_utc_timestamp("2020-02-07 16:00:00", "GMT+8:00");
> 21/02/07 01:24:32 ERROR SparkSQLDriver: Failed in [select 
> to_utc_timestamp("2020-02-07 16:00:00", "GMT+8:00")]
> java.time.DateTimeException: Invalid ID for offset-based ZoneId: GMT+8:00
>   at java.time.ZoneId.ofWithPrefix(ZoneId.java:437)
>   at java.time.ZoneId.of(ZoneId.java:407)
>   at java.time.ZoneId.of(ZoneId.java:359)
>   at java.time.ZoneId.of(ZoneId.java:315)
>   at 
> org.apache.spark.sql.catalyst.util.DateTimeUtils$.getZoneId(DateTimeUtils.scala:53)
>   at 
> org.apache.spark.sql.catalyst.util.DateTimeUtils$.toUTCTime(DateTimeUtils.scala:814)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34404) Support Avro datasource options to control datetime rebasing in read

2021-02-08 Thread Maxim Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-34404:
---
Description: Add new Avro option similar to the SQL configs 
{{spark.sql.legacy.parquet.datetimeRebaseModeInRead}}{{.}}  (was: Add new 
parquet options similar to the SQL configs 
{{spark.sql.legacy.parquet.datetimeRebaseModeInRead}} and 
{{spark.sql.legacy.parquet.int96RebaseModeInRead.}})

> Support Avro datasource options to control datetime rebasing in read
> 
>
> Key: SPARK-34404
> URL: https://issues.apache.org/jira/browse/SPARK-34404
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
> Fix For: 3.2.0
>
>
> Add new Avro option similar to the SQL configs 
> {{spark.sql.legacy.parquet.datetimeRebaseModeInRead}}{{.}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34404) Support Avro datasource options to control datetime rebasing in read

2021-02-08 Thread Maxim Gekk (Jira)

Maxim Gekk created SPARK-34404:
--

 Summary: Support Avro datasource options to control datetime 
rebasing in read
 Key: SPARK-34404
 URL: https://issues.apache.org/jira/browse/SPARK-34404
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Affects Versions: 3.2.0
Reporter: Maxim Gekk
Assignee: Maxim Gekk
 Fix For: 3.2.0


Add new parquet options similar to the SQL configs 
{{spark.sql.legacy.parquet.datetimeRebaseModeInRead}} and 
{{spark.sql.legacy.parquet.int96RebaseModeInRead.}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34401) Update public docs about altering cached tables/views

2021-02-08 Thread Maxim Gekk (Jira)

Maxim Gekk created SPARK-34401:
--

 Summary: Update public docs about altering cached tables/views
 Key: SPARK-34401
 URL: https://issues.apache.org/jira/browse/SPARK-34401
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Maxim Gekk






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34397) Support v2 `MSCK REPAIR TABLE`

2021-02-07 Thread Maxim Gekk (Jira)

Maxim Gekk created SPARK-34397:
--

 Summary: Support v2 `MSCK REPAIR TABLE`
 Key: SPARK-34397
 URL: https://issues.apache.org/jira/browse/SPARK-34397
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Maxim Gekk


Implement the `MSCK REPAIR TABLE` command for tables from v2 catalogs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34386) "Proleptic" date off by 10 days when returned by .collectAsList

2021-02-06 Thread Maxim Gekk (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17280269#comment-17280269
 ] 

Maxim Gekk commented on SPARK-34386:


[~bysza] You can find more details in the blog post: 
https://databricks.com/blog/2020/07/22/a-comprehensive-look-at-dates-and-timestamps-in-apache-spark-3-0.html

> "Proleptic" date off by 10 days when returned by .collectAsList
> ---
>
> Key: SPARK-34386
> URL: https://issues.apache.org/jira/browse/SPARK-34386
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1
> Environment: Windows 10
>Reporter: Marek Byszewski
>Priority: Major
>
> Run the following commands using Spark 3.0.1:
> {{scala> spark.sql("select to_timestamp('1582-10-05 02:12:34.997') as 
> data_console").show(false)}}
> {{+---+}}
> {{|data_console           |}}
> {{+---+}}
> {{|*1582-10-05 02:12:34.997*|}}
> {{+---+}}
> {{scala> spark.sql("select to_timestamp('1582-10-05 02:12:34.997') as 
> data_console")}}
> {{res3: org.apache.spark.sql.DataFrame = [data_console: timestamp]}}
> {{scala> res3.collectAsList}}
> {{res4: java.util.List[org.apache.spark.sql.Row] = 
> [[*1582-10-{color:#FF}15{color} 02:12:34.997*]]}}
> Notice that the returned date is off by 10 days compared to the date returned 
> by the first command.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34386) "Proleptic" date off by 10 days when returned by .collectAsList

2021-02-06 Thread Maxim Gekk (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17280268#comment-17280268
 ] 

Maxim Gekk commented on SPARK-34386:


[~bysza] Thanks for the ping. This is expected behavior, actually. The 
collectAsList() method converts internal timestamp values (in the Proleptic 
Gregorian calendar) to java.sql.Timestamp which is based on the hybrid calendar 
(Julian + Proleptic Gregorian calendars). The timestamp from your example 
doesn't exist in the hybrid calendar, so, Spark shifts it to the closest valid 
date which is 1582-10-15. If you want to receive timestamps AS IS  from 
collectAsList(), please, switch to Java 8 types via 
*spark.sql.datetime.java8API.enabled*.

> "Proleptic" date off by 10 days when returned by .collectAsList
> ---
>
> Key: SPARK-34386
> URL: https://issues.apache.org/jira/browse/SPARK-34386
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1
> Environment: Windows 10
>Reporter: Marek Byszewski
>Priority: Major
>
> Run the following commands using Spark 3.0.1:
> {{scala> spark.sql("select to_timestamp('1582-10-05 02:12:34.997') as 
> data_console").show(false)}}
> {{+---+}}
> {{|data_console           |}}
> {{+---+}}
> {{|*1582-10-05 02:12:34.997*|}}
> {{+---+}}
> {{scala> spark.sql("select to_timestamp('1582-10-05 02:12:34.997') as 
> data_console")}}
> {{res3: org.apache.spark.sql.DataFrame = [data_console: timestamp]}}
> {{scala> res3.collectAsList}}
> {{res4: java.util.List[org.apache.spark.sql.Row] = 
> [[*1582-10-{color:#FF}15{color} 02:12:34.997*]]}}
> Notice that the returned date is off by 10 days compared to the date returned 
> by the first command.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34385) Unwrap SparkUpgradeException in v2 Parquet datasource

2021-02-06 Thread Maxim Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-34385:
---
Summary: Unwrap SparkUpgradeException in v2 Parquet datasource  (was: 
Unwrap SparkUpgradeException in v1 Parquet datasource)

> Unwrap SparkUpgradeException in v2 Parquet datasource
> -
>
> Key: SPARK-34385
> URL: https://issues.apache.org/jira/browse/SPARK-34385
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Priority: Major
>
> Unwrap SparkUpgradeException in FilePartitionReader, and throw it as caused 
> one of SparkException.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34385) Unwrap SparkUpgradeException in v1 Parquet datasource

2021-02-06 Thread Maxim Gekk (Jira)

Maxim Gekk created SPARK-34385:
--

 Summary: Unwrap SparkUpgradeException in v1 Parquet datasource
 Key: SPARK-34385
 URL: https://issues.apache.org/jira/browse/SPARK-34385
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.2.0
Reporter: Maxim Gekk


Unwrap SparkUpgradeException in FilePartitionReader, and throw it as caused one 
of SparkException.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34377) Support parquet datasource options to control datetime rebasing in read

2021-02-05 Thread Maxim Gekk (Jira)

Maxim Gekk created SPARK-34377:
--

 Summary: Support parquet datasource options to control datetime 
rebasing in read
 Key: SPARK-34377
 URL: https://issues.apache.org/jira/browse/SPARK-34377
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Affects Versions: 3.2.0
Reporter: Maxim Gekk


Add new parquet options similar to the SQL configs 
{{spark.sql.legacy.parquet.datetimeRebaseModeInRead}} and 
{{spark.sql.legacy.parquet.int96RebaseModeInRead.}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34371) Run datetime rebasing tests for parquet DSv1 and DSv2

2021-02-04 Thread Maxim Gekk (Jira)

Maxim Gekk created SPARK-34371:
--

 Summary: Run datetime rebasing tests for parquet DSv1 and DSv2
 Key: SPARK-34371
 URL: https://issues.apache.org/jira/browse/SPARK-34371
 Project: Spark
  Issue Type: Test
  Components: SQL
Affects Versions: 3.2.0
Reporter: Maxim Gekk


Extract datetime rebasing tests from ParquetIOSuite and place them a separate 
test suite to run it for both implementations DS v1 and v2.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34360) Support table truncation by v2 Table Catalogs

2021-02-04 Thread Maxim Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-34360:
---
Description: Add new method `truncateTable` to the TableCatalog interface 
with default implementation. And implement this method in InMemoryTableCatalog. 
 (was: Add new method `truncatePartition` in `SupportsPartitionManagement` and 
`truncatePartitions` in `SupportsAtomicPartitionManagement`.)

> Support table truncation by v2 Table Catalogs
> -
>
> Key: SPARK-34360
> URL: https://issues.apache.org/jira/browse/SPARK-34360
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
> Fix For: 3.2.0
>
>
> Add new method `truncateTable` to the TableCatalog interface with default 
> implementation. And implement this method in InMemoryTableCatalog.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34360) Support table truncation by v2 Table Catalogs

2021-02-04 Thread Maxim Gekk (Jira)

Maxim Gekk created SPARK-34360:
--

 Summary: Support table truncation by v2 Table Catalogs
 Key: SPARK-34360
 URL: https://issues.apache.org/jira/browse/SPARK-34360
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Maxim Gekk
Assignee: Maxim Gekk
 Fix For: 3.2.0


Add new method `truncatePartition` in `SupportsPartitionManagement` and 
`truncatePartitions` in `SupportsAtomicPartitionManagement`.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34332) Unify v1 and v2 ALTER TABLE .. SET LOCATION tests

2021-02-02 Thread Maxim Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-34332:
---
Description: Extract ALTER TABLE .. SET LOCATION tests to the common place 
to run them for V1 and v2 datasources. Some tests can be places to V1 and V2 
specific test suites.  (was: Extract ALTER TABLE .. SET SERDE tests to the 
common place to run them for V1 and v2 datasources. Some tests can be places to 
V1 and V2 specific test suites.)

> Unify v1 and v2 ALTER TABLE .. SET LOCATION tests
> -
>
> Key: SPARK-34332
> URL: https://issues.apache.org/jira/browse/SPARK-34332
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
> Fix For: 3.2.0
>
>
> Extract ALTER TABLE .. SET LOCATION tests to the common place to run them for 
> V1 and v2 datasources. Some tests can be places to V1 and V2 specific test 
> suites.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34332) Unify v1 and v2 ALTER TABLE .. SET LOCATION tests

2021-02-02 Thread Maxim Gekk (Jira)

Maxim Gekk created SPARK-34332:
--

 Summary: Unify v1 and v2 ALTER TABLE .. SET LOCATION tests
 Key: SPARK-34332
 URL: https://issues.apache.org/jira/browse/SPARK-34332
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Maxim Gekk
Assignee: Maxim Gekk
 Fix For: 3.2.0


Extract ALTER TABLE .. SET SERDE tests to the common place to run them for V1 
and v2 datasources. Some tests can be places to V1 and V2 specific test suites.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34314) Wrong discovered partition value

2021-02-01 Thread Maxim Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-34314:
---
Affects Version/s: 3.1.0
   3.0.2
   2.4.8

> Wrong discovered partition value
> 
>
> Key: SPARK-34314
> URL: https://issues.apache.org/jira/browse/SPARK-34314
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.8, 3.0.2, 3.1.0, 3.2.0
>Reporter: Maxim Gekk
>Priority: Major
>
> The example below portraits the issue:
> {code:scala}
>   val df = Seq((0, "AA"), (1, "-0")).toDF("id", "part")
>   df.write
> .partitionBy("part")
> .format("parquet")
> .save(path)
>   val readback = spark.read.parquet(path)
>   readback.printSchema()
>   readback.show(false)
> {code}
> It write the partition value as string:
> {code}
> /private/var/folders/p3/dfs6mf655d7fnjrsjvldh0tcgn/T/spark-e09eae99-7ecf-4ab2-b99b-f63f8dea658d
> ├── _SUCCESS
> ├── part=-0
> │   └── part-1-02144398-2896-4d21-9628-a8743d098cb4.c000.snappy.parquet
> └── part=AA
> └── part-0-02144398-2896-4d21-9628-a8743d098cb4.c000.snappy.parquet
> {code}
> *"-0"* and "AA".
> but when Spark reads data back, it transforms "-0" to "0"
> {code}
> root
>  |-- id: integer (nullable = true)
>  |-- part: string (nullable = true)
> +---++
> |id |part|
> +---++
> |0  |AA  |
> |1  |0   |
> +---++
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34314) Wrong discovered partition value

2021-02-01 Thread Maxim Gekk (Jira)

Maxim Gekk created SPARK-34314:
--

 Summary: Wrong discovered partition value
 Key: SPARK-34314
 URL: https://issues.apache.org/jira/browse/SPARK-34314
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.2.0
Reporter: Maxim Gekk


The example below portraits the issue:
{code:scala}
  val df = Seq((0, "AA"), (1, "-0")).toDF("id", "part")
  df.write
.partitionBy("part")
.format("parquet")
.save(path)
  val readback = spark.read.parquet(path)
  readback.printSchema()
  readback.show(false)
{code}

It write the partition value as string:
{code}
/private/var/folders/p3/dfs6mf655d7fnjrsjvldh0tcgn/T/spark-e09eae99-7ecf-4ab2-b99b-f63f8dea658d
├── _SUCCESS
├── part=-0
│   └── part-1-02144398-2896-4d21-9628-a8743d098cb4.c000.snappy.parquet
└── part=AA
└── part-0-02144398-2896-4d21-9628-a8743d098cb4.c000.snappy.parquet
{code}
*"-0"* and "AA".

but when Spark reads data back, it transforms "-0" to "0"
{code}
root
 |-- id: integer (nullable = true)
 |-- part: string (nullable = true)

+---++
|id |part|
+---++
|0  |AA  |
|1  |0   |
+---++
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34312) Support partition truncation by `SupportsPartitionManagement`

2021-02-01 Thread Maxim Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-34312:
---
Description: Add new method `truncatePartition` in 
`SupportsPartitionManagement` and `truncatePartitions` in 
`SupportsAtomicPartitionManagement`.  (was: Add new method `purgePartition` in 
`SupportsPartitionManagement` and `purgePartitions` in 
`SupportsAtomicPartitionManagement`.)

> Support partition truncation by `SupportsPartitionManagement`
> -
>
> Key: SPARK-34312
> URL: https://issues.apache.org/jira/browse/SPARK-34312
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
> Fix For: 3.2.0
>
>
> Add new method `truncatePartition` in `SupportsPartitionManagement` and 
> `truncatePartitions` in `SupportsAtomicPartitionManagement`.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34312) Support partition truncation by `SupportsPartitionManagement`

2021-02-01 Thread Maxim Gekk (Jira)

Maxim Gekk created SPARK-34312:
--

 Summary: Support partition truncation by 
`SupportsPartitionManagement`
 Key: SPARK-34312
 URL: https://issues.apache.org/jira/browse/SPARK-34312
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Maxim Gekk
Assignee: Maxim Gekk
 Fix For: 3.2.0


Add new method `purgePartition` in `SupportsPartitionManagement` and 
`purgePartitions` in `SupportsAtomicPartitionManagement`.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1102 matches

Mail list logo