[jira] [Resolved] (SPARK-24778) DateTimeUtils.getTimeZone method returns GMT time if timezone cannot be parsed

2019-03-03 Thread Maxim Gekk (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk resolved SPARK-24778.

   Resolution: Fixed
Fix Version/s: 3.0.0

The issue has been fixed already by using ZoneId.of for parsing time zone ids.

> DateTimeUtils.getTimeZone method returns GMT time if timezone cannot be parsed
> --
>
> Key: SPARK-24778
> URL: https://issues.apache.org/jira/browse/SPARK-24778
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: Vinitha Reddy Gankidi
>Priority: Major
> Fix For: 3.0.0
>
>
> {{DateTimeUtils.getTimeZone}} calls java's {{Timezone.getTimezone}} method 
> that defaults to GMT if the timezone cannot be parsed. This can be misleading 
> for users and its better to return NULL instead of returning an incorrect 
> value. 
> To reproduce: {{from_utc_timestamp}} is one of the functions that calls 
> {{DateTimeUtils.getTimeZone}}. Session timezone is GMT for the following 
> queries.
> {code:java}
> SELECT from_utc_timestamp('2018-07-10 12:00:00', 'GMT+05:00') -> 2018-07-10 
> 17:00:00 
> SELECT from_utc_timestamp('2018-07-10 12:00:00', '+05:00') -> 2018-07-10 
> 12:00:00 (Defaults to GMT as the timezone is not recognized){code}
> We could fix it by using the workaround mentioned here: 
> [https://bugs.openjdk.java.net/browse/JDK-4412864].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26016) Encoding not working when using a map / mapPartitions call

2019-03-03 Thread Maxim Gekk (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16782778#comment-16782778
 ] 

Maxim Gekk commented on SPARK-26016:


> nothing reinterprets the bytes according to a different encoding?

Correct

> The underlying Hadoop impl does interpret the bytes as UTF-8 (skipping of 
> BOMs, etc) ...

Hadoop's LineReader does not decode input bytes. It just copy bytes between 
line delimiters in 
https://github.com/apache/hadoop-common/blob/42a61a4fbc88303913c4681f0d40ffcc737e70b5/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/LineReader.java#L257
 by using Text.append 
(https://github.com/apache/hadoop-common/blob/42a61a4fbc88303913c4681f0d40ffcc737e70b5/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/LineReader.java#L335):
{code:java}
  public void append(byte[] utf8, int start, int len) {
setCapacity(length + len, true);
System.arraycopy(utf8, start, bytes, length, len);
length += len;
  }
{code}

Spark actually never checks correctness of UTF8String input. I even created a 
few tickets for that:
https://issues.apache.org/jira/browse/SPARK-23741
https://issues.apache.org/jira/browse/SPARK-23649

Adding such checks could bring some performance degradation most likely.

> Encoding not working when using a map / mapPartitions call
> --
>
> Key: SPARK-26016
> URL: https://issues.apache.org/jira/browse/SPARK-26016
> Project: Spark
>  Issue Type: Bug
>  Components: Java API
>Affects Versions: 2.4.0
>Reporter: Chris Caspanello
>Priority: Major
> Attachments: spark-sandbox.zip
>
>
> Attached you will find a project with unit tests showing the issue at hand.
> If I read in a ISO-8859-1 encoded file and simply write out what was read; 
> the contents in the part file matches what was read.  Which is great.
> However, the second I use a map / mapPartitions function it looks like the 
> encoding is not correct.  In addition a simple collectAsList and writing that 
> list of strings to a file does not work either.  I don't think I'm doing 
> anything wrong.  Can someone please investigate?  I think this is a bug.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-27057) Common trait for limit exec operators

2019-03-05 Thread Maxim Gekk (JIRA)
Maxim Gekk created SPARK-27057:
--

 Summary: Common trait for limit exec operators
 Key: SPARK-27057
 URL: https://issues.apache.org/jira/browse/SPARK-27057
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.4.0
Reporter: Maxim Gekk


Currently, CollectLimitExec, LocalLimitExec and GlobalLimitExec have the 
UnaryExecNode trait as the common trait. It is slightly inconvenient to 
distinguish those operators from others. The ticket aims to introduce new trait 
for all 3 operators. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-27109) Refactoring of TimestampFormatter and DateFormatter

2019-03-08 Thread Maxim Gekk (JIRA)
Maxim Gekk created SPARK-27109:
--

 Summary: Refactoring of TimestampFormatter and DateFormatter
 Key: SPARK-27109
 URL: https://issues.apache.org/jira/browse/SPARK-27109
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.4.0
Reporter: Maxim Gekk


* Date/TimestampFormatter converts parsed input to Instant before converting it 
to days/micros. This is unnecessary conversion because seconds and fraction of 
second can be extracted (calculated) from ZoneDateTime directly
 * Avoid additional extraction of TemporalQueries.localTime from 
temporalAccessor



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-27199) Replace TimeZone by ZoneId in TimestampFormatter API

2019-03-19 Thread Maxim Gekk (JIRA)
Maxim Gekk created SPARK-27199:
--

 Summary: Replace TimeZone by ZoneId in TimestampFormatter API
 Key: SPARK-27199
 URL: https://issues.apache.org/jira/browse/SPARK-27199
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.4.0
Reporter: Maxim Gekk


Internally, TimestampFormatter implementations use ZoneId but not TimeZone 
which comes via API. Conversion from TimeZone to ZoneId is not for free. 
Actually, TimeZone is converted to String, and the String and parsed to ZoneId. 
The conversion to String can be eliminated if TimestampFormatter would accept 
ZoneId. And also, TimeZone is converted from String in some cases (JSON 
options). So, in bad case String -> TimeZone -> String -> ZoneId -> ZoneOffset. 
The ticket aims to use ZoneId in TimestampFormatter API. We could require 
ZoneOffset but it is not convenient in most cases because conversion ZoneId to 
ZoneOffset requires Instant. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-27212) Eliminate TimeZone to ZoneId conversion in stringToTimestamp

2019-03-20 Thread Maxim Gekk (JIRA)
Maxim Gekk created SPARK-27212:
--

 Summary: Eliminate TimeZone to ZoneId conversion in 
stringToTimestamp
 Key: SPARK-27212
 URL: https://issues.apache.org/jira/browse/SPARK-27212
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.4.0
Reporter: Maxim Gekk


The stringToTimestamp method of DateTimeUtils (and stringToDate as well) can be 
called per each row. And the method converts TimeZone to ZoneId each time. The 
operation is relatively expensive because it does intermediate conversion to a 
string: 
http://hg.openjdk.java.net/jdk8u/jdk8u/jdk/file/f940e7a48b72/src/share/classes/java/util/TimeZone.java#l547

The conversion is unnecessary, and could be avoid. The ticket aims to replace 
signature of stringToTimestamp to require ZoneId as a parameter.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-27222) Support Instant and LocalDate in Literal.apply

2019-03-20 Thread Maxim Gekk (JIRA)
Maxim Gekk created SPARK-27222:
--

 Summary: Support Instant and LocalDate in Literal.apply
 Key: SPARK-27222
 URL: https://issues.apache.org/jira/browse/SPARK-27222
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.4.0
Reporter: Maxim Gekk


SPARK-26902 and SPARK-27008 supported java.time.Instant and java.time.LocalDate 
as external types for TimestampType and DateType. The ticket aims to support 
literals of such types. In particular, need to extend Literal.apply by adding 
new cases for  java.time.Instant/LocalDate.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-27242) Avoid using default time zone in formatting TIMESTAMP/DATE literals

2019-03-22 Thread Maxim Gekk (JIRA)
Maxim Gekk created SPARK-27242:
--

 Summary: Avoid using default time zone in formatting 
TIMESTAMP/DATE literals
 Key: SPARK-27242
 URL: https://issues.apache.org/jira/browse/SPARK-27242
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.4.0
Reporter: Maxim Gekk


Spark calls the toString() methods of java.sql.Timestamp/java.sql.Date in 
formatting TIMESTAMP/DATE literals in Literal.sql: 
https://github.com/apache/spark/blob/0f4f8160e6d01d2e263adcf39d53bd0a03fc1b73/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala#L373-L374
 . This is inconsistent to parsing TIMESTAMP/DATE literals in AstBuilder: 
https://github.com/apache/spark/blob/a529be2930b1d69015f1ac8f85e590f197cf53cf/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala#L1594-L1597
 where *spark.sql.session.timeZone* is used in parsing TIMESTAMP literals, and 
DATE literals are parsed independently from time zone (actually in UTC time 
zone). The ticket aims to make parsing and formatting date/timestamp literals 
consistent, and use the SQL config for TIMESTAMP literals.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-27252) Make current_date() independent from time zones

2019-03-22 Thread Maxim Gekk (JIRA)
Maxim Gekk created SPARK-27252:
--

 Summary: Make current_date() independent from time zones
 Key: SPARK-27252
 URL: https://issues.apache.org/jira/browse/SPARK-27252
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.4.0
Reporter: Maxim Gekk


The CurrentDate expression produces a result of DateType which is by definition 
number of days since epoch (in UTC time zone). Current implementation shifts 
the number of days according to session time zone `spark.sql.session.timeZone`. 
Result of shifting cannot be considered as number of days since epoch in UTC 
time zone, and cannot have type `DateType`. There are many reasons that makes 
the result invalid. For example:
# zone offset depends on an instant in UTC timezone, and zone offset of 
`spark.sql.session.timeZone` in shifted date may have different value.
# Result of shifting cannot be considered as number of days since epoch anymore.

The ticket aims to make `current_date` independent from time zone, and to 
return the current date in UTC time zone.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26325) Interpret timestamp fields in Spark while reading json (timestampFormat)

2019-03-25 Thread Maxim Gekk (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16801011#comment-16801011
 ] 

Maxim Gekk commented on SPARK-26325:


Can you try Z instead of 'Z'?

> Interpret timestamp fields in Spark while reading json (timestampFormat)
> 
>
> Key: SPARK-26325
> URL: https://issues.apache.org/jira/browse/SPARK-26325
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Veenit Shah
>Priority: Major
>
> I am trying to read a pretty printed json which has time fields in it. I want 
> to interpret the timestamps columns as timestamp fields while reading the 
> json itself. However, it's still reading them as string when I {{printSchema}}
> E.g. Input json file -
> {code:java}
> [{
> "time_field" : "2017-09-30 04:53:39.412496Z"
> }]
> {code}
> Code -
> {code:java}
> df = spark.read.option("multiLine", 
> "true").option("timestampFormat","-MM-dd 
> HH:mm:ss.SS'Z'").json('path_to_json_file')
> {code}
> Output of df.printSchema() -
> {code:java}
> root
>  |-- time_field: string (nullable = true)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-27325) Support implicit encoders for LocalDate and Instant

2019-03-30 Thread Maxim Gekk (JIRA)
Maxim Gekk created SPARK-27325:
--

 Summary: Support implicit encoders for LocalDate and Instant
 Key: SPARK-27325
 URL: https://issues.apache.org/jira/browse/SPARK-27325
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Affects Versions: 2.4.0
Reporter: Maxim Gekk


Currently, Spark supports java.time.LocalDate and java.time.Instant as external 
types for DateType and TimestampType but doesn't allow to construct datasets 
using the external types because there is no implicit encoders of the types. 
The ticket aims to add such encoders.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-27327) New JSON benchmarks: functions, dataset parsing

2019-03-30 Thread Maxim Gekk (JIRA)
Maxim Gekk created SPARK-27327:
--

 Summary: New JSON benchmarks: functions, dataset parsing
 Key: SPARK-27327
 URL: https://issues.apache.org/jira/browse/SPARK-27327
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.4.0
Reporter: Maxim Gekk


Existing JSONBenchmarks doesn't contain benchmarks for:
# JSON functions like from_json
# Parsing Dataset[String]

The ticket aims to update JSONBenchmark, and add new benchmarks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-27344) Support the LocalDate and Instant classes in Java Bean encoders

2019-04-01 Thread Maxim Gekk (JIRA)
Maxim Gekk created SPARK-27344:
--

 Summary: Support the LocalDate and Instant classes in Java Bean 
encoders
 Key: SPARK-27344
 URL: https://issues.apache.org/jira/browse/SPARK-27344
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.4.0
Reporter: Maxim Gekk


- Check that Java Bean encoders support java.time.LocalDate and 
java.time.Instant. Write a test for that.
- Update the comment: 
https://github.com/apache/spark/pull/24249/files#diff-3e88c21c9270fef6eaf6f0e64ed81f27R152



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-27357) Convert timestamps to/from dates independently from time zones

2019-04-03 Thread Maxim Gekk (JIRA)
Maxim Gekk created SPARK-27357:
--

 Summary: Convert timestamps to/from dates independently from time 
zones
 Key: SPARK-27357
 URL: https://issues.apache.org/jira/browse/SPARK-27357
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.4.0
Reporter: Maxim Gekk


Both Catalyst's types TIMESTAMP and DATE internally represent time intervals 
since epoch in UTC time zone. The TIMESTAMP type contains number of 
microseconds since epoch, and DATE is number of days since epoch (00:00:00  1 
January 1970). As a consequence of that, the conversion should be independent 
from session or local time zone. The ticket aims to fix current behavior and 
makes the conversion independent from time zones.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27357) Cast timestamps to/from dates independently from time zones

2019-04-03 Thread Maxim Gekk (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-27357:
---
Summary: Cast timestamps to/from dates independently from time zones  (was: 
Convert timestamps to/from dates independently from time zones)

> Cast timestamps to/from dates independently from time zones
> ---
>
> Key: SPARK-27357
> URL: https://issues.apache.org/jira/browse/SPARK-27357
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Maxim Gekk
>Priority: Minor
>
> Both Catalyst's types TIMESTAMP and DATE internally represent time intervals 
> since epoch in UTC time zone. The TIMESTAMP type contains number of 
> microseconds since epoch, and DATE is number of days since epoch (00:00:00  1 
> January 1970). As a consequence of that, the conversion should be independent 
> from session or local time zone. The ticket aims to fix current behavior and 
> makes the conversion independent from time zones.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-27398) Get rid of sun.nio.cs.StreamDecoder in CreateJacksonParser

2019-04-06 Thread Maxim Gekk (JIRA)
Maxim Gekk created SPARK-27398:
--

 Summary: Get rid of sun.nio.cs.StreamDecoder in CreateJacksonParser
 Key: SPARK-27398
 URL: https://issues.apache.org/jira/browse/SPARK-27398
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.4.0
Reporter: Maxim Gekk


The CreateJacksonParser.getStreamDecoder method creates an instance of 
ReadableByteChannel and returns the result as of sun.nio.cs.StreamDecoder. This 
is unnecessary and overcomplicates the method. This code can be replaced by:
{code:scala}
val bais = new ByteArrayInputStream(in, 0, length)
new InputStreamReader(bais, enc)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-27401) Refactoring conversion of Date/Timestamp to/from java.sql.Date/Timestamp

2019-04-06 Thread Maxim Gekk (JIRA)
Maxim Gekk created SPARK-27401:
--

 Summary: Refactoring conversion of Date/Timestamp to/from 
java.sql.Date/Timestamp
 Key: SPARK-27401
 URL: https://issues.apache.org/jira/browse/SPARK-27401
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.4.0
Reporter: Maxim Gekk


The fromJavaTimestamp/toJavaTimestamp and toJavaDate/fromJavaDate can be 
implemented using existing methods DateTimeUtils like 
instantToMicros/microsToInstant and daysToLocalDate/localDateToDays. This 
should allow:
 # To avoid invocation of millisToDays and time zone offset calculation
 # To simplify implementation of toJavaTimestamp, and properly handle negative 
inputs
 # Detect arithmetic overflow of Long



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-27405) Restrict the range of generated random timestamps

2019-04-07 Thread Maxim Gekk (JIRA)
Maxim Gekk created SPARK-27405:
--

 Summary: Restrict the range of generated random timestamps
 Key: SPARK-27405
 URL: https://issues.apache.org/jira/browse/SPARK-27405
 Project: Spark
  Issue Type: Improvement
  Components: Tests
Affects Versions: 2.4.0
Reporter: Maxim Gekk


The timestampLiteralGen of LiteralGenerator can produce instances of 
java.sql.Timestamp that cause Long arithmetic overflow on conversion 
milliseconds to microseconds. The former conversion is performed because 
Catalyst's Timestamp type stores microseconds since epoch internally. The 
ticket aims to restrict the range of generated random timestamps to 
[Long.MaxValue / 1000, Long.MinValue/ 1000].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-27422) CurrentDate should return local date

2019-04-09 Thread Maxim Gekk (JIRA)
Maxim Gekk created SPARK-27422:
--

 Summary: CurrentDate should return local date
 Key: SPARK-27422
 URL: https://issues.apache.org/jira/browse/SPARK-27422
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.0
Reporter: Maxim Gekk


According to SQL standard, DATE type is union of (year, month, days), and 
current date should return a triple of (year, month, days) in session local 
time zone. The ticket aims to follow the requirement, and calculate a local 
date for session time zone. The local date should be converted to epoch day, 
and stored internally in as DATE value.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-27357) Cast timestamps to/from dates independently from time zones

2019-04-09 Thread Maxim Gekk (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk resolved SPARK-27357.

Resolution: Not A Problem

> Cast timestamps to/from dates independently from time zones
> ---
>
> Key: SPARK-27357
> URL: https://issues.apache.org/jira/browse/SPARK-27357
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Maxim Gekk
>Priority: Minor
>
> Both Catalyst's types TIMESTAMP and DATE internally represent time intervals 
> since epoch in UTC time zone. The TIMESTAMP type contains number of 
> microseconds since epoch, and DATE is number of days since epoch (00:00:00  1 
> January 1970). As a consequence of that, the conversion should be independent 
> from session or local time zone. The ticket aims to fix current behavior and 
> makes the conversion independent from time zones.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-27423) Cast DATE to/from TIMESTAMP according to SQL standard

2019-04-09 Thread Maxim Gekk (JIRA)
Maxim Gekk created SPARK-27423:
--

 Summary: Cast DATE to/from TIMESTAMP according to SQL standard
 Key: SPARK-27423
 URL: https://issues.apache.org/jira/browse/SPARK-27423
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.0
Reporter: Maxim Gekk


According to SQL standard, DATE is union of (year, month, day). To convert it 
to Spark's TIMESTAMP which is TIMESTAMP WITH TIME ZONE, the date should be 
extended by time at midnight - (year, month, day, hour = 0, minute = 0, seconds 
= 0). The former timestamp should be considered as a timestamp at the session 
time zone, and transformed to microseconds since epoch in UTC.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-27438) Increase precision of to_timestamp

2019-04-10 Thread Maxim Gekk (JIRA)
Maxim Gekk created SPARK-27438:
--

 Summary: Increase precision of to_timestamp
 Key: SPARK-27438
 URL: https://issues.apache.org/jira/browse/SPARK-27438
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.4.0
Reporter: Maxim Gekk


The to_timestamp() function can parse input string up to second precision even 
if the specified pattern contains second fraction sub-pattern. The ticket aims 
to improve precision of to_timestamp() up to microsecond precision. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-27522) Test migration from INT96 to TIMESTAMP_MICROS in parquet

2019-04-19 Thread Maxim Gekk (JIRA)
Maxim Gekk created SPARK-27522:
--

 Summary: Test migration from INT96 to TIMESTAMP_MICROS in parquet
 Key: SPARK-27522
 URL: https://issues.apache.org/jira/browse/SPARK-27522
 Project: Spark
  Issue Type: Test
  Components: SQL
Affects Versions: 2.4.1
Reporter: Maxim Gekk


Write tests to check:
* Append timestamps of TIMESTAMP_MICROS to existing parquets with INT96 for 
timestamps
* Append timestamps of TIMESTAMP_MICROS to a table with INT96 for timestamps
* Append INT96 timestamps to parquet files with TIMESTAMP_MICROS timestamps
* Append INT96 timestamps to a table with TIMESTAMP_MICROS timestamps



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-27527) Improve description of Timestamp and Date types

2019-04-20 Thread Maxim Gekk (JIRA)
Maxim Gekk created SPARK-27527:
--

 Summary: Improve description of Timestamp and Date types
 Key: SPARK-27527
 URL: https://issues.apache.org/jira/browse/SPARK-27527
 Project: Spark
  Issue Type: Documentation
  Components: SQL
Affects Versions: 2.4.1
Reporter: Maxim Gekk


Describe precisely semantic of TimestampType and DateType, how they represent 
dates and timestamps internally.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-27528) Use Parquet logical type TIMESTAMP_MICROS by default

2019-04-20 Thread Maxim Gekk (JIRA)
Maxim Gekk created SPARK-27528:
--

 Summary: Use Parquet logical type TIMESTAMP_MICROS by default
 Key: SPARK-27528
 URL: https://issues.apache.org/jira/browse/SPARK-27528
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.4.1
Reporter: Maxim Gekk


Currently, Spark uses INT96 type for timestamps written to parquet files. To 
store Catalyst's Timestamp values as INT96, Spark converts microseconds since 
epoch to nanoseconds in Julian calendar. This conversion is not necessary if 
Spark saves timestamps as Parquet TIMESTAMP_MICROS logical type. The ticket 
aims to switch on TIMESTAMP_MICROS from INT96 in write by default.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27533) Date/timestamps CSV benchmarks

2019-04-21 Thread Maxim Gekk (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-27533:
---
Summary: Date/timestamps CSV benchmarks  (was: CSV benchmarks 
date/timestamp ops )

> Date/timestamps CSV benchmarks
> --
>
> Key: SPARK-27533
> URL: https://issues.apache.org/jira/browse/SPARK-27533
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 2.4.1
>Reporter: Maxim Gekk
>Priority: Minor
>
> Extend CSVBenchmark by new benchmarks:
> - Write dates/timestamps to files
> - Read/infer dates/timestamp from files
> - Read/infer dates/timestamps from Dataset[String]
> - to_csv/from_csv for dates/timestamps



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-27533) CSV benchmarks date/timestamp ops

2019-04-21 Thread Maxim Gekk (JIRA)
Maxim Gekk created SPARK-27533:
--

 Summary: CSV benchmarks date/timestamp ops 
 Key: SPARK-27533
 URL: https://issues.apache.org/jira/browse/SPARK-27533
 Project: Spark
  Issue Type: Test
  Components: SQL
Affects Versions: 2.4.1
Reporter: Maxim Gekk


Extend CSVBenchmark by new benchmarks:
- Write dates/timestamps to files
- Read/infer dates/timestamp from files
- Read/infer dates/timestamps from Dataset[String]
- to_csv/from_csv for dates/timestamps




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27533) Date and timestamp CSV benchmarks

2019-04-21 Thread Maxim Gekk (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-27533:
---
Summary: Date and timestamp CSV benchmarks  (was: Date/timestamps CSV 
benchmarks)

> Date and timestamp CSV benchmarks
> -
>
> Key: SPARK-27533
> URL: https://issues.apache.org/jira/browse/SPARK-27533
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 2.4.1
>Reporter: Maxim Gekk
>Priority: Minor
>
> Extend CSVBenchmark by new benchmarks:
> - Write dates/timestamps to files
> - Read/infer dates/timestamp from files
> - Read/infer dates/timestamps from Dataset[String]
> - to_csv/from_csv for dates/timestamps



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-27535) Date and timestamp JSON benchmarks

2019-04-21 Thread Maxim Gekk (JIRA)
Maxim Gekk created SPARK-27535:
--

 Summary: Date and timestamp JSON benchmarks
 Key: SPARK-27535
 URL: https://issues.apache.org/jira/browse/SPARK-27535
 Project: Spark
  Issue Type: Test
  Components: SQL
Affects Versions: 2.4.1
Reporter: Maxim Gekk


Extend JSONBenchmark by new benchmarks:
* Write dates/timestamps to files
* Read/infer dates/timestamp from files
* Read/infer dates/timestamps from Dataset[String]
* to_json/from_json for dates/timestamps




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27450) Timestamp cast fails when the ISO8601 string omits minutes, seconds or milliseconds

2019-05-04 Thread Maxim Gekk (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16833132#comment-16833132
 ] 

Maxim Gekk commented on SPARK-27450:


The cast functions supports limited number of timestamp patterns, see 
https://github.com/apache/spark/blob/ab8710b57916a129fcb89464209361120d224535/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala#L189-L208
 . Please, use to_timestamp with the custom pattern like:
{code}
scala> val new_df3 = df3
  .withColumn("eventTimeTS", to_timestamp($"eventTimeString", 
"-MM-dd'T'HH:mmXXX"))
new_df3: org.apache.spark.sql.DataFrame = [eventTimeString: string, 
eventTimeTS: timestamp]

scala> new_df3.show(false)
+--+---+
|eventTimeString   |eventTimeTS|
+--+---+
|2017-08-01T02:33-03:00|2017-08-01 07:33:00|
+--+---+
{code}

> Timestamp cast fails when the ISO8601 string omits minutes, seconds or 
> milliseconds
> ---
>
> Key: SPARK-27450
> URL: https://issues.apache.org/jira/browse/SPARK-27450
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
> Environment: Spark 2.3.x
>Reporter: Leandro Rosa
>Priority: Major
>
> ISO8601 allows to omit minutes, seconds and milliseconds.
> {quote}
> |hh:mm:ss.sss|_or_|hhmmss.sss|
> |hh:mm:ss|_or_|hhmmss|
> |hh:mm|_or_|hhmm|
> | |hh|
> {quote}
> {quote}Either the seconds, or the minutes and seconds, may be omitted from 
> the basic or extended time formats for greater brevity but decreased 
> accuracy: [hh]:[mm], [hh][mm] and [hh] are the resulting reduced accuracy 
> time formats
> {quote}
> Source: [Wikipedia ISO8601|https://en.wikipedia.org/wiki/ISO_8601]
> Popular libs, such as 
> [ZonedDateTime|https://docs.oracle.com/javase/8/docs/api/java/time/ZonedDateTime.html],
>  respect that. However, Timestamp cast fails silently.
>  
> {code:java}
> import org.apache.spark.sql.types._
> val df1 = Seq(("2017-08-01T02:33")).toDF("eventTimeString") // NON-ISO8601 
> (missing TZ offset) [OK]
> val new_df1 = df1
> .withColumn("eventTimeTS", col("eventTimeString").cast(TimestampType))
> new_df1.show(false)
> ++---+
> |eventTimeString |eventTimeTS |
> ++---+
> |2017-08-01T02:33|2017-08-01 02:33:00|
> ++---+
> {code}
> {code:java}
> val df2 = Seq(("2017-08-01T02:33Z")).toDF("eventTimeString") // ISO8601 [FAIL]
> val new_df2 = df2
> .withColumn("eventTimeTS", col("eventTimeString").cast(TimestampType))
> new_df2.show(false)
> +-+---+
> |eventTimeString |eventTimeTS|
> +-+---+
> |2017-08-01T02:33Z|null |
> +-+---+
> {code}
>  
> {code:java}
> val df3 = Seq(("2017-08-01T02:33-03:00")).toDF("eventTimeString") // ISO8601 
> [FAIL]
> val new_df3 = df3
> .withColumn("eventTimeTS", col("eventTimeString").cast(TimestampType))
> new_df3.show(false)
> +--+---+
> |eventTimeString |eventTimeTS|
> +--+---+
> |2017-08-01T02:33-03:00|null |
> +--+---+
> {code}
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27638) date format yyyy-M-dd string comparison not handled properly

2019-05-06 Thread Maxim Gekk (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16833949#comment-16833949
 ] 

Maxim Gekk commented on SPARK-27638:


[~srowen] The date literal should be casted to the date type by 
[stringToDate|[https://github.com/apache/spark/blob/ab8710b57916a129fcb89464209361120d224535/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala#L376]]
 that is able to parse the date by default, see supported patterns:
{code}
``
`-[m]m`
`-[m]m-[d]d`
`-[m]m-[d]d `
`-[m]m-[d]d *`
`-[m]m-[d]dT*
{code}

 

> date format -M-dd string comparison not handled properly 
> -
>
> Key: SPARK-27638
> URL: https://issues.apache.org/jira/browse/SPARK-27638
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.2
>Reporter: peng bo
>Priority: Major
>
> The below example works with both Mysql and Hive, however not with spark.
> {code:java}
> mysql> select * from date_test where date_col >= '2000-1-1';
> ++
> | date_col   |
> ++
> | 2000-01-01 |
> ++
> {code}
> The reason is that Spark casts both sides to String type during date and 
> string comparison for partial date support. Please find more details in 
> https://issues.apache.org/jira/browse/SPARK-8420.
> Based on some tests, the behavior of Date and String comparison in Hive and 
> Mysql:
> Hive: Cast to Date, partial date is not supported
> Spark: Cast to Date,  certain "partial date" is supported by defining certain 
> date string parse rules. Check out {{str_to_datetime}} in 
> https://github.com/mysql/mysql-server/blob/5.5/sql-common/my_time.c
> Here's 2 proposals:
> a. Follow Mysql parse rule, but some partial date string comparison cases 
> won't be supported either. 
> b. Cast String value to Date, if it passes use date.toString, original string 
> otherwise.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-27638) date format yyyy-M-dd string comparison not handled properly

2019-05-06 Thread Maxim Gekk (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16833949#comment-16833949
 ] 

Maxim Gekk edited comment on SPARK-27638 at 5/6/19 3:57 PM:


[~srowen] The date literal should be casted to the date type by 
[stringToDate|https://github.com/apache/spark/blob/ab8710b57916a129fcb89464209361120d224535/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala#L376]
 that is able to parse the date by default, see supported patterns:
{code}
``
`-[m]m`
`-[m]m-[d]d`
`-[m]m-[d]d `
`-[m]m-[d]d *`
`-[m]m-[d]dT*
{code}

 


was (Author: maxgekk):
[~srowen] The date literal should be casted to the date type by 
[stringToDate|[https://github.com/apache/spark/blob/ab8710b57916a129fcb89464209361120d224535/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala#L376]]
 that is able to parse the date by default, see supported patterns:
{code}
``
`-[m]m`
`-[m]m-[d]d`
`-[m]m-[d]d `
`-[m]m-[d]d *`
`-[m]m-[d]dT*
{code}

 

> date format -M-dd string comparison not handled properly 
> -
>
> Key: SPARK-27638
> URL: https://issues.apache.org/jira/browse/SPARK-27638
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.2
>Reporter: peng bo
>Priority: Major
>
> The below example works with both Mysql and Hive, however not with spark.
> {code:java}
> mysql> select * from date_test where date_col >= '2000-1-1';
> ++
> | date_col   |
> ++
> | 2000-01-01 |
> ++
> {code}
> The reason is that Spark casts both sides to String type during date and 
> string comparison for partial date support. Please find more details in 
> https://issues.apache.org/jira/browse/SPARK-8420.
> Based on some tests, the behavior of Date and String comparison in Hive and 
> Mysql:
> Hive: Cast to Date, partial date is not supported
> Spark: Cast to Date,  certain "partial date" is supported by defining certain 
> date string parse rules. Check out {{str_to_datetime}} in 
> https://github.com/mysql/mysql-server/blob/5.5/sql-common/my_time.c
> Here's 2 proposals:
> a. Follow Mysql parse rule, but some partial date string comparison cases 
> won't be supported either. 
> b. Cast String value to Date, if it passes use date.toString, original string 
> otherwise.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27638) date format yyyy-M-dd string comparison not handled properly

2019-05-06 Thread Maxim Gekk (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16833957#comment-16833957
 ] 

Maxim Gekk commented on SPARK-27638:


It works with explicit to_date:
{code:scala}
scala> val ds = spark.range(1).selectExpr("date '2000-01-01' as d")
ds: org.apache.spark.sql.DataFrame = [d: date]

scala> ds.where("d >= to_date('2000-1-1')").show
+--+
| d|
+--+
|2000-01-01|
+--+
{code}
but with to_date, it compares strings:
{code}
scala> ds.where("d >= '2000-1-1'").explain(true)
== Parsed Logical Plan ==
'Filter ('d >= 2000-1-1)
+- Project [10957 AS d#51]
   +- Range (0, 1, step=1, splits=Some(8))

== Analyzed Logical Plan ==
d: date
Filter (cast(d#51 as string) >= 2000-1-1)
+- Project [10957 AS d#51]
   +- Range (0, 1, step=1, splits=Some(8))

== Optimized Logical Plan ==
LocalRelation , [d#51]

== Physical Plan ==
LocalTableScan , [d#51]
{code}

> date format -M-dd string comparison not handled properly 
> -
>
> Key: SPARK-27638
> URL: https://issues.apache.org/jira/browse/SPARK-27638
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.2
>Reporter: peng bo
>Priority: Major
>
> The below example works with both Mysql and Hive, however not with spark.
> {code:java}
> mysql> select * from date_test where date_col >= '2000-1-1';
> ++
> | date_col   |
> ++
> | 2000-01-01 |
> ++
> {code}
> The reason is that Spark casts both sides to String type during date and 
> string comparison for partial date support. Please find more details in 
> https://issues.apache.org/jira/browse/SPARK-8420.
> Based on some tests, the behavior of Date and String comparison in Hive and 
> Mysql:
> Hive: Cast to Date, partial date is not supported
> Spark: Cast to Date,  certain "partial date" is supported by defining certain 
> date string parse rules. Check out {{str_to_datetime}} in 
> https://github.com/mysql/mysql-server/blob/5.5/sql-common/my_time.c
> Here's 2 proposals:
> a. Follow Mysql parse rule, but some partial date string comparison cases 
> won't be supported either. 
> b. Cast String value to Date, if it passes use date.toString, original string 
> otherwise.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-27638) date format yyyy-M-dd string comparison not handled properly

2019-05-06 Thread Maxim Gekk (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16833957#comment-16833957
 ] 

Maxim Gekk edited comment on SPARK-27638 at 5/6/19 4:10 PM:


It works with explicit to_date:
{code:scala}
scala> val ds = spark.range(1).selectExpr("date '2000-01-01' as d")
ds: org.apache.spark.sql.DataFrame = [d: date]

scala> ds.where("d >= to_date('2000-1-1')").show
+--+
| d|
+--+
|2000-01-01|
+--+
{code}
but with to_date, it compares strings:
{code}
scala> ds.where("d >= '2000-1-1'").explain(true)
== Parsed Logical Plan ==
'Filter ('d >= 2000-1-1)
+- Project [10957 AS d#51]
   +- Range (0, 1, step=1, splits=Some(8))

== Analyzed Logical Plan ==
d: date
Filter (cast(d#51 as string) >= 2000-1-1)
+- Project [10957 AS d#51]
   +- Range (0, 1, step=1, splits=Some(8))

== Optimized Logical Plan ==
LocalRelation , [d#51]

== Physical Plan ==
LocalTableScan , [d#51]
{code}

The same is for '2000-01-01', the date column is casted to string. 


was (Author: maxgekk):
It works with explicit to_date:
{code:scala}
scala> val ds = spark.range(1).selectExpr("date '2000-01-01' as d")
ds: org.apache.spark.sql.DataFrame = [d: date]

scala> ds.where("d >= to_date('2000-1-1')").show
+--+
| d|
+--+
|2000-01-01|
+--+
{code}
but with to_date, it compares strings:
{code}
scala> ds.where("d >= '2000-1-1'").explain(true)
== Parsed Logical Plan ==
'Filter ('d >= 2000-1-1)
+- Project [10957 AS d#51]
   +- Range (0, 1, step=1, splits=Some(8))

== Analyzed Logical Plan ==
d: date
Filter (cast(d#51 as string) >= 2000-1-1)
+- Project [10957 AS d#51]
   +- Range (0, 1, step=1, splits=Some(8))

== Optimized Logical Plan ==
LocalRelation , [d#51]

== Physical Plan ==
LocalTableScan , [d#51]
{code}

> date format -M-dd string comparison not handled properly 
> -
>
> Key: SPARK-27638
> URL: https://issues.apache.org/jira/browse/SPARK-27638
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.2
>Reporter: peng bo
>Priority: Major
>
> The below example works with both Mysql and Hive, however not with spark.
> {code:java}
> mysql> select * from date_test where date_col >= '2000-1-1';
> ++
> | date_col   |
> ++
> | 2000-01-01 |
> ++
> {code}
> The reason is that Spark casts both sides to String type during date and 
> string comparison for partial date support. Please find more details in 
> https://issues.apache.org/jira/browse/SPARK-8420.
> Based on some tests, the behavior of Date and String comparison in Hive and 
> Mysql:
> Hive: Cast to Date, partial date is not supported
> Spark: Cast to Date,  certain "partial date" is supported by defining certain 
> date string parse rules. Check out {{str_to_datetime}} in 
> https://github.com/mysql/mysql-server/blob/5.5/sql-common/my_time.c
> Here's 2 proposals:
> a. Follow Mysql parse rule, but some partial date string comparison cases 
> won't be supported either. 
> b. Cast String value to Date, if it passes use date.toString, original string 
> otherwise.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27638) date format yyyy-M-dd string comparison not handled properly

2019-05-06 Thread Maxim Gekk (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16834214#comment-16834214
 ] 

Maxim Gekk commented on SPARK-27638:


[~pengbo] Are you going to propose a PR for that? If not, I can fix the issue.

> date format -M-dd string comparison not handled properly 
> -
>
> Key: SPARK-27638
> URL: https://issues.apache.org/jira/browse/SPARK-27638
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.2
>Reporter: peng bo
>Priority: Major
>
> The below example works with both Mysql and Hive, however not with spark.
> {code:java}
> mysql> select * from date_test where date_col >= '2000-1-1';
> ++
> | date_col   |
> ++
> | 2000-01-01 |
> ++
> {code}
> The reason is that Spark casts both sides to String type during date and 
> string comparison for partial date support. Please find more details in 
> https://issues.apache.org/jira/browse/SPARK-8420.
> Based on some tests, the behavior of Date and String comparison in Hive and 
> Mysql:
> Hive: Cast to Date, partial date is not supported
> Spark: Cast to Date,  certain "partial date" is supported by defining certain 
> date string parse rules. Check out {{str_to_datetime}} in 
> https://github.com/mysql/mysql-server/blob/5.5/sql-common/my_time.c
> Here's 2 proposals:
> a. Follow Mysql parse rule, but some partial date string comparison cases 
> won't be supported either. 
> b. Cast String value to Date, if it passes use date.toString, original string 
> otherwise.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-34138) Keep dependants cached while refreshing v1 tables

2021-01-16 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-34138:
--

 Summary: Keep dependants cached while refreshing v1 tables
 Key: SPARK-34138
 URL: https://issues.apache.org/jira/browse/SPARK-34138
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Maxim Gekk


Keeping dependants cached while refreshing v1 tables should allow to improve 
user experience with table/view caching. For example, let's imagine that an 
user has cached v1 table and cached view based on the table. And the user 
passed the table to external library which drops/renames/adds partitions in the 
v1 table. Unfortunately, the user gets the view uncached after that even he/she 
hasn't uncached the view explicitly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-34143) Adding partitions to fully partitioned v2 table

2021-01-17 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-34143:
--

 Summary: Adding partitions to fully partitioned v2 table
 Key: SPARK-34143
 URL: https://issues.apache.org/jira/browse/SPARK-34143
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.2.0
Reporter: Maxim Gekk


The test below fails:
{code:scala}
withNamespaceAndTable("ns", "tbl") { t =>
  sql(s"CREATE TABLE $t (p0 INT, p1 STRING) $defaultUsing PARTITIONED BY 
(p0, p1)")
  sql(s"ALTER TABLE $t ADD PARTITION (p0 = 0, p1 = 'abc')")
  checkPartitions(t, Map("p0" -> "0", "p1" -> "abc"))
  checkAnswer(sql(s"SELECT * FROM $t"), Row(0, "abc"))
}
{code}

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-34149) DSv2: `ALTER TABLE .. ADD PARTITION` does not refresh table cache

2021-01-17 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-34149:
--

 Summary: DSv2: `ALTER TABLE .. ADD PARTITION` does not refresh 
table cache
 Key: SPARK-34149
 URL: https://issues.apache.org/jira/browse/SPARK-34149
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Maxim Gekk






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-34149) DSv2: `ALTER TABLE .. ADD PARTITION` does not refresh table cache

2021-01-17 Thread Maxim Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-34149:
---
Description: 
For example, the test below:
{code:scala}
  test("SPARK-X: refresh cache in partition adding") {
withNamespaceAndTable("ns", "tbl") { t =>
  sql(s"CREATE TABLE $t (part int) $defaultUsing PARTITIONED BY (part)")
  sql(s"ALTER TABLE $t ADD PARTITION (part=0)")
  assert(!spark.catalog.isCached(t))
  sql(s"CACHE TABLE $t")
  assert(spark.catalog.isCached(t))
  checkAnswer(sql(s"SELECT * FROM $t"), Row(0))

  sql(s"ALTER TABLE $t ADD PARTITION (part=1)")
  assert(spark.catalog.isCached(t))
  checkAnswer(sql(s"SELECT * FROM $t"), Seq(Row(0), Row(1)))
}
  }
{code}
fails with;
{code}
!== Correct Answer - 2 ==   == Spark Answer - 1 ==
!struct<>   struct
 [0][0]
![1]

   
ScalaTestFailureLocation: org.apache.spark.sql.QueryTest$ at 
(QueryTest.scala:243)
{code}
because the command doesn't refresh the cache.

> DSv2: `ALTER TABLE .. ADD PARTITION` does not refresh table cache
> -
>
> Key: SPARK-34149
> URL: https://issues.apache.org/jira/browse/SPARK-34149
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Priority: Major
>
> For example, the test below:
> {code:scala}
>   test("SPARK-X: refresh cache in partition adding") {
> withNamespaceAndTable("ns", "tbl") { t =>
>   sql(s"CREATE TABLE $t (part int) $defaultUsing PARTITIONED BY (part)")
>   sql(s"ALTER TABLE $t ADD PARTITION (part=0)")
>   assert(!spark.catalog.isCached(t))
>   sql(s"CACHE TABLE $t")
>   assert(spark.catalog.isCached(t))
>   checkAnswer(sql(s"SELECT * FROM $t"), Row(0))
>   sql(s"ALTER TABLE $t ADD PARTITION (part=1)")
>   assert(spark.catalog.isCached(t))
>   checkAnswer(sql(s"SELECT * FROM $t"), Seq(Row(0), Row(1)))
> }
>   }
> {code}
> fails with;
> {code}
> !== Correct Answer - 2 ==   == Spark Answer - 1 ==
> !struct<>   struct
>  [0][0]
> ![1]
> 
>
> ScalaTestFailureLocation: org.apache.spark.sql.QueryTest$ at 
> (QueryTest.scala:243)
> {code}
> because the command doesn't refresh the cache.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-34153) Remove unused `getRawTable()` from `HiveExternalCatalog.alterPartitions()`

2021-01-18 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-34153:
--

 Summary: Remove unused `getRawTable()` from 
`HiveExternalCatalog.alterPartitions()`
 Key: SPARK-34153
 URL: https://issues.apache.org/jira/browse/SPARK-34153
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.2.0
Reporter: Maxim Gekk


This 
https://github.com/apache/spark/blob/157b72ac9fa0057d5fd6d7ed52a6c4b22ebd1dfc/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala#L1148
 can be removed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-34161) Check re-caching of v2 table dependents after table altering

2021-01-19 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-34161:
--

 Summary: Check re-caching of v2 table dependents after table 
altering
 Key: SPARK-34161
 URL: https://issues.apache.org/jira/browse/SPARK-34161
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Maxim Gekk


Add tests to unified tests and check that dependants of v2 table are still 
cached after table altering.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-34161) Check re-caching of v2 table dependents after table altering

2021-01-19 Thread Maxim Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-34161:
---
Description: Add tests to unified test suites and check that dependants of 
v2 table are still cached after table altering.  (was: Add tests to unified 
tests and check that dependants of v2 table are still cached after table 
altering.)

> Check re-caching of v2 table dependents after table altering
> 
>
> Key: SPARK-34161
> URL: https://issues.apache.org/jira/browse/SPARK-34161
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Priority: Major
>
> Add tests to unified test suites and check that dependants of v2 table are 
> still cached after table altering.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-34197) refreshTable() should not invalidate the relation cache for temporary views

2021-01-21 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-34197:
--

 Summary: refreshTable() should not invalidate the relation cache 
for temporary views
 Key: SPARK-34197
 URL: https://issues.apache.org/jira/browse/SPARK-34197
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.2.0
Reporter: Maxim Gekk


The SessionCatalog.refreshTable() should not invalidate the entry in the 
relation cache for a table when refreshTable() refreshes a temp view.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-34207) Rename `isTemporaryTable` to `isTempView` in `SessionCatalog`

2021-01-22 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-34207:
--

 Summary: Rename `isTemporaryTable` to `isTempView` in 
`SessionCatalog`
 Key: SPARK-34207
 URL: https://issues.apache.org/jira/browse/SPARK-34207
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.2.0
Reporter: Maxim Gekk


Currently, there are two methods that do the same but have different names:

{code:java}
def isTempView(nameParts: Seq[String]): Boolean
{code}

and

{code:java}
def isTemporaryTable(name: TableIdentifier): Boolean
{code}
 It would be nice to rename `SessionCatalog.isTemporaryTable()` to 
`SessionCatalog.isTempView()`.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-34207) Rename `isTemporaryTable` to `isTempView` in `SessionCatalog`

2021-01-22 Thread Maxim Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-34207:
---
Priority: Trivial  (was: Major)

> Rename `isTemporaryTable` to `isTempView` in `SessionCatalog`
> -
>
> Key: SPARK-34207
> URL: https://issues.apache.org/jira/browse/SPARK-34207
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Priority: Trivial
>
> Currently, there are two methods that do the same but have different names:
> {code:java}
> def isTempView(nameParts: Seq[String]): Boolean
> {code}
> and
> {code:java}
> def isTemporaryTable(name: TableIdentifier): Boolean
> {code}
>  It would be nice to rename `SessionCatalog.isTemporaryTable()` to 
> `SessionCatalog.isTempView()`.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-34213) LOAD DATA doesn't refresh v1 table cache

2021-01-23 Thread Maxim Gekk (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17270722#comment-17270722
 ] 

Maxim Gekk commented on SPARK-34213:


I am working on the issue.

> LOAD DATA doesn't refresh v1 table cache
> 
>
> Key: SPARK-34213
> URL: https://issues.apache.org/jira/browse/SPARK-34213
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.2, 3.2.0, 3.1.1
>Reporter: Maxim Gekk
>Priority: Major
>
> The example below portraits the issue:
> 1. Create a source table:
> {code:sql}
> spark-sql> CREATE TABLE src_tbl (c0 int, part int) USING hive PARTITIONED BY 
> (part);
> spark-sql> INSERT INTO src_tbl PARTITION (part=0) SELECT 0;
> spark-sql> SHOW TABLE EXTENDED LIKE 'src_tbl' PARTITION (part=0);
> default   src_tbl false   Partition Values: [part=0]
> Location: 
> file:/Users/maximgekk/proj/load-data-refresh-cache/spark-warehouse/src_tbl/part=0
> ...
> {code}
> 2. Load data from the source table to a cached destination table:
> {code:sql}
> spark-sql> CREATE TABLE dst_tbl (c0 int, part int) USING hive PARTITIONED BY 
> (part);
> spark-sql> INSERT INTO dst_tbl PARTITION (part=1) SELECT 1;
> spark-sql> CACHE TABLE dst_tbl;
> spark-sql> SELECT * FROM dst_tbl;
> 1 1
> spark-sql> LOAD DATA LOCAL INPATH 
> '/Users/maximgekk/proj/load-data-refresh-cache/spark-warehouse/src_tbl/part=0'
>  INTO TABLE dst_tbl PARTITION (part=0);
> spark-sql> SELECT * FROM dst_tbl;
> 1 1
> {code}
> The last query does not show recently loaded data from the source table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-34213) LOAD DATA doesn't refresh v1 table cache

2021-01-23 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-34213:
--

 Summary: LOAD DATA doesn't refresh v1 table cache
 Key: SPARK-34213
 URL: https://issues.apache.org/jira/browse/SPARK-34213
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.2, 3.2.0, 3.1.1
Reporter: Maxim Gekk


The example below portraits the issue:
1. Create a source table:
{code:sql}
spark-sql> CREATE TABLE src_tbl (c0 int, part int) USING hive PARTITIONED BY 
(part);
spark-sql> INSERT INTO src_tbl PARTITION (part=0) SELECT 0;
spark-sql> SHOW TABLE EXTENDED LIKE 'src_tbl' PARTITION (part=0);
default src_tbl false   Partition Values: [part=0]
Location: 
file:/Users/maximgekk/proj/load-data-refresh-cache/spark-warehouse/src_tbl/part=0
...
{code}
2. Load data from the source table to a cached destination table:
{code:sql}
spark-sql> CREATE TABLE dst_tbl (c0 int, part int) USING hive PARTITIONED BY 
(part);
spark-sql> INSERT INTO dst_tbl PARTITION (part=1) SELECT 1;
spark-sql> CACHE TABLE dst_tbl;
spark-sql> SELECT * FROM dst_tbl;
1   1
spark-sql> LOAD DATA LOCAL INPATH 
'/Users/maximgekk/proj/load-data-refresh-cache/spark-warehouse/src_tbl/part=0' 
INTO TABLE dst_tbl PARTITION (part=0);
spark-sql> SELECT * FROM dst_tbl;
1   1
{code}
The last query does not show recently loaded data from the source table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-34215) Keep table cached after truncation

2021-01-24 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-34215:
--

 Summary: Keep table cached after truncation
 Key: SPARK-34215
 URL: https://issues.apache.org/jira/browse/SPARK-34215
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Maxim Gekk


Currently, TRUNCATE TABLE uncaches table. It should keep tables cached to be 
consistent to other commands.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-34251) TRUNCATE TABLE resets stats for non-empty v1 table

2021-01-26 Thread Maxim Gekk (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17272231#comment-17272231
 ] 

Maxim Gekk commented on SPARK-34251:


I am working on bug fix.

> TRUNCATE TABLE resets stats for non-empty v1 table
> --
>
> Key: SPARK-34251
> URL: https://issues.apache.org/jira/browse/SPARK-34251
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.2, 3.2.0, 3.1.1
>Reporter: Maxim Gekk
>Priority: Major
>
> The example below portraits the issue:
> {code:sql}
> spark-sql> CREATE TABLE tbl (c0 int, part int) PARTITIONED BY (part);
> spark-sql> INSERT INTO tbl PARTITION (part=0) SELECT 0;
> spark-sql> INSERT INTO tbl PARTITION (part=1) SELECT 1;
> spark-sql> ANALYZE TABLE tbl COMPUTE STATISTICS;
> spark-sql> DESCRIBE TABLE EXTENDED tbl;
> ...
> Statistics4 bytes, 2 rows
> ...
> {code}
> Let's truncate one partition:
> {code:sql}
> spark-sql> TRUNCATE TABLE tbl PARTITION (part=1);
> spark-sql> DESCRIBE TABLE EXTENDED tbl;
> ...
> Statistics0 bytes, 0 rows
> ...
> spark-sql> SELECT * FROM tbl;
> 0 0
> {code}
> *The last query returns a row but stats show 0 rows. *



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-34251) TRUNCATE TABLE resets stats for non-empty v1 table

2021-01-26 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-34251:
--

 Summary: TRUNCATE TABLE resets stats for non-empty v1 table
 Key: SPARK-34251
 URL: https://issues.apache.org/jira/browse/SPARK-34251
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.2, 3.2.0, 3.1.1
Reporter: Maxim Gekk


The example below portraits the issue:
{code:sql}
spark-sql> CREATE TABLE tbl (c0 int, part int) PARTITIONED BY (part);
spark-sql> INSERT INTO tbl PARTITION (part=0) SELECT 0;
spark-sql> INSERT INTO tbl PARTITION (part=1) SELECT 1;
spark-sql> ANALYZE TABLE tbl COMPUTE STATISTICS;
spark-sql> DESCRIBE TABLE EXTENDED tbl;
...
Statistics  4 bytes, 2 rows
...
{code}
Let's truncate one partition:
{code:sql}
spark-sql> TRUNCATE TABLE tbl PARTITION (part=1);
spark-sql> DESCRIBE TABLE EXTENDED tbl;
...
Statistics  0 bytes, 0 rows
...
spark-sql> SELECT * FROM tbl;
0   0
{code}
*The last query returns a row but stats show 0 rows. *



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-34262) ALTER TABLE .. SET LOCATION doesn't refresh v1 table cache

2021-01-27 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-34262:
--

 Summary: ALTER TABLE .. SET LOCATION doesn't refresh v1 table cache
 Key: SPARK-34262
 URL: https://issues.apache.org/jira/browse/SPARK-34262
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.2, 3.2.0, 3.1.1
Reporter: Maxim Gekk
Assignee: Maxim Gekk
 Fix For: 3.0.2, 3.1.1


The example below portraits the issue:
1. Create a source table:
{code:sql}
spark-sql> CREATE TABLE src_tbl (c0 int, part int) USING hive PARTITIONED BY 
(part);
spark-sql> INSERT INTO src_tbl PARTITION (part=0) SELECT 0;
spark-sql> SHOW TABLE EXTENDED LIKE 'src_tbl' PARTITION (part=0);
default src_tbl false   Partition Values: [part=0]
Location: 
file:/Users/maximgekk/proj/load-data-refresh-cache/spark-warehouse/src_tbl/part=0
...
{code}
2. Load data from the source table to a cached destination table:
{code:sql}
spark-sql> CREATE TABLE dst_tbl (c0 int, part int) USING hive PARTITIONED BY 
(part);
spark-sql> INSERT INTO dst_tbl PARTITION (part=1) SELECT 1;
spark-sql> CACHE TABLE dst_tbl;
spark-sql> SELECT * FROM dst_tbl;
1   1
spark-sql> LOAD DATA LOCAL INPATH 
'/Users/maximgekk/proj/load-data-refresh-cache/spark-warehouse/src_tbl/part=0' 
INTO TABLE dst_tbl PARTITION (part=0);
spark-sql> SELECT * FROM dst_tbl;
1   1
{code}
The last query does not show recently loaded data from the source table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-34262) ALTER TABLE .. SET LOCATION doesn't refresh v1 table cache

2021-01-27 Thread Maxim Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-34262:
---
Description: 
The example below portraits the issue:
1. Create a source table:
{code:sql}
spark-sql> CREATE TABLE src_tbl (c0 int, part int) USING hive PARTITIONED BY 
(part);
spark-sql> INSERT INTO src_tbl PARTITION (part=0) SELECT 0;
spark-sql> SHOW TABLE EXTENDED LIKE 'src_tbl' PARTITION (part=0);
default src_tbl false   Partition Values: [part=0]
Location: 
file:/Users/maximgekk/proj/refresh-cache-set-location/spark-warehouse/src_tbl/part=0
...
{code}
2. Load data from the source table to a cached destination table:
{code:sql}
spark-sql> CREATE TABLE dst_tbl (c0 int, part int) USING hive PARTITIONED BY 
(part);
spark-sql> ALTER TABLE dst_tbl ADD PARTITION (part=0);
spark-sql> INSERT INTO dst_tbl PARTITION (part=1) SELECT 1;
spark-sql> CACHE TABLE dst_tbl;
spark-sql> SELECT * FROM dst_tbl;
1   1
spark-sql> ALTER TABLE dst_tbl PARTITION (part=0) SET LOCATION 
'/Users/maximgekk/proj/refresh-cache-set-location/spark-warehouse/src_tbl/part=0';
spark-sql> SELECT * FROM dst_tbl;
1   1
{code}
The last query does not show recently loaded data from the source table.

  was:
The example below portraits the issue:
1. Create a source table:
{code:sql}
spark-sql> CREATE TABLE src_tbl (c0 int, part int) USING hive PARTITIONED BY 
(part);
spark-sql> INSERT INTO src_tbl PARTITION (part=0) SELECT 0;
spark-sql> SHOW TABLE EXTENDED LIKE 'src_tbl' PARTITION (part=0);
default src_tbl false   Partition Values: [part=0]
Location: 
file:/Users/maximgekk/proj/load-data-refresh-cache/spark-warehouse/src_tbl/part=0
...
{code}
2. Load data from the source table to a cached destination table:
{code:sql}
spark-sql> CREATE TABLE dst_tbl (c0 int, part int) USING hive PARTITIONED BY 
(part);
spark-sql> INSERT INTO dst_tbl PARTITION (part=1) SELECT 1;
spark-sql> CACHE TABLE dst_tbl;
spark-sql> SELECT * FROM dst_tbl;
1   1
spark-sql> LOAD DATA LOCAL INPATH 
'/Users/maximgekk/proj/load-data-refresh-cache/spark-warehouse/src_tbl/part=0' 
INTO TABLE dst_tbl PARTITION (part=0);
spark-sql> SELECT * FROM dst_tbl;
1   1
{code}
The last query does not show recently loaded data from the source table.


> ALTER TABLE .. SET LOCATION doesn't refresh v1 table cache
> --
>
> Key: SPARK-34262
> URL: https://issues.apache.org/jira/browse/SPARK-34262
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.2, 3.2.0, 3.1.1
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
>  Labels: correctness
> Fix For: 3.0.2, 3.1.1
>
>
> The example below portraits the issue:
> 1. Create a source table:
> {code:sql}
> spark-sql> CREATE TABLE src_tbl (c0 int, part int) USING hive PARTITIONED BY 
> (part);
> spark-sql> INSERT INTO src_tbl PARTITION (part=0) SELECT 0;
> spark-sql> SHOW TABLE EXTENDED LIKE 'src_tbl' PARTITION (part=0);
> default   src_tbl false   Partition Values: [part=0]
> Location: 
> file:/Users/maximgekk/proj/refresh-cache-set-location/spark-warehouse/src_tbl/part=0
> ...
> {code}
> 2. Load data from the source table to a cached destination table:
> {code:sql}
> spark-sql> CREATE TABLE dst_tbl (c0 int, part int) USING hive PARTITIONED BY 
> (part);
> spark-sql> ALTER TABLE dst_tbl ADD PARTITION (part=0);
> spark-sql> INSERT INTO dst_tbl PARTITION (part=1) SELECT 1;
> spark-sql> CACHE TABLE dst_tbl;
> spark-sql> SELECT * FROM dst_tbl;
> 1 1
> spark-sql> ALTER TABLE dst_tbl PARTITION (part=0) SET LOCATION 
> '/Users/maximgekk/proj/refresh-cache-set-location/spark-warehouse/src_tbl/part=0';
> spark-sql> SELECT * FROM dst_tbl;
> 1 1
> {code}
> The last query does not show recently loaded data from the source table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-34266) Update comments for `SessionCatalog.refreshTable()` and `CatalogImpl.refreshTable()`

2021-01-27 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-34266:
--

 Summary: Update comments for `SessionCatalog.refreshTable()` and 
`CatalogImpl.refreshTable()`
 Key: SPARK-34266
 URL: https://issues.apache.org/jira/browse/SPARK-34266
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Maxim Gekk






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-34267) Remove `refreshTable()` from `SessionState`

2021-01-27 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-34267:
--

 Summary: Remove `refreshTable()` from `SessionState`
 Key: SPARK-34267
 URL: https://issues.apache.org/jira/browse/SPARK-34267
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Maxim Gekk






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-34267) Remove `refreshTable()` from `SessionState`

2021-01-27 Thread Maxim Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-34267:
---
Description: Spark already has `SessionCatalog.refreshTable` and 
`CatalogImpl.refreshTable`. One more method in `SessionState` might confuse.

> Remove `refreshTable()` from `SessionState`
> ---
>
> Key: SPARK-34267
> URL: https://issues.apache.org/jira/browse/SPARK-34267
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Priority: Major
>
> Spark already has `SessionCatalog.refreshTable` and 
> `CatalogImpl.refreshTable`. One more method in `SessionState` might confuse.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-34282) Unify v1 and v2 TRUNCATE TABLE tests

2021-01-28 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-34282:
--

 Summary: Unify v1 and v2 TRUNCATE TABLE tests
 Key: SPARK-34282
 URL: https://issues.apache.org/jira/browse/SPARK-34282
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Maxim Gekk
Assignee: Maxim Gekk
 Fix For: 3.2.0


Extract ALTER TABLE .. RECOVER PARTITIONS tests to the common place to run them 
for V1 and v2 datasources. Some tests can be places to V1 and V2 specific test 
suites.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-34282) Unify v1 and v2 TRUNCATE TABLE tests

2021-01-28 Thread Maxim Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-34282:
---
Description: Extract TRUNCATE TABLE tests to the common place to run them 
for V1 and v2 datasources. Some tests can be places to V1 and V2 specific test 
suites.  (was: Extract ALTER TABLE .. RECOVER PARTITIONS tests to the common 
place to run them for V1 and v2 datasources. Some tests can be places to V1 and 
V2 specific test suites.)

> Unify v1 and v2 TRUNCATE TABLE tests
> 
>
> Key: SPARK-34282
> URL: https://issues.apache.org/jira/browse/SPARK-34282
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
> Fix For: 3.2.0
>
>
> Extract TRUNCATE TABLE tests to the common place to run them for V1 and v2 
> datasources. Some tests can be places to V1 and V2 specific test suites.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-34290) Support v2 TRUNCATE TABLE

2021-01-29 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-34290:
--

 Summary: Support v2 TRUNCATE TABLE
 Key: SPARK-34290
 URL: https://issues.apache.org/jira/browse/SPARK-34290
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Maxim Gekk


Need to implement TRUNCATE TABLE for DSv2 tables similarly to v1 implementation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-34290) Support v2 TRUNCATE TABLE

2021-01-29 Thread Maxim Gekk (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17274247#comment-17274247
 ] 

Maxim Gekk commented on SPARK-34290:


I am working on this.

> Support v2 TRUNCATE TABLE
> -
>
> Key: SPARK-34290
> URL: https://issues.apache.org/jira/browse/SPARK-34290
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Priority: Major
>
> Need to implement TRUNCATE TABLE for DSv2 tables similarly to v1 
> implementation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-34301) Use logical plan of alter table in `CatalogImpl.recoverPartitions()`

2021-01-31 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-34301:
--

 Summary: Use logical plan of alter table in 
`CatalogImpl.recoverPartitions()`
 Key: SPARK-34301
 URL: https://issues.apache.org/jira/browse/SPARK-34301
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Maxim Gekk


The logical node will allow:
1. Print nicer error message
2. Not bound to v1 tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-34302) Migrate ALTER TABLE .. CHANGE COLUMN to new resolution framework

2021-01-31 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-34302:
--

 Summary: Migrate ALTER TABLE .. CHANGE COLUMN to new resolution 
framework
 Key: SPARK-34302
 URL: https://issues.apache.org/jira/browse/SPARK-34302
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Maxim Gekk
Assignee: Maxim Gekk
 Fix For: 3.2.0


# Create the Command logical node for SHOW TABLE EXTENDED
# Remove ShowTableStatement



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-34302) Migrate ALTER TABLE .. CHANGE COLUMN to new resolution framework

2021-01-31 Thread Maxim Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-34302:
---
Description: 
# Create the Command logical node for ALTER TABLE .. CHANGE COLUMN
# Remove AlterTableAlterColumnStatement

  was:
# Create the Command logical node for SHOW TABLE EXTENDED
# Remove ShowTableStatement


> Migrate ALTER TABLE .. CHANGE COLUMN to new resolution framework
> 
>
> Key: SPARK-34302
> URL: https://issues.apache.org/jira/browse/SPARK-34302
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
> Fix For: 3.2.0
>
>
> # Create the Command logical node for ALTER TABLE .. CHANGE COLUMN
> # Remove AlterTableAlterColumnStatement



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-34303) Migrate ALTER TABLE ... SET LOCATION to new resolution framework

2021-01-31 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-34303:
--

 Summary: Migrate ALTER TABLE ... SET LOCATION to new resolution 
framework
 Key: SPARK-34303
 URL: https://issues.apache.org/jira/browse/SPARK-34303
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Maxim Gekk
Assignee: Maxim Gekk
 Fix For: 3.2.0


# Create the Command logical node for ALTER TABLE .. CHANGE COLUMN
# Remove AlterTableAlterColumnStatement



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-34303) Migrate ALTER TABLE ... SET LOCATION to new resolution framework

2021-01-31 Thread Maxim Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-34303:
---
Description: 
# Create the Command logical node for ALTER TABLE ... SET LOCATION
# Remove AlterTableSetLocationStatement

  was:
# Create the Command logical node for ALTER TABLE .. CHANGE COLUMN
# Remove AlterTableAlterColumnStatement


> Migrate ALTER TABLE ... SET LOCATION to new resolution framework
> 
>
> Key: SPARK-34303
> URL: https://issues.apache.org/jira/browse/SPARK-34303
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
> Fix For: 3.2.0
>
>
> # Create the Command logical node for ALTER TABLE ... SET LOCATION
> # Remove AlterTableSetLocationStatement



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-34304) Remove view checks in v1 alter table commands

2021-01-31 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-34304:
--

 Summary: Remove view checks in v1 alter table commands
 Key: SPARK-34304
 URL: https://issues.apache.org/jira/browse/SPARK-34304
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Maxim Gekk


After migration on new resolution framework in SPARK-29900, view checks are not 
needed in the commands:
  - AlterTableAddPartitionCommand
  - AlterTableDropPartitionCommand
  - AlterTableRenamePartitionCommand
  - AlterTableRecoverPartitionsCommand
  - AlterTableSerDePropertiesCommand

So, the checks DDLUtils.verifyAlterTableType can be removed from the v1 
commands.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-34305) Unify v1 and v2 ALTER TABLE .. SET SERDE tests

2021-01-31 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-34305:
--

 Summary: Unify v1 and v2 ALTER TABLE .. SET SERDE tests
 Key: SPARK-34305
 URL: https://issues.apache.org/jira/browse/SPARK-34305
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Maxim Gekk
Assignee: Maxim Gekk
 Fix For: 3.2.0


Extract TRUNCATE TABLE tests to the common place to run them for V1 and v2 
datasources. Some tests can be places to V1 and V2 specific test suites.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-34305) Unify v1 and v2 ALTER TABLE .. SET SERDE tests

2021-01-31 Thread Maxim Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-34305:
---
Description: Extract ALTER TABLE .. SET SERDE tests to the common place to 
run them for V1 and v2 datasources. Some tests can be places to V1 and V2 
specific test suites.  (was: Extract TRUNCATE TABLE tests to the common place 
to run them for V1 and v2 datasources. Some tests can be places to V1 and V2 
specific test suites.)

> Unify v1 and v2 ALTER TABLE .. SET SERDE tests
> --
>
> Key: SPARK-34305
> URL: https://issues.apache.org/jira/browse/SPARK-34305
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
> Fix For: 3.2.0
>
>
> Extract ALTER TABLE .. SET SERDE tests to the common place to run them for V1 
> and v2 datasources. Some tests can be places to V1 and V2 specific test 
> suites.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-34302) Migrate ALTER TABLE .. CHANGE COLUMN to new resolution framework

2021-01-31 Thread Maxim Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-34302:
---
Description: 
# Create the Command logical node for ALTER TABLE .. CHANGE COLUMN
# Remove AlterTableAlterColumnStatement
# Remove the check verifyAlterTableType() from run()

  was:
# Create the Command logical node for ALTER TABLE .. CHANGE COLUMN
# Remove AlterTableAlterColumnStatement


> Migrate ALTER TABLE .. CHANGE COLUMN to new resolution framework
> 
>
> Key: SPARK-34302
> URL: https://issues.apache.org/jira/browse/SPARK-34302
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
> Fix For: 3.2.0
>
>
> # Create the Command logical node for ALTER TABLE .. CHANGE COLUMN
> # Remove AlterTableAlterColumnStatement
> # Remove the check verifyAlterTableType() from run()



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-34303) Migrate ALTER TABLE ... SET LOCATION to new resolution framework

2021-01-31 Thread Maxim Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-34303:
---
Description: 
# Create the Command logical node for ALTER TABLE ... SET LOCATION
# Remove AlterTableSetLocationStatement
# Remove the check verifyAlterTableType() from run()

  was:
# Create the Command logical node for ALTER TABLE ... SET LOCATION
# Remove AlterTableSetLocationStatement


> Migrate ALTER TABLE ... SET LOCATION to new resolution framework
> 
>
> Key: SPARK-34303
> URL: https://issues.apache.org/jira/browse/SPARK-34303
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
> Fix For: 3.2.0
>
>
> # Create the Command logical node for ALTER TABLE ... SET LOCATION
> # Remove AlterTableSetLocationStatement
> # Remove the check verifyAlterTableType() from run()



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29391) Default year-month units

2021-01-31 Thread Maxim Gekk (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17275933#comment-17275933
 ] 

Maxim Gekk commented on SPARK-29391:


For now, I am not sure that we are still going to be compatible with PostgreSQL 
in interval formats. cc [~cloud_fan] [~hyukjin.kwon] 

> Default year-month units
> 
>
> Key: SPARK-29391
> URL: https://issues.apache.org/jira/browse/SPARK-29391
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Maxim Gekk
>Priority: Major
>
> PostgreSQL can assume default year-month units by defaults:
> {code}
> maxim=# SELECT interval '1-2'; 
>interval
> ---
>  1 year 2 mons
> {code}
> but the same produces NULL in Spark:



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-34161) Check re-caching of v2 table dependents after table altering

2021-01-31 Thread Maxim Gekk (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17275958#comment-17275958
 ] 

Maxim Gekk commented on SPARK-34161:


This was resolved by https://github.com/apache/spark/pull/31250, [~cloud_fan] 
please, close it.

> Check re-caching of v2 table dependents after table altering
> 
>
> Key: SPARK-34161
> URL: https://issues.apache.org/jira/browse/SPARK-34161
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Priority: Major
>
> Add tests to unified test suites and check that dependants of v2 table are 
> still cached after table altering.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-34312) Support partition truncation by `SupportsPartitionManagement`

2021-02-01 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-34312:
--

 Summary: Support partition truncation by 
`SupportsPartitionManagement`
 Key: SPARK-34312
 URL: https://issues.apache.org/jira/browse/SPARK-34312
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Maxim Gekk
Assignee: Maxim Gekk
 Fix For: 3.2.0


Add new method `purgePartition` in `SupportsPartitionManagement` and 
`purgePartitions` in `SupportsAtomicPartitionManagement`.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-34312) Support partition truncation by `SupportsPartitionManagement`

2021-02-01 Thread Maxim Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-34312:
---
Description: Add new method `truncatePartition` in 
`SupportsPartitionManagement` and `truncatePartitions` in 
`SupportsAtomicPartitionManagement`.  (was: Add new method `purgePartition` in 
`SupportsPartitionManagement` and `purgePartitions` in 
`SupportsAtomicPartitionManagement`.)

> Support partition truncation by `SupportsPartitionManagement`
> -
>
> Key: SPARK-34312
> URL: https://issues.apache.org/jira/browse/SPARK-34312
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
> Fix For: 3.2.0
>
>
> Add new method `truncatePartition` in `SupportsPartitionManagement` and 
> `truncatePartitions` in `SupportsAtomicPartitionManagement`.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-34314) Wrong discovered partition value

2021-02-01 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-34314:
--

 Summary: Wrong discovered partition value
 Key: SPARK-34314
 URL: https://issues.apache.org/jira/browse/SPARK-34314
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.2.0
Reporter: Maxim Gekk


The example below portraits the issue:
{code:scala}
  val df = Seq((0, "AA"), (1, "-0")).toDF("id", "part")
  df.write
.partitionBy("part")
.format("parquet")
.save(path)
  val readback = spark.read.parquet(path)
  readback.printSchema()
  readback.show(false)
{code}

It write the partition value as string:
{code}
/private/var/folders/p3/dfs6mf655d7fnjrsjvldh0tcgn/T/spark-e09eae99-7ecf-4ab2-b99b-f63f8dea658d
├── _SUCCESS
├── part=-0
│   └── part-1-02144398-2896-4d21-9628-a8743d098cb4.c000.snappy.parquet
└── part=AA
└── part-0-02144398-2896-4d21-9628-a8743d098cb4.c000.snappy.parquet
{code}
*"-0"* and "AA".

but when Spark reads data back, it transforms "-0" to "0"
{code}
root
 |-- id: integer (nullable = true)
 |-- part: string (nullable = true)

+---++
|id |part|
+---++
|0  |AA  |
|1  |0   |
+---++
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-34314) Wrong discovered partition value

2021-02-01 Thread Maxim Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-34314:
---
Affects Version/s: 3.1.0
   3.0.2
   2.4.8

> Wrong discovered partition value
> 
>
> Key: SPARK-34314
> URL: https://issues.apache.org/jira/browse/SPARK-34314
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.8, 3.0.2, 3.1.0, 3.2.0
>Reporter: Maxim Gekk
>Priority: Major
>
> The example below portraits the issue:
> {code:scala}
>   val df = Seq((0, "AA"), (1, "-0")).toDF("id", "part")
>   df.write
> .partitionBy("part")
> .format("parquet")
> .save(path)
>   val readback = spark.read.parquet(path)
>   readback.printSchema()
>   readback.show(false)
> {code}
> It write the partition value as string:
> {code}
> /private/var/folders/p3/dfs6mf655d7fnjrsjvldh0tcgn/T/spark-e09eae99-7ecf-4ab2-b99b-f63f8dea658d
> ├── _SUCCESS
> ├── part=-0
> │   └── part-1-02144398-2896-4d21-9628-a8743d098cb4.c000.snappy.parquet
> └── part=AA
> └── part-0-02144398-2896-4d21-9628-a8743d098cb4.c000.snappy.parquet
> {code}
> *"-0"* and "AA".
> but when Spark reads data back, it transforms "-0" to "0"
> {code}
> root
>  |-- id: integer (nullable = true)
>  |-- part: string (nullable = true)
> +---++
> |id |part|
> +---++
> |0  |AA  |
> |1  |0   |
> +---++
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-34332) Unify v1 and v2 ALTER TABLE .. SET LOCATION tests

2021-02-02 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-34332:
--

 Summary: Unify v1 and v2 ALTER TABLE .. SET LOCATION tests
 Key: SPARK-34332
 URL: https://issues.apache.org/jira/browse/SPARK-34332
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Maxim Gekk
Assignee: Maxim Gekk
 Fix For: 3.2.0


Extract ALTER TABLE .. SET SERDE tests to the common place to run them for V1 
and v2 datasources. Some tests can be places to V1 and V2 specific test suites.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-34332) Unify v1 and v2 ALTER TABLE .. SET LOCATION tests

2021-02-02 Thread Maxim Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-34332:
---
Description: Extract ALTER TABLE .. SET LOCATION tests to the common place 
to run them for V1 and v2 datasources. Some tests can be places to V1 and V2 
specific test suites.  (was: Extract ALTER TABLE .. SET SERDE tests to the 
common place to run them for V1 and v2 datasources. Some tests can be places to 
V1 and V2 specific test suites.)

> Unify v1 and v2 ALTER TABLE .. SET LOCATION tests
> -
>
> Key: SPARK-34332
> URL: https://issues.apache.org/jira/browse/SPARK-34332
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
> Fix For: 3.2.0
>
>
> Extract ALTER TABLE .. SET LOCATION tests to the common place to run them for 
> V1 and v2 datasources. Some tests can be places to V1 and V2 specific test 
> suites.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-34360) Support table truncation by v2 Table Catalogs

2021-02-04 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-34360:
--

 Summary: Support table truncation by v2 Table Catalogs
 Key: SPARK-34360
 URL: https://issues.apache.org/jira/browse/SPARK-34360
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Maxim Gekk
Assignee: Maxim Gekk
 Fix For: 3.2.0


Add new method `truncatePartition` in `SupportsPartitionManagement` and 
`truncatePartitions` in `SupportsAtomicPartitionManagement`.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-34360) Support table truncation by v2 Table Catalogs

2021-02-04 Thread Maxim Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-34360:
---
Description: Add new method `truncateTable` to the TableCatalog interface 
with default implementation. And implement this method in InMemoryTableCatalog. 
 (was: Add new method `truncatePartition` in `SupportsPartitionManagement` and 
`truncatePartitions` in `SupportsAtomicPartitionManagement`.)

> Support table truncation by v2 Table Catalogs
> -
>
> Key: SPARK-34360
> URL: https://issues.apache.org/jira/browse/SPARK-34360
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
> Fix For: 3.2.0
>
>
> Add new method `truncateTable` to the TableCatalog interface with default 
> implementation. And implement this method in InMemoryTableCatalog.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-34371) Run datetime rebasing tests for parquet DSv1 and DSv2

2021-02-04 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-34371:
--

 Summary: Run datetime rebasing tests for parquet DSv1 and DSv2
 Key: SPARK-34371
 URL: https://issues.apache.org/jira/browse/SPARK-34371
 Project: Spark
  Issue Type: Test
  Components: SQL
Affects Versions: 3.2.0
Reporter: Maxim Gekk


Extract datetime rebasing tests from ParquetIOSuite and place them a separate 
test suite to run it for both implementations DS v1 and v2.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-34377) Support parquet datasource options to control datetime rebasing in read

2021-02-05 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-34377:
--

 Summary: Support parquet datasource options to control datetime 
rebasing in read
 Key: SPARK-34377
 URL: https://issues.apache.org/jira/browse/SPARK-34377
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Affects Versions: 3.2.0
Reporter: Maxim Gekk


Add new parquet options similar to the SQL configs 
{{spark.sql.legacy.parquet.datetimeRebaseModeInRead}} and 
{{spark.sql.legacy.parquet.int96RebaseModeInRead.}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-34385) Unwrap SparkUpgradeException in v1 Parquet datasource

2021-02-06 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-34385:
--

 Summary: Unwrap SparkUpgradeException in v1 Parquet datasource
 Key: SPARK-34385
 URL: https://issues.apache.org/jira/browse/SPARK-34385
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.2.0
Reporter: Maxim Gekk


Unwrap SparkUpgradeException in FilePartitionReader, and throw it as caused one 
of SparkException.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-34385) Unwrap SparkUpgradeException in v2 Parquet datasource

2021-02-06 Thread Maxim Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-34385:
---
Summary: Unwrap SparkUpgradeException in v2 Parquet datasource  (was: 
Unwrap SparkUpgradeException in v1 Parquet datasource)

> Unwrap SparkUpgradeException in v2 Parquet datasource
> -
>
> Key: SPARK-34385
> URL: https://issues.apache.org/jira/browse/SPARK-34385
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Priority: Major
>
> Unwrap SparkUpgradeException in FilePartitionReader, and throw it as caused 
> one of SparkException.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-34386) "Proleptic" date off by 10 days when returned by .collectAsList

2021-02-06 Thread Maxim Gekk (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17280268#comment-17280268
 ] 

Maxim Gekk commented on SPARK-34386:


[~bysza] Thanks for the ping. This is expected behavior, actually. The 
collectAsList() method converts internal timestamp values (in the Proleptic 
Gregorian calendar) to java.sql.Timestamp which is based on the hybrid calendar 
(Julian + Proleptic Gregorian calendars). The timestamp from your example 
doesn't exist in the hybrid calendar, so, Spark shifts it to the closest valid 
date which is 1582-10-15. If you want to receive timestamps AS IS  from 
collectAsList(), please, switch to Java 8 types via 
*spark.sql.datetime.java8API.enabled*.

> "Proleptic" date off by 10 days when returned by .collectAsList
> ---
>
> Key: SPARK-34386
> URL: https://issues.apache.org/jira/browse/SPARK-34386
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1
> Environment: Windows 10
>Reporter: Marek Byszewski
>Priority: Major
>
> Run the following commands using Spark 3.0.1:
> {{scala> spark.sql("select to_timestamp('1582-10-05 02:12:34.997') as 
> data_console").show(false)}}
> {{+---+}}
> {{|data_console           |}}
> {{+---+}}
> {{|*1582-10-05 02:12:34.997*|}}
> {{+---+}}
> {{scala> spark.sql("select to_timestamp('1582-10-05 02:12:34.997') as 
> data_console")}}
> {{res3: org.apache.spark.sql.DataFrame = [data_console: timestamp]}}
> {{scala> res3.collectAsList}}
> {{res4: java.util.List[org.apache.spark.sql.Row] = 
> [[*1582-10-{color:#FF}15{color} 02:12:34.997*]]}}
> Notice that the returned date is off by 10 days compared to the date returned 
> by the first command.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-34386) "Proleptic" date off by 10 days when returned by .collectAsList

2021-02-06 Thread Maxim Gekk (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17280269#comment-17280269
 ] 

Maxim Gekk commented on SPARK-34386:


[~bysza] You can find more details in the blog post: 
https://databricks.com/blog/2020/07/22/a-comprehensive-look-at-dates-and-timestamps-in-apache-spark-3-0.html

> "Proleptic" date off by 10 days when returned by .collectAsList
> ---
>
> Key: SPARK-34386
> URL: https://issues.apache.org/jira/browse/SPARK-34386
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1
> Environment: Windows 10
>Reporter: Marek Byszewski
>Priority: Major
>
> Run the following commands using Spark 3.0.1:
> {{scala> spark.sql("select to_timestamp('1582-10-05 02:12:34.997') as 
> data_console").show(false)}}
> {{+---+}}
> {{|data_console           |}}
> {{+---+}}
> {{|*1582-10-05 02:12:34.997*|}}
> {{+---+}}
> {{scala> spark.sql("select to_timestamp('1582-10-05 02:12:34.997') as 
> data_console")}}
> {{res3: org.apache.spark.sql.DataFrame = [data_console: timestamp]}}
> {{scala> res3.collectAsList}}
> {{res4: java.util.List[org.apache.spark.sql.Row] = 
> [[*1582-10-{color:#FF}15{color} 02:12:34.997*]]}}
> Notice that the returned date is off by 10 days compared to the date returned 
> by the first command.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-34397) Support v2 `MSCK REPAIR TABLE`

2021-02-07 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-34397:
--

 Summary: Support v2 `MSCK REPAIR TABLE`
 Key: SPARK-34397
 URL: https://issues.apache.org/jira/browse/SPARK-34397
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Maxim Gekk


Implement the `MSCK REPAIR TABLE` command for tables from v2 catalogs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-34401) Update public docs about altering cached tables/views

2021-02-08 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-34401:
--

 Summary: Update public docs about altering cached tables/views
 Key: SPARK-34401
 URL: https://issues.apache.org/jira/browse/SPARK-34401
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Maxim Gekk






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-34404) Support Avro datasource options to control datetime rebasing in read

2021-02-08 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-34404:
--

 Summary: Support Avro datasource options to control datetime 
rebasing in read
 Key: SPARK-34404
 URL: https://issues.apache.org/jira/browse/SPARK-34404
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Affects Versions: 3.2.0
Reporter: Maxim Gekk
Assignee: Maxim Gekk
 Fix For: 3.2.0


Add new parquet options similar to the SQL configs 
{{spark.sql.legacy.parquet.datetimeRebaseModeInRead}} and 
{{spark.sql.legacy.parquet.int96RebaseModeInRead.}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-34404) Support Avro datasource options to control datetime rebasing in read

2021-02-08 Thread Maxim Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-34404:
---
Description: Add new Avro option similar to the SQL configs 
{{spark.sql.legacy.parquet.datetimeRebaseModeInRead}}{{.}}  (was: Add new 
parquet options similar to the SQL configs 
{{spark.sql.legacy.parquet.datetimeRebaseModeInRead}} and 
{{spark.sql.legacy.parquet.int96RebaseModeInRead.}})

> Support Avro datasource options to control datetime rebasing in read
> 
>
> Key: SPARK-34404
> URL: https://issues.apache.org/jira/browse/SPARK-34404
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
> Fix For: 3.2.0
>
>
> Add new Avro option similar to the SQL configs 
> {{spark.sql.legacy.parquet.datetimeRebaseModeInRead}}{{.}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-34392) Invalid ID for offset-based ZoneId since Spark 3.0

2021-02-09 Thread Maxim Gekk (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17281838#comment-17281838
 ] 

Maxim Gekk commented on SPARK-34392:


The "GMT+8:00" string is unsupported format in 3.0, see docs for the 
to_utc_timestamp() function:
{code:scala}
   * @param tz A string detailing the time zone ID that the input should be 
adjusted to. It should
   *   be in the format of either region-based zone IDs or zone 
offsets. Region IDs must
   *   have the form 'area/city', such as 'America/Los_Angeles'. Zone 
offsets must be in
   *   the format '(+|-)HH:mm', for example '-08:00' or '+01:00'. Also 
'UTC' and 'Z' are
   *   supported as aliases of '+00:00'. Other short names are not 
recommended to use
   *   because they can be ambiguous.
{code}

> Invalid ID for offset-based ZoneId since Spark 3.0
> --
>
> Key: SPARK-34392
> URL: https://issues.apache.org/jira/browse/SPARK-34392
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0, 3.0.1
>Reporter: Yuming Wang
>Priority: Major
>
> How to reproduce this issue:
> {code:sql}
> select to_utc_timestamp("2020-02-07 16:00:00", "GMT+8:00");
> {code}
> Spark 2.4:
> {noformat}
> spark-sql> select to_utc_timestamp("2020-02-07 16:00:00", "GMT+8:00");
> 2020-02-07 08:00:00
> Time taken: 0.089 seconds, Fetched 1 row(s)
> {noformat}
> Spark 3.x:
> {noformat}
> spark-sql> select to_utc_timestamp("2020-02-07 16:00:00", "GMT+8:00");
> 21/02/07 01:24:32 ERROR SparkSQLDriver: Failed in [select 
> to_utc_timestamp("2020-02-07 16:00:00", "GMT+8:00")]
> java.time.DateTimeException: Invalid ID for offset-based ZoneId: GMT+8:00
>   at java.time.ZoneId.ofWithPrefix(ZoneId.java:437)
>   at java.time.ZoneId.of(ZoneId.java:407)
>   at java.time.ZoneId.of(ZoneId.java:359)
>   at java.time.ZoneId.of(ZoneId.java:315)
>   at 
> org.apache.spark.sql.catalyst.util.DateTimeUtils$.getZoneId(DateTimeUtils.scala:53)
>   at 
> org.apache.spark.sql.catalyst.util.DateTimeUtils$.toUTCTime(DateTimeUtils.scala:814)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-34392) Invalid ID for offset-based ZoneId since Spark 3.0

2021-02-09 Thread Maxim Gekk (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17281838#comment-17281838
 ] 

Maxim Gekk edited comment on SPARK-34392 at 2/9/21, 3:26 PM:
-

The "GMT+8:00" string is unsupported format in 3.0, see docs for the 
to_utc_timestamp() function 
(https://github.com/apache/spark/blob/30468a901577e82c855fbc4cb78e1b869facb44c/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L3397-L3402):
{code:scala}
@param tz A string detailing the time zone ID that the input should be adjusted 
to. It should
  be in the format of either region-based zone IDs or zone offsets. Region IDs 
must
  have the form 'area/city', such as 'America/Los_Angeles'. Zone offsets must 
be in
  the format '(+|-)HH:mm', for example '-08:00' or '+01:00'. Also 'UTC' and 'Z' 
are
  supported as aliases of '+00:00'. Other short names are not recommended to use
  because they can be ambiguous.
{code}


was (Author: maxgekk):
The "GMT+8:00" string is unsupported format in 3.0, see docs for the 
to_utc_timestamp() function:
{code:scala}
   * @param tz A string detailing the time zone ID that the input should be 
adjusted to. It should
   *   be in the format of either region-based zone IDs or zone 
offsets. Region IDs must
   *   have the form 'area/city', such as 'America/Los_Angeles'. Zone 
offsets must be in
   *   the format '(+|-)HH:mm', for example '-08:00' or '+01:00'. Also 
'UTC' and 'Z' are
   *   supported as aliases of '+00:00'. Other short names are not 
recommended to use
   *   because they can be ambiguous.
{code}

> Invalid ID for offset-based ZoneId since Spark 3.0
> --
>
> Key: SPARK-34392
> URL: https://issues.apache.org/jira/browse/SPARK-34392
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0, 3.0.1
>Reporter: Yuming Wang
>Priority: Major
>
> How to reproduce this issue:
> {code:sql}
> select to_utc_timestamp("2020-02-07 16:00:00", "GMT+8:00");
> {code}
> Spark 2.4:
> {noformat}
> spark-sql> select to_utc_timestamp("2020-02-07 16:00:00", "GMT+8:00");
> 2020-02-07 08:00:00
> Time taken: 0.089 seconds, Fetched 1 row(s)
> {noformat}
> Spark 3.x:
> {noformat}
> spark-sql> select to_utc_timestamp("2020-02-07 16:00:00", "GMT+8:00");
> 21/02/07 01:24:32 ERROR SparkSQLDriver: Failed in [select 
> to_utc_timestamp("2020-02-07 16:00:00", "GMT+8:00")]
> java.time.DateTimeException: Invalid ID for offset-based ZoneId: GMT+8:00
>   at java.time.ZoneId.ofWithPrefix(ZoneId.java:437)
>   at java.time.ZoneId.of(ZoneId.java:407)
>   at java.time.ZoneId.of(ZoneId.java:359)
>   at java.time.ZoneId.of(ZoneId.java:315)
>   at 
> org.apache.spark.sql.catalyst.util.DateTimeUtils$.getZoneId(DateTimeUtils.scala:53)
>   at 
> org.apache.spark.sql.catalyst.util.DateTimeUtils$.toUTCTime(DateTimeUtils.scala:814)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-34418) Check v1 TRUNCATE TABLE preserves partitions

2021-02-10 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-34418:
--

 Summary: Check v1 TRUNCATE TABLE preserves partitions
 Key: SPARK-34418
 URL: https://issues.apache.org/jira/browse/SPARK-34418
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Maxim Gekk


Add a test which checks TRUNCATE TABLE only removes rows and preserves existing 
partitions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-34424) HiveOrcHadoopFsRelationSuite fails with seed 610710213676

2021-02-12 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-34424:
--

 Summary: HiveOrcHadoopFsRelationSuite fails with seed 610710213676
 Key: SPARK-34424
 URL: https://issues.apache.org/jira/browse/SPARK-34424
 Project: Spark
  Issue Type: Test
  Components: SQL
Affects Versions: 3.0.2, 3.2.0, 3.1.1
Reporter: Maxim Gekk


The test "test all data types" in HiveOrcHadoopFsRelationSuite fails with:
{code:java}
== Results ==
!== Correct Answer - 20 ==== Spark Answer - 20 ==
 struct   struct
 [1,1582-10-15]   [1,1582-10-15]
 [2,null] [2,null]
 [3,1970-01-01]   [3,1970-01-01]
 [4,1681-08-06]   [4,1681-08-06]
 [5,1582-10-15]   [5,1582-10-15]
 [6,-12-31]   [6,-12-31]
 [7,0583-01-04]   [7,0583-01-04]
 [8,6077-03-04]   [8,6077-03-04]
![9,1582-10-06]   [9,1582-10-15]
 [10,1582-10-15]  [10,1582-10-15]
 [11,-12-31]  [11,-12-31]
 [12,9722-10-04]  [12,9722-10-04]
 [13,0243-12-19]  [13,0243-12-19]
 [14,-12-31]  [14,-12-31]
 [15,8743-01-24]  [15,8743-01-24]
 [16,1039-10-31]  [16,1039-10-31]
 [17,-12-31]  [17,-12-31]
 [18,1582-10-15]  [18,1582-10-15]
 [19,1582-10-15]  [19,1582-10-15]
 [20,1582-10-15]  [20,1582-10-15]
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-34431) Only load hive-site.xml once

2021-02-13 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-34431:
--

 Summary: Only load hive-site.xml once
 Key: SPARK-34431
 URL: https://issues.apache.org/jira/browse/SPARK-34431
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.2.0
Reporter: Maxim Gekk


Hive configs from hive-site.xml are parsed over and over again. We can optimize 
this, and parse it only once.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-34434) Mention DS rebase options in SparkUpgradeException

2021-02-14 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-34434:
--

 Summary: Mention DS rebase options in SparkUpgradeException 
 Key: SPARK-34434
 URL: https://issues.apache.org/jira/browse/SPARK-34434
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.2.0
Reporter: Maxim Gekk


Mention the DS options added by SPARK-34404 and SPARK-34377 in 
SparkUpgradeException.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-34437) Update Spark SQL guide about rebase DS options and SQL configs

2021-02-14 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-34437:
--

 Summary: Update Spark SQL guide about rebase DS options and SQL 
configs
 Key: SPARK-34437
 URL: https://issues.apache.org/jira/browse/SPARK-34437
 Project: Spark
  Issue Type: Documentation
  Components: SQL
Affects Versions: 3.2.0
Reporter: Maxim Gekk


Describe the following SQL configs:
* spark.sql.legacy.parquet.int96RebaseModeInWrite
* spark.sql.legacy.parquet.datetimeRebaseModeInWrite
* spark.sql.legacy.parquet.int96RebaseModeInRead
* spark.sql.legacy.parquet.datetimeRebaseModeInRead
* spark.sql.legacy.avro.datetimeRebaseModeInWrite
* spark.sql.legacy.avro.datetimeRebaseModeInRead

And Avro/Parquet options datetimeRebaseMode and int96RebaseMode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-34439) Recognize `spark_catalog` in new identifier while view/table renaming

2021-02-15 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-34439:
--

 Summary: Recognize `spark_catalog` in new identifier while 
view/table renaming
 Key: SPARK-34439
 URL: https://issues.apache.org/jira/browse/SPARK-34439
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Maxim Gekk


Currently, v1 ALTER TABLE .. RENAME TO doesn't recognize spark_catalog in new 
view/table identifiers. The example below demonstrates the issue:
{code:scala}
spark-sql> CREATE DATABASE db;
spark-sql> CREATE TABLE spark_catalog.db.tbl (c0 INT) USING parquet;
spark-sql> INSERT INTO spark_catalog.db.tbl SELECT 0;
spark-sql> SELECT * FROM spark_catalog.db.tbl;
0
spark-sql> ALTER TABLE spark_catalog.db.tbl RENAME TO spark_catalog.db.tbl2;
Error in query: spark_catalog.db.tbl2 is not a valid TableIdentifier as it has 
more than 2 name parts.
{code}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-34440) Allow saving/loading datetime in ORC w/o rebasing

2021-02-15 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-34440:
--

 Summary: Allow saving/loading datetime in ORC w/o rebasing
 Key: SPARK-34440
 URL: https://issues.apache.org/jira/browse/SPARK-34440
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Affects Versions: 3.1.0
Reporter: Maxim Gekk
Assignee: Maxim Gekk
 Fix For: 3.1.0


Currently, Spark always performs rebasing of INT96 columns in Parquet 
datasource but this is not required by parquet spec. This tickets aims to allow 
users to turn off rebasing via SQL config.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-34440) Allow saving/loading datetime in ORC w/o rebasing

2021-02-15 Thread Maxim Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-34440:
---
Fix Version/s: (was: 3.1.0)
   3.2.0

> Allow saving/loading datetime in ORC w/o rebasing
> -
>
> Key: SPARK-34440
> URL: https://issues.apache.org/jira/browse/SPARK-34440
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
> Fix For: 3.2.0
>
>
> Currently, Spark always performs rebasing of INT96 columns in Parquet 
> datasource but this is not required by parquet spec. This tickets aims to 
> allow users to turn off rebasing via SQL config.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-34440) Allow saving/loading datetime in ORC w/o rebasing

2021-02-15 Thread Maxim Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-34440:
---
Description: Currently, Spark always performs rebasing of date/timestamp 
columns in ORC datasource but this is not required by parquet spec. This 
tickets aims to allow users to turn off rebasing via SQL configs or DS options. 
 (was: Currently, Spark always performs rebasing of INT96 columns in Parquet 
datasource but this is not required by parquet spec. This tickets aims to allow 
users to turn off rebasing via SQL config.)

> Allow saving/loading datetime in ORC w/o rebasing
> -
>
> Key: SPARK-34440
> URL: https://issues.apache.org/jira/browse/SPARK-34440
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
> Fix For: 3.2.0
>
>
> Currently, Spark always performs rebasing of date/timestamp columns in ORC 
> datasource but this is not required by parquet spec. This tickets aims to 
> allow users to turn off rebasing via SQL configs or DS options.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   3   4   5   6   7   8   9   10   >