[jira] [Resolved] (SPARK-24778) DateTimeUtils.getTimeZone method returns GMT time if timezone cannot be parsed
[ https://issues.apache.org/jira/browse/SPARK-24778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk resolved SPARK-24778. Resolution: Fixed Fix Version/s: 3.0.0 The issue has been fixed already by using ZoneId.of for parsing time zone ids. > DateTimeUtils.getTimeZone method returns GMT time if timezone cannot be parsed > -- > > Key: SPARK-24778 > URL: https://issues.apache.org/jira/browse/SPARK-24778 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.1 >Reporter: Vinitha Reddy Gankidi >Priority: Major > Fix For: 3.0.0 > > > {{DateTimeUtils.getTimeZone}} calls java's {{Timezone.getTimezone}} method > that defaults to GMT if the timezone cannot be parsed. This can be misleading > for users and its better to return NULL instead of returning an incorrect > value. > To reproduce: {{from_utc_timestamp}} is one of the functions that calls > {{DateTimeUtils.getTimeZone}}. Session timezone is GMT for the following > queries. > {code:java} > SELECT from_utc_timestamp('2018-07-10 12:00:00', 'GMT+05:00') -> 2018-07-10 > 17:00:00 > SELECT from_utc_timestamp('2018-07-10 12:00:00', '+05:00') -> 2018-07-10 > 12:00:00 (Defaults to GMT as the timezone is not recognized){code} > We could fix it by using the workaround mentioned here: > [https://bugs.openjdk.java.net/browse/JDK-4412864]. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26016) Encoding not working when using a map / mapPartitions call
[ https://issues.apache.org/jira/browse/SPARK-26016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16782778#comment-16782778 ] Maxim Gekk commented on SPARK-26016: > nothing reinterprets the bytes according to a different encoding? Correct > The underlying Hadoop impl does interpret the bytes as UTF-8 (skipping of > BOMs, etc) ... Hadoop's LineReader does not decode input bytes. It just copy bytes between line delimiters in https://github.com/apache/hadoop-common/blob/42a61a4fbc88303913c4681f0d40ffcc737e70b5/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/LineReader.java#L257 by using Text.append (https://github.com/apache/hadoop-common/blob/42a61a4fbc88303913c4681f0d40ffcc737e70b5/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/LineReader.java#L335): {code:java} public void append(byte[] utf8, int start, int len) { setCapacity(length + len, true); System.arraycopy(utf8, start, bytes, length, len); length += len; } {code} Spark actually never checks correctness of UTF8String input. I even created a few tickets for that: https://issues.apache.org/jira/browse/SPARK-23741 https://issues.apache.org/jira/browse/SPARK-23649 Adding such checks could bring some performance degradation most likely. > Encoding not working when using a map / mapPartitions call > -- > > Key: SPARK-26016 > URL: https://issues.apache.org/jira/browse/SPARK-26016 > Project: Spark > Issue Type: Bug > Components: Java API >Affects Versions: 2.4.0 >Reporter: Chris Caspanello >Priority: Major > Attachments: spark-sandbox.zip > > > Attached you will find a project with unit tests showing the issue at hand. > If I read in a ISO-8859-1 encoded file and simply write out what was read; > the contents in the part file matches what was read. Which is great. > However, the second I use a map / mapPartitions function it looks like the > encoding is not correct. In addition a simple collectAsList and writing that > list of strings to a file does not work either. I don't think I'm doing > anything wrong. Can someone please investigate? I think this is a bug. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-27057) Common trait for limit exec operators
Maxim Gekk created SPARK-27057: -- Summary: Common trait for limit exec operators Key: SPARK-27057 URL: https://issues.apache.org/jira/browse/SPARK-27057 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.4.0 Reporter: Maxim Gekk Currently, CollectLimitExec, LocalLimitExec and GlobalLimitExec have the UnaryExecNode trait as the common trait. It is slightly inconvenient to distinguish those operators from others. The ticket aims to introduce new trait for all 3 operators. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-27109) Refactoring of TimestampFormatter and DateFormatter
Maxim Gekk created SPARK-27109: -- Summary: Refactoring of TimestampFormatter and DateFormatter Key: SPARK-27109 URL: https://issues.apache.org/jira/browse/SPARK-27109 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.4.0 Reporter: Maxim Gekk * Date/TimestampFormatter converts parsed input to Instant before converting it to days/micros. This is unnecessary conversion because seconds and fraction of second can be extracted (calculated) from ZoneDateTime directly * Avoid additional extraction of TemporalQueries.localTime from temporalAccessor -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-27199) Replace TimeZone by ZoneId in TimestampFormatter API
Maxim Gekk created SPARK-27199: -- Summary: Replace TimeZone by ZoneId in TimestampFormatter API Key: SPARK-27199 URL: https://issues.apache.org/jira/browse/SPARK-27199 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.4.0 Reporter: Maxim Gekk Internally, TimestampFormatter implementations use ZoneId but not TimeZone which comes via API. Conversion from TimeZone to ZoneId is not for free. Actually, TimeZone is converted to String, and the String and parsed to ZoneId. The conversion to String can be eliminated if TimestampFormatter would accept ZoneId. And also, TimeZone is converted from String in some cases (JSON options). So, in bad case String -> TimeZone -> String -> ZoneId -> ZoneOffset. The ticket aims to use ZoneId in TimestampFormatter API. We could require ZoneOffset but it is not convenient in most cases because conversion ZoneId to ZoneOffset requires Instant. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-27212) Eliminate TimeZone to ZoneId conversion in stringToTimestamp
Maxim Gekk created SPARK-27212: -- Summary: Eliminate TimeZone to ZoneId conversion in stringToTimestamp Key: SPARK-27212 URL: https://issues.apache.org/jira/browse/SPARK-27212 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.4.0 Reporter: Maxim Gekk The stringToTimestamp method of DateTimeUtils (and stringToDate as well) can be called per each row. And the method converts TimeZone to ZoneId each time. The operation is relatively expensive because it does intermediate conversion to a string: http://hg.openjdk.java.net/jdk8u/jdk8u/jdk/file/f940e7a48b72/src/share/classes/java/util/TimeZone.java#l547 The conversion is unnecessary, and could be avoid. The ticket aims to replace signature of stringToTimestamp to require ZoneId as a parameter. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-27222) Support Instant and LocalDate in Literal.apply
Maxim Gekk created SPARK-27222: -- Summary: Support Instant and LocalDate in Literal.apply Key: SPARK-27222 URL: https://issues.apache.org/jira/browse/SPARK-27222 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.4.0 Reporter: Maxim Gekk SPARK-26902 and SPARK-27008 supported java.time.Instant and java.time.LocalDate as external types for TimestampType and DateType. The ticket aims to support literals of such types. In particular, need to extend Literal.apply by adding new cases for java.time.Instant/LocalDate. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-27242) Avoid using default time zone in formatting TIMESTAMP/DATE literals
Maxim Gekk created SPARK-27242: -- Summary: Avoid using default time zone in formatting TIMESTAMP/DATE literals Key: SPARK-27242 URL: https://issues.apache.org/jira/browse/SPARK-27242 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.4.0 Reporter: Maxim Gekk Spark calls the toString() methods of java.sql.Timestamp/java.sql.Date in formatting TIMESTAMP/DATE literals in Literal.sql: https://github.com/apache/spark/blob/0f4f8160e6d01d2e263adcf39d53bd0a03fc1b73/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala#L373-L374 . This is inconsistent to parsing TIMESTAMP/DATE literals in AstBuilder: https://github.com/apache/spark/blob/a529be2930b1d69015f1ac8f85e590f197cf53cf/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala#L1594-L1597 where *spark.sql.session.timeZone* is used in parsing TIMESTAMP literals, and DATE literals are parsed independently from time zone (actually in UTC time zone). The ticket aims to make parsing and formatting date/timestamp literals consistent, and use the SQL config for TIMESTAMP literals. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-27252) Make current_date() independent from time zones
Maxim Gekk created SPARK-27252: -- Summary: Make current_date() independent from time zones Key: SPARK-27252 URL: https://issues.apache.org/jira/browse/SPARK-27252 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.4.0 Reporter: Maxim Gekk The CurrentDate expression produces a result of DateType which is by definition number of days since epoch (in UTC time zone). Current implementation shifts the number of days according to session time zone `spark.sql.session.timeZone`. Result of shifting cannot be considered as number of days since epoch in UTC time zone, and cannot have type `DateType`. There are many reasons that makes the result invalid. For example: # zone offset depends on an instant in UTC timezone, and zone offset of `spark.sql.session.timeZone` in shifted date may have different value. # Result of shifting cannot be considered as number of days since epoch anymore. The ticket aims to make `current_date` independent from time zone, and to return the current date in UTC time zone. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26325) Interpret timestamp fields in Spark while reading json (timestampFormat)
[ https://issues.apache.org/jira/browse/SPARK-26325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16801011#comment-16801011 ] Maxim Gekk commented on SPARK-26325: Can you try Z instead of 'Z'? > Interpret timestamp fields in Spark while reading json (timestampFormat) > > > Key: SPARK-26325 > URL: https://issues.apache.org/jira/browse/SPARK-26325 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 >Reporter: Veenit Shah >Priority: Major > > I am trying to read a pretty printed json which has time fields in it. I want > to interpret the timestamps columns as timestamp fields while reading the > json itself. However, it's still reading them as string when I {{printSchema}} > E.g. Input json file - > {code:java} > [{ > "time_field" : "2017-09-30 04:53:39.412496Z" > }] > {code} > Code - > {code:java} > df = spark.read.option("multiLine", > "true").option("timestampFormat","-MM-dd > HH:mm:ss.SS'Z'").json('path_to_json_file') > {code} > Output of df.printSchema() - > {code:java} > root > |-- time_field: string (nullable = true) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-27325) Support implicit encoders for LocalDate and Instant
Maxim Gekk created SPARK-27325: -- Summary: Support implicit encoders for LocalDate and Instant Key: SPARK-27325 URL: https://issues.apache.org/jira/browse/SPARK-27325 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 2.4.0 Reporter: Maxim Gekk Currently, Spark supports java.time.LocalDate and java.time.Instant as external types for DateType and TimestampType but doesn't allow to construct datasets using the external types because there is no implicit encoders of the types. The ticket aims to add such encoders. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-27327) New JSON benchmarks: functions, dataset parsing
Maxim Gekk created SPARK-27327: -- Summary: New JSON benchmarks: functions, dataset parsing Key: SPARK-27327 URL: https://issues.apache.org/jira/browse/SPARK-27327 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.4.0 Reporter: Maxim Gekk Existing JSONBenchmarks doesn't contain benchmarks for: # JSON functions like from_json # Parsing Dataset[String] The ticket aims to update JSONBenchmark, and add new benchmarks. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-27344) Support the LocalDate and Instant classes in Java Bean encoders
Maxim Gekk created SPARK-27344: -- Summary: Support the LocalDate and Instant classes in Java Bean encoders Key: SPARK-27344 URL: https://issues.apache.org/jira/browse/SPARK-27344 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.4.0 Reporter: Maxim Gekk - Check that Java Bean encoders support java.time.LocalDate and java.time.Instant. Write a test for that. - Update the comment: https://github.com/apache/spark/pull/24249/files#diff-3e88c21c9270fef6eaf6f0e64ed81f27R152 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-27357) Convert timestamps to/from dates independently from time zones
Maxim Gekk created SPARK-27357: -- Summary: Convert timestamps to/from dates independently from time zones Key: SPARK-27357 URL: https://issues.apache.org/jira/browse/SPARK-27357 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.4.0 Reporter: Maxim Gekk Both Catalyst's types TIMESTAMP and DATE internally represent time intervals since epoch in UTC time zone. The TIMESTAMP type contains number of microseconds since epoch, and DATE is number of days since epoch (00:00:00 1 January 1970). As a consequence of that, the conversion should be independent from session or local time zone. The ticket aims to fix current behavior and makes the conversion independent from time zones. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27357) Cast timestamps to/from dates independently from time zones
[ https://issues.apache.org/jira/browse/SPARK-27357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-27357: --- Summary: Cast timestamps to/from dates independently from time zones (was: Convert timestamps to/from dates independently from time zones) > Cast timestamps to/from dates independently from time zones > --- > > Key: SPARK-27357 > URL: https://issues.apache.org/jira/browse/SPARK-27357 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.0 >Reporter: Maxim Gekk >Priority: Minor > > Both Catalyst's types TIMESTAMP and DATE internally represent time intervals > since epoch in UTC time zone. The TIMESTAMP type contains number of > microseconds since epoch, and DATE is number of days since epoch (00:00:00 1 > January 1970). As a consequence of that, the conversion should be independent > from session or local time zone. The ticket aims to fix current behavior and > makes the conversion independent from time zones. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-27398) Get rid of sun.nio.cs.StreamDecoder in CreateJacksonParser
Maxim Gekk created SPARK-27398: -- Summary: Get rid of sun.nio.cs.StreamDecoder in CreateJacksonParser Key: SPARK-27398 URL: https://issues.apache.org/jira/browse/SPARK-27398 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.4.0 Reporter: Maxim Gekk The CreateJacksonParser.getStreamDecoder method creates an instance of ReadableByteChannel and returns the result as of sun.nio.cs.StreamDecoder. This is unnecessary and overcomplicates the method. This code can be replaced by: {code:scala} val bais = new ByteArrayInputStream(in, 0, length) new InputStreamReader(bais, enc) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-27401) Refactoring conversion of Date/Timestamp to/from java.sql.Date/Timestamp
Maxim Gekk created SPARK-27401: -- Summary: Refactoring conversion of Date/Timestamp to/from java.sql.Date/Timestamp Key: SPARK-27401 URL: https://issues.apache.org/jira/browse/SPARK-27401 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.4.0 Reporter: Maxim Gekk The fromJavaTimestamp/toJavaTimestamp and toJavaDate/fromJavaDate can be implemented using existing methods DateTimeUtils like instantToMicros/microsToInstant and daysToLocalDate/localDateToDays. This should allow: # To avoid invocation of millisToDays and time zone offset calculation # To simplify implementation of toJavaTimestamp, and properly handle negative inputs # Detect arithmetic overflow of Long -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-27405) Restrict the range of generated random timestamps
Maxim Gekk created SPARK-27405: -- Summary: Restrict the range of generated random timestamps Key: SPARK-27405 URL: https://issues.apache.org/jira/browse/SPARK-27405 Project: Spark Issue Type: Improvement Components: Tests Affects Versions: 2.4.0 Reporter: Maxim Gekk The timestampLiteralGen of LiteralGenerator can produce instances of java.sql.Timestamp that cause Long arithmetic overflow on conversion milliseconds to microseconds. The former conversion is performed because Catalyst's Timestamp type stores microseconds since epoch internally. The ticket aims to restrict the range of generated random timestamps to [Long.MaxValue / 1000, Long.MinValue/ 1000]. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-27422) CurrentDate should return local date
Maxim Gekk created SPARK-27422: -- Summary: CurrentDate should return local date Key: SPARK-27422 URL: https://issues.apache.org/jira/browse/SPARK-27422 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.0 Reporter: Maxim Gekk According to SQL standard, DATE type is union of (year, month, days), and current date should return a triple of (year, month, days) in session local time zone. The ticket aims to follow the requirement, and calculate a local date for session time zone. The local date should be converted to epoch day, and stored internally in as DATE value. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-27357) Cast timestamps to/from dates independently from time zones
[ https://issues.apache.org/jira/browse/SPARK-27357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk resolved SPARK-27357. Resolution: Not A Problem > Cast timestamps to/from dates independently from time zones > --- > > Key: SPARK-27357 > URL: https://issues.apache.org/jira/browse/SPARK-27357 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.0 >Reporter: Maxim Gekk >Priority: Minor > > Both Catalyst's types TIMESTAMP and DATE internally represent time intervals > since epoch in UTC time zone. The TIMESTAMP type contains number of > microseconds since epoch, and DATE is number of days since epoch (00:00:00 1 > January 1970). As a consequence of that, the conversion should be independent > from session or local time zone. The ticket aims to fix current behavior and > makes the conversion independent from time zones. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-27423) Cast DATE to/from TIMESTAMP according to SQL standard
Maxim Gekk created SPARK-27423: -- Summary: Cast DATE to/from TIMESTAMP according to SQL standard Key: SPARK-27423 URL: https://issues.apache.org/jira/browse/SPARK-27423 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.0 Reporter: Maxim Gekk According to SQL standard, DATE is union of (year, month, day). To convert it to Spark's TIMESTAMP which is TIMESTAMP WITH TIME ZONE, the date should be extended by time at midnight - (year, month, day, hour = 0, minute = 0, seconds = 0). The former timestamp should be considered as a timestamp at the session time zone, and transformed to microseconds since epoch in UTC. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-27438) Increase precision of to_timestamp
Maxim Gekk created SPARK-27438: -- Summary: Increase precision of to_timestamp Key: SPARK-27438 URL: https://issues.apache.org/jira/browse/SPARK-27438 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.4.0 Reporter: Maxim Gekk The to_timestamp() function can parse input string up to second precision even if the specified pattern contains second fraction sub-pattern. The ticket aims to improve precision of to_timestamp() up to microsecond precision. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-27522) Test migration from INT96 to TIMESTAMP_MICROS in parquet
Maxim Gekk created SPARK-27522: -- Summary: Test migration from INT96 to TIMESTAMP_MICROS in parquet Key: SPARK-27522 URL: https://issues.apache.org/jira/browse/SPARK-27522 Project: Spark Issue Type: Test Components: SQL Affects Versions: 2.4.1 Reporter: Maxim Gekk Write tests to check: * Append timestamps of TIMESTAMP_MICROS to existing parquets with INT96 for timestamps * Append timestamps of TIMESTAMP_MICROS to a table with INT96 for timestamps * Append INT96 timestamps to parquet files with TIMESTAMP_MICROS timestamps * Append INT96 timestamps to a table with TIMESTAMP_MICROS timestamps -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-27527) Improve description of Timestamp and Date types
Maxim Gekk created SPARK-27527: -- Summary: Improve description of Timestamp and Date types Key: SPARK-27527 URL: https://issues.apache.org/jira/browse/SPARK-27527 Project: Spark Issue Type: Documentation Components: SQL Affects Versions: 2.4.1 Reporter: Maxim Gekk Describe precisely semantic of TimestampType and DateType, how they represent dates and timestamps internally. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-27528) Use Parquet logical type TIMESTAMP_MICROS by default
Maxim Gekk created SPARK-27528: -- Summary: Use Parquet logical type TIMESTAMP_MICROS by default Key: SPARK-27528 URL: https://issues.apache.org/jira/browse/SPARK-27528 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.4.1 Reporter: Maxim Gekk Currently, Spark uses INT96 type for timestamps written to parquet files. To store Catalyst's Timestamp values as INT96, Spark converts microseconds since epoch to nanoseconds in Julian calendar. This conversion is not necessary if Spark saves timestamps as Parquet TIMESTAMP_MICROS logical type. The ticket aims to switch on TIMESTAMP_MICROS from INT96 in write by default. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27533) Date/timestamps CSV benchmarks
[ https://issues.apache.org/jira/browse/SPARK-27533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-27533: --- Summary: Date/timestamps CSV benchmarks (was: CSV benchmarks date/timestamp ops ) > Date/timestamps CSV benchmarks > -- > > Key: SPARK-27533 > URL: https://issues.apache.org/jira/browse/SPARK-27533 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 2.4.1 >Reporter: Maxim Gekk >Priority: Minor > > Extend CSVBenchmark by new benchmarks: > - Write dates/timestamps to files > - Read/infer dates/timestamp from files > - Read/infer dates/timestamps from Dataset[String] > - to_csv/from_csv for dates/timestamps -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-27533) CSV benchmarks date/timestamp ops
Maxim Gekk created SPARK-27533: -- Summary: CSV benchmarks date/timestamp ops Key: SPARK-27533 URL: https://issues.apache.org/jira/browse/SPARK-27533 Project: Spark Issue Type: Test Components: SQL Affects Versions: 2.4.1 Reporter: Maxim Gekk Extend CSVBenchmark by new benchmarks: - Write dates/timestamps to files - Read/infer dates/timestamp from files - Read/infer dates/timestamps from Dataset[String] - to_csv/from_csv for dates/timestamps -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27533) Date and timestamp CSV benchmarks
[ https://issues.apache.org/jira/browse/SPARK-27533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-27533: --- Summary: Date and timestamp CSV benchmarks (was: Date/timestamps CSV benchmarks) > Date and timestamp CSV benchmarks > - > > Key: SPARK-27533 > URL: https://issues.apache.org/jira/browse/SPARK-27533 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 2.4.1 >Reporter: Maxim Gekk >Priority: Minor > > Extend CSVBenchmark by new benchmarks: > - Write dates/timestamps to files > - Read/infer dates/timestamp from files > - Read/infer dates/timestamps from Dataset[String] > - to_csv/from_csv for dates/timestamps -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-27535) Date and timestamp JSON benchmarks
Maxim Gekk created SPARK-27535: -- Summary: Date and timestamp JSON benchmarks Key: SPARK-27535 URL: https://issues.apache.org/jira/browse/SPARK-27535 Project: Spark Issue Type: Test Components: SQL Affects Versions: 2.4.1 Reporter: Maxim Gekk Extend JSONBenchmark by new benchmarks: * Write dates/timestamps to files * Read/infer dates/timestamp from files * Read/infer dates/timestamps from Dataset[String] * to_json/from_json for dates/timestamps -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27450) Timestamp cast fails when the ISO8601 string omits minutes, seconds or milliseconds
[ https://issues.apache.org/jira/browse/SPARK-27450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16833132#comment-16833132 ] Maxim Gekk commented on SPARK-27450: The cast functions supports limited number of timestamp patterns, see https://github.com/apache/spark/blob/ab8710b57916a129fcb89464209361120d224535/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala#L189-L208 . Please, use to_timestamp with the custom pattern like: {code} scala> val new_df3 = df3 .withColumn("eventTimeTS", to_timestamp($"eventTimeString", "-MM-dd'T'HH:mmXXX")) new_df3: org.apache.spark.sql.DataFrame = [eventTimeString: string, eventTimeTS: timestamp] scala> new_df3.show(false) +--+---+ |eventTimeString |eventTimeTS| +--+---+ |2017-08-01T02:33-03:00|2017-08-01 07:33:00| +--+---+ {code} > Timestamp cast fails when the ISO8601 string omits minutes, seconds or > milliseconds > --- > > Key: SPARK-27450 > URL: https://issues.apache.org/jira/browse/SPARK-27450 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 > Environment: Spark 2.3.x >Reporter: Leandro Rosa >Priority: Major > > ISO8601 allows to omit minutes, seconds and milliseconds. > {quote} > |hh:mm:ss.sss|_or_|hhmmss.sss| > |hh:mm:ss|_or_|hhmmss| > |hh:mm|_or_|hhmm| > | |hh| > {quote} > {quote}Either the seconds, or the minutes and seconds, may be omitted from > the basic or extended time formats for greater brevity but decreased > accuracy: [hh]:[mm], [hh][mm] and [hh] are the resulting reduced accuracy > time formats > {quote} > Source: [Wikipedia ISO8601|https://en.wikipedia.org/wiki/ISO_8601] > Popular libs, such as > [ZonedDateTime|https://docs.oracle.com/javase/8/docs/api/java/time/ZonedDateTime.html], > respect that. However, Timestamp cast fails silently. > > {code:java} > import org.apache.spark.sql.types._ > val df1 = Seq(("2017-08-01T02:33")).toDF("eventTimeString") // NON-ISO8601 > (missing TZ offset) [OK] > val new_df1 = df1 > .withColumn("eventTimeTS", col("eventTimeString").cast(TimestampType)) > new_df1.show(false) > ++---+ > |eventTimeString |eventTimeTS | > ++---+ > |2017-08-01T02:33|2017-08-01 02:33:00| > ++---+ > {code} > {code:java} > val df2 = Seq(("2017-08-01T02:33Z")).toDF("eventTimeString") // ISO8601 [FAIL] > val new_df2 = df2 > .withColumn("eventTimeTS", col("eventTimeString").cast(TimestampType)) > new_df2.show(false) > +-+---+ > |eventTimeString |eventTimeTS| > +-+---+ > |2017-08-01T02:33Z|null | > +-+---+ > {code} > > {code:java} > val df3 = Seq(("2017-08-01T02:33-03:00")).toDF("eventTimeString") // ISO8601 > [FAIL] > val new_df3 = df3 > .withColumn("eventTimeTS", col("eventTimeString").cast(TimestampType)) > new_df3.show(false) > +--+---+ > |eventTimeString |eventTimeTS| > +--+---+ > |2017-08-01T02:33-03:00|null | > +--+---+ > {code} > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27638) date format yyyy-M-dd string comparison not handled properly
[ https://issues.apache.org/jira/browse/SPARK-27638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16833949#comment-16833949 ] Maxim Gekk commented on SPARK-27638: [~srowen] The date literal should be casted to the date type by [stringToDate|[https://github.com/apache/spark/blob/ab8710b57916a129fcb89464209361120d224535/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala#L376]] that is able to parse the date by default, see supported patterns: {code} `` `-[m]m` `-[m]m-[d]d` `-[m]m-[d]d ` `-[m]m-[d]d *` `-[m]m-[d]dT* {code} > date format -M-dd string comparison not handled properly > - > > Key: SPARK-27638 > URL: https://issues.apache.org/jira/browse/SPARK-27638 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.2 >Reporter: peng bo >Priority: Major > > The below example works with both Mysql and Hive, however not with spark. > {code:java} > mysql> select * from date_test where date_col >= '2000-1-1'; > ++ > | date_col | > ++ > | 2000-01-01 | > ++ > {code} > The reason is that Spark casts both sides to String type during date and > string comparison for partial date support. Please find more details in > https://issues.apache.org/jira/browse/SPARK-8420. > Based on some tests, the behavior of Date and String comparison in Hive and > Mysql: > Hive: Cast to Date, partial date is not supported > Spark: Cast to Date, certain "partial date" is supported by defining certain > date string parse rules. Check out {{str_to_datetime}} in > https://github.com/mysql/mysql-server/blob/5.5/sql-common/my_time.c > Here's 2 proposals: > a. Follow Mysql parse rule, but some partial date string comparison cases > won't be supported either. > b. Cast String value to Date, if it passes use date.toString, original string > otherwise. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-27638) date format yyyy-M-dd string comparison not handled properly
[ https://issues.apache.org/jira/browse/SPARK-27638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16833949#comment-16833949 ] Maxim Gekk edited comment on SPARK-27638 at 5/6/19 3:57 PM: [~srowen] The date literal should be casted to the date type by [stringToDate|https://github.com/apache/spark/blob/ab8710b57916a129fcb89464209361120d224535/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala#L376] that is able to parse the date by default, see supported patterns: {code} `` `-[m]m` `-[m]m-[d]d` `-[m]m-[d]d ` `-[m]m-[d]d *` `-[m]m-[d]dT* {code} was (Author: maxgekk): [~srowen] The date literal should be casted to the date type by [stringToDate|[https://github.com/apache/spark/blob/ab8710b57916a129fcb89464209361120d224535/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala#L376]] that is able to parse the date by default, see supported patterns: {code} `` `-[m]m` `-[m]m-[d]d` `-[m]m-[d]d ` `-[m]m-[d]d *` `-[m]m-[d]dT* {code} > date format -M-dd string comparison not handled properly > - > > Key: SPARK-27638 > URL: https://issues.apache.org/jira/browse/SPARK-27638 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.2 >Reporter: peng bo >Priority: Major > > The below example works with both Mysql and Hive, however not with spark. > {code:java} > mysql> select * from date_test where date_col >= '2000-1-1'; > ++ > | date_col | > ++ > | 2000-01-01 | > ++ > {code} > The reason is that Spark casts both sides to String type during date and > string comparison for partial date support. Please find more details in > https://issues.apache.org/jira/browse/SPARK-8420. > Based on some tests, the behavior of Date and String comparison in Hive and > Mysql: > Hive: Cast to Date, partial date is not supported > Spark: Cast to Date, certain "partial date" is supported by defining certain > date string parse rules. Check out {{str_to_datetime}} in > https://github.com/mysql/mysql-server/blob/5.5/sql-common/my_time.c > Here's 2 proposals: > a. Follow Mysql parse rule, but some partial date string comparison cases > won't be supported either. > b. Cast String value to Date, if it passes use date.toString, original string > otherwise. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27638) date format yyyy-M-dd string comparison not handled properly
[ https://issues.apache.org/jira/browse/SPARK-27638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16833957#comment-16833957 ] Maxim Gekk commented on SPARK-27638: It works with explicit to_date: {code:scala} scala> val ds = spark.range(1).selectExpr("date '2000-01-01' as d") ds: org.apache.spark.sql.DataFrame = [d: date] scala> ds.where("d >= to_date('2000-1-1')").show +--+ | d| +--+ |2000-01-01| +--+ {code} but with to_date, it compares strings: {code} scala> ds.where("d >= '2000-1-1'").explain(true) == Parsed Logical Plan == 'Filter ('d >= 2000-1-1) +- Project [10957 AS d#51] +- Range (0, 1, step=1, splits=Some(8)) == Analyzed Logical Plan == d: date Filter (cast(d#51 as string) >= 2000-1-1) +- Project [10957 AS d#51] +- Range (0, 1, step=1, splits=Some(8)) == Optimized Logical Plan == LocalRelation , [d#51] == Physical Plan == LocalTableScan , [d#51] {code} > date format -M-dd string comparison not handled properly > - > > Key: SPARK-27638 > URL: https://issues.apache.org/jira/browse/SPARK-27638 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.2 >Reporter: peng bo >Priority: Major > > The below example works with both Mysql and Hive, however not with spark. > {code:java} > mysql> select * from date_test where date_col >= '2000-1-1'; > ++ > | date_col | > ++ > | 2000-01-01 | > ++ > {code} > The reason is that Spark casts both sides to String type during date and > string comparison for partial date support. Please find more details in > https://issues.apache.org/jira/browse/SPARK-8420. > Based on some tests, the behavior of Date and String comparison in Hive and > Mysql: > Hive: Cast to Date, partial date is not supported > Spark: Cast to Date, certain "partial date" is supported by defining certain > date string parse rules. Check out {{str_to_datetime}} in > https://github.com/mysql/mysql-server/blob/5.5/sql-common/my_time.c > Here's 2 proposals: > a. Follow Mysql parse rule, but some partial date string comparison cases > won't be supported either. > b. Cast String value to Date, if it passes use date.toString, original string > otherwise. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-27638) date format yyyy-M-dd string comparison not handled properly
[ https://issues.apache.org/jira/browse/SPARK-27638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16833957#comment-16833957 ] Maxim Gekk edited comment on SPARK-27638 at 5/6/19 4:10 PM: It works with explicit to_date: {code:scala} scala> val ds = spark.range(1).selectExpr("date '2000-01-01' as d") ds: org.apache.spark.sql.DataFrame = [d: date] scala> ds.where("d >= to_date('2000-1-1')").show +--+ | d| +--+ |2000-01-01| +--+ {code} but with to_date, it compares strings: {code} scala> ds.where("d >= '2000-1-1'").explain(true) == Parsed Logical Plan == 'Filter ('d >= 2000-1-1) +- Project [10957 AS d#51] +- Range (0, 1, step=1, splits=Some(8)) == Analyzed Logical Plan == d: date Filter (cast(d#51 as string) >= 2000-1-1) +- Project [10957 AS d#51] +- Range (0, 1, step=1, splits=Some(8)) == Optimized Logical Plan == LocalRelation , [d#51] == Physical Plan == LocalTableScan , [d#51] {code} The same is for '2000-01-01', the date column is casted to string. was (Author: maxgekk): It works with explicit to_date: {code:scala} scala> val ds = spark.range(1).selectExpr("date '2000-01-01' as d") ds: org.apache.spark.sql.DataFrame = [d: date] scala> ds.where("d >= to_date('2000-1-1')").show +--+ | d| +--+ |2000-01-01| +--+ {code} but with to_date, it compares strings: {code} scala> ds.where("d >= '2000-1-1'").explain(true) == Parsed Logical Plan == 'Filter ('d >= 2000-1-1) +- Project [10957 AS d#51] +- Range (0, 1, step=1, splits=Some(8)) == Analyzed Logical Plan == d: date Filter (cast(d#51 as string) >= 2000-1-1) +- Project [10957 AS d#51] +- Range (0, 1, step=1, splits=Some(8)) == Optimized Logical Plan == LocalRelation , [d#51] == Physical Plan == LocalTableScan , [d#51] {code} > date format -M-dd string comparison not handled properly > - > > Key: SPARK-27638 > URL: https://issues.apache.org/jira/browse/SPARK-27638 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.2 >Reporter: peng bo >Priority: Major > > The below example works with both Mysql and Hive, however not with spark. > {code:java} > mysql> select * from date_test where date_col >= '2000-1-1'; > ++ > | date_col | > ++ > | 2000-01-01 | > ++ > {code} > The reason is that Spark casts both sides to String type during date and > string comparison for partial date support. Please find more details in > https://issues.apache.org/jira/browse/SPARK-8420. > Based on some tests, the behavior of Date and String comparison in Hive and > Mysql: > Hive: Cast to Date, partial date is not supported > Spark: Cast to Date, certain "partial date" is supported by defining certain > date string parse rules. Check out {{str_to_datetime}} in > https://github.com/mysql/mysql-server/blob/5.5/sql-common/my_time.c > Here's 2 proposals: > a. Follow Mysql parse rule, but some partial date string comparison cases > won't be supported either. > b. Cast String value to Date, if it passes use date.toString, original string > otherwise. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27638) date format yyyy-M-dd string comparison not handled properly
[ https://issues.apache.org/jira/browse/SPARK-27638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16834214#comment-16834214 ] Maxim Gekk commented on SPARK-27638: [~pengbo] Are you going to propose a PR for that? If not, I can fix the issue. > date format -M-dd string comparison not handled properly > - > > Key: SPARK-27638 > URL: https://issues.apache.org/jira/browse/SPARK-27638 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.2 >Reporter: peng bo >Priority: Major > > The below example works with both Mysql and Hive, however not with spark. > {code:java} > mysql> select * from date_test where date_col >= '2000-1-1'; > ++ > | date_col | > ++ > | 2000-01-01 | > ++ > {code} > The reason is that Spark casts both sides to String type during date and > string comparison for partial date support. Please find more details in > https://issues.apache.org/jira/browse/SPARK-8420. > Based on some tests, the behavior of Date and String comparison in Hive and > Mysql: > Hive: Cast to Date, partial date is not supported > Spark: Cast to Date, certain "partial date" is supported by defining certain > date string parse rules. Check out {{str_to_datetime}} in > https://github.com/mysql/mysql-server/blob/5.5/sql-common/my_time.c > Here's 2 proposals: > a. Follow Mysql parse rule, but some partial date string comparison cases > won't be supported either. > b. Cast String value to Date, if it passes use date.toString, original string > otherwise. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34138) Keep dependants cached while refreshing v1 tables
Maxim Gekk created SPARK-34138: -- Summary: Keep dependants cached while refreshing v1 tables Key: SPARK-34138 URL: https://issues.apache.org/jira/browse/SPARK-34138 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.2.0 Reporter: Maxim Gekk Keeping dependants cached while refreshing v1 tables should allow to improve user experience with table/view caching. For example, let's imagine that an user has cached v1 table and cached view based on the table. And the user passed the table to external library which drops/renames/adds partitions in the v1 table. Unfortunately, the user gets the view uncached after that even he/she hasn't uncached the view explicitly. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34143) Adding partitions to fully partitioned v2 table
Maxim Gekk created SPARK-34143: -- Summary: Adding partitions to fully partitioned v2 table Key: SPARK-34143 URL: https://issues.apache.org/jira/browse/SPARK-34143 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.2.0 Reporter: Maxim Gekk The test below fails: {code:scala} withNamespaceAndTable("ns", "tbl") { t => sql(s"CREATE TABLE $t (p0 INT, p1 STRING) $defaultUsing PARTITIONED BY (p0, p1)") sql(s"ALTER TABLE $t ADD PARTITION (p0 = 0, p1 = 'abc')") checkPartitions(t, Map("p0" -> "0", "p1" -> "abc")) checkAnswer(sql(s"SELECT * FROM $t"), Row(0, "abc")) } {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34149) DSv2: `ALTER TABLE .. ADD PARTITION` does not refresh table cache
Maxim Gekk created SPARK-34149: -- Summary: DSv2: `ALTER TABLE .. ADD PARTITION` does not refresh table cache Key: SPARK-34149 URL: https://issues.apache.org/jira/browse/SPARK-34149 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.2.0 Reporter: Maxim Gekk -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34149) DSv2: `ALTER TABLE .. ADD PARTITION` does not refresh table cache
[ https://issues.apache.org/jira/browse/SPARK-34149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-34149: --- Description: For example, the test below: {code:scala} test("SPARK-X: refresh cache in partition adding") { withNamespaceAndTable("ns", "tbl") { t => sql(s"CREATE TABLE $t (part int) $defaultUsing PARTITIONED BY (part)") sql(s"ALTER TABLE $t ADD PARTITION (part=0)") assert(!spark.catalog.isCached(t)) sql(s"CACHE TABLE $t") assert(spark.catalog.isCached(t)) checkAnswer(sql(s"SELECT * FROM $t"), Row(0)) sql(s"ALTER TABLE $t ADD PARTITION (part=1)") assert(spark.catalog.isCached(t)) checkAnswer(sql(s"SELECT * FROM $t"), Seq(Row(0), Row(1))) } } {code} fails with; {code} !== Correct Answer - 2 == == Spark Answer - 1 == !struct<> struct [0][0] ![1] ScalaTestFailureLocation: org.apache.spark.sql.QueryTest$ at (QueryTest.scala:243) {code} because the command doesn't refresh the cache. > DSv2: `ALTER TABLE .. ADD PARTITION` does not refresh table cache > - > > Key: SPARK-34149 > URL: https://issues.apache.org/jira/browse/SPARK-34149 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Maxim Gekk >Priority: Major > > For example, the test below: > {code:scala} > test("SPARK-X: refresh cache in partition adding") { > withNamespaceAndTable("ns", "tbl") { t => > sql(s"CREATE TABLE $t (part int) $defaultUsing PARTITIONED BY (part)") > sql(s"ALTER TABLE $t ADD PARTITION (part=0)") > assert(!spark.catalog.isCached(t)) > sql(s"CACHE TABLE $t") > assert(spark.catalog.isCached(t)) > checkAnswer(sql(s"SELECT * FROM $t"), Row(0)) > sql(s"ALTER TABLE $t ADD PARTITION (part=1)") > assert(spark.catalog.isCached(t)) > checkAnswer(sql(s"SELECT * FROM $t"), Seq(Row(0), Row(1))) > } > } > {code} > fails with; > {code} > !== Correct Answer - 2 == == Spark Answer - 1 == > !struct<> struct > [0][0] > ![1] > > > ScalaTestFailureLocation: org.apache.spark.sql.QueryTest$ at > (QueryTest.scala:243) > {code} > because the command doesn't refresh the cache. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34153) Remove unused `getRawTable()` from `HiveExternalCatalog.alterPartitions()`
Maxim Gekk created SPARK-34153: -- Summary: Remove unused `getRawTable()` from `HiveExternalCatalog.alterPartitions()` Key: SPARK-34153 URL: https://issues.apache.org/jira/browse/SPARK-34153 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.2.0 Reporter: Maxim Gekk This https://github.com/apache/spark/blob/157b72ac9fa0057d5fd6d7ed52a6c4b22ebd1dfc/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala#L1148 can be removed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34161) Check re-caching of v2 table dependents after table altering
Maxim Gekk created SPARK-34161: -- Summary: Check re-caching of v2 table dependents after table altering Key: SPARK-34161 URL: https://issues.apache.org/jira/browse/SPARK-34161 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.2.0 Reporter: Maxim Gekk Add tests to unified tests and check that dependants of v2 table are still cached after table altering. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34161) Check re-caching of v2 table dependents after table altering
[ https://issues.apache.org/jira/browse/SPARK-34161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-34161: --- Description: Add tests to unified test suites and check that dependants of v2 table are still cached after table altering. (was: Add tests to unified tests and check that dependants of v2 table are still cached after table altering.) > Check re-caching of v2 table dependents after table altering > > > Key: SPARK-34161 > URL: https://issues.apache.org/jira/browse/SPARK-34161 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Maxim Gekk >Priority: Major > > Add tests to unified test suites and check that dependants of v2 table are > still cached after table altering. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34197) refreshTable() should not invalidate the relation cache for temporary views
Maxim Gekk created SPARK-34197: -- Summary: refreshTable() should not invalidate the relation cache for temporary views Key: SPARK-34197 URL: https://issues.apache.org/jira/browse/SPARK-34197 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.2.0 Reporter: Maxim Gekk The SessionCatalog.refreshTable() should not invalidate the entry in the relation cache for a table when refreshTable() refreshes a temp view. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34207) Rename `isTemporaryTable` to `isTempView` in `SessionCatalog`
Maxim Gekk created SPARK-34207: -- Summary: Rename `isTemporaryTable` to `isTempView` in `SessionCatalog` Key: SPARK-34207 URL: https://issues.apache.org/jira/browse/SPARK-34207 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.2.0 Reporter: Maxim Gekk Currently, there are two methods that do the same but have different names: {code:java} def isTempView(nameParts: Seq[String]): Boolean {code} and {code:java} def isTemporaryTable(name: TableIdentifier): Boolean {code} It would be nice to rename `SessionCatalog.isTemporaryTable()` to `SessionCatalog.isTempView()`. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34207) Rename `isTemporaryTable` to `isTempView` in `SessionCatalog`
[ https://issues.apache.org/jira/browse/SPARK-34207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-34207: --- Priority: Trivial (was: Major) > Rename `isTemporaryTable` to `isTempView` in `SessionCatalog` > - > > Key: SPARK-34207 > URL: https://issues.apache.org/jira/browse/SPARK-34207 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Maxim Gekk >Priority: Trivial > > Currently, there are two methods that do the same but have different names: > {code:java} > def isTempView(nameParts: Seq[String]): Boolean > {code} > and > {code:java} > def isTemporaryTable(name: TableIdentifier): Boolean > {code} > It would be nice to rename `SessionCatalog.isTemporaryTable()` to > `SessionCatalog.isTempView()`. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34213) LOAD DATA doesn't refresh v1 table cache
[ https://issues.apache.org/jira/browse/SPARK-34213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17270722#comment-17270722 ] Maxim Gekk commented on SPARK-34213: I am working on the issue. > LOAD DATA doesn't refresh v1 table cache > > > Key: SPARK-34213 > URL: https://issues.apache.org/jira/browse/SPARK-34213 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.2, 3.2.0, 3.1.1 >Reporter: Maxim Gekk >Priority: Major > > The example below portraits the issue: > 1. Create a source table: > {code:sql} > spark-sql> CREATE TABLE src_tbl (c0 int, part int) USING hive PARTITIONED BY > (part); > spark-sql> INSERT INTO src_tbl PARTITION (part=0) SELECT 0; > spark-sql> SHOW TABLE EXTENDED LIKE 'src_tbl' PARTITION (part=0); > default src_tbl false Partition Values: [part=0] > Location: > file:/Users/maximgekk/proj/load-data-refresh-cache/spark-warehouse/src_tbl/part=0 > ... > {code} > 2. Load data from the source table to a cached destination table: > {code:sql} > spark-sql> CREATE TABLE dst_tbl (c0 int, part int) USING hive PARTITIONED BY > (part); > spark-sql> INSERT INTO dst_tbl PARTITION (part=1) SELECT 1; > spark-sql> CACHE TABLE dst_tbl; > spark-sql> SELECT * FROM dst_tbl; > 1 1 > spark-sql> LOAD DATA LOCAL INPATH > '/Users/maximgekk/proj/load-data-refresh-cache/spark-warehouse/src_tbl/part=0' > INTO TABLE dst_tbl PARTITION (part=0); > spark-sql> SELECT * FROM dst_tbl; > 1 1 > {code} > The last query does not show recently loaded data from the source table. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34213) LOAD DATA doesn't refresh v1 table cache
Maxim Gekk created SPARK-34213: -- Summary: LOAD DATA doesn't refresh v1 table cache Key: SPARK-34213 URL: https://issues.apache.org/jira/browse/SPARK-34213 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.0.2, 3.2.0, 3.1.1 Reporter: Maxim Gekk The example below portraits the issue: 1. Create a source table: {code:sql} spark-sql> CREATE TABLE src_tbl (c0 int, part int) USING hive PARTITIONED BY (part); spark-sql> INSERT INTO src_tbl PARTITION (part=0) SELECT 0; spark-sql> SHOW TABLE EXTENDED LIKE 'src_tbl' PARTITION (part=0); default src_tbl false Partition Values: [part=0] Location: file:/Users/maximgekk/proj/load-data-refresh-cache/spark-warehouse/src_tbl/part=0 ... {code} 2. Load data from the source table to a cached destination table: {code:sql} spark-sql> CREATE TABLE dst_tbl (c0 int, part int) USING hive PARTITIONED BY (part); spark-sql> INSERT INTO dst_tbl PARTITION (part=1) SELECT 1; spark-sql> CACHE TABLE dst_tbl; spark-sql> SELECT * FROM dst_tbl; 1 1 spark-sql> LOAD DATA LOCAL INPATH '/Users/maximgekk/proj/load-data-refresh-cache/spark-warehouse/src_tbl/part=0' INTO TABLE dst_tbl PARTITION (part=0); spark-sql> SELECT * FROM dst_tbl; 1 1 {code} The last query does not show recently loaded data from the source table. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34215) Keep table cached after truncation
Maxim Gekk created SPARK-34215: -- Summary: Keep table cached after truncation Key: SPARK-34215 URL: https://issues.apache.org/jira/browse/SPARK-34215 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.2.0 Reporter: Maxim Gekk Currently, TRUNCATE TABLE uncaches table. It should keep tables cached to be consistent to other commands. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34251) TRUNCATE TABLE resets stats for non-empty v1 table
[ https://issues.apache.org/jira/browse/SPARK-34251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17272231#comment-17272231 ] Maxim Gekk commented on SPARK-34251: I am working on bug fix. > TRUNCATE TABLE resets stats for non-empty v1 table > -- > > Key: SPARK-34251 > URL: https://issues.apache.org/jira/browse/SPARK-34251 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.2, 3.2.0, 3.1.1 >Reporter: Maxim Gekk >Priority: Major > > The example below portraits the issue: > {code:sql} > spark-sql> CREATE TABLE tbl (c0 int, part int) PARTITIONED BY (part); > spark-sql> INSERT INTO tbl PARTITION (part=0) SELECT 0; > spark-sql> INSERT INTO tbl PARTITION (part=1) SELECT 1; > spark-sql> ANALYZE TABLE tbl COMPUTE STATISTICS; > spark-sql> DESCRIBE TABLE EXTENDED tbl; > ... > Statistics4 bytes, 2 rows > ... > {code} > Let's truncate one partition: > {code:sql} > spark-sql> TRUNCATE TABLE tbl PARTITION (part=1); > spark-sql> DESCRIBE TABLE EXTENDED tbl; > ... > Statistics0 bytes, 0 rows > ... > spark-sql> SELECT * FROM tbl; > 0 0 > {code} > *The last query returns a row but stats show 0 rows. * -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34251) TRUNCATE TABLE resets stats for non-empty v1 table
Maxim Gekk created SPARK-34251: -- Summary: TRUNCATE TABLE resets stats for non-empty v1 table Key: SPARK-34251 URL: https://issues.apache.org/jira/browse/SPARK-34251 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.0.2, 3.2.0, 3.1.1 Reporter: Maxim Gekk The example below portraits the issue: {code:sql} spark-sql> CREATE TABLE tbl (c0 int, part int) PARTITIONED BY (part); spark-sql> INSERT INTO tbl PARTITION (part=0) SELECT 0; spark-sql> INSERT INTO tbl PARTITION (part=1) SELECT 1; spark-sql> ANALYZE TABLE tbl COMPUTE STATISTICS; spark-sql> DESCRIBE TABLE EXTENDED tbl; ... Statistics 4 bytes, 2 rows ... {code} Let's truncate one partition: {code:sql} spark-sql> TRUNCATE TABLE tbl PARTITION (part=1); spark-sql> DESCRIBE TABLE EXTENDED tbl; ... Statistics 0 bytes, 0 rows ... spark-sql> SELECT * FROM tbl; 0 0 {code} *The last query returns a row but stats show 0 rows. * -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34262) ALTER TABLE .. SET LOCATION doesn't refresh v1 table cache
Maxim Gekk created SPARK-34262: -- Summary: ALTER TABLE .. SET LOCATION doesn't refresh v1 table cache Key: SPARK-34262 URL: https://issues.apache.org/jira/browse/SPARK-34262 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.0.2, 3.2.0, 3.1.1 Reporter: Maxim Gekk Assignee: Maxim Gekk Fix For: 3.0.2, 3.1.1 The example below portraits the issue: 1. Create a source table: {code:sql} spark-sql> CREATE TABLE src_tbl (c0 int, part int) USING hive PARTITIONED BY (part); spark-sql> INSERT INTO src_tbl PARTITION (part=0) SELECT 0; spark-sql> SHOW TABLE EXTENDED LIKE 'src_tbl' PARTITION (part=0); default src_tbl false Partition Values: [part=0] Location: file:/Users/maximgekk/proj/load-data-refresh-cache/spark-warehouse/src_tbl/part=0 ... {code} 2. Load data from the source table to a cached destination table: {code:sql} spark-sql> CREATE TABLE dst_tbl (c0 int, part int) USING hive PARTITIONED BY (part); spark-sql> INSERT INTO dst_tbl PARTITION (part=1) SELECT 1; spark-sql> CACHE TABLE dst_tbl; spark-sql> SELECT * FROM dst_tbl; 1 1 spark-sql> LOAD DATA LOCAL INPATH '/Users/maximgekk/proj/load-data-refresh-cache/spark-warehouse/src_tbl/part=0' INTO TABLE dst_tbl PARTITION (part=0); spark-sql> SELECT * FROM dst_tbl; 1 1 {code} The last query does not show recently loaded data from the source table. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34262) ALTER TABLE .. SET LOCATION doesn't refresh v1 table cache
[ https://issues.apache.org/jira/browse/SPARK-34262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-34262: --- Description: The example below portraits the issue: 1. Create a source table: {code:sql} spark-sql> CREATE TABLE src_tbl (c0 int, part int) USING hive PARTITIONED BY (part); spark-sql> INSERT INTO src_tbl PARTITION (part=0) SELECT 0; spark-sql> SHOW TABLE EXTENDED LIKE 'src_tbl' PARTITION (part=0); default src_tbl false Partition Values: [part=0] Location: file:/Users/maximgekk/proj/refresh-cache-set-location/spark-warehouse/src_tbl/part=0 ... {code} 2. Load data from the source table to a cached destination table: {code:sql} spark-sql> CREATE TABLE dst_tbl (c0 int, part int) USING hive PARTITIONED BY (part); spark-sql> ALTER TABLE dst_tbl ADD PARTITION (part=0); spark-sql> INSERT INTO dst_tbl PARTITION (part=1) SELECT 1; spark-sql> CACHE TABLE dst_tbl; spark-sql> SELECT * FROM dst_tbl; 1 1 spark-sql> ALTER TABLE dst_tbl PARTITION (part=0) SET LOCATION '/Users/maximgekk/proj/refresh-cache-set-location/spark-warehouse/src_tbl/part=0'; spark-sql> SELECT * FROM dst_tbl; 1 1 {code} The last query does not show recently loaded data from the source table. was: The example below portraits the issue: 1. Create a source table: {code:sql} spark-sql> CREATE TABLE src_tbl (c0 int, part int) USING hive PARTITIONED BY (part); spark-sql> INSERT INTO src_tbl PARTITION (part=0) SELECT 0; spark-sql> SHOW TABLE EXTENDED LIKE 'src_tbl' PARTITION (part=0); default src_tbl false Partition Values: [part=0] Location: file:/Users/maximgekk/proj/load-data-refresh-cache/spark-warehouse/src_tbl/part=0 ... {code} 2. Load data from the source table to a cached destination table: {code:sql} spark-sql> CREATE TABLE dst_tbl (c0 int, part int) USING hive PARTITIONED BY (part); spark-sql> INSERT INTO dst_tbl PARTITION (part=1) SELECT 1; spark-sql> CACHE TABLE dst_tbl; spark-sql> SELECT * FROM dst_tbl; 1 1 spark-sql> LOAD DATA LOCAL INPATH '/Users/maximgekk/proj/load-data-refresh-cache/spark-warehouse/src_tbl/part=0' INTO TABLE dst_tbl PARTITION (part=0); spark-sql> SELECT * FROM dst_tbl; 1 1 {code} The last query does not show recently loaded data from the source table. > ALTER TABLE .. SET LOCATION doesn't refresh v1 table cache > -- > > Key: SPARK-34262 > URL: https://issues.apache.org/jira/browse/SPARK-34262 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.2, 3.2.0, 3.1.1 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > Labels: correctness > Fix For: 3.0.2, 3.1.1 > > > The example below portraits the issue: > 1. Create a source table: > {code:sql} > spark-sql> CREATE TABLE src_tbl (c0 int, part int) USING hive PARTITIONED BY > (part); > spark-sql> INSERT INTO src_tbl PARTITION (part=0) SELECT 0; > spark-sql> SHOW TABLE EXTENDED LIKE 'src_tbl' PARTITION (part=0); > default src_tbl false Partition Values: [part=0] > Location: > file:/Users/maximgekk/proj/refresh-cache-set-location/spark-warehouse/src_tbl/part=0 > ... > {code} > 2. Load data from the source table to a cached destination table: > {code:sql} > spark-sql> CREATE TABLE dst_tbl (c0 int, part int) USING hive PARTITIONED BY > (part); > spark-sql> ALTER TABLE dst_tbl ADD PARTITION (part=0); > spark-sql> INSERT INTO dst_tbl PARTITION (part=1) SELECT 1; > spark-sql> CACHE TABLE dst_tbl; > spark-sql> SELECT * FROM dst_tbl; > 1 1 > spark-sql> ALTER TABLE dst_tbl PARTITION (part=0) SET LOCATION > '/Users/maximgekk/proj/refresh-cache-set-location/spark-warehouse/src_tbl/part=0'; > spark-sql> SELECT * FROM dst_tbl; > 1 1 > {code} > The last query does not show recently loaded data from the source table. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34266) Update comments for `SessionCatalog.refreshTable()` and `CatalogImpl.refreshTable()`
Maxim Gekk created SPARK-34266: -- Summary: Update comments for `SessionCatalog.refreshTable()` and `CatalogImpl.refreshTable()` Key: SPARK-34266 URL: https://issues.apache.org/jira/browse/SPARK-34266 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.2.0 Reporter: Maxim Gekk -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34267) Remove `refreshTable()` from `SessionState`
Maxim Gekk created SPARK-34267: -- Summary: Remove `refreshTable()` from `SessionState` Key: SPARK-34267 URL: https://issues.apache.org/jira/browse/SPARK-34267 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.2.0 Reporter: Maxim Gekk -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34267) Remove `refreshTable()` from `SessionState`
[ https://issues.apache.org/jira/browse/SPARK-34267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-34267: --- Description: Spark already has `SessionCatalog.refreshTable` and `CatalogImpl.refreshTable`. One more method in `SessionState` might confuse. > Remove `refreshTable()` from `SessionState` > --- > > Key: SPARK-34267 > URL: https://issues.apache.org/jira/browse/SPARK-34267 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Maxim Gekk >Priority: Major > > Spark already has `SessionCatalog.refreshTable` and > `CatalogImpl.refreshTable`. One more method in `SessionState` might confuse. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34282) Unify v1 and v2 TRUNCATE TABLE tests
Maxim Gekk created SPARK-34282: -- Summary: Unify v1 and v2 TRUNCATE TABLE tests Key: SPARK-34282 URL: https://issues.apache.org/jira/browse/SPARK-34282 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.2.0 Reporter: Maxim Gekk Assignee: Maxim Gekk Fix For: 3.2.0 Extract ALTER TABLE .. RECOVER PARTITIONS tests to the common place to run them for V1 and v2 datasources. Some tests can be places to V1 and V2 specific test suites. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34282) Unify v1 and v2 TRUNCATE TABLE tests
[ https://issues.apache.org/jira/browse/SPARK-34282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-34282: --- Description: Extract TRUNCATE TABLE tests to the common place to run them for V1 and v2 datasources. Some tests can be places to V1 and V2 specific test suites. (was: Extract ALTER TABLE .. RECOVER PARTITIONS tests to the common place to run them for V1 and v2 datasources. Some tests can be places to V1 and V2 specific test suites.) > Unify v1 and v2 TRUNCATE TABLE tests > > > Key: SPARK-34282 > URL: https://issues.apache.org/jira/browse/SPARK-34282 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > Fix For: 3.2.0 > > > Extract TRUNCATE TABLE tests to the common place to run them for V1 and v2 > datasources. Some tests can be places to V1 and V2 specific test suites. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34290) Support v2 TRUNCATE TABLE
Maxim Gekk created SPARK-34290: -- Summary: Support v2 TRUNCATE TABLE Key: SPARK-34290 URL: https://issues.apache.org/jira/browse/SPARK-34290 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.2.0 Reporter: Maxim Gekk Need to implement TRUNCATE TABLE for DSv2 tables similarly to v1 implementation. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34290) Support v2 TRUNCATE TABLE
[ https://issues.apache.org/jira/browse/SPARK-34290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17274247#comment-17274247 ] Maxim Gekk commented on SPARK-34290: I am working on this. > Support v2 TRUNCATE TABLE > - > > Key: SPARK-34290 > URL: https://issues.apache.org/jira/browse/SPARK-34290 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Maxim Gekk >Priority: Major > > Need to implement TRUNCATE TABLE for DSv2 tables similarly to v1 > implementation. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34301) Use logical plan of alter table in `CatalogImpl.recoverPartitions()`
Maxim Gekk created SPARK-34301: -- Summary: Use logical plan of alter table in `CatalogImpl.recoverPartitions()` Key: SPARK-34301 URL: https://issues.apache.org/jira/browse/SPARK-34301 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.2.0 Reporter: Maxim Gekk The logical node will allow: 1. Print nicer error message 2. Not bound to v1 tables. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34302) Migrate ALTER TABLE .. CHANGE COLUMN to new resolution framework
Maxim Gekk created SPARK-34302: -- Summary: Migrate ALTER TABLE .. CHANGE COLUMN to new resolution framework Key: SPARK-34302 URL: https://issues.apache.org/jira/browse/SPARK-34302 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.2.0 Reporter: Maxim Gekk Assignee: Maxim Gekk Fix For: 3.2.0 # Create the Command logical node for SHOW TABLE EXTENDED # Remove ShowTableStatement -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34302) Migrate ALTER TABLE .. CHANGE COLUMN to new resolution framework
[ https://issues.apache.org/jira/browse/SPARK-34302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-34302: --- Description: # Create the Command logical node for ALTER TABLE .. CHANGE COLUMN # Remove AlterTableAlterColumnStatement was: # Create the Command logical node for SHOW TABLE EXTENDED # Remove ShowTableStatement > Migrate ALTER TABLE .. CHANGE COLUMN to new resolution framework > > > Key: SPARK-34302 > URL: https://issues.apache.org/jira/browse/SPARK-34302 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > Fix For: 3.2.0 > > > # Create the Command logical node for ALTER TABLE .. CHANGE COLUMN > # Remove AlterTableAlterColumnStatement -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34303) Migrate ALTER TABLE ... SET LOCATION to new resolution framework
Maxim Gekk created SPARK-34303: -- Summary: Migrate ALTER TABLE ... SET LOCATION to new resolution framework Key: SPARK-34303 URL: https://issues.apache.org/jira/browse/SPARK-34303 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.2.0 Reporter: Maxim Gekk Assignee: Maxim Gekk Fix For: 3.2.0 # Create the Command logical node for ALTER TABLE .. CHANGE COLUMN # Remove AlterTableAlterColumnStatement -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34303) Migrate ALTER TABLE ... SET LOCATION to new resolution framework
[ https://issues.apache.org/jira/browse/SPARK-34303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-34303: --- Description: # Create the Command logical node for ALTER TABLE ... SET LOCATION # Remove AlterTableSetLocationStatement was: # Create the Command logical node for ALTER TABLE .. CHANGE COLUMN # Remove AlterTableAlterColumnStatement > Migrate ALTER TABLE ... SET LOCATION to new resolution framework > > > Key: SPARK-34303 > URL: https://issues.apache.org/jira/browse/SPARK-34303 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > Fix For: 3.2.0 > > > # Create the Command logical node for ALTER TABLE ... SET LOCATION > # Remove AlterTableSetLocationStatement -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34304) Remove view checks in v1 alter table commands
Maxim Gekk created SPARK-34304: -- Summary: Remove view checks in v1 alter table commands Key: SPARK-34304 URL: https://issues.apache.org/jira/browse/SPARK-34304 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.2.0 Reporter: Maxim Gekk After migration on new resolution framework in SPARK-29900, view checks are not needed in the commands: - AlterTableAddPartitionCommand - AlterTableDropPartitionCommand - AlterTableRenamePartitionCommand - AlterTableRecoverPartitionsCommand - AlterTableSerDePropertiesCommand So, the checks DDLUtils.verifyAlterTableType can be removed from the v1 commands. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34305) Unify v1 and v2 ALTER TABLE .. SET SERDE tests
Maxim Gekk created SPARK-34305: -- Summary: Unify v1 and v2 ALTER TABLE .. SET SERDE tests Key: SPARK-34305 URL: https://issues.apache.org/jira/browse/SPARK-34305 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.2.0 Reporter: Maxim Gekk Assignee: Maxim Gekk Fix For: 3.2.0 Extract TRUNCATE TABLE tests to the common place to run them for V1 and v2 datasources. Some tests can be places to V1 and V2 specific test suites. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34305) Unify v1 and v2 ALTER TABLE .. SET SERDE tests
[ https://issues.apache.org/jira/browse/SPARK-34305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-34305: --- Description: Extract ALTER TABLE .. SET SERDE tests to the common place to run them for V1 and v2 datasources. Some tests can be places to V1 and V2 specific test suites. (was: Extract TRUNCATE TABLE tests to the common place to run them for V1 and v2 datasources. Some tests can be places to V1 and V2 specific test suites.) > Unify v1 and v2 ALTER TABLE .. SET SERDE tests > -- > > Key: SPARK-34305 > URL: https://issues.apache.org/jira/browse/SPARK-34305 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > Fix For: 3.2.0 > > > Extract ALTER TABLE .. SET SERDE tests to the common place to run them for V1 > and v2 datasources. Some tests can be places to V1 and V2 specific test > suites. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34302) Migrate ALTER TABLE .. CHANGE COLUMN to new resolution framework
[ https://issues.apache.org/jira/browse/SPARK-34302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-34302: --- Description: # Create the Command logical node for ALTER TABLE .. CHANGE COLUMN # Remove AlterTableAlterColumnStatement # Remove the check verifyAlterTableType() from run() was: # Create the Command logical node for ALTER TABLE .. CHANGE COLUMN # Remove AlterTableAlterColumnStatement > Migrate ALTER TABLE .. CHANGE COLUMN to new resolution framework > > > Key: SPARK-34302 > URL: https://issues.apache.org/jira/browse/SPARK-34302 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > Fix For: 3.2.0 > > > # Create the Command logical node for ALTER TABLE .. CHANGE COLUMN > # Remove AlterTableAlterColumnStatement > # Remove the check verifyAlterTableType() from run() -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34303) Migrate ALTER TABLE ... SET LOCATION to new resolution framework
[ https://issues.apache.org/jira/browse/SPARK-34303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-34303: --- Description: # Create the Command logical node for ALTER TABLE ... SET LOCATION # Remove AlterTableSetLocationStatement # Remove the check verifyAlterTableType() from run() was: # Create the Command logical node for ALTER TABLE ... SET LOCATION # Remove AlterTableSetLocationStatement > Migrate ALTER TABLE ... SET LOCATION to new resolution framework > > > Key: SPARK-34303 > URL: https://issues.apache.org/jira/browse/SPARK-34303 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > Fix For: 3.2.0 > > > # Create the Command logical node for ALTER TABLE ... SET LOCATION > # Remove AlterTableSetLocationStatement > # Remove the check verifyAlterTableType() from run() -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-29391) Default year-month units
[ https://issues.apache.org/jira/browse/SPARK-29391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17275933#comment-17275933 ] Maxim Gekk commented on SPARK-29391: For now, I am not sure that we are still going to be compatible with PostgreSQL in interval formats. cc [~cloud_fan] [~hyukjin.kwon] > Default year-month units > > > Key: SPARK-29391 > URL: https://issues.apache.org/jira/browse/SPARK-29391 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Maxim Gekk >Priority: Major > > PostgreSQL can assume default year-month units by defaults: > {code} > maxim=# SELECT interval '1-2'; >interval > --- > 1 year 2 mons > {code} > but the same produces NULL in Spark: -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34161) Check re-caching of v2 table dependents after table altering
[ https://issues.apache.org/jira/browse/SPARK-34161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17275958#comment-17275958 ] Maxim Gekk commented on SPARK-34161: This was resolved by https://github.com/apache/spark/pull/31250, [~cloud_fan] please, close it. > Check re-caching of v2 table dependents after table altering > > > Key: SPARK-34161 > URL: https://issues.apache.org/jira/browse/SPARK-34161 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Maxim Gekk >Priority: Major > > Add tests to unified test suites and check that dependants of v2 table are > still cached after table altering. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34312) Support partition truncation by `SupportsPartitionManagement`
Maxim Gekk created SPARK-34312: -- Summary: Support partition truncation by `SupportsPartitionManagement` Key: SPARK-34312 URL: https://issues.apache.org/jira/browse/SPARK-34312 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.2.0 Reporter: Maxim Gekk Assignee: Maxim Gekk Fix For: 3.2.0 Add new method `purgePartition` in `SupportsPartitionManagement` and `purgePartitions` in `SupportsAtomicPartitionManagement`. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34312) Support partition truncation by `SupportsPartitionManagement`
[ https://issues.apache.org/jira/browse/SPARK-34312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-34312: --- Description: Add new method `truncatePartition` in `SupportsPartitionManagement` and `truncatePartitions` in `SupportsAtomicPartitionManagement`. (was: Add new method `purgePartition` in `SupportsPartitionManagement` and `purgePartitions` in `SupportsAtomicPartitionManagement`.) > Support partition truncation by `SupportsPartitionManagement` > - > > Key: SPARK-34312 > URL: https://issues.apache.org/jira/browse/SPARK-34312 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > Fix For: 3.2.0 > > > Add new method `truncatePartition` in `SupportsPartitionManagement` and > `truncatePartitions` in `SupportsAtomicPartitionManagement`. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34314) Wrong discovered partition value
Maxim Gekk created SPARK-34314: -- Summary: Wrong discovered partition value Key: SPARK-34314 URL: https://issues.apache.org/jira/browse/SPARK-34314 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.2.0 Reporter: Maxim Gekk The example below portraits the issue: {code:scala} val df = Seq((0, "AA"), (1, "-0")).toDF("id", "part") df.write .partitionBy("part") .format("parquet") .save(path) val readback = spark.read.parquet(path) readback.printSchema() readback.show(false) {code} It write the partition value as string: {code} /private/var/folders/p3/dfs6mf655d7fnjrsjvldh0tcgn/T/spark-e09eae99-7ecf-4ab2-b99b-f63f8dea658d ├── _SUCCESS ├── part=-0 │ └── part-1-02144398-2896-4d21-9628-a8743d098cb4.c000.snappy.parquet └── part=AA └── part-0-02144398-2896-4d21-9628-a8743d098cb4.c000.snappy.parquet {code} *"-0"* and "AA". but when Spark reads data back, it transforms "-0" to "0" {code} root |-- id: integer (nullable = true) |-- part: string (nullable = true) +---++ |id |part| +---++ |0 |AA | |1 |0 | +---++ {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34314) Wrong discovered partition value
[ https://issues.apache.org/jira/browse/SPARK-34314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-34314: --- Affects Version/s: 3.1.0 3.0.2 2.4.8 > Wrong discovered partition value > > > Key: SPARK-34314 > URL: https://issues.apache.org/jira/browse/SPARK-34314 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.8, 3.0.2, 3.1.0, 3.2.0 >Reporter: Maxim Gekk >Priority: Major > > The example below portraits the issue: > {code:scala} > val df = Seq((0, "AA"), (1, "-0")).toDF("id", "part") > df.write > .partitionBy("part") > .format("parquet") > .save(path) > val readback = spark.read.parquet(path) > readback.printSchema() > readback.show(false) > {code} > It write the partition value as string: > {code} > /private/var/folders/p3/dfs6mf655d7fnjrsjvldh0tcgn/T/spark-e09eae99-7ecf-4ab2-b99b-f63f8dea658d > ├── _SUCCESS > ├── part=-0 > │ └── part-1-02144398-2896-4d21-9628-a8743d098cb4.c000.snappy.parquet > └── part=AA > └── part-0-02144398-2896-4d21-9628-a8743d098cb4.c000.snappy.parquet > {code} > *"-0"* and "AA". > but when Spark reads data back, it transforms "-0" to "0" > {code} > root > |-- id: integer (nullable = true) > |-- part: string (nullable = true) > +---++ > |id |part| > +---++ > |0 |AA | > |1 |0 | > +---++ > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34332) Unify v1 and v2 ALTER TABLE .. SET LOCATION tests
Maxim Gekk created SPARK-34332: -- Summary: Unify v1 and v2 ALTER TABLE .. SET LOCATION tests Key: SPARK-34332 URL: https://issues.apache.org/jira/browse/SPARK-34332 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.2.0 Reporter: Maxim Gekk Assignee: Maxim Gekk Fix For: 3.2.0 Extract ALTER TABLE .. SET SERDE tests to the common place to run them for V1 and v2 datasources. Some tests can be places to V1 and V2 specific test suites. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34332) Unify v1 and v2 ALTER TABLE .. SET LOCATION tests
[ https://issues.apache.org/jira/browse/SPARK-34332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-34332: --- Description: Extract ALTER TABLE .. SET LOCATION tests to the common place to run them for V1 and v2 datasources. Some tests can be places to V1 and V2 specific test suites. (was: Extract ALTER TABLE .. SET SERDE tests to the common place to run them for V1 and v2 datasources. Some tests can be places to V1 and V2 specific test suites.) > Unify v1 and v2 ALTER TABLE .. SET LOCATION tests > - > > Key: SPARK-34332 > URL: https://issues.apache.org/jira/browse/SPARK-34332 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > Fix For: 3.2.0 > > > Extract ALTER TABLE .. SET LOCATION tests to the common place to run them for > V1 and v2 datasources. Some tests can be places to V1 and V2 specific test > suites. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34360) Support table truncation by v2 Table Catalogs
Maxim Gekk created SPARK-34360: -- Summary: Support table truncation by v2 Table Catalogs Key: SPARK-34360 URL: https://issues.apache.org/jira/browse/SPARK-34360 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.2.0 Reporter: Maxim Gekk Assignee: Maxim Gekk Fix For: 3.2.0 Add new method `truncatePartition` in `SupportsPartitionManagement` and `truncatePartitions` in `SupportsAtomicPartitionManagement`. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34360) Support table truncation by v2 Table Catalogs
[ https://issues.apache.org/jira/browse/SPARK-34360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-34360: --- Description: Add new method `truncateTable` to the TableCatalog interface with default implementation. And implement this method in InMemoryTableCatalog. (was: Add new method `truncatePartition` in `SupportsPartitionManagement` and `truncatePartitions` in `SupportsAtomicPartitionManagement`.) > Support table truncation by v2 Table Catalogs > - > > Key: SPARK-34360 > URL: https://issues.apache.org/jira/browse/SPARK-34360 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > Fix For: 3.2.0 > > > Add new method `truncateTable` to the TableCatalog interface with default > implementation. And implement this method in InMemoryTableCatalog. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34371) Run datetime rebasing tests for parquet DSv1 and DSv2
Maxim Gekk created SPARK-34371: -- Summary: Run datetime rebasing tests for parquet DSv1 and DSv2 Key: SPARK-34371 URL: https://issues.apache.org/jira/browse/SPARK-34371 Project: Spark Issue Type: Test Components: SQL Affects Versions: 3.2.0 Reporter: Maxim Gekk Extract datetime rebasing tests from ParquetIOSuite and place them a separate test suite to run it for both implementations DS v1 and v2. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34377) Support parquet datasource options to control datetime rebasing in read
Maxim Gekk created SPARK-34377: -- Summary: Support parquet datasource options to control datetime rebasing in read Key: SPARK-34377 URL: https://issues.apache.org/jira/browse/SPARK-34377 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 3.2.0 Reporter: Maxim Gekk Add new parquet options similar to the SQL configs {{spark.sql.legacy.parquet.datetimeRebaseModeInRead}} and {{spark.sql.legacy.parquet.int96RebaseModeInRead.}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34385) Unwrap SparkUpgradeException in v1 Parquet datasource
Maxim Gekk created SPARK-34385: -- Summary: Unwrap SparkUpgradeException in v1 Parquet datasource Key: SPARK-34385 URL: https://issues.apache.org/jira/browse/SPARK-34385 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.2.0 Reporter: Maxim Gekk Unwrap SparkUpgradeException in FilePartitionReader, and throw it as caused one of SparkException. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34385) Unwrap SparkUpgradeException in v2 Parquet datasource
[ https://issues.apache.org/jira/browse/SPARK-34385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-34385: --- Summary: Unwrap SparkUpgradeException in v2 Parquet datasource (was: Unwrap SparkUpgradeException in v1 Parquet datasource) > Unwrap SparkUpgradeException in v2 Parquet datasource > - > > Key: SPARK-34385 > URL: https://issues.apache.org/jira/browse/SPARK-34385 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Maxim Gekk >Priority: Major > > Unwrap SparkUpgradeException in FilePartitionReader, and throw it as caused > one of SparkException. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34386) "Proleptic" date off by 10 days when returned by .collectAsList
[ https://issues.apache.org/jira/browse/SPARK-34386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17280268#comment-17280268 ] Maxim Gekk commented on SPARK-34386: [~bysza] Thanks for the ping. This is expected behavior, actually. The collectAsList() method converts internal timestamp values (in the Proleptic Gregorian calendar) to java.sql.Timestamp which is based on the hybrid calendar (Julian + Proleptic Gregorian calendars). The timestamp from your example doesn't exist in the hybrid calendar, so, Spark shifts it to the closest valid date which is 1582-10-15. If you want to receive timestamps AS IS from collectAsList(), please, switch to Java 8 types via *spark.sql.datetime.java8API.enabled*. > "Proleptic" date off by 10 days when returned by .collectAsList > --- > > Key: SPARK-34386 > URL: https://issues.apache.org/jira/browse/SPARK-34386 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.1 > Environment: Windows 10 >Reporter: Marek Byszewski >Priority: Major > > Run the following commands using Spark 3.0.1: > {{scala> spark.sql("select to_timestamp('1582-10-05 02:12:34.997') as > data_console").show(false)}} > {{+---+}} > {{|data_console |}} > {{+---+}} > {{|*1582-10-05 02:12:34.997*|}} > {{+---+}} > {{scala> spark.sql("select to_timestamp('1582-10-05 02:12:34.997') as > data_console")}} > {{res3: org.apache.spark.sql.DataFrame = [data_console: timestamp]}} > {{scala> res3.collectAsList}} > {{res4: java.util.List[org.apache.spark.sql.Row] = > [[*1582-10-{color:#FF}15{color} 02:12:34.997*]]}} > Notice that the returned date is off by 10 days compared to the date returned > by the first command. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34386) "Proleptic" date off by 10 days when returned by .collectAsList
[ https://issues.apache.org/jira/browse/SPARK-34386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17280269#comment-17280269 ] Maxim Gekk commented on SPARK-34386: [~bysza] You can find more details in the blog post: https://databricks.com/blog/2020/07/22/a-comprehensive-look-at-dates-and-timestamps-in-apache-spark-3-0.html > "Proleptic" date off by 10 days when returned by .collectAsList > --- > > Key: SPARK-34386 > URL: https://issues.apache.org/jira/browse/SPARK-34386 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.1 > Environment: Windows 10 >Reporter: Marek Byszewski >Priority: Major > > Run the following commands using Spark 3.0.1: > {{scala> spark.sql("select to_timestamp('1582-10-05 02:12:34.997') as > data_console").show(false)}} > {{+---+}} > {{|data_console |}} > {{+---+}} > {{|*1582-10-05 02:12:34.997*|}} > {{+---+}} > {{scala> spark.sql("select to_timestamp('1582-10-05 02:12:34.997') as > data_console")}} > {{res3: org.apache.spark.sql.DataFrame = [data_console: timestamp]}} > {{scala> res3.collectAsList}} > {{res4: java.util.List[org.apache.spark.sql.Row] = > [[*1582-10-{color:#FF}15{color} 02:12:34.997*]]}} > Notice that the returned date is off by 10 days compared to the date returned > by the first command. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34397) Support v2 `MSCK REPAIR TABLE`
Maxim Gekk created SPARK-34397: -- Summary: Support v2 `MSCK REPAIR TABLE` Key: SPARK-34397 URL: https://issues.apache.org/jira/browse/SPARK-34397 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.2.0 Reporter: Maxim Gekk Implement the `MSCK REPAIR TABLE` command for tables from v2 catalogs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34401) Update public docs about altering cached tables/views
Maxim Gekk created SPARK-34401: -- Summary: Update public docs about altering cached tables/views Key: SPARK-34401 URL: https://issues.apache.org/jira/browse/SPARK-34401 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.2.0 Reporter: Maxim Gekk -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34404) Support Avro datasource options to control datetime rebasing in read
Maxim Gekk created SPARK-34404: -- Summary: Support Avro datasource options to control datetime rebasing in read Key: SPARK-34404 URL: https://issues.apache.org/jira/browse/SPARK-34404 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 3.2.0 Reporter: Maxim Gekk Assignee: Maxim Gekk Fix For: 3.2.0 Add new parquet options similar to the SQL configs {{spark.sql.legacy.parquet.datetimeRebaseModeInRead}} and {{spark.sql.legacy.parquet.int96RebaseModeInRead.}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34404) Support Avro datasource options to control datetime rebasing in read
[ https://issues.apache.org/jira/browse/SPARK-34404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-34404: --- Description: Add new Avro option similar to the SQL configs {{spark.sql.legacy.parquet.datetimeRebaseModeInRead}}{{.}} (was: Add new parquet options similar to the SQL configs {{spark.sql.legacy.parquet.datetimeRebaseModeInRead}} and {{spark.sql.legacy.parquet.int96RebaseModeInRead.}}) > Support Avro datasource options to control datetime rebasing in read > > > Key: SPARK-34404 > URL: https://issues.apache.org/jira/browse/SPARK-34404 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.2.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > Fix For: 3.2.0 > > > Add new Avro option similar to the SQL configs > {{spark.sql.legacy.parquet.datetimeRebaseModeInRead}}{{.}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34392) Invalid ID for offset-based ZoneId since Spark 3.0
[ https://issues.apache.org/jira/browse/SPARK-34392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17281838#comment-17281838 ] Maxim Gekk commented on SPARK-34392: The "GMT+8:00" string is unsupported format in 3.0, see docs for the to_utc_timestamp() function: {code:scala} * @param tz A string detailing the time zone ID that the input should be adjusted to. It should * be in the format of either region-based zone IDs or zone offsets. Region IDs must * have the form 'area/city', such as 'America/Los_Angeles'. Zone offsets must be in * the format '(+|-)HH:mm', for example '-08:00' or '+01:00'. Also 'UTC' and 'Z' are * supported as aliases of '+00:00'. Other short names are not recommended to use * because they can be ambiguous. {code} > Invalid ID for offset-based ZoneId since Spark 3.0 > -- > > Key: SPARK-34392 > URL: https://issues.apache.org/jira/browse/SPARK-34392 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0, 3.0.1 >Reporter: Yuming Wang >Priority: Major > > How to reproduce this issue: > {code:sql} > select to_utc_timestamp("2020-02-07 16:00:00", "GMT+8:00"); > {code} > Spark 2.4: > {noformat} > spark-sql> select to_utc_timestamp("2020-02-07 16:00:00", "GMT+8:00"); > 2020-02-07 08:00:00 > Time taken: 0.089 seconds, Fetched 1 row(s) > {noformat} > Spark 3.x: > {noformat} > spark-sql> select to_utc_timestamp("2020-02-07 16:00:00", "GMT+8:00"); > 21/02/07 01:24:32 ERROR SparkSQLDriver: Failed in [select > to_utc_timestamp("2020-02-07 16:00:00", "GMT+8:00")] > java.time.DateTimeException: Invalid ID for offset-based ZoneId: GMT+8:00 > at java.time.ZoneId.ofWithPrefix(ZoneId.java:437) > at java.time.ZoneId.of(ZoneId.java:407) > at java.time.ZoneId.of(ZoneId.java:359) > at java.time.ZoneId.of(ZoneId.java:315) > at > org.apache.spark.sql.catalyst.util.DateTimeUtils$.getZoneId(DateTimeUtils.scala:53) > at > org.apache.spark.sql.catalyst.util.DateTimeUtils$.toUTCTime(DateTimeUtils.scala:814) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-34392) Invalid ID for offset-based ZoneId since Spark 3.0
[ https://issues.apache.org/jira/browse/SPARK-34392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17281838#comment-17281838 ] Maxim Gekk edited comment on SPARK-34392 at 2/9/21, 3:26 PM: - The "GMT+8:00" string is unsupported format in 3.0, see docs for the to_utc_timestamp() function (https://github.com/apache/spark/blob/30468a901577e82c855fbc4cb78e1b869facb44c/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L3397-L3402): {code:scala} @param tz A string detailing the time zone ID that the input should be adjusted to. It should be in the format of either region-based zone IDs or zone offsets. Region IDs must have the form 'area/city', such as 'America/Los_Angeles'. Zone offsets must be in the format '(+|-)HH:mm', for example '-08:00' or '+01:00'. Also 'UTC' and 'Z' are supported as aliases of '+00:00'. Other short names are not recommended to use because they can be ambiguous. {code} was (Author: maxgekk): The "GMT+8:00" string is unsupported format in 3.0, see docs for the to_utc_timestamp() function: {code:scala} * @param tz A string detailing the time zone ID that the input should be adjusted to. It should * be in the format of either region-based zone IDs or zone offsets. Region IDs must * have the form 'area/city', such as 'America/Los_Angeles'. Zone offsets must be in * the format '(+|-)HH:mm', for example '-08:00' or '+01:00'. Also 'UTC' and 'Z' are * supported as aliases of '+00:00'. Other short names are not recommended to use * because they can be ambiguous. {code} > Invalid ID for offset-based ZoneId since Spark 3.0 > -- > > Key: SPARK-34392 > URL: https://issues.apache.org/jira/browse/SPARK-34392 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0, 3.0.1 >Reporter: Yuming Wang >Priority: Major > > How to reproduce this issue: > {code:sql} > select to_utc_timestamp("2020-02-07 16:00:00", "GMT+8:00"); > {code} > Spark 2.4: > {noformat} > spark-sql> select to_utc_timestamp("2020-02-07 16:00:00", "GMT+8:00"); > 2020-02-07 08:00:00 > Time taken: 0.089 seconds, Fetched 1 row(s) > {noformat} > Spark 3.x: > {noformat} > spark-sql> select to_utc_timestamp("2020-02-07 16:00:00", "GMT+8:00"); > 21/02/07 01:24:32 ERROR SparkSQLDriver: Failed in [select > to_utc_timestamp("2020-02-07 16:00:00", "GMT+8:00")] > java.time.DateTimeException: Invalid ID for offset-based ZoneId: GMT+8:00 > at java.time.ZoneId.ofWithPrefix(ZoneId.java:437) > at java.time.ZoneId.of(ZoneId.java:407) > at java.time.ZoneId.of(ZoneId.java:359) > at java.time.ZoneId.of(ZoneId.java:315) > at > org.apache.spark.sql.catalyst.util.DateTimeUtils$.getZoneId(DateTimeUtils.scala:53) > at > org.apache.spark.sql.catalyst.util.DateTimeUtils$.toUTCTime(DateTimeUtils.scala:814) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34418) Check v1 TRUNCATE TABLE preserves partitions
Maxim Gekk created SPARK-34418: -- Summary: Check v1 TRUNCATE TABLE preserves partitions Key: SPARK-34418 URL: https://issues.apache.org/jira/browse/SPARK-34418 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.2.0 Reporter: Maxim Gekk Add a test which checks TRUNCATE TABLE only removes rows and preserves existing partitions. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34424) HiveOrcHadoopFsRelationSuite fails with seed 610710213676
Maxim Gekk created SPARK-34424: -- Summary: HiveOrcHadoopFsRelationSuite fails with seed 610710213676 Key: SPARK-34424 URL: https://issues.apache.org/jira/browse/SPARK-34424 Project: Spark Issue Type: Test Components: SQL Affects Versions: 3.0.2, 3.2.0, 3.1.1 Reporter: Maxim Gekk The test "test all data types" in HiveOrcHadoopFsRelationSuite fails with: {code:java} == Results == !== Correct Answer - 20 ==== Spark Answer - 20 == struct struct [1,1582-10-15] [1,1582-10-15] [2,null] [2,null] [3,1970-01-01] [3,1970-01-01] [4,1681-08-06] [4,1681-08-06] [5,1582-10-15] [5,1582-10-15] [6,-12-31] [6,-12-31] [7,0583-01-04] [7,0583-01-04] [8,6077-03-04] [8,6077-03-04] ![9,1582-10-06] [9,1582-10-15] [10,1582-10-15] [10,1582-10-15] [11,-12-31] [11,-12-31] [12,9722-10-04] [12,9722-10-04] [13,0243-12-19] [13,0243-12-19] [14,-12-31] [14,-12-31] [15,8743-01-24] [15,8743-01-24] [16,1039-10-31] [16,1039-10-31] [17,-12-31] [17,-12-31] [18,1582-10-15] [18,1582-10-15] [19,1582-10-15] [19,1582-10-15] [20,1582-10-15] [20,1582-10-15] {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34431) Only load hive-site.xml once
Maxim Gekk created SPARK-34431: -- Summary: Only load hive-site.xml once Key: SPARK-34431 URL: https://issues.apache.org/jira/browse/SPARK-34431 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.2.0 Reporter: Maxim Gekk Hive configs from hive-site.xml are parsed over and over again. We can optimize this, and parse it only once. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34434) Mention DS rebase options in SparkUpgradeException
Maxim Gekk created SPARK-34434: -- Summary: Mention DS rebase options in SparkUpgradeException Key: SPARK-34434 URL: https://issues.apache.org/jira/browse/SPARK-34434 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.2.0 Reporter: Maxim Gekk Mention the DS options added by SPARK-34404 and SPARK-34377 in SparkUpgradeException. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34437) Update Spark SQL guide about rebase DS options and SQL configs
Maxim Gekk created SPARK-34437: -- Summary: Update Spark SQL guide about rebase DS options and SQL configs Key: SPARK-34437 URL: https://issues.apache.org/jira/browse/SPARK-34437 Project: Spark Issue Type: Documentation Components: SQL Affects Versions: 3.2.0 Reporter: Maxim Gekk Describe the following SQL configs: * spark.sql.legacy.parquet.int96RebaseModeInWrite * spark.sql.legacy.parquet.datetimeRebaseModeInWrite * spark.sql.legacy.parquet.int96RebaseModeInRead * spark.sql.legacy.parquet.datetimeRebaseModeInRead * spark.sql.legacy.avro.datetimeRebaseModeInWrite * spark.sql.legacy.avro.datetimeRebaseModeInRead And Avro/Parquet options datetimeRebaseMode and int96RebaseMode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34439) Recognize `spark_catalog` in new identifier while view/table renaming
Maxim Gekk created SPARK-34439: -- Summary: Recognize `spark_catalog` in new identifier while view/table renaming Key: SPARK-34439 URL: https://issues.apache.org/jira/browse/SPARK-34439 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.2.0 Reporter: Maxim Gekk Currently, v1 ALTER TABLE .. RENAME TO doesn't recognize spark_catalog in new view/table identifiers. The example below demonstrates the issue: {code:scala} spark-sql> CREATE DATABASE db; spark-sql> CREATE TABLE spark_catalog.db.tbl (c0 INT) USING parquet; spark-sql> INSERT INTO spark_catalog.db.tbl SELECT 0; spark-sql> SELECT * FROM spark_catalog.db.tbl; 0 spark-sql> ALTER TABLE spark_catalog.db.tbl RENAME TO spark_catalog.db.tbl2; Error in query: spark_catalog.db.tbl2 is not a valid TableIdentifier as it has more than 2 name parts. {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34440) Allow saving/loading datetime in ORC w/o rebasing
Maxim Gekk created SPARK-34440: -- Summary: Allow saving/loading datetime in ORC w/o rebasing Key: SPARK-34440 URL: https://issues.apache.org/jira/browse/SPARK-34440 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 3.1.0 Reporter: Maxim Gekk Assignee: Maxim Gekk Fix For: 3.1.0 Currently, Spark always performs rebasing of INT96 columns in Parquet datasource but this is not required by parquet spec. This tickets aims to allow users to turn off rebasing via SQL config. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34440) Allow saving/loading datetime in ORC w/o rebasing
[ https://issues.apache.org/jira/browse/SPARK-34440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-34440: --- Fix Version/s: (was: 3.1.0) 3.2.0 > Allow saving/loading datetime in ORC w/o rebasing > - > > Key: SPARK-34440 > URL: https://issues.apache.org/jira/browse/SPARK-34440 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.1.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > Fix For: 3.2.0 > > > Currently, Spark always performs rebasing of INT96 columns in Parquet > datasource but this is not required by parquet spec. This tickets aims to > allow users to turn off rebasing via SQL config. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34440) Allow saving/loading datetime in ORC w/o rebasing
[ https://issues.apache.org/jira/browse/SPARK-34440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-34440: --- Description: Currently, Spark always performs rebasing of date/timestamp columns in ORC datasource but this is not required by parquet spec. This tickets aims to allow users to turn off rebasing via SQL configs or DS options. (was: Currently, Spark always performs rebasing of INT96 columns in Parquet datasource but this is not required by parquet spec. This tickets aims to allow users to turn off rebasing via SQL config.) > Allow saving/loading datetime in ORC w/o rebasing > - > > Key: SPARK-34440 > URL: https://issues.apache.org/jira/browse/SPARK-34440 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.1.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > Fix For: 3.2.0 > > > Currently, Spark always performs rebasing of date/timestamp columns in ORC > datasource but this is not required by parquet spec. This tickets aims to > allow users to turn off rebasing via SQL configs or DS options. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org