date:20210607

[jira] [Created] (SPARK-35662) Support Timestamp without time zone data type

2021-06-07 Thread Gengliang Wang (Jira)

Gengliang Wang created SPARK-35662:
--

 Summary: Support Timestamp without time zone data type
 Key: SPARK-35662
 URL: https://issues.apache.org/jira/browse/SPARK-35662
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Affects Versions: 3.2.0
Reporter: Gengliang Wang
Assignee: Apache Spark


Spark SQL today supports the TIMESTAMP data type. However the semantics 
provided actually match TIMESTAMP WITH LOCAL TIMEZONE as defined by Oracle. 
Timestamps embedded in a SQL query or passed through JDBC are presumed to be in 
session local timezone and cast to UTC before being processed.
These are desirable semantics in many cases, such as when dealing with 
calendars.
In many (more) other cases, such as when dealing with log files it is desirable 
that the provided timestamps not be altered.
SQL users expect that they can model either behavior and do so by using 
TIMESTAMP WITHOUT TIME ZONE for time zone insensitive data and TIMESTAMP WITH 
LOCAL TIME ZONE for time zone sensitive data.
Most traditional RDBMS map TIMESTAMP to TIMESTAMP WITHOUT TIME ZONE and will be 
surprised to see TIMESTAMP WITH LOCAL TIME ZONE, a feature that does not exist 
in the standard.

In this new feature, we will introduce TIMESTAMP WITH LOCAL TIMEZONE to 
describe the existing timestamp type and add TIMESTAMP WITHOUT TIME ZONE for 
standard semantic.
Using these two types will provide clarity.
We will also allow users to set the default behavior for TIMESTAMP to either 
use TIMESTAMP WITH LOCAL TIME ZONE or TIMESTAMP WITHOUT TIME ZONE.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-35662) Support Timestamp without time zone data type

2021-06-07 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang updated SPARK-35662:
---
Description: 
Spark SQL today supports the TIMESTAMP data type. However the semantics 
provided actually match TIMESTAMP WITH LOCAL TIMEZONE as defined by Oracle. 
Timestamps embedded in a SQL query or passed through JDBC are presumed to be in 
session local timezone and cast to UTC before being processed.
These are desirable semantics in many cases, such as when dealing with 
calendars.
In many (more) other cases, such as when dealing with log files it is desirable 
that the provided timestamps not be altered.
SQL users expect that they can model either behavior and do so by using 
TIMESTAMP WITHOUT TIME ZONE for time zone insensitive data and TIMESTAMP WITH 
LOCAL TIME ZONE for time zone sensitive data.
Most traditional RDBMS map TIMESTAMP to TIMESTAMP WITHOUT TIME ZONE and will be 
surprised to see TIMESTAMP WITH LOCAL TIME ZONE, a feature that does not exist 
in the standard.

In this new feature, we will introduce TIMESTAMP WITH LOCAL TIMEZONE to 
describe the existing timestamp type and add TIMESTAMP WITHOUT TIME ZONE for 
standard semantic.
Using these two types will provide clarity.
We will also allow users to set the default behavior for TIMESTAMP to either 
use TIMESTAMP WITH LOCAL TIME ZONE or TIMESTAMP WITHOUT TIME ZONE.

h3. Milestone 1 – Spark Timestamp equivalency ( The new Timestamp type 
TimestampNTZ meets or exceeds all function of the existing SQL Timestamp):

* Add a new DataType implementation for TimestampNTZ.
* Support TimestampNTZ in Dataset/UDF.
* TimestampNTZ literals 
* TimestampNTZ arithmetic(e.g. TimestampNTZ - TimestampNTZ, TimestampNTZ - Date)
* Datetime functions/operators: dayofweek, weekofyear, year, etc
* Cast to and from TimestampNTZ, cast String/Timestamp to TimestampNTZ, cast 
TimestampNTZ to string (pretty printing)/Timestamp, with the SQL syntax to 
specify the types
* Support sorting TimestampNTZ.

h3. Milestone 2 – Persistence:

* Ability to create tables of type TimestampNTZ
* Ability to write to common file formats such as Parquet and JSON.
* INSERT, SELECT, UPDATE, MERGE
* Discovery

h3. Milestone 3 – Client support

* JDBC support
* Hive Thrift server

h3. Milestone 4 – PySpark and Spark R integration

* Python UDF can take and return intervals
* DataFrame support

  was:
Spark SQL today supports the TIMESTAMP data type. However the semantics 
provided actually match TIMESTAMP WITH LOCAL TIMEZONE as defined by Oracle. 
Timestamps embedded in a SQL query or passed through JDBC are presumed to be in 
session local timezone and cast to UTC before being processed.
These are desirable semantics in many cases, such as when dealing with 
calendars.
In many (more) other cases, such as when dealing with log files it is desirable 
that the provided timestamps not be altered.
SQL users expect that they can model either behavior and do so by using 
TIMESTAMP WITHOUT TIME ZONE for time zone insensitive data and TIMESTAMP WITH 
LOCAL TIME ZONE for time zone sensitive data.
Most traditional RDBMS map TIMESTAMP to TIMESTAMP WITHOUT TIME ZONE and will be 
surprised to see TIMESTAMP WITH LOCAL TIME ZONE, a feature that does not exist 
in the standard.

In this new feature, we will introduce TIMESTAMP WITH LOCAL TIMEZONE to 
describe the existing timestamp type and add TIMESTAMP WITHOUT TIME ZONE for 
standard semantic.
Using these two types will provide clarity.
We will also allow users to set the default behavior for TIMESTAMP to either 
use TIMESTAMP WITH LOCAL TIME ZONE or TIMESTAMP WITHOUT TIME ZONE.


> Support Timestamp without time zone data type
> -
>
> Key: SPARK-35662
> URL: https://issues.apache.org/jira/browse/SPARK-35662
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Gengliang Wang
>Assignee: Apache Spark
>Priority: Major
>
> Spark SQL today supports the TIMESTAMP data type. However the semantics 
> provided actually match TIMESTAMP WITH LOCAL TIMEZONE as defined by Oracle. 
> Timestamps embedded in a SQL query or passed through JDBC are presumed to be 
> in session local timezone and cast to UTC before being processed.
> These are desirable semantics in many cases, such as when dealing with 
> calendars.
> In many (more) other cases, such as when dealing with log files it is 
> desirable that the provided timestamps not be altered.
> SQL users expect that they can model either behavior and do so by using 
> TIMESTAMP WITHOUT TIME ZONE for time zone insensitive data and TIMESTAMP WITH 
> LOCAL TIME ZONE for time zone sensitive data.
> Most traditional RDBMS map TIMESTAMP to TIMESTAMP WITHOUT TIME ZONE and will 
> be surprised to see TIMESTAMP WITH LOCAL TIME ZONE, a featu

[jira] [Updated] (SPARK-35662) Support Timestamp without time zone data type

2021-06-07 Thread Gengliang Wang (Jira)

[
https://issues.apache.org/jira/browse/SPARK-35662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Gengliang Wang updated SPARK-35662:
---
Description:
Spark SQL today supports the TIMESTAMP data type. However the semantics
provided actually match TIMESTAMP WITH LOCAL TIMEZONE as defined by Oracle.
Timestamps embedded in a SQL query or passed through JDBC are presumed to be in
session local timezone and cast to UTC before being processed.
These are desirable semantics in many cases, such as when dealing with
calendars.
In many (more) other cases, such as when dealing with log files it is
desirable that the provided timestamps not be altered.
SQL users expect that they can model either behavior and do so by using
TIMESTAMP WITHOUT TIME ZONE for time zone insensitive data and TIMESTAMP WITH
LOCAL TIME ZONE for time zone sensitive data.
Most traditional RDBMS map TIMESTAMP to TIMESTAMP WITHOUT TIME ZONE and will
be surprised to see TIMESTAMP WITH LOCAL TIME ZONE, a feature that does not
exist in the standard.

In this new feature, we will introduce TIMESTAMP WITH LOCAL TIMEZONE to
describe the existing timestamp type and add TIMESTAMP WITHOUT TIME ZONE for
standard semantic.
Using these two types will provide clarity.
We will also allow users to set the default behavior for TIMESTAMP to either
use TIMESTAMP WITH LOCAL TIME ZONE or TIMESTAMP WITHOUT TIME ZONE.
h3. Milestone 1 – Spark Timestamp equivalency ( The new Timestamp type
TimestampNTZ meets or exceeds all function of the existing SQL Timestamp):
* Add a new DataType implementation for TimestampNTZ.
* Support TimestampNTZ in Dataset/UDF.
* TimestampNTZ literals
* TimestampNTZ arithmetic(e.g. TimestampNTZ - TimestampNTZ, TimestampNTZ -
Date)
* Datetime functions/operators: dayofweek, weekofyear, year, etc
* Cast to and from TimestampNTZ, cast String/Timestamp to TimestampNTZ, cast
TimestampNTZ to string (pretty printing)/Timestamp, with the SQL syntax to
specify the types
* Support sorting TimestampNTZ.

h3. Milestone 2 – Persistence:
* Ability to create tables of type TimestampNTZ
* Ability to write to common file formats such as Parquet and JSON.
* INSERT, SELECT, UPDATE, MERGE
* Discovery

h3. Milestone 3 – Client support
* JDBC support
* Hive Thrift server

h3. Milestone 4 – PySpark and Spark R integration
* Python UDF can take and return TimestampNTZ
* DataFrame support

was:
Spark SQL today supports the TIMESTAMP data type. However the semantics
provided actually match TIMESTAMP WITH LOCAL TIMEZONE as defined by Oracle.
Timestamps embedded in a SQL query or passed through JDBC are presumed to be in
session local timezone and cast to UTC before being processed.
These are desirable semantics in many cases, such as when dealing with
calendars.
In many (more) other cases, such as when dealing with log files it is desirable
that the provided timestamps not be altered.
SQL users expect that they can model either behavior and do so by using
TIMESTAMP WITHOUT TIME ZONE for time zone insensitive data and TIMESTAMP WITH
LOCAL TIME ZONE for time zone sensitive data.
Most traditional RDBMS map TIMESTAMP to TIMESTAMP WITHOUT TIME ZONE and will be
surprised to see TIMESTAMP WITH LOCAL TIME ZONE, a feature that does not exist
in the standard.

h3. Milestone 1 – Spark Timestamp equivalency ( The new Timestamp type
TimestampNTZ meets or exceeds all function of the existing SQL Timestamp):

* Add a new DataType implementation for TimestampNTZ.
* Support TimestampNTZ in Dataset/UDF.
* TimestampNTZ literals
* TimestampNTZ arithmetic(e.g. TimestampNTZ - TimestampNTZ, TimestampNTZ - Date)
* Datetime functions/operators: dayofweek, weekofyear, year, etc
* Cast to and from TimestampNTZ, cast String/Timestamp to TimestampNTZ, cast
TimestampNTZ to string (pretty printing)/Timestamp, with the SQL syntax to
specify the types
* Support sorting TimestampNTZ.

h3. Milestone 2 – Persistence:

* Ability to create tables of type TimestampNTZ
* Ability to write to common file formats such as Parquet and JSON.
* INSERT, SELECT, UPDATE, MERGE
* Discovery

h3. Milestone 3 – Client support

* JDBC support
* Hive Thrift server

h3. Milestone 4 – PySpark and Spark R integration

* Python UDF can take and return intervals
* DataFrame support

> Support Timestamp without time zone data type
> -
>
> Key: SPARK-35662
> URL: https://issues.apache.org/jira/browse/SPARK-35662
> Project: Spark
> Issue Type: New Feature
>

[jira] [Created] (SPARK-35663) Add Timestamp without time zone type

2021-06-07 Thread Gengliang Wang (Jira)

Gengliang Wang created SPARK-35663:
--

 Summary: Add Timestamp without time zone type
 Key: SPARK-35663
 URL: https://issues.apache.org/jira/browse/SPARK-35663
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Gengliang Wang
Assignee: Gengliang Wang


Extend Catalyst's type system by a new type that conform to the SQL standard 
(see SQL:2016, section 4.6.2):

* TimestampNTZType represents the timestamp without time zone type



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35646) Merge contents and remove obsolete pages in API reference section

2021-06-07 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-35646:


Assignee: Hyukjin Kwon

> Merge contents and remove obsolete pages in API reference section
> -
>
> Key: SPARK-35646
> URL: https://issues.apache.org/jira/browse/SPARK-35646
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>
> Now Koalas documentation is in PySpark documentations. We should probably now 
> remove obsolete pages such as blog post and talks. Also, we should refine and 
> merge contents properly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-35646) Merge contents and remove obsolete pages in API reference section

2021-06-07 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-35646.
--
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 32799
[https://github.com/apache/spark/pull/32799]

> Merge contents and remove obsolete pages in API reference section
> -
>
> Key: SPARK-35646
> URL: https://issues.apache.org/jira/browse/SPARK-35646
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.2.0
>
>
> Now Koalas documentation is in PySpark documentations. We should probably now 
> remove obsolete pages such as blog post and talks. Also, we should refine and 
> merge contents properly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-35664) Support java.time. LocalDateTime as an external type of TimestampNTZ type

2021-06-07 Thread Gengliang Wang (Jira)

Gengliang Wang created SPARK-35664:
--

 Summary: Support java.time. LocalDateTime as an external type of 
TimestampNTZ type
 Key: SPARK-35664
 URL: https://issues.apache.org/jira/browse/SPARK-35664
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Gengliang Wang
Assignee: Gengliang Wang


Allow parallelization/collection of java.time.LocalDateTime values, and convert 
the values to timestamp values of TimestampNTZType.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

1 2 >

1 - 100 of 117 matches

Mail list logo