[jira] [Updated] (HIVE-9482) Hive parquet timestamp compatibility

2015-01-30 Thread Szehon Ho (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szehon Ho updated HIVE-9482:

Labels:   (was: TODOC1.2)

Done, added new section for 
[Parquet|https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-Parquet]
 and mention this property.

> Hive parquet timestamp compatibility
> 
>
> Key: HIVE-9482
> URL: https://issues.apache.org/jira/browse/HIVE-9482
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Affects Versions: 0.15.0
>Reporter: Szehon Ho
>Assignee: Szehon Ho
> Fix For: 1.2.0
>
> Attachments: HIVE-9482.2.patch, HIVE-9482.patch, HIVE-9482.patch, 
> parquet_external_time.parq
>
>
> In current Hive implementation, timestamps are stored in UTC (converted from 
> current timezone), based on original parquet timestamp spec.
> However, we find this is not compatibility with other tools, and after some 
> investigation it is not the way of the other file formats, or even some 
> databases (Hive Timestamp is more equivalent of 'timestamp without timezone' 
> datatype).
> This is the first part of the fix, which will restore compatibility with 
> parquet-timestamp files generated by external tools by skipping conversion on 
> reading.
> Later fix will change the write path to not convert, and stop the 
> read-conversion even for files written by Hive itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9482) Hive parquet timestamp compatibility

2015-01-29 Thread Szehon Ho (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szehon Ho updated HIVE-9482:

Labels: TODOC1.2  (was: )

Adds property "hive.parquet.timestamp.skip.conversion", which needs to be 
documented.

> Hive parquet timestamp compatibility
> 
>
> Key: HIVE-9482
> URL: https://issues.apache.org/jira/browse/HIVE-9482
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Affects Versions: 0.15.0
>Reporter: Szehon Ho
>Assignee: Szehon Ho
>  Labels: TODOC1.2
> Fix For: 1.2.0
>
> Attachments: HIVE-9482.2.patch, HIVE-9482.patch, HIVE-9482.patch, 
> parquet_external_time.parq
>
>
> In current Hive implementation, timestamps are stored in UTC (converted from 
> current timezone), based on original parquet timestamp spec.
> However, we find this is not compatibility with other tools, and after some 
> investigation it is not the way of the other file formats, or even some 
> databases (Hive Timestamp is more equivalent of 'timestamp without timezone' 
> datatype).
> This is the first part of the fix, which will restore compatibility with 
> parquet-timestamp files generated by external tools by skipping conversion on 
> reading.
> Later fix will change the write path to not convert, and stop the 
> read-conversion even for files written by Hive itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9482) Hive parquet timestamp compatibility

2015-01-29 Thread Szehon Ho (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szehon Ho updated HIVE-9482:

   Resolution: Fixed
Fix Version/s: (was: 0.15.0)
   1.2.0
   Status: Resolved  (was: Patch Available)

Committed to trunk.  Thanks Brock for review.

> Hive parquet timestamp compatibility
> 
>
> Key: HIVE-9482
> URL: https://issues.apache.org/jira/browse/HIVE-9482
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Affects Versions: 0.15.0
>Reporter: Szehon Ho
>Assignee: Szehon Ho
>  Labels: TODOC1.2
> Fix For: 1.2.0
>
> Attachments: HIVE-9482.2.patch, HIVE-9482.patch, HIVE-9482.patch, 
> parquet_external_time.parq
>
>
> In current Hive implementation, timestamps are stored in UTC (converted from 
> current timezone), based on original parquet timestamp spec.
> However, we find this is not compatibility with other tools, and after some 
> investigation it is not the way of the other file formats, or even some 
> databases (Hive Timestamp is more equivalent of 'timestamp without timezone' 
> datatype).
> This is the first part of the fix, which will restore compatibility with 
> parquet-timestamp files generated by external tools by skipping conversion on 
> reading.
> Later fix will change the write path to not convert, and stop the 
> read-conversion even for files written by Hive itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9482) Hive parquet timestamp compatibility

2015-01-28 Thread Szehon Ho (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szehon Ho updated HIVE-9482:

Attachment: HIVE-9482.2.patch

Address review comments.

> Hive parquet timestamp compatibility
> 
>
> Key: HIVE-9482
> URL: https://issues.apache.org/jira/browse/HIVE-9482
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Affects Versions: 0.15.0
>Reporter: Szehon Ho
>Assignee: Szehon Ho
> Fix For: 0.15.0
>
> Attachments: HIVE-9482.2.patch, HIVE-9482.patch, HIVE-9482.patch, 
> parquet_external_time.parq
>
>
> In current Hive implementation, timestamps are stored in UTC (converted from 
> current timezone), based on original parquet timestamp spec.
> However, we find this is not compatibility with other tools, and after some 
> investigation it is not the way of the other file formats, or even some 
> databases (Hive Timestamp is more equivalent of 'timestamp without timezone' 
> datatype).
> This is the first part of the fix, which will restore compatibility with 
> parquet-timestamp files generated by external tools by skipping conversion on 
> reading.
> Later fix will change the write path to not convert, and stop the 
> read-conversion even for files written by Hive itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9482) Hive parquet timestamp compatibility

2015-01-28 Thread Szehon Ho (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szehon Ho updated HIVE-9482:

Attachment: HIVE-9482.patch

Attaching again to trigger test

> Hive parquet timestamp compatibility
> 
>
> Key: HIVE-9482
> URL: https://issues.apache.org/jira/browse/HIVE-9482
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Affects Versions: 0.15.0
>Reporter: Szehon Ho
>Assignee: Szehon Ho
> Fix For: 0.15.0
>
> Attachments: HIVE-9482.patch, HIVE-9482.patch, 
> parquet_external_time.parq
>
>
> In current Hive implementation, timestamps are stored in UTC (converted from 
> current timezone), based on original parquet timestamp spec.
> However, we find this is not compatibility with other tools, and after some 
> investigation it is not the way of the other file formats, or even some 
> databases (Hive Timestamp is more equivalent of 'timestamp without timezone' 
> datatype).
> This is the first part of the fix, which will restore compatibility with 
> parquet-timestamp files generated by external tools by skipping conversion on 
> reading.
> Later fix will change the write path to not convert, and stop the 
> read-conversion even for files written by Hive itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9482) Hive parquet timestamp compatibility

2015-01-27 Thread Szehon Ho (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szehon Ho updated HIVE-9482:

Attachment: parquet_external_time.parq

Attaching the new data file which is binary and cannot be displayed in the 
patch.  This should go in /data/files

> Hive parquet timestamp compatibility
> 
>
> Key: HIVE-9482
> URL: https://issues.apache.org/jira/browse/HIVE-9482
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Affects Versions: 0.15.0
>Reporter: Szehon Ho
>Assignee: Szehon Ho
> Fix For: 0.15.0
>
> Attachments: HIVE-9482.patch, parquet_external_time.parq
>
>
> In current Hive implementation, timestamps are stored in UTC (converted from 
> current timezone), based on original parquet timestamp spec.
> However, we find this is not compatibility with other tools, and after some 
> investigation it is not the way of the other file formats, or even some 
> databases (Hive Timestamp is more equivalent of 'timestamp without timezone' 
> datatype).
> This is the first part of the fix, which will restore compatibility with 
> parquet-timestamp files generated by external tools by skipping conversion on 
> reading.
> Later fix will change the write path to not convert, and stop the 
> read-conversion even for files written by Hive itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9482) Hive parquet timestamp compatibility

2015-01-27 Thread Szehon Ho (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szehon Ho updated HIVE-9482:

Status: Patch Available  (was: Open)

> Hive parquet timestamp compatibility
> 
>
> Key: HIVE-9482
> URL: https://issues.apache.org/jira/browse/HIVE-9482
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Affects Versions: 0.15.0
>Reporter: Szehon Ho
>Assignee: Szehon Ho
> Fix For: 0.15.0
>
> Attachments: HIVE-9482.patch, parquet_external_time.parq
>
>
> In current Hive implementation, timestamps are stored in UTC (converted from 
> current timezone), based on original parquet timestamp spec.
> However, we find this is not compatibility with other tools, and after some 
> investigation it is not the way of the other file formats, or even some 
> databases (Hive Timestamp is more equivalent of 'timestamp without timezone' 
> datatype).
> This is the first part of the fix, which will restore compatibility with 
> parquet-timestamp files generated by external tools by skipping conversion on 
> reading.
> Later fix will change the write path to not convert, and stop the 
> read-conversion even for files written by Hive itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9482) Hive parquet timestamp compatibility

2015-01-27 Thread Szehon Ho (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szehon Ho updated HIVE-9482:

Attachment: HIVE-9482.patch

> Hive parquet timestamp compatibility
> 
>
> Key: HIVE-9482
> URL: https://issues.apache.org/jira/browse/HIVE-9482
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Affects Versions: 0.15.0
>Reporter: Szehon Ho
>Assignee: Szehon Ho
> Fix For: 0.15.0
>
> Attachments: HIVE-9482.patch
>
>
> In current Hive implementation, timestamps are stored in UTC (converted from 
> current timezone), based on original parquet timestamp spec.
> However, we find this is not compatibility with other tools, and after some 
> investigation it is not the way of the other file formats, or even some 
> databases (Hive Timestamp is more equivalent of 'timestamp without timezone' 
> datatype).
> This is the first part of the fix, which will restore compatibility with 
> parquet-timestamp files generated by external tools by skipping conversion on 
> reading.
> Later fix will change the write path to not convert, and stop the 
> read-conversion even for files written by Hive itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)