[jira] [Work logged] (HIVE-23553) Upgrade ORC version to 1.6.7

2021-02-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23553?focusedWorklogId=548277&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-548277
 ]

ASF GitHub Bot logged work on HIVE-23553:
-

Author: ASF GitHub Bot
Created on: 05/Feb/21 10:05
Start Date: 05/Feb/21 10:05
Worklog Time Spent: 10m 
  Work Description: dongjoon-hyun commented on pull request #1823:
URL: https://github.com/apache/hive/pull/1823#issuecomment-773584083


   Thank you so much all! It's great!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 548277)
Time Spent: 10h  (was: 9h 50m)

> Upgrade ORC version to 1.6.7
> 
>
> Key: HIVE-23553
> URL: https://issues.apache.org/jira/browse/HIVE-23553
> Project: Hive
>  Issue Type: Improvement
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 10h
>  Remaining Estimate: 0h
>
>  Apache Hive is currently on 1.5.X version and in order to take advantage of 
> the latest ORC improvements such as column encryption we have to bump to 
> 1.6.X.
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12343288&styleName=&projectId=12318320&Create=Create&atl_token=A5KQ-2QAV-T4JA-FDED_4ae78f19321c7fb1e7f337fba1dd90af751d8810_lin
> Even though ORC reader could work out of the box, HIVE LLAP is heavily 
> depending on internal ORC APIs e.g., to retrieve and store File Footers, 
> Tails, streams – un/compress RG data etc. As there ware many internal changes 
> from 1.5 to 1.6 (Input stream offsets, relative BufferChunks etc.) the 
> upgrade is not straightforward.
> This Umbrella Jira tracks this upgrade effort.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23553) Upgrade ORC version to 1.6.7

2021-02-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23553?focusedWorklogId=548129&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-548129
 ]

ASF GitHub Bot logged work on HIVE-23553:
-

Author: ASF GitHub Bot
Created on: 05/Feb/21 09:48
Start Date: 05/Feb/21 09:48
Worklog Time Spent: 10m 
  Work Description: pgaref commented on a change in pull request #1823:
URL: https://github.com/apache/hive/pull/1823#discussion_r570105800



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/EncodedTreeReaderFactory.java
##
@@ -2585,6 +2590,7 @@ private static TreeReader getPrimitiveTreeReader(final 
int columnIndex,
 .setColumnEncoding(columnEncoding)
 .setVectors(vectors)
 .setContext(context)
+.setIsInstant(columnType.getCategory()  == 
TypeDescription.Category.TIMESTAMP_INSTANT)

Review comment:
   Sure thing! This is now tracked as HIVE-24735 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 548129)
Time Spent: 9h 50m  (was: 9h 40m)

> Upgrade ORC version to 1.6.7
> 
>
> Key: HIVE-23553
> URL: https://issues.apache.org/jira/browse/HIVE-23553
> Project: Hive
>  Issue Type: Improvement
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 9h 50m
>  Remaining Estimate: 0h
>
>  Apache Hive is currently on 1.5.X version and in order to take advantage of 
> the latest ORC improvements such as column encryption we have to bump to 
> 1.6.X.
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12343288&styleName=&projectId=12318320&Create=Create&atl_token=A5KQ-2QAV-T4JA-FDED_4ae78f19321c7fb1e7f337fba1dd90af751d8810_lin
> Even though ORC reader could work out of the box, HIVE LLAP is heavily 
> depending on internal ORC APIs e.g., to retrieve and store File Footers, 
> Tails, streams – un/compress RG data etc. As there ware many internal changes 
> from 1.5 to 1.6 (Input stream offsets, relative BufferChunks etc.) the 
> upgrade is not straightforward.
> This Umbrella Jira tracks this upgrade effort.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23553) Upgrade ORC version to 1.6.7

2021-02-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23553?focusedWorklogId=547850&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-547850
 ]

ASF GitHub Bot logged work on HIVE-23553:
-

Author: ASF GitHub Bot
Created on: 04/Feb/21 20:28
Start Date: 04/Feb/21 20:28
Worklog Time Spent: 10m 
  Work Description: dongjoon-hyun commented on pull request #1823:
URL: https://github.com/apache/hive/pull/1823#issuecomment-773584083


   Thank you so much all! It's great!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 547850)
Time Spent: 9h 40m  (was: 9.5h)

> Upgrade ORC version to 1.6.7
> 
>
> Key: HIVE-23553
> URL: https://issues.apache.org/jira/browse/HIVE-23553
> Project: Hive
>  Issue Type: Improvement
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 9h 40m
>  Remaining Estimate: 0h
>
>  Apache Hive is currently on 1.5.X version and in order to take advantage of 
> the latest ORC improvements such as column encryption we have to bump to 
> 1.6.X.
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12343288&styleName=&projectId=12318320&Create=Create&atl_token=A5KQ-2QAV-T4JA-FDED_4ae78f19321c7fb1e7f337fba1dd90af751d8810_lin
> Even though ORC reader could work out of the box, HIVE LLAP is heavily 
> depending on internal ORC APIs e.g., to retrieve and store File Footers, 
> Tails, streams – un/compress RG data etc. As there ware many internal changes 
> from 1.5 to 1.6 (Input stream offsets, relative BufferChunks etc.) the 
> upgrade is not straightforward.
> This Umbrella Jira tracks this upgrade effort.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23553) Upgrade ORC version to 1.6.7

2021-02-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23553?focusedWorklogId=547552&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-547552
 ]

ASF GitHub Bot logged work on HIVE-23553:
-

Author: ASF GitHub Bot
Created on: 04/Feb/21 10:18
Start Date: 04/Feb/21 10:18
Worklog Time Spent: 10m 
  Work Description: pgaref commented on a change in pull request #1823:
URL: https://github.com/apache/hive/pull/1823#discussion_r570105800



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/EncodedTreeReaderFactory.java
##
@@ -2585,6 +2590,7 @@ private static TreeReader getPrimitiveTreeReader(final 
int columnIndex,
 .setColumnEncoding(columnEncoding)
 .setVectors(vectors)
 .setContext(context)
+.setIsInstant(columnType.getCategory()  == 
TypeDescription.Category.TIMESTAMP_INSTANT)

Review comment:
   Sure thing! This is now tracked as HIVE-24735 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 547552)
Time Spent: 9.5h  (was: 9h 20m)

> Upgrade ORC version to 1.6.7
> 
>
> Key: HIVE-23553
> URL: https://issues.apache.org/jira/browse/HIVE-23553
> Project: Hive
>  Issue Type: Improvement
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 9.5h
>  Remaining Estimate: 0h
>
>  Apache Hive is currently on 1.5.X version and in order to take advantage of 
> the latest ORC improvements such as column encryption we have to bump to 
> 1.6.X.
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12343288&styleName=&projectId=12318320&Create=Create&atl_token=A5KQ-2QAV-T4JA-FDED_4ae78f19321c7fb1e7f337fba1dd90af751d8810_lin
> Even though ORC reader could work out of the box, HIVE LLAP is heavily 
> depending on internal ORC APIs e.g., to retrieve and store File Footers, 
> Tails, streams – un/compress RG data etc. As there ware many internal changes 
> from 1.5 to 1.6 (Input stream offsets, relative BufferChunks etc.) the 
> upgrade is not straightforward.
> This Umbrella Jira tracks this upgrade effort.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23553) Upgrade ORC version to 1.6.7

2021-02-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23553?focusedWorklogId=547351&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-547351
 ]

ASF GitHub Bot logged work on HIVE-23553:
-

Author: ASF GitHub Bot
Created on: 04/Feb/21 00:02
Start Date: 04/Feb/21 00:02
Worklog Time Spent: 10m 
  Work Description: jcamachor merged pull request #1823:
URL: https://github.com/apache/hive/pull/1823


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 547351)
Time Spent: 9h 20m  (was: 9h 10m)

> Upgrade ORC version to 1.6.7
> 
>
> Key: HIVE-23553
> URL: https://issues.apache.org/jira/browse/HIVE-23553
> Project: Hive
>  Issue Type: Improvement
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 9h 20m
>  Remaining Estimate: 0h
>
>  Apache Hive is currently on 1.5.X version and in order to take advantage of 
> the latest ORC improvements such as column encryption we have to bump to 
> 1.6.X.
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12343288&styleName=&projectId=12318320&Create=Create&atl_token=A5KQ-2QAV-T4JA-FDED_4ae78f19321c7fb1e7f337fba1dd90af751d8810_lin
> Even though ORC reader could work out of the box, HIVE LLAP is heavily 
> depending on internal ORC APIs e.g., to retrieve and store File Footers, 
> Tails, streams – un/compress RG data etc. As there ware many internal changes 
> from 1.5 to 1.6 (Input stream offsets, relative BufferChunks etc.) the 
> upgrade is not straightforward.
> This Umbrella Jira tracks this upgrade effort.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23553) Upgrade ORC version to 1.6.7

2021-02-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23553?focusedWorklogId=547313&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-547313
 ]

ASF GitHub Bot logged work on HIVE-23553:
-

Author: ASF GitHub Bot
Created on: 03/Feb/21 22:17
Start Date: 03/Feb/21 22:17
Worklog Time Spent: 10m 
  Work Description: mustafaiman commented on pull request #1823:
URL: https://github.com/apache/hive/pull/1823#issuecomment-772864136


   Looks good to me. Thanks for the effort @pgaref 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 547313)
Time Spent: 9h 10m  (was: 9h)

> Upgrade ORC version to 1.6.7
> 
>
> Key: HIVE-23553
> URL: https://issues.apache.org/jira/browse/HIVE-23553
> Project: Hive
>  Issue Type: Improvement
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 9h 10m
>  Remaining Estimate: 0h
>
>  Apache Hive is currently on 1.5.X version and in order to take advantage of 
> the latest ORC improvements such as column encryption we have to bump to 
> 1.6.X.
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12343288&styleName=&projectId=12318320&Create=Create&atl_token=A5KQ-2QAV-T4JA-FDED_4ae78f19321c7fb1e7f337fba1dd90af751d8810_lin
> Even though ORC reader could work out of the box, HIVE LLAP is heavily 
> depending on internal ORC APIs e.g., to retrieve and store File Footers, 
> Tails, streams – un/compress RG data etc. As there ware many internal changes 
> from 1.5 to 1.6 (Input stream offsets, relative BufferChunks etc.) the 
> upgrade is not straightforward.
> This Umbrella Jira tracks this upgrade effort.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23553) Upgrade ORC version to 1.6.7

2021-02-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23553?focusedWorklogId=547303&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-547303
 ]

ASF GitHub Bot logged work on HIVE-23553:
-

Author: ASF GitHub Bot
Created on: 03/Feb/21 21:55
Start Date: 03/Feb/21 21:55
Worklog Time Spent: 10m 
  Work Description: jcamachor commented on pull request #1823:
URL: https://github.com/apache/hive/pull/1823#issuecomment-772850624


   Thanks for addressing the comments @pgaref . I am fine from my side, +1.
   
   I'd like to hear from @mustafaiman , if it's fine from his side, we can 
merge it.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 547303)
Time Spent: 9h  (was: 8h 50m)

> Upgrade ORC version to 1.6.7
> 
>
> Key: HIVE-23553
> URL: https://issues.apache.org/jira/browse/HIVE-23553
> Project: Hive
>  Issue Type: Improvement
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 9h
>  Remaining Estimate: 0h
>
>  Apache Hive is currently on 1.5.X version and in order to take advantage of 
> the latest ORC improvements such as column encryption we have to bump to 
> 1.6.X.
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12343288&styleName=&projectId=12318320&Create=Create&atl_token=A5KQ-2QAV-T4JA-FDED_4ae78f19321c7fb1e7f337fba1dd90af751d8810_lin
> Even though ORC reader could work out of the box, HIVE LLAP is heavily 
> depending on internal ORC APIs e.g., to retrieve and store File Footers, 
> Tails, streams – un/compress RG data etc. As there ware many internal changes 
> from 1.5 to 1.6 (Input stream offsets, relative BufferChunks etc.) the 
> upgrade is not straightforward.
> This Umbrella Jira tracks this upgrade effort.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23553) Upgrade ORC version to 1.6.7

2021-02-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23553?focusedWorklogId=547300&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-547300
 ]

ASF GitHub Bot logged work on HIVE-23553:
-

Author: ASF GitHub Bot
Created on: 03/Feb/21 21:52
Start Date: 03/Feb/21 21:52
Worklog Time Spent: 10m 
  Work Description: jcamachor commented on a change in pull request #1823:
URL: https://github.com/apache/hive/pull/1823#discussion_r569775411



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/EncodedTreeReaderFactory.java
##
@@ -2585,6 +2590,7 @@ private static TreeReader getPrimitiveTreeReader(final 
int columnIndex,
 .setColumnEncoding(columnEncoding)
 .setVectors(vectors)
 .setContext(context)
+.setIsInstant(columnType.getCategory()  == 
TypeDescription.Category.TIMESTAMP_INSTANT)

Review comment:
   @pgaref , can we create a follow-up JIRA to implement TIMESTAMP WITH 
LOCAL TIME ZONE integration with ORC so we do not forget about it?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 547300)
Time Spent: 8h 50m  (was: 8h 40m)

> Upgrade ORC version to 1.6.7
> 
>
> Key: HIVE-23553
> URL: https://issues.apache.org/jira/browse/HIVE-23553
> Project: Hive
>  Issue Type: Improvement
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 8h 50m
>  Remaining Estimate: 0h
>
>  Apache Hive is currently on 1.5.X version and in order to take advantage of 
> the latest ORC improvements such as column encryption we have to bump to 
> 1.6.X.
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12343288&styleName=&projectId=12318320&Create=Create&atl_token=A5KQ-2QAV-T4JA-FDED_4ae78f19321c7fb1e7f337fba1dd90af751d8810_lin
> Even though ORC reader could work out of the box, HIVE LLAP is heavily 
> depending on internal ORC APIs e.g., to retrieve and store File Footers, 
> Tails, streams – un/compress RG data etc. As there ware many internal changes 
> from 1.5 to 1.6 (Input stream offsets, relative BufferChunks etc.) the 
> upgrade is not straightforward.
> This Umbrella Jira tracks this upgrade effort.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23553) Upgrade ORC version to 1.6.7

2021-02-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23553?focusedWorklogId=547292&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-547292
 ]

ASF GitHub Bot logged work on HIVE-23553:
-

Author: ASF GitHub Bot
Created on: 03/Feb/21 21:40
Start Date: 03/Feb/21 21:40
Worklog Time Spent: 10m 
  Work Description: pgaref commented on pull request #1823:
URL: https://github.com/apache/hive/pull/1823#issuecomment-772842805


   Gentle ping @mustafaiman @jcamachor  -- any further comments here?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 547292)
Time Spent: 8h 40m  (was: 8.5h)

> Upgrade ORC version to 1.6.7
> 
>
> Key: HIVE-23553
> URL: https://issues.apache.org/jira/browse/HIVE-23553
> Project: Hive
>  Issue Type: Improvement
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 8h 40m
>  Remaining Estimate: 0h
>
>  Apache Hive is currently on 1.5.X version and in order to take advantage of 
> the latest ORC improvements such as column encryption we have to bump to 
> 1.6.X.
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12343288&styleName=&projectId=12318320&Create=Create&atl_token=A5KQ-2QAV-T4JA-FDED_4ae78f19321c7fb1e7f337fba1dd90af751d8810_lin
> Even though ORC reader could work out of the box, HIVE LLAP is heavily 
> depending on internal ORC APIs e.g., to retrieve and store File Footers, 
> Tails, streams – un/compress RG data etc. As there ware many internal changes 
> from 1.5 to 1.6 (Input stream offsets, relative BufferChunks etc.) the 
> upgrade is not straightforward.
> This Umbrella Jira tracks this upgrade effort.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23553) Upgrade ORC version to 1.6.7

2021-02-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23553?focusedWorklogId=546411&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-546411
 ]

ASF GitHub Bot logged work on HIVE-23553:
-

Author: ASF GitHub Bot
Created on: 03/Feb/21 01:01
Start Date: 03/Feb/21 01:01
Worklog Time Spent: 10m 
  Work Description: pgaref commented on pull request #1823:
URL: https://github.com/apache/hive/pull/1823#issuecomment-771227835


   > All q.out files show data size increase for tables. Since most of them are 
consistently additional 4 bytes per row, that seems like not a bug. However, I 
found some irregular increases too like 16 bytes per row. Can you explain why 
data size increased so we can check the irregularities and make sure they are 
expected?
   
   Hey @mustafaiman -- the main size differences are on Timestamp columns where 
we now support nanosecond precision (using 2 extra variables for the lower and 
the upper precision as part of the stats -- see 
[ORC-611](https://issues.apache.org/jira/browse/ORC-611)).
   
   Other than that there are other changes that can also affect size, such as: 
Trimming StringStatistics minimum and maximum values as part of ORC-203  or 
List and Map column statistics that was recently added as part of ORC-398.
   
   Happy to check further if you have doubts about a particular query.
   
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 546411)
Time Spent: 8.5h  (was: 8h 20m)

> Upgrade ORC version to 1.6.7
> 
>
> Key: HIVE-23553
> URL: https://issues.apache.org/jira/browse/HIVE-23553
> Project: Hive
>  Issue Type: Improvement
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 8.5h
>  Remaining Estimate: 0h
>
>  Apache Hive is currently on 1.5.X version and in order to take advantage of 
> the latest ORC improvements such as column encryption we have to bump to 
> 1.6.X.
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12343288&styleName=&projectId=12318320&Create=Create&atl_token=A5KQ-2QAV-T4JA-FDED_4ae78f19321c7fb1e7f337fba1dd90af751d8810_lin
> Even though ORC reader could work out of the box, HIVE LLAP is heavily 
> depending on internal ORC APIs e.g., to retrieve and store File Footers, 
> Tails, streams – un/compress RG data etc. As there ware many internal changes 
> from 1.5 to 1.6 (Input stream offsets, relative BufferChunks etc.) the 
> upgrade is not straightforward.
> This Umbrella Jira tracks this upgrade effort.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23553) Upgrade ORC version to 1.6.7

2021-02-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23553?focusedWorklogId=546366&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-546366
 ]

ASF GitHub Bot logged work on HIVE-23553:
-

Author: ASF GitHub Bot
Created on: 03/Feb/21 00:57
Start Date: 03/Feb/21 00:57
Worklog Time Spent: 10m 
  Work Description: pgaref commented on a change in pull request #1823:
URL: https://github.com/apache/hive/pull/1823#discussion_r568204044



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/EncodedTreeReaderFactory.java
##
@@ -2585,6 +2590,7 @@ private static TreeReader getPrimitiveTreeReader(final 
int columnIndex,
 .setColumnEncoding(columnEncoding)
 .setVectors(vectors)
 .setContext(context)
+.setIsInstant(columnType.getCategory()  == 
TypeDescription.Category.TIMESTAMP_INSTANT)

Review comment:
   Even though TimeStamp with local timezone was added as part of 
[ORC-189](https://issues.apache.org/jira/browse/ORC-189) 

##
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##
@@ -4509,7 +4509,7 @@ private static void populateLlapDaemonVarsSet(Set 
llapDaemonVarsSetLocal
 "Minimum allocation possible from LLAP buddy allocator. Allocations 
below that are\n" +
 "padded to minimum allocation. For ORC, should generally be the same 
as the expected\n" +
 "compression buffer size, or next lowest power of 2. Must be a power 
of 2."),
-LLAP_ALLOCATOR_MAX_ALLOC("hive.llap.io.allocator.alloc.max", "16Mb", new 
SizeValidator(),
+LLAP_ALLOCATOR_MAX_ALLOC("hive.llap.io.allocator.alloc.max", "4Mb", new 
SizeValidator(),

Review comment:
   LLAP_ALLOCATOR_MAX_ALLOC is used both for the LowLevelCacheImpl 
(buddyAllocator) and bufferSize on 
[WriterOptions](https://github.com/apache/hive/blob/da1aa077716a65c2a02d850828b16cdeece1f574/llap-server/src/java/org/apache/hadoop/hive/llap/io/encoded/SerDeEncodedDataReader.java#L1553)
  Please check how this propagated from 
[SerDeEncodedDataReader](https://github.com/apache/hive/blob/da1aa077716a65c2a02d850828b16cdeece1f574/llap-server/src/java/org/apache/hadoop/hive/llap/io/encoded/SerDeEncodedDataReader.java#L248)
 
   
   Llap is tightly coupled to ORC, thus it could make sense to use the same 
buffer size for serialized Buffers, and the ORC writer as we would not need to 
split/merge them  -- however I have nothing against splitting the conf or 
checking is the 8Mb limit is a hard one.
   All I am trying to say here is that this is orthogonal to the ORC version 
bump.

##
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##
@@ -4509,7 +4509,7 @@ private static void populateLlapDaemonVarsSet(Set 
llapDaemonVarsSetLocal
 "Minimum allocation possible from LLAP buddy allocator. Allocations 
below that are\n" +
 "padded to minimum allocation. For ORC, should generally be the same 
as the expected\n" +
 "compression buffer size, or next lowest power of 2. Must be a power 
of 2."),
-LLAP_ALLOCATOR_MAX_ALLOC("hive.llap.io.allocator.alloc.max", "16Mb", new 
SizeValidator(),
+LLAP_ALLOCATOR_MAX_ALLOC("hive.llap.io.allocator.alloc.max", "4Mb", new 
SizeValidator(),

Review comment:
   Sure, addressed as part of 2eca3b9de1f2332beabc3bde9ac0f89d62ec1527  -- 
also opened HIVE-24721 to investigate this further

##
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##
@@ -4509,7 +4509,7 @@ private static void populateLlapDaemonVarsSet(Set 
llapDaemonVarsSetLocal
 "Minimum allocation possible from LLAP buddy allocator. Allocations 
below that are\n" +
 "padded to minimum allocation. For ORC, should generally be the same 
as the expected\n" +
 "compression buffer size, or next lowest power of 2. Must be a power 
of 2."),
-LLAP_ALLOCATOR_MAX_ALLOC("hive.llap.io.allocator.alloc.max", "16Mb", new 
SizeValidator(),
+LLAP_ALLOCATOR_MAX_ALLOC("hive.llap.io.allocator.alloc.max", "4Mb", new 
SizeValidator(),

Review comment:
   Sure, addressed this as part of 2eca3b9de1f2332beabc3bde9ac0f89d62ec1527 
 -- also opened HIVE-24721 to investigate this further

##
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##
@@ -4509,7 +4509,7 @@ private static void populateLlapDaemonVarsSet(Set 
llapDaemonVarsSetLocal
 "Minimum allocation possible from LLAP buddy allocator. Allocations 
below that are\n" +
 "padded to minimum allocation. For ORC, should generally be the same 
as the expected\n" +
 "compression buffer size, or next lowest power of 2. Must be a power 
of 2."),
-LLAP_ALLOCATOR_MAX_ALLOC("hive.llap.io.allocator.alloc.max", "16Mb", new 
SizeValidator(),
+LLAP_ALLOCATOR_MAX_ALLOC("hive.llap.io.allocator.alloc.max", "4Mb", new 
SizeValidator()

[jira] [Work logged] (HIVE-23553) Upgrade ORC version to 1.6.7

2021-02-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23553?focusedWorklogId=546005&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-546005
 ]

ASF GitHub Bot logged work on HIVE-23553:
-

Author: ASF GitHub Bot
Created on: 02/Feb/21 12:39
Start Date: 02/Feb/21 12:39
Worklog Time Spent: 10m 
  Work Description: pgaref commented on a change in pull request #1823:
URL: https://github.com/apache/hive/pull/1823#discussion_r568567405



##
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##
@@ -4509,7 +4509,7 @@ private static void populateLlapDaemonVarsSet(Set 
llapDaemonVarsSetLocal
 "Minimum allocation possible from LLAP buddy allocator. Allocations 
below that are\n" +
 "padded to minimum allocation. For ORC, should generally be the same 
as the expected\n" +
 "compression buffer size, or next lowest power of 2. Must be a power 
of 2."),
-LLAP_ALLOCATOR_MAX_ALLOC("hive.llap.io.allocator.alloc.max", "16Mb", new 
SizeValidator(),
+LLAP_ALLOCATOR_MAX_ALLOC("hive.llap.io.allocator.alloc.max", "4Mb", new 
SizeValidator(),

Review comment:
   Sure, addressed this as part of 2eca3b9de1f2332beabc3bde9ac0f89d62ec1527 
 -- also opened HIVE-24721 to investigate this further

##
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##
@@ -4509,7 +4509,7 @@ private static void populateLlapDaemonVarsSet(Set 
llapDaemonVarsSetLocal
 "Minimum allocation possible from LLAP buddy allocator. Allocations 
below that are\n" +
 "padded to minimum allocation. For ORC, should generally be the same 
as the expected\n" +
 "compression buffer size, or next lowest power of 2. Must be a power 
of 2."),
-LLAP_ALLOCATOR_MAX_ALLOC("hive.llap.io.allocator.alloc.max", "16Mb", new 
SizeValidator(),
+LLAP_ALLOCATOR_MAX_ALLOC("hive.llap.io.allocator.alloc.max", "4Mb", new 
SizeValidator(),

Review comment:
   Sure, addressed this as part of 2eca3b9de1f2332beabc3bde9ac0f89d62ec1527 
 -- also opened HIVE-24721 to investigate further





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 546005)
Time Spent: 8h 10m  (was: 8h)

> Upgrade ORC version to 1.6.7
> 
>
> Key: HIVE-23553
> URL: https://issues.apache.org/jira/browse/HIVE-23553
> Project: Hive
>  Issue Type: Improvement
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 8h 10m
>  Remaining Estimate: 0h
>
>  Apache Hive is currently on 1.5.X version and in order to take advantage of 
> the latest ORC improvements such as column encryption we have to bump to 
> 1.6.X.
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12343288&styleName=&projectId=12318320&Create=Create&atl_token=A5KQ-2QAV-T4JA-FDED_4ae78f19321c7fb1e7f337fba1dd90af751d8810_lin
> Even though ORC reader could work out of the box, HIVE LLAP is heavily 
> depending on internal ORC APIs e.g., to retrieve and store File Footers, 
> Tails, streams – un/compress RG data etc. As there ware many internal changes 
> from 1.5 to 1.6 (Input stream offsets, relative BufferChunks etc.) the 
> upgrade is not straightforward.
> This Umbrella Jira tracks this upgrade effort.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23553) Upgrade ORC version to 1.6.7

2021-02-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23553?focusedWorklogId=546003&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-546003
 ]

ASF GitHub Bot logged work on HIVE-23553:
-

Author: ASF GitHub Bot
Created on: 02/Feb/21 12:36
Start Date: 02/Feb/21 12:36
Worklog Time Spent: 10m 
  Work Description: pgaref commented on a change in pull request #1823:
URL: https://github.com/apache/hive/pull/1823#discussion_r568567405



##
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##
@@ -4509,7 +4509,7 @@ private static void populateLlapDaemonVarsSet(Set 
llapDaemonVarsSetLocal
 "Minimum allocation possible from LLAP buddy allocator. Allocations 
below that are\n" +
 "padded to minimum allocation. For ORC, should generally be the same 
as the expected\n" +
 "compression buffer size, or next lowest power of 2. Must be a power 
of 2."),
-LLAP_ALLOCATOR_MAX_ALLOC("hive.llap.io.allocator.alloc.max", "16Mb", new 
SizeValidator(),
+LLAP_ALLOCATOR_MAX_ALLOC("hive.llap.io.allocator.alloc.max", "4Mb", new 
SizeValidator(),

Review comment:
   Sure, addressed as part of 2eca3b9de1f2332beabc3bde9ac0f89d62ec1527  -- 
also opened HIVE-24721 to investigate this further





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 546003)
Time Spent: 8h  (was: 7h 50m)

> Upgrade ORC version to 1.6.7
> 
>
> Key: HIVE-23553
> URL: https://issues.apache.org/jira/browse/HIVE-23553
> Project: Hive
>  Issue Type: Improvement
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 8h
>  Remaining Estimate: 0h
>
>  Apache Hive is currently on 1.5.X version and in order to take advantage of 
> the latest ORC improvements such as column encryption we have to bump to 
> 1.6.X.
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12343288&styleName=&projectId=12318320&Create=Create&atl_token=A5KQ-2QAV-T4JA-FDED_4ae78f19321c7fb1e7f337fba1dd90af751d8810_lin
> Even though ORC reader could work out of the box, HIVE LLAP is heavily 
> depending on internal ORC APIs e.g., to retrieve and store File Footers, 
> Tails, streams – un/compress RG data etc. As there ware many internal changes 
> from 1.5 to 1.6 (Input stream offsets, relative BufferChunks etc.) the 
> upgrade is not straightforward.
> This Umbrella Jira tracks this upgrade effort.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23553) Upgrade ORC version to 1.6.7

2021-02-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23553?focusedWorklogId=545970&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-545970
 ]

ASF GitHub Bot logged work on HIVE-23553:
-

Author: ASF GitHub Bot
Created on: 02/Feb/21 11:11
Start Date: 02/Feb/21 11:11
Worklog Time Spent: 10m 
  Work Description: pgaref commented on a change in pull request #1823:
URL: https://github.com/apache/hive/pull/1823#discussion_r568174337



##
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##
@@ -4509,7 +4509,7 @@ private static void populateLlapDaemonVarsSet(Set 
llapDaemonVarsSetLocal
 "Minimum allocation possible from LLAP buddy allocator. Allocations 
below that are\n" +
 "padded to minimum allocation. For ORC, should generally be the same 
as the expected\n" +
 "compression buffer size, or next lowest power of 2. Must be a power 
of 2."),
-LLAP_ALLOCATOR_MAX_ALLOC("hive.llap.io.allocator.alloc.max", "16Mb", new 
SizeValidator(),
+LLAP_ALLOCATOR_MAX_ALLOC("hive.llap.io.allocator.alloc.max", "4Mb", new 
SizeValidator(),

Review comment:
   LLAP_ALLOCATOR_MAX_ALLOC is used both for the LowLevelCacheImpl 
(buddyAllocator) and bufferSize on 
[WriterOptions](https://github.com/apache/hive/blob/da1aa077716a65c2a02d850828b16cdeece1f574/llap-server/src/java/org/apache/hadoop/hive/llap/io/encoded/SerDeEncodedDataReader.java#L1553)
  Please check how this propagated from 
[SerDeEncodedDataReader](https://github.com/apache/hive/blob/da1aa077716a65c2a02d850828b16cdeece1f574/llap-server/src/java/org/apache/hadoop/hive/llap/io/encoded/SerDeEncodedDataReader.java#L248)
 
   
   Llap is tightly coupled to ORC, thus it could make sense to use the same 
buffer size for serialized Buffers, and the ORC writer as we would not need to 
split/merge them  -- however I have nothing against splitting the conf or 
checking is the 8Mb limit is a hard one.
   All I am trying to say here is that this is orthogonal to the ORC version 
bump.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 545970)
Time Spent: 7h 50m  (was: 7h 40m)

> Upgrade ORC version to 1.6.7
> 
>
> Key: HIVE-23553
> URL: https://issues.apache.org/jira/browse/HIVE-23553
> Project: Hive
>  Issue Type: Improvement
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 7h 50m
>  Remaining Estimate: 0h
>
>  Apache Hive is currently on 1.5.X version and in order to take advantage of 
> the latest ORC improvements such as column encryption we have to bump to 
> 1.6.X.
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12343288&styleName=&projectId=12318320&Create=Create&atl_token=A5KQ-2QAV-T4JA-FDED_4ae78f19321c7fb1e7f337fba1dd90af751d8810_lin
> Even though ORC reader could work out of the box, HIVE LLAP is heavily 
> depending on internal ORC APIs e.g., to retrieve and store File Footers, 
> Tails, streams – un/compress RG data etc. As there ware many internal changes 
> from 1.5 to 1.6 (Input stream offsets, relative BufferChunks etc.) the 
> upgrade is not straightforward.
> This Umbrella Jira tracks this upgrade effort.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23553) Upgrade ORC version to 1.6.7

2021-02-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23553?focusedWorklogId=545689&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-545689
 ]

ASF GitHub Bot logged work on HIVE-23553:
-

Author: ASF GitHub Bot
Created on: 01/Feb/21 23:16
Start Date: 01/Feb/21 23:16
Worklog Time Spent: 10m 
  Work Description: pgaref commented on pull request #1823:
URL: https://github.com/apache/hive/pull/1823#issuecomment-771227835


   > All q.out files show data size increase for tables. Since most of them are 
consistently additional 4 bytes per row, that seems like not a bug. However, I 
found some irregular increases too like 16 bytes per row. Can you explain why 
data size increased so we can check the irregularities and make sure they are 
expected?
   
   Hey @mustafaiman -- the main size differences are on Timestamp columns where 
we now support nanosecond precision (using 2 extra variables for the lower and 
the upper precision as part of the stats -- see 
[ORC-611](https://issues.apache.org/jira/browse/ORC-611)).
   
   Other than that there are other changes that can also affect size, such as: 
Trimming StringStatistics minimum and maximum values as part of ORC-203  or 
List and Map column statistics that was recently added as part of ORC-398.
   
   Happy to check further if you have doubts about a particular query.
   
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 545689)
Time Spent: 7h 40m  (was: 7.5h)

> Upgrade ORC version to 1.6.7
> 
>
> Key: HIVE-23553
> URL: https://issues.apache.org/jira/browse/HIVE-23553
> Project: Hive
>  Issue Type: Improvement
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 7h 40m
>  Remaining Estimate: 0h
>
>  Apache Hive is currently on 1.5.X version and in order to take advantage of 
> the latest ORC improvements such as column encryption we have to bump to 
> 1.6.X.
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12343288&styleName=&projectId=12318320&Create=Create&atl_token=A5KQ-2QAV-T4JA-FDED_4ae78f19321c7fb1e7f337fba1dd90af751d8810_lin
> Even though ORC reader could work out of the box, HIVE LLAP is heavily 
> depending on internal ORC APIs e.g., to retrieve and store File Footers, 
> Tails, streams – un/compress RG data etc. As there ware many internal changes 
> from 1.5 to 1.6 (Input stream offsets, relative BufferChunks etc.) the 
> upgrade is not straightforward.
> This Umbrella Jira tracks this upgrade effort.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23553) Upgrade ORC version to 1.6.7

2021-02-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23553?focusedWorklogId=545684&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-545684
 ]

ASF GitHub Bot logged work on HIVE-23553:
-

Author: ASF GitHub Bot
Created on: 01/Feb/21 23:05
Start Date: 01/Feb/21 23:05
Worklog Time Spent: 10m 
  Work Description: pgaref commented on a change in pull request #1823:
URL: https://github.com/apache/hive/pull/1823#discussion_r568204044



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/EncodedTreeReaderFactory.java
##
@@ -2585,6 +2590,7 @@ private static TreeReader getPrimitiveTreeReader(final 
int columnIndex,
 .setColumnEncoding(columnEncoding)
 .setVectors(vectors)
 .setContext(context)
+.setIsInstant(columnType.getCategory()  == 
TypeDescription.Category.TIMESTAMP_INSTANT)

Review comment:
   Even though TimeStamp with local timezone was added as part of 
[ORC-189](https://issues.apache.org/jira/browse/ORC-189) 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 545684)
Time Spent: 7.5h  (was: 7h 20m)

> Upgrade ORC version to 1.6.7
> 
>
> Key: HIVE-23553
> URL: https://issues.apache.org/jira/browse/HIVE-23553
> Project: Hive
>  Issue Type: Improvement
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 7.5h
>  Remaining Estimate: 0h
>
>  Apache Hive is currently on 1.5.X version and in order to take advantage of 
> the latest ORC improvements such as column encryption we have to bump to 
> 1.6.X.
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12343288&styleName=&projectId=12318320&Create=Create&atl_token=A5KQ-2QAV-T4JA-FDED_4ae78f19321c7fb1e7f337fba1dd90af751d8810_lin
> Even though ORC reader could work out of the box, HIVE LLAP is heavily 
> depending on internal ORC APIs e.g., to retrieve and store File Footers, 
> Tails, streams – un/compress RG data etc. As there ware many internal changes 
> from 1.5 to 1.6 (Input stream offsets, relative BufferChunks etc.) the 
> upgrade is not straightforward.
> This Umbrella Jira tracks this upgrade effort.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23553) Upgrade ORC version to 1.6.7

2021-02-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23553?focusedWorklogId=545680&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-545680
 ]

ASF GitHub Bot logged work on HIVE-23553:
-

Author: ASF GitHub Bot
Created on: 01/Feb/21 22:58
Start Date: 01/Feb/21 22:58
Worklog Time Spent: 10m 
  Work Description: mustafaiman commented on a change in pull request #1823:
URL: https://github.com/apache/hive/pull/1823#discussion_r568200811



##
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##
@@ -4509,7 +4509,7 @@ private static void populateLlapDaemonVarsSet(Set 
llapDaemonVarsSetLocal
 "Minimum allocation possible from LLAP buddy allocator. Allocations 
below that are\n" +
 "padded to minimum allocation. For ORC, should generally be the same 
as the expected\n" +
 "compression buffer size, or next lowest power of 2. Must be a power 
of 2."),
-LLAP_ALLOCATOR_MAX_ALLOC("hive.llap.io.allocator.alloc.max", "16Mb", new 
SizeValidator(),
+LLAP_ALLOCATOR_MAX_ALLOC("hive.llap.io.allocator.alloc.max", "4Mb", new 
SizeValidator(),

Review comment:
   I see ORC strictly enforces this now. I would set the appropriate 
setting at Hive-ORC boundary and leave the LLAP_ALLOCATOR_MAX_ALLOC as it is 
(Math.min(llap.allocator.max, what ORC enforces). If you think we should set 
LLAP_ALLOCATOR_MAX_ALLOC to be the same as what ORC enforces, that can be done 
in a seperate ticket. Like you said this is orthogonal to ORC version bump, 
therefore should be discussed in its own ticket.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 545680)
Time Spent: 7h 20m  (was: 7h 10m)

> Upgrade ORC version to 1.6.7
> 
>
> Key: HIVE-23553
> URL: https://issues.apache.org/jira/browse/HIVE-23553
> Project: Hive
>  Issue Type: Improvement
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 7h 20m
>  Remaining Estimate: 0h
>
>  Apache Hive is currently on 1.5.X version and in order to take advantage of 
> the latest ORC improvements such as column encryption we have to bump to 
> 1.6.X.
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12343288&styleName=&projectId=12318320&Create=Create&atl_token=A5KQ-2QAV-T4JA-FDED_4ae78f19321c7fb1e7f337fba1dd90af751d8810_lin
> Even though ORC reader could work out of the box, HIVE LLAP is heavily 
> depending on internal ORC APIs e.g., to retrieve and store File Footers, 
> Tails, streams – un/compress RG data etc. As there ware many internal changes 
> from 1.5 to 1.6 (Input stream offsets, relative BufferChunks etc.) the 
> upgrade is not straightforward.
> This Umbrella Jira tracks this upgrade effort.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23553) Upgrade ORC version to 1.6.7

2021-02-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23553?focusedWorklogId=545643&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-545643
 ]

ASF GitHub Bot logged work on HIVE-23553:
-

Author: ASF GitHub Bot
Created on: 01/Feb/21 22:09
Start Date: 01/Feb/21 22:09
Worklog Time Spent: 10m 
  Work Description: pgaref commented on a change in pull request #1823:
URL: https://github.com/apache/hive/pull/1823#discussion_r568174337



##
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##
@@ -4509,7 +4509,7 @@ private static void populateLlapDaemonVarsSet(Set 
llapDaemonVarsSetLocal
 "Minimum allocation possible from LLAP buddy allocator. Allocations 
below that are\n" +
 "padded to minimum allocation. For ORC, should generally be the same 
as the expected\n" +
 "compression buffer size, or next lowest power of 2. Must be a power 
of 2."),
-LLAP_ALLOCATOR_MAX_ALLOC("hive.llap.io.allocator.alloc.max", "16Mb", new 
SizeValidator(),
+LLAP_ALLOCATOR_MAX_ALLOC("hive.llap.io.allocator.alloc.max", "4Mb", new 
SizeValidator(),

Review comment:
   LLAP_ALLOCATOR_MAX_ALLOC is used both for the LowLevelCacheImpl 
(buddyAllocator) and bufferSize on 
[WriterOptions](https://github.com/apache/hive/blob/da1aa077716a65c2a02d850828b16cdeece1f574/llap-server/src/java/org/apache/hadoop/hive/llap/io/encoded/SerDeEncodedDataReader.java#L1553)
 
   
   Please check how this propagated from 
[SerDeEncodedDataReader](https://github.com/apache/hive/blob/da1aa077716a65c2a02d850828b16cdeece1f574/llap-server/src/java/org/apache/hadoop/hive/llap/io/encoded/SerDeEncodedDataReader.java#L248)
 
   
   Llap is tightly coupled to ORC, thus it could make sense to use the same 
buffer size for serialized Buffers, and the ORC writer as we would not need to 
split/merge them  -- however I have nothing against splitting the conf or 
checking is the 8Mb limit is a hard one.
   All I am trying to say here is that this is orthogonal the ORC version bump.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 545643)
Time Spent: 7h  (was: 6h 50m)

> Upgrade ORC version to 1.6.7
> 
>
> Key: HIVE-23553
> URL: https://issues.apache.org/jira/browse/HIVE-23553
> Project: Hive
>  Issue Type: Improvement
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 7h
>  Remaining Estimate: 0h
>
>  Apache Hive is currently on 1.5.X version and in order to take advantage of 
> the latest ORC improvements such as column encryption we have to bump to 
> 1.6.X.
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12343288&styleName=&projectId=12318320&Create=Create&atl_token=A5KQ-2QAV-T4JA-FDED_4ae78f19321c7fb1e7f337fba1dd90af751d8810_lin
> Even though ORC reader could work out of the box, HIVE LLAP is heavily 
> depending on internal ORC APIs e.g., to retrieve and store File Footers, 
> Tails, streams – un/compress RG data etc. As there ware many internal changes 
> from 1.5 to 1.6 (Input stream offsets, relative BufferChunks etc.) the 
> upgrade is not straightforward.
> This Umbrella Jira tracks this upgrade effort.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23553) Upgrade ORC version to 1.6.7

2021-02-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23553?focusedWorklogId=545644&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-545644
 ]

ASF GitHub Bot logged work on HIVE-23553:
-

Author: ASF GitHub Bot
Created on: 01/Feb/21 22:09
Start Date: 01/Feb/21 22:09
Worklog Time Spent: 10m 
  Work Description: pgaref commented on a change in pull request #1823:
URL: https://github.com/apache/hive/pull/1823#discussion_r568174337



##
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##
@@ -4509,7 +4509,7 @@ private static void populateLlapDaemonVarsSet(Set 
llapDaemonVarsSetLocal
 "Minimum allocation possible from LLAP buddy allocator. Allocations 
below that are\n" +
 "padded to minimum allocation. For ORC, should generally be the same 
as the expected\n" +
 "compression buffer size, or next lowest power of 2. Must be a power 
of 2."),
-LLAP_ALLOCATOR_MAX_ALLOC("hive.llap.io.allocator.alloc.max", "16Mb", new 
SizeValidator(),
+LLAP_ALLOCATOR_MAX_ALLOC("hive.llap.io.allocator.alloc.max", "4Mb", new 
SizeValidator(),

Review comment:
   LLAP_ALLOCATOR_MAX_ALLOC is used both for the LowLevelCacheImpl 
(buddyAllocator) and bufferSize on 
[WriterOptions](https://github.com/apache/hive/blob/da1aa077716a65c2a02d850828b16cdeece1f574/llap-server/src/java/org/apache/hadoop/hive/llap/io/encoded/SerDeEncodedDataReader.java#L1553)
  Please check how this propagated from 
[SerDeEncodedDataReader](https://github.com/apache/hive/blob/da1aa077716a65c2a02d850828b16cdeece1f574/llap-server/src/java/org/apache/hadoop/hive/llap/io/encoded/SerDeEncodedDataReader.java#L248)
 
   
   Llap is tightly coupled to ORC, thus it could make sense to use the same 
buffer size for serialized Buffers, and the ORC writer as we would not need to 
split/merge them  -- however I have nothing against splitting the conf or 
checking is the 8Mb limit is a hard one.
   All I am trying to say here is that this is orthogonal the ORC version bump.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 545644)
Time Spent: 7h 10m  (was: 7h)

> Upgrade ORC version to 1.6.7
> 
>
> Key: HIVE-23553
> URL: https://issues.apache.org/jira/browse/HIVE-23553
> Project: Hive
>  Issue Type: Improvement
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 7h 10m
>  Remaining Estimate: 0h
>
>  Apache Hive is currently on 1.5.X version and in order to take advantage of 
> the latest ORC improvements such as column encryption we have to bump to 
> 1.6.X.
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12343288&styleName=&projectId=12318320&Create=Create&atl_token=A5KQ-2QAV-T4JA-FDED_4ae78f19321c7fb1e7f337fba1dd90af751d8810_lin
> Even though ORC reader could work out of the box, HIVE LLAP is heavily 
> depending on internal ORC APIs e.g., to retrieve and store File Footers, 
> Tails, streams – un/compress RG data etc. As there ware many internal changes 
> from 1.5 to 1.6 (Input stream offsets, relative BufferChunks etc.) the 
> upgrade is not straightforward.
> This Umbrella Jira tracks this upgrade effort.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23553) Upgrade ORC version to 1.6.7

2021-02-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23553?focusedWorklogId=545623&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-545623
 ]

ASF GitHub Bot logged work on HIVE-23553:
-

Author: ASF GitHub Bot
Created on: 01/Feb/21 21:54
Start Date: 01/Feb/21 21:54
Worklog Time Spent: 10m 
  Work Description: pgaref commented on a change in pull request #1823:
URL: https://github.com/apache/hive/pull/1823#discussion_r568166455



##
File path: 
llap-server/src/java/org/apache/hadoop/hive/llap/io/encoded/LlapRecordReaderUtils.java
##
@@ -0,0 +1,440 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.llap.io.encoded;
+
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hive.common.io.DiskRangeList;
+import org.apache.hadoop.hive.ql.io.orc.encoded.LlapDataReader;
+import org.apache.orc.CompressionCodec;
+import org.apache.orc.CompressionKind;
+import org.apache.orc.OrcFile;
+import org.apache.orc.OrcProto;
+import org.apache.orc.StripeInformation;
+import org.apache.orc.TypeDescription;
+import org.apache.orc.impl.BufferChunk;
+import org.apache.orc.impl.DataReaderProperties;
+import org.apache.orc.impl.DirectDecompressionCodec;
+import org.apache.orc.impl.HadoopShims;
+import org.apache.orc.impl.HadoopShimsFactory;
+import org.apache.orc.impl.InStream;
+import org.apache.orc.impl.OrcCodecPool;
+import org.apache.orc.impl.OrcIndex;
+import org.apache.orc.impl.RecordReaderUtils;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.List;
+import java.util.function.Supplier;
+
+public class LlapRecordReaderUtils {
+
+  private static final HadoopShims SHIMS = HadoopShimsFactory.get();
+  private static final Logger LOG = 
LoggerFactory.getLogger(LlapRecordReaderUtils.class);
+
+  static HadoopShims.ZeroCopyReaderShim createZeroCopyShim(FSDataInputStream 
file, CompressionCodec codec,
+  RecordReaderUtils.ByteBufferAllocatorPool pool) throws IOException {
+return codec == null || (codec instanceof DirectDecompressionCodec && 
((DirectDecompressionCodec) codec)
+.isAvailable()) ? null : SHIMS.getZeroCopyReader(file, pool);

Review comment:
   Good catch, FIXed thanks!





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 545623)
Time Spent: 6h 50m  (was: 6h 40m)

> Upgrade ORC version to 1.6.7
> 
>
> Key: HIVE-23553
> URL: https://issues.apache.org/jira/browse/HIVE-23553
> Project: Hive
>  Issue Type: Improvement
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 6h 50m
>  Remaining Estimate: 0h
>
>  Apache Hive is currently on 1.5.X version and in order to take advantage of 
> the latest ORC improvements such as column encryption we have to bump to 
> 1.6.X.
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12343288&styleName=&projectId=12318320&Create=Create&atl_token=A5KQ-2QAV-T4JA-FDED_4ae78f19321c7fb1e7f337fba1dd90af751d8810_lin
> Even though ORC reader could work out of the box, HIVE LLAP is heavily 
> depending on internal ORC APIs e.g., to retrieve and store File Footers, 
> Tails, streams – un/compress RG data etc. As there ware many internal changes 
> from 1.5 to 1.6 (Input stream offsets, relative BufferChunks etc.) the 
> upgrade is not straightforward.
> This Umbrella Jira tracks this upgrade effort.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23553) Upgrade ORC version to 1.6.7

2021-02-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23553?focusedWorklogId=545617&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-545617
 ]

ASF GitHub Bot logged work on HIVE-23553:
-

Author: ASF GitHub Bot
Created on: 01/Feb/21 21:45
Start Date: 01/Feb/21 21:45
Worklog Time Spent: 10m 
  Work Description: mustafaiman commented on a change in pull request #1823:
URL: https://github.com/apache/hive/pull/1823#discussion_r568161096



##
File path: 
llap-server/src/java/org/apache/hadoop/hive/llap/io/encoded/LlapRecordReaderUtils.java
##
@@ -0,0 +1,440 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.llap.io.encoded;
+
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hive.common.io.DiskRangeList;
+import org.apache.hadoop.hive.ql.io.orc.encoded.LlapDataReader;
+import org.apache.orc.CompressionCodec;
+import org.apache.orc.CompressionKind;
+import org.apache.orc.OrcFile;
+import org.apache.orc.OrcProto;
+import org.apache.orc.StripeInformation;
+import org.apache.orc.TypeDescription;
+import org.apache.orc.impl.BufferChunk;
+import org.apache.orc.impl.DataReaderProperties;
+import org.apache.orc.impl.DirectDecompressionCodec;
+import org.apache.orc.impl.HadoopShims;
+import org.apache.orc.impl.HadoopShimsFactory;
+import org.apache.orc.impl.InStream;
+import org.apache.orc.impl.OrcCodecPool;
+import org.apache.orc.impl.OrcIndex;
+import org.apache.orc.impl.RecordReaderUtils;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.List;
+import java.util.function.Supplier;
+
+public class LlapRecordReaderUtils {
+
+  private static final HadoopShims SHIMS = HadoopShimsFactory.get();
+  private static final Logger LOG = 
LoggerFactory.getLogger(LlapRecordReaderUtils.class);
+
+  static HadoopShims.ZeroCopyReaderShim createZeroCopyShim(FSDataInputStream 
file, CompressionCodec codec,
+  RecordReaderUtils.ByteBufferAllocatorPool pool) throws IOException {
+return codec == null || (codec instanceof DirectDecompressionCodec && 
((DirectDecompressionCodec) codec)
+.isAvailable()) ? null : SHIMS.getZeroCopyReader(file, pool);

Review comment:
   I think this was equivalent to `codec == null || (codec instanceof 
DirectDecompressionCodec && ((DirectDecompressionCodec) codec).isAvailable()) ? 
SHIMS.getZeroCopyReader(file, pool) : null`
before. Looks like `null: SHIMS.getZeroCopyReader` thing got inverted.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 545617)
Time Spent: 6h 40m  (was: 6.5h)

> Upgrade ORC version to 1.6.7
> 
>
> Key: HIVE-23553
> URL: https://issues.apache.org/jira/browse/HIVE-23553
> Project: Hive
>  Issue Type: Improvement
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 6h 40m
>  Remaining Estimate: 0h
>
>  Apache Hive is currently on 1.5.X version and in order to take advantage of 
> the latest ORC improvements such as column encryption we have to bump to 
> 1.6.X.
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12343288&styleName=&projectId=12318320&Create=Create&atl_token=A5KQ-2QAV-T4JA-FDED_4ae78f19321c7fb1e7f337fba1dd90af751d8810_lin
> Even though ORC reader could work out of the box, HIVE LLAP is heavily 
> depending on internal ORC APIs e.g., to retrieve and store File Footers, 
> Tails, streams – un/compress RG data etc. As there ware many internal changes 
> from 1.5 to 1.6 (Input st

[jira] [Work logged] (HIVE-23553) Upgrade ORC version to 1.6.7

2021-02-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23553?focusedWorklogId=545610&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-545610
 ]

ASF GitHub Bot logged work on HIVE-23553:
-

Author: ASF GitHub Bot
Created on: 01/Feb/21 21:38
Start Date: 01/Feb/21 21:38
Worklog Time Spent: 10m 
  Work Description: mustafaiman commented on a change in pull request #1823:
URL: https://github.com/apache/hive/pull/1823#discussion_r568157322



##
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##
@@ -4509,7 +4509,7 @@ private static void populateLlapDaemonVarsSet(Set 
llapDaemonVarsSetLocal
 "Minimum allocation possible from LLAP buddy allocator. Allocations 
below that are\n" +
 "padded to minimum allocation. For ORC, should generally be the same 
as the expected\n" +
 "compression buffer size, or next lowest power of 2. Must be a power 
of 2."),
-LLAP_ALLOCATOR_MAX_ALLOC("hive.llap.io.allocator.alloc.max", "16Mb", new 
SizeValidator(),
+LLAP_ALLOCATOR_MAX_ALLOC("hive.llap.io.allocator.alloc.max", "4Mb", new 
SizeValidator(),

Review comment:
   I still do not understand why we need to change LLAP Allocator's maximum 
allocation size. Does LLAP allocator serve only ORC writers? I think it is used 
for other buffer needs too.
   
   Hive depends on ORC. So I dont understand how ORC uses 
LLAP_ALLOCATOR_MAX_ALLOC for anything. We pass orc writers the appropriate 
configs. If ORC writers need smaller buffer, we can configure that for those 
writers via WriterOptions. There is no need to change llap allocator's settings 
for that.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 545610)
Time Spent: 6.5h  (was: 6h 20m)

> Upgrade ORC version to 1.6.7
> 
>
> Key: HIVE-23553
> URL: https://issues.apache.org/jira/browse/HIVE-23553
> Project: Hive
>  Issue Type: Improvement
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>
>  Apache Hive is currently on 1.5.X version and in order to take advantage of 
> the latest ORC improvements such as column encryption we have to bump to 
> 1.6.X.
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12343288&styleName=&projectId=12318320&Create=Create&atl_token=A5KQ-2QAV-T4JA-FDED_4ae78f19321c7fb1e7f337fba1dd90af751d8810_lin
> Even though ORC reader could work out of the box, HIVE LLAP is heavily 
> depending on internal ORC APIs e.g., to retrieve and store File Footers, 
> Tails, streams – un/compress RG data etc. As there ware many internal changes 
> from 1.5 to 1.6 (Input stream offsets, relative BufferChunks etc.) the 
> upgrade is not straightforward.
> This Umbrella Jira tracks this upgrade effort.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23553) Upgrade ORC version to 1.6.7

2021-02-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23553?focusedWorklogId=545493&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-545493
 ]

ASF GitHub Bot logged work on HIVE-23553:
-

Author: ASF GitHub Bot
Created on: 01/Feb/21 18:19
Start Date: 01/Feb/21 18:19
Worklog Time Spent: 10m 
  Work Description: pgaref commented on pull request #1823:
URL: https://github.com/apache/hive/pull/1823#issuecomment-771057135


   Tests just passed and comments are addressed above.
   @mustafaiman @jcamachor please take another look and let me know what you 
think :) 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 545493)
Time Spent: 6h 20m  (was: 6h 10m)

> Upgrade ORC version to 1.6.7
> 
>
> Key: HIVE-23553
> URL: https://issues.apache.org/jira/browse/HIVE-23553
> Project: Hive
>  Issue Type: Improvement
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>
>  Apache Hive is currently on 1.5.X version and in order to take advantage of 
> the latest ORC improvements such as column encryption we have to bump to 
> 1.6.X.
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12343288&styleName=&projectId=12318320&Create=Create&atl_token=A5KQ-2QAV-T4JA-FDED_4ae78f19321c7fb1e7f337fba1dd90af751d8810_lin
> Even though ORC reader could work out of the box, HIVE LLAP is heavily 
> depending on internal ORC APIs e.g., to retrieve and store File Footers, 
> Tails, streams – un/compress RG data etc. As there ware many internal changes 
> from 1.5 to 1.6 (Input stream offsets, relative BufferChunks etc.) the 
> upgrade is not straightforward.
> This Umbrella Jira tracks this upgrade effort.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23553) Upgrade ORC version to 1.6.7

2021-02-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23553?focusedWorklogId=545316&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-545316
 ]

ASF GitHub Bot logged work on HIVE-23553:
-

Author: ASF GitHub Bot
Created on: 01/Feb/21 13:07
Start Date: 01/Feb/21 13:07
Worklog Time Spent: 10m 
  Work Description: pgaref commented on a change in pull request #1823:
URL: https://github.com/apache/hive/pull/1823#discussion_r566908774



##
File path: ql/src/test/results/clientpositive/tez/orc_merge12.q.out
##
@@ -162,8 +162,8 @@ Stripe Statistics:
 Column 6: count: 9174 hasNull: true min: -16379.0 max: 9763215.5639 sum: 
5.62236530305E7
 Column 7: count: 12288 hasNull: false min: 
00020767-dd8f-4f4d-bd68-4b7be64b8e44 max: fffa3516-e219-4027-b0d3-72bb2e676c52 
sum: 442368
 Column 8: count: 12288 hasNull: false min: 
000976f7-7075-4f3f-a564-5a375fafcc101416a2b7-7f64-41b7-851f-97d15405037e max: 
fffd0642-5f01-48cd-8d97-3428faee49e9b39f2b4c-efdc-4e5f-9ab5-4aa5394cb156 sum: 
884736
-Column 9: count: 9173 hasNull: true min: 1969-12-31 15:59:30.929 max: 
1969-12-31 16:00:30.808
-Column 10: count: 9174 hasNull: true min: 1969-12-31 15:59:30.929 max: 
1969-12-31 16:00:30.808
+Column 9: count: 9173 hasNull: true min: 1969-12-31 15:59:30.929 max: 
1969-12-31 16:00:30.80899

Review comment:
   Yes, this is expected as we are now supporting Nanosecond precision for 
Timestamps: https://issues.apache.org/jira/browse/ORC-663





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 545316)
Time Spent: 6h 10m  (was: 6h)

> Upgrade ORC version to 1.6.7
> 
>
> Key: HIVE-23553
> URL: https://issues.apache.org/jira/browse/HIVE-23553
> Project: Hive
>  Issue Type: Improvement
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
>  Apache Hive is currently on 1.5.X version and in order to take advantage of 
> the latest ORC improvements such as column encryption we have to bump to 
> 1.6.X.
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12343288&styleName=&projectId=12318320&Create=Create&atl_token=A5KQ-2QAV-T4JA-FDED_4ae78f19321c7fb1e7f337fba1dd90af751d8810_lin
> Even though ORC reader could work out of the box, HIVE LLAP is heavily 
> depending on internal ORC APIs e.g., to retrieve and store File Footers, 
> Tails, streams – un/compress RG data etc. As there ware many internal changes 
> from 1.5 to 1.6 (Input stream offsets, relative BufferChunks etc.) the 
> upgrade is not straightforward.
> This Umbrella Jira tracks this upgrade effort.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23553) Upgrade ORC version to 1.6.7

2021-02-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23553?focusedWorklogId=545313&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-545313
 ]

ASF GitHub Bot logged work on HIVE-23553:
-

Author: ASF GitHub Bot
Created on: 01/Feb/21 13:05
Start Date: 01/Feb/21 13:05
Worklog Time Spent: 10m 
  Work Description: pgaref edited a comment on pull request #1823:
URL: https://github.com/apache/hive/pull/1823#issuecomment-769755146


   > I only partially reviewed this. Will continue reviewing.
   > One question: I see we do not care about column encryption related 
arguments in multiple places. Is it because column encryption is not supported?
   
   Hey @mustafaiman good question with a complicated answer -- while creating 
this I also did some digging to find out whats supported and what not. To sum 
up my findings: 
   
   -  It looks like we are currently able to encrypt entire tables and/or data 
on hdfs using kms: HIVE-8065
   -  Support for column level encryption/decryption (passing some encryption 
setting to the Table props and let Hive take care of the rest) started more 
than a while ago as part of HIVE-6329 
   -  There was a community discussion as part of HIVE-21848 to unify 
encryption table properties (at least for ORC and Parquet) that concluded in 
the accepted options
   - However, these properties are still not propagated to the tables: 
HIVE-21849
   
   I believe part of the reason is that Hive already integrates with Apache 
Ranger that can restrict user access to particular columns and also adds 
data-masking on top.
   However, I am more than happy discussing the revival of column encryption at 
some point.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 545313)
Time Spent: 6h  (was: 5h 50m)

> Upgrade ORC version to 1.6.7
> 
>
> Key: HIVE-23553
> URL: https://issues.apache.org/jira/browse/HIVE-23553
> Project: Hive
>  Issue Type: Improvement
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
>  Apache Hive is currently on 1.5.X version and in order to take advantage of 
> the latest ORC improvements such as column encryption we have to bump to 
> 1.6.X.
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12343288&styleName=&projectId=12318320&Create=Create&atl_token=A5KQ-2QAV-T4JA-FDED_4ae78f19321c7fb1e7f337fba1dd90af751d8810_lin
> Even though ORC reader could work out of the box, HIVE LLAP is heavily 
> depending on internal ORC APIs e.g., to retrieve and store File Footers, 
> Tails, streams – un/compress RG data etc. As there ware many internal changes 
> from 1.5 to 1.6 (Input stream offsets, relative BufferChunks etc.) the 
> upgrade is not straightforward.
> This Umbrella Jira tracks this upgrade effort.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23553) Upgrade ORC version to 1.6.7

2021-02-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23553?focusedWorklogId=545312&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-545312
 ]

ASF GitHub Bot logged work on HIVE-23553:
-

Author: ASF GitHub Bot
Created on: 01/Feb/21 13:03
Start Date: 01/Feb/21 13:03
Worklog Time Spent: 10m 
  Work Description: pgaref commented on a change in pull request #1823:
URL: https://github.com/apache/hive/pull/1823#discussion_r567807669



##
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##
@@ -4509,7 +4509,7 @@ private static void populateLlapDaemonVarsSet(Set 
llapDaemonVarsSetLocal
 "Minimum allocation possible from LLAP buddy allocator. Allocations 
below that are\n" +
 "padded to minimum allocation. For ORC, should generally be the same 
as the expected\n" +
 "compression buffer size, or next lowest power of 2. Must be a power 
of 2."),
-LLAP_ALLOCATOR_MAX_ALLOC("hive.llap.io.allocator.alloc.max", "16Mb", new 
SizeValidator(),
+LLAP_ALLOCATOR_MAX_ALLOC("hive.llap.io.allocator.alloc.max", "4Mb", new 
SizeValidator(),

Review comment:
   The issue here is that LLAP_ALLOCATOR_MAX_ALLOC is also used as the ORC 
Writer buffer size (thus the change).
   
   Initial buffer size check was introduced in 
[ORC-238](https://github.com/apache/orc/pull/171/files) even though it was only 
applied when buffer size was enforced from table properties. Later, on ORC-1.6 
this was enforced for the [Writer buffer size in 
general](https://github.com/apache/orc/blob/0128f817b0ab28fa2d0660737234ac966f0f5c50/java/core/src/java/org/apache/orc/impl/WriterImpl.java#L171).
   
   The max bufferSize argument can be up to 2^(3*8 - 1) -- meaning less than 
8Mb and since we enforce the size to be power of 2 the next available is 4Mb.
   
   The main question here is if there is a reason for the upper limit to be < 8 
Mb (cc @prasanthj that might know more here) -- or if we should decouple the 
two configuration (LLAP alloc and ORC Writer buffer size).
   
   I believe the best thing to do for now is open a new Ticket to track this 
(as this will either require more work on LLAP, or a new release on ORC) -- and 
I do not expect this to cause any major issues until then. @mustafaiman what do 
you think?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 545312)
Time Spent: 5h 50m  (was: 5h 40m)

> Upgrade ORC version to 1.6.7
> 
>
> Key: HIVE-23553
> URL: https://issues.apache.org/jira/browse/HIVE-23553
> Project: Hive
>  Issue Type: Improvement
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
>  Apache Hive is currently on 1.5.X version and in order to take advantage of 
> the latest ORC improvements such as column encryption we have to bump to 
> 1.6.X.
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12343288&styleName=&projectId=12318320&Create=Create&atl_token=A5KQ-2QAV-T4JA-FDED_4ae78f19321c7fb1e7f337fba1dd90af751d8810_lin
> Even though ORC reader could work out of the box, HIVE LLAP is heavily 
> depending on internal ORC APIs e.g., to retrieve and store File Footers, 
> Tails, streams – un/compress RG data etc. As there ware many internal changes 
> from 1.5 to 1.6 (Input stream offsets, relative BufferChunks etc.) the 
> upgrade is not straightforward.
> This Umbrella Jira tracks this upgrade effort.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23553) Upgrade ORC version to 1.6.7

2021-02-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23553?focusedWorklogId=545304&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-545304
 ]

ASF GitHub Bot logged work on HIVE-23553:
-

Author: ASF GitHub Bot
Created on: 01/Feb/21 12:51
Start Date: 01/Feb/21 12:51
Worklog Time Spent: 10m 
  Work Description: pgaref commented on a change in pull request #1823:
URL: https://github.com/apache/hive/pull/1823#discussion_r567800512



##
File path: ql/src/java/org/apache/hadoop/hive/ql/io/orc/LocalCache.java
##
@@ -82,8 +82,7 @@ public void put(Path path, OrcTail tail) {
 if (bb.capacity() != bb.remaining()) {
   throw new RuntimeException("Bytebuffer allocated for path: " + path + " 
has remaining: " + bb.remaining() + " != capacity: " + bb.capacity());
 }
-cache.put(path, new TailAndFileData(tail.getFileTail().getFileLength(),
-tail.getFileModificationTime(), bb.duplicate()));
+cache.put(path, new TailAndFileData(bb.limit(), 
tail.getFileModificationTime(), bb.duplicate()));

Review comment:
   But I agree, cache should be populated with the original 
**getFileTail().getFileLength()**  as it is afterward used for comparison (thus 
reverted this change) -- however, where ReaderImpl.extractFileTail is now 
called uses the buffer size instead.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 545304)
Time Spent: 5h 40m  (was: 5.5h)

> Upgrade ORC version to 1.6.7
> 
>
> Key: HIVE-23553
> URL: https://issues.apache.org/jira/browse/HIVE-23553
> Project: Hive
>  Issue Type: Improvement
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
>  Apache Hive is currently on 1.5.X version and in order to take advantage of 
> the latest ORC improvements such as column encryption we have to bump to 
> 1.6.X.
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12343288&styleName=&projectId=12318320&Create=Create&atl_token=A5KQ-2QAV-T4JA-FDED_4ae78f19321c7fb1e7f337fba1dd90af751d8810_lin
> Even though ORC reader could work out of the box, HIVE LLAP is heavily 
> depending on internal ORC APIs e.g., to retrieve and store File Footers, 
> Tails, streams – un/compress RG data etc. As there ware many internal changes 
> from 1.5 to 1.6 (Input stream offsets, relative BufferChunks etc.) the 
> upgrade is not straightforward.
> This Umbrella Jira tracks this upgrade effort.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23553) Upgrade ORC version to 1.6.7

2021-02-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23553?focusedWorklogId=545302&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-545302
 ]

ASF GitHub Bot logged work on HIVE-23553:
-

Author: ASF GitHub Bot
Created on: 01/Feb/21 12:48
Start Date: 01/Feb/21 12:48
Worklog Time Spent: 10m 
  Work Description: pgaref commented on a change in pull request #1823:
URL: https://github.com/apache/hive/pull/1823#discussion_r567799059



##
File path: 
ql/src/test/results/clientpositive/llap/dynamic_semijoin_reduction_multicol.q.out
##
@@ -355,7 +355,7 @@ Stage-1 HIVE COUNTERS:
RECORDS_OUT_OPERATOR_TS_3: 800
TOTAL_TABLE_ROWS_WRITTEN: 0
 Stage-1 LLAP IO COUNTERS:
-   CACHE_HIT_BYTES: 138344
+   CACHE_MISS_BYTES: 138342

Review comment:
   This was a bit more complex, CacheWriter.getSparseOrcIndexFromDenseDest 
was called with colId = 0 from SerDeEncodedDataReader -- causing 
IndexOutOfBounds and Cache not being populated.
   
   This is now addressed by 
https://github.com/apache/hive/pull/1823/commits/da1aa077716a65c2a02d850828b16cdeece1f574





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 545302)
Time Spent: 5.5h  (was: 5h 20m)

> Upgrade ORC version to 1.6.7
> 
>
> Key: HIVE-23553
> URL: https://issues.apache.org/jira/browse/HIVE-23553
> Project: Hive
>  Issue Type: Improvement
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
>  Apache Hive is currently on 1.5.X version and in order to take advantage of 
> the latest ORC improvements such as column encryption we have to bump to 
> 1.6.X.
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12343288&styleName=&projectId=12318320&Create=Create&atl_token=A5KQ-2QAV-T4JA-FDED_4ae78f19321c7fb1e7f337fba1dd90af751d8810_lin
> Even though ORC reader could work out of the box, HIVE LLAP is heavily 
> depending on internal ORC APIs e.g., to retrieve and store File Footers, 
> Tails, streams – un/compress RG data etc. As there ware many internal changes 
> from 1.5 to 1.6 (Input stream offsets, relative BufferChunks etc.) the 
> upgrade is not straightforward.
> This Umbrella Jira tracks this upgrade effort.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23553) Upgrade ORC version to 1.6.7

2021-01-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23553?focusedWorklogId=544478&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-544478
 ]

ASF GitHub Bot logged work on HIVE-23553:
-

Author: ASF GitHub Bot
Created on: 29/Jan/21 20:40
Start Date: 29/Jan/21 20:40
Worklog Time Spent: 10m 
  Work Description: pgaref commented on a change in pull request #1823:
URL: https://github.com/apache/hive/pull/1823#discussion_r566867490



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcAcidRowBatchReader.java
##
@@ -443,7 +444,8 @@ public void setBaseAndInnerReader(
   return new OrcRawRecordMerger.KeyInterval(null, null);
 }
 
-OrcTail orcTail = getOrcTail(orcSplit.getPath(), conf, cacheTag, 
orcSplit.getFileKey()).orcTail;
+VectorizedOrcAcidRowBatchReader.ReaderData orcReaderData =

Review comment:
   This is one of the breaking ORC changes introduced by encryption support.
   As Tail and thus StripeStatistics may be encrypted, we always need a reader 
instance to retrieve them.
   
   OrcTail maintained the API call for backwards compatibility but it still 
expects a reader to actually retrieve the stats:
   
https://github.com/apache/orc/blob/d78cc39a9299b62bc8a5d8f5c3fac9201e03cb8b/java/core/src/java/org/apache/orc/impl/OrcTail.java#L210





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 544478)
Time Spent: 5h 20m  (was: 5h 10m)

> Upgrade ORC version to 1.6.7
> 
>
> Key: HIVE-23553
> URL: https://issues.apache.org/jira/browse/HIVE-23553
> Project: Hive
>  Issue Type: Improvement
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
>  Apache Hive is currently on 1.5.X version and in order to take advantage of 
> the latest ORC improvements such as column encryption we have to bump to 
> 1.6.X.
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12343288&styleName=&projectId=12318320&Create=Create&atl_token=A5KQ-2QAV-T4JA-FDED_4ae78f19321c7fb1e7f337fba1dd90af751d8810_lin
> Even though ORC reader could work out of the box, HIVE LLAP is heavily 
> depending on internal ORC APIs e.g., to retrieve and store File Footers, 
> Tails, streams – un/compress RG data etc. As there ware many internal changes 
> from 1.5 to 1.6 (Input stream offsets, relative BufferChunks etc.) the 
> upgrade is not straightforward.
> This Umbrella Jira tracks this upgrade effort.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23553) Upgrade ORC version to 1.6.7

2021-01-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23553?focusedWorklogId=544339&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-544339
 ]

ASF GitHub Bot logged work on HIVE-23553:
-

Author: ASF GitHub Bot
Created on: 29/Jan/21 16:29
Start Date: 29/Jan/21 16:29
Worklog Time Spent: 10m 
  Work Description: pgaref commented on a change in pull request #1823:
URL: https://github.com/apache/hive/pull/1823#discussion_r566941929



##
File path: ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcFile.java
##
@@ -325,7 +326,7 @@ public void testReadFormat_0_11() throws Exception {
 + "binary,string1:string,middle:struct>>,list:array>,"
 + "map:map>,ts:timestamp,"
-+ "decimal1:decimal(38,18)>", readerInspector.getTypeName());
++ "decimal1:decimal(38,10)>", readerInspector.getTypeName());

Review comment:
   Seems that that type scale was not properly propaged before and was 
using HiveDecimal.SYSTEM_DEFAULT_SCALE which is 18: 
https://github.com/apache/hive/blob/ff6f3565e50148b7bcfbcf19b970379f2bd59290/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcStruct.java#L607
   
   This is actually in-line with ORC tests for the same file:
   
https://github.com/apache/orc/blob/b54d10cedf5ec1529cf06d77268510c216402cba/java/core/src/test/org/apache/orc/impl/TestRecordReaderImpl.java#L141





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 544339)
Time Spent: 5h  (was: 4h 50m)

> Upgrade ORC version to 1.6.7
> 
>
> Key: HIVE-23553
> URL: https://issues.apache.org/jira/browse/HIVE-23553
> Project: Hive
>  Issue Type: Improvement
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
>  Apache Hive is currently on 1.5.X version and in order to take advantage of 
> the latest ORC improvements such as column encryption we have to bump to 
> 1.6.X.
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12343288&styleName=&projectId=12318320&Create=Create&atl_token=A5KQ-2QAV-T4JA-FDED_4ae78f19321c7fb1e7f337fba1dd90af751d8810_lin
> Even though ORC reader could work out of the box, HIVE LLAP is heavily 
> depending on internal ORC APIs e.g., to retrieve and store File Footers, 
> Tails, streams – un/compress RG data etc. As there ware many internal changes 
> from 1.5 to 1.6 (Input stream offsets, relative BufferChunks etc.) the 
> upgrade is not straightforward.
> This Umbrella Jira tracks this upgrade effort.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23553) Upgrade ORC version to 1.6.7

2021-01-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23553?focusedWorklogId=544340&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-544340
 ]

ASF GitHub Bot logged work on HIVE-23553:
-

Author: ASF GitHub Bot
Created on: 29/Jan/21 16:29
Start Date: 29/Jan/21 16:29
Worklog Time Spent: 10m 
  Work Description: pgaref commented on a change in pull request #1823:
URL: https://github.com/apache/hive/pull/1823#discussion_r566941929



##
File path: ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcFile.java
##
@@ -325,7 +326,7 @@ public void testReadFormat_0_11() throws Exception {
 + "binary,string1:string,middle:struct>>,list:array>,"
 + "map:map>,ts:timestamp,"
-+ "decimal1:decimal(38,18)>", readerInspector.getTypeName());
++ "decimal1:decimal(38,10)>", readerInspector.getTypeName());

Review comment:
   Seems that that type scale was not properly propaged before and was 
using HiveDecimal.SYSTEM_DEFAULT_SCALE which is 18: 
https://github.com/apache/hive/blob/ff6f3565e50148b7bcfbcf19b970379f2bd59290/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcStruct.java#L607
   
   Current schema (decimal with scale 10) is actually in-line with ORC tests 
for the same file:
   
https://github.com/apache/orc/blob/b54d10cedf5ec1529cf06d77268510c216402cba/java/core/src/test/org/apache/orc/impl/TestRecordReaderImpl.java#L141





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 544340)
Time Spent: 5h 10m  (was: 5h)

> Upgrade ORC version to 1.6.7
> 
>
> Key: HIVE-23553
> URL: https://issues.apache.org/jira/browse/HIVE-23553
> Project: Hive
>  Issue Type: Improvement
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
>  Apache Hive is currently on 1.5.X version and in order to take advantage of 
> the latest ORC improvements such as column encryption we have to bump to 
> 1.6.X.
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12343288&styleName=&projectId=12318320&Create=Create&atl_token=A5KQ-2QAV-T4JA-FDED_4ae78f19321c7fb1e7f337fba1dd90af751d8810_lin
> Even though ORC reader could work out of the box, HIVE LLAP is heavily 
> depending on internal ORC APIs e.g., to retrieve and store File Footers, 
> Tails, streams – un/compress RG data etc. As there ware many internal changes 
> from 1.5 to 1.6 (Input stream offsets, relative BufferChunks etc.) the 
> upgrade is not straightforward.
> This Umbrella Jira tracks this upgrade effort.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23553) Upgrade ORC version to 1.6.7

2021-01-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23553?focusedWorklogId=544327&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-544327
 ]

ASF GitHub Bot logged work on HIVE-23553:
-

Author: ASF GitHub Bot
Created on: 29/Jan/21 15:56
Start Date: 29/Jan/21 15:56
Worklog Time Spent: 10m 
  Work Description: pgaref commented on a change in pull request #1823:
URL: https://github.com/apache/hive/pull/1823#discussion_r566919513



##
File path: ql/src/test/results/clientpositive/llap/orc_file_dump.q.out
##
@@ -249,15 +249,15 @@ Stripes:
   Entry 1: numHashFunctions: 4 bitCount: 6272 popCount: 182 loadFactor: 
0.029 expectedFpp: 7.090246E-7
   Stripe level merge: numHashFunctions: 4 bitCount: 6272 popCount: 1772 
loadFactor: 0.2825 expectedFpp: 0.0063713384
 Row group indices for column 9:
-  Entry 0: count: 1000 hasNull: false min: 2013-03-01 09:11:58.703 max: 
2013-03-01 09:11:58.703 positions: 0,0,0,0,0,0
-  Entry 1: count: 49 hasNull: false min: 2013-03-01 09:11:58.703 max: 
2013-03-01 09:11:58.703 positions: 0,7,488,0,1538,488
+  Entry 0: count: 1000 hasNull: false min: 2013-03-01 09:11:58.70307 max: 
2013-03-01 09:11:58.703325 positions: 0,0,0,0,0,0
+  Entry 1: count: 49 hasNull: false min: 2013-03-01 09:11:58.703076 max: 
2013-03-01 09:11:58.703325 positions: 0,7,488,0,1538,488
 Bloom filters for column 9:
   Entry 0: numHashFunctions: 4 bitCount: 6272 popCount: 4 loadFactor: 
0.0006 expectedFpp: 1.6543056E-13
   Entry 1: numHashFunctions: 4 bitCount: 6272 popCount: 4 loadFactor: 
0.0006 expectedFpp: 1.6543056E-13
   Stripe level merge: numHashFunctions: 4 bitCount: 6272 popCount: 4 
loadFactor: 0.0006 expectedFpp: 1.6543056E-13
 Row group indices for column 10:
-  Entry 0: count: 1000 hasNull: false min: 8 max: 9994 sum: 5118211 
positions: 0,0,0,0,0
-  Entry 1: count: 49 hasNull: false min: 248 max: 9490 sum: 246405 
positions: 0,2194,0,4,488
+  Entry 0: count: 1000 hasNull: false min: 0.08 max: 99.94 sum: 51182.11 
positions: 0,0,0,0,0

Review comment:
   yeah, this is related to the TS conversion issue described above





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 544327)
Time Spent: 4h 50m  (was: 4h 40m)

> Upgrade ORC version to 1.6.7
> 
>
> Key: HIVE-23553
> URL: https://issues.apache.org/jira/browse/HIVE-23553
> Project: Hive
>  Issue Type: Improvement
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
>  Apache Hive is currently on 1.5.X version and in order to take advantage of 
> the latest ORC improvements such as column encryption we have to bump to 
> 1.6.X.
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12343288&styleName=&projectId=12318320&Create=Create&atl_token=A5KQ-2QAV-T4JA-FDED_4ae78f19321c7fb1e7f337fba1dd90af751d8810_lin
> Even though ORC reader could work out of the box, HIVE LLAP is heavily 
> depending on internal ORC APIs e.g., to retrieve and store File Footers, 
> Tails, streams – un/compress RG data etc. As there ware many internal changes 
> from 1.5 to 1.6 (Input stream offsets, relative BufferChunks etc.) the 
> upgrade is not straightforward.
> This Umbrella Jira tracks this upgrade effort.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23553) Upgrade ORC version to 1.6.7

2021-01-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23553?focusedWorklogId=544318&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-544318
 ]

ASF GitHub Bot logged work on HIVE-23553:
-

Author: ASF GitHub Bot
Created on: 29/Jan/21 15:41
Start Date: 29/Jan/21 15:41
Worklog Time Spent: 10m 
  Work Description: pgaref commented on a change in pull request #1823:
URL: https://github.com/apache/hive/pull/1823#discussion_r566909094



##
File path: ql/src/test/results/clientpositive/llap/orc_file_dump.q.out
##
@@ -111,7 +111,7 @@ Stripe Statistics:
 Column 6: count: 1049 hasNull: false bytesOnDisk: 3323 min: 0.02 max: 
49.85 sum: 26286.3477
 Column 7: count: 1049 hasNull: false bytesOnDisk: 137 true: 526
 Column 8: count: 1049 hasNull: false bytesOnDisk: 3430 min:  max: zach 
zipper sum: 13443
-Column 9: count: 1049 hasNull: false bytesOnDisk: 1802 min: 2013-03-01 
09:11:58.703 max: 2013-03-01 09:11:58.703
+Column 9: count: 1049 hasNull: false bytesOnDisk: 1802 min: 2013-03-01 
09:11:58.70307 max: 2013-03-01 09:11:58.703325

Review comment:
   Yes, this is expected as are now supporting Nanosecond precision for 
Timestamps: https://issues.apache.org/jira/browse/ORC-663





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 544318)
Time Spent: 4h 40m  (was: 4.5h)

> Upgrade ORC version to 1.6.7
> 
>
> Key: HIVE-23553
> URL: https://issues.apache.org/jira/browse/HIVE-23553
> Project: Hive
>  Issue Type: Improvement
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
>  Apache Hive is currently on 1.5.X version and in order to take advantage of 
> the latest ORC improvements such as column encryption we have to bump to 
> 1.6.X.
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12343288&styleName=&projectId=12318320&Create=Create&atl_token=A5KQ-2QAV-T4JA-FDED_4ae78f19321c7fb1e7f337fba1dd90af751d8810_lin
> Even though ORC reader could work out of the box, HIVE LLAP is heavily 
> depending on internal ORC APIs e.g., to retrieve and store File Footers, 
> Tails, streams – un/compress RG data etc. As there ware many internal changes 
> from 1.5 to 1.6 (Input stream offsets, relative BufferChunks etc.) the 
> upgrade is not straightforward.
> This Umbrella Jira tracks this upgrade effort.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23553) Upgrade ORC version to 1.6.7

2021-01-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23553?focusedWorklogId=544317&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-544317
 ]

ASF GitHub Bot logged work on HIVE-23553:
-

Author: ASF GitHub Bot
Created on: 29/Jan/21 15:40
Start Date: 29/Jan/21 15:40
Worklog Time Spent: 10m 
  Work Description: pgaref commented on a change in pull request #1823:
URL: https://github.com/apache/hive/pull/1823#discussion_r566908774



##
File path: ql/src/test/results/clientpositive/tez/orc_merge12.q.out
##
@@ -162,8 +162,8 @@ Stripe Statistics:
 Column 6: count: 9174 hasNull: true min: -16379.0 max: 9763215.5639 sum: 
5.62236530305E7
 Column 7: count: 12288 hasNull: false min: 
00020767-dd8f-4f4d-bd68-4b7be64b8e44 max: fffa3516-e219-4027-b0d3-72bb2e676c52 
sum: 442368
 Column 8: count: 12288 hasNull: false min: 
000976f7-7075-4f3f-a564-5a375fafcc101416a2b7-7f64-41b7-851f-97d15405037e max: 
fffd0642-5f01-48cd-8d97-3428faee49e9b39f2b4c-efdc-4e5f-9ab5-4aa5394cb156 sum: 
884736
-Column 9: count: 9173 hasNull: true min: 1969-12-31 15:59:30.929 max: 
1969-12-31 16:00:30.808
-Column 10: count: 9174 hasNull: true min: 1969-12-31 15:59:30.929 max: 
1969-12-31 16:00:30.808
+Column 9: count: 9173 hasNull: true min: 1969-12-31 15:59:30.929 max: 
1969-12-31 16:00:30.80899

Review comment:
   Yes, this is expected as are now supporting Nanosecond precision for 
Timestamps: https://issues.apache.org/jira/browse/ORC-663





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 544317)
Time Spent: 4.5h  (was: 4h 20m)

> Upgrade ORC version to 1.6.7
> 
>
> Key: HIVE-23553
> URL: https://issues.apache.org/jira/browse/HIVE-23553
> Project: Hive
>  Issue Type: Improvement
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
>  Apache Hive is currently on 1.5.X version and in order to take advantage of 
> the latest ORC improvements such as column encryption we have to bump to 
> 1.6.X.
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12343288&styleName=&projectId=12318320&Create=Create&atl_token=A5KQ-2QAV-T4JA-FDED_4ae78f19321c7fb1e7f337fba1dd90af751d8810_lin
> Even though ORC reader could work out of the box, HIVE LLAP is heavily 
> depending on internal ORC APIs e.g., to retrieve and store File Footers, 
> Tails, streams – un/compress RG data etc. As there ware many internal changes 
> from 1.5 to 1.6 (Input stream offsets, relative BufferChunks etc.) the 
> upgrade is not straightforward.
> This Umbrella Jira tracks this upgrade effort.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23553) Upgrade ORC version to 1.6.7

2021-01-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23553?focusedWorklogId=544315&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-544315
 ]

ASF GitHub Bot logged work on HIVE-23553:
-

Author: ASF GitHub Bot
Created on: 29/Jan/21 15:39
Start Date: 29/Jan/21 15:39
Worklog Time Spent: 10m 
  Work Description: pgaref commented on a change in pull request #1823:
URL: https://github.com/apache/hive/pull/1823#discussion_r566784743



##
File path: 
llap-server/src/java/org/apache/hadoop/hive/llap/io/encoded/OrcEncodedDataReader.java
##
@@ -631,10 +630,19 @@ private OrcFileMetadata getFileFooterFromCacheOrDisk() 
throws IOException {
   OrcTail orcTail = getOrcTailFromLlapBuffers(tailBuffers);
   counters.incrCounter(LlapIOCounters.METADATA_CACHE_HIT);
   FileTail tail = orcTail.getFileTail();
-  stats = orcTail.getStripeStatisticsProto();
+  CompressionKind compressionKind = orcTail.getCompressionKind();
+  InStream.StreamOptions options = null;
+  if (compressionKind != CompressionKind.NONE) {
+options = InStream.options()
+
.withCodec(OrcCodecPool.getCodec(compressionKind)).withBufferSize(orcTail.getCompressionBufferSize());
+  }
+  InStream stream = InStream.create("stripe stats", 
orcTail.getTailBuffer(),

Review comment:
   Sure, makes sense -- extracted method above.

##
File path: 
llap-server/src/java/org/apache/hadoop/hive/llap/io/encoded/OrcEncodedDataReader.java
##
@@ -631,10 +630,19 @@ private OrcFileMetadata getFileFooterFromCacheOrDisk() 
throws IOException {
   OrcTail orcTail = getOrcTailFromLlapBuffers(tailBuffers);
   counters.incrCounter(LlapIOCounters.METADATA_CACHE_HIT);
   FileTail tail = orcTail.getFileTail();
-  stats = orcTail.getStripeStatisticsProto();
+  CompressionKind compressionKind = orcTail.getCompressionKind();
+  InStream.StreamOptions options = null;
+  if (compressionKind != CompressionKind.NONE) {
+options = InStream.options()
+
.withCodec(OrcCodecPool.getCodec(compressionKind)).withBufferSize(orcTail.getCompressionBufferSize());
+  }
+  InStream stream = InStream.create("stripe stats", 
orcTail.getTailBuffer(),
+  orcTail.getMetadataOffset(), orcTail.getMetadataSize(), options);
+  stats = 
OrcProto.Metadata.parseFrom(InStream.createCodedInputStream(stream)).getStripeStatsList();
   stripes = new ArrayList<>(tail.getFooter().getStripesCount());
+  int stripeIdx = 0;
   for (OrcProto.StripeInformation stripeProto : 
tail.getFooter().getStripesList()) {
-stripes.add(new ReaderImpl.StripeInformationImpl(stripeProto));
+stripes.add(new ReaderImpl.StripeInformationImpl(stripeProto, 
stripeIdx++, -1, null));

Review comment:
   In ORC-1.5 encryption is not supported -- Stripe info can also be 
encrypted since ORC-523 and thus the extra arguments here. Since we are not yet 
using encryption on LLAP the last 2 params can be null we I am keeping an 
incremental StripeId as it is used in a couple of places like the StripePlanner.

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/LlapDataReader.java
##
@@ -0,0 +1,93 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.io.orc.encoded;
+
+import org.apache.hadoop.hive.common.io.DiskRangeList;
+import org.apache.orc.CompressionCodec;
+import org.apache.orc.OrcFile;
+import org.apache.orc.OrcProto;
+import org.apache.orc.StripeInformation;
+import org.apache.orc.TypeDescription;
+import org.apache.orc.impl.OrcIndex;
+
+import java.io.IOException;
+import java.nio.ByteBuffer;
+
+/** An abstract data reader that IO formats can use to read bytes from 
underlying storage. */
+public interface LlapDataReader extends AutoCloseable, Cloneable {
+
+  /** Opens the DataReader, making it ready to use. */
+  void open() throws IOException;
+
+  OrcIndex readRowIndex(StripeInformation stripe,
+  T

[jira] [Work logged] (HIVE-23553) Upgrade ORC version to 1.6.7

2021-01-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23553?focusedWorklogId=544234&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-544234
 ]

ASF GitHub Bot logged work on HIVE-23553:
-

Author: ASF GitHub Bot
Created on: 29/Jan/21 11:41
Start Date: 29/Jan/21 11:41
Worklog Time Spent: 10m 
  Work Description: pgaref commented on pull request #1823:
URL: https://github.com/apache/hive/pull/1823#issuecomment-769755146


   > I only partially reviewed this. Will continue reviewing.
   > One question: I see we do not care about column encryption related 
arguments in multiple places. Is it because column encryption is not supported?
   
   Hey @mustage good question with a complicated answer -- while creating this 
I also did some digging to find out whats supported and what not. To sum up my 
findings: 
   
   -  It looks like we are currently able to encrypt entire tables and/or data 
on hdfs using kms: HIVE-8065
   -  Support for column level encryption/decryption (passing some encryption 
setting to the Table props and let Hive take care of the rest) started more 
than a while ago as part of HIVE-6329 
   -  There was a community discussion as part of HIVE-21848 to unify 
encryption table properties (at least for ORC and Parquet) that concluded in 
the accepted options
   - However, these properties are still not propagated to the tables: 
HIVE-21849
   
   I believe part of the reason is that Hive already integrates with Apache 
Ranger that can restrict user access to particular columns and also adds 
data-masking on top.
   However, I am more than happy discussing the revival of column encryption at 
some point.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 544234)
Time Spent: 4h 10m  (was: 4h)

> Upgrade ORC version to 1.6.7
> 
>
> Key: HIVE-23553
> URL: https://issues.apache.org/jira/browse/HIVE-23553
> Project: Hive
>  Issue Type: Improvement
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
>  Apache Hive is currently on 1.5.X version and in order to take advantage of 
> the latest ORC improvements such as column encryption we have to bump to 
> 1.6.X.
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12343288&styleName=&projectId=12318320&Create=Create&atl_token=A5KQ-2QAV-T4JA-FDED_4ae78f19321c7fb1e7f337fba1dd90af751d8810_lin
> Even though ORC reader could work out of the box, HIVE LLAP is heavily 
> depending on internal ORC APIs e.g., to retrieve and store File Footers, 
> Tails, streams – un/compress RG data etc. As there ware many internal changes 
> from 1.5 to 1.6 (Input stream offsets, relative BufferChunks etc.) the 
> upgrade is not straightforward.
> This Umbrella Jira tracks this upgrade effort.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23553) Upgrade ORC version to 1.6.7

2021-01-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23553?focusedWorklogId=542604&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-542604
 ]

ASF GitHub Bot logged work on HIVE-23553:
-

Author: ASF GitHub Bot
Created on: 27/Jan/21 03:28
Start Date: 27/Jan/21 03:28
Worklog Time Spent: 10m 
  Work Description: jcamachor commented on a change in pull request #1823:
URL: https://github.com/apache/hive/pull/1823#discussion_r564992785



##
File path: ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcFileFormatProxy.java
##
@@ -47,12 +47,14 @@ public SplitInfos applySargToMetadata(
 OrcTail orcTail = ReaderImpl.extractFileTail(fileMetadata);
 OrcProto.Footer footer = orcTail.getFooter();
 int stripeCount = footer.getStripesCount();
-boolean writerUsedProlepticGregorian = footer.hasCalendar()
-? footer.getCalendar() == OrcProto.CalendarKind.PROLEPTIC_GREGORIAN
-: OrcConf.PROLEPTIC_GREGORIAN_DEFAULT.getBoolean(conf);
+// Always convert To PROLEPTIC_GREGORIAN

Review comment:
   Why is it OK to use proleptic calendar always here? Could we leave short 
explanation in the comment for when we need to revisit this code?

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/EncodedReaderImpl.java
##
@@ -282,6 +280,56 @@ public String toString() {
 }
   }
 
+  public static boolean[] findPresentStreamsByColumn(

Review comment:
   Can we add javadoc for these public static utility methods? If they are 
used only in this class, should we change their visibility?

##
File path: 
ql/src/test/results/clientpositive/llap/schema_evol_orc_nonvec_part_all_primitive.q.out
##
@@ -687,11 +687,11 @@ POSTHOOK: Input: 
default@part_change_various_various_timestamp_n6
 POSTHOOK: Input: default@part_change_various_various_timestamp_n6@part=1
  A masked pattern was here 
 insert_num partc1  c2  c3  c4  c5  c6  c7  
c8  c9  c10 c11 c12 b
-1011   1970-01-01 00:00:00.001 1969-12-31 23:59:59.872 NULL
1969-12-07 03:28:36.352 NULLNULLNULLNULL6229-06-28 
02:54:28.970117179   6229-06-28 02:54:28.97011   6229-06-28 02:54:28.97011  
 1950-12-18 00:00:00 original
-1021   1970-01-01 00:00:00 1970-01-01 00:00:00.127 1970-01-01 
00:00:32.767 1970-01-25 20:31:23.647 NULLNULLNULLNULL5966-07-09 
03:30:50.597 5966-07-09 03:30:50.597 5966-07-09 03:30:50.597 2049-12-18 
00:00:00 original
+1011   1970-01-01 00:00:01 1969-12-31 23:57:52 NULL
1901-12-13 20:45:52 NULLNULLNULLNULL6229-06-28 
02:54:28.970117179   6229-06-28 02:54:28.97011   6229-06-28 02:54:28.97011  
 1950-12-18 00:00:00 original

Review comment:
   This shifting for timestamp values does not seem right (or at least I 
cannot make sense of it). Could you explain what is going on here? Some of the 
shifting is significant: For those, I remember there were some backwards 
incompatible changes in schema evolution in 1.6.x, it may be related to that? 
However, other shifting seems a bit more suspicious, e.g., 1 second, ~2 minutes?

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/EncodedTreeReaderFactory.java
##
@@ -2585,6 +2590,7 @@ private static TreeReader getPrimitiveTreeReader(final 
int columnIndex,
 .setColumnEncoding(columnEncoding)
 .setVectors(vectors)
 .setContext(context)
+.setIsInstant(columnType.getCategory()  == 
TypeDescription.Category.TIMESTAMP_INSTANT)

Review comment:
   As @mustafaiman  mentioned, I think this should be always false indeed: 
TIMESTAMP_INSTANT is equivalent to TIMESTAMP_WITH_LOCAL_TIME_ZONE type in Hive. 
AFAIK support to read/write timestamp with local time zone in ORC is not 
implemented yet.

##
File path: ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcFile.java
##
@@ -325,7 +326,7 @@ public void testReadFormat_0_11() throws Exception {
 + "binary,string1:string,middle:struct>>,list:array>,"
 + "map:map>,ts:timestamp,"
-+ "decimal1:decimal(38,18)>", readerInspector.getTypeName());
++ "decimal1:decimal(38,10)>", readerInspector.getTypeName());

Review comment:
   Change in decimal scale. Expected?

##
File path: ql/src/test/results/clientpositive/llap/orc_file_dump.q.out
##
@@ -249,15 +249,15 @@ Stripes:
   Entry 1: numHashFunctions: 4 bitCount: 6272 popCount: 182 loadFactor: 
0.029 expectedFpp: 7.090246E-7
   Stripe level merge: numHashFunctions: 4 bitCount: 6272 popCount: 1772 
loadFactor: 0.2825 expectedFpp: 0.0063713384
 Row group indices for column 9:
-  Entry 0: count: 1000 hasNull: false min: 2013-03-01 09:11:58.703 max: 
2013-03-01 09:11:58.703 positions: 0,0,0,0,0,0
-

[jira] [Work logged] (HIVE-23553) Upgrade ORC version to 1.6.7

2021-01-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23553?focusedWorklogId=541832&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-541832
 ]

ASF GitHub Bot logged work on HIVE-23553:
-

Author: ASF GitHub Bot
Created on: 26/Jan/21 04:24
Start Date: 26/Jan/21 04:24
Worklog Time Spent: 10m 
  Work Description: mustafaiman commented on a change in pull request #1823:
URL: https://github.com/apache/hive/pull/1823#discussion_r563956041



##
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##
@@ -4509,7 +4509,7 @@ private static void populateLlapDaemonVarsSet(Set 
llapDaemonVarsSetLocal
 "Minimum allocation possible from LLAP buddy allocator. Allocations 
below that are\n" +
 "padded to minimum allocation. For ORC, should generally be the same 
as the expected\n" +
 "compression buffer size, or next lowest power of 2. Must be a power 
of 2."),
-LLAP_ALLOCATOR_MAX_ALLOC("hive.llap.io.allocator.alloc.max", "16Mb", new 
SizeValidator(),
+LLAP_ALLOCATOR_MAX_ALLOC("hive.llap.io.allocator.alloc.max", "4Mb", new 
SizeValidator(),

Review comment:
   You say this is changed to be compatible with ORC setting. I do not 
understand why this is necessary and what its impact is. This looks like a 
change that is not to be taken lightly

##
File path: 
llap-server/src/java/org/apache/hadoop/hive/llap/io/encoded/LlapRecordReaderUtils.java
##
@@ -0,0 +1,438 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.llap.io.encoded;
+
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hive.common.io.DiskRangeList;
+import org.apache.hadoop.hive.ql.io.orc.encoded.LlapDataReader;
+import org.apache.orc.CompressionCodec;
+import org.apache.orc.CompressionKind;
+import org.apache.orc.OrcFile;
+import org.apache.orc.OrcProto;
+import org.apache.orc.StripeInformation;
+import org.apache.orc.TypeDescription;
+import org.apache.orc.impl.BufferChunk;
+import org.apache.orc.impl.DataReaderProperties;
+import org.apache.orc.impl.DirectDecompressionCodec;
+import org.apache.orc.impl.HadoopShims;
+import org.apache.orc.impl.HadoopShimsFactory;
+import org.apache.orc.impl.InStream;
+import org.apache.orc.impl.OrcCodecPool;
+import org.apache.orc.impl.OrcIndex;
+import org.apache.orc.impl.RecordReaderUtils;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.List;
+import java.util.function.Supplier;
+
+public class LlapRecordReaderUtils {
+
+  private static final HadoopShims SHIMS = HadoopShimsFactory.get();
+  private static final Logger LOG = 
LoggerFactory.getLogger(LlapRecordReaderUtils.class);
+
+  static HadoopShims.ZeroCopyReaderShim createZeroCopyShim(FSDataInputStream 
file, CompressionCodec codec, RecordReaderUtils.ByteBufferAllocatorPool pool) 
throws IOException {
+return codec != null && (!(codec instanceof DirectDecompressionCodec) || 
!((DirectDecompressionCodec)codec).isAvailable()) ? null : 
SHIMS.getZeroCopyReader(file, pool);

Review comment:
   Can you invert this condition. There are a lot of negatives making this 
hard to understand
   `codec == null || (codec instanceof DirectDecompressionCodec && 
((DirectDecompressionCodec) codec).isAvailable()) ? 
SHIMS.getZeroCopyReader(file, pool) : null` is much more understandable

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcAcidRowBatchReader.java
##
@@ -443,7 +444,8 @@ public void setBaseAndInnerReader(
   return new OrcRawRecordMerger.KeyInterval(null, null);
 }
 
-OrcTail orcTail = getOrcTail(orcSplit.getPath(), conf, cacheTag, 
orcSplit.getFileKey()).orcTail;
+VectorizedOrcAcidRowBatchReader.ReaderData orcReaderData =

Review comment:
   I am not sure about this. Previously we did not create the full reader. 
Why do we need to create the reader now? All calls from here use orcTail anyway 
except `List stats = 
orcReaderData.reader.

[jira] [Work logged] (HIVE-23553) Upgrade ORC version to 1.6.7

2021-01-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23553?focusedWorklogId=541348&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-541348
 ]

ASF GitHub Bot logged work on HIVE-23553:
-

Author: ASF GitHub Bot
Created on: 25/Jan/21 22:50
Start Date: 25/Jan/21 22:50
Worklog Time Spent: 10m 
  Work Description: mustafaiman commented on a change in pull request #1823:
URL: https://github.com/apache/hive/pull/1823#discussion_r564077084



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/EncodedTreeReaderFactory.java
##
@@ -2585,6 +2590,7 @@ private static TreeReader getPrimitiveTreeReader(final 
int columnIndex,
 .setColumnEncoding(columnEncoding)
 .setVectors(vectors)
 .setContext(context)
+.setIsInstant(columnType.getCategory()  == 
TypeDescription.Category.TIMESTAMP_INSTANT)

Review comment:
   Isn't this always `false`? Don't we need another case for 
TIMESTAMP_INSTANT?

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/LlapDataReader.java
##
@@ -0,0 +1,93 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.io.orc.encoded;
+
+import org.apache.hadoop.hive.common.io.DiskRangeList;
+import org.apache.orc.CompressionCodec;
+import org.apache.orc.OrcFile;
+import org.apache.orc.OrcProto;
+import org.apache.orc.StripeInformation;
+import org.apache.orc.TypeDescription;
+import org.apache.orc.impl.OrcIndex;
+
+import java.io.IOException;
+import java.nio.ByteBuffer;
+
+/** An abstract data reader that IO formats can use to read bytes from 
underlying storage. */
+public interface LlapDataReader extends AutoCloseable, Cloneable {
+
+  /** Opens the DataReader, making it ready to use. */
+  void open() throws IOException;
+
+  OrcIndex readRowIndex(StripeInformation stripe,
+  TypeDescription fileSchema,
+  OrcProto.StripeFooter footer,
+  boolean ignoreNonUtf8BloomFilter,
+  boolean[] included,
+  OrcProto.RowIndex[] indexes,
+  boolean[] sargColumns,
+  OrcFile.WriterVersion version,
+  OrcProto.Stream.Kind[] bloomFilterKinds,
+  OrcProto.BloomFilterIndex[] bloomFilterIndices
+  ) throws IOException;
+
+  OrcProto.StripeFooter readStripeFooter(StripeInformation stripe) throws 
IOException;
+
+  /** Reads the data.
+   *
+   * Note that for the cases such as zero-copy read, caller must release the 
disk ranges
+   * produced after being done with them. Call isTrackingDiskRanges to find 
out if this is needed.
+   * @param range List if disk ranges to read. Ranges with data will be 
ignored.
+   * @param baseOffset Base offset from the start of the file of the ranges in 
disk range list.
+   * @param doForceDirect Whether the data should be read into direct buffers.
+   * @return New or modified list of DiskRange-s, where all the ranges are 
filled with data.
+   */
+  DiskRangeList readFileData(
+  DiskRangeList range, long baseOffset, boolean doForceDirect) throws 
IOException;
+
+
+  /**
+   * Whether the user should release buffers created by readFileData. See 
readFileData javadoc.
+   */
+  boolean isTrackingDiskRanges();
+
+  /**
+   * Releases buffers created by readFileData. See readFileData javadoc.
+   * @param toRelease The buffer to release.
+   */
+  void releaseBuffer(ByteBuffer toRelease);
+
+  /**
+   * Clone the entire state of the DataReader with the assumption that the
+   * clone will be closed at a different time. Thus, any file handles in the
+   * implementation need to be cloned.
+   * @return a new instance
+   */
+  LlapDataReader clone();
+
+  @Override
+  void close() throws IOException;
+
+  /**
+   * Returns the compression codec used by this datareader.
+   * We should consider removing this from the interface.
+   * @return the compression codec
+   */
+  CompressionCodec getCompressionCodec();

Review comment:
   This interface looks like a copy of ORC's DataReader except this method. 
ORC's DataReader returns as StreamOptions instead of CompressionCodec. As far 
as i understand, StreamOptions includes the 

[jira] [Work logged] (HIVE-23553) Upgrade ORC version to 1.6.7

2021-01-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23553?focusedWorklogId=541286&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-541286
 ]

ASF GitHub Bot logged work on HIVE-23553:
-

Author: ASF GitHub Bot
Created on: 25/Jan/21 20:20
Start Date: 25/Jan/21 20:20
Worklog Time Spent: 10m 
  Work Description: mustafaiman commented on a change in pull request #1823:
URL: https://github.com/apache/hive/pull/1823#discussion_r563956041



##
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##
@@ -4509,7 +4509,7 @@ private static void populateLlapDaemonVarsSet(Set 
llapDaemonVarsSetLocal
 "Minimum allocation possible from LLAP buddy allocator. Allocations 
below that are\n" +
 "padded to minimum allocation. For ORC, should generally be the same 
as the expected\n" +
 "compression buffer size, or next lowest power of 2. Must be a power 
of 2."),
-LLAP_ALLOCATOR_MAX_ALLOC("hive.llap.io.allocator.alloc.max", "16Mb", new 
SizeValidator(),
+LLAP_ALLOCATOR_MAX_ALLOC("hive.llap.io.allocator.alloc.max", "4Mb", new 
SizeValidator(),

Review comment:
   You say this is changed to be compatible with ORC setting. I do not 
understand why this is necessary and what its impact is. This looks like a 
change that is not to be taken lightly

##
File path: 
llap-server/src/java/org/apache/hadoop/hive/llap/io/encoded/LlapRecordReaderUtils.java
##
@@ -0,0 +1,438 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.llap.io.encoded;
+
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hive.common.io.DiskRangeList;
+import org.apache.hadoop.hive.ql.io.orc.encoded.LlapDataReader;
+import org.apache.orc.CompressionCodec;
+import org.apache.orc.CompressionKind;
+import org.apache.orc.OrcFile;
+import org.apache.orc.OrcProto;
+import org.apache.orc.StripeInformation;
+import org.apache.orc.TypeDescription;
+import org.apache.orc.impl.BufferChunk;
+import org.apache.orc.impl.DataReaderProperties;
+import org.apache.orc.impl.DirectDecompressionCodec;
+import org.apache.orc.impl.HadoopShims;
+import org.apache.orc.impl.HadoopShimsFactory;
+import org.apache.orc.impl.InStream;
+import org.apache.orc.impl.OrcCodecPool;
+import org.apache.orc.impl.OrcIndex;
+import org.apache.orc.impl.RecordReaderUtils;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.List;
+import java.util.function.Supplier;
+
+public class LlapRecordReaderUtils {
+
+  private static final HadoopShims SHIMS = HadoopShimsFactory.get();
+  private static final Logger LOG = 
LoggerFactory.getLogger(LlapRecordReaderUtils.class);
+
+  static HadoopShims.ZeroCopyReaderShim createZeroCopyShim(FSDataInputStream 
file, CompressionCodec codec, RecordReaderUtils.ByteBufferAllocatorPool pool) 
throws IOException {
+return codec != null && (!(codec instanceof DirectDecompressionCodec) || 
!((DirectDecompressionCodec)codec).isAvailable()) ? null : 
SHIMS.getZeroCopyReader(file, pool);

Review comment:
   Can you invert this condition. There are a lot of negatives making this 
hard to understand
   `codec == null || (codec instanceof DirectDecompressionCodec && 
((DirectDecompressionCodec) codec).isAvailable()) ? 
SHIMS.getZeroCopyReader(file, pool) : null` is much more understandable

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcAcidRowBatchReader.java
##
@@ -443,7 +444,8 @@ public void setBaseAndInnerReader(
   return new OrcRawRecordMerger.KeyInterval(null, null);
 }
 
-OrcTail orcTail = getOrcTail(orcSplit.getPath(), conf, cacheTag, 
orcSplit.getFileKey()).orcTail;
+VectorizedOrcAcidRowBatchReader.ReaderData orcReaderData =

Review comment:
   I am not sure about this. Previously we did not create the full reader. 
Why do we need to create the reader now? All calls from here use orcTail anyway 
except `List stats = 
orcReaderData.reader.

[jira] [Work logged] (HIVE-23553) Upgrade ORC version to 1.6.7

2021-01-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23553?focusedWorklogId=540403&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-540403
 ]

ASF GitHub Bot logged work on HIVE-23553:
-

Author: ASF GitHub Bot
Created on: 22/Jan/21 23:47
Start Date: 22/Jan/21 23:47
Worklog Time Spent: 10m 
  Work Description: dongjoon-hyun commented on pull request #1823:
URL: https://github.com/apache/hive/pull/1823#issuecomment-765763006


   Apache ORC 1.6.7 is officially release. Could you update the PR to use the 
official one, @pgaref ?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 540403)
Time Spent: 3h 20m  (was: 3h 10m)

> Upgrade ORC version to 1.6.7
> 
>
> Key: HIVE-23553
> URL: https://issues.apache.org/jira/browse/HIVE-23553
> Project: Hive
>  Issue Type: Improvement
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
>  Apache Hive is currently on 1.5.X version and in order to take advantage of 
> the latest ORC improvements such as column encryption we have to bump to 
> 1.6.X.
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12343288&styleName=&projectId=12318320&Create=Create&atl_token=A5KQ-2QAV-T4JA-FDED_4ae78f19321c7fb1e7f337fba1dd90af751d8810_lin
> Even though ORC reader could work out of the box, HIVE LLAP is heavily 
> depending on internal ORC APIs e.g., to retrieve and store File Footers, 
> Tails, streams – un/compress RG data etc. As there ware many internal changes 
> from 1.5 to 1.6 (Input stream offsets, relative BufferChunks etc.) the 
> upgrade is not straightforward.
> This Umbrella Jira tracks this upgrade effort.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)