[ 
https://issues.apache.org/jira/browse/HIVE-23553?focusedWorklogId=545689&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-545689
 ]

ASF GitHub Bot logged work on HIVE-23553:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 01/Feb/21 23:16
            Start Date: 01/Feb/21 23:16
    Worklog Time Spent: 10m 
      Work Description: pgaref commented on pull request #1823:
URL: https://github.com/apache/hive/pull/1823#issuecomment-771227835


   > All q.out files show data size increase for tables. Since most of them are 
consistently additional 4 bytes per row, that seems like not a bug. However, I 
found some irregular increases too like 16 bytes per row. Can you explain why 
data size increased so we can check the irregularities and make sure they are 
expected?
   
   Hey @mustafaiman -- the main size differences are on Timestamp columns where 
we now support nanosecond precision (using 2 extra variables for the lower and 
the upper precision as part of the stats -- see 
[ORC-611](https://issues.apache.org/jira/browse/ORC-611)).
   
   Other than that there are other changes that can also affect size, such as: 
Trimming StringStatistics minimum and maximum values as part of ORC-203  or 
List and Map column statistics that was recently added as part of ORC-398.
   
   Happy to check further if you have doubts about a particular query.
   
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 545689)
    Time Spent: 7h 40m  (was: 7.5h)

> Upgrade ORC version to 1.6.7
> ----------------------------
>
>                 Key: HIVE-23553
>                 URL: https://issues.apache.org/jira/browse/HIVE-23553
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Panagiotis Garefalakis
>            Assignee: Panagiotis Garefalakis
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 7h 40m
>  Remaining Estimate: 0h
>
>  Apache Hive is currently on 1.5.X version and in order to take advantage of 
> the latest ORC improvements such as column encryption we have to bump to 
> 1.6.X.
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12343288&styleName=&projectId=12318320&Create=Create&atl_token=A5KQ-2QAV-T4JA-FDED_4ae78f19321c7fb1e7f337fba1dd90af751d8810_lin
> Even though ORC reader could work out of the box, HIVE LLAP is heavily 
> depending on internal ORC APIs e.g., to retrieve and store File Footers, 
> Tails, streams – un/compress RG data etc. As there ware many internal changes 
> from 1.5 to 1.6 (Input stream offsets, relative BufferChunks etc.) the 
> upgrade is not straightforward.
> This Umbrella Jira tracks this upgrade effort.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to