[jira] [Commented] (ORC-305) Add column statistics for the size on disk

Sandeep More (JIRA) Thu, 22 Mar 2018 08:45:18 -0700

    [ 
https://issues.apache.org/jira/browse/ORC-305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16409746#comment-16409746
 ]


Sandeep More commented on ORC-305:
----------------------------------

Hello [~owen.omalley] 
Thanks for the detailed explanation, I am getting closer and have a question 
about the following line

bq. Now before TreeWriterBase.writeStripe saves the stripe statistics, use 
context.getPhysicalWriter().getFileBytes(id) to get the number of bytes for 
this column for this stripe.

I am assuming you mean, use 'context.getPhysicalWriter().getFileBytes(id)'
in WriterImpl, where we can get hold of the PhysicalWriter, the problem here is 
that I do not have a way to get Column ids (in WriterImpl.flushStripes() ). 

And in the TreeWriterBase, where I do have access to column ids, I don't have 
access to PhysicalWriter instance, is there any other way I can get access to 
PhysicalWriter instance here ?

Best,
Sandeep




> Add column statistics for the size on disk
> ------------------------------------------
>
>                 Key: ORC-305
>                 URL: https://issues.apache.org/jira/browse/ORC-305
>             Project: ORC
>          Issue Type: Test
>            Reporter: Owen O'Malley
>            Assignee: Sandeep More
>            Priority: Major
>
> It would be great to have the size on disk of each column.
> You can generate this by adding up the sizes of the dictionary and data 
> streams.
> It is only relevant at the stripe and file level.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ORC-305) Add column statistics for the size on disk

Reply via email to