[
https://issues.apache.org/jira/browse/ORC-305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16409746#comment-16409746
]
Sandeep More commented on ORC-305:
----------------------------------
Hello [~owen.omalley]
Thanks for the detailed explanation, I am getting closer and have a question
about the following line
bq. Now before TreeWriterBase.writeStripe saves the stripe statistics, use
context.getPhysicalWriter().getFileBytes(id) to get the number of bytes for
this column for this stripe.
I am assuming you mean, use 'context.getPhysicalWriter().getFileBytes(id)'
in WriterImpl, where we can get hold of the PhysicalWriter, the problem here is
that I do not have a way to get Column ids (in WriterImpl.flushStripes() ).
And in the TreeWriterBase, where I do have access to column ids, I don't have
access to PhysicalWriter instance, is there any other way I can get access to
PhysicalWriter instance here ?
Best,
Sandeep
> Add column statistics for the size on disk
> ------------------------------------------
>
> Key: ORC-305
> URL: https://issues.apache.org/jira/browse/ORC-305
> Project: ORC
> Issue Type: Test
> Reporter: Owen O'Malley
> Assignee: Sandeep More
> Priority: Major
>
> It would be great to have the size on disk of each column.
> You can generate this by adding up the sizes of the dictionary and data
> streams.
> It is only relevant at the stripe and file level.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)