[ 
https://issues.apache.org/jira/browse/ORC-339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16438106#comment-16438106
 ] 

ASF GitHub Bot commented on ORC-339:
------------------------------------

Github user xndai commented on a diff in the pull request:

    https://github.com/apache/orc/pull/247#discussion_r181530787
  
    --- Diff: site/specification/ORCv2.md ---
    @@ -0,0 +1,1032 @@
    +---
    +layout: page
    +title: Evolving Draft for ORC Specification v2
    +---
    +
    +This specification is rapidly evolving and should only be used for
    +developers on the project.
    +
    +# TO DO items
    --- End diff --
    
    Is this a final list of v2 or we are still working on it? I have one 
proposal to add to ORC v2, which is what I call "clustered index". Basically 
the writer can specify a sorting property on one or more columns, then we 
create an index section in ORC file with keys being the column(s) value and the 
value is the row number. To reduce the size of index, each row group has one 
entry in the clustered index. This will enable new range scan pattern when 
reader provides upper bound and lower bound of column(s) values. 
    
    I can write up a detailed proposal for this.


> Reorganize ORC specification
> ----------------------------
>
>                 Key: ORC-339
>                 URL: https://issues.apache.org/jira/browse/ORC-339
>             Project: ORC
>          Issue Type: Improvement
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>            Priority: Major
>
> Currently we've put the ORC format specification in the documentation. Now 
> that we are starting the work to design ORCv2, it will be more convenient to 
> have each file format version as a separate page. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to