[
https://issues.apache.org/jira/browse/ORC-339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16436395#comment-16436395
]
ASF GitHub Bot commented on ORC-339:
------------------------------------
Github user wgtmac commented on a diff in the pull request:
https://github.com/apache/orc/pull/247#discussion_r181239251
--- Diff: site/specification/ORCv2.md ---
@@ -0,0 +1,1032 @@
+---
+layout: page
+title: Evolving Draft for ORC Specification v2
+---
+
+This specification is rapidly evolving and should only be used for
+developers on the project.
+
+# TO DO items
+
+The list of things that we plan to change:
+
+* Create a decimal representation with fixed scale using rle.
+* Create a better float/double encoding that splits mantissa and
+ exponent.
+* Create a dictionary encoding for float, double, and decimal.
+* Create RLEv3:
+ * 64 and 128 bit variants
+ * Zero suppression
+ * Evaluate the rle subformats
+* Group stripe data into stripelets to enable Async IO for reads.
+* Reorder stripe data into (stripe metadata, index, dictionary, data)
+* Stop sorting dictionaries and record the sort order separately in the
index.
+* Remove use of RLEv1 and RLEv2.
+* Remove non-utf8 bloom filter.
+* Use numeric value for decimal bloom filter.
--- End diff --
We may also use numeric value for decimal column statistics
> Reorganize ORC specification
> ----------------------------
>
> Key: ORC-339
> URL: https://issues.apache.org/jira/browse/ORC-339
> Project: ORC
> Issue Type: Improvement
> Reporter: Owen O'Malley
> Assignee: Owen O'Malley
> Priority: Major
>
> Currently we've put the ORC format specification in the documentation. Now
> that we are starting the work to design ORCv2, it will be more convenient to
> have each file format version as a separate page.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)