[ 
https://issues.apache.org/jira/browse/ORC-426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16669373#comment-16669373
 ] 

ASF GitHub Bot commented on ORC-426:
------------------------------------

omalley closed pull request #329: ORC-426: Fix errors in ORC specification.
URL: https://github.com/apache/orc/pull/329
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/site/specification/ORCv0.md b/site/specification/ORCv0.md
index 32ce14a151..b4fea4e81b 100644
--- a/site/specification/ORCv0.md
+++ b/site/specification/ORCv0.md
@@ -725,7 +725,7 @@ DIRECT        | PRESENT         | Yes      | Boolean RLE
 ## Map Columns
 
 Maps are encoded as the PRESENT stream and a length stream with number
-of items in each list. They have a child column for the key and
+of items in each map. They have a child column for the key and
 another child column for the value.
 
 Encoding      | Stream Kind     | Optional | Contents
diff --git a/site/specification/ORCv1.md b/site/specification/ORCv1.md
index fb90c8353c..5dbd3d027f 100644
--- a/site/specification/ORCv1.md
+++ b/site/specification/ORCv1.md
@@ -581,8 +581,6 @@ the index values and the additional value bits.
   bit is set, the entire value is negated.
 * Data values (W * L bits padded to the byte) - A sequence of W bit positive
   values that are added to the base value.
-* Data values (W * L bits padded to the byte) - A sequence of W bit positive
-  values that are added to the base value.
 * Patch list (PLL * (PGW + PW) bytes) - A list of patches for values
   that didn't fit within W bits. Each entry in the list consists of a
   gap, which is the number of elements skipped from the previous
@@ -899,7 +897,7 @@ DIRECT_V2     | PRESENT         | Yes      | Boolean RLE
 ## Map Columns
 
 Maps are encoded as the PRESENT stream and a length stream with number
-of items in each list. They have a child column for the key and
+of items in each map. They have a child column for the key and
 another child column for the value.
 
 Encoding      | Stream Kind     | Optional | Contents
@@ -978,7 +976,7 @@ group (default to 10,000 rows) in a column. Only the row 
groups that
 satisfy min/max row index evaluation will be evaluated against the
 bloom filter index.
 
-Each BloomFilterEntry stores the number of hash functions ('k') used
+Each bloom filter entry stores the number of hash functions ('k') used
 and the bitset backing the bloom filter. The original encoding (pre
 ORC-101) of bloom filters used the bitset field encoded as a repeating
 sequence of longs in the bitset field with a little endian encoding
diff --git a/site/specification/ORCv2.md b/site/specification/ORCv2.md
index 76ee571f0e..d91139c0fe 100644
--- a/site/specification/ORCv2.md
+++ b/site/specification/ORCv2.md
@@ -601,8 +601,6 @@ the index values and the additional value bits.
   bit is set, the entire value is negated.
 * Data values (W * L bits padded to the byte) - A sequence of W bit positive
   values that are added to the base value.
-* Data values (W * L bits padded to the byte) - A sequence of W bit positive
-  values that are added to the base value.
 * Patch list (PLL * (PGW + PW) bytes) - A list of patches for values
   that didn't fit within W bits. Each entry in the list consists of a
   gap, which is the number of elements skipped from the previous
@@ -916,7 +914,7 @@ DIRECT_V2     | PRESENT         | Yes      | Boolean RLE
 ## Map Columns
 
 Maps are encoded as the PRESENT stream and a length stream with number
-of items in each list. They have a child column for the key and
+of items in each map. They have a child column for the key and
 another child column for the value.
 
 Encoding      | Stream Kind     | Optional | Contents
@@ -995,7 +993,7 @@ group (default to 10,000 rows) in a column. Only the row 
groups that
 satisfy min/max row index evaluation will be evaluated against the
 bloom filter index.
 
-Each BloomFilterEntry stores the number of hash functions ('k') used
+Each bloom filter entry stores the number of hash functions ('k') used
 and the bitset backing the bloom filter. The original encoding (pre
 ORC-101) of bloom filters used the bitset field encoded as a repeating
 sequence of longs in the bitset field with a little endian encoding


 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Errors in ORC Specification
> ---------------------------
>
>                 Key: ORC-426
>                 URL: https://issues.apache.org/jira/browse/ORC-426
>             Project: ORC
>          Issue Type: Bug
>          Components: documentation
>            Reporter: Fang Zheng
>            Priority: Minor
>              Labels: documentation
>
> There are some errors in the ORC format specifications:
> 1. In specification/ORCv1.md and specification/ORCv2.md, the following 
> sentence appears twice in the description of "Patched Base”:
> Data values (W * L bits padded to the byte) - A sequence of W bit positive
>   values that are added to the base value.
> 2. In specification/ORCv0.md, specification/ORCv1.md, and 
> specification/ORCv2.md, there is an error in the description of “Map Columns”:
> Maps are encoded as the PRESENT stream and a length stream with number
> of items in each list. —> The last word “list” should be changed to “map”
> 3. In specification/ORCv1.md and specification/ORCv2.md, the word 
> “BloomFilterEntry” should be changed to “bloom filter entry”, as 
> “BloomFilterEntry” does not exist in the source code or ProtocolBuffer 
> definition.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to