viirya commented on code in PR #3534:
URL: https://github.com/apache/arrow-rs/pull/3534#discussion_r1071516782


##########
arrow-schema/src/datatype.rs:
##########
@@ -242,6 +242,19 @@ pub enum DataType {
     /// child fields may be respectively "entries", "key", and "value", but 
this is
     /// not enforced.
     Map(Box<Field>, bool),
+
+    /// A run-end encoding (REE) array is a variation of run-length encoding 
(RLE). These
+    /// encodings are well-suited for representing data containing sequences 
of the
+    /// same value, called runs. Each run is represented as a value and an 
integer giving
+    /// the index in the array where the run ends.
+    ///
+    /// A run-end encoded array has no buffers by itself, but has two child 
arrays. The
+    /// first child array, called the run ends array, holds either 16, 32, or 
64-bit
+    /// signed integers. The actual values of each run are held in the second 
child array.
+    ///
+    /// These child arrays are prescribed the standard names of "run_ends" and 
"values"
+    /// respectively.
+    RunEndEncodedType(Box<Field>, Box<Field>),

Review Comment:
   Run-end encode type has child arrays with no buffers, like `Struct`. So I 
treat it as two `Field`s. I think it makes sense for `values` to be `Field` as 
it is possibly to be a dictionary. I remember it is necessary it to be a field 
for IPC serialization on dictionary. `run-ends` is just primitive one, it could 
be just `DataType`, I think.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to