[ 
https://issues.apache.org/jira/browse/ARROW-5949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16984836#comment-16984836
 ] 

Andy Thomason edited comment on ARROW-5949 at 11/29/19 9:27 AM:
----------------------------------------------------------------

We should discuss the design for a dictionary type and the necessary 
serialisation.

For example, start by adding
  
{code:java}
     Dictionary((Box<DataType>, Box<DataType>)),{code}

To DataType (key and value types)
  
 We may not need the extra Schema dictionary field as this is integral in the 
DataType.
  
{code:java}
pub struct DictionaryArray
{
     keys: ArrayRef,
     values: Vec<ArrayDataRef>,
} {code}
 
 Note that to support multiple dictionary batches, we need a vector of values, 
although
 in the majority of our use cases, we have only used a single dictionary. An 
option
 to concatenate dictionaries might be useful.
  
 Access is similar to ListArray except that the index is a variable type. For 
example,
 we often have a "chromosome" column which is "1", .. "X" and reduces to a byte.
  
 Fast access to dictionary components is essential - returning slices for key 
and
 value per recordbatch. It would be very useful for all types to have a 
rb.get_slice<T>("name") function
 to get a named, typed slice for an array.
  
 Andy.
  
  

 


was (Author: andy-thomason):
We should discuss the design for a dictionary type and the necessary 
serialisation.

For example, start by adding
  
 Dictionary((Box<DataType>, Box<DataType>)),
 To DataType (key and value types)
  
 We may not need the extra Schema dictionary field as this is integral in the 
DataType.
  
{code:java}
pub struct DictionaryArray
{
     keys: ArrayRef,
     values: Vec<ArrayDataRef>,
} {code}
 
 Note that to support multiple dictionary batches, we need a vector of values, 
although
 in the majority of our use cases, we have only used a single dictionary. An 
option
 to concatenate dictionaries might be useful.
  
 Access is similar to ListArray except that the index is a variable type. For 
example,
 we often have a "chromosome" column which is "1", .. "X" and reduces to a byte.
  
 Fast access to dictionary components is essential - returning slices for key 
and
 value per recordbatch. It would be very useful for all types to have a 
rb.get_slice<T>("name") function
 to get a named, typed slice for an array.
  
 Andy.
  
  

 

> [Rust] Implement DictionaryArray
> --------------------------------
>
>                 Key: ARROW-5949
>                 URL: https://issues.apache.org/jira/browse/ARROW-5949
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: Rust
>            Reporter: David Atienza
>            Priority: Major
>
> I am pretty new to the codebase, but I have seen that DictionaryArray is not 
> implemented in the Rust implementation.
> I went through the list of issues and I could not see any work on this. Is 
> there any blocker?
>  
> The specification is a bit 
> [short|https://arrow.apache.org/docs/format/Layout.html#dictionary-encoding] 
> or even 
> [non-existant|https://arrow.apache.org/docs/format/Metadata.html#dictionary-encoding],
>  so I am not sure how to implement it myself.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to