[ 
https://issues.apache.org/jira/browse/ARROW-5949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091640#comment-17091640
 ] 

Neville Dipale commented on ARROW-5949:
---------------------------------------

I think not providing more convenient ways of using DictionaryArray potentially 
defeats the purpose of having it. I've already mentioned the need for compute 
kernel support on dictionaries, some of which would require access to the 
array's keys as a primitive array (e.g. sort, take), and others which would 
need both keys and values (filter).

I would rather have the DictionaryArray<T: PrimitiveType>::keys() return 
ArrayRef instead of NullableIter, then support iterating on arrays in general.

Yes, building the primitive array is a bit expensive, and more importantly, 
it's opaque to a casual Arrow user; so I'd support providing that option.

Look at the below, for example:
{code:java}
impl<'a, K: ArrowPrimitiveType> DictionaryArray<K> {
     pub fn decode_dictionary(&self) -> Result<ArrayRef> {
         // convert the keys into an array
         let keys = Arc::new(PrimitiveArray::<K>::from(self.data.clone())) as 
ArrayRef;
         // cast keys to an uint32 array
         let keys = crate::compute::cast(&keys, &DataType::UInt32)?;
         let keys = UInt32Array::from(keys.data());
         // index into the values of the dictionary, with keys
         crate::compute::take(&self.values, &keys, None)
     }
 }{code}
This is how I'd convert a dictionary to a 'normal' array of an unknown type.

Perhaps this could be a discussion for the mailing list? I'm interested in 
simplifying the dictionary API, and widening dictionary support; this could be 
a good starting point to do this. CC [~paddyhoran] [~andygrove]

> [Rust] Implement DictionaryArray
> --------------------------------
>
>                 Key: ARROW-5949
>                 URL: https://issues.apache.org/jira/browse/ARROW-5949
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: Rust
>            Reporter: David Atienza
>            Assignee: David Atienza
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 0.17.0
>
>          Time Spent: 18h
>  Remaining Estimate: 0h
>
> I am pretty new to the codebase, but I have seen that DictionaryArray is not 
> implemented in the Rust implementation.
> I went through the list of issues and I could not see any work on this. Is 
> there any blocker?
>  
> The specification is a bit 
> [short|https://arrow.apache.org/docs/format/Layout.html#dictionary-encoding] 
> or even 
> [non-existant|https://arrow.apache.org/docs/format/Metadata.html#dictionary-encoding],
>  so I am not sure how to implement it myself.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to