[ 
https://issues.apache.org/jira/browse/ARROW-5949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091547#comment-17091547
 ] 

Neville Dipale commented on ARROW-5949:
---------------------------------------

Hi [~vertexclique], there was some discussion around using sentinel values over 
bitmask ([https://github.com/apache/arrow/pull/6095#discussion_r367760573),] 
and I believe it was a matter of sentinel values not being spec-compliant.

We never resolved the following point, but I was of the opinion that it'd be 
better to provide methods/functions that allow converting a dictionary array 
into a primitive array. 
My opinion was mainly informed by my concern that we don't have a way of using 
dictionary arrays in compute kernels, so at the time I preferred something to 
convert `dict(i32)[<k:[0, 0, null, 1, 2]><v:[1, 2, null]>` to `i32<1, 1, null, 
2, null>`.

The contributor of the PR provided a valid use-case, which led them in the 
route of providing iterator access, so we eventually merged the PR under the 
premise that more work could be done in future to provide other access methods.

Regarding the 2 reasons:

R1: what do you mean by "rebuilding from that lookup"? Do you mean rebuilding a 
primitive array from the dictionary's iterator? If so, would a method that 
converts a dict(i32) into a primitive(i32) suffice for your needs?

R2: may you please provide an example of what you mean by parallel comparison? 
My knowledge of SIMD and auto-vec is a bit limited, but what we noticed in the 
Rust implementation is that we can often forgo explicit SIMD on some 
computation kernels if we relegate null handling to bitmask manipulation, and 
operate on arrays without branching to check nulls 
([https://github.com/apache/arrow/pull/6086]).

> [Rust] Implement DictionaryArray
> --------------------------------
>
>                 Key: ARROW-5949
>                 URL: https://issues.apache.org/jira/browse/ARROW-5949
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: Rust
>            Reporter: David Atienza
>            Assignee: David Atienza
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 0.17.0
>
>          Time Spent: 18h
>  Remaining Estimate: 0h
>
> I am pretty new to the codebase, but I have seen that DictionaryArray is not 
> implemented in the Rust implementation.
> I went through the list of issues and I could not see any work on this. Is 
> there any blocker?
>  
> The specification is a bit 
> [short|https://arrow.apache.org/docs/format/Layout.html#dictionary-encoding] 
> or even 
> [non-existant|https://arrow.apache.org/docs/format/Metadata.html#dictionary-encoding],
>  so I am not sure how to implement it myself.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to