Thanks a lot Wes. I will give the arrow::compute::Cast API a try.

BTW, although I don't have any working proposal yet, I wonder what
format/process we typically follow for such a proposal? I assume I need to
do some experiment locally and draft a email describing the proposal and
send it to the dev mail list, is it correct? Or do we have other
place/process requiring more formal proposal like PEP for Python?

On Wed, Apr 22, 2020 at 12:22 AM Wes McKinney <wesmck...@gmail.com> wrote:

> On Tue, Apr 21, 2020 at 6:34 AM Yue Ni <niyue....@gmail.com> wrote:
> >
> > Hi there,
> >
> > I am currently using gandiva C++ library doing projection/selection for
> > Arrow record batch, in my record batch, I have some fields encoded with
> > dictionary encoding, I wonder how I can apply gandiva functions for these
> > dictionary encoded fields.
> >
> > Currently, there is no gandiva function having signature supporting
> > dictionary array, and if I tried using the dictionary array's value type
> to
> > compose a gandiva function expression and create a projector, it will
> > report "Field definition in schema my_field dictionary<values=string,
> > indices=int8, ordered=0> different from field in expression
> > my_field:string", which is expected.
> >
> > I would like to know how to solve this problem in arrow/gandiva, more
> > specifically:
> > 1) Do I need to convert a dictionary array into a non dictionary
> > encoded array for applying such a projection?
>
> Currently yes
>
> > 2) Is there any API in Arrow that allows me to convert a dictionary array
> > into a non dictionary encoded array easily?
>
> Yes, use arrow::compute::Cast with the dense type as the target type
>
> > 3) Initially I thought Dictionary Array could be accessed with similar
> API
> > like other arrays since dictionary encoding seems to me a mechanism for
> > organizing the data internally in the array, and I expect I can access
> the
> > value in the dictionary array like other normal arrays for example,
> > dict_array->Value(i), but it turns out users need to use a different API
> to
> > access the values in dictionary (get the indices/dictionaries and then
> > retrieve the value). Because of this API difference, other clients for
> the
> > arrow API have to handle dictionary array/normal array differently, is
> > there any approach/plan to make this transparent to the API clients?
>
> There's no plan that I'm aware of, but you are welcome to propose one.
>
> > Thanks.
>

Reply via email to