Hi there,

I am currently using gandiva C++ library doing projection/selection for
Arrow record batch, in my record batch, I have some fields encoded with
dictionary encoding, I wonder how I can apply gandiva functions for these
dictionary encoded fields.

Currently, there is no gandiva function having signature supporting
dictionary array, and if I tried using the dictionary array's value type to
compose a gandiva function expression and create a projector, it will
report "Field definition in schema my_field dictionary<values=string,
indices=int8, ordered=0> different from field in expression
my_field:string", which is expected.

I would like to know how to solve this problem in arrow/gandiva, more
specifically:
1) Do I need to convert a dictionary array into a non dictionary
encoded array for applying such a projection?
2) Is there any API in Arrow that allows me to convert a dictionary array
into a non dictionary encoded array easily?
3) Initially I thought Dictionary Array could be accessed with similar API
like other arrays since dictionary encoding seems to me a mechanism for
organizing the data internally in the array, and I expect I can access the
value in the dictionary array like other normal arrays for example,
dict_array->Value(i), but it turns out users need to use a different API to
access the values in dictionary (get the indices/dictionaries and then
retrieve the value). Because of this API difference, other clients for the
arrow API have to handle dictionary array/normal array differently, is
there any approach/plan to make this transparent to the API clients?

Thanks.

Reply via email to