Hi there, I am currently using gandiva C++ library doing projection/selection for Arrow record batch, in my record batch, I have some fields encoded with dictionary encoding, I wonder how I can apply gandiva functions for these dictionary encoded fields.
Currently, there is no gandiva function having signature supporting dictionary array, and if I tried using the dictionary array's value type to compose a gandiva function expression and create a projector, it will report "Field definition in schema my_field dictionary<values=string, indices=int8, ordered=0> different from field in expression my_field:string", which is expected. I would like to know how to solve this problem in arrow/gandiva, more specifically: 1) Do I need to convert a dictionary array into a non dictionary encoded array for applying such a projection? 2) Is there any API in Arrow that allows me to convert a dictionary array into a non dictionary encoded array easily? 3) Initially I thought Dictionary Array could be accessed with similar API like other arrays since dictionary encoding seems to me a mechanism for organizing the data internally in the array, and I expect I can access the value in the dictionary array like other normal arrays for example, dict_array->Value(i), but it turns out users need to use a different API to access the values in dictionary (get the indices/dictionaries and then retrieve the value). Because of this API difference, other clients for the arrow API have to handle dictionary array/normal array differently, is there any approach/plan to make this transparent to the API clients? Thanks.