Thanks a lot Wes. I will give the arrow::compute::Cast API a try. BTW, although I don't have any working proposal yet, I wonder what format/process we typically follow for such a proposal? I assume I need to do some experiment locally and draft a email describing the proposal and send it to the dev mail list, is it correct? Or do we have other place/process requiring more formal proposal like PEP for Python?
On Wed, Apr 22, 2020 at 12:22 AM Wes McKinney <wesmck...@gmail.com> wrote: > On Tue, Apr 21, 2020 at 6:34 AM Yue Ni <niyue....@gmail.com> wrote: > > > > Hi there, > > > > I am currently using gandiva C++ library doing projection/selection for > > Arrow record batch, in my record batch, I have some fields encoded with > > dictionary encoding, I wonder how I can apply gandiva functions for these > > dictionary encoded fields. > > > > Currently, there is no gandiva function having signature supporting > > dictionary array, and if I tried using the dictionary array's value type > to > > compose a gandiva function expression and create a projector, it will > > report "Field definition in schema my_field dictionary<values=string, > > indices=int8, ordered=0> different from field in expression > > my_field:string", which is expected. > > > > I would like to know how to solve this problem in arrow/gandiva, more > > specifically: > > 1) Do I need to convert a dictionary array into a non dictionary > > encoded array for applying such a projection? > > Currently yes > > > 2) Is there any API in Arrow that allows me to convert a dictionary array > > into a non dictionary encoded array easily? > > Yes, use arrow::compute::Cast with the dense type as the target type > > > 3) Initially I thought Dictionary Array could be accessed with similar > API > > like other arrays since dictionary encoding seems to me a mechanism for > > organizing the data internally in the array, and I expect I can access > the > > value in the dictionary array like other normal arrays for example, > > dict_array->Value(i), but it turns out users need to use a different API > to > > access the values in dictionary (get the indices/dictionaries and then > > retrieve the value). Because of this API difference, other clients for > the > > arrow API have to handle dictionary array/normal array differently, is > > there any approach/plan to make this transparent to the API clients? > > There's no plan that I'm aware of, but you are welcome to propose one. > > > Thanks. >