Thanks!

I read somewhere that the string->int mapping are not guaranteed to be the
same across chunks. Is this correct?
If so, is calling first unify_dictionaries() necessary?

Also, if the operations only work on chunks is it up to the user to iterate
through all chunks to create the resulting array of integers?

Best,

Laurent


Le dim. 28 avr. 2024 à 14:28, Jacek Pliszka <jacek.plis...@gmail.com> a
écrit :

> Hi!
>
> table.column('a').chunk(0).dictionary returns dictionary values as an
> array that you can map...
>
> Then you can construct new Dictionary Type columns from the mapped values
> and table.column('a').chunk(0).indices
> using pa.DictionaryArray.from_arrays
>
> BR
>
> J
>
>
>
> niedz., 28 kwi 2024 o 20:19 Laurent Gautier <lgaut...@gmail.com>
> napisał(a):
>
>> Hi,
>>
>> Is there a way to cast an Array of data type DictionaryType ( for
>> example, I have DictionaryType(dictionary<values=large_string,
>> indices=uint32, ordered=0>)) into integers (the indices) and retrieve the
>> mapping (string -> integer)?
>>
>> I cannot find anything about this in the documentation. For the first ask
>> (cast to integers), trying to cast does not work:
>>
>> >>> pyarrow.compute.cast(foo, pyarrow.int32())
>> ArrowInvalid: Failed to parse string: 'Some String' as a scalar of type
>> int32
>>
>>
>> Best,
>>
>>
>> Laurent
>>
>>

Reply via email to