Arrow will try to pass along the dictionary directly to parquet so if data
is already in that form it could be faster. But the first encoding
attempted for raw types is dictionary encoding so if cardinality is low
enough you should end up with similar space savings.
On Thursday, June 23, 2022,
For writing Parquet using ParquetWriter, is there any particular advantage
(in terms of ending up with a compact dictionary encoded column being
efficiently written in Parquet) to starting with a DictionaryArray versus a
simpler type?
Thank you!
> I was wondering if it is possible to add a C++ Function to the Compute
> FunctionRegistry and then use the functions in python.
Yes. This is partly handled automatically I believe when functions
are added to the C++ function registry. There are generally two ways
to call the function. For