Re: Parquet from DictionaryArray of strings

2022-06-23 Thread Micah Kornfield
Arrow will try to pass along the dictionary directly to parquet so if data is already in that form it could be faster. But the first encoding attempted for raw types is dictionary encoding so if cardinality is low enough you should end up with similar space savings. On Thursday, June 23, 2022,

Parquet from DictionaryArray of strings

2022-06-23 Thread Kirby, Adam
For writing Parquet using ParquetWriter, is there any particular advantage (in terms of ending up with a compact dictionary encoded column being efficiently written in Parquet) to starting with a DictionaryArray versus a simpler type? Thank you!

Re: Arrow FunctionRegsitry usage in Python

2022-06-23 Thread Weston Pace
> I was wondering if it is possible to add a C++ Function to the Compute > FunctionRegistry and then use the functions in python. Yes. This is partly handled automatically I believe when functions are added to the C++ function registry. There are generally two ways to call the function. For