Re: Questions on Dictionary Array types.

2022-04-20 Thread Suresh V
Thank you very much for the response. I was looking directly at tab['x']. Didnt realize that the dictionary is present at chunk level. On Thu, Apr 21, 2022, 1:17 AM Weston Pace wrote: > > However I cannot figure out any easy way to get the mapping > > used to create the dictionary array (vals)

Re: [Ruby] Cannot require 'parquet' on M1 Mac

2022-04-20 Thread Sutou Kouhei
Hi, Thanks for the update! It seems that we need to know more macOS tools to debug this case... If I have a M1 Mac, I can look into this more. But I don't have it. Please use DYLD_FALLBACK_LIBRARY_PATH for now. Thanks, -- kou In "Re: [Ruby] Cannot require 'parquet' on M1 Mac" on Wed, 20

Re: Questions on Dictionary Array types.

2022-04-20 Thread Weston Pace
> However I cannot figure out any easy way to get the mapping > used to create the dictionary array (vals) easily from the table. Can > you please let me know the easiest way? A dictionary is going to be associated with an array and not a table. So you first need to get the array from the table.

Re: Compute expression using pc.call_function not working as expected

2022-04-20 Thread Weston Pace
No and no. This filter will not be used for predicate pushdown now or in 8.0.0. It could possibly come after 8.0.0. If parquet stores statistics for each column of a struct array (don't know offhand if they do) then we should create a JIRA to expose this. On Wed, Apr 20, 2022, 11:01 AM Partha

Re: Compute expression using pc.call_function not working as expected

2022-04-20 Thread Partha Dutta
That works! Thanks. Do you know off hand if this filter would be used in a predicate pushdown for a parquet dataset? Or would it be possibly coming in version 8.0.0? On Wed, Apr 20, 2022 at 3:49 PM Weston Pace wrote: > The second argument to `call_function` should be a list (the args to > the

Questions on Dictionary Array types.

2022-04-20 Thread Suresh V
Hi .. I created a pyarrow table from a dictionary array as shown below. However I cannot figure out any easy way to get the mapping used to create the dictionary array (vals) easily from the table. Can you please let me know the easiest way? Other than the ones which involve

Re: Compute expression using pc.call_function not working as expected

2022-04-20 Thread Weston Pace
The second argument to `call_function` should be a list (the args to the function). Since `arr3` is iterable it is interpreting it as a list of args and trying to treat each row as an argument to your call (this is the reason it thinks you have 3 arguments). This should work:

Compute expression using pc.call_function not working as expected

2022-04-20 Thread Partha Dutta
I'm trying to use the compute function struct_field in order to create an expression for dataset filtering. But running into an error. This is the code snippet: arr1 = pa.array([100, 200, 300]) arr2 = pa.array([400, 500, 600]) arr3 = pa.StructArray.from_arrays([arr1, arr2], ["one", "two"]) e =

Re: [C++] Null indices and byte lengths of string columns

2022-04-20 Thread Antoine Pitrou
On Mon, 18 Apr 2022 13:09:52 -0700 Micah Kornfield wrote: > Note that uncompressed size is encoded size so can be substantially smaller > then a simple concatenated string buffer Indeed, the only realiable way to get the desired information is to actually read and decode the Parquet data.

Re: [Ruby] Cannot require 'parquet' on M1 Mac

2022-04-20 Thread Sten Larsson
Hi kou I rescued NameError instead of LoadError to catch the error, but unfortunately it still doesn't seem to show anything about loading libraries. I updated the gist with the output: https://gist.github.com/stenlarsson/02d777e4c3b9e485b6e0f80f834ed8f5 Thanks /Sten On Wed, 20 Apr 2022 at