to
make this clear?
Sam
From: Weston Pace
Sent: 15 January 2022 01:46
To: user@arrow.apache.org
Subject: Re: [PyArrow] DictionaryArray isDelta Support
I've been working with IPC files lately so I took another look at this and it
was much easier than I expected
));
> } else {
> RETURN_NOT_OK(
> GetDictionaryPayload(dictionary_id, dictionary, options_,
> ));
> }
> RETURN_NOT_OK(WritePayload(payload));
> ++stats_.num_dictionary_batches;
> if (dict
;
Apache Arrow is a multi-language toolbox for accelerated data interchange and
in-memory processing - arrow/writer.cc at
91e3ac53e2e21736ce6291d73fc37da6fa21259d * apache/arrow
github.com
?
____________
From: Weston Pace
Sent: 27 July 2021 19:52
To: user@arrow.apache.org
Subject:
dictionary_deltas=True)
>
> with pa.ipc.new_file("/tmp/sdavis_tmp.arrow", schema=schema, options=options)
> as writer:
> writer.write(b1)
>
> writer.write(b2)
> ```
>
> Best,
>
> Sam
>
> From: Wes McKinney
> Sent: 24 July 202
writer.write(b1)
writer.write(b2)
```
Best,
Sam
____________
From: Wes McKinney
Sent: 24 July 2021 01:43
To: user@arrow.apache.org
Subject: Re: [PyArrow] DictionaryArray isDelta Support
If I'm interpreting you correctly, the issue is that every dictionary
must be
I think this check in the C++ code triggers regardless of
> whether the delta option is turned on:
>
> https://github.com/apache/arrow/blob/e0401123736c85283e527797a113a3c38c0915f2/cpp/src/arrow/ipc/writer.cc#L1066
> ____
> From: Sam Davis
> Sent: 23 July 20
:43
To: user@arrow.apache.org
Subject: Re: [PyArrow] DictionaryArray isDelta Support
Yes I know this as quoted in the spec. What I am wondering is for the file
format how can I write deltas out using PyArrow?
The previous example was a trivial version of reality.
More concretely, say I want
g
Subject: Re: [PyArrow] DictionaryArray isDelta Support
Dictionary replacements aren't supported in the file format, only
deltas. Your use case is a replacement, not a delta. You could use the
stream format instead.
On Fri, Jul 23, 2021 at 8:32 AM Sam Davis wrote:
>
> Hey Wes,
>
> Tha
rite(b1)
> writer.write(b2)
> ```
>
> Version printed: 4.0.1
>
> Sam
>
> From: Wes McKinney
> Sent: 23 July 2021 14:24
> To: user@arrow.apache.org
> Subject: Re: [PyArrow] DictionaryArray isDelta Support
>
> hi Sam
>
>
Hi,
We want to write out RecordBatches of data, where one or more columns in a
batch has a `pa.string()` column encoded as a `pa.dictionary(pa.intX(),
pa.string()` as the column only contains a handful of unique values.
However, PyArrow seems to lack support for writing these batches out to
10 matches
Mail list logo