Re: [PyArrow] DictionaryArray isDelta Support

2022-01-17 Thread Sam Davis
to make this clear? Sam From: Weston Pace Sent: 15 January 2022 01:46 To: user@arrow.apache.org Subject: Re: [PyArrow] DictionaryArray isDelta Support I've been working with IPC files lately so I took another look at this and it was much easier than I expected

Re: [PyArrow] DictionaryArray isDelta Support

2022-01-14 Thread Weston Pace
)); > } else { > RETURN_NOT_OK( > GetDictionaryPayload(dictionary_id, dictionary, options_, > )); > } > RETURN_NOT_OK(WritePayload(payload)); > ++stats_.num_dictionary_batches; > if (dict

Re: [PyArrow] DictionaryArray isDelta Support

2022-01-07 Thread Sam Davis
; Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing - arrow/writer.cc at 91e3ac53e2e21736ce6291d73fc37da6fa21259d * apache/arrow github.com ? ____________ From: Weston Pace Sent: 27 July 2021 19:52 To: user@arrow.apache.org Subject:

Re: [PyArrow] DictionaryArray isDelta Support

2021-07-27 Thread Weston Pace
dictionary_deltas=True) > > with pa.ipc.new_file("/tmp/sdavis_tmp.arrow", schema=schema, options=options) > as writer: > writer.write(b1) > > writer.write(b2) > ``` > > Best, > > Sam > > From: Wes McKinney > Sent: 24 July 202

Re: [PyArrow] DictionaryArray isDelta Support

2021-07-26 Thread Sam Davis
writer.write(b1) writer.write(b2) ``` Best, Sam ____________ From: Wes McKinney Sent: 24 July 2021 01:43 To: user@arrow.apache.org Subject: Re: [PyArrow] DictionaryArray isDelta Support If I'm interpreting you correctly, the issue is that every dictionary must be

Re: [PyArrow] DictionaryArray isDelta Support

2021-07-23 Thread Wes McKinney
I think this check in the C++ code triggers regardless of > whether the delta option is turned on: > > https://github.com/apache/arrow/blob/e0401123736c85283e527797a113a3c38c0915f2/cpp/src/arrow/ipc/writer.cc#L1066 > ____ > From: Sam Davis > Sent: 23 July 20

Re: [PyArrow] DictionaryArray isDelta Support

2021-07-23 Thread Sam Davis
:43 To: user@arrow.apache.org Subject: Re: [PyArrow] DictionaryArray isDelta Support Yes I know this as quoted in the spec. What I am wondering is for the file format how can I write deltas out using PyArrow? The previous example was a trivial version of reality. More concretely, say I want

Re: [PyArrow] DictionaryArray isDelta Support

2021-07-23 Thread Sam Davis
g Subject: Re: [PyArrow] DictionaryArray isDelta Support Dictionary replacements aren't supported in the file format, only deltas. Your use case is a replacement, not a delta. You could use the stream format instead. On Fri, Jul 23, 2021 at 8:32 AM Sam Davis wrote: > > Hey Wes, > > Tha

Re: [PyArrow] DictionaryArray isDelta Support

2021-07-23 Thread Wes McKinney
rite(b1) > writer.write(b2) > ``` > > Version printed: 4.0.1 > > Sam > > From: Wes McKinney > Sent: 23 July 2021 14:24 > To: user@arrow.apache.org > Subject: Re: [PyArrow] DictionaryArray isDelta Support > > hi Sam > >

[PyArrow] DictionaryArray isDelta Support

2021-07-23 Thread Sam Davis
Hi, We want to write out RecordBatches of data, where one or more columns in a batch has a `pa.string()` column encoded as a `pa.dictionary(pa.intX(), pa.string()` as the column only contains a handful of unique values. However, PyArrow seems to lack support for writing these batches out to