This may be too late, but here's a simple example of iterating over a 
ChunkedArray containing primitive (fixed-width) types [1]. This mainly differs 
from Felipe's code because it goes from an arrow::Array to a C array without 
going through ArraySpan or ArrayData (which are needed if you want to use 
validity bitmaps).

The main thing I wanted to share, which reflects what Weston mentioned in his 
first reply, is this piece:

`std::static_pointer_cast<Int32Array>(cell_indices->chunk(chunk_ndx))`

There, I'm using `chunk(int ndx)` to access a specific chunk in the 
ChunkedArray, which is an Array. Then, I'm doing a static_pointer_cast to treat 
a std::shared_ptr<Array> as a std::shared_ptr<Int32Array>. Then, I use 
`raw_values()` to get an array:

`const int32_t *chunk_vals = chunk_data->raw_values();`

Using a ChunkedArray to select columns from a schema is definitely atypical, 
but the example might be useful.

Also, for reference, a FieldVec is just a vector<arrow::Field> [2] but I think 
I didn't find that alias until after I wrote this code.

[1]: 
https://gitlab.com/skyhookdm/skytether-singlecell/-/blob/mainline/src/cpp/processing/dataops.cpp?ref_type=heads#L254-L273

[2]: https://github.com/apache/arrow/blob/main/cpp/src/arrow/type_fwd.h#L68




# ------------------------------

# Aldrin


https://github.com/drin/

https://gitlab.com/octalene

https://keybase.io/octalene


On Thursday, February 22nd, 2024 at 11:21, Weston Pace <weston.p...@gmail.com> 
wrote:

> >> ultimately, these do end up being loops at the lower levels (unless 
> >> there's some hardware support, eg SIMD/GPU etc).
> 

> > Even if you don't write explicit SIMD, (1) the compiler might
> > vectorize the loop for you, and (2) the superscalar nature of modern
> > CPUs means loops with less branches and memory indirections will run
> > faster.
> 

> Probably getting into the weeds at this point but my concern was less 
> branch/simd/etc. and more that `GetScalar` requires a heap allocation.
> 

> 

> On Thu, Feb 22, 2024 at 10:55 AM Felipe Oliveira Carvalho 
> <felipe...@gmail.com> wrote:
> 

> > > these do end up being loops at the lower levels
> > 

> > Even if you don't write explicit SIMD, (1) the compiler might
> > vectorize the loop for you, and (2) the superscalar nature of modern
> > CPUs means loops with less branches and memory indirections will run
> > faster.
> > 

> > > Now I just need to figure out the best way to do this over multiple 
> > > columns (row-wise).
> > 

> > You can usually turn loops that go row-by-row into loops that go
> > column-by-column by maintaining selection vectors or bitmaps that you
> > can use as masks to operations on the remaining columns.
> > 

> > On Thu, Feb 22, 2024 at 1:39 PM Blair Azzopardi <blai...@gmail.com> wrote:
> > >
> > > Thanks @Weston and @Felipe. This information has been very helpful and 
> > > thank you for the examples too. I completely agree with vectorizing 
> > > computations; although, ultimately, these do end up being loops at the 
> > > lower levels (unless there's some hardware support, eg SIMD/GPU etc).
> > >
> > > @Weston, I managed to iterate over my chunked array as you suggested 
> > > (found some useful examples under the test cases) i.e
> > >
> > > std::vector<double> values;
> > > for (auto elem : arrow::stl::Iterate<arrow::DoubleType>(*chunked_array)) {
> > > if (elem.has_value()) {
> > > values.push_back(*elem);
> > > }
> > > }
> > >
> > > @Felipe, I had to adjust your snippet somewhat to get it to work (perhaps 
> > > the API is in flux). Eventually I did something like this:
> > >
> > > for (auto &chunk : chunked_array->chunks()) {
> > > auto &data = chunk->data();
> > > arrow::ArraySpan array_span(*data);
> > > auto len = array_span.buffers[1].size / 
> > > static_cast<int64_t>(sizeof(double));
> > > auto raw_values = array_span.GetSpan<double>(1, len);
> > > // able to inspect (double)*(raw_values.data_ + N)
> > > }
> > >
> > > Now I just need to figure out the best way to do this over multiple 
> > > columns (row-wise).
> > >
> > > Thanks again!
> > >
> > >
> > > On Tue, 20 Feb 2024 at 19:51, Felipe Oliveira Carvalho 
> > > <felipe...@gmail.com> wrote:
> > >>
> > >> In a Vectorized querying system, scalars and conditionals should be
> > >> avoided at all costs. That's why it's called "vectorized" — it's about
> > >> the vectors and not the scalars.
> > >>
> > >> Arrow Arrays (AKA "vectors" in other systems) are the unit of data you
> > >> mainly deal with. Data abstraction (in the OOP sense) isn't possible
> > >> while also keeping performance — classes like Scalar and DoubleScalar
> > >> are not supposed to be instantiated for every scalar in an array when
> > >> you're looping. The disadvantage is that your loop now depends on the
> > >> type of the array you're dealing with (no data abstraction based on
> > >> virtual dispatching).
> > >>
> > >> > Also, is there an efficient way to loop through a slice perhaps by 
> > >> > incrementing a pointer?
> > >>
> > >> That's the right path. Given a ChunkedArray, this what you can do:
> > >>
> > >> auto &dt = chunked_array->type();
> > >> assert(dt->id() == Type::DOUBLE);
> > >> for (auto &chunk : chunked_array->chunks()) {
> > >> // each chunk is an arrow::Array
> > >> ArrayData &data = chunk->data();
> > >> util::span<const double> raw_values = data.GetSpan<double>(1); // 1
> > >> is the data buffer
> > >> // ^ all the scalars of the chunk ara tightly packed here
> > >> // 64 bits for every double even if it's logically NULL
> > >> }
> > >>
> > >> If data.IsNull(i), the value of raw_values[i] is undefined, depending
> > >> on what you're doing with the raw_values, you don't have to care.
> > >> Compute functions commonly have two different loops: one that handles
> > >> nulls and a faster one (without checks in the loop body) that you can
> > >> use when data.GetNullCount()==0.
> > >>
> > >> Another trick is to compute on all the values and carry the same
> > >> validity-bitmap to the result. Possible when the operation is based on
> > >> each value independently of the others.
> > >>
> > >> Hope this helps. The ultra generic loop on all possible array types is
> > >> not possible without many allocations and branches per array element.
> > >>
> > >> --
> > >> Felipe
> > >>
> > >>
> > >>
> > >> On Mon, Feb 19, 2024 at 9:23 AM Weston Pace <weston.p...@gmail.com> 
> > >> wrote:
> > >> >
> > >> > There is no advantage to using a Datum here. The Datum class is mainly 
> > >> > intended for representing something that might be a Scalar or might be 
> > >> > an Array.
> > >> >
> > >> > > Also, is there an efficient way to loop through a slice perhaps by 
> > >> > > incrementing a pointer?
> > >> >
> > >> > You will want to cast the Array and avoid Scalar instances entirely. 
> > >> > For example, if you know there are no nulls in your data then you can 
> > >> > use methods like `DoubleArray::raw_values` which will give you a 
> > >> > `double*`. Since it is a chunked array you would also have to deal 
> > >> > with indexing and iterating the chunks.
> > >> >
> > >> > There are also some iterator utility classes like 
> > >> > `arrow::stl::ChunkedArrayIterator` which can be easier to use.
> > >> >
> > >> > On Mon, Feb 19, 2024 at 3:54 AM Blair Azzopardi <blai...@gmail.com> 
> > >> > wrote:
> > >> >>
> > >> >> On 2nd thoughts, the 2nd method could also be done in a single line.
> > >> >>
> > >> >> auto low3 = 
> > >> >> arrow::Datum(st_s_low.ValueOrDie()).scalar_as<arrow::DoubleScalar>().value;
> > >> >>
> > >> >> That said, I'm still keen to hear if there's an advantage to using 
> > >> >> Datum or without; and on my 2nd question regarding efficiently 
> > >> >> looping through a slice's values.
> > >> >>
> > >> >> On Mon, 19 Feb 2024 at 09:24, Blair Azzopardi <blai...@gmail.com> 
> > >> >> wrote:
> > >> >>>
> > >> >>> Hi
> > >> >>>
> > >> >>> I'm trying to figure out the optimal way for extracting scalar 
> > >> >>> values from a table; I've found two ways, using a dynamic cast or 
> > >> >>> using Datum and cast. Is one better than the other? The advantage of 
> > >> >>> the dynamic cast, seems at least, to be a one liner.
> > >> >>>
> > >> >>> auto c_val1 = table.GetColumnByName("Val1");
> > >> >>> auto st_c_val1 = s_low->GetScalar(0);
> > >> >>> if (st_c_val1.ok()) {
> > >> >>>
> > >> >>> // method 1 - via dyn cast
> > >> >>> auto val1 = 
> > >> >>> std::dynamic_pointer_cast<arrow::DoubleScalar>(st_c_val1.ValueOrDie())->value;
> > >> >>>
> > >> >>> // method 2 - via Datum & cast
> > >> >>> arrow::Datum val(st_c_val1.ValueOrDie());
> > >> >>> auto val1 = val.scalar_as<arrow::DoubleScalar>().value;
> > >> >>> }
> > >> >>>
> > >> >>> Also, is there an efficient way to loop through a slice perhaps by 
> > >> >>> incrementing a pointer? I know a chunked array might mean that the 
> > >> >>> underlying data isn't stored contiguously so perhaps this is tricky 
> > >> >>> to do. I imagine the compute functions might do this. Otherwise, it 
> > >> >>> feels each access to a value in memory requires calls to several 
> > >> >>> functions (GetScalar/ok/ValueOrDie etc).
> > >> >>>
> > >> >>> Thanks in advance
> > >> >>> Blair

Attachment: publickey - octalene.dev@pm.me - 0x21969656.asc
Description: application/pgp-keys

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to