This may be too late, but here's a simple example of iterating over a ChunkedArray containing primitive (fixed-width) types [1]. This mainly differs from Felipe's code because it goes from an arrow::Array to a C array without going through ArraySpan or ArrayData (which are needed if you want to use validity bitmaps).
The main thing I wanted to share, which reflects what Weston mentioned in his first reply, is this piece: `std::static_pointer_cast<Int32Array>(cell_indices->chunk(chunk_ndx))` There, I'm using `chunk(int ndx)` to access a specific chunk in the ChunkedArray, which is an Array. Then, I'm doing a static_pointer_cast to treat a std::shared_ptr<Array> as a std::shared_ptr<Int32Array>. Then, I use `raw_values()` to get an array: `const int32_t *chunk_vals = chunk_data->raw_values();` Using a ChunkedArray to select columns from a schema is definitely atypical, but the example might be useful. Also, for reference, a FieldVec is just a vector<arrow::Field> [2] but I think I didn't find that alias until after I wrote this code. [1]: https://gitlab.com/skyhookdm/skytether-singlecell/-/blob/mainline/src/cpp/processing/dataops.cpp?ref_type=heads#L254-L273 [2]: https://github.com/apache/arrow/blob/main/cpp/src/arrow/type_fwd.h#L68 # ------------------------------ # Aldrin https://github.com/drin/ https://gitlab.com/octalene https://keybase.io/octalene On Thursday, February 22nd, 2024 at 11:21, Weston Pace <weston.p...@gmail.com> wrote: > >> ultimately, these do end up being loops at the lower levels (unless > >> there's some hardware support, eg SIMD/GPU etc). > > > Even if you don't write explicit SIMD, (1) the compiler might > > vectorize the loop for you, and (2) the superscalar nature of modern > > CPUs means loops with less branches and memory indirections will run > > faster. > > Probably getting into the weeds at this point but my concern was less > branch/simd/etc. and more that `GetScalar` requires a heap allocation. > > > On Thu, Feb 22, 2024 at 10:55 AM Felipe Oliveira Carvalho > <felipe...@gmail.com> wrote: > > > > these do end up being loops at the lower levels > > > > Even if you don't write explicit SIMD, (1) the compiler might > > vectorize the loop for you, and (2) the superscalar nature of modern > > CPUs means loops with less branches and memory indirections will run > > faster. > > > > > Now I just need to figure out the best way to do this over multiple > > > columns (row-wise). > > > > You can usually turn loops that go row-by-row into loops that go > > column-by-column by maintaining selection vectors or bitmaps that you > > can use as masks to operations on the remaining columns. > > > > On Thu, Feb 22, 2024 at 1:39 PM Blair Azzopardi <blai...@gmail.com> wrote: > > > > > > Thanks @Weston and @Felipe. This information has been very helpful and > > > thank you for the examples too. I completely agree with vectorizing > > > computations; although, ultimately, these do end up being loops at the > > > lower levels (unless there's some hardware support, eg SIMD/GPU etc). > > > > > > @Weston, I managed to iterate over my chunked array as you suggested > > > (found some useful examples under the test cases) i.e > > > > > > std::vector<double> values; > > > for (auto elem : arrow::stl::Iterate<arrow::DoubleType>(*chunked_array)) { > > > if (elem.has_value()) { > > > values.push_back(*elem); > > > } > > > } > > > > > > @Felipe, I had to adjust your snippet somewhat to get it to work (perhaps > > > the API is in flux). Eventually I did something like this: > > > > > > for (auto &chunk : chunked_array->chunks()) { > > > auto &data = chunk->data(); > > > arrow::ArraySpan array_span(*data); > > > auto len = array_span.buffers[1].size / > > > static_cast<int64_t>(sizeof(double)); > > > auto raw_values = array_span.GetSpan<double>(1, len); > > > // able to inspect (double)*(raw_values.data_ + N) > > > } > > > > > > Now I just need to figure out the best way to do this over multiple > > > columns (row-wise). > > > > > > Thanks again! > > > > > > > > > On Tue, 20 Feb 2024 at 19:51, Felipe Oliveira Carvalho > > > <felipe...@gmail.com> wrote: > > >> > > >> In a Vectorized querying system, scalars and conditionals should be > > >> avoided at all costs. That's why it's called "vectorized" — it's about > > >> the vectors and not the scalars. > > >> > > >> Arrow Arrays (AKA "vectors" in other systems) are the unit of data you > > >> mainly deal with. Data abstraction (in the OOP sense) isn't possible > > >> while also keeping performance — classes like Scalar and DoubleScalar > > >> are not supposed to be instantiated for every scalar in an array when > > >> you're looping. The disadvantage is that your loop now depends on the > > >> type of the array you're dealing with (no data abstraction based on > > >> virtual dispatching). > > >> > > >> > Also, is there an efficient way to loop through a slice perhaps by > > >> > incrementing a pointer? > > >> > > >> That's the right path. Given a ChunkedArray, this what you can do: > > >> > > >> auto &dt = chunked_array->type(); > > >> assert(dt->id() == Type::DOUBLE); > > >> for (auto &chunk : chunked_array->chunks()) { > > >> // each chunk is an arrow::Array > > >> ArrayData &data = chunk->data(); > > >> util::span<const double> raw_values = data.GetSpan<double>(1); // 1 > > >> is the data buffer > > >> // ^ all the scalars of the chunk ara tightly packed here > > >> // 64 bits for every double even if it's logically NULL > > >> } > > >> > > >> If data.IsNull(i), the value of raw_values[i] is undefined, depending > > >> on what you're doing with the raw_values, you don't have to care. > > >> Compute functions commonly have two different loops: one that handles > > >> nulls and a faster one (without checks in the loop body) that you can > > >> use when data.GetNullCount()==0. > > >> > > >> Another trick is to compute on all the values and carry the same > > >> validity-bitmap to the result. Possible when the operation is based on > > >> each value independently of the others. > > >> > > >> Hope this helps. The ultra generic loop on all possible array types is > > >> not possible without many allocations and branches per array element. > > >> > > >> -- > > >> Felipe > > >> > > >> > > >> > > >> On Mon, Feb 19, 2024 at 9:23 AM Weston Pace <weston.p...@gmail.com> > > >> wrote: > > >> > > > >> > There is no advantage to using a Datum here. The Datum class is mainly > > >> > intended for representing something that might be a Scalar or might be > > >> > an Array. > > >> > > > >> > > Also, is there an efficient way to loop through a slice perhaps by > > >> > > incrementing a pointer? > > >> > > > >> > You will want to cast the Array and avoid Scalar instances entirely. > > >> > For example, if you know there are no nulls in your data then you can > > >> > use methods like `DoubleArray::raw_values` which will give you a > > >> > `double*`. Since it is a chunked array you would also have to deal > > >> > with indexing and iterating the chunks. > > >> > > > >> > There are also some iterator utility classes like > > >> > `arrow::stl::ChunkedArrayIterator` which can be easier to use. > > >> > > > >> > On Mon, Feb 19, 2024 at 3:54 AM Blair Azzopardi <blai...@gmail.com> > > >> > wrote: > > >> >> > > >> >> On 2nd thoughts, the 2nd method could also be done in a single line. > > >> >> > > >> >> auto low3 = > > >> >> arrow::Datum(st_s_low.ValueOrDie()).scalar_as<arrow::DoubleScalar>().value; > > >> >> > > >> >> That said, I'm still keen to hear if there's an advantage to using > > >> >> Datum or without; and on my 2nd question regarding efficiently > > >> >> looping through a slice's values. > > >> >> > > >> >> On Mon, 19 Feb 2024 at 09:24, Blair Azzopardi <blai...@gmail.com> > > >> >> wrote: > > >> >>> > > >> >>> Hi > > >> >>> > > >> >>> I'm trying to figure out the optimal way for extracting scalar > > >> >>> values from a table; I've found two ways, using a dynamic cast or > > >> >>> using Datum and cast. Is one better than the other? The advantage of > > >> >>> the dynamic cast, seems at least, to be a one liner. > > >> >>> > > >> >>> auto c_val1 = table.GetColumnByName("Val1"); > > >> >>> auto st_c_val1 = s_low->GetScalar(0); > > >> >>> if (st_c_val1.ok()) { > > >> >>> > > >> >>> // method 1 - via dyn cast > > >> >>> auto val1 = > > >> >>> std::dynamic_pointer_cast<arrow::DoubleScalar>(st_c_val1.ValueOrDie())->value; > > >> >>> > > >> >>> // method 2 - via Datum & cast > > >> >>> arrow::Datum val(st_c_val1.ValueOrDie()); > > >> >>> auto val1 = val.scalar_as<arrow::DoubleScalar>().value; > > >> >>> } > > >> >>> > > >> >>> Also, is there an efficient way to loop through a slice perhaps by > > >> >>> incrementing a pointer? I know a chunked array might mean that the > > >> >>> underlying data isn't stored contiguously so perhaps this is tricky > > >> >>> to do. I imagine the compute functions might do this. Otherwise, it > > >> >>> feels each access to a value in memory requires calls to several > > >> >>> functions (GetScalar/ok/ValueOrDie etc). > > >> >>> > > >> >>> Thanks in advance > > >> >>> Blair
publickey - octalene.dev@pm.me - 0x21969656.asc
Description: application/pgp-keys
signature.asc
Description: OpenPGP digital signature