>> ultimately, these do end up being loops at the lower levels (unless there's some hardware support, eg SIMD/GPU etc).
> Even if you don't write explicit SIMD, (1) the compiler might > vectorize the loop for you, and (2) the superscalar nature of modern > CPUs means loops with less branches and memory indirections will run > faster. Probably getting into the weeds at this point but my concern was less branch/simd/etc. and more that `GetScalar` requires a heap allocation. On Thu, Feb 22, 2024 at 10:55 AM Felipe Oliveira Carvalho < felipe...@gmail.com> wrote: > > these do end up being loops at the lower levels > > Even if you don't write explicit SIMD, (1) the compiler might > vectorize the loop for you, and (2) the superscalar nature of modern > CPUs means loops with less branches and memory indirections will run > faster. > > > Now I just need to figure out the best way to do this over multiple > columns (row-wise). > > You can usually turn loops that go row-by-row into loops that go > column-by-column by maintaining selection vectors or bitmaps that you > can use as masks to operations on the remaining columns. > > On Thu, Feb 22, 2024 at 1:39 PM Blair Azzopardi <blai...@gmail.com> wrote: > > > > Thanks @Weston and @Felipe. This information has been very helpful and > thank you for the examples too. I completely agree with vectorizing > computations; although, ultimately, these do end up being loops at the > lower levels (unless there's some hardware support, eg SIMD/GPU etc). > > > > @Weston, I managed to iterate over my chunked array as you suggested > (found some useful examples under the test cases) i.e > > > > std::vector<double> values; > > for (auto elem : > arrow::stl::Iterate<arrow::DoubleType>(*chunked_array)) { > > if (elem.has_value()) { > > values.push_back(*elem); > > } > > } > > > > @Felipe, I had to adjust your snippet somewhat to get it to work > (perhaps the API is in flux). Eventually I did something like this: > > > > for (auto &chunk : chunked_array->chunks()) { > > auto &data = chunk->data(); > > arrow::ArraySpan array_span(*data); > > auto len = array_span.buffers[1].size / > static_cast<int64_t>(sizeof(double)); > > auto raw_values = array_span.GetSpan<double>(1, len); > > // able to inspect (double)*(raw_values.data_ + N) > > } > > > > Now I just need to figure out the best way to do this over multiple > columns (row-wise). > > > > Thanks again! > > > > > > On Tue, 20 Feb 2024 at 19:51, Felipe Oliveira Carvalho < > felipe...@gmail.com> wrote: > >> > >> In a Vectorized querying system, scalars and conditionals should be > >> avoided at all costs. That's why it's called "vectorized" — it's about > >> the vectors and not the scalars. > >> > >> Arrow Arrays (AKA "vectors" in other systems) are the unit of data you > >> mainly deal with. Data abstraction (in the OOP sense) isn't possible > >> while also keeping performance — classes like Scalar and DoubleScalar > >> are not supposed to be instantiated for every scalar in an array when > >> you're looping. The disadvantage is that your loop now depends on the > >> type of the array you're dealing with (no data abstraction based on > >> virtual dispatching). > >> > >> > Also, is there an efficient way to loop through a slice perhaps by > incrementing a pointer? > >> > >> That's the right path. Given a ChunkedArray, this what you can do: > >> > >> auto &dt = chunked_array->type(); > >> assert(dt->id() == Type::DOUBLE); > >> for (auto &chunk : chunked_array->chunks()) { > >> // each chunk is an arrow::Array > >> ArrayData &data = chunk->data(); > >> util::span<const double> raw_values = data.GetSpan<double>(1); // 1 > >> is the data buffer > >> // ^ all the scalars of the chunk ara tightly packed here > >> // 64 bits for every double even if it's logically NULL > >> } > >> > >> If data.IsNull(i), the value of raw_values[i] is undefined, depending > >> on what you're doing with the raw_values, you don't have to care. > >> Compute functions commonly have two different loops: one that handles > >> nulls and a faster one (without checks in the loop body) that you can > >> use when data.GetNullCount()==0. > >> > >> Another trick is to compute on all the values and carry the same > >> validity-bitmap to the result. Possible when the operation is based on > >> each value independently of the others. > >> > >> Hope this helps. The ultra generic loop on all possible array types is > >> not possible without many allocations and branches per array element. > >> > >> -- > >> Felipe > >> > >> > >> > >> On Mon, Feb 19, 2024 at 9:23 AM Weston Pace <weston.p...@gmail.com> > wrote: > >> > > >> > There is no advantage to using a Datum here. The Datum class is > mainly intended for representing something that might be a Scalar or might > be an Array. > >> > > >> > > Also, is there an efficient way to loop through a slice perhaps by > incrementing a pointer? > >> > > >> > You will want to cast the Array and avoid Scalar instances entirely. > For example, if you know there are no nulls in your data then you can use > methods like `DoubleArray::raw_values` which will give you a `double*`. > Since it is a chunked array you would also have to deal with indexing and > iterating the chunks. > >> > > >> > There are also some iterator utility classes like > `arrow::stl::ChunkedArrayIterator` which can be easier to use. > >> > > >> > On Mon, Feb 19, 2024 at 3:54 AM Blair Azzopardi <blai...@gmail.com> > wrote: > >> >> > >> >> On 2nd thoughts, the 2nd method could also be done in a single line. > >> >> > >> >> auto low3 = > arrow::Datum(st_s_low.ValueOrDie()).scalar_as<arrow::DoubleScalar>().value; > >> >> > >> >> That said, I'm still keen to hear if there's an advantage to using > Datum or without; and on my 2nd question regarding efficiently looping > through a slice's values. > >> >> > >> >> On Mon, 19 Feb 2024 at 09:24, Blair Azzopardi <blai...@gmail.com> > wrote: > >> >>> > >> >>> Hi > >> >>> > >> >>> I'm trying to figure out the optimal way for extracting scalar > values from a table; I've found two ways, using a dynamic cast or using > Datum and cast. Is one better than the other? The advantage of the dynamic > cast, seems at least, to be a one liner. > >> >>> > >> >>> auto c_val1 = table.GetColumnByName("Val1"); > >> >>> auto st_c_val1 = s_low->GetScalar(0); > >> >>> if (st_c_val1.ok()) { > >> >>> > >> >>> // method 1 - via dyn cast > >> >>> auto val1 = > std::dynamic_pointer_cast<arrow::DoubleScalar>(st_c_val1.ValueOrDie())->value; > >> >>> > >> >>> // method 2 - via Datum & cast > >> >>> arrow::Datum val(st_c_val1.ValueOrDie()); > >> >>> auto val1 = val.scalar_as<arrow::DoubleScalar>().value; > >> >>> } > >> >>> > >> >>> Also, is there an efficient way to loop through a slice perhaps by > incrementing a pointer? I know a chunked array might mean that the > underlying data isn't stored contiguously so perhaps this is tricky to do. > I imagine the compute functions might do this. Otherwise, it feels each > access to a value in memory requires calls to several functions > (GetScalar/ok/ValueOrDie etc). > >> >>> > >> >>> Thanks in advance > >> >>> Blair >