>> ultimately, these do end up being loops at the lower levels (unless
there's some hardware support, eg SIMD/GPU etc).

> Even if you don't write explicit SIMD, (1) the compiler might
> vectorize the loop for you, and (2) the superscalar nature of modern
> CPUs means loops with less branches and memory indirections will run
> faster.

Probably getting into the weeds at this point but my concern was less
branch/simd/etc. and more that `GetScalar` requires a heap allocation.

On Thu, Feb 22, 2024 at 10:55 AM Felipe Oliveira Carvalho <
felipe...@gmail.com> wrote:

> > these do end up being loops at the lower levels
>
> Even if you don't write explicit SIMD, (1) the compiler might
> vectorize the loop for you, and (2) the superscalar nature of modern
> CPUs means loops with less branches and memory indirections will run
> faster.
>
> > Now I just need to figure out the best way to do this over multiple
> columns (row-wise).
>
> You can usually turn loops that go row-by-row into loops that go
> column-by-column by maintaining selection vectors or bitmaps that you
> can use as masks to operations on the remaining columns.
>
> On Thu, Feb 22, 2024 at 1:39 PM Blair Azzopardi <blai...@gmail.com> wrote:
> >
> > Thanks @Weston and @Felipe. This information has been very helpful and
> thank you for the examples too. I completely agree with vectorizing
> computations; although, ultimately, these do end up being loops at the
> lower levels (unless there's some hardware support, eg SIMD/GPU etc).
> >
> > @Weston, I managed to iterate over my chunked array as you suggested
> (found some useful examples under the test cases) i.e
> >
> >     std::vector<double> values;
> >     for (auto elem :
> arrow::stl::Iterate<arrow::DoubleType>(*chunked_array)) {
> >         if (elem.has_value()) {
> >             values.push_back(*elem);
> >         }
> >     }
> >
> > @Felipe, I had to adjust your snippet somewhat to get it to work
> (perhaps the API is in flux). Eventually I did something like this:
> >
> >     for (auto &chunk : chunked_array->chunks()) {
> >         auto &data = chunk->data();
> >         arrow::ArraySpan array_span(*data);
> >         auto len = array_span.buffers[1].size /
> static_cast<int64_t>(sizeof(double));
> >         auto raw_values = array_span.GetSpan<double>(1, len);
> >         // able to inspect (double)*(raw_values.data_ + N)
> >     }
> >
> > Now I just need to figure out the best way to do this over multiple
> columns (row-wise).
> >
> > Thanks again!
> >
> >
> > On Tue, 20 Feb 2024 at 19:51, Felipe Oliveira Carvalho <
> felipe...@gmail.com> wrote:
> >>
> >> In a Vectorized querying system, scalars and conditionals should be
> >> avoided at all costs. That's why it's called "vectorized" — it's about
> >> the vectors and not the scalars.
> >>
> >> Arrow Arrays (AKA "vectors" in other systems) are the unit of data you
> >> mainly deal with. Data abstraction (in the OOP sense) isn't possible
> >> while also keeping performance — classes like Scalar and DoubleScalar
> >> are not supposed to be instantiated for every scalar in an array when
> >> you're looping. The disadvantage is that your loop now depends on the
> >> type of the array you're dealing with (no data abstraction based on
> >> virtual dispatching).
> >>
> >> > Also, is there an efficient way to loop through a slice perhaps by
> incrementing a pointer?
> >>
> >> That's the right path. Given a ChunkedArray, this what you can do:
> >>
> >> auto &dt = chunked_array->type();
> >> assert(dt->id() == Type::DOUBLE);
> >> for (auto &chunk : chunked_array->chunks()) {
> >>    // each chunk is an arrow::Array
> >>    ArrayData &data = chunk->data();
> >>    util::span<const double> raw_values = data.GetSpan<double>(1); // 1
> >> is the data buffer
> >>    // ^ all the scalars of the chunk ara tightly packed here
> >>    // 64 bits for every double even if it's logically NULL
> >> }
> >>
> >> If data.IsNull(i), the value of raw_values[i] is undefined, depending
> >> on what you're doing with the raw_values, you don't have to care.
> >> Compute functions commonly have two different loops: one that handles
> >> nulls and a faster one (without checks in the loop body) that you can
> >> use when data.GetNullCount()==0.
> >>
> >> Another trick is to compute on all the values and carry the same
> >> validity-bitmap to the result. Possible when the operation is based on
> >> each value independently of the others.
> >>
> >> Hope this helps. The ultra generic loop on all possible array types is
> >> not possible without many allocations and branches per array element.
> >>
> >> --
> >> Felipe
> >>
> >>
> >>
> >> On Mon, Feb 19, 2024 at 9:23 AM Weston Pace <weston.p...@gmail.com>
> wrote:
> >> >
> >> > There is no advantage to using a Datum here.  The Datum class is
> mainly intended for representing something that might be a Scalar or might
> be an Array.
> >> >
> >> > > Also, is there an efficient way to loop through a slice perhaps by
> incrementing a pointer?
> >> >
> >> > You will want to cast the Array and avoid Scalar instances entirely.
> For example, if you know there are no nulls in your data then you can use
> methods like `DoubleArray::raw_values` which will give you a `double*`.
> Since it is a chunked array you would also have to deal with indexing and
> iterating the chunks.
> >> >
> >> > There are also some iterator utility classes like
> `arrow::stl::ChunkedArrayIterator` which can be easier to use.
> >> >
> >> > On Mon, Feb 19, 2024 at 3:54 AM Blair Azzopardi <blai...@gmail.com>
> wrote:
> >> >>
> >> >> On 2nd thoughts, the 2nd method could also be done in a single line.
> >> >>
> >> >> auto low3 =
> arrow::Datum(st_s_low.ValueOrDie()).scalar_as<arrow::DoubleScalar>().value;
> >> >>
> >> >> That said, I'm still keen to hear if there's an advantage to using
> Datum or without; and on my 2nd question regarding efficiently looping
> through a slice's values.
> >> >>
> >> >> On Mon, 19 Feb 2024 at 09:24, Blair Azzopardi <blai...@gmail.com>
> wrote:
> >> >>>
> >> >>> Hi
> >> >>>
> >> >>> I'm trying to figure out the optimal way for extracting scalar
> values from a table; I've found two ways, using a dynamic cast or using
> Datum and cast. Is one better than the other? The advantage of the dynamic
> cast, seems at least, to be a one liner.
> >> >>>
> >> >>> auto c_val1 = table.GetColumnByName("Val1");
> >> >>> auto st_c_val1 = s_low->GetScalar(0);
> >> >>> if (st_c_val1.ok()) {
> >> >>>
> >> >>>     // method 1 - via dyn cast
> >> >>>     auto val1 =
> std::dynamic_pointer_cast<arrow::DoubleScalar>(st_c_val1.ValueOrDie())->value;
> >> >>>
> >> >>>     // method 2 - via Datum & cast
> >> >>>     arrow::Datum val(st_c_val1.ValueOrDie());
> >> >>>     auto val1 = val.scalar_as<arrow::DoubleScalar>().value;
> >> >>> }
> >> >>>
> >> >>> Also, is there an efficient way to loop through a slice perhaps by
> incrementing a pointer? I know a chunked array might mean that the
> underlying data isn't stored contiguously so perhaps this is tricky to do.
> I imagine the compute functions might do this. Otherwise, it feels each
> access to a value in memory requires calls to several functions
> (GetScalar/ok/ValueOrDie etc).
> >> >>>
> >> >>> Thanks in advance
> >> >>> Blair
>

Reply via email to