Re: [DISCUSS][C++] Enabling finer-grained parallelism in compute operators, quantifying ExecBatch overhead

Antoine Pitrou Wed, 28 Jul 2021 03:32:27 -0700


Le 28/07/2021 à 03:33, Wes McKinney a écrit :


I don't have the solution worked out for this, but the basic gist is:

* To be 10-100x more efficient ExecBatch slicing cannot call
ArrayData::Slice for every field like it does now
* Atomics associated with interacting with shared_ptr<ArrayData> /
shared_ptr<Buffer> do add meaningful overhead
* The way that array kernels are currently implemented would need to
shift to accommodate the changes needed to make ExecBatch lighter
weight.

One initial option is to move the "batch offset" to the top level of
ExecBatch (to remove the need to copy ArrayData), but then quite a bit
of code would need to be adapted to combine that offset with the
ArrayData's offsets to compute memory addresses. If this memory
address arithmetic has leaked into kernel implementations, this might
be a good opportunity to unleak it. That wouldn't fix the
shared_ptr/atomics overhead, so I'm open to ideas about how that could
be addressed also.


We could have a non-owning ArraySpan:

struct ArraySpan {
  ArrayData* data;
  const int64_t offset, length;

  int64_t absolute_offset() const {
    return offset + data->offset;
  }
};

And/or a more general (Datum-like) Span:

class Span {
  util::variant<ArraySpan*, Scalar*> datum_;

 public:
  // Datum-like accessors
};

or

class Span {
  util::variant<ArrayData*, Scalar*> datum_;
  const int64_t offset_, length_;

 public:
  // Datum-like accessors
};


Then ExecBatch could be a glorified std::vector<Span>.

(also, Arrow would probably still benefit from a small vectorimplementation... at least for internals, because we can't easily exposeit in public-facing APIs in place of regular std::vector)


Regards

Antoine.

Re: [DISCUSS][C++] Enabling finer-grained parallelism in compute operators, quantifying ExecBatch overhead

Reply via email to