On Tue, Apr 7, 2020 at 2:40 AM Antoine Pitrou <anto...@python.org> wrote:

>
> Le 06/04/2020 à 19:22, Todd Lipcon a écrit :
> >
> > The spec should also probably cover thread-safety: if the consumer gets
> an
> > ArrowArray, is it safe to pass off the children to multiple threads and
> > have them call release() concurrently? In other words, do I need to use a
> > thread-safe reference count? I would guess so.
>
> Hmm, the spec may not be clear enough on this, but if you move a child
> and release the parent, then the other children are not usable anymore.
>
> In your case, you don't call release() on every child.  You just call
> release() on the parent once you are done with all children.  So the
> synchronization has to be managed on the consumer side, the producer
> doesn't see any of it.
>

I'm not sure I follow. If the consumer takes a top-level ArrowArray, and
then moves out each of the children, then calls release() on the top-level
array, the expectation is that the chlidren stay valid, according to the
spec. Then the consumer is responsible to call release() on each of the
children, eventually.

The question here is whether the release() callback on different children
needs to be thread-safe (eg by using a shared_ptr and not a non-atomic
reference count). Again it sounds like yes, even though in the most common
use case, everything will happen on a single thread.

-Todd



> (and of course, the simplest way to do that in C++ may be to use a
> shared_ptr to an object holding the ArrowArray struct)
>

Yea, I'm doing more or less this, but using a manual atomic reference count
so that I can do a single atomic operation in the case that release() is
called on the parent, rather than doing O(n_children):
https://gist.github.com/toddlipcon/5d5c6629dd29bc6975dd509e75424454

-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera

Reply via email to