Yes, I think you mean this post by 
Weston<https://lists.apache.org/thread/llfm5dfh2988w2w4j6off417w9szp1tg>. I'll 
look into adding this sequential-option to source-node and report back.


Yaron.
________________________________
From: Li Jin <ice.xell...@gmail.com>
Sent: Monday, July 25, 2022 11:39 AM
To: dev@arrow.apache.org <dev@arrow.apache.org>
Subject: Re: [C++] Clarifying the behavior of source node and executor

Now I think about it more. Weston has probably answered this in another
mailing thread that this is not guaranteed and the observation of batches
becoming out of file reader + source node happened by chance. Perhaps we
can look into adding an option to Source node to ensure "sequential"..

Li

On Mon, Jul 25, 2022 at 11:18 AM Yaron Gvili <rt...@hotmail.com> wrote:

> I've also been using source node with a generator, but observed batches in
> random order (in a 1-to-2-months old version of Arrow). So, I'd be
> surprised if ordering is guaranteed, and I'm also interested in how to
> obtain such a guarantee.
>
>
> Yaron.
> ________________________________
> From: Li Jin <ice.xell...@gmail.com>
> Sent: Monday, July 25, 2022 11:10 AM
> To: dev@arrow.apache.org <dev@arrow.apache.org>
> Subject: Re: [C++] Clarifying the behavior of source node and executor
>
> Sorry the link to the generator above is wrong - We traced into the code
> and found it uses BackgroundGenerator:
>
> https://github.com/apache/arrow/blob/78fb2edd30b602bd54702896fa78d36ec6fefc8c/cpp/src/arrow/util/async_generator.h#L1581
>
> On Mon, Jul 25, 2022 at 11:07 AM Li Jin <ice.xell...@gmail.com> wrote:
>
> > Hi,
> >
> > Ivan and I are debugging some behavior of the source node this morning
> and
> > I was hoping to clarify that our understanding is correct.
> >
> > We observed that when using source node with a generator:
> >
> >
> https://github.com/apache/arrow/blob/66c66d040bbf81a4819b276aee306625dc02837c/cpp/src/arrow/compute/exec/options.h#L54
> >
> > The source node becomes "sequential" (batches come out in order one at a
> > time) even with a GetCpuThreadPool() attached to the plan.
> >
> > We traced the code into this class:
> >
> >
> https://github.com/apache/arrow/blob/78fb2edd30b602bd54702896fa78d36ec6fefc8c/cpp/src/arrow/util/async_generator.h#L316
> >
> > And it seems like because of the synchronization of this class, it
> > generates batches sequentially. Is this correct understanding and if it
> is
> > intentional that the source node are sequential when backed by a
> > generator? (This is actually the behavior that we want)
> >
>

Reply via email to