hi!

I think your strikethrough got lost due to this being a text-only email
list. To make sure, I think you're asking the following:
" would it be reasonable to think of splitIntoBundles as generateSplits? "
(ie, you strikethrough'd Initial)

They are very similar and I definitely also think of them as occupying the
same niche. I'll let someone else who was around for naming discuss whether
it was intentional or not. Conceptually, the way that bounded vs streaming
are handled means that they are doing slightly different things: a bounded
source is really kind of creating physical chunks of the data, whereas the
streaming source is creating conceptual divisions of the data that will be
used later. I'm not sure that's worth the confusion caused by the
differences.

One thing to clarify - splitIntoBundles does have an "Initial" aspect to
it. I don't believe there is a publicly defined/written down order the
Sources & Reader methods are called in, but a runner trying to get
efficiency would be able to use splitIntoBundles during job startup to be
able to split up the work before creating readers rather than after
creating readers and waiting to use splitAtFraction.

S

On Sun, Jan 8, 2017 at 6:06 AM Stas Levin <stasle...@gmail.com> wrote:

> Hi,
>
> A short terminology question regarding "bundle", and
> particularly splitIntoBundles vs. generateInitialSplits.
>
> In *BoundedSource* we have:
> List<? extends BoundedSource<T>> *splitIntoBundles*(...)
>
> In *UnboundedSource* we have:
> List<? extends UnboundedSource<OutputT, CheckpointMarkT>>
> *generateInitialSplits*(...)
>
> I was wondering if the names were intentionally made different, i.e. "into
> bundles" vs "into splits"?
> In a way these two methods carry out a very similar task, would it be
> reasonable to think of *splitIntoBundles *as *generate*Initial*Splits? *
> (strikethrough due to "initial" not being applicable in the case of bounded
> sources)
>
> Regards,
> Stas
>

Reply via email to