hi! I think your strikethrough got lost due to this being a text-only email list. To make sure, I think you're asking the following: " would it be reasonable to think of splitIntoBundles as generateSplits? " (ie, you strikethrough'd Initial)
They are very similar and I definitely also think of them as occupying the same niche. I'll let someone else who was around for naming discuss whether it was intentional or not. Conceptually, the way that bounded vs streaming are handled means that they are doing slightly different things: a bounded source is really kind of creating physical chunks of the data, whereas the streaming source is creating conceptual divisions of the data that will be used later. I'm not sure that's worth the confusion caused by the differences. One thing to clarify - splitIntoBundles does have an "Initial" aspect to it. I don't believe there is a publicly defined/written down order the Sources & Reader methods are called in, but a runner trying to get efficiency would be able to use splitIntoBundles during job startup to be able to split up the work before creating readers rather than after creating readers and waiting to use splitAtFraction. S On Sun, Jan 8, 2017 at 6:06 AM Stas Levin <stasle...@gmail.com> wrote: > Hi, > > A short terminology question regarding "bundle", and > particularly splitIntoBundles vs. generateInitialSplits. > > In *BoundedSource* we have: > List<? extends BoundedSource<T>> *splitIntoBundles*(...) > > In *UnboundedSource* we have: > List<? extends UnboundedSource<OutputT, CheckpointMarkT>> > *generateInitialSplits*(...) > > I was wondering if the names were intentionally made different, i.e. "into > bundles" vs "into splits"? > In a way these two methods carry out a very similar task, would it be > reasonable to think of *splitIntoBundles *as *generate*Initial*Splits? * > (strikethrough due to "initial" not being applicable in the case of bounded > sources) > > Regards, > Stas >