Depends on the size of the files-- if there are a bunch of tiny ones, it
can be worthwhile to have a CombineFileInputFormat, ala

http://yaseminavcular.blogspot.com/2011/03/many-small-input-files.html

J


On Tue, Feb 12, 2013 at 1:56 PM, Victor Iacoban <[email protected]>wrote:

> Thanks Josh,
> Is there any performance penalty in unions, assuming that I have several
> hundreds of input files?
>
>
> On Tue, Feb 12, 2013 at 4:39 PM, Josh Wills <[email protected]> wrote:
>
> > Yeah, of course-- that's how stuff like joins work.
> >
> > PTable<K, V> first = pipeline.read(new TableSource<K, V>(firstFile));
> > PTable<K, V> second = ...;
> > PTable<K, V> union = first.union(second);
> >
> > etc.
> >
> >
> > On Tue, Feb 12, 2013 at 1:36 PM, Victor Iacoban <
> [email protected]
> > >wrote:
> >
> > > Is there any support in crunch to use multiple sequence files as
> pipeline
> > > source?
> > > something similar to standard MultipleInputs
> > >
> > > Thanks,
> > > victor
> > >
> >
>

Reply via email to