Thanks for reply.. Really, what i am doing is trying to implement a mapside join. In my mind, i am gonna need that files must be no splittable, so each map will process partitions with same key.
I saw in hadoop, definitive guide, that i can force files not be split, setting the min split size to max integer. Other option is override the method isSplittablable. Is that make sense? Sorry for the spelling mistakes. I am from my iphone.. Em quinta-feira, 7 de agosto de 2014, Chris Douglas <cdoug...@apache.org> escreveu: > Is that quote from product documentation? > > Whether the output files are splittable is a practical consideration > when setting up the join; the quote is identifying a common case that > satisfies the constraints. The size of each partition is irrelevant, > provided that the splits are generated consistently across all > InputFormats involved in the expression (i.e., given datasets A,B in a > join expression and a key K in A, K is in partition N iff K is in > partition N for InputFormat B OR K is not in B). -C > > On Mon, Aug 4, 2014 at 1:36 PM, Pedro Magalhaes <pedror...@gmail.com > <javascript:;>> wrote: > > I saw that one of the requirements to use CompositeInputFormat is: > > "A map-side join can be used to join the outputs of several jobs that had > > the same number of reducers, the same keys, and output files that are not > > splittable (by being smaller than an HDFS block, or by virtue of being > gzip > > compressed, for example)" > > > > So Does my partitions size must be equal or smaller than the HDFS Block? > > > > If i have a 1 GB File = 1024 mb, i will have 16 partitions of 64 MB? > > > > How can i control the size of the partition? > > > > >