I forgot the quote is from Hadoop, Definitive Guide.
On Thu, Aug 7, 2014 at 6:04 PM, Pedro Magalhaes <[email protected]> wrote: > Thanks for reply.. > > Really, what i am doing is trying to implement a mapside join. In my > mind, i am gonna need that files must be no splittable, so each map will > process partitions with same key. > > I saw in hadoop, definitive guide, that i can force files not be split, > setting the min split size to max integer. Other option is override the > method isSplittablable. > > Is that make sense? > > Sorry for the spelling mistakes. I am from my iphone.. > > > > Em quinta-feira, 7 de agosto de 2014, Chris Douglas <[email protected]> > escreveu: > > Is that quote from product documentation? >> >> Whether the output files are splittable is a practical consideration >> when setting up the join; the quote is identifying a common case that >> satisfies the constraints. The size of each partition is irrelevant, >> provided that the splits are generated consistently across all >> InputFormats involved in the expression (i.e., given datasets A,B in a >> join expression and a key K in A, K is in partition N iff K is in >> partition N for InputFormat B OR K is not in B). -C >> >> On Mon, Aug 4, 2014 at 1:36 PM, Pedro Magalhaes <[email protected]> >> wrote: >> > I saw that one of the requirements to use CompositeInputFormat is: >> > "A map-side join can be used to join the outputs of several jobs that >> had >> > the same number of reducers, the same keys, and output files that are >> not >> > splittable (by being smaller than an HDFS block, or by virtue of being >> gzip >> > compressed, for example)" >> > >> > So Does my partitions size must be equal or smaller than the HDFS Block? >> > >> > If i have a 1 GB File = 1024 mb, i will have 16 partitions of 64 MB? >> > >> > How can i control the size of the partition? >> > >> > >> >
