I forgot the quote is from Hadoop, Definitive Guide.

On Thu, Aug 7, 2014 at 6:04 PM, Pedro Magalhaes <pedror...@gmail.com> wrote:

> Thanks for reply..
>
> Really, what i  am doing is trying to implement a mapside join. In my
> mind, i am gonna need that files must be no splittable, so each map will
> process partitions with same key.
>
> I saw in hadoop, definitive guide, that i can force files not be split,
> setting the min split size to max integer. Other option is override the
> method isSplittablable.
>
> Is that make sense?
>
> Sorry for the spelling mistakes. I am from my iphone..
>
>
>
> Em quinta-feira, 7 de agosto de 2014, Chris Douglas <cdoug...@apache.org>
> escreveu:
>
> Is that quote from product documentation?
>>
>> Whether the output files are splittable is a practical consideration
>> when setting up the join; the quote is identifying a common case that
>> satisfies the constraints. The size of each partition is irrelevant,
>> provided that the splits are generated consistently across all
>> InputFormats involved in the expression (i.e., given datasets A,B in a
>> join expression and a key K in A, K is in partition N iff K is in
>> partition N for InputFormat B OR K is not in B). -C
>>
>> On Mon, Aug 4, 2014 at 1:36 PM, Pedro Magalhaes <pedror...@gmail.com>
>> wrote:
>> > I saw that one of the requirements to use CompositeInputFormat is:
>> > "A map-side join can be used to join the outputs of several jobs that
>> had
>> > the same number of reducers, the same keys, and output files that are
>> not
>> > splittable (by being smaller than an HDFS block, or by virtue of being
>> gzip
>> > compressed, for example)"
>> >
>> > So Does my partitions size must be equal or smaller than the HDFS Block?
>> >
>> > If i have a 1 GB File = 1024 mb, i will have 16 partitions of 64 MB?
>> >
>> > How can i control the size of the partition?
>> >
>> >
>>
>

Reply via email to