Thank you, Chris. This solves my questions.
-Kevin

On Mon, Jul 14, 2008 at 11:17 AM, Chris Douglas <[EMAIL PROTECTED]> wrote:
> "Yielding equal partitions" means that each input source will offer n
> partitions and for any given partition 0 <= i < n, the records in that
> partition are 1) sorted on the same key 2) unique to that partition, i.e. if
> a key k is in partition i for a given source, k appears in no other
> partitions from that source and if any other source contains k, all
> occurrences appear in partition i from that source. All the framework really
> effects is the cartesian product of all matching keys, so yes, that implies
> equi-joins.
>
> It's a fairly strict requirement. Satisfying it is less onerous if one is
> joining the output of several m/r jobs, each of which uses the same
> keys/partitioner, the same number of reduces, and each output file
> (part-xxxxx) of each job is not splittable. In this case, n is equal to the
> number of output files from each job (the number of reduces), (1) is
> satisfied if the reduce emits records in the same order (i.e. no new keys,
> no records out of order), and (2) is guaranteed by the partitioner and (1).
>
> An InputFormat capable of parsing metadata about each source to generate
> partitions from the set of input sources is ideal, but I can point to no
> existing implementation. -C
>
> On Jul 14, 2008, at 9:20 AM, Kevin wrote:
>
>> Hi,
>>
>> I find limited information about this package which looks like could
>> do "equi?" join. "Given a set of sorted datasets keyed with the same
>> class and yielding equal partitions, it is possible to effect a join
>> of those datasets prior to the map. " What does "yielding equal
>> partitions" mean?
>>
>> Thank you.
>>
>> -Kevin
>
>

Reply via email to