Thank you, Chris. This solves my questions. -Kevin
On Mon, Jul 14, 2008 at 11:17 AM, Chris Douglas <[EMAIL PROTECTED]> wrote: > "Yielding equal partitions" means that each input source will offer n > partitions and for any given partition 0 <= i < n, the records in that > partition are 1) sorted on the same key 2) unique to that partition, i.e. if > a key k is in partition i for a given source, k appears in no other > partitions from that source and if any other source contains k, all > occurrences appear in partition i from that source. All the framework really > effects is the cartesian product of all matching keys, so yes, that implies > equi-joins. > > It's a fairly strict requirement. Satisfying it is less onerous if one is > joining the output of several m/r jobs, each of which uses the same > keys/partitioner, the same number of reduces, and each output file > (part-xxxxx) of each job is not splittable. In this case, n is equal to the > number of output files from each job (the number of reduces), (1) is > satisfied if the reduce emits records in the same order (i.e. no new keys, > no records out of order), and (2) is guaranteed by the partitioner and (1). > > An InputFormat capable of parsing metadata about each source to generate > partitions from the set of input sources is ideal, but I can point to no > existing implementation. -C > > On Jul 14, 2008, at 9:20 AM, Kevin wrote: > >> Hi, >> >> I find limited information about this package which looks like could >> do "equi?" join. "Given a set of sorted datasets keyed with the same >> class and yielding equal partitions, it is possible to effect a join >> of those datasets prior to the map. " What does "yielding equal >> partitions" mean? >> >> Thank you. >> >> -Kevin > >