One last possible trick to consider:

If you were to subclass SequenceFileRecordReader, you'd have access to its
seek method, allowing you to rewind the reducer input. You could then
implement a block hash join with something like the following pseudocode:

ahash = new HashMap<Key, Val>();
while (i have ram available) {
  read a record
  if the record is from table B, break
  put the record into ahash
}
nextAPos = reader.getPos()

while (current record is an A record) {
  skip to next record
}
firstBPos = reader.getPos()

while (current record has current key) {
  read and join against ahash
  process joined result
}

if firstBPos > nextAPos {
  seek(nextAPos)
  go back to top
}


On Thu, May 28, 2009 at 8:05 AM, Todd Lipcon <t...@cloudera.com> wrote:

> Hi Stuart,
>
> It seems to me like you have a few options.
>
> Option 1: Just use a lot of RAM. Unless you really expect many millions of
> entries on both sides of the join, you might be able to get away with
> buffering despite its inefficiency.
>
> Option 2: Use LocalDirAllocator to find some local storage to spill all of
> the left table's records to disk in a MapFile format. Then as you iterate
> over the right table, do lookups in the MapFile. This is really the same as
> option 1, except that you're using disk as an extension of RAM.
>
> Option 3: Convert this to a map-side merge join. Basically what you need to
> do is sort both tables by the join key, and partition them with the same
> partitioner into the same number of columns. This way you have an equal
> number of part-NNNNN files for both tables, and within each part-NNNNN file
> they're ordered by join key. In each map task, you open both tableA/part-N
> and tableB/part-N and do a sequential merge to perform the join. I believe
> the CompositeInputFormat class helps with this, though I've never used it.
>
> Option 4: Perform the join in several passes. Whichever table is smaller,
> break into pieces that fit in RAM. Unless my relational algebra is off, A
> JOIN B = A JOIN (B1 UNION B2) = (A JOIN B1 UNION A JOIN B2) if B = B1 UNION
> B2.
>
> Hope that helps
> -Todd
>
>
> On Thu, May 28, 2009 at 5:02 AM, Stuart White <stuart.whi...@gmail.com>wrote:
>
>> I need to do a reduce-side join of two datasets.  It's a many-to-many
>> join; that is, each dataset can can multiple records with any given
>> key.
>>
>> Every description of a reduce-side join I've seen involves
>> constructing your keys out of your mapper such that records from one
>> dataset will be presented to the reducers before records from the
>> second dataset.  I should "hold on" to the value from the one dataset
>> and remember it as I iterate across the values from the second
>> dataset.
>>
>> This seems like it only works well for one-to-many joins (when one of
>> your datasets will only have a single record with any given key).
>> This scales well because you're only remembering one value.
>>
>> In a many-to-many join, if you apply this same algorithm, you'll need
>> to remember all values from one dataset, which of course will be
>> problematic (and won't scale) when dealing with large datasets with
>> large numbers of records with the same keys.
>>
>> Does an efficient algorithm exist for a many-to-many reduce-side join?
>>
>
>

Reply via email to