Edward J. Yoon wrote:
Hi communities,

Do you have any idea how to get the pairs of all row key combinations
w/o repetition on Map/Reduce as describe below?

Input : (MapFile or Hbase Table)

<Key1, Value or RowResult>
<Key2, Value or RowResult>
<Key3, Value or RowResult>
<Key4, Value or RowResult>

Output :

<Key1, Key2>
<Key1, Key3>
<Key1, Key4>
<Key2, Key3>
<Key2, Key4>
<Key3, Key4>
One way to do it would be as follows
Map :
For every key with index i (one has to define an index on a key, some kind of ordering),
for (k=0; k < i; k++) {
emit(i,key_i)
}
So the above input becomes
1,key1
1,key2
2,key2
1,key3
2,key3
3,key3
1,key4
2,key4
3,key4
4,key4
etc
Now if u reduce them you get
Reduce :
index,{key_index, key_index+1,  ...} for which you emit
for (i=index + 1; i < n; i++) { // n is num_keys
emit (key_index,key_i)
}
I think it can be optimized further.

Amar
P.S plz ignore my previous post. I clicked on send by mistake.
It would be nice if someone can review my pseudo code of traditional
CF using cosine similarity.
http://wiki.apache.org/hama/TraditionalCollaborativeFiltering

Thanks.

Reply via email to