Re: Get the pairs of all row key combinations w/o repetition
Hmmm, Yes!! I'll try as above, It looks good. Thanks, Ed On Wed, Aug 13, 2008 at 10:52 PM, Amar Kamat <[EMAIL PROTECTED]> wrote: > Edward J. Yoon wrote: >> >> Yes, but then, as the i grows, the task to workload ratio gets larger >> and larger. Is It Right? >> > > I hope you have seen the corrected version in the latest email. What do you > mean by *i*? If you mean the index in the key ordering then smaller the > index, larger the keys associated with it and that is what we want. If you > mean the total number of keys then yes, larger the number of keys more the > combinations/associations the smaller key has to make. Since there will be > mC2 combinations (m : num keys), one can optimize it to have mC2 / N values > per reducer (N : num-reducers). Something like > > partition(index i, key key_j, int N) { // N is num reducers > // find the data per reducer > int dataPerRed = mC2 / N; // assuming m is known > int prev_sum = 0; > // calculate the total combinations contributed by previous indexes > for (k=1; k < i; k++) { > prev_sum += m - k + 1; // this adds the number of combinations contributed > by kth index > } > prev_sum += j - i + 1 // self contribution > return prev_sum % dataPerRed > } > I think this might work. > Amar >> >> -Edward >> >> On Wed, Aug 13, 2008 at 9:23 PM, Amar Kamat <[EMAIL PROTECTED]> wrote: >> >>> >>> Edward J. Yoon wrote: >>> Hi communities, Do you have any idea how to get the pairs of all row key combinations w/o repetition on Map/Reduce as describe below? Input : (MapFile or Hbase Table) Output : >>> >>> One way to do it would be as follows >>> For every key with index i, >>> for (k=0; k < i; k++) { >>> emit(i,key_i) >>> } >>> So the above input becomes >>> 1,key1 >>> 1,key1 >>> 1,key1 >>> It would be nice if someone can review my pseudo code of traditional CF using cosine similarity. http://wiki.apache.org/hama/TraditionalCollaborativeFiltering Thanks. >>> >>> >> >> >> >> > > -- Best regards, Edward J. Yoon [EMAIL PROTECTED] http://blog.udanax.org
Re: Get the pairs of all row key combinations w/o repetition
Edward J. Yoon wrote: Yes, but then, as the i grows, the task to workload ratio gets larger and larger. Is It Right? I hope you have seen the corrected version in the latest email. What do you mean by *i*? If you mean the index in the key ordering then smaller the index, larger the keys associated with it and that is what we want. If you mean the total number of keys then yes, larger the number of keys more the combinations/associations the smaller key has to make. Since there will be mC2 combinations (m : num keys), one can optimize it to have mC2 / N values per reducer (N : num-reducers). Something like partition(index i, key key_j, int N) { // N is num reducers // find the data per reducer int dataPerRed = mC2 / N; // assuming m is known int prev_sum = 0; // calculate the total combinations contributed by previous indexes for (k=1; k < i; k++) { prev_sum += m - k + 1; // this adds the number of combinations contributed by kth index } prev_sum += j - i + 1 // self contribution return prev_sum % dataPerRed } I think this might work. Amar -Edward On Wed, Aug 13, 2008 at 9:23 PM, Amar Kamat <[EMAIL PROTECTED]> wrote: Edward J. Yoon wrote: Hi communities, Do you have any idea how to get the pairs of all row key combinations w/o repetition on Map/Reduce as describe below? Input : (MapFile or Hbase Table) Output : One way to do it would be as follows For every key with index i, for (k=0; k < i; k++) { emit(i,key_i) } So the above input becomes 1,key1 1,key1 1,key1 It would be nice if someone can review my pseudo code of traditional CF using cosine similarity. http://wiki.apache.org/hama/TraditionalCollaborativeFiltering Thanks.
Re: Get the pairs of all row key combinations w/o repetition
Is there another option? or a efficient workload balancing algorithm for this case? If so, please, let me know. Thanks, Ed On Wed, Aug 13, 2008 at 9:55 PM, Edward J. Yoon <[EMAIL PROTECTED]> wrote: > Yes, but then, as the i grows, the task to workload ratio gets larger > and larger. Is It Right? > > -Edward > > On Wed, Aug 13, 2008 at 9:23 PM, Amar Kamat <[EMAIL PROTECTED]> wrote: >> Edward J. Yoon wrote: >>> >>> Hi communities, >>> >>> Do you have any idea how to get the pairs of all row key combinations >>> w/o repetition on Map/Reduce as describe below? >>> >>> Input : (MapFile or Hbase Table) >>> >>> >>> >>> >>> >>> >>> Output : >>> >>> >>> >>> >>> >>> >>> >>> >> >> One way to do it would be as follows >> For every key with index i, >> for (k=0; k < i; k++) { >> emit(i,key_i) >> } >> So the above input becomes >> 1,key1 >> 1,key1 >> 1,key1 >>> >>> It would be nice if someone can review my pseudo code of traditional >>> CF using cosine similarity. >>> http://wiki.apache.org/hama/TraditionalCollaborativeFiltering >>> >>> Thanks. >>> >> >> > > > > -- > Best regards, Edward J. Yoon > [EMAIL PROTECTED] > http://blog.udanax.org > -- Best regards, Edward J. Yoon [EMAIL PROTECTED] http://blog.udanax.org
Re: Get the pairs of all row key combinations w/o repetition
Yes, but then, as the i grows, the task to workload ratio gets larger and larger. Is It Right? -Edward On Wed, Aug 13, 2008 at 9:23 PM, Amar Kamat <[EMAIL PROTECTED]> wrote: > Edward J. Yoon wrote: >> >> Hi communities, >> >> Do you have any idea how to get the pairs of all row key combinations >> w/o repetition on Map/Reduce as describe below? >> >> Input : (MapFile or Hbase Table) >> >> >> >> >> >> >> Output : >> >> >> >> >> >> >> >> > > One way to do it would be as follows > For every key with index i, > for (k=0; k < i; k++) { > emit(i,key_i) > } > So the above input becomes > 1,key1 > 1,key1 > 1,key1 >> >> It would be nice if someone can review my pseudo code of traditional >> CF using cosine similarity. >> http://wiki.apache.org/hama/TraditionalCollaborativeFiltering >> >> Thanks. >> > > -- Best regards, Edward J. Yoon [EMAIL PROTECTED] http://blog.udanax.org
Re: Get the pairs of all row key combinations w/o repetition
Amar Kamat wrote: Edward J. Yoon wrote: Hi communities, Do you have any idea how to get the pairs of all row key combinations w/o repetition on Map/Reduce as describe below? Input : (MapFile or Hbase Table) Output : One way to do it would be as follows Map : For every key with index i (one has to define an index on a key, some kind of ordering), for (k=0; k < i; k++) { emit(i,key_i) emit(k,key_i); } So the above input becomes 1,key1 1,key2 2,key2 1,key3 2,key3 3,key3 1,key4 2,key4 3,key4 4,key4 etc Now if u reduce them you get Reduce : index,{key_index, key_index+1, ...} for which you emit for (i=index + 1; i < n; i++) { // n is num_keys emit (key_index,key_i) } I think it can be optimized further. Amar P.S plz ignore my previous post. I clicked on send by mistake. It would be nice if someone can review my pseudo code of traditional CF using cosine similarity. http://wiki.apache.org/hama/TraditionalCollaborativeFiltering Thanks.
Re: Get the pairs of all row key combinations w/o repetition
Edward J. Yoon wrote: Hi communities, Do you have any idea how to get the pairs of all row key combinations w/o repetition on Map/Reduce as describe below? Input : (MapFile or Hbase Table) Output : One way to do it would be as follows Map : For every key with index i (one has to define an index on a key, some kind of ordering), for (k=0; k < i; k++) { emit(i,key_i) } So the above input becomes 1,key1 1,key2 2,key2 1,key3 2,key3 3,key3 1,key4 2,key4 3,key4 4,key4 etc Now if u reduce them you get Reduce : index,{key_index, key_index+1, ...} for which you emit for (i=index + 1; i < n; i++) { // n is num_keys emit (key_index,key_i) } I think it can be optimized further. Amar P.S plz ignore my previous post. I clicked on send by mistake. It would be nice if someone can review my pseudo code of traditional CF using cosine similarity. http://wiki.apache.org/hama/TraditionalCollaborativeFiltering Thanks.
Re: Get the pairs of all row key combinations w/o repetition
Edward J. Yoon wrote: Hi communities, Do you have any idea how to get the pairs of all row key combinations w/o repetition on Map/Reduce as describe below? Input : (MapFile or Hbase Table) Output : One way to do it would be as follows For every key with index i, for (k=0; k < i; k++) { emit(i,key_i) } So the above input becomes 1,key1 1,key1 1,key1 It would be nice if someone can review my pseudo code of traditional CF using cosine similarity. http://wiki.apache.org/hama/TraditionalCollaborativeFiltering Thanks.