Re: Get the pairs of all row key combinations w/o repetition

2008-08-13 Thread Edward J. Yoon
Hmmm, Yes!! I'll try as above, It looks good.

Thanks, Ed

On Wed, Aug 13, 2008 at 10:52 PM, Amar Kamat <[EMAIL PROTECTED]> wrote:
> Edward J. Yoon wrote:
>>
>> Yes, but then, as the i grows, the task to workload ratio gets larger
>> and larger. Is It Right?
>>
>
> I hope you have seen the corrected version in the latest email. What do you
> mean by *i*? If you mean the index in the key ordering then smaller the
> index, larger the keys associated with it and that is what we want. If you
> mean the total number of keys then yes, larger the number of keys more the
> combinations/associations the smaller key has to make. Since there will be
> mC2 combinations (m : num keys), one can optimize it to have mC2 / N values
> per reducer (N : num-reducers). Something like
>
> partition(index i, key key_j, int N) { // N is num reducers
>  // find the data per reducer
>  int dataPerRed = mC2 / N; // assuming m is known
>  int prev_sum = 0;
>  // calculate the total combinations contributed by previous indexes
>  for (k=1; k < i; k++) {
>   prev_sum += m - k + 1; // this adds the number of combinations contributed
> by kth index
>  }
>  prev_sum += j - i + 1 // self contribution
>  return prev_sum % dataPerRed
> }
> I think this might work.
> Amar
>>
>> -Edward
>>
>> On Wed, Aug 13, 2008 at 9:23 PM, Amar Kamat <[EMAIL PROTECTED]> wrote:
>>
>>>
>>> Edward J. Yoon wrote:
>>>

 Hi communities,

 Do you have any idea how to get the pairs of all row key combinations
 w/o repetition on Map/Reduce as describe below?

 Input : (MapFile or Hbase Table)

 
 
 
 

 Output :

 
 
 
 
 
 


>>>
>>> One way to do it would be as follows
>>> For every key with index i,
>>> for (k=0; k < i; k++) {
>>> emit(i,key_i)
>>> }
>>> So the above input becomes
>>> 1,key1
>>> 1,key1
>>> 1,key1
>>>

 It would be nice if someone can review my pseudo code of traditional
 CF using cosine similarity.
 http://wiki.apache.org/hama/TraditionalCollaborativeFiltering

 Thanks.


>>>
>>>
>>
>>
>>
>>
>
>



-- 
Best regards, Edward J. Yoon
[EMAIL PROTECTED]
http://blog.udanax.org


Re: Get the pairs of all row key combinations w/o repetition

2008-08-13 Thread Amar Kamat

Edward J. Yoon wrote:

Yes, but then, as the i grows, the task to workload ratio gets larger
and larger. Is It Right?
  
I hope you have seen the corrected version in the latest email. What do 
you mean by *i*? If you mean the index in the key ordering then smaller 
the index, larger the keys associated with it and that is what we want. 
If you mean the total number of keys then yes, larger the number of keys 
more the combinations/associations the smaller key has to make. Since 
there will be mC2 combinations (m : num keys), one can optimize it to 
have mC2 / N values per reducer (N : num-reducers). Something like


partition(index i, key key_j, int N) { // N is num reducers
 // find the data per reducer
 int dataPerRed = mC2 / N; // assuming m is known
 int prev_sum = 0;
 // calculate the total combinations contributed by previous indexes
 for (k=1; k < i; k++) {
   prev_sum += m - k + 1; // this adds the number of combinations 
contributed by kth index

 }
 prev_sum += j - i + 1 // self contribution
 return prev_sum % dataPerRed
}
I think this might work.
Amar

-Edward

On Wed, Aug 13, 2008 at 9:23 PM, Amar Kamat <[EMAIL PROTECTED]> wrote:
  

Edward J. Yoon wrote:


Hi communities,

Do you have any idea how to get the pairs of all row key combinations
w/o repetition on Map/Reduce as describe below?

Input : (MapFile or Hbase Table)






Output :








  

One way to do it would be as follows
For every key with index i,
for (k=0; k < i; k++) {
emit(i,key_i)
}
So the above input becomes
1,key1
1,key1
1,key1


It would be nice if someone can review my pseudo code of traditional
CF using cosine similarity.
http://wiki.apache.org/hama/TraditionalCollaborativeFiltering

Thanks.

  





  




Re: Get the pairs of all row key combinations w/o repetition

2008-08-13 Thread Edward J. Yoon
Is there another option? or a efficient workload balancing algorithm
for this case? If so, please, let me know.

Thanks, Ed

On Wed, Aug 13, 2008 at 9:55 PM, Edward J. Yoon <[EMAIL PROTECTED]> wrote:
> Yes, but then, as the i grows, the task to workload ratio gets larger
> and larger. Is It Right?
>
> -Edward
>
> On Wed, Aug 13, 2008 at 9:23 PM, Amar Kamat <[EMAIL PROTECTED]> wrote:
>> Edward J. Yoon wrote:
>>>
>>> Hi communities,
>>>
>>> Do you have any idea how to get the pairs of all row key combinations
>>> w/o repetition on Map/Reduce as describe below?
>>>
>>> Input : (MapFile or Hbase Table)
>>>
>>> 
>>> 
>>> 
>>> 
>>>
>>> Output :
>>>
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>>
>>
>> One way to do it would be as follows
>> For every key with index i,
>> for (k=0; k < i; k++) {
>> emit(i,key_i)
>> }
>> So the above input becomes
>> 1,key1
>> 1,key1
>> 1,key1
>>>
>>> It would be nice if someone can review my pseudo code of traditional
>>> CF using cosine similarity.
>>> http://wiki.apache.org/hama/TraditionalCollaborativeFiltering
>>>
>>> Thanks.
>>>
>>
>>
>
>
>
> --
> Best regards, Edward J. Yoon
> [EMAIL PROTECTED]
> http://blog.udanax.org
>



-- 
Best regards, Edward J. Yoon
[EMAIL PROTECTED]
http://blog.udanax.org


Re: Get the pairs of all row key combinations w/o repetition

2008-08-13 Thread Edward J. Yoon
Yes, but then, as the i grows, the task to workload ratio gets larger
and larger. Is It Right?

-Edward

On Wed, Aug 13, 2008 at 9:23 PM, Amar Kamat <[EMAIL PROTECTED]> wrote:
> Edward J. Yoon wrote:
>>
>> Hi communities,
>>
>> Do you have any idea how to get the pairs of all row key combinations
>> w/o repetition on Map/Reduce as describe below?
>>
>> Input : (MapFile or Hbase Table)
>>
>> 
>> 
>> 
>> 
>>
>> Output :
>>
>> 
>> 
>> 
>> 
>> 
>> 
>>
>
> One way to do it would be as follows
> For every key with index i,
> for (k=0; k < i; k++) {
> emit(i,key_i)
> }
> So the above input becomes
> 1,key1
> 1,key1
> 1,key1
>>
>> It would be nice if someone can review my pseudo code of traditional
>> CF using cosine similarity.
>> http://wiki.apache.org/hama/TraditionalCollaborativeFiltering
>>
>> Thanks.
>>
>
>



-- 
Best regards, Edward J. Yoon
[EMAIL PROTECTED]
http://blog.udanax.org


Re: Get the pairs of all row key combinations w/o repetition

2008-08-13 Thread Amar Kamat

Amar Kamat wrote:

Edward J. Yoon wrote:

Hi communities,

Do you have any idea how to get the pairs of all row key combinations
w/o repetition on Map/Reduce as describe below?

Input : (MapFile or Hbase Table)






Output :







  

One way to do it would be as follows
Map :
For every key with index i (one has to define an index on a key, some 
kind of ordering),

for (k=0; k < i; k++) {
emit(i,key_i)

emit(k,key_i);

}
So the above input becomes
1,key1
1,key2
2,key2
1,key3
2,key3
3,key3
1,key4
2,key4
3,key4
4,key4
etc
Now if u reduce them you get
Reduce :
index,{key_index, key_index+1,  ...} for which you emit
for (i=index + 1; i < n; i++) { // n is num_keys
emit (key_index,key_i)
}
I think it can be optimized further.

Amar
P.S plz ignore my previous post. I clicked on send by mistake.

It would be nice if someone can review my pseudo code of traditional
CF using cosine similarity.
http://wiki.apache.org/hama/TraditionalCollaborativeFiltering

Thanks.
  






Re: Get the pairs of all row key combinations w/o repetition

2008-08-13 Thread Amar Kamat

Edward J. Yoon wrote:

Hi communities,

Do you have any idea how to get the pairs of all row key combinations
w/o repetition on Map/Reduce as describe below?

Input : (MapFile or Hbase Table)






Output :







  

One way to do it would be as follows
Map :
For every key with index i (one has to define an index on a key, some 
kind of ordering),

for (k=0; k < i; k++) {
emit(i,key_i)
}
So the above input becomes
1,key1
1,key2
2,key2
1,key3
2,key3
3,key3
1,key4
2,key4
3,key4
4,key4
etc
Now if u reduce them you get
Reduce :
index,{key_index, key_index+1,  ...} for which you emit
for (i=index + 1; i < n; i++) { // n is num_keys
emit (key_index,key_i)
}
I think it can be optimized further.

Amar
P.S plz ignore my previous post. I clicked on send by mistake.

It would be nice if someone can review my pseudo code of traditional
CF using cosine similarity.
http://wiki.apache.org/hama/TraditionalCollaborativeFiltering

Thanks.
  




Re: Get the pairs of all row key combinations w/o repetition

2008-08-13 Thread Amar Kamat

Edward J. Yoon wrote:

Hi communities,

Do you have any idea how to get the pairs of all row key combinations
w/o repetition on Map/Reduce as describe below?

Input : (MapFile or Hbase Table)






Output :







  

One way to do it would be as follows
For every key with index i,
for (k=0; k < i; k++) {
emit(i,key_i)
}
So the above input becomes
1,key1
1,key1
1,key1

It would be nice if someone can review my pseudo code of traditional
CF using cosine similarity.
http://wiki.apache.org/hama/TraditionalCollaborativeFiltering

Thanks.