Actually it occurs to me that as far as reducers are concerned, we can thin
the things up in reducers even further by splitting Qhat blocks but mappers
gotta hold Q blocks of (k+p) x r in memory in its entirety

On Thu, Nov 18, 2010 at 12:00 PM, Dmitriy Lyubimov <[email protected]>wrote:

> actually, perhaps somewhat less than (around k+p=800...1000 ) since we'll
> have to have 2 q buffers in reducers at the same time for Q merging.
>
>
> On Thu, Nov 18, 2010 at 11:56 AM, Dmitriy Lyubimov <[email protected]>wrote:
>
>> Ok . I guess we'll have to see how it plays out in scale. Current version
>> does computation on Q blocks that have to be k+p wide. With hadoop default
>> setting which i think is -Xmx200M, and constraint of m>=n for Q block, that
>> puts upper limit on k+p into the area of ~1.4 K for completely square dense
>> Q blocks, other expenses notwithstanding, with default child process
>> settings.i am to guess it is certainly going to be enough for my personal
>> purposes :-). I will expect somebody to provide correction on that  for
>> Mahout goals.
>>
>> On Thu, Nov 18, 2010 at 11:41 AM, Ted Dunning <[email protected]>wrote:
>>
>>> There is an ironic tension with these.  Using the power iterations is
>>> generally bad numerically, but having a small
>>> p is much worse for accuracy.  That means that factoring (A' A)^q A will
>>> get
>>> much more accurate values for the same
>>> value of p.  Alternately phrased, getting the same accuracy would require
>>> a
>>> much larger value of p and thus would
>>> overcome the cost of the initial power iteration.
>>>
>>> How this works out in practice on truly massive scale is totally up in
>>> the
>>> air.  The result of the stochastic projection
>>> can actually be *larger* than the original sparse matrix which would seem
>>> to
>>> imply that the power method might
>>> actually save time sometimes.
>>>
>>> On Thu, Nov 18, 2010 at 11:07 AM, Dmitriy Lyubimov <[email protected]
>>> >wrote:
>>>
>>> > Further work on this may include implementation of power iterations
>>> > (although i doubt there's much to be had of them on such big volumes).
>>> >
>>>
>>
>>
>

Reply via email to