On 09/06/10 20:20, Jake Mannix wrote:
On Wed, Jun 9, 2010 at 11:57 AM, Richard Simon Just<
[email protected]> wrote:
On 09/06/10 00:32, Jake Mannix wrote:
Too bad it's not IntWritable,VectorWritable, because then we'd actually
be able to transpose() properly - our Vector implementation assumes int
indices. :\
You can do decomposition on this, because transpose isn't needed. But
if one of the later steps needs it...
Doesn't the MatrixMultiplicationJob, and therefore decomposition, also
assume int? If so is the conversion fairly easy? or would it be a case of
writing a separate Input conversion M/R?
MatrixMultiplicationJob assumes int, but SVD does not, as it turns out,
because we never actually look at what defines the row-space - only the
column space is operated on. It turns out that I think the
LanczosDecompositionJob can actually run on any
SequenceFile<Writable,VectorWritable>, with any Writable type as key.
Nice!
On 09/06/10 00:47, Ted Dunning wrote:
@Jake and Sean
My understanding is that the adding of biases and average rating to the
prediction is based on what is done in terms of normalisation before the SVD
computation. On that topic could someone clarify the difference between
normalization and regularization for me? and also where/if the two interact?
I'm not sure what kind of regularization we're doing here, actually...
I guess what I'm asking is, would regularization normally be a part of
the normalization process? or are they completely separate? In the
literature when normalization is talked about they generally seem to be
talking about imputation and the filling in of the null entries. Whereas
when regularization is mentioned it's more gradient descent.
Nice to see it can be done with just A and V. Could you talk me through
what timeSquared does? I can't quite tell from the src.
matrix.timesSquared(Vector v) ==
(matrix.transpose().times(matrix)).times(v)
in one pass over the matrix, without ever computing the transpose or matrix
multiplication two matrices (and on a DistributedRowMatrix, it does this in
one MapReduce pass over the data).
-jake
That's awesome!
Cheers
-Richard