> If instead you computed kernelized Lanczos all in one fell
> swoop, you'll be making k passes over the data, and computing
> the kernel each time, yes, but the scaling is just that of Lanczos:
>
> O(k * numUsers * kernelCost(userSparsity) )
>
> (as you mention, I forgot the kernel cost here, yes)
>
> This is tremendously better than computing the full kernel matrix,
> by a factor of k / numUsers = 5000 or so.  Unless I'm missing
> a factor somewhere.

I still don't see this. Hopefully I can explain my confusion. It seems
to me that the complexity you wrote uses a kernel that does not
require pair-wise computations. It doesn't look like this is actually
operating on the Gram matrix, H.

(call the data matrix X)
I think a more accurate complexity would be O(k * numUsers *
(kernelCost * numUsers)), which is the same as precomputing H. For
every row, i, of X that the Lanczos algorithm comes to, we need to
find k(X_i, X_{1...N}).

> Is the former, but the difference is between computing the k(r_i, r_j)
> for all i and j, and computing k(r_i, V) for all i, for a set of only K =
> 100's of dense eigenvectors (well, Lanczos vectors, used to produce
> eigenvectors).  Way way way less computation of the kernel.

I'm not sure what V is here. If V is an eigen (or Lanczos) vector,
isn't that of different dimensionality than r_i? I'm not sure how this
would work if that was the case, and I can't think of what else you
would mean by V.

Let's say that I am completely misunderstanding what you're saying and
that your suggestion works. We would then still have the problem of
centering H. There may be tricks to do this row-wise, but I am pretty
sure that the entire matrix has to be computed first, meaning that
your suggestion wouldn't work out after all.

> And you know, if you've got some clever MapReduce code to compute
> the full sparse kernelized H computation,  we could certainly use it here
> in Mahout!

I don't know if it's clever, but this is what we're currently doing:

Given an input file with (r,c) pairs, meaning there is a 1 at (r,c) in
the data matrix:
Round 1:
Identity Map
Expanding Reducer: (r,c) -> (c_1, (r, [c_{1..m}])), (c_2, (r, [c_{1..m}])),
ie:
1 2
1 3
2 1
2 4
4 1
4 2
would become
2    (1, [2, 3])
3    (1, [2, 3])
1    (2, [1, 4])
4    (2, [1, 4])
1    (4, [1, 2])
2    (4, [1, 2])

This is done to find the rows that have some overlap.

Round 2:
Identity Map
TupleMaker Reducer: (c_m, (r, [c_{1..m}])) -> ((r_1, r_2),
([c_1_{1..m}], [c_2_{1..m}]))
ie:
1    2 [1 4]
1    4 [1 2]
2    1 [2 3]
2    4 [1 2]
3    1 [2 3]
4    2 [1 4]
becomes:
[2 4]    [1 4] [1 2]
[1 4]    [2 3] [1 2]

now we have the rows that need to have the kernel function evaluated on

Round 3:
Identity Map
Remove Duplicates Reducer

Obviously, this removes any duplicates that might appear (none in this
example, but it does happen)

Round 4:
Kernel Map: ((r_1, r_2), ([c_1_{1..m}], [c_2_{1..m}])) -> ([r_1, r_2],
k(r_1, r_2))
ie, say we're using the dot product as the kernel function, so:
[2 4]    [1 4] [1 2]
[1 4]    [2 3] [1 2]
becomes:
[2 4]    9
[1 4]    8

We are doing so many rounds because we want a general approach to
applying kernel functions, and don't want to write a special case for
each kernel function we try. So this approach should work for all
pairwise kernels (like dot product, jaccard, cosine, etc)


And now to Ted:
> I would be tempted to plot the rank vs number of articles
> and look for the point where users seem to be above the Zipf curve.  I would
> just eliminate those users from the data and see what happens.

Oh, that's an interesting idea! It might be some time before we get to
this, but we will definitely try this!

Steven Buss
[email protected]
http://www.stevenbuss.com/

Reply via email to