If you just want to find the top eigenvalue / eigenvector you can do
something like the Lanczos method. There is a description of a MapReduce
based algorithm in Section 4.2 of [1]

[1] http://www.cs.cmu.edu/~ukang/papers/HeigenPAKDD2011.pdf


On Thu, Aug 7, 2014 at 10:54 AM, Li Pu <l...@twitter.com.invalid> wrote:

> @Miles, the latest SVD implementation in mllib is partially distributed.
> Matrix-vector multiplication is computed among all workers, but the right
> singular vectors are all stored in the driver. If your symmetric matrix is
> n x n and you want the first k eigenvalues, you will need to fit n x k
> doubles in driver's memory. Behind the scene, it calls ARPACK to compute
> eigen-decomposition of A^T A. You can look into the source code for the
> details.
>
> @Sean, the SVD++ implementation in graphx is not the canonical definition
> of SVD. It doesn't have the orthogonality that SVD holds. But we might want
> to use graphx as the underlying matrix representation for mllib.SVD to
> address the problem of skewed entry distribution.
>
>
> On Thu, Aug 7, 2014 at 10:51 AM, Evan R. Sparks <evan.spa...@gmail.com>
> wrote:
>
>> Reza Zadeh has contributed the distributed implementation of
>> (Tall/Skinny) SVD (
>> http://spark.apache.org/docs/latest/mllib-dimensionality-reduction.html),
>> which is in MLlib (Spark 1.0) and a distributed sparse SVD coming in Spark
>> 1.1. (https://issues.apache.org/jira/browse/SPARK-1782). If your data is
>> sparse (which it often is in social networks), you may have better luck
>> with this.
>>
>> I haven't tried the GraphX implementation, but those algorithms are often
>> well-suited for power-law distributed graphs as you might see in social
>> networks.
>>
>> FWIW, I believe you need to square elements of the sigma matrix from the
>> SVD to get the eigenvalues.
>>
>>
>>
>>
>> On Thu, Aug 7, 2014 at 10:20 AM, Sean Owen <so...@cloudera.com> wrote:
>>
>>> (-incubator, +user)
>>>
>>> If your matrix is symmetric (and real I presume), and if my linear
>>> algebra isn't too rusty, then its SVD is its eigendecomposition. The
>>> SingularValueDecomposition object you get back has U and V, both of
>>> which have columns that are the eigenvectors.
>>>
>>> There are a few SVDs in the Spark code. The one in mllib is not
>>> distributed (right?) and is probably not an efficient means of
>>> computing eigenvectors if you really just want a decomposition of a
>>> symmetric matrix.
>>>
>>> The one I see in graphx is distributed? I haven't used it though.
>>> Maybe it could be part of a solution.
>>>
>>>
>>>
>>> On Thu, Aug 7, 2014 at 2:21 PM, yaochunnan <yaochun...@gmail.com> wrote:
>>> > Our lab need to do some simulation on online social networks. We need
>>> to
>>> > handle a 5000*5000 adjacency matrix, namely, to get its largest
>>> eigenvalue
>>> > and corresponding eigenvector. Matlab can be used but it is
>>> time-consuming.
>>> > Is Spark effective in linear algebra calculations and transformations?
>>> Later
>>> > we would have 5000000*5000000 matrix processed. It seems emergent that
>>> we
>>> > should find some distributed computation platform.
>>> >
>>> > I see SVD has been implemented and I can get eigenvalues of a matrix
>>> through
>>> > this API.  But when I want to get both eigenvalues and eigenvectors or
>>> at
>>> > least the biggest eigenvalue and the corresponding eigenvector, it
>>> seems
>>> > that current Spark doesn't have such API. Is it possible that I write
>>> > eigenvalue decomposition from scratch? What should I do? Thanks a lot!
>>> >
>>> >
>>> > Miles Yao
>>> >
>>> > ________________________________
>>> > View this message in context: How can I implement eigenvalue
>>> decomposition
>>> > in Spark?
>>> > Sent from the Apache Spark User List mailing list archive at
>>> Nabble.com.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>
>>>
>>
>
>
> --
> Li
> @vrilleup
>

Reply via email to