[ 
https://issues.apache.org/jira/browse/MAHOUT-986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shannon Quinn updated MAHOUT-986:
---------------------------------

    Description: 
Dan Brickley and I have been testing SpectralKMeans with a dbpedia dataset ( 
http://danbri.org/2012/spectral/dbpedia/ ); effectively, a graph with 4,192,499 
nodes. Not surprisingly, the LanczosSolver throws an OutOfMemoryError when it 
attempts to instantiate a DenseMatrix of dimensions 4192499-by-4192499 (~17.5 
trillion double-precision floating point values). Here's the full stack trace:

{quote}
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
        at org.apache.mahout.math.DenseMatrix.<init>(DenseMatrix.java:50)
        at 
org.apache.mahout.math.decomposer.lanczos.LanczosState.<init>(LanczosState.java:45)
        at 
org.apache.mahout.clustering.spectral.kmeans.SpectralKMeansDriver.run(SpectralKMeansDriver.java:146)
        at 
org.apache.mahout.clustering.spectral.kmeans.SpectralKMeansDriver.run(SpectralKMeansDriver.java:86)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
        at 
org.apache.mahout.clustering.spectral.kmeans.SpectralKMeansDriver.main(SpectralKMeansDriver.java:53)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:616)
        at 
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
        at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
        at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:616)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
{quote}

Obviously SKM needs a more sustainable and memory-efficient way of performing 
an eigen-decomposition of the graph laplacian. For those who are more 
knowledgeable in the linear systems solvers of Mahout than I, can the Lanczos 
parameters be tweaked to negate the requirement of a full DenseMatrix? Or 
should SKM move to SSVD instead?

  was:
Dan Brickley and I have been testing SpectralKMeans with a dbpedia dataset ( 
http://danbri.org/2012/spectral/dbpedia/ ); effectively, a graph with 4,192,499 
nodes. Not surprisingly, the LanczosSolver throws an OutOfMemoryError when it 
attempts to instantiate a DenseMatrix of dimensions 4192499-by-4192499 (~17.5 
trillion double-precision floating point values). Here's the full stack trace:

{{Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
        at org.apache.mahout.math.DenseMatrix.<init>(DenseMatrix.java:50)
        at 
org.apache.mahout.math.decomposer.lanczos.LanczosState.<init>(LanczosState.java:45)
        at 
org.apache.mahout.clustering.spectral.kmeans.SpectralKMeansDriver.run(SpectralKMeansDriver.java:146)
        at 
org.apache.mahout.clustering.spectral.kmeans.SpectralKMeansDriver.run(SpectralKMeansDriver.java:86)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
        at 
org.apache.mahout.clustering.spectral.kmeans.SpectralKMeansDriver.main(SpectralKMeansDriver.java:53)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:616)
        at 
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
        at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
        at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:616)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:156)}}

Obviously SKM needs a more sustainable and memory-efficient way of performing 
an eigen-decomposition of the graph laplacian. For those who are more 
knowledgeable in the linear systems solvers of Mahout than I, can the Lanczos 
parameters be tweaked to negate the requirement of a full DenseMatrix? Or 
should SKM move to SSVD instead?

    
> OutOfMemoryError in LanczosState by way of SpectralKMeans
> ---------------------------------------------------------
>
>                 Key: MAHOUT-986
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-986
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Clustering
>    Affects Versions: 0.6
>         Environment: Ubuntu 11.10 (64-bit)
>            Reporter: Shannon Quinn
>            Assignee: Shannon Quinn
>            Priority: Minor
>             Fix For: 0.7
>
>
> Dan Brickley and I have been testing SpectralKMeans with a dbpedia dataset ( 
> http://danbri.org/2012/spectral/dbpedia/ ); effectively, a graph with 
> 4,192,499 nodes. Not surprisingly, the LanczosSolver throws an 
> OutOfMemoryError when it attempts to instantiate a DenseMatrix of dimensions 
> 4192499-by-4192499 (~17.5 trillion double-precision floating point values). 
> Here's the full stack trace:
> {quote}
> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
>       at org.apache.mahout.math.DenseMatrix.<init>(DenseMatrix.java:50)
>       at 
> org.apache.mahout.math.decomposer.lanczos.LanczosState.<init>(LanczosState.java:45)
>       at 
> org.apache.mahout.clustering.spectral.kmeans.SpectralKMeansDriver.run(SpectralKMeansDriver.java:146)
>       at 
> org.apache.mahout.clustering.spectral.kmeans.SpectralKMeansDriver.run(SpectralKMeansDriver.java:86)
>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>       at 
> org.apache.mahout.clustering.spectral.kmeans.SpectralKMeansDriver.main(SpectralKMeansDriver.java:53)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:616)
>       at 
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>       at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>       at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:616)
>       at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> {quote}
> Obviously SKM needs a more sustainable and memory-efficient way of performing 
> an eigen-decomposition of the graph laplacian. For those who are more 
> knowledgeable in the linear systems solvers of Mahout than I, can the Lanczos 
> parameters be tweaked to negate the requirement of a full DenseMatrix? Or 
> should SKM move to SSVD instead?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to