exponential law for simulated singular values is probably too aggressive. also Q normalizations are not needed. I need to poke the data simulation there a bit more.
On Mon, Mar 17, 2014 at 3:26 PM, Dmitriy Lyubimov <dlie...@gmail.com> wrote: > Hm. yeah. i can do the version of distributed QR used in MR SSVD and > subsequently defined by Nathan Halko in his dissertation. That version > seemed to be incredibly numberically stable. > > But i guess this is too much for a work not aligned with my current > interest. > > Anyway, Cholesky-based SSVD should be enough (for now), i suppose. My PCA > test exhibits a strange behavior where SSVD finds rank deficiency at 25-th > value albeit i just generate the input with 100 singular vectors and > spectrum 100:1. I may have an error in the input generation part, but even > if i do, i would not expect it to be that bad. > > > https://github.com/apache/mahout/blob/trunk/math-scala/src/test/scala/org/apache/mahout/math/scalabindings/MathSuite.scala > line > 176, test ("spca") is in-core version of the test (distributed test > generated 100% identical input with 100% identical results seen). > > > On Mon, Mar 17, 2014 at 2:26 PM, Dmitriy Lyubimov (JIRA) > <j...@apache.org>wrote: > >> >> [ >> https://issues.apache.org/jira/browse/MAHOUT-1346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel] >> >> Dmitriy Lyubimov updated MAHOUT-1346: >> ------------------------------------- >> >> Attachment: ScalaSparkBindings.pdf >> >> updating docs to reflect latest committed state. >> Brought in distributed and in-core stochastic PCA scripts, colmeans, >> colsums, drm-vector multiplication, more tests etc.etc. see the doc. >> >> > Spark Bindings (DRM) >> > -------------------- >> > >> > Key: MAHOUT-1346 >> > URL: https://issues.apache.org/jira/browse/MAHOUT-1346 >> > Project: Mahout >> > Issue Type: Improvement >> > Affects Versions: 0.9 >> > Reporter: Dmitriy Lyubimov >> > Assignee: Dmitriy Lyubimov >> > Fix For: 1.0 >> > >> > Attachments: ScalaSparkBindings.pdf >> > >> > >> > Spark bindings for Mahout DRM. >> > DRM DSL. >> > Disclaimer. This will all be experimental at this point. >> > The idea is to wrap DRM by Spark RDD with support of some basic >> functionality, perhaps some humble beginning of Cost-based optimizer >> > (0) Spark serialization support for Vector, Matrix >> > (1) Bagel transposition >> > (2) slim X'X >> > (2a) not-so-slim X'X >> > (3) blockify() (compose RDD containing vertical blocks of original >> input) >> > (4) read/write Mahout DRM off HDFS >> > (5) A'B >> > ... >> >> >> >> -- >> This message was sent by Atlassian JIRA >> (v6.2#6252) >> > >