exponential law for simulated singular values is probably too aggressive.
also Q normalizations are not needed. I need to poke the data simulation
there a bit more.


On Mon, Mar 17, 2014 at 3:26 PM, Dmitriy Lyubimov <dlie...@gmail.com> wrote:

> Hm. yeah. i can do the version of distributed QR used in MR SSVD and
> subsequently defined by Nathan Halko in his dissertation. That version
> seemed to be incredibly numberically stable.
>
> But i guess this is too much for a work not aligned with my current
> interest.
>
> Anyway, Cholesky-based SSVD should be enough (for now), i suppose. My PCA
> test exhibits a strange behavior where SSVD finds rank deficiency at 25-th
> value albeit i just generate the input with 100 singular vectors and
> spectrum 100:1. I may have an error in the input generation part, but even
> if i do, i would not expect it to be that bad.
>
>
> https://github.com/apache/mahout/blob/trunk/math-scala/src/test/scala/org/apache/mahout/math/scalabindings/MathSuite.scala
>  line
> 176, test ("spca") is in-core version of the test (distributed test
> generated 100% identical input with 100% identical results seen).
>
>
> On Mon, Mar 17, 2014 at 2:26 PM, Dmitriy Lyubimov (JIRA) 
> <j...@apache.org>wrote:
>
>>
>>      [
>> https://issues.apache.org/jira/browse/MAHOUT-1346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel]
>>
>> Dmitriy Lyubimov updated MAHOUT-1346:
>> -------------------------------------
>>
>>     Attachment: ScalaSparkBindings.pdf
>>
>> updating docs to reflect latest committed state.
>> Brought in distributed and in-core stochastic PCA scripts, colmeans,
>> colsums, drm-vector multiplication, more tests etc.etc. see the doc.
>>
>> > Spark Bindings (DRM)
>> > --------------------
>> >
>> >                 Key: MAHOUT-1346
>> >                 URL: https://issues.apache.org/jira/browse/MAHOUT-1346
>> >             Project: Mahout
>> >          Issue Type: Improvement
>> >    Affects Versions: 0.9
>> >            Reporter: Dmitriy Lyubimov
>> >            Assignee: Dmitriy Lyubimov
>> >             Fix For: 1.0
>> >
>> >         Attachments: ScalaSparkBindings.pdf
>> >
>> >
>> > Spark bindings for Mahout DRM.
>> > DRM DSL.
>> > Disclaimer. This will all be experimental at this point.
>> > The idea is to wrap DRM by Spark RDD with support of some basic
>> functionality, perhaps some humble beginning of Cost-based optimizer
>> > (0) Spark serialization support for Vector, Matrix
>> > (1) Bagel transposition
>> > (2) slim X'X
>> > (2a) not-so-slim X'X
>> > (3) blockify() (compose RDD containing vertical blocks of original
>> input)
>> > (4) read/write Mahout DRM off HDFS
>> > (5) A'B
>> > ...
>>
>>
>>
>> --
>> This message was sent by Atlassian JIRA
>> (v6.2#6252)
>>
>
>

Reply via email to