[ https://issues.apache.org/jira/browse/MAHOUT-771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13086365#comment-13086365 ]
Ted Dunning commented on MAHOUT-771: ------------------------------------ I think so. It is a useful idea and I will include a variant in the SSVD refresh that I have coming up. I think that murmurhash is over-kill here and a simple prime number congruential approach versus the indexes is likely just as good. > Random Projection using sampled values > -------------------------------------- > > Key: MAHOUT-771 > URL: https://issues.apache.org/jira/browse/MAHOUT-771 > Project: Mahout > Issue Type: New Feature > Components: Math > Reporter: Lance Norskog > Priority: Minor > Attachments: RandomProjector.patch, RandomProjectorBenchmark.java > > > Random Projection implementation which follows two deterministic guarantees: > # The same data projected multiple times produces the same output > # Dense and sparse data with the same contents produce the same output > Custom class that does Random Projection based on Johnson-Lindenstrauss. This > implementation uses Achlioptas's results, which allow using method other than > a full-range random multiplier per sample: > * use 1 random bit to add or subtract a sample to a row sum > * use a random value from 1/6 to add (1/6), subtract (1/6), or ignore (4 out > of 6) a sample to a row sum > Custom implementations for both dense and sparse vectors are included. The > sparse vector implementation assumes the active values will fit in memory. > An implementation using full-range random multipliers made by > java.util.Random is included for reference/research. > *Database-friendly random projections: Johnson-Lindenstrauss with binary > coins* > _Dimitris Achlioptas_ > [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.84.4546&rep=rep1&type=pdf] -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira