[ https://issues.apache.org/jira/browse/MAHOUT-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13168754#comment-13168754 ]
Hudson commented on MAHOUT-922: ------------------------------- Integrated in Mahout-Quality #1251 (See [https://builds.apache.org/job/Mahout-Quality/1251/]) MAHOUT-922:optional p, AB' tweaks, faster tests, style overhaul. dlyubimov : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1213842 Files : * /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/ABtDenseOutJob.java * /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/BBtJob.java * /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/BtJob.java * /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/DenseBlockWritable.java * /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/QJob.java * /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDCli.java * /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDPrototype.java * /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDSolver.java * /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/SparseRowBlockWritable.java * /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/UJob.java * /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/UpperTriangular.java * /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/VJob.java * /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/YtYJob.java * /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/qr/GivensThinSolver.java * /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/qr/GramSchmidt.java * /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/qr/GrammSchmidt.java * /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/qr/QRFirstStep.java * /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/qr/QRLastStep.java * /mahout/trunk/core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/LocalSSVDSolverDenseTest.java * /mahout/trunk/core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/LocalSSVDSolverSparseSequentialTest.java * /mahout/trunk/core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDTestsHelper.java > SSVD: ABt Job tweaks for extra sparse inputs > -------------------------------------------- > > Key: MAHOUT-922 > URL: https://issues.apache.org/jira/browse/MAHOUT-922 > Project: Mahout > Issue Type: Improvement > Components: Math > Affects Versions: 0.6 > Reporter: Dmitriy Lyubimov > Assignee: Dmitriy Lyubimov > Fix For: 0.6 > > Attachments: MAHOUT-922.patch, MAHOUT-922.patch, MAHOUT-922.patch > > > Per tests on Sebastian's extremely sparse large inputs (4.5m x 4.5 m). > AB' performance is still a bottleneck if one uses power iterations. For > sufficiently sparse inputs it may turn out that mappers cannot form the > entire blocked product in memory for Y_i. the Y_i block is going to be of > size s x (k+p) where s is number of A rows read in a given split. in cases > when A is extra sparse, such blocks may actually take more space than the A > input. When this happens, s is constrained by -oh parameter and combiners and > reducers get flooded by partial oh x (k+p) outer products and seem to have > hard time to sort and shuffle them (especially high pressure on combiners has > been seen). > So, several improvements in this patch: > -- present Y_i blocks as dense (they are beleived to be dense anyway, so > keeping them as sparse just eats up RAM by sparse encoding, so at least twice > as high blocks can actually be formed); > -- eliminate combining completely. instead of persisting and sorting and > summing up partial product in combiner, sum up map-side. if block height is > still insufficient and cannot be extended due RAM constraints (unlikely for > Sebastien's 4.5 x 4.5 mln case) just perform additional passes over B'. Since > computation is cpu bound, additional passes over B' should not register. > However, elimination of combiner phase for high load cases is probably going > to have a dramatic effect. > -- set max block height for Q'A and AB' separately instead of single -oh > option. Their scaling seems to be quite different in terms of OOM danger. in > my experiments Q'A blocking enters red zone at ~150,000 already whereas AB' > block height can freely roam over a million easily for the same RAM. I > provide 200,000 (~160Mb for k+p=100) as a default for AB' blocks which should > be enough for Sebastien's 4.5 x 4.5 mln sparse case without causing more than > one block. > Miscellanea: > -- Test run time: removed redundant tests and checks for SSVD. reduced test > input size. > -- Per Nathan's suggestion, p parameter is now optional, default is 15 > (computation time is cubic to it, so I want to be careful not to run it too > high by default). > Current patch branch work is here: > https://github.com/dlyubimov/mahout-commits/tree/MAHOUT-922 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira