[jira] [Commented] (MAHOUT-1193) We may want a BlockSparseMatrix
[ https://issues.apache.org/jira/browse/MAHOUT-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13702344#comment-13702344 ] Saleem Ansari commented on MAHOUT-1193: --- Hello Ted, I have fixed the test cases. The central issue to the problem was that the class members "rows" and "columns" were conflicting with the parent class members ( AbstractMatrix ). That fixed all test cases except two: * testClone() -- this failed because of missing clone() method * testViewColumnIndexOver() -- this was failing because BlockSparseMatrix have extensible rows I have added clone() method and also fixed remaining test cases in BlockSparseMatrixTest class. Now all tests are passing. Please have a look at the patch attached in previous comment: [^MAHOUT-1193-all-tests-pass.patch] Thanks, Saleem > We may want a BlockSparseMatrix > --- > > Key: MAHOUT-1193 > URL: https://issues.apache.org/jira/browse/MAHOUT-1193 > Project: Mahout > Issue Type: Bug >Reporter: Ted Dunning > Fix For: Backlog > > Attachments: MAHOUT-1193-all-tests-pass.patch, > MAHOUT-1193-fix-compile-errors-tests-still-fail.patch, MAHOUT-1193.patch > > > Here is an implementation. > Is it good enough to commit? > Is it useful? > Is it redundant? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-1193) We may want a BlockSparseMatrix
[ https://issues.apache.org/jira/browse/MAHOUT-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13701127#comment-13701127 ] Ted Dunning commented on MAHOUT-1193: - Saleem, Thanks for your work. These errors indicate that the new matrix is pretty much not working as expected. The desired behavior is that the matrix should emulate the operation of the normal matrices like Dense or SparseVector within reasonably broad limits. I won't have time to look at this right away, but I suspect a fairly central problem is causing all of these issues. > We may want a BlockSparseMatrix > --- > > Key: MAHOUT-1193 > URL: https://issues.apache.org/jira/browse/MAHOUT-1193 > Project: Mahout > Issue Type: Bug >Reporter: Ted Dunning > Fix For: Backlog > > Attachments: MAHOUT-1193-fix-compile-errors-tests-still-fail.patch, > MAHOUT-1193.patch > > > Here is an implementation. > Is it good enough to commit? > Is it useful? > Is it redundant? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-1193) We may want a BlockSparseMatrix
[ https://issues.apache.org/jira/browse/MAHOUT-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13700957#comment-13700957 ] Saleem Ansari commented on MAHOUT-1193: --- I have attached a patch which fixes the errors as mentioned above. > We may want a BlockSparseMatrix > --- > > Key: MAHOUT-1193 > URL: https://issues.apache.org/jira/browse/MAHOUT-1193 > Project: Mahout > Issue Type: Bug >Reporter: Ted Dunning > Fix For: Backlog > > Attachments: MAHOUT-1193-fix-compile-errors-tests-still-fail.patch, > MAHOUT-1193.patch > > > Here is an implementation. > Is it good enough to commit? > Is it useful? > Is it redundant? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-1193) We may want a BlockSparseMatrix
[ https://issues.apache.org/jira/browse/MAHOUT-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13700956#comment-13700956 ] Saleem Ansari commented on MAHOUT-1193: --- Hi, I was trying to understand how this Block Sparse Matrix is supposed to work. To begin with, I jave only tried to fix compile errors with respect to current trunk codebase. BlockSparseMatrix.java * Resolved compilation errors against trunk. * Added unimplemented methods: mergeUpdates(), getLookupCost(), getIteratorAdvanceCost(), isAddConstantTime() * Implemented methods: mergeUpdates(), getLookupCost(), getIteratorAdvanceCost(), isAddConstantTime() * Changed getColumn() -> viewColumn() * Changed getRow() -> viewRow() BlockSparseMatrixTest.java * Use viewRow instead of getRow. The member 'test' is still private * Comment out private member However many of the tests failed. I have put the test errors in a pastebin: * Test Errors: http://pastebin.com/0Za4AF3q Is there any reference document or a paper against which this implementation was made? Thanks, Saleem > We may want a BlockSparseMatrix > --- > > Key: MAHOUT-1193 > URL: https://issues.apache.org/jira/browse/MAHOUT-1193 > Project: Mahout > Issue Type: Bug >Reporter: Ted Dunning > Fix For: Backlog > > Attachments: MAHOUT-1193-fix-compile-errors-tests-still-fail.patch, > MAHOUT-1193.patch > > > Here is an implementation. > Is it good enough to commit? > Is it useful? > Is it redundant? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-1193) We may want a BlockSparseMatrix
[ https://issues.apache.org/jira/browse/MAHOUT-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13642752#comment-13642752 ] Gokhan Capan commented on MAHOUT-1193: -- Sorry I missed that. I modified the SparseMatrix code to handle dense rows and I am happy with that. The code is not patch-quality, but I can implement a flexible extension to the current implementation if that is desired (I believe that might be a common use case). I personally liked the BlockSparseMatrix idea and its really flexible schema. I did a quick implementation to make it work with configurable block size, in a few days I can submit an additional diff to the reviewboard so we can discuss on code. One thing to consider, I suspect my version's CPU usage is kind of high. I believe both versions are valuable and important, they have their own benefits, particularly as an input to online learning algorithms. > We may want a BlockSparseMatrix > --- > > Key: MAHOUT-1193 > URL: https://issues.apache.org/jira/browse/MAHOUT-1193 > Project: Mahout > Issue Type: Bug >Reporter: Ted Dunning > Attachments: MAHOUT-1193.patch > > > Here is an implementation. > Is it good enough to commit? > Is it useful? > Is it redundant? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-1193) We may want a BlockSparseMatrix
[ https://issues.apache.org/jira/browse/MAHOUT-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13640556#comment-13640556 ] Ted Dunning commented on MAHOUT-1193: - Gokhan, Is there a patch you meant to attach? What do you think that Mahout should have? Stay with SparseMatrix? Perhaps add a flexible schema capability to SparseMatrix? > We may want a BlockSparseMatrix > --- > > Key: MAHOUT-1193 > URL: https://issues.apache.org/jira/browse/MAHOUT-1193 > Project: Mahout > Issue Type: Bug >Reporter: Ted Dunning > Attachments: MAHOUT-1193.patch > > > Here is an implementation. > Is it good enough to commit? > Is it useful? > Is it redundant? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-1193) We may want a BlockSparseMatrix
[ https://issues.apache.org/jira/browse/MAHOUT-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13640268#comment-13640268 ] Gokhan Capan commented on MAHOUT-1193: -- Ok, here are the updates: I modified the code a little (made it run and modified as I had commented previosly), and did some tests within the real application that I mentioned int the user list. Performance of get and sets (bigger is better): DenseMatrix > SparseMatrix (with dense rows) > BlockSparseMatrix > SparseMatrix (with sparse rows) > SparseColumnMatrix Performance difference between SparseMatrix with dense rows and BlockSparseMatrix is small. One drawback of SparseMatrix might be that you need to specify the rowSize in advance (which means you need to set a boundary for your row indices). This wasn't a problem for me, but it's worth mentioning. With this version of BlockSparseMatrix, there might also be a memory overhead depending on blockSize. I decided to go for SparseMatrix with dense rows for now, but I also work on BlockSparseMatrix code (thanks to the flexible schema). > We may want a BlockSparseMatrix > --- > > Key: MAHOUT-1193 > URL: https://issues.apache.org/jira/browse/MAHOUT-1193 > Project: Mahout > Issue Type: Bug >Reporter: Ted Dunning > Attachments: MAHOUT-1193.patch > > > Here is an implementation. > Is it good enough to commit? > Is it useful? > Is it redundant? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-1193) We may want a BlockSparseMatrix
[ https://issues.apache.org/jira/browse/MAHOUT-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13635838#comment-13635838 ] Ted Dunning commented on MAHOUT-1193: - The last time that this code compiled was well over a year ago. Don't imagine that it is correct or even all that sensible. > We may want a BlockSparseMatrix > --- > > Key: MAHOUT-1193 > URL: https://issues.apache.org/jira/browse/MAHOUT-1193 > Project: Mahout > Issue Type: Bug >Reporter: Ted Dunning > Attachments: MAHOUT-1193.patch > > > Here is an implementation. > Is it good enough to commit? > Is it useful? > Is it redundant? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-1193) We may want a BlockSparseMatrix
[ https://issues.apache.org/jira/browse/MAHOUT-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13635670#comment-13635670 ] Gokhan Capan commented on MAHOUT-1193: -- Is it just me or doesn't it compile because it does not have super-matching constructor and cardinality is not declared? What I understand from the implementation is that we create a Map, each Entry of which represents a block and the associated DenseMatrix. If I didn't totally misunderstand the implementation, if the blockSize always will be 1, this associates a matrix with each row. Say I want to sacrifice some memory and try to set blockSize to 5, so if there were n actual rows in [row/blockSize, row/blockSize+5), there would be 5-n empty ones, and I am OK with that. Shouldn't we modify the extendToThisRow method such that: int blockIndex = row / blockSize; Matrix block = data.get(blockIndex); if (block == null) { data.put(blockIndex, new DenseMatrix(blockSize, columns)); } else if (!block.hasRow(row)) { block.assignRow(row % blockIndex, new DenseVector(columns)) } rows = Math.max(row + 1, rows); cardinality[ROW] = rows; > We may want a BlockSparseMatrix > --- > > Key: MAHOUT-1193 > URL: https://issues.apache.org/jira/browse/MAHOUT-1193 > Project: Mahout > Issue Type: Bug >Reporter: Ted Dunning > Attachments: MAHOUT-1193.patch > > > Here is an implementation. > Is it good enough to commit? > Is it useful? > Is it redundant? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-1193) We may want a BlockSparseMatrix
[ https://issues.apache.org/jira/browse/MAHOUT-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13635447#comment-13635447 ] Dmitriy Lyubimov commented on MAHOUT-1193: -- I think this is very useful. I could use something like that for streaming QR solvers. Typically what happens, there's a block of fixed height and width (usually dense or almost dense) but it acts queue-like for rows (i.e. one can push one row from below and remove one low from above). The most efficient scheme that I used prevents object creation in that scenario is that the memory that backs the pushed-out row can then be reused to push new row from below. But I used just java 2d-array, i am sure there could be something more clever and more general. > We may want a BlockSparseMatrix > --- > > Key: MAHOUT-1193 > URL: https://issues.apache.org/jira/browse/MAHOUT-1193 > Project: Mahout > Issue Type: Bug >Reporter: Ted Dunning > Attachments: MAHOUT-1193.patch > > > Here is an implementation. > Is it good enough to commit? > Is it useful? > Is it redundant? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira