[jira] [Commented] (MAHOUT-1193) We may want a BlockSparseMatrix

2013-07-08 Thread Saleem Ansari (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13702344#comment-13702344
 ] 

Saleem Ansari commented on MAHOUT-1193:
---

Hello Ted,

I have fixed the test cases. The central issue to the problem was that the 
class members "rows" and "columns" were conflicting with the parent class 
members ( AbstractMatrix ).

That fixed all test cases except two:
 * testClone() -- this failed because of missing clone() method 
 * testViewColumnIndexOver() -- this was failing because BlockSparseMatrix have 
extensible rows

I have added clone() method and also fixed remaining test cases in 
BlockSparseMatrixTest class.

Now all tests are passing. Please have a look at the patch attached in previous 
comment: [^MAHOUT-1193-all-tests-pass.patch]


Thanks,
Saleem


> We may want a BlockSparseMatrix
> ---
>
> Key: MAHOUT-1193
> URL: https://issues.apache.org/jira/browse/MAHOUT-1193
> Project: Mahout
>  Issue Type: Bug
>Reporter: Ted Dunning
> Fix For: Backlog
>
> Attachments: MAHOUT-1193-all-tests-pass.patch, 
> MAHOUT-1193-fix-compile-errors-tests-still-fail.patch, MAHOUT-1193.patch
>
>
> Here is an implementation.
> Is it good enough to commit?
> Is it useful?
> Is it redundant?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAHOUT-1193) We may want a BlockSparseMatrix

2013-07-05 Thread Ted Dunning (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13701127#comment-13701127
 ] 

Ted Dunning commented on MAHOUT-1193:
-

Saleem,

Thanks for your work.

These errors indicate that the new matrix is pretty much not working as 
expected.  The desired behavior is that the matrix should emulate the operation 
of the normal matrices like Dense or SparseVector within reasonably broad 
limits.

I won't have time to look at this right away, but I suspect a fairly central 
problem is causing all of these issues.


> We may want a BlockSparseMatrix
> ---
>
> Key: MAHOUT-1193
> URL: https://issues.apache.org/jira/browse/MAHOUT-1193
> Project: Mahout
>  Issue Type: Bug
>Reporter: Ted Dunning
> Fix For: Backlog
>
> Attachments: MAHOUT-1193-fix-compile-errors-tests-still-fail.patch, 
> MAHOUT-1193.patch
>
>
> Here is an implementation.
> Is it good enough to commit?
> Is it useful?
> Is it redundant?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAHOUT-1193) We may want a BlockSparseMatrix

2013-07-05 Thread Saleem Ansari (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13700957#comment-13700957
 ] 

Saleem Ansari commented on MAHOUT-1193:
---

I have attached a patch which fixes the errors as mentioned above.

> We may want a BlockSparseMatrix
> ---
>
> Key: MAHOUT-1193
> URL: https://issues.apache.org/jira/browse/MAHOUT-1193
> Project: Mahout
>  Issue Type: Bug
>Reporter: Ted Dunning
> Fix For: Backlog
>
> Attachments: MAHOUT-1193-fix-compile-errors-tests-still-fail.patch, 
> MAHOUT-1193.patch
>
>
> Here is an implementation.
> Is it good enough to commit?
> Is it useful?
> Is it redundant?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAHOUT-1193) We may want a BlockSparseMatrix

2013-07-05 Thread Saleem Ansari (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13700956#comment-13700956
 ] 

Saleem Ansari commented on MAHOUT-1193:
---

Hi,

I was trying to understand how this Block Sparse Matrix is supposed to work. To 
begin with, I jave only tried to fix compile errors with respect to current 
trunk codebase.

BlockSparseMatrix.java

 * Resolved compilation errors against trunk.
 * Added unimplemented methods:  mergeUpdates(), getLookupCost(), 
getIteratorAdvanceCost(), isAddConstantTime()
 * Implemented methods: mergeUpdates(), getLookupCost(), 
getIteratorAdvanceCost(), isAddConstantTime()
 * Changed getColumn() -> viewColumn()
 * Changed getRow() -> viewRow()

BlockSparseMatrixTest.java

 * Use viewRow instead of getRow. The member 'test' is still private
 * Comment out private member

However many of the tests failed. I have put the test errors in a pastebin:

 * Test Errors: http://pastebin.com/0Za4AF3q

Is there any reference document or a paper against which this implementation 
was made?

Thanks,
Saleem

> We may want a BlockSparseMatrix
> ---
>
> Key: MAHOUT-1193
> URL: https://issues.apache.org/jira/browse/MAHOUT-1193
> Project: Mahout
>  Issue Type: Bug
>Reporter: Ted Dunning
> Fix For: Backlog
>
> Attachments: MAHOUT-1193-fix-compile-errors-tests-still-fail.patch, 
> MAHOUT-1193.patch
>
>
> Here is an implementation.
> Is it good enough to commit?
> Is it useful?
> Is it redundant?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAHOUT-1193) We may want a BlockSparseMatrix

2013-04-26 Thread Gokhan Capan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13642752#comment-13642752
 ] 

Gokhan Capan commented on MAHOUT-1193:
--

Sorry I missed that.

I modified the SparseMatrix code to handle dense rows and I am happy with that. 
The code is not patch-quality, but I can implement a flexible extension to the 
current implementation if that is desired (I believe that might be a common use 
case).

I personally liked the BlockSparseMatrix idea and its really flexible schema. I 
did a quick implementation to make it work with configurable block size, in a 
few days I can submit an additional diff to the reviewboard so we can discuss 
on code. One thing to consider, I suspect my version's CPU usage is kind of 
high. 

I believe both versions are valuable and important, they have their own 
benefits, particularly as an input to online learning algorithms.

> We may want a BlockSparseMatrix
> ---
>
> Key: MAHOUT-1193
> URL: https://issues.apache.org/jira/browse/MAHOUT-1193
> Project: Mahout
>  Issue Type: Bug
>Reporter: Ted Dunning
> Attachments: MAHOUT-1193.patch
>
>
> Here is an implementation.
> Is it good enough to commit?
> Is it useful?
> Is it redundant?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAHOUT-1193) We may want a BlockSparseMatrix

2013-04-24 Thread Ted Dunning (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13640556#comment-13640556
 ] 

Ted Dunning commented on MAHOUT-1193:
-

Gokhan,

Is there a patch you meant to attach?

What do you think that Mahout should have?  Stay with SparseMatrix?  Perhaps 
add a flexible schema capability to SparseMatrix?

> We may want a BlockSparseMatrix
> ---
>
> Key: MAHOUT-1193
> URL: https://issues.apache.org/jira/browse/MAHOUT-1193
> Project: Mahout
>  Issue Type: Bug
>Reporter: Ted Dunning
> Attachments: MAHOUT-1193.patch
>
>
> Here is an implementation.
> Is it good enough to commit?
> Is it useful?
> Is it redundant?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAHOUT-1193) We may want a BlockSparseMatrix

2013-04-24 Thread Gokhan Capan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13640268#comment-13640268
 ] 

Gokhan Capan commented on MAHOUT-1193:
--

Ok, here are the updates:

I modified the code a little (made it run and modified as I had commented 
previosly), and did some tests within the real application that I mentioned int 
the user list.

Performance of get and sets (bigger is better):
DenseMatrix > SparseMatrix (with dense rows) > BlockSparseMatrix > SparseMatrix 
(with sparse rows) > SparseColumnMatrix


Performance difference between SparseMatrix with dense rows and 
BlockSparseMatrix is small.

One drawback of SparseMatrix might be that you need to specify the rowSize in 
advance (which means you need to set a boundary for your row indices). This 
wasn't a problem for me, but it's worth mentioning. With this version of 
BlockSparseMatrix, there might also be a memory overhead depending on 
blockSize. 

I decided to go for SparseMatrix with dense rows for now, but I also work on 
BlockSparseMatrix code (thanks to the flexible schema).

> We may want a BlockSparseMatrix
> ---
>
> Key: MAHOUT-1193
> URL: https://issues.apache.org/jira/browse/MAHOUT-1193
> Project: Mahout
>  Issue Type: Bug
>Reporter: Ted Dunning
> Attachments: MAHOUT-1193.patch
>
>
> Here is an implementation.
> Is it good enough to commit?
> Is it useful?
> Is it redundant?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAHOUT-1193) We may want a BlockSparseMatrix

2013-04-18 Thread Ted Dunning (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13635838#comment-13635838
 ] 

Ted Dunning commented on MAHOUT-1193:
-

The last time that this code compiled was well over a year ago.  Don't imagine 
that it is correct or even all that sensible.



> We may want a BlockSparseMatrix
> ---
>
> Key: MAHOUT-1193
> URL: https://issues.apache.org/jira/browse/MAHOUT-1193
> Project: Mahout
>  Issue Type: Bug
>Reporter: Ted Dunning
> Attachments: MAHOUT-1193.patch
>
>
> Here is an implementation.
> Is it good enough to commit?
> Is it useful?
> Is it redundant?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAHOUT-1193) We may want a BlockSparseMatrix

2013-04-18 Thread Gokhan Capan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13635670#comment-13635670
 ] 

Gokhan Capan commented on MAHOUT-1193:
--

Is it just me or doesn't it compile because it does not have super-matching 
constructor and cardinality is not declared?

What I understand from the implementation is that we create a Map, each Entry of which represents a block and the associated DenseMatrix.

If I didn't totally misunderstand the implementation, if the blockSize always 
will be 1, this associates a matrix with each row. 

Say I want to sacrifice some memory and try to set blockSize to 5, so if there 
were n actual rows in [row/blockSize, row/blockSize+5), there would be 5-n 
empty ones, and I am OK with that. Shouldn't we modify the extendToThisRow 
method such that:

int blockIndex = row / blockSize;
Matrix block = data.get(blockIndex);
if (block == null) {
  data.put(blockIndex, new DenseMatrix(blockSize, columns));
} else if (!block.hasRow(row)) {
  block.assignRow(row % blockIndex, new DenseVector(columns))
}
rows = Math.max(row + 1, rows);
cardinality[ROW] = rows;

> We may want a BlockSparseMatrix
> ---
>
> Key: MAHOUT-1193
> URL: https://issues.apache.org/jira/browse/MAHOUT-1193
> Project: Mahout
>  Issue Type: Bug
>Reporter: Ted Dunning
> Attachments: MAHOUT-1193.patch
>
>
> Here is an implementation.
> Is it good enough to commit?
> Is it useful?
> Is it redundant?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAHOUT-1193) We may want a BlockSparseMatrix

2013-04-18 Thread Dmitriy Lyubimov (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13635447#comment-13635447
 ] 

Dmitriy Lyubimov commented on MAHOUT-1193:
--

I think this is very useful. 

I could use something like that for streaming QR solvers. Typically what 
happens, there's a block of fixed height and width (usually dense or almost 
dense) but it acts queue-like for rows (i.e. one can push one row from below 
and remove one low from above). 

The most efficient scheme that I used prevents object creation in that scenario 
is that the memory that backs the pushed-out row can then be reused to push new 
row from below. But I used just java 2d-array, i am sure there could be 
something more clever and more general.

> We may want a BlockSparseMatrix
> ---
>
> Key: MAHOUT-1193
> URL: https://issues.apache.org/jira/browse/MAHOUT-1193
> Project: Mahout
>  Issue Type: Bug
>Reporter: Ted Dunning
> Attachments: MAHOUT-1193.patch
>
>
> Here is an implementation.
> Is it good enough to commit?
> Is it useful?
> Is it redundant?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira