[
https://issues.apache.org/jira/browse/MAHOUT-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12577181#action_12577181
]
Jason Rennie commented on MAHOUT-6:
-----------------------------------
David's right, I'm wrong---a primitive implementation didn't help (wrt speed).
I tested the pcj IntKeyDoubleOpenHashMap vs. java.util.HashMap<Integer,Double>.
Tried two load factors (I set initial capacity to 1/loadFactor): 25% and 50%.
The two maps were about comparable for 25%. java.util.HashMap was faster for
50%. The CRS impl. was 2.3-3.6x faster than java.util.HashMap (after
subtracting the SW "base" times).
The details:
/**
* <ul>
* <li> Finished HashMap dot product in 6.501 seconds
* <li> Finished PrimitiveMap dot product in 6.589 seconds
* <li> Finished CRS dot product in 2.632 seconds
* <li> Finished SW base time in 1.162 seconds
* <li> numTrials=1000000 vectorSize=1000 nnz1=100 nnz2=100
loadFactor=25%
* </ul>
* <ul>
* <li> Finished HashMap dot product in 4.573 seconds
* <li> Finished PrimitiveMap dot product in 4.033 seconds
* <li> Finished CRS dot product in 2.290 seconds
* <li> Finished SW base time in 1.103 seconds
* <li> numTrials=1000000 vectorSize=1000 nnz1=200 nnz2=50
loadFactor=25%
* </ul>
* <ul>
* <li> Finished HashMap dot product in 6.244 seconds
* <li> Finished PrimitiveMap dot product in 7.417 seconds
* <li> Finished CRS dot product in 2.575 seconds
* <li> Finished SW base time in 1.054 seconds
* <li> numTrials=1000000 vectorSize=1000 nnz1=100 nnz2=100
loadFactor=50%
* </ul>
* <ul>
* <li> Finished HashMap dot product in 3.745 seconds
* <li> Finished PrimitiveMap dot product in 4.249 seconds
* <li> Finished CRS dot product in 2.270 seconds
* <li> Finished SW base time in 1.164 seconds
* <li> numTrials=1000000 vectorSize=1000 nnz1=200 nnz2=50
loadFactor=50%
* </ul>
*/
public void testSparseVectorPerformance() throws Exception {
StopWatch hmvSW = new StopWatch("HashMap dot product", log,
false);
StopWatch pmvSW = new StopWatch("PrimitiveMap dot product",
log, false);
StopWatch crsSW = new StopWatch("CRS dot product", log, false);
StopWatch baseSW = new StopWatch("SW base time", log, false);
final int numTrials = 1000000;
final int vectorSize = 1000;
final int nnz1 = 100;
final int nnz2 = 100;
final int sizeMultiple = 2;
for (int i = 0; i < numTrials; ++i) {
// ignore first 10% of iterations
if (i == numTrials / 10) {
hmvSW.reset();
pmvSW.reset();
crsSW.reset();
baseSW.reset();
}
SparseVectorPrimitiveMap pmv1 = new
SparseVectorPrimitiveMap(nnz1 * sizeMultiple);
SparseVectorPrimitiveMap pmv2 = new
SparseVectorPrimitiveMap(nnz2 * sizeMultiple);
SparseVectorHashMap hmv1 = new SparseVectorHashMap(nnz1
* sizeMultiple);
SparseVectorHashMap hmv2 = new SparseVectorHashMap(nnz2
* sizeMultiple);
for (int j = 0; j < nnz1; ++j) {
int index = this.rand.nextInt(vectorSize) + 1;
double value = this.rand.nextDouble();
pmv1.set(index, value);
hmv1.set(index, value);
}
for (int j = 0; j < nnz2; ++j) {
int index = this.rand.nextInt(vectorSize) + 1;
double value = this.rand.nextDouble();
pmv2.set(index, value);
hmv2.set(index, value);
}
SparseVector crsv1 = pmv1.buildSparseVector();
SparseVector crsv2 = pmv2.buildSparseVector();
hmvSW.start();
hmv1.dot(hmv2);
hmvSW.stop();
pmvSW.start();
pmv1.dot(pmv2);
pmvSW.stop();
crsSW.start();
crsv1.dot(crsv2);
crsSW.stop();
baseSW.start();
baseSW.stop();
}
hmvSW.logEndMessage();
pmvSW.logEndMessage();
crsSW.logEndMessage();
baseSW.logEndMessage();
log.debug("numTrials=" + numTrials + " vectorSize=" +
vectorSize + " nnz1=" + nnz1 + " nnz2=" + nnz2 + " loadFactor=" + (100 /
sizeMultiple) + "%");
}
> Need a matrix implementation
> ----------------------------
>
> Key: MAHOUT-6
> URL: https://issues.apache.org/jira/browse/MAHOUT-6
> Project: Mahout
> Issue Type: New Feature
> Reporter: Ted Dunning
> Assignee: Grant Ingersoll
> Attachments: MAHOUT-6a.diff, MAHOUT-6b.diff, MAHOUT-6c.diff,
> MAHOUT-6d.diff, MAHOUT-6e.diff, MAHOUT-6f.diff, MAHOUT-6g.diff,
> MAHOUT-6h.patch, MAHOUT-6i.diff, MAHOUT-6j.diff, MAHOUT-6k.diff,
> MAHOUT-6l.patch
>
>
> We need matrices for Mahout.
> An initial set of basic requirements includes:
> a) sparse and dense support are required
> b) row and column labels are important
> c) serialization for hadoop use is required
> d) reasonable floating point performance is required, but awesome FP is not
> e) the API should be simple enough to understand
> f) it should be easy to carve out sub-matrices for sending to different
> reducers
> g) a reasonable set of matrix operations should be supported, these should
> eventually include:
> simple matrix-matrix and matrix-vector and matrix-scalar linear algebra
> operations, A B, A + B, A v, A + x, v + x, u + v, dot(u, v)
> row and column sums
> generalized level 2 and 3 BLAS primitives, alpha A B + beta C and A u +
> beta v
> h) easy and efficient iteration constructs, especially for sparse matrices
> i) easy to extend with new implementations
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.