[ 
https://issues.apache.org/jira/browse/MAHOUT-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13631235#comment-13631235
 ] 

Robin Anil commented on MAHOUT-1190:
------------------------------------

Few more good updates
1) I moved the iterator logic of RASV to OpenIntDoubleHashmap, this removes the 
extra copy needed and iterates directly on the hashmap arrays. This is giving 
an extra bump of 5-10% for RASV on benchmarks. Overall the benchmarks have 
improved 30-55% for RASV
2) The dot product for SASV was in-efficiently implemented as well, I also 
noticed the dot product has some magic constants. We need to remove those as 
the tests on one machine is not an indicator on overall performance. Rewriting 
the dot product for SASV pulled it ahead of RASV by 40% (even with all the 
optimizations listed above).
3) The benchmark code was running at high density. Increasing the cardinality 
to 100K and keeping doc length at 1000 (like a text corpus), the benchmarks are 
looking more realistic.

Benchmark running at 100K cardinality and 1K doc length.

{noformat}
BenchMarks              DenseVector             RandSparseVector        
SeqSparseVector         Dense.fn(Rand)          Dense.fn(Seq)           
Rand.fn(Dense)          Rand.fn(Seq)            Seq.fn(Dense)           
Seq.fn(Rand)            
Create (copy)                                                                   
                
                        nCalls = 20000;         nCalls = 20000;         nCalls 
= 20000;         
                        sum = 2.898515s;        sum = 1.598739s;        sum = 
0.744008s;        
                        min = 0.083ms;          min = 0.059ms;          min = 
0.027ms;          
                        max = 43.645ms;         max = 14.333ms;         max = 
21.283ms;         
                        mean = 0.144925ms;      mean = 0.079936ms;      mean = 
0.0372ms;        
                        stdDev = 0.95928ms;     stdDev = 0.173512ms;    stdDev 
= 0.150829ms;    
                        Speed = 6900.085 /sec   Speed = 12509.859 /sec  Speed = 
26881.432 /sec  
                        Rate = 82.801025 MB/s   Rate = 150.11832 MB/s   Rate = 
322.57718 MB/s   

DotProduct                                                                      
                                                                                
                                                                                
                        nCalls = 20000;         nCalls = 20000;         nCalls 
= 20000;         nCalls = 20000;         nCalls = 20000;         nCalls = 
20000;         nCalls = 20000;         nCalls = 20000;         nCalls = 20000;  
       
                        sum = 2.077209s;        sum = 1.117693s;        sum = 
0.712275s;        sum = 1.932336s;        sum = 3.017146s;        sum = 
1.910794s;        sum = 1.990205s;        sum = 3.039884s;        sum = 
0.769796s;        
                        min = 0.084ms;          min = 0.038ms;          min = 
0.016ms;          min = 0.072ms;          min = 0.127ms;          min = 
0.078ms;          min = 0.086ms;          min = 0.131ms;          min = 
0.022ms;          
                        max = 5.707ms;          max = 5.051ms;          max = 
0.984ms;          max = 0.419ms;          max = 6.065ms;          max = 
0.221ms;          max = 0.614ms;          max = 0.913ms;          max = 0.31ms; 
          
                        mean = 0.10386ms;       mean = 0.055884ms;      mean = 
0.035613ms;      mean = 0.096616ms;      mean = 0.150857ms;      mean = 
0.095539ms;      mean = 0.09951ms;       mean = 0.151994ms;      mean = 
0.038489ms;      
                        stdDev = 0.040833ms;    stdDev = 0.055292ms;    stdDev 
= 0.034072ms;    stdDev = 0.020768ms;    stdDev = 0.061155ms;    stdDev = 
0.005271ms;    stdDev = 0.0227ms;      stdDev = 0.037256ms;    stdDev = 
0.024026ms;    
                        Speed = 9628.304 /sec   Speed = 17894.0 /sec    Speed = 
28079.041 /sec  Speed = 10350.167 /sec  Speed = 6628.781 /sec   Speed = 
10466.853 /sec  Speed = 10049.216 /sec  Speed = 6579.198 /sec   Speed = 
25980.91 /sec   
                        Rate = 115.53966 MB/s   Rate = 214.72801 MB/s   Rate = 
336.94852 MB/s   Rate = 124.202 MB/s     Rate = 79.54537 MB/s    Rate = 
125.60224 MB/s   Rate = 120.59059 MB/s   Rate = 78.950386 MB/s   Rate = 
311.77094 MB/s   

org.apache.mahout.common.distance.CosineDistanceMeasure                         
                                                                                
                                                                                
                               
                        nCalls = 20000;         nCalls = 20000;         nCalls 
= 20000;         nCalls = 20000;         nCalls = 20000;         nCalls = 
20000;         nCalls = 20000;         nCalls = 20000;         nCalls = 20000;  
       
                        sum = 20.332047s;       sum = 10.604119s;       sum = 
9.226591s;        sum = 15.916052s;       sum = 27.26601s;        sum = 
15.092329s;       sum = 7.213573s;        sum = 29.085011s;       sum = 
19.878921s;       
                        min = 0.927ms;          min = 0.47ms;           min = 
0.395ms;          min = 0.725ms;          min = 1.261ms;          min = 
0.584ms;          min = 0.327ms;          min = 1.341ms;          min = 
0.919ms;          
                        max = 4.847ms;          max = 3.641ms;          max = 
5.204ms;          max = 1.338ms;          max = 6.759ms;          max = 
1.486ms;          max = 0.677ms;          max = 2.713ms;          max = 
53.207ms;         
                        mean = 1.016602ms;      mean = 0.530205ms;      mean = 
0.461329ms;      mean = 0.795802ms;      mean = 1.3633ms;        mean = 
0.754616ms;      mean = 0.360678ms;      mean = 1.45425ms;       mean = 
0.993946ms;      
                        stdDev = 0.0485ms;      stdDev = 0.061561ms;    stdDev 
= 0.120737ms;    stdDev = 0.032558ms;    stdDev = 0.085586ms;    stdDev = 
0.08194ms;     stdDev = 0.039636ms;    stdDev = 0.081356ms;    stdDev = 
0.546942ms;    
                        Speed = 983.6688 /sec   Speed = 1886.0596 /sec  Speed = 
2167.6477 /sec  Speed = 1256.593 /sec   Speed = 733.51404 /sec  Speed = 
1325.1765 /sec  Speed = 2772.551 /sec   Speed = 687.6394 /sec   Speed = 
1006.0908 /sec  
                        Rate = 11.804027 MB/s   Rate = 22.632715 MB/s   Rate = 
26.011774 MB/s   Rate = 15.079117 MB/s   Rate = 8.802169 MB/s    Rate = 
15.902119 MB/s   Rate = 33.270615 MB/s   Rate = 8.251673 MB/s    Rate = 
12.073091 MB/s   

org.apache.mahout.common.distance.EuclideanDistanceMeasure                      
                                                                                
                                                                                
                                  
                        nCalls = 20000;         nCalls = 20000;         nCalls 
= 20000;         nCalls = 20000;         nCalls = 20000;         nCalls = 
20000;         nCalls = 20000;         nCalls = 20000;         nCalls = 20000;  
       
                        sum = 20.754439s;       sum = 11.289392s;       sum = 
9.224848s;        sum = 16.26417s;        sum = 27.594554s;       sum = 
15.53272s;        sum = 7.403939s;        sum = 29.404334s;       sum = 
19.927117s;       
                        min = 0.932ms;          min = 0.51ms;           min = 
0.419ms;          min = 0.663ms;          min = 1.266ms;          min = 
0.571ms;          min = 0.326ms;          min = 1.338ms;          min = 
0.919ms;          
                        max = 2.279ms;          max = 4.234ms;          max = 
14.486ms;         max = 1.672ms;          max = 2.89ms;           max = 
1.745ms;          max = 0.973ms;          max = 3.133ms;          max = 
1.858ms;          
                        mean = 1.037721ms;      mean = 0.564469ms;      mean = 
0.461242ms;      mean = 0.813208ms;      mean = 1.379727ms;      mean = 
0.776636ms;      mean = 0.370196ms;      mean = 1.470216ms;      mean = 
0.996355ms;      
                        stdDev = 0.07615ms;     stdDev = 0.063422ms;    stdDev 
= 0.104271ms;    stdDev = 0.062487ms;    stdDev = 0.092051ms;    stdDev = 
0.106946ms;    stdDev = 0.056665ms;    stdDev = 0.093252ms;    stdDev = 
0.064435ms;    
                        Speed = 963.6493 /sec   Speed = 1771.5746 /sec  Speed = 
2168.0574 /sec  Speed = 1229.6969 /sec  Speed = 724.7807 /sec   Speed = 
1287.6045 /sec  Speed = 2701.265 /sec   Speed = 680.1718 /sec   Speed = 
1003.6574 /sec  
                        Rate = 11.563792 MB/s   Rate = 21.258896 MB/s   Rate = 
26.01669 MB/s    Rate = 14.756363 MB/s   Rate = 8.697369 MB/s    Rate = 
15.451254 MB/s   Rate = 32.41518 MB/s    Rate = 8.162063 MB/s    Rate = 
12.04389 MB/s    

{noformat}

                
> SequentialAccessSparseVector function assignment is very slow
> -------------------------------------------------------------
>
>                 Key: MAHOUT-1190
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1190
>             Project: Mahout
>          Issue Type: Bug
>            Reporter: Dan Filimon
>         Attachments: MAHOUT-1190-1.patch, MAHOUT-1190.patch
>
>
> Currently when calling .assign() on a SASV with another vector and a custom 
> function, it will iterate through it and assign every single entry while also 
> referring it by index.
> This makes the process *hugely* expensive. (on a run of BallKMeans on the 20 
> newsgroups data set, profiling reveals that 92% of the runtime was spent 
> updating assigning the vectors).
> Here's a prototype patch:
> https://github.com/dfilimon/mahout/commit/63998d82bb750150a6ae09052dadf6c326c62d3d

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to