[ 
https://issues.apache.org/jira/browse/MAHOUT-1691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14588388#comment-14588388
 ] 

ASF GitHub Bot commented on MAHOUT-1691:
----------------------------------------

Github user dlyubimov commented on the pull request:

    https://github.com/apache/mahout/pull/138#issuecomment-112499826
  
    Alexey, there are a  few problems here.
    
    
    I believe much more computationally efficient form to do this as it stands 
        block.cloned := {(r,c,v) =>  v- mean(c) / std(c) }
    
    (1) Creation + assignment is much slower
    (2) Functional assignments take into account matrix structure and avoid 
inefficient iteration directions. e.g. if block is really column-wise sparse 
matrix consisting of sparse sequential columns, this iteration is 10...100x 
slower than it needs to be (as demonstrated by #135).
    (3) This syntax already exist in form of dense() or sparse() (if you want 
to assemble a matrix from collection of vector rows). 
    (4) Finally, this code is most likely missing your intent because row 
slices are coming from iterator in order which is not guaranteed. I.e. 
iterator() may be returning first row number 20, then 5, then 31 etc. You 
assemble it back in order of iteration which is probably not what you want.  
Note that iterators return MatrixSlice, not just a vector, and the slice has 
index() method which indicates its true row ordinal. 



> iterable of vectors to matrix 
> ------------------------------
>
>                 Key: MAHOUT-1691
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1691
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Math
>    Affects Versions: 0.10.1
>            Reporter: Alexey Grigorev
>            Priority: Minor
>              Labels: math, scala
>
> In Mahout scala bindings, instead of writing  
> {code}
> val res = drmX.mapBlock(drmX.ncol) {
>   case (keys, block) => {
>     val copy = block.like
>     copy := block.map(row => (row - mean) / std)
>     (keys, copy)
>   }
> }
> {code}
> I would like to be able to write 
> {code}
> val res = drmX.mapBlock(drmX.ncol) {
>   case (keys, block) => {
>     keys -> block.map(row => (row - mean) / std)
>   }
> }
> {code}
> Solution: add a method for implicit conversion from iterable to Matrix:
> {code}
>   implicit def iterable2Matrix(that: Iterable[Vector]): Matrix = {
>     val first = that.head
>     val nrow = that.size
>     val ncol = first.size
>     val m = if (first.isDense) {
>       new DenseMatrix(nrow, ncol)
>     } else {
>       new SparseRowMatrix(nrow, ncol)
>     }
>     that.zipWithIndex.foreach { case (row, idx) => 
>       m.assignRow(idx.toInt, row)
>     }
>     m
>   }
> {code}
> If it sounds nice, I can send a pull request with this implemented



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to