[ 
https://issues.apache.org/jira/browse/MAHOUT-300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836249#action_12836249
 ] 

Ted Dunning commented on MAHOUT-300:
------------------------------------

{quote}
ted:    It may be that someday we will need maxNonZero, but we can do that when 
it comes up.

Then no need of all these checks. Just need to iterateAll(). It will be slow. 
But thats the penalty you pay to use this function(should be documented) on 
large sparse vector. If you just need the maxNonZero, which should use 
iterateNonZero. All implementations return -INF if nothing is found..
{quote}

Quite the contrary!

We need sparse implementations that do the same as iterateAll implements but 
much faster for sparse cases.  The difference between max and maxNonZero is 
definitional and both should return the same values on sparse or non-sparse 
versions of their inputs.  They should also both take advantage of sparseness 
to be as fast as possible.  And all of these cases should have semi-reasonable 
behavior on unreasonable inputs.

max should return the largest value x in the vector such that for all values y 
in the vector y <= x.  This can be done by scanning all non-zero values 
explicitly and then handling all zero values in one comparison.  If there are 
NO values in the vector, then the result is undefined and a domain exception is 
warranted.

maxNonZero should do the same thing, but should be applied only to non-zero 
values.  If there are no non-zero values, then the same domain exception needs 
to be raised.


> Solve performance issues with Vector Implementations
> ----------------------------------------------------
>
>                 Key: MAHOUT-300
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-300
>             Project: Mahout
>          Issue Type: Improvement
>    Affects Versions: 0.3
>            Reporter: Robin Anil
>             Fix For: 0.3
>
>         Attachments: MAHOUT-300.patch, MAHOUT-300.patch
>
>
> AbstractVector operations like times
>   public Vector times(double x) {
>     Vector result = clone();
>     Iterator<Element> iter = iterateNonZero();
>     while (iter.hasNext()) {
>       Element element = iter.next();
>       int index = element.index();
>       result.setQuick(index, element.get() * x);
>     }
>     return result;
>   }
> should be implemented as follows
>  public Vector times(double x) {
>     Vector result = clone();
>     Iterator<Element> iter = result.iterateNonZero();
>     while (iter.hasNext()) {
>       Element element = iter.next();
>       element.set(element.get() * x);
>     }
>     return result;
>   }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to