[ 
https://issues.apache.org/jira/browse/MAHOUT-300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836433#action_12836433
 ] 

Ted Dunning edited comment on MAHOUT-300 at 2/21/10 7:52 PM:
-------------------------------------------------------------

I think that this is a cleaner style for the merge loop.  In particular, the 
average inner loop is much tighter.  The trick is that either iterator can take 
many steps per outer iteration and that whenever either iterator is stepping, 
you only check that iterator, its index and a constant.  In sparse vectors, 
this is a big win.  Even in fairly dense vectors there are just a few extra 
tests which the compiler may well be able to eliminate with common 
sub-expression analysis.  This form has the added benefit of having a simple 
correctness argument.

{noformat}
      while (myIter.hasNext() && otherIter.hasNext()) {
        // loop invariant: neither entry is up to date
        Iterator<Element> myIter = iterateNonZero();
        Iterator<Element> otherIter = x.iterateNonZero();

        // scan to end or equality
        while (myCurrent.index() != otherCurrent.index() &&
                  ((myIter.hasNext() && myCurrent.index() < 
otherCurrent.index()) ||
                  (otherIter.hasNext() && otherCurrent.index() < 
myCurrent.index()))) {
          // invariant: both entries are current

          // catch up my side.  This will find first match, if possible by 
stepping myIter
          while (myIter.hasNext() && myCurrent.index() < otherCurrent.index()) {
            myCurrent = myIter.next();
          }
          // if match, it was the first one possible

          // catch up other side.  This will find first match, if possible
          while (otherIter.hasNext() && otherCurrent.index() < 
myCurrent.index()) {
            otherCurrent = otherIter.next();
          }
          // if match, it was the first one possible

          // invariant: both entries are current
        }
        // exit: (both entries are current AND equal index AND it was first 
match) OR one side ran out early

        if (myCurrent.index() == otherCurrent.index()) {
          // if equal, use it
          result += myCurrent.get() * otherCurrent.get();
        }
        // invariant: neither entry is up to date (or we will exit loop)
      }
{noformat}

      was (Author: tdunning):
    I think that this is a cleaner style for the merge loop.  In particular, 
the average inner loop is much tighter.  The trick is that either iterator can 
take many steps per outer iteration and that whenever either iterator is 
stepping, you only check that iterator, its index and a constant.  In sparse 
vectors, this is a big win.  Even in fairly dense vectors there are just a few 
extra tests which the compiler may well be able to eliminate with common 
sub-expression analysis.  This form has the added benefit of having a simple 
correctness argument.

{noformat}
      Iterator<Element> myIter = iterateNonZero();
      Iterator<Element> otherIter = x.iterateNonZero();
      while (myIter.hasNext() && otherIter.hasNext()) {
        // loop invariant: neither entry is up to date

        // scan to end or equality
        while (myCurrent.index() != otherCurrent.index() &&
                  ((myIter.hasNext() && myCurrent.index() < 
otherCurrent.index()) ||
                  (otherIter.hasNext() && otherCurrent.index() < 
myCurrent.index()))) {
          // invariant: both entries are current

          // catch up my side
          while (myIter.hasNext() && myCurrent.index() < otherCurrent.index()) {
            myCurrent = myIter.next();
          }

          // catch up other side  
          while (otherIter.hasNext() && otherCurrent.index() < 
myCurrent.index()) {
            otherCurrent = otherIter.next();
          }

          // invariant: both entries are current
        }
        // exit: (both entries are current AND equal index) OR one side ran out 
early

        if (myCurrent.index() == otherCurrent.index()) {
          // if equal, use it
          result += myCurrent.get() * otherCurrent.get();
        }
        // invariant: neither entry is up to date or we will exit loop
      }
{noformat}
  
> Solve performance issues with Vector Implementations
> ----------------------------------------------------
>
>                 Key: MAHOUT-300
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-300
>             Project: Mahout
>          Issue Type: Improvement
>    Affects Versions: 0.3
>            Reporter: Robin Anil
>             Fix For: 0.3
>
>         Attachments: MAHOUT-300.patch, MAHOUT-300.patch, MAHOUT-300.patch
>
>
> AbstractVector operations like times
>   public Vector times(double x) {
>     Vector result = clone();
>     Iterator<Element> iter = iterateNonZero();
>     while (iter.hasNext()) {
>       Element element = iter.next();
>       int index = element.index();
>       result.setQuick(index, element.get() * x);
>     }
>     return result;
>   }
> should be implemented as follows
>  public Vector times(double x) {
>     Vector result = clone();
>     Iterator<Element> iter = result.iterateNonZero();
>     while (iter.hasNext()) {
>       Element element = iter.next();
>       element.set(element.get() * x);
>     }
>     return result;
>   }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to