On Mon, Mar 11, 2013 at 5:14 PM, Ted Dunning <[email protected]> wrote:
> [mvn compile|test|package] will do the trick.
>
> Everything is built-in. The code generator is a maven plug-in that runs
> whenever you build math. That is why the build isn't real incremental.
> Not that it matters much since the compile is so fast.
>
Ok, I'll try that. For some reason, it wasn't doing anything (I think?)
before,
as we hardcode dependency on mahout-collections-1.0 in a lot of poms,
I think?
> Getting a good iterator would be awesome. Should be easy to have internal
> state variables with getters to avoid cons'ing up temporary values or to
> avoid boxing. For key/value pairs, the iterator can nominally be over the
> keys but have an extra method to be called after next() which will give the
> value without a real lookup.
>
I was imagining doing very similar to what we have in our vectors: truly
implement Iterable<${KeyTypeCap}${ValueTypeCap}Pair>, by instantiating
exactly *one* ${KeyTypeCap}${ValueTypeCap}Pair per iterator, and having
it serve as a layer of indirection to fetch keys/values directly from the
underlying
primitive arrays (and keeping the simple state of the index offset into the
arrays which is incremented as iteration commences).
>
> On Mon, Mar 11, 2013 at 4:42 PM, Jake Mannix <[email protected]>
> wrote:
>
> > On Mon, Mar 11, 2013 at 4:21 PM, Ted Dunning <[email protected]>
> > wrote:
> >
> > > It is part of math now since we had zero pull for it separate from
> math.
> > >
> >
> > I see the code templates living in math, yes, but how to build it?
> >
> >
> > > What did you need?
> > >
> >
> > Iterators.
> >
> > The way we use OpenIntDoubleHashMap in our primary sparse vector impl is
> to
> > use forEachPair() to fill a secondary structure with the keys and values,
> > and then iterate over this. In addition to being wasteful in the usual
> > case of iterating over all values (both for CPU and memory), it's super
> > wasteful if your iteration terminates early: you've already done the full
> > O(n) walk, but the "second pass" might terminate after a few values: you
> > want to know whether the vector has any values > 1.0. You might find out
> > that the first one does, but instead of being an O(1) operation, it's
> O(n).
> >
> > For raw OpenIntDoubleHashMap, you can use forEachXYZ methods, but
> exposing
> > these in the Vector interface is a bit heavy-handed. What would be
> better
> > would be to just properly implement the iterateAllNonZero() method to
> > properly delegate to an efficient iterater() method on
> > OpenIntDoubleHashMap. It's not hard to write (it's basically what we
> have
> > in RandomAccessSparseVector), it just needs to be implemented in the
> > templates.
> >
> >
> > >
> > > On Mon, Mar 11, 2013 at 1:43 PM, Jake Mannix <[email protected]>
> > > wrote:
> > >
> > > > Question which I ought to know the answer to, but don't: if we want
> to
> > > make
> > > > changes to mahout-collections, what's the build process / maven
> target
> > to
> > > > do this?
> > > >
> > > > --
> > > >
> > > > -jake
> > > >
> > >
> >
> >
> >
> > --
> >
> > -jake
> >
>
--
-jake