On Mon, Mar 11, 2013 at 5:44 PM, Jake Mannix <[email protected]> wrote:
> > > > On Mon, Mar 11, 2013 at 5:14 PM, Ted Dunning <[email protected]>wrote: > >> [mvn compile|test|package] will do the trick. >> >> Everything is built-in. The code generator is a maven plug-in that runs >> whenever you build math. That is why the build isn't real incremental. >> Not that it matters much since the compile is so fast. >> > > Ok, I'll try that. For some reason, it wasn't doing anything (I think?) > before, > as we hardcode dependency on mahout-collections-1.0 in a lot of poms, > I think? > In particular, when I build, I notice that I see: Downloading: http://repo1.maven.org/maven2/org/apache/mahout/mahout-collections/1.0/mahout-collections-1.0.jar Which implies to me that I'm not going to be using my newly minted code... > > >> Getting a good iterator would be awesome. Should be easy to have internal >> state variables with getters to avoid cons'ing up temporary values or to >> avoid boxing. For key/value pairs, the iterator can nominally be over the >> keys but have an extra method to be called after next() which will give >> the >> value without a real lookup. >> > > I was imagining doing very similar to what we have in our vectors: truly > implement Iterable<${KeyTypeCap}${ValueTypeCap}Pair>, by instantiating > exactly *one* ${KeyTypeCap}${ValueTypeCap}Pair per iterator, and having > it serve as a layer of indirection to fetch keys/values directly from the > underlying > primitive arrays (and keeping the simple state of the index offset into > the > arrays which is incremented as iteration commences). > > > >> >> On Mon, Mar 11, 2013 at 4:42 PM, Jake Mannix <[email protected]> >> wrote: >> >> > On Mon, Mar 11, 2013 at 4:21 PM, Ted Dunning <[email protected]> >> > wrote: >> > >> > > It is part of math now since we had zero pull for it separate from >> math. >> > > >> > >> > I see the code templates living in math, yes, but how to build it? >> > >> > >> > > What did you need? >> > > >> > >> > Iterators. >> > >> > The way we use OpenIntDoubleHashMap in our primary sparse vector impl >> is to >> > use forEachPair() to fill a secondary structure with the keys and >> values, >> > and then iterate over this. In addition to being wasteful in the usual >> > case of iterating over all values (both for CPU and memory), it's super >> > wasteful if your iteration terminates early: you've already done the >> full >> > O(n) walk, but the "second pass" might terminate after a few values: you >> > want to know whether the vector has any values > 1.0. You might find >> out >> > that the first one does, but instead of being an O(1) operation, it's >> O(n). >> > >> > For raw OpenIntDoubleHashMap, you can use forEachXYZ methods, but >> exposing >> > these in the Vector interface is a bit heavy-handed. What would be >> better >> > would be to just properly implement the iterateAllNonZero() method to >> > properly delegate to an efficient iterater() method on >> > OpenIntDoubleHashMap. It's not hard to write (it's basically what we >> have >> > in RandomAccessSparseVector), it just needs to be implemented in the >> > templates. >> > >> > >> > > >> > > On Mon, Mar 11, 2013 at 1:43 PM, Jake Mannix <[email protected]> >> > > wrote: >> > > >> > > > Question which I ought to know the answer to, but don't: if we want >> to >> > > make >> > > > changes to mahout-collections, what's the build process / maven >> target >> > to >> > > > do this? >> > > > >> > > > -- >> > > > >> > > > -jake >> > > > >> > > >> > >> > >> > >> > -- >> > >> > -jake >> > >> > > > > -- > > -jake > -- -jake
