I'd like to ask we take a moment to agree on and then implement some small code hygiene... should be things we always do to adhere to project and industry norms:
- Make sure a copyright statement appears in each file - Let's not do * imports - No serialVersionUID - No printStackTrace() or System.{out,err} -- logger instead (except maybe in command-line program classes, or maybe test classes) - No empty javadoc if you please - Let's use canonical literals for floats and doubles -- 1.0, or 1.0f, instead of 1f or 1d or 1.0d also there are now two "cache" abstractions in this project. This may be in vain but would be good to attempt to unify. I prefer how org.apache.mahout.cf.taste.impl.common.Cache works -- don't think a cache should have a set() operation for instance. The difference is in eviction strategy. If it's actually necessary to have different strategies, we could refactor this. I personally find it faster and as fine to use the Bitset-based thing. Not going to push on it but ideally we'd do a lot more unification like this. There are also two Pair classes. On that subject, also thought I'd ask about the org.apache.mahout.utils package -- makes sense to have a place to stuff miscellaneous methods, though some classes literally have one three-line method. Also, what are the distance measure classes doing there? And how does this package relate to org.apache.mahout.common? The reason I am nit-picking here is I would like to clean up the 'common spaces' a bit to pave the way for more reuse across the code base. That's good per se, but also acts against the trend for this to end up being the mere concatenation of five developers' personal projects.