I am considering a somewhat large change to org.apache.mahout.cf.taste code and would like to solicit feedback from users.
The change would be to remove the User, Item and Preference interfaces/abstractions from the code. Everything would proceed in terms of user and item IDs, and preference values instead. The reasons for these interfaces originally were, well, it seemed nice. It also provided a way for implementors to substitute domain-specific implementations with additional information. But there are problems too. - Do methods take a User, or user ID? The code is not consistent in this regard. If User, the caller is forced to look up a User if it only has an ID. (Conversely, if the caller already has a User, and the method needs a User, then passing an ID only forces a redundant lookup. I think this is rarer.) - Factory method problem. There are many points in the code where it should call to factory methods to generate a User/Item/Preference object since the domain may use specialized implementations instead of GenericUser, etc. At the moment some methods just assume GenericUser, etc. Fixing this would be a bit hard but would more importantly impact performance I think. - Object overhead. Holding these extra objects has a cost in memory and performance. The code already really assumes there are nothing but user and item IDs and a pref value. So why not make the core reflect this and gain some simplicity and speed performance? I think that domains that need to inject extra information can still do this fine without needing custom User, Item implementations. It is just a thought now. Anybody have more? Sean
