I've been thinking a bit about problems that Luke and I have outlined together related to using CouchDB to it's full potential in melkjug. I'm going to lay out a few of those problems and a quick summary of what my thoughts are. These ideas are all still a little half-baked so feedback would be appreciated.
*How do we get a random subset of a query?* When writing a view for CouchDB each document is run through a map function to generate the view. The map function is supposed to emit a key/value combination (though these can each be complex types) for each document. If we want a random subset of documents we can make they key (or some element of a complex key) be a random integer. Then we can just take the first n results. *How do we use CouchDB to distribute our filtering onto a cluster? *Unless I'm mistaken (and only a subset of javascript is available to a map function, in which case maybe we can use a Python view server), we should be able to calculate scores for each document through a view. The steps needed to make this happen I see as follows: The filter needs to be accessible from the map function. Either we: - build our own View Server in Python and include our filter modules to call directly for calculating scores. - implement a RESTful pattern for calling filter modules via HTTP/JSON (happy side effect is the possibility for off site filters) - maybe do both *Is there anything we can do to improve our measurements of "goodness" and our results? *There has been discussion about the recommendation/rating algorithm for Melkjug: http://tinyurl.com/5s8tdh http://tinyurl.com/5gawhk If we wanted to get really nutty (read: awesome), it seems feasible to implement a closest-n articles (by Euclidean distance or dot product) view in a way which distributes. A Python view server could have views which leverage Numpy for doing the linear algebra for us. We could generate a view for each user which is updated whenever the users preferences change. This view would, for each score document (document which stores the scores for a given article against all filters), calculate the distance we require against the user's preference vector. Since we cannot pass arguments to CouchDB views, we can simply update the view when a user changes their filtering preferences. -Randall

