On 12.08.2010 18:16, Clint Byrum wrote: > > On Aug 12, 2010, at 7:55 AM, Abel Deuring wrote: [...] > Having recently come from a search company (job searches are not all that > different than bug/project/etc. searches. :), I can offer some insight, > as over 6 years, we solved this problem 3 different ways.
cool, I suspected that I missed something obvious while thinking about the "search subscriptions", but now it seems that this is a problem where it is hard to come up with a really good solution. > > Method 1 above is pretty similar to the final solution we settled on, > though it was fraught with one big problem. Every time we changed search, > we had to go through all of the distinct search keys and make sure we > didn't break peoples' search. Or we'd forget that step (often) and just > break lots of saved searches. yes, that's what I thought too; OTOH we don't change the search parameters (or the search behaviour) that often. > > The biggest pitfall here was that if we changed a field from structured to > free form, we had to go generate free-form representations of the old > structured search. One example of this was location. For a long time, > location was City, State/Province, Country, Zip/PostalCode. This became > "Location" and search location providers (such as google maps, and others) > were used to interpret it. The re-interpretations can also be done at > run-time, whenever one of the old fields is encountered, but I prefer to > keep code simple, and the simplest way to do that is to have less > variation in your stored data. > > Method 2 has the same problem, but now instead of having things in > predictible, structured storage where its easy to find any rows you may > break, you now have to go digging/regexing through all the URL query parts. > One thing about this that mitigates the problem, is that often times you > will keep "old style" searches working for a while, so that links to old > searches continue to work, so you can simply use whatever method you use > there to re-process these searches. right, there is a risk that we have to do some maintenance work on stored queries. But that would be also a good reminder that we would also break URLs for bug searches which people have bookmarked when we have to update the stored query parameters ;) > > A third method, which was only done as an experiment and never rolled out, > was to store searches as documents in CouchDB. This allowed flexibility > in the schema, so while it resembled method 2, instead of a query part, > it was still "structured" and had indexes for querying. It also had a single > read for every search, rather than, as you put it, 12 rows for a single > search. It was also more logical to store and retrieve a full data structure, > rather than try to break it up and reassemble it from relational rows. > > The hottest part of this was of course that you could cache the actual result > in the couchdb document, and simply tag it with a date/time stamp, and then > only refresh the result when it was out of date. This is cool because for a > user viewing their subscriptions its a very low cost to show them the results. Could you explain a bit more how CouchDB would be used? Would query parameters be mapped to search results? Perhaps I am somewhat slow, but I don't how this would help for the case "does bug notification X match the search criteria specified by subscriber Y". > > The only reason it wasn't pursued further was CouchDB's, at the time, dismal > performance. This was almost 2 years ago, and I'm sure by now CouchDB has > gotten much better. > > The "12 rows for a single search" mentioned above isn't all that bad. As I > understand it, if you do 12 inserts in a row to a single table in PostgreSQL, > at that point, those 12 rows are physically stored in serial. So at least > that row will remain fast until the table is re-clustered (apologies for > my terminology, I am more familiar w/ mysql than postgres). The expansion of three values for parameter 1 and two values for parameters 2 and 3 into twelve rows is simply unaesthetic ;) But storing the selected values of parameter 1 in table 1, the values of parameter 2 in table 2 etc and joining all these tables may also work sufficiently fast in queries. Abel _______________________________________________ Mailing list: https://launchpad.net/~launchpad-dev Post to : [email protected] Unsubscribe : https://launchpad.net/~launchpad-dev More help : https://help.launchpad.net/ListHelp

