I updated http://wiki.apache.org/couchdb/FullTextIndexWithView with a slightly more robust implementation. Still no boolean abilities though -- I'm coming the internets trying to figure out how google does it in m/r, but my best guess is they just brute-force the merge (and probably track some stats to guess a total). This doesn't seem like something that would lend itself easily to couch -- but I could be wrong. I'm probably wrong. Please, someone tell me I'm wrong...
Dean On Mon, Jul 28, 2008 at 1:18 PM, Dean Landolt <[EMAIL PROTECTED]> wrote: > Gladly. I'll get it on the wiki and send a link after I clean it up. > > Regarding merging views, something like that would be fantastic, though I > can't really comprehend the performance implications. If a view can peer > into another view for its processing, I gather this would mean it would have > to be updated every time a change happens in the referenced view(s), and an > incremental update here may really mean a full update of the view in > question, but I'm just guessing. Though this would allow real *joins *and > end that whole question once and for all... :) > > > > On Sun, Jul 27, 2008 at 7:04 PM, Dan Reverri <[EMAIL PROTECTED]> wrote: > >> Dean, >> >> Any chance you want to share your view code? >> >> In regards to the query parsing, I am not sure how this will work. Right >> now >> results for each term have to be pulled down to the client and merged >> together. Perhaps we could add a query method to views that allow >> different >> key values to be combined. >> >> A user could query a view with a set of keys and a merge function that >> could >> define how the key values could be combined. >> >> On Fri, Jul 25, 2008 at 5:01 PM, Dean Landolt <[EMAIL PROTECTED]> >> wrote: >> >> > On Mon, Jul 21, 2008 at 11:45 AM, Dean Landolt <[EMAIL PROTECTED]> >> > wrote: >> > >> > > On Mon, Jul 21, 2008 at 1:08 AM, Dan Reverri <[EMAIL PROTECTED]> >> wrote: >> > > >> > >> Is it worthwhile to implement a full text indexer on top of couchdbs >> > >> map/reduce functionality? >> > >> >> > >> http://wiki.apache.org/couchdb/FullTextIndexWithView >> > >> >> > > >> > > >> > > Interesting idea. There's definitely more to FTI than tokenization >> alone, >> > > but then again there's an awful lot of power in m/r and javascript -- >> it >> > > didn't take me a second to find a porter stemming algorithm in js: >> > > http://tartarus.org/~martin/PorterStemmer/js.txt<http://tartarus.org/%7Emartin/PorterStemmer/js.txt> >> <http://tartarus.org/%7Emartin/PorterStemmer/js.txt> >> > <http://tartarus.org/%7Emartin/PorterStemmer/js.txt> >> > > >> > > I bet variable weighting would be pretty close to impossible in the >> m/r >> > > paradigm though, and probably some other features (of course, I could >> be >> > > wrong, and when it comes to couchdb, thus far I usually am). For a >> > strait-up >> > > word search, this is servicible as is. I'm going to see if I can't >> figure >> > > out how to shoehorn in some boolean features. >> > > >> > >> > I gave this approach another look and I was able to get a view together >> > that >> > did a little more (stemming, optional case-insensitivity, min length for >> > tokens, better whitespace handling). I'm working on an ngram view too >> and >> > so >> > far it's promising. But there's still one huge problem -- for the life >> of >> > me >> > I can't figure out a workable strategy for boolean operations that >> doesn't >> > involve fully loading each piece of the query. Am I missing something? >> Is >> > something like this even possible? I know there's no way to load a piece >> of >> > a view from another view -- but I just can't help but really wish there >> > were. >> > >> > >
