On Mon, Apr 23, 2007 at 03:52:51AM +0200, Danny Burkes wrote: [..]
> 
> Sorry, I should have been more clear- what I was referring to was not
> storage, but rather tokenization.  My understanding is that many
> people use a simple Regex-based one-token-per-character tokenizer for
> non-Latin languages, but, since our languages are mixed, I wasn't sure
> what type of approach to tokenization would be best.  Clearly we can't
> use that one-token-per-character analyzer on latin text, right?

Right :-) 

Some heuristics to get an idea about which language you're working on
right now might be a good idea to select a proper analyzing algorithm.

The Nutch search engine (Java, Lucene-based) seems to have something
like that, possibly we could port this:
http://wiki.apache.org/nutch/MultiLingualSupport

> > However, if your search system isn't online (ie, the feature isn't
> > enabled in the front end), why would you need anything special? The
> > AAF DRb server can server requests while you're running a rebuild (as
> > long as you don't use the current rebuild_index method).
> > 
> 
> Perhaps I'm remembering incorrectly, but my recollection was that, the
> first time I created a new record for a model that uses aaf, the whole
> instance blocked while aaf was creating the index.  Did I remember that
> wrong?

No, that's correct. You can force a rebuild by calling
Model.rebuild_index from the console. 

> If that is the way that it works, then, clearly, I need to start the
> rebuild from outside of the application, before any users can create new
> model objects.
> 
> Further, are you saying that model creations during the rebuild won't
> block (I guess they realize that a rebuild is already happening and just
> return immediately)?

Unfortunately the DRb server doesn't realize this, yet. As Ryan wrote, I
plan to rework the re-indexing stuff in the near future, most likely
then there will be some kind of index rotation and a queue remembering
model updates that occured while a rebuild is going on.

> >> 5.  I suspect we will have to disable_ferret(:always) on our utterance
> >> model, then update the index manually on some periodic basis (cron job,
> >> backgroundrb worker, etc.).  The reason for this is that we don't want
> >> to introduce any delay into the process of storing a new utterance,
> >> which occurs in realtime during a chat session.  Anyone have experience
> >> doing this?
> > 
> > It's pretty fast. The only time you'd see a slowdown is when you
> > encounter a lock in the DRb server.
> >
> 
> And what would cause that?  Do normal model creates cause a lock?

Index updates are synchronized as there only may be one thread writing
to the index at a time. In case immediate indexing of new or updated
records is not needed, I see no problem in doing this later from cron or
backgroundrb based on some flag or timestamp. 

Ferret *is* fast, but you also have to take into account the DRb round
trip time, so this really could make sense for a chat application.

cheers,
Jens


-- 
Jens Krämer
webit! Gesellschaft für neue Medien mbH
Schnorrstraße 76 | 01069 Dresden
Telefon +49 351 46766-0 | Telefax +49 351 46766-66
[EMAIL PROTECTED] | www.webit.de
 
Amtsgericht Dresden | HRB 15422
GF Sven Haubold, Hagen Malessa
_______________________________________________
Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk

Reply via email to