> This is a good approach if the number of total documents doesn't grow
> too much. There's obviously a limit to full index runs at some point.

Well I was actually going to go with incremental indexing, since a full
reindex will probably take ~1 hour. We have a relatively fixed size of data,
but the data is updated very frequently - almost 100% turnover/day.

> You need to find out where you lose most of the time:

Fair enough, I haven't tried much in the way of profiling yet. I just
thought you might have found some Lucene settings that made a big difference
for you, or you'd found indexing into a RAMDirectory then dumping it to disk
was faster, etc. But it sounds like you're pretty happy with near default
settings.

> However, what I wonder: if you have your data in a database anyway, why
> not use the database's indexing features? It seems like Lucene is an
> additional layer on top of your data, which you don't really need.

Our current DB server (running SQL Server) is under enormous strain, partly
due to the complex searches that are being performed against it. We've got
it pretty heavily tweaked already, so I don't think there's too much room to
improve on that front. The idea is to use Lucene to take the searching load
off it so it can get on with all the other tasks it has to perform. The
Lucene implementation I'm working on here is just a proof of concept - it
may be that we stay with SQL Server in the long run anyway, but Lucene
definitely seems to be worth investigating - it has certainly worked well
for us on smaller projects.




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to