The approach I am currently using is (pseudo code):
select count(*) from docs
where date_modified > lastIndexRunDate
if ((countChangedOrNew/reader.numDocs) >50%)
{
//quicker to rebuild the whole index
wipeIndex;
Select * from docs
for (each record)
{
writer.addDoc(new Doc(record));
}
}
else
{
//patch the data
//first delete any docs in index
select id from docs where
date_modified>lastIndexRunDate
for(each id)
{
reader.delete(new Term("dbkey",id);
}
reader.close
//now add docs
select * from docs where
date_modified>lastIndexRunDate
for (each record)
{
writer.addDoc(new Doc(record));
}
}
save lastIndexRunDate;
We've found there are database-specific JDBC streaming
settings that help when reading huge volumes of
records.
--- N <[EMAIL PROTECTED]> wrote:
> Hi
>
> I am indexing database tables with huge data via
> Lucene. Do I need to reindex the whole table(s) as
> changes are made to keep the search up to date..?
> since it is time consuming to create new index every
> time from scratch when the data is modified in the
> tables, can anybody suggest some workaround for
> efficient method?
>
> Thanks in advance
> Noon
>
>
> ---------------------------------
> Relax. Yahoo! Mail virus scanning helps detect nasty
viruses!
___________________________________________________________
Win a BlackBerry device from O2 with Yahoo!. Enter now.
http://www.yahoo.co.uk/blackberry
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]