Hey all;

I have ran into an interesting case.

Our system has notes. These need to be indexed. They are xml files called 
default.xml and are easily parsed and indexed. No problem, have been doing it 
all week.

The problem is if someone edits the note, the system doesn't update the 
default.xml. It creates a new file, default_1.xml (every edit creates a new 
file with an incremented number, the sytem only displays the content from the 
highest number).

My problem is I index all the documents and end up with terms that were taken 
out of note several version ago still showing up in the query. >From my point 
of view this makes sense because the files are still in the content. But to a 
user it is confusing because they have no idea every change they make to a note 
spans a new file and now the are seeing a term they removed from their note 2 
weeks ago showing up in a query.

I have started modifying my incremental update to be look for multiple version 
of the default.xml but it is more work than I thought and is going make things 
complex.

Maybe there is an easier way? If I just let it run and create the index, can 
somebody suggest a way I could easily scan the index folder ensuring only the 
default.xml with the highest number in its filename remains (only for folders 
were there is more than one default.xml file)? Or is this wishful thinking?

Thanks,

Luke

Reply via email to