Hey all; I have ran into an interesting case.
Our system has notes. These need to be indexed. They are xml files called default.xml and are easily parsed and indexed. No problem, have been doing it all week. The problem is if someone edits the note, the system doesn't update the default.xml. It creates a new file, default_1.xml (every edit creates a new file with an incremented number, the sytem only displays the content from the highest number). My problem is I index all the documents and end up with terms that were taken out of note several version ago still showing up in the query. >From my point of view this makes sense because the files are still in the content. But to a user it is confusing because they have no idea every change they make to a note spans a new file and now the are seeing a term they removed from their note 2 weeks ago showing up in a query. I have started modifying my incremental update to be look for multiple version of the default.xml but it is more work than I thought and is going make things complex. Maybe there is an easier way? If I just let it run and create the index, can somebody suggest a way I could easily scan the index folder ensuring only the default.xml with the highest number in its filename remains (only for folders were there is more than one default.xml file)? Or is this wishful thinking? Thanks, Luke