A specific example:

You have a criminal justice system that indexes past court cases.

You do a search for cases involving Joe Smith because you are a judge and you want to review priors before sentencing. Similar issues with related cases, case history, etc.

Is it better to return something that may not be correct, or return an error saying the index is offline and is being rebuilt - please perform your search later? In this case old false positives are just as bad as missing new records. I hope that demonstrates the position clearly.

As I stated, there are several classes of applications where "any data" whether it is current or valid is acceptable, but I would argue that in MOST cases this is not the case, and if the interested subjects fully reviewed their requirements they would not accept that solution. It is easily summarized with the old adage "garbage in, garbage out".

The only reason that corruption is ok is that you need to reindex anyway, and rebuilding from scratch is often faster than determining the affected documents and updating (especially if corruption is a possibility).

It was in fact me that brought about the issue that none of the "lockless commits" code fixed anything related to corruption. The only way to ensure non-corruption is to sync all data files, then write and sync the segments file. I think this change could have been accomplished in about 10 lines of code, and is completely independent of lockless commits, and in most cases makes lockless commits obsolete. But to be honest, I am not really certain how lockless commits can actually work in an environment that allows updates to the documents (and or related resources), so I am sure there are aspects I am just ignorant of.

As an aside, we engineered our software years ago to work around these issues, which why we still use a 1.9 derivative, and monitor the trunk for important fixes an enhancements.

On Jan 22, 2008, at 8:35 PM, Mark Miller wrote:



robert engels wrote:
I think there are a lot of applications using Lucene where "whether its lost a bit of data or not" is not acceptable.
Yeah, and I have one of them. Which is why I would love the support your talking about. But its not there yet and I am just grateful that i can get my customers back up and searching as quick as possible rather than experience an index corruption. Access to the data is more important than complete access to the data for my customers (though theyd say they certainly want both). After such an experience I have to run through the database and check if anything from the index is missing, and if it is, re index. Not ideal, but what can you do? I find it odd that you don't think non corruption is better than nothing. Its a big feature for me. If the server reboots at night and causes a corruption, I have customers that will be SOL for some time...id prefer when the server reboots, my index - whatever is left, is searchable. My customers need to work. Can't get behind on a daily product :)

I'd prefer what your talking about, but there are tons of other things I'd love to see in Lucene as well. It just seems odd to complain about them. I'd think that instead, I might spear head the development. Just not experienced enough myself to do a lot of the deeper work. You don't appear so limited. How about helping out with some transactional support :)


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to