petite_abeille wrote:

Well, the subject says it all...

If there is one thing which is overly cumbersome in Lucene, it's updating documents, therefore this Request For Enhancement:

Please consider enhancing the IndexWriter API to include an updateDocument(...) method to take care of all the gory details involved in such operation.

I agree, this is always a hassle to do right due to having to use IndexWriter and IndexReader and properly opening/closing them.


I have a prelim version of a "batched index writer" that I use. The code is kinda messy, but for discussion here's what it does:

Briefly the methods are:

// [1]
// the ctr has parameters:
//    'batch size # docs' e.g. it will flush pending updates every 100 docs
//    'batch freq' e.g. auto flush every 60 sec


// [2] // queue a document to be added to the index // 'key' is the primary key name e.g. "url" // 'val' is the primary key val e.g. "http://www.tropo.com/"; // 'doc' is the doc to be added update( String key, String val, Document doc)

// [3]
//  queue a document for removal
// 'key' and 'val' are the params, as from [2]
remove( String key, String val)

// [4]
// periodic flush, called automatically or on demand, 2 stages:
//         1. call IndexReader.delete() on all pending (key,val) pairs
//         2. close IndexReader
//         3. call IndexWriter.add() on all pending documents
//         4. optionally call optimze()
//         5. close IndexWriter
flush()



//----

So in normal usage you just keep calling update() and it peridically flushes the pending updates to the index. By its nature this uses memory however it's tunable as to how many documents it'll queue in memory.

Does the algorithm above, esp flush(), sound correct? It seems to work right for me and I can post this if people want to see it...

- Dave



Thanks in advance.

Cheers,

PA.


--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to