On Sat, 19 Mar 2005 22:43:44 +0300, Pasha Bizhan <[EMAIL PROTECTED]> wrote: > Could you provide the code snippets for your process? >
Sure (thanx for helping, btw) I just realized that the way I described our process was off a little bit. Here's the process again: 1. grab all index Directorys (index parts) 2. loop newest to oldest and make documents unique (by deleting older documents) 3. get list of documents from index parts to delete from our main index 4. delete documents from main index 5. add all documents from index parts into the main index I apologize for the amount of code below. Here is the code that loops through all the index parts, from newest to oldest, and then deletes the documents from any older index parts. The unique ID we use as a Key Field is "ReceivedDate". IndexReader reader = null; IndexReader reader2 = null; try { /* *------------------------------------------------------------- * Loop backwards (latest to oldest) through parts *------------------------------------------------------------- */ for ( int i = ( directories.length - 1 ); i >= 0; i-- ) { reader = IndexReader.open( FSDirectory.getDirectory( directories[i], false ) ); int numDocuments = reader.numDocs(); /* *------------------------------------------------------------- * Loop forward (oldest to latest) up to the current part * being looked at. * Delete any messages from the older parts that exist in the * current part. *------------------------------------------------------------- */ for ( int x = 0; x < i; x++ ) { String partName = directories[x].getName(); reader2 = IndexReader.open( FSDirectory.getDirectory( directories[x], false ) ); for ( int h = 0; h < numDocuments; h++ ) { if ( !reader.isDeleted( h ) ) { Document d = reader.document( h ); String receivedDate = d.get( "ReceivedDate" ); Term term = new Term( "ReceivedDate", receivedDate ); int num = reader2.delete( term ); } } reader2.close(); reader2 = null; } reader.close(); reader = null; } } catch ( Exception e ) { // log error } finally { try { if ( reader != null ) reader.close(); if ( reader2 != null ) reader2.close(); } catch ( IOException e ) { // log error } } Here we build up a list of ReceivedDates to help us delete from the main.index. I just realized that we could build this list from the previous section. List list = new ArrayList(); for ( int i = 0; i < directories.length; i++ ) { IndexReader r = null; try { r = IndexReader.open( directories[i] ); int num = r.numDocs(); for ( int x = 0; x < num; x++ ) { if ( !r.isDeleted( x ) ) { Map map = new HashMap(); Document d = r.document( x ); map.put( "ReceivedDate", d.get( "ReceivedDate" ) ); list.add( map ); } } } catch ( Exception e ) { e.printStackTrace(); } finally { if ( r != null ) try { r.close(); } catch ( Exception e ) {} } } return list; Here we actually go through and delete the documents from the main index. IndexReader reader = null; Map message; try { reader = IndexReader.open( mainindex ); Iterator it = indexList.iterator(); // returned from previous section /* *------------------------------------------------------------- * Loop through messages to clear from the index *------------------------------------------------------------- */ while ( it.hasNext() ) { message = (Map)it.next(); /* *------------------------------------------------------------- * Delete based on received date *------------------------------------------------------------- */ String receivedDate = (String)message.get( "ReceivedDate" ); Term term = new Term( "ReceivedDate", receivedDate ); int num = reader.delete( term ); } /* *------------------------------------------------------------- * End loop through messages to clear from the index *------------------------------------------------------------- */ } catch ( Exception e ) { // log error } finally { try { if ( reader != null ) reader.close(); } catch ( IOException e ) { // log close error } } Now we loop through each index part and add each document separately, this time with most fields as Unstored. IndexReader r = null; try { /* *------------------------------------------------------------- * Open an index reader to an index part *------------------------------------------------------------- */ r = IndexReader.open( directory ); int num = r.numDocs(); /* *------------------------------------------------------------- * Loop through messages in the index part *------------------------------------------------------------- */ for ( int i = 0; i < num; i++ ) { if ( !r.isDeleted( i ) ) { /* *------------------------------------------------------------- * Current document while looping through *------------------------------------------------------------- */ Document d = r.document( i ); if ( writer != null ) { /* *------------------------------------------------------------- * New document to add into main index *------------------------------------------------------------- */ Document nd = new Document(); /* *------------------------------------------------------------- * Loop through document fields. *------------------------------------------------------------- */ for ( Enumeration e = d.fields(); e.hasMoreElements(); ) { Field f = (Field)e.nextElement(); // re-add most fields as unstored } /* *------------------------------------------------------------- * End loop through document fields. *------------------------------------------------------------- */ /* *------------------------------------------------------------- * Add new document into main index. *------------------------------------------------------------- */ writer.addDocument( nd ); } } } /* *------------------------------------------------------------- * End loop through messages in the index part *------------------------------------------------------------- */ } finally { if ( r != null ) try { r.close(); } catch ( Exception e ) {} } After that the index writer is closed. Thanks for helping out! Roy. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]