Check if Term present in Existing Index before Merging indexes from Directory.

2013-09-11 Thread Ankit Murarka

Hello

Have a peculiar problem to deal with and I am sure there must be some 
way to handle it.


1. Indexes exist on the server for existing files.
2. Generating indexing is automated so files when generated will also 
lead to index generation.

3. I am merging the newly generated indexes and existing index.

/*Field of prime importance is fileName.*/

Now since merging is being done with /* writer.addIndexes(Directory name)*/

The same file if indexed again is being added in the indexes twice. So 
in Hit I am getting more than 1 entries for same file. No problem with 
the HIT..


Problem is with the same file being indexed two times during merging..

I need to ensure that when I merge indexes, if term say /*File1*/ is 
already present, the indexes should be updated instead of adding. This 
is supposed to happen during indexing process.


Kindly guide as to how it can be achieved.. Javadoc does not seem to 
help me.


TIA.

--
Regards

Ankit Murarka

What lies behind us and what lies before us are tiny matters compared with what 
lies within us



Re: Check if Term present in Existing Index before Merging indexes from Directory.

2013-09-11 Thread Ian Lea
If you want to stick with the approach of multiple indexes you'll have
to add some logic to work round it.

Option 1.

Post merge, loop through all docs identifying duplicates and deleting
the one(s) you don't want.


Option 2.

Pre merge, read all indexes in parallel, identifying and deleting as above.


Option 3.

When creating a new index, check the first and delete matches or don't
index the file, whichever makes sense.


I'm sure there are other options as well, but no instant solutions.
One obvious option is to skip the merging altogether: if you want one
big index, why not just work directly with that, using updateDocument
with filename as the Term.



--
Ian.


On Wed, Sep 11, 2013 at 1:40 PM, Ankit Murarka
ankit.mura...@rancoretech.com wrote:
 Hello

 Have a peculiar problem to deal with and I am sure there must be some way to
 handle it.

 1. Indexes exist on the server for existing files.
 2. Generating indexing is automated so files when generated will also lead
 to index generation.
 3. I am merging the newly generated indexes and existing index.

 /*Field of prime importance is fileName.*/

 Now since merging is being done with /* writer.addIndexes(Directory name)*/

 The same file if indexed again is being added in the indexes twice. So in
 Hit I am getting more than 1 entries for same file. No problem with the
 HIT..

 Problem is with the same file being indexed two times during merging..

 I need to ensure that when I merge indexes, if term say /*File1*/ is
 already present, the indexes should be updated instead of adding. This is
 supposed to happen during indexing process.

 Kindly guide as to how it can be achieved.. Javadoc does not seem to help
 me.

 TIA.

 --
 Regards

 Ankit Murarka

 What lies behind us and what lies before us are tiny matters compared with
 what lies within us


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org