RE: Concurrent searching & re-indexing
Ok, I will change my reindex method to delete all documents and then re-add them all, rather than using an IndexWriter to write a completely new index. Thanks for the help on this everyone. Paul -Original Message- From: Doug Cutting [mailto:[EMAIL PROTECTED] Sent: 17 February 2005 22:26 To: Lucene Users List Subject: Re: Concurrent searching & re-indexing Paul Mellor wrote: > I've read from various sources on the Internet that it is perfectly safe to > simultaneously search a Lucene index that is being updated from another > Thread, as long as all write access to the index is synchronized. But does > this apply only to updating the index (i.e. deleting and adding documents), > or to a complete re-indexing (i.e. create a new IndexWriter with the > 'create' argument true and then re-add all the documents)? [ ...] > java.io.IOException: couldn't delete _a.f1 > at org.apache.lucene.store.FSDirectory.create(FSDirectory.java:166) [...] > This is running on Windows 2000. On Windows one cannot delete a file while it is still open. So, no, on Windows one cannot remove an index entirely while an IndexReader or Searcher is still open on it, since it is simply impossible to remove all the files in the index. We might attempt to patch this by keeping a list of such files and attempt to delete them later (as is done when updating an index). But this could cause problems, as a new index will eventually try to use these same file names again, and it would then conflict with the open IndexReader. This is not a problem when updating an existing index, since filenames (except for a few which are not kept open, like "segments") are never reused in the lifetime of an index. So, in order for such a fix to work we would need to switch to globally unique segment names, e.g., long random strings, rather than increasing integers. In the meantime, the safe way to rebuild an index from scratch while other processes are reading it is simply to delete all of its documents, then start adding new ones. Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] _ This e-mail has been scanned for viruses by MCI's Internet Managed Scanning Services - powered by MessageLabs. For further information visit http://www.mci.com This e-mail and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you are not the intended recipient, you should not copy, retransmit or use the e-mail and/or files transmitted with it and should not disclose their contents. In such a case, please notify [EMAIL PROTECTED] and delete the message from your own system. Any opinions expressed in this e-mail and/or files transmitted with it that do not relate to the official business of this company are those solely of the author and should not be interpreted as being endorsed by this company.
Re: Concurrent searching & re-indexing
Paul Mellor wrote: I've read from various sources on the Internet that it is perfectly safe to simultaneously search a Lucene index that is being updated from another Thread, as long as all write access to the index is synchronized. But does this apply only to updating the index (i.e. deleting and adding documents), or to a complete re-indexing (i.e. create a new IndexWriter with the 'create' argument true and then re-add all the documents)? [ ...] java.io.IOException: couldn't delete _a.f1 at org.apache.lucene.store.FSDirectory.create(FSDirectory.java:166) [...] This is running on Windows 2000. On Windows one cannot delete a file while it is still open. So, no, on Windows one cannot remove an index entirely while an IndexReader or Searcher is still open on it, since it is simply impossible to remove all the files in the index. We might attempt to patch this by keeping a list of such files and attempt to delete them later (as is done when updating an index). But this could cause problems, as a new index will eventually try to use these same file names again, and it would then conflict with the open IndexReader. This is not a problem when updating an existing index, since filenames (except for a few which are not kept open, like "segments") are never reused in the lifetime of an index. So, in order for such a fix to work we would need to switch to globally unique segment names, e.g., long random strings, rather than increasing integers. In the meantime, the safe way to rebuild an index from scratch while other processes are reading it is simply to delete all of its documents, then start adding new ones. Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Concurrent searching & re-indexing
On Thu, 2005-02-17 at 04:44, Paul Mellor wrote: > "on windows you cannot delete open files, so Lucene AFAIK (I don't use > windows) postpones the deletion to a time, when the file is closed" > > If Lucene does not in fact postpone the deletion, that would explain the > exception I'm seeing ("java.io.IOException: couldn't delete _a.f1") - the > IndexWriter is attempting to delete the files but the IndexReader has them > open. > > Does this then mean that re-indexing whilst searching is inherently unsafe, > but only on Windows? Using Lucene 1.3 final, I ran across what I believe to be this problem. Under heavy load on Windows, deleting the segments file would fail sometimes. I tried to duplicate the problem with an attached debugger, but I was unable to do so. There's more details about my problem in this message: http://nagoya.apache.org/eyebrowse/[EMAIL PROTECTED]&msgNo=11986 Any advice would still be appreciated. Currently, I'm catching the error and doing a retry in the finally block, but I am not confident in this solution due to the difficulty of reproducing the problem. Regards, Luke Francl - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Concurrent searching & re-indexing
It failed for me on Linux. Paul Mellor wrote: "on windows you cannot delete open files, so Lucene AFAIK (I don't use windows) postpones the deletion to a time, when the file is closed" If Lucene does not in fact postpone the deletion, that would explain the exception I'm seeing ("java.io.IOException: couldn't delete _a.f1") - the IndexWriter is attempting to delete the files but the IndexReader has them open. Does this then mean that re-indexing whilst searching is inherently unsafe, but only on Windows? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Concurrent searching & re-indexing
Hi, Paul, I brought this point up a while back and didn't get a response. I've found that I frequently get a "file not found" exception when searching at the same time an indexing and/or optimize operation is running. I fixed it by trapping the exception and looping until it didn't fail. Jim. Paul Mellor wrote: Otis, 1. If IndexReader takes a snapshot of the index state when opened and then reads the files when searching, what would happen if the files it takes a snapshot of are deleted before the search is performed (as would happen with a reindexing in the period between opening an IndexSearcher and using it to search)? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Concurrent searching & re-indexing
"on windows you cannot delete open files, so Lucene AFAIK (I don't use windows) postpones the deletion to a time, when the file is closed" If Lucene does not in fact postpone the deletion, that would explain the exception I'm seeing ("java.io.IOException: couldn't delete _a.f1") - the IndexWriter is attempting to delete the files but the IndexReader has them open. Does this then mean that re-indexing whilst searching is inherently unsafe, but only on Windows? -Original Message- From: Morus Walter [mailto:[EMAIL PROTECTED] Sent: 17 February 2005 10:38 To: Lucene Users List Subject: RE: Concurrent searching & re-indexing Paul Mellor writes: > > 1. If IndexReader takes a snapshot of the index state when opened and then > reads the files when searching, what would happen if the files it takes a > snapshot of are deleted before the search is performed (as would happen with > a reindexing in the period between opening an IndexSearcher and using it to > search)? > On unix, open files are still there, even if they are deleted (that is, there is no link (filename) to the file anymore but the file's content still exists), on windows you cannot delete open files, so Lucene AFAIK (I don't use windows) postpones the deletion to a time, when the file is closed. > 2. Does a similar potential problem exist when optimising an index, if this > combines all the segments into a single file? > AFAIK optimising creates new files. The only problem that might occur, is opening a reader during index change but that's handled by a lock. HTH Morus - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] _ This e-mail has been scanned for viruses by MCI's Internet Managed Scanning Services - powered by MessageLabs. For further information visit http://www.mci.com This e-mail and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you are not the intended recipient, you should not copy, retransmit or use the e-mail and/or files transmitted with it and should not disclose their contents. In such a case, please notify [EMAIL PROTECTED] and delete the message from your own system. Any opinions expressed in this e-mail and/or files transmitted with it that do not relate to the official business of this company are those solely of the author and should not be interpreted as being endorsed by this company.
RE: Concurrent searching & re-indexing
Paul Mellor writes: > > 1. If IndexReader takes a snapshot of the index state when opened and then > reads the files when searching, what would happen if the files it takes a > snapshot of are deleted before the search is performed (as would happen with > a reindexing in the period between opening an IndexSearcher and using it to > search)? > On unix, open files are still there, even if they are deleted (that is, there is no link (filename) to the file anymore but the file's content still exists), on windows you cannot delete open files, so Lucene AFAIK (I don't use windows) postpones the deletion to a time, when the file is closed. > 2. Does a similar potential problem exist when optimising an index, if this > combines all the segments into a single file? > AFAIK optimising creates new files. The only problem that might occur, is opening a reader during index change but that's handled by a lock. HTH Morus - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Concurrent searching & re-indexing
Otis, Looking at your reply again, I have a couple of questions - "IndexSearcher (IndexReader, really) does take a snapshot of the index state when it is opened, so at that time the index segments listed in segments should be in a complete state. It also reads index files when searching, of course." 1. If IndexReader takes a snapshot of the index state when opened and then reads the files when searching, what would happen if the files it takes a snapshot of are deleted before the search is performed (as would happen with a reindexing in the period between opening an IndexSearcher and using it to search)? 2. Does a similar potential problem exist when optimising an index, if this combines all the segments into a single file? Many thanks Paul -Original Message- From: Paul Mellor [mailto:[EMAIL PROTECTED] Sent: 16 February 2005 17:37 To: 'Lucene Users List' Subject: RE: Concurrent searching & re-indexing But all write access to the index is synchronized, so that although multiple threads are creating an IndexWriter for the same directory and using it to totally recreate that index, only one thread is doing this at once. I was concerned about the safety of using an IndexSearcher to perform queries on an index that is in the process of being recreated from scratch, but I guess that if the IndexSearcher takes a snapshot of the index when it is created (and in my code this creation is synchronized with the write operations as well so that the threads wait for the write operations to finish before instantiating an IndexSearcher, and vice versa) this can't be a problem. -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Sent: 16 February 2005 17:30 To: Lucene Users List Subject: Re: Concurrent searching & re-indexing Hi Paul, If I understand your setup correctly, it looks like you are running multiple threads that create IndexWriter for the ame directory. That's a "no no". This section (first hit) describes all various concurrency issues with regards to adds, updates, optimization, and searches: http://www.lucenebook.com/search?query=concurrent IndexSearcher (IndexReader, really) does take a snapshot of the index state when it is opened, so at that time the index segments listed in segments should be in a complete state. It also reads index files when searching, of course. Otis --- Paul Mellor <[EMAIL PROTECTED]> wrote: > Hi, > > I've read from various sources on the Internet that it is perfectly > safe to > simultaneously search a Lucene index that is being updated from > another > Thread, as long as all write access to the index is synchronized. > But does > this apply only to updating the index (i.e. deleting and adding > documents), > or to a complete re-indexing (i.e. create a new IndexWriter with the > 'create' argument true and then re-add all the documents)? > > I have a class which encapsulates all access to my index, so that > writes can > be synchronized. This class also exposes a method to obtain an > IndexSearcher for the index. I'm running unit tests to test this > which > create many threads - each thread does a complete re-indexing and > then > obtains an IndexSearcher and does a query. > > I'm finding that with sufficiently high numbers of threads, I'm > getting the > occasional failure, with the following exception thrown when > attempting to > construct a new IndexWriter (during the reindexing) - > > java.io.IOException: couldn't delete _a.f1 > at org.apache.lucene.store.FSDirectory.create(FSDirectory.java:166) > at > org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:135) > at > org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:113) > at org.apache.lucene.index.IndexWriter.(IndexWriter.java:151) > ... > > The exception occurs quite infrequently (usually for somewhere > between 1-5% > of the Threads). > > Does the IndexSearcher take a 'snapshot' of the index at creation? > Or does > it access the filesystem whilst searching? I am also synchronizing > creation > of the IndexSearcher with the write lock, so that the IndexSearcher > is not > created whilst the index is being recreated (and vice versa). But do > I need > to ensure that the IndexSearcher cannot search whilst the index is > being > recreated as well? > > Note that a similar unit test where the threads update the index > (rather > than recreate it from scratch) works fine, as expected. > > This is running on Windows 2000. > > Any help would be much appreciated! > > Paul > > This e-mail and any files transmitted with it are confidential and > intended > solely for the use of the individual or entity to whom they are > addressed. > I
RE: Concurrent searching & re-indexing
But all write access to the index is synchronized, so that although multiple threads are creating an IndexWriter for the same directory and using it to totally recreate that index, only one thread is doing this at once. I was concerned about the safety of using an IndexSearcher to perform queries on an index that is in the process of being recreated from scratch, but I guess that if the IndexSearcher takes a snapshot of the index when it is created (and in my code this creation is synchronized with the write operations as well so that the threads wait for the write operations to finish before instantiating an IndexSearcher, and vice versa) this can't be a problem. -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Sent: 16 February 2005 17:30 To: Lucene Users List Subject: Re: Concurrent searching & re-indexing Hi Paul, If I understand your setup correctly, it looks like you are running multiple threads that create IndexWriter for the ame directory. That's a "no no". This section (first hit) describes all various concurrency issues with regards to adds, updates, optimization, and searches: http://www.lucenebook.com/search?query=concurrent IndexSearcher (IndexReader, really) does take a snapshot of the index state when it is opened, so at that time the index segments listed in segments should be in a complete state. It also reads index files when searching, of course. Otis --- Paul Mellor <[EMAIL PROTECTED]> wrote: > Hi, > > I've read from various sources on the Internet that it is perfectly > safe to > simultaneously search a Lucene index that is being updated from > another > Thread, as long as all write access to the index is synchronized. > But does > this apply only to updating the index (i.e. deleting and adding > documents), > or to a complete re-indexing (i.e. create a new IndexWriter with the > 'create' argument true and then re-add all the documents)? > > I have a class which encapsulates all access to my index, so that > writes can > be synchronized. This class also exposes a method to obtain an > IndexSearcher for the index. I'm running unit tests to test this > which > create many threads - each thread does a complete re-indexing and > then > obtains an IndexSearcher and does a query. > > I'm finding that with sufficiently high numbers of threads, I'm > getting the > occasional failure, with the following exception thrown when > attempting to > construct a new IndexWriter (during the reindexing) - > > java.io.IOException: couldn't delete _a.f1 > at org.apache.lucene.store.FSDirectory.create(FSDirectory.java:166) > at > org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:135) > at > org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:113) > at org.apache.lucene.index.IndexWriter.(IndexWriter.java:151) > ... > > The exception occurs quite infrequently (usually for somewhere > between 1-5% > of the Threads). > > Does the IndexSearcher take a 'snapshot' of the index at creation? > Or does > it access the filesystem whilst searching? I am also synchronizing > creation > of the IndexSearcher with the write lock, so that the IndexSearcher > is not > created whilst the index is being recreated (and vice versa). But do > I need > to ensure that the IndexSearcher cannot search whilst the index is > being > recreated as well? > > Note that a similar unit test where the threads update the index > (rather > than recreate it from scratch) works fine, as expected. > > This is running on Windows 2000. > > Any help would be much appreciated! > > Paul > > This e-mail and any files transmitted with it are confidential and > intended > solely for the use of the individual or entity to whom they are > addressed. > If you are not the intended recipient, you should not copy, > retransmit or > use the e-mail and/or files transmitted with it and should not > disclose > their contents. In such a case, please notify > [EMAIL PROTECTED] > and delete the message from your own system. Any opinions expressed > in this > e-mail and/or files transmitted with it that do not relate to the > official > business of this company are those solely of the author and should > not be > interpreted as being endorsed by this company. > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] _ This e-mail has been scanned for viruses by MCI's Internet Managed Scanning Services - powered by MessageLabs. For further information visit http://www.mci.com This
Re: Concurrent searching & re-indexing
Hi Paul, If I understand your setup correctly, it looks like you are running multiple threads that create IndexWriter for the ame directory. That's a "no no". This section (first hit) describes all various concurrency issues with regards to adds, updates, optimization, and searches: http://www.lucenebook.com/search?query=concurrent IndexSearcher (IndexReader, really) does take a snapshot of the index state when it is opened, so at that time the index segments listed in segments should be in a complete state. It also reads index files when searching, of course. Otis --- Paul Mellor <[EMAIL PROTECTED]> wrote: > Hi, > > I've read from various sources on the Internet that it is perfectly > safe to > simultaneously search a Lucene index that is being updated from > another > Thread, as long as all write access to the index is synchronized. > But does > this apply only to updating the index (i.e. deleting and adding > documents), > or to a complete re-indexing (i.e. create a new IndexWriter with the > 'create' argument true and then re-add all the documents)? > > I have a class which encapsulates all access to my index, so that > writes can > be synchronized. This class also exposes a method to obtain an > IndexSearcher for the index. I'm running unit tests to test this > which > create many threads - each thread does a complete re-indexing and > then > obtains an IndexSearcher and does a query. > > I'm finding that with sufficiently high numbers of threads, I'm > getting the > occasional failure, with the following exception thrown when > attempting to > construct a new IndexWriter (during the reindexing) - > > java.io.IOException: couldn't delete _a.f1 > at org.apache.lucene.store.FSDirectory.create(FSDirectory.java:166) > at > org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:135) > at > org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:113) > at org.apache.lucene.index.IndexWriter.(IndexWriter.java:151) > ... > > The exception occurs quite infrequently (usually for somewhere > between 1-5% > of the Threads). > > Does the IndexSearcher take a 'snapshot' of the index at creation? > Or does > it access the filesystem whilst searching? I am also synchronizing > creation > of the IndexSearcher with the write lock, so that the IndexSearcher > is not > created whilst the index is being recreated (and vice versa). But do > I need > to ensure that the IndexSearcher cannot search whilst the index is > being > recreated as well? > > Note that a similar unit test where the threads update the index > (rather > than recreate it from scratch) works fine, as expected. > > This is running on Windows 2000. > > Any help would be much appreciated! > > Paul > > This e-mail and any files transmitted with it are confidential and > intended > solely for the use of the individual or entity to whom they are > addressed. > If you are not the intended recipient, you should not copy, > retransmit or > use the e-mail and/or files transmitted with it and should not > disclose > their contents. In such a case, please notify > [EMAIL PROTECTED] > and delete the message from your own system. Any opinions expressed > in this > e-mail and/or files transmitted with it that do not relate to the > official > business of this company are those solely of the author and should > not be > interpreted as being endorsed by this company. > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Concurrent searching & re-indexing
Hi, I've read from various sources on the Internet that it is perfectly safe to simultaneously search a Lucene index that is being updated from another Thread, as long as all write access to the index is synchronized. But does this apply only to updating the index (i.e. deleting and adding documents), or to a complete re-indexing (i.e. create a new IndexWriter with the 'create' argument true and then re-add all the documents)? I have a class which encapsulates all access to my index, so that writes can be synchronized. This class also exposes a method to obtain an IndexSearcher for the index. I'm running unit tests to test this which create many threads - each thread does a complete re-indexing and then obtains an IndexSearcher and does a query. I'm finding that with sufficiently high numbers of threads, I'm getting the occasional failure, with the following exception thrown when attempting to construct a new IndexWriter (during the reindexing) - java.io.IOException: couldn't delete _a.f1 at org.apache.lucene.store.FSDirectory.create(FSDirectory.java:166) at org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:135) at org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:113) at org.apache.lucene.index.IndexWriter.(IndexWriter.java:151) ... The exception occurs quite infrequently (usually for somewhere between 1-5% of the Threads). Does the IndexSearcher take a 'snapshot' of the index at creation? Or does it access the filesystem whilst searching? I am also synchronizing creation of the IndexSearcher with the write lock, so that the IndexSearcher is not created whilst the index is being recreated (and vice versa). But do I need to ensure that the IndexSearcher cannot search whilst the index is being recreated as well? Note that a similar unit test where the threads update the index (rather than recreate it from scratch) works fine, as expected. This is running on Windows 2000. Any help would be much appreciated! Paul This e-mail and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you are not the intended recipient, you should not copy, retransmit or use the e-mail and/or files transmitted with it and should not disclose their contents. In such a case, please notify [EMAIL PROTECTED] and delete the message from your own system. Any opinions expressed in this e-mail and/or files transmitted with it that do not relate to the official business of this company are those solely of the author and should not be interpreted as being endorsed by this company.