Re: Disk space used by optimize
Bernhard Messer writes: However, three times the space sounds a bit too much, or I make a mistake in the book. :) there already was a discussion about disk usage during index optimize. Please have a look to the developers list at: http://mail-archives.apache.org/eyebrowse/[EMAIL PROTECTED]msgId=1797569 http://mail-archives.apache.org/eyebrowse/[EMAIL PROTECTED]msgId=1797569 where i made some measurements about the disk usage within lucene. At that time i proposed a patch which was reducing disk total used disk size from 3 times to a little more than 2 times of the final index size. Together with Christoph we implemented some improvements to the optimization patch and finally commit the changes. Hmm. In the case that the index is used (open reader), I doubt your patch makes a difference. In that case the disk space used by the non optimized index will still be used even if the files are deleted (on unix/linux). What happens, if disk space run's out during creation of the compound index? Will the non compound files be a usable index? Otherwise you risk to loose the index. Morus - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re[2]: Disk space used by optimize
Hello, Doug. There is a big difference when you use compound index format or multiple files. I have tested it on the big index (45 Gb). When I used compound file then optimize takes 3 times more space, b/c *.cfs needs to be unpacked. Now I do use non compound file format. It needs like twice as much disk space. DC Perhaps we should add something to the javadocs noting this? Sure. I was a bit confused about optimizing compound file format b/c I had not info about space usage when optimizing. More info in the javadocs will save somebody's time :) Yura Smolsky - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Optimize not deleting all files
Hi all We have the same problem. We guess that the problem is that windows lock files. Our enviroment: Windows 2000 Tomcat 5.5.4 Ernesto. [EMAIL PROTECTED] escribió: Hi, When I run an optimize in our production environment, old index are left in the directory and are not deleted. My understanding is that an optimize will create new index files and all existing index files should be deleted. Is this correct? We are running Lucene 1.4.2 on Windows. Any help is appreciated. Thanks! - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- No virus found in this outgoing message. Checked by AVG Anti-Virus. Version: 7.0.300 / Virus Database: 265.8.5 - Release Date: 03/02/2005 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Disk space used by optimize - non space in disk corrupts index.
Hi all We have a big index and a little space in disk. When optimize and all space is consumed, our index is corrupted. segments file point to nonexistent files. Enviroment: java 1.4.2_04 W2000 SP4 Tomat 5.5.4 Bye, Ernesto. Yura Smolsky escribió: Hello, Otis. There is a big difference when you use compound index format or multiple files. I have tested it on the big index (45 Gb). When I used compound file then optimize takes 3 times more space, b/c *.cfs needs to be unpacked. Now I do use non compound file format. It needs like twice as much disk space. OG Have you tried using the multifile index format? Now I wonder if there OG is actually a difference in disk space cosumed by optimize() when you OG use multifile and compound index format... OG Otis OG --- Kauler, Leto S [EMAIL PROTECTED] wrote: Our copy of LIA is in the mail ;) Yes the final three files are: the .cfs (46.8MB), deletable (4 bytes), and segments (29 bytes). --Leto -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Hello, Yes, that is how optimize works - copies all existing index segments into one unified index segment, thus optimizing it. see hit #1: http://www.lucenebook.com/search?query=optimize+disk+space However, three times the space sounds a bit too much, or I make a mistake in the book. :) You said you end up with 3 files - .cfs is one of them, right? Otis --- Kauler, Leto S [EMAIL PROTECTED] wrote: Just a quick question: after writing an index and then calling optimize(), is it normal for the index to expand to about three times the size before finally compressing? In our case the optimise grinds the disk, expanding the index into many files of about 145MB total, before compressing down to three files of about 47MB total. That must be a lot of disk activity for the people with multi-gigabyte indexes! Regards, Leto CONFIDENTIALITY NOTICE AND DISCLAIMER Information in this transmission is intended only for the person(s) to whom it is addressed and may contain privileged and/or confidential information. If you are not the intended recipient, any disclosure, copying or dissemination of the information is unauthorised and you should delete/destroy all copies and notify the sender. No liability is accepted for any unauthorised use of the information contained in this transmission. This disclaimer has been automatically added. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] OG - OG To unsubscribe, e-mail: [EMAIL PROTECTED] OG For additional commands, e-mail: OG [EMAIL PROTECTED] Yura Smolsky, - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- No virus found in this outgoing message. Checked by AVG Anti-Virus. Version: 7.0.300 / Virus Database: 265.8.5 - Release Date: 03/02/2005 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Disk space used by optimize
However, three times the space sounds a bit too much, or I make a mistake in the book. :) there already was a discussion about disk usage during index optimize. Please have a look to the developers list at: http://mail-archives.apache.org/eyebrowse/[EMAIL PROTECTED]msgId=1797569 http://mail-archives.apache.org/eyebrowse/[EMAIL PROTECTED]msgId=1797569 where i made some measurements about the disk usage within lucene. At that time i proposed a patch which was reducing disk total used disk size from 3 times to a little more than 2 times of the final index size. Together with Christoph we implemented some improvements to the optimization patch and finally commit the changes. Bernhard - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Optimize not deleting all files
Get and try Lucene 1.4.3. One of the older versions had a bug that was not deleting old index files. Otis --- [EMAIL PROTECTED] wrote: Hi, When I run an optimize in our production environment, old index are left in the directory and are not deleted. My understanding is that an optimize will create new index files and all existing index files should be deleted. Is this correct? We are running Lucene 1.4.2 on Windows. Any help is appreciated. Thanks! - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Optimize not deleting all files
Ernestor, what version of Lucene are you running? --- Lucene Users List lucene-user@jakarta.apache.org wrote: Hi all We have the same problem. We guess that the problem is that windows lock files. Our enviroment: Windows 2000 Tomcat 5.5.4 Ernesto. [EMAIL PROTECTED] escribiꊾ Hi, When I run an optimize in our production environment, old index are left in the directory and are not deleted. My understanding is that an optimize will create new index files and all existing index files should be deleted. Is this correct? We are running Lucene 1.4.2 on Windows. Any help is appreciated. Thanks! - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- No virus found in this outgoing message. Checked by AVG Anti-Virus. Version: 7.0.300 / Virus Database: 265.8.5 - Release Date: 03/02/2005 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Optimize not deleting all files
Hi all, ill answer on behalf of Ernesto, our environment is: Lucene 1.4.2 Tomcat 5.5.4 java 1.4.2_04 Windows 2000 SP4 --p [EMAIL PROTECTED] wrote: Ernestor, what version of Lucene are you running? --- Lucene Users List lucene-user@jakarta.apache.org wrote: Hi all We have the same problem. We guess that the problem is that windows lock files. Our enviroment: Windows 2000 Tomcat 5.5.4 Ernesto. [EMAIL PROTECTED] escribi Hi, When I run an optimize in our production environment, old index are left in the directory and are not deleted. My understanding is that an optimize will create new index files and all existing index files should be deleted. Is this correct? We are running Lucene 1.4.2 on Windows. Any help is appreciated. Thanks! - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- No virus found in this outgoing message. Checked by AVG Anti-Virus. Version: 7.0.300 / Virus Database: 265.8.5 - Release Date: 03/02/2005 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Optimize not deleting all files
Hi Otis, tried version 1.4.3 without success, old index files still remain in the directory. Also tried not calling optimize(), and still getting the same behaviour, maybe our problem is not related to optimize() call at all. --p Otis Gospodnetic wrote: Get and try Lucene 1.4.3. One of the older versions had a bug that was not deleting old index files. Otis --- [EMAIL PROTECTED] wrote: Hi, When I run an optimize in our production environment, old index are left in the directory and are not deleted. My understanding is that an optimize will create new index files and all existing index files should be deleted. Is this correct? We are running Lucene 1.4.2 on Windows. Any help is appreciated. Thanks! - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Optimize not deleting all files
Hi Patricio, Is it the case that the old index files are not removed from session to session, or only within the same session? The discussion below pertains to the latter case, that is, where the old index files are used in the same process as the files replacing them. I was having a similar problem, and tracked the source down to IndexReaders not being closed in my application. As far as I can tell, in order for IndexReaders to present a consistent view of an index while changes are being made to it, read-only copies of the index are kept around until all IndexReaders using them are closed. If any IndexReaders are open on the index, IndexWriters first make a copy, then operate on the copy. If you track down all of these open IndexReaders and close them before optimization, all of the old index files should be deleted. (Lucene Gurus, please correct this if I have misrepresented the situation). In my application, I had a bad interaction between IndexReader caching, garbage collection, and incremental indexing, in which a new IndexReader was being opened on an index after each indexing increment, without closing the already-opened IndexReaders. On Windows, operating-system level file locking caused by IndexReaders left open was disallowing index re-creation, because the IndexWriter wasn't allowed to delete the index files opened by the abandoned IndexReaders. In short, if you need to write to an index more than once in a single session, be sure to keep careful track of your IndexReaders. Hope it helps, Steve Patricio Keilty wrote: Hi Otis, tried version 1.4.3 without success, old index files still remain in the directory. Also tried not calling optimize(), and still getting the same behaviour, maybe our problem is not related to optimize() call at all. --p Otis Gospodnetic wrote: Get and try Lucene 1.4.3. One of the older versions had a bug that was not deleting old index files. Otis --- [EMAIL PROTECTED] wrote: Hi, When I run an optimize in our production environment, old index are left in the directory and are not deleted. My understanding is that an optimize will create new index files and all existing index files should be deleted. Is this correct? We are running Lucene 1.4.2 on Windows. Any help is appreciated. Thanks! - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Optimize not deleting all files
Yes, I believe my problem is related to open IndexReaders. The issues is that we can't shut down our live search application while we wait for a 10 minute optimization. Search is a major part of our application and removing the feature would significantly affect our end users (even though we run the optimize during the night). After the optimize is completed, I close and re-open the readers so they start reading from the new index files. I'm thinking of adding code to delete all the old files at that point. I presume they will no longer be locked. --- Lucene Users List lucene-user@jakarta.apache.org wrote: Hi Patricio, Is it the case that the old index files are not removed from session to session, or only within the same session? The discussion below pertains to the latter case, that is, where the old index files are used in the same process as the files replacing them. I was having a similar problem, and tracked the source down to IndexReaders not being closed in my application. As far as I can tell, in order for IndexReaders to present a consistent view of an index while changes are being made to it, read-only copies of the index are kept around until all IndexReaders using them are closed. If any IndexReaders are open on the index, IndexWriters first make a copy, then operate on the copy. If you track down all of these open IndexReaders and close them before optimization, all of the old index files should be deleted. (Lucene Gurus, please correct this if I have misrepresented the situation). In my application, I had a bad interaction between IndexReader caching, garbage collection, and incremental indexing, in which a new IndexReader was being opened on an index after each indexing increment, without closing the already-opened IndexReaders. On Windows, operating-system level file locking caused by IndexReaders left open was disallowing index re-creation, because the IndexWriter wasn't allowed to delete the index files opened by the abandoned IndexReaders. In short, if you need to write to an index more than once in a single session, be sure to keep careful track of your IndexReaders. Hope it helps, Steve Patricio Keilty wrote: Hi Otis, tried version 1.4.3 without success, old index files still remain in the directory. Also tried not calling optimize(), and still getting the same behaviour, maybe our problem is not related to optimize() call at all. --p Otis Gospodnetic wrote: Get and try Lucene 1.4.3. One of the older versions had a bug that was not deleting old index files. Otis --- [EMAIL PROTECTED] wrote: Hi, When I run an optimize in our production environment, old index are left in the directory and are not deleted. My understanding is that an optimize will create new index files and all existing index files should be deleted. Is this correct? We are running Lucene 1.4.2 on Windows. Any help is appreciated. Thanks! - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Optimize not deleting all files
Hi, When I run an optimize in our production environment, old index are left in the directory and are not deleted. My understanding is that an optimize will create new index files and all existing index files should be deleted. Is this correct? We are running Lucene 1.4.2 on Windows. Any help is appreciated. Thanks! - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Optimize not deleting all files
Your understanding is right! The old existing files should be deleted,but it will build new files! On Thu, 03 Feb 2005 17:36:27 -0800 (PST), [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: Hi, When I run an optimize in our production environment, old index are left in the directory and are not deleted. My understanding is that an optimize will create new index files and all existing index files should be deleted. Is this correct? We are running Lucene 1.4.2 on Windows. Any help is appreciated. Thanks! - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Disk space used by optimize
Yura Smolsky wrote: There is a big difference when you use compound index format or multiple files. I have tested it on the big index (45 Gb). When I used compound file then optimize takes 3 times more space, b/c *.cfs needs to be unpacked. Now I do use non compound file format. It needs like twice as much disk space. Perhaps we should add something to the javadocs noting this? Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re[2]: Disk space used by optimize
Hello, Otis. There is a big difference when you use compound index format or multiple files. I have tested it on the big index (45 Gb). When I used compound file then optimize takes 3 times more space, b/c *.cfs needs to be unpacked. Now I do use non compound file format. It needs like twice as much disk space. OG Have you tried using the multifile index format? Now I wonder if there OG is actually a difference in disk space cosumed by optimize() when you OG use multifile and compound index format... OG Otis OG --- Kauler, Leto S [EMAIL PROTECTED] wrote: Our copy of LIA is in the mail ;) Yes the final three files are: the .cfs (46.8MB), deletable (4 bytes), and segments (29 bytes). --Leto -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Hello, Yes, that is how optimize works - copies all existing index segments into one unified index segment, thus optimizing it. see hit #1: http://www.lucenebook.com/search?query=optimize+disk+space However, three times the space sounds a bit too much, or I make a mistake in the book. :) You said you end up with 3 files - .cfs is one of them, right? Otis --- Kauler, Leto S [EMAIL PROTECTED] wrote: Just a quick question: after writing an index and then calling optimize(), is it normal for the index to expand to about three times the size before finally compressing? In our case the optimise grinds the disk, expanding the index into many files of about 145MB total, before compressing down to three files of about 47MB total. That must be a lot of disk activity for the people with multi-gigabyte indexes! Regards, Leto CONFIDENTIALITY NOTICE AND DISCLAIMER Information in this transmission is intended only for the person(s) to whom it is addressed and may contain privileged and/or confidential information. If you are not the intended recipient, any disclosure, copying or dissemination of the information is unauthorised and you should delete/destroy all copies and notify the sender. No liability is accepted for any unauthorised use of the information contained in this transmission. This disclaimer has been automatically added. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] OG - OG To unsubscribe, e-mail: [EMAIL PROTECTED] OG For additional commands, e-mail: OG [EMAIL PROTECTED] Yura Smolsky, - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Disk space used by optimize
Morus, that description of 3 sets of index files is what I was imagining, too. I'll have to test and add to the book errata, it seems. Thanks for the info, Otis --- Morus Walter [EMAIL PROTECTED] wrote: Otis Gospodnetic writes: Hello, Yes, that is how optimize works - copies all existing index segments into one unified index segment, thus optimizing it. see hit #1: http://www.lucenebook.com/search?query=optimize+disk+space However, three times the space sounds a bit too much, or I make a mistake in the book. :) I cannot explain why, but ~ three times the size of the final index is what I observed, when I logged disk usage during optimize of an index in compound index format. The test was on linux, I simply did a 'du -s' every few seconds parallel to the optimize. I didn't test noncompund format. Probably optimizing a compund format requires to store the different parts of the compound file separately before joining them to the compound file (sound reasonable, otherwise you would need to know the sizes before creating the parts). In that case you had the original index, the separate files and the new compound file as the disk usage peak. So IMHO the book is wrong. Morus - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Disk space used by optimize
Just a quick question: after writing an index and then calling optimize(), is it normal for the index to expand to about three times the size before finally compressing? In our case the optimise grinds the disk, expanding the index into many files of about 145MB total, before compressing down to three files of about 47MB total. That must be a lot of disk activity for the people with multi-gigabyte indexes! Regards, Leto CONFIDENTIALITY NOTICE AND DISCLAIMER Information in this transmission is intended only for the person(s) to whom it is addressed and may contain privileged and/or confidential information. If you are not the intended recipient, any disclosure, copying or dissemination of the information is unauthorised and you should delete/destroy all copies and notify the sender. No liability is accepted for any unauthorised use of the information contained in this transmission. This disclaimer has been automatically added. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Disk space used by optimize
Hello, Yes, that is how optimize works - copies all existing index segments into one unified index segment, thus optimizing it. see hit #1: http://www.lucenebook.com/search?query=optimize+disk+space However, three times the space sounds a bit too much, or I make a mistake in the book. :) You said you end up with 3 files - .cfs is one of them, right? Otis --- Kauler, Leto S [EMAIL PROTECTED] wrote: Just a quick question: after writing an index and then calling optimize(), is it normal for the index to expand to about three times the size before finally compressing? In our case the optimise grinds the disk, expanding the index into many files of about 145MB total, before compressing down to three files of about 47MB total. That must be a lot of disk activity for the people with multi-gigabyte indexes! Regards, Leto CONFIDENTIALITY NOTICE AND DISCLAIMER Information in this transmission is intended only for the person(s) to whom it is addressed and may contain privileged and/or confidential information. If you are not the intended recipient, any disclosure, copying or dissemination of the information is unauthorised and you should delete/destroy all copies and notify the sender. No liability is accepted for any unauthorised use of the information contained in this transmission. This disclaimer has been automatically added. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Disk space used by optimize
Our copy of LIA is in the mail ;) Yes the final three files are: the .cfs (46.8MB), deletable (4 bytes), and segments (29 bytes). --Leto -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Hello, Yes, that is how optimize works - copies all existing index segments into one unified index segment, thus optimizing it. see hit #1: http://www.lucenebook.com/search?query=optimize+disk+space However, three times the space sounds a bit too much, or I make a mistake in the book. :) You said you end up with 3 files - .cfs is one of them, right? Otis --- Kauler, Leto S [EMAIL PROTECTED] wrote: Just a quick question: after writing an index and then calling optimize(), is it normal for the index to expand to about three times the size before finally compressing? In our case the optimise grinds the disk, expanding the index into many files of about 145MB total, before compressing down to three files of about 47MB total. That must be a lot of disk activity for the people with multi-gigabyte indexes! Regards, Leto CONFIDENTIALITY NOTICE AND DISCLAIMER Information in this transmission is intended only for the person(s) to whom it is addressed and may contain privileged and/or confidential information. If you are not the intended recipient, any disclosure, copying or dissemination of the information is unauthorised and you should delete/destroy all copies and notify the sender. No liability is accepted for any unauthorised use of the information contained in this transmission. This disclaimer has been automatically added. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Disk space used by optimize
Have you tried using the multifile index format? Now I wonder if there is actually a difference in disk space cosumed by optimize() when you use multifile and compound index format... Otis --- Kauler, Leto S [EMAIL PROTECTED] wrote: Our copy of LIA is in the mail ;) Yes the final three files are: the .cfs (46.8MB), deletable (4 bytes), and segments (29 bytes). --Leto -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Hello, Yes, that is how optimize works - copies all existing index segments into one unified index segment, thus optimizing it. see hit #1: http://www.lucenebook.com/search?query=optimize+disk+space However, three times the space sounds a bit too much, or I make a mistake in the book. :) You said you end up with 3 files - .cfs is one of them, right? Otis --- Kauler, Leto S [EMAIL PROTECTED] wrote: Just a quick question: after writing an index and then calling optimize(), is it normal for the index to expand to about three times the size before finally compressing? In our case the optimise grinds the disk, expanding the index into many files of about 145MB total, before compressing down to three files of about 47MB total. That must be a lot of disk activity for the people with multi-gigabyte indexes! Regards, Leto CONFIDENTIALITY NOTICE AND DISCLAIMER Information in this transmission is intended only for the person(s) to whom it is addressed and may contain privileged and/or confidential information. If you are not the intended recipient, any disclosure, copying or dissemination of the information is unauthorised and you should delete/destroy all copies and notify the sender. No liability is accepted for any unauthorised use of the information contained in this transmission. This disclaimer has been automatically added. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: how often to optimize?
Are not optimized indices causing you any problems (e.g. slow searches, high number of open file handles)? If no, then you don't even need to optimize until those issues become... issues. OK I have changed the process to not doing optimize() at all. So far so good. The number of files hover from 10 to 40 during the indexing of 10,000 files. Seems Lucene is doing some kind of self maintenance to keep things in order. Is it right to say optimize() is a totally optional operation? I probably get the impression it is a natural step to end an incremental update from the IndexHTML example. Since it replicates the whole index it might be an overkill for many applications to do daily. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: how often to optimize?
Correct. The self-maintenance you are referring to is Lucene's periodic segment merging. The frequency of that can be controlled through IndexWriter's mergeFactor. Otis --- aurora [EMAIL PROTECTED] wrote: Are not optimized indices causing you any problems (e.g. slow searches, high number of open file handles)? If no, then you don't even need to optimize until those issues become... issues. OK I have changed the process to not doing optimize() at all. So far so good. The number of files hover from 10 to 40 during the indexing of 10,000 files. Seems Lucene is doing some kind of self maintenance to keep things in order. Is it right to say optimize() is a totally optional operation? I probably get the impression it is a natural step to end an incremental update from the IndexHTML example. Since it replicates the whole index it might be an overkill for many applications to do daily. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
how often to optimize?
Right now I am incrementally adding about 100 documents to the index a day and then optimize after that. I find that optimize essentially rebuilding the entire index into a single file. So the size of disk write is proportion to the total index size, not to the size of documents incrementally added. So my question is would it be an overkill to optimize everyday? Is there any guideline on how often to optimize? Every 1000 documents or more? Every week? Is there any concern if there are a lot of documents added without optimizing? Thanks. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: how often to optimize?
Hello, I think some of these questions my be answered in the jGuru FAQ So my question is would it be an overkill to optimize everyday? Only if lots of documents are being added/deleted, and you end up with a lot of index segments. Is there any guideline on how often to optimize? Every 1000 documents or more? Are not optimized indices causing you any problems (e.g. slow searches, high number of open file handles)? If no, then you don't even need to optimize until those issues become... issues. Every week? Is there any concern if there are a lot of documents added without optimizing? Possibly, see my answer above. Otis - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: finalize delete without optimize
Hello John, Once you make your change locally, use 'cvs diff -u IndexWriter.java indexwriter.patch' to make a patch. Then open a new Bugzilla entry. Finally, attach your patch to that entry. Note that Document deletion is actually done from IndexReader, so your patch may have to be on IndexReader, not IndexWriter. Thanks, Otis --- John Wang [EMAIL PROTECTED] wrote: Hi Otis: Thanks for you reply. I am looking for more of an API call than a tool. e.g. IndexWriter.finalizeDelete() If I implement this, how would I go about submitting a patch? thanks -John On Mon, 13 Dec 2004 22:24:12 -0800 (PST), Otis Gospodnetic [EMAIL PROTECTED] wrote: Hello John, I believe you didn't get any replies to this. What you are describing cannot be done using the public, but maaay (no source code on this machine, so I can't double-check that) be doable if you use some of the 'internal' methods. I don't have the need for this, but others might, so it may be worth developing a tool that purges Documents marked as deleted without the expensive segment merging, iff that is possible. If you put this tool under the approprite org.apache.lucene... package, you'll get access to 'internal' methods, of course. If you end up creating this, we could stick it in the Sandbox, where we should really create a new section for handy command-line tools that manipulate the index. Otis --- John Wang [EMAIL PROTECTED] wrote: Hi: Is there a way to finalize delete, e.g. actually remove them from the segments and make sure the docIDs are contiguous again. The only explicit way to do this is by calling IndexWriter.optmize(). But this call does a lot more (also merges all the segments), hence is very expensive. Is there a way to simply just finalize the deletes without having to merge all the segments? If not, I'd be glad to submit an implementation of this feature if the Lucene devs agree this is useful. Thanks -John - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: finalize delete without optimize
Hello John, I believe you didn't get any replies to this. What you are describing cannot be done using the public, but maaay (no source code on this machine, so I can't double-check that) be doable if you use some of the 'internal' methods. I don't have the need for this, but others might, so it may be worth developing a tool that purges Documents marked as deleted without the expensive segment merging, iff that is possible. If you put this tool under the approprite org.apache.lucene... package, you'll get access to 'internal' methods, of course. If you end up creating this, we could stick it in the Sandbox, where we should really create a new section for handy command-line tools that manipulate the index. Otis --- John Wang [EMAIL PROTECTED] wrote: Hi: Is there a way to finalize delete, e.g. actually remove them from the segments and make sure the docIDs are contiguous again. The only explicit way to do this is by calling IndexWriter.optmize(). But this call does a lot more (also merges all the segments), hence is very expensive. Is there a way to simply just finalize the deletes without having to merge all the segments? If not, I'd be glad to submit an implementation of this feature if the Lucene devs agree this is useful. Thanks -John - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: finalize delete without optimize
Hi Otis: Thanks for you reply. I am looking for more of an API call than a tool. e.g. IndexWriter.finalizeDelete() If I implement this, how would I go about submitting a patch? thanks -John On Mon, 13 Dec 2004 22:24:12 -0800 (PST), Otis Gospodnetic [EMAIL PROTECTED] wrote: Hello John, I believe you didn't get any replies to this. What you are describing cannot be done using the public, but maaay (no source code on this machine, so I can't double-check that) be doable if you use some of the 'internal' methods. I don't have the need for this, but others might, so it may be worth developing a tool that purges Documents marked as deleted without the expensive segment merging, iff that is possible. If you put this tool under the approprite org.apache.lucene... package, you'll get access to 'internal' methods, of course. If you end up creating this, we could stick it in the Sandbox, where we should really create a new section for handy command-line tools that manipulate the index. Otis --- John Wang [EMAIL PROTECTED] wrote: Hi: Is there a way to finalize delete, e.g. actually remove them from the segments and make sure the docIDs are contiguous again. The only explicit way to do this is by calling IndexWriter.optmize(). But this call does a lot more (also merges all the segments), hence is very expensive. Is there a way to simply just finalize the deletes without having to merge all the segments? If not, I'd be glad to submit an implementation of this feature if the Lucene devs agree this is useful. Thanks -John - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: finalize delete without optimize
Lucene standard API does not support this kind of operation. Aviran http://www.aviransplace.com -Original Message- From: John Wang [mailto:[EMAIL PROTECTED] Sent: Wednesday, December 08, 2004 17:32 PM To: [EMAIL PROTECTED] Subject: Re: finalize delete without optimize Hi folks: I sent this out a few days ago without a response. Please help. Thanks in advance -John On Mon, 6 Dec 2004 21:15:00 -0800, John Wang [EMAIL PROTECTED] wrote: Hi: Is there a way to finalize delete, e.g. actually remove them from the segments and make sure the docIDs are contiguous again. The only explicit way to do this is by calling IndexWriter.optmize(). But this call does a lot more (also merges all the segments), hence is very expensive. Is there a way to simply just finalize the deletes without having to merge all the segments? If not, I'd be glad to submit an implementation of this feature if the Lucene devs agree this is useful. Thanks -John - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
finalize delete without optimize
Hi: Is there a way to finalize delete, e.g. actually remove them from the segments and make sure the docIDs are contiguous again. The only explicit way to do this is by calling IndexWriter.optmize(). But this call does a lot more (also merges all the segments), hence is very expensive. Is there a way to simply just finalize the deletes without having to merge all the segments? If not, I'd be glad to submit an implementation of this feature if the Lucene devs agree this is useful. Thanks -John - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: JDBCDirectory to prevent optimize()?
On Tuesday 23 November 2004 00:06, Kevin A. Burton wrote: I'm wondering about the potential for a generic JDBCDirectory for keeping the lucene index within a database. Such a thing already exists: http://ppinew.mnis.com/jdbcdirectory/, but I don't know about its scalability. Regards Daniel -- http://www.danielnaber.de - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: JDBCDirectory to prevent optimize()?
Also, there is a DBDirectory in the sandbox to store a Lucene index inside Berkeley DB. Erik On Nov 22, 2004, at 6:06 PM, Kevin A. Burton wrote: It seems that when compared to other datastores that Lucene starts to fall down. For example lucene doesn't perform online index optimizations so if you add 10 documents you have to run optimize() again and this isn't exactly a fast operation. I'm wondering about the potential for a generic JDBCDirectory for keeping the lucene index within a database. It sounds somewhat unconventional would allow you to perform live addDirectory updates without performing an optimize() again. Has anyone looked at this? How practical would it be. Kevin -- Use Rojo (RSS/Atom aggregator). Visit http://rojo.com. Ask me for an invite! Also see irc.freenode.net #rojo if you want to chat. Rojo is Hiring! - http://www.rojonetworks.com/JobsAtRojo.html If you're interested in RSS, Weblogs, Social Networking, etc... then you should work for Rojo! If you recommend someone and we hire them you'll get a free iPod! Kevin A. Burton, Location - San Francisco, CA AIM/YIM - sfburtonator, Web - http://peerfear.org/ GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: JDBCDirectory to prevent optimize()?
Erik Hatcher wrote: Also, there is a DBDirectory in the sandbox to store a Lucene index inside Berkeley DB. I assume this would prevent prefix queries from working... Kevin -- Use Rojo (RSS/Atom aggregator). Visit http://rojo.com. Ask me for an invite! Also see irc.freenode.net #rojo if you want to chat. Rojo is Hiring! - http://www.rojonetworks.com/JobsAtRojo.html If you're interested in RSS, Weblogs, Social Networking, etc... then you should work for Rojo! If you recommend someone and we hire them you'll get a free iPod! Kevin A. Burton, Location - San Francisco, CA AIM/YIM - sfburtonator, Web - http://peerfear.org/ GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: JDBCDirectory to prevent optimize()?
On Nov 23, 2004, at 6:02 PM, Kevin A. Burton wrote: Erik Hatcher wrote: Also, there is a DBDirectory in the sandbox to store a Lucene index inside Berkeley DB. I assume this would prevent prefix queries from working... Huh? Why would you assume that? As far as I know, and I've tested this some, a Lucene index inside Berkeley DB works the same as if it had been in RAM or on the filesystem. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
JDBCDirectory to prevent optimize()?
It seems that when compared to other datastores that Lucene starts to fall down. For example lucene doesn't perform online index optimizations so if you add 10 documents you have to run optimize() again and this isn't exactly a fast operation. I'm wondering about the potential for a generic JDBCDirectory for keeping the lucene index within a database. It sounds somewhat unconventional would allow you to perform live addDirectory updates without performing an optimize() again. Has anyone looked at this? How practical would it be. Kevin -- Use Rojo (RSS/Atom aggregator). Visit http://rojo.com. Ask me for an invite! Also see irc.freenode.net #rojo if you want to chat. Rojo is Hiring! - http://www.rojonetworks.com/JobsAtRojo.html If you're interested in RSS, Weblogs, Social Networking, etc... then you should work for Rojo! If you recommend someone and we hire them you'll get a free iPod! Kevin A. Burton, Location - San Francisco, CA AIM/YIM - sfburtonator, Web - http://peerfear.org/ GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
using optimize and addDocument concurrently.
Hi, My basic question is whether it is possible to continue to add documents to an index in one Thread while running a long running optimization of the index (approx 30 mins) in another thread. I'm using Lucene version 1.4.2. The concurrency matrix at http://www.jguru.com/faq/view.jsp?EID=913302 shows that if you use the same IndexWriter object you can do concurrent writes and optimization. When I try it in my program the addDocuments wait until the optimization has finished, so in this respect it is Thread safe, but the operations cannot be performed at the same time. Our problem is that the index needs to be continually kept up to date with new news articles, but also needs to be regularly optimized to keep it fast. If I cannot update and optimize one index at the same time the best way I can see of doing this is maintaining multiple identical indexes and offlining, optimizing, letting them catch up-to-date and re-onlining them. Does that sounds best to you? Thanks a lot in advance Steve
RE: using optimize and addDocument concurrently.
Steve, The behavior that you descibe is as expected. I have tackled a similar problem to yours by creating a proxy object that acts as a gatekeeper to all IndexReader, IndexSearcher and IndexWriter operations. With fully synchronized access to all methods of the proxy you will not run into any problems. Everytime I need to perform something with the writer, I close the searcher etc. As to regular optimization I tend to reindex now and again with a completely seperate writer and replace the index by moving it to the new location. This BTW has also become a method in my proxy object. Hope this helps, Cheers, Aad Hi, My basic question is whether it is possible to continue to add documents to an index in one Thread while running a long running optimization of the index (approx 30 mins) in another thread. I'm using Lucene version 1.4.2. The concurrency matrix at http://www.jguru.com/faq/view.jsp?EID=913302 shows that if you use the same IndexWriter object you can do concurrent writes and optimization. When I try it in my program the addDocuments wait until the optimization has finished, so in this respect it is Thread safe, but the operations cannot be performed at the same time. Our problem is that the index needs to be continually kept up to date with new news articles, but also needs to be regularly optimized to keep it fast. If I cannot update and optimize one index at the same time the best way I can see of doing this is maintaining multiple identical indexes and offlining, optimizing, letting them catch up-to-date and re-onlining them. Does that sounds best to you? Thanks a lot in advance Steve - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
IndexReader.close() semantics and optimize -- Re: problem with locks when updating the data of a previous stored document
Crump, Michael wrote: You have to close the IndexReader after doing the delete, before opening the IndexWriter for the addition. See information at this link: http://wiki.apache.org/jakarta-lucene/UpdatingAnIndex Recently I thought I observed that if I use this batch update idiom (1st delete the changed docs, then add them), it seems that IndexReader.close() does not flush/commit the deletions - rather IndexWriter.optimize() does. I may have been confused and should retest this, but regardless, the javadoc seems unclear. close() says it *saves* deletions to disk. What does it mean to save a deletion? Save a pending one, or commit it (commit - really delete it)? http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/index/IndexReader.html#close() Also optimize doesn't mention deletions. http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/index/IndexWriter.html#optimize() Suggestion: could the word save in the close() jdoc be elaborated on, and possibly could optimize() get another comment wrt its effect on deletions? thx, Dave Regards, Michael -Original Message- From: Paul Williams [mailto:[EMAIL PROTECTED] Sent: Thu 9/16/2004 5:39 AM To: 'Lucene Users List' Cc: Subject:problem with locks when updating the data of a previous stored document Hi, Using lucene-1.4.1.jar on WinXP I am having trouble with locking and updating an existing Lucene document. I delete the old document from the index and then add the new document to the index writer. I am using the minMerge docs set to 100 (much quicker!!) and close the writer once the batch is done, so the documents are flushed to the filesystem The problem i am having is I can't delete the old version of the document (after the first document has been added) using reader.delete because there is a lock on the index due to the IndexWriter being open. Am I doing this wrong or is there a simple way round this? Regards, Paul Code snippets of the update code (I have just cut and pasted the relevant line from my app to get an idea) reader = IndexReader.open(location); // Delete old doc/term if present if (reader.docFreq(docNumberTerm) 0) { reader.delete(docNumberTerm); . . . IndexWriter writer = null; // get the writer from the hash table so last few are cached and don't have to be restarted synchronized(IndexWriterCache) { String dbstring = + ldb; writer = (IndexWriter)IndexWriterCache.get(dbstring); if (writer == null) { //Not in cache so create one and add to cache for next time writer = new IndexWriter(location, new StandardAnalyzer(), new_index); writer.setUseCompoundFile(true); // Set the maximum number of entries per field. Default is 10,000 writer.maxFieldLength = MaxFieldCount; // Set how many docs will be stored in memory before being saved to disk writer.minMergeDocs = (int) DocsInMemory; IndexWriterCache.remove(dbstring); IndexWriterCache.put(dbstring, writer); } . . . // Add the docuents to the Lucene index writer.addDocument(doc); . . Some time later after a batch of docs been added writer.close(); DISCLAIMER: The information in this message is confidential and may be legally privileged. It is intended solely for the addressee. Access to this message by anyone else is unauthorised. If you are not the intended recipient, any disclosure, copying, or distribution of the message, or any action or omission taken by you in reliance on it, is prohibited and may be unlawful. Please immediately contact the sender if you have received this message in error. Thank you. Valid Information Systems Limited. Address: Morline House, 160 London Road, Barking, Essex, IG11 8BB. http://www.valinf.com Tel: +44 (0) 20 8215 1414 Fax: +44 (0) 20 8215 2040 Please note that as part of our drive to continually improve the service to our clients, we have established a dedicated support line for customers to call if they are in need of help with their installation of R/KYV or have a query regarding the operation of the software. The number is - 0870 0161414 This will ensure any call is carefully noted, any action required is scheduled for completion and any problem experienced handled by a carefully chosen team of developers. Please make a note of this number and pass it on to any other relevant person within your organisation. * -- Visit Valid who will sharing a stand with partners, Goss Interactive at the SOCITM Event, 10- 12 October 2004, Edinburgh International Conference
Re: Way to repair an index broking during 1/2 optimize?
Kevin A. Burton wrote: With the typical handful of fields, one should never see more than hundreds of files. We only have 13 fields... Though to be honest I'm worried that even if I COULD do the optimize that it would run out of file handles. Optimization doesn't open all files at once. The most files that are ever opened by an IndexWriter is just: 4 + (5 + numIndexedFields) * (mergeFactor-1) This includes during optimization. However, when searching, an IndexReader must keep most files open. In particular, the maximum number of files an unoptimized, non-compound IndexReader can have open is: (5 + numIndexedFields) * (mergeFactor-1) * (log_base_mergeFactor(numDocs/minMergeDocs)) A compound IndexReader, on the other hand, should open at most, just: (mergeFactor-1) * (log_base_mergeFactor(numDocs/minMergeDocs)) An optimized, non-compound IndexReader will open just (5 + numIndexedFields) files. And an optimized, compound IndexReader should only keep one file open. Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Way to repair an index broking during 1/2 optimize?
You might try merging the existing index into a new index located on a ram disk. Once it is done, you can move the directory from ram disk back to your hard disk. I think this will work as long as the old index did not finish merging. You might do a strings command on the segments file to make sure the new (merged) segment is not in there, and if there's a deletable file, make sure there are no segments from the old index listed therein. - Original Message - From: Kevin A. Burton [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Thursday, July 08, 2004 2:02 PM Subject: Way to repair an index broking during 1/2 optimize? So.. the other day I sent an email about building an index with 14M documents. That went well but the optimize() was taking FOREVER. It took 7 hours to generate the whole index and when complete as of 10AM it was still optimizing (6 hours later) and I needed the box back. So is it possible to fix this index now? Can I just delete the most recent segment that was created? I can find this by ls -alt Also... what can I do to speed up this optimize? Ideally it wouldn't take 6 hours. Kevin -- Please reply using PGP. http://peerfear.org/pubkey.asc NewsMonster - http://www.newsmonster.org/ Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965 AIM/YIM - sfburtonator, Web - http://peerfear.org/ GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412 IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Way to repair an index broking during 1/2 optimize?
Kevin A. Burton wrote: So is it possible to fix this index now? Can I just delete the most recent segment that was created? I can find this by ls -alt Sorry, I forgot to answer your question: this should work fine. I don't think you should even have to delete that segment. Also, to elaborate on my previous comment, a mergeFactor of 5000 not only delays the work until the end, but it also makes the disk workload more seek-dominated, which is not optimal. So I suspect a smaller merge factor, together with a larger minMergeDocs, will be much faster overall, including the final optimize(). Please tell us how it goes. Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Way to repair an index broking during 1/2 optimize?
Peter M Cipollone wrote: You might try merging the existing index into a new index located on a ram disk. Once it is done, you can move the directory from ram disk back to your hard disk. I think this will work as long as the old index did not finish merging. You might do a strings command on the segments file to make sure the new (merged) segment is not in there, and if there's a deletable file, make sure there are no segments from the old index listed therein. Its a HUGE index. It won't fit in memory ;) Right now its at 8G... Thanks though! :) Kevin -- Please reply using PGP. http://peerfear.org/pubkey.asc NewsMonster - http://www.newsmonster.org/ Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965 AIM/YIM - sfburtonator, Web - http://peerfear.org/ GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412 IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Way to repair an index broking during 1/2 optimize?
Doug Cutting wrote: Kevin A. Burton wrote: Also... what can I do to speed up this optimize? Ideally it wouldn't take 6 hours. Was this the index with the mergeFactor of 5000? If so, that's why it's so slow: you've delayed all of the work until the end. Indexing on a ramfs will make things faster in general, however, if you have enough RAM... No... I changed the mergeFactor back to 10 as you suggested. Kevin -- Please reply using PGP. http://peerfear.org/pubkey.asc NewsMonster - http://www.newsmonster.org/ Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965 AIM/YIM - sfburtonator, Web - http://peerfear.org/ GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412 IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Way to repair an index broking during 1/2 optimize?
Doug Cutting wrote: Kevin A. Burton wrote: So is it possible to fix this index now? Can I just delete the most recent segment that was created? I can find this by ls -alt Sorry, I forgot to answer your question: this should work fine. I don't think you should even have to delete that segment. I'm worried about duplicate or missing content from the original index. I'd rather rebuild the index and waste another 6 hours (I've probably blown 100 hours of CPU time on this already) and have a correct index :) During an optimize I assume Lucene starts writing to a new segment and leaves all others in place until everything is done and THEN deletes them? Also, to elaborate on my previous comment, a mergeFactor of 5000 not only delays the work until the end, but it also makes the disk workload more seek-dominated, which is not optimal. The only settings I uses are: targetIndex.mergeFactor=10; targetIndex.minMergeDocs=1000; the resulting index has 230k files in it :-/ I assume this is contributing to all the disk seeks. So I suspect a smaller merge factor, together with a larger minMergeDocs, will be much faster overall, including the final optimize(). Please tell us how it goes. This is what I did for this last round but then I ended up with the highly fragmented index. hm... Thanks for all the help btw! Kevin -- Please reply using PGP. http://peerfear.org/pubkey.asc NewsMonster - http://www.newsmonster.org/ Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965 AIM/YIM - sfburtonator, Web - http://peerfear.org/ GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412 IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Way to repair an index broking during 1/2 optimize?
Kevin A. Burton wrote: No... I changed the mergeFactor back to 10 as you suggested. Then I am confused about why it should take so long. Did you by chance set the IndexWriter.infoStream to something, so that it logs merges? If so, it would be interesting to see that output, especially the last entry. Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Way to repair an index broking during 1/2 optimize?
Doug Cutting wrote: Kevin A. Burton wrote: No... I changed the mergeFactor back to 10 as you suggested. Then I am confused about why it should take so long. Did you by chance set the IndexWriter.infoStream to something, so that it logs merges? If so, it would be interesting to see that output, especially the last entry. No I didn't actually... If I run it again I'll be sure to do this. -- Please reply using PGP. http://peerfear.org/pubkey.asc NewsMonster - http://www.newsmonster.org/ Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965 AIM/YIM - sfburtonator, Web - http://peerfear.org/ GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412 IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Way to repair an index broking during 1/2 optimize?
Kevin A. Burton wrote: During an optimize I assume Lucene starts writing to a new segment and leaves all others in place until everything is done and THEN deletes them? That's correct. The only settings I uses are: targetIndex.mergeFactor=10; targetIndex.minMergeDocs=1000; the resulting index has 230k files in it :-/ Something sounds very wrong for there to be that many files. The maximum number of files should be around: (7 + numIndexedFields) * (mergeFactor-1) * (log_base_mergeFactor(numDocs/minMergeDocs)) With 14M documents, log_10(14M/1000) is 4, which gives, for you: (7 + numIndexedFields) * 36 = 230k 7*36 + numIndexedFields*36 = 230k numIndexedFields = (230k - 7*36) / 36 =~ 6k So you'd have to have around 6k unique field names to get 230k files. Or something else must be wrong. Are you running on win32, where file deletion can be difficult? With the typical handful of fields, one should never see more than hundreds of files. Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Way to repair an index broking during 1/2 optimize?
Doug Cutting wrote: Something sounds very wrong for there to be that many files. The maximum number of files should be around: (7 + numIndexedFields) * (mergeFactor-1) * (log_base_mergeFactor(numDocs/minMergeDocs)) With 14M documents, log_10(14M/1000) is 4, which gives, for you: (7 + numIndexedFields) * 36 = 230k 7*36 + numIndexedFields*36 = 230k numIndexedFields = (230k - 7*36) / 36 =~ 6k So you'd have to have around 6k unique field names to get 230k files. Or something else must be wrong. Are you running on win32, where file deletion can be difficult? With the typical handful of fields, one should never see more than hundreds of files. We only have 13 fields... Though to be honest I'm worried that even if I COULD do the optimize that it would run out of file handles. This is very strange... I'm going to increase minMergeDocs to 1 and then run the full converstion on one box and then try to do an optimize (of the corrupt) another box. See which one finishes first. I assume the speed of optimize() can be increased the same way that indexing is increased... Kevin -- Please reply using PGP. http://peerfear.org/pubkey.asc NewsMonster - http://www.newsmonster.org/ Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965 AIM/YIM - sfburtonator, Web - http://peerfear.org/ GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412 IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
addIndexes and optimize
Hey y'all again, Just wondering why the IndexWriter.addIndexes method calls optimize before and after it starts merging segments together. We would like to create an addIndexes method that doesn't optimize and call optimize on the IndexWriter later. Roy. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: optimize() is not merging into single file? !!!!!!
I rechecked the results. Here they are: IndexWriter compiled with v.1.4-rc2 generates after optimization _36d.cfs3779 kb IndexWriter compiled with v.1.4-rc3 generates after optimization _36d.cfs 3778 kb _36c.cfs31 kb _35z.cfs14 kb _35o.cfs 14 kb . etc. I both cases segment file contains _36d.cfs Looks like new version just foget to clean up Iouli Golovatyi/X/GP/[EMAIL PROTECTED] 01.06.2004 17:22 Please respond to Lucene Users List To: [EMAIL PROTECTED] cc: Subject:optimeze() is not merging into single file? Category: I optimize and close the index after that, but don't get just one .cvs file as it promised in doc. Instead of it I see something like small segments and a couple of big. This weird behavor seems started since i changed from v 1.4-rc2 to 1.4-rc3. Before I got just one cvs segment . Any ideas? Thanks in advance J.
RE : optimize() is not merging into single file? !!!!!!
Hello, I am running a two-week old version of Lucene from the CVS HEAD and seeing the same behavior.? Regards, RBP -Message d'origine- De : [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Envoy : mercredi 2 juin 2004 13:53 : Lucene Users List Objet : Re: optimize() is not merging into single file? !! I rechecked the results. Here they are: IndexWriter compiled with v.1.4-rc2 generates after optimization _36d.cfs3779 kb IndexWriter compiled with v.1.4-rc3 generates after optimization _36d.cfs 3778 kb _36c.cfs31 kb _35z.cfs14 kb _35o.cfs 14 kb . etc. I both cases segment file contains _36d.cfs Looks like new version just foget to clean up Iouli Golovatyi/X/GP/[EMAIL PROTECTED] 01.06.2004 17:22 Please respond to Lucene Users List To: [EMAIL PROTECTED] cc: Subject:optimeze() is not merging into single file? Category: I optimize and close the index after that, but don't get just one .cvs file as it promised in doc. Instead of it I see something like small segments and a couple of big. This weird behavor seems started since i changed from v 1.4-rc2 to 1.4-rc3. Before I got just one cvs segment . Any ideas? Thanks in advance J. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: optimize fails with Negative seek offset
Hi, sorry for following up my own mail, but since no one responded so far, I thought the stacktrace might be of interested. The following exception always occurs when trying to optimize one of our indizes, which always went ok for about a year now. I just tried with 1.4-rc3, but with the same result: java.io.IOException: Negative seek offset at java.io.RandomAccessFile.seek(Native Method) at org.apache.lucene.store.FSInputStream.readInternal(FSDirectory.java:405) at org.apache.lucene.store.InputStream.readBytes(InputStream.java:61) at org.apache.lucene.index.CompoundFileReader$CSInputStream.readInternal(CompoundFileReader.java:222) at org.apache.lucene.store.InputStream.refill(InputStream.java:158) at org.apache.lucene.store.InputStream.readByte(InputStream.java:43) at org.apache.lucene.store.InputStream.readVInt(InputStream.java:83) at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:63) at org.apache.lucene.index.SegmentReader.document(SegmentReader.java:238) at org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:185) at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:92) at org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:483) at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:362) at LuceneRPCHandler.optimize(LuceneRPCHandler.java:398) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:324) at org.apache.xmlrpc.Invoker.execute(Invoker.java:168) at org.apache.xmlrpc.XmlRpcWorker.invokeHandler(XmlRpcWorker.java:123) at org.apache.xmlrpc.XmlRpcWorker.execute(XmlRpcWorker.java:185) at org.apache.xmlrpc.XmlRpcServer.execute(XmlRpcServer.java:151) at org.apache.xmlrpc.XmlRpcServer.execute(XmlRpcServer.java:139) at org.apache.xmlrpc.WebServer$Connection.run(WebServer.java:773) at org.apache.xmlrpc.WebServer$Runner.run(WebServer.java:656) at java.lang.Thread.run(Thread.java:534) Any hint would be greatly appreciated. Thanks, Sascha -- Gallileus - the power of knowledge Gallileus GmbHhttp://www.gallileus.info/ Pintschstraße 16 fon +49-(0)30-41 93 43 43 10249 Berlin fax +49-(0)30-41 93 43 45 Germany ++ AKTUELLER HINWEIS (Mai 2004) Literatur Alerts - Literatursuche (wie) im Schlaf! Ab jetzt mehr dazu unter: http://www.gallileus.info/gallileus/about/products/alerts/ ++ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: optimize fails with Negative seek offset
Looks like the same error I got when I tried to use Lucene version 1.3 to search on an index I had created with Lucene version 1.4. The versions are not forward compatible. Did you by chance create the index with version 1.4 and are now searching with version 1.3. It's easy to get the dependencies out of sync for different apps, which is what happened to me. -vito On Wed, 2004-05-12 at 04:59, Sascha Ottolski wrote: Hi, sorry for following up my own mail, but since no one responded so far, I thought the stacktrace might be of interested. The following exception always occurs when trying to optimize one of our indizes, which always went ok for about a year now. I just tried with 1.4-rc3, but with the same result: java.io.IOException: Negative seek offset at java.io.RandomAccessFile.seek(Native Method) at org.apache.lucene.store.FSInputStream.readInternal(FSDirectory.java:405) at org.apache.lucene.store.InputStream.readBytes(InputStream.java:61) at org.apache.lucene.index.CompoundFileReader$CSInputStream.readInternal(CompoundFileReader.java:222) at org.apache.lucene.store.InputStream.refill(InputStream.java:158) at org.apache.lucene.store.InputStream.readByte(InputStream.java:43) at org.apache.lucene.store.InputStream.readVInt(InputStream.java:83) at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:63) at org.apache.lucene.index.SegmentReader.document(SegmentReader.java:238) at org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:185) at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:92) at org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:483) at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:362) at LuceneRPCHandler.optimize(LuceneRPCHandler.java:398) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:324) at org.apache.xmlrpc.Invoker.execute(Invoker.java:168) at org.apache.xmlrpc.XmlRpcWorker.invokeHandler(XmlRpcWorker.java:123) at org.apache.xmlrpc.XmlRpcWorker.execute(XmlRpcWorker.java:185) at org.apache.xmlrpc.XmlRpcServer.execute(XmlRpcServer.java:151) at org.apache.xmlrpc.XmlRpcServer.execute(XmlRpcServer.java:139) at org.apache.xmlrpc.WebServer$Connection.run(WebServer.java:773) at org.apache.xmlrpc.WebServer$Runner.run(WebServer.java:656) at java.lang.Thread.run(Thread.java:534) Any hint would be greatly appreciated. Thanks, Sascha -- Gallileus - the power of knowledge Gallileus GmbHhttp://www.gallileus.info/ Pintschstrae 16 fon +49-(0)30-41 93 43 43 10249 Berlin fax +49-(0)30-41 93 43 45 Germany ++ AKTUELLER HINWEIS (Mai 2004) Literatur Alerts - Literatursuche (wie) im Schlaf! Ab jetzt mehr dazu unter: http://www.gallileus.info/gallileus/about/products/alerts/ ++ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: optimize fails with Negative seek offset
Am Mittwoch, 12. Mai 2004 18:54 schrieb Anthony Vito: Looks like the same error I got when I tried to use Lucene version 1.3 to search on an index I had created with Lucene version 1.4. The versions are not forward compatible. Did you by chance create the index with version 1.4 and are now searching with version 1.3. It's easy to get the dependencies out of sync for different apps, which is what happened to me. -vito Hi vito, thanks for the reply, but no, we only upgraded so far, but did not downgade. More than that, the failing index was just rebuilt completely with 1.4-rc2, only two weeks ago. The problem started a short time afterwards (but not immediately). Greets, Sascha -- Gallileus - the power of knowledge Gallileus GmbHhttp://www.gallileus.info/ Pintschstrae 16 fon +49-(0)30-41 93 43 43 10249 Berlin fax +49-(0)30-41 93 43 45 Germany ++ AKTUELLER HINWEIS (Mai 2004) Literatur Alerts - Literatursuche (wie) im Schlaf! Ab jetzt mehr dazu unter: http://www.gallileus.info/gallileus/about/products/alerts/ ++ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Memory requirements for optimize() on compound index high?
Hi, I am working on an application which uses Lucene 1.3 Final which uses the compound index format on Win32 Sun JVM 1.4.1_02. I have set maxFieldLength for the index writer to 1,000,000, as often I have to index potentially very large documents, which contain information that must be indexed. All other index writer parameters have their default values. The application loads all documents in a batch phase, and then allows the user to perform searchers. Typically, no new documents are added afterwards. Given the large size set for maxFieldLength, I have allocated 512MB of memory to the JVM. For indexing 1,000,000 complex documents, with potentially around 30 fields each, this seems to work fine. I have noticed that when performing an optimize() on this index at the end of a batch load, the memory requirements seem to be much higher. I was receiving OutOfMemoryErrors for a 512MB JVM. I increased the JVM size to 1 GIG, and the optimize operation completed successfully. Task manager reported a peak VM size of 810MB during the optimize() operation, from a newly-created JVM. FWIW, the final index size was 11 gigabytes - most document fields are stored in the index. Do people have similar experiences to this when calling optimize() on a compound index? Are there any ways I can reduce the amount of memory required, apart from making maxFieldLength smaller? Are there any way of determining in advance the kind of memory requirements optimize() will require? Its highly undesirable to receive OutOfMemoryErrors during optimize(). I guess the user can still search on an unoptimized index which is better than nothing... -- Cheers, David This message is intended only for the named recipient. If you are not the intended recipient you are notified that disclosing, copying, distributing or taking any action in reliance on the contents of this information is strictly prohibited. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
optimize fails with Negative seek offset
Hi, I have no idea where to look for, and I know almost nothing about java :-( We're using lucene quite a while now (about a year I guess) and suddenly I've seen this when trying to optimize the index: java.lang.Exception: java.io.IOException: Negative seek offset The code throwing this was: public boolean optimize() throws IOException { IndexWriter writer = new IndexWriter(this.indexpath, new StandardAnalyzer(), false); writer.mergeFactor = this.mergeFactor; try { writer.optimize(); writer.close(); } finally { this.changedIndex(); } return true; } The index-file is about 8.8 GB now. However, when the exception occurs the new temporary index-file only grew to 3.2 GB. All this with 1.4-rc2. Thanks in advance for any advice, Sascha -- Gallileus - the power of knowledge Gallileus GmbHhttp://www.gallileus.info/ Pintschstraße 16 fon +49-(0)30-41 93 43 43 10249 Berlin fax +49-(0)30-41 93 43 45 Germany ++ AKTUELLER HINWEIS (Mai 2004) Literatur Alerts - Literatursuche (wie) im Schlaf! Ab jetzt mehr dazu unter: http://www.gallileus.info/gallileus/about/products/alerts/ ++ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Preventing duplicate document insertion during optimize
Let's say you have two indexes each with the same document literal. All the fields hash the same and the document is a binary duplicate of a different document in the second index. What happens when you do a merge to create a 3rd index from the first two? I assume you now have two documents that are identical in one index. Is there any way to prevent this? It would be nice to figure out if there's a way to flag a field as a primary key so that if it has already added it to just skip. Kevin -- Please reply using PGP. http://peerfear.org/pubkey.asc NewsMonster - http://www.newsmonster.org/ Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965 AIM/YIM - sfburtonator, Web - http://peerfear.org/ GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412 IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster signature.asc Description: OpenPGP digital signature
Re: Preventing duplicate document insertion during optimize
Kevin, I have a similar issue. The only solution I have been able to come up with is, after the merge, to open an IndexReader against the merge index, iterate over all the docs and delete duplicate docs based on my primary key field. Jim --- Kevin A. Burton [EMAIL PROTECTED] wrote: Let's say you have two indexes each with the same document literal. All the fields hash the same and the document is a binary duplicate of a different document in the second index. What happens when you do a merge to create a 3rd index from the first two? I assume you now have two documents that are identical in one index. Is there any way to prevent this? It would be nice to figure out if there's a way to flag a field as a primary key so that if it has already added it to just skip. Kevin -- Please reply using PGP. http://peerfear.org/pubkey.asc NewsMonster - http://www.newsmonster.org/ Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965 AIM/YIM - sfburtonator, Web - http://peerfear.org/ GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412 IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster ATTACHMENT part 2 application/pgp-signature name=signature.asc __ Do you Yahoo!? Win a $20,000 Career Makeover at Yahoo! HotJobs http://hotjobs.sweepstakes.yahoo.com/careermakeover - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Will failed optimize corrupt an index?
The index should be fine. Lucene index updates are atomic. Doug Dan Quaroni wrote: My index grew about 7 gigs larger than I projected it would, and it ran out of disk space during optimize. Does lucene have transactions or anything that would prevent this from corrupting an index, or do I need to generate the index again? Thanks! - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Will failed optimize corrupt an index?
My index grew about 7 gigs larger than I projected it would, and it ran out of disk space during optimize. Does lucene have transactions or anything that would prevent this from corrupting an index, or do I need to generate the index again? Thanks! - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Will failed optimize corrupt an index?
HI, From: Dan Quaroni [mailto:[EMAIL PROTECTED] My index grew about 7 gigs larger than I projected it would, and it ran out of disk space during optimize. Does lucene have transactions or anything that would prevent this from corrupting an index, or do I need to generate the index again? You must generate index again. Pasha Lucene.Net www.sourceforge.net/projects/lucenedotnet - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Files getting deleted when optimize is killed?
Upon further examination what I found is this: - Killing the process while optimize() is still working does NOT cause the index files to be deleted, HOWEVER -- - Once the index is opened again by a new process (now apparently in an unstable state due to the incomplete optimize()), at that time all existing files are deleted and only a file called segments remains. -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Sent: Saturday, July 12, 2003 7:06 AM To: Lucene Users List Subject: Re: Files getting deleted when optimize is killed? --- Steve Rajavuori [EMAIL PROTECTED] wrote: I've had a problem on several occasions where my entire index is deleted -- that is, EVERY file (except 'segments') is gone. There were many users on the system each time, so its a little hard to tell for sure what was going on, but my theory is this: My code will automatically call optimize( ) periodically. Because the index is very large, it can take a long time. It looks like an administrator may have killed my process, and its possible that it was killed while an optimize( ) was in progress. I have two questions: 1) Does anyone know if killing an optimize( ) in progress could wipe out all files like this? (New index created in temporary files that were not saved properly, while old index files were already deleted???) I highly doubt it. 2) Does anyone know of any other way all files in an index could be inadvertently deleted (e.g. through killing a process)? For example, if you kill the process during an 'add' would that cause all files to be deleted? Same as above. You can create an artificial, large index for testing purposes. Call optimize once in a while, and then kill the process. I don't think Lucene will remove your files. Otis __ Do you Yahoo!? SBC Yahoo! DSL - Now only $29.95 per month! http://sbc.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Files getting deleted when optimize is killed?
--- Steve Rajavuori [EMAIL PROTECTED] wrote: I've had a problem on several occasions where my entire index is deleted -- that is, EVERY file (except 'segments') is gone. There were many users on the system each time, so its a little hard to tell for sure what was going on, but my theory is this: My code will automatically call optimize( ) periodically. Because the index is very large, it can take a long time. It looks like an administrator may have killed my process, and its possible that it was killed while an optimize( ) was in progress. I have two questions: 1) Does anyone know if killing an optimize( ) in progress could wipe out all files like this? (New index created in temporary files that were not saved properly, while old index files were already deleted???) I highly doubt it. 2) Does anyone know of any other way all files in an index could be inadvertently deleted (e.g. through killing a process)? For example, if you kill the process during an 'add' would that cause all files to be deleted? Same as above. You can create an artificial, large index for testing purposes. Call optimize once in a while, and then kill the process. I don't think Lucene will remove your files. Otis __ Do you Yahoo!? SBC Yahoo! DSL - Now only $29.95 per month! http://sbc.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Files getting deleted when optimize is killed?
I've had a problem on several occasions where my entire index is deleted -- that is, EVERY file (except 'segments') is gone. There were many users on the system each time, so its a little hard to tell for sure what was going on, but my theory is this: My code will automatically call optimize( ) periodically. Because the index is very large, it can take a long time. It looks like an administrator may have killed my process, and its possible that it was killed while an optimize( ) was in progress. I have two questions: 1) Does anyone know if killing an optimize( ) in progress could wipe out all files like this? (New index created in temporary files that were not saved properly, while old index files were already deleted???) 2) Does anyone know of any other way all files in an index could be inadvertently deleted (e.g. through killing a process)? For example, if you kill the process during an 'add' would that cause all files to be deleted? Steve Rajavuori - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
optimize()
How does it affect overall performance, when I do not call optimize()? THX -g- -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
Re: optimize()
This was just mentioned a few days ago. Check the archives. Not needed for indexing, good to do after you are done indexing, as the index reader needs to open and search through less files. Otis --- Leo Galambos [EMAIL PROTECTED] wrote: How does it affect overall performance, when I do not call optimize()? THX -g- -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] __ Do you Yahoo!? Yahoo! Mail Plus - Powerful. Affordable. Sign up now. http://mailplus.yahoo.com -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
Re: optimize()
Did you try any tests in this area? (figures, charts...) AFAIK reader reads identical number of (giga)bytes. BTW, it could read segments in many threads. I do not see why it would be slower (until you do many delete()-s). If reader opens 1 or 50 files, it is still nothing. -g- On Tue, 26 Nov 2002, Otis Gospodnetic wrote: This was just mentioned a few days ago. Check the archives. Not needed for indexing, good to do after you are done indexing, as the index reader needs to open and search through less files. Otis --- Leo Galambos [EMAIL PROTECTED] wrote: How does it affect overall performance, when I do not call optimize()? THX -g- -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] __ Do you Yahoo!? Yahoo! Mail Plus - Powerful. Affordable. Sign up now. http://mailplus.yahoo.com -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
RE: optimize()
I don't know if this answers your question, but I had alot of problems with lucene bombing out with out of memory errors. I was not using the optimize, I tried this and hey presto no more problems. -Original Message- From: Leo Galambos [mailto:[EMAIL PROTECTED]] Sent: Wednesday, 27 November 2002 5:22 AM To: [EMAIL PROTECTED] Subject: optimize() How does it affect overall performance, when I do not call optimize()? THX -g- -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
large index - slow optimize()
Hello, I am building an index with a few 1M documents, and every X documents added to the index I call optimize() on the IndexWriter. I have noticed that as the index grows this calls takes more and more time, even though the number of new segments that need to be merged is the same between every optimize() call. I suspect this is normal and not a bug, but is there no way around that? Do you know which part is the part that takes longer and longer as the index grows? Thanks, Otis __ Do you Yahoo!? Yahoo! Mail Plus Powerful. Affordable. Sign up now. http://mailplus.yahoo.com -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
RE: large index - slow optimize()
Note - this is not a fact, this is what I think I know about how it works. My working assumption has been its just a matter of disk speed, since during optimize, the entire index is copied into new files, and then at the end, the old one is removed. So the more GB you have to copy, the longer it takes. This is also the reason that you need double the size of your index available on the drive in order to perform an optimize, correct? Or does this only apply when you are merging indexes? Dan -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]] Sent: Friday, November 22, 2002 12:52 PM To: [EMAIL PROTECTED] Subject: large index - slow optimize() Hello, I am building an index with a few 1M documents, and every X documents added to the index I call optimize() on the IndexWriter. I have noticed that as the index grows this calls takes more and more time, even though the number of new segments that need to be merged is the same between every optimize() call. I suspect this is normal and not a bug, but is there no way around that? Do you know which part is the part that takes longer and longer as the index grows? Thanks, Otis __ Do you Yahoo!? Yahoo! Mail Plus - Powerful. Affordable. Sign up now. http://mailplus.yahoo.com -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
Re: optimize(), delete() calls on IndexWriter
No they don't. Note that delete() is in IndexReader. Otis --- Aruna Raghavan [EMAIL PROTECTED] wrote: Hi, Do calls like optimize() and delete() on the Indexwriter cause a separate thread to be kicked off? Thanks! Aruna. -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] __ Do You Yahoo!? Try FREE Yahoo! Mail - the world's greatest free email! http://mail.yahoo.com/ -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
RE: optimize(), delete() calls on IndexWriter
Yes, thanks. -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]] Sent: Friday, March 08, 2002 11:46 AM To: Lucene Users List Subject: Re: optimize(), delete() calls on IndexWriter No they don't. Note that delete() is in IndexReader. Otis --- Aruna Raghavan [EMAIL PROTECTED] wrote: Hi, Do calls like optimize() and delete() on the Indexwriter cause a separate thread to be kicked off? Thanks! Aruna. -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] __ Do You Yahoo!? Try FREE Yahoo! Mail - the world's greatest free email! http://mail.yahoo.com/ -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]