Re: Disk space used by optimize
Bernhard Messer writes: However, three times the space sounds a bit too much, or I make a mistake in the book. :) there already was a discussion about disk usage during index optimize. Please have a look to the developers list at: http://mail-archives.apache.org/eyebrowse/[EMAIL PROTECTED]msgId=1797569 http://mail-archives.apache.org/eyebrowse/[EMAIL PROTECTED]msgId=1797569 where i made some measurements about the disk usage within lucene. At that time i proposed a patch which was reducing disk total used disk size from 3 times to a little more than 2 times of the final index size. Together with Christoph we implemented some improvements to the optimization patch and finally commit the changes. Hmm. In the case that the index is used (open reader), I doubt your patch makes a difference. In that case the disk space used by the non optimized index will still be used even if the files are deleted (on unix/linux). What happens, if disk space run's out during creation of the compound index? Will the non compound files be a usable index? Otherwise you risk to loose the index. Morus - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Disk space used by optimize - non space in disk corrupts index.
Hi all We have a big index and a little space in disk. When optimize and all space is consumed, our index is corrupted. segments file point to nonexistent files. Enviroment: java 1.4.2_04 W2000 SP4 Tomat 5.5.4 Bye, Ernesto. Yura Smolsky escribió: Hello, Otis. There is a big difference when you use compound index format or multiple files. I have tested it on the big index (45 Gb). When I used compound file then optimize takes 3 times more space, b/c *.cfs needs to be unpacked. Now I do use non compound file format. It needs like twice as much disk space. OG Have you tried using the multifile index format? Now I wonder if there OG is actually a difference in disk space cosumed by optimize() when you OG use multifile and compound index format... OG Otis OG --- Kauler, Leto S [EMAIL PROTECTED] wrote: Our copy of LIA is in the mail ;) Yes the final three files are: the .cfs (46.8MB), deletable (4 bytes), and segments (29 bytes). --Leto -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Hello, Yes, that is how optimize works - copies all existing index segments into one unified index segment, thus optimizing it. see hit #1: http://www.lucenebook.com/search?query=optimize+disk+space However, three times the space sounds a bit too much, or I make a mistake in the book. :) You said you end up with 3 files - .cfs is one of them, right? Otis --- Kauler, Leto S [EMAIL PROTECTED] wrote: Just a quick question: after writing an index and then calling optimize(), is it normal for the index to expand to about three times the size before finally compressing? In our case the optimise grinds the disk, expanding the index into many files of about 145MB total, before compressing down to three files of about 47MB total. That must be a lot of disk activity for the people with multi-gigabyte indexes! Regards, Leto CONFIDENTIALITY NOTICE AND DISCLAIMER Information in this transmission is intended only for the person(s) to whom it is addressed and may contain privileged and/or confidential information. If you are not the intended recipient, any disclosure, copying or dissemination of the information is unauthorised and you should delete/destroy all copies and notify the sender. No liability is accepted for any unauthorised use of the information contained in this transmission. This disclaimer has been automatically added. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] OG - OG To unsubscribe, e-mail: [EMAIL PROTECTED] OG For additional commands, e-mail: OG [EMAIL PROTECTED] Yura Smolsky, - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- No virus found in this outgoing message. Checked by AVG Anti-Virus. Version: 7.0.300 / Virus Database: 265.8.5 - Release Date: 03/02/2005 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Disk space used by optimize
However, three times the space sounds a bit too much, or I make a mistake in the book. :) there already was a discussion about disk usage during index optimize. Please have a look to the developers list at: http://mail-archives.apache.org/eyebrowse/[EMAIL PROTECTED]msgId=1797569 http://mail-archives.apache.org/eyebrowse/[EMAIL PROTECTED]msgId=1797569 where i made some measurements about the disk usage within lucene. At that time i proposed a patch which was reducing disk total used disk size from 3 times to a little more than 2 times of the final index size. Together with Christoph we implemented some improvements to the optimization patch and finally commit the changes. Bernhard - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Disk space used by optimize
Yura Smolsky wrote: There is a big difference when you use compound index format or multiple files. I have tested it on the big index (45 Gb). When I used compound file then optimize takes 3 times more space, b/c *.cfs needs to be unpacked. Now I do use non compound file format. It needs like twice as much disk space. Perhaps we should add something to the javadocs noting this? Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Disk space used by optimize
Morus, that description of 3 sets of index files is what I was imagining, too. I'll have to test and add to the book errata, it seems. Thanks for the info, Otis --- Morus Walter [EMAIL PROTECTED] wrote: Otis Gospodnetic writes: Hello, Yes, that is how optimize works - copies all existing index segments into one unified index segment, thus optimizing it. see hit #1: http://www.lucenebook.com/search?query=optimize+disk+space However, three times the space sounds a bit too much, or I make a mistake in the book. :) I cannot explain why, but ~ three times the size of the final index is what I observed, when I logged disk usage during optimize of an index in compound index format. The test was on linux, I simply did a 'du -s' every few seconds parallel to the optimize. I didn't test noncompund format. Probably optimizing a compund format requires to store the different parts of the compound file separately before joining them to the compound file (sound reasonable, otherwise you would need to know the sizes before creating the parts). In that case you had the original index, the separate files and the new compound file as the disk usage peak. So IMHO the book is wrong. Morus - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Disk space used by optimize
Hello, Yes, that is how optimize works - copies all existing index segments into one unified index segment, thus optimizing it. see hit #1: http://www.lucenebook.com/search?query=optimize+disk+space However, three times the space sounds a bit too much, or I make a mistake in the book. :) You said you end up with 3 files - .cfs is one of them, right? Otis --- Kauler, Leto S [EMAIL PROTECTED] wrote: Just a quick question: after writing an index and then calling optimize(), is it normal for the index to expand to about three times the size before finally compressing? In our case the optimise grinds the disk, expanding the index into many files of about 145MB total, before compressing down to three files of about 47MB total. That must be a lot of disk activity for the people with multi-gigabyte indexes! Regards, Leto CONFIDENTIALITY NOTICE AND DISCLAIMER Information in this transmission is intended only for the person(s) to whom it is addressed and may contain privileged and/or confidential information. If you are not the intended recipient, any disclosure, copying or dissemination of the information is unauthorised and you should delete/destroy all copies and notify the sender. No liability is accepted for any unauthorised use of the information contained in this transmission. This disclaimer has been automatically added. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Disk space used by optimize
Our copy of LIA is in the mail ;) Yes the final three files are: the .cfs (46.8MB), deletable (4 bytes), and segments (29 bytes). --Leto -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Hello, Yes, that is how optimize works - copies all existing index segments into one unified index segment, thus optimizing it. see hit #1: http://www.lucenebook.com/search?query=optimize+disk+space However, three times the space sounds a bit too much, or I make a mistake in the book. :) You said you end up with 3 files - .cfs is one of them, right? Otis --- Kauler, Leto S [EMAIL PROTECTED] wrote: Just a quick question: after writing an index and then calling optimize(), is it normal for the index to expand to about three times the size before finally compressing? In our case the optimise grinds the disk, expanding the index into many files of about 145MB total, before compressing down to three files of about 47MB total. That must be a lot of disk activity for the people with multi-gigabyte indexes! Regards, Leto CONFIDENTIALITY NOTICE AND DISCLAIMER Information in this transmission is intended only for the person(s) to whom it is addressed and may contain privileged and/or confidential information. If you are not the intended recipient, any disclosure, copying or dissemination of the information is unauthorised and you should delete/destroy all copies and notify the sender. No liability is accepted for any unauthorised use of the information contained in this transmission. This disclaimer has been automatically added. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Disk space used by optimize
Have you tried using the multifile index format? Now I wonder if there is actually a difference in disk space cosumed by optimize() when you use multifile and compound index format... Otis --- Kauler, Leto S [EMAIL PROTECTED] wrote: Our copy of LIA is in the mail ;) Yes the final three files are: the .cfs (46.8MB), deletable (4 bytes), and segments (29 bytes). --Leto -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Hello, Yes, that is how optimize works - copies all existing index segments into one unified index segment, thus optimizing it. see hit #1: http://www.lucenebook.com/search?query=optimize+disk+space However, three times the space sounds a bit too much, or I make a mistake in the book. :) You said you end up with 3 files - .cfs is one of them, right? Otis --- Kauler, Leto S [EMAIL PROTECTED] wrote: Just a quick question: after writing an index and then calling optimize(), is it normal for the index to expand to about three times the size before finally compressing? In our case the optimise grinds the disk, expanding the index into many files of about 145MB total, before compressing down to three files of about 47MB total. That must be a lot of disk activity for the people with multi-gigabyte indexes! Regards, Leto CONFIDENTIALITY NOTICE AND DISCLAIMER Information in this transmission is intended only for the person(s) to whom it is addressed and may contain privileged and/or confidential information. If you are not the intended recipient, any disclosure, copying or dissemination of the information is unauthorised and you should delete/destroy all copies and notify the sender. No liability is accepted for any unauthorised use of the information contained in this transmission. This disclaimer has been automatically added. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]