Re: Disk space used by optimize

2005-02-06 Thread Morus Walter
Bernhard Messer writes:
 
 However, three times the space sounds a bit too much, or I make a
 mistake in the book. :)
   
 
 there already was  a discussion about disk usage during index optimize. 
 Please have a look to the developers list at: 
 http://mail-archives.apache.org/eyebrowse/[EMAIL PROTECTED]msgId=1797569 
 http://mail-archives.apache.org/eyebrowse/[EMAIL PROTECTED]msgId=1797569
 where i made some measurements about the disk usage within lucene.
 At that time i proposed a patch which was reducing disk total used disk 
 size from 3 times to a little more than 2 times of the final index size. 
 Together with Christoph we implemented some improvements to the 
 optimization patch and finally commit the changes.
 
Hmm. In the case that the index is used (open reader), I doubt your patch 
makes a difference. In that case the disk space used by the non optimized 
index will still be used even if the files are deleted (on unix/linux).
What happens, if disk space run's out during creation of the compound index?
Will the non compound files be a usable index?
Otherwise you risk to loose the index.

Morus

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Disk space used by optimize - non space in disk corrupts index.

2005-02-04 Thread Ernesto De Santis
Hi all
We have a big index and a little space in disk.
When optimize and all space is consumed, our index is corrupted.
segments file point to nonexistent files.
Enviroment:
java 1.4.2_04
W2000 SP4
Tomat 5.5.4
Bye,
Ernesto.
Yura Smolsky escribió:
Hello, Otis.
There is a big difference when you use compound index format or
multiple files. I have tested it on the big index (45 Gb). When I used
compound file then optimize takes 3 times more space, b/c *.cfs needs
to be unpacked.
Now I do use non compound file format. It needs like twice as much
disk space.
OG Have you tried using the multifile index format?  Now I wonder if there
OG is actually a difference in disk space cosumed by optimize() when you
OG use multifile and compound index format...
OG Otis
OG --- Kauler, Leto S [EMAIL PROTECTED] wrote:
 

Our copy of LIA is in the mail ;)
Yes the final three files are: the .cfs (46.8MB), deletable (4
bytes),
and segments (29 bytes).
--Leto

 

-Original Message-
From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] 

Hello,
Yes, that is how optimize works - copies all existing index 
segments into one unified index segment, thus optimizing it.

see hit #1:
   

http://www.lucenebook.com/search?query=optimize+disk+space
 

However, three times the space sounds a bit too much, or I 
make a mistake in the book. :)

You said you end up with 3 files - .cfs is one of them, right?
Otis
--- Kauler, Leto S [EMAIL PROTECTED] wrote:
   

Just a quick question:  after writing an index and then calling
optimize(), is it normal for the index to expand to about 
 

three times 
   

the size before finally compressing?
In our case the optimise grinds the disk, expanding the index
 

into 
 

many files of about 145MB total, before compressing down to three
 

files of about 47MB total.  That must be a lot of disk activity
 

for 
 

the people with multi-gigabyte indexes!
Regards,
Leto
 

CONFIDENTIALITY NOTICE AND DISCLAIMER
Information in this transmission is intended only for the person(s)
to whom it is addressed and may contain privileged and/or
confidential information. If you are not the intended recipient, any
disclosure, copying or dissemination of the information is
unauthorised and you should delete/destroy all copies and notify the
sender. No liability is accepted for any unauthorised use of the
information contained in this transmission.
This disclaimer has been automatically added.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail:
[EMAIL PROTECTED]
 


OG -
OG To unsubscribe, e-mail: [EMAIL PROTECTED]
OG For additional commands, e-mail:
OG [EMAIL PROTECTED]
Yura Smolsky,

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 


--
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
Version: 7.0.300 / Virus Database: 265.8.5 - Release Date: 03/02/2005
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Disk space used by optimize

2005-02-04 Thread Bernhard Messer

However, three times the space sounds a bit too much, or I make a
mistake in the book. :)
 

there already was  a discussion about disk usage during index optimize. 
Please have a look to the developers list at: 
http://mail-archives.apache.org/eyebrowse/[EMAIL PROTECTED]msgId=1797569 
http://mail-archives.apache.org/eyebrowse/[EMAIL PROTECTED]msgId=1797569
where i made some measurements about the disk usage within lucene.
At that time i proposed a patch which was reducing disk total used disk 
size from 3 times to a little more than 2 times of the final index size. 
Together with Christoph we implemented some improvements to the 
optimization patch and finally commit the changes.

Bernhard
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Disk space used by optimize

2005-01-31 Thread Doug Cutting
Yura Smolsky wrote:
There is a big difference when you use compound index format or
multiple files. I have tested it on the big index (45 Gb). When I used
compound file then optimize takes 3 times more space, b/c *.cfs needs
to be unpacked.
Now I do use non compound file format. It needs like twice as much
disk space.
Perhaps we should add something to the javadocs noting this?
Doug
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Disk space used by optimize

2005-01-28 Thread Otis Gospodnetic
Morus,

that description of 3 sets of index files is what I was imagining, too.
 I'll have to test and add to the book errata, it seems.

Thanks for the info,
Otis

--- Morus Walter [EMAIL PROTECTED] wrote:

 Otis Gospodnetic writes:
  Hello,
  
  Yes, that is how optimize works - copies all existing index
 segments
  into one unified index segment, thus optimizing it.
  
  see hit #1:
 http://www.lucenebook.com/search?query=optimize+disk+space
  
  However, three times the space sounds a bit too much, or I make a
  mistake in the book. :)
  
 I cannot explain why, but ~ three times the size of the final index
 is
 what I observed, when I logged disk usage during optimize of an index
 in compound index format.
 The test was on linux, I simply did a 'du -s' every few seconds
 parallel 
 to the optimize.
 I didn't test noncompund format. Probably optimizing a compund format
 requires to store the different parts of the compound file separately
 before joining them to the compound file (sound reasonable, otherwise
 you would need to know the sizes before creating the parts). In that
 case 
 you had the original index, the separate files and the new compound
 file 
 as the disk usage peak.
 
 So IMHO the book is wrong.
 
 Morus
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Disk space used by optimize

2005-01-27 Thread Otis Gospodnetic
Hello,

Yes, that is how optimize works - copies all existing index segments
into one unified index segment, thus optimizing it.

see hit #1: http://www.lucenebook.com/search?query=optimize+disk+space

However, three times the space sounds a bit too much, or I make a
mistake in the book. :)

You said you end up with 3 files - .cfs is one of them, right?

Otis


--- Kauler, Leto S [EMAIL PROTECTED] wrote:

 
 Just a quick question:  after writing an index and then calling
 optimize(), is it normal for the index to expand to about three times
 the size before finally compressing?
 
 In our case the optimise grinds the disk, expanding the index into
 many
 files of about 145MB total, before compressing down to three files of
 about 47MB total.  That must be a lot of disk activity for the people
 with multi-gigabyte indexes!
 
 Regards,
 Leto
 
 CONFIDENTIALITY NOTICE AND DISCLAIMER
 
 Information in this transmission is intended only for the person(s)
 to whom it is addressed and may contain privileged and/or
 confidential information. If you are not the intended recipient, any
 disclosure, copying or dissemination of the information is
 unauthorised and you should delete/destroy all copies and notify the
 sender. No liability is accepted for any unauthorised use of the
 information contained in this transmission.
 
 This disclaimer has been automatically added.
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Disk space used by optimize

2005-01-27 Thread Kauler, Leto S
Our copy of LIA is in the mail ;)

Yes the final three files are: the .cfs (46.8MB), deletable (4 bytes),
and segments (29 bytes).

--Leto



 -Original Message-
 From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] 
 
 Hello,
 
 Yes, that is how optimize works - copies all existing index 
 segments into one unified index segment, thus optimizing it.
 
 see hit #1: http://www.lucenebook.com/search?query=optimize+disk+space
 
 However, three times the space sounds a bit too much, or I 
 make a mistake in the book. :)
 
 You said you end up with 3 files - .cfs is one of them, right?
 
 Otis
 
 
 --- Kauler, Leto S [EMAIL PROTECTED] wrote:
 
  
  Just a quick question:  after writing an index and then calling 
  optimize(), is it normal for the index to expand to about 
 three times 
  the size before finally compressing?
  
  In our case the optimise grinds the disk, expanding the index into 
  many files of about 145MB total, before compressing down to three 
  files of about 47MB total.  That must be a lot of disk activity for 
  the people with multi-gigabyte indexes!
  
  Regards,
  Leto

CONFIDENTIALITY NOTICE AND DISCLAIMER

Information in this transmission is intended only for the person(s) to whom it 
is addressed and may contain privileged and/or confidential information. If you 
are not the intended recipient, any disclosure, copying or dissemination of the 
information is unauthorised and you should delete/destroy all copies and notify 
the sender. No liability is accepted for any unauthorised use of the 
information contained in this transmission.

This disclaimer has been automatically added.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Disk space used by optimize

2005-01-27 Thread Otis Gospodnetic
Have you tried using the multifile index format?  Now I wonder if there
is actually a difference in disk space cosumed by optimize() when you
use multifile and compound index format...

Otis

--- Kauler, Leto S [EMAIL PROTECTED] wrote:

 Our copy of LIA is in the mail ;)
 
 Yes the final three files are: the .cfs (46.8MB), deletable (4
 bytes),
 and segments (29 bytes).
 
 --Leto
 
 
 
  -Original Message-
  From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] 
  
  Hello,
  
  Yes, that is how optimize works - copies all existing index 
  segments into one unified index segment, thus optimizing it.
  
  see hit #1:
 http://www.lucenebook.com/search?query=optimize+disk+space
  
  However, three times the space sounds a bit too much, or I 
  make a mistake in the book. :)
  
  You said you end up with 3 files - .cfs is one of them, right?
  
  Otis
  
  
  --- Kauler, Leto S [EMAIL PROTECTED] wrote:
  
   
   Just a quick question:  after writing an index and then calling 
   optimize(), is it normal for the index to expand to about 
  three times 
   the size before finally compressing?
   
   In our case the optimise grinds the disk, expanding the index
 into 
   many files of about 145MB total, before compressing down to three
 
   files of about 47MB total.  That must be a lot of disk activity
 for 
   the people with multi-gigabyte indexes!
   
   Regards,
   Leto
 
 CONFIDENTIALITY NOTICE AND DISCLAIMER
 
 Information in this transmission is intended only for the person(s)
 to whom it is addressed and may contain privileged and/or
 confidential information. If you are not the intended recipient, any
 disclosure, copying or dissemination of the information is
 unauthorised and you should delete/destroy all copies and notify the
 sender. No liability is accepted for any unauthorised use of the
 information contained in this transmission.
 
 This disclaimer has been automatically added.
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]