Re: Disk space used by optimize

2005-02-06 Thread Morus Walter
Bernhard Messer writes:
 
 However, three times the space sounds a bit too much, or I make a
 mistake in the book. :)
   
 
 there already was  a discussion about disk usage during index optimize. 
 Please have a look to the developers list at: 
 http://mail-archives.apache.org/eyebrowse/[EMAIL PROTECTED]msgId=1797569 
 http://mail-archives.apache.org/eyebrowse/[EMAIL PROTECTED]msgId=1797569
 where i made some measurements about the disk usage within lucene.
 At that time i proposed a patch which was reducing disk total used disk 
 size from 3 times to a little more than 2 times of the final index size. 
 Together with Christoph we implemented some improvements to the 
 optimization patch and finally commit the changes.
 
Hmm. In the case that the index is used (open reader), I doubt your patch 
makes a difference. In that case the disk space used by the non optimized 
index will still be used even if the files are deleted (on unix/linux).
What happens, if disk space run's out during creation of the compound index?
Will the non compound files be a usable index?
Otherwise you risk to loose the index.

Morus

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re[2]: Disk space used by optimize

2005-02-04 Thread Yura Smolsky
Hello, Doug.

 There is a big difference when you use compound index format or
 multiple files. I have tested it on the big index (45 Gb). When I used
 compound file then optimize takes 3 times more space, b/c *.cfs needs
 to be unpacked.
 
 Now I do use non compound file format. It needs like twice as much
 disk space.
DC Perhaps we should add something to the javadocs noting this?

Sure. I was a bit confused about optimizing compound file format b/c I
had not info about space usage when optimizing.
More info in the javadocs will save somebody's time :)


Yura Smolsky




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Optimize not deleting all files

2005-02-04 Thread Ernesto De Santis
Hi all
We have the same problem.
We guess that the problem is that windows lock files.
Our enviroment:
Windows 2000
Tomcat 5.5.4
Ernesto.
[EMAIL PROTECTED] escribió:
Hi,
When I run an optimize in our production environment, old index are
left in the directory and are not deleted.  

My understanding is that an
optimize will create new index files and all existing index files should be
deleted.  Is this correct?
We are running Lucene 1.4.2 on Windows.  

Any help is appreciated.  Thanks!
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

 


--
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
Version: 7.0.300 / Virus Database: 265.8.5 - Release Date: 03/02/2005
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Disk space used by optimize - non space in disk corrupts index.

2005-02-04 Thread Ernesto De Santis
Hi all
We have a big index and a little space in disk.
When optimize and all space is consumed, our index is corrupted.
segments file point to nonexistent files.
Enviroment:
java 1.4.2_04
W2000 SP4
Tomat 5.5.4
Bye,
Ernesto.
Yura Smolsky escribió:
Hello, Otis.
There is a big difference when you use compound index format or
multiple files. I have tested it on the big index (45 Gb). When I used
compound file then optimize takes 3 times more space, b/c *.cfs needs
to be unpacked.
Now I do use non compound file format. It needs like twice as much
disk space.
OG Have you tried using the multifile index format?  Now I wonder if there
OG is actually a difference in disk space cosumed by optimize() when you
OG use multifile and compound index format...
OG Otis
OG --- Kauler, Leto S [EMAIL PROTECTED] wrote:
 

Our copy of LIA is in the mail ;)
Yes the final three files are: the .cfs (46.8MB), deletable (4
bytes),
and segments (29 bytes).
--Leto

 

-Original Message-
From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] 

Hello,
Yes, that is how optimize works - copies all existing index 
segments into one unified index segment, thus optimizing it.

see hit #1:
   

http://www.lucenebook.com/search?query=optimize+disk+space
 

However, three times the space sounds a bit too much, or I 
make a mistake in the book. :)

You said you end up with 3 files - .cfs is one of them, right?
Otis
--- Kauler, Leto S [EMAIL PROTECTED] wrote:
   

Just a quick question:  after writing an index and then calling
optimize(), is it normal for the index to expand to about 
 

three times 
   

the size before finally compressing?
In our case the optimise grinds the disk, expanding the index
 

into 
 

many files of about 145MB total, before compressing down to three
 

files of about 47MB total.  That must be a lot of disk activity
 

for 
 

the people with multi-gigabyte indexes!
Regards,
Leto
 

CONFIDENTIALITY NOTICE AND DISCLAIMER
Information in this transmission is intended only for the person(s)
to whom it is addressed and may contain privileged and/or
confidential information. If you are not the intended recipient, any
disclosure, copying or dissemination of the information is
unauthorised and you should delete/destroy all copies and notify the
sender. No liability is accepted for any unauthorised use of the
information contained in this transmission.
This disclaimer has been automatically added.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail:
[EMAIL PROTECTED]
 


OG -
OG To unsubscribe, e-mail: [EMAIL PROTECTED]
OG For additional commands, e-mail:
OG [EMAIL PROTECTED]
Yura Smolsky,

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 


--
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
Version: 7.0.300 / Virus Database: 265.8.5 - Release Date: 03/02/2005
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Disk space used by optimize

2005-02-04 Thread Bernhard Messer

However, three times the space sounds a bit too much, or I make a
mistake in the book. :)
 

there already was  a discussion about disk usage during index optimize. 
Please have a look to the developers list at: 
http://mail-archives.apache.org/eyebrowse/[EMAIL PROTECTED]msgId=1797569 
http://mail-archives.apache.org/eyebrowse/[EMAIL PROTECTED]msgId=1797569
where i made some measurements about the disk usage within lucene.
At that time i proposed a patch which was reducing disk total used disk 
size from 3 times to a little more than 2 times of the final index size. 
Together with Christoph we implemented some improvements to the 
optimization patch and finally commit the changes.

Bernhard
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Optimize not deleting all files

2005-02-04 Thread Otis Gospodnetic
Get and try Lucene 1.4.3.  One of the older versions had a bug that was
not deleting old index files.

Otis

--- [EMAIL PROTECTED] wrote:

 Hi,
 
 When I run an optimize in our production environment, old index are
 left in the directory and are not deleted.  
 
 My understanding is that an
 optimize will create new index files and all existing index files
 should be
 deleted.  Is this correct?
 
 We are running Lucene 1.4.2 on Windows.  
 
 
 Any help is appreciated.  Thanks!
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Optimize not deleting all files

2005-02-04 Thread yahootintin . 1247688
Ernestor, what version of Lucene are you running?



--- Lucene Users List
lucene-user@jakarta.apache.org wrote:

Hi all

 

 We have the same problem.

 We guess that the problem is that windows lock files.

 

 Our enviroment:

 Windows 2000

 Tomcat 5.5.4

 

 Ernesto.

 

 [EMAIL PROTECTED]
escribiꊾ 

 Hi,

 

 When I run an optimize in our production environment,
old index are

 left in the directory and are not deleted.  

 

 My
understanding is that an

 optimize will create new index files and all
existing index files should be

 deleted.  Is this correct?

 

 We
are running Lucene 1.4.2 on Windows.  

 

 

 Any help is appreciated.
 Thanks!

 

 -

 To unsubscribe, e-mail: [EMAIL PROTECTED]


For additional commands, e-mail: [EMAIL PROTECTED]

 

 

 

   

 

 

 

 -- 

 No virus found in this outgoing message.

 Checked by AVG Anti-Virus.

 Version: 7.0.300 / Virus Database: 265.8.5
- Release Date: 03/02/2005

 

 

 -

 To unsubscribe, e-mail: [EMAIL PROTECTED]

 For
additional commands, e-mail: [EMAIL PROTECTED]

 

 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Optimize not deleting all files

2005-02-04 Thread Patricio Keilty
Hi all, ill answer on behalf of Ernesto, our environment is:
Lucene 1.4.2
Tomcat 5.5.4
java 1.4.2_04
Windows 2000 SP4
--p
[EMAIL PROTECTED] wrote:
Ernestor, what version of Lucene are you running?
--- Lucene Users List
lucene-user@jakarta.apache.org wrote:
Hi all
We have the same problem.

We guess that the problem is that windows lock files.
Our enviroment:

Windows 2000
Tomcat 5.5.4
Ernesto.
[EMAIL PROTECTED]
escribi 

Hi,
When I run an optimize in our production environment,
old index are
left in the directory and are not deleted.  

My
understanding is that an
optimize will create new index files and all
existing index files should be
deleted.  Is this correct?
We
are running Lucene 1.4.2 on Windows.  

Any help is appreciated.
Thanks!
-

To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



--
No virus found in this outgoing message.

Checked by AVG Anti-Virus.
Version: 7.0.300 / Virus Database: 265.8.5
- Release Date: 03/02/2005
-

To unsubscribe, e-mail: [EMAIL PROTECTED]
For
additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Optimize not deleting all files

2005-02-04 Thread Patricio Keilty
Hi Otis, tried version 1.4.3 without success, old index files still 
remain in the directory.
Also tried not calling optimize(), and still getting the same behaviour, 
maybe our problem is not related to optimize() call at all.

--p
Otis Gospodnetic wrote:
Get and try Lucene 1.4.3.  One of the older versions had a bug that was
not deleting old index files.
Otis
--- [EMAIL PROTECTED] wrote:

Hi,
When I run an optimize in our production environment, old index are
left in the directory and are not deleted.  

My understanding is that an
optimize will create new index files and all existing index files
should be
deleted.  Is this correct?
We are running Lucene 1.4.2 on Windows.  

Any help is appreciated.  Thanks!
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Optimize not deleting all files

2005-02-04 Thread Steven Rowe
Hi Patricio,
Is it the case that the old index files are not removed from session to
session, or only within the same session?  The discussion below pertains to
the latter case, that is, where the old index files are used in the same
process as the files replacing them.
I was having a similar problem, and tracked the source down to IndexReaders
not being closed in my application.  

As far as I can tell, in order for IndexReaders to present a consistent
view of an index while changes are being made to it, read-only copies
of the index are kept around until all IndexReaders using them are
closed.  If any IndexReaders are open on the index, IndexWriters first
make a copy, then operate on the copy.  If you track down all of these
open IndexReaders and close them before optimization, all of the
old index files should be deleted.  (Lucene Gurus, please correct this
if I have misrepresented the situation).
In my application, I had a bad interaction between IndexReader caching,
garbage collection, and incremental indexing, in which a new IndexReader
was being opened on an index after each indexing increment, without
closing the already-opened IndexReaders.
On Windows, operating-system level file locking caused by IndexReaders
left open was disallowing index re-creation, because the IndexWriter
wasn't allowed to delete the index files opened by the abandoned
IndexReaders.
In short, if you need to write to an index more than once in a single
session, be sure to keep careful track of your IndexReaders.
Hope it helps,
Steve
Patricio Keilty wrote:
Hi Otis, tried version 1.4.3 without success, old index files still 
remain in the directory.
Also tried not calling optimize(), and still getting the same behaviour, 
maybe our problem is not related to optimize() call at all.

--p
Otis Gospodnetic wrote:
Get and try Lucene 1.4.3.  One of the older versions had a bug that was
not deleting old index files.
Otis
--- [EMAIL PROTECTED] wrote:

Hi,
When I run an optimize in our production environment, old index are
left in the directory and are not deleted. 
My understanding is that an
optimize will create new index files and all existing index files
should be
deleted.  Is this correct?

We are running Lucene 1.4.2 on Windows. 

Any help is appreciated.  Thanks!

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Optimize not deleting all files

2005-02-04 Thread yahootintin . 1247688
Yes, I believe my problem is related to open IndexReaders.  The issues is
that we can't shut down our live search application while we wait for a 10
minute optimization.  Search is a major part of our application and removing
the feature would significantly affect our end users (even though we run the
optimize during the night).



After the optimize is completed, I close and
re-open the readers so they start reading from the new index files.  I'm 
thinking
of adding code to delete all the old files at that point.  I presume they
will no longer be locked.



--- Lucene Users List lucene-user@jakarta.apache.org
wrote:

Hi Patricio,

 

 Is it the case that the old index files are
not removed from session to

 session, or only within the same session? 
The discussion below pertains to

 the latter case, that is, where the old
index files are used in the same

 process as the files replacing them.

 

 I was having a similar problem, and tracked the source down to IndexReaders

 not being closed in my application.  

 

 As far as I can tell, in order
for IndexReaders to present a consistent

 view of an index while changes
are being made to it, read-only copies

 of the index are kept around until
all IndexReaders using them are

 closed.  If any IndexReaders are open on
the index, IndexWriters first

 make a copy, then operate on the copy.  If
you track down all of these

 open IndexReaders and close them before optimization,
all of the

 old index files should be deleted.  (Lucene Gurus, please
correct this

 if I have misrepresented the situation).

 

 In my application,
I had a bad interaction between IndexReader caching,

 garbage collection,
and incremental indexing, in which a new IndexReader

 was being opened on
an index after each indexing increment, without

 closing the already-opened
IndexReaders.

 

 On Windows, operating-system level file locking caused
by IndexReaders

 left open was disallowing index re-creation, because the
IndexWriter

 wasn't allowed to delete the index files opened by the abandoned

 IndexReaders.

 

 In short, if you need to write to an index more than
once in a single

 session, be sure to keep careful track of your IndexReaders.

 

 Hope it helps,

 Steve

 

 Patricio Keilty wrote:

  Hi Otis,
tried version 1.4.3 without success, old index files still 

  remain in
the directory.

  Also tried not calling optimize(), and still getting the
same behaviour, 

  maybe our problem is not related to optimize() call
at all.

  

  --p

  

  Otis Gospodnetic wrote:

  

  Get
and try Lucene 1.4.3.  One of the older versions had a bug that was

 
not deleting old index files.

 

  Otis

 

  --- [EMAIL PROTECTED]
wrote:

 

 

  Hi,

 

  When I run an optimize in our
production environment, old index are

  left in the directory and are
not deleted. 

  My understanding is that an

  optimize will create
new index files and all existing index files

  should be

  deleted.
 Is this correct?

 

  We are running Lucene 1.4.2 on Windows. 

 

  Any help is appreciated.  Thanks!

 

 

 -

 To unsubscribe, e-mail: [EMAIL PROTECTED]

 For
additional commands, e-mail: [EMAIL PROTECTED]

 

 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Optimize not deleting all files

2005-02-03 Thread yahootintin . 1247688
Hi,



When I run an optimize in our production environment, old index are
left in the directory and are not deleted.  



My understanding is that an
optimize will create new index files and all existing index files should be
deleted.  Is this correct?



We are running Lucene 1.4.2 on Windows.  



Any help is appreciated.  Thanks!

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Optimize not deleting all files

2005-02-03 Thread
Your understanding is right!

The old existing files should be deleted,but it  will build new files!


On Thu, 03 Feb 2005 17:36:27 -0800 (PST),
[EMAIL PROTECTED] [EMAIL PROTECTED]
wrote:
 Hi,
 
 When I run an optimize in our production environment, old index are
 left in the directory and are not deleted.
 
 My understanding is that an
 optimize will create new index files and all existing index files should be
 deleted.  Is this correct?
 
 We are running Lucene 1.4.2 on Windows.
 
 Any help is appreciated.  Thanks!
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 


-- 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Disk space used by optimize

2005-01-31 Thread Doug Cutting
Yura Smolsky wrote:
There is a big difference when you use compound index format or
multiple files. I have tested it on the big index (45 Gb). When I used
compound file then optimize takes 3 times more space, b/c *.cfs needs
to be unpacked.
Now I do use non compound file format. It needs like twice as much
disk space.
Perhaps we should add something to the javadocs noting this?
Doug
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re[2]: Disk space used by optimize

2005-01-30 Thread Yura Smolsky
Hello, Otis.

There is a big difference when you use compound index format or
multiple files. I have tested it on the big index (45 Gb). When I used
compound file then optimize takes 3 times more space, b/c *.cfs needs
to be unpacked.

Now I do use non compound file format. It needs like twice as much
disk space.

OG Have you tried using the multifile index format?  Now I wonder if there
OG is actually a difference in disk space cosumed by optimize() when you
OG use multifile and compound index format...

OG Otis

OG --- Kauler, Leto S [EMAIL PROTECTED] wrote:

 Our copy of LIA is in the mail ;)
 
 Yes the final three files are: the .cfs (46.8MB), deletable (4
 bytes),
 and segments (29 bytes).
 
 --Leto
 
 
 
  -Original Message-
  From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] 
  
  Hello,
  
  Yes, that is how optimize works - copies all existing index 
  segments into one unified index segment, thus optimizing it.
  
  see hit #1:
 http://www.lucenebook.com/search?query=optimize+disk+space
  
  However, three times the space sounds a bit too much, or I 
  make a mistake in the book. :)
  
  You said you end up with 3 files - .cfs is one of them, right?
  
  Otis
  
  
  --- Kauler, Leto S [EMAIL PROTECTED] wrote:
  
   
   Just a quick question:  after writing an index and then calling
   optimize(), is it normal for the index to expand to about 
  three times 
   the size before finally compressing?
   
   In our case the optimise grinds the disk, expanding the index
 into 
   many files of about 145MB total, before compressing down to three
 
   files of about 47MB total.  That must be a lot of disk activity
 for 
   the people with multi-gigabyte indexes!
   
   Regards,
   Leto
 
 CONFIDENTIALITY NOTICE AND DISCLAIMER
 
 Information in this transmission is intended only for the person(s)
 to whom it is addressed and may contain privileged and/or
 confidential information. If you are not the intended recipient, any
 disclosure, copying or dissemination of the information is
 unauthorised and you should delete/destroy all copies and notify the
 sender. No liability is accepted for any unauthorised use of the
 information contained in this transmission.
 
 This disclaimer has been automatically added.
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail:
 [EMAIL PROTECTED]
 
 


OG -
OG To unsubscribe, e-mail: [EMAIL PROTECTED]
OG For additional commands, e-mail:
OG [EMAIL PROTECTED]


Yura Smolsky,




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Disk space used by optimize

2005-01-28 Thread Otis Gospodnetic
Morus,

that description of 3 sets of index files is what I was imagining, too.
 I'll have to test and add to the book errata, it seems.

Thanks for the info,
Otis

--- Morus Walter [EMAIL PROTECTED] wrote:

 Otis Gospodnetic writes:
  Hello,
  
  Yes, that is how optimize works - copies all existing index
 segments
  into one unified index segment, thus optimizing it.
  
  see hit #1:
 http://www.lucenebook.com/search?query=optimize+disk+space
  
  However, three times the space sounds a bit too much, or I make a
  mistake in the book. :)
  
 I cannot explain why, but ~ three times the size of the final index
 is
 what I observed, when I logged disk usage during optimize of an index
 in compound index format.
 The test was on linux, I simply did a 'du -s' every few seconds
 parallel 
 to the optimize.
 I didn't test noncompund format. Probably optimizing a compund format
 requires to store the different parts of the compound file separately
 before joining them to the compound file (sound reasonable, otherwise
 you would need to know the sizes before creating the parts). In that
 case 
 you had the original index, the separate files and the new compound
 file 
 as the disk usage peak.
 
 So IMHO the book is wrong.
 
 Morus
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Disk space used by optimize

2005-01-27 Thread Kauler, Leto S

Just a quick question:  after writing an index and then calling
optimize(), is it normal for the index to expand to about three times
the size before finally compressing?

In our case the optimise grinds the disk, expanding the index into many
files of about 145MB total, before compressing down to three files of
about 47MB total.  That must be a lot of disk activity for the people
with multi-gigabyte indexes!

Regards,
Leto

CONFIDENTIALITY NOTICE AND DISCLAIMER

Information in this transmission is intended only for the person(s) to whom it 
is addressed and may contain privileged and/or confidential information. If you 
are not the intended recipient, any disclosure, copying or dissemination of the 
information is unauthorised and you should delete/destroy all copies and notify 
the sender. No liability is accepted for any unauthorised use of the 
information contained in this transmission.

This disclaimer has been automatically added.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Disk space used by optimize

2005-01-27 Thread Otis Gospodnetic
Hello,

Yes, that is how optimize works - copies all existing index segments
into one unified index segment, thus optimizing it.

see hit #1: http://www.lucenebook.com/search?query=optimize+disk+space

However, three times the space sounds a bit too much, or I make a
mistake in the book. :)

You said you end up with 3 files - .cfs is one of them, right?

Otis


--- Kauler, Leto S [EMAIL PROTECTED] wrote:

 
 Just a quick question:  after writing an index and then calling
 optimize(), is it normal for the index to expand to about three times
 the size before finally compressing?
 
 In our case the optimise grinds the disk, expanding the index into
 many
 files of about 145MB total, before compressing down to three files of
 about 47MB total.  That must be a lot of disk activity for the people
 with multi-gigabyte indexes!
 
 Regards,
 Leto
 
 CONFIDENTIALITY NOTICE AND DISCLAIMER
 
 Information in this transmission is intended only for the person(s)
 to whom it is addressed and may contain privileged and/or
 confidential information. If you are not the intended recipient, any
 disclosure, copying or dissemination of the information is
 unauthorised and you should delete/destroy all copies and notify the
 sender. No liability is accepted for any unauthorised use of the
 information contained in this transmission.
 
 This disclaimer has been automatically added.
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Disk space used by optimize

2005-01-27 Thread Kauler, Leto S
Our copy of LIA is in the mail ;)

Yes the final three files are: the .cfs (46.8MB), deletable (4 bytes),
and segments (29 bytes).

--Leto



 -Original Message-
 From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] 
 
 Hello,
 
 Yes, that is how optimize works - copies all existing index 
 segments into one unified index segment, thus optimizing it.
 
 see hit #1: http://www.lucenebook.com/search?query=optimize+disk+space
 
 However, three times the space sounds a bit too much, or I 
 make a mistake in the book. :)
 
 You said you end up with 3 files - .cfs is one of them, right?
 
 Otis
 
 
 --- Kauler, Leto S [EMAIL PROTECTED] wrote:
 
  
  Just a quick question:  after writing an index and then calling 
  optimize(), is it normal for the index to expand to about 
 three times 
  the size before finally compressing?
  
  In our case the optimise grinds the disk, expanding the index into 
  many files of about 145MB total, before compressing down to three 
  files of about 47MB total.  That must be a lot of disk activity for 
  the people with multi-gigabyte indexes!
  
  Regards,
  Leto

CONFIDENTIALITY NOTICE AND DISCLAIMER

Information in this transmission is intended only for the person(s) to whom it 
is addressed and may contain privileged and/or confidential information. If you 
are not the intended recipient, any disclosure, copying or dissemination of the 
information is unauthorised and you should delete/destroy all copies and notify 
the sender. No liability is accepted for any unauthorised use of the 
information contained in this transmission.

This disclaimer has been automatically added.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Disk space used by optimize

2005-01-27 Thread Otis Gospodnetic
Have you tried using the multifile index format?  Now I wonder if there
is actually a difference in disk space cosumed by optimize() when you
use multifile and compound index format...

Otis

--- Kauler, Leto S [EMAIL PROTECTED] wrote:

 Our copy of LIA is in the mail ;)
 
 Yes the final three files are: the .cfs (46.8MB), deletable (4
 bytes),
 and segments (29 bytes).
 
 --Leto
 
 
 
  -Original Message-
  From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] 
  
  Hello,
  
  Yes, that is how optimize works - copies all existing index 
  segments into one unified index segment, thus optimizing it.
  
  see hit #1:
 http://www.lucenebook.com/search?query=optimize+disk+space
  
  However, three times the space sounds a bit too much, or I 
  make a mistake in the book. :)
  
  You said you end up with 3 files - .cfs is one of them, right?
  
  Otis
  
  
  --- Kauler, Leto S [EMAIL PROTECTED] wrote:
  
   
   Just a quick question:  after writing an index and then calling 
   optimize(), is it normal for the index to expand to about 
  three times 
   the size before finally compressing?
   
   In our case the optimise grinds the disk, expanding the index
 into 
   many files of about 145MB total, before compressing down to three
 
   files of about 47MB total.  That must be a lot of disk activity
 for 
   the people with multi-gigabyte indexes!
   
   Regards,
   Leto
 
 CONFIDENTIALITY NOTICE AND DISCLAIMER
 
 Information in this transmission is intended only for the person(s)
 to whom it is addressed and may contain privileged and/or
 confidential information. If you are not the intended recipient, any
 disclosure, copying or dissemination of the information is
 unauthorised and you should delete/destroy all copies and notify the
 sender. No liability is accepted for any unauthorised use of the
 information contained in this transmission.
 
 This disclaimer has been automatically added.
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: how often to optimize?

2004-12-28 Thread aurora
Are not optimized indices causing you any problems (e.g. slow searches,
high number of open file handles)?  If no, then you don't even need to
optimize until those issues become... issues.
OK I have changed the process to not doing optimize() at all. So far so  
good. The number of files hover from 10 to 40 during the indexing of  
10,000 files. Seems Lucene is doing some kind of self maintenance to keep  
things in order.

Is it right to say optimize() is a totally optional operation? I probably  
get the impression it is a natural step to end an incremental update from  
the IndexHTML example. Since it replicates the whole index it might be an  
overkill for many applications to do daily.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: how often to optimize?

2004-12-28 Thread Otis Gospodnetic
Correct.
The self-maintenance you are referring to is Lucene's periodic segment
merging.  The frequency of that can be controlled through IndexWriter's
mergeFactor.

Otis

--- aurora [EMAIL PROTECTED] wrote:

  Are not optimized indices causing you any problems (e.g. slow
 searches,
  high number of open file handles)?  If no, then you don't even need
 to
  optimize until those issues become... issues.
 
 
 OK I have changed the process to not doing optimize() at all. So far
 so  
 good. The number of files hover from 10 to 40 during the indexing of 
 
 10,000 files. Seems Lucene is doing some kind of self maintenance to
 keep  
 things in order.
 
 Is it right to say optimize() is a totally optional operation? I
 probably  
 get the impression it is a natural step to end an incremental update
 from  
 the IndexHTML example. Since it replicates the whole index it might
 be an  
 overkill for many applications to do daily.
 
 
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



how often to optimize?

2004-12-21 Thread aurora
Right now I am incrementally adding about 100 documents to the index a day  
and then optimize after that. I find that optimize essentially rebuilding  
the entire index into a single file. So the size of disk write is  
proportion to the total index size, not to the size of documents  
incrementally added.

So my question is would it be an overkill to optimize everyday? Is there  
any guideline on how often to optimize? Every 1000 documents or more?  
Every week? Is there any concern if there are a lot of documents added  
without optimizing?

Thanks.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: how often to optimize?

2004-12-21 Thread Otis Gospodnetic
Hello,

I think some of these questions my be answered in the jGuru FAQ

 So my question is would it be an overkill to optimize everyday?

Only if lots of documents are being added/deleted, and you end up with
a lot of index segments.

 Is
 there  
 any guideline on how often to optimize? Every 1000 documents or more?

Are not optimized indices causing you any problems (e.g. slow searches,
high number of open file handles)?  If no, then you don't even need to
optimize until those issues become... issues.

 Every week? Is there any concern if there are a lot of documents
 added without optimizing?

Possibly, see my answer above.

Otis


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: finalize delete without optimize

2004-12-14 Thread Otis Gospodnetic
Hello John,

Once you make your change locally, use 'cvs diff -u IndexWriter.java 
indexwriter.patch' to make a patch.
Then open a new Bugzilla entry.
Finally, attach your patch to that entry.

Note that Document deletion is actually done from IndexReader, so your
patch may have to be on IndexReader, not IndexWriter.

Thanks,
Otis


--- John Wang [EMAIL PROTECTED] wrote:

 Hi Otis:
 
  Thanks for you reply.
 
  I am looking for more of an API call than a tool. e.g.
 IndexWriter.finalizeDelete()
 
  If I implement this, how would I go about submitting a patch?
 
 thanks
 
 -John
 
 
 On Mon, 13 Dec 2004 22:24:12 -0800 (PST), Otis Gospodnetic
 [EMAIL PROTECTED] wrote:
  Hello John,
  
  I believe you didn't get any replies to this.  What you are
 describing
  cannot be done using the public, but maaay (no source code on this
  machine, so I can't double-check that) be doable if you use some of
 the
  'internal' methods.
  
  I don't have the need for this, but others might, so it may be
 worth
  developing a tool that purges Documents marked as deleted without
 the
  expensive segment merging, iff that is possible.  If you put this
 tool
  under the approprite org.apache.lucene... package, you'll get
 access to
  'internal' methods, of course.  If you end up creating this, we
 could
  stick it in the Sandbox, where we should really create a new
 section
  for handy command-line tools that manipulate the index.
  
  Otis
  
  
  
  
  --- John Wang [EMAIL PROTECTED] wrote:
  
   Hi:
  
  Is there a way to finalize delete, e.g. actually remove them
 from
   the segments and make sure the docIDs are contiguous again.
  
  The only explicit way to do this is by calling
   IndexWriter.optmize(). But this call does a lot more (also merges
 all
   the segments), hence is very expensive. Is there a way to simply
 just
   finalize the deletes without having to merge all the segments?
  
   If not, I'd be glad to submit an implementation of this
 feature
   if
   the Lucene devs agree this is useful.
  
   Thanks
  
   -John
  
  
 -
   To unsubscribe, e-mail:
 [EMAIL PROTECTED]
   For additional commands, e-mail:
 [EMAIL PROTECTED]
  
  
  
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: finalize delete without optimize

2004-12-14 Thread Otis Gospodnetic
Hello John,

I believe you didn't get any replies to this.  What you are describing
cannot be done using the public, but maaay (no source code on this
machine, so I can't double-check that) be doable if you use some of the
'internal' methods.  

I don't have the need for this, but others might, so it may be worth
developing a tool that purges Documents marked as deleted without the
expensive segment merging, iff that is possible.  If you put this tool
under the approprite org.apache.lucene... package, you'll get access to
'internal' methods, of course.  If you end up creating this, we could
stick it in the Sandbox, where we should really create a new section
for handy command-line tools that manipulate the index.

Otis


--- John Wang [EMAIL PROTECTED] wrote:

 Hi:
 
Is there a way to finalize delete, e.g. actually remove them from
 the segments and make sure the docIDs are contiguous again.
 
The only explicit way to do this is by calling
 IndexWriter.optmize(). But this call does a lot more (also merges all
 the segments), hence is very expensive. Is there a way to simply just
 finalize the deletes without having to merge all the segments?
 
 If not, I'd be glad to submit an implementation of this feature
 if
 the Lucene devs agree this is useful.
 
 Thanks
 
 -John
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: finalize delete without optimize

2004-12-14 Thread John Wang
Hi Otis:

 Thanks for you reply.

 I am looking for more of an API call than a tool. e.g.
IndexWriter.finalizeDelete()

 If I implement this, how would I go about submitting a patch?

thanks

-John


On Mon, 13 Dec 2004 22:24:12 -0800 (PST), Otis Gospodnetic
[EMAIL PROTECTED] wrote:
 Hello John,
 
 I believe you didn't get any replies to this.  What you are describing
 cannot be done using the public, but maaay (no source code on this
 machine, so I can't double-check that) be doable if you use some of the
 'internal' methods.
 
 I don't have the need for this, but others might, so it may be worth
 developing a tool that purges Documents marked as deleted without the
 expensive segment merging, iff that is possible.  If you put this tool
 under the approprite org.apache.lucene... package, you'll get access to
 'internal' methods, of course.  If you end up creating this, we could
 stick it in the Sandbox, where we should really create a new section
 for handy command-line tools that manipulate the index.
 
 Otis
 
 
 
 
 --- John Wang [EMAIL PROTECTED] wrote:
 
  Hi:
 
 Is there a way to finalize delete, e.g. actually remove them from
  the segments and make sure the docIDs are contiguous again.
 
 The only explicit way to do this is by calling
  IndexWriter.optmize(). But this call does a lot more (also merges all
  the segments), hence is very expensive. Is there a way to simply just
  finalize the deletes without having to merge all the segments?
 
  If not, I'd be glad to submit an implementation of this feature
  if
  the Lucene devs agree this is useful.
 
  Thanks
 
  -John
 
  -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail: [EMAIL PROTECTED]
 
 
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


RE: finalize delete without optimize

2004-12-09 Thread Aviran
Lucene standard API does not support this kind of operation.

Aviran
http://www.aviransplace.com


-Original Message-
From: John Wang [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, December 08, 2004 17:32 PM
To: [EMAIL PROTECTED]
Subject: Re: finalize delete without optimize


Hi folks:

I sent this out a few days ago without a response. 

Please help.

Thanks in advance

-John


On Mon, 6 Dec 2004 21:15:00 -0800, John Wang [EMAIL PROTECTED] wrote:
 Hi:
 
   Is there a way to finalize delete, e.g. actually remove them from 
 the segments and make sure the docIDs are contiguous again.
 
   The only explicit way to do this is by calling 
 IndexWriter.optmize(). But this call does a lot more (also merges all 
 the segments), hence is very expensive. Is there a way to simply just 
 finalize the deletes without having to merge all the segments?
 
If not, I'd be glad to submit an implementation of this feature if 
 the Lucene devs agree this is useful.
 
 Thanks
 
 -John


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



finalize delete without optimize

2004-12-06 Thread John Wang
Hi:

   Is there a way to finalize delete, e.g. actually remove them from
the segments and make sure the docIDs are contiguous again.

   The only explicit way to do this is by calling
IndexWriter.optmize(). But this call does a lot more (also merges all
the segments), hence is very expensive. Is there a way to simply just
finalize the deletes without having to merge all the segments?

If not, I'd be glad to submit an implementation of this feature if
the Lucene devs agree this is useful.

Thanks

-John

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: JDBCDirectory to prevent optimize()?

2004-11-23 Thread Daniel Naber
On Tuesday 23 November 2004 00:06, Kevin A. Burton wrote:

 I'm wondering about the potential for a generic JDBCDirectory for
 keeping the lucene index within a database.

Such a thing already exists: http://ppinew.mnis.com/jdbcdirectory/, but I 
don't know about its scalability.

Regards
 Daniel

-- 
http://www.danielnaber.de

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: JDBCDirectory to prevent optimize()?

2004-11-23 Thread Erik Hatcher
Also, there is a DBDirectory in the sandbox to store a Lucene index 
inside Berkeley DB.

Erik
On Nov 22, 2004, at 6:06 PM, Kevin A. Burton wrote:
It seems that when compared to other datastores that Lucene starts to 
fall down.  For example lucene doesn't perform online index 
optimizations so if you add 10 documents you have to run optimize() 
again and this isn't exactly a fast operation.

I'm wondering about the potential for a generic JDBCDirectory for 
keeping the lucene index within a database.
It sounds somewhat unconventional would allow you to perform live 
addDirectory updates without performing an optimize() again.

Has anyone looked at this?  How practical would it be.
Kevin
--
Use Rojo (RSS/Atom aggregator).  Visit http://rojo.com. Ask me for an 
invite!  Also see irc.freenode.net #rojo if you want to chat.

Rojo is Hiring! - http://www.rojonetworks.com/JobsAtRojo.html
If you're interested in RSS, Weblogs, Social Networking, etc... then 
you should work for Rojo!  If you recommend someone and we hire them 
you'll get a free iPod!
   Kevin A. Burton, Location - San Francisco, CA
  AIM/YIM - sfburtonator,  Web - http://peerfear.org/
GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: JDBCDirectory to prevent optimize()?

2004-11-23 Thread Kevin A. Burton
Erik Hatcher wrote:
Also, there is a DBDirectory in the sandbox to store a Lucene index 
inside Berkeley DB.
I assume this would prevent prefix queries from working...
Kevin
--
Use Rojo (RSS/Atom aggregator).  Visit http://rojo.com. Ask me for an 
invite!  Also see irc.freenode.net #rojo if you want to chat.

Rojo is Hiring! - http://www.rojonetworks.com/JobsAtRojo.html
If you're interested in RSS, Weblogs, Social Networking, etc... then you 
should work for Rojo!  If you recommend someone and we hire them you'll 
get a free iPod!
   
Kevin A. Burton, Location - San Francisco, CA
  AIM/YIM - sfburtonator,  Web - http://peerfear.org/
GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: JDBCDirectory to prevent optimize()?

2004-11-23 Thread Erik Hatcher
On Nov 23, 2004, at 6:02 PM, Kevin A. Burton wrote:
Erik Hatcher wrote:
Also, there is a DBDirectory in the sandbox to store a Lucene index 
inside Berkeley DB.
I assume this would prevent prefix queries from working...
Huh?  Why would you assume that?  As far as I know, and I've tested 
this some, a Lucene index inside Berkeley DB works the same as if it 
had been in RAM or on the filesystem.

Erik
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


JDBCDirectory to prevent optimize()?

2004-11-22 Thread Kevin A. Burton
It seems that when compared to other datastores that Lucene starts to 
fall down.  For example lucene doesn't perform online index 
optimizations so if you add 10 documents you have to run optimize() 
again and this isn't exactly a fast operation.

I'm wondering about the potential for a generic JDBCDirectory for 
keeping the lucene index within a database. 

It sounds somewhat unconventional would allow you to perform live 
addDirectory updates without performing an optimize() again.

Has anyone looked at this?  How practical would it be.
Kevin
--
Use Rojo (RSS/Atom aggregator).  Visit http://rojo.com. Ask me for an 
invite!  Also see irc.freenode.net #rojo if you want to chat.

Rojo is Hiring! - http://www.rojonetworks.com/JobsAtRojo.html
If you're interested in RSS, Weblogs, Social Networking, etc... then you 
should work for Rojo!  If you recommend someone and we hire them you'll 
get a free iPod!
   
Kevin A. Burton, Location - San Francisco, CA
  AIM/YIM - sfburtonator,  Web - http://peerfear.org/
GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


using optimize and addDocument concurrently.

2004-10-19 Thread Stephen Halsey
Hi,

My basic question is whether it is possible to continue to add documents to an index 
in one Thread while running a long running optimization of the index (approx 30 mins) 
in another thread.  I'm using Lucene version 1.4.2.  The concurrency matrix at 
http://www.jguru.com/faq/view.jsp?EID=913302 shows that if you use the same 
IndexWriter object you can do concurrent writes and optimization.  When I try it in my 
program the addDocuments wait until the optimization has finished, so in this respect 
it is Thread safe, but the operations cannot be performed at the same time.  Our 
problem is that the index needs to be continually kept up to date with new news 
articles, but also needs to be regularly optimized to keep it fast.  If I cannot 
update and optimize one index at the same time the best way I can see of doing this is 
maintaining multiple identical indexes and offlining, optimizing, letting them catch 
up-to-date and re-onlining them.  Does that sounds best to you?

Thanks a lot in advance


Steve

RE: using optimize and addDocument concurrently.

2004-10-19 Thread Aad Nales
Steve,

The behavior that you descibe is as expected. I have tackled a similar
problem to yours by creating a proxy object that acts as a gatekeeper to
all IndexReader, IndexSearcher and IndexWriter operations. With fully
synchronized access to all methods of the proxy you will not run into
any problems. Everytime I need to perform something with the writer, I
close the searcher etc.

As to regular optimization I tend to reindex now and again with a
completely seperate writer and replace the index by moving it to the new
location. This BTW has also become a method in my proxy object.

Hope this helps,
Cheers,
Aad




Hi,

My basic question is whether it is possible to continue to add documents
to an index in one Thread while running a long running optimization of
the index (approx 30 mins) in another thread.  I'm using Lucene version
1.4.2.  The concurrency matrix at
http://www.jguru.com/faq/view.jsp?EID=913302 shows that if you use the
same IndexWriter object you can do concurrent writes and optimization.
When I try it in my program the addDocuments wait until the optimization
has finished, so in this respect it is Thread safe, but the operations
cannot be performed at the same time.  Our problem is that the index
needs to be continually kept up to date with new news articles, but also
needs to be regularly optimized to keep it fast.  If I cannot update and
optimize one index at the same time the best way I can see of doing this
is maintaining multiple identical indexes and offlining, optimizing,
letting them catch up-to-date and re-onlining them.  Does that sounds
best to you?

Thanks a lot in advance


Steve



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



IndexReader.close() semantics and optimize -- Re: problem with locks when updating the data of a previous stored document

2004-09-16 Thread David Spencer
Crump, Michael wrote:
You have to close the IndexReader after doing the delete, before opening the 
IndexWriter for the addition.  See information at this link:
http://wiki.apache.org/jakarta-lucene/UpdatingAnIndex
Recently I thought I observed that if I use this batch update idiom (1st 
delete the changed docs, then add them), it seems that 
IndexReader.close() does not flush/commit the deletions - rather 
IndexWriter.optimize() does.

I may have been confused and should retest this, but regardless, the 
javadoc seems unclear. close() says it *saves* deletions to disk. What 
does it mean to save a deletion? Save a pending one, or commit it 
(commit - really delete it)?

http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/index/IndexReader.html#close()
Also optimize doesn't mention deletions.
http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/index/IndexWriter.html#optimize()
Suggestion: could the word save in the close() jdoc be elaborated on, 
and possibly could optimize() get another comment wrt its effect on 
deletions?

thx,
 Dave

Regards,
Michael
-Original Message-
From:   Paul Williams [mailto:[EMAIL PROTECTED]
Sent:   Thu 9/16/2004 5:39 AM
To: 'Lucene Users List'
Cc: 
Subject:problem with locks when updating the data of a previous stored document
Hi,
Using lucene-1.4.1.jar on WinXP  

I am having trouble with locking and updating an existing Lucene document. I
delete the old document from the index and then add the new document to the
index writer. I am using the minMerge docs set to 100 (much quicker!!) and
close the writer once the batch is done, so the documents are flushed to the
filesystem
The problem i am having is I can't delete the old version of the document
(after the first document has been added) using reader.delete because there
is a lock on the index due to the IndexWriter being open.
Am I doing this wrong or is there a simple way round this?
Regards,
Paul
Code snippets of the update code (I have just cut and pasted the relevant
line from my app to get an idea)
reader = IndexReader.open(location);
// Delete old doc/term if present
if (reader.docFreq(docNumberTerm)  0) {
reader.delete(docNumberTerm);
.
.
.
IndexWriter writer = null;
// get the writer from the hash table so last few are cached and don't
have to be restarted
synchronized(IndexWriterCache) {
   String dbstring =  + ldb;
   writer = (IndexWriter)IndexWriterCache.get(dbstring);
   if (writer == null) {
   //Not in cache so create one and add to cache for next time
   writer = new IndexWriter(location, new StandardAnalyzer(),
new_index);
   writer.setUseCompoundFile(true);
   // Set the maximum number of entries per field. Default is
10,000
   writer.maxFieldLength = MaxFieldCount;
   // Set how many docs will be stored in memory before being
saved to disk
   writer.minMergeDocs = (int) DocsInMemory;
   IndexWriterCache.remove(dbstring);
   IndexWriterCache.put(dbstring, writer);
}
.
.
.
  
// Add the docuents to the Lucene index
writer.addDocument(doc);


.
. Some time later after a batch of docs been added
 
	   writer.close();




DISCLAIMER:
The information in this message is confidential and may be legally
privileged. It is intended solely for the addressee. Access to this message
by anyone else is unauthorised. If you are not the intended recipient, any
disclosure, copying, or distribution of the message, or any action or
omission taken by you in reliance on it, is prohibited and may be unlawful.
Please immediately contact the sender if you have received this message in
error.
Thank you.
Valid Information Systems Limited. Address: Morline House, 160 London Road,
Barking, Essex, IG11 8BB. 

http://www.valinf.com Tel: +44 (0) 20 8215 1414 Fax: +44 (0) 20 8215 2040
Please note that as part of our drive to continually improve the service to
our clients, we have established a dedicated support line for customers to 
call if they are in need of help with their installation of R/KYV or have a
query regarding the operation of the software. The number is - 0870 0161414
This will ensure any call is carefully noted, any action required is 
scheduled for completion and any problem experienced handled by a carefully
chosen team of developers. Please make a note of this number and pass it 
on to any other relevant person within your organisation.
 
*

--
Visit Valid who will sharing a stand with partners, Goss Interactive at the
SOCITM Event, 10- 12 October 2004, Edinburgh International Conference

Re: Way to repair an index broking during 1/2 optimize?

2004-07-09 Thread Doug Cutting
Kevin A. Burton wrote:
With the typical handful of fields, one should never see more than 
hundreds of files.

We only have 13 fields... Though to be honest I'm worried that even if I 
COULD do the optimize that it would run out of file handles.
Optimization doesn't open all files at once.  The most files that are 
ever opened by an IndexWriter is just:

4 + (5 + numIndexedFields) * (mergeFactor-1)
This includes during optimization.
However, when searching, an IndexReader must keep most files open.  In 
particular, the maximum number of files an unoptimized, non-compound 
IndexReader can have open is:

(5 + numIndexedFields) * (mergeFactor-1) * 
(log_base_mergeFactor(numDocs/minMergeDocs))

A compound IndexReader, on the other hand, should open at most, just:
(mergeFactor-1) * (log_base_mergeFactor(numDocs/minMergeDocs))
An optimized, non-compound IndexReader will open just (5 + 
numIndexedFields) files.

And an optimized, compound IndexReader should only keep one file open.
Doug
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Way to repair an index broking during 1/2 optimize?

2004-07-08 Thread Peter M Cipollone
You might try merging the existing index into a new index located on a ram
disk.  Once it is done, you can move the directory from ram disk back to
your hard disk.  I think this will work as long as the old index did not
finish merging.  You might do a strings command on the segments file to
make sure the new (merged) segment is not in there, and if there's a
deletable file, make sure there are no segments from the old index listed
therein.

- Original Message - 
From: Kevin A. Burton [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Thursday, July 08, 2004 2:02 PM
Subject: Way to repair an index broking during 1/2 optimize?


 So.. the other day I sent an email about building an index with 14M
 documents.

 That went well but the optimize() was taking FOREVER.  It took 7 hours
 to generate the whole index and when complete as of 10AM it was still
 optimizing (6 hours later) and I needed the box back.

 So is it possible to fix this index now?  Can I just delete the most
 recent segment that was created?  I can find this by ls -alt

 Also... what can I do to speed up this optimize?  Ideally it wouldn't
 take 6 hours.

 Kevin

 -- 

 Please reply using PGP.

 http://peerfear.org/pubkey.asc

 NewsMonster - http://www.newsmonster.org/

 Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965
AIM/YIM - sfburtonator,  Web - http://peerfear.org/
 GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412
   IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster


 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Way to repair an index broking during 1/2 optimize?

2004-07-08 Thread Doug Cutting
Kevin A. Burton wrote:
So is it possible to fix this index now?  Can I just delete the most 
recent segment that was created?  I can find this by ls -alt
Sorry, I forgot to answer your question: this should work fine.  I don't 
think you should even have to delete that segment.

Also, to elaborate on my previous comment, a mergeFactor of 5000 not 
only delays the work until the end, but it also makes the disk workload 
more seek-dominated, which is not optimal.  So I suspect a smaller merge 
factor, together with a larger minMergeDocs, will be much faster 
overall, including the final optimize().  Please tell us how it goes.

Doug
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Way to repair an index broking during 1/2 optimize?

2004-07-08 Thread Kevin A. Burton
Peter M Cipollone wrote:
You might try merging the existing index into a new index located on a ram
disk.  Once it is done, you can move the directory from ram disk back to
your hard disk.  I think this will work as long as the old index did not
finish merging.  You might do a strings command on the segments file to
make sure the new (merged) segment is not in there, and if there's a
deletable file, make sure there are no segments from the old index listed
therein.
 

Its a HUGE index.  It won't fit in memory ;)  Right now its at 8G...
Thanks though! :)
Kevin
--
Please reply using PGP.
   http://peerfear.org/pubkey.asc
   
   NewsMonster - http://www.newsmonster.org/
   
Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965
  AIM/YIM - sfburtonator,  Web - http://peerfear.org/
GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412
 IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Way to repair an index broking during 1/2 optimize?

2004-07-08 Thread Kevin A. Burton
Doug Cutting wrote:
Kevin A. Burton wrote:
Also... what can I do to speed up this optimize? Ideally it wouldn't 
take 6 hours.

Was this the index with the mergeFactor of 5000? If so, that's why 
it's so slow: you've delayed all of the work until the end. Indexing 
on a ramfs will make things faster in general, however, if you have 
enough RAM...
No... I changed the mergeFactor back to 10 as you suggested.
Kevin
--
Please reply using PGP.
   http://peerfear.org/pubkey.asc
   
   NewsMonster - http://www.newsmonster.org/
   
Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965
  AIM/YIM - sfburtonator,  Web - http://peerfear.org/
GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412
 IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Way to repair an index broking during 1/2 optimize?

2004-07-08 Thread Kevin A. Burton
Doug Cutting wrote:
Kevin A. Burton wrote:
So is it possible to fix this index now? Can I just delete the most 
recent segment that was created? I can find this by ls -alt

Sorry, I forgot to answer your question: this should work fine. I 
don't think you should even have to delete that segment.
I'm worried about duplicate or missing content from the original index. 
I'd rather rebuild the index and waste another 6 hours (I've probably 
blown 100 hours of CPU time on this already) and have a correct index :)

During an optimize I assume Lucene starts writing to a new segment and 
leaves all others in place until everything is done and THEN deletes them?

Also, to elaborate on my previous comment, a mergeFactor of 5000 not 
only delays the work until the end, but it also makes the disk 
workload more seek-dominated, which is not optimal. 
The only settings I uses are:
targetIndex.mergeFactor=10;
targetIndex.minMergeDocs=1000;
the resulting index has 230k files in it :-/
I assume this is contributing to all the disk seeks.
So I suspect a smaller merge factor, together with a larger 
minMergeDocs, will be much faster overall, including the final 
optimize(). Please tell us how it goes.

This is what I did for this last round but then I ended up with the 
highly fragmented index.

hm...
Thanks for all the help btw!
Kevin
--
Please reply using PGP.
   http://peerfear.org/pubkey.asc
   
   NewsMonster - http://www.newsmonster.org/
   
Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965
  AIM/YIM - sfburtonator,  Web - http://peerfear.org/
GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412
 IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Way to repair an index broking during 1/2 optimize?

2004-07-08 Thread Doug Cutting
Kevin A. Burton wrote:
No... I changed the mergeFactor back to 10 as you suggested.
Then I am confused about why it should take so long.
Did you by chance set the IndexWriter.infoStream to something, so that 
it logs merges?  If so, it would be interesting to see that output, 
especially the last entry.

Doug
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Way to repair an index broking during 1/2 optimize?

2004-07-08 Thread Kevin A. Burton
Doug Cutting wrote:
Kevin A. Burton wrote:
No... I changed the mergeFactor back to 10 as you suggested.

Then I am confused about why it should take so long.
Did you by chance set the IndexWriter.infoStream to something, so that 
it logs merges? If so, it would be interesting to see that output, 
especially the last entry.

No I didn't actually... If I run it again I'll be sure to do this.
--
Please reply using PGP.
   http://peerfear.org/pubkey.asc
   
   NewsMonster - http://www.newsmonster.org/
   
Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965
  AIM/YIM - sfburtonator,  Web - http://peerfear.org/
GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412
 IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Way to repair an index broking during 1/2 optimize?

2004-07-08 Thread Doug Cutting
Kevin A. Burton wrote:
During an optimize I assume Lucene starts writing to a new segment and 
leaves all others in place until everything is done and THEN deletes them?
That's correct.
The only settings I uses are:
targetIndex.mergeFactor=10;
targetIndex.minMergeDocs=1000;
the resulting index has 230k files in it :-/
Something sounds very wrong for there to be that many files.
The maximum number of files should be around:
  (7 + numIndexedFields) * (mergeFactor-1) * 
(log_base_mergeFactor(numDocs/minMergeDocs))

With 14M documents, log_10(14M/1000) is 4, which gives, for you:
  (7 + numIndexedFields) * 36 = 230k
   7*36 + numIndexedFields*36 = 230k
   numIndexedFields = (230k - 7*36) / 36 =~ 6k
So you'd have to have around 6k unique field names to get 230k files. 
Or something else must be wrong.  Are you running on win32, where file 
deletion can be difficult?

With the typical handful of fields, one should never see more than 
hundreds of files.

Doug
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Way to repair an index broking during 1/2 optimize?

2004-07-08 Thread Kevin A. Burton
Doug Cutting wrote:
Something sounds very wrong for there to be that many files.
The maximum number of files should be around:
(7 + numIndexedFields) * (mergeFactor-1) * 
(log_base_mergeFactor(numDocs/minMergeDocs))

With 14M documents, log_10(14M/1000) is 4, which gives, for you:
(7 + numIndexedFields) * 36 = 230k
7*36 + numIndexedFields*36 = 230k
numIndexedFields = (230k - 7*36) / 36 =~ 6k
So you'd have to have around 6k unique field names to get 230k files. 
Or something else must be wrong. Are you running on win32, where file 
deletion can be difficult?

With the typical handful of fields, one should never see more than 
hundreds of files.

We only have 13 fields... Though to be honest I'm worried that even if I 
COULD do the optimize that it would run out of file handles.

This is very strange...
I'm going to increase minMergeDocs to 1 and then run the full 
converstion on one box and then try to do an optimize (of the corrupt) 
another box. See which one finishes first.

I assume the speed of optimize() can be increased the same way that 
indexing is increased...

Kevin
--
Please reply using PGP.
   http://peerfear.org/pubkey.asc
   
   NewsMonster - http://www.newsmonster.org/
   
Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965
  AIM/YIM - sfburtonator,  Web - http://peerfear.org/
GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412
 IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


addIndexes and optimize

2004-07-07 Thread roy-lucene-user
Hey y'all again,

Just wondering why the IndexWriter.addIndexes method calls optimize before and after 
it starts merging segments together.

We would like to create an addIndexes method that doesn't optimize and call optimize 
on the IndexWriter later.

Roy.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: optimize() is not merging into single file? !!!!!!

2004-06-02 Thread iouli . golovatyi
I rechecked  the results. Here they are:

IndexWriter compiled with v.1.4-rc2 generates after optimization
_36d.cfs3779 kb

IndexWriter compiled with v.1.4-rc3 generates after optimization

_36d.cfs   3778 kb
_36c.cfs31 kb
_35z.cfs14 kb
_35o.cfs   14  kb
.
etc.

I both cases segment file contains _36d.cfs

Looks like new version just foget to clean up






Iouli Golovatyi/X/GP/[EMAIL PROTECTED]
01.06.2004 17:22
Please respond to Lucene Users List

 
To: [EMAIL PROTECTED]
cc: 
Subject:optimeze() is not merging into single file?
Category: 



I optimize and close the index after that, but don't get just one .cvs 
file as it promised in doc. Instead of it I see something like small 
segments and a couple of big.
This weird behavor seems started since i changed from v 1.4-rc2 to 
1.4-rc3.
Before I got just one cvs segment . Any ideas?
Thanks in advance
J.



RE : optimize() is not merging into single file? !!!!!!

2004-06-02 Thread Rasik Pandey
Hello,

I am running a two-week old version of Lucene from the CVS HEAD and seeing the same 
behavior.?

Regards,
RBP 

 -Message d'origine-
 De : [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED]
 Envoy : mercredi 2 juin 2004 13:53
  : Lucene Users List
 Objet : Re: optimize() is not merging into single file? !!
 
 I rechecked  the results. Here they are:
 
 IndexWriter compiled with v.1.4-rc2 generates after
 optimization
 _36d.cfs3779 kb
 
 IndexWriter compiled with v.1.4-rc3 generates after
 optimization
 
 _36d.cfs   3778 kb
 _36c.cfs31 kb
 _35z.cfs14 kb
 _35o.cfs   14  kb
 .
 etc.
 
 I both cases segment file contains _36d.cfs
 
 Looks like new version just foget to clean up
 
 
 
 
 
 
 Iouli Golovatyi/X/GP/[EMAIL PROTECTED]
 01.06.2004 17:22
 Please respond to Lucene Users List
 
 
 To: [EMAIL PROTECTED]
 cc:
 Subject:optimeze() is not merging into single
 file?
 Category:
 
 
 
 I optimize and close the index after that, but don't get just
 one .cvs
 file as it promised in doc. Instead of it I see something like
 small
 segments and a couple of big.
 This weird behavor seems started since i changed from v 1.4-rc2
 to
 1.4-rc3.
 Before I got just one cvs segment . Any ideas?
 Thanks in advance
 J.




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: optimize fails with Negative seek offset

2004-05-12 Thread Sascha Ottolski
Hi,

sorry for following up my own mail, but since no one responded so
far, I thought the stacktrace might be of interested. The following
exception always occurs when trying to optimize one of our indizes,
which always went ok for about a year now. I just tried with 1.4-rc3,
but with the same result:

java.io.IOException: Negative seek offset
at java.io.RandomAccessFile.seek(Native Method)
at org.apache.lucene.store.FSInputStream.readInternal(FSDirectory.java:405)
at org.apache.lucene.store.InputStream.readBytes(InputStream.java:61)
at 
org.apache.lucene.index.CompoundFileReader$CSInputStream.readInternal(CompoundFileReader.java:222)
at org.apache.lucene.store.InputStream.refill(InputStream.java:158)
at org.apache.lucene.store.InputStream.readByte(InputStream.java:43)
at org.apache.lucene.store.InputStream.readVInt(InputStream.java:83)
at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:63)
at org.apache.lucene.index.SegmentReader.document(SegmentReader.java:238)
at org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:185)
at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:92)
at org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:483)
at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:362)
at LuceneRPCHandler.optimize(LuceneRPCHandler.java:398)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:324)
at org.apache.xmlrpc.Invoker.execute(Invoker.java:168)
at org.apache.xmlrpc.XmlRpcWorker.invokeHandler(XmlRpcWorker.java:123)
at org.apache.xmlrpc.XmlRpcWorker.execute(XmlRpcWorker.java:185)
at org.apache.xmlrpc.XmlRpcServer.execute(XmlRpcServer.java:151)
at org.apache.xmlrpc.XmlRpcServer.execute(XmlRpcServer.java:139)
at org.apache.xmlrpc.WebServer$Connection.run(WebServer.java:773)
at org.apache.xmlrpc.WebServer$Runner.run(WebServer.java:656)
at java.lang.Thread.run(Thread.java:534)


Any hint would be greatly appreciated.


Thanks,

Sascha

-- 
Gallileus - the power of knowledge

Gallileus GmbHhttp://www.gallileus.info/

Pintschstraße 16  fon +49-(0)30-41 93 43 43
10249 Berlin  fax +49-(0)30-41 93 43 45
Germany



++
AKTUELLER HINWEIS (Mai 2004)

Literatur Alerts - Literatursuche (wie) im Schlaf!

Ab jetzt mehr dazu unter:
http://www.gallileus.info/gallileus/about/products/alerts/
++

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: optimize fails with Negative seek offset

2004-05-12 Thread Anthony Vito
Looks like the same error I got when I tried to use Lucene version 1.3
to search on an index I had created with Lucene version 1.4. The
versions are not forward compatible. Did you by chance create the index
with version 1.4 and are now searching with version 1.3. It's easy to
get the dependencies out of sync for different apps, which is what
happened to me.

-vito

On Wed, 2004-05-12 at 04:59, Sascha Ottolski wrote:
 Hi,
 
 sorry for following up my own mail, but since no one responded so
 far, I thought the stacktrace might be of interested. The following
 exception always occurs when trying to optimize one of our indizes,
 which always went ok for about a year now. I just tried with 1.4-rc3,
 but with the same result:
 
 java.io.IOException: Negative seek offset
 at java.io.RandomAccessFile.seek(Native Method)
 at org.apache.lucene.store.FSInputStream.readInternal(FSDirectory.java:405)
 at org.apache.lucene.store.InputStream.readBytes(InputStream.java:61)
 at 
 org.apache.lucene.index.CompoundFileReader$CSInputStream.readInternal(CompoundFileReader.java:222)
 at org.apache.lucene.store.InputStream.refill(InputStream.java:158)
 at org.apache.lucene.store.InputStream.readByte(InputStream.java:43)
 at org.apache.lucene.store.InputStream.readVInt(InputStream.java:83)
 at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:63)
 at org.apache.lucene.index.SegmentReader.document(SegmentReader.java:238)
 at org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:185)
 at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:92)
 at org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:483)
 at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:362)
 at LuceneRPCHandler.optimize(LuceneRPCHandler.java:398)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:324)
 at org.apache.xmlrpc.Invoker.execute(Invoker.java:168)
 at org.apache.xmlrpc.XmlRpcWorker.invokeHandler(XmlRpcWorker.java:123)
 at org.apache.xmlrpc.XmlRpcWorker.execute(XmlRpcWorker.java:185)
 at org.apache.xmlrpc.XmlRpcServer.execute(XmlRpcServer.java:151)
 at org.apache.xmlrpc.XmlRpcServer.execute(XmlRpcServer.java:139)
 at org.apache.xmlrpc.WebServer$Connection.run(WebServer.java:773)
 at org.apache.xmlrpc.WebServer$Runner.run(WebServer.java:656)
 at java.lang.Thread.run(Thread.java:534)
 
 
 Any hint would be greatly appreciated.
 
 
 Thanks,
 
 Sascha
 
 -- 
 Gallileus - the power of knowledge
 
 Gallileus GmbHhttp://www.gallileus.info/
 
 Pintschstrae 16  fon +49-(0)30-41 93 43 43
 10249 Berlin  fax +49-(0)30-41 93 43 45
 Germany
 
 
 
 ++
 AKTUELLER HINWEIS (Mai 2004)
 
 Literatur Alerts - Literatursuche (wie) im Schlaf!
 
 Ab jetzt mehr dazu unter:
 http://www.gallileus.info/gallileus/about/products/alerts/
 ++
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: optimize fails with Negative seek offset

2004-05-12 Thread Sascha Ottolski
Am Mittwoch, 12. Mai 2004 18:54 schrieb Anthony Vito:
 Looks like the same error I got when I tried to use Lucene version
 1.3 to search on an index I had created with Lucene version 1.4. The
 versions are not forward compatible. Did you by chance create the
 index with version 1.4 and are now searching with version 1.3. It's
 easy to get the dependencies out of sync for different apps, which is
 what happened to me.

 -vito

Hi vito,

thanks for the reply, but no, we only upgraded so far, but did not 
downgade. More than that, the failing index was just rebuilt completely 
with 1.4-rc2, only two weeks ago. The problem started a short time 
afterwards (but not immediately).


Greets,

Sascha

-- 
Gallileus - the power of knowledge

Gallileus GmbHhttp://www.gallileus.info/

Pintschstrae 16  fon +49-(0)30-41 93 43 43
10249 Berlin  fax +49-(0)30-41 93 43 45
Germany



++
AKTUELLER HINWEIS (Mai 2004)

Literatur Alerts - Literatursuche (wie) im Schlaf!

Ab jetzt mehr dazu unter:
http://www.gallileus.info/gallileus/about/products/alerts/
++

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Memory requirements for optimize() on compound index high?

2004-05-10 Thread David Sitsky
Hi,

I am working on an application which uses Lucene 1.3 Final which uses the 
compound index format on Win32 Sun JVM 1.4.1_02.  I have set 
maxFieldLength for the index writer to 1,000,000, as often I have to index 
potentially very large documents, which contain information that must be 
indexed.

All other index writer parameters have their default values.  The 
application loads all documents in a batch phase, and then allows the user 
to perform searchers.  Typically, no new documents are added afterwards.

Given the large size set for maxFieldLength, I have allocated 512MB of 
memory to the JVM.  For indexing 1,000,000 complex documents, with 
potentially around 30 fields each, this seems to work fine.

I have noticed that when performing an optimize() on this index at the end 
of a batch load, the memory requirements seem to be much higher.  I was 
receiving OutOfMemoryErrors for a 512MB JVM.  I increased the JVM size to 
1 GIG, and the optimize operation completed successfully.

Task manager reported a peak VM size of 810MB during the optimize() 
operation, from a newly-created JVM.  FWIW, the final index size was 11 
gigabytes - most document fields are stored in the index.

Do people have similar experiences to this when calling optimize() on a 
compound index?

Are there any ways I can reduce the amount of memory required, apart from 
making maxFieldLength smaller?

Are there any way of determining in advance the kind of memory requirements 
optimize() will require?  Its highly undesirable to receive 
OutOfMemoryErrors during optimize().  I guess the user can still search on 
an unoptimized index which is better than nothing...

-- 
Cheers,
David

This message is intended only for the named recipient.  If you are not the 
intended recipient you are notified that disclosing, copying, distributing 
or taking any action  in reliance on the contents of this information is 
strictly prohibited.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



optimize fails with Negative seek offset

2004-05-04 Thread Sascha Ottolski
Hi,

I have no idea where to look for, and I know almost nothing about 
java :-( We're using lucene quite a while now (about a year I guess) 
and suddenly I've seen this when trying to optimize the index:

java.lang.Exception: java.io.IOException: Negative seek offset

The code throwing this was:

public boolean optimize() throws IOException {
IndexWriter writer = new IndexWriter(this.indexpath, new 
StandardAnalyzer(), false);
writer.mergeFactor = this.mergeFactor;
try {
writer.optimize();
writer.close();
}
finally {
this.changedIndex();
}
return true;
}

The index-file is about 8.8 GB now. However, when the exception occurs 
the new temporary index-file only grew to 3.2 GB. All this with 
1.4-rc2.


Thanks in advance for any advice,

Sascha


-- 
Gallileus - the power of knowledge

Gallileus GmbHhttp://www.gallileus.info/

Pintschstraße 16  fon +49-(0)30-41 93 43 43
10249 Berlin  fax +49-(0)30-41 93 43 45
Germany



++
AKTUELLER HINWEIS (Mai 2004)

Literatur Alerts - Literatursuche (wie) im Schlaf!

Ab jetzt mehr dazu unter:
http://www.gallileus.info/gallileus/about/products/alerts/
++

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Preventing duplicate document insertion during optimize

2004-04-30 Thread Kevin A. Burton
Let's say you have two indexes each with the same document literal.  All 
the fields hash the same and the document is a binary duplicate of a 
different document in the second index.

What happens when you do a merge to create a 3rd index from the first 
two?  I assume you now have two documents that are identical in one 
index.  Is there any way to prevent this?

It would be nice to figure out if there's a way to flag a field as a 
primary key so that if it has already added it to just skip.

Kevin

--

Please reply using PGP.

   http://peerfear.org/pubkey.asc
   
   NewsMonster - http://www.newsmonster.org/
   
Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965
  AIM/YIM - sfburtonator,  Web - http://peerfear.org/
GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412
 IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster



signature.asc
Description: OpenPGP digital signature


Re: Preventing duplicate document insertion during optimize

2004-04-30 Thread James Dunn
Kevin,

I have a similar issue.  The only solution I have been
able to come up with is, after the merge, to open an
IndexReader against the merge index, iterate over all
the docs and delete duplicate docs based on my
primary key field.

Jim

--- Kevin A. Burton [EMAIL PROTECTED] wrote:
 Let's say you have two indexes each with the same
 document literal.  All 
 the fields hash the same and the document is a
 binary duplicate of a 
 different document in the second index.
 
 What happens when you do a merge to create a 3rd
 index from the first 
 two?  I assume you now have two documents that are
 identical in one 
 index.  Is there any way to prevent this?
 
 It would be nice to figure out if there's a way to
 flag a field as a 
 primary key so that if it has already added it to
 just skip.
 
 Kevin
 
 -- 
 
 Please reply using PGP.
 
 http://peerfear.org/pubkey.asc
 
 NewsMonster - http://www.newsmonster.org/
 
 Kevin A. Burton, Location - San Francisco, CA, Cell
 - 415.595.9965
AIM/YIM - sfburtonator,  Web -
 http://peerfear.org/
 GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D
 8D04 99F1 4412
   IRC - freenode.net #infoanarchy | #p2p-hackers |
 #newsmonster
 
 

 ATTACHMENT part 2 application/pgp-signature
name=signature.asc






__
Do you Yahoo!?
Win a $20,000 Career Makeover at Yahoo! HotJobs  
http://hotjobs.sweepstakes.yahoo.com/careermakeover 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Will failed optimize corrupt an index?

2003-08-20 Thread Doug Cutting
The index should be fine.  Lucene index updates are atomic.

Doug

Dan Quaroni wrote:
My index grew about 7 gigs larger than I projected it would, and it ran out
of disk space during optimize.  Does lucene have transactions or anything
that would prevent this from corrupting an index, or do I need to generate
the index again?
Thanks!

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Will failed optimize corrupt an index?

2003-08-19 Thread Dan Quaroni
My index grew about 7 gigs larger than I projected it would, and it ran out
of disk space during optimize.  Does lucene have transactions or anything
that would prevent this from corrupting an index, or do I need to generate
the index again?

Thanks!

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Will failed optimize corrupt an index?

2003-08-19 Thread Pasha Bizhan
HI,

 From: Dan Quaroni [mailto:[EMAIL PROTECTED] 
 
 My index grew about 7 gigs larger than I projected it would, 
 and it ran out of disk space during optimize.  Does lucene 
 have transactions or anything that would prevent this from 
 corrupting an index, or do I need to generate the index again?

You must generate index again.

Pasha
Lucene.Net www.sourceforge.net/projects/lucenedotnet


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Files getting deleted when optimize is killed?

2003-07-14 Thread Steve Rajavuori
Upon further examination what I found is this:

- Killing the process while optimize() is still working does NOT cause the
index files to be deleted, HOWEVER --

- Once the index is opened again by a new process (now apparently in an
unstable state due to the incomplete optimize()), at that time all existing
files are deleted and only a file called segments remains.

-Original Message-
From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]
Sent: Saturday, July 12, 2003 7:06 AM
To: Lucene Users List
Subject: Re: Files getting deleted when optimize is killed?



--- Steve Rajavuori [EMAIL PROTECTED] wrote:
 I've had a problem on several occasions where my entire index is
 deleted --
 that is, EVERY file (except 'segments') is gone. There were many
 users on
 the system each time, so its a little hard to tell for sure what was
 going
 on, but my theory is this:
 
 My code will automatically call optimize( ) periodically. Because the
 index
 is very large, it can take a long time. It looks like an
 administrator may
 have killed my process, and its possible that it was killed while an
 optimize( ) was in progress.
 
 I have two questions:
 
 1) Does anyone know if killing an optimize( ) in progress could wipe
 out all
 files like this? (New index created in temporary files that were not
 saved
 properly, while old index files were already deleted???)

I highly doubt it.

 2) Does anyone know of any other way all files in an index could be
 inadvertently deleted (e.g. through killing a process)? For example,
 if you
 kill the process during an 'add' would that cause all files to be
 deleted?

Same as above.  You can create an artificial, large index for testing
purposes.  Call optimize once in a while, and then kill the process.  I
don't think Lucene will remove your files.

Otis


__
Do you Yahoo!?
SBC Yahoo! DSL - Now only $29.95 per month!
http://sbc.yahoo.com

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Files getting deleted when optimize is killed?

2003-07-12 Thread Otis Gospodnetic

--- Steve Rajavuori [EMAIL PROTECTED] wrote:
 I've had a problem on several occasions where my entire index is
 deleted --
 that is, EVERY file (except 'segments') is gone. There were many
 users on
 the system each time, so its a little hard to tell for sure what was
 going
 on, but my theory is this:
 
 My code will automatically call optimize( ) periodically. Because the
 index
 is very large, it can take a long time. It looks like an
 administrator may
 have killed my process, and its possible that it was killed while an
 optimize( ) was in progress.
 
 I have two questions:
 
 1) Does anyone know if killing an optimize( ) in progress could wipe
 out all
 files like this? (New index created in temporary files that were not
 saved
 properly, while old index files were already deleted???)

I highly doubt it.

 2) Does anyone know of any other way all files in an index could be
 inadvertently deleted (e.g. through killing a process)? For example,
 if you
 kill the process during an 'add' would that cause all files to be
 deleted?

Same as above.  You can create an artificial, large index for testing
purposes.  Call optimize once in a while, and then kill the process.  I
don't think Lucene will remove your files.

Otis


__
Do you Yahoo!?
SBC Yahoo! DSL - Now only $29.95 per month!
http://sbc.yahoo.com

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Files getting deleted when optimize is killed?

2003-07-11 Thread Steve Rajavuori
I've had a problem on several occasions where my entire index is deleted --
that is, EVERY file (except 'segments') is gone. There were many users on
the system each time, so its a little hard to tell for sure what was going
on, but my theory is this:

My code will automatically call optimize( ) periodically. Because the index
is very large, it can take a long time. It looks like an administrator may
have killed my process, and its possible that it was killed while an
optimize( ) was in progress.

I have two questions:

1) Does anyone know if killing an optimize( ) in progress could wipe out all
files like this? (New index created in temporary files that were not saved
properly, while old index files were already deleted???)

2) Does anyone know of any other way all files in an index could be
inadvertently deleted (e.g. through killing a process)? For example, if you
kill the process during an 'add' would that cause all files to be deleted?

Steve Rajavuori

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



optimize()

2002-11-26 Thread Leo Galambos
How does it affect overall performance, when I do not call optimize()?

THX

-g-



--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: optimize()

2002-11-26 Thread Otis Gospodnetic
This was just mentioned a few days ago. Check the archives.
Not needed for indexing, good to do after you are done indexing, as the
index reader needs to open and search through less files.

Otis

--- Leo Galambos [EMAIL PROTECTED] wrote:
 How does it affect overall performance, when I do not call
 optimize()?
 
 THX
 
 -g-
 
 
 
 --
 To unsubscribe, e-mail:  
 mailto:[EMAIL PROTECTED]
 For additional commands, e-mail:
 mailto:[EMAIL PROTECTED]
 


__
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: optimize()

2002-11-26 Thread Leo Galambos
Did you try any tests in this area? (figures, charts...)

AFAIK reader reads identical number of (giga)bytes. BTW, it could read
segments in many threads. I do not see why it would be slower (until you
do many delete()-s). If reader opens 1 or 50 files, it is still nothing.

-g-

On Tue, 26 Nov 2002, Otis Gospodnetic wrote:

 This was just mentioned a few days ago. Check the archives.
 Not needed for indexing, good to do after you are done indexing, as the
 index reader needs to open and search through less files.
 
 Otis
 
 --- Leo Galambos [EMAIL PROTECTED] wrote:
  How does it affect overall performance, when I do not call
  optimize()?
  
  THX
  
  -g-
  
  
  
  --
  To unsubscribe, e-mail:  
  mailto:[EMAIL PROTECTED]
  For additional commands, e-mail:
  mailto:[EMAIL PROTECTED]
  
 
 
 __
 Do you Yahoo!?
 Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
 http://mailplus.yahoo.com
 
 --
 To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
 For additional commands, e-mail: mailto:[EMAIL PROTECTED]
 


--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




RE: optimize()

2002-11-26 Thread Stephen Eaton
I don't know if this answers your question, but I had alot of problems
with lucene bombing out with out of memory errors.  I was not using the
optimize, I tried this and hey presto no more problems.

-Original Message-
From: Leo Galambos [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, 27 November 2002 5:22 AM
To: [EMAIL PROTECTED]
Subject: optimize()


How does it affect overall performance, when I do not call optimize()?

THX

-g-



--
To unsubscribe, e-mail:
mailto:[EMAIL PROTECTED]
For additional commands, e-mail:
mailto:[EMAIL PROTECTED]


--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




large index - slow optimize()

2002-11-22 Thread Otis Gospodnetic
Hello,

I am building an index with a few 1M documents, and every X documents
added to the index I call optimize() on the IndexWriter.
I have noticed that as the index grows this calls takes more and more
time, even though the number of new segments that need to be merged is
the same between every optimize() call.
I suspect this is normal and not a bug, but is there no way around
that?  Do you know which part is the part that takes longer and longer
as the index grows?

Thanks,
Otis


__
Do you Yahoo!?
Yahoo! Mail Plus – Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




RE: large index - slow optimize()

2002-11-22 Thread Armbrust, Daniel C.
Note - this is not a fact, this is what I think I know about how it works.

My working assumption has been its just a matter of disk speed, since during optimize, 
the entire index is copied into new files, and then at the end, the old one is 
removed.  So the more GB you have to copy, the longer it takes.

This is also the reason that you need double the size of your index available on the 
drive in order to perform an optimize, correct?  Or does this only apply when you are 
merging indexes?


Dan



-Original Message-
From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]] 
Sent: Friday, November 22, 2002 12:52 PM
To: [EMAIL PROTECTED]
Subject: large index - slow optimize()


Hello,

I am building an index with a few 1M documents, and every X documents
added to the index I call optimize() on the IndexWriter.
I have noticed that as the index grows this calls takes more and more
time, even though the number of new segments that need to be merged is
the same between every optimize() call.
I suspect this is normal and not a bug, but is there no way around
that?  Do you know which part is the part that takes longer and longer
as the index grows?

Thanks,
Otis


__
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: optimize(), delete() calls on IndexWriter

2002-03-08 Thread Otis Gospodnetic

No they don't. Note that delete() is in IndexReader.

Otis

--- Aruna Raghavan [EMAIL PROTECTED] wrote:
 Hi,
 Do calls like optimize() and delete() on the Indexwriter cause a
 separate
 thread to be kicked off?
 Thanks!
 Aruna.
 
 --
 To unsubscribe, e-mail:  
 mailto:[EMAIL PROTECTED]
 For additional commands, e-mail:
 mailto:[EMAIL PROTECTED]
 


__
Do You Yahoo!?
Try FREE Yahoo! Mail - the world's greatest free email!
http://mail.yahoo.com/

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




RE: optimize(), delete() calls on IndexWriter

2002-03-08 Thread Aruna Raghavan

Yes, thanks.

-Original Message-
From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]]
Sent: Friday, March 08, 2002 11:46 AM
To: Lucene Users List
Subject: Re: optimize(), delete() calls on IndexWriter


No they don't. Note that delete() is in IndexReader.

Otis

--- Aruna Raghavan [EMAIL PROTECTED] wrote:
 Hi,
 Do calls like optimize() and delete() on the Indexwriter cause a
 separate
 thread to be kicked off?
 Thanks!
 Aruna.
 
 --
 To unsubscribe, e-mail:  
 mailto:[EMAIL PROTECTED]
 For additional commands, e-mail:
 mailto:[EMAIL PROTECTED]
 


__
Do You Yahoo!?
Try FREE Yahoo! Mail - the world's greatest free email!
http://mail.yahoo.com/

--
To unsubscribe, e-mail:
mailto:[EMAIL PROTECTED]
For additional commands, e-mail:
mailto:[EMAIL PROTECTED]

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]