optimize()

2002-11-26 Thread Leo Galambos
How does it affect overall performance, when I do not call optimize()?

THX

-g-



--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>




Re: optimize()

2002-11-26 Thread Otis Gospodnetic
This was just mentioned a few days ago. Check the archives.
Not needed for indexing, good to do after you are done indexing, as the
index reader needs to open and search through less files.

Otis

--- Leo Galambos <[EMAIL PROTECTED]> wrote:
> How does it affect overall performance, when I do not call
> optimize()?
> 
> THX
> 
> -g-
> 
> 
> 
> --
> To unsubscribe, e-mail:  
> <mailto:[EMAIL PROTECTED]>
> For additional commands, e-mail:
> <mailto:[EMAIL PROTECTED]>
> 


__
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com

--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>




Re: optimize()

2002-11-26 Thread Leo Galambos
Did you try any tests in this area? (figures, charts...)

AFAIK reader reads identical number of (giga)bytes. BTW, it could read
segments in many threads. I do not see why it would be slower (until you
do many delete()-s). If reader opens 1 or 50 files, it is still nothing.

-g-

On Tue, 26 Nov 2002, Otis Gospodnetic wrote:

> This was just mentioned a few days ago. Check the archives.
> Not needed for indexing, good to do after you are done indexing, as the
> index reader needs to open and search through less files.
> 
> Otis
> 
> --- Leo Galambos <[EMAIL PROTECTED]> wrote:
> > How does it affect overall performance, when I do not call
> > optimize()?
> > 
> > THX
> > 
> > -g-
> > 
> > 
> > 
> > --
> > To unsubscribe, e-mail:  
> > <mailto:[EMAIL PROTECTED]>
> > For additional commands, e-mail:
> > <mailto:[EMAIL PROTECTED]>
> > 
> 
> 
> __
> Do you Yahoo!?
> Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
> http://mailplus.yahoo.com
> 
> --
> To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
> 


--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>




Re: optimize()

2002-11-26 Thread Otis Gospodnetic
No tests, just intuition that it's faster to find something in 1 file
than in 100 of them.  If you do some tests, I'd love to hear the real
numbers :)

Otis

--- Leo Galambos <[EMAIL PROTECTED]> wrote:
> Did you try any tests in this area? (figures, charts...)
> 
> AFAIK reader reads identical number of (giga)bytes. BTW, it could
> read
> segments in many threads. I do not see why it would be slower (until
> you
> do many delete()-s). If reader opens 1 or 50 files, it is still
> nothing.
> 
> -g-
> 
> On Tue, 26 Nov 2002, Otis Gospodnetic wrote:
> 
> > This was just mentioned a few days ago. Check the archives.
> > Not needed for indexing, good to do after you are done indexing, as
> the
> > index reader needs to open and search through less files.
> > 
> > Otis
> > 
> > --- Leo Galambos <[EMAIL PROTECTED]> wrote:
> > > How does it affect overall performance, when I do not call
> > > optimize()?
> > > 
> > > THX
> > > 
> > > -g-
> > > 
> > > 
> > > 
> > > --
> > > To unsubscribe, e-mail:  
> > > <mailto:[EMAIL PROTECTED]>
> > > For additional commands, e-mail:
> > > <mailto:[EMAIL PROTECTED]>
> > > 
> > 
> > 
> > __
> > Do you Yahoo!?
> > Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
> > http://mailplus.yahoo.com
> > 
> > --
> > To unsubscribe, e-mail:  
> <mailto:[EMAIL PROTECTED]>
> > For additional commands, e-mail:
> <mailto:[EMAIL PROTECTED]>
> > 
> 
> 
> --
> To unsubscribe, e-mail:  
> <mailto:[EMAIL PROTECTED]>
> For additional commands, e-mail:
> <mailto:[EMAIL PROTECTED]>
> 


__
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com

--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>




Re: optimize()

2002-11-26 Thread Leo Galambos
Hmmm. The question is what would I measure?

Otis, do you know what implementation is used in Lucene (I am lost in 
hiearchy of readers/writers):

a) single thread for solving query
b) more than one thread for a query

(a) would mean that Lucene could solve queries more than 50% slower
than in case (b). It would also mean, that Lucene's index is in optimal
state when just one segment exists. And it also means that if you remove
half of documents from a collection you have to rebuild one big segment to
a smaller one, and so on... It would cost a lot of CPU/HDD time.

So it looks like I would measure effect of random insert/remove 
operations. The problem is, how often I would call optimize in the test?

Any thoughts?

-g-

On Tue, 26 Nov 2002, Otis Gospodnetic wrote:

> No tests, just intuition that it's faster to find something in 1 file
> than in 100 of them.  If you do some tests, I'd love to hear the real
> numbers :)
> 
> Otis
> 
> --- Leo Galambos <[EMAIL PROTECTED]> wrote:
> > Did you try any tests in this area? (figures, charts...)
> > 
> > AFAIK reader reads identical number of (giga)bytes. BTW, it could
> > read
> > segments in many threads. I do not see why it would be slower (until
> > you
> > do many delete()-s). If reader opens 1 or 50 files, it is still
> > nothing.
> > 
> > -g-
> > 
> > On Tue, 26 Nov 2002, Otis Gospodnetic wrote:
> > 
> > > This was just mentioned a few days ago. Check the archives.
> > > Not needed for indexing, good to do after you are done indexing, as
> > the
> > > index reader needs to open and search through less files.
> > > 
> > > Otis
> > > 
> > > --- Leo Galambos <[EMAIL PROTECTED]> wrote:
> > > > How does it affect overall performance, when I do not call
> > > > optimize()?
> > > > 
> > > > THX
> > > > 
> > > > -g-
> > > > 
> > > > 
> > > > 
> > > > --
> > > > To unsubscribe, e-mail:  
> > > > <mailto:[EMAIL PROTECTED]>
> > > > For additional commands, e-mail:
> > > > <mailto:[EMAIL PROTECTED]>
> > > > 
> > > 
> > > 
> > > __
> > > Do you Yahoo!?
> > > Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
> > > http://mailplus.yahoo.com
> > > 
> > > --
> > > To unsubscribe, e-mail:  
> > <mailto:[EMAIL PROTECTED]>
> > > For additional commands, e-mail:
> > <mailto:[EMAIL PROTECTED]>
> > > 
> > 
> > 
> > --
> > To unsubscribe, e-mail:  
> > <mailto:[EMAIL PROTECTED]>
> > For additional commands, e-mail:
> > <mailto:[EMAIL PROTECTED]>
> > 
> 
> 
> __
> Do you Yahoo!?
> Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
> http://mailplus.yahoo.com
> 
> --
> To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
> 


--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>




RE: optimize()

2002-11-26 Thread Stephen Eaton
I don't know if this answers your question, but I had alot of problems
with lucene bombing out with out of memory errors.  I was not using the
optimize, I tried this and hey presto no more problems.

-Original Message-
From: Leo Galambos [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, 27 November 2002 5:22 AM
To: [EMAIL PROTECTED]
Subject: optimize()


How does it affect overall performance, when I do not call optimize()?

THX

-g-



--
To unsubscribe, e-mail:
<mailto:[EMAIL PROTECTED]>
For additional commands, e-mail:
<mailto:[EMAIL PROTECTED]>


--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>




Re: optimize()

2002-11-26 Thread Otis Gospodnetic
This should answer your a) or b) question:

[otis@linux2 java]$ pwd
/mnt/disk2/cvs-repositories/jakarta/jakarta-lucene/src/java

[otis@linux2 java]$ which ffjg
alias ffjg='find . -type f -name \*.java|xargs grep'
/usr/bin/find
/usr/bin/xargs

[otis@linux2 java]$ ffjg Thread
./org/apache/lucene/store/Lock.java:   
Thread.sleep(sleepInterval);
[otis@linux2 java]$


Unoptimized index is not a problem for document additions, they take
constant time, regardless of the size of the index and regardless of
whether the index is optimized or not.
Searches of unoptimized index take longer than searches of an optimized
index.
Here's a test:
Write a class that indexes X documents where X is a substantial number.
Them make a copy of that index X and call it Y.  Optimize index Y.
Then do a search against one, and against the other index, and time it.
Then let us know which one is faster and by how much.

Good luck,
Otis



--- Leo Galambos <[EMAIL PROTECTED]> wrote:
> Hmmm. The question is what would I measure?
> 
> Otis, do you know what implementation is used in Lucene (I am lost in
> 
> hiearchy of readers/writers):
> 
> a) single thread for solving query
> b) more than one thread for a query
> 
> (a) would mean that Lucene could solve queries more than 50% slower
> than in case (b). It would also mean, that Lucene's index is in
> optimal
> state when just one segment exists. And it also means that if you
> remove
> half of documents from a collection you have to rebuild one big
> segment to
> a smaller one, and so on... It would cost a lot of CPU/HDD time.
> 
> So it looks like I would measure effect of random insert/remove 
> operations. The problem is, how often I would call optimize in the
> test?
> 
> Any thoughts?
> 
> -g-
> 
> On Tue, 26 Nov 2002, Otis Gospodnetic wrote:
> 
> > No tests, just intuition that it's faster to find something in 1
> file
> > than in 100 of them.  If you do some tests, I'd love to hear the
> real
> > numbers :)
> > 
> > Otis
> > 
> > --- Leo Galambos <[EMAIL PROTECTED]> wrote:
> > > Did you try any tests in this area? (figures, charts...)
> > > 
> > > AFAIK reader reads identical number of (giga)bytes. BTW, it could
> > > read
> > > segments in many threads. I do not see why it would be slower
> (until
> > > you
> > > do many delete()-s). If reader opens 1 or 50 files, it is still
> > > nothing.
> > > 
> > > -g-
> > > 
> > > On Tue, 26 Nov 2002, Otis Gospodnetic wrote:
> > > 
> > > > This was just mentioned a few days ago. Check the archives.
> > > > Not needed for indexing, good to do after you are done
> indexing, as
> > > the
> > > > index reader needs to open and search through less files.
> > > > 
> > > > Otis
> > > > 
> > > > --- Leo Galambos <[EMAIL PROTECTED]> wrote:
> > > > > How does it affect overall performance, when I do not call
> > > > > optimize()?
> > > > > 
> > > > > THX
> > > > > 
> > > > > -g-
> > > > > 
> > > > > 
> > > > > 
> > > > > --
> > > > > To unsubscribe, e-mail:  
> > > > > <mailto:[EMAIL PROTECTED]>
> > > > > For additional commands, e-mail:
> > > > > <mailto:[EMAIL PROTECTED]>
> > > > > 
> > > > 
> > > > 
> > > > __
> > > > Do you Yahoo!?
> > > > Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
> > > > http://mailplus.yahoo.com
> > > > 
> > > > --
> > > > To unsubscribe, e-mail:  
> > > <mailto:[EMAIL PROTECTED]>
> > > > For additional commands, e-mail:
> > > <mailto:[EMAIL PROTECTED]>
> > > > 
> > > 
> > > 
> > > --
> > > To unsubscribe, e-mail:  
> > > <mailto:[EMAIL PROTECTED]>
> > > For additional commands, e-mail:
> > > <mailto:[EMAIL PROTECTED]>
> > > 
> > 
> > 
> > __
> > Do you Yahoo!?
> > Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
> > http://mailplus.yahoo.com
> > 
> > --
> > To unsubscribe, e-mail:  
> <mailto:[EMAIL PROTECTED]>
> > For additional commands, e-mail:
> <mailto:[EMAIL PROTECTED]>
> > 
> 
> 
> --
> To unsubscribe, e-mail:  
> <mailto:[EMAIL PROTECTED]>
> For additional commands, e-mail:
> <mailto:[EMAIL PROTECTED]>
> 


__
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com

--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>




Re: optimize()

2002-11-27 Thread Leo Galambos
> Unoptimized index is not a problem for document additions, they take
> constant time, regardless of the size of the index and regardless of
> whether the index is optimized or not.

IMHO It is not true. It would mean that O(log(n/M))=O(1).  (n-number of
documents in index, M max number of segments per level). I think that if
you are true, we are able to sort an array in O(n) and not in O(nlog n).

> Searches of unoptimized index take longer than searches of an optimized
> index.

Is there any limitation in Lucene architecture, so that you cannot use
multithread algorithm for calculation of hit lists? I think it would boost
performance. Otis, thank you for your proof, that Lucene has not it now
(you got me :-)). But what about next releases?

> Then do a search against one, and against the other index, and time it.
> Then let us know which one is faster and by how much.

OK, I will.

I would like to compare Lucene to another engine. The test would be
precise, because I wanna use it in an academic paper.

Aim of my question was, how could I configure Lucene to get maximum
performance for test. It looks to be pretty hard, because:

- if I do not call optimize(), I can build index at maximum speed, but 
searches are slow, so it is not configuration for dynamic environment

- if I call optimize() regularly (as real application would do), indexing
is slower and slower when I add more and more documents to the collection

IMHO the second option describes "real environment", so we get:

loop:
  K-times indexDoc()
  optimize()
end-of-loop

What *K* would I use? 1000, 1 or 10 or 100? Folks, what *K* do you use 
in your applications? Thank you.

-g-



--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>




Re: optimize()

2002-11-27 Thread Scott Ganyo
We generally optimize only after a full index (re-)build or during 
periods where the index is not being unused.

Scott

Leo Galambos wrote:
Unoptimized index is not a problem for document additions, they take
constant time, regardless of the size of the index and regardless of
whether the index is optimized or not.


IMHO It is not true. It would mean that O(log(n/M))=O(1).  (n-number of
documents in index, M max number of segments per level). I think that if
you are true, we are able to sort an array in O(n) and not in O(nlog n).



Searches of unoptimized index take longer than searches of an optimized
index.



Is there any limitation in Lucene architecture, so that you cannot use
multithread algorithm for calculation of hit lists? I think it would boost
performance. Otis, thank you for your proof, that Lucene has not it now
(you got me :-)). But what about next releases?



Then do a search against one, and against the other index, and time it.
Then let us know which one is faster and by how much.



OK, I will.

I would like to compare Lucene to another engine. The test would be
precise, because I wanna use it in an academic paper.

Aim of my question was, how could I configure Lucene to get maximum
performance for test. It looks to be pretty hard, because:

- if I do not call optimize(), I can build index at maximum speed, but 
searches are slow, so it is not configuration for dynamic environment

- if I call optimize() regularly (as real application would do), indexing
is slower and slower when I add more and more documents to the collection

IMHO the second option describes "real environment", so we get:

loop:
  K-times indexDoc()
  optimize()
end-of-loop

What *K* would I use? 1000, 1 or 10 or 100? Folks, what *K* do you use 
in your applications? Thank you.

-g-



--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

--
Brain: Pinky, are you pondering what I’m pondering?
Pinky: I think so, Brain, but calling it a pu-pu platter? Huh, what were 
they thinking?


--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>



Optimize crash

2004-04-10 Thread Paul
The problem I have is that when I try to execute an optimize on my Lucene
index I get the following error thrown (see below).

If anyone can help, and the answer requires some digging, then I have the
very index tarred and gzipped for anon FTP access at ftp.catalyst.net.nz (in
the "pub" sub-directory). This is 462Mb, and unpacks to roughly twice that
size. There is also a README file there.

Here is the error I get very quickly when optimize runs:

--- CUT ---
java.lang.ArrayIndexOutOfBoundsException: 111 >= 23
at java.util.Vector.elementAt(Vector.java(Compiled Code))
at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java(Compiled 
Code))
at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java(Compiled 
Code))
at 
org.apache.lucene.index.SegmentReader.document(SegmentReader.java(Compiled 
Code))
at 
org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java(Compiled 
Code))
at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:92)
at 
org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:473)
at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:354)
at nz.net.catalyst.lucene.server.Optimize.execute(Optimize.java:80)
at nz.net.catalyst.lucene.server.Control.optimize(Control.java:87)
at nz.net.catalyst.lucene.server.Control.execute(Control.java:49)
at nz.net.catalyst.lucene.server.Dialogue.process(Dialogue.java:111)
at nz.net.catalyst.lucene.server.Session.communicate(Session.java:125)
at nz.net.catalyst.SocketClient.run(SocketClient.java:70)
at java.lang.Thread.run(Thread.java:512)
--- CUT ---

This was actually thrown by Lucene v1.4-rc2, which I was testing to see if it
solved my problem. I am currently running v1.3-Final on my live site and this 
does the same thing. This is running on Debian Linux, Woody, and is using the
IBM Runtime Environment for Linux Java(TM) 2 Technology Edition, Version
1.3.1, JRE.

It should be noted that I have had this problem before, and I solved it by
completely re-indexing the article set from scratch (starting with no index at
all). After that process, the optimize worked fine. Then somewhere along the
line of many days indexing new articles, and doing an optimise every day at
about 3.30am, the problem has returned.

The articles being indexed are all homogeneous in terms of fields being 
indexed, details below:

FIELD DEFINITIONS
Field name  Field type  Stored? Indexed?
--  --  --- 
Domain  TextSTORED  INDEXED
Id  Id  STORED  INDEXED
dateDateSTORED  INDEXED
datetimeDateSTORED  INDEXED
added   DateSTORED  INDEXED
categoryTextSTORED  INDEXED
subcategory TextSTORED  INDEXED
source  TextSTORED  INDEXED
title   TextSTORED  NOT INDEXED
slugTextSTORED  NOT INDEXED
typeTextSTORED  NOT INDEXED
sourcetype  TextNOT STORED  INDEXED


Any help greatly appreciated.

Cheers,
Paul.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Optimize Crash

2004-04-10 Thread Paul
The problem I have is that when I try to execute an optimize on my Lucene
index I get the following error thrown (see below).

If anyone can help, and the answer requires some digging, then I have the
very index tarred and gzipped for anon FTP access at ftp.catalyst.net.nz (in
the "pub" sub-directory). This is 462Mb, and unpacks to roughly twice that
size. There is also a README file there.

Here is the error I get very quickly when optimize runs:

--- CUT ---
java.lang.ArrayIndexOutOfBoundsException: 111 >= 23
        at java.util.Vector.elementAt(Vector.java(Compiled Code))
        at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java(Compiled 
Code))
        at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java(Compiled 
Code))
        at 
org.apache.lucene.index.SegmentReader.document(SegmentReader.java(Compiled 
Code))
        at 
org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java(Compiled 
Code))
        at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:92)
        at 
org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:473)
        at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:354)
        at nz.net.catalyst.lucene.server.Optimize.execute(Optimize.java:80)
        at nz.net.catalyst.lucene.server.Control.optimize(Control.java:87)
        at nz.net.catalyst.lucene.server.Control.execute(Control.java:49)
        at nz.net.catalyst.lucene.server.Dialogue.process(Dialogue.java:111)
        at nz.net.catalyst.lucene.server.Session.communicate(Session.java:125)
        at nz.net.catalyst.SocketClient.run(SocketClient.java:70)
        at java.lang.Thread.run(Thread.java:512)
--- CUT ---

This was actually thrown by Lucene v1.4-rc2, which I was testing to see if it
solved my problem. I am currently running v1.3-Final on my live site and this 
does the same thing. This is running on Debian Linux, Woody, and is using the
IBM Runtime Environment for Linux Java(TM) 2 Technology Edition, Version
1.3.1, JRE.

It should be noted that I have had this problem before, and I solved it by
completely re-indexing the article set from scratch (starting with no index at
all). After that process, the optimize worked fine. Then somewhere along the
line of many days indexing new articles, and doing an optimise every day at
about 3.30am, the problem has returned.

The articles being indexed are all homogeneous in terms of fields being 
indexed, details below:

FIELD DEFINITIONS
Field name      Field type      Stored?         Indexed?
--      --      ---         
Domain          Text            STORED          INDEXED
Id              Id              STORED          INDEXED
date            Date            STORED          INDEXED
datetime        Date            STORED          INDEXED
added           Date            STORED          INDEXED
category        Text            STORED          INDEXED
subcategory     Text            STORED          INDEXED
source          Text            STORED          INDEXED
title           Text            STORED          NOT INDEXED
slug            Text            STORED          NOT INDEXED
type            Text            STORED          NOT INDEXED
sourcetype      Text            NOT STORED      INDEXED


Any help greatly appreciated.

Cheers,
Paul.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



delete / optimize question

2002-09-17 Thread Tom Mortimer

Hi all,

I'm new-ish to Lucene, and having a few problems with document deletion.  
In particular, the point at which a deleted document is no longer visible to
an IndexReader. Is the following scenario sane?

1. Open an IndexReader and delete all docs with Term("name", "Bob"), then
   close the reader.

2. Open an IndexWriter and add various "non-Bob" documents

3. then add a new document with a Term("name", "Bob").  

4. Call optimize()

5. Open an IndexReader and get docFreq for "Bob"


I'd expect the final doc freq to be 1, as all the "Bob" docs should have
been deleted except for the one added in step 3, but instead I'm getting
freq > 1.  

Do I in fact need to do the optimize() step immediately after the deletions,
and before adding any more docs?  This could be expensive with a large
index, and document additions and deletions required in random order.

Thanks for any help!

Tom



--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>




Re: Optimize crash

2004-04-19 Thread Paul
Dear all,

I hate to be insistent, but I have a large live website with a growing,
un-optimizable Lucene index and which therefore has it's appointment
with destiny pencilled into The Diary of Doom on a date roughly
three weeks hence.

So if I'm doing something stupid, or there's a workaround, or someone
is already looking into this problem, *please* let me know. My alternative
is to spend two days re-indexing the archive, and then to just wait for the 
inevitable repeat of this problem, like Groundhog Day, which isn't a
particularly attractive option.

(NB: The original message is under the same subject line in the archive.)

Thanks.

Cheers,
Paul.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



addIndexes and optimize

2004-07-07 Thread roy-lucene-user
Hey y'all again,

Just wondering why the IndexWriter.addIndexes method calls optimize before and after 
it starts merging segments together.

We would like to create an addIndexes method that doesn't optimize and call optimize 
on the IndexWriter later.

Roy.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: delete / optimize question

2002-09-17 Thread Joshua O'Madadhain

On Tue, 17 Sep 2002, Tom Mortimer wrote:

> I'm new-ish to Lucene, and having a few problems with document deletion.  
> In particular, the point at which a deleted document is no longer visible to
> an IndexReader. Is the following scenario sane?
> 
> 1. Open an IndexReader and delete all docs with Term("name", "Bob"), then
>close the reader.
> 
> 2. Open an IndexWriter and add various "non-Bob" documents
> 
> 3. then add a new document with a Term("name", "Bob").  
> 
> 4. Call optimize()
> 
> 5. Open an IndexReader and get docFreq for "Bob"
> 
> I'd expect the final doc freq to be 1, as all the "Bob" docs should have
> been deleted except for the one added in step 3, but instead I'm getting
> freq > 1.  
> 
> Do I in fact need to do the optimize() step immediately after the deletions,
> and before adding any more docs?  This could be expensive with a large
> index, and document additions and deletions required in random order.

I *think* that what you think should happen is what indeed should be
happening.  However, I have a few suggestions for sanity checks to make
sure that you're seeing what you think you're seeing:

(1) Call docFreq("Bob") before and after each step.  This will make sure
that (for instance) your supposed "non-Bob" documents are indeed
"non-Bob", which if false would be a subtle sort of "gotcha".

(2) Double-check that you use the same index in all cases.

(3) Are you closing the index before you call optimize()?  (According to
the docs, you shouldn't.)

(4) Are you closing the IndexWriter after you call optimize() and before
you call docFreq()?  (close() does flush changes, although I don't know
whether it should be necessary after optimize().)

Anyway, good luck.

Joshua

 [EMAIL PROTECTED] Per Obscurius...www.ics.uci.edu/~jmadden
  Joshua O'Madadhain: Information Scientist, Musician, Philosopher-At-Tall
 It's that moment of dawning comprehension that I live for--Bill Watterson
My opinions are too rational and insightful to be those of any organization.




--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>




large index -> slow optimize()

2002-11-22 Thread Otis Gospodnetic
Hello,

I am building an index with a few 1M documents, and every X documents
added to the index I call optimize() on the IndexWriter.
I have noticed that as the index grows this calls takes more and more
time, even though the number of new segments that need to be merged is
the same between every optimize() call.
I suspect this is normal and not a bug, but is there no way around
that?  Do you know which part is the part that takes longer and longer
as the index grows?

Thanks,
Otis


__
Do you Yahoo!?
Yahoo! Mail Plus – Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com

--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>




JDBCDirectory to prevent optimize()?

2004-11-22 Thread Kevin A. Burton
It seems that when compared to other datastores that Lucene starts to 
fall down.  For example lucene doesn't perform online index 
optimizations so if you add 10 documents you have to run optimize() 
again and this isn't exactly a fast operation.

I'm wondering about the potential for a generic JDBCDirectory for 
keeping the lucene index within a database. 

It sounds somewhat unconventional would allow you to perform live 
addDirectory updates without performing an optimize() again.

Has anyone looked at this?  How practical would it be.
Kevin
--
Use Rojo (RSS/Atom aggregator).  Visit http://rojo.com. Ask me for an 
invite!  Also see irc.freenode.net #rojo if you want to chat.

Rojo is Hiring! - http://www.rojonetworks.com/JobsAtRojo.html
If you're interested in RSS, Weblogs, Social Networking, etc... then you 
should work for Rojo!  If you recommend someone and we hire them you'll 
get a free iPod!
   
Kevin A. Burton, Location - San Francisco, CA
  AIM/YIM - sfburtonator,  Web - http://peerfear.org/
GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


finalize delete without optimize

2004-12-06 Thread John Wang
Hi:

   Is there a way to finalize delete, e.g. actually remove them from
the segments and make sure the docIDs are contiguous again.

   The only explicit way to do this is by calling
IndexWriter.optmize(). But this call does a lot more (also merges all
the segments), hence is very expensive. Is there a way to simply just
finalize the deletes without having to merge all the segments?

If not, I'd be glad to submit an implementation of this feature if
the Lucene devs agree this is useful.

Thanks

-John

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



how often to optimize?

2004-12-21 Thread aurora
Right now I am incrementally adding about 100 documents to the index a day  
and then optimize after that. I find that optimize essentially rebuilding  
the entire index into a single file. So the size of disk write is  
proportion to the total index size, not to the size of documents  
incrementally added.

So my question is would it be an overkill to optimize everyday? Is there  
any guideline on how often to optimize? Every 1000 documents or more?  
Every week? Is there any concern if there are a lot of documents added  
without optimizing?

Thanks.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


optimize(), delete() calls on IndexWriter

2002-03-08 Thread Aruna Raghavan

Hi,
Do calls like optimize() and delete() on the Indexwriter cause a separate
thread to be kicked off?
Thanks!
Aruna.

--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>




RE: large index -> slow optimize()

2002-11-22 Thread Armbrust, Daniel C.
Note - this is not a fact, this is what I think I know about how it works.

My working assumption has been its just a matter of disk speed, since during optimize, 
the entire index is copied into new files, and then at the end, the old one is 
removed.  So the more GB you have to copy, the longer it takes.

This is also the reason that you need double the size of your index available on the 
drive in order to perform an optimize, correct?  Or does this only apply when you are 
merging indexes?


Dan



-Original Message-
From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]] 
Sent: Friday, November 22, 2002 12:52 PM
To: [EMAIL PROTECTED]
Subject: large index -> slow optimize()


Hello,

I am building an index with a few 1M documents, and every X documents
added to the index I call optimize() on the IndexWriter.
I have noticed that as the index grows this calls takes more and more
time, even though the number of new segments that need to be merged is
the same between every optimize() call.
I suspect this is normal and not a bug, but is there no way around
that?  Do you know which part is the part that takes longer and longer
as the index grows?

Thanks,
Otis


__
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com

--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>




using optimize and addDocument concurrently.

2004-10-19 Thread Stephen Halsey
Hi,

My basic question is whether it is possible to continue to add documents to an index 
in one Thread while running a long running optimization of the index (approx 30 mins) 
in another thread.  I'm using Lucene version 1.4.2.  The concurrency matrix at 
http://www.jguru.com/faq/view.jsp?EID=913302 shows that if you use the same 
IndexWriter object you can do concurrent writes and optimization.  When I try it in my 
program the addDocuments wait until the optimization has finished, so in this respect 
it is Thread safe, but the operations cannot be performed at the same time.  Our 
problem is that the index needs to be continually kept up to date with new news 
articles, but also needs to be regularly optimized to keep it fast.  If I cannot 
update and optimize one index at the same time the best way I can see of doing this is 
maintaining multiple identical indexes and offlining, optimizing, letting them catch 
up-to-date and re-onlining them.  Does that sounds best to you?

Thanks a lot in advance


Steve

Re: JDBCDirectory to prevent optimize()?

2004-11-23 Thread Daniel Naber
On Tuesday 23 November 2004 00:06, Kevin A. Burton wrote:

> I'm wondering about the potential for a generic JDBCDirectory for
> keeping the lucene index within a database.

Such a thing already exists: http://ppinew.mnis.com/jdbcdirectory/, but I 
don't know about its scalability.

Regards
 Daniel

-- 
http://www.danielnaber.de

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: JDBCDirectory to prevent optimize()?

2004-11-23 Thread Erik Hatcher
Also, there is a DBDirectory in the sandbox to store a Lucene index 
inside Berkeley DB.

Erik
On Nov 22, 2004, at 6:06 PM, Kevin A. Burton wrote:
It seems that when compared to other datastores that Lucene starts to 
fall down.  For example lucene doesn't perform online index 
optimizations so if you add 10 documents you have to run optimize() 
again and this isn't exactly a fast operation.

I'm wondering about the potential for a generic JDBCDirectory for 
keeping the lucene index within a database.
It sounds somewhat unconventional would allow you to perform live 
addDirectory updates without performing an optimize() again.

Has anyone looked at this?  How practical would it be.
Kevin
--
Use Rojo (RSS/Atom aggregator).  Visit http://rojo.com. Ask me for an 
invite!  Also see irc.freenode.net #rojo if you want to chat.

Rojo is Hiring! - http://www.rojonetworks.com/JobsAtRojo.html
If you're interested in RSS, Weblogs, Social Networking, etc... then 
you should work for Rojo!  If you recommend someone and we hire them 
you'll get a free iPod!
   Kevin A. Burton, Location - San Francisco, CA
  AIM/YIM - sfburtonator,  Web - http://peerfear.org/
GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: JDBCDirectory to prevent optimize()?

2004-11-23 Thread Kevin A. Burton
Erik Hatcher wrote:
Also, there is a DBDirectory in the sandbox to store a Lucene index 
inside Berkeley DB.
I assume this would prevent prefix queries from working...
Kevin
--
Use Rojo (RSS/Atom aggregator).  Visit http://rojo.com. Ask me for an 
invite!  Also see irc.freenode.net #rojo if you want to chat.

Rojo is Hiring! - http://www.rojonetworks.com/JobsAtRojo.html
If you're interested in RSS, Weblogs, Social Networking, etc... then you 
should work for Rojo!  If you recommend someone and we hire them you'll 
get a free iPod!
   
Kevin A. Burton, Location - San Francisco, CA
  AIM/YIM - sfburtonator,  Web - http://peerfear.org/
GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: JDBCDirectory to prevent optimize()?

2004-11-23 Thread Erik Hatcher
On Nov 23, 2004, at 6:02 PM, Kevin A. Burton wrote:
Erik Hatcher wrote:
Also, there is a DBDirectory in the sandbox to store a Lucene index 
inside Berkeley DB.
I assume this would prevent prefix queries from working...
Huh?  Why would you assume that?  As far as I know, and I've tested 
this some, a Lucene index inside Berkeley DB works the same as if it 
had been in RAM or on the filesystem.

Erik
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: finalize delete without optimize

2004-12-08 Thread John Wang
Hi folks:

I sent this out a few days ago without a response. 

Please help.

Thanks in advance

-John


On Mon, 6 Dec 2004 21:15:00 -0800, John Wang <[EMAIL PROTECTED]> wrote:
> Hi:
> 
>   Is there a way to finalize delete, e.g. actually remove them from
> the segments and make sure the docIDs are contiguous again.
> 
>   The only explicit way to do this is by calling
> IndexWriter.optmize(). But this call does a lot more (also merges all
> the segments), hence is very expensive. Is there a way to simply just
> finalize the deletes without having to merge all the segments?
> 
>If not, I'd be glad to submit an implementation of this feature if
> the Lucene devs agree this is useful.
> 
> Thanks
> 
> -John
>

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: finalize delete without optimize

2004-12-09 Thread Aviran
Lucene standard API does not support this kind of operation.

Aviran
http://www.aviransplace.com


-Original Message-
From: John Wang [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, December 08, 2004 17:32 PM
To: [EMAIL PROTECTED]
Subject: Re: finalize delete without optimize


Hi folks:

I sent this out a few days ago without a response. 

Please help.

Thanks in advance

-John


On Mon, 6 Dec 2004 21:15:00 -0800, John Wang <[EMAIL PROTECTED]> wrote:
> Hi:
> 
>   Is there a way to finalize delete, e.g. actually remove them from 
> the segments and make sure the docIDs are contiguous again.
> 
>   The only explicit way to do this is by calling 
> IndexWriter.optmize(). But this call does a lot more (also merges all 
> the segments), hence is very expensive. Is there a way to simply just 
> finalize the deletes without having to merge all the segments?
> 
>If not, I'd be glad to submit an implementation of this feature if 
> the Lucene devs agree this is useful.
> 
> Thanks
> 
> -John
>

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: finalize delete without optimize

2004-12-14 Thread Otis Gospodnetic
Hello John,

Once you make your change locally, use 'cvs diff -u IndexWriter.java >
indexwriter.patch' to make a patch.
Then open a new Bugzilla entry.
Finally, attach your patch to that entry.

Note that Document deletion is actually done from IndexReader, so your
patch may have to be on IndexReader, not IndexWriter.

Thanks,
Otis


--- John Wang <[EMAIL PROTECTED]> wrote:

> Hi Otis:
> 
>  Thanks for you reply.
> 
>  I am looking for more of an API call than a tool. e.g.
> IndexWriter.finalizeDelete()
> 
>  If I implement this, how would I go about submitting a patch?
> 
> thanks
> 
> -John
> 
> 
> On Mon, 13 Dec 2004 22:24:12 -0800 (PST), Otis Gospodnetic
> <[EMAIL PROTECTED]> wrote:
> > Hello John,
> > 
> > I believe you didn't get any replies to this.  What you are
> describing
> > cannot be done using the public, but maaay (no source code on this
> > machine, so I can't double-check that) be doable if you use some of
> the
> > 'internal' methods.
> > 
> > I don't have the need for this, but others might, so it may be
> worth
> > developing a tool that purges Documents marked as deleted without
> the
> > expensive segment merging, iff that is possible.  If you put this
> tool
> > under the approprite org.apache.lucene... package, you'll get
> access to
> > 'internal' methods, of course.  If you end up creating this, we
> could
> > stick it in the Sandbox, where we should really create a new
> section
> > for handy command-line tools that manipulate the index.
> > 
> > Otis
> > 
> > 
> > 
> > 
> > --- John Wang <[EMAIL PROTECTED]> wrote:
> > 
> > > Hi:
> > >
> > >Is there a way to finalize delete, e.g. actually remove them
> from
> > > the segments and make sure the docIDs are contiguous again.
> > >
> > >The only explicit way to do this is by calling
> > > IndexWriter.optmize(). But this call does a lot more (also merges
> all
> > > the segments), hence is very expensive. Is there a way to simply
> just
> > > finalize the deletes without having to merge all the segments?
> > >
> > > If not, I'd be glad to submit an implementation of this
> feature
> > > if
> > > the Lucene devs agree this is useful.
> > >
> > > Thanks
> > >
> > > -John
> > >
> > >
> -
> > > To unsubscribe, e-mail:
> [EMAIL PROTECTED]
> > > For additional commands, e-mail:
> [EMAIL PROTECTED]
> > >
> > >
> > 
> >
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: finalize delete without optimize

2004-12-14 Thread Otis Gospodnetic
Hello John,

I believe you didn't get any replies to this.  What you are describing
cannot be done using the public, but maaay (no source code on this
machine, so I can't double-check that) be doable if you use some of the
'internal' methods.  

I don't have the need for this, but others might, so it may be worth
developing a tool that purges Documents marked as deleted without the
expensive segment merging, iff that is possible.  If you put this tool
under the approprite org.apache.lucene... package, you'll get access to
'internal' methods, of course.  If you end up creating this, we could
stick it in the Sandbox, where we should really create a new section
for handy command-line tools that manipulate the index.

Otis


--- John Wang <[EMAIL PROTECTED]> wrote:

> Hi:
> 
>Is there a way to finalize delete, e.g. actually remove them from
> the segments and make sure the docIDs are contiguous again.
> 
>The only explicit way to do this is by calling
> IndexWriter.optmize(). But this call does a lot more (also merges all
> the segments), hence is very expensive. Is there a way to simply just
> finalize the deletes without having to merge all the segments?
> 
> If not, I'd be glad to submit an implementation of this feature
> if
> the Lucene devs agree this is useful.
> 
> Thanks
> 
> -John
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: finalize delete without optimize

2004-12-14 Thread John Wang
Hi Otis:

 Thanks for you reply.

 I am looking for more of an API call than a tool. e.g.
IndexWriter.finalizeDelete()

 If I implement this, how would I go about submitting a patch?

thanks

-John


On Mon, 13 Dec 2004 22:24:12 -0800 (PST), Otis Gospodnetic
<[EMAIL PROTECTED]> wrote:
> Hello John,
> 
> I believe you didn't get any replies to this.  What you are describing
> cannot be done using the public, but maaay (no source code on this
> machine, so I can't double-check that) be doable if you use some of the
> 'internal' methods.
> 
> I don't have the need for this, but others might, so it may be worth
> developing a tool that purges Documents marked as deleted without the
> expensive segment merging, iff that is possible.  If you put this tool
> under the approprite org.apache.lucene... package, you'll get access to
> 'internal' methods, of course.  If you end up creating this, we could
> stick it in the Sandbox, where we should really create a new section
> for handy command-line tools that manipulate the index.
> 
> Otis
> 
> 
> 
> 
> --- John Wang <[EMAIL PROTECTED]> wrote:
> 
> > Hi:
> >
> >Is there a way to finalize delete, e.g. actually remove them from
> > the segments and make sure the docIDs are contiguous again.
> >
> >The only explicit way to do this is by calling
> > IndexWriter.optmize(). But this call does a lot more (also merges all
> > the segments), hence is very expensive. Is there a way to simply just
> > finalize the deletes without having to merge all the segments?
> >
> > If not, I'd be glad to submit an implementation of this feature
> > if
> > the Lucene devs agree this is useful.
> >
> > Thanks
> >
> > -John
> >
> > -
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> >
> >
> 
>

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: how often to optimize?

2004-12-21 Thread Otis Gospodnetic
Hello,

I think some of these questions my be answered in the jGuru FAQ

> So my question is would it be an overkill to optimize everyday?

Only if lots of documents are being added/deleted, and you end up with
a lot of index segments.

> Is
> there  
> any guideline on how often to optimize? Every 1000 documents or more?

Are not optimized indices causing you any problems (e.g. slow searches,
high number of open file handles)?  If no, then you don't even need to
optimize until those issues become... issues.

> Every week? Is there any concern if there are a lot of documents
> added without optimizing?

Possibly, see my answer above.

Otis


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: how often to optimize?

2004-12-28 Thread aurora
Are not optimized indices causing you any problems (e.g. slow searches,
high number of open file handles)?  If no, then you don't even need to
optimize until those issues become... issues.
OK I have changed the process to not doing optimize() at all. So far so  
good. The number of files hover from 10 to 40 during the indexing of  
10,000 files. Seems Lucene is doing some kind of self maintenance to keep  
things in order.

Is it right to say optimize() is a totally optional operation? I probably  
get the impression it is a natural step to end an incremental update from  
the IndexHTML example. Since it replicates the whole index it might be an  
overkill for many applications to do daily.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: how often to optimize?

2004-12-28 Thread Otis Gospodnetic
Correct.
The self-maintenance you are referring to is Lucene's periodic segment
merging.  The frequency of that can be controlled through IndexWriter's
mergeFactor.

Otis

--- aurora <[EMAIL PROTECTED]> wrote:

> > Are not optimized indices causing you any problems (e.g. slow
> searches,
> > high number of open file handles)?  If no, then you don't even need
> to
> > optimize until those issues become... issues.
> >
> 
> OK I have changed the process to not doing optimize() at all. So far
> so  
> good. The number of files hover from 10 to 40 during the indexing of 
> 
> 10,000 files. Seems Lucene is doing some kind of self maintenance to
> keep  
> things in order.
> 
> Is it right to say optimize() is a totally optional operation? I
> probably  
> get the impression it is a natural step to end an incremental update
> from  
> the IndexHTML example. Since it replicates the whole index it might
> be an  
> overkill for many applications to do daily.
> 
> 
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Disk space used by optimize

2005-01-27 Thread Kauler, Leto S

Just a quick question:  after writing an index and then calling
optimize(), is it normal for the index to expand to about three times
the size before finally compressing?

In our case the optimise grinds the disk, expanding the index into many
files of about 145MB total, before compressing down to three files of
about 47MB total.  That must be a lot of disk activity for the people
with multi-gigabyte indexes!

Regards,
Leto

CONFIDENTIALITY NOTICE AND DISCLAIMER

Information in this transmission is intended only for the person(s) to whom it 
is addressed and may contain privileged and/or confidential information. If you 
are not the intended recipient, any disclosure, copying or dissemination of the 
information is unauthorised and you should delete/destroy all copies and notify 
the sender. No liability is accepted for any unauthorised use of the 
information contained in this transmission.

This disclaimer has been automatically added.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Optimize not deleting all files

2005-02-03 Thread yahootintin . 1247688
Hi,



When I run an optimize in our production environment, old index are
left in the directory and are not deleted.  



My understanding is that an
optimize will create new index files and all existing index files should be
deleted.  Is this correct?



We are running Lucene 1.4.2 on Windows.  



Any help is appreciated.  Thanks!

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: optimize(), delete() calls on IndexWriter

2002-03-08 Thread Otis Gospodnetic

No they don't. Note that delete() is in IndexReader.

Otis

--- Aruna Raghavan <[EMAIL PROTECTED]> wrote:
> Hi,
> Do calls like optimize() and delete() on the Indexwriter cause a
> separate
> thread to be kicked off?
> Thanks!
> Aruna.
> 
> --
> To unsubscribe, e-mail:  
> <mailto:[EMAIL PROTECTED]>
> For additional commands, e-mail:
> <mailto:[EMAIL PROTECTED]>
> 


__
Do You Yahoo!?
Try FREE Yahoo! Mail - the world's greatest free email!
http://mail.yahoo.com/

--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>




RE: optimize(), delete() calls on IndexWriter

2002-03-08 Thread Aruna Raghavan

Yes, thanks.

-Original Message-
From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]]
Sent: Friday, March 08, 2002 11:46 AM
To: Lucene Users List
Subject: Re: optimize(), delete() calls on IndexWriter


No they don't. Note that delete() is in IndexReader.

Otis

--- Aruna Raghavan <[EMAIL PROTECTED]> wrote:
> Hi,
> Do calls like optimize() and delete() on the Indexwriter cause a
> separate
> thread to be kicked off?
> Thanks!
> Aruna.
> 
> --
> To unsubscribe, e-mail:  
> <mailto:[EMAIL PROTECTED]>
> For additional commands, e-mail:
> <mailto:[EMAIL PROTECTED]>
> 


__
Do You Yahoo!?
Try FREE Yahoo! Mail - the world's greatest free email!
http://mail.yahoo.com/

--
To unsubscribe, e-mail:
<mailto:[EMAIL PROTECTED]>
For additional commands, e-mail:
<mailto:[EMAIL PROTECTED]>

--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>




What does it take to optimize?

2002-04-03 Thread Robert A. Decker

My index is larger than it should be. My deletable file has entries. I'm
trying to optimize the index, but it just doesn't seem to be doing
anything. Here's how I'm trying to optimize:

IndexWriter writer = new IndexWriter(pathToIndex,
 fragmentAnalyzer,
 false);
writer.close();
writer.optimize();



thanks,
rob

http://www.robdecker.com/
http://www.planetside.com/


--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>




Will failed optimize corrupt an index?

2003-08-19 Thread Dan Quaroni
My index grew about 7 gigs larger than I projected it would, and it ran out
of disk space during optimize.  Does lucene have transactions or anything
that would prevent this from corrupting an index, or do I need to generate
the index again?

Thanks!

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Preventing duplicate document insertion during optimize

2004-04-30 Thread Kevin A. Burton
Let's say you have two indexes each with the same document literal.  All 
the fields hash the same and the document is a binary duplicate of a 
different document in the second index.

What happens when you do a merge to create a 3rd index from the first 
two?  I assume you now have two documents that are identical in one 
index.  Is there any way to prevent this?

It would be nice to figure out if there's a way to flag a field as a 
primary key so that if it has already added it to just skip.

Kevin

--

Please reply using PGP.

   http://peerfear.org/pubkey.asc
   
   NewsMonster - http://www.newsmonster.org/
   
Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965
  AIM/YIM - sfburtonator,  Web - http://peerfear.org/
GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412
 IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster



signature.asc
Description: OpenPGP digital signature


optimize fails with "Negative seek offset"

2004-05-04 Thread Sascha Ottolski
Hi,

I have no idea where to look for, and I know almost nothing about 
java :-( We're using lucene quite a while now (about a year I guess) 
and suddenly I've seen this when trying to optimize the index:

java.lang.Exception: java.io.IOException: Negative seek offset

The code throwing this was:

public boolean optimize() throws IOException {
IndexWriter writer = new IndexWriter(this.indexpath, new 
StandardAnalyzer(), false);
writer.mergeFactor = this.mergeFactor;
try {
writer.optimize();
writer.close();
}
finally {
this.changedIndex();
}
return true;
}

The index-file is about 8.8 GB now. However, when the exception occurs 
the new temporary index-file only grew to 3.2 GB. All this with 
1.4-rc2.


Thanks in advance for any advice,

Sascha


-- 
Gallileus - the power of knowledge

Gallileus GmbHhttp://www.gallileus.info/

Pintschstraße 16  fon +49-(0)30-41 93 43 43
10249 Berlin  fax +49-(0)30-41 93 43 45
Germany



++
AKTUELLER HINWEIS (Mai 2004)

Literatur Alerts - Literatursuche (wie) im Schlaf!

Ab jetzt mehr dazu unter:
http://www.gallileus.info/gallileus/about/products/alerts/
++

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: using optimize and addDocument concurrently.

2004-10-19 Thread Aad Nales
Steve,

The behavior that you descibe is as expected. I have tackled a similar
problem to yours by creating a proxy object that acts as a gatekeeper to
all IndexReader, IndexSearcher and IndexWriter operations. With fully
synchronized access to all methods of the proxy you will not run into
any problems. Everytime I need to perform something with the writer, I
close the searcher etc.

As to regular optimization I tend to reindex now and again with a
completely seperate writer and replace the index by moving it to the new
location. This BTW has also become a method in my proxy object.

Hope this helps,
Cheers,
Aad




Hi,

My basic question is whether it is possible to continue to add documents
to an index in one Thread while running a long running optimization of
the index (approx 30 mins) in another thread.  I'm using Lucene version
1.4.2.  The concurrency matrix at
http://www.jguru.com/faq/view.jsp?EID=913302 shows that if you use the
same IndexWriter object you can do concurrent writes and optimization.
When I try it in my program the addDocuments wait until the optimization
has finished, so in this respect it is Thread safe, but the operations
cannot be performed at the same time.  Our problem is that the index
needs to be continually kept up to date with new news articles, but also
needs to be regularly optimized to keep it fast.  If I cannot update and
optimize one index at the same time the best way I can see of doing this
is maintaining multiple identical indexes and offlining, optimizing,
letting them catch up-to-date and re-onlining them.  Does that sounds
best to you?

Thanks a lot in advance


Steve



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Disk space used by optimize

2005-01-27 Thread Otis Gospodnetic
Hello,

Yes, that is how optimize works - copies all existing index segments
into one unified index segment, thus optimizing it.

see hit #1: http://www.lucenebook.com/search?query=optimize+disk+space

However, three times the space sounds a bit too much, or I make a
mistake in the book. :)

You said you end up with 3 files - .cfs is one of them, right?

Otis


--- "Kauler, Leto S" <[EMAIL PROTECTED]> wrote:

> 
> Just a quick question:  after writing an index and then calling
> optimize(), is it normal for the index to expand to about three times
> the size before finally compressing?
> 
> In our case the optimise grinds the disk, expanding the index into
> many
> files of about 145MB total, before compressing down to three files of
> about 47MB total.  That must be a lot of disk activity for the people
> with multi-gigabyte indexes!
> 
> Regards,
> Leto
> 
> CONFIDENTIALITY NOTICE AND DISCLAIMER
> 
> Information in this transmission is intended only for the person(s)
> to whom it is addressed and may contain privileged and/or
> confidential information. If you are not the intended recipient, any
> disclosure, copying or dissemination of the information is
> unauthorised and you should delete/destroy all copies and notify the
> sender. No liability is accepted for any unauthorised use of the
> information contained in this transmission.
> 
> This disclaimer has been automatically added.
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Disk space used by optimize

2005-01-27 Thread Kauler, Leto S
Our copy of LIA is "in the mail" ;)

Yes the final three files are: the .cfs (46.8MB), deletable (4 bytes),
and segments (29 bytes).

--Leto



> -Original Message-
> From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] 
> 
> Hello,
> 
> Yes, that is how optimize works - copies all existing index 
> segments into one unified index segment, thus optimizing it.
> 
> see hit #1: http://www.lucenebook.com/search?query=optimize+disk+space
> 
> However, three times the space sounds a bit too much, or I 
> make a mistake in the book. :)
> 
> You said you end up with 3 files - .cfs is one of them, right?
> 
> Otis
> 
> 
> --- "Kauler, Leto S" <[EMAIL PROTECTED]> wrote:
> 
> > 
> > Just a quick question:  after writing an index and then calling 
> > optimize(), is it normal for the index to expand to about 
> three times 
> > the size before finally compressing?
> > 
> > In our case the optimise grinds the disk, expanding the index into 
> > many files of about 145MB total, before compressing down to three 
> > files of about 47MB total.  That must be a lot of disk activity for 
> > the people with multi-gigabyte indexes!
> > 
> > Regards,
> > Leto

CONFIDENTIALITY NOTICE AND DISCLAIMER

Information in this transmission is intended only for the person(s) to whom it 
is addressed and may contain privileged and/or confidential information. If you 
are not the intended recipient, any disclosure, copying or dissemination of the 
information is unauthorised and you should delete/destroy all copies and notify 
the sender. No liability is accepted for any unauthorised use of the 
information contained in this transmission.

This disclaimer has been automatically added.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Disk space used by optimize

2005-01-27 Thread Otis Gospodnetic
Have you tried using the multifile index format?  Now I wonder if there
is actually a difference in disk space cosumed by optimize() when you
use multifile and compound index format...

Otis

--- "Kauler, Leto S" <[EMAIL PROTECTED]> wrote:

> Our copy of LIA is "in the mail" ;)
> 
> Yes the final three files are: the .cfs (46.8MB), deletable (4
> bytes),
> and segments (29 bytes).
> 
> --Leto
> 
> 
> 
> > -Original Message-
> > From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] 
> > 
> > Hello,
> > 
> > Yes, that is how optimize works - copies all existing index 
> > segments into one unified index segment, thus optimizing it.
> > 
> > see hit #1:
> http://www.lucenebook.com/search?query=optimize+disk+space
> > 
> > However, three times the space sounds a bit too much, or I 
> > make a mistake in the book. :)
> > 
> > You said you end up with 3 files - .cfs is one of them, right?
> > 
> > Otis
> > 
> > 
> > --- "Kauler, Leto S" <[EMAIL PROTECTED]> wrote:
> > 
> > > 
> > > Just a quick question:  after writing an index and then calling 
> > > optimize(), is it normal for the index to expand to about 
> > three times 
> > > the size before finally compressing?
> > > 
> > > In our case the optimise grinds the disk, expanding the index
> into 
> > > many files of about 145MB total, before compressing down to three
> 
> > > files of about 47MB total.  That must be a lot of disk activity
> for 
> > > the people with multi-gigabyte indexes!
> > > 
> > > Regards,
> > > Leto
> 
> CONFIDENTIALITY NOTICE AND DISCLAIMER
> 
> Information in this transmission is intended only for the person(s)
> to whom it is addressed and may contain privileged and/or
> confidential information. If you are not the intended recipient, any
> disclosure, copying or dissemination of the information is
> unauthorised and you should delete/destroy all copies and notify the
> sender. No liability is accepted for any unauthorised use of the
> information contained in this transmission.
> 
> This disclaimer has been automatically added.
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Disk space used by optimize

2005-01-28 Thread Morus Walter
Otis Gospodnetic writes:
> Hello,
> 
> Yes, that is how optimize works - copies all existing index segments
> into one unified index segment, thus optimizing it.
> 
> see hit #1: http://www.lucenebook.com/search?query=optimize+disk+space
> 
> However, three times the space sounds a bit too much, or I make a
> mistake in the book. :)
> 
I cannot explain why, but ~ three times the size of the final index is
what I observed, when I logged disk usage during optimize of an index
in compound index format.
The test was on linux, I simply did a 'du -s' every few seconds parallel 
to the optimize.
I didn't test noncompund format. Probably optimizing a compund format
requires to store the different parts of the compound file separately
before joining them to the compound file (sound reasonable, otherwise
you would need to know the sizes before creating the parts). In that case 
you had the original index, the separate files and the new compound file 
as the disk usage peak.

So IMHO the book is wrong.

Morus

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Disk space used by optimize

2005-01-28 Thread Otis Gospodnetic
Morus,

that description of 3 sets of index files is what I was imagining, too.
 I'll have to test and add to the book errata, it seems.

Thanks for the info,
Otis

--- Morus Walter <[EMAIL PROTECTED]> wrote:

> Otis Gospodnetic writes:
> > Hello,
> > 
> > Yes, that is how optimize works - copies all existing index
> segments
> > into one unified index segment, thus optimizing it.
> > 
> > see hit #1:
> http://www.lucenebook.com/search?query=optimize+disk+space
> > 
> > However, three times the space sounds a bit too much, or I make a
> > mistake in the book. :)
> > 
> I cannot explain why, but ~ three times the size of the final index
> is
> what I observed, when I logged disk usage during optimize of an index
> in compound index format.
> The test was on linux, I simply did a 'du -s' every few seconds
> parallel 
> to the optimize.
> I didn't test noncompund format. Probably optimizing a compund format
> requires to store the different parts of the compound file separately
> before joining them to the compound file (sound reasonable, otherwise
> you would need to know the sizes before creating the parts). In that
> case 
> you had the original index, the separate files and the new compound
> file 
> as the disk usage peak.
> 
> So IMHO the book is wrong.
> 
> Morus
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Disk space used by optimize

2005-01-31 Thread Doug Cutting
Yura Smolsky wrote:
There is a big difference when you use compound index format or
multiple files. I have tested it on the big index (45 Gb). When I used
compound file then optimize takes 3 times more space, b/c *.cfs needs
to be unpacked.
Now I do use non compound file format. It needs like twice as much
disk space.
Perhaps we should add something to the javadocs noting this?
Doug
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Optimize not deleting all files

2005-02-03 Thread åç
Your understanding is right!

The old existing files should be deleted,but it  will build new files!


On Thu, 03 Feb 2005 17:36:27 -0800 (PST),
[EMAIL PROTECTED] <[EMAIL PROTECTED]>
wrote:
> Hi,
> 
> When I run an optimize in our production environment, old index are
> left in the directory and are not deleted.
> 
> My understanding is that an
> optimize will create new index files and all existing index files should be
> deleted.  Is this correct?
> 
> We are running Lucene 1.4.2 on Windows.
> 
> Any help is appreciated.  Thanks!
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-- 
æäåäæäå

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Optimize not deleting all files

2005-02-04 Thread Ernesto De Santis
Hi all
We have the same problem.
We guess that the problem is that windows lock files.
Our enviroment:
Windows 2000
Tomcat 5.5.4
Ernesto.
[EMAIL PROTECTED] escribió:
Hi,
When I run an optimize in our production environment, old index are
left in the directory and are not deleted.  

My understanding is that an
optimize will create new index files and all existing index files should be
deleted.  Is this correct?
We are running Lucene 1.4.2 on Windows.  

Any help is appreciated.  Thanks!
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

 


--
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
Version: 7.0.300 / Virus Database: 265.8.5 - Release Date: 03/02/2005
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Disk space used by optimize

2005-02-04 Thread Bernhard Messer

However, three times the space sounds a bit too much, or I make a
mistake in the book. :)
 

there already was  a discussion about disk usage during index optimize. 
Please have a look to the developers list at: 
http://mail-archives.apache.org/eyebrowse/[EMAIL PROTECTED]&msgId=1797569 
<http://mail-archives.apache.org/eyebrowse/[EMAIL PROTECTED]&msgId=1797569>
where i made some measurements about the disk usage within lucene.
At that time i proposed a patch which was reducing disk total used disk 
size from 3 times to a little more than 2 times of the final index size. 
Together with Christoph we implemented some improvements to the 
optimization patch and finally commit the changes.

Bernhard
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Optimize not deleting all files

2005-02-04 Thread Otis Gospodnetic
Get and try Lucene 1.4.3.  One of the older versions had a bug that was
not deleting old index files.

Otis

--- [EMAIL PROTECTED] wrote:

> Hi,
> 
> When I run an optimize in our production environment, old index are
> left in the directory and are not deleted.  
> 
> My understanding is that an
> optimize will create new index files and all existing index files
> should be
> deleted.  Is this correct?
> 
> We are running Lucene 1.4.2 on Windows.  
> 
> 
> Any help is appreciated.  Thanks!
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Optimize not deleting all files

2005-02-04 Thread yahootintin . 1247688
Ernestor, what version of Lucene are you running?



--- Lucene Users List"
 

> We have the same problem.

> We guess that the problem is that windows lock files.

> 

> Our enviroment:

> Windows 2000

> Tomcat 5.5.4

> 

> Ernesto.

> 

> [EMAIL PROTECTED]
escribi󺊾 

> >Hi,

> >

> >When I run an optimize in our production environment,
old index are

> >left in the directory and are not deleted.  

> >

> >My
understanding is that an

> >optimize will create new index files and all
existing index files should be

> >deleted.  Is this correct?

> >

> >We
are running Lucene 1.4.2 on Windows.  

> >

> >

> >Any help is appreciated.
 Thanks!

> >

> >-

> >To unsubscribe, e-mail: [EMAIL PROTECTED]

>
>For additional commands, e-mail: [EMAIL PROTECTED]

> >

> >

> >

> >  

> >

> 

> 

> -- 

> No virus found in this outgoing message.

> Checked by AVG Anti-Virus.

> Version: 7.0.300 / Virus Database: 265.8.5
- Release Date: 03/02/2005

> 

> 

> -

> To unsubscribe, e-mail: [EMAIL PROTECTED]

> For
additional commands, e-mail: [EMAIL PROTECTED]

> 

> 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Optimize not deleting all files

2005-02-04 Thread Patricio Keilty
Hi all, i´ll answer on behalf of Ernesto, our environment is:
Lucene 1.4.2
Tomcat 5.5.4
java 1.4.2_04
Windows 2000 SP4
--p
[EMAIL PROTECTED] wrote:
Ernestor, what version of Lucene are you running?
--- Lucene Users List"

We have the same problem.

We guess that the problem is that windows lock files.
Our enviroment:

Windows 2000
Tomcat 5.5.4
Ernesto.
[EMAIL PROTECTED]
escribi󺊾 

Hi,
When I run an optimize in our production environment,
old index are
left in the directory and are not deleted.  

My
understanding is that an
optimize will create new index files and all
existing index files should be
deleted.  Is this correct?
We
are running Lucene 1.4.2 on Windows.  

Any help is appreciated.
Thanks!
-

To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



--
No virus found in this outgoing message.

Checked by AVG Anti-Virus.
Version: 7.0.300 / Virus Database: 265.8.5
- Release Date: 03/02/2005
-

To unsubscribe, e-mail: [EMAIL PROTECTED]
For
additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Optimize not deleting all files

2005-02-04 Thread Patricio Keilty
Hi Otis, tried version 1.4.3 without success, old index files still 
remain in the directory.
Also tried not calling optimize(), and still getting the same behaviour, 
maybe our problem is not related to optimize() call at all.

--p
Otis Gospodnetic wrote:
Get and try Lucene 1.4.3.  One of the older versions had a bug that was
not deleting old index files.
Otis
--- [EMAIL PROTECTED] wrote:

Hi,
When I run an optimize in our production environment, old index are
left in the directory and are not deleted.  

My understanding is that an
optimize will create new index files and all existing index files
should be
deleted.  Is this correct?
We are running Lucene 1.4.2 on Windows.  

Any help is appreciated.  Thanks!
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Optimize not deleting all files

2005-02-04 Thread Steven Rowe
Hi Patricio,
Is it the case that the "old index files" are not removed from session to
session, or only within the same session?  The discussion below pertains to
the latter case, that is, where the "old index files" are used in the same
process as the files replacing them.
I was having a similar problem, and tracked the source down to IndexReaders
not being closed in my application.  

As far as I can tell, in order for IndexReaders to present a consistent
view of an index while changes are being made to it, read-only copies
of the index are kept around until all IndexReaders using them are
closed.  If any IndexReaders are open on the index, IndexWriters first
make a copy, then operate on the copy.  If you track down all of these
open IndexReaders and close them before optimization, all of the
"old index files" should be deleted.  (Lucene Gurus, please correct this
if I have misrepresented the situation).
In my application, I had a bad interaction between IndexReader caching,
garbage collection, and incremental indexing, in which a new IndexReader
was being opened on an index after each indexing increment, without
closing the already-opened IndexReaders.
On Windows, operating-system level file locking caused by IndexReaders
left open was disallowing index re-creation, because the IndexWriter
wasn't allowed to delete the index files opened by the abandoned
IndexReaders.
In short, if you need to write to an index more than once in a single
session, be sure to keep careful track of your IndexReaders.
Hope it helps,
Steve
Patricio Keilty wrote:
Hi Otis, tried version 1.4.3 without success, old index files still 
remain in the directory.
Also tried not calling optimize(), and still getting the same behaviour, 
maybe our problem is not related to optimize() call at all.

--p
Otis Gospodnetic wrote:
Get and try Lucene 1.4.3.  One of the older versions had a bug that was
not deleting old index files.
Otis
--- [EMAIL PROTECTED] wrote:

Hi,
When I run an optimize in our production environment, old index are
left in the directory and are not deleted. 
My understanding is that an
optimize will create new index files and all existing index files
should be
deleted.  Is this correct?

We are running Lucene 1.4.2 on Windows. 

Any help is appreciated.  Thanks!

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Optimize not deleting all files

2005-02-04 Thread yahootintin . 1247688
Yes, I believe my problem is related to open IndexReaders.  The issues is
that we can't shut down our live search application while we wait for a 10
minute optimization.  Search is a major part of our application and removing
the feature would significantly affect our end users (even though we run the
optimize during the night).



After the optimize is completed, I close and
re-open the readers so they start reading from the new index files.  I'm 
thinking
of adding code to delete all the old files at that point.  I presume they
will no longer be locked.



--- Lucene Users List"  

> Is it the case that the "old index files" are
not removed from session to

> session, or only within the same session? 
The discussion below pertains to

> the latter case, that is, where the "old
index files" are used in the same

> process as the files replacing them.

> 

> I was having a similar problem, and tracked the source down to IndexReaders

> not being closed in my application.  

> 

> As far as I can tell, in order
for IndexReaders to present a consistent

> view of an index while changes
are being made to it, read-only copies

> of the index are kept around until
all IndexReaders using them are

> closed.  If any IndexReaders are open on
the index, IndexWriters first

> make a copy, then operate on the copy.  If
you track down all of these

> open IndexReaders and close them before optimization,
all of the

> "old index files" should be deleted.  (Lucene Gurus, please
correct this

> if I have misrepresented the situation).

> 

> In my application,
I had a bad interaction between IndexReader caching,

> garbage collection,
and incremental indexing, in which a new IndexReader

> was being opened on
an index after each indexing increment, without

> closing the already-opened
IndexReaders.

> 

> On Windows, operating-system level file locking caused
by IndexReaders

> left open was disallowing index re-creation, because the
IndexWriter

> wasn't allowed to delete the index files opened by the abandoned

> IndexReaders.

> 

> In short, if you need to write to an index more than
once in a single

> session, be sure to keep careful track of your IndexReaders.

> 

> Hope it helps,

> Steve

> 

> Patricio Keilty wrote:

> > Hi Otis,
tried version 1.4.3 without success, old index files still 

> > remain in
the directory.

> > Also tried not calling optimize(), and still getting the
same behaviour, 

> > maybe our problem is not related to optimize() call
at all.

> > 

> > --p

> > 

> > Otis Gospodnetic wrote:

> > 

> >> Get
and try Lucene 1.4.3.  One of the older versions had a bug that was

> >>
not deleting old index files.

> >>

> >> Otis

> >>

> >> --- [EMAIL PROTECTED]
wrote:

> >>

> >>

> >>> Hi,

> >>>

> >>> When I run an optimize in our
production environment, old index are

> >>> left in the directory and are
not deleted. 

> >>> My understanding is that an

> >>> optimize will create
new index files and all existing index files

> >>> should be

> >>> deleted.
 Is this correct?

> >>>

> >>> We are running Lucene 1.4.2 on Windows. 

> >>>

> >>> Any help is appreciated.  Thanks!

> 

> 

> -

> To unsubscribe, e-mail: [EMAIL PROTECTED]

> For
additional commands, e-mail: [EMAIL PROTECTED]

> 

> 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Disk space used by optimize

2005-02-06 Thread Morus Walter
Bernhard Messer writes:
> 
> >However, three times the space sounds a bit too much, or I make a
> >mistake in the book. :)
> >  
> >
> there already was  a discussion about disk usage during index optimize. 
> Please have a look to the developers list at: 
> http://mail-archives.apache.org/eyebrowse/[EMAIL PROTECTED]&msgId=1797569 
> <http://mail-archives.apache.org/eyebrowse/[EMAIL PROTECTED]&msgId=1797569>
> where i made some measurements about the disk usage within lucene.
> At that time i proposed a patch which was reducing disk total used disk 
> size from 3 times to a little more than 2 times of the final index size. 
> Together with Christoph we implemented some improvements to the 
> optimization patch and finally commit the changes.
> 
Hmm. In the case that the index is used (open reader), I doubt your patch 
makes a difference. In that case the disk space used by the non optimized 
index will still be used even if the files are deleted (on unix/linux).
What happens, if disk space run's out during creation of the compound index?
Will the non compound files be a usable index?
Otherwise you risk to loose the index.

Morus

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: What does it take to optimize?

2002-04-03 Thread Aruna Raghavan

Hi,
I would call close after I call optimize...

-Original Message-
From: Robert A. Decker [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, April 03, 2002 1:15 PM
To: Lucene Users List
Subject: What does it take to optimize?


My index is larger than it should be. My deletable file has entries. I'm
trying to optimize the index, but it just doesn't seem to be doing
anything. Here's how I'm trying to optimize:

IndexWriter writer = new IndexWriter(pathToIndex,
 fragmentAnalyzer,
 false);
writer.close();
writer.optimize();



thanks,
rob

http://www.robdecker.com/
http://www.planetside.com/


--
To unsubscribe, e-mail:
<mailto:[EMAIL PROTECTED]>
For additional commands, e-mail:
<mailto:[EMAIL PROTECTED]>

--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>




Files getting deleted when optimize is killed?

2003-07-11 Thread Steve Rajavuori
I've had a problem on several occasions where my entire index is deleted --
that is, EVERY file (except 'segments') is gone. There were many users on
the system each time, so its a little hard to tell for sure what was going
on, but my theory is this:

My code will automatically call optimize( ) periodically. Because the index
is very large, it can take a long time. It looks like an administrator may
have killed my process, and its possible that it was killed while an
optimize( ) was in progress.

I have two questions:

1) Does anyone know if killing an optimize( ) in progress could wipe out all
files like this? (New index created in temporary files that were not saved
properly, while old index files were already deleted???)

2) Does anyone know of any other way all files in an index could be
inadvertently deleted (e.g. through killing a process)? For example, if you
kill the process during an 'add' would that cause all files to be deleted?

Steve Rajavuori

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Will failed optimize corrupt an index?

2003-08-19 Thread Pasha Bizhan
HI,

> From: Dan Quaroni [mailto:[EMAIL PROTECTED] 
> 
> My index grew about 7 gigs larger than I projected it would, 
> and it ran out of disk space during optimize.  Does lucene 
> have transactions or anything that would prevent this from 
> corrupting an index, or do I need to generate the index again?

You must generate index again.

Pasha
Lucene.Net www.sourceforge.net/projects/lucenedotnet


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Will failed optimize corrupt an index?

2003-08-20 Thread Doug Cutting
The index should be fine.  Lucene index updates are atomic.

Doug

Dan Quaroni wrote:
My index grew about 7 gigs larger than I projected it would, and it ran out
of disk space during optimize.  Does lucene have transactions or anything
that would prevent this from corrupting an index, or do I need to generate
the index again?
Thanks!

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: optimize fails with "Negative seek offset"

2004-05-12 Thread Sascha Ottolski
Hi,

sorry for following up my own mail, but since no one responded so
far, I thought the stacktrace might be of interested. The following
exception always occurs when trying to optimize one of our indizes,
which always went ok for about a year now. I just tried with 1.4-rc3,
but with the same result:

java.io.IOException: Negative seek offset
at java.io.RandomAccessFile.seek(Native Method)
at org.apache.lucene.store.FSInputStream.readInternal(FSDirectory.java:405)
at org.apache.lucene.store.InputStream.readBytes(InputStream.java:61)
at 
org.apache.lucene.index.CompoundFileReader$CSInputStream.readInternal(CompoundFileReader.java:222)
at org.apache.lucene.store.InputStream.refill(InputStream.java:158)
at org.apache.lucene.store.InputStream.readByte(InputStream.java:43)
at org.apache.lucene.store.InputStream.readVInt(InputStream.java:83)
at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:63)
at org.apache.lucene.index.SegmentReader.document(SegmentReader.java:238)
at org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:185)
at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:92)
at org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:483)
at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:362)
at LuceneRPCHandler.optimize(LuceneRPCHandler.java:398)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:324)
at org.apache.xmlrpc.Invoker.execute(Invoker.java:168)
at org.apache.xmlrpc.XmlRpcWorker.invokeHandler(XmlRpcWorker.java:123)
at org.apache.xmlrpc.XmlRpcWorker.execute(XmlRpcWorker.java:185)
at org.apache.xmlrpc.XmlRpcServer.execute(XmlRpcServer.java:151)
at org.apache.xmlrpc.XmlRpcServer.execute(XmlRpcServer.java:139)
at org.apache.xmlrpc.WebServer$Connection.run(WebServer.java:773)
at org.apache.xmlrpc.WebServer$Runner.run(WebServer.java:656)
at java.lang.Thread.run(Thread.java:534)


Any hint would be greatly appreciated.


Thanks,

Sascha

-- 
Gallileus - the power of knowledge

Gallileus GmbHhttp://www.gallileus.info/

Pintschstraße 16  fon +49-(0)30-41 93 43 43
10249 Berlin  fax +49-(0)30-41 93 43 45
Germany



++
AKTUELLER HINWEIS (Mai 2004)

Literatur Alerts - Literatursuche (wie) im Schlaf!

Ab jetzt mehr dazu unter:
http://www.gallileus.info/gallileus/about/products/alerts/
++

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: optimize fails with "Negative seek offset"

2004-05-12 Thread Anthony Vito
Looks like the same error I got when I tried to use Lucene version 1.3
to search on an index I had created with Lucene version 1.4. The
versions are not forward compatible. Did you by chance create the index
with version 1.4 and are now searching with version 1.3. It's easy to
get the dependencies out of sync for different apps, which is what
happened to me.

-vito

On Wed, 2004-05-12 at 04:59, Sascha Ottolski wrote:
> Hi,
> 
> sorry for following up my own mail, but since no one responded so
> far, I thought the stacktrace might be of interested. The following
> exception always occurs when trying to optimize one of our indizes,
> which always went ok for about a year now. I just tried with 1.4-rc3,
> but with the same result:
> 
> java.io.IOException: Negative seek offset
> at java.io.RandomAccessFile.seek(Native Method)
> at org.apache.lucene.store.FSInputStream.readInternal(FSDirectory.java:405)
> at org.apache.lucene.store.InputStream.readBytes(InputStream.java:61)
> at 
> org.apache.lucene.index.CompoundFileReader$CSInputStream.readInternal(CompoundFileReader.java:222)
> at org.apache.lucene.store.InputStream.refill(InputStream.java:158)
> at org.apache.lucene.store.InputStream.readByte(InputStream.java:43)
> at org.apache.lucene.store.InputStream.readVInt(InputStream.java:83)
> at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:63)
> at org.apache.lucene.index.SegmentReader.document(SegmentReader.java:238)
> at org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:185)
> at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:92)
> at org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:483)
> at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:362)
> at LuceneRPCHandler.optimize(LuceneRPCHandler.java:398)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:324)
> at org.apache.xmlrpc.Invoker.execute(Invoker.java:168)
> at org.apache.xmlrpc.XmlRpcWorker.invokeHandler(XmlRpcWorker.java:123)
> at org.apache.xmlrpc.XmlRpcWorker.execute(XmlRpcWorker.java:185)
> at org.apache.xmlrpc.XmlRpcServer.execute(XmlRpcServer.java:151)
> at org.apache.xmlrpc.XmlRpcServer.execute(XmlRpcServer.java:139)
> at org.apache.xmlrpc.WebServer$Connection.run(WebServer.java:773)
> at org.apache.xmlrpc.WebServer$Runner.run(WebServer.java:656)
> at java.lang.Thread.run(Thread.java:534)
> 
> 
> Any hint would be greatly appreciated.
> 
> 
> Thanks,
> 
> Sascha
> 
> -- 
> Gallileus - the power of knowledge
> 
> Gallileus GmbHhttp://www.gallileus.info/
> 
> PintschstraÃe 16  fon +49-(0)30-41 93 43 43
> 10249 Berlin  fax +49-(0)30-41 93 43 45
> Germany
> 
> 
> 
> ++
> AKTUELLER HINWEIS (Mai 2004)
> 
> Literatur Alerts - Literatursuche (wie) im Schlaf!
> 
> Ab jetzt mehr dazu unter:
> http://www.gallileus.info/gallileus/about/products/alerts/
> ++
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: optimize fails with "Negative seek offset"

2004-05-12 Thread Sascha Ottolski
Am Mittwoch, 12. Mai 2004 18:54 schrieb Anthony Vito:
> Looks like the same error I got when I tried to use Lucene version
> 1.3 to search on an index I had created with Lucene version 1.4. The
> versions are not forward compatible. Did you by chance create the
> index with version 1.4 and are now searching with version 1.3. It's
> easy to get the dependencies out of sync for different apps, which is
> what happened to me.
>
> -vito

Hi vito,

thanks for the reply, but no, we only upgraded so far, but did not 
downgade. More than that, the failing index was just rebuilt completely 
with 1.4-rc2, only two weeks ago. The problem started a short time 
afterwards (but not immediately).


Greets,

Sascha

-- 
Gallileus - the power of knowledge

Gallileus GmbHhttp://www.gallileus.info/

PintschstraÃe 16  fon +49-(0)30-41 93 43 43
10249 Berlin  fax +49-(0)30-41 93 43 45
Germany



++
AKTUELLER HINWEIS (Mai 2004)

Literatur Alerts - Literatursuche (wie) im Schlaf!

Ab jetzt mehr dazu unter:
http://www.gallileus.info/gallileus/about/products/alerts/
++

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Preventing duplicate document insertion during optimize

2004-04-30 Thread James Dunn
Kevin,

I have a similar issue.  The only solution I have been
able to come up with is, after the merge, to open an
IndexReader against the merge index, iterate over all
the docs and delete duplicate docs based on my
"primary key" field.

Jim

--- "Kevin A. Burton" <[EMAIL PROTECTED]> wrote:
> Let's say you have two indexes each with the same
> document literal.  All 
> the fields hash the same and the document is a
> binary duplicate of a 
> different document in the second index.
> 
> What happens when you do a merge to create a 3rd
> index from the first 
> two?  I assume you now have two documents that are
> identical in one 
> index.  Is there any way to prevent this?
> 
> It would be nice to figure out if there's a way to
> flag a field as a 
> primary key so that if it has already added it to
> just skip.
> 
> Kevin
> 
> -- 
> 
> Please reply using PGP.
> 
> http://peerfear.org/pubkey.asc
> 
> NewsMonster - http://www.newsmonster.org/
> 
> Kevin A. Burton, Location - San Francisco, CA, Cell
> - 415.595.9965
>AIM/YIM - sfburtonator,  Web -
> http://peerfear.org/
> GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D
> 8D04 99F1 4412
>   IRC - freenode.net #infoanarchy | #p2p-hackers |
> #newsmonster
> 
> 

> ATTACHMENT part 2 application/pgp-signature
name=signature.asc






__
Do you Yahoo!?
Win a $20,000 Career Makeover at Yahoo! HotJobs  
http://hotjobs.sweepstakes.yahoo.com/careermakeover 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re[2]: Disk space used by optimize

2005-01-30 Thread Yura Smolsky
Hello, Otis.

There is a big difference when you use compound index format or
multiple files. I have tested it on the big index (45 Gb). When I used
compound file then optimize takes 3 times more space, b/c *.cfs needs
to be unpacked.

Now I do use non compound file format. It needs like twice as much
disk space.

OG> Have you tried using the multifile index format?  Now I wonder if there
OG> is actually a difference in disk space cosumed by optimize() when you
OG> use multifile and compound index format...

OG> Otis

OG> --- "Kauler, Leto S" <[EMAIL PROTECTED]> wrote:

>> Our copy of LIA is "in the mail" ;)
>> 
>> Yes the final three files are: the .cfs (46.8MB), deletable (4
>> bytes),
>> and segments (29 bytes).
>> 
>> --Leto
>> 
>> 
>> 
>> > -Original Message-
>> > From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] 
>> > 
>> > Hello,
>> > 
>> > Yes, that is how optimize works - copies all existing index 
>> > segments into one unified index segment, thus optimizing it.
>> > 
>> > see hit #1:
>> http://www.lucenebook.com/search?query=optimize+disk+space
>> > 
>> > However, three times the space sounds a bit too much, or I 
>> > make a mistake in the book. :)
>> > 
>> > You said you end up with 3 files - .cfs is one of them, right?
>> > 
>> > Otis
>> > 
>> > 
>> > --- "Kauler, Leto S" <[EMAIL PROTECTED]> wrote:
>> > 
>> > > 
>> > > Just a quick question:  after writing an index and then calling
>> > > optimize(), is it normal for the index to expand to about 
>> > three times 
>> > > the size before finally compressing?
>> > > 
>> > > In our case the optimise grinds the disk, expanding the index
>> into 
>> > > many files of about 145MB total, before compressing down to three
>> 
>> > > files of about 47MB total.  That must be a lot of disk activity
>> for 
>> > > the people with multi-gigabyte indexes!
>> > > 
>> > > Regards,
>> > > Leto
>> 
>> CONFIDENTIALITY NOTICE AND DISCLAIMER
>> 
>> Information in this transmission is intended only for the person(s)
>> to whom it is addressed and may contain privileged and/or
>> confidential information. If you are not the intended recipient, any
>> disclosure, copying or dissemination of the information is
>> unauthorised and you should delete/destroy all copies and notify the
>> sender. No liability is accepted for any unauthorised use of the
>> information contained in this transmission.
>> 
>> This disclaimer has been automatically added.
>> 
>> -
>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>> For additional commands, e-mail:
>> [EMAIL PROTECTED]
>> 
>> 


OG> -
OG> To unsubscribe, e-mail: [EMAIL PROTECTED]
OG> For additional commands, e-mail:
OG> [EMAIL PROTECTED]


Yura Smolsky,




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re[2]: Disk space used by optimize

2005-02-04 Thread Yura Smolsky
Hello, Doug.

>> There is a big difference when you use compound index format or
>> multiple files. I have tested it on the big index (45 Gb). When I used
>> compound file then optimize takes 3 times more space, b/c *.cfs needs
>> to be unpacked.
>> 
>> Now I do use non compound file format. It needs like twice as much
>> disk space.
DC> Perhaps we should add something to the javadocs noting this?

Sure. I was a bit confused about optimizing compound file format b/c I
had not info about space usage when optimizing.
More info in the javadocs will save somebody's time :)


Yura Smolsky




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Files getting deleted when optimize is killed?

2003-07-12 Thread Otis Gospodnetic

--- Steve Rajavuori <[EMAIL PROTECTED]> wrote:
> I've had a problem on several occasions where my entire index is
> deleted --
> that is, EVERY file (except 'segments') is gone. There were many
> users on
> the system each time, so its a little hard to tell for sure what was
> going
> on, but my theory is this:
> 
> My code will automatically call optimize( ) periodically. Because the
> index
> is very large, it can take a long time. It looks like an
> administrator may
> have killed my process, and its possible that it was killed while an
> optimize( ) was in progress.
> 
> I have two questions:
> 
> 1) Does anyone know if killing an optimize( ) in progress could wipe
> out all
> files like this? (New index created in temporary files that were not
> saved
> properly, while old index files were already deleted???)

I highly doubt it.

> 2) Does anyone know of any other way all files in an index could be
> inadvertently deleted (e.g. through killing a process)? For example,
> if you
> kill the process during an 'add' would that cause all files to be
> deleted?

Same as above.  You can create an artificial, large index for testing
purposes.  Call optimize once in a while, and then kill the process.  I
don't think Lucene will remove your files.

Otis


__
Do you Yahoo!?
SBC Yahoo! DSL - Now only $29.95 per month!
http://sbc.yahoo.com

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Files getting deleted when optimize is killed?

2003-07-14 Thread Steve Rajavuori
Upon further examination what I found is this:

- Killing the process while optimize() is still working does NOT cause the
index files to be deleted, HOWEVER --

- Once the index is opened again by a new process (now apparently in an
unstable state due to the incomplete optimize()), at that time all existing
files are deleted and only a file called "segments" remains.

-Original Message-
From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]
Sent: Saturday, July 12, 2003 7:06 AM
To: Lucene Users List
Subject: Re: Files getting deleted when optimize is killed?



--- Steve Rajavuori <[EMAIL PROTECTED]> wrote:
> I've had a problem on several occasions where my entire index is
> deleted --
> that is, EVERY file (except 'segments') is gone. There were many
> users on
> the system each time, so its a little hard to tell for sure what was
> going
> on, but my theory is this:
> 
> My code will automatically call optimize( ) periodically. Because the
> index
> is very large, it can take a long time. It looks like an
> administrator may
> have killed my process, and its possible that it was killed while an
> optimize( ) was in progress.
> 
> I have two questions:
> 
> 1) Does anyone know if killing an optimize( ) in progress could wipe
> out all
> files like this? (New index created in temporary files that were not
> saved
> properly, while old index files were already deleted???)

I highly doubt it.

> 2) Does anyone know of any other way all files in an index could be
> inadvertently deleted (e.g. through killing a process)? For example,
> if you
> kill the process during an 'add' would that cause all files to be
> deleted?

Same as above.  You can create an artificial, large index for testing
purposes.  Call optimize once in a while, and then kill the process.  I
don't think Lucene will remove your files.

Otis


__
Do you Yahoo!?
SBC Yahoo! DSL - Now only $29.95 per month!
http://sbc.yahoo.com

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Memory requirements for optimize() on compound index high?

2004-05-10 Thread David Sitsky
Hi,

I am working on an application which uses Lucene 1.3 Final which uses the 
compound index format on Win32 Sun JVM 1.4.1_02.  I have set 
maxFieldLength for the index writer to 1,000,000, as often I have to index 
potentially very large documents, which contain information that must be 
indexed.

All other index writer parameters have their default values.  The 
application loads all documents in a batch phase, and then allows the user 
to perform searchers.  Typically, no new documents are added afterwards.

Given the large size set for maxFieldLength, I have allocated 512MB of 
memory to the JVM.  For indexing 1,000,000 complex documents, with 
potentially around 30 fields each, this seems to work fine.

I have noticed that when performing an optimize() on this index at the end 
of a batch load, the memory requirements seem to be much higher.  I was 
receiving OutOfMemoryErrors for a 512MB JVM.  I increased the JVM size to 
1 GIG, and the optimize operation completed successfully.

Task manager reported a peak VM size of 810MB during the optimize() 
operation, from a newly-created JVM.  FWIW, the final index size was 11 
gigabytes - most document fields are stored in the index.

Do people have similar experiences to this when calling optimize() on a 
compound index?

Are there any ways I can reduce the amount of memory required, apart from 
making maxFieldLength smaller?

Are there any way of determining in advance the kind of memory requirements 
optimize() will require?  Its highly undesirable to receive 
OutOfMemoryErrors during optimize().  I guess the user can still search on 
an unoptimized index which is better than nothing...

-- 
Cheers,
David

This message is intended only for the named recipient.  If you are not the 
intended recipient you are notified that disclosing, copying, distributing 
or taking any action  in reliance on the contents of this information is 
strictly prohibited.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: optimize() is not merging into single file? !!!!!!

2004-06-02 Thread iouli . golovatyi
I rechecked  the results. Here they are:

IndexWriter compiled with v.1.4-rc2 generates after optimization
_36d.cfs3779 kb

IndexWriter compiled with v.1.4-rc3 generates after optimization

_36d.cfs   3778 kb
_36c.cfs31 kb
_35z.cfs14 kb
_35o.cfs   14  kb
.
etc.

I both cases segment file contains _36d.cfs

Looks like new version just "foget" to clean up






Iouli Golovatyi/X/GP/[EMAIL PROTECTED]
01.06.2004 17:22
Please respond to "Lucene Users List"

 
To: <[EMAIL PROTECTED]>
cc: 
Subject:optimeze() is not merging into single file?
    Category: 



I optimize and close the index after that, but don't get just one .cvs 
file as it promised in doc. Instead of it I see something like small 
segments and a couple of big.
This weird behavor seems started since i changed from v 1.4-rc2 to 
1.4-rc3.
Before I got just one cvs segment . Any ideas?
Thanks in advance
J.



RE : optimize() is not merging into single file? !!!!!!

2004-06-02 Thread Rasik Pandey
Hello,

I am running a two-week old version of Lucene from the CVS HEAD and seeing the same 
behavior.?

Regards,
RBP 

> -Message d'origine-
> De : [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED]
> Envoyà : mercredi 2 juin 2004 13:53
> Ã : Lucene Users List
> Objet : Re: optimize() is not merging into single file? !!
> 
> I rechecked  the results. Here they are:
> 
> IndexWriter compiled with v.1.4-rc2 generates after
> optimization
> _36d.cfs3779 kb
> 
> IndexWriter compiled with v.1.4-rc3 generates after
> optimization
> 
> _36d.cfs   3778 kb
> _36c.cfs31 kb
> _35z.cfs14 kb
> _35o.cfs   14  kb
> .
> etc.
> 
> I both cases segment file contains _36d.cfs
> 
> Looks like new version just "foget" to clean up
> 
> 
> 
> 
> 
> 
> Iouli Golovatyi/X/GP/[EMAIL PROTECTED]
> 01.06.2004 17:22
> Please respond to "Lucene Users List"
> 
> 
> To: <[EMAIL PROTECTED]>
> cc:
> Subject:optimeze() is not merging into single
> file?
> Category:
> 
> 
> 
> I optimize and close the index after that, but don't get just
> one .cvs
> file as it promised in doc. Instead of it I see something like
> small
> segments and a couple of big.
> This weird behavor seems started since i changed from v 1.4-rc2
> to
> 1.4-rc3.
> Before I got just one cvs segment . Any ideas?
> Thanks in advance
> J.




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Way to repair an index broking during 1/2 optimize?

2004-07-08 Thread Kevin A. Burton
So.. the other day I sent an email about building an index with 14M 
documents.

That went well but the optimize() was taking FOREVER.  It took 7 hours 
to generate the whole index and when complete as of 10AM it was still 
optimizing (6 hours later) and I needed the box back.

So is it possible to fix this index now?  Can I just delete the most 
recent segment that was created?  I can find this by ls -alt

Also... what can I do to speed up this optimize?  Ideally it wouldn't 
take 6 hours.

Kevin
--
Please reply using PGP.
   http://peerfear.org/pubkey.asc
   
   NewsMonster - http://www.newsmonster.org/
   
Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965
  AIM/YIM - sfburtonator,  Web - http://peerfear.org/
GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412
 IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Way to repair an index broking during 1/2 optimize?

2004-07-08 Thread Peter M Cipollone
You might try merging the existing index into a new index located on a ram
disk.  Once it is done, you can move the directory from ram disk back to
your hard disk.  I think this will work as long as the old index did not
finish merging.  You might do a "strings" command on the segments file to
make sure the new (merged) segment is not in there, and if there's a
"deletable" file, make sure there are no segments from the old index listed
therein.

- Original Message - 
From: "Kevin A. Burton" <[EMAIL PROTECTED]>
To: "Lucene Users List" <[EMAIL PROTECTED]>
Sent: Thursday, July 08, 2004 2:02 PM
Subject: Way to repair an index broking during 1/2 optimize?


> So.. the other day I sent an email about building an index with 14M
> documents.
>
> That went well but the optimize() was taking FOREVER.  It took 7 hours
> to generate the whole index and when complete as of 10AM it was still
> optimizing (6 hours later) and I needed the box back.
>
> So is it possible to fix this index now?  Can I just delete the most
> recent segment that was created?  I can find this by ls -alt
>
> Also... what can I do to speed up this optimize?  Ideally it wouldn't
> take 6 hours.
>
> Kevin
>
> -- 
>
> Please reply using PGP.
>
> http://peerfear.org/pubkey.asc
>
> NewsMonster - http://www.newsmonster.org/
>
> Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965
>AIM/YIM - sfburtonator,  Web - http://peerfear.org/
> GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412
>   IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster
>
>
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>
>


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Way to repair an index broking during 1/2 optimize?

2004-07-08 Thread Doug Cutting
Kevin A. Burton wrote:
Also... what can I do to speed up this optimize?  Ideally it wouldn't 
take 6 hours.
Was this the index with the mergeFactor of 5000?  If so, that's why it's 
so slow: you've delayed all of the work until the end.  Indexing on a 
ramfs will make things faster in general, however, if you have enough RAM...

Doug

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Way to repair an index broking during 1/2 optimize?

2004-07-08 Thread Doug Cutting
Kevin A. Burton wrote:
So is it possible to fix this index now?  Can I just delete the most 
recent segment that was created?  I can find this by ls -alt
Sorry, I forgot to answer your question: this should work fine.  I don't 
think you should even have to delete that segment.

Also, to elaborate on my previous comment, a mergeFactor of 5000 not 
only delays the work until the end, but it also makes the disk workload 
more seek-dominated, which is not optimal.  So I suspect a smaller merge 
factor, together with a larger minMergeDocs, will be much faster 
overall, including the final optimize().  Please tell us how it goes.

Doug
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Way to repair an index broking during 1/2 optimize?

2004-07-08 Thread Kevin A. Burton
Peter M Cipollone wrote:
You might try merging the existing index into a new index located on a ram
disk.  Once it is done, you can move the directory from ram disk back to
your hard disk.  I think this will work as long as the old index did not
finish merging.  You might do a "strings" command on the segments file to
make sure the new (merged) segment is not in there, and if there's a
"deletable" file, make sure there are no segments from the old index listed
therein.
 

Its a HUGE index.  It won't fit in memory ;)  Right now its at 8G...
Thanks though! :)
Kevin
--
Please reply using PGP.
   http://peerfear.org/pubkey.asc
   
   NewsMonster - http://www.newsmonster.org/
   
Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965
  AIM/YIM - sfburtonator,  Web - http://peerfear.org/
GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412
 IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Way to repair an index broking during 1/2 optimize?

2004-07-08 Thread Kevin A. Burton
Doug Cutting wrote:
Kevin A. Burton wrote:
Also... what can I do to speed up this optimize? Ideally it wouldn't 
take 6 hours.

Was this the index with the mergeFactor of 5000? If so, that's why 
it's so slow: you've delayed all of the work until the end. Indexing 
on a ramfs will make things faster in general, however, if you have 
enough RAM...
No... I changed the mergeFactor back to 10 as you suggested.
Kevin
--
Please reply using PGP.
   http://peerfear.org/pubkey.asc
   
   NewsMonster - http://www.newsmonster.org/
   
Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965
  AIM/YIM - sfburtonator,  Web - http://peerfear.org/
GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412
 IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Way to repair an index broking during 1/2 optimize?

2004-07-08 Thread Kevin A. Burton
Doug Cutting wrote:
Kevin A. Burton wrote:
So is it possible to fix this index now? Can I just delete the most 
recent segment that was created? I can find this by ls -alt

Sorry, I forgot to answer your question: this should work fine. I 
don't think you should even have to delete that segment.
I'm worried about duplicate or missing content from the original index. 
I'd rather rebuild the index and waste another 6 hours (I've probably 
blown 100 hours of CPU time on this already) and have a correct index :)

During an optimize I assume Lucene starts writing to a new segment and 
leaves all others in place until everything is done and THEN deletes them?

Also, to elaborate on my previous comment, a mergeFactor of 5000 not 
only delays the work until the end, but it also makes the disk 
workload more seek-dominated, which is not optimal. 
The only settings I uses are:
targetIndex.mergeFactor=10;
targetIndex.minMergeDocs=1000;
the resulting index has 230k files in it :-/
I assume this is contributing to all the disk seeks.
So I suspect a smaller merge factor, together with a larger 
minMergeDocs, will be much faster overall, including the final 
optimize(). Please tell us how it goes.

This is what I did for this last round but then I ended up with the 
highly fragmented index.

hm...
Thanks for all the help btw!
Kevin
--
Please reply using PGP.
   http://peerfear.org/pubkey.asc
   
   NewsMonster - http://www.newsmonster.org/
   
Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965
  AIM/YIM - sfburtonator,  Web - http://peerfear.org/
GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412
 IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Way to repair an index broking during 1/2 optimize?

2004-07-08 Thread Doug Cutting
Kevin A. Burton wrote:
No... I changed the mergeFactor back to 10 as you suggested.
Then I am confused about why it should take so long.
Did you by chance set the IndexWriter.infoStream to something, so that 
it logs merges?  If so, it would be interesting to see that output, 
especially the last entry.

Doug
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Way to repair an index broking during 1/2 optimize?

2004-07-08 Thread Kevin A. Burton
Doug Cutting wrote:
Kevin A. Burton wrote:
No... I changed the mergeFactor back to 10 as you suggested.

Then I am confused about why it should take so long.
Did you by chance set the IndexWriter.infoStream to something, so that 
it logs merges? If so, it would be interesting to see that output, 
especially the last entry.

No I didn't actually... If I run it again I'll be sure to do this.
--
Please reply using PGP.
   http://peerfear.org/pubkey.asc
   
   NewsMonster - http://www.newsmonster.org/
   
Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965
  AIM/YIM - sfburtonator,  Web - http://peerfear.org/
GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412
 IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Way to repair an index broking during 1/2 optimize?

2004-07-08 Thread Doug Cutting
Kevin A. Burton wrote:
During an optimize I assume Lucene starts writing to a new segment and 
leaves all others in place until everything is done and THEN deletes them?
That's correct.
The only settings I uses are:
targetIndex.mergeFactor=10;
targetIndex.minMergeDocs=1000;
the resulting index has 230k files in it :-/
Something sounds very wrong for there to be that many files.
The maximum number of files should be around:
  (7 + numIndexedFields) * (mergeFactor-1) * 
(log_base_mergeFactor(numDocs/minMergeDocs))

With 14M documents, log_10(14M/1000) is 4, which gives, for you:
  (7 + numIndexedFields) * 36 = 230k
   7*36 + numIndexedFields*36 = 230k
   numIndexedFields = (230k - 7*36) / 36 =~ 6k
So you'd have to have around 6k unique field names to get 230k files. 
Or something else must be wrong.  Are you running on win32, where file 
deletion can be difficult?

With the typical handful of fields, one should never see more than 
hundreds of files.

Doug
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Way to repair an index broking during 1/2 optimize?

2004-07-08 Thread Kevin A. Burton
Doug Cutting wrote:
Something sounds very wrong for there to be that many files.
The maximum number of files should be around:
(7 + numIndexedFields) * (mergeFactor-1) * 
(log_base_mergeFactor(numDocs/minMergeDocs))

With 14M documents, log_10(14M/1000) is 4, which gives, for you:
(7 + numIndexedFields) * 36 = 230k
7*36 + numIndexedFields*36 = 230k
numIndexedFields = (230k - 7*36) / 36 =~ 6k
So you'd have to have around 6k unique field names to get 230k files. 
Or something else must be wrong. Are you running on win32, where file 
deletion can be difficult?

With the typical handful of fields, one should never see more than 
hundreds of files.

We only have 13 fields... Though to be honest I'm worried that even if I 
COULD do the optimize that it would run out of file handles.

This is very strange...
I'm going to increase minMergeDocs to 1 and then run the full 
converstion on one box and then try to do an optimize (of the corrupt) 
another box. See which one finishes first.

I assume the speed of optimize() can be increased the same way that 
indexing is increased...

Kevin
--
Please reply using PGP.
   http://peerfear.org/pubkey.asc
   
   NewsMonster - http://www.newsmonster.org/
   
Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965
  AIM/YIM - sfburtonator,  Web - http://peerfear.org/
GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412
 IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Way to repair an index broking during 1/2 optimize?

2004-07-09 Thread Doug Cutting
Kevin A. Burton wrote:
With the typical handful of fields, one should never see more than 
hundreds of files.

We only have 13 fields... Though to be honest I'm worried that even if I 
COULD do the optimize that it would run out of file handles.
Optimization doesn't open all files at once.  The most files that are 
ever opened by an IndexWriter is just:

4 + (5 + numIndexedFields) * (mergeFactor-1)
This includes during optimization.
However, when searching, an IndexReader must keep most files open.  In 
particular, the maximum number of files an unoptimized, non-compound 
IndexReader can have open is:

(5 + numIndexedFields) * (mergeFactor-1) * 
(log_base_mergeFactor(numDocs/minMergeDocs))

A compound IndexReader, on the other hand, should open at most, just:
(mergeFactor-1) * (log_base_mergeFactor(numDocs/minMergeDocs))
An optimized, non-compound IndexReader will open just (5 + 
numIndexedFields) files.

And an optimized, compound IndexReader should only keep one file open.
Doug
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Disk space used by optimize - non space in disk corrupts index.

2005-02-04 Thread Ernesto De Santis
Hi all
We have a big index and a little space in disk.
When optimize and all space is consumed, our index is corrupted.
segments file point to nonexistent files.
Enviroment:
java 1.4.2_04
W2000 SP4
Tomat 5.5.4
Bye,
Ernesto.
Yura Smolsky escribió:
Hello, Otis.
There is a big difference when you use compound index format or
multiple files. I have tested it on the big index (45 Gb). When I used
compound file then optimize takes 3 times more space, b/c *.cfs needs
to be unpacked.
Now I do use non compound file format. It needs like twice as much
disk space.
OG> Have you tried using the multifile index format?  Now I wonder if there
OG> is actually a difference in disk space cosumed by optimize() when you
OG> use multifile and compound index format...
OG> Otis
OG> --- "Kauler, Leto S" <[EMAIL PROTECTED]> wrote:
 

Our copy of LIA is "in the mail" ;)
Yes the final three files are: the .cfs (46.8MB), deletable (4
bytes),
and segments (29 bytes).
--Leto

 

-Original Message-
From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] 

Hello,
Yes, that is how optimize works - copies all existing index 
segments into one unified index segment, thus optimizing it.

see hit #1:
   

http://www.lucenebook.com/search?query=optimize+disk+space
 

However, three times the space sounds a bit too much, or I 
make a mistake in the book. :)

You said you end up with 3 files - .cfs is one of them, right?
Otis
--- "Kauler, Leto S" <[EMAIL PROTECTED]> wrote:
   

Just a quick question:  after writing an index and then calling
optimize(), is it normal for the index to expand to about 
 

three times 
   

the size before finally compressing?
In our case the optimise grinds the disk, expanding the index
 

into 
 

many files of about 145MB total, before compressing down to three
 

files of about 47MB total.  That must be a lot of disk activity
 

for 
 

the people with multi-gigabyte indexes!
Regards,
Leto
 

CONFIDENTIALITY NOTICE AND DISCLAIMER
Information in this transmission is intended only for the person(s)
to whom it is addressed and may contain privileged and/or
confidential information. If you are not the intended recipient, any
disclosure, copying or dissemination of the information is
unauthorised and you should delete/destroy all copies and notify the
sender. No liability is accepted for any unauthorised use of the
information contained in this transmission.
This disclaimer has been automatically added.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail:
[EMAIL PROTECTED]
 


OG> -
OG> To unsubscribe, e-mail: [EMAIL PROTECTED]
OG> For additional commands, e-mail:
OG> [EMAIL PROTECTED]
Yura Smolsky,

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 


--
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
Version: 7.0.300 / Virus Database: 265.8.5 - Release Date: 03/02/2005
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


IndexReader.close() semantics and optimize -- Re: problem with locks when updating the data of a previous stored document

2004-09-16 Thread David Spencer
Crump, Michael wrote:
You have to close the IndexReader after doing the delete, before opening the 
IndexWriter for the addition.  See information at this link:
http://wiki.apache.org/jakarta-lucene/UpdatingAnIndex
Recently I thought I observed that if I use this batch update idiom (1st 
delete the changed docs, then add them), it seems that 
IndexReader.close() does not flush/commit the deletions - rather 
IndexWriter.optimize() does.

I may have been confused and should retest this, but regardless, the 
javadoc seems unclear. close() says it "*saves* deletions to disk". What 
does it mean to save a deletion? Save a pending one, or commit it 
(commit -> really delete it)?

http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/index/IndexReader.html#close()
Also optimize doesn't mention deletions.
http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/index/IndexWriter.html#optimize()
Suggestion: could the word "save" in the close() jdoc be elaborated on, 
and possibly could optimize() get another comment wrt its effect on 
deletions?

thx,
 Dave

Regards,
Michael
-Original Message-
From:   Paul Williams [mailto:[EMAIL PROTECTED]
Sent:   Thu 9/16/2004 5:39 AM
To: 'Lucene Users List'
Cc: 
Subject:problem with locks when updating the data of a previous stored document
Hi,
Using lucene-1.4.1.jar on WinXP  

I am having trouble with locking and updating an existing Lucene document. I
delete the old document from the index and then add the new document to the
index writer. I am using the minMerge docs set to 100 (much quicker!!) and
close the writer once the batch is done, so the documents are flushed to the
filesystem
The problem i am having is I can't delete the old version of the document
(after the first document has been added) using reader.delete because there
is a lock on the index due to the IndexWriter being open.
Am I doing this wrong or is there a simple way round this?
Regards,
Paul
Code snippets of the update code (I have just cut and pasted the relevant
line from my app to get an idea)
reader = IndexReader.open(location);
// Delete old doc/term if present
if (reader.docFreq(docNumberTerm) > 0) {
reader.delete(docNumberTerm);
.
.
.
IndexWriter writer = null;
// get the writer from the hash table so last few are cached and don't
have to be restarted
synchronized(IndexWriterCache) {
   String dbstring = "" + ldb;
   writer = (IndexWriter)IndexWriterCache.get(dbstring);
   if (writer == null) {
   //Not in cache so create one and add to cache for next time
   writer = new IndexWriter(location, new StandardAnalyzer(),
new_index);
   writer.setUseCompoundFile(true);
   // Set the maximum number of entries per field. Default is
10,000
   writer.maxFieldLength = MaxFieldCount;
   // Set how many docs will be stored in memory before being
saved to disk
   writer.minMergeDocs = (int) DocsInMemory;
   IndexWriterCache.remove(dbstring);
   IndexWriterCache.put(dbstring, writer);
}
.
.
.
  
// Add the docuents to the Lucene index
writer.addDocument(doc);


.
. Some time later after a batch of docs been added
 
	   writer.close();




DISCLAIMER:
The information in this message is confidential and may be legally
privileged. It is intended solely for the addressee. Access to this message
by anyone else is unauthorised. If you are not the intended recipient, any
disclosure, copying, or distribution of the message, or any action or
omission taken by you in reliance on it, is prohibited and may be unlawful.
Please immediately contact the sender if you have received this message in
error.
Thank you.
Valid Information Systems Limited. Address: Morline House, 160 London Road,
Barking, Essex, IG11 8BB. 

http://www.valinf.com Tel: +44 (0) 20 8215 1414 Fax: +44 (0) 20 8215 2040
Please note that as part of our drive to continually improve the service to
our clients, we have established a dedicated support line for customers to 
call if they are in need of help with their installation of R/KYV or have a
query regarding the operation of the software. The number is - 0870 0161414
This will ensure any call is carefully noted, any action required is 
scheduled for completion and any problem experienced handled by a carefully
chosen team of developers. Please make a note of this number and pass it 
on to any other relevant person within your organisation.
 
*

--
Visit Valid who will sharing a stand with partners, Goss Interactive at t

Re: IndexReader.close() semantics and optimize -- Re: problem with locks when updating the data of a previous stored document

2004-09-16 Thread Morus Walter
David Spencer writes:
> Crump, Michael wrote:
> 
> > You have to close the IndexReader after doing the delete, before opening the 
> > IndexWriter for the addition.  See information at this link:
> > 
> > http://wiki.apache.org/jakarta-lucene/UpdatingAnIndex
> 
> Recently I thought I observed that if I use this batch update idiom (1st 
> delete the changed docs, then add them), it seems that 
> IndexReader.close() does not flush/commit the deletions - rather 
> IndexWriter.optimize() does.
> 
> I may have been confused and should retest this, but regardless, the 
> javadoc seems unclear. close() says it "*saves* deletions to disk". What 
> does it mean to save a deletion? Save a pending one, or commit it 
> (commit -> really delete it)?
> 
My understanding is, that saving makes sure, that index searcher opened after
the close will take the deletions into account.
Deletion in lucene is in fact a two phase process:
a) delete -> document is marked deleted but not removed from the index.
  low level apis (such as list terms) will still see the content
  from deleted documents 
  Search takes the delete flag into account and removes deleted
  documents from the result list.
  (I think this also means that the deleted documents still 
  contribute to term frequencies)
b) remove deleted documents
   this is done during optimize

Morus

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]