RE: CFS file and file formats

2004-12-23 Thread Steve Rajavuori
I think there are several problems. 

1) First of all, there are both CFS files and standard (non-compound) files
in this directory, and all of them have recent update dates, so I assume
they are all being used. My code never explicitly sets the compound file
flag, so I don't know how this happened.

2) Is there a way to force all files into compound mode? For example, if I
set the compound setting, then call optimize, will that recreate everything
into the CFS format?

3) There are several other large .CFS files in this directory that I think
have somehow become detached from the index. They have recent update dates
-- however, the last time I ran optimize these were not touched, and they
are not being updated now. I know these segments have valid data, because
now when I search I am missing large chunks of data -- which I assume is in
these detached segments. So my thought is to edit the 'segments' file to
make Lucene recognize these again -- but I need to know the correct segment
size in order to do this. So how do I determine what the correct segment
size should be?

Steve

-Original Message-
From: Daniel Naber [mailto:[EMAIL PROTECTED]
Sent: Wednesday, December 22, 2004 4:50 PM
To: Lucene Users List
Subject: Re: CFS file and file formats


On Wednesday 22 December 2004 23:41, Steve Rajavuori wrote:

 Thanks. I am trying to repair a corrupted 'segments' file.

Why are you sure it's corrupted? Are the *.cfs file and the other files 
types mixed in one directory? Then that's the problem: if you have *.cfs, 
segments, and deletable, nothing else should exist in that directory or 
Lucene will get confused.

Regards
 Daniel

-- 
http://www.danielnaber.de

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: CFS file and file formats

2004-12-23 Thread Doug Cutting
Steve Rajavuori wrote:
1) First of all, there are both CFS files and standard (non-compound) files
in this directory, and all of them have recent update dates, so I assume
they are all being used. My code never explicitly sets the compound file
flag, so I don't know how this happened.
This can happen if your application crashes while the index was being 
updated.  In this case these were never entered into the segments file 
and may be partially written.

2) Is there a way to force all files into compound mode? For example, if I
set the compound setting, then call optimize, will that recreate everything
into the CFS format?
It should.  Except, on Windows not all old CFS file will be deleted 
immediately, but may instead be listed in the 'deleteable' file for a while.

3) There are several other large .CFS files in this directory that I think
have somehow become detached from the index. They have recent update dates
-- however, the last time I ran optimize these were not touched, and they
are not being updated now. I know these segments have valid data, because
now when I search I am missing large chunks of data -- which I assume is in
these detached segments. So my thought is to edit the 'segments' file to
make Lucene recognize these again -- but I need to know the correct segment
size in order to do this. So how do I determine what the correct segment
size should be?
These could also be the result of crashes.  In this case they may be 
partially written.

The safest approach is to remove files not mentioned in the segments 
file and update the index with the missing documents.  How does your 
application recover if it crashes during an update?

Doug
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


RE: CFS file and file formats

2004-12-23 Thread Steve Rajavuori
Doug wrote:
The safest approach is to remove files not mentioned in the segments 
file and update the index with the missing documents.  How does your 
application recover if it crashes during an update?

There are around 20 million documents in the orphaned segments, so it would
take a very long time to update the index. Is there an unsafe way to edit
the segments file to add these back? It seems like the missing piece of
information I need to do this is the correct segment size -- where can I
find that?

My application doesn't really have any recovery method if it crashes. Can
you tell me what the proper error handling procedure is? If, in fact, these
segments were corrupted because the application crashed, what could I have
done programmatically to recover once that had happened?

-Original Message-
From: Doug Cutting [mailto:[EMAIL PROTECTED]
Sent: Thursday, December 23, 2004 1:34 PM
To: Lucene Users List
Subject: Re: CFS file and file formats


Steve Rajavuori wrote:
 1) First of all, there are both CFS files and standard (non-compound)
files
 in this directory, and all of them have recent update dates, so I assume
 they are all being used. My code never explicitly sets the compound file
 flag, so I don't know how this happened.

This can happen if your application crashes while the index was being 
updated.  In this case these were never entered into the segments file 
and may be partially written.

 2) Is there a way to force all files into compound mode? For example, if I
 set the compound setting, then call optimize, will that recreate
everything
 into the CFS format?

It should.  Except, on Windows not all old CFS file will be deleted 
immediately, but may instead be listed in the 'deleteable' file for a while.

 3) There are several other large .CFS files in this directory that I think
 have somehow become detached from the index. They have recent update dates
 -- however, the last time I ran optimize these were not touched, and they
 are not being updated now. I know these segments have valid data, because
 now when I search I am missing large chunks of data -- which I assume is
in
 these detached segments. So my thought is to edit the 'segments' file to
 make Lucene recognize these again -- but I need to know the correct
segment
 size in order to do this. So how do I determine what the correct segment
 size should be?

These could also be the result of crashes.  In this case they may be 
partially written.

The safest approach is to remove files not mentioned in the segments 
file and update the index with the missing documents.  How does your 
application recover if it crashes during an update?

Doug

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: CFS file and file formats

2004-12-23 Thread Doug Cutting
Steve Rajavuori wrote:
There are around 20 million documents in the orphaned segments, so it would
take a very long time to update the index. Is there an unsafe way to edit
the segments file to add these back? It seems like the missing piece of
information I need to do this is the correct segment size -- where can I
find that?
Do the CFS and non-CVS segment names correspond?  If so, then it 
probably crashed after the segment was complete, but perhaps before it 
was packed into a CFS file.  So I'd trust the non-CFS stuff first.  And 
it's easy to see the size of a non-CVS segement: it's just the number of 
bytes in each of the .f* files.

Doug
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


RE: CFS file and file formats

2004-12-22 Thread Steve Rajavuori
Thanks. I am trying to repair a corrupted 'segments' file. I am attempting
to manually edit the file to add some missing segment names, but I need to
add the correct segment size for each. Can anyone tell me how to determine
the correct segment size (number of documents in the segment) by looking in
either a .CFS file, or in the appropriate file in a non-compound segment?

Steve

-Original Message-
From: Bernhard Messer [mailto:[EMAIL PROTECTED]
Sent: Wednesday, December 22, 2004 4:13 PM
To: Lucene Users List
Subject: Re: CFS file?


Steve Rajavuori schrieb:

Can someone tell me the purpose of the .CFS files? The Index File Formats
page does not mention this type of file.
  

uuuh, you're right, it is not documented at fileformats.html.
Since Lucene 1.4, the individual index files are stored per default 
within one single compound file which has the file extension .cvs . You 
can switch that behaviour off by setting the public static member 
IndexWriter.useCompoundFile to false.

Bernhard


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: CFS file and file formats

2004-12-22 Thread Daniel Naber
On Wednesday 22 December 2004 23:41, Steve Rajavuori wrote:

 Thanks. I am trying to repair a corrupted 'segments' file.

Why are you sure it's corrupted? Are the *.cfs file and the other files 
types mixed in one directory? Then that's the problem: if you have *.cfs, 
segments, and deletable, nothing else should exist in that directory or 
Lucene will get confused.

Regards
 Daniel

-- 
http://www.danielnaber.de

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]