RE: CFS file and file formats
I think there are several problems. 1) First of all, there are both CFS files and standard (non-compound) files in this directory, and all of them have recent update dates, so I assume they are all being used. My code never explicitly sets the compound file flag, so I don't know how this happened. 2) Is there a way to force all files into compound mode? For example, if I set the compound setting, then call optimize, will that recreate everything into the CFS format? 3) There are several other large .CFS files in this directory that I think have somehow become detached from the index. They have recent update dates -- however, the last time I ran optimize these were not touched, and they are not being updated now. I know these segments have valid data, because now when I search I am missing large chunks of data -- which I assume is in these detached segments. So my thought is to edit the 'segments' file to make Lucene recognize these again -- but I need to know the correct segment size in order to do this. So how do I determine what the correct segment size should be? Steve -Original Message- From: Daniel Naber [mailto:[EMAIL PROTECTED] Sent: Wednesday, December 22, 2004 4:50 PM To: Lucene Users List Subject: Re: CFS file and file formats On Wednesday 22 December 2004 23:41, Steve Rajavuori wrote: Thanks. I am trying to repair a corrupted 'segments' file. Why are you sure it's corrupted? Are the *.cfs file and the other files types mixed in one directory? Then that's the problem: if you have *.cfs, segments, and deletable, nothing else should exist in that directory or Lucene will get confused. Regards Daniel -- http://www.danielnaber.de - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: CFS file and file formats
Steve Rajavuori wrote: 1) First of all, there are both CFS files and standard (non-compound) files in this directory, and all of them have recent update dates, so I assume they are all being used. My code never explicitly sets the compound file flag, so I don't know how this happened. This can happen if your application crashes while the index was being updated. In this case these were never entered into the segments file and may be partially written. 2) Is there a way to force all files into compound mode? For example, if I set the compound setting, then call optimize, will that recreate everything into the CFS format? It should. Except, on Windows not all old CFS file will be deleted immediately, but may instead be listed in the 'deleteable' file for a while. 3) There are several other large .CFS files in this directory that I think have somehow become detached from the index. They have recent update dates -- however, the last time I ran optimize these were not touched, and they are not being updated now. I know these segments have valid data, because now when I search I am missing large chunks of data -- which I assume is in these detached segments. So my thought is to edit the 'segments' file to make Lucene recognize these again -- but I need to know the correct segment size in order to do this. So how do I determine what the correct segment size should be? These could also be the result of crashes. In this case they may be partially written. The safest approach is to remove files not mentioned in the segments file and update the index with the missing documents. How does your application recover if it crashes during an update? Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: CFS file and file formats
Doug wrote: The safest approach is to remove files not mentioned in the segments file and update the index with the missing documents. How does your application recover if it crashes during an update? There are around 20 million documents in the orphaned segments, so it would take a very long time to update the index. Is there an unsafe way to edit the segments file to add these back? It seems like the missing piece of information I need to do this is the correct segment size -- where can I find that? My application doesn't really have any recovery method if it crashes. Can you tell me what the proper error handling procedure is? If, in fact, these segments were corrupted because the application crashed, what could I have done programmatically to recover once that had happened? -Original Message- From: Doug Cutting [mailto:[EMAIL PROTECTED] Sent: Thursday, December 23, 2004 1:34 PM To: Lucene Users List Subject: Re: CFS file and file formats Steve Rajavuori wrote: 1) First of all, there are both CFS files and standard (non-compound) files in this directory, and all of them have recent update dates, so I assume they are all being used. My code never explicitly sets the compound file flag, so I don't know how this happened. This can happen if your application crashes while the index was being updated. In this case these were never entered into the segments file and may be partially written. 2) Is there a way to force all files into compound mode? For example, if I set the compound setting, then call optimize, will that recreate everything into the CFS format? It should. Except, on Windows not all old CFS file will be deleted immediately, but may instead be listed in the 'deleteable' file for a while. 3) There are several other large .CFS files in this directory that I think have somehow become detached from the index. They have recent update dates -- however, the last time I ran optimize these were not touched, and they are not being updated now. I know these segments have valid data, because now when I search I am missing large chunks of data -- which I assume is in these detached segments. So my thought is to edit the 'segments' file to make Lucene recognize these again -- but I need to know the correct segment size in order to do this. So how do I determine what the correct segment size should be? These could also be the result of crashes. In this case they may be partially written. The safest approach is to remove files not mentioned in the segments file and update the index with the missing documents. How does your application recover if it crashes during an update? Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: CFS file and file formats
Steve Rajavuori wrote: There are around 20 million documents in the orphaned segments, so it would take a very long time to update the index. Is there an unsafe way to edit the segments file to add these back? It seems like the missing piece of information I need to do this is the correct segment size -- where can I find that? Do the CFS and non-CVS segment names correspond? If so, then it probably crashed after the segment was complete, but perhaps before it was packed into a CFS file. So I'd trust the non-CFS stuff first. And it's easy to see the size of a non-CVS segement: it's just the number of bytes in each of the .f* files. Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: CFS file and file formats
Thanks. I am trying to repair a corrupted 'segments' file. I am attempting to manually edit the file to add some missing segment names, but I need to add the correct segment size for each. Can anyone tell me how to determine the correct segment size (number of documents in the segment) by looking in either a .CFS file, or in the appropriate file in a non-compound segment? Steve -Original Message- From: Bernhard Messer [mailto:[EMAIL PROTECTED] Sent: Wednesday, December 22, 2004 4:13 PM To: Lucene Users List Subject: Re: CFS file? Steve Rajavuori schrieb: Can someone tell me the purpose of the .CFS files? The Index File Formats page does not mention this type of file. uuuh, you're right, it is not documented at fileformats.html. Since Lucene 1.4, the individual index files are stored per default within one single compound file which has the file extension .cvs . You can switch that behaviour off by setting the public static member IndexWriter.useCompoundFile to false. Bernhard - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: CFS file and file formats
On Wednesday 22 December 2004 23:41, Steve Rajavuori wrote: Thanks. I am trying to repair a corrupted 'segments' file. Why are you sure it's corrupted? Are the *.cfs file and the other files types mixed in one directory? Then that's the problem: if you have *.cfs, segments, and deletable, nothing else should exist in that directory or Lucene will get confused. Regards Daniel -- http://www.danielnaber.de - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]