RE: Index partition corrupted during a regular flush due to FileNotFoundException on DEL file
Thanks to Erick. Get it. -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Tuesday, September 13, 2016 11:52 PM To: dev@lucene.apache.org Subject: Re: Index partition corrupted during a regular flush due to FileNotFoundException on DEL file bq: So can we say that the FileNotFoundExcep tion on DEL file was caused by merge operation of Lucene Not quite. what we're saying is that unpredictable things _may_ happen if your disk is full. You should check that you have adequate free space for all operations to succeed. Lucene tries very hard to deal properly with disk full situations, but there are always edge cases to consider. Lucene will not delete files to allow a merge to succeed, it's first priority is always to keep the current index intact and would not delete files just to allow a merge to succeed. So if you have adequate disk space, then we need to start looking for other culprits that usually are outside of Solr/Lucene. If you do not have adequate disk space, then you need to get more free disk and see if the problem recurs. By the way, make sure you've configured your logging such that your Solr logs (and especially the CONSOLE log) do not grow indefinitely. That's sometimes a reason that the disk fills up. Ditto with the Zookeeper snapshots. Best, Erick On Tue, Sep 13, 2016 at 8:30 AM, wenxzhen wrote: > Thanks to Shawn. > > > > So can we say that the FileNotFoundExcep tion on DEL file was caused > by merge operation of Lucene? Note that our application is running on > old Lucene Core v3.6.2. Does the rule below from Core 5.3.2 apply to Core > 3.6.2? > > > > Now we have 2 indexes in 68G and 76G respectively, but only 143G disk > space is left. > > > > -Original Message- > From: Shawn Heisey [mailto:apa...@elyograg.org] > Sent: Tuesday, September 13, 2016 8:49 PM > To: dev@lucene.apache.org > Subject: Re: Index partition corrupted during a regular flush due to > FileNotFoundException on DEL file > > > > On 9/12/2016 7:07 PM, 郑文兴 wrote: > >> So you mean if there is no more than 10G free space, Lucene/Solr will > >> delete some files to save the disk space? Or it will cause the > >> misbehave of Lucene/Solr? > > > > If you do not have enough free disk space, and a segment merge takes > place that requires more free space than you have, then the merge will fail. > Merges are going to happen if you are adding/updating documents in > your index. > > > > The Solr operation called "optimize" is known as "forceMerge" in > Lucene. It forces a merge of the entire index down to (usually) one > segment. Here's some more information: > > > > https://lucene.apache.org/core/5_3_2/core/org/apache/lucene/index/Inde > xWriter.html#forceMerge(int) > > > > Lucene will not delete files to free up space. The only way that a > Lucene index ever gets smaller is through segment merging, which will > temporarily > *increase* the space used, until the merge is complete. Any deleted > documents contained in the merging segments will be removed when the > merge completes. > > > >> Please note that we have several shards/partitions under the same >> root > >> directory, so which way the following is true to us. Let's assume we > >> have 2 partitions, and A -> 10G, B->10G > >> > >> l do we have to make sure there are at least 20G disk space available? > >> > >> l Or we just need to make sure there are at least 10G disk space > >> available? > >> > > > > The extra space required for merging is detailed in the link above. > At worst your index size might temporarily increase by a factor of four. > > Usually a full optimize (forceMerge) on an index that's not in the > compound file format will only *double* the index size while it's > running, but in some situations, it might require more. > > > > My own recommendation: If you only have one large index on a server, > you should have enough space available for that index to triple in > size. If you have many large indexes, having enough space for *all* > your indexes to double in size at the same time is probably > sufficient, but if you can arrange for more free space, that would be > advisable. > > > > Thanks, > > Shawn > > > > > > - > > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For > additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Index partition corrupted during a regular flush due to FileNotFoundException on DEL file
bq: So can we say that the FileNotFoundExcep tion on DEL file was caused by merge operation of Lucene Not quite. what we're saying is that unpredictable things _may_ happen if your disk is full. You should check that you have adequate free space for all operations to succeed. Lucene tries very hard to deal properly with disk full situations, but there are always edge cases to consider. Lucene will not delete files to allow a merge to succeed, it's first priority is always to keep the current index intact and would not delete files just to allow a merge to succeed. So if you have adequate disk space, then we need to start looking for other culprits that usually are outside of Solr/Lucene. If you do not have adequate disk space, then you need to get more free disk and see if the problem recurs. By the way, make sure you've configured your logging such that your Solr logs (and especially the CONSOLE log) do not grow indefinitely. That's sometimes a reason that the disk fills up. Ditto with the Zookeeper snapshots. Best, Erick On Tue, Sep 13, 2016 at 8:30 AM, wenxzhen wrote: > Thanks to Shawn. > > > > So can we say that the FileNotFoundExcep tion on DEL file was caused by > merge operation of Lucene? Note that our application is running on old > Lucene Core v3.6.2. Does the rule below from Core 5.3.2 apply to Core 3.6.2? > > > > Now we have 2 indexes in 68G and 76G respectively, but only 143G disk space > is left. > > > > -Original Message- > From: Shawn Heisey [mailto:apa...@elyograg.org] > Sent: Tuesday, September 13, 2016 8:49 PM > To: dev@lucene.apache.org > Subject: Re: Index partition corrupted during a regular flush due to > FileNotFoundException on DEL file > > > > On 9/12/2016 7:07 PM, 郑文兴 wrote: > >> So you mean if there is no more than 10G free space, Lucene/Solr will > >> delete some files to save the disk space? Or it will cause the > >> misbehave of Lucene/Solr? > > > > If you do not have enough free disk space, and a segment merge takes place > that requires more free space than you have, then the merge will fail. > Merges are going to happen if you are adding/updating documents in your > index. > > > > The Solr operation called "optimize" is known as "forceMerge" in Lucene. It > forces a merge of the entire index down to (usually) one segment. Here's > some more information: > > > > https://lucene.apache.org/core/5_3_2/core/org/apache/lucene/index/IndexWriter.html#forceMerge(int) > > > > Lucene will not delete files to free up space. The only way that a Lucene > index ever gets smaller is through segment merging, which will temporarily > *increase* the space used, until the merge is complete. Any deleted > documents contained in the merging segments will be removed when the merge > completes. > > > >> Please note that we have several shards/partitions under the same root > >> directory, so which way the following is true to us. Let's assume we > >> have 2 partitions, and A -> 10G, B->10G > >> > >> l do we have to make sure there are at least 20G disk space available? > >> > >> l Or we just need to make sure there are at least 10G disk space > >> available? > >> > > > > The extra space required for merging is detailed in the link above. At > worst your index size might temporarily increase by a factor of four. > > Usually a full optimize (forceMerge) on an index that's not in the compound > file format will only *double* the index size while it's running, but in > some situations, it might require more. > > > > My own recommendation: If you only have one large index on a server, you > should have enough space available for that index to triple in size. If you > have many large indexes, having enough space for *all* your indexes to > double in size at the same time is probably sufficient, but if you can > arrange for more free space, that would be advisable. > > > > Thanks, > > Shawn > > > > > > - > > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional > commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: Index partition corrupted during a regular flush due to FileNotFoundException on DEL file
Thanks to Shawn. So can we say that the FileNotFoundExcep tion on DEL file was caused by merge operation of Lucene? Note that our application is running on old Lucene Core v3.6.2. Does the rule below from Core 5.3.2 apply to Core 3.6.2? Now we have 2 indexes in 68G and 76G respectively, but only 143G disk space is left. -Original Message- From: Shawn Heisey [mailto:apa...@elyograg.org] Sent: Tuesday, September 13, 2016 8:49 PM To: dev@lucene.apache.org Subject: Re: Index partition corrupted during a regular flush due to FileNotFoundException on DEL file On 9/12/2016 7:07 PM, 郑文兴 wrote: > So you mean if there is no more than 10G free space, Lucene/Solr will > delete some files to save the disk space? Or it will cause the > misbehave of Lucene/Solr? If you do not have enough free disk space, and a segment merge takes place that requires more free space than you have, then the merge will fail. Merges are going to happen if you are adding/updating documents in your index. The Solr operation called "optimize" is known as "forceMerge" in Lucene. It forces a merge of the entire index down to (usually) one segment. Here's some more information: <https://lucene.apache.org/core/5_3_2/core/org/apache/lucene/index/IndexWriter.html#forceMerge(int)> https://lucene.apache.org/core/5_3_2/core/org/apache/lucene/index/IndexWriter.html#forceMerge(int) Lucene will not delete files to free up space. The only way that a Lucene index ever gets smaller is through segment merging, which will temporarily *increase* the space used, until the merge is complete. Any deleted documents contained in the merging segments will be removed when the merge completes. > Please note that we have several shards/partitions under the same root > directory, so which way the following is true to us. Let's assume we > have 2 partitions, and A -> 10G, B->10G > > l do we have to make sure there are at least 20G disk space available? > > l Or we just need to make sure there are at least 10G disk space > available? > The extra space required for merging is detailed in the link above. At worst your index size might temporarily increase by a factor of four. Usually a full optimize (forceMerge) on an index that's not in the compound file format will only *double* the index size while it's running, but in some situations, it might require more. My own recommendation: If you only have one large index on a server, you should have enough space available for that index to triple in size. If you have many large indexes, having enough space for *all* your indexes to double in size at the same time is probably sufficient, but if you can arrange for more free space, that would be advisable. Thanks, Shawn - To unsubscribe, e-mail: <mailto:dev-unsubscr...@lucene.apache.org> dev-unsubscr...@lucene.apache.org For additional commands, e-mail: <mailto:dev-h...@lucene.apache.org> dev-h...@lucene.apache.org
Re: Index partition corrupted during a regular flush due to FileNotFoundException on DEL file
On 9/12/2016 7:07 PM, 郑文兴 wrote: > So you mean if there is no more than 10G free space, Lucene/Solr will > delete some files to save the disk space? Or it will cause the > misbehave of Lucene/Solr? If you do not have enough free disk space, and a segment merge takes place that requires more free space than you have, then the merge will fail. Merges are going to happen if you are adding/updating documents in your index. The Solr operation called "optimize" is known as "forceMerge" in Lucene. It forces a merge of the entire index down to (usually) one segment. Here's some more information: https://lucene.apache.org/core/5_3_2/core/org/apache/lucene/index/IndexWriter.html#forceMerge(int) Lucene will not delete files to free up space. The only way that a Lucene index ever gets smaller is through segment merging, which will temporarily *increase* the space used, until the merge is complete. Any deleted documents contained in the merging segments will be removed when the merge completes. > Please note that we have several shards/partitions under the same root > directory, so which way the following is true to us. Let's assume we > have 2 partitions, and A -> 10G, B->10G > > l do we have to make sure there are at least 20G disk space available? > > l Or we just need to make sure there are at least 10G disk space > available? > The extra space required for merging is detailed in the link above. At worst your index size might temporarily increase by a factor of four. Usually a full optimize (forceMerge) on an index that's not in the compound file format will only *double* the index size while it's running, but in some situations, it might require more. My own recommendation: If you only have one large index on a server, you should have enough space available for that index to triple in size. If you have many large indexes, having enough space for *all* your indexes to double in size at the same time is probably sufficient, but if you can arrange for more free space, that would be advisable. Thanks, Shawn - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: Index partition corrupted during a regular flush due to FileNotFoundException on DEL file
BTW: From the log files, I can’t see other abnormal logs related with the exception. But found the 1st exception: [2016-09-12 16:08:47,628][ERROR][qtp2107666786-40502][indexEngine ] index [so_blog] commit ERROR:java.io.FileNotFoundException: _p5tr_328.del org.apache.lucene.index.TieredMergePolicy$SegmentByteSizeDescending.compare(TieredMergePolicy.java:253) org.apache.lucene.index.TieredMergePolicy$SegmentByteSizeDescending.compare(TieredMergePolicy.java:240) java.util.TimSort.binarySort(TimSort.java:265) java.util.TimSort.sort(TimSort.java:208) java.util.TimSort.sort(TimSort.java:173) java.util.Arrays.sort(Arrays.java:659) java.util.Collections.sort(Collections.java:217) org.apache.lucene.index.TieredMergePolicy.findMerges(TieredMergePolicy.java:279) org.apache.lucene.index.IndexWriter.updatePendingMerges(IndexWriter.java:2775) org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:2746) org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:2741) org.apache.lucene.index.IndexWriter.prepareCommit(IndexWriter.java:3402) org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:3485) org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:3467) org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:3451) org.apache.lucene.index.IndexEngine.flush(IndexEngine.java:409) Thanks, Wenxing From: 郑文兴 [mailto:zhen...@csdn.net] Sent: Tuesday, September 13, 2016 9:07 AM To: dev@lucene.apache.org Subject: RE: Index partition corrupted during a regular flush due to FileNotFoundException on DEL file Thanks to Erick. I will check the disk space first. So you mean if there is no more than 10G free space, Lucene/Solr will delete some files to save the disk space? Or it will cause the misbehave of Lucene/Solr? Please note that we have several shards/partitions under the same root directory, so which way the following is true to us. Let's assume we have 2 partitions, and A -> 10G, B->10G l do we have to make sure there are at least 20G disk space available? l Or we just need to make sure there are at least 10G disk space available? Best, Wenxing -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Monday, September 12, 2016 10:59 PM To: dev@lucene.apache.org Subject: Re: Index partition corrupted during a regular flush due to FileNotFoundException on DEL file The del file should be present for each segment assuming it has any documents that have been updated or deleted. Of course if some process external to Solr removed it, you'd get this error. A less common reason is that your disk is full. Solr/Lucene require that you have at least as much free space on your disk as the index occupies. Thus if you have 10G total disk space used up by your index, you must have at least 10G free space, is it possible that you're running without enough disk space? If anything like that is the case you should see errors in your Solr logs, assuming they haven't been rolled over. Is there anything suspicious there? Look for ERROR (all caps) and/or "Caused by" as a start. Best, Erick On Mon, Sep 12, 2016 at 3:31 AM, 郑文兴 < <mailto:zhen...@csdn.net> zhen...@csdn.net> wrote: > Dear all, > > > > Today we found one of our index partitions was corrupted during the > regular flush, due to the FileNotFoundException on a del file. The > followings were the call stacks from the corresponding exception: > > > > [2016-09-12 16:40:01,801][ERROR][qtp2107666786-40854][indexEngine ] > index [so_blog] commit ERROR:_oxep_7fa.del > org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:284) > org.apache.lucene.index.SegmentInfo.sizeInBytes(SegmentInfo.java:303) > org.apache.lucene.index.TieredMergePolicy.size(TieredMergePolicy.java: > 635) > org.apache.lucene.index.TieredMergePolicy.useCompoundFile(TieredMergeP > olicy.java:611) > org.apache.lucene.index.DocumentsWriter.flush(DocumentsWriter.java:593 > ) > org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3587) > org.apache.lucene.index.IndexWriter.prepareCommit(IndexWriter.java:337 > 6) > org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:34 > 85) > org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:3467) > org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:3451) > org.apache.lucene.index.IndexEngine.flush(IndexEngine.java:409) > > > > My questions are: > > l Does anyone know the situation here? From the file system, I can’t > find the _oxep_7fa.del. > > l How about the life cycle of the del file? > > > > Note: The Lucene Core is on 3.6.2. > > > > Appreciated for your kindly advice. > > Best Regards, Wenxing
RE: Index partition corrupted during a regular flush due to FileNotFoundException on DEL file
Thanks to Erick. I will check the disk space first. So you mean if there is no more than 10G free space, Lucene/Solr will delete some files to save the disk space? Or it will cause the misbehave of Lucene/Solr? Please note that we have several shards/partitions under the same root directory, so which way the following is true to us. Let's assume we have 2 partitions, and A -> 10G, B->10G l do we have to make sure there are at least 20G disk space available? l Or we just need to make sure there are at least 10G disk space available? Best, Wenxing -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Monday, September 12, 2016 10:59 PM To: dev@lucene.apache.org Subject: Re: Index partition corrupted during a regular flush due to FileNotFoundException on DEL file The del file should be present for each segment assuming it has any documents that have been updated or deleted. Of course if some process external to Solr removed it, you'd get this error. A less common reason is that your disk is full. Solr/Lucene require that you have at least as much free space on your disk as the index occupies. Thus if you have 10G total disk space used up by your index, you must have at least 10G free space, is it possible that you're running without enough disk space? If anything like that is the case you should see errors in your Solr logs, assuming they haven't been rolled over. Is there anything suspicious there? Look for ERROR (all caps) and/or "Caused by" as a start. Best, Erick On Mon, Sep 12, 2016 at 3:31 AM, 郑文兴 < <mailto:zhen...@csdn.net> zhen...@csdn.net> wrote: > Dear all, > > > > Today we found one of our index partitions was corrupted during the > regular flush, due to the FileNotFoundException on a del file. The > followings were the call stacks from the corresponding exception: > > > > [2016-09-12 16:40:01,801][ERROR][qtp2107666786-40854][indexEngine ] > index [so_blog] commit ERROR:_oxep_7fa.del > org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:284) > org.apache.lucene.index.SegmentInfo.sizeInBytes(SegmentInfo.java:303) > org.apache.lucene.index.TieredMergePolicy.size(TieredMergePolicy.java: > 635) > org.apache.lucene.index.TieredMergePolicy.useCompoundFile(TieredMergeP > olicy.java:611) > org.apache.lucene.index.DocumentsWriter.flush(DocumentsWriter.java:593 > ) > org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3587) > org.apache.lucene.index.IndexWriter.prepareCommit(IndexWriter.java:337 > 6) > org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:34 > 85) > org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:3467) > org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:3451) > org.apache.lucene.index.IndexEngine.flush(IndexEngine.java:409) > > > > My questions are: > > l Does anyone know the situation here? From the file system, I can’t > find the _oxep_7fa.del. > > l How about the life cycle of the del file? > > > > Note: The Lucene Core is on 3.6.2. > > > > Appreciated for your kindly advice. > > Best Regards, Wenxing - To unsubscribe, e-mail: <mailto:dev-unsubscr...@lucene.apache.org> dev-unsubscr...@lucene.apache.org For additional commands, e-mail: <mailto:dev-h...@lucene.apache.org> dev-h...@lucene.apache.org
Re: Index partition corrupted during a regular flush due to FileNotFoundException on DEL file
The del file should be present for each segment assuming it has any documents that have been updated or deleted. Of course if some process external to Solr removed it, you'd get this error. A less common reason is that your disk is full. Solr/Lucene require that you have at least as much free space on your disk as the index occupies. Thus if you have 10G total disk space used up by your index, you must have at least 10G free space, is it possible that you're running without enough disk space? If anything like that is the case you should see errors in your Solr logs, assuming they haven't been rolled over. Is there anything suspicious there? Look for ERROR (all caps) and/or "Caused by" as a start. Best, Erick On Mon, Sep 12, 2016 at 3:31 AM, 郑文兴 wrote: > Dear all, > > > > Today we found one of our index partitions was corrupted during the regular > flush, due to the FileNotFoundException on a del file. The followings were > the call stacks from the corresponding exception: > > > > [2016-09-12 16:40:01,801][ERROR][qtp2107666786-40854][indexEngine ] index > [so_blog] commit ERROR:_oxep_7fa.del > org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:284) > org.apache.lucene.index.SegmentInfo.sizeInBytes(SegmentInfo.java:303) > org.apache.lucene.index.TieredMergePolicy.size(TieredMergePolicy.java:635) > org.apache.lucene.index.TieredMergePolicy.useCompoundFile(TieredMergePolicy.java:611) > org.apache.lucene.index.DocumentsWriter.flush(DocumentsWriter.java:593) > org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3587) > org.apache.lucene.index.IndexWriter.prepareCommit(IndexWriter.java:3376) > org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:3485) > org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:3467) > org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:3451) > org.apache.lucene.index.IndexEngine.flush(IndexEngine.java:409) > > > > My questions are: > > l Does anyone know the situation here? From the file system, I can’t find > the _oxep_7fa.del. > > l How about the life cycle of the del file? > > > > Note: The Lucene Core is on 3.6.2. > > > > Appreciated for your kindly advice. > > Best Regards, Wenxing - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org