Hi, To find all errors in an index, you should pass -ea to the java command line to enable assertions.
Uwe Am 5. Mai 2022 14:25:03 UTC schrieb Michael McCandless <luc...@mikemccandless.com>: >Hi Antony, > >Sorry for the late reply. > >Indeed the file _14gb.si is missing, yet _14gb.cfs is present (interesting >-- must have failed deletion because an IndexReader has it open). And yet >when you run CheckIndex on this directory (without -exorcise), the index is >fine? No errors reported? Can you post the full CheckIndex output? > >There are two segments_N files present, which is interesting. Are >you using the default IndexDeletionPolicy (which deletes the old segments_N >file as soon as the new segments_N+1 is done being committed)? > >Do you open near-real-time readers (passing IndexWriter to >DirectoryReader.open)? Or filesystem based readers only (passing Directory >to DirectoryReader.open)? > >How do you reopen/refresh those IndexReaders? Is it "every N seconds"? Or >is it timed to after the IndexWriter.commit() has finished? How often are >you calling IndexWriter.commit()? > >6.5.0 is quite old by now, and I poked around in our issue history ><https://jirasearch.mikemccandless.com/search.py?index=jira> to see if this >might be a known issue. The only interesting issue I found was LUCENE-6835 ><https://issues.apache.org/jira/browse/LUCENE-6835> which shifted >responsibility of retrying file deletions down into Directory (instead of >IndexWriter), but that landed in 6.0 and hopefully any bugs were ironed out >by 6.5.0. > >Mike McCandless > >http://blog.mikemccandless.com > > >On Wed, May 4, 2022 at 3:44 PM Antony Joseph <antony.dev.webm...@gmail.com> >wrote: > >> Hi Michael, >> >> Any update? >> >> Regards, >> Antony >> >> On Sun, 1 May 2022 at 19:35, Antony Joseph <antony.dev.webm...@gmail.com> >> wrote: >> >>> Hi Michael, >>> >>> Thank you for your reply. Please find responses to your questions below. >>> >>> Regards, >>> Antony >>> >>> On Sat, 30 Apr 2022 at 18:59, Michael McCandless < >>> luc...@mikemccandless.com> wrote: >>> >>>> Hi Antony, >>>> >>>> Hmm it looks like the root cause is this: >>>> >>>> Caused by: java.nio.file.NoSuchFileException: D:\i\202204\_14gb.si >>>> >>>> Can you list all the files in the index directory at the time this >>>> exception happens, and reply here? We need to figure out whether the file >>>> is really missing or what. >>>> >>> Below the index directory file listing. Yes, file is missing >>> (D:\i\202204\_14gb.si) >>> >>>> >>>> Do you run any virus scanner / disk file tree utilities / etc.? In the >>>> distant past sometimes such programs might cause strange transient errors >>>> if they open a file for read exclusively or so, on windows. >>>> >>> There is no virus scanner running. >>> >>>> >>>> What is the actual drive you are storing the index on (D:)? Is it a >>>> local disk or remote SMBFS mount? >>>> >>> It's a local disk (D:). >>> >>>> >>>> Mike McCandless >>>> >>>> http://blog.mikemccandless.com >>>> >>>> >>>> On Sat, Apr 30, 2022 at 8:39 AM Antony Joseph < >>>> antony.dev.webm...@gmail.com> wrote: >>>> >>>>> Thank you for your reply. >>>>> >>>>> *The full stack trace is included:* >>>>> >>>>> <super: <class 'JavaError'>, <JavaError object>> >>>>> Java stacktrace: >>>>> org.apache.lucene.index.CorruptIndexException: Unexpected file read >>>>> error >>>>> while >>>>> reading index. >>>>> >>>>> (resource=BufferedChecksumIndexInput(MMapIndexInput(path="D:\i\202204\segments_10fj"))) >>>>> at >>>>> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:290) >>>>> at >>>>> >>>>> org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:165) >>>>> at >>>>> org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:972) >>>>> Caused by: java.nio.file.NoSuchFileException: D:\i\202204\_14gb.si >>>>> at sun.nio.fs.WindowsException.translateToIOException(Unknown >>>>> Source) >>>>> at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown >>>>> Source) >>>>> at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown >>>>> Source) >>>>> at sun.nio.fs.WindowsFileSystemProvider.newFileChannel(Unknown >>>>> Source) >>>>> at java.nio.channels.FileChannel.open(Unknown Source) >>>>> at java.nio.channels.FileChannel.open(Unknown Source) >>>>> at >>>>> org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:238) >>>>> at >>>>> org.apache.lucene.store.Directory.openChecksumInput(Directory.java:137) >>>>> at >>>>> >>>>> org.apache.lucene.codecs.lucene62.Lucene62SegmentInfoFormat.read(Lucene62SegmentInfoFormat.java:89) >>>>> at >>>>> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:357) >>>>> at >>>>> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:288) >>>>> ... 2 more >>>>> >>>>> Traceback (most recent call last): >>>>> File "index.py", line 112, in start >>>>> writer = IndexWriter(index_directory, iconfig) >>>>> lucene.JavaError: <super: <class 'JavaError'>, <JavaError object>> >>>>> Java stacktrace: >>>>> org.apache.lucene.index.CorruptIndexException: Unexpected file read >>>>> error >>>>> while >>>>> reading index. >>>>> >>>>> (resource=BufferedChecksumIndexInput(MMapIndexInput(path="D:\i\202204\segments_10fj"))) >>>>> at >>>>> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:290) >>>>> at >>>>> >>>>> org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:165) >>>>> at >>>>> org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:972) >>>>> Caused by: java.nio.file.NoSuchFileException: D:\i\202204\_14gb.si >>>>> at sun.nio.fs.WindowsException.translateToIOException(Unknown >>>>> Source) >>>>> at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown >>>>> Source) >>>>> at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown >>>>> Source) >>>>> at sun.nio.fs.WindowsFileSystemProvider.newFileChannel(Unknown >>>>> Source) >>>>> at java.nio.channels.FileChannel.open(Unknown Source) >>>>> at java.nio.channels.FileChannel.open(Unknown Source) >>>>> at >>>>> org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:238) >>>>> at >>>>> org.apache.lucene.store.Directory.openChecksumInput(Directory.java:137) >>>>> at >>>>> >>>>> org.apache.lucene.codecs.lucene62.Lucene62SegmentInfoFormat.read(Lucene62SegmentInfoFormat.java:89) >>>>> at >>>>> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:357) >>>>> at >>>>> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:288) >>>>> ... 2 more >>>>> >>>>> >>>>> Regards, >>>>> Antony >>>>> >>>>> On Sat, 30 Apr 2022 at 10:59, Robert Muir <rcm...@gmail.com> wrote: >>>>> >>>>> > The most helpful thing would be the full stacktrace of the exception. >>>>> > This exception should be chaining the original exception and call >>>>> > site, and maybe tell us more about this error you hit. >>>>> > >>>>> > To me, it looks like a windows-specific issue where the filesystem is >>>>> > returning an unexpected error. So it would be helpful to see exactly >>>>> > which one that is, and the full trace of where it comes from, to chase >>>>> > it further >>>>> > >>>>> > On Thu, Apr 28, 2022 at 12:10 PM Antony Joseph >>>>> > <antony.dev.webm...@gmail.com> wrote: >>>>> > > >>>>> > > Thank you for your reply. >>>>> > > >>>>> > > This isn't happening in a single environment. Our application is >>>>> being >>>>> > used >>>>> > > by various clients and this has been reported by multiple users - >>>>> all of >>>>> > > whom were running the earlier pylucene (v4.10) - without issues. >>>>> > > >>>>> > > One thing to mention is that our earlier version used Python 2.7.15 >>>>> (with >>>>> > > pylucene 4.10) and now we are using Python 3.8.10 with Pylucene >>>>> 6.5.0 - >>>>> > the >>>>> > > indexing logic is the same... >>>>> > > >>>>> > > One other thing to note is that the issue described has (so far!) >>>>> only >>>>> > > occurred on MS Windows - none of our Linux customers have complained >>>>> > about >>>>> > > this. >>>>> > > >>>>> > > Any ideas? >>>>> > > >>>>> > > Regards, >>>>> > > Antony >>>>> > > >>>>> > > On Thu, 28 Apr 2022 at 17:00, Adrien Grand <jpou...@gmail.com> >>>>> wrote: >>>>> > > >>>>> > > > Hi Anthony, >>>>> > > > >>>>> > > > This isn't something that you should try to fix programmatically, >>>>> > > > corruptions indicate that something is wrong with the environment, >>>>> > > > like a broken disk or corrupt RAM. I would suggest running a >>>>> memtest >>>>> > > > to check your RAM and looking at system logs in case they have >>>>> > > > anything to tell about your disks. >>>>> > > > >>>>> > > > Can you also share the full stack trace of the exception? >>>>> > > > >>>>> > > > On Thu, Apr 28, 2022 at 10:26 AM Antony Joseph >>>>> > > > <antony.dev.webm...@gmail.com> wrote: >>>>> > > > > >>>>> > > > > Hello, >>>>> > > > > >>>>> > > > > We are facing a strange situation in our application as >>>>> described >>>>> > below: >>>>> > > > > >>>>> > > > > *Using*: >>>>> > > > > >>>>> > > > > - Python 3.8.10 >>>>> > > > > - Pylucene 6.5.0 >>>>> > > > > - Java 8 (1.8.0_181) >>>>> > > > > - Runs on Linux and Windows (error seen on Windows) >>>>> > > > > >>>>> > > > > We suddenly get the following *error*: >>>>> > > > > >>>>> > > > > 2022-02-10 09:58:09.253215: ERROR : writer | Failed to get index >>>>> > > > > (D:\i\202202) writer, Exception: >>>>> > > > > org.apache.lucene.index.CorruptIndexException: Unexpected file >>>>> read >>>>> > error >>>>> > > > > while reading index. >>>>> > > > > >>>>> > > > >>>>> > >>>>> (resource=BufferedChecksumIndexInput(MMapIndexInput(path="D:\i\202202\segments_fo"))) >>>>> > > > > >>>>> > > > > >>>>> > > > > After this, no further indexing happens - trying to open the >>>>> index >>>>> > for >>>>> > > > > writing throws the above error - and the index writer does not >>>>> open. >>>>> > > > > >>>>> > > > > FYI, our code contains the following *settings*: >>>>> > > > > >>>>> > > > > index_path = "D:\i\202202" >>>>> > > > > index_directory = FSDirectory.open(Paths.get(index_path)) >>>>> > > > > iconfig = IndexWriterConfig(wrapper_analyzer) >>>>> > > > > iconfig.setOpenMode(IndexWriterConfig.OpenMode.CREATE_OR_APPEND) >>>>> > > > > iconfig.setRAMBufferSizeMB(16.0) >>>>> > > > > writer = IndexWriter(index_directory, iconfig) >>>>> > > > > >>>>> > > > > >>>>> > > > > *Repairing* >>>>> > > > > We tried 'repairing' the index with the following command / >>>>> tool: >>>>> > > > > >>>>> > > > > java -cp lucene-core-6.5.0.jar:lucene-backward-codecs-6.5.0.jar >>>>> > > > > org.apache.lucene.index.CheckIndex "D:\i\202202" -exorcise >>>>> > > > > >>>>> > > > > This however returns saying "No problems found with the index." >>>>> > > > > >>>>> > > > > >>>>> > > > > *Work around* >>>>> > > > > We have to manually delete the problematic segment file: >>>>> > > > > D:\i\202202\segments_fo >>>>> > > > > after which the application starts again... until the next >>>>> > corruption. We >>>>> > > > > can't spot a specific pattern. >>>>> > > > > >>>>> > > > > >>>>> > > > > *Two questions:* >>>>> > > > > >>>>> > > > > 1. Can we handle this situation programmatically, so that no >>>>> > manual >>>>> > > > > intervention is needed? >>>>> > > > > 2. Any reason why we are facing the corruption issue in the >>>>> first >>>>> > > > place? >>>>> > > > > >>>>> > > > > >>>>> > > > > Before this we were using Pylucene 4.10 and we didn't face this >>>>> > problem - >>>>> > > > > the application logic is the same. >>>>> > > > > >>>>> > > > > Also, while the application runs on both Linux and Windows, so >>>>> far we >>>>> > > > have >>>>> > > > > observed this situation only on various Windows platforms. >>>>> > > > > >>>>> > > > > Would really appreciate some assistance. Thanks in advance. >>>>> > > > > >>>>> > > > > Regards, >>>>> > > > > Antony >>>>> > > > >>>>> > > > >>>>> > > > >>>>> > > > -- >>>>> > > > Adrien >>>>> > > > >>>>> > > > >>>>> --------------------------------------------------------------------- >>>>> > > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>>> > > > For additional commands, e-mail: java-user-h...@lucene.apache.org >>>>> > > > >>>>> > > > >>>>> > >>>>> > --------------------------------------------------------------------- >>>>> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>>> > For additional commands, e-mail: java-user-h...@lucene.apache.org >>>>> > >>>>> > >>>>> >>>> -- Uwe Schindler Achterdiek 19, 28357 Bremen https://www.thetaphi.de