Hi,

To find all errors in an index, you should pass -ea to the java command line to 
enable assertions.

Uwe

Am 5. Mai 2022 14:25:03 UTC schrieb Michael McCandless 
<luc...@mikemccandless.com>:
>Hi Antony,
>
>Sorry for the late reply.
>
>Indeed the file _14gb.si is missing, yet _14gb.cfs is present (interesting
>-- must have failed deletion because an IndexReader has it open).  And yet
>when you run CheckIndex on this directory (without -exorcise), the index is
>fine?  No errors reported?  Can you post the full CheckIndex output?
>
>There are two segments_N files present, which is interesting.  Are
>you using the default IndexDeletionPolicy (which deletes the old segments_N
>file as soon as the new segments_N+1 is done being committed)?
>
>Do you open near-real-time readers (passing IndexWriter to
>DirectoryReader.open)?  Or filesystem based readers only (passing Directory
>to DirectoryReader.open)?
>
>How do you reopen/refresh those IndexReaders?  Is it "every N seconds"?  Or
>is it timed to after the IndexWriter.commit() has finished?  How often are
>you calling IndexWriter.commit()?
>
>6.5.0 is quite old by now, and I poked around in our issue history
><https://jirasearch.mikemccandless.com/search.py?index=jira> to see if this
>might be a known issue.  The only interesting issue I found was LUCENE-6835
><https://issues.apache.org/jira/browse/LUCENE-6835> which shifted
>responsibility of retrying file deletions down into Directory (instead of
>IndexWriter), but that landed in 6.0 and hopefully any bugs were ironed out
>by 6.5.0.
>
>Mike McCandless
>
>http://blog.mikemccandless.com
>
>
>On Wed, May 4, 2022 at 3:44 PM Antony Joseph <antony.dev.webm...@gmail.com>
>wrote:
>
>> Hi Michael,
>>
>> Any update?
>>
>> Regards,
>> Antony
>>
>> On Sun, 1 May 2022 at 19:35, Antony Joseph <antony.dev.webm...@gmail.com>
>> wrote:
>>
>>> Hi Michael,
>>>
>>> Thank you for your reply. Please find responses to your questions below.
>>>
>>> Regards,
>>> Antony
>>>
>>> On Sat, 30 Apr 2022 at 18:59, Michael McCandless <
>>> luc...@mikemccandless.com> wrote:
>>>
>>>> Hi Antony,
>>>>
>>>> Hmm it looks like the root cause is this:
>>>>
>>>>       Caused by: java.nio.file.NoSuchFileException: D:\i\202204\_14gb.si
>>>>
>>>> Can you list all the files in the index directory at the time this
>>>> exception happens, and reply here?  We need to figure out whether the file
>>>> is really missing or what.
>>>>
>>> Below the index directory file listing. Yes, file is missing
>>> (D:\i\202204\_14gb.si)
>>>
>>>>
>>>> Do you run any virus scanner / disk file tree utilities / etc.?  In the
>>>> distant past sometimes such programs might cause strange transient errors
>>>> if they open a file for read exclusively or so, on windows.
>>>>
>>> There is no virus scanner running.
>>>
>>>>
>>>> What is the actual drive you are storing the index on (D:)?  Is it a
>>>> local disk or remote SMBFS mount?
>>>>
>>> It's a local disk (D:).
>>>
>>>>
>>>> Mike McCandless
>>>>
>>>> http://blog.mikemccandless.com
>>>>
>>>>
>>>> On Sat, Apr 30, 2022 at 8:39 AM Antony Joseph <
>>>> antony.dev.webm...@gmail.com> wrote:
>>>>
>>>>> Thank you for your reply.
>>>>>
>>>>> *The full stack trace is included:*
>>>>>
>>>>> <super: <class 'JavaError'>, <JavaError object>>
>>>>>     Java stacktrace:
>>>>> org.apache.lucene.index.CorruptIndexException: Unexpected file read
>>>>> error
>>>>> while
>>>>> reading index.
>>>>>
>>>>> (resource=BufferedChecksumIndexInput(MMapIndexInput(path="D:\i\202204\segments_10fj")))
>>>>>         at
>>>>> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:290)
>>>>>         at
>>>>>
>>>>> org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:165)
>>>>>         at
>>>>> org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:972)
>>>>> Caused by: java.nio.file.NoSuchFileException: D:\i\202204\_14gb.si
>>>>>         at sun.nio.fs.WindowsException.translateToIOException(Unknown
>>>>> Source)
>>>>>         at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown
>>>>> Source)
>>>>>         at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown
>>>>> Source)
>>>>>         at sun.nio.fs.WindowsFileSystemProvider.newFileChannel(Unknown
>>>>> Source)
>>>>>         at java.nio.channels.FileChannel.open(Unknown Source)
>>>>>         at java.nio.channels.FileChannel.open(Unknown Source)
>>>>>         at
>>>>> org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:238)
>>>>>         at
>>>>> org.apache.lucene.store.Directory.openChecksumInput(Directory.java:137)
>>>>>         at
>>>>>
>>>>> org.apache.lucene.codecs.lucene62.Lucene62SegmentInfoFormat.read(Lucene62SegmentInfoFormat.java:89)
>>>>>         at
>>>>> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:357)
>>>>>         at
>>>>> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:288)
>>>>>         ... 2 more
>>>>>
>>>>> Traceback (most recent call last):
>>>>>   File "index.py", line 112, in start
>>>>>     writer = IndexWriter(index_directory, iconfig)
>>>>> lucene.JavaError: <super: <class 'JavaError'>, <JavaError object>>
>>>>>     Java stacktrace:
>>>>> org.apache.lucene.index.CorruptIndexException: Unexpected file read
>>>>> error
>>>>> while
>>>>> reading index.
>>>>>
>>>>> (resource=BufferedChecksumIndexInput(MMapIndexInput(path="D:\i\202204\segments_10fj")))
>>>>>         at
>>>>> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:290)
>>>>>         at
>>>>>
>>>>> org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:165)
>>>>>         at
>>>>> org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:972)
>>>>> Caused by: java.nio.file.NoSuchFileException: D:\i\202204\_14gb.si
>>>>>         at sun.nio.fs.WindowsException.translateToIOException(Unknown
>>>>> Source)
>>>>>         at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown
>>>>> Source)
>>>>>         at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown
>>>>> Source)
>>>>>         at sun.nio.fs.WindowsFileSystemProvider.newFileChannel(Unknown
>>>>> Source)
>>>>>         at java.nio.channels.FileChannel.open(Unknown Source)
>>>>>         at java.nio.channels.FileChannel.open(Unknown Source)
>>>>>         at
>>>>> org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:238)
>>>>>         at
>>>>> org.apache.lucene.store.Directory.openChecksumInput(Directory.java:137)
>>>>>         at
>>>>>
>>>>> org.apache.lucene.codecs.lucene62.Lucene62SegmentInfoFormat.read(Lucene62SegmentInfoFormat.java:89)
>>>>>         at
>>>>> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:357)
>>>>>         at
>>>>> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:288)
>>>>>         ... 2 more
>>>>>
>>>>>
>>>>> Regards,
>>>>> Antony
>>>>>
>>>>> On Sat, 30 Apr 2022 at 10:59, Robert Muir <rcm...@gmail.com> wrote:
>>>>>
>>>>> > The most helpful thing would be the full stacktrace of the exception.
>>>>> > This exception should be chaining the original exception and call
>>>>> > site, and maybe tell us more about this error you hit.
>>>>> >
>>>>> > To me, it looks like a windows-specific issue where the filesystem is
>>>>> > returning an unexpected error. So it would be helpful to see exactly
>>>>> > which one that is, and the full trace of where it comes from, to chase
>>>>> > it further
>>>>> >
>>>>> > On Thu, Apr 28, 2022 at 12:10 PM Antony Joseph
>>>>> > <antony.dev.webm...@gmail.com> wrote:
>>>>> > >
>>>>> > > Thank you for your reply.
>>>>> > >
>>>>> > > This isn't happening in a single environment. Our application is
>>>>> being
>>>>> > used
>>>>> > > by various clients and this has been reported by multiple users -
>>>>> all of
>>>>> > > whom were running the earlier pylucene (v4.10) - without issues.
>>>>> > >
>>>>> > > One thing to mention is that our earlier version used Python 2.7.15
>>>>> (with
>>>>> > > pylucene 4.10) and now we are using Python 3.8.10 with Pylucene
>>>>> 6.5.0 -
>>>>> > the
>>>>> > > indexing logic is the same...
>>>>> > >
>>>>> > > One other thing to note is that the issue described has (so far!)
>>>>> only
>>>>> > > occurred on MS Windows - none of our Linux customers have complained
>>>>> > about
>>>>> > > this.
>>>>> > >
>>>>> > > Any ideas?
>>>>> > >
>>>>> > > Regards,
>>>>> > > Antony
>>>>> > >
>>>>> > > On Thu, 28 Apr 2022 at 17:00, Adrien Grand <jpou...@gmail.com>
>>>>> wrote:
>>>>> > >
>>>>> > > > Hi Anthony,
>>>>> > > >
>>>>> > > > This isn't something that you should try to fix programmatically,
>>>>> > > > corruptions indicate that something is wrong with the environment,
>>>>> > > > like a broken disk or corrupt RAM. I would suggest running a
>>>>> memtest
>>>>> > > > to check your RAM and looking at system logs in case they have
>>>>> > > > anything to tell about your disks.
>>>>> > > >
>>>>> > > > Can you also share the full stack trace of the exception?
>>>>> > > >
>>>>> > > > On Thu, Apr 28, 2022 at 10:26 AM Antony Joseph
>>>>> > > > <antony.dev.webm...@gmail.com> wrote:
>>>>> > > > >
>>>>> > > > > Hello,
>>>>> > > > >
>>>>> > > > > We are facing a strange situation in our application as
>>>>> described
>>>>> > below:
>>>>> > > > >
>>>>> > > > > *Using*:
>>>>> > > > >
>>>>> > > > >    - Python 3.8.10
>>>>> > > > >    - Pylucene 6.5.0
>>>>> > > > >    - Java 8 (1.8.0_181)
>>>>> > > > >    - Runs on Linux and Windows (error seen on Windows)
>>>>> > > > >
>>>>> > > > > We suddenly get the following *error*:
>>>>> > > > >
>>>>> > > > > 2022-02-10 09:58:09.253215: ERROR : writer | Failed to get index
>>>>> > > > > (D:\i\202202) writer, Exception:
>>>>> > > > > org.apache.lucene.index.CorruptIndexException: Unexpected file
>>>>> read
>>>>> > error
>>>>> > > > > while reading index.
>>>>> > > > >
>>>>> > > >
>>>>> >
>>>>> (resource=BufferedChecksumIndexInput(MMapIndexInput(path="D:\i\202202\segments_fo")))
>>>>> > > > >
>>>>> > > > >
>>>>> > > > > After this, no further indexing happens - trying to open the
>>>>> index
>>>>> > for
>>>>> > > > > writing throws the above error - and the index writer does not
>>>>> open.
>>>>> > > > >
>>>>> > > > > FYI, our code contains the following *settings*:
>>>>> > > > >
>>>>> > > > > index_path = "D:\i\202202"
>>>>> > > > > index_directory = FSDirectory.open(Paths.get(index_path))
>>>>> > > > > iconfig = IndexWriterConfig(wrapper_analyzer)
>>>>> > > > > iconfig.setOpenMode(IndexWriterConfig.OpenMode.CREATE_OR_APPEND)
>>>>> > > > > iconfig.setRAMBufferSizeMB(16.0)
>>>>> > > > > writer = IndexWriter(index_directory, iconfig)
>>>>> > > > >
>>>>> > > > >
>>>>> > > > > *Repairing*
>>>>> > > > > We tried 'repairing' the index with the following command /
>>>>> tool:
>>>>> > > > >
>>>>> > > > > java -cp lucene-core-6.5.0.jar:lucene-backward-codecs-6.5.0.jar
>>>>> > > > > org.apache.lucene.index.CheckIndex "D:\i\202202" -exorcise
>>>>> > > > >
>>>>> > > > > This however returns saying "No problems found with the index."
>>>>> > > > >
>>>>> > > > >
>>>>> > > > > *Work around*
>>>>> > > > > We have to manually delete the problematic segment file:
>>>>> > > > > D:\i\202202\segments_fo
>>>>> > > > > after which the application starts again... until the next
>>>>> > corruption. We
>>>>> > > > > can't spot a specific pattern.
>>>>> > > > >
>>>>> > > > >
>>>>> > > > > *Two questions:*
>>>>> > > > >
>>>>> > > > >    1. Can we handle this situation programmatically, so that no
>>>>> > manual
>>>>> > > > >    intervention is needed?
>>>>> > > > >    2. Any reason why we are facing the corruption issue in the
>>>>> first
>>>>> > > > place?
>>>>> > > > >
>>>>> > > > >
>>>>> > > > > Before this we were using Pylucene 4.10 and we didn't face this
>>>>> > problem -
>>>>> > > > > the application logic is the same.
>>>>> > > > >
>>>>> > > > > Also, while the application runs on both Linux and Windows, so
>>>>> far we
>>>>> > > > have
>>>>> > > > > observed this situation only on various Windows platforms.
>>>>> > > > >
>>>>> > > > > Would really appreciate some assistance. Thanks in advance.
>>>>> > > > >
>>>>> > > > > Regards,
>>>>> > > > > Antony
>>>>> > > >
>>>>> > > >
>>>>> > > >
>>>>> > > > --
>>>>> > > > Adrien
>>>>> > > >
>>>>> > > >
>>>>> ---------------------------------------------------------------------
>>>>> > > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>>>>> > > > For additional commands, e-mail: java-user-h...@lucene.apache.org
>>>>> > > >
>>>>> > > >
>>>>> >
>>>>> > ---------------------------------------------------------------------
>>>>> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>>>>> > For additional commands, e-mail: java-user-h...@lucene.apache.org
>>>>> >
>>>>> >
>>>>>
>>>>

--
Uwe Schindler
Achterdiek 19, 28357 Bremen
https://www.thetaphi.de

Reply via email to