Re: Lucene (unexpected ) fsync on existing segments

Uwe Schindler Mon, 15 Mar 2021 12:50:10 -0700

Correction: the windows limitation is only till windows server 2012 / Windows 
8. So you can memory map easily terabytes of data nowadays.


Uwe

Am March 15, 2021 7:42:26 PM UTC schrieb Uwe Schindler <[email protected]>:
>Hi Mike,
>
>Windows has unfortunately some crazy limitation on address space, so
>number of address bits is limited to 43, see my blog post @
>https://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
>
>That's 8 Terabyte.
>
>On Linux this limitation is at 47 bits, and with later kernels and
>hardware it's even huger so the universe fits into it. 😜
>
>Uwe
>
>Am March 15, 2021 7:15:11 PM UTC schrieb Michael McCandless
><[email protected]>:
>>Thanks Rahul.
>>
>>> primary reason being that memory mapping multi-terabyte indexes is
>>not
>>feasible through mmap
>>
>>Hmm, that is interesting -- are you using a 64 bit JVM?  If so, what
>>goes
>>wrong with such large maps?  Lucene's MMapDirectory should chunk the
>>mapping to deal with ByteBuffer int only address space.
>>
>>SimpleFSDirectory usually has substantially worse performance than
>>MMapDirectory.
>>
>>Still, I suspect you would hit the same issue if you used other
>>FSDirectory
>>implementations -- the fsync behavior should be the same.
>>
>>Mike McCandless
>>
>>http://blog.mikemccandless.com
>>
>>
>>On Fri, Mar 12, 2021 at 1:46 PM Rahul Goswami <[email protected]>
>>wrote:
>>
>>> Thanks Michael. For your question...yes I am running Solr on Windows
>>and
>>> running it with SimpleFSDirectoryFactory (primary reason being that
>>memory
>>> mapping multi-terabyte indexes is not feasible through mmap). I will
>>create
>>> a Jira later today with the details in this thread and assign it to
>>myself.
>>> Will take a shot at the fix.
>>>
>>> Thanks,
>>> Rahul
>>>
>>> On Fri, Mar 12, 2021 at 10:00 AM Michael McCandless <
>>> [email protected]> wrote:
>>>
>>>> I think long ago we used to track which files were actually dirty
>>(we had
>>>> written bytes to) and only fsync those ones.  But something went
>>wrong with
>>>> that, and at some point we "simplified" this logic, I think on the
>>>> assumption that asking the OS to fsync a file that does in fact
>>exist yet
>>>> indeed has not changed would be harmless?  But somehow it is not in
>>your
>>>> case?  Are you on Windows?
>>>>
>>>> I tried to do a bit of digital archaeology and remember what
>>>> happened here, and I came across this relevant looking issue:
>>>> https://issues.apache.org/jira/browse/LUCENE-2328.  That issue
>moved
>>>> tracking of which files have been written but not yet fsync'd down
>>from
>>>> IndexWriter into FSDirectory.
>>>>
>>>> But there was another change that then removed staleFiles from
>>>> FSDirectory entirely.... still trying to find that.  Aha, found it!
>>>> https://issues.apache.org/jira/browse/LUCENE-6150.  Phew Uwe was
>>really
>>>> quite upset in that issue ;)
>>>>
>>>> I also came across this delightful related issue, showing how a
>>massive
>>>> hurricane (Irene) can lead to finding and fixing a bug in Lucene!
>>>> https://issues.apache.org/jira/browse/LUCENE-3418
>>>>
>>>> > The assumption is that while the commit point is saved, no
>changes
>>>> happen to the segment files in the saved generation.
>>>>
>>>> This assumption should really be true.  Lucene writes the files,
>>append
>>>> only, once, and then never changes them, once they are closed. 
>>Pulling a
>>>> commit point from Solr should further ensure that, even as indexing
>>>> continues and new segments are written, the old segments referenced
>>in that
>>>> commit point will not be deleted.  But apparently this "harmless
>>fsync"
>>>> Lucene is doing is not so harmless in your use case.  Maybe open an
>>issue
>>>> and pull out the details from this discussion onto it?
>>>>
>>>> Mike McCandless
>>>>
>>>> http://blog.mikemccandless.com
>>>>
>>>>
>>>> On Fri, Mar 12, 2021 at 9:03 AM Michael Sokolov
><[email protected]>
>>>> wrote:
>>>>
>>>>> Also - I should have said - I think the first step here is to
>write
>>a
>>>>> focused unit test that demonstrates the existence of the extra
>>fsyncs
>>>>> that we want to eliminate. It would be awesome if you were able to
>>>>> create such a thing.
>>>>>
>>>>> On Fri, Mar 12, 2021 at 9:00 AM Michael Sokolov
>><[email protected]>
>>>>> wrote:
>>>>> >
>>>>> > Yes, please go ahead and open an issue. TBH I'm not sure why
>this
>>is
>>>>> > happening - there may be a good reason?? But let's explore it
>>using an
>>>>> > issue, thanks.
>>>>> >
>>>>> > On Fri, Mar 12, 2021 at 12:16 AM Rahul Goswami
>><[email protected]>
>>>>> wrote:
>>>>> > >
>>>>> > > I can create a Jira and assign it to myself if that's ok (?).
>I
>>>>> think this can help improve commit performance.
>>>>> > > Also, to answer your question, we have indexes sometimes going
>>into
>>>>> multiple terabytes. Using the replication handler for backup would
>>mean
>>>>> requiring a disk capacity more than 2x the index size on the
>>machine at all
>>>>> times, which might not be feasible. So we directly back the index
>>up from
>>>>> the Solr node to a remote repository.
>>>>> > >
>>>>> > > Thanks,
>>>>> > > Rahul
>>>>> > >
>>>>> > > On Thu, Mar 11, 2021 at 4:09 PM Michael Sokolov
>><[email protected]>
>>>>> wrote:
>>>>> > >>
>>>>> > >> Well, it certainly doesn't seem necessary to fsync files that
>>are
>>>>> > >> unchanged and have already been fsync'ed. Maybe there's an
>>>>> opportunity
>>>>> > >> to improve it? On the other hand, support for external
>>processes
>>>>> > >> reading Lucene index files isn't likely to become a feature
>of
>>>>> Lucene.
>>>>> > >> You might want to consider using Solr replication to power
>>your
>>>>> > >> backup?
>>>>> > >>
>>>>> > >> On Thu, Mar 11, 2021 at 2:52 PM Rahul Goswami <
>>>>> [email protected]> wrote:
>>>>> > >> >
>>>>> > >> > Thanks Michael. I thought since this discussion is closer
>to
>>the
>>>>> code than most discussions on the solr-users list, it seemed like
>a
>>more
>>>>> appropriate forum. Will be mindful going forward.
>>>>> > >> > On your point about new segments, I attached a debugger and
>>tried
>>>>> to do a new commit (just pure Solr commit, no backup process
>>running), and
>>>>> the code indeed does fsync on a pre-existing segment file. Hence I
>>was a
>>>>> bit baffled since it challenged my fundamental understanding that
>>segment
>>>>> files once written are immutable, no matter what (unless picked up
>>for a
>>>>> merge of course). Hence I thought of reaching out, in case there
>>are
>>>>> scenarios where this might happen which I might be unaware of.
>>>>> > >> >
>>>>> > >> > Thanks,
>>>>> > >> > Rahul
>>>>> > >> >
>>>>> > >> > On Thu, Mar 11, 2021 at 2:38 PM Michael Sokolov <
>>>>> [email protected]> wrote:
>>>>> > >> >>
>>>>> > >> >> This isn't a support forum; solr-users@ might be more
>>>>> appropriate. On
>>>>> > >> >> that list someone might have a better idea about how the
>>>>> replication
>>>>> > >> >> handler gets its list of files. This would be a good list
>>to try
>>>>> if
>>>>> > >> >> you wanted to propose a fix for the problem you're having.
>>But
>>>>> since
>>>>> > >> >> you're here -- it looks to me as if IndexWriter indeed
>>syncs all
>>>>> "new"
>>>>> > >> >> files in the current segments being committed; look in
>>>>> > >> >> IndexWriter.startCommit and SegmentInfos.files. Caveat:
>(1)
>>I'm
>>>>> > >> >> looking at this code for the first time, and (2) things
>may
>>have
>>>>> been
>>>>> > >> >> different in 7.7.2? Sorry I don't know for sure, but are
>>you
>>>>> sure that
>>>>> > >> >> your backup process is not attempting to copy one of the
>>new
>>>>> files?
>>>>> > >> >>
>>>>> > >> >> On Thu, Mar 11, 2021 at 1:35 PM Rahul Goswami <
>>>>> [email protected]> wrote:
>>>>> > >> >> >
>>>>> > >> >> > Hello,
>>>>> > >> >> > Just wanted to follow up one more time to see if this is
>>the
>>>>> right form for my question? Or is this suitable for some other
>>mailing list?
>>>>> > >> >> >
>>>>> > >> >> > Best,
>>>>> > >> >> > Rahul
>>>>> > >> >> >
>>>>> > >> >> > On Sat, Mar 6, 2021 at 3:57 PM Rahul Goswami <
>>>>> [email protected]> wrote:
>>>>> > >> >> >>
>>>>> > >> >> >> Hello everyone,
>>>>> > >> >> >> Following up on my question in case anyone has any
>idea.
>>Why
>>>>> it's important to know this is because I am thinking of allowing
>>the backup
>>>>> process to not hold any lock on the index files, which should
>allow
>>the
>>>>> fsync during parallel commits. BUT, in case doing an fsync on
>>existing
>>>>> segment files in a saved commit point DOES have an effect, it
>might
>>render
>>>>> the backed up index in a corrupt state.
>>>>> > >> >> >>
>>>>> > >> >> >> Thanks,
>>>>> > >> >> >> Rahul
>>>>> > >> >> >>
>>>>> > >> >> >> On Fri, Mar 5, 2021 at 3:04 PM Rahul Goswami <
>>>>> [email protected]> wrote:
>>>>> > >> >> >>>
>>>>> > >> >> >>> Hello,
>>>>> > >> >> >>> We have a process which backs up the index (Solr
>7.7.2)
>>on a
>>>>> schedule. The way we do it is we first save a commit point on the
>>index and
>>>>> then using Solr's /replication handler, get the list of files in
>>that
>>>>> generation. After the backup completes, we release the commit
>point
>>(Please
>>>>> note that this is a separate backup process outside of Solr and
>not
>>the
>>>>> backup command of the /replication handler)
>>>>> > >> >> >>> The assumption is that while the commit point is
>saved,
>>no
>>>>> changes happen to the segment files in the saved generation.
>>>>> > >> >> >>>
>>>>> > >> >> >>> Now the issue... The backup process opens the index
>>files in
>>>>> a shared READ mode, preventing writes. This is causing any
>parallel
>>commits
>>>>> to fail as it seems to be complaining about the index files to be
>>locked by
>>>>> another process(the backup process). Upon debugging, I see that
>>fsync is
>>>>> being called during commit on already existing segment files which
>>is not
>>>>> expected. So, my question is, is there any reason for lucene to
>>call fsync
>>>>> on already existing segment files?
>>>>> > >> >> >>>
>>>>> > >> >> >>> The line of code I am referring to is as below:
>>>>> > >> >> >>> try (final FileChannel file =
>>FileChannel.open(fileToSync,
>>>>> isDir ? StandardOpenOption.READ : StandardOpenOption.WRITE))
>>>>> > >> >> >>>
>>>>> > >> >> >>> in method fsync(Path fileToSync, boolean isDir) of the
>>class
>>>>> file
>>>>> > >> >> >>>
>>>>> > >> >> >>>
>>lucene\core\src\java\org\apache\lucene\util\IOUtils.java
>>>>> > >> >> >>>
>>>>> > >> >> >>> Thanks,
>>>>> > >> >> >>> Rahul
>>>>> > >> >>
>>>>> > >> >>
>>>>>
>>---------------------------------------------------------------------
>>>>> > >> >> To unsubscribe, e-mail: [email protected]
>>>>> > >> >> For additional commands, e-mail:
>[email protected]
>>>>> > >> >>
>>>>> > >>
>>>>> > >>
>>>>>
>>---------------------------------------------------------------------
>>>>> > >> To unsubscribe, e-mail: [email protected]
>>>>> > >> For additional commands, e-mail: [email protected]
>>>>> > >>
>>>>>
>>>>>
>>---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: [email protected]
>>>>> For additional commands, e-mail: [email protected]
>>>>>
>>>>>
>
>--
>Uwe Schindler
>Achterdiek 19, 28357 Bremen
>https://www.thetaphi.de

--
Uwe Schindler
Achterdiek 19, 28357 Bremen
https://www.thetaphi.de

Re: Lucene (unexpected ) fsync on existing segments

Reply via email to