Thanks Michael. For your question...yes I am running Solr on Windows and
running it with SimpleFSDirectoryFactory (primary reason being that memory
mapping multi-terabyte indexes is not feasible through mmap). I will create
a Jira later today with the details in this thread and assign it to myself.
Will take a shot at the fix.

Thanks,
Rahul

On Fri, Mar 12, 2021 at 10:00 AM Michael McCandless <
luc...@mikemccandless.com> wrote:

> I think long ago we used to track which files were actually dirty (we had
> written bytes to) and only fsync those ones.  But something went wrong with
> that, and at some point we "simplified" this logic, I think on the
> assumption that asking the OS to fsync a file that does in fact exist yet
> indeed has not changed would be harmless?  But somehow it is not in your
> case?  Are you on Windows?
>
> I tried to do a bit of digital archaeology and remember what
> happened here, and I came across this relevant looking issue:
> https://issues.apache.org/jira/browse/LUCENE-2328.  That issue moved
> tracking of which files have been written but not yet fsync'd down from
> IndexWriter into FSDirectory.
>
> But there was another change that then removed staleFiles from FSDirectory
> entirely.... still trying to find that.  Aha, found it!
> https://issues.apache.org/jira/browse/LUCENE-6150.  Phew Uwe was really
> quite upset in that issue ;)
>
> I also came across this delightful related issue, showing how a massive
> hurricane (Irene) can lead to finding and fixing a bug in Lucene!
> https://issues.apache.org/jira/browse/LUCENE-3418
>
> > The assumption is that while the commit point is saved, no changes
> happen to the segment files in the saved generation.
>
> This assumption should really be true.  Lucene writes the files, append
> only, once, and then never changes them, once they are closed.  Pulling a
> commit point from Solr should further ensure that, even as indexing
> continues and new segments are written, the old segments referenced in that
> commit point will not be deleted.  But apparently this "harmless fsync"
> Lucene is doing is not so harmless in your use case.  Maybe open an issue
> and pull out the details from this discussion onto it?
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Fri, Mar 12, 2021 at 9:03 AM Michael Sokolov <msoko...@gmail.com>
> wrote:
>
>> Also - I should have said - I think the first step here is to write a
>> focused unit test that demonstrates the existence of the extra fsyncs
>> that we want to eliminate. It would be awesome if you were able to
>> create such a thing.
>>
>> On Fri, Mar 12, 2021 at 9:00 AM Michael Sokolov <msoko...@gmail.com>
>> wrote:
>> >
>> > Yes, please go ahead and open an issue. TBH I'm not sure why this is
>> > happening - there may be a good reason?? But let's explore it using an
>> > issue, thanks.
>> >
>> > On Fri, Mar 12, 2021 at 12:16 AM Rahul Goswami <rahul196...@gmail.com>
>> wrote:
>> > >
>> > > I can create a Jira and assign it to myself if that's ok (?). I think
>> this can help improve commit performance.
>> > > Also, to answer your question, we have indexes sometimes going into
>> multiple terabytes. Using the replication handler for backup would mean
>> requiring a disk capacity more than 2x the index size on the machine at all
>> times, which might not be feasible. So we directly back the index up from
>> the Solr node to a remote repository.
>> > >
>> > > Thanks,
>> > > Rahul
>> > >
>> > > On Thu, Mar 11, 2021 at 4:09 PM Michael Sokolov <msoko...@gmail.com>
>> wrote:
>> > >>
>> > >> Well, it certainly doesn't seem necessary to fsync files that are
>> > >> unchanged and have already been fsync'ed. Maybe there's an
>> opportunity
>> > >> to improve it? On the other hand, support for external processes
>> > >> reading Lucene index files isn't likely to become a feature of
>> Lucene.
>> > >> You might want to consider using Solr replication to power your
>> > >> backup?
>> > >>
>> > >> On Thu, Mar 11, 2021 at 2:52 PM Rahul Goswami <rahul196...@gmail.com>
>> wrote:
>> > >> >
>> > >> > Thanks Michael. I thought since this discussion is closer to the
>> code than most discussions on the solr-users list, it seemed like a more
>> appropriate forum. Will be mindful going forward.
>> > >> > On your point about new segments, I attached a debugger and tried
>> to do a new commit (just pure Solr commit, no backup process running), and
>> the code indeed does fsync on a pre-existing segment file. Hence I was a
>> bit baffled since it challenged my fundamental understanding that segment
>> files once written are immutable, no matter what (unless picked up for a
>> merge of course). Hence I thought of reaching out, in case there are
>> scenarios where this might happen which I might be unaware of.
>> > >> >
>> > >> > Thanks,
>> > >> > Rahul
>> > >> >
>> > >> > On Thu, Mar 11, 2021 at 2:38 PM Michael Sokolov <
>> msoko...@gmail.com> wrote:
>> > >> >>
>> > >> >> This isn't a support forum; solr-users@ might be more
>> appropriate. On
>> > >> >> that list someone might have a better idea about how the
>> replication
>> > >> >> handler gets its list of files. This would be a good list to try
>> if
>> > >> >> you wanted to propose a fix for the problem you're having. But
>> since
>> > >> >> you're here -- it looks to me as if IndexWriter indeed syncs all
>> "new"
>> > >> >> files in the current segments being committed; look in
>> > >> >> IndexWriter.startCommit and SegmentInfos.files. Caveat: (1) I'm
>> > >> >> looking at this code for the first time, and (2) things may have
>> been
>> > >> >> different in 7.7.2? Sorry I don't know for sure, but are you sure
>> that
>> > >> >> your backup process is not attempting to copy one of the new
>> files?
>> > >> >>
>> > >> >> On Thu, Mar 11, 2021 at 1:35 PM Rahul Goswami <
>> rahul196...@gmail.com> wrote:
>> > >> >> >
>> > >> >> > Hello,
>> > >> >> > Just wanted to follow up one more time to see if this is the
>> right form for my question? Or is this suitable for some other mailing list?
>> > >> >> >
>> > >> >> > Best,
>> > >> >> > Rahul
>> > >> >> >
>> > >> >> > On Sat, Mar 6, 2021 at 3:57 PM Rahul Goswami <
>> rahul196...@gmail.com> wrote:
>> > >> >> >>
>> > >> >> >> Hello everyone,
>> > >> >> >> Following up on my question in case anyone has any idea. Why
>> it's important to know this is because I am thinking of allowing the backup
>> process to not hold any lock on the index files, which should allow the
>> fsync during parallel commits. BUT, in case doing an fsync on existing
>> segment files in a saved commit point DOES have an effect, it might render
>> the backed up index in a corrupt state.
>> > >> >> >>
>> > >> >> >> Thanks,
>> > >> >> >> Rahul
>> > >> >> >>
>> > >> >> >> On Fri, Mar 5, 2021 at 3:04 PM Rahul Goswami <
>> rahul196...@gmail.com> wrote:
>> > >> >> >>>
>> > >> >> >>> Hello,
>> > >> >> >>> We have a process which backs up the index (Solr 7.7.2) on a
>> schedule. The way we do it is we first save a commit point on the index and
>> then using Solr's /replication handler, get the list of files in that
>> generation. After the backup completes, we release the commit point (Please
>> note that this is a separate backup process outside of Solr and not the
>> backup command of the /replication handler)
>> > >> >> >>> The assumption is that while the commit point is saved, no
>> changes happen to the segment files in the saved generation.
>> > >> >> >>>
>> > >> >> >>> Now the issue... The backup process opens the index files in
>> a shared READ mode, preventing writes. This is causing any parallel commits
>> to fail as it seems to be complaining about the index files to be locked by
>> another process(the backup process). Upon debugging, I see that fsync is
>> being called during commit on already existing segment files which is not
>> expected. So, my question is, is there any reason for lucene to call fsync
>> on already existing segment files?
>> > >> >> >>>
>> > >> >> >>> The line of code I am referring to is as below:
>> > >> >> >>> try (final FileChannel file = FileChannel.open(fileToSync,
>> isDir ? StandardOpenOption.READ : StandardOpenOption.WRITE))
>> > >> >> >>>
>> > >> >> >>> in method fsync(Path fileToSync, boolean isDir) of the class
>> file
>> > >> >> >>>
>> > >> >> >>> lucene\core\src\java\org\apache\lucene\util\IOUtils.java
>> > >> >> >>>
>> > >> >> >>> Thanks,
>> > >> >> >>> Rahul
>> > >> >>
>> > >> >>
>> ---------------------------------------------------------------------
>> > >> >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> > >> >> For additional commands, e-mail: dev-h...@lucene.apache.org
>> > >> >>
>> > >>
>> > >> ---------------------------------------------------------------------
>> > >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> > >> For additional commands, e-mail: dev-h...@lucene.apache.org
>> > >>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>>

Reply via email to