Thanks Michael. For your question...yes I am running Solr on Windows and running it with SimpleFSDirectoryFactory (primary reason being that memory mapping multi-terabyte indexes is not feasible through mmap). I will create a Jira later today with the details in this thread and assign it to myself. Will take a shot at the fix.
Thanks, Rahul On Fri, Mar 12, 2021 at 10:00 AM Michael McCandless < luc...@mikemccandless.com> wrote: > I think long ago we used to track which files were actually dirty (we had > written bytes to) and only fsync those ones. But something went wrong with > that, and at some point we "simplified" this logic, I think on the > assumption that asking the OS to fsync a file that does in fact exist yet > indeed has not changed would be harmless? But somehow it is not in your > case? Are you on Windows? > > I tried to do a bit of digital archaeology and remember what > happened here, and I came across this relevant looking issue: > https://issues.apache.org/jira/browse/LUCENE-2328. That issue moved > tracking of which files have been written but not yet fsync'd down from > IndexWriter into FSDirectory. > > But there was another change that then removed staleFiles from FSDirectory > entirely.... still trying to find that. Aha, found it! > https://issues.apache.org/jira/browse/LUCENE-6150. Phew Uwe was really > quite upset in that issue ;) > > I also came across this delightful related issue, showing how a massive > hurricane (Irene) can lead to finding and fixing a bug in Lucene! > https://issues.apache.org/jira/browse/LUCENE-3418 > > > The assumption is that while the commit point is saved, no changes > happen to the segment files in the saved generation. > > This assumption should really be true. Lucene writes the files, append > only, once, and then never changes them, once they are closed. Pulling a > commit point from Solr should further ensure that, even as indexing > continues and new segments are written, the old segments referenced in that > commit point will not be deleted. But apparently this "harmless fsync" > Lucene is doing is not so harmless in your use case. Maybe open an issue > and pull out the details from this discussion onto it? > > Mike McCandless > > http://blog.mikemccandless.com > > > On Fri, Mar 12, 2021 at 9:03 AM Michael Sokolov <msoko...@gmail.com> > wrote: > >> Also - I should have said - I think the first step here is to write a >> focused unit test that demonstrates the existence of the extra fsyncs >> that we want to eliminate. It would be awesome if you were able to >> create such a thing. >> >> On Fri, Mar 12, 2021 at 9:00 AM Michael Sokolov <msoko...@gmail.com> >> wrote: >> > >> > Yes, please go ahead and open an issue. TBH I'm not sure why this is >> > happening - there may be a good reason?? But let's explore it using an >> > issue, thanks. >> > >> > On Fri, Mar 12, 2021 at 12:16 AM Rahul Goswami <rahul196...@gmail.com> >> wrote: >> > > >> > > I can create a Jira and assign it to myself if that's ok (?). I think >> this can help improve commit performance. >> > > Also, to answer your question, we have indexes sometimes going into >> multiple terabytes. Using the replication handler for backup would mean >> requiring a disk capacity more than 2x the index size on the machine at all >> times, which might not be feasible. So we directly back the index up from >> the Solr node to a remote repository. >> > > >> > > Thanks, >> > > Rahul >> > > >> > > On Thu, Mar 11, 2021 at 4:09 PM Michael Sokolov <msoko...@gmail.com> >> wrote: >> > >> >> > >> Well, it certainly doesn't seem necessary to fsync files that are >> > >> unchanged and have already been fsync'ed. Maybe there's an >> opportunity >> > >> to improve it? On the other hand, support for external processes >> > >> reading Lucene index files isn't likely to become a feature of >> Lucene. >> > >> You might want to consider using Solr replication to power your >> > >> backup? >> > >> >> > >> On Thu, Mar 11, 2021 at 2:52 PM Rahul Goswami <rahul196...@gmail.com> >> wrote: >> > >> > >> > >> > Thanks Michael. I thought since this discussion is closer to the >> code than most discussions on the solr-users list, it seemed like a more >> appropriate forum. Will be mindful going forward. >> > >> > On your point about new segments, I attached a debugger and tried >> to do a new commit (just pure Solr commit, no backup process running), and >> the code indeed does fsync on a pre-existing segment file. Hence I was a >> bit baffled since it challenged my fundamental understanding that segment >> files once written are immutable, no matter what (unless picked up for a >> merge of course). Hence I thought of reaching out, in case there are >> scenarios where this might happen which I might be unaware of. >> > >> > >> > >> > Thanks, >> > >> > Rahul >> > >> > >> > >> > On Thu, Mar 11, 2021 at 2:38 PM Michael Sokolov < >> msoko...@gmail.com> wrote: >> > >> >> >> > >> >> This isn't a support forum; solr-users@ might be more >> appropriate. On >> > >> >> that list someone might have a better idea about how the >> replication >> > >> >> handler gets its list of files. This would be a good list to try >> if >> > >> >> you wanted to propose a fix for the problem you're having. But >> since >> > >> >> you're here -- it looks to me as if IndexWriter indeed syncs all >> "new" >> > >> >> files in the current segments being committed; look in >> > >> >> IndexWriter.startCommit and SegmentInfos.files. Caveat: (1) I'm >> > >> >> looking at this code for the first time, and (2) things may have >> been >> > >> >> different in 7.7.2? Sorry I don't know for sure, but are you sure >> that >> > >> >> your backup process is not attempting to copy one of the new >> files? >> > >> >> >> > >> >> On Thu, Mar 11, 2021 at 1:35 PM Rahul Goswami < >> rahul196...@gmail.com> wrote: >> > >> >> > >> > >> >> > Hello, >> > >> >> > Just wanted to follow up one more time to see if this is the >> right form for my question? Or is this suitable for some other mailing list? >> > >> >> > >> > >> >> > Best, >> > >> >> > Rahul >> > >> >> > >> > >> >> > On Sat, Mar 6, 2021 at 3:57 PM Rahul Goswami < >> rahul196...@gmail.com> wrote: >> > >> >> >> >> > >> >> >> Hello everyone, >> > >> >> >> Following up on my question in case anyone has any idea. Why >> it's important to know this is because I am thinking of allowing the backup >> process to not hold any lock on the index files, which should allow the >> fsync during parallel commits. BUT, in case doing an fsync on existing >> segment files in a saved commit point DOES have an effect, it might render >> the backed up index in a corrupt state. >> > >> >> >> >> > >> >> >> Thanks, >> > >> >> >> Rahul >> > >> >> >> >> > >> >> >> On Fri, Mar 5, 2021 at 3:04 PM Rahul Goswami < >> rahul196...@gmail.com> wrote: >> > >> >> >>> >> > >> >> >>> Hello, >> > >> >> >>> We have a process which backs up the index (Solr 7.7.2) on a >> schedule. The way we do it is we first save a commit point on the index and >> then using Solr's /replication handler, get the list of files in that >> generation. After the backup completes, we release the commit point (Please >> note that this is a separate backup process outside of Solr and not the >> backup command of the /replication handler) >> > >> >> >>> The assumption is that while the commit point is saved, no >> changes happen to the segment files in the saved generation. >> > >> >> >>> >> > >> >> >>> Now the issue... The backup process opens the index files in >> a shared READ mode, preventing writes. This is causing any parallel commits >> to fail as it seems to be complaining about the index files to be locked by >> another process(the backup process). Upon debugging, I see that fsync is >> being called during commit on already existing segment files which is not >> expected. So, my question is, is there any reason for lucene to call fsync >> on already existing segment files? >> > >> >> >>> >> > >> >> >>> The line of code I am referring to is as below: >> > >> >> >>> try (final FileChannel file = FileChannel.open(fileToSync, >> isDir ? StandardOpenOption.READ : StandardOpenOption.WRITE)) >> > >> >> >>> >> > >> >> >>> in method fsync(Path fileToSync, boolean isDir) of the class >> file >> > >> >> >>> >> > >> >> >>> lucene\core\src\java\org\apache\lucene\util\IOUtils.java >> > >> >> >>> >> > >> >> >>> Thanks, >> > >> >> >>> Rahul >> > >> >> >> > >> >> >> --------------------------------------------------------------------- >> > >> >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> > >> >> For additional commands, e-mail: dev-h...@lucene.apache.org >> > >> >> >> > >> >> > >> --------------------------------------------------------------------- >> > >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> > >> For additional commands, e-mail: dev-h...@lucene.apache.org >> > >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: dev-h...@lucene.apache.org >> >>