Also - I should have said - I think the first step here is to write a
focused unit test that demonstrates the existence of the extra fsyncs
that we want to eliminate. It would be awesome if you were able to
create such a thing.

On Fri, Mar 12, 2021 at 9:00 AM Michael Sokolov <[email protected]> wrote:
>
> Yes, please go ahead and open an issue. TBH I'm not sure why this is
> happening - there may be a good reason?? But let's explore it using an
> issue, thanks.
>
> On Fri, Mar 12, 2021 at 12:16 AM Rahul Goswami <[email protected]> wrote:
> >
> > I can create a Jira and assign it to myself if that's ok (?). I think this 
> > can help improve commit performance.
> > Also, to answer your question, we have indexes sometimes going into 
> > multiple terabytes. Using the replication handler for backup would mean 
> > requiring a disk capacity more than 2x the index size on the machine at all 
> > times, which might not be feasible. So we directly back the index up from 
> > the Solr node to a remote repository.
> >
> > Thanks,
> > Rahul
> >
> > On Thu, Mar 11, 2021 at 4:09 PM Michael Sokolov <[email protected]> wrote:
> >>
> >> Well, it certainly doesn't seem necessary to fsync files that are
> >> unchanged and have already been fsync'ed. Maybe there's an opportunity
> >> to improve it? On the other hand, support for external processes
> >> reading Lucene index files isn't likely to become a feature of Lucene.
> >> You might want to consider using Solr replication to power your
> >> backup?
> >>
> >> On Thu, Mar 11, 2021 at 2:52 PM Rahul Goswami <[email protected]> 
> >> wrote:
> >> >
> >> > Thanks Michael. I thought since this discussion is closer to the code 
> >> > than most discussions on the solr-users list, it seemed like a more 
> >> > appropriate forum. Will be mindful going forward.
> >> > On your point about new segments, I attached a debugger and tried to do 
> >> > a new commit (just pure Solr commit, no backup process running), and the 
> >> > code indeed does fsync on a pre-existing segment file. Hence I was a bit 
> >> > baffled since it challenged my fundamental understanding that segment 
> >> > files once written are immutable, no matter what (unless picked up for a 
> >> > merge of course). Hence I thought of reaching out, in case there are 
> >> > scenarios where this might happen which I might be unaware of.
> >> >
> >> > Thanks,
> >> > Rahul
> >> >
> >> > On Thu, Mar 11, 2021 at 2:38 PM Michael Sokolov <[email protected]> 
> >> > wrote:
> >> >>
> >> >> This isn't a support forum; solr-users@ might be more appropriate. On
> >> >> that list someone might have a better idea about how the replication
> >> >> handler gets its list of files. This would be a good list to try if
> >> >> you wanted to propose a fix for the problem you're having. But since
> >> >> you're here -- it looks to me as if IndexWriter indeed syncs all "new"
> >> >> files in the current segments being committed; look in
> >> >> IndexWriter.startCommit and SegmentInfos.files. Caveat: (1) I'm
> >> >> looking at this code for the first time, and (2) things may have been
> >> >> different in 7.7.2? Sorry I don't know for sure, but are you sure that
> >> >> your backup process is not attempting to copy one of the new files?
> >> >>
> >> >> On Thu, Mar 11, 2021 at 1:35 PM Rahul Goswami <[email protected]> 
> >> >> wrote:
> >> >> >
> >> >> > Hello,
> >> >> > Just wanted to follow up one more time to see if this is the right 
> >> >> > form for my question? Or is this suitable for some other mailing list?
> >> >> >
> >> >> > Best,
> >> >> > Rahul
> >> >> >
> >> >> > On Sat, Mar 6, 2021 at 3:57 PM Rahul Goswami <[email protected]> 
> >> >> > wrote:
> >> >> >>
> >> >> >> Hello everyone,
> >> >> >> Following up on my question in case anyone has any idea. Why it's 
> >> >> >> important to know this is because I am thinking of allowing the 
> >> >> >> backup process to not hold any lock on the index files, which should 
> >> >> >> allow the fsync during parallel commits. BUT, in case doing an fsync 
> >> >> >> on existing segment files in a saved commit point DOES have an 
> >> >> >> effect, it might render the backed up index in a corrupt state.
> >> >> >>
> >> >> >> Thanks,
> >> >> >> Rahul
> >> >> >>
> >> >> >> On Fri, Mar 5, 2021 at 3:04 PM Rahul Goswami <[email protected]> 
> >> >> >> wrote:
> >> >> >>>
> >> >> >>> Hello,
> >> >> >>> We have a process which backs up the index (Solr 7.7.2) on a 
> >> >> >>> schedule. The way we do it is we first save a commit point on the 
> >> >> >>> index and then using Solr's /replication handler, get the list of 
> >> >> >>> files in that generation. After the backup completes, we release 
> >> >> >>> the commit point (Please note that this is a separate backup 
> >> >> >>> process outside of Solr and not the backup command of the 
> >> >> >>> /replication handler)
> >> >> >>> The assumption is that while the commit point is saved, no changes 
> >> >> >>> happen to the segment files in the saved generation.
> >> >> >>>
> >> >> >>> Now the issue... The backup process opens the index files in a 
> >> >> >>> shared READ mode, preventing writes. This is causing any parallel 
> >> >> >>> commits to fail as it seems to be complaining about the index files 
> >> >> >>> to be locked by another process(the backup process). Upon 
> >> >> >>> debugging, I see that fsync is being called during commit on 
> >> >> >>> already existing segment files which is not expected. So, my 
> >> >> >>> question is, is there any reason for lucene to call fsync on 
> >> >> >>> already existing segment files?
> >> >> >>>
> >> >> >>> The line of code I am referring to is as below:
> >> >> >>> try (final FileChannel file = FileChannel.open(fileToSync, isDir ? 
> >> >> >>> StandardOpenOption.READ : StandardOpenOption.WRITE))
> >> >> >>>
> >> >> >>> in method fsync(Path fileToSync, boolean isDir) of the class file
> >> >> >>>
> >> >> >>> lucene\core\src\java\org\apache\lucene\util\IOUtils.java
> >> >> >>>
> >> >> >>> Thanks,
> >> >> >>> Rahul
> >> >>
> >> >> ---------------------------------------------------------------------
> >> >> To unsubscribe, e-mail: [email protected]
> >> >> For additional commands, e-mail: [email protected]
> >> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: [email protected]
> >> For additional commands, e-mail: [email protected]
> >>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to