Yes, please go ahead and open an issue. TBH I'm not sure why this is happening - there may be a good reason?? But let's explore it using an issue, thanks.
On Fri, Mar 12, 2021 at 12:16 AM Rahul Goswami <[email protected]> wrote: > > I can create a Jira and assign it to myself if that's ok (?). I think this > can help improve commit performance. > Also, to answer your question, we have indexes sometimes going into multiple > terabytes. Using the replication handler for backup would mean requiring a > disk capacity more than 2x the index size on the machine at all times, which > might not be feasible. So we directly back the index up from the Solr node to > a remote repository. > > Thanks, > Rahul > > On Thu, Mar 11, 2021 at 4:09 PM Michael Sokolov <[email protected]> wrote: >> >> Well, it certainly doesn't seem necessary to fsync files that are >> unchanged and have already been fsync'ed. Maybe there's an opportunity >> to improve it? On the other hand, support for external processes >> reading Lucene index files isn't likely to become a feature of Lucene. >> You might want to consider using Solr replication to power your >> backup? >> >> On Thu, Mar 11, 2021 at 2:52 PM Rahul Goswami <[email protected]> wrote: >> > >> > Thanks Michael. I thought since this discussion is closer to the code than >> > most discussions on the solr-users list, it seemed like a more appropriate >> > forum. Will be mindful going forward. >> > On your point about new segments, I attached a debugger and tried to do a >> > new commit (just pure Solr commit, no backup process running), and the >> > code indeed does fsync on a pre-existing segment file. Hence I was a bit >> > baffled since it challenged my fundamental understanding that segment >> > files once written are immutable, no matter what (unless picked up for a >> > merge of course). Hence I thought of reaching out, in case there are >> > scenarios where this might happen which I might be unaware of. >> > >> > Thanks, >> > Rahul >> > >> > On Thu, Mar 11, 2021 at 2:38 PM Michael Sokolov <[email protected]> wrote: >> >> >> >> This isn't a support forum; solr-users@ might be more appropriate. On >> >> that list someone might have a better idea about how the replication >> >> handler gets its list of files. This would be a good list to try if >> >> you wanted to propose a fix for the problem you're having. But since >> >> you're here -- it looks to me as if IndexWriter indeed syncs all "new" >> >> files in the current segments being committed; look in >> >> IndexWriter.startCommit and SegmentInfos.files. Caveat: (1) I'm >> >> looking at this code for the first time, and (2) things may have been >> >> different in 7.7.2? Sorry I don't know for sure, but are you sure that >> >> your backup process is not attempting to copy one of the new files? >> >> >> >> On Thu, Mar 11, 2021 at 1:35 PM Rahul Goswami <[email protected]> >> >> wrote: >> >> > >> >> > Hello, >> >> > Just wanted to follow up one more time to see if this is the right form >> >> > for my question? Or is this suitable for some other mailing list? >> >> > >> >> > Best, >> >> > Rahul >> >> > >> >> > On Sat, Mar 6, 2021 at 3:57 PM Rahul Goswami <[email protected]> >> >> > wrote: >> >> >> >> >> >> Hello everyone, >> >> >> Following up on my question in case anyone has any idea. Why it's >> >> >> important to know this is because I am thinking of allowing the backup >> >> >> process to not hold any lock on the index files, which should allow >> >> >> the fsync during parallel commits. BUT, in case doing an fsync on >> >> >> existing segment files in a saved commit point DOES have an effect, it >> >> >> might render the backed up index in a corrupt state. >> >> >> >> >> >> Thanks, >> >> >> Rahul >> >> >> >> >> >> On Fri, Mar 5, 2021 at 3:04 PM Rahul Goswami <[email protected]> >> >> >> wrote: >> >> >>> >> >> >>> Hello, >> >> >>> We have a process which backs up the index (Solr 7.7.2) on a >> >> >>> schedule. The way we do it is we first save a commit point on the >> >> >>> index and then using Solr's /replication handler, get the list of >> >> >>> files in that generation. After the backup completes, we release the >> >> >>> commit point (Please note that this is a separate backup process >> >> >>> outside of Solr and not the backup command of the /replication >> >> >>> handler) >> >> >>> The assumption is that while the commit point is saved, no changes >> >> >>> happen to the segment files in the saved generation. >> >> >>> >> >> >>> Now the issue... The backup process opens the index files in a shared >> >> >>> READ mode, preventing writes. This is causing any parallel commits to >> >> >>> fail as it seems to be complaining about the index files to be locked >> >> >>> by another process(the backup process). Upon debugging, I see that >> >> >>> fsync is being called during commit on already existing segment files >> >> >>> which is not expected. So, my question is, is there any reason for >> >> >>> lucene to call fsync on already existing segment files? >> >> >>> >> >> >>> The line of code I am referring to is as below: >> >> >>> try (final FileChannel file = FileChannel.open(fileToSync, isDir ? >> >> >>> StandardOpenOption.READ : StandardOpenOption.WRITE)) >> >> >>> >> >> >>> in method fsync(Path fileToSync, boolean isDir) of the class file >> >> >>> >> >> >>> lucene\core\src\java\org\apache\lucene\util\IOUtils.java >> >> >>> >> >> >>> Thanks, >> >> >>> Rahul >> >> >> >> --------------------------------------------------------------------- >> >> To unsubscribe, e-mail: [email protected] >> >> For additional commands, e-mail: [email protected] >> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [email protected] >> For additional commands, e-mail: [email protected] >> --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
