Yes, please go ahead and open an issue. TBH I'm not sure why this is
happening - there may be a good reason?? But let's explore it using an
issue, thanks.

On Fri, Mar 12, 2021 at 12:16 AM Rahul Goswami <[email protected]> wrote:
>
> I can create a Jira and assign it to myself if that's ok (?). I think this 
> can help improve commit performance.
> Also, to answer your question, we have indexes sometimes going into multiple 
> terabytes. Using the replication handler for backup would mean requiring a 
> disk capacity more than 2x the index size on the machine at all times, which 
> might not be feasible. So we directly back the index up from the Solr node to 
> a remote repository.
>
> Thanks,
> Rahul
>
> On Thu, Mar 11, 2021 at 4:09 PM Michael Sokolov <[email protected]> wrote:
>>
>> Well, it certainly doesn't seem necessary to fsync files that are
>> unchanged and have already been fsync'ed. Maybe there's an opportunity
>> to improve it? On the other hand, support for external processes
>> reading Lucene index files isn't likely to become a feature of Lucene.
>> You might want to consider using Solr replication to power your
>> backup?
>>
>> On Thu, Mar 11, 2021 at 2:52 PM Rahul Goswami <[email protected]> wrote:
>> >
>> > Thanks Michael. I thought since this discussion is closer to the code than 
>> > most discussions on the solr-users list, it seemed like a more appropriate 
>> > forum. Will be mindful going forward.
>> > On your point about new segments, I attached a debugger and tried to do a 
>> > new commit (just pure Solr commit, no backup process running), and the 
>> > code indeed does fsync on a pre-existing segment file. Hence I was a bit 
>> > baffled since it challenged my fundamental understanding that segment 
>> > files once written are immutable, no matter what (unless picked up for a 
>> > merge of course). Hence I thought of reaching out, in case there are 
>> > scenarios where this might happen which I might be unaware of.
>> >
>> > Thanks,
>> > Rahul
>> >
>> > On Thu, Mar 11, 2021 at 2:38 PM Michael Sokolov <[email protected]> wrote:
>> >>
>> >> This isn't a support forum; solr-users@ might be more appropriate. On
>> >> that list someone might have a better idea about how the replication
>> >> handler gets its list of files. This would be a good list to try if
>> >> you wanted to propose a fix for the problem you're having. But since
>> >> you're here -- it looks to me as if IndexWriter indeed syncs all "new"
>> >> files in the current segments being committed; look in
>> >> IndexWriter.startCommit and SegmentInfos.files. Caveat: (1) I'm
>> >> looking at this code for the first time, and (2) things may have been
>> >> different in 7.7.2? Sorry I don't know for sure, but are you sure that
>> >> your backup process is not attempting to copy one of the new files?
>> >>
>> >> On Thu, Mar 11, 2021 at 1:35 PM Rahul Goswami <[email protected]> 
>> >> wrote:
>> >> >
>> >> > Hello,
>> >> > Just wanted to follow up one more time to see if this is the right form 
>> >> > for my question? Or is this suitable for some other mailing list?
>> >> >
>> >> > Best,
>> >> > Rahul
>> >> >
>> >> > On Sat, Mar 6, 2021 at 3:57 PM Rahul Goswami <[email protected]> 
>> >> > wrote:
>> >> >>
>> >> >> Hello everyone,
>> >> >> Following up on my question in case anyone has any idea. Why it's 
>> >> >> important to know this is because I am thinking of allowing the backup 
>> >> >> process to not hold any lock on the index files, which should allow 
>> >> >> the fsync during parallel commits. BUT, in case doing an fsync on 
>> >> >> existing segment files in a saved commit point DOES have an effect, it 
>> >> >> might render the backed up index in a corrupt state.
>> >> >>
>> >> >> Thanks,
>> >> >> Rahul
>> >> >>
>> >> >> On Fri, Mar 5, 2021 at 3:04 PM Rahul Goswami <[email protected]> 
>> >> >> wrote:
>> >> >>>
>> >> >>> Hello,
>> >> >>> We have a process which backs up the index (Solr 7.7.2) on a 
>> >> >>> schedule. The way we do it is we first save a commit point on the 
>> >> >>> index and then using Solr's /replication handler, get the list of 
>> >> >>> files in that generation. After the backup completes, we release the 
>> >> >>> commit point (Please note that this is a separate backup process 
>> >> >>> outside of Solr and not the backup command of the /replication 
>> >> >>> handler)
>> >> >>> The assumption is that while the commit point is saved, no changes 
>> >> >>> happen to the segment files in the saved generation.
>> >> >>>
>> >> >>> Now the issue... The backup process opens the index files in a shared 
>> >> >>> READ mode, preventing writes. This is causing any parallel commits to 
>> >> >>> fail as it seems to be complaining about the index files to be locked 
>> >> >>> by another process(the backup process). Upon debugging, I see that 
>> >> >>> fsync is being called during commit on already existing segment files 
>> >> >>> which is not expected. So, my question is, is there any reason for 
>> >> >>> lucene to call fsync on already existing segment files?
>> >> >>>
>> >> >>> The line of code I am referring to is as below:
>> >> >>> try (final FileChannel file = FileChannel.open(fileToSync, isDir ? 
>> >> >>> StandardOpenOption.READ : StandardOpenOption.WRITE))
>> >> >>>
>> >> >>> in method fsync(Path fileToSync, boolean isDir) of the class file
>> >> >>>
>> >> >>> lucene\core\src\java\org\apache\lucene\util\IOUtils.java
>> >> >>>
>> >> >>> Thanks,
>> >> >>> Rahul
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: [email protected]
>> >> For additional commands, e-mail: [email protected]
>> >>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to