Hi Mike,

Windows has unfortunately some crazy limitation on address space, so number of 
address bits is limited to 43, see my blog post @ 
https://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

That's 8 Terabyte.

On Linux this limitation is at 47 bits, and with later kernels and hardware 
it's even huger so the universe fits into it. 😜

Uwe

Am March 15, 2021 7:15:11 PM UTC schrieb Michael McCandless 
<[email protected]>:
>Thanks Rahul.
>
>> primary reason being that memory mapping multi-terabyte indexes is
>not
>feasible through mmap
>
>Hmm, that is interesting -- are you using a 64 bit JVM?  If so, what
>goes
>wrong with such large maps?  Lucene's MMapDirectory should chunk the
>mapping to deal with ByteBuffer int only address space.
>
>SimpleFSDirectory usually has substantially worse performance than
>MMapDirectory.
>
>Still, I suspect you would hit the same issue if you used other
>FSDirectory
>implementations -- the fsync behavior should be the same.
>
>Mike McCandless
>
>http://blog.mikemccandless.com
>
>
>On Fri, Mar 12, 2021 at 1:46 PM Rahul Goswami <[email protected]>
>wrote:
>
>> Thanks Michael. For your question...yes I am running Solr on Windows
>and
>> running it with SimpleFSDirectoryFactory (primary reason being that
>memory
>> mapping multi-terabyte indexes is not feasible through mmap). I will
>create
>> a Jira later today with the details in this thread and assign it to
>myself.
>> Will take a shot at the fix.
>>
>> Thanks,
>> Rahul
>>
>> On Fri, Mar 12, 2021 at 10:00 AM Michael McCandless <
>> [email protected]> wrote:
>>
>>> I think long ago we used to track which files were actually dirty
>(we had
>>> written bytes to) and only fsync those ones.  But something went
>wrong with
>>> that, and at some point we "simplified" this logic, I think on the
>>> assumption that asking the OS to fsync a file that does in fact
>exist yet
>>> indeed has not changed would be harmless?  But somehow it is not in
>your
>>> case?  Are you on Windows?
>>>
>>> I tried to do a bit of digital archaeology and remember what
>>> happened here, and I came across this relevant looking issue:
>>> https://issues.apache.org/jira/browse/LUCENE-2328.  That issue moved
>>> tracking of which files have been written but not yet fsync'd down
>from
>>> IndexWriter into FSDirectory.
>>>
>>> But there was another change that then removed staleFiles from
>>> FSDirectory entirely.... still trying to find that.  Aha, found it!
>>> https://issues.apache.org/jira/browse/LUCENE-6150.  Phew Uwe was
>really
>>> quite upset in that issue ;)
>>>
>>> I also came across this delightful related issue, showing how a
>massive
>>> hurricane (Irene) can lead to finding and fixing a bug in Lucene!
>>> https://issues.apache.org/jira/browse/LUCENE-3418
>>>
>>> > The assumption is that while the commit point is saved, no changes
>>> happen to the segment files in the saved generation.
>>>
>>> This assumption should really be true.  Lucene writes the files,
>append
>>> only, once, and then never changes them, once they are closed. 
>Pulling a
>>> commit point from Solr should further ensure that, even as indexing
>>> continues and new segments are written, the old segments referenced
>in that
>>> commit point will not be deleted.  But apparently this "harmless
>fsync"
>>> Lucene is doing is not so harmless in your use case.  Maybe open an
>issue
>>> and pull out the details from this discussion onto it?
>>>
>>> Mike McCandless
>>>
>>> http://blog.mikemccandless.com
>>>
>>>
>>> On Fri, Mar 12, 2021 at 9:03 AM Michael Sokolov <[email protected]>
>>> wrote:
>>>
>>>> Also - I should have said - I think the first step here is to write
>a
>>>> focused unit test that demonstrates the existence of the extra
>fsyncs
>>>> that we want to eliminate. It would be awesome if you were able to
>>>> create such a thing.
>>>>
>>>> On Fri, Mar 12, 2021 at 9:00 AM Michael Sokolov
><[email protected]>
>>>> wrote:
>>>> >
>>>> > Yes, please go ahead and open an issue. TBH I'm not sure why this
>is
>>>> > happening - there may be a good reason?? But let's explore it
>using an
>>>> > issue, thanks.
>>>> >
>>>> > On Fri, Mar 12, 2021 at 12:16 AM Rahul Goswami
><[email protected]>
>>>> wrote:
>>>> > >
>>>> > > I can create a Jira and assign it to myself if that's ok (?). I
>>>> think this can help improve commit performance.
>>>> > > Also, to answer your question, we have indexes sometimes going
>into
>>>> multiple terabytes. Using the replication handler for backup would
>mean
>>>> requiring a disk capacity more than 2x the index size on the
>machine at all
>>>> times, which might not be feasible. So we directly back the index
>up from
>>>> the Solr node to a remote repository.
>>>> > >
>>>> > > Thanks,
>>>> > > Rahul
>>>> > >
>>>> > > On Thu, Mar 11, 2021 at 4:09 PM Michael Sokolov
><[email protected]>
>>>> wrote:
>>>> > >>
>>>> > >> Well, it certainly doesn't seem necessary to fsync files that
>are
>>>> > >> unchanged and have already been fsync'ed. Maybe there's an
>>>> opportunity
>>>> > >> to improve it? On the other hand, support for external
>processes
>>>> > >> reading Lucene index files isn't likely to become a feature of
>>>> Lucene.
>>>> > >> You might want to consider using Solr replication to power
>your
>>>> > >> backup?
>>>> > >>
>>>> > >> On Thu, Mar 11, 2021 at 2:52 PM Rahul Goswami <
>>>> [email protected]> wrote:
>>>> > >> >
>>>> > >> > Thanks Michael. I thought since this discussion is closer to
>the
>>>> code than most discussions on the solr-users list, it seemed like a
>more
>>>> appropriate forum. Will be mindful going forward.
>>>> > >> > On your point about new segments, I attached a debugger and
>tried
>>>> to do a new commit (just pure Solr commit, no backup process
>running), and
>>>> the code indeed does fsync on a pre-existing segment file. Hence I
>was a
>>>> bit baffled since it challenged my fundamental understanding that
>segment
>>>> files once written are immutable, no matter what (unless picked up
>for a
>>>> merge of course). Hence I thought of reaching out, in case there
>are
>>>> scenarios where this might happen which I might be unaware of.
>>>> > >> >
>>>> > >> > Thanks,
>>>> > >> > Rahul
>>>> > >> >
>>>> > >> > On Thu, Mar 11, 2021 at 2:38 PM Michael Sokolov <
>>>> [email protected]> wrote:
>>>> > >> >>
>>>> > >> >> This isn't a support forum; solr-users@ might be more
>>>> appropriate. On
>>>> > >> >> that list someone might have a better idea about how the
>>>> replication
>>>> > >> >> handler gets its list of files. This would be a good list
>to try
>>>> if
>>>> > >> >> you wanted to propose a fix for the problem you're having.
>But
>>>> since
>>>> > >> >> you're here -- it looks to me as if IndexWriter indeed
>syncs all
>>>> "new"
>>>> > >> >> files in the current segments being committed; look in
>>>> > >> >> IndexWriter.startCommit and SegmentInfos.files. Caveat: (1)
>I'm
>>>> > >> >> looking at this code for the first time, and (2) things may
>have
>>>> been
>>>> > >> >> different in 7.7.2? Sorry I don't know for sure, but are
>you
>>>> sure that
>>>> > >> >> your backup process is not attempting to copy one of the
>new
>>>> files?
>>>> > >> >>
>>>> > >> >> On Thu, Mar 11, 2021 at 1:35 PM Rahul Goswami <
>>>> [email protected]> wrote:
>>>> > >> >> >
>>>> > >> >> > Hello,
>>>> > >> >> > Just wanted to follow up one more time to see if this is
>the
>>>> right form for my question? Or is this suitable for some other
>mailing list?
>>>> > >> >> >
>>>> > >> >> > Best,
>>>> > >> >> > Rahul
>>>> > >> >> >
>>>> > >> >> > On Sat, Mar 6, 2021 at 3:57 PM Rahul Goswami <
>>>> [email protected]> wrote:
>>>> > >> >> >>
>>>> > >> >> >> Hello everyone,
>>>> > >> >> >> Following up on my question in case anyone has any idea.
>Why
>>>> it's important to know this is because I am thinking of allowing
>the backup
>>>> process to not hold any lock on the index files, which should allow
>the
>>>> fsync during parallel commits. BUT, in case doing an fsync on
>existing
>>>> segment files in a saved commit point DOES have an effect, it might
>render
>>>> the backed up index in a corrupt state.
>>>> > >> >> >>
>>>> > >> >> >> Thanks,
>>>> > >> >> >> Rahul
>>>> > >> >> >>
>>>> > >> >> >> On Fri, Mar 5, 2021 at 3:04 PM Rahul Goswami <
>>>> [email protected]> wrote:
>>>> > >> >> >>>
>>>> > >> >> >>> Hello,
>>>> > >> >> >>> We have a process which backs up the index (Solr 7.7.2)
>on a
>>>> schedule. The way we do it is we first save a commit point on the
>index and
>>>> then using Solr's /replication handler, get the list of files in
>that
>>>> generation. After the backup completes, we release the commit point
>(Please
>>>> note that this is a separate backup process outside of Solr and not
>the
>>>> backup command of the /replication handler)
>>>> > >> >> >>> The assumption is that while the commit point is saved,
>no
>>>> changes happen to the segment files in the saved generation.
>>>> > >> >> >>>
>>>> > >> >> >>> Now the issue... The backup process opens the index
>files in
>>>> a shared READ mode, preventing writes. This is causing any parallel
>commits
>>>> to fail as it seems to be complaining about the index files to be
>locked by
>>>> another process(the backup process). Upon debugging, I see that
>fsync is
>>>> being called during commit on already existing segment files which
>is not
>>>> expected. So, my question is, is there any reason for lucene to
>call fsync
>>>> on already existing segment files?
>>>> > >> >> >>>
>>>> > >> >> >>> The line of code I am referring to is as below:
>>>> > >> >> >>> try (final FileChannel file =
>FileChannel.open(fileToSync,
>>>> isDir ? StandardOpenOption.READ : StandardOpenOption.WRITE))
>>>> > >> >> >>>
>>>> > >> >> >>> in method fsync(Path fileToSync, boolean isDir) of the
>class
>>>> file
>>>> > >> >> >>>
>>>> > >> >> >>>
>lucene\core\src\java\org\apache\lucene\util\IOUtils.java
>>>> > >> >> >>>
>>>> > >> >> >>> Thanks,
>>>> > >> >> >>> Rahul
>>>> > >> >>
>>>> > >> >>
>>>>
>---------------------------------------------------------------------
>>>> > >> >> To unsubscribe, e-mail: [email protected]
>>>> > >> >> For additional commands, e-mail: [email protected]
>>>> > >> >>
>>>> > >>
>>>> > >>
>>>>
>---------------------------------------------------------------------
>>>> > >> To unsubscribe, e-mail: [email protected]
>>>> > >> For additional commands, e-mail: [email protected]
>>>> > >>
>>>>
>>>>
>---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: [email protected]
>>>> For additional commands, e-mail: [email protected]
>>>>
>>>>

--
Uwe Schindler
Achterdiek 19, 28357 Bremen
https://www.thetaphi.de

Reply via email to