The direct io has in fact the problem which was just wrongly named by Dawid: 
Block alignment is needed - on disk and not in memory. In short: You can't read 
or write a single byte anywhere in file; you need a buffering layer in-between 
that takes care of alignment. NativeUnixDir does this.

Uwe

Am September 18, 2019 1:54:35 PM UTC schrieb Dawid Weiss 
<dawid.we...@gmail.com>:
>Thanks for the explanation, Mike!
>
>D.
>
>On Wed, Sep 18, 2019 at 3:21 PM Michael McCandless
><luc...@mikemccandless.com> wrote:
>>
>> Dawid, it's confusing: direct IO is different from a direct
>ByteBuffer!
>>
>> Direct IO means you bypass all kernel "smarts", so the Linux buffer
>cache is not used, no IO scheduling, no write cache that the pdflush
>daemon must periodically move to disk, etc.  This is normally a bad
>idea, and better to use fadvise/madvise to give kernel hints about what
>you are doing, and use the buffer cache for what it's good at.  Linus
>hates that direct IO is even an option for us ...
>>
>> Back when I wrote NativeUnixDirectory, the idea was to prevent
>ongoing merges from so heavily impacting ongoing searches, when you are
>doing indexing and searching on one node.  We open the newly merged
>segments files using direct IO, and do our own buffering, and then all
>writes go straight to disk instead of using up precious hot pages that
>are in use for searching.  I think I ran some simple performance tests
>back then but I don't remember the results ... more testing is needed
>to see if it really helps.
>>
>> At Amazon, we are using segment based replication ever 60 seconds to
>copy newly indexed segments out to all searchers, so we never have
>nodes doing both indexing or searching, it's either or ... but, copying
>out max sized newly merged segments to the searchers is causing some
>thrashing so we are exploring using direct IO for those writes, and
>then separately warming the new segments after the copy.
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>>
>> On Tue, Sep 17, 2019 at 1:16 PM Uwe Schindler <u...@thetaphi.de>
>wrote:
>>>
>>> We discussed this already on Berlinbuzzwords (Mike and Michael). Yes
>it's possible and may work for merges where block io is possible. But
>most of us said: it's fine to not use io cache for merging, but it
>won't make pages hot. So merges are invisible to OS, so you have to
>warm merged segments if you write directly. If you read directly on
>merging, you won't pollute cache with one time reads, but it also won't
>use cache if already cached.
>>> We should better make a proposal for f/madvise. The jdk people are
>open for that, and I am jdk committer now, so I can make a prototype.
>>>
>>> Uwe
>>>
>>> Am September 17, 2019 4:48:26 PM UTC schrieb Dawid Weiss
><dawid.we...@gmail.com>:
>>>>
>>>> Isn't that restricted to aligned block-only access though? I can
>>>> imagine this would complicate the implementation if somebody wanted
>to
>>>> use it directly.
>>>>
>>>> Dawid
>>>>
>>>> On Tue, Sep 17, 2019 at 5:37 PM Michael McCandless
>>>> <luc...@mikemccandless.com> wrote:
>>>>>
>>>>>
>>>>>  Whoa!  That would be awesome -- no more JNI to use Direct I/O?
>>>>>  Looks like you use it like this:
>>>>>
>>>>>  FileChannel fc = FileChannel.open(p, StandardOpenOption.WRITE,
>>>>>                                    ExtendedOpenOption.DIRECT
>>>>>
>>>>>  But it looks like you need to enable the jdk.unsupported module,
>added with http://openjdk.java.net/jeps/260
>>>>>
>>>>>  Mike McCandless
>>>>>
>>>>>  http://blog.mikemccandless.com
>>>>>
>>>>>
>>>>>  On Mon, Sep 16, 2019 at 11:55 AM Michael Sokolov
><msoko...@gmail.com> wrote:
>>>>>>
>>>>>>
>>>>>>  https://bugs.openjdk.java.net/browse/JDK-8189192 makes it appear
>that
>>>>>>  Direct I/O is (or may be?) available now in JDK's since JDK10.
>Should
>>>>>>  we try using that API in NativeUnixDirectory in order to avoid
>JNI
>>>>>>  calls?
>>>>>> ________________________________
>>>>>>  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>>>>>  For additional commands, e-mail: dev-h...@lucene.apache.org
>>>>>>
>>>> ________________________________
>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>>
>>>
>>> --
>>> Uwe Schindler
>>> Achterdiek 19, 28357 Bremen
>>> https://www.thetaphi.de
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>For additional commands, e-mail: dev-h...@lucene.apache.org

--
Uwe Schindler
Achterdiek 19, 28357 Bremen
https://www.thetaphi.de

Reply via email to