The direct io has in fact the problem which was just wrongly named by Dawid: Block alignment is needed - on disk and not in memory. In short: You can't read or write a single byte anywhere in file; you need a buffering layer in-between that takes care of alignment. NativeUnixDir does this.
Uwe Am September 18, 2019 1:54:35 PM UTC schrieb Dawid Weiss <dawid.we...@gmail.com>: >Thanks for the explanation, Mike! > >D. > >On Wed, Sep 18, 2019 at 3:21 PM Michael McCandless ><luc...@mikemccandless.com> wrote: >> >> Dawid, it's confusing: direct IO is different from a direct >ByteBuffer! >> >> Direct IO means you bypass all kernel "smarts", so the Linux buffer >cache is not used, no IO scheduling, no write cache that the pdflush >daemon must periodically move to disk, etc. This is normally a bad >idea, and better to use fadvise/madvise to give kernel hints about what >you are doing, and use the buffer cache for what it's good at. Linus >hates that direct IO is even an option for us ... >> >> Back when I wrote NativeUnixDirectory, the idea was to prevent >ongoing merges from so heavily impacting ongoing searches, when you are >doing indexing and searching on one node. We open the newly merged >segments files using direct IO, and do our own buffering, and then all >writes go straight to disk instead of using up precious hot pages that >are in use for searching. I think I ran some simple performance tests >back then but I don't remember the results ... more testing is needed >to see if it really helps. >> >> At Amazon, we are using segment based replication ever 60 seconds to >copy newly indexed segments out to all searchers, so we never have >nodes doing both indexing or searching, it's either or ... but, copying >out max sized newly merged segments to the searchers is causing some >thrashing so we are exploring using direct IO for those writes, and >then separately warming the new segments after the copy. >> >> Mike McCandless >> >> http://blog.mikemccandless.com >> >> >> On Tue, Sep 17, 2019 at 1:16 PM Uwe Schindler <u...@thetaphi.de> >wrote: >>> >>> We discussed this already on Berlinbuzzwords (Mike and Michael). Yes >it's possible and may work for merges where block io is possible. But >most of us said: it's fine to not use io cache for merging, but it >won't make pages hot. So merges are invisible to OS, so you have to >warm merged segments if you write directly. If you read directly on >merging, you won't pollute cache with one time reads, but it also won't >use cache if already cached. >>> We should better make a proposal for f/madvise. The jdk people are >open for that, and I am jdk committer now, so I can make a prototype. >>> >>> Uwe >>> >>> Am September 17, 2019 4:48:26 PM UTC schrieb Dawid Weiss ><dawid.we...@gmail.com>: >>>> >>>> Isn't that restricted to aligned block-only access though? I can >>>> imagine this would complicate the implementation if somebody wanted >to >>>> use it directly. >>>> >>>> Dawid >>>> >>>> On Tue, Sep 17, 2019 at 5:37 PM Michael McCandless >>>> <luc...@mikemccandless.com> wrote: >>>>> >>>>> >>>>> Whoa! That would be awesome -- no more JNI to use Direct I/O? >>>>> Looks like you use it like this: >>>>> >>>>> FileChannel fc = FileChannel.open(p, StandardOpenOption.WRITE, >>>>> ExtendedOpenOption.DIRECT >>>>> >>>>> But it looks like you need to enable the jdk.unsupported module, >added with http://openjdk.java.net/jeps/260 >>>>> >>>>> Mike McCandless >>>>> >>>>> http://blog.mikemccandless.com >>>>> >>>>> >>>>> On Mon, Sep 16, 2019 at 11:55 AM Michael Sokolov ><msoko...@gmail.com> wrote: >>>>>> >>>>>> >>>>>> https://bugs.openjdk.java.net/browse/JDK-8189192 makes it appear >that >>>>>> Direct I/O is (or may be?) available now in JDK's since JDK10. >Should >>>>>> we try using that API in NativeUnixDirectory in order to avoid >JNI >>>>>> calls? >>>>>> ________________________________ >>>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >>>>>> For additional commands, e-mail: dev-h...@lucene.apache.org >>>>>> >>>> ________________________________ >>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >>>> For additional commands, e-mail: dev-h...@lucene.apache.org >>>> >>> >>> -- >>> Uwe Schindler >>> Achterdiek 19, 28357 Bremen >>> https://www.thetaphi.de > >--------------------------------------------------------------------- >To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >For additional commands, e-mail: dev-h...@lucene.apache.org -- Uwe Schindler Achterdiek 19, 28357 Bremen https://www.thetaphi.de