Thanks a lot Uwe!!! Do we get any benefit on using MMapDirectory over
NIOFSDir during indexing? During merging? Is it ok to change to
MMapDirectory during search alone?

--
Kumaran R


On Nov 24, 2016 11:27 PM, "Erick Erickson" <erickerick...@gmail.com> wrote:
>
> Thanks Uwe!
>
>
>
>
> On Thu, Nov 24, 2016 at 9:41 AM, Uwe Schindler <u...@thetaphi.de> wrote:
> > Hi Kumaran, hi Erick,
> >
> >> Not really, as I don't know that code well, Uwe and company
> >> are the masters of that realm ;)....
> >>
> >> Sorry I can't be more help there....
> >
> > I can help!
> >
> >> On Thu, Nov 24, 2016 at 7:29 AM, Kumaran Ramasubramanian
> >> <kums....@gmail.com> wrote:
> >> > Erick, Thanks a lot for sharing an excellent post...
> >> >
> >> > Btw, am using NIOFSDirectory, could you please elaborate on below
> >> mentioned
> >> > lines? or any further pointers?
> >> > NIOFSDirectory or SimpleFSDirectory, we have to pay another price:
Our
> >> code
> >> >> has to do a lot of syscalls to the O/S kernel to copy blocks of data
> >> >> between the disk or filesystem cache and our buffers residing in
Java
> >> heap.
> >> >> This needs to be done on every search request, over and over again.
> >
> > the blog post just says it simple: You should use MMapDirectory and
avoid SimpleFSDir or MMapDirectory! The blog post explains why: SimpleFSDir
and NIOFSDir extend BufferedIndexInput. This class uses an on-heap buffer
for reading index files (which is 16 KB). For some parts of the index (like
doc values), this is not ideal. E.g. if you sort against a doc values field
and it needs to access a sort value (e.g. a short, integer or byte, which
is very small), it will ask the buffer for the like 4 bytes. In most cases
when sorting the buffer will not contain those byte, as sorting requires
random access over a huge file (so it is unlikely that the buffer will
help). Then BufferedIndexInput will seek the NIO/Simple file pointer and
read 16 KiB into the buffer. This requires a syscall to the OS kernel,
which is expensive. During sorting search results this can be millions or
billions of times. In addition it will copy chunks of memory between Java
heap and operating system cache over and over.
> >
> > With MMapDirectory no buffering is done, the Lucene code directly
accesses the file system cache and this is much more optimized.
> >
> > So for fast index access:
> > - avoid SimpleFSDir or NIOFSDir (those are only there for legacy 32 bit
operating systems and JVMs)
> > - configure your operating system kernel as described in the blog post
and use MMapDirectory
> > - tell the sysadmin to inform himself about the output of linux
commands free/top/... (or Windows complements).
> >
> > Uwe
> >
> >> > --
> >> > Kumaran R
> >> >
> >> >
> >> >
> >> > On Wed, Nov 23, 2016 at 9:17 PM, Erick Erickson
> >> <erickerick...@gmail.com>
> >> > wrote:
> >> >
> >> >> see Uwe's blog:
> >> >> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-
> >> 64bit.html
> >> >>
> >> >> Short form: files are read into the OS's memory as needed. the whole
> >> >> file isn't read at once.
> >> >>
> >> >> Best,
> >> >> Erick
> >> >>
> >> >> On Wed, Nov 23, 2016 at 12:04 AM, Kumaran Ramasubramanian
> >> >> <kums....@gmail.com> wrote:
> >> >> > Hi All,
> >> >> >
> >> >> > how do lucene read large index files?
> >> >> > for example, if one file (for eg: .dat file) is 4GB.
> >> >> > lucene read only part of file to RAM? or
> >> >> > is it different approach for different lucene file formats?
> >> >> >
> >> >> >
> >> >> > Related Link:
> >> >> > How do applications (and OS) handle very big files?
> >> >> > http://superuser.com/a/361201
> >> >> >
> >> >> >
> >> >> > --
> >> >> > Kumaran R
> >> >>
> >> >>
---------------------------------------------------------------------
> >> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> >> >> For additional commands, e-mail: java-user-h...@lucene.apache.org
> >> >>
> >> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> >> For additional commands, e-mail: java-user-h...@lucene.apache.org
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: java-user-h...@lucene.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>

Reply via email to