Re: how do lucene read large index files?

Kumaran Ramasubramanian Tue, 29 Nov 2016 20:02:53 -0800

Thanks Mike. We are planning to move  MMapDirectory in both indexing and
searching.    Regarding ulimit change and read during merging, i just tried
to know the impact of mmapdir during indexing.


-
Kumaran R


On Nov 30, 2016 4:18 AM, "Michael McCandless" <luc...@mikemccandless.com>
wrote:
>
> It's OK to use NIOFSDirectory for indexing only in that nothing will
break.
>
> But, MMapDirectory already uses normal IO for writing
> (java.io.FileOutputStream), and indexing does sometimes need to to
> read (for merging segments) though that's largely sequential reading
> so perhaps NIOFSDirectory won't be much slower.
>
> Why not use MMapDirectory for both indexing and searching?
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Mon, Nov 28, 2016 at 7:20 AM, Kumaran Ramasubramanian
> <kums....@gmail.com> wrote:
> > Thanks a lot Uwe!!! Do we get any benefit on using MMapDirectory over
> > NIOFSDir during indexing? During merging? Is it ok to change to
> > MMapDirectory during search alone?
> >
> > --
> > Kumaran R
> >
> >
> > On Nov 24, 2016 11:27 PM, "Erick Erickson" <erickerick...@gmail.com>
wrote:
> >>
> >> Thanks Uwe!
> >>
> >>
> >>
> >>
> >> On Thu, Nov 24, 2016 at 9:41 AM, Uwe Schindler <u...@thetaphi.de> wrote:
> >> > Hi Kumaran, hi Erick,
> >> >
> >> >> Not really, as I don't know that code well, Uwe and company
> >> >> are the masters of that realm ;)....
> >> >>
> >> >> Sorry I can't be more help there....
> >> >
> >> > I can help!
> >> >
> >> >> On Thu, Nov 24, 2016 at 7:29 AM, Kumaran Ramasubramanian
> >> >> <kums....@gmail.com> wrote:
> >> >> > Erick, Thanks a lot for sharing an excellent post...
> >> >> >
> >> >> > Btw, am using NIOFSDirectory, could you please elaborate on below
> >> >> mentioned
> >> >> > lines? or any further pointers?
> >> >> > NIOFSDirectory or SimpleFSDirectory, we have to pay another price:
> > Our
> >> >> code
> >> >> >> has to do a lot of syscalls to the O/S kernel to copy blocks of
data
> >> >> >> between the disk or filesystem cache and our buffers residing in
> > Java
> >> >> heap.
> >> >> >> This needs to be done on every search request, over and over
again.
> >> >
> >> > the blog post just says it simple: You should use MMapDirectory and
> > avoid SimpleFSDir or MMapDirectory! The blog post explains why:
SimpleFSDir
> > and NIOFSDir extend BufferedIndexInput. This class uses an on-heap
buffer
> > for reading index files (which is 16 KB). For some parts of the index
(like
> > doc values), this is not ideal. E.g. if you sort against a doc values
field
> > and it needs to access a sort value (e.g. a short, integer or byte,
which
> > is very small), it will ask the buffer for the like 4 bytes. In most
cases
> > when sorting the buffer will not contain those byte, as sorting requires
> > random access over a huge file (so it is unlikely that the buffer will
> > help). Then BufferedIndexInput will seek the NIO/Simple file pointer and
> > read 16 KiB into the buffer. This requires a syscall to the OS kernel,
> > which is expensive. During sorting search results this can be millions
or
> > billions of times. In addition it will copy chunks of memory between
Java
> > heap and operating system cache over and over.
> >> >
> >> > With MMapDirectory no buffering is done, the Lucene code directly
> > accesses the file system cache and this is much more optimized.
> >> >
> >> > So for fast index access:
> >> > - avoid SimpleFSDir or NIOFSDir (those are only there for legacy 32
bit
> > operating systems and JVMs)
> >> > - configure your operating system kernel as described in the blog
post
> > and use MMapDirectory
> >> > - tell the sysadmin to inform himself about the output of linux
> > commands free/top/... (or Windows complements).
> >> >
> >> > Uwe
> >> >
> >> >> > --
> >> >> > Kumaran R
> >> >> >
> >> >> >
> >> >> >
> >> >> > On Wed, Nov 23, 2016 at 9:17 PM, Erick Erickson
> >> >> <erickerick...@gmail.com>
> >> >> > wrote:
> >> >> >
> >> >> >> see Uwe's blog:
> >> >> >> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-
> >> >> 64bit.html
> >> >> >>
> >> >> >> Short form: files are read into the OS's memory as needed. the
whole
> >> >> >> file isn't read at once.
> >> >> >>
> >> >> >> Best,
> >> >> >> Erick
> >> >> >>
> >> >> >> On Wed, Nov 23, 2016 at 12:04 AM, Kumaran Ramasubramanian
> >> >> >> <kums....@gmail.com> wrote:
> >> >> >> > Hi All,
> >> >> >> >
> >> >> >> > how do lucene read large index files?
> >> >> >> > for example, if one file (for eg: .dat file) is 4GB.
> >> >> >> > lucene read only part of file to RAM? or
> >> >> >> > is it different approach for different lucene file formats?
> >> >> >> >
> >> >> >> >
> >> >> >> > Related Link:
> >> >> >> > How do applications (and OS) handle very big files?
> >> >> >> > http://superuser.com/a/361201
> >> >> >> >
> >> >> >> >
> >> >> >> > --
> >> >> >> > Kumaran R
> >> >> >>
> >> >> >>
> > ---------------------------------------------------------------------
> >> >> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> >> >> >> For additional commands, e-mail: java-user-h...@lucene.apache.org
> >> >> >>
> >> >> >>
> >> >>
> >> >>
---------------------------------------------------------------------
> >> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> >> >> For additional commands, e-mail: java-user-h...@lucene.apache.org
> >> >
> >> >
> >> > ---------------------------------------------------------------------
> >> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> >> > For additional commands, e-mail: java-user-h...@lucene.apache.org
> >> >
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> >> For additional commands, e-mail: java-user-h...@lucene.apache.org
> >>

Re: how do lucene read large index files?

Reply via email to