Thanks Mike. We are planning to move MMapDirectory in both indexing and searching. Regarding ulimit change and read during merging, i just tried to know the impact of mmapdir during indexing.
- Kumaran R On Nov 30, 2016 4:18 AM, "Michael McCandless" <luc...@mikemccandless.com> wrote: > > It's OK to use NIOFSDirectory for indexing only in that nothing will break. > > But, MMapDirectory already uses normal IO for writing > (java.io.FileOutputStream), and indexing does sometimes need to to > read (for merging segments) though that's largely sequential reading > so perhaps NIOFSDirectory won't be much slower. > > Why not use MMapDirectory for both indexing and searching? > Mike McCandless > > http://blog.mikemccandless.com > > > On Mon, Nov 28, 2016 at 7:20 AM, Kumaran Ramasubramanian > <kums....@gmail.com> wrote: > > Thanks a lot Uwe!!! Do we get any benefit on using MMapDirectory over > > NIOFSDir during indexing? During merging? Is it ok to change to > > MMapDirectory during search alone? > > > > -- > > Kumaran R > > > > > > On Nov 24, 2016 11:27 PM, "Erick Erickson" <erickerick...@gmail.com> wrote: > >> > >> Thanks Uwe! > >> > >> > >> > >> > >> On Thu, Nov 24, 2016 at 9:41 AM, Uwe Schindler <u...@thetaphi.de> wrote: > >> > Hi Kumaran, hi Erick, > >> > > >> >> Not really, as I don't know that code well, Uwe and company > >> >> are the masters of that realm ;).... > >> >> > >> >> Sorry I can't be more help there.... > >> > > >> > I can help! > >> > > >> >> On Thu, Nov 24, 2016 at 7:29 AM, Kumaran Ramasubramanian > >> >> <kums....@gmail.com> wrote: > >> >> > Erick, Thanks a lot for sharing an excellent post... > >> >> > > >> >> > Btw, am using NIOFSDirectory, could you please elaborate on below > >> >> mentioned > >> >> > lines? or any further pointers? > >> >> > NIOFSDirectory or SimpleFSDirectory, we have to pay another price: > > Our > >> >> code > >> >> >> has to do a lot of syscalls to the O/S kernel to copy blocks of data > >> >> >> between the disk or filesystem cache and our buffers residing in > > Java > >> >> heap. > >> >> >> This needs to be done on every search request, over and over again. > >> > > >> > the blog post just says it simple: You should use MMapDirectory and > > avoid SimpleFSDir or MMapDirectory! The blog post explains why: SimpleFSDir > > and NIOFSDir extend BufferedIndexInput. This class uses an on-heap buffer > > for reading index files (which is 16 KB). For some parts of the index (like > > doc values), this is not ideal. E.g. if you sort against a doc values field > > and it needs to access a sort value (e.g. a short, integer or byte, which > > is very small), it will ask the buffer for the like 4 bytes. In most cases > > when sorting the buffer will not contain those byte, as sorting requires > > random access over a huge file (so it is unlikely that the buffer will > > help). Then BufferedIndexInput will seek the NIO/Simple file pointer and > > read 16 KiB into the buffer. This requires a syscall to the OS kernel, > > which is expensive. During sorting search results this can be millions or > > billions of times. In addition it will copy chunks of memory between Java > > heap and operating system cache over and over. > >> > > >> > With MMapDirectory no buffering is done, the Lucene code directly > > accesses the file system cache and this is much more optimized. > >> > > >> > So for fast index access: > >> > - avoid SimpleFSDir or NIOFSDir (those are only there for legacy 32 bit > > operating systems and JVMs) > >> > - configure your operating system kernel as described in the blog post > > and use MMapDirectory > >> > - tell the sysadmin to inform himself about the output of linux > > commands free/top/... (or Windows complements). > >> > > >> > Uwe > >> > > >> >> > -- > >> >> > Kumaran R > >> >> > > >> >> > > >> >> > > >> >> > On Wed, Nov 23, 2016 at 9:17 PM, Erick Erickson > >> >> <erickerick...@gmail.com> > >> >> > wrote: > >> >> > > >> >> >> see Uwe's blog: > >> >> >> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on- > >> >> 64bit.html > >> >> >> > >> >> >> Short form: files are read into the OS's memory as needed. the whole > >> >> >> file isn't read at once. > >> >> >> > >> >> >> Best, > >> >> >> Erick > >> >> >> > >> >> >> On Wed, Nov 23, 2016 at 12:04 AM, Kumaran Ramasubramanian > >> >> >> <kums....@gmail.com> wrote: > >> >> >> > Hi All, > >> >> >> > > >> >> >> > how do lucene read large index files? > >> >> >> > for example, if one file (for eg: .dat file) is 4GB. > >> >> >> > lucene read only part of file to RAM? or > >> >> >> > is it different approach for different lucene file formats? > >> >> >> > > >> >> >> > > >> >> >> > Related Link: > >> >> >> > How do applications (and OS) handle very big files? > >> >> >> > http://superuser.com/a/361201 > >> >> >> > > >> >> >> > > >> >> >> > -- > >> >> >> > Kumaran R > >> >> >> > >> >> >> > > --------------------------------------------------------------------- > >> >> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > >> >> >> For additional commands, e-mail: java-user-h...@lucene.apache.org > >> >> >> > >> >> >> > >> >> > >> >> --------------------------------------------------------------------- > >> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > >> >> For additional commands, e-mail: java-user-h...@lucene.apache.org > >> > > >> > > >> > --------------------------------------------------------------------- > >> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > >> > For additional commands, e-mail: java-user-h...@lucene.apache.org > >> > > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > >> For additional commands, e-mail: java-user-h...@lucene.apache.org > >>