Robert, Three exchanges ago in this thread, you made the incorrect assumption that the motivation behind using mmap was read speed, and that memory mapping was being waved around as some sort of magic wand:
Is there something that I am missing? I see lots of references to using "memory mapped" files to "dramatically" improve performance. I don't think this is the case at all. At the lowest levels, it is somewhat more efficient from a CPU standpoint, but with a decent OS cache the IO performance difference is going to negligible. In response, I indicated that the mmap design had been discussed in JIRA, and pointed you at a particular issue. There have been substantial discussions about this design in JIRA, notably LUCENE-1458. The "dramatic" improvement is WRT to opening/reopening an IndexReader. Apparently, you did not go back to read that JIRA thread, because you subsequently offered a critique of a purely invented design you assumed we must have arrived at, and continued to argue with a straw man about read speed: 1. with "fixed" size terms, the additional IO (larger pages) probably offsets a lot of the random access benefit. This is why "compressed" disks on a fast machine (CPU) are often faster than "uncompressed" - more data is read during every IO access. While my reply did not specifically point back to LUCENE-1458 again, I hoped that having your foolish assumption exposed would motivate you to go back and read it, so that you could offer an informed critique of the *actual* design. I also linked to a specific comment in LUCENE-831 which explained how mmap applied to sort caches. Additionally, sort caches would be written at index time in three files, and memory mapped as laid out in <https://issues.apache.org/jira/browse/LUCENE-831?focusedCommentId=12656150#action_12656150>. Apparently you still didn't go back and read up, because you subsequently made a third incorrect assumption, this time about plans to do away with the term dictionary index. In response I griped about JIRA again, using slightly stronger but still intentionally indirect language. No. That idea was entertained briefly and quickly discarded. There seems to be an awful lot of irrelevant noise in the current thread arising due to lack of familiarity with the ongoing discussions in JIRA. Unfortunately, this must not have worked either, because you have now offered a fourth message based on incorrect assumptions which would have been remedied by bringing yourself up to date with the relevant JIRA threads. > That could very well be, but I was referencing your statement: > > "1) Design index formats that can be memory mapped rather than slurped, > bringing the cost of opening/reopening an IndexReader down to a > negligible level." > > The only reason to do this (or have it happen) is if you perform a binary > search on the term index. No. As discussed in LUCENE-1458, LUCENE-1483, the specific link I pointed you towards in LUCENE-831, the message where I provided you with that link, and elsewhere in this thread... loading the term dictionary index is important, but the cost pales in comparison to the cost of loading sort caches. > Using a 2 file system is going to be WAY slower - I'll bet lunch. It might be > workable if the files were on a striped drive, or put each file on a different > drive/controller, but requiring such specially configured hardware is not a > good idea. In the common case (single drive), you are going to be seeking all > over the place. Mike McCandless and I had an extensive debate about the pros and cons of depending on the OS cache to hold the term dictionary index under LUCENE-1458. The concerns you express here were fully addressed, and even resolved under an "agree to disagree" design. > Also, the mmap is only suitable for 64 bit platforms, since there is no way > in Java to unmap, you are going to run out of address space as segments are > rewritten. The discussion of how the mmap design translates from Lucy to Lucene is an important one, but I despair of having it if we have to rehash all of LUCENE-1458, LUCENE-831, and possibly LUCENE-1476 and LUCENE-1483 because you cannot be troubled to bring yourself up to speed before commenting. You are obviously knowledgable on the subject of low level memory issues. Me and Mike McCandless ain't exactly chopped liver, though, and neither are a lot of other people around here who *are* bothering to keep up with the threads in JIRA. I request that you show the rest of us more respect. Our time is valuable, too. Marvin Humphrey --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org