RamDirectory vs MemoryIndex vs MMapDirectory for In-Memory-Index
Hi, Lucene provides different storage options for in-memory indexes. I found three structures that would qualify for the task: * RamDirectory (which I currently use for prototyping, but wonder if it is the ideal choice for my task) * MemoryIndex, which claims to have better performance and resource use for small documents * MMapDirectory which should outperform RamDirectory for huge indices (what is "huge?") My plan is to periodically index some properties (string codes, longs, lat/lng points) of a larger database content with Lucene for quicker lookups (compared to slow SQL queries). What would be the most efficient (or intended) storage option for such an index in terms of lookup speed and CPU/memory use? Below [1] is a brief summary of the index contents and I hope these figures are sufficient to get a recommendation. But I am also happy to study more detailed documentation on the matter. - Matthias [1]: Summary of index contents and intended use * Total documents: 500.000 - 1.000.000, may grow to 10.000.000 records in mid future. * Document fields (all of them single value fields): * String (9x), usually 1-10 characters long, mostly recurring values (5% distinct) * LongPoint (4x), two fields contain mostly distinct values, one lostly recurring values (5-10% distinct), one field acts as a primary key * LatLonPoint (1x), 30% distinct * Refresh interval: 1..5 minutes (I currently create a fresh index instance on each update and discard the old one) * Most queries are range queries and exact matches on several properties, sometimes I need to retrieve the property fields of a single document based on a primary key value.
Re: RamDirectory vs MemoryIndex vs MMapDirectory for In-Memory-Index
Use MMapDirectory on a temporary location, Matthias. If you really need in-memory indexes, a new Directory implementation is coming (RAMDirectory will be deprecated, then removed), but the difference compared to MMapDirectory is typically not worth the hassle. See this issue for more discussion. https://issues.apache.org/jira/browse/LUCENE-8438 Dawid On Tue, Sep 25, 2018 at 10:44 AM Matthias Müller wrote: > > Hi, > > Lucene provides different storage options for in-memory indexes. I > found three structures that would qualify for the task: > > * RamDirectory (which I currently use for prototyping, but wonder if it > is the ideal choice for my task) > * MemoryIndex, which claims to have better performance and resource use > for small documents > * MMapDirectory which should outperform RamDirectory for huge indices > (what is "huge?") > > > My plan is to periodically index some properties (string codes, longs, > lat/lng points) of a larger database content with Lucene for quicker > lookups (compared to slow SQL queries). > > What would be the most efficient (or intended) storage option for such > an index in terms of lookup speed and CPU/memory use? Below [1] is a > brief summary of the index contents and I hope these figures are > sufficient to get a recommendation. But I am also happy to study more > detailed documentation on the matter. > > - Matthias > > [1]: Summary of index contents and intended use > * Total documents: 500.000 - 1.000.000, may grow to 10.000.000 records > in mid future. > * Document fields (all of them single value fields): > * String (9x), usually 1-10 characters long, mostly recurring > values (5% distinct) > * LongPoint (4x), two fields contain mostly distinct values, one > lostly recurring values (5-10% distinct), one field acts as a primary > key > * LatLonPoint (1x), 30% distinct > * Refresh interval: 1..5 minutes (I currently create a fresh index > instance on each update and discard the old one) > * Most queries are range queries and exact matches on several > properties, sometimes I need to retrieve the property fields of a > single document based on a primary key value. > - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: RamDirectory vs MemoryIndex vs MMapDirectory for In-Memory-Index
Thanks Dawid, glad I asked! Am Dienstag, den 25.09.2018, 10:46 +0200 schrieb Dawid Weiss: > Use MMapDirectory on a temporary location, Matthias. If you really > need in-memory indexes, a new Directory implementation is coming > (RAMDirectory will be deprecated, then removed), but the difference > compared to MMapDirectory is typically not worth the hassle. See this > issue for more discussion. > > https://issues.apache.org/jira/browse/LUCENE-8438 > > Dawid > On Tue, Sep 25, 2018 at 10:44 AM Matthias Müller > wrote: > > > > Hi, > > > > Lucene provides different storage options for in-memory indexes. I > > found three structures that would qualify for the task: > > > > * RamDirectory (which I currently use for prototyping, but wonder > > if it > > is the ideal choice for my task) > > * MemoryIndex, which claims to have better performance and resource > > use > > for small documents > > * MMapDirectory which should outperform RamDirectory for huge > > indices > > (what is "huge?") > > > > > > My plan is to periodically index some properties (string codes, > > longs, > > lat/lng points) of a larger database content with Lucene for > > quicker > > lookups (compared to slow SQL queries). > > > > What would be the most efficient (or intended) storage option for > > such > > an index in terms of lookup speed and CPU/memory use? Below [1] is > > a > > brief summary of the index contents and I hope these figures are > > sufficient to get a recommendation. But I am also happy to study > > more > > detailed documentation on the matter. > > > > - Matthias > > > > [1]: Summary of index contents and intended use > > * Total documents: 500.000 - 1.000.000, may grow to 10.000.000 > > records > > in mid future. > > * Document fields (all of them single value fields): > > * String (9x), usually 1-10 characters long, mostly recurring > > values (5% distinct) > > * LongPoint (4x), two fields contain mostly distinct values, > > one > > lostly recurring values (5-10% distinct), one field acts as a > > primary > > key > > * LatLonPoint (1x), 30% distinct > > * Refresh interval: 1..5 minutes (I currently create a fresh index > > instance on each update and discard the old one) > > * Most queries are range queries and exact matches on several > > properties, sometimes I need to retrieve the property fields of a > > single document based on a primary key value. > > > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org