I know I'm late to this thread, but I saw this and specifically "reverse geocoding" and it caught my attention. I recently did this on a public project with Solr, which you may find of interest: https://github.com/cga-harvard/hhypermap-bop/tree/master/enrich/solr-geo-admin I'm super pleased with the performance.
~ David On Wed, May 17, 2017 at 10:59 PM Tom Hirschfeld <tomhirschf...@gmail.com> wrote: > Hey! > > I am working on a lucene based service for reverse geocoding. We have a > large index with lots of unique terms (550 million) and it appears that > we're running into issue with memory on our leaf servers as the term > dictionary for the entire index is being loaded into heap space. If we > allocate > 65g heap space, our queries return relatively quickly (10s -100s > of ms), but if we drop below ~65g heap space on the leaf nodes, query time > drops dramatically, quickly hitting 20+ seconds (our test harness drops at > 20s). > > I did some research, and found in past versions of lucene, one could split > the loading of the terms dictionary using the 'termInfosIndexDivisor' > option in the directoryReader class. That option was deprecated in lucene > 5.0.0 > <https://abi-laboratory.pro/java/tracker/changelog/lucene/5.0.0/log.html> > in > favor of using codecs to achieve similar functionality. Looking at the > available experimental codecs. I see the BlockTreeTermsWriter > < > https://lucene.apache.org/core/5_3_1/core/org/apache/lucene/codecs/blocktree/BlockTreeTermsWriter.html#BlockTreeTermsWriter(org.apache.lucene.index.SegmentWriteState,%20org.apache.lucene.codecs.PostingsWriterBase,%20int,%20int) > > > that > seems like it could be used for a similar purpose, breaking down the term > dictionary so that we don't load the whole thing into heap space. > > Has anyone run into this problem before and found an effective solution? > Does changing the codec used seem appropriate for this issue? If so, how do > I got about loading an alternative codec and configuring it to my needs? > I'm having trouble finding docs/examples of how this is used in the real > world so even if you point me to a repo or docs somewhere I'd appreciate > it. > Thanks! > > Best, > Tom Hirschfeld > -- Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker LinkedIn: http://linkedin.com/in/davidwsmiley | Book: http://www.solrenterprisesearchserver.com