[ https://issues.apache.org/jira/browse/LUCENE-7863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mikhail Khludnev updated LUCENE-7863: ------------------------------------- Attachment: benchmark-1m.out I've run benchmark for 1M wiki docs [^benchmark-1m.out]. Turns out, a memory consumption for derivative terms (or for the current impl at least) is terrific. So, I couldn't run even 4M benchmark on 16G laptop. Therefore, using ByteRefsHash is absolutely necessary (current code is pretty dumb). Also, I've realized that terms are derived for every merge, but I have no idea how to avoid it. Here is the comparison on 1M wiki with url terms excluded. |round|indexing, mins|search req/sec|ram total, GB |index size, GB| | EdgeNGramm |25|81.04|1.9|6.3| |derived edges|18|35.31|10.2|2.0| So, far search results don't match side by side, but I'm not sure whether they are expected to match in benchmark. A good random test is necessary (fwiw, existing test actually tests nothing). > Don't repeat postings (and perhaps positions) on ReverseWF, EdgeNGram, etc > ---------------------------------------------------------------------------- > > Key: LUCENE-7863 > URL: https://issues.apache.org/jira/browse/LUCENE-7863 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index > Reporter: Mikhail Khludnev > Attachments: benchmark-1m.out, LUCENE-7863.hazard, LUCENE-7863.patch, > LUCENE-7863.patch, LUCENE-7863.patch, LUCENE-7863.patch, LUCENE-7863.patch, > LUCENE-7863.patch, LUCENE-7863.patch, LUCENE-7863.patch > > > h2. Context > \*suffix and \*infix\* searches on large indexes. > h2. Problem > Obviously applying {{ReversedWildcardFilter}} doubles an index size, and I'm > shuddering to think about EdgeNGrams... > h2. Proposal > _DRY_ -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org