[ https://issues.apache.org/jira/browse/LUCENE-7863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mikhail Khludnev updated LUCENE-7863: ------------------------------------- Attachment: bench-byte-array-long.out [^bench-byte-array-long.out] here is the long test log evaluated larger ram buffer for derivative terms. Here is the summary. * derivative terms are indexed 25% slower than edgeNgramms (see below) * they significantly reduces index size. For a usual case, the gain would be bigger, since here we have multi language docs that make postings shorter * derivative terms roughly double ram consumption for indexing (see below) * searching for derivative terms is 30..60%% slower since it's required to gather randomly distributed postings. Indexing can be optimized with using BytesRefHash for collecting multivalue mapping: {code} EdgeNGramm -> {postingOffset} {code}. It also allows appending EdgeNGramms with the least number of bytes to make unique entries from them. Now, it wastefully appends every EdgeNGramm with 5 bytes. > Don't repeat postings (and perhaps positions) on ReverseWF, EdgeNGram, etc > ---------------------------------------------------------------------------- > > Key: LUCENE-7863 > URL: https://issues.apache.org/jira/browse/LUCENE-7863 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index > Reporter: Mikhail Khludnev > Attachments: bench-byte-array2.out, benchmark-1m.out, > LUCENE-7863.hazard, LUCENE-7863.patch, LUCENE-7863.patch, LUCENE-7863.patch, > LUCENE-7863.patch, LUCENE-7863.patch, LUCENE-7863.patch, LUCENE-7863.patch, > LUCENE-7863.patch, LUCENE-7863.patch > > > h2. Context > \*suffix and \*infix\* searches on large indexes. > h2. Problem > Obviously applying {{ReversedWildcardFilter}} doubles an index size, and I'm > shuddering to think about EdgeNGrams... > h2. Proposal > _DRY_ -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org