[ https://issues.apache.org/jira/browse/LUCENE-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13724955#comment-13724955 ]
Han Jiang commented on LUCENE-3069: ----------------------------------- Performance result after last patch(intersect) is applied. On wiki 33M data, between TempFST(with intersect) and TempFSTOrd(with intersect): {noformat} Task QPS base StdDev QPS comp StdDev Pct diff PKLookup 232.47 (1.0%) 205.28 (2.0%) -11.7% ( -14% - -8%) Prefix3 26.93 (1.2%) 28.40 (1.4%) 5.5% ( 2% - 8%) Wildcard 6.75 (2.1%) 7.37 (1.5%) 9.2% ( 5% - 13%) Fuzzy1 29.86 (1.8%) 51.87 (3.7%) 73.7% ( 67% - 80%) Fuzzy2 30.82 (1.6%) 53.82 (2.7%) 74.7% ( 69% - 80%) Respell 27.30 (1.2%) 49.55 (2.6%) 81.5% ( 76% - 86%) {noformat} So the decoding of outputs is really the main hurt. And now we should start to compare it with trunk (base=Lucene41, comp=TempFSTOrd): Hmm, I must have done something wrong on wildcard query here. {noformat} Task QPS base StdDev QPS comp StdDev Pct diff Wildcard 19.21 (2.1%) 7.30 (0.3%) -62.0% ( -63% - -60%) Prefix3 33.69 (1.2%) 28.18 (0.9%) -16.4% ( -18% - -14%) Fuzzy1 61.59 (2.1%) 52.36 (0.8%) -15.0% ( -17% - -12%) Fuzzy2 60.94 (1.0%) 54.15 (1.3%) -11.1% ( -13% - -8%) Respell 54.21 (2.8%) 49.54 (1.2%) -8.6% ( -12% - -4%) PKLookup 148.40 (1.0%) 208.07 (3.6%) 40.2% ( 35% - 45%) {noformat} I'll commit current version so we can iterate on it. > Lucene should have an entirely memory resident term dictionary > -------------------------------------------------------------- > > Key: LUCENE-3069 > URL: https://issues.apache.org/jira/browse/LUCENE-3069 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index, core/search > Affects Versions: 4.0-ALPHA > Reporter: Simon Willnauer > Assignee: Han Jiang > Labels: gsoc2013 > Fix For: 5.0, 4.5 > > Attachments: df-ttf-estimate.txt, example.png, LUCENE-3069.patch, > LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, > LUCENE-3069.patch, LUCENE-3069.patch > > > FST based TermDictionary has been a great improvement yet it still uses a > delta codec file for scanning to terms. Some environments have enough memory > available to keep the entire FST based term dict in memory. We should add a > TermDictionary implementation that encodes all needed information for each > term into the FST (custom fst.Output) and builds a FST from the entire term > not just the delta. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org