[ https://issues.apache.org/jira/browse/LUCENE-4702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17024522#comment-17024522 ]
ASF subversion and git services commented on LUCENE-4702: --------------------------------------------------------- Commit 666bdac64d68c3f247760d0a2a1c7a441502af1e in lucene-solr's branch refs/heads/branch_8x from Adrien Grand [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=666bdac ] LUCENE-4702: CHANGES entry. > Terms dictionary compression > ---------------------------- > > Key: LUCENE-4702 > URL: https://issues.apache.org/jira/browse/LUCENE-4702 > Project: Lucene - Core > Issue Type: Wish > Reporter: Adrien Grand > Assignee: Adrien Grand > Priority: Trivial > Attachments: LUCENE-4702.patch, LUCENE-4702.patch > > Time Spent: 3.5h > Remaining Estimate: 0h > > I've done a quick test with the block tree terms dictionary by replacing a > call to IndexOutput.writeBytes to write suffix bytes with a call to > LZ4.compressHC to test the peformance hit. Interestingly, search performance > was very good (see comparison table below) and the tim files were 14% smaller > (from 150432 bytes overall to 129516). > {noformat} > TaskQPS baseline StdDevQPS compressed StdDev > Pct diff > Fuzzy1 111.50 (2.0%) 78.78 (1.5%) > -29.4% ( -32% - -26%) > Fuzzy2 36.99 (2.7%) 28.59 (1.5%) > -22.7% ( -26% - -18%) > Respell 122.86 (2.1%) 103.89 (1.7%) > -15.4% ( -18% - -11%) > Wildcard 100.58 (4.3%) 94.42 (3.2%) > -6.1% ( -13% - 1%) > Prefix3 124.90 (5.7%) 122.67 (4.7%) > -1.8% ( -11% - 9%) > OrHighLow 169.87 (6.8%) 167.77 (8.0%) > -1.2% ( -15% - 14%) > LowTerm 1949.85 (4.5%) 1929.02 (3.4%) > -1.1% ( -8% - 7%) > AndHighLow 2011.95 (3.5%) 1991.85 (3.3%) > -1.0% ( -7% - 5%) > OrHighHigh 155.63 (6.7%) 154.12 (7.9%) > -1.0% ( -14% - 14%) > AndHighHigh 341.82 (1.2%) 339.49 (1.7%) > -0.7% ( -3% - 2%) > OrHighMed 217.55 (6.3%) 216.16 (7.1%) > -0.6% ( -13% - 13%) > IntNRQ 53.10 (10.9%) 52.90 (8.6%) > -0.4% ( -17% - 21%) > MedTerm 998.11 (3.8%) 994.82 (5.6%) > -0.3% ( -9% - 9%) > MedSpanNear 60.50 (3.7%) 60.36 (4.8%) > -0.2% ( -8% - 8%) > HighSpanNear 19.74 (4.5%) 19.72 (5.1%) > -0.1% ( -9% - 9%) > LowSpanNear 101.93 (3.2%) 101.82 (4.4%) > -0.1% ( -7% - 7%) > AndHighMed 366.18 (1.7%) 366.93 (1.7%) > 0.2% ( -3% - 3%) > PKLookup 237.28 (4.0%) 237.96 (4.2%) > 0.3% ( -7% - 8%) > MedPhrase 173.17 (4.7%) 174.69 (4.7%) > 0.9% ( -8% - 10%) > LowSloppyPhrase 180.91 (2.6%) 182.79 (2.7%) > 1.0% ( -4% - 6%) > LowPhrase 374.64 (5.5%) 379.11 (5.8%) > 1.2% ( -9% - 13%) > HighTerm 253.14 (7.9%) 256.97 (11.4%) > 1.5% ( -16% - 22%) > HighPhrase 19.52 (10.6%) 19.83 (11.0%) > 1.6% ( -18% - 25%) > MedSloppyPhrase 141.90 (2.6%) 144.11 (2.5%) > 1.6% ( -3% - 6%) > HighSloppyPhrase 25.26 (4.8%) 25.97 (5.0%) > 2.8% ( -6% - 13%) > {noformat} > Only queries which are very terms-dictionary-intensive got a performance hit > (Fuzzy, Fuzzy2, Respell, Wildcard), other queries including Prefix3 behaved > (surprisingly) well. > Do you think of it as something worth exploring? -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org