[ https://issues.apache.org/jira/browse/LUCENE-9663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17266976#comment-17266976 ]
Jaison.Bi commented on LUCENE-9663: ----------------------------------- Thanks for the comment, [~mikemccand] I added one benchmark test to compare the diff of building OridinalMap. Still using the data mentioned in previous comment. Each index contains 4 segments. Index directory size: ||Before||After|| |6.23 GB|5.38 GB| (I didnot count dvd file size since compound file exist) See below results: ||Benchmark||Mode||Cnt||Score||Error||Units|| |BuildOrdinalMapBenchmark.buildOrdinalMap_extend_After|avgt|15|2120.204|± 111.956|ms/op| |BuildOrdinalMapBenchmark.buildOrdinalMap_extend_Before|avgt|15|1217.172|± 57.555|ms/op| |BuildOrdinalMapBenchmark.buildOrdinalMap_host_After|avgt|15|4.775|± 0.260|ms/op| |BuildOrdinalMapBenchmark.buildOrdinalMap_host_Before|avgt|15|4.667|± 0.154|ms/op| |BuildOrdinalMapBenchmark.buildOrdinalMap_obj_After|avgt|15|670.785|± 52.170|ms/op| |BuildOrdinalMapBenchmark.buildOrdinalMap_obj_Before|avgt|15|557.300|± 80.592|ms/op| |BuildOrdinalMapBenchmark.buildOrdinalMap_reqid_After|avgt|15|876.092|± 112.798|ms/op| |BuildOrdinalMapBenchmark.buildOrdinalMap_reqid_Before|avgt|15|515.775|± 61.233|ms/op| |BuildOrdinalMapBenchmark.buildOrdinalMap_uploadtime_After|avgt|15|167.986|± 5.600|ms/op| |BuildOrdinalMapBenchmark.buildOrdinalMap_uploadtime_Before|avgt|15|162.752|± 1.934|ms/op| |BuildOrdinalMapBenchmark.buildOrdinalMap_url_After|avgt|15|667.657|± 18.655|ms/op| |BuildOrdinalMapBenchmark.buildOrdinalMap_url_Before|avgt|15|524.013|± 27.244|ms/op| > Adding compression to terms dict from SortedSet/Sorted DocValues > ---------------------------------------------------------------- > > Key: LUCENE-9663 > URL: https://issues.apache.org/jira/browse/LUCENE-9663 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs > Reporter: Jaison.Bi > Priority: Trivial > Time Spent: 10m > Remaining Estimate: 0h > > Elasticsearch keyword field uses SortedSet DocValues. In our applications, > “keyword” is the most frequently used field type. > LUCENE-7081 has done prefix-compression for docvalues terms dict. We can do > better by replacing prefix-compression with LZ4. In one of our application, > the dvd files were ~41% smaller with this change(from 1.95 GB to 1.15 GB). > I've done simple tests based on the real application data, comparing the > write/merge time cost, and the on-disk *.dvd file size(after merge into 1 > segment). > || ||Before||After|| > |Write time cost(ms)|591972|618200| > |Merge time cost(ms)|270661|294663| > |*.dvd file size(GB)|1.95|1.15| > This feature is only for the high-cardinality fields. > I'm doing the benchmark test based on luceneutil. Will attach the report and > patch after the test. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org