[ https://issues.apache.org/jira/browse/LUCENE-3584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13162392#comment-13162392 ]
Robert Muir commented on LUCENE-3584: ------------------------------------- Yonik, where is the code to your benchmark? I don't trust it. hotspot likes to change how it compiles readvint so be sure to use lots of jvm iterations. I tested this change with luceneutil (lots of iterations, takes an hour to run) and everything was the same, with disjunction queries looking better every time I ran it. I think everything is just fine. ||Task||QPS||trunkStdDev||trunk QPS||patchStdDev||patch Pct diff|| |IntNRQ|10.44|0.69|9.80|0.88|-19% - 9%| |Wildcard|24.93|0.41|24.23|0.44|-6% - 0%| |Prefix3|48.83|1.14|47.45|1.09|-7% - 1%| |TermBGroup1M1P|43.29|1.08|42.28|1.31|-7% - 3%| |PKLookup|187.88|4.49|186.43|5.07|-5% - 4%| |AndHighHigh|15.10|0.25|14.99|0.54|-5% - 4%| |SpanNear|15.96|0.43|15.87|0.43|-5% - 4%| |TermBGroup1M|32.30|0.87|32.14|0.64|-4% - 4%| |SloppyPhrase|14.53|0.50|14.47|0.55|-7% - 7%| |TermGroup1M|24.07|0.54|24.01|0.48|-4% - 4%| |Respell|87.11|3.74|86.91|4.05|-8% - 9%| |Fuzzy1|94.79|3.18|94.58|4.05|-7% - 7%| |Fuzzy2|48.13|1.92|48.10|2.45|-8% - 9%| |Phrase|9.10|0.41|9.11|0.41|-8% - 9%| |Term|135.52|4.74|137.26|2.91|-4% - 7%| |AndHighMed|51.64|0.92|53.20|1.90|-2% - 8%| |OrHighHigh|10.75|0.62|11.79|0.60|-1% - 22%| |OrHighMed|12.20|0.75|13.40|0.71|-1% - 23%| > bulk postings should be codec private > ------------------------------------- > > Key: LUCENE-3584 > URL: https://issues.apache.org/jira/browse/LUCENE-3584 > Project: Lucene - Java > Issue Type: Task > Reporter: Robert Muir > Assignee: Robert Muir > Fix For: 4.0 > > Attachments: LUCENE-3584.patch > > > In LUCENE-2723, a lot of work was done to speed up Lucene's bulk postings > read API. > There were some upsides: > * you could specify things like 'i dont care about frequency data up front'. > This made things like multitermquery->filter and other consumers that don't > care about freqs faster. But this is unrelated to 'bulkness' and we have a > separate patch now for this on LUCENE-2929. > * the buffersize for standardcodec was increased to 128, increasing > performance > for TermQueries, but this was unrelated too. > But there were serious downsides/nocommits: > * the API was hairy because it tried to be 'one-size-fits-all'. This made > consumer code crazy. > * the API could not really be specialized to your codec: e.g. could never > take advantage that e.g. docs and freqs are aligned. > * the API forced codecs to implement delta encoding for things like documents > and positions. > But this is totally up to the codec how it wants to encode! Some codecs > might not use delta encoding. > * using such an API for positions was only theoretical, it would have been > super complicated and I doubt ever > performant or maintainable. > * there was a regression with advance(), probably because the api forced you > to do both a linear scan thru > the remaining buffer, then refill... > I think a cleaner approach is to let codecs do whatever they want to > implement the DISI > contract. This lets codecs have the freedom to implement whatever > compression/buffering they want > for the best performance, and keeps consumers simple. If a codec uses delta > encoding, or if it wants > to defer this to the last possible minute or do it at decode time, thats its > own business. Maybe a codec > doesn't want to do any buffering at all. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org