[
https://issues.apache.org/jira/browse/LUCENE-9843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17302450#comment-17302450
]
Michael McCandless commented on LUCENE-9843:
--------------------------------------------
{quote}There is a more obvious one to fix immediately: {{SORTED}}. Why is the
codec option available on {{SORTED}} terms dictionary? The option is not
necessary: it does not impact the speed of per-document ordinals. And the term
dictionary (for lookupOrd) is block-compressed, prefix coded, etc regardless of
what you supply. So let's please remove the option there.
{quote}
+1, I agree use cases should not be relying on super fast ord lookup, so
hardwired compression is the right choice here.
{quote}For the {{BINARY}}, I personally think it is wrong to compress by
default, in the default codec. The user wants a per-document byte[] (with their
custom encoding), we should make it fast and just plumb it through. It's like a
catch-all type when no other type (numeric, string, etc) is truly suitable.
Sure, maybe some users are putting "yuge" stuff in there, where compression
might not hurt their speed and save some disk: we could supply a different
codec in the {{codecs/}} package for such users. But I don't think it makes
sense at all to support in the default codec with backwards compatibility.
{quote}
Yeah, +1.
> Remove compression option on doc values
> ---------------------------------------
>
> Key: LUCENE-9843
> URL: https://issues.apache.org/jira/browse/LUCENE-9843
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Adrien Grand
> Priority: Minor
>
> Options on file formats add complexity and put a big tax on
> backward-compatibility testing. I'm the one who introduced it LUCENE-9378 but
> I would now like to think about what we can do to remove this option.
> For the record, compression was initially introduced because some binary
> fields have so much redundancy that it's wasteful not to compress them at
> all. But unfortunately, this slowed down some search workloads and we decided
> to introduce this option as a way to let users choose the trade-off they want.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]