[
https://issues.apache.org/jira/browse/LUCENE-6841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14960553#comment-14960553
]
Uwe Schindler commented on LUCENE-6841:
---------------------------------------
bq. or some way to pick and choose which fields get compressed? Perhaps a
minimum length of a field could be specified before it's compressed? I'm not
sure if that's possible.
The compression is in blocks and not per field. To read a stored field it has
to load and decompress the whole block, containing multiple fields (and maybe
also multiple documents). If you do this for many small fields where you are
only interested in some of them you may use the wrong type of storage. Since
Lucene 4 there is an alternative, column-based store called docvalues. If you
want to use data fields during scoring or to read sequentially a single field
for all documents, then it is better to use column-based docvalues (which can
be numeric, too - e.g. useful as scoring factors).
Stored fields use-case is to load few documents as part of search result
display of like 10 or 20 top-ranking documents. They are not made for
processing millions of documents like to retrieve scoring factors.
> LZ4 compression using too much CPU time
> ---------------------------------------
>
> Key: LUCENE-6841
> URL: https://issues.apache.org/jira/browse/LUCENE-6841
> Project: Lucene - Core
> Issue Type: Bug
> Components: core/codecs
> Affects Versions: 5.3.1
> Environment: Linux, Java 8
> Reporter: Karl von Randow
>
> I am using Lucene for search indexing, including storing a large number of
> small fields, and some larger plain text fields, and searching using both
> exact matches and analyzed queries.
> LZ4 (specifically the decompress method) is using nearly exactly 50% of the
> application's CPU time.
> It seems to me that LZ4 is inappropriate for my use case. I note that I can
> choose BEST_SPEED or BEST_COMPRESSION.
> Would it be palatable to add a NO_COMPRESSION option, or some way to pick and
> choose which fields get compressed? Perhaps a minimum length of a field could
> be specified before it's compressed? I'm not sure if that's possible.
> If this approach, or similar is palatable, I would be happy to contribute a
> patch (or to consume and test a patch).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]