[GitHub] [lucene] jainankitk opened a new issue, #12317: Option for disabling term dictionary compression

via GitHub Fri, 19 May 2023 17:38:51 -0700


jainankitk opened a new issue, #12317:
URL: https://github.com/apache/lucene/issues/12317

### Description

While working on a customer issue, I noticed that memory allocations for
recently added [term dictionary
compression](https://github.com/apache/lucene-solr/commit/33a7af9cbfb9f668b4aee433906ee93d55e1e709)
is significant. After disabling the compression using patch, I was able to
notice some reduction in the memory allocation.

Generally the cost of storage is significantly lower than memory/CPU, but
can be useful once the segment/index is being archived. But during live data
ingestion when segments merge frequently, the cost of compression/decompression
is paid more than once.

Wondering couple of things here:

* Should we expose an option to disable term dictionary compression?
* Does it make sense to initialize the HighCompressionHashTable lazily in
TermsWriter? As some code paths (non-compression) don't end up using this.

For context, the customer workload is running on instance having 32G memory
with 16G allocated for heap. Attaching the memory allocation profile below:

![Screenshot 2023-05-19 at 5 27 53
PM](https://github.com/apache/lucene/assets/8193480/bfa964b4-76d4-4903-89a7-7164f993ea3e)

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [lucene] jainankitk opened a new issue, #12317: Option for disabling term dictionary compression

Reply via email to