jainankitk commented on issue #12317: URL: https://github.com/apache/lucene/issues/12317#issuecomment-1579572674
> In general, we don't like adding options to file formats and prefer to have full control to keep file formats easy to reason about and to test. > I'm just wondering if disabling this compression is something that users would actually be interested in, as I question how it might impact the query performance. Since I don't have concrete evidence of performance degradation, it looks reasonable to not add option for keeping testing overhead limited > So I wouldn't generally expect it to be a big contributor to a heap profile unless there are many small segments getting written, which could happen if you do frequent refreshes, have many fields, or many indices. In that case it's possible that LZ4 compression never gets used on some fields/segments because of the checks on prefix length and average suffix length, so your idea to lazily allocate this compression hash table might help? Per field per segment looks reasonably high to me, given each of these are allocating 256k (128k for short[] and 128k for int[]). I have seen index mappings upto 1500 fields, although not all of them are text fields. But for these very large documents, we are talking couple hundred mbs. And due to tiered merge policy every segment might be getting merged a few times. Hence, it does make sense to lazily allocate this compression hash table -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
