jainankitk commented on issue #12317: URL: https://github.com/apache/lucene/issues/12317#issuecomment-1557736895
@gsmiller - Thank you for reviewing and providing your comments > it looks like you're primarily looking at an indexing-related performance issue and concerned with the memory usage during writing. Is that correct? Looking at an issue around higher GC in recent versions (8.10+) compared to previous version (7.x). Nothing specifically with the indexing > When you disabled the patch, did you notice query-time performance changes? Did not notice any degradation in performance as the index size is small, so it can fit in memory with / without compression > Compression isn't only useful for saving disk space; it's useful for keeping index pages hot in the OS cache and getting better data locality, which translates to better query-time performance. Not sure if I understand this completely. Based on my understanding, file is nothing but an array of bytes, and lucene reader directly works with that. Now if we compress and store those bytes, the indices into that array changes and reader cannot use that directly. So even if we can keep it hot in the OS cache, some intermediate logic takes care of decoding that sequence of bytes (decompression). That decompressed sequence needs to be stored somewhere, be it byte buffer on heap or native memory. Although we will decode only the blocks that lucene reader needs, we could have directly read the same blocks into native memory from uncompressed file. @jpountz Thoughts? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
