markharwood commented on issue #1234: Add compression for Binary doc value fields URL: https://github.com/apache/lucene-solr/pull/1234#issuecomment-583449275 There was a suggestion from @jimczi that we fall back to writing raw data if content doesn't compress well. I'm not sure this logic is worth developing for the reasons outlined below: I wrote a [compression buffer](https://gist.github.com/markharwood/91cc8d96d6611ad97df11f244b1b1d0f) to see what the compression algo outputs before deciding whether to write the compressed or raw data to disk. I tested with the most uncompressible content I could imagine: public static void fillRandom(byte[] buffer, int length) { for (int i = 0; i < length; i++) { buffer[i] = (byte) (Math.random() * Byte.MAX_VALUE); } } The LZ4 compressed versions of this content were only marginally bigger than their raw counterparts (adding 0.4% overhead to the original content e.g. 96,921 compressed vs 96,541 raw bytes). On that basis I'm not sure if it's worth doubling the memory costs of the indexing logic (we would require a temporary output buffer that is at least the same size as the raw data being compressed) and additional byte shuffling.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org