[ 
https://issues.apache.org/jira/browse/LUCENE-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508292#comment-13508292
 ] 

Uwe Schindler commented on LUCENE-4584:
---------------------------------------

I agree with Robert here. We don't need to test random data, for Lucene only 2 
things are important:
- When you compress random data and decompress it again, the same exact bytes 
must come back. This should be tested and needs no external C code. This is the 
doesn't corrumptâ„¢ Robert is talking about.
- The compressed content should never get significantly bigger

There is no reason at all that Lucene's LZ4 returns the same compressed output. 
E.g. if we find a better algorithm that performs better in Hotspot, although it 
compresses to a different byte array, we are perfectly fine.

If we want to assert for now that both algorithms create the same compressed 
output, we should have three different size random byte files (e.g. generated 
by /dev/urandom) as test resources and the C-compressed ones also as test 
resources, and then we can compare the results. We should just document how the 
test data was created. But keep in mind: We may change the algorithm to produce 
different bytes, so this is not mandatory. I think we may only assert that the 
compression percentage of the random data is identical, not the actual bytes.
                
> Compare the LZ4 implementation in Lucene against the original impl
> ------------------------------------------------------------------
>
>                 Key: LUCENE-4584
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4584
>             Project: Lucene - Core
>          Issue Type: Task
>            Reporter: Adrien Grand
>            Assignee: Adrien Grand
>             Fix For: 4.1
>
>
> We should add tests to make sure that the LZ4 impl in Lucene compresses data 
> the exact same way as the original impl.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to