[ 
https://issues.apache.org/jira/browse/LUCENE-4196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-4196:
--------------------------------

    Attachment: LUCENE-4196.patch

Hi Uwe... tricky issue.

I think in general we were doing pretty good actually: often times various 
things like assert value >= 0 after a readVLong (well, thats ok to be an assert 
since readVLong does a real check anyways).

But I think we were missing a few key checks that cost nothing and would detect 
issues:
* when reading things like #fields or #segments, check thats not negative. 
otherwise the loop does nothing and it acts like zero.
* things like docCount in the .si really need to be checked they are positive: 
this file isn't checksummed (the information used to be in sis which is) and 
had basically no checks.
* check that we consume entire .si file just like we do with .fnm
* the asserts for checking duplicate field names/numbers in FieldInfos and 
terms dicts were a good idea, but a real check for duplicate values is 
basically free anyway, since Map.put returns the previous value and its just a 
null check.
* statistics are a wonderful cheap check in term dicts impls at startup...

Anyway I think this is a good start. I'll wait for review but I'd like to 
commit this patch tomorrow.
                
> Turn asserts in I/O related code into hard checks
> -------------------------------------------------
>
>                 Key: LUCENE-4196
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4196
>             Project: Lucene - Core
>          Issue Type: Task
>          Components: core/index
>    Affects Versions: 4.0-ALPHA
>            Reporter: Uwe Schindler
>             Fix For: 4.0
>
>         Attachments: LUCENE-4196.patch
>
>
> In lots of codecs we only assert, that e.g. some things inside files are 
> correctly in bounds, which leads to security problems (ok, not as bad as 
> C-Style buffer overflows), but e.g. allocating a large array after reading a 
> VInt from a file header and then OOM, is a security issue. So we have to 
> check all those contracts for files as hard checks, especially as a simply 
> check in most cases dont cost anything (and it costs not more than the assert 
> itsself, as the assert also takes CPU power, because it needs a check one 
> time on a static final class field).
> Of course we cannot check values we read when reading postings, but the 
> simple checks that any postings file has correct header and something like a 
> positive number of elements, or number of elements < file size,..., a 
> bit-fireld only contains valid bits in StoredFieldsReader, or non-duplicate 
> filenames (CFS) are very important. We had those checks in 3.x, but in 4.0, 
> Mike changed all of those to asserts during the flex development (in my 
> opinion with no real reason).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to