[jira] [Commented] (LUCENE-4484) NRTCachingDir can't handle large files

Michael McCandless (JIRA) Tue, 16 Oct 2012 04:03:07 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-4484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13476908#comment-13476908
 ]


Michael McCandless commented on LUCENE-4484:
--------------------------------------------

bq. Can uncache() be changed to return the still-open newly created IndexOutput?

I think we'd have to wrap the RAMOutputStream .. then we could 1) know when too 
many bytes have been written, 2) close the wrapped RAMOutputStream and call 
uncache to move it to disk, 3) fix uncache to not close the IO (return it), 4) 
cutover the wrapper to the new on-disk IO.  And all of this would have to be 
done inside a writeByte/s call (from the caller's standpoint) ... it seems 
hairy.

We could also just leave it be, ie advertise this limitation.  NRTCachingDir is 
already hairy enough...  The purpose of this directory is to be used in an NRT 
setting where you have relatively frequent reopens compared to the indexing 
rate, and this naturally keeps files plenty small.  It's also particularly 
unusual to index only stored fields in an NRT setting (what this test is doing).

Yet another option would be to somehow have the indexer be able to flush based 
on size of stored fields / term vectors files ... today of course we completely 
disregard these from the RAM accounting since we write their bytes directly to 
disk.  Maybe ... the app could pass the indexer an AtomicInt/Long recording 
"bytes held elsewhere in RAM", and indexer would add that in its logic for when 
to trigger a flush...
                
> NRTCachingDir can't handle large files
> --------------------------------------
>
>                 Key: LUCENE-4484
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4484
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Michael McCandless
>
> I dug into this OOME, which easily repros for me on rev 1398268:
> {noformat}
> ant test  -Dtestcase=Test4GBStoredFields -Dtests.method=test 
> -Dtests.seed=2D89DD229CD304F5 -Dtests.multiplier=3 -Dtests.nightly=true 
> -Dtests.slow=true 
> -Dtests.linedocsfile=/home/hudson/lucene-data/enwiki.random.lines.txt 
> -Dtests.locale=ru -Dtests.timezone=Asia/Vladivostok 
> -Dtests.file.encoding=UTF-8 -Dtests.verbose=true
> {noformat}
> The problem is the test got NRTCachingDir ... which cannot handle large files 
> because it decides up front (when createOutput is called) whether the file 
> will be in RAMDir vs wrapped dir ... so if that file turns out to be immense 
> (which this test does since stored fields files can grow arbitrarily huge w/o 
> any flush happening) then it takes unbounded RAM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4484) NRTCachingDir can't handle large files

Reply via email to