[ 
https://issues.apache.org/jira/browse/LUCENE-2537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891123#action_12891123
 ] 

Michael McCandless commented on LUCENE-2537:
--------------------------------------------

Nice results Shai!

bq. I think, given these results, we can use the FileChannel method w/ a chunk 
size of 4 (or even 2) MB, to be on the safe side and don't eat up too much RAM?

+1

> FSDirectory.copy() impl is unsafe
> ---------------------------------
>
>                 Key: LUCENE-2537
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2537
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Store
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>             Fix For: 3.1, 4.0
>
>         Attachments: FileCopyTest.java
>
>
> There are a couple of issues with it:
> # FileChannel.transferFrom documents that it may not copy the number of bytes 
> requested, however we don't check the return value. So need to fix the code 
> to read in a loop until all bytes were copied..
> # When calling addIndexes() w/ very large segments (few hundred MBs in size), 
> I ran into the following exception (Java 1.6 -- Java 1.5's exception was 
> cryptic):
> {code}
> Exception in thread "main" java.io.IOException: Map failed
>     at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:770)
>     at 
> sun.nio.ch.FileChannelImpl.transferToTrustedChannel(FileChannelImpl.java:450)
>     at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:523)
>     at org.apache.lucene.store.FSDirectory.copy(FSDirectory.java:450)
>     at org.apache.lucene.index.IndexWriter.addIndexes(IndexWriter.java:3019)
> Caused by: java.lang.OutOfMemoryError: Map failed
>     at sun.nio.ch.FileChannelImpl.map0(Native Method)
>     at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:767)
>     ... 7 more
> {code}
> I changed the impl to something like this:
> {code}
> long numWritten = 0;
> long numToWrite = input.size();
> long bufSize = 1 << 26;
> while (numWritten < numToWrite) {
>   numWritten += output.transferFrom(input, numWritten, bufSize);
> }
> {code}
> And the code successfully adds the indexes. This code uses chunks of 64MB, 
> however that might be too large for some applications, so we definitely need 
> a smaller one. The question is how small so that performance won't be 
> affected, and it'd be great if we can let it be configurable, however since 
> that API is called by other API, such as addIndexes, not sure it's easily 
> controllable.
> Also, I read somewhere (can't remember now where) that on Linux the native 
> impl is better and does copy in chunks. So perhaps we should make a Linux 
> specific impl?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to