[ https://issues.apache.org/jira/browse/LUCENE-2537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Shai Erera updated LUCENE-2537: ------------------------------- Attachment: LUCENE-2537.patch Patch copies the files in chunks of 2MB. All core tests pass. I'll wait a day or two in case someone wants to suggests a different approach, or chunk size limit before I commit. > FSDirectory.copy() impl is unsafe > --------------------------------- > > Key: LUCENE-2537 > URL: https://issues.apache.org/jira/browse/LUCENE-2537 > Project: Lucene - Java > Issue Type: Bug > Components: Store > Reporter: Shai Erera > Assignee: Shai Erera > Fix For: 3.1, 4.0 > > Attachments: FileCopyTest.java, LUCENE-2537.patch > > > There are a couple of issues with it: > # FileChannel.transferFrom documents that it may not copy the number of bytes > requested, however we don't check the return value. So need to fix the code > to read in a loop until all bytes were copied.. > # When calling addIndexes() w/ very large segments (few hundred MBs in size), > I ran into the following exception (Java 1.6 -- Java 1.5's exception was > cryptic): > {code} > Exception in thread "main" java.io.IOException: Map failed > at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:770) > at > sun.nio.ch.FileChannelImpl.transferToTrustedChannel(FileChannelImpl.java:450) > at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:523) > at org.apache.lucene.store.FSDirectory.copy(FSDirectory.java:450) > at org.apache.lucene.index.IndexWriter.addIndexes(IndexWriter.java:3019) > Caused by: java.lang.OutOfMemoryError: Map failed > at sun.nio.ch.FileChannelImpl.map0(Native Method) > at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:767) > ... 7 more > {code} > I changed the impl to something like this: > {code} > long numWritten = 0; > long numToWrite = input.size(); > long bufSize = 1 << 26; > while (numWritten < numToWrite) { > numWritten += output.transferFrom(input, numWritten, bufSize); > } > {code} > And the code successfully adds the indexes. This code uses chunks of 64MB, > however that might be too large for some applications, so we definitely need > a smaller one. The question is how small so that performance won't be > affected, and it'd be great if we can let it be configurable, however since > that API is called by other API, such as addIndexes, not sure it's easily > controllable. > Also, I read somewhere (can't remember now where) that on Linux the native > impl is better and does copy in chunks. So perhaps we should make a Linux > specific impl? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org