[ 
https://issues.apache.org/jira/browse/LUCENE-2537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12887935#action_12887935
 ] 

Shai Erera edited comment on LUCENE-2537 at 7/22/10 8:09 AM:
-------------------------------------------------------------

Oh .. found the thread we discussed that on the list, to which I've actually 
last posted w/ the following text:

{quote}
I've Googled around a bit and came across this: 
http://markmail.org/message/l67bierbmmedrfw5. Apparently, there's a long 
standing bug against SUN since May 2006 
(http://bugs.sun.com/view_bug.do?bug_id=6431344) that's still open and reports 
the exact same behavior that I'm seeing.

If I understand correctly, this might be a Windows limitation and is expected 
to work well on Linux. I'll give it a try. But this makes me think if we should 
keep the current behavior for Linux-based directories, and fallback to the 
chunks approach for Windows ones? Since eventually I'll be running on Linux, I 
don't want to lose performance ...

This isn't the first that we've witnessed the "write once, run everywhere" 
misconception of Java :). I'm thinking if in general we should have a 
Windows/Linux FSDirectory impl, or handlers, to prepare for future cases as 
well. Mike already started this with LUCENE-2500 (DirectIOLinuxDirectory). 
Instead of writing a Directory, perhaps we could have a handler object or 
something, or a generic LinuxDirectory that impls some stuff the 'linux' way. 
In FSDirectory we already have code which detects the OS and JRE used to decide 
between Simple, NIO and MMAP Directories ...
{quote}

      was (Author: shaie):
    Oh .. found the thread we discussed that on the list, to which I've 
actually last posted w/ the following text:

{quote}
I've Googled around a bit and came across this: 
http://markmail.org/message/l67bierbmmedrfw5. Apparently, there's a long 
standing bug against SUN since May 2006 
(http://bugs.sun.com/view_bug.do?bug_id=6431344) that's still open and reports 
the exact same behavior that I'm seeing.

If I understand correctly, this might be a Windows limitation and is expected 
to work well on Linux. I'll give it a try. But this makes me think if we should 
keep the current behavior for Linux-based directories, and fallback to the 
chunks approach for Windows ones? Since eventually I'll be running on Linux, I 
don't want to lose performance ...

This isn't the first that we've witnessed the "write once, run everywhere" 
misconception of Java :). I'm thinking if in general we should have a 
Windows/Linux FSDirectory impl, or handlers, to prepare for future cases as 
well. Mike already started this with LUCENE-2500 (DirectIOLinuxDirectory). 
Instead of writing a Directory, perhaps we could have a handler object or 
something, or a generic LinuxDirectory that impls some stuff the 'linux' way. 
In FSDirectory we already have code which detects the OS and JRE used to decide 
between Simple, NIO and MMAP Directories ...
{code}
  
> FSDirectory.copy() impl is unsafe
> ---------------------------------
>
>                 Key: LUCENE-2537
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2537
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Store
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>             Fix For: 3.1, 4.0
>
>
> There are a couple of issues with it:
> # FileChannel.transferFrom documents that it may not copy the number of bytes 
> requested, however we don't check the return value. So need to fix the code 
> to read in a loop until all bytes were copied..
> # When calling addIndexes() w/ very large segments (few hundred MBs in size), 
> I ran into the following exception (Java 1.6 -- Java 1.5's exception was 
> cryptic):
> {code}
> Exception in thread "main" java.io.IOException: Map failed
>     at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:770)
>     at 
> sun.nio.ch.FileChannelImpl.transferToTrustedChannel(FileChannelImpl.java:450)
>     at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:523)
>     at org.apache.lucene.store.FSDirectory.copy(FSDirectory.java:450)
>     at org.apache.lucene.index.IndexWriter.addIndexes(IndexWriter.java:3019)
> Caused by: java.lang.OutOfMemoryError: Map failed
>     at sun.nio.ch.FileChannelImpl.map0(Native Method)
>     at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:767)
>     ... 7 more
> {code}
> I changed the impl to something like this:
> {code}
> long numWritten = 0;
> long numToWrite = input.size();
> long bufSize = 1 << 26;
> while (numWritten < numToWrite) {
>   numWritten += output.transferFrom(input, numWritten, bufSize);
> }
> {code}
> And the code successfully adds the indexes. This code uses chunks of 64MB, 
> however that might be too large for some applications, so we definitely need 
> a smaller one. The question is how small so that performance won't be 
> affected, and it'd be great if we can let it be configurable, however since 
> that API is called by other API, such as addIndexes, not sure it's easily 
> controllable.
> Also, I read somewhere (can't remember now where) that on Linux the native 
> impl is better and does copy in chunks. So perhaps we should make a Linux 
> specific impl?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to