[jira] [Commented] (LUCENE-5561) NativeUnixDirectory is broken

Robert Muir (JIRA) Thu, 03 Apr 2014 16:44:19 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-5561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13959436#comment-13959436
 ]


Robert Muir commented on LUCENE-5561:
-------------------------------------

{code}
// NativeUnixDirectory only works on Unix:
Assume.assumeTrue(Constants.WINDOWS == false);
{code}

Can this be a "positive test" (e.g. LINUX | SOLARIS | MACOX) rather than have 
tests failing on any unknown platform? Its better to just be safe...

{code}
private final static class NativeUnixIndexOutput extends IndexOutput {
...
private final CRC32 crc = new CRC32();
...
public void writeByte(byte b) throws IOException {
  ...
  crc.update(b);
  ...
}
{code}

This will be excruciatingly slow. Buffering needed here for this computation to 
be efficient. 
So I have a hard time understanding why you don't do this in dump().
If you wont do it there, then use new BufferedChecksum(new CRC32()) instead.
Here's a comparison of different sizes to illustrate:
||CRC32/bufferSize||throughput||
|CRC32/1|20.46 mB/s|
|CRC32/2|41.16 mB/s|
|CRC32/4|81.14 mB/s|
|CRC32/8|148.64 mB/s|
|CRC32/16|259.04 mB/s|
|CRC32/32|428.32 mB/s|
|CRC32/64|606.56 mB/s|
|CRC32/128|765.93 mB/s|
|CRC32/256|879.79 mB/s|
|CRC32/512|952.74 mB/s|
|CRC32/1024|991.43 mB/s|

{code}
            <!-- TODO: generalize this for non-unix -->
            <!-- Add any native extensions to LD_LIBRARY_PATH: -->
            <env key="LD_LIBRARY_PATH" 
value="/l/fixnativeunix/lucene/build/native/"></env>
{code}

This will not work except on your computer. Also, any existing stuff in 
LD_LIBRARY_PATH should be preserved, so you have to use some 'path' stuff in 
ant to do it correctly.

Otherwise it looks good. 

Long term for the future, if dirs like this need a fixed-size buffer, it would 
be cool to have a BufferedOutput that works a little differently in that way. I 
am not really sure, but i think the HDFSDirectory stuff has code duplication 
for this same reason. You'd have to handle the "end" special though I guess...

Also long term it would be nice if the JNI could be removed, and it was 
possible to do just some evil reflected call to open with O_DIRECT and get 
pageSize and so on. This would actually be safer, too.


> NativeUnixDirectory is broken
> -----------------------------
>
>                 Key: LUCENE-5561
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5561
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 4.8, 5.0
>
>         Attachments: LUCENE-5561.patch, LUCENE-5561.patch
>
>
> Several things:
>   * It assumed ByteBuffer.allocateDirect would be page-aligned, but
>     that's no longer true in Java 1.7
>   * It failed to throw FNFE if a file didn't exist (throw IOExc
>     instead)
>   * It didn't have a default ctor taking File (so it was hard to run
>     all tests against it)
>   * It didn't have a test case
>   * Some Javadocs problems
>   * I cutover to FilterDirectory
> I tried to cutover to BufferedIndexOutput since this is essentially
> all that NativeUnixIO is doing ... but it's not simple because BIO
> sometimes flushes non-full (non-aligned) buffers even before the end
> of the file (its writeBytes method).
> I also factored out a BaseDirectoryTestCase, and tried to fold in
> "generic" Directory tests, and added/cutover explicit tests for the
> core directory impls.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5561) NativeUnixDirectory is broken

Reply via email to