Adrien Grand created LUCENE-5583:
------------------------------------

             Summary: Should BufferedChecksumIndexInput have its own buffer?
                 Key: LUCENE-5583
                 URL: https://issues.apache.org/jira/browse/LUCENE-5583
             Project: Lucene - Core
          Issue Type: Bug
    Affects Versions: 4.8
            Reporter: Adrien Grand


I was playing with on-the-fly checksum verification and this made me stumble 
upon an issue with {{BufferedChecksumIndexInput}}.

I have some code that skips over a {{DataInput}} by reading bytes into 
/dev/null, eg.
{code}
  private static final byte[] SKIP_BUFFER = new byte[1024];

  private static void skipBytes(DataInput in, long numBytes) throws IOException 
{
    assert numBytes >= 0;
    for (long skipped = 0; skipped < numBytes; ) {
      final int toRead = (int) Math.min(numBytes - skipped, SKIP_BUFFER.length);
      in.readBytes(SKIP_BUFFER, 0, toRead);
      skipped += toRead;
    }
  }
{code}

It is fine to read into this static buffer, even from multiple threads, since 
the content that is read doesn't matter here. However, it breaks with 
{{BufferedChecksumIndexInput}} because of the way that it updates the checksum:

{code}
  @Override
  public void readBytes(byte[] b, int offset, int len)
    throws IOException {
    main.readBytes(b, offset, len);
    digest.update(b, offset, len);
  }
{code}

If you are unlucky enough so that a concurrent call to {{skipBytes}} started 
modifying the content of {{b}} before the call to {{digest.update(b, offset, 
len)}} finished, then your checksum will be wrong.

I think we should make {{BufferedChecksumIndexInput}} read into a private 
buffer first instead of relying on the user-provided buffer.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to