[GitHub] avro pull request #198: AVRO-1857: GenericDatumWriter.write using BufferedBi...

2017-05-25 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/avro/pull/198


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (AVRO-1857) GenericDatumWriter.write using BufferedBinaryEncoder leaves ByteBuffer in indeterminate state

2017-05-25 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16024355#comment-16024355
 ] 

ASF subversion and git services commented on AVRO-1857:
---

Commit c9f2c49241397fb3967bd281225f7cce353c32fc in avro's branch 
refs/heads/master from [~nkollar]
[ https://git-wip-us.apache.org/repos/asf?p=avro.git;h=c9f2c49 ]

AVRO-1857: GenericDatumWriter.write using BufferedBinaryEncoder leaves 
ByteBuffer in indeterminate state

This closes #198


> GenericDatumWriter.write using BufferedBinaryEncoder leaves ByteBuffer in 
> indeterminate state
> -
>
> Key: AVRO-1857
> URL: https://issues.apache.org/jira/browse/AVRO-1857
> Project: Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.7.7
>Reporter: Matt Grimwade
>Assignee: Nandor Kollar
>
> Calling {{GenericDatumWriter.write(Object, Encoder)}} using a 
> BufferedBinaryEncoder leaves any ByteBuffers within the object (representing 
> BYTES or FIXED types) in an indeterminate state. Specifically, each buffer 
> may be left either in its initial state, with (position, remaining) = (0, N) 
> or in its "consumed" state of (N, 0).
> Although I cannot find it documented, I believe the correct behaviour is that 
> the state of the object being written should be unmodified.
> This is an actual problem in our project where we save a copy of an object to 
> disk before performing some action on it. This later action fails 
> indeterminately because some of the ByteBuffers in the object are "consumed" 
> and some are not.
> I think the fault lies in 
> {{org.apache.avro.io.BufferedBinaryEncoder#writeFixed(java.nio.ByteBuffer)}}, 
> wherein the first branch advances the buffer's position but the second does 
> not:
> {code}
>   @Override
>   public void writeFixed(ByteBuffer bytes) throws IOException {
> if (!bytes.hasArray() && bytes.remaining() > bulkLimit) {
>   flushBuffer();
>   sink.innerWrite(bytes); // bypass the buffer
> } else {
>   super.writeFixed(bytes);
> }
>   }
> {code}
> Here is a failing test case:
> {code}
> import static java.util.Arrays.asList;
> import static org.hamcrest.Matchers.equalTo;
> import static org.hamcrest.Matchers.is;
> import static org.junit.Assert.assertThat;
> import java.io.ByteArrayOutputStream;
> import java.io.IOException;
> import java.nio.ByteBuffer;
> import java.nio.MappedByteBuffer;
> import java.nio.channels.FileChannel;
> import java.nio.file.Files;
> import java.nio.file.Path;
> import java.nio.file.StandardOpenOption;
> import org.apache.avro.Schema;
> import org.apache.avro.generic.GenericDatumWriter;
> import org.apache.avro.io.Encoder;
> import org.apache.avro.io.EncoderFactory;
> import org.junit.Test;
> public class BugTest {
> private static final int ENCODER_BUFFER_SIZE = 32;
> private static final int EXAMPLE_DATA_SIZE = 17;
> @Test
> public void testArrayBackedByteBuffer() throws IOException {
> ByteBuffer buffer = ByteBuffer.wrap(someBytes(EXAMPLE_DATA_SIZE));
> doTest(buffer);
> }
> @Test
> public void testMappedByteBuffer() throws IOException {
> Path file = Files.createTempFile("test", "data");
> Files.write(file, someBytes(EXAMPLE_DATA_SIZE));
> MappedByteBuffer buffer = FileChannel.open(file, 
> StandardOpenOption.READ).map(FileChannel.MapMode.READ_ONLY, 0, 
> EXAMPLE_DATA_SIZE);
> doTest(buffer);
> }
> private static void doTest(ByteBuffer buffer) throws IOException {
> assertThat(asList(buffer.position(), buffer.remaining()), 
> is(asList(0, EXAMPLE_DATA_SIZE)));
> ByteArrayOutputStream output = new 
> ByteArrayOutputStream(EXAMPLE_DATA_SIZE * 2);
> EncoderFactory encoderFactory = new EncoderFactory();
> encoderFactory.configureBufferSize(ENCODER_BUFFER_SIZE);
> Encoder encoder = encoderFactory.binaryEncoder(output, null);
> new 
> GenericDatumWriter(Schema.create(Schema.Type.BYTES)).write(buffer,
>  encoder);
> encoder.flush();
> assertThat(output.toByteArray(), 
> equalTo(avroEncoded(someBytes(EXAMPLE_DATA_SIZE;
> assertThat(asList(buffer.position(), buffer.remaining()), 
> is(asList(0, EXAMPLE_DATA_SIZE))); // fails if buffer is not array-backed and 
> buffer overflow occurs
> }
> private static byte[] someBytes(int size) {
> byte[] result = new byte[size];
> for (int i = 0; i < size; i++) {
> result[i] = (byte) i;
> }
> return result;
> }
> private static byte[] avroEncoded(byte[] bytes) {
> assert bytes.length < 64;
> byte[] result = new byte[1 + bytes.length];
> result[0] = (byte) (bytes.length * 2);

[jira] [Commented] (AVRO-1857) GenericDatumWriter.write using BufferedBinaryEncoder leaves ByteBuffer in indeterminate state

2017-05-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16024356#comment-16024356
 ] 

ASF GitHub Bot commented on AVRO-1857:
--

Github user asfgit closed the pull request at:

https://github.com/apache/avro/pull/198


> GenericDatumWriter.write using BufferedBinaryEncoder leaves ByteBuffer in 
> indeterminate state
> -
>
> Key: AVRO-1857
> URL: https://issues.apache.org/jira/browse/AVRO-1857
> Project: Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.7.7
>Reporter: Matt Grimwade
>Assignee: Nandor Kollar
>
> Calling {{GenericDatumWriter.write(Object, Encoder)}} using a 
> BufferedBinaryEncoder leaves any ByteBuffers within the object (representing 
> BYTES or FIXED types) in an indeterminate state. Specifically, each buffer 
> may be left either in its initial state, with (position, remaining) = (0, N) 
> or in its "consumed" state of (N, 0).
> Although I cannot find it documented, I believe the correct behaviour is that 
> the state of the object being written should be unmodified.
> This is an actual problem in our project where we save a copy of an object to 
> disk before performing some action on it. This later action fails 
> indeterminately because some of the ByteBuffers in the object are "consumed" 
> and some are not.
> I think the fault lies in 
> {{org.apache.avro.io.BufferedBinaryEncoder#writeFixed(java.nio.ByteBuffer)}}, 
> wherein the first branch advances the buffer's position but the second does 
> not:
> {code}
>   @Override
>   public void writeFixed(ByteBuffer bytes) throws IOException {
> if (!bytes.hasArray() && bytes.remaining() > bulkLimit) {
>   flushBuffer();
>   sink.innerWrite(bytes); // bypass the buffer
> } else {
>   super.writeFixed(bytes);
> }
>   }
> {code}
> Here is a failing test case:
> {code}
> import static java.util.Arrays.asList;
> import static org.hamcrest.Matchers.equalTo;
> import static org.hamcrest.Matchers.is;
> import static org.junit.Assert.assertThat;
> import java.io.ByteArrayOutputStream;
> import java.io.IOException;
> import java.nio.ByteBuffer;
> import java.nio.MappedByteBuffer;
> import java.nio.channels.FileChannel;
> import java.nio.file.Files;
> import java.nio.file.Path;
> import java.nio.file.StandardOpenOption;
> import org.apache.avro.Schema;
> import org.apache.avro.generic.GenericDatumWriter;
> import org.apache.avro.io.Encoder;
> import org.apache.avro.io.EncoderFactory;
> import org.junit.Test;
> public class BugTest {
> private static final int ENCODER_BUFFER_SIZE = 32;
> private static final int EXAMPLE_DATA_SIZE = 17;
> @Test
> public void testArrayBackedByteBuffer() throws IOException {
> ByteBuffer buffer = ByteBuffer.wrap(someBytes(EXAMPLE_DATA_SIZE));
> doTest(buffer);
> }
> @Test
> public void testMappedByteBuffer() throws IOException {
> Path file = Files.createTempFile("test", "data");
> Files.write(file, someBytes(EXAMPLE_DATA_SIZE));
> MappedByteBuffer buffer = FileChannel.open(file, 
> StandardOpenOption.READ).map(FileChannel.MapMode.READ_ONLY, 0, 
> EXAMPLE_DATA_SIZE);
> doTest(buffer);
> }
> private static void doTest(ByteBuffer buffer) throws IOException {
> assertThat(asList(buffer.position(), buffer.remaining()), 
> is(asList(0, EXAMPLE_DATA_SIZE)));
> ByteArrayOutputStream output = new 
> ByteArrayOutputStream(EXAMPLE_DATA_SIZE * 2);
> EncoderFactory encoderFactory = new EncoderFactory();
> encoderFactory.configureBufferSize(ENCODER_BUFFER_SIZE);
> Encoder encoder = encoderFactory.binaryEncoder(output, null);
> new 
> GenericDatumWriter(Schema.create(Schema.Type.BYTES)).write(buffer,
>  encoder);
> encoder.flush();
> assertThat(output.toByteArray(), 
> equalTo(avroEncoded(someBytes(EXAMPLE_DATA_SIZE;
> assertThat(asList(buffer.position(), buffer.remaining()), 
> is(asList(0, EXAMPLE_DATA_SIZE))); // fails if buffer is not array-backed and 
> buffer overflow occurs
> }
> private static byte[] someBytes(int size) {
> byte[] result = new byte[size];
> for (int i = 0; i < size; i++) {
> result[i] = (byte) i;
> }
> return result;
> }
> private static byte[] avroEncoded(byte[] bytes) {
> assert bytes.length < 64;
> byte[] result = new byte[1 + bytes.length];
> result[0] = (byte) (bytes.length * 2); // zig-zag encoding
> System.arraycopy(bytes, 0, result, 1, bytes.length);
> return result;
> }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (AVRO-1857) GenericDatumWriter.write using BufferedBinaryEncoder leaves ByteBuffer in indeterminate state

2017-05-25 Thread Gabor Szadovszky (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Szadovszky updated AVRO-1857:
---
   Resolution: Fixed
Fix Version/s: 1.9.0
   Status: Resolved  (was: Patch Available)

> GenericDatumWriter.write using BufferedBinaryEncoder leaves ByteBuffer in 
> indeterminate state
> -
>
> Key: AVRO-1857
> URL: https://issues.apache.org/jira/browse/AVRO-1857
> Project: Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.7.7
>Reporter: Matt Grimwade
>Assignee: Nandor Kollar
> Fix For: 1.9.0
>
>
> Calling {{GenericDatumWriter.write(Object, Encoder)}} using a 
> BufferedBinaryEncoder leaves any ByteBuffers within the object (representing 
> BYTES or FIXED types) in an indeterminate state. Specifically, each buffer 
> may be left either in its initial state, with (position, remaining) = (0, N) 
> or in its "consumed" state of (N, 0).
> Although I cannot find it documented, I believe the correct behaviour is that 
> the state of the object being written should be unmodified.
> This is an actual problem in our project where we save a copy of an object to 
> disk before performing some action on it. This later action fails 
> indeterminately because some of the ByteBuffers in the object are "consumed" 
> and some are not.
> I think the fault lies in 
> {{org.apache.avro.io.BufferedBinaryEncoder#writeFixed(java.nio.ByteBuffer)}}, 
> wherein the first branch advances the buffer's position but the second does 
> not:
> {code}
>   @Override
>   public void writeFixed(ByteBuffer bytes) throws IOException {
> if (!bytes.hasArray() && bytes.remaining() > bulkLimit) {
>   flushBuffer();
>   sink.innerWrite(bytes); // bypass the buffer
> } else {
>   super.writeFixed(bytes);
> }
>   }
> {code}
> Here is a failing test case:
> {code}
> import static java.util.Arrays.asList;
> import static org.hamcrest.Matchers.equalTo;
> import static org.hamcrest.Matchers.is;
> import static org.junit.Assert.assertThat;
> import java.io.ByteArrayOutputStream;
> import java.io.IOException;
> import java.nio.ByteBuffer;
> import java.nio.MappedByteBuffer;
> import java.nio.channels.FileChannel;
> import java.nio.file.Files;
> import java.nio.file.Path;
> import java.nio.file.StandardOpenOption;
> import org.apache.avro.Schema;
> import org.apache.avro.generic.GenericDatumWriter;
> import org.apache.avro.io.Encoder;
> import org.apache.avro.io.EncoderFactory;
> import org.junit.Test;
> public class BugTest {
> private static final int ENCODER_BUFFER_SIZE = 32;
> private static final int EXAMPLE_DATA_SIZE = 17;
> @Test
> public void testArrayBackedByteBuffer() throws IOException {
> ByteBuffer buffer = ByteBuffer.wrap(someBytes(EXAMPLE_DATA_SIZE));
> doTest(buffer);
> }
> @Test
> public void testMappedByteBuffer() throws IOException {
> Path file = Files.createTempFile("test", "data");
> Files.write(file, someBytes(EXAMPLE_DATA_SIZE));
> MappedByteBuffer buffer = FileChannel.open(file, 
> StandardOpenOption.READ).map(FileChannel.MapMode.READ_ONLY, 0, 
> EXAMPLE_DATA_SIZE);
> doTest(buffer);
> }
> private static void doTest(ByteBuffer buffer) throws IOException {
> assertThat(asList(buffer.position(), buffer.remaining()), 
> is(asList(0, EXAMPLE_DATA_SIZE)));
> ByteArrayOutputStream output = new 
> ByteArrayOutputStream(EXAMPLE_DATA_SIZE * 2);
> EncoderFactory encoderFactory = new EncoderFactory();
> encoderFactory.configureBufferSize(ENCODER_BUFFER_SIZE);
> Encoder encoder = encoderFactory.binaryEncoder(output, null);
> new 
> GenericDatumWriter(Schema.create(Schema.Type.BYTES)).write(buffer,
>  encoder);
> encoder.flush();
> assertThat(output.toByteArray(), 
> equalTo(avroEncoded(someBytes(EXAMPLE_DATA_SIZE;
> assertThat(asList(buffer.position(), buffer.remaining()), 
> is(asList(0, EXAMPLE_DATA_SIZE))); // fails if buffer is not array-backed and 
> buffer overflow occurs
> }
> private static byte[] someBytes(int size) {
> byte[] result = new byte[size];
> for (int i = 0; i < size; i++) {
> result[i] = (byte) i;
> }
> return result;
> }
> private static byte[] avroEncoded(byte[] bytes) {
> assert bytes.length < 64;
> byte[] result = new byte[1 + bytes.length];
> result[0] = (byte) (bytes.length * 2); // zig-zag encoding
> System.arraycopy(bytes, 0, result, 1, bytes.length);
> return result;
> }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (AVRO-2037) Use std::any where available

2017-05-25 Thread Darryl Green (JIRA)
Darryl Green created AVRO-2037:
--

 Summary: Use std::any where available
 Key: AVRO-2037
 URL: https://issues.apache.org/jira/browse/AVRO-2037
 Project: Avro
  Issue Type: Improvement
  Components: c++
Reporter: Darryl Green


The use of boost::any to hold union types causes a significant performance hit 
especially for small types - in particular the when using [null,primitive] for 
optional primitive type elements of a schema. Most (all?) implementations of 
std::any include a small value optimisation that avoids allocation overhead for 
scalars and other small types. Its a little unfortunate that the performance of 
a C++ binding of r a notionally high performance serialization format performs 
so poorly in this case (note - I had previously proposed using boost::variant 
which would address this problem but would fail to support recursive types or 
truly huge numbers of distinct types in a union). Obviously this requires C++ 
17 but could fall back to boost::any for older compilers.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (AVRO-2037) Use std::any where available

2017-05-25 Thread Darryl Green (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-2037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Darryl Green updated AVRO-2037:
---
Description: The use of boost::any to hold union types causes a significant 
performance hit especially for small types - in particular the when using 
[null,primitive] for optional primitive type elements of a schema. Most (all?) 
implementations of std::any include a small value optimisation that avoids 
allocation overhead for scalars and other small types. Its a little unfortunate 
that the performance of a C++ binding of a notionally high performance 
serialization format performs so poorly in this case (note - I had previously 
proposed using boost::variant which would address this problem but would fail 
to support recursive types or truly huge numbers of distinct types in a union). 
Obviously this requires C++ 17 but could fall back to boost::any for older 
compilers.  (was: The use of boost::any to hold union types causes a 
significant performance hit especially for small types - in particular the when 
using [null,primitive] for optional primitive type elements of a schema. Most 
(all?) implementations of std::any include a small value optimisation that 
avoids allocation overhead for scalars and other small types. Its a little 
unfortunate that the performance of a C++ binding of r a notionally high 
performance serialization format performs so poorly in this case (note - I had 
previously proposed using boost::variant which would address this problem but 
would fail to support recursive types or truly huge numbers of distinct types 
in a union). Obviously this requires C++ 17 but could fall back to boost::any 
for older compilers.)

> Use std::any where available
> 
>
> Key: AVRO-2037
> URL: https://issues.apache.org/jira/browse/AVRO-2037
> Project: Avro
>  Issue Type: Improvement
>  Components: c++
>Reporter: Darryl Green
>
> The use of boost::any to hold union types causes a significant performance 
> hit especially for small types - in particular the when using 
> [null,primitive] for optional primitive type elements of a schema. Most 
> (all?) implementations of std::any include a small value optimisation that 
> avoids allocation overhead for scalars and other small types. Its a little 
> unfortunate that the performance of a C++ binding of a notionally high 
> performance serialization format performs so poorly in this case (note - I 
> had previously proposed using boost::variant which would address this problem 
> but would fail to support recursive types or truly huge numbers of distinct 
> types in a union). Obviously this requires C++ 17 but could fall back to 
> boost::any for older compilers.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)