[ https://issues.apache.org/jira/browse/AVRO-295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12831568#action_12831568 ]
Doug Cutting commented on AVRO-295: ----------------------------------- I am not convinced that we should automatically flush after each object is serialized, as this may adversely affect performance when writing many small objects. Mightn't it be better to add a flush() method to DatumWriter and leave control of when things are flushed to the application? > JsonEncoder is not flushed after writing using ReflectDatumWriter > ------------------------------------------------------------------ > > Key: AVRO-295 > URL: https://issues.apache.org/jira/browse/AVRO-295 > Project: Avro > Issue Type: Improvement > Components: java > Affects Versions: 1.3.0 > Reporter: Jonathan Hsieh > Assignee: Thiruvalluvan M. G. > Attachments: AVRO-295-test.patch, AVRO-295.patch > > > JsonEncoder needs to be flushed otherwise data may be left in its buffers. > Ideally behavior should be the same regardless of what kind of Encoder passed > in. Here is some example code: > {code} > class A { > long timestamp; > } > public void testEventSchemaSerializeBinary() throws IOException { > A e = new A(); > e.timestamp = 1234; > ReflectData reflectData = ReflectData.get(); > Schema schm = reflectData.getSchema(A.class); > System.out.println(schm); > ReflectDatumWriter writer = new ReflectDatumWriter(schm); > ByteArrayOutputStream out = new ByteArrayOutputStream(); > Encoder json = new BinaryEncoder(out); > writer.write(e, json); // only one calls > byte[] bs = out.toByteArray(); > int len = bs.length; // length is 2, which is reasonable. > System.out.println("output size: " + len); > } > public void testSerializeJson() throws IOException { > A a = new A(); > a.timestamp = 1234; > ReflectData reflectData = ReflectData.get(); > Schema schm = reflectData.getSchema(A.class); > ReflectDatumWriter writer = new ReflectDatumWriter(schm); > ByteArrayOutputStream out = new ByteArrayOutputStream(); > JsonEncoder json = new JsonEncoder(schm, out); > writer.write(e, json); /// only one call > // did not flush > byte[] bs = out.toByteArray(); > int len = bs.length; // len == 0; this is unexpected! > System.out.println("output size: " + len); > > // flushed this time. this is a bit unwieldy > json.flush(); > bs = out.toByteArray(); > len = bs.length; // len == 18; this is better! > System.out.println("output size: " + len); > } > {code} > One way to deal with it is to have either all Encoders have flush method so > the DatumWriter can always flush it, and potentially add a flush method to > DatumWriter as well. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.