Scratch reusing ReflectDatumWriter; it depends on internal state
(schema). When not reusing, it seems that at least writing succeeds
without complaining. Reading however gets me:

java.io.IOException: Invalid sync!
        at org.apache.avro.file.DataFileReader.skipSync(DataFileReader.java:129)
        at org.apache.avro.file.DataFileReader.next(DataFileReader.java:113)
        at 
org.apache.avro.TestDataFileReflect.testMultiReflect(TestDataFileReflect.java:71)
...

Eelco



On Sat, Aug 15, 2009 at 6:45 PM, Eelco
Hillenius<[email protected]> wrote:
> Hi all,
>
> I'd like to adopt Avro to do audit logging. For this, I have a
> hierarchy of audit events, like:
>
> AuditEvent
>  |-- UserEvent
>        |-- UserSessionStartedEvent
>        |-- UserSessionEndedEvent
>        |-- WorkspaceEvent
>               |-- WorkspaceAccessedEvent
>
> etc. And I would like to write instances of these events to log files
> (and then later move them to HDFS so that we can fire MR jobs at
> them).
>
> There are two show stoppers for me right now, AVRO-93 and AVRO-95.
> Now, my question is about the latter one, which is about mixing
> multiple types in one data file using reflection. I submitted a unit
> test for it that shows the bug, but I'm wondering if the way I'm using
> the API is as it is intended.
>
> Basically, I assume I can reuse the OutputStream and
> ReflectDatumWriter instances for different types:
>
>    FileOutputStream fos = new FileOutputStream(FILE);
>    ReflectDatumWriter dout = new ReflectDatumWriter();
>
> Than, for every type I have:
>
>    Schema fooSchema = ReflectData.getSchema(FooEvent.class);
>    DataFileWriter<Object> fooWriter = new DataFileWriter<Object>(fooSchema,
>        fos, dout);
>
>    Schema barSchema = ReflectData.getSchema(BarEvent.class);
>    DataFileWriter<Object> barWriter = new DataFileWriter<Object>(barSchema,
>        fos, dout);
>
> I don't know what events I will have in one file upfront, and I would
> like to only write the schemas I'm actually using. So I'm assuming
> (hoping) I can get a schema on the fly and create a new writer for it,
> and then cache that writer for as long as the file is open and reuse
> it for the same type. So I'd end up with a writer for every type
> that's in the file so far.
>
> Appending to the files:
>
> barWriter.append(new BarEvent("Cheers mate!"));
> fooWriter.append(new FooEvent(30));
>
> Then when I'm about to rollover to a new file, I flush the writers and
> close the output stream.
>
> I'd then hope to be able to read records in again like this:
>
>    GenericDatumReader<Object> din = new GenericDatumReader<Object>();
>    SeekableFileInput sin = new SeekableFileInput(FILE);
>    DataFileReader reader = new DataFileReader<Object>(sin, din);
>
> and with ever new reader.next get a proper object magically
> instantiated and populated back of course! :-)
>
> Would using Avro like this be about right and is it a matter of having
> these bugs fixed, or am I making some wrong assumptions and/ or should
> I go about this differently?
>
> See the attachment with https://issues.apache.org/jira/browse/AVRO-93
> for the full unit test.
>
> Cheers,
>
> Eelco
>

Reply via email to