Re: Avro schema in Ruby API
Hey Harsh, thanks. I can confirm that the first one works. Let me try the second one. Tomas On Sun, Feb 16, 2014 at 8:07 AM, Harsh J ha...@cloudera.com wrote: Hi, For (1) I believe you could do a Schema.parse meta['avro.schema'] to obtain the schema as an object from the meta entry of the file. For (2), as defined in the spec at http://avro.apache.org/docs/current/spec.html#Object+Container+Files, since the schema is stored only in the header of the file, using a simple initialised reader will be efficient in reading just that. The file's data blocks are read only upon enumerating over the reader. On Sun, Feb 16, 2014 at 4:52 AM, Tomas Svarovsky svarovsky.to...@gmail.com wrote: Hey, I wanted to ask couple of questions. 1) Let's assume I have 2 avro files. I would like to grab schemas of both. Compare them and decide what to do. The only way I found to get to the schema in a reader is through dr = Avro::DataFile::Reader.new(file, Avro::IO::DatumReader.new) dr.meta and that is still stringified JSON. Is this the only way or even is this use case something supported or should I do it differently? 2) Also is ti possible to read just the schema? Sometimes it is useful to look at a file without actually reading the whole file let's say from s3. Regards Tomas -- Harsh J
General-Purpose Serialization and Deserialization for Avro-Generated SpecificRecords
Hi all, Here's my use case: I've got a bunch of different Java objects generated from Avro schema files. So the class definition headers look something like this: public class MyObject extends org.apache.avro.specific.SpecificRecordBase implements org.apache.avro.specific.SpecificRecord. I've got many other types than MyObject too. I need to write a method which can serialize (from MyObject or another class to byte[]) and deserialize (from byte[] to MyObject or another class) in memory (not writing to disk). I couldn't figure out how to write one method to handle it for SpecificRecord, so I tired serializing/deserializing these things as GenericRecord instead: public static byte[] serializeFromAvro(GenericRecord gr) { try { DatumWriterGenericRecord writer2 = new GenericDatumWriterGenericRecord(gr.getSchema()); ByteArrayOutputStream bao2 = new ByteArrayOutputStream(); BinaryEncoder encoder2 = EncoderFactory.get().directBinaryEncoder(bao2, null); writer2.write(gr, encoder2); byte[] avroBytes2 = bao2.toByteArray(); return avroBytes2; } catch (IOException e) { LOG.debug(e); return null; } } // Here I use a DataType enum and the AvroSchemaFactory to quickly retrieve a Schema object for a supported DataType. public static GenericRecord deserializeFromAvro(byte[] avroBytes, DataType dataType) { try { Schema schema = AvroSchemaFactory.getInstance().getSchema(dataType); DatumReaderGenericRecord reader2 = new GenericDatumReaderGenericRecord(schema); ByteArrayInputStream bai2 = new ByteArrayInputStream(avroBytes); BinaryDecoder decoder2 = DecoderFactory.get().directBinaryDecoder(bai2, null); GenericRecord gr2 = reader2.read(null, decoder2); return gr2; } catch (Exception e) { LOG.debug(e); return null; } } And use them like such: // Remember MyObject is the SpecificRecord implementing class. MyObject x = new MyObject(); byte[] avroBytes = serializeFromAvro(x); MyObject x2 = (MyObject) deserializeFromAvro(avroBytes, DataType.MyObject); Which results in this: java.lang.ClassCastException: org.apache.avro.generic.GenericData$Record cannot be cast to datatypes.generated.avro.MyObject Is there an easier way to achieve my use case, or some way I can fix my methods to allow the sort of behavior I want? Thanks, Gary
RE: General-Purpose Serialization and Deserialization for Avro-Generated SpecificRecords
Here are some utility functions we've used for serialization to and from JSON. Something similar should work for binary. public T String avroEncodeAsJson(ClassT clazz, Object object) { String avroEncodedJson = null; try { if (object == null || !(object instanceof SpecificRecord)) { return null; } T record = (T) object; Schema schema = ((SpecificRecord) record).getSchema(); ByteArrayOutputStream out = new ByteArrayOutputStream(); Encoder e = EncoderFactory.get().jsonEncoder(schema, out); SpecificDatumWriterT w = new SpecificDatumWriterT(clazz); w.write(record, e); e.flush(); avroEncodedJson = new String(out.toByteArray()); } catch (IOException e) { e.printStackTrace(); } return avroEncodedJson; } public T T jsonDecodeToAvro(String inputString, ClassT className, Schema schema) { T returnObject = null; try { JsonDecoder jsonDecoder = DecoderFactory.get().jsonDecoder(schema, inputString); SpecificDatumReaderT reader = new SpecificDatumReaderT(className); returnObject = reader.read(null, jsonDecoder); } catch (IOException e) { e.printStackTrace(); } return returnObject; } Dave From: flaming.ze...@gmail.com [mailto:flaming.ze...@gmail.com] On Behalf Of Gary Steelman Sent: Tuesday, February 18, 2014 4:21 PM To: user@avro.apache.org Subject: General-Purpose Serialization and Deserialization for Avro-Generated SpecificRecords Hi all, Here's my use case: I've got a bunch of different Java objects generated from Avro schema files. So the class definition headers look something like this: public class MyObject extends org.apache.avro.specific.SpecificRecordBase implements org.apache.avro.specific.SpecificRecord. I've got many other types than MyObject too. I need to write a method which can serialize (from MyObject or another class to byte[]) and deserialize (from byte[] to MyObject or another class) in memory (not writing to disk). I couldn't figure out how to write one method to handle it for SpecificRecord, so I tired serializing/deserializing these things as GenericRecord instead: public static byte[] serializeFromAvro(GenericRecord gr) { try { DatumWriterGenericRecord writer2 = new GenericDatumWriterGenericRecord(gr.getSchema()); ByteArrayOutputStream bao2 = new ByteArrayOutputStream(); BinaryEncoder encoder2 = EncoderFactory.get().directBinaryEncoder(bao2, null); writer2.write(gr, encoder2); byte[] avroBytes2 = bao2.toByteArray(); return avroBytes2; } catch (IOException e) { LOG.debug(e); return null; } } // Here I use a DataType enum and the AvroSchemaFactory to quickly retrieve a Schema object for a supported DataType. public static GenericRecord deserializeFromAvro(byte[] avroBytes, DataType dataType) { try { Schema schema = AvroSchemaFactory.getInstance().getSchema(dataType); DatumReaderGenericRecord reader2 = new GenericDatumReaderGenericRecord(schema); ByteArrayInputStream bai2 = new ByteArrayInputStream(avroBytes); BinaryDecoder decoder2 = DecoderFactory.get().directBinaryDecoder(bai2, null); GenericRecord gr2 = reader2.read(null, decoder2); return gr2; } catch (Exception e) { LOG.debug(e); return null; } } And use them like such: // Remember MyObject is the SpecificRecord implementing class. MyObject x = new MyObject(); byte[] avroBytes = serializeFromAvro(x); MyObject x2 = (MyObject) deserializeFromAvro(avroBytes, DataType.MyObject); Which results in this: java.lang.ClassCastException: org.apache.avro.generic.GenericData$Record cannot be cast to datatypes.generated.avro.MyObject Is there an easier way to achieve my use case, or some way I can fix my methods to allow the sort of behavior I want? Thanks, Gary
Using snappy codec with Avro C++
I am trying to read an avro file compessed with snappy using Avro C++, but Avro C++ does not come with snappy out of the box, and I am unsure how to add it. Can anyone offer any practical advice to get it working? Thanks. -- View this message in context: http://apache-avro.679487.n3.nabble.com/Using-snappy-codec-with-Avro-C-tp4029430.html Sent from the Avro - Users mailing list archive at Nabble.com.
RE: General-Purpose Serialization and Deserialization for Avro-Generated SpecificRecords
That's great Gary. Thanks for the follow up. Dave From: flaming.ze...@gmail.com [mailto:flaming.ze...@gmail.com] On Behalf Of Gary Steelman Sent: Tuesday, February 18, 2014 5:15 PM To: Gary Steelman Cc: user@avro.apache.org Subject: Re: General-Purpose Serialization and Deserialization for Avro-Generated SpecificRecords Hey all, I've adapted Dave's solution to serialize to/from byte[] rather than JSON. Thanks a lot! The two methods are below: @SuppressWarnings(unchecked) public static T byte[] avroSerialize(ClassT clazz, Object object) { byte[] ret = null; try { if (object == null || !(object instanceof SpecificRecord)) { return null; } T record = (T) object; ByteArrayOutputStream out = new ByteArrayOutputStream(); Encoder e = EncoderFactory.get().directBinaryEncoder(out, null); SpecificDatumWriterT w = new SpecificDatumWriterT(clazz); w.write(record, e); e.flush(); ret = out.toByteArray(); } catch (IOException e) { LOG.debug(e); } return ret; } public static T T avroDeserialize(byte[] avroBytes, ClassT clazz, Schema schema) { T ret = null; try { ByteArrayInputStream in = new ByteArrayInputStream(avroBytes); Decoder d = DecoderFactory.get().directBinaryDecoder(in, null); SpecificDatumReaderT reader = new SpecificDatumReaderT(clazz); ret = reader.read(null, d); } catch (IOException e) { LOG.debug(e); } return ret; } And they're called like so: MyObject x = new MyObject(); byte[] avroBytes = avroSerialize(x.getClass(), x); MyObject y = avroDeserialize(avroBytes, MyObject.class, MyObject.SCHEMA$); Thanks, Gary On Tue, Feb 18, 2014 at 6:49 PM, Gary Steelman gary.steelma...@gmail.commailto:gary.steelma...@gmail.com wrote: Thank you Dave, I appreciate it. I'll give those a shot and let you know how it goes. -Gary On Feb 18, 2014 6:45 PM, Dave McAlpin dmcal...@inome.commailto:dmcal...@inome.com wrote: Here are some utility functions we've used for serialization to and from JSON. Something similar should work for binary. public T String avroEncodeAsJson(ClassT clazz, Object object) { String avroEncodedJson = null; try { if (object == null || !(object instanceof SpecificRecord)) { return null; } T record = (T) object; Schema schema = ((SpecificRecord) record).getSchema(); ByteArrayOutputStream out = new ByteArrayOutputStream(); Encoder e = EncoderFactory.get().jsonEncoder(schema, out); SpecificDatumWriterT w = new SpecificDatumWriterT(clazz); w.write(record, e); e.flush(); avroEncodedJson = new String(out.toByteArray()); } catch (IOException e) { e.printStackTrace(); } return avroEncodedJson; } public T T jsonDecodeToAvro(String inputString, ClassT className, Schema schema) { T returnObject = null; try { JsonDecoder jsonDecoder = DecoderFactory.get().jsonDecoder(schema, inputString); SpecificDatumReaderT reader = new SpecificDatumReaderT(className); returnObject = reader.read(null, jsonDecoder); } catch (IOException e) { e.printStackTrace(); } return returnObject; } Dave From: flaming.ze...@gmail.commailto:flaming.ze...@gmail.com [mailto:flaming.ze...@gmail.commailto:flaming.ze...@gmail.com] On Behalf Of Gary Steelman Sent: Tuesday, February 18, 2014 4:21 PM To: user@avro.apache.orgmailto:user@avro.apache.org Subject: General-Purpose Serialization and Deserialization for Avro-Generated SpecificRecords Hi all, Here's my use case: I've got a bunch of different Java objects generated from Avro schema files. So the class definition headers look something like this: public class MyObject extends org.apache.avro.specific.SpecificRecordBase implements org.apache.avro.specific.SpecificRecord. I've got many other types than MyObject too. I need to write a method which can serialize (from MyObject or another class to byte[]) and deserialize (from byte[] to MyObject or another class) in memory (not writing to disk). I couldn't figure out how to write one method to handle it for SpecificRecord, so I tired serializing/deserializing these things as GenericRecord instead: public static byte[] serializeFromAvro(GenericRecord gr) { try { DatumWriterGenericRecord writer2 = new GenericDatumWriterGenericRecord(gr.getSchema()); ByteArrayOutputStream bao2 = new ByteArrayOutputStream(); BinaryEncoder encoder2 = EncoderFactory.get().directBinaryEncoder(bao2, null); writer2.write(gr, encoder2); byte[] avroBytes2 = bao2.toByteArray(); return avroBytes2; } catch (IOException e) { LOG.debug(e); return null; } } // Here I use a DataType enum and the AvroSchemaFactory to quickly retrieve a Schema object for a supported DataType. public static GenericRecord deserializeFromAvro(byte[] avroBytes, DataType dataType) {