Re: Avro schema in Ruby API

2014-02-18 Thread Tomas Svarovsky
Hey Harsh,

thanks. I can confirm that the first one works. Let me try the second one.

Tomas


On Sun, Feb 16, 2014 at 8:07 AM, Harsh J ha...@cloudera.com wrote:

 Hi,

 For (1) I believe you could do a Schema.parse meta['avro.schema'] to
 obtain the schema as an object from the meta entry of the file.

 For (2), as defined in the spec at
 http://avro.apache.org/docs/current/spec.html#Object+Container+Files,
 since the schema is stored only in the header of the file, using a
 simple initialised reader will be efficient in reading just that. The
 file's data blocks are read only upon enumerating over the reader.

 On Sun, Feb 16, 2014 at 4:52 AM, Tomas Svarovsky
 svarovsky.to...@gmail.com wrote:
  Hey,
 
  I wanted to ask couple of questions.
 
  1) Let's assume I have 2 avro files. I would like to grab schemas of
 both.
  Compare them and decide what to do. The only way I found to get to the
  schema in a reader is through
 
  dr = Avro::DataFile::Reader.new(file, Avro::IO::DatumReader.new)
  dr.meta
 
  and that is still stringified JSON. Is this the only way or even is this
 use
  case something supported or should I do it differently?
 
  2) Also is ti possible to read just the schema? Sometimes it is useful to
  look at a file without actually reading the whole file let's say from s3.
 
  Regards Tomas



 --
 Harsh J



General-Purpose Serialization and Deserialization for Avro-Generated SpecificRecords

2014-02-18 Thread Gary Steelman
Hi all,

Here's my use case: I've got a bunch of different Java objects generated
from Avro schema files. So the class definition headers look something like
this: public class MyObject extends
org.apache.avro.specific.SpecificRecordBase implements
org.apache.avro.specific.SpecificRecord. I've got many other types than
MyObject too. I need to write a method which can serialize (from MyObject
or another class to byte[]) and deserialize (from byte[] to MyObject or
another class) in memory (not writing to disk).

I couldn't figure out how to write one method to handle it for
SpecificRecord, so I tired serializing/deserializing these things as
GenericRecord instead:

  public static byte[] serializeFromAvro(GenericRecord gr) {
try {
  DatumWriterGenericRecord writer2 = new
GenericDatumWriterGenericRecord(gr.getSchema());
  ByteArrayOutputStream bao2 = new ByteArrayOutputStream();
  BinaryEncoder encoder2 =
EncoderFactory.get().directBinaryEncoder(bao2, null);
  writer2.write(gr, encoder2);
  byte[] avroBytes2 = bao2.toByteArray();
  return avroBytes2;
} catch (IOException e) {
  LOG.debug(e);
  return null;
}
  }

  // Here I use a DataType enum and the AvroSchemaFactory to quickly
retrieve a Schema object for a supported DataType.
  public static GenericRecord deserializeFromAvro(byte[] avroBytes,
DataType dataType) {
try {
  Schema schema = AvroSchemaFactory.getInstance().getSchema(dataType);
  DatumReaderGenericRecord reader2 = new
GenericDatumReaderGenericRecord(schema);
  ByteArrayInputStream bai2 = new ByteArrayInputStream(avroBytes);
  BinaryDecoder decoder2 =
DecoderFactory.get().directBinaryDecoder(bai2, null);
  GenericRecord gr2 = reader2.read(null, decoder2);
  return gr2;
} catch (Exception e) {
  LOG.debug(e);
  return null;
}
  }

And use them like such:

// Remember MyObject is the SpecificRecord implementing class.
MyObject x = new MyObject();
byte[] avroBytes = serializeFromAvro(x);
MyObject x2 = (MyObject) deserializeFromAvro(avroBytes, DataType.MyObject);

Which results in this:
java.lang.ClassCastException: org.apache.avro.generic.GenericData$Record
cannot be cast to datatypes.generated.avro.MyObject

Is there an easier way to achieve my use case, or some way I can fix my
methods to allow the sort of behavior I want?

Thanks,
Gary


RE: General-Purpose Serialization and Deserialization for Avro-Generated SpecificRecords

2014-02-18 Thread Dave McAlpin
Here are some utility functions we've used for serialization to and from JSON. 
Something similar should work for binary.

public T String avroEncodeAsJson(ClassT clazz, Object object) {
String avroEncodedJson = null;
try {
if (object == null || !(object instanceof SpecificRecord)) {
return null;
}
T record = (T) object;
Schema schema = ((SpecificRecord) record).getSchema();
ByteArrayOutputStream out = new ByteArrayOutputStream();
Encoder e = EncoderFactory.get().jsonEncoder(schema, out);
SpecificDatumWriterT w = new SpecificDatumWriterT(clazz);
w.write(record, e);
e.flush();
avroEncodedJson = new String(out.toByteArray());
} catch (IOException e) {
e.printStackTrace();
}

return avroEncodedJson;
}

public T T jsonDecodeToAvro(String inputString, ClassT className, Schema 
schema) {
T returnObject = null;
try {
JsonDecoder jsonDecoder = DecoderFactory.get().jsonDecoder(schema, 
inputString);
SpecificDatumReaderT reader = new SpecificDatumReaderT(className);
returnObject = reader.read(null, jsonDecoder);
} catch (IOException e) {
e.printStackTrace();
}

return returnObject;
}

Dave

From: flaming.ze...@gmail.com [mailto:flaming.ze...@gmail.com] On Behalf Of 
Gary Steelman
Sent: Tuesday, February 18, 2014 4:21 PM
To: user@avro.apache.org
Subject: General-Purpose Serialization and Deserialization for Avro-Generated 
SpecificRecords

Hi all,
Here's my use case: I've got a bunch of different Java objects generated from 
Avro schema files. So the class definition headers look something like this: 
public class MyObject extends org.apache.avro.specific.SpecificRecordBase 
implements org.apache.avro.specific.SpecificRecord. I've got many other types 
than MyObject too. I need to write a method which can serialize (from MyObject 
or another class to byte[]) and deserialize (from byte[] to MyObject or another 
class) in memory (not writing to disk).
I couldn't figure out how to write one method to handle it for SpecificRecord, 
so I tired serializing/deserializing these things as GenericRecord instead:

  public static byte[] serializeFromAvro(GenericRecord gr) {
try {
  DatumWriterGenericRecord writer2 = new 
GenericDatumWriterGenericRecord(gr.getSchema());
  ByteArrayOutputStream bao2 = new ByteArrayOutputStream();
  BinaryEncoder encoder2 = EncoderFactory.get().directBinaryEncoder(bao2, 
null);
  writer2.write(gr, encoder2);
  byte[] avroBytes2 = bao2.toByteArray();
  return avroBytes2;
} catch (IOException e) {
  LOG.debug(e);
  return null;
}
  }
  // Here I use a DataType enum and the AvroSchemaFactory to quickly retrieve a 
Schema object for a supported DataType.
  public static GenericRecord deserializeFromAvro(byte[] avroBytes, DataType 
dataType) {
try {
  Schema schema = AvroSchemaFactory.getInstance().getSchema(dataType);
  DatumReaderGenericRecord reader2 = new 
GenericDatumReaderGenericRecord(schema);
  ByteArrayInputStream bai2 = new ByteArrayInputStream(avroBytes);
  BinaryDecoder decoder2 = DecoderFactory.get().directBinaryDecoder(bai2, 
null);
  GenericRecord gr2 = reader2.read(null, decoder2);
  return gr2;
} catch (Exception e) {
  LOG.debug(e);
  return null;
}
  }
And use them like such:
// Remember MyObject is the SpecificRecord implementing class.
MyObject x = new MyObject();
byte[] avroBytes = serializeFromAvro(x);
MyObject x2 = (MyObject) deserializeFromAvro(avroBytes, DataType.MyObject);
Which results in this:
java.lang.ClassCastException: org.apache.avro.generic.GenericData$Record cannot 
be cast to datatypes.generated.avro.MyObject
Is there an easier way to achieve my use case, or some way I can fix my methods 
to allow the sort of behavior I want?
Thanks,
Gary




Using snappy codec with Avro C++

2014-02-18 Thread irodens
I am trying to read an avro file compessed with snappy using Avro C++, but
Avro C++ does not come with snappy out of the box, and I am unsure how to
add it.

Can anyone offer any practical advice to get it working?

Thanks.



--
View this message in context: 
http://apache-avro.679487.n3.nabble.com/Using-snappy-codec-with-Avro-C-tp4029430.html
Sent from the Avro - Users mailing list archive at Nabble.com.


RE: General-Purpose Serialization and Deserialization for Avro-Generated SpecificRecords

2014-02-18 Thread Dave McAlpin
That's great Gary. Thanks for the follow up.

Dave

From: flaming.ze...@gmail.com [mailto:flaming.ze...@gmail.com] On Behalf Of 
Gary Steelman
Sent: Tuesday, February 18, 2014 5:15 PM
To: Gary Steelman
Cc: user@avro.apache.org
Subject: Re: General-Purpose Serialization and Deserialization for 
Avro-Generated SpecificRecords

Hey all, I've adapted Dave's solution to serialize to/from byte[] rather than 
JSON. Thanks a lot! The two methods are below:

  @SuppressWarnings(unchecked)
  public static T byte[] avroSerialize(ClassT clazz, Object object) {
byte[] ret = null;
try {
  if (object == null || !(object instanceof SpecificRecord)) {
return null;
  }

  T record = (T) object;
  ByteArrayOutputStream out = new ByteArrayOutputStream();
  Encoder e = EncoderFactory.get().directBinaryEncoder(out, null);
  SpecificDatumWriterT w = new SpecificDatumWriterT(clazz);
  w.write(record, e);
  e.flush();
  ret = out.toByteArray();
} catch (IOException e) {
  LOG.debug(e);
}

return ret;
  }

  public static T T avroDeserialize(byte[] avroBytes, ClassT clazz, Schema 
schema) {
T ret = null;
try {
  ByteArrayInputStream in = new ByteArrayInputStream(avroBytes);
  Decoder d = DecoderFactory.get().directBinaryDecoder(in, null);
  SpecificDatumReaderT reader = new SpecificDatumReaderT(clazz);
  ret = reader.read(null, d);
} catch (IOException e) {
  LOG.debug(e);
}

return ret;
  }
And they're called like so:
MyObject x = new MyObject();
byte[] avroBytes = avroSerialize(x.getClass(), x);
MyObject y = avroDeserialize(avroBytes, MyObject.class, MyObject.SCHEMA$);
Thanks,
Gary

On Tue, Feb 18, 2014 at 6:49 PM, Gary Steelman 
gary.steelma...@gmail.commailto:gary.steelma...@gmail.com wrote:

Thank you Dave, I appreciate it. I'll give those a shot and let you know how it 
goes.

-Gary
On Feb 18, 2014 6:45 PM, Dave McAlpin 
dmcal...@inome.commailto:dmcal...@inome.com wrote:
Here are some utility functions we've used for serialization to and from JSON. 
Something similar should work for binary.

public T String avroEncodeAsJson(ClassT clazz, Object object) {
String avroEncodedJson = null;
try {
if (object == null || !(object instanceof SpecificRecord)) {
return null;
}
T record = (T) object;
Schema schema = ((SpecificRecord) record).getSchema();
ByteArrayOutputStream out = new ByteArrayOutputStream();
Encoder e = EncoderFactory.get().jsonEncoder(schema, out);
SpecificDatumWriterT w = new SpecificDatumWriterT(clazz);
w.write(record, e);
e.flush();
avroEncodedJson = new String(out.toByteArray());
} catch (IOException e) {
e.printStackTrace();
}

return avroEncodedJson;
}

public T T jsonDecodeToAvro(String inputString, ClassT className, Schema 
schema) {
T returnObject = null;
try {
JsonDecoder jsonDecoder = DecoderFactory.get().jsonDecoder(schema, 
inputString);
SpecificDatumReaderT reader = new SpecificDatumReaderT(className);
returnObject = reader.read(null, jsonDecoder);
} catch (IOException e) {
e.printStackTrace();
}

return returnObject;
}

Dave

From: flaming.ze...@gmail.commailto:flaming.ze...@gmail.com 
[mailto:flaming.ze...@gmail.commailto:flaming.ze...@gmail.com] On Behalf Of 
Gary Steelman
Sent: Tuesday, February 18, 2014 4:21 PM
To: user@avro.apache.orgmailto:user@avro.apache.org
Subject: General-Purpose Serialization and Deserialization for Avro-Generated 
SpecificRecords

Hi all,
Here's my use case: I've got a bunch of different Java objects generated from 
Avro schema files. So the class definition headers look something like this: 
public class MyObject extends org.apache.avro.specific.SpecificRecordBase 
implements org.apache.avro.specific.SpecificRecord. I've got many other types 
than MyObject too. I need to write a method which can serialize (from MyObject 
or another class to byte[]) and deserialize (from byte[] to MyObject or another 
class) in memory (not writing to disk).
I couldn't figure out how to write one method to handle it for SpecificRecord, 
so I tired serializing/deserializing these things as GenericRecord instead:

  public static byte[] serializeFromAvro(GenericRecord gr) {
try {
  DatumWriterGenericRecord writer2 = new 
GenericDatumWriterGenericRecord(gr.getSchema());
  ByteArrayOutputStream bao2 = new ByteArrayOutputStream();
  BinaryEncoder encoder2 = EncoderFactory.get().directBinaryEncoder(bao2, 
null);
  writer2.write(gr, encoder2);
  byte[] avroBytes2 = bao2.toByteArray();
  return avroBytes2;
} catch (IOException e) {
  LOG.debug(e);
  return null;
}
  }
  // Here I use a DataType enum and the AvroSchemaFactory to quickly retrieve a 
Schema object for a supported DataType.
  public static GenericRecord deserializeFromAvro(byte[] avroBytes, DataType 
dataType) {