[ https://issues.apache.org/jira/browse/PIG-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12905437#action_12905437 ]
Jeff Zhang commented on PIG-794: -------------------------------- Scott & Doug, thanks for your review. Scott, It seems we are doing the same thing to integrate avro into pig. seems you put all the read/writer logic in PigDataAssembly.java, and BTW, you miss one type of InternalMap which pig use it when there's a order by statement in pig script. The key type of InternalMap can been any type that pig can handle such as tuple. Doug, {quote} is there a reason you subclass GenericDatumReader and GenericDatumWriter, overriding readRecord() and writeRecord()? {quote} Actually at first I try to extend GenericDatumReader and GenericDatumWriter, but I found it needs to override many methods (The AvroStorage_4.patch has the implementation code), and one weird thing is that it can not handle InternalMap, PigData's main method illustrate the problem (maybe I do not use avro api correctly) . {quote} your writeMap() doesn't appear to write a valid Avro map, writeArray() doesn't write a valid Avro array {quote} Do you mean I should follow the steps of writeArray in GenericDatumWriter like following: {code} out.writeArrayStart(); out.setItemCount(size); for (Iterator<? extends Object> it = getArrayElements(datum); it.hasNext();) { out.startItem(); write(element, it.next(), out); } out.writeArrayEnd(); {code} {quote} my guess is that a lot of time is spent in findSchemaIndex(). {quote} Yes, I have optimized in AvroStorage_3.patch {quote} I don't see where this specifies an Avro schema for Pig data. {quote} I construct Pig's avro schema in PigData.java, I use the avro api to construct the schema rather than construct it from json. > Use Avro serialization in Pig > ----------------------------- > > Key: PIG-794 > URL: https://issues.apache.org/jira/browse/PIG-794 > Project: Pig > Issue Type: Improvement > Components: impl > Affects Versions: 0.2.0 > Reporter: Rakesh Setty > Assignee: Dmitriy V. Ryaboy > Attachments: avro-0.1-dev-java_r765402.jar, AvroStorage.patch, > AvroStorage_2.patch, AvroStorage_3.patch, AvroTest.java, > jackson-asl-0.9.4.jar, PIG-794.patch > > > We would like to use Avro serialization in Pig to pass data between MR jobs > instead of the current BinStorage. Attached is an implementation of > AvroBinStorage which performs significantly better compared to BinStorage on > our benchmarks. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.