Hey Ryan, Any chance you were able to look into this (My email on Jun 6)?
Thanks SG On Thu, Jun 16, 2016 at 2:05 PM, S G <sg.online.em...@gmail.com> wrote: > Doug Cutting, Ryan Blue, > > I have worked with both of you on this ticket and would appreciate your > thoughts regarding circular references. > > If you think the current circular references' implementation in Avro is > indeed easy for clients to use (i.e. they do not have to write all those > extra classes), then please help me by providing some test-case/code > reference. > > Thanks > Sachin > > > On Mon, Jun 6, 2016 at 9:18 AM, S G <sg.online.em...@gmail.com> wrote: > >> >> As far as I have understood the attached testcase, I think the solution >> in https://github.com/apache/avro/pull/23/files (JIRA: >> https://issues.apache.org/jira/browse/AVRO-695) is more seamless from a >> user's perspective because the user does not have to write any classes >> extending LogicalType or Conversion. >> >> Writing those classes could become a bit of a challenge: >> 1) If there are several circular references in the code (something very >> common in the Hibernate world). >> 2) It is a pain to change these classes to reflect any changes in the >> classes being serialized. >> 3) When the user decides to write a conversion/logical-type class for >> some record, he has to handle serialization for all fields of the class >> even though all he wanted to handle was circular references (This is what I >> have understood but I could be wrong here). >> >> In my pull request for AVRO-695, the only thing a user need to do is to >> call setResolvingCircularRefs(true) on ReflectData. Then every record is >> converted into a union of the record-type and the newly added class >> CircularRef. This new class stores the actual reference ID on encountering >> a circular ref. This is really convenient for the user and there is no >> penalty on performance. Even if there is a problem, the user can always go >> back to writing LogicalType and Conversion classes. >> >> My request would be to consider this solution for handling circular >> references instead of asking the user to write a bunch of other classes. >> If the solution sounds acceptable, then I can bring the patch up-to-date >> with the latest master branch and reopen the pull request. >> >> Cheers, >> SG >> >> On Wed, Jun 1, 2016 at 5:36 PM, S G <sg.online.em...@gmail.com> wrote: >> >>> Thanks RB, >>> >>> But I am not sure I follow that example (Probably because there is no >>> actual datum class there?). >>> >>> Consider a complex code running in production with lots of circular >>> references. >>> Something like: >>> >>> class Home { >>> String address; >>> Integer zipCode; >>> List <Person> residents; >>> } >>> >>> class Person { >>> String name; >>> Person spouse; >>> House home; >>> Map <Car, Dealer> vehiclesOwned; >>> } >>> >>> class Car { >>> String make; >>> Integer year; >>> Dealer soldBy; >>> Person ownedBy; >>> } >>> >>> class Dealer { >>> String name; >>> String address; >>> List<Car> vehiclesSold; >>> } >>> >>> The above is a made-up example, intentionally using easy-to-understand >>> references. >>> But it does show that circular references can become very complex and >>> across multiple chains. >>> >>> If users have to write Conversion classes for all of the above along >>> with hash-map(s) that finds out IDs for the objects seen already, writes >>> them out as separate fields and then reads them back to replace them with >>> actual objects, then it would be expecting too much from our clients IMO. >>> >>> Perhaps it would be more helpful if there was a more seamless way of >>> getting this done automatically. >>> >>> SG >>> >>> >>> >>> On Wed, Jun 1, 2016 at 5:22 PM, Ryan Blue <rb...@netflix.com.invalid> >>> wrote: >>> >>>> Those aren't the datum classes, those are logical types that are added >>>> to >>>> the schema for your datum classes. The "referenceable" logical type is >>>> applied to the class that gets replaced with an ID reference and points >>>> to >>>> the field to use for that ID. The "reference" logical type is applied to >>>> the class that contains a referencable datum. Then there are conversions >>>> that replace a referenceable with its ID. >>>> >>>> On Wed, Jun 1, 2016 at 4:52 PM, S G <sg.online.em...@gmail.com> wrote: >>>> >>>> > RB, >>>> > >>>> > The test TestCircularReferences shows that the classes having circular >>>> > reference need to extend LogicalType. >>>> > If every circular reference class has to do this, then I think this >>>> would >>>> > be a big limitation for actual classes used in production code >>>> because they >>>> > would be already extending other classes plus asking several other >>>> teams to >>>> > change their base-class would be difficult. >>>> > Is there a way to do this without making the classes extend the >>>> LogicalType >>>> > ? >>>> > >>>> > SG >>>> > >>>> > >>>> > On Wed, Jun 1, 2016 at 3:29 PM, Ryan Blue <rb...@netflix.com.invalid> >>>> > wrote: >>>> > >>>> > > SG, >>>> > > >>>> > > The example/test I built uses logical types to remove the circular >>>> > > reference when writing and restore it when reading. It doesn't look >>>> like >>>> > > your test adds logical types, so that's probably the issue. >>>> > > >>>> > > rb >>>> > > >>>> > > On Wed, Jun 1, 2016 at 1:51 PM, S G <sg.online.em...@gmail.com> >>>> wrote: >>>> > > >>>> > > > Hey RB, >>>> > > > >>>> > > > I was trying out the circular refs with latest 1.8.1 version of >>>> Avro >>>> > and >>>> > > it >>>> > > > doesn't seem to be working out of the box for me. >>>> > > > Perhaps I am missing something and would appreciate your help. >>>> > > > >>>> > > > Thanks, >>>> > > > SG >>>> > > > >>>> > > > >>>> > > > Here is my test code: >>>> > > > >>>> > > > public class CircularRefsTest >>>> > > > { >>>> > > > @Test >>>> > > > public void testSerialization() throws Exception >>>> > > > { >>>> > > > // Create a circular linked-list >>>> > > > ListNode first = new ListNode("first"); >>>> > > > ListNode second = new ListNode("second"); >>>> > > > second.next = first; >>>> > > > first.next = second; >>>> > > > >>>> > > > ReflectData rdata = ReflectData.AllowNull.get(); >>>> > > > >>>> > > > // Print Schema >>>> > > > Schema schema = rdata.getSchema(ListNode.class); >>>> > > > System.out.println(schema); >>>> > > > >>>> > > > // Serialize >>>> > > > DatumWriter<ListNode> datumWriter = new >>>> > > > ReflectDatumWriter<ListNode>(ListNode.class); >>>> > > > ByteArrayOutputStream byteArrayOutputStream = new >>>> > > > ByteArrayOutputStream(); >>>> > > > Encoder encoder = >>>> > > > EncoderFactory.get().binaryEncoder(byteArrayOutputStream, null); >>>> > > > datumWriter.write(first, encoder); >>>> > > > encoder.flush(); >>>> > > > byte[] bytes = byteArrayOutputStream.toByteArray(); >>>> > > > >>>> > > > assertTrue( bytes != null ); >>>> > > > } >>>> > > > } >>>> > > > >>>> > > > class ListNode >>>> > > > { >>>> > > > String data; >>>> > > > ListNode next; >>>> > > > >>>> > > > public ListNode() { >>>> > > > } >>>> > > > public ListNode(String data) { >>>> > > > this.data = data; >>>> > > > } >>>> > > > } >>>> > > > >>>> > > > >>>> > > > >>>> > > > On Thu, May 28, 2015 at 11:00 AM, Ryan Blue <b...@cloudera.com> >>>> wrote: >>>> > > > >>>> > > > > SG, >>>> > > > > >>>> > > > > The data ends up looking like this: >>>> > > > > >>>> > > > > {"id":1,"p":"parent data!","child":{"c":"child >>>> > > > data!","parent":{"long":1}}} >>>> > > > > >>>> > > > > I produced that with avro-tools 1.7.6 and tojson. >>>> > > > > >>>> > > > > Here's the schema: { >>>> > > > > "type" : "record", >>>> > > > > "name" : "Parent", >>>> > > > > "fields" : [ { >>>> > > > > "name" : "id", >>>> > > > > "type" : "long" >>>> > > > > }, { >>>> > > > > "name" : "p", >>>> > > > > "type" : "string" >>>> > > > > }, { >>>> > > > > "name" : "child", >>>> > > > > "type" : { >>>> > > > > "type" : "record", >>>> > > > > "name" : "Child", >>>> > > > > "fields" : [ { >>>> > > > > "name" : "c", >>>> > > > > "type" : "string" >>>> > > > > }, { >>>> > > > > "name" : "parent", >>>> > > > > "type" : [ "null", "long", "Parent" ] >>>> > > > > } ], >>>> > > > > "logicalType" : "reference", >>>> > > > > "ref-field-name" : "parent" >>>> > > > > } >>>> > > > > } ], >>>> > > > > "logicalType" : "referenceable", >>>> > > > > "id-field-name" : "id" >>>> > > > > } >>>> > > > > >>>> > > > > rb >>>> > > > > >>>> > > > > >>>> > > > > On 05/28/2015 09:50 AM, S G wrote: >>>> > > > > >>>> > > > >> RB, >>>> > > > >> >>>> > > > >> Could you please attach the schema and the JSON serialized >>>> output >>>> > from >>>> > > > >> your >>>> > > > >> test-code as well? >>>> > > > >> My build environment is currently broken as I am grappling >>>> with some >>>> > > > Java >>>> > > > >> 8 >>>> > > > >> update issues. >>>> > > > >> >>>> > > > >> Thanks >>>> > > > >> SG >>>> > > > >> >>>> > > > >> >>>> > > > >> On Wed, May 27, 2015 at 3:28 PM, Ryan Blue <b...@cloudera.com> >>>> > wrote: >>>> > > > >> >>>> > > > >> SG, >>>> > > > >>> >>>> > > > >>> Now that logical types are in, I had some time to look at this >>>> > issue. >>>> > > > >>> Thanks for your patience on this. >>>> > > > >>> >>>> > > > >>> When I started looking at the use case, this began to look >>>> very >>>> > much >>>> > > > like >>>> > > > >>> a logical type issue. (I know, I've been saying that a lot.) >>>> When >>>> > you >>>> > > > >>> write, you replace any referenced object with its id. When you >>>> > read, >>>> > > > you >>>> > > > >>> replace those ids with the correct object. I went ahead and >>>> > > implemented >>>> > > > >>> this using 2 logical types: Referenceable for an object with >>>> an id, >>>> > > and >>>> > > > >>> Reference for a record with a field that references another >>>> object >>>> > by >>>> > > > id. >>>> > > > >>> >>>> > > > >>> There were some trade-offs to this approach. For example, I >>>> had to >>>> > > use >>>> > > > a >>>> > > > >>> logical type for the object containing the reference rather >>>> than >>>> > for >>>> > > > the >>>> > > > >>> reference itself because I used a callback to set the object >>>> once >>>> > it >>>> > > is >>>> > > > >>> found. That happens because children with references to a >>>> parent >>>> > are >>>> > > > >>> deserialized completely first. The parent is the last object >>>> to be >>>> > > > >>> assembled and passed to the logical type conversion. >>>> > > > >>> >>>> > > > >>> A bigger issue was that logical types are currently >>>> conservative >>>> > and >>>> > > > >>> don't >>>> > > > >>> overlap with reflect types or anything that sets java-class. >>>> That >>>> > > means >>>> > > > >>> that this currently only works with generic types. But, I'd >>>> rather >>>> > > make >>>> > > > >>> logical types work for reflect than add more custom code to >>>> support >>>> > > > this. >>>> > > > >>> Does that sound reasonable? >>>> > > > >>> >>>> > > > >>> I'm attaching a diff with the working test code so you can >>>> take a >>>> > > look >>>> > > > at >>>> > > > >>> the approach. Let me know what you are thinking. >>>> > > > >>> >>>> > > > >>> rb >>>> > > > >>> >>>> > > > >>> On 05/20/2015 12:05 PM, S G wrote: >>>> > > > >>> >>>> > > > >>> I am requesting some help with AVRO-695. >>>> > > > >>>> Here are some pieces from the last conversation. >>>> > > > >>>> >>>> > > > >>>> >>>> > > > >>>> Doug Cutting >>>> > > > >>>> < >>>> > > https://issues.apache.org/jira/secure/ViewProfile.jspa?name=cutting >>>> > >>>> > > > >>>> added >>>> > > > >>>> a comment - 02/Oct/14 21:19 >>>> > > > >>>> >>>> > > > >>>> Here's a modified version of the patch. It moves all >>>> significant >>>> > > > changes >>>> > > > >>>> to >>>> > > > >>>> reflect. Since reflect is the only implementation that can >>>> > generate >>>> > > an >>>> > > > >>>> appropriate schema, changes should be confined to there. >>>> > > > >>>> >>>> > > > >>>> The tests need to be updated, as they still assume generic. >>>> > > > >>>> <https://issues.apache.org/jira/browse/AVRO-695#> >>>> > > > >>>> < >>>> > > > >>>> >>>> > > > >>>> >>>> > > > >>>> > > >>>> > https://issues.apache.org/jira/browse/AVRO-695? >>>> focusedCommentId=14286370&page=com.atlassian.jira. >>>> plugin.system.issuetabpanels:comment-tabpanel#comment-14286370 >>>> > > > >>>> >>>> > > > >>>>> >>>> > > > >>>>> Sachin Goyal >>>> > > > >>>> < >>>> > > > >>>> >>>> > > > >>>> > https://issues.apache.org/jira/secure/ViewProfile.jspa? >>>> name=sachingoyal >>>> > > > >>>> > >>>> > > > >>>> added >>>> > > > >>>> a comment - 21/Jan/15 21:56 >>>> > > > >>>> >>>> > > > >>>> Here is a patch with the updated test-cases. >>>> > > > >>>> I also confirm that all my changes are there in the patch. >>>> > > > >>>> >>>> > > > >>>> >>>> > > > >>>> The patch was submitted in June 2014 and was very hot till >>>> October >>>> > > > 2014. >>>> > > > >>>> Since then, there has been no action on this even though I >>>> have >>>> > sent >>>> > > > >>>> many >>>> > > > >>>> reminders in this group. >>>> > > > >>>> >>>> > > > >>>> I understand that everyone is very busy with their own stuff >>>> but I >>>> > > > would >>>> > > > >>>> really appreciate if someone could help a fellow engineer in >>>> > getting >>>> > > > his >>>> > > > >>>> patch accepted. >>>> > > > >>>> It would encourage more participation as well as help people >>>> > wanting >>>> > > > to >>>> > > > >>>> use >>>> > > > >>>> Avro for circular references. >>>> > > > >>>> >>>> > > > >>>> Regards >>>> > > > >>>> SG >>>> > > > >>>> >>>> > > > >>>> >>>> > > > >>> >>>> > > > >>> -- >>>> > > > >>> Ryan Blue >>>> > > > >>> Software Engineer >>>> > > > >>> Cloudera, Inc. >>>> > > > >>> >>>> > > > >>> >>>> > > > >> >>>> > > > > >>>> > > > > -- >>>> > > > > Ryan Blue >>>> > > > > Software Engineer >>>> > > > > Cloudera, Inc. >>>> > > > > >>>> > > > >>>> > > >>>> > > >>>> > > >>>> > > -- >>>> > > Ryan Blue >>>> > > Software Engineer >>>> > > Netflix >>>> > > >>>> > >>>> >>>> >>>> >>>> -- >>>> Ryan Blue >>>> Software Engineer >>>> Netflix >>>> >>> >>> >> >