[ https://issues.apache.org/jira/browse/AVRO-1167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16717799#comment-16717799 ]
ASF subversion and git services commented on AVRO-1167: ------------------------------------------------------- Commit 1f00b1a7a2f76921325ba073de3212a4a22524de in avro's branch refs/heads/branch-1.8 from John Gill [ https://gitbox.apache.org/repos/asf?p=avro.git;h=1f00b1a ] AVRO-1167, AVRO-766: Fix c AVRO_LINK memory leaks (#217) * AVRO-1167: Enhance avro_schema_copy for AVRO_LINK - Add hash of named schemas found during copy - Find saved named schema for copy of AVRO_LINK * AVRO-766: Correct memory leaks in AVRO_LINK copy - Adds test cases for AVRO-766 & AVRO-1167 - Corrects reference counting for avro_schema_copy * Enable TEST_AVRO_1167 in test_avro_766 This ensures that both fixes work together and that no valgrind errors are produced from a recrusive schema. > Avro-C: avro_schema_copy() does not copy AVRO_LINKs properly. > ------------------------------------------------------------- > > Key: AVRO-1167 > URL: https://issues.apache.org/jira/browse/AVRO-1167 > Project: Apache Avro > Issue Type: Bug > Components: c > Affects Versions: 1.7.1 > Environment: Ubuntu Linux 11.10 > Reporter: Vivek Nadkarni > Priority: Major > Attachments: AVRO-1167-TEST.patch > > Original Estimate: 168h > Remaining Estimate: 168h > > When avro_schema_copy() encounters an AVRO_LINK from an old_schema to a > new_schema, it sets the target of the new_link to the target of the old_link > in the old_schema. Thus, the AVRO_LINK in the new_schema points to an element > in the old_schema. > While this is currently safe, since the reference count of the target in the > old_schema is incremented, we are not really making a "copy" of the schema. > There is a "TODO" in the code, which says that we should make a > avro_schema_copy() of the target in old_schema instead of linking directly to > it. However, this solution of making a copy would result in a few problems: > 1. Avro schemas are intended to be self-contained. That implies that > AVRO_LINKs are intended to be internal links inside a self-contained schema. > The code introduces unnecessary (and potentially disallowed) external > dependencies in an Avro schema. > 2. The purpose of copying a schema that we want to decouple the old_schema > from the new_schema. The two copies may have different owners, we may want to > deallocate old schema etc. > 3. If the schema is recursive, then the code would enter an infinite > recursion loop. > It appears to me that the "correct" solution would be to replicate the entire > structure of the current schema, including the internal links. This means > that if old_link_A points to old_target_B, then new_link_A should point to > new_target_B in the new schema. Note that there should only be one copy of > new_target_B in the new schema, even if there are multiple links pointing to > new_target_B - i.e. we should not make a new copy for each link. > In order to implement this proper copying of links, we would need to keep a > lookup table of pairs of old and new schemas as they are being created, as > well as a list of all the AVRO_LINKs that are copied. Then as a post-copy > step, we would go and fix up all the AVRO_LINKs to point to the appropriate > targets. This is the way the schema is constructed in the first place in > avro_schema_from_json(). > An inefficient way to obtain the correct result from avro_schema_copy() would > be to perform an avro_schema_to_json() followed by an avro_schema_from_json(). > Note: I have not implemented a fix for this issue, but I am documenting this > issue in AVRO-JIRA because this issue needs to be resolved before AVRO-766 > can be fixed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)