Vivek Nadkarni created AVRO-1167:
------------------------------------

             Summary: Avro-C: avro_schema_copy() does not copy AVRO_LINKs 
properly.
                 Key: AVRO-1167
                 URL: https://issues.apache.org/jira/browse/AVRO-1167
             Project: Avro
          Issue Type: Bug
          Components: c
    Affects Versions: 1.7.1
         Environment: Ubuntu Linux 11.10
            Reporter: Vivek Nadkarni


When avro_schema_copy() encounters an AVRO_LINK from an old_schema to a 
new_schema, it sets the target of the new_link to the target of the old_link in 
the old_schema. Thus, the AVRO_LINK in the new_schema points to an element in 
the old_schema. 

While this is currently safe, since the reference count of the target in the 
old_schema is incremented, we are not really making a "copy" of the schema.

There is a "TODO" in the code, which says that we should make a 
avro_schema_copy() of the target in old_schema instead of linking directly to 
it. However, this solution of making a copy would result in a few problems:

1. Avro schemas are intended to be self-contained. That implies that AVRO_LINKs 
are intended to be internal links inside a self-contained schema. The code 
introduces unnecessary (and potentially disallowed) external dependencies in an 
Avro schema. 

2. The purpose of copying a schema that we want to decouple the old_schema from 
the new_schema. The two copies may have different owners, we may want to 
deallocate old schema etc.

3. If the schema is recursive, then the code would enter an infinite recursion 
loop.

It appears to me that the "correct" solution would be to replicate the entire 
structure of the current schema, including the internal links. This means that 
if old_link_A points to old_target_B, then new_link_A should point to 
new_target_B in the new schema. Note that there should only be one copy of 
new_target_B in the new schema, even if there are multiple links pointing to 
new_target_B - i.e. we should not make a new copy for each link.

In order to implement this proper copying of links, we would need to keep a 
lookup table of pairs of old and new schemas as they are being created, as well 
as a list of all the AVRO_LINKs that are copied. Then as a post-copy step, we 
would go and fix up all the AVRO_LINKs to point to the appropriate targets. 
This is the way the schema is constructed in the first place in 
avro_schema_from_json().

An inefficient way to obtain the correct result from avro_schema_copy() would 
be to perform an avro_schema_to_json() followed by an avro_schema_from_json().

Note: I have not implemented a fix for this issue, but I am documenting this 
issue in AVRO-JIRA because this issue needs to be resolved before AVRO-766 can 
be fixed.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to