[ 
https://issues.apache.org/jira/browse/ATLAS-535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15168154#comment-15168154
 ] 

Suma Shivaprasad edited comment on ATLAS-535 at 2/26/16 12:19 AM:
------------------------------------------------------------------

*Modelling  DELETE cascades across entities*


*Background*

Currently, the Typesystem allows modelling relationship behaviour between types 
as part of its attribute flags. The isComposite flag on an attribute defines 
that the relation between the current type and the attribute Type (which is 
annotated with the isComposite flag) have a “composition” relationship 
indicating that the referred instance needs to be loaded, deleted whenever the 
current instance is loaded/deleted. For eg: hive_table.columns has an 
isComposite relationship and whenever a table is loaded/deleted , the columns 
are also loaded/deleted.

*API changes*

deleteEntity API should have another flag to indicate cascading deletes


*Modelling/Repository changes*

*Option 1:*

Add an attribute array<hive_table> in hive_db 

*Pros:*

Works OOB and does not need any code changes
Since the entity being deleted is also the source from which the delete cascade 
begins i.e it is the parent entity, we know exactly which edges i.e the ones 
with label __typeName.attributeName and vertices are to be deleted. 

*Cons:*

** The current support for adding such an attribute flag is limited in its 
application in some cases. For eg: Database->Table , Table -> Partitions could 
have issues since any add of a partition will require updating the Table entity 
and add it to array<partition>s which could possibly have issues with scale . 
If we take hourly, daily partitions as worst case over five years, it could 
have around ~50000 - partition entries for a table. Not sure what can be an 
average number of tables that we should support for a Database  ?
** Will have to implement another flag isVisible/lazyFetch on an attribute to 
not load/display the tables or do a lazy fetch when a database is loaded  since 
this is more of an atribute added for internal reasons and should not be 
displayed when a database is viewed. If we add a lazyFetch, should we load all 
the entries in the array ?

*Option 2:*

Add an attribute flag called isInverseComposite on hive_table.db.

In this case, 

whenever an instance of hive_db needs to be deleted, it needs to look at all 
the incoming vertices with edge label starting with __hive_db, look at their 
type definition and check if isInverseComposite flag is set on them for the 
current type attribute. If set, then remove the corresponding vertices and 
edges 

Get or update behaviour does not change/affected based on this flag. 

*Pros:*

Simple approach and doesnt need intrusive code changes

*Cons:*

** An additional flag that users need to define in the type definition. 
** Need to iterate over all the edges( which could be potentially large and 
check which ones have the labels starting with that typeName prefix).  However, 
on an average there could be mostly one or maximum two such attributes which 
have a potentially large number of edges and hence the scan would anyways 
mostly go through all the vertices that need to be deleted.

*Option 3:*

There is no way currently to model associations between any two types/classes. 
The proposal is to model this in a generic way as to be able to represent 
various association rules between types which are not attribute specific . For 
eg: Database to Table is a composition relationship.

Define a generic new internal type 

*AssociationRule*

attributes:
 String targetType   // the type which which the association rule is being 
defined
 String name   // the name of this Rule
 String description  

Note: Typesystem will enforce a typecheck on the targetType using existing 
types. 

A type definition will have a Collection<AssociationRule> along with the 
existing attribute definitions, traits etc

*CascadeRule extends AssociationRule*

*DeleteCascadeRule extends CascadeRule*

Currently the only Cascade type supported is DELETE

However going forward it could be extended later to various other types like  
the JPA cascade types - for updates, gets etc -  
https://docs.oracle.com/javaee/6/api/javax/persistence/CascadeType.html

Also going forward AssociationRule(s) could be attached at an attribute level 
i.e isComposite on an attribute can be changed to be a DeleteCascade rule 
instead. So the same set of association rules can apply at both the type , 
attribute levels.

When a delete with cascade is issued on an entity, if its corresponding type 
contains a DeleteCascadeRule, delete any references from this entity which are 
of the targetType for eg: when an entity of hive_db is deleted, it will delete 
all the hive_table  entities associated with it. In order to find the vertices 
to delete, it will follow all edges starting with the typeName 
__hive_table(targetType) and delete the referred vertices. This should work for 
all the complex and collection types -  array, map, struct and class 
references. 

*Pros:*

Generic and can be used to define any associations between two types and use 
them in any aspect of ATLAS eg: during entity mutation - updates, gets, delete 
behaviour etc.
the current hive model of Table-> Database reference will not need a change 
which means that there are no extra updates whenever a table is added which was 
the case in Option 1.

*Cons:*

Is more intrusive and will need changes in type system apart from entity 
mutation. 
Need to iterate over all the edges( which could be potentially large and check 
which ones have the labels starting with that typeName prefix).  However, on an 
average there could be mostly one or maximum two such attributes which have a 
potentially large number of edges and hence the scan would anyways mostly go 
through all the vertices that need to be deleted. Also deletes in general could 
be a less used operation than creates/updates.


Due to its simplicity and non-intrusive code changes, leaning towards Option 2. 
Thoughts?



was (Author: suma.shivaprasad):
*Modelling  DELETE cascades across entities*


*Background*

Currently, the Typesystem allows modelling relationship behaviour between types 
as part of its attribute flags. The isComposite flag on an attribute defines 
that the relation between the current type and the attribute Type (which is 
annotated with the isComposite flag) have a “composition” relationship 
indicating that the referred instance needs to be loaded, deleted whenever the 
current instance is loaded/deleted. For eg: hive_table.columns has an 
isComposite relationship and whenever a table is loaded/deleted , the columns 
are also loaded/deleted.

*API changes*

deleteEntity API should have another flag to indicate cascading deletes


*Modelling/Repository changes*

*Option 1:*

Add an attribute array<hive_table> in hive_db 

*Pros:*

Works OOB and does not need any code changes
Since the entity being deleted is also the source from which the delete cascade 
begins i.e it is the parent entity, we know exactly which edges i.e the ones 
with label __typeName.attributeName and vertices are to be deleted. 

*Cons:*

**The current support for adding such an attribute flag is limited in its 
application in some cases. For eg: Database->Table , Table -> Partitions could 
have issues since any add of a partition will require updating the Table entity 
and add it to array<partition>s which could possibly have issues with scale . 
If we take hourly, daily partitions as worst case over five years, it could 
have around ~50000 - partition entries for a table. Not sure what can be an 
average number of tables that we should support for a Database  ?
**Will have to implement another flag isVisible/lazyFetch on an attribute to 
not load/display the tables or do a lazy fetch when a database is loaded  since 
this is more of an atribute added for internal reasons and should not be 
displayed when a database is viewed. If we add a lazyFetch, should we load all 
the entries in the array ?

*Option 2:*

Add an attribute flag called isInverseComposite on hive_table.db.

In this case, 

whenever an instance of hive_db needs to be deleted, it needs to look at all 
the incoming vertices with edge label starting with __hive_db, look at their 
type definition and check if isInverseComposite flag is set on them for the 
current type attribute. If set, then remove the corresponding vertices and 
edges 

Get or update behaviour does not change/affected based on this flag. 

*Pros:*

Simple approach and doesnt need intrusive code changes

*Cons:*

** An additional flag that users need to define in the type definition. 
** Need to iterate over all the edges( which could be potentially large and 
check which ones have the labels starting with that typeName prefix).  However, 
on an average there could be mostly one or maximum two such attributes which 
have a potentially large number of edges and hence the scan would anyways 
mostly go through all the vertices that need to be deleted.

*Option 3:*

There is no way currently to model associations between any two types/classes. 
The proposal is to model this in a generic way as to be able to represent 
various association rules between types which are not attribute specific . For 
eg: Database to Table is a composition relationship.

Define a generic new internal type 

*AssociationRule*

attributes:
 String targetType   // the type which which the association rule is being 
defined
 String name   // the name of this Rule
 String description  

Note: Typesystem will enforce a typecheck on the targetType using existing 
types. 

A type definition will have a Collection<AssociationRule> along with the 
existing attribute definitions, traits etc

*CascadeRule extends AssociationRule*

*DeleteCascadeRule extends CascadeRule*

Currently the only Cascade type supported is DELETE

However going forward it could be extended later to various other types like  
the JPA cascade types - for updates, gets etc -  
https://docs.oracle.com/javaee/6/api/javax/persistence/CascadeType.html

Also going forward AssociationRule(s) could be attached at an attribute level 
i.e isComposite on an attribute can be changed to be a DeleteCascade rule 
instead. So the same set of association rules can apply at both the type , 
attribute levels.

When a delete with cascade is issued on an entity, if its corresponding type 
contains a DeleteCascadeRule, delete any references from this entity which are 
of the targetType for eg: when an entity of hive_db is deleted, it will delete 
all the hive_table  entities associated with it. In order to find the vertices 
to delete, it will follow all edges starting with the typeName 
__hive_table(targetType) and delete the referred vertices. This should work for 
all the complex and collection types -  array, map, struct and class 
references. 

*Pros:*

Generic and can be used to define any associations between two types and use 
them in any aspect of ATLAS eg: during entity mutation - updates, gets, delete 
behaviour etc.
the current hive model of Table-> Database reference will not need a change 
which means that there are no extra updates whenever a table is added which was 
the case in Option 1.

*Cons:*

Is more intrusive and will need changes in type system apart from entity 
mutation. 
Need to iterate over all the edges( which could be potentially large and check 
which ones have the labels starting with that typeName prefix).  However, on an 
average there could be mostly one or maximum two such attributes which have a 
potentially large number of edges and hence the scan would anyways mostly go 
through all the vertices that need to be deleted. Also deletes in general could 
be a less used operation than creates/updates.


Due to its simplicity and non-intrusive code changes, leaning towards Option 2. 
Thoughts?


> Support delete cascade efficently
> ---------------------------------
>
>                 Key: ATLAS-535
>                 URL: https://issues.apache.org/jira/browse/ATLAS-535
>             Project: Atlas
>          Issue Type: Sub-task
>            Reporter: Suma Shivaprasad
>             Fix For: 0.7-incubating
>
>
> Currently there are some limitation in the typesystem and modelling to 
> support delete cascades at scale through the isComposite flag



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to