-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/73081/#review222341
-----------------------------------------------------------


Ship it!




Ship It!

- Ashutosh Mestry


On Dec. 11, 2020, 9:47 p.m., Deep Singh wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/73081/
> -----------------------------------------------------------
> 
> (Updated Dec. 11, 2020, 9:47 p.m.)
> 
> 
> Review request for atlas, Ashutosh Mestry, Madhan Neethiraj, Nikhil Bonte, 
> and Sarath Subramanian.
> 
> 
> Bugs: ATLAS-4076
>     https://issues.apache.org/jira/browse/ATLAS-4076
> 
> 
> Repository: atlas
> 
> 
> Description
> -------
> 
> Observations:
> =============
> Have a hive table and attach classification to it on Atlas. Enable 
> propagation on the attached classification.
> When you drive a new table from this hive table, the new table will have the 
> propagated classification, as expected.
> However, the entity audits of the newly derived table has multiple 
> "Propagated Classification Added" enteries. 
> 
> If table derivation is done using Hive Beeline, there are 5 such entries per 
> propagated classification.
> Using Spark-shell, 3 such entries were observed per propagated classification.
> 
> Expected behaviour is to have just 1 entry per propagated classification.
> 
> Analysis:
> =========
> After detecting relationship and creating relationship edge, the propagated 
> enteties(classifications) are notified to entityChangeListner through 
> entityChangeNotifier. However details of the propagated enteties are not 
> passed directly to notifier, but through request context (buffered into 
> addedPropagation list). 
> 
> After processing every edge, AtlasRelationshipStore manager sends 
> notification to entityChangeListner, which simply gets all the items in 
> request context buffer list. 
> 
> In this issue, Hive sends event which has multiple relationships, and only 
> one relationship has propagated entities, but due to multiple 
> notifications(which is correct) same buffer list is processed multipletimes 
> (which is wrong).
> 
> Following are the list of created relationships 
> Created relationship edge from [hive_table] --> [hive_storagedesc] using edge 
> label: [__hive_table.sd] 
> Created relationship edge from [hive_table] --> [hive_column] using edge 
> label: [__hive_table.columns] 
> Created relationship edge from [hive_table] --> [hive_table_ddl] using edge 
> label: [r:hive_table_ddl_queries] 
> Created relationship edge from [hive_table] --> [hive_db] using edge label: 
> [__hive_table.db] 
> Created relationship edge from [hive_process] --> [hive_process_execution] 
> using edge label: [r:hive_process_process_executions] 
> Created relationship edge from [hive_process] --> [hive_table] using edge 
> label: [__Process.outputs]
> Created relationship edge from [hive_process] --> [hive_table] using edge 
> label: [__Process.inputs]
> ===================================================================================================
> Created relationship edge from [hive_column_lineage] --> [hive_column] using 
> edge label: [__Process.outputs] 
> Created relationship edge from [hive_column_lineage] --> [hive_column] using 
> edge label: [__Process.inputs] 
> Created relationship edge from [hive_column_lineage] --> [hive_process] using 
> edge label: [__hive_column_lineage.query] 
> 
> In the above list the highlited one has propagated classificatin, but 
> subscequent 3 relationships sends 3 more notifications, resulting 3 extra 
> entries for same classification in entity audits.
> 
> At the end entityChangeNotifier, while processing mutated entities, 
> explicetly notify for any pending propagated entities and once again buffer 
> list in request context is processed. Resulting in 4th extra entry in audits.
> 
> Fix:
> ====
> 
> One option was to send the details of propagated entities directly to 
> notifier and not rely on the request context. It required lot of code change.
> Other option was to clear the buffer in the request context after processing 
> it in entityChangeNotifier.
> 
> This review request is with the second aproach.
> 
> 
> Diffs
> -----
> 
>   
> repository/src/main/java/org/apache/atlas/repository/store/graph/v2/AtlasEntityChangeNotifier.java
>  32ad65e7a 
>   server-api/src/main/java/org/apache/atlas/RequestContext.java 32ffddde1 
> 
> 
> Diff: https://reviews.apache.org/r/73081/diff/1/
> 
> 
> Testing
> -------
> 
> Manual testing was done using both hive and spark.
> precommit test were success
> https://ci-builds.apache.org/job/Atlas/job/PreCommit-ATLAS-Build-Test/263/console
> 
> 
> Thanks,
> 
> Deep Singh
> 
>

Reply via email to