----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/73081/ -----------------------------------------------------------
Review request for atlas, Ashutosh Mestry, Madhan Neethiraj, Nikhil Bonte, and Sarath Subramanian. Bugs: ATLAS-4076 https://issues.apache.org/jira/browse/ATLAS-4076 Repository: atlas Description ------- Observations: ============= Have a hive table and attach classification to it on Atlas. Enable propagation on the attached classification. When you drive a new table from this hive table, the new table will have the propagated classification, as expected. However, the entity audits of the newly derived table has multiple "Propagated Classification Added" enteries. If table derivation is done using Hive Beeline, there are 5 such entries per propagated classification. Using Spark-shell, 3 such entries were observed per propagated classification. Expected behaviour is to have just 1 entry per propagated classification. Analysis: ========= After detecting relationship and creating relationship edge, the propagated enteties(classifications) are notified to entityChangeListner through entityChangeNotifier. However details of the propagated enteties are not passed directly to notifier, but through request context (buffered into addedPropagation list). After processing every edge, AtlasRelationshipStore manager sends notification to entityChangeListner, which simply gets all the items in request context buffer list. In this issue, Hive sends and event which has multiple relationships, and only one relationship has propagated entities, but due to multiple notifications(which is correct) same buffer list is processed multipletimes (which is wrong). Following are the list of created relationships Created relationship edge from [hive_table] --> [hive_storagedesc] using edge label: [__hive_table.sd] Created relationship edge from [hive_table] --> [hive_column] using edge label: [__hive_table.columns] Created relationship edge from [hive_table] --> [hive_table_ddl] using edge label: [r:hive_table_ddl_queries] Created relationship edge from [hive_table] --> [hive_db] using edge label: [__hive_table.db] Created relationship edge from [hive_process] --> [hive_process_execution] using edge label: [r:hive_process_process_executions] Created relationship edge from [hive_process] --> [hive_table] using edge label: [__Process.outputs] Created relationship edge from [hive_process] --> [hive_table] using edge label: [__Process.inputs] =================================================================================================== Created relationship edge from [hive_column_lineage] --> [hive_column] using edge label: [__Process.outputs] Created relationship edge from [hive_column_lineage] --> [hive_column] using edge label: [__Process.inputs] Created relationship edge from [hive_column_lineage] --> [hive_process] using edge label: [__hive_column_lineage.query] In the above list the highlited one has propagated classificatin, but subscequent 3 relationships sends 3 more notifications, resulting 3 extra entries for same classification in entity audits. At the end entityChangeNotifier, while processing mutated entities, explicetly notify for any pending propagated entities and once again buffer list in request context is processed. Resulting in 4th extra entry in audits. Fix: ==== One option was to send the details of propagated entities directly to notifier and not rely on the request context. It required lot of code change. Other option was to clear the buffer in the request context after processing it in entityChangeNotifier. This review request is with the second aproach. Diffs ----- repository/src/main/java/org/apache/atlas/repository/store/graph/v2/AtlasEntityChangeNotifier.java 32ad65e7a server-api/src/main/java/org/apache/atlas/RequestContext.java 32ffddde1 Diff: https://reviews.apache.org/r/73081/diff/1/ Testing ------- Manual testing was done using both hive and spark. precommit test were success https://ci-builds.apache.org/job/Atlas/job/PreCommit-ATLAS-Build-Test/263/console Thanks, Deep Singh