----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/73081/#review222342 -----------------------------------------------------------
repository/src/main/java/org/apache/atlas/repository/store/graph/v2/AtlasEntityChangeNotifier.java Lines 362 (patched) <https://reviews.apache.org/r/73081/#comment311407> Refactor: Can you move the clearAddedPropagations to notifyProagatedEntities. repository/src/main/java/org/apache/atlas/repository/store/graph/v2/AtlasEntityChangeNotifier.java Lines 364 (patched) <https://reviews.apache.org/r/73081/#comment311408> Will removing all proapgations after notification is done work? - Ashutosh Mestry On Dec. 11, 2020, 9:47 p.m., Deep Singh wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/73081/ > ----------------------------------------------------------- > > (Updated Dec. 11, 2020, 9:47 p.m.) > > > Review request for atlas, Ashutosh Mestry, Madhan Neethiraj, Nikhil Bonte, > and Sarath Subramanian. > > > Bugs: ATLAS-4076 > https://issues.apache.org/jira/browse/ATLAS-4076 > > > Repository: atlas > > > Description > ------- > > Observations: > ============= > Have a hive table and attach classification to it on Atlas. Enable > propagation on the attached classification. > When you drive a new table from this hive table, the new table will have the > propagated classification, as expected. > However, the entity audits of the newly derived table has multiple > "Propagated Classification Added" enteries. > > If table derivation is done using Hive Beeline, there are 5 such entries per > propagated classification. > Using Spark-shell, 3 such entries were observed per propagated classification. > > Expected behaviour is to have just 1 entry per propagated classification. > > Analysis: > ========= > After detecting relationship and creating relationship edge, the propagated > enteties(classifications) are notified to entityChangeListner through > entityChangeNotifier. However details of the propagated enteties are not > passed directly to notifier, but through request context (buffered into > addedPropagation list). > > After processing every edge, AtlasRelationshipStore manager sends > notification to entityChangeListner, which simply gets all the items in > request context buffer list. > > In this issue, Hive sends event which has multiple relationships, and only > one relationship has propagated entities, but due to multiple > notifications(which is correct) same buffer list is processed multipletimes > (which is wrong). > > Following are the list of created relationships > Created relationship edge from [hive_table] --> [hive_storagedesc] using edge > label: [__hive_table.sd] > Created relationship edge from [hive_table] --> [hive_column] using edge > label: [__hive_table.columns] > Created relationship edge from [hive_table] --> [hive_table_ddl] using edge > label: [r:hive_table_ddl_queries] > Created relationship edge from [hive_table] --> [hive_db] using edge label: > [__hive_table.db] > Created relationship edge from [hive_process] --> [hive_process_execution] > using edge label: [r:hive_process_process_executions] > Created relationship edge from [hive_process] --> [hive_table] using edge > label: [__Process.outputs] > Created relationship edge from [hive_process] --> [hive_table] using edge > label: [__Process.inputs] > =================================================================================================== > Created relationship edge from [hive_column_lineage] --> [hive_column] using > edge label: [__Process.outputs] > Created relationship edge from [hive_column_lineage] --> [hive_column] using > edge label: [__Process.inputs] > Created relationship edge from [hive_column_lineage] --> [hive_process] using > edge label: [__hive_column_lineage.query] > > In the above list the highlited one has propagated classificatin, but > subscequent 3 relationships sends 3 more notifications, resulting 3 extra > entries for same classification in entity audits. > > At the end entityChangeNotifier, while processing mutated entities, > explicetly notify for any pending propagated entities and once again buffer > list in request context is processed. Resulting in 4th extra entry in audits. > > Fix: > ==== > > One option was to send the details of propagated entities directly to > notifier and not rely on the request context. It required lot of code change. > Other option was to clear the buffer in the request context after processing > it in entityChangeNotifier. > > This review request is with the second aproach. > > > Diffs > ----- > > > repository/src/main/java/org/apache/atlas/repository/store/graph/v2/AtlasEntityChangeNotifier.java > 32ad65e7a > server-api/src/main/java/org/apache/atlas/RequestContext.java 32ffddde1 > > > Diff: https://reviews.apache.org/r/73081/diff/1/ > > > Testing > ------- > > Manual testing was done using both hive and spark. > precommit test were success > https://ci-builds.apache.org/job/Atlas/job/PreCommit-ATLAS-Build-Test/263/console > > > Thanks, > > Deep Singh > >