-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72636/
-----------------------------------------------------------
Review request for atlas, Nikhil Bonte, Nixon Rodrigues, and Sarath Subramanian.
Bugs: ATLAS-3874
https://issues.apache.org/jira/browse/ATLAS-3874
Repository: atlas
Description
-------
**Background**
Please see bug description.
**Approach**
At a high-level: Introduce a notion where the individual consumers are aware of
the entities being processed by each other. If there are no entities being
processed concurrently, everything proceeds as usual (the way it is before his
change). If same entity is being procesed by multiple consumers, then one
consumer waits for the other to finish before proceeding.
Classes:
New *UniqueKeysExtractor*: Extracts values of unique keys from
*AtlasEntitiesWithExtInfo*. It navigates *relationshipAttributes* and
*attributes* that has *objectRef* set.
New *UniquenessChecker*: Maintains a set of unique keys provided by
*UniqueKeysExtractor*. It detects the presence of duplicates and waits until
duplicates are resolved.
Modified *NoitficationHookConsumer.createOrUpdate* Updates *UniquenessChecker*
with output from *UniqueKeysExtractor*. Clears the keys at the end of entity
creation.
Modified *NotificationHookConsumer*: Accepts an shared instance of
*UniqunessChecker*.
Diffs
-----
webapp/src/main/java/org/apache/atlas/notification/NotificationHookConsumer.java
3f1ea05e1
webapp/src/main/java/org/apache/atlas/notification/UniqueKeysExtractor.java
PRE-CREATION
webapp/src/main/java/org/apache/atlas/notification/UniquenessChecker.java
PRE-CREATION
webapp/src/test/java/org/apache/atlas/notification/UniquenessCheckerTest.java
PRE-CREATION
Diff: https://reviews.apache.org/r/72636/diff/1/
Testing
-------
**Unit tests**
Tests added to verify new clases.
*UniquessCheckerTest*
Performs worst case checking by adding 100s of keys that are duplicates and
verifies output of those.
**Functional tests**
Used Spark hook to verify.
Start Spark shell using:
```
sudo -u hdfs spark-shell
```
Spark sql commands:
```
spark.sql("create table default.t1_1381104676(col1 int)")
spark.sql("create table default.t2_1381104676(col2 int)")
spark.sql("select * from t1_1381104676, t2_1381104676 where
col1=col2").write.saveAsTable("t3_1381104676")
```
**Volume test**
Medium-size Kafka dump added.
**Pre-commit Build**
https://builds.apache.org/view/A/view/Atlas/job/PreCommit-ATLAS-Build-Test/2003/
Thanks,
Ashutosh Mestry