-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70463/#review214687
-----------------------------------------------------------


Fix it, then Ship it!





repository/src/main/java/org/apache/atlas/repository/patches/UniqueAttributePatchProcessor.java
Lines 206 (patched)
<https://reviews.apache.org/r/70463/#comment300893>

    please verify if AtlasSchemaViolationException is thrown on commit or on 
setting property value.


- Madhan Neethiraj


On April 16, 2019, 6:15 a.m., Ashutosh Mestry wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70463/
> -----------------------------------------------------------
> 
> (Updated April 16, 2019, 6:15 a.m.)
> 
> 
> Review request for atlas, Kapildeo Nayak, Madhan Neethiraj, Nikhil Bonte, 
> Nixon Rodrigues, and Sarath Subramanian.
> 
> 
> Bugs: ATLAS-3132
>     https://issues.apache.org/jira/browse/ATLAS-3132
> 
> 
> Repository: atlas
> 
> 
> Description
> -------
> 
> **Approach**
> - Refactored existing implementation for new design.
> - Renamed 'Java Patch Framework' to 'Data Patch Framework', rationale being 
> that this is essentially to modify structure of existing data.
> - New _DataPatchService_: Modified order in which services are called. 
> _DataPatchService_ will be called before other services are invoked, thereby 
> giving chance for it to complete before entertaining new data.
> - New _DataPatchRegistry_: Data access (CRUD) operation for data patches.
> - New _UniqueAttributePatchHandler_: Current implementation for adding the 
> new property to data vertices. Implemented rudimentary caching to precent 
> repetitive look-ups.
> - New REST Endpoint to query status of patches.
> - Duplicates entities are detected during the patch application process. (See 
> below.)
> 
> **Performance**
> Since the data patching operation is high-volume operation, it has been 
> treated with priority. 
> - New _NewPropertyDataHandler_ uses database in bulk loading mode for rapid 
> processing. This scales with resources. Additional properties:
> - _atlas.processing.batchSize_: Size of batch.
> - _atlas.processing.numWorkers_: Number of worker threads to be employed. 
> - Leverages existing PC framework.
> 
> Processing speed:
> - 300K vertices: ~5 mins (8 threads, batch size: 3000)
> - 3.2 M vertices: ~39 mins (12 threads, batch size: 300, memory: 8192 MB)
> - 4.2 M entities: ~45 mins (from: 2019-04-12 04:44:50 to 2019-04-12 
> 05:29:04), (4 threads, batch size: 300)
> 
> **Duplicates Detection**
> Once the patch is run, user can do a fgrep on the application.log and get a 
> dump of all the duplicates detected in the process:
> _fgrep "Duplicates detected" /var/log/atlas/application.log_
> 
> **Memory & CPU**
> Higher the memory, more the threads that can be spawned.
> 
> 
> Diffs
> -----
> 
>   intg/src/main/java/org/apache/atlas/pc/WorkItemConsumer.java b7eb4d89c 
>   intg/src/main/java/org/apache/atlas/pc/WorkItemManager.java 0e7d3f22d 
>   notification/src/main/java/org/apache/atlas/kafka/EmbeddedKafkaServer.java 
> 32b597fb6 
>   notification/src/main/java/org/apache/atlas/kafka/KafkaNotification.java 
> 1d0a2734b 
>   
> repository/src/main/java/org/apache/atlas/repository/patches/AtlasJavaPatchHandler.java
>  9153d497b 
>   
> repository/src/main/java/org/apache/atlas/repository/patches/AtlasPatchHandler.java
>  PRE-CREATION 
>   
> repository/src/main/java/org/apache/atlas/repository/patches/AtlasPatchManager.java
>  PRE-CREATION 
>   
> repository/src/main/java/org/apache/atlas/repository/patches/AtlasPatchRegistry.java
>  PRE-CREATION 
>   
> repository/src/main/java/org/apache/atlas/repository/patches/AtlasPatchService.java
>  PRE-CREATION 
>   
> repository/src/main/java/org/apache/atlas/repository/patches/PatchContext.java
>  a60422b80 
>   
> repository/src/main/java/org/apache/atlas/repository/patches/TypeNameAttributeCache.java
>  PRE-CREATION 
>   
> repository/src/main/java/org/apache/atlas/repository/patches/UniqueAttributePatch.java
>  PRE-CREATION 
>   
> repository/src/main/java/org/apache/atlas/repository/patches/UniqueAttributePatchHandler.java
>  f2238f1b0 
>   
> repository/src/main/java/org/apache/atlas/repository/patches/UniqueAttributePatchProcessor.java
>  PRE-CREATION 
>   
> repository/src/main/java/org/apache/atlas/repository/store/bootstrap/AtlasTypeDefStoreInitializer.java
>  78f3faf99 
>   
> repository/src/main/java/org/apache/atlas/repository/store/graph/v2/AtlasGraphUtilsV2.java
>  80141b4f1 
>   
> repository/src/main/java/org/apache/atlas/repository/store/graph/v2/EntityGraphRetriever.java
>  03d2c066b 
>   
> repository/src/test/java/org/apache/atlas/patches/AtlasPatchRegistryTest.java 
> PRE-CREATION 
>   
> webapp/src/main/java/org/apache/atlas/notification/NotificationHookConsumer.java
>  ce2d76f11 
>   webapp/src/main/java/org/apache/atlas/web/resources/AdminResource.java 
> c5ceb9d6d 
>   webapp/src/test/java/org/apache/atlas/web/resources/AdminResourceTest.java 
> 223a90a9c 
> 
> 
> Diff: https://reviews.apache.org/r/70463/diff/4/
> 
> 
> Testing
> -------
> 
> **Unit tests**
> Additional tests added.
> 
> **Volume tests**
> Verification with large datasets: 
> - 4M entities
> - 3.2M entities
> - 16K entities.
> 
> **Performance tests**
> CPU usage, memory usage and disk IO.
> 
> **Pre-commit build**
> https://builds.apache.org/view/A/view/Atlas/job/PreCommit-ATLAS-Build-Test/1031/
> 
> **Gremlin Queries for Verification**
> Check entities that do not have the new attribute:
> ```
> g.V().has('__typeName', 
> within('hive_db','hive_table','hive_column')).hasNot('Referenceable.__u_qualifiedName').valueMap('__guid')
> ```
> 
> Drop entities with new attribute:
> ```
> g.V().has('__typeName', 
> within('hive_db','hive_table','hive_column')).has('Referenceable.__u_qualifiedName').properties('Referenceable.__u_qualifiedName').drop()
> ```
> 
> Re-run patch:
> ```
> g.V().has('__patch.id', 
> 'JAVA_PATCH_0000_001').property('__patch.state','FAILED');
> ```
> 
> 
> Thanks,
> 
> Ashutosh Mestry
> 
>

Reply via email to