----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/70463/#review214687 -----------------------------------------------------------
Fix it, then Ship it! repository/src/main/java/org/apache/atlas/repository/patches/UniqueAttributePatchProcessor.java Lines 206 (patched) <https://reviews.apache.org/r/70463/#comment300893> please verify if AtlasSchemaViolationException is thrown on commit or on setting property value. - Madhan Neethiraj On April 16, 2019, 6:15 a.m., Ashutosh Mestry wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/70463/ > ----------------------------------------------------------- > > (Updated April 16, 2019, 6:15 a.m.) > > > Review request for atlas, Kapildeo Nayak, Madhan Neethiraj, Nikhil Bonte, > Nixon Rodrigues, and Sarath Subramanian. > > > Bugs: ATLAS-3132 > https://issues.apache.org/jira/browse/ATLAS-3132 > > > Repository: atlas > > > Description > ------- > > **Approach** > - Refactored existing implementation for new design. > - Renamed 'Java Patch Framework' to 'Data Patch Framework', rationale being > that this is essentially to modify structure of existing data. > - New _DataPatchService_: Modified order in which services are called. > _DataPatchService_ will be called before other services are invoked, thereby > giving chance for it to complete before entertaining new data. > - New _DataPatchRegistry_: Data access (CRUD) operation for data patches. > - New _UniqueAttributePatchHandler_: Current implementation for adding the > new property to data vertices. Implemented rudimentary caching to precent > repetitive look-ups. > - New REST Endpoint to query status of patches. > - Duplicates entities are detected during the patch application process. (See > below.) > > **Performance** > Since the data patching operation is high-volume operation, it has been > treated with priority. > - New _NewPropertyDataHandler_ uses database in bulk loading mode for rapid > processing. This scales with resources. Additional properties: > - _atlas.processing.batchSize_: Size of batch. > - _atlas.processing.numWorkers_: Number of worker threads to be employed. > - Leverages existing PC framework. > > Processing speed: > - 300K vertices: ~5 mins (8 threads, batch size: 3000) > - 3.2 M vertices: ~39 mins (12 threads, batch size: 300, memory: 8192 MB) > - 4.2 M entities: ~45 mins (from: 2019-04-12 04:44:50 to 2019-04-12 > 05:29:04), (4 threads, batch size: 300) > > **Duplicates Detection** > Once the patch is run, user can do a fgrep on the application.log and get a > dump of all the duplicates detected in the process: > _fgrep "Duplicates detected" /var/log/atlas/application.log_ > > **Memory & CPU** > Higher the memory, more the threads that can be spawned. > > > Diffs > ----- > > intg/src/main/java/org/apache/atlas/pc/WorkItemConsumer.java b7eb4d89c > intg/src/main/java/org/apache/atlas/pc/WorkItemManager.java 0e7d3f22d > notification/src/main/java/org/apache/atlas/kafka/EmbeddedKafkaServer.java > 32b597fb6 > notification/src/main/java/org/apache/atlas/kafka/KafkaNotification.java > 1d0a2734b > > repository/src/main/java/org/apache/atlas/repository/patches/AtlasJavaPatchHandler.java > 9153d497b > > repository/src/main/java/org/apache/atlas/repository/patches/AtlasPatchHandler.java > PRE-CREATION > > repository/src/main/java/org/apache/atlas/repository/patches/AtlasPatchManager.java > PRE-CREATION > > repository/src/main/java/org/apache/atlas/repository/patches/AtlasPatchRegistry.java > PRE-CREATION > > repository/src/main/java/org/apache/atlas/repository/patches/AtlasPatchService.java > PRE-CREATION > > repository/src/main/java/org/apache/atlas/repository/patches/PatchContext.java > a60422b80 > > repository/src/main/java/org/apache/atlas/repository/patches/TypeNameAttributeCache.java > PRE-CREATION > > repository/src/main/java/org/apache/atlas/repository/patches/UniqueAttributePatch.java > PRE-CREATION > > repository/src/main/java/org/apache/atlas/repository/patches/UniqueAttributePatchHandler.java > f2238f1b0 > > repository/src/main/java/org/apache/atlas/repository/patches/UniqueAttributePatchProcessor.java > PRE-CREATION > > repository/src/main/java/org/apache/atlas/repository/store/bootstrap/AtlasTypeDefStoreInitializer.java > 78f3faf99 > > repository/src/main/java/org/apache/atlas/repository/store/graph/v2/AtlasGraphUtilsV2.java > 80141b4f1 > > repository/src/main/java/org/apache/atlas/repository/store/graph/v2/EntityGraphRetriever.java > 03d2c066b > > repository/src/test/java/org/apache/atlas/patches/AtlasPatchRegistryTest.java > PRE-CREATION > > webapp/src/main/java/org/apache/atlas/notification/NotificationHookConsumer.java > ce2d76f11 > webapp/src/main/java/org/apache/atlas/web/resources/AdminResource.java > c5ceb9d6d > webapp/src/test/java/org/apache/atlas/web/resources/AdminResourceTest.java > 223a90a9c > > > Diff: https://reviews.apache.org/r/70463/diff/4/ > > > Testing > ------- > > **Unit tests** > Additional tests added. > > **Volume tests** > Verification with large datasets: > - 4M entities > - 3.2M entities > - 16K entities. > > **Performance tests** > CPU usage, memory usage and disk IO. > > **Pre-commit build** > https://builds.apache.org/view/A/view/Atlas/job/PreCommit-ATLAS-Build-Test/1031/ > > **Gremlin Queries for Verification** > Check entities that do not have the new attribute: > ``` > g.V().has('__typeName', > within('hive_db','hive_table','hive_column')).hasNot('Referenceable.__u_qualifiedName').valueMap('__guid') > ``` > > Drop entities with new attribute: > ``` > g.V().has('__typeName', > within('hive_db','hive_table','hive_column')).has('Referenceable.__u_qualifiedName').properties('Referenceable.__u_qualifiedName').drop() > ``` > > Re-run patch: > ``` > g.V().has('__patch.id', > 'JAVA_PATCH_0000_001').property('__patch.state','FAILED'); > ``` > > > Thanks, > > Ashutosh Mestry > >