----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/73010/ -----------------------------------------------------------
(Updated Nov. 10, 2020, 5:48 p.m.) Review request for atlas, Madhan Neethiraj, Nikhil Bonte, Nixon Rodrigues, and Sarath Subramanian. Changes ------- Updates include: Addressed review comments. Bugs: ATLAS-4015 https://issues.apache.org/jira/browse/ATLAS-4015 Repository: atlas Description ------- **Background** Please see JIRA. Re-indexing within Atlas was implemented so far as an external tool. Using this tool had number of challenges. The biggest being the throughput of the tool. For a medium sized Atlas repository, the tool could take days to finish. The implementation addresses the problems. (See results below.) **Approach** Re-indexing is now implemented as a JAVA_PATCH that is applied only when the property _atlas.patch.reindex.enabled_ is set to true. *Modified* AtlasJanusGraphManagement: New method _reindex_ implements the re-indexing logic. *New* _ReIndexPatch_ is a JAVA_PATCH that implements the reindexing logic. This uses the PC framework to enumerate vertices and edges. The patch application displays useful log messages indicating progress. **Configuration** _atlas.patch.reindex.enabled=true_ Diffs (updated) ----- graphdb/api/src/main/java/org/apache/atlas/repository/graphdb/AtlasGraphManagement.java f7d2e273c graphdb/janus/src/main/java/org/apache/atlas/repository/graphdb/janus/AtlasJanusGraphManagement.java 2a2ef92a7 intg/src/main/java/org/apache/atlas/AtlasConfiguration.java 1c7915859 repository/src/main/java/org/apache/atlas/repository/patches/AtlasPatchManager.java b142a2a4a repository/src/main/java/org/apache/atlas/repository/patches/ConcurrentPatchProcessor.java c6f0e6438 repository/src/main/java/org/apache/atlas/repository/patches/ReIndexPatch.java PRE-CREATION Diff: https://reviews.apache.org/r/73010/diff/2/ Changes: https://reviews.apache.org/r/73010/diff/1-2/ Testing ------- **Test Setup** Start with a known Atlas setup with known data. Ascetain that basic search yields results. Use these CURL commands to delete Solr indexes: curl http://<host>:8983/solr/vertex_index/update?commit=true -H "Content-Type: text/xml" --data-binary '<delete><query>b2d_t:*</query></delete>' curl http://<host>:8983/solr/edge_index/update?commit=true -H "Content-Type: text/xml" --data-binary '<delete><query>1151_t:*</query></delete>' curl http://ve0128.halxg.cloudera.com:8983/solr/fulltext_index/update?commit=true -H "Content-Type: text/xml" --data-binary '<delete><query>14at_t:*</query></delete>' This will delete solr indexes. If basic search is performed from within the web UI, it will not show any results. Now set configuration parameter. Restart Atlas. Server-side logs will indicate that the patch is run. **Volume Testing** Vertices: ~16M: Duration: ~5 hrs. Edges: ~122M: ~6 hrs. Configuration parameters: atlas.patch.reindex.enabled=true atlas.patch.numWorkers=14 atlas.patch.batchSize=1000 Node configuration: Atlas: Heap size: 6 GB. Solr: Heap size: 12 GB. **PC Build** https://ci-builds.apache.org/job/Atlas/job/PreCommit-ATLAS-Build-Test/177/ Thanks, Ashutosh Mestry