-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/73010/
-----------------------------------------------------------

(Updated Nov. 10, 2020, 5:48 p.m.)


Review request for atlas, Madhan Neethiraj, Nikhil Bonte, Nixon Rodrigues, and 
Sarath Subramanian.


Changes
-------

Updates include: Addressed review comments.


Bugs: ATLAS-4015
    https://issues.apache.org/jira/browse/ATLAS-4015


Repository: atlas


Description
-------

**Background**
Please see JIRA.
Re-indexing within Atlas was implemented so far as an external tool. Using this 
tool had number of challenges. The biggest being the throughput of the tool. 
For a medium sized Atlas repository, the tool could take days to finish.

The implementation addresses the problems. (See results below.)

**Approach**
Re-indexing is now implemented as a JAVA_PATCH that is applied only when the 
property _atlas.patch.reindex.enabled_ is set to true.

*Modified* AtlasJanusGraphManagement: New method _reindex_ implements the 
re-indexing logic.
*New* _ReIndexPatch_ is a JAVA_PATCH that implements the reindexing logic. This 
uses the PC framework to enumerate vertices and edges. The patch application 
displays useful log messages indicating progress.

**Configuration**
_atlas.patch.reindex.enabled=true_


Diffs (updated)
-----

  
graphdb/api/src/main/java/org/apache/atlas/repository/graphdb/AtlasGraphManagement.java
 f7d2e273c 
  
graphdb/janus/src/main/java/org/apache/atlas/repository/graphdb/janus/AtlasJanusGraphManagement.java
 2a2ef92a7 
  intg/src/main/java/org/apache/atlas/AtlasConfiguration.java 1c7915859 
  
repository/src/main/java/org/apache/atlas/repository/patches/AtlasPatchManager.java
 b142a2a4a 
  
repository/src/main/java/org/apache/atlas/repository/patches/ConcurrentPatchProcessor.java
 c6f0e6438 
  
repository/src/main/java/org/apache/atlas/repository/patches/ReIndexPatch.java 
PRE-CREATION 


Diff: https://reviews.apache.org/r/73010/diff/2/

Changes: https://reviews.apache.org/r/73010/diff/1-2/


Testing
-------

**Test Setup**
Start with a known Atlas setup with known data. Ascetain that basic search 
yields results.

Use these CURL commands to delete Solr indexes:

curl http://<host>:8983/solr/vertex_index/update?commit=true  -H "Content-Type: 
text/xml" --data-binary '<delete><query>b2d_t:*</query></delete>'

curl http://<host>:8983/solr/edge_index/update?commit=true  -H "Content-Type: 
text/xml" --data-binary '<delete><query>1151_t:*</query></delete>'

curl 
http://ve0128.halxg.cloudera.com:8983/solr/fulltext_index/update?commit=true  
-H "Content-Type: text/xml" --data-binary 
'<delete><query>14at_t:*</query></delete>'

This will delete solr indexes. If basic search is performed from within the web 
UI, it will not show any results.

Now set configuration parameter. Restart Atlas.

Server-side logs will indicate that the patch is run.

**Volume Testing**
Vertices: ~16M: Duration: ~5 hrs.
Edges: ~122M: ~6 hrs.

Configuration parameters:
atlas.patch.reindex.enabled=true
atlas.patch.numWorkers=14
atlas.patch.batchSize=1000

Node configuration:
Atlas: Heap size: 6 GB.
Solr: Heap size: 12 GB.

**PC Build**
https://ci-builds.apache.org/job/Atlas/job/PreCommit-ATLAS-Build-Test/177/


Thanks,

Ashutosh Mestry

Reply via email to