rkundam opened a new pull request, #671:
URL: https://github.com/apache/atlas/pull/671

   To support high-volume metadata ingestion while preserving lineage accuracy 
and data consistency, https://issues.apache.org/jira/browse/ATLAS-5320 
introduced a distributed parallel processing architecture. Rather than relying 
on a single-threaded sequential pipeline, the system partitions entity 
workloads deterministically and processes independent entity families 
concurrently.
   
   How was this patch tested?
   Tested with different sets of data in clusters and compared with Serial 
Processing for the same datasets.
   Ex: For below dataset, Serial Processing took around 5hrs and Distributed 
Parallel processing with 3 metadata and 3 lineage topics it took 1.5hrs.
   Tables: 6.5K
   Column: 130K
   Lineage: 1.7K


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to