[
https://issues.apache.org/jira/browse/ATLAS-5320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18090708#comment-18090708
]
ASF subversion and git services commented on ATLAS-5320:
--------------------------------------------------------
Commit 3fce2f7378cc48d8afca5f222a5c628e47181b29 in atlas's branch
refs/heads/master from Radhika Kundam
[ https://gitbox.apache.org/repos/asf?p=atlas.git;h=3fce2f737 ]
ATLAS-5320: Distributed Notification Processing (#671)
> Distributed Notification Processing
> -----------------------------------
>
> Key: ATLAS-5320
> URL: https://issues.apache.org/jira/browse/ATLAS-5320
> Project: Atlas
> Issue Type: New Feature
> Reporter: Radhika Kundam
> Assignee: Radhika Kundam
> Priority: Major
> Attachments: Apache Atlas - Distributed Notification Processing.pdf
>
> Time Spent: 0.5h
> Remaining Estimate: 0h
>
> Current entity and lineage processing in Atlas is mostly single-threaded to
> maintain message order, which limits scalability. HMS messages create
> entities, and HS2 messages create lineage based on those entities. To ensure
> correct lineage, we currently serialize all processing, which becomes a
> performance bottleneck.
> As a solution for this, Introduce a proof-of-concept for scalable message
> processing in Apache Atlas by using multiple Kafka topics based on a key
> ({{{}dbName.tableName{}}}). This will enable parallel processing of HMS and
> HS2 messages, improve throughput, and reduce bottlenecks caused by
> single-threaded lineage creation.
> Implementation details:
> * HMS messages are routed to Kafka partitions based on
> {{{}dbName.tableName{}}}.
> * HS2 messages are routed to *all* relevant partitions based on input/output
> tables.
> * Messages are processed in parallel by multiple consumer threads.
> * Deduplication and shell entity handling is incorporated.
> Attached architectural design document.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)