[
https://issues.apache.org/jira/browse/ATLAS-5320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18089788#comment-18089788
]
ASF subversion and git services commented on ATLAS-5320:
--------------------------------------------------------
Commit 419a392c76263832f88289fd732b3ec30893e6ab in atlas's branch
refs/heads/atlas-5320 from Radhika Kundam
[ https://gitbox.apache.org/repos/asf?p=atlas.git;h=419a392c7 ]
ATLAS-5320: Distributed Notification Processing - fixing failed tests
> Distributed Notification Processing
> -----------------------------------
>
> Key: ATLAS-5320
> URL: https://issues.apache.org/jira/browse/ATLAS-5320
> Project: Atlas
> Issue Type: New Feature
> Reporter: Radhika Kundam
> Assignee: Radhika Kundam
> Priority: Major
> Attachments: Apache Atlas - Distributed Notification Processing.pdf
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Current entity and lineage processing in Atlas is mostly single-threaded to
> maintain message order, which limits scalability. HMS messages create
> entities, and HS2 messages create lineage based on those entities. To ensure
> correct lineage, we currently serialize all processing, which becomes a
> performance bottleneck.
> As a solution for this, Introduce a proof-of-concept for scalable message
> processing in Apache Atlas by using multiple Kafka topics based on a key
> ({{{}dbName.tableName{}}}). This will enable parallel processing of HMS and
> HS2 messages, improve throughput, and reduce bottlenecks caused by
> single-threaded lineage creation.
> Implementation details:
> * HMS messages are routed to Kafka partitions based on
> {{{}dbName.tableName{}}}.
> * HS2 messages are routed to *all* relevant partitions based on input/output
> tables.
> * Messages are processed in parallel by multiple consumer threads.
> * Deduplication and shell entity handling is incorporated.
> Attached architectural design document.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)