[
https://issues.apache.org/jira/browse/SENTRY-2249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16525670#comment-16525670
]
kalyan kumar kalvagadda commented on SENTRY-2249:
-------------------------------------------------
*Here are two options I had in mind.*
*Option-1:* Persist the snapshot entities in batches. This may significantly
reduce the DB operations. Currently there is one DB operation for one entry in
snapshot. Which does not scale.
*Option-2:* Break the total snapshot into to batches and persist all of them in
parallel in different transactions. As we use using repeatable_read isolation
level we should be able to have parallel writes on the same table. This bring
an issue if there is a failure in persisting any of the batches. This approach
needs additional logic of cleaning the partially persisted snapshot. I’m
evaluating this option.
> Persist HMS Full Snapshot in batches.
> -------------------------------------
>
> Key: SENTRY-2249
> URL: https://issues.apache.org/jira/browse/SENTRY-2249
> Project: Sentry
> Issue Type: Improvement
> Components: Sentry
> Affects Versions: 2.1.0
> Reporter: kalyan kumar kalvagadda
> Assignee: kalyan kumar kalvagadda
> Priority: Major
> Attachments: SENTRY-2249.001.patch
>
>
> Currently each entry in full snapshot of HMS is persisted one entry at a
> time. Instead it could be optimized by persisting the entries in batches. DB
> operations are expensive, reducing the number of database operations should
> help. This would decrease the time to persist the snapshot in to database
> significantly.
> Size of the batch could be configurable.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)