[
https://issues.apache.org/jira/browse/SENTRY-2249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16671868#comment-16671868
]
kalyan kumar kalvagadda commented on SENTRY-2249:
-------------------------------------------------
[~spena] [~LinaAtAustin]
*What is the issue?*
Persisting the snapshot is slow. One of the reasons for that is inserting the
entries into AUTHZ_PATH table one at a time sequentially instead of sending the
commands to datastore in a batch. This behavior is not efficient specially when
application is performing a bulk insert. While persisting snapshot application
has to persist huge number of entires into AUTHZ_PATH table. This number will
be in the magnitude of millions.
*Root cause of the Issue:*
Datanucleus does not perform a batch insert when the JDO entity has ForeignKey
mapping. In our case MPath has FK mapping to MAuthzPathsMapping. This relation
is needed for the datanucleus to fetch the snapshot.
*Solution Approach:*
Have two mapping for AUTHZ_PATh table. One mapping to insert the data and the
other to fetch the data. This approach could bring a question on how do force
contributors to follow this guidance.
In that effect I made change to MAuthzPathsMapping so that MPath entities are
not persisted during snapshots.
> Enable batch insert of HMS paths in Full Snapshot.
> ---------------------------------------------------
>
> Key: SENTRY-2249
> URL: https://issues.apache.org/jira/browse/SENTRY-2249
> Project: Sentry
> Issue Type: Improvement
> Components: Sentry
> Affects Versions: 2.1.0
> Reporter: kalyan kumar kalvagadda
> Assignee: kalyan kumar kalvagadda
> Priority: Major
> Attachments: SENTRY-2249.001.patch, SENTRY-2249.002.patch,
> SENTRY-2249.003.patch, SENTRY-2249.004.patch, SENTRY-2249.005.patch
>
>
> Currently each entry in full snapshot of HMS is persisted one entry at a
> time. Instead it could be optimized by persisting the path entries in
> batches. DB operations are expensive, reducing the number of database
> operations and around trip time will help. This would decrease the time to
> persist the snapshot in to database significantly.
> Size of the batch could be configurable.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)