[ 
https://issues.apache.org/jira/browse/SENTRY-2249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16671868#comment-16671868
 ] 

kalyan kumar kalvagadda commented on SENTRY-2249:
-------------------------------------------------

[~spena] [~LinaAtAustin] 

*What is the issue?*

Persisting the snapshot is slow. One of the reasons for that is inserting the 
entries into AUTHZ_PATH table one at a time sequentially instead of sending the 
commands to datastore in a batch. This behavior is not efficient specially when 
application is performing a bulk insert. While persisting snapshot application 
has to persist huge number of entires into AUTHZ_PATH table. This number will 
be in the magnitude of millions.

 

*Root cause of the Issue:*

Datanucleus does not perform a batch insert when the JDO entity has ForeignKey 
mapping. In our case MPath has FK mapping to MAuthzPathsMapping. This relation 
is needed for the datanucleus to fetch the snapshot.

 

*Solution Approach:*

Have two mapping for AUTHZ_PATh table. One mapping to insert the data and the 
other to fetch the data. This approach could bring a question on how do force 
contributors to follow this guidance. 

In that effect I made change to MAuthzPathsMapping so that MPath entities are 
not persisted during snapshots.

 

 

> Enable batch insert of HMS paths in  Full Snapshot.
> ---------------------------------------------------
>
>                 Key: SENTRY-2249
>                 URL: https://issues.apache.org/jira/browse/SENTRY-2249
>             Project: Sentry
>          Issue Type: Improvement
>          Components: Sentry
>    Affects Versions: 2.1.0
>            Reporter: kalyan kumar kalvagadda
>            Assignee: kalyan kumar kalvagadda
>            Priority: Major
>         Attachments: SENTRY-2249.001.patch, SENTRY-2249.002.patch, 
> SENTRY-2249.003.patch, SENTRY-2249.004.patch, SENTRY-2249.005.patch
>
>
> Currently each entry in full snapshot of HMS is persisted one entry at a 
> time. Instead it could be optimized by persisting the path entries in 
> batches. DB operations are expensive, reducing the number of database 
> operations and around trip time will help. This would decrease the time to 
> persist the snapshot in to database significantly.
> Size of the batch could be configurable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to