-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68973/
-----------------------------------------------------------
(Updated Oct. 10, 2018, 5:38 p.m.)
Review request for sentry, Arjun Mishra, Na Li, and Sergio Pena.
Changes
-------
Changed the summary.
Summary (updated)
-----------------
SENTRY-2305: Optimize time taken for persistence HMS snapshot by persisting in
parallel
Bugs: SENTRY-2305
https://issues.apache.org/jira/browse/SENTRY-2305
Repository: sentry
Description
-------
I have considered multiple options. Persisting in batches is not an option with
out changing the schema as the data nucleus does not persist row in batches for
tables which have foreign key on other tables.
I see that best option is to persist the paths in parallel. It gave good
results.
Solution Approach:
I have used a thread pool to persist the snapshot. Size of this thread pool is
configurable. Paths for each object database/table are submitted to this thread
pool. If for reason some of the paths are not pesisted, snapshot is removed and
exception is throw back.
This patch along with SENTRY-2423 gave was 5 times faster when tested with
below.
Object Type Count
Databases 209
Tables 2100
Partitions 200004
Diffs
-----
sentry-core/sentry-core-common/src/main/java/org/apache/sentry/service/common/ServiceConstants.java
092060c450c6a906850630cb10454737157af5fe
sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
7a736ca9604eb0bb182a159b5a2aed274275c16e
sentry-service/sentry-service-server/src/test/java/org/apache/sentry/provider/db/service/persistent/TestSentryStore.java
4a9afe303672baff39be01d4f190034b2bfb75fe
Diff: https://reviews.apache.org/r/68973/diff/1/
Testing
-------
Added new test and also made sure that existing tests passed.
Thanks,
kalyan kumar kalvagadda