> On Oct. 22, 2018, 4:18 p.m., Sergio Pena wrote: > > sentry-core/sentry-core-common/src/main/java/org/apache/sentry/service/common/ServiceConstants.java > > Lines 153 (patched) > > <https://reviews.apache.org/r/68973/diff/4/?file=2100402#file2100402line153> > > > > is there an optimal value higher than 2 to make it faster? > > kalyan kumar kalvagadda wrote: > I used 2 because it is minimum value. We can increase the number but the > optimal number depends on the hardware used and the cpu usage.
any smaller number(2-3) should be good. It's not good idea to have a bigger value as we don't have a control on kind of hardware where is sentry is run. > On Oct. 22, 2018, 4:18 p.m., Sergio Pena wrote: > > sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java > > Lines 3476-3479 (original), 3491-3494 (patched) > > <https://reviews.apache.org/r/68973/diff/4/?file=2100403#file2100403line3501> > > > > I wonder how much memory do we expect in the worst case scenarios to be > > hold in memory for all these objects in the pool? > > > > In the previous patchthe objects were persisted (1 at a time), and GC > > could collected them. > > kalyan kumar kalvagadda wrote: > That is the case even now GC should collect them. Only difference is that > there is additional memory usage for short period of time. It is not possible > to come up with exact memory usage. In the worst case it could be size of the > fullsnapshot. > I can make code changes to limit the queue length to avaoid additional > memory usage. This appraach reduces memory usage signifiacantly as we are not holding up the data in L-1 cache. > On Oct. 22, 2018, 4:18 p.m., Sergio Pena wrote: > > sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java > > Line 3478 (original), 3493 (patched) > > <https://reviews.apache.org/r/68973/diff/4/?file=2100403#file2100403line3503> > > > > This is executed in a different transaction? > > Arjun Mishra wrote: > I believe its all in a single transaction > > kalyan kumar kalvagadda wrote: > it is not possible to perform concurrent inserts using a single > transaction. SnapshotUpdater is executed as seperate transaction. In the > event if failure we need to perform explicit cleanup. Yes, This is different transaction. > On Oct. 22, 2018, 4:18 p.m., Sergio Pena wrote: > > sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java > > Lines 3510 (patched) > > <https://reviews.apache.org/r/68973/diff/4/?file=2100403#file2100403line3522> > > > > Aren't these deleted during the rollback? > > Arjun Mishra wrote: > Does executeTransactionWithRetry include roll back on failure? > > kalyan kumar kalvagadda wrote: > it is not possible to perform concurrent inserts using a single > transaction. SnapshotUpdater is executed as seperate transaction. In the > event if failure we need to perform explicit cleanup. As we are persisinting the paths for a hive table/database is seperate transaction we need to have an explcitiy logic to remove the other paths in the event of failure. - kalyan kumar ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/68973/#review209865 ----------------------------------------------------------- On Oct. 22, 2018, 2:42 p.m., kalyan kumar kalvagadda wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/68973/ > ----------------------------------------------------------- > > (Updated Oct. 22, 2018, 2:42 p.m.) > > > Review request for sentry, Arjun Mishra, Na Li, and Sergio Pena. > > > Bugs: SENTRY-2305 > https://issues.apache.org/jira/browse/SENTRY-2305 > > > Repository: sentry > > > Description > ------- > > I have considered multiple options. Persisting in batches is not an option > with out changing the schema as the data nucleus does not persist row in > batches for tables which have foreign key on other tables. > > I see that best option is to persist the paths in parallel. It gave good > results. > > Solution Approach: > > I have used a thread pool to persist the snapshot. Size of this thread pool > is configurable. Paths for each object database/table are submitted to this > thread pool. If for reason some of the paths are not pesisted, snapshot is > removed and exception is throw back. > > This patch along with SENTRY-2423 was 5 times faster when tested with below. > > > > Object Type Count > Databases 209 > Tables 2100 > Partitions 200004 > > > Diffs > ----- > > > sentry-core/sentry-core-common/src/main/java/org/apache/sentry/service/common/ServiceConstants.java > 092060c450c6a906850630cb10454737157af5fe > > sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java > b387a22e40b8958395e1c12af12272b89a778726 > > sentry-service/sentry-service-server/src/test/java/org/apache/sentry/provider/db/service/persistent/TestHMSFollower.java > 0d62941a7bd45e0d8a67b5d95fdb39a8801f5a26 > > sentry-service/sentry-service-server/src/test/java/org/apache/sentry/provider/db/service/persistent/TestSentryStore.java > 4a9afe303672baff39be01d4f190034b2bfb75fe > > > Diff: https://reviews.apache.org/r/68973/diff/5/ > > > Testing > ------- > > Added new test and also made sure that existing tests passed. > > > Thanks, > > kalyan kumar kalvagadda > >