> On Oct. 22, 2018, 4:18 p.m., Sergio Pena wrote:
> > sentry-core/sentry-core-common/src/main/java/org/apache/sentry/service/common/ServiceConstants.java
> > Lines 153 (patched)
> > <https://reviews.apache.org/r/68973/diff/4/?file=2100402#file2100402line153>
> >
> >     is there an optimal value higher than 2 to make it faster?
> 
> kalyan kumar kalvagadda wrote:
>     I used 2 because it is minimum value. We can increase the number but the 
> optimal number depends on the hardware used and the cpu usage.

any smaller number(2-3) should be good. It's not good idea to have a bigger 
value as we don't have a control on kind of hardware where is sentry is run.


> On Oct. 22, 2018, 4:18 p.m., Sergio Pena wrote:
> > sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
> > Lines 3476-3479 (original), 3491-3494 (patched)
> > <https://reviews.apache.org/r/68973/diff/4/?file=2100403#file2100403line3501>
> >
> >     I wonder how much memory do we expect in the worst case scenarios to be 
> > hold in memory for all these objects in the pool?
> >     
> >     In the previous patchthe objects were persisted (1 at a time), and GC 
> > could collected them.
> 
> kalyan kumar kalvagadda wrote:
>     That is the case even now GC should collect them. Only difference is that 
> there is additional memory usage for short period of time. It is not possible 
> to come up with exact memory usage. In the worst case it could be size of the 
> fullsnapshot.
>     I can make code changes to limit the queue length to avaoid additional 
> memory usage.

This appraach reduces memory usage signifiacantly as we are not holding up the 
data in L-1 cache.


> On Oct. 22, 2018, 4:18 p.m., Sergio Pena wrote:
> > sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
> > Line 3478 (original), 3493 (patched)
> > <https://reviews.apache.org/r/68973/diff/4/?file=2100403#file2100403line3503>
> >
> >     This is executed in a different transaction?
> 
> Arjun Mishra wrote:
>     I believe its all in a single transaction
> 
> kalyan kumar kalvagadda wrote:
>     it is not possible to perform concurrent inserts using a single 
> transaction. SnapshotUpdater is executed as seperate transaction. In the 
> event if failure we need to perform explicit cleanup.

Yes, This is different transaction.


> On Oct. 22, 2018, 4:18 p.m., Sergio Pena wrote:
> > sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
> > Lines 3510 (patched)
> > <https://reviews.apache.org/r/68973/diff/4/?file=2100403#file2100403line3522>
> >
> >     Aren't these deleted during the rollback?
> 
> Arjun Mishra wrote:
>     Does executeTransactionWithRetry include roll back on failure?
> 
> kalyan kumar kalvagadda wrote:
>     it is not possible to perform concurrent inserts using a single 
> transaction. SnapshotUpdater is executed as seperate transaction. In the 
> event if failure we need to perform explicit cleanup.

As we are persisinting the paths for a hive table/database is seperate 
transaction we need to have an explcitiy logic to remove the other paths in the 
event of failure.


- kalyan kumar


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68973/#review209865
-----------------------------------------------------------


On Oct. 22, 2018, 2:42 p.m., kalyan kumar kalvagadda wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68973/
> -----------------------------------------------------------
> 
> (Updated Oct. 22, 2018, 2:42 p.m.)
> 
> 
> Review request for sentry, Arjun Mishra, Na Li, and Sergio Pena.
> 
> 
> Bugs: SENTRY-2305
>     https://issues.apache.org/jira/browse/SENTRY-2305
> 
> 
> Repository: sentry
> 
> 
> Description
> -------
> 
> I have considered multiple options. Persisting in batches is not an option 
> with out changing the schema as the data nucleus does not persist row in 
> batches for tables which have foreign key on other tables.
> 
> I see that best option is to persist the paths in parallel. It gave good 
> results.
> 
> Solution Approach:
> 
> I have used a thread pool to persist the snapshot. Size of this thread pool 
> is configurable. Paths for each object database/table are submitted to this 
> thread pool. If for reason some of the paths are not pesisted, snapshot is 
> removed and exception is throw back.
> 
>   This patch along with SENTRY-2423 was 5 times faster when tested with below.
> 
>  
> 
> Object Type   Count
> Databases     209
> Tables        2100
> Partitions    200004
> 
> 
> Diffs
> -----
> 
>   
> sentry-core/sentry-core-common/src/main/java/org/apache/sentry/service/common/ServiceConstants.java
>  092060c450c6a906850630cb10454737157af5fe 
>   
> sentry-service/sentry-service-server/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
>  b387a22e40b8958395e1c12af12272b89a778726 
>   
> sentry-service/sentry-service-server/src/test/java/org/apache/sentry/provider/db/service/persistent/TestHMSFollower.java
>  0d62941a7bd45e0d8a67b5d95fdb39a8801f5a26 
>   
> sentry-service/sentry-service-server/src/test/java/org/apache/sentry/provider/db/service/persistent/TestSentryStore.java
>  4a9afe303672baff39be01d4f190034b2bfb75fe 
> 
> 
> Diff: https://reviews.apache.org/r/68973/diff/5/
> 
> 
> Testing
> -------
> 
> Added new test and also made sure that existing tests passed.
> 
> 
> Thanks,
> 
> kalyan kumar kalvagadda
> 
>

Reply via email to