> On Jan. 7, 2021, 6:17 a.m., Madhan Neethiraj wrote: > > repository/src/main/java/org/apache/atlas/repository/audit/HBaseBasedAuditRepository.java > > Lines 302 (patched) > > <https://reviews.apache.org/r/73119/diff/2/?file=2243934#file2243934line302> > > > > Does use of second scan result in significant performance improvements > > i.e. avoiding population of event.details and event.entity during the first > > scan? If not, it will be simper to move #320 - #326 to #293 and avoid this > > second scan.
Yes, there is significant performance improvement with the two scan approach. The first scan is a full scan and it is done with limited columns. The second scan is a limited row scan with all the columns. If we want to avoid the second scan then the first scan has to be a full scan with all the columns which would be very expensive. On a setup with 1 million audit entries, spread across a thousand entities, the two scan approach was 5X faster than the single scan approach. > On Jan. 7, 2021, 6:17 a.m., Madhan Neethiraj wrote: > > webapp/src/main/java/org/apache/atlas/web/rest/EntityREST.java > > Lines 802 (patched) > > <https://reviews.apache.org/r/73119/diff/2/?file=2243937#file2243937line802> > > > > Why not add 'sortBy' query-param to existing REST API method, instead > > of adding a new method? Also, consider adding 'sortOrder' query-param - > > with default value as 'asc'. Added the 'sortorder' as a query-param in the new API. There are fundamental differences between the two APIs - Offset in the existing rest API is achieved by startkey parameter where hbase row key is expected. New API takes offset in the form of an integer. - Performance and mechanism are different. New API is done with in-memory sort and therefore is 10X slower than the old one. If we mix the APIs it would be very difficult and confusing to present the benchmarking. - Old API support filter by column, but the new one does not. Also, to differentiate the behavior based on parameters would be very confusing from docs perspective. - Deep ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/73119/#review222416 ----------------------------------------------------------- On Jan. 6, 2021, 7:06 p.m., Deep Singh wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/73119/ > ----------------------------------------------------------- > > (Updated Jan. 6, 2021, 7:06 p.m.) > > > Review request for atlas, Ashutosh Mestry, Madhan Neethiraj, Sarath > Subramanian, and Sidharth Mishra. > > > Bugs: ATLAS-4094 > https://issues.apache.org/jira/browse/ATLAS-4094 > > > Repository: atlas > > > Description > ------- > > Created a new entity audit API with Sorting functionality. > > /api/atlas/v2/entity/{entity-guid}/audit/sortby/{sort-column}?offset=2&count=5 > > sort-column may have 3 values "user", "action", or "timestamp". > > HBase does not support query with sorted results. To support this API > inmemory sort has to be performed. > Audit entry can potentially have entire entity dumped into it. Loading entire > audit entries for an entity can be memory intensive. Therefore we load audit > entries with limited columns first, perform sort on this light weight list, > then get the relevant section by removing offsets and reducing to limits. > With this reduced list we create MultiRowRangeFilter and then again scan the > table to get all the columns from the table this time. > > > Diffs > ----- > > > repository/src/main/java/org/apache/atlas/repository/audit/CassandraBasedAuditRepository.java > 8a453fd43 > > repository/src/main/java/org/apache/atlas/repository/audit/EntityAuditRepository.java > 07784d1c4 > > repository/src/main/java/org/apache/atlas/repository/audit/HBaseBasedAuditRepository.java > 9fca74470 > > repository/src/main/java/org/apache/atlas/repository/audit/InMemoryEntityAuditRepository.java > 900df0205 > > repository/src/main/java/org/apache/atlas/repository/audit/NoopEntityAuditRepository.java > ef9e259ea > webapp/src/main/java/org/apache/atlas/web/rest/EntityREST.java 0d6d0c845 > > > Diff: https://reviews.apache.org/r/73119/diff/2/ > > > Testing > ------- > > Manual testing was done. > > > Thanks, > > Deep Singh > >