+1 for this change

I tested the part on S3, it is as fast as peter mentioned and cuts
down a lot of chore time if a user is actively using a snapshot and
the archive is growing very fast.

-Stephen

On Thu, Jan 26, 2023 at 8:36 AM Peter Somogyi
<[email protected]> wrote:
>
> Hi,
>
> I was doing performance testing of the CleanerChores using S3 root
> directory with HBase 2.4. When the archive directory was large the HFile
> cleaner threads were blocked on acquiring the SnapshotWriteLock. The flame
> graph [1] showed that inside SnapshotFileCache.getUnreferencedFiles there
> were a lot of listing operations to S3. The lock was held for 45 seconds
> with a directory containing 1000 files.
> I've also seen one case when a snapshot creation failed (timed out). I
> assume that can also be related to the long locking.
>
> The locking time can be drastically reduced by changing the
> Iterable<FileStatus> parameter to List<FileStatus> for the
> FileCleanerDelegate.getDeletableFiles. With this change, the lock was
> released in approximately 100ms, however, the overall time for file
> listing, evaluation, and deletion took the same time (45sec). Since the
> different cleaner threads are not blocked on the snapshot lock the overall
> deletion speed can be increased significantly.
> CleanerChore.checkAndDeleteFiles already receives a List<FileStatus>
> parameter, and later converts it to Iterable<FileStatus> so I don't expect
> a drastic change in the Cleaners' memory usage.
>
> I've done some testing with HDFS as well.
> Cleanup for 100k files took 63518ms with Iterable implementation, the lock
> was held for ~130ms for every 1000 files.
> Cleanup for 100k files took 64411ms with List implementation, the lock was
> held for ~2ms for every 1000 files.
>
> Do you have any concerns about making this change?
>
> Thanks,
> Peter
>
> [1] https://issues.apache.org/jira/browse/HBASE-27590
> [2] https://github.com/apache/hbase/pull/4995

Reply via email to