[ 
https://issues.apache.org/jira/browse/KUDU-1625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17731894#comment-17731894
 ] 

ASF subversion and git services commented on KUDU-1625:
-------------------------------------------------------

Commit 89e8d3daaf0fe9765cf5e229f52809f6f2a85d3e in kudu's branch 
refs/heads/branch-1.17.x from kedeng
[ https://gitbox.apache.org/repos/asf?p=kudu.git;h=89e8d3daa ]

[tablet] GC ancient, fully deleted rowsets without live row count stats

We added a background op to GC ancient, fully deleted rowsets for
KUDU-1625 base on live row count. That patch is very useful, but
does not work for older versions(earlier than 1.10) that do not
support live row count stats. And during the upgrade process from
a lower version to a higher version, live row count feature cannot
be enabled for already existing data.

To resolve this issue on a lower version of the kudu cluster, I submitted
this patch. The main reason is to replace the use of live row count.
However, due to the lack of a more accurate counting method, this patch
may only release part of the storage space for ancient, fully deleted rows.
Therefore, this feature can alleviate the storage space tension of older
versions to a certain extent.

If you need to enable this feature, enable the flag
--enable_gc_deleted_rowsets_without_live_row_count and restart tservers.

There's still room for improvement in this implementation in that, currently,
we ignored the delete operation in DMS. I will resolve this issue in a follow-up
patch.

I ran this on a real cluster, the storage space of deleted rowsets that was not
previously freed can be GCed as expected. And I also add unit test case to 
ensure
it make sense.

Change-Id: Iacdff107b8b07cbd56f47f296a93f4bcfbf56b41
Reviewed-on: http://gerrit.cloudera.org:8080/19670
Tested-by: Kudu Jenkins
Reviewed-by: Yingchun Lai <laiyingc...@apache.org>
Reviewed-by: Yuqi Du <shenxingwuy...@gmail.com>
(cherry picked from commit e89dfe9a7d615b9cdb45e7222ac4dacb0e0916d3)
Reviewed-on: http://gerrit.cloudera.org:8080/20054
Tested-by: Alexey Serbin <ale...@apache.org>


> Schedule compaction on rowsets with high percentage of deleted data
> -------------------------------------------------------------------
>
>                 Key: KUDU-1625
>                 URL: https://issues.apache.org/jira/browse/KUDU-1625
>             Project: Kudu
>          Issue Type: Improvement
>          Components: tablet
>    Affects Versions: 1.0.0
>            Reporter: Todd Lipcon
>            Priority: Major
>
> Although with KUDU-236 we can now remove rows that were deleted prior to the 
> ancient history mark, we don't actively schedule compactions based on deleted 
> rows. So, if for example we have a fully compacted table and issue a DELETE 
> for every row, the data size actually does not change, because no compactions 
> are triggered.
> We need some way to notice the fact that the ratio of deletes to rows is high 
> and decide to compact those rowsets.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to