[
https://issues.apache.org/jira/browse/KUDU-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Will Berkeley resolved KUDU-2001.
---------------------------------
Resolution: Fixed
Fix Version/s: 1.4.0
Fixed in 1aa0ebb91fc072b4cd3ed629721b8263a6ba1ffe
> Metric on_disk_size does not include UNDO deltas
> ------------------------------------------------
>
> Key: KUDU-2001
> URL: https://issues.apache.org/jira/browse/KUDU-2001
> Project: Kudu
> Issue Type: Bug
> Components: tablet
> Affects Versions: 1.3.1
> Reporter: Mike Percy
> Assignee: Will Berkeley
> Priority: Minor
> Fix For: 1.4.0
>
>
> Kudu has a (misleadingly named) metric called {{on_disk_size}} defined in
> tablet.cc with the metric description "Tablet size on disk".
> The current implementation (as of 1.3.1) is that this metric only counts
> bytes contained in the base data and the REDO deltas in the DiskRowSets in
> addition to the data in the MemRowSet. It does not include UNDO deltas. Also
> not included is data in the WALs and other metadata files.
> The easy thing to do to improve this situation is change the description of
> the metric to be "Space used by this tablet's data blocks" and add UNDO
> deltas to the count. However that would be a 2-step process.
> The metric is currently tied to Tablet::EstimateOnDiskSize(). If you trace
> that down to the DiskRowSet you will end up at a function in DiskRowSet:
> {code}
> uint64_t DiskRowSet::EstimateOnDiskSize() const {
> DCHECK(open_);
> shared_lock<rw_spinlock> l(component_lock_);
> return base_data_->EstimateOnDiskSize() +
> delta_tracker_->EstimateOnDiskSize();
> }
> {code}
> In the DeltaTracker, you can see that we are only counting REDO deltas, not
> UNDO deltas:
> {code}
> uint64_t DeltaTracker::EstimateOnDiskSize() const {
> shared_lock<rw_spinlock> lock(component_lock_);
> uint64_t size = 0;
> for (const shared_ptr<DeltaStore>& ds : redo_delta_stores_) {
> size += ds->EstimateSize();
> }
> return size;
> }
> {code}
> However, this function is used by the MM op
> MajorDeltaCompactionOp::UpdateStats() which eventually calls into double
> DiskRowSet::DeltaStoresCompactionPerfImprovementScore(). That function calls
> into EstimateDeltaDiskSize() which has the following implementation:
> {code}
> uint64_t DiskRowSet::EstimateDeltaDiskSize() const {
> DCHECK(open_);
> shared_lock<rw_spinlock> l(component_lock_);
> return delta_tracker_->EstimateOnDiskSize();
> }
> {code}
> So in order not to break that estimation we will need to separate the two,
> such that we provide a way to estimate the Redo delta size separately from
> the size of all of the deltas in a RowSet.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)