Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13426 )

Change subject: KUDU-2797: the master aggregates tablet metrics
......................................................................


Patch Set 6:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/13426/6//COMMIT_MSG
Commit Message:

PS6:
Could you update this to use "stats" too?

Could you also update this to describe the behavior of collecting stats from 
all replicas, but only displaying one. Do you plan on using the follower stats 
at all?


http://gerrit.cloudera.org:8080/#/c/13426/6//COMMIT_MSG@15
PS6, Line 15: And on this basis, we can expose this information to Impala
Question about how you intend on using this:
This patch currently displays this in the master web UI, which is useful to 
operators. But these also only get updated every 5 minutes by default. In your 
experience, is this sufficient for Impala queries? Should we also expose an API 
to compute and aggregate these at runtime so Impala can get the most up-to-date 
stats?


http://gerrit.cloudera.org:8080/#/c/13426/6/src/kudu/master/catalog_manager.cc
File src/kudu/master/catalog_manager.cc:

http://gerrit.cloudera.org:8080/#/c/13426/6/src/kudu/master/catalog_manager.cc@5257
PS6, Line 5257:     const auto it = replica_stats_.find(replica_id);
              :     if (it != replica_stats_.end()) {
              :       return it->second;
              :     }
nit: This may be simpler if you use FindOrNull, e.g.

{
  <lock>
  const auto* stats = FindOrNull(replica_stats_, replica_id);
  if (stats) {
    return *stats;
  }
}
return ReportedTabletStatsPB();


http://gerrit.cloudera.org:8080/#/c/13426/6/src/kudu/tserver/heartbeater.h
File src/kudu/tserver/heartbeater.h:

PS6:
nit: now that this hasn't changed, could you revert this file?


http://gerrit.cloudera.org:8080/#/c/13426/6/src/kudu/tserver/heartbeater.cc
File src/kudu/tserver/heartbeater.cc:

http://gerrit.cloudera.org:8080/#/c/13426/6/src/kudu/tserver/heartbeater.cc@559
PS6, Line 559:     // Update the tablet statistics if necessary.
nit: maybe consider putting this by L442. DoHeartbeat seems to contain the 
logic for putting the heartbeat request together (and sending them), whereas 
RunThread focuses more on scheduling the heartbeats and handling the RPC 
response.

Also it seems like there is at least some other code in DoHeartbeat that 
updates server state.


http://gerrit.cloudera.org:8080/#/c/13426/6/src/kudu/tserver/ts_tablet_manager.cc
File src/kudu/tserver/ts_tablet_manager.cc:

http://gerrit.cloudera.org:8080/#/c/13426/6/src/kudu/tserver/ts_tablet_manager.cc@1289
PS6, Line 1289:     StatsMap::const_iterator it = 
stats_map_.find(replica->tablet_id());
              :     if (it != stats_map_.end()) {
              :       reported_tablet->mutable_stats()->CopyFrom(*it->second);
              :     }
FindOrNull might be convenient here too.

Also I think we should be able to do something like:

  *reported_tablet->mutable_stats() = *tablet_stats;

instead of CopyFrom().


http://gerrit.cloudera.org:8080/#/c/13426/6/src/kudu/tserver/ts_tablet_manager.cc@1545
PS6, Line 1545:   // Check if the time is up.
              :   {
              :     shared_lock<rw_spinlock> l(lock_stats_);
              :     if (MonoTime::Now() < next_update_time_) {
              :       return;
              :     }
              :   }
              :
              :   // Only one thread is allowed to update at the same time.
              :   if (!lock_stats_.try_lock()) {
              :     return;
              :   }
Rather than locking twice, what do you think about something like:

if (!lock_stats_.try_lock()) {
  return;
}

if (MonoTime::Now() < next_update_time_) {
  return;
}

?


http://gerrit.cloudera.org:8080/#/c/13426/6/src/kudu/tserver/ts_tablet_manager.cc@1568
PS6, Line 1568: unique_ptr<ReportedTabletStatsPB> pb(new 
ReportedTabletStatsPB())
Do these need to be heap-allocated? Could we create the ReportedTableStatsPB on 
stack and std::move() them into the map?


http://gerrit.cloudera.org:8080/#/c/13426/6/src/kudu/tserver/ts_tablet_manager.cc@1600
PS6, Line 1600: for (const auto& tablet_id : dirty_tablets) {
              :     MarkTabletDirty(tablet_id, "The tablet statistics have been 
changed");
              :   }
> I think it's necessary to add a new function to support batch dirty tablets
Hrm, indeed.

Alternatively, perhaps these could use:

  server_->heartbeater()->MarkTabletDirty(tablet_id, reason);

to not trigger ASAP? Would that be sufficient?


http://gerrit.cloudera.org:8080/#/c/13426/6/www/table.mustache
File www/table.mustache:

http://gerrit.cloudera.org:8080/#/c/13426/6/www/table.mustache@35
PS6, Line 35:     <tr><td>Disk Size:</td><td>{{table_disk_size}}</td></tr>
Since this is also a UI change, if you can, could you manually test on a small 
cluster that these look correct in case we have:
- a live tablet
- a tombstoned tablet
- a deleted tablet

?



--
To view, visit http://gerrit.cloudera.org:8080/13426
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I74406ab7cca7c22fda455c328b8ee9989a6b2d99
Gerrit-Change-Number: 13426
Gerrit-PatchSet: 6
Gerrit-Owner: helifu <hzhel...@corp.netease.com>
Gerrit-Reviewer: Adar Dembo <a...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mpe...@apache.org>
Gerrit-Reviewer: helifu <hzhel...@corp.netease.com>
Gerrit-Comment-Date: Wed, 12 Jun 2019 01:17:47 +0000
Gerrit-HasComments: Yes

Reply via email to