For a few weeks, one of our clusters has been having large spikes in CPU across 
most or all nodes, at the same time. Beginning at the same time, our metrics 
collection job has intermittently started to see long response times when 
calling the following api end-points:

/nifi-api/process-groups/root
/nifi-api/flow/process-groups/root/status?recursive=False&nodewise=True
/nifi-api/system-diagnostics?nodewise=true

There has been no change in response time for these end-points:

/nifi-api/access/token
/nifi-api/flow/cluster/summary
/nifi-api/tenants/users/
/nifi-api/flow/process-groups/root/controller-services?includeAncestorGroups=False&includeDescendantGroups=True

All calls are node local, so usually response times are sub-second. But when 
the slowness sets in, some problem calls may take as long as 45 to 60 seconds. 
Sometimes it's just one call, sometimes it's more than one that is slow (they 
are called synchronously back to back).

Any thoughts on properties/settings that we can look at, or component logging 
that we can enable to help troubleshoot the slowness?

We have a second cluster, geographically isolated, that runs (to the best of 
our knowledge) the exact same jobs with the exact same data. This cluster has 
no issues.

Thanks,
  Peter

Reply via email to