Github user phrocker commented on the issue:
https://github.com/apache/accumulo/pull/260
@ivakegg Yields are not aware of other yields and thus are completely
independent and thus cannot cooperate with any scheduling mechanism. My old
Operating System book calls this "uncooperative yielding." But I can see how
this can be confusing. Let's call it isolated yielding.
To your point that "they do it to themselves." Well, since an iterator is
one amongst a stack and you could have a multi-user system, if you had one
iterator that would skip just five more keys before completing, but is
pre-empted due to another iterator, you have the potential for a yield when one
is not desired. The only way to combat this would be solid metrics. You don't
know how many increased RPC calls there are. This can increase RPCs if you
simply set the key yield incorrectly. You don't know I/O load and how many
keys being skipped is reasonable without these metrics. Further, one key is not
the same as another key. Parts of a table could have much smaller keys, so
again, these metrics prove everything by telling us: how much time spent before
yield, size of keys skipped, etc, etc
Hence those metrics would be useful to show if this mechanism works as
intended in production.
Then, after metrics, a nice to have would be a mechanism that allows the
entire scan to stop. If you are going to put a limit and "yield." You must have
a cessation point. Agree that long running scans can happen, but the RPC
increase and context switching is a problem that we cannot stop with the
current solution. You also need a point at which you have yielded enough and
thus must sop entirely.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---