Github user ivakegg commented on the issue:

    https://github.com/apache/accumulo/pull/260
  
    @phrocker 
    > Yields are not aware of other yields and thus are completely independent 
and thus cannot cooperate with any scheduling mechanism. My old Operating 
System book calls this "uncooperative yielding." But I can see how this can be 
confusing. Let's call it isolated yielding.
    
    Taking a quote from: https://en.wikipedia.org/wiki/Cooperative_multitasking:
    
    > Cooperative multitasking, also known as non-preemptive multitasking, is a 
style of computer multitasking in which the operating system never initiates a 
context switch from a running process to another process. Instead, processes 
voluntarily yield control periodically or when idle in order to enable multiple 
applications to be run simultaneously.
    
    I believe this is (at least in part) the capability I am providing here, 
the ability for the scan (a process in some sense) to voluntarily yield 
control.  This is why I used the word cooperative.  Granted the scheduler in 
this case does not require all iterators to be able to yield so in that sense 
we have a mix of cooperative and preemptive/uncooperative multitasking.  I 
submit in that I am stretching the definition, so "isolated yielding" it is.
    
    > To your point that "they do it to themselves." Well, since an iterator is 
one amongst a stack and you could have a multi-user system, if you had one 
iterator that would skip just five more keys before completing, but is 
pre-empted due to another iterator, you have the potential for a yield when one 
is not desired.
    
    One iterator cannot pre-empt another unless the first iterator explicitly 
enables it to do so (see enableYielding(callback) on SKVI).  By enabling 
yielding on an iterator/source, then that iterator must be able to deal with 
that iterator yielding after any seek or next call.  That is part of the 
contract.  Note that the only way to actually yield the scan is for the top 
level iterator to do so.  No iterator is required to call enableYielding on the 
iterator below it.  The Tablet will only call enableYielding on the top level 
iterator.  Since the top level iterator may be the SourceSwitchingIterator, I 
made sure that it can handle this and passes this on to the next iterator below.
    
    > The only way to combat this would be solid metrics. You don't know how 
many increased RPC calls there are. This can increase RPCs if you simply set 
the key yield incorrectly. You don't know I/O load and how many keys being 
skipped is reasonable without these metrics. Further, one key is not the same 
as another key. Parts of a table could have much smaller keys, so again, these 
metrics prove everything by telling us: how much time spent before yield, size 
of keys skipped, etc, etc
    > Hence those metrics would be useful to show if this mechanism works as 
intended in production.
    
    OK, I will see if I can add a metric to the accumulo metrics mechanism.
    
    > Then, after metrics, a nice to have would be a mechanism that allows the 
entire scan to stop. If you are going to put a limit and "yield." You must have 
a cessation point. Agree that long running scans can happen, but the RPC 
increase and context switching is a problem that we cannot stop with the 
current solution. You also need a point at which you have yielded enough and 
thus must stop entirely.
    
    Sounds like a reasonable feature.  Please write a ticket.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to