I came across some MS logs which showed that commands originating locally are 
getting starved (timed out and then cancelled since it never got a chance to 
execute) and the forwarded commands from another MS gets executed even though 
the local commands got scheduled earlier. A huge number of forwarded commands 
arrived in a quick succession. The sequence numbers of the forwarded commands 
were much less than the local commands and so the forwarded commands always got 
added ahead of the local ones.
  
On 24-Oct-2013, at 9:34 PM, Alex Huang <alex.hu...@citrix.com> wrote:

> I only took a quick look here.  I think when commands originate from the 
> management server, it doesn't go through this code.  This is if the command 
> came from the agent which doesn't matter if there's multiple management 
> servers.
> 
> --Alex
> 
>> -----Original Message-----
>> From: Koushik Das
>> Sent: Thursday, October 24, 2013 2:11 AM
>> To: <dev@cloudstack.apache.org>
>> Cc: Alex Huang
>> Subject: Re: Command sequence logic in agent code
>> 
>> Created https://issues.apache.org/jira/browse/CLOUDSTACK-4944 to track
>> this issue.
>> 
>> Alex, Any reason for adding requests based on sequence and not doing FIFO?
>> Do you see any issues if request always gets added to the end of the queue?
>> 
>> 
>> On 23-Oct-2013, at 6:26 PM, Koushik Das <koushik....@citrix.com> wrote:
>> 
>>> I was looking at the command sequencing logic in the agent code.
>>> 
>>> Each agent maintains a sequence that gets initialised based on following
>> logic
>>> 
>>>   private static final Random s_rand = new
>> Random(System.currentTimeMillis());
>>>   _nextSequence = s_rand.nextInt(Short.MAX_VALUE) << 48;
>>> 
>>> For every command that gets processed by the agent the sequence is
>> incremented by 1. If commands are to be executed in sequence then they
>> are queued up based on this sequence
>>> 
>>>   protected synchronized void addRequest(Request req) {
>>>       int index = findRequest(req);
>>>       assert (index < 0) : "How can we get index again? " + index + ":" +
>> req.toString();
>>>       _requests.add(-index - 1, req);
>>>   }
>>> 
>>> The above works fine in case of a single MS scenario. In case of a clustered
>> MS setup things change slightly.
>>> 
>>> The command can originate at any MS and based on the ownership of the
>> agent, it gets forwarded to the correct MS which then handles the command.
>> Now command sequences are local to individual agents in MS. In this case
>> the originating MS agent tags the request with a sequence. This gets
>> forwarded to the owning MS and based on if 'executedInSequence' flag is
>> set, gets added to the list based on the sequence number. Now here lies the
>> problem, commands are not inserted in the order in which they arrive but
>> based on the sequence number. In case of a forwarded command the
>> sequence is different from the local sequence. If the starting sequence of
>> forwarded commands is much less than that of the locally generated
>> commands then there is a possibility of local commands getting starved if
>> there is a steady arrival of forwarded commands. Similarly it can also happen
>> the other way round. Also if the the starting sequence for a agent in local 
>> and
>> peer MS is not spread far apart then there may be overlaps and a new
>> request will override the old one.
>>> 
>>> Not sure if anyone encountered any issues due to this. The correct way
>> looks like to implement the queue model rather than doing a add based on
>> the above code.
>>> 
>>> Comments?
>>> 
>>> -Koushik
>>> 
>>> 
>>> 
>>> 
>>> 
> 

Reply via email to