Re: [vbox-dev] Race condition?

2013-02-25 Thread Maxime Dor
Hello Klaus,

Thank you very much for the insight, that makes it perfectly clear.
I'll make sure this is handled properly then.

Thank you very much for your time.

Best regards,
Max

On 25/02/2013 17:38, Klaus Espenlaub wrote:
> You're not missing anything, and it's not a bug either. It's an 
> inherent API race, since there is nothing preventing session state 
> changes between the if statement and the unlockMachine() call (and 
> there can't be anything preventing it). The cause is that the 
> operations you're doing (saveState/powerDown) are terminating the VM, 
> and at the very end of those operations the VM session will be taken 
> down (which implies that the associated shared session locks are gone 
> too). Your API client code is racing with exactly this session cleanup 
> which can extend slightly after the operation already signaled 
> completion through the corresponding IProgress object. So it's up to 
> well-written code to be prepared for such spurious session state 
> changes (which might happen for other reasons actually, e.g. if the VM 
> process is terminated concurrently by some other API client, which can 
> happen even if your code doesn't). Your code has to handle it. Klaus 


___
vbox-dev mailing list
vbox-dev@virtualbox.org
https://www.virtualbox.org/mailman/listinfo/vbox-dev


Re: [vbox-dev] Race condition?

2013-02-25 Thread Klaus Espenlaub
Hi Max,

On 23.02.2013 03:17, Maxime Dor wrote:
> Hello,
>
> Given the following Java code snipet :
[...]
> I am running this using the 4.2 API (from the 4.2.4 SDK to be precise)
> against Web Services running on a Debian Testing box.
>
> When running my Unit tests, I keep having an intermitent error on the
> following part :
>> if (session.getState().equals(SessionState.Locked)) {
>> session.unlockMachine();
>> }
> /session.unlockMachine()/ throws an exception with : "/The session is
> not locked (session state: Unlocked) (0x8000)/". This exception is
> thrown about 1 out of 5 times I use this method or a copy of this one
> with only/.saveState()/ instead of/.powerDown()/ on the machine's
> console object.
>
> Am I missing something? From what I see here, it looks like the session
> state changes between the IF statement and the actual unlockMachine()
> call, looking like a race condition - or a bug?

You're not missing anything, and it's not a bug either. It's an inherent 
API race, since there is nothing preventing session state changes 
between the if statement and the unlockMachine() call (and there can't 
be anything preventing it).

The cause is that the operations you're doing (saveState/powerDown) are 
terminating the VM, and at the very end of those operations the VM 
session will be taken down (which implies that the associated shared 
session locks are gone too). Your API client code is racing with exactly 
this session cleanup which can extend slightly after the operation 
already signaled completion through the corresponding IProgress object.

So it's up to well-written code to be prepared for such spurious session 
state changes (which might happen for other reasons actually, e.g. if 
the VM process is terminated concurrently by some other API client, 
which can happen even if your code doesn't). Your code has to handle it.


Klaus

>
> Thank you for your insight.
>
> Best regards,
> Max
>
>
> ___
> vbox-dev mailing list
> vbox-dev@virtualbox.org
> https://www.virtualbox.org/mailman/listinfo/vbox-dev


-- 
Oracle 
Dr. Klaus Espenlaub | Software Development Director
Phone: +49 7151 60405 205 
Oracle VM VirtualBox

ORACLE Deutschland B.V. & Co. KG | Werkstr. 24 | 71384 Weinstadt

ORACLE Deutschland B.V. & Co. KG
Hauptverwaltung: Riesstr. 25, D-80992 München
Registergericht: Amtsgericht München, HRA 95603
Geschäftsführer: Jürgen Kunz

Komplementärin: ORACLE Deutschland Verwaltung B.V.
Hertogswetering 163/167, 3543 AS Utrecht, Niederlande
Handelsregister der Handelskammer Midden-Niederlande, Nr. 30143697
Geschäftsführer: Alexander van der Ven, Astrid Kepper, Val Maher

Green Oracle  Oracle is committed to
developing practices and products that help protect the environment


___
vbox-dev mailing list
vbox-dev@virtualbox.org
https://www.virtualbox.org/mailman/listinfo/vbox-dev


[vbox-dev] Race condition?

2013-02-22 Thread Maxime Dor

Hello,

Given the following Java code snipet :

   @Override
   public void powerOff() throws MachineException {
  Logger.track();

  IMachine machine = ConnectionManager.findMachine(getUuid());
  ISession session = ConnectionManager.getSession();
  try {
 machine.lockMachine(session, LockType.Shared);
 IProgress p = session.getConsole().powerDown();
 while (!p.getCompleted() || p.getCanceled()) {
try {
Thread.sleep(Math.abs(p.getTimeRemaining()) * waitingCoef);
} catch (InterruptedException e) {
   Logger.exception(e);
}
 }
 Logger.debug("Return code : " + p.getResultCode());
  } catch (VBoxException e) {
 throw new MachineException(e.getWrapped().getMessage());
  } finally {
 if (session.getState().equals(SessionState.Locked)) {
session.unlockMachine();
 }
 machine.releaseRemote();
  }
   }


I am running this using the 4.2 API (from the 4.2.4 SDK to be precise) 
against Web Services running on a Debian Testing box.


When running my Unit tests, I keep having an intermitent error on the 
following part :

 if (session.getState().equals(SessionState.Locked)) {
session.unlockMachine();
 }
/session.unlockMachine()/ throws an exception with : "/The session is 
not locked (session state: Unlocked) (0x8000)/". This exception is 
thrown about 1 out of 5 times I use this method or a copy of this one 
with only/.saveState()/ instead of/.powerDown()/ on the machine's 
console object.


Am I missing something? From what I see here, it looks like the session 
state changes between the IF statement and the actual unlockMachine() 
call, looking like a race condition - or a bug?


Thank you for your insight.

Best regards,
Max
___
vbox-dev mailing list
vbox-dev@virtualbox.org
https://www.virtualbox.org/mailman/listinfo/vbox-dev


Re: [vbox-dev] Race condition in DrvNAT.cpp (was: svn: NAT dropping all external packets)

2009-12-01 Thread Robin Green
Yes, that worked. Thanks!
-- 
Robin

2009/11/30 Vasily Levchenko 

> But I think root cause of the problem isn't here (race), because earlier or
> later urgent channel will be waken up, by ICMP message from host and will
> wakeup Recv thread. But for me problem is located at mbuf zone water line
> mechanism some basic implementation of rfc 896. Have you tried first today's
> suggestion?
>
___
vbox-dev mailing list
vbox-dev@virtualbox.org
http://vbox.innotek.de/mailman/listinfo/vbox-dev


Re: [vbox-dev] Race condition in DrvNAT.cpp (was: svn: NAT dropping all external packets)

2009-11-30 Thread Vasily Levchenko

On Nov 30, 2009, at 7:23 PM, Robin Green wrote:

> Hi Vasily,
> I am sure that the patched code is still not correct. A thread switch could 
> occur between the unlocking of one lock and the locking of another - right?

Need investigate that. 

But I think root cause of the problem isn't here (race), because earlier or 
later urgent channel will be waken up, by ICMP message from host and will 
wakeup Recv thread. But for me problem is located at mbuf zone water line 
mechanism some basic implementation of rfc 896. Have you tried first today's 
suggestion?  

> So, you would need to lock the *same* lock that is currently locked by 
> RTSemEventWait (on systems that use semevent-posix.cpp, anyway), so that that 
> lock will be temporarily released while it waits.
> 
> However, there is a problem here. With systems like Mac OS X that use 
> semevent-posix.cpp, the implementation of RTSemEventWait in there uses a 
> lock. So, OK, it should be possible to lock that lock (but you might have to 
> prevent locking the same lock twice). But on Linux, semevent-linux.cpp is 
> presumably used, and the implementation of RTSemEventWait in there does *not* 
> use a lock! I think this is wrong - you can't use the code in RTSemEventWait 
> in semevent-linux.cpp to implement a correctly-working condition variable or 
> semaphore for what you're trying to do, because to implement it correctly you 
> need to hold a lock to avoid race conditions, and atomically release the lock 
> while you wait.
> 
> I know this stuff can be confusing - I find it confusing myself, and I might 
> have misunderstood it - so please, do not hesitate to ask for further 
> explanation.
> 
> -- 
> Robin
> 
> 2009/11/30 Vasily Levchenko 
> Hi Robin,
> Could you please check, that patch solves problem for you?
> 
> 
> On Nov 28, 2009, at 7:30 PM, Robin Green wrote:
> 
>> Further to my earlier report on this list of NAT adapters stopping working 
>> with svn and 3.1.0 beta versions, I believe I have identified the problem.
>> 
>> r23462 introduced the bug, by introducing a new thread to process "urgent" 
>> packets, the communication of which with the existing thread isn't properly 
>> synchronised. Basically, it's the classic bug of sending a signal to a 
>> thread (in this case, the existing, non-urgent thread) without using a 
>> critical section to ensure that the target thread is waiting for the signal 
>> when it's sent. So occasionally the non-urgent thread misses the signal, and 
>> thus (for all practical purposes) never wakes up, because it's waiting 
>> indefinitely.
>> 
>> There *is* a lock taken inside RTSemEventSignal and RTSemEventWait, but that 
>> doesn't prevent this because it only covers the waiting and signalling 
>> itself, and not the real work. This (locking but making the critical section 
>> too small) is also a classic bug.
>> 
>> I haven't tried to fix this because I'm not familiar with the code, but this 
>> is my best guess at what's going wrong.
>> -- 
>> Robin
>> 
>> 2009/11/26 Robin Green 
>> I think I've found something important. The responses stop appearing in the 
>> pcap log file immediately after VirtualBox sends the first ICMP type 4 
>> packet (wireshark says this is "Source quench (flow control)"). The packet 
>> is also malformed in the .pcap file, according to wireshark, which says it 
>> has a wrong header checksum - not for the overall packet, but in a nested IP 
>> header inside the ICMP packet.
>> 
>> -- 
>> Robin
>> 
>> 2009/11/26 Robin Green 
>> 
>> OK, this time I fortunately managed to reproduce the bug very quickly, *and* 
>> I got a host log file as well using wireshark. This time I used r24972.
>> 
>> I had a quick look at both .pcap files in wireshark. If you look for example 
>> at my attempt to ping 137.44.1.27, you'll see that the ping is sent and the 
>> reply is received on the host, but the reply does *not* appear in 
>> VirtualBox's pcap log file. I was trying to ping this IP address (one of our 
>> DNS servers) because I could not access twitter.com in firefox in the guest.
>> 
>> Thanks,
>> -- 
>> Robin
>> 
>> 2009/11/21 Vasily Levchenko 
>> 
>> On Nov 20, 2009, at 7:43 PM, Robin Green wrote:
>> 
>>> I've attached the log file. What do you mean by host network file?
>>> 
>> 
>> It's actually output of wireshark or any other traffic dumping tool working 
>> on host. Rather useful to check how nat send/modifyreceived from the 
>> guest packets to remote host.  
>> 
>>> 2009/11/19 Vasily Levchenko 
>>> Hello Robin,
>>> Thanks for feedback.
>>> Could you please attach pcap file NAT<->guest (please see 
>>> http://www.virtualbox.org/wiki/Network_tips) and host network and log file?
>>> On Nov 19, 2009, at 7:23 PM, Robin Green wrote:
>>> 
 Bisection reveals that this bug was introduced at some point between 
 r23223 and r23530 inclusive (I can't be more specific because all of those 
 revisions except r23530 either failed to build or failed to boot).
 -- 
 Robin
 

[vbox-dev] Race condition in DrvNAT.cpp (was: svn: NAT dropping all external packets)

2009-11-28 Thread Robin Green
Further to my earlier report on this list of NAT adapters stopping working
with svn and 3.1.0 beta versions, I believe I have identified the problem.

r23462 introduced the bug, by introducing a new thread to process "urgent"
packets, the communication of which with the existing thread isn't properly
synchronised. Basically, it's the classic bug of sending a signal to a
thread (in this case, the existing, non-urgent thread) without using a
critical section to ensure that the target thread is waiting for the signal
when it's sent. So occasionally the non-urgent thread misses the signal, and
thus (for all practical purposes) never wakes up, because it's waiting
indefinitely.

There *is* a lock taken inside RTSemEventSignal and RTSemEventWait, but that
doesn't prevent this because it only covers the waiting and signalling
itself, and not the real work. This (locking but making the critical section
too small) is also a classic bug.

I haven't tried to fix this because I'm not familiar with the code, but this
is my best guess at what's going wrong.
-- 
Robin

2009/11/26 Robin Green 

> I think I've found something important. The responses stop appearing in the
> pcap log file immediately after VirtualBox sends the first ICMP type 4
> packet (wireshark says this is "Source quench (flow control)"). The packet
> is also malformed in the .pcap file, according to wireshark, which says it
> has a wrong header checksum - not for the overall packet, but in a nested IP
> header inside the ICMP packet.
>
> --
> Robin
>
> 2009/11/26 Robin Green 
>
> OK, this time I fortunately managed to reproduce the bug very quickly,
>> *and* I got a host log file as well using wireshark. This time I used
>> r24972.
>>
>> I had a quick look at both .pcap files in wireshark. If you look for
>> example at my attempt to ping 137.44.1.27, you'll see that the ping is sent
>> and the reply is received on the host, but the reply does *not* appear in
>> VirtualBox's pcap log file. I was trying to ping this IP address (one of our
>> DNS servers) because I could not access twitter.com in firefox in the
>> guest.
>>
>> Thanks,
>> --
>> Robin
>>
>> 2009/11/21 Vasily Levchenko 
>>
>>>
>>> On Nov 20, 2009, at 7:43 PM, Robin Green wrote:
>>>
>>> I've attached the log file. What do you mean by host network file?
>>>
>>>
>>> It's actually output of wireshark or any other traffic dumping tool
>>> working on host. Rather useful to check how nat send/modifyreceived from
>>> the guest packets to remote host.
>>>
>>> 2009/11/19 Vasily Levchenko 
>>>
 Hello Robin,
 Thanks for feedback.
 Could you please attach pcap file NAT<->guest (please see
 http://www.virtualbox.org/wiki/Network_tips) and host network and log
 file?
 On Nov 19, 2009, at 7:23 PM, Robin Green wrote:

 Bisection reveals that this bug was introduced at some point between
 r23223 and r23530 inclusive (I can't be more specific because all of those
 revisions except r23530 either failed to build or failed to boot).
 --
 Robin

 2009/11/18 Robin Green 

> I tried 3.1 beta 1 and the NAT interfaces on both my VMs very quickly
> stopped working. So I tried building from SVN, and with this (r24735) I
> can't connect to, or even ping, anything other than the host. Guest is
> Fedora rawhide (F13), but with Fedora 12 kernel (due to an uninteresting 
> bug
> in the rawhide kernel).
>
> --
> Robin
>

 ___
 vbox-dev mailing list
 vbox-dev@virtualbox.org
 http://vbox.innotek.de/mailman/listinfo/vbox-dev



>>> 
>>>
>>>
>>>
>>
>
___
vbox-dev mailing list
vbox-dev@virtualbox.org
http://vbox.innotek.de/mailman/listinfo/vbox-dev