On Nov 30, 2009, at 7:23 PM, Robin Green wrote:
Hi Vasily,
I am sure that the patched code is still not correct. A thread switch could
occur between the unlocking of one lock and the locking of another - right?
Need investigate that.
But I think root cause of the problem isn't here (race), because earlier or
later urgent channel will be waken up, by ICMP message from host and will
wakeup Recv thread. But for me problem is located at mbuf zone water line
mechanism some basic implementation of rfc 896. Have you tried first today's
suggestion?
So, you would need to lock the *same* lock that is currently locked by
RTSemEventWait (on systems that use semevent-posix.cpp, anyway), so that that
lock will be temporarily released while it waits.
However, there is a problem here. With systems like Mac OS X that use
semevent-posix.cpp, the implementation of RTSemEventWait in there uses a
lock. So, OK, it should be possible to lock that lock (but you might have to
prevent locking the same lock twice). But on Linux, semevent-linux.cpp is
presumably used, and the implementation of RTSemEventWait in there does *not*
use a lock! I think this is wrong - you can't use the code in RTSemEventWait
in semevent-linux.cpp to implement a correctly-working condition variable or
semaphore for what you're trying to do, because to implement it correctly you
need to hold a lock to avoid race conditions, and atomically release the lock
while you wait.
I know this stuff can be confusing - I find it confusing myself, and I might
have misunderstood it - so please, do not hesitate to ask for further
explanation.
--
Robin
2009/11/30 Vasily Levchenko vasily.levche...@sun.com
Hi Robin,
Could you please check, that patch solves problem for you?
On Nov 28, 2009, at 7:30 PM, Robin Green wrote:
Further to my earlier report on this list of NAT adapters stopping working
with svn and 3.1.0 beta versions, I believe I have identified the problem.
r23462 introduced the bug, by introducing a new thread to process urgent
packets, the communication of which with the existing thread isn't properly
synchronised. Basically, it's the classic bug of sending a signal to a
thread (in this case, the existing, non-urgent thread) without using a
critical section to ensure that the target thread is waiting for the signal
when it's sent. So occasionally the non-urgent thread misses the signal, and
thus (for all practical purposes) never wakes up, because it's waiting
indefinitely.
There *is* a lock taken inside RTSemEventSignal and RTSemEventWait, but that
doesn't prevent this because it only covers the waiting and signalling
itself, and not the real work. This (locking but making the critical section
too small) is also a classic bug.
I haven't tried to fix this because I'm not familiar with the code, but this
is my best guess at what's going wrong.
--
Robin
2009/11/26 Robin Green gree...@gmail.com
I think I've found something important. The responses stop appearing in the
pcap log file immediately after VirtualBox sends the first ICMP type 4
packet (wireshark says this is Source quench (flow control)). The packet
is also malformed in the .pcap file, according to wireshark, which says it
has a wrong header checksum - not for the overall packet, but in a nested IP
header inside the ICMP packet.
--
Robin
2009/11/26 Robin Green gree...@gmail.com
OK, this time I fortunately managed to reproduce the bug very quickly, *and*
I got a host log file as well using wireshark. This time I used r24972.
I had a quick look at both .pcap files in wireshark. If you look for example
at my attempt to ping 137.44.1.27, you'll see that the ping is sent and the
reply is received on the host, but the reply does *not* appear in
VirtualBox's pcap log file. I was trying to ping this IP address (one of our
DNS servers) because I could not access twitter.com in firefox in the guest.
Thanks,
--
Robin
2009/11/21 Vasily Levchenko vasily.levche...@sun.com
On Nov 20, 2009, at 7:43 PM, Robin Green wrote:
I've attached the log file. What do you mean by host network file?
It's actually output of wireshark or any other traffic dumping tool working
on host. Rather useful to check how nat send/modifyreceived from the
guest packets to remote host.
2009/11/19 Vasily Levchenko vasily.levche...@sun.com
Hello Robin,
Thanks for feedback.
Could you please attach pcap file NAT-guest (please see
http://www.virtualbox.org/wiki/Network_tips) and host network and log file?
On Nov 19, 2009, at 7:23 PM, Robin Green wrote:
Bisection reveals that this bug was introduced at some point between
r23223 and r23530 inclusive (I can't be more specific because all of those
revisions except r23530 either failed to build or failed to boot).
--
Robin
2009/11/18 Robin Green gree...@gmail.com
I tried 3.1 beta 1 and the NAT interfaces on both my