Re: [Qemu-devel] [PATCH 3/3] Delayed IP packets
On 29.09.2011, at 18:06, Amit Shah wrote: > On (Wed) 03 Aug 2011 [13:24:22], Jan Kiszka wrote: >> From: Fabien Chouteau >> >> In the current implementation, if Slirp tries to send an IP packet to a >> client >> with an unknown hardware address, the packet is simply dropped and an ARP >> request is sent (if_encap in slirp/slirp.c). >> >> With this patch, Slirp will send the ARP request, re-queue the packet and try >> to send it later. The packet is dropped after one second if the ARP reply is >> not received. > > This patch causes a segfault when guests wake up from hibernate. > > Recipe: > 1. Start guest with -net user -net nic,model=virtio > 2. (guest) ping 10.0.2.2 > 3. (guest) echo "disk" > /sys/power/state > 4. Re-start guest with same command line > 5. Ping has stopped receiving replies. > 6. Kill that ping process and start a new one. qemu segfaults. > > This needs the not-upstream-yet virtio S4 handling patches, found at > http://thread.gmane.org/gmane.linux.kernel/1197141 > > The backtrace is: > > (gdb) bt > #0 0x77e421f7 in slirp_insque (a=0x0, b=0x78f95d50) at > /home/amit/src/qemu/slirp/misc.c:27 > #1 0x77e40738 in if_start (slirp=0x78a9cdf0) at > /home/amit/src/qemu/slirp/if.c:194 > #2 0x77e44828 in slirp_select_poll (readfds=0x7fffd930, > writefds=0x7fffd9b0, xfds=0x7fffda30, select_error=0) >at /home/amit/src/qemu/slirp/slirp.c:588 > #3 0x77e110f1 in main_loop_wait (nonblocking=) > at /home/amit/src/qemu/vl.c:1549 > #4 0x77d7dc47 in main_loop () at > /home/amit/src/qemu/vl.c:1579 > #5 main (argc=, argv=, envp= out>) at /home/amit/src/qemu/vl.c:3574 > > > Reverting the patch keeps the ping going on after resume. I get the same thing with yesterday's HEAD (close to 1.0-rc3), but without hibernation. I'm running KVM Autotest on PPC machines to check my ppc-next queue and every single test failed for me because of segmentation faults in the slirp code. Reverting this patch (and the follow-up patch which fixes the struct mbuf definition) makes all tests not segfault for me, so I'm fairly sure this is the offending one :). I'm not saying that the patch is actually wrong - maybe it only exposes another bug that was only hidden so far. Either way, the breakage looks pretty much like memory corruption to me. Also, I'm having a hard time reproducing the problem manually. It triggers every time in Autotest, but never when I try to trigger it manually. Essentially Autotest is merely trying to connect to the guest using ssh every couple of seconds, so I don't know why I can't reproduce it without it. Please fix or revert this for 1.0. Alex
Re: [Qemu-devel] [PATCH 3/3] Delayed IP packets
On (Thu) 29 Sep 2011 [20:19:42], Jan Kiszka wrote: > On 2011-09-29 20:05, Amit Shah wrote: > > On (Thu) 29 Sep 2011 [19:53:47], Jan Kiszka wrote: > Can't reproduce, I'm not getting stable hibernation here even without > any network configured. > >>> > >>> With virtio devices and the patches applied? Can you tell me what > >>> you're seeing? > >> > >> No, I didn't patch my guest. I was using standard IDE with an emulated > >> NIC (or without) against a 3.1-rc3 (or so) guest. > > > > Strange, using qemu.git and an F14 guest (2.6.38) using this cmd line: > > > > ./x86_64-softmmu/qemu-system-x86_64 -m 512 /guests/f14-suspend.qcow2 -net > > none -enable-kvm -smp 2 > > > > I could successfully hibernate and resume. > > -cpu qemu64,-kvmclock makes it work. Would have to check a different > kernel version to find out if it's a guest kernel or qemu/kvm issue. Aha, yes. I disable kvmclock in my kernels as that has known issues with hibernate. I haven't checked if the 2.6.38 f14 kernel had it disabled, or maybe I didn't hit its problem point soon enough. Amit
Re: [Qemu-devel] [PATCH 3/3] Delayed IP packets
On 2011-09-29 20:05, Amit Shah wrote: > On (Thu) 29 Sep 2011 [19:53:47], Jan Kiszka wrote: Can't reproduce, I'm not getting stable hibernation here even without any network configured. >>> >>> With virtio devices and the patches applied? Can you tell me what >>> you're seeing? >> >> No, I didn't patch my guest. I was using standard IDE with an emulated >> NIC (or without) against a 3.1-rc3 (or so) guest. > > Strange, using qemu.git and an F14 guest (2.6.38) using this cmd line: > > ./x86_64-softmmu/qemu-system-x86_64 -m 512 /guests/f14-suspend.qcow2 -net > none -enable-kvm -smp 2 > > I could successfully hibernate and resume. -cpu qemu64,-kvmclock makes it work. Would have to check a different kernel version to find out if it's a guest kernel or qemu/kvm issue. > Could you check if the recent pull request [1] changes the picture for you? >>> >>> Thanks, that series fixes the problem. >> >> Perfect! Right in time. :) > > And people say slirp is neglected and unmaintainable :-) Who says this? I think slirp just needed some attention. It already presented two fairly old bugs to me, but strangely right after I adopted it... ;) Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux
Re: [Qemu-devel] [PATCH 3/3] Delayed IP packets
On (Thu) 29 Sep 2011 [19:53:47], Jan Kiszka wrote: > >> Can't reproduce, I'm not getting stable hibernation here even without > >> any network configured. > > > > With virtio devices and the patches applied? Can you tell me what > > you're seeing? > > No, I didn't patch my guest. I was using standard IDE with an emulated > NIC (or without) against a 3.1-rc3 (or so) guest. Strange, using qemu.git and an F14 guest (2.6.38) using this cmd line: ./x86_64-softmmu/qemu-system-x86_64 -m 512 /guests/f14-suspend.qcow2 -net none -enable-kvm -smp 2 I could successfully hibernate and resume. > >> Could you check if the recent pull request [1] changes the picture for you? > > > > Thanks, that series fixes the problem. > > Perfect! Right in time. :) And people say slirp is neglected and unmaintainable :-) Amit
Re: [Qemu-devel] [PATCH 3/3] Delayed IP packets
On 2011-09-29 19:50, Amit Shah wrote: > On (Thu) 29 Sep 2011 [19:41:33], Jan Kiszka wrote: >> On 2011-09-29 18:06, Amit Shah wrote: >>> On (Wed) 03 Aug 2011 [13:24:22], Jan Kiszka wrote: From: Fabien Chouteau In the current implementation, if Slirp tries to send an IP packet to a client with an unknown hardware address, the packet is simply dropped and an ARP request is sent (if_encap in slirp/slirp.c). With this patch, Slirp will send the ARP request, re-queue the packet and try to send it later. The packet is dropped after one second if the ARP reply is not received. >>> >>> This patch causes a segfault when guests wake up from hibernate. >>> >>> Recipe: >>> 1. Start guest with -net user -net nic,model=virtio >>> 2. (guest) ping 10.0.2.2 >>> 3. (guest) echo "disk" > /sys/power/state >>> 4. Re-start guest with same command line >>> 5. Ping has stopped receiving replies. >>> 6. Kill that ping process and start a new one. qemu segfaults. >> >> Can't reproduce, I'm not getting stable hibernation here even without >> any network configured. > > With virtio devices and the patches applied? Can you tell me what > you're seeing? No, I didn't patch my guest. I was using standard IDE with an emulated NIC (or without) against a 3.1-rc3 (or so) guest. > >> Could you check if the recent pull request [1] changes the picture for you? > > Thanks, that series fixes the problem. Perfect! Right in time. :) Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux
Re: [Qemu-devel] [PATCH 3/3] Delayed IP packets
On (Thu) 29 Sep 2011 [19:41:33], Jan Kiszka wrote: > On 2011-09-29 18:06, Amit Shah wrote: > > On (Wed) 03 Aug 2011 [13:24:22], Jan Kiszka wrote: > >> From: Fabien Chouteau > >> > >> In the current implementation, if Slirp tries to send an IP packet to a > >> client > >> with an unknown hardware address, the packet is simply dropped and an ARP > >> request is sent (if_encap in slirp/slirp.c). > >> > >> With this patch, Slirp will send the ARP request, re-queue the packet and > >> try > >> to send it later. The packet is dropped after one second if the ARP reply > >> is > >> not received. > > > > This patch causes a segfault when guests wake up from hibernate. > > > > Recipe: > > 1. Start guest with -net user -net nic,model=virtio > > 2. (guest) ping 10.0.2.2 > > 3. (guest) echo "disk" > /sys/power/state > > 4. Re-start guest with same command line > > 5. Ping has stopped receiving replies. > > 6. Kill that ping process and start a new one. qemu segfaults. > > Can't reproduce, I'm not getting stable hibernation here even without > any network configured. With virtio devices and the patches applied? Can you tell me what you're seeing? > Could you check if the recent pull request [1] changes the picture for you? Thanks, that series fixes the problem. Amit
Re: [Qemu-devel] [PATCH 3/3] Delayed IP packets
On 2011-09-29 18:06, Amit Shah wrote: > On (Wed) 03 Aug 2011 [13:24:22], Jan Kiszka wrote: >> From: Fabien Chouteau >> >> In the current implementation, if Slirp tries to send an IP packet to a >> client >> with an unknown hardware address, the packet is simply dropped and an ARP >> request is sent (if_encap in slirp/slirp.c). >> >> With this patch, Slirp will send the ARP request, re-queue the packet and try >> to send it later. The packet is dropped after one second if the ARP reply is >> not received. > > This patch causes a segfault when guests wake up from hibernate. > > Recipe: > 1. Start guest with -net user -net nic,model=virtio > 2. (guest) ping 10.0.2.2 > 3. (guest) echo "disk" > /sys/power/state > 4. Re-start guest with same command line > 5. Ping has stopped receiving replies. > 6. Kill that ping process and start a new one. qemu segfaults. Can't reproduce, I'm not getting stable hibernation here even without any network configured. Could you check if the recent pull request [1] changes the picture for you? Thanks, Jan [1] http://thread.gmane.org/gmane.comp.emulators.qemu/118992 -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux
Re: [Qemu-devel] [PATCH 3/3] Delayed IP packets
On (Wed) 03 Aug 2011 [13:24:22], Jan Kiszka wrote: > From: Fabien Chouteau > > In the current implementation, if Slirp tries to send an IP packet to a client > with an unknown hardware address, the packet is simply dropped and an ARP > request is sent (if_encap in slirp/slirp.c). > > With this patch, Slirp will send the ARP request, re-queue the packet and try > to send it later. The packet is dropped after one second if the ARP reply is > not received. This patch causes a segfault when guests wake up from hibernate. Recipe: 1. Start guest with -net user -net nic,model=virtio 2. (guest) ping 10.0.2.2 3. (guest) echo "disk" > /sys/power/state 4. Re-start guest with same command line 5. Ping has stopped receiving replies. 6. Kill that ping process and start a new one. qemu segfaults. This needs the not-upstream-yet virtio S4 handling patches, found at http://thread.gmane.org/gmane.linux.kernel/1197141 The backtrace is: (gdb) bt #0 0x77e421f7 in slirp_insque (a=0x0, b=0x78f95d50) at /home/amit/src/qemu/slirp/misc.c:27 #1 0x77e40738 in if_start (slirp=0x78a9cdf0) at /home/amit/src/qemu/slirp/if.c:194 #2 0x77e44828 in slirp_select_poll (readfds=0x7fffd930, writefds=0x7fffd9b0, xfds=0x7fffda30, select_error=0) at /home/amit/src/qemu/slirp/slirp.c:588 #3 0x77e110f1 in main_loop_wait (nonblocking=) at /home/amit/src/qemu/vl.c:1549 #4 0x77d7dc47 in main_loop () at /home/amit/src/qemu/vl.c:1579 #5 main (argc=, argv=, envp=) at /home/amit/src/qemu/vl.c:3574 Reverting the patch keeps the ping going on after resume. Leaving the patch in context: > > Signed-off-by: Fabien Chouteau > Signed-off-by: Jan Kiszka > --- > slirp/if.c| 28 +++--- > slirp/main.h |2 +- > slirp/mbuf.c |2 + > slirp/mbuf.h |2 + > slirp/slirp.c | 72 +++- > 5 files changed, 69 insertions(+), 37 deletions(-) > > diff --git a/slirp/if.c b/slirp/if.c > index 0f04e13..2d79e45 100644 > --- a/slirp/if.c > +++ b/slirp/if.c > @@ -6,6 +6,7 @@ > */ > > #include > +#include "qemu-timer.h" > > #define ifs_init(ifm) ((ifm)->ifs_next = (ifm)->ifs_prev = (ifm)) > > @@ -105,6 +106,9 @@ if_output(struct socket *so, struct mbuf *ifm) > ifs_init(ifm); > insque(ifm, ifq); > > +/* Expiration date = Now + 1 second */ > +ifm->expiration_date = qemu_get_clock_ns(rt_clock) + 10ULL; > + > diddit: > slirp->if_queued++; > > @@ -153,6 +157,9 @@ diddit: > void > if_start(Slirp *slirp) > { > +int requeued = 0; > +uint64_t now; > + > struct mbuf *ifm, *ifqt; > > DEBUG_CALL("if_start"); > @@ -165,6 +172,8 @@ if_start(Slirp *slirp) > if (!slirp_can_output(slirp->opaque)) > return; > > +now = qemu_get_clock_ns(rt_clock); > + > /* >* See which queue to get next packet from >* If there's something in the fastq, select it immediately > @@ -199,11 +208,22 @@ if_start(Slirp *slirp) > ifm->ifq_so->so_nqueued = 0; > } > > - /* Encapsulate the packet for sending */ > -if_encap(slirp, (uint8_t *)ifm->m_data, ifm->m_len); > - > -m_free(ifm); > +if (ifm->expiration_date < now) { > +/* Expired */ > +m_free(ifm); > +} else { > +/* Encapsulate the packet for sending */ > +if (if_encap(slirp, ifm)) { > +m_free(ifm); > +} else { > +/* re-queue */ > +insque(ifm, ifqt); > +requeued++; > +} > +} > > if (slirp->if_queued) > goto again; > + > +slirp->if_queued = requeued; > } > diff --git a/slirp/main.h b/slirp/main.h > index 0dd8d81..028df4b 100644 > --- a/slirp/main.h > +++ b/slirp/main.h > @@ -42,5 +42,5 @@ extern int tcp_keepintvl; > #define PROTO_PPP 0x2 > #endif > > -void if_encap(Slirp *slirp, const uint8_t *ip_data, int ip_data_len); > +int if_encap(Slirp *slirp, struct mbuf *ifm); > ssize_t slirp_send(struct socket *so, const void *buf, size_t len, int > flags); > diff --git a/slirp/mbuf.c b/slirp/mbuf.c > index ce2eb84..c699c75 100644 > --- a/slirp/mbuf.c > +++ b/slirp/mbuf.c > @@ -70,6 +70,8 @@ m_get(Slirp *slirp) > m->m_len = 0; > m->m_nextpkt = NULL; > m->m_prevpkt = NULL; > +m->arp_requested = false; > +m->expiration_date = (uint64_t)-1; > end_error: > DEBUG_ARG("m = %lx", (long )m); > return m; > diff --git a/slirp/mbuf.h b/slirp/mbuf.h > index b74544b..55170e5 100644 > --- a/slirp/mbuf.h > +++ b/slirp/mbuf.h > @@ -86,6 +86,8 @@ struct mbuf { > charm_dat_[1]; /* ANSI don't like 0 sized arrays */ > char*m_ext_; > } M_dat; > +bool arp_requested; > +uint64_t expiration_date; > }; > > #define m_next m_hdr.mh_next > diff --gi
[Qemu-devel] [PATCH 3/3] Delayed IP packets
From: Fabien Chouteau In the current implementation, if Slirp tries to send an IP packet to a client with an unknown hardware address, the packet is simply dropped and an ARP request is sent (if_encap in slirp/slirp.c). With this patch, Slirp will send the ARP request, re-queue the packet and try to send it later. The packet is dropped after one second if the ARP reply is not received. Signed-off-by: Fabien Chouteau Signed-off-by: Jan Kiszka --- slirp/if.c| 28 +++--- slirp/main.h |2 +- slirp/mbuf.c |2 + slirp/mbuf.h |2 + slirp/slirp.c | 72 +++- 5 files changed, 69 insertions(+), 37 deletions(-) diff --git a/slirp/if.c b/slirp/if.c index 0f04e13..2d79e45 100644 --- a/slirp/if.c +++ b/slirp/if.c @@ -6,6 +6,7 @@ */ #include +#include "qemu-timer.h" #define ifs_init(ifm) ((ifm)->ifs_next = (ifm)->ifs_prev = (ifm)) @@ -105,6 +106,9 @@ if_output(struct socket *so, struct mbuf *ifm) ifs_init(ifm); insque(ifm, ifq); +/* Expiration date = Now + 1 second */ +ifm->expiration_date = qemu_get_clock_ns(rt_clock) + 10ULL; + diddit: slirp->if_queued++; @@ -153,6 +157,9 @@ diddit: void if_start(Slirp *slirp) { +int requeued = 0; +uint64_t now; + struct mbuf *ifm, *ifqt; DEBUG_CALL("if_start"); @@ -165,6 +172,8 @@ if_start(Slirp *slirp) if (!slirp_can_output(slirp->opaque)) return; +now = qemu_get_clock_ns(rt_clock); + /* * See which queue to get next packet from * If there's something in the fastq, select it immediately @@ -199,11 +208,22 @@ if_start(Slirp *slirp) ifm->ifq_so->so_nqueued = 0; } - /* Encapsulate the packet for sending */ -if_encap(slirp, (uint8_t *)ifm->m_data, ifm->m_len); - -m_free(ifm); +if (ifm->expiration_date < now) { +/* Expired */ +m_free(ifm); +} else { +/* Encapsulate the packet for sending */ +if (if_encap(slirp, ifm)) { +m_free(ifm); +} else { +/* re-queue */ +insque(ifm, ifqt); +requeued++; +} +} if (slirp->if_queued) goto again; + +slirp->if_queued = requeued; } diff --git a/slirp/main.h b/slirp/main.h index 0dd8d81..028df4b 100644 --- a/slirp/main.h +++ b/slirp/main.h @@ -42,5 +42,5 @@ extern int tcp_keepintvl; #define PROTO_PPP 0x2 #endif -void if_encap(Slirp *slirp, const uint8_t *ip_data, int ip_data_len); +int if_encap(Slirp *slirp, struct mbuf *ifm); ssize_t slirp_send(struct socket *so, const void *buf, size_t len, int flags); diff --git a/slirp/mbuf.c b/slirp/mbuf.c index ce2eb84..c699c75 100644 --- a/slirp/mbuf.c +++ b/slirp/mbuf.c @@ -70,6 +70,8 @@ m_get(Slirp *slirp) m->m_len = 0; m->m_nextpkt = NULL; m->m_prevpkt = NULL; +m->arp_requested = false; +m->expiration_date = (uint64_t)-1; end_error: DEBUG_ARG("m = %lx", (long )m); return m; diff --git a/slirp/mbuf.h b/slirp/mbuf.h index b74544b..55170e5 100644 --- a/slirp/mbuf.h +++ b/slirp/mbuf.h @@ -86,6 +86,8 @@ struct mbuf { charm_dat_[1]; /* ANSI don't like 0 sized arrays */ char*m_ext_; } M_dat; +bool arp_requested; +uint64_t expiration_date; }; #define m_next m_hdr.mh_next diff --git a/slirp/slirp.c b/slirp/slirp.c index 4a9a4d5..a86cc6e 100644 --- a/slirp/slirp.c +++ b/slirp/slirp.c @@ -692,55 +692,63 @@ void slirp_input(Slirp *slirp, const uint8_t *pkt, int pkt_len) } } -/* output the IP packet to the ethernet device */ -void if_encap(Slirp *slirp, const uint8_t *ip_data, int ip_data_len) +/* Output the IP packet to the ethernet device. Returns 0 if the packet must be + * re-queued. + */ +int if_encap(Slirp *slirp, struct mbuf *ifm) { uint8_t buf[1600]; struct ethhdr *eh = (struct ethhdr *)buf; uint8_t ethaddr[ETH_ALEN]; -const struct ip *iph = (const struct ip *)ip_data; +const struct ip *iph = (const struct ip *)ifm->m_data; -if (ip_data_len + ETH_HLEN > sizeof(buf)) -return; +if (ifm->m_len + ETH_HLEN > sizeof(buf)) { +return 1; +} if (!arp_table_search(slirp, iph->ip_dst.s_addr, ethaddr)) { uint8_t arp_req[ETH_HLEN + sizeof(struct arphdr)]; struct ethhdr *reh = (struct ethhdr *)arp_req; struct arphdr *rah = (struct arphdr *)(arp_req + ETH_HLEN); -/* If the client addr is not known, there is no point in - sending the packet to it. Normally the sender should have - done an ARP request to get its MAC address. Here we do it - in place of sending the packet and we hope that the sender - will retry sending its packet. */ -memset(reh->h_dest, 0xff, ETH_ALEN);