Re: [Qemu-devel] [PATCH v2] Fix buffer run out in eepro100.

2012-09-18 Thread Andreas Färber
Am 31.08.2012 07:30, schrieb Stefan Hajnoczi:
 On Fri, Aug 31, 2012 at 4:40 AM, Bo Yang boy...@suse.com wrote:


 Stefan Hajnoczi stefa...@gmail.com 08/30/12 7:42 PM 
 I'm happy with this patch.  I have merged it into the net tree:
 
 https://github.com/stefanha/qemu/commit/363ed8f3012462d409de8fcbfe746b7336a2c437

Thanks, applied to stable-0.15.

Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg



Re: [Qemu-devel] [PATCH v2] Fix buffer run out in eepro100.

2012-08-31 Thread Bo Yang


 Stefan Hajnoczi stefa...@gmail.com 08/30/12 7:42 PM 
On Thu, Aug 30, 2012 at 9:38 AM, Bo Yang boy...@suse.com wrote:
 On 08/30/2012 04:04 PM, Stefan Hajnoczi wrote:
 On Wed, Aug 29, 2012 at 09:17:43PM +0200, Stefan Weil wrote:
 Am 29.08.2012 13:26, schrieb Bo Yang:
 This is reported by QA. When installing os with pxe, after the initial
 kernel and initrd are loaded, the procedure tries to copy files from 
 install
 server to local harddisk, the network becomes stall because of running out 
 of
 receive descriptor.

 Signed-off-by: Bo Yangboy...@suse.com
 ---
  hw/eepro100.c |5 -
  1 files changed, 4 insertions(+), 1 deletions(-)

 diff --git a/hw/eepro100.c b/hw/eepro100.c
 index 50d117e..52a18ad 100644
 --- a/hw/eepro100.c
 +++ b/hw/eepro100.c
 @@ -1036,6 +1036,8 @@ static void eepro100_ru_command(EEPRO100State * s, 
 uint8_t val)
  }
  set_ru_state(s, ru_ready);
  s-ru_offset = e100_read_reg4(s, SCBPointer);
 +   qemu_flush_queued_packets(s-nic-nc);
 +   qemu_notify_event();

 What would happen if the above changes were omitted?

 In the worst case the guest code would be unable to make progress since
 packet reception is disabled.

 The QEMU net subsystem needs to be kicked when rx buffers become
 available again so that any queued packets can be delivered and we can
 restart the event loop.

 The event loop needs to be restarted because net clients (like tap) use
 qemu_set_fd_handler2() with a read_poll() handler that returns false
 when the NIC is unable to receive.  Imagine this scenario:

 1. NIC runs out of rx buffers.
 2. Event loop iteration starts and calls tap's read_poll() handler,
which sees the NIC cannot receive and therefore does not add the tap
file descriptor to select(2).
 3. NIC gets new rx buffers but does not kick net subsystem/event loop.
 4. Event loop still sitting in select(2) without the tap file
descriptor.  Therefore incoming packets are not picked up by QEMU!

 In practice the event loop tends to iterate due to timers, etc.  But in
 the worst case we can go completely starved here.

 Yes. The fd will be added to read set in the next iteration. The delay
 depends on the select timeout. it is possible to go starved here.


 Would the network show less performance? How much
 would the test scenario (Linux installation) take longer?

 Yes, the lack of kicks causes reduced network performance.  This is
 especially true with -netdev tap and a guest driver that runs out of rx
 buffers.  If you're lucky you might not hit this depending on the
 -netdev and availability of rx buffers.

 What about the other nic emulations in QEMU?
 I observe hanging network rather often with the
 ARM versatilepb emulation.

 virtio-net has been correct for some time.

 e1000, xen, usb, and eepro100 are now fixed in the net tree:
 http://github.com/stefanha/qemu/commits/net

 Other NICs may or may not be okay.  Really all of them need to be
 audited.

  TRACE(OTHER, logout(val=0x%02x (rx start)\n, val));
  break;
  case RX_RESUME:
 @@ -1770,7 +1772,8 @@ static ssize_t nic_receive(NetClientState *nc, const 
 uint8_t * buf, size_t size)
  if (rfd_command  COMMAND_EL) {
  /* EL bit is set, so this was the last frame. */
  logout(receive: Running out of frames\n);
 -set_ru_state(s, ru_suspended);
 +set_ru_state(s, ru_no_resources);
 +   eepro100_rnr_interrupt(s);

 Adding the interrupt here is correct (I have similar code in
 http://repo.or.cz/w/qemu/ar7.git/blob/HEAD:/hw/eepro100.c
 which is an improved version of hw/eepro100.c).

 Setting ru_no_resources looks also good, but I am not
 sure whether removing ru_suspended is ok. Maybe it should
 be ru_no_resources | ru_suspended.

 I think the datasheet talks about setting the RU to no resources and the
 CU to suspended.  So there are two state machines and we only track one
 here.

 I don't think I understand this. If we run out of rx descriptor, why do
 we suspend tx unit too? maybe there are reasons I am unaware of.. I
 don't know.

I was wrong.  The datasheet Table 55. CU Activities Performed at the
End of Execution shows that the EL and S bit cause the CU to enter
the Idle State.

In terms of hw/eepro100.c I don't think we care about the CU state.
RU state No Resources is correct.

I think we've done here, do we? sorry for the format of the mail, it is 
sent from web.. Thanks for reviewing and suggestions.


Stefan






Re: [Qemu-devel] [PATCH v2] Fix buffer run out in eepro100.

2012-08-30 Thread Stefan Hajnoczi
On Thu, Aug 30, 2012 at 07:47:38AM +0800, Bo Yang wrote:
 On 08/29/2012 11:19 PM, Stefan Hajnoczi wrote:
  On Wed, Aug 29, 2012 at 07:26:11PM +0800, Bo Yang wrote:
  This is reported by QA. When installing os with pxe, after the initial
  kernel and initrd are loaded, the procedure tries to copy files from 
  install
  server to local harddisk, the network becomes stall because of running out 
  of
  receive descriptor.
 
  Signed-off-by: Bo Yang boy...@suse.com
  ---
   hw/eepro100.c |5 -
   1 files changed, 4 insertions(+), 1 deletions(-)
  
  Thanks, applied to the net patches tree:
  https://github.com/stefanha/qemu/commits/net
  
  I have reproduced the bug and tested that your patch fixes it.  Your
  patch also has tabs instead of spaces, I fixed that up when merging.
 
 Thanks for verifying and merging the patch. Sorry for my mistake in
 coding style. I'll be careful next time. I use vim, it looks like set
 tabstop=4 is not enough to get it work as expected.

set tabstop=4  display tabs as 4 spaces
set expandtab  use spaces instead of tabs
set shiftwidth=4   indent by 4 spaces



Re: [Qemu-devel] [PATCH v2] Fix buffer run out in eepro100.

2012-08-30 Thread Stefan Hajnoczi
On Wed, Aug 29, 2012 at 09:17:43PM +0200, Stefan Weil wrote:
 Am 29.08.2012 13:26, schrieb Bo Yang:
 This is reported by QA. When installing os with pxe, after the initial
 kernel and initrd are loaded, the procedure tries to copy files from install
 server to local harddisk, the network becomes stall because of running out of
 receive descriptor.
 
 Signed-off-by: Bo Yangboy...@suse.com
 ---
   hw/eepro100.c |5 -
   1 files changed, 4 insertions(+), 1 deletions(-)
 
 diff --git a/hw/eepro100.c b/hw/eepro100.c
 index 50d117e..52a18ad 100644
 --- a/hw/eepro100.c
 +++ b/hw/eepro100.c
 @@ -1036,6 +1036,8 @@ static void eepro100_ru_command(EEPRO100State * s, 
 uint8_t val)
   }
   set_ru_state(s, ru_ready);
   s-ru_offset = e100_read_reg4(s, SCBPointer);
 +qemu_flush_queued_packets(s-nic-nc);
 +qemu_notify_event();
 
 What would happen if the above changes were omitted?

In the worst case the guest code would be unable to make progress since
packet reception is disabled.

The QEMU net subsystem needs to be kicked when rx buffers become
available again so that any queued packets can be delivered and we can
restart the event loop.

The event loop needs to be restarted because net clients (like tap) use
qemu_set_fd_handler2() with a read_poll() handler that returns false
when the NIC is unable to receive.  Imagine this scenario:

1. NIC runs out of rx buffers.
2. Event loop iteration starts and calls tap's read_poll() handler,
   which sees the NIC cannot receive and therefore does not add the tap
   file descriptor to select(2).
3. NIC gets new rx buffers but does not kick net subsystem/event loop.
4. Event loop still sitting in select(2) without the tap file
   descriptor.  Therefore incoming packets are not picked up by QEMU!

In practice the event loop tends to iterate due to timers, etc.  But in
the worst case we can go completely starved here.

 Would the network show less performance? How much
 would the test scenario (Linux installation) take longer?

Yes, the lack of kicks causes reduced network performance.  This is
especially true with -netdev tap and a guest driver that runs out of rx
buffers.  If you're lucky you might not hit this depending on the
-netdev and availability of rx buffers.

 What about the other nic emulations in QEMU?
 I observe hanging network rather often with the
 ARM versatilepb emulation.

virtio-net has been correct for some time.

e1000, xen, usb, and eepro100 are now fixed in the net tree:
http://github.com/stefanha/qemu/commits/net

Other NICs may or may not be okay.  Really all of them need to be
audited.

   TRACE(OTHER, logout(val=0x%02x (rx start)\n, val));
   break;
   case RX_RESUME:
 @@ -1770,7 +1772,8 @@ static ssize_t nic_receive(NetClientState *nc, const 
 uint8_t * buf, size_t size)
   if (rfd_command  COMMAND_EL) {
   /* EL bit is set, so this was the last frame. */
   logout(receive: Running out of frames\n);
 -set_ru_state(s, ru_suspended);
 +set_ru_state(s, ru_no_resources);
 +eepro100_rnr_interrupt(s);
 
 Adding the interrupt here is correct (I have similar code in
 http://repo.or.cz/w/qemu/ar7.git/blob/HEAD:/hw/eepro100.c
 which is an improved version of hw/eepro100.c).
 
 Setting ru_no_resources looks also good, but I am not
 sure whether removing ru_suspended is ok. Maybe it should
 be ru_no_resources | ru_suspended.

I think the datasheet talks about setting the RU to no resources and the
CU to suspended.  So there are two state machines and we only track one
here.

Stefan



Re: [Qemu-devel] [PATCH v2] Fix buffer run out in eepro100.

2012-08-30 Thread Bo Yang
On 08/30/2012 04:04 PM, Stefan Hajnoczi wrote:
 On Wed, Aug 29, 2012 at 09:17:43PM +0200, Stefan Weil wrote:
 Am 29.08.2012 13:26, schrieb Bo Yang:
 This is reported by QA. When installing os with pxe, after the initial
 kernel and initrd are loaded, the procedure tries to copy files from install
 server to local harddisk, the network becomes stall because of running out 
 of
 receive descriptor.

 Signed-off-by: Bo Yangboy...@suse.com
 ---
  hw/eepro100.c |5 -
  1 files changed, 4 insertions(+), 1 deletions(-)

 diff --git a/hw/eepro100.c b/hw/eepro100.c
 index 50d117e..52a18ad 100644
 --- a/hw/eepro100.c
 +++ b/hw/eepro100.c
 @@ -1036,6 +1036,8 @@ static void eepro100_ru_command(EEPRO100State * s, 
 uint8_t val)
  }
  set_ru_state(s, ru_ready);
  s-ru_offset = e100_read_reg4(s, SCBPointer);
 +   qemu_flush_queued_packets(s-nic-nc);
 +   qemu_notify_event();

 What would happen if the above changes were omitted?
 
 In the worst case the guest code would be unable to make progress since
 packet reception is disabled.
 
 The QEMU net subsystem needs to be kicked when rx buffers become
 available again so that any queued packets can be delivered and we can
 restart the event loop.
 
 The event loop needs to be restarted because net clients (like tap) use
 qemu_set_fd_handler2() with a read_poll() handler that returns false
 when the NIC is unable to receive.  Imagine this scenario:
 
 1. NIC runs out of rx buffers.
 2. Event loop iteration starts and calls tap's read_poll() handler,
which sees the NIC cannot receive and therefore does not add the tap
file descriptor to select(2).
 3. NIC gets new rx buffers but does not kick net subsystem/event loop.
 4. Event loop still sitting in select(2) without the tap file
descriptor.  Therefore incoming packets are not picked up by QEMU!
 
 In practice the event loop tends to iterate due to timers, etc.  But in
 the worst case we can go completely starved here.

Yes. The fd will be added to read set in the next iteration. The delay
depends on the select timeout. it is possible to go starved here.

 
 Would the network show less performance? How much
 would the test scenario (Linux installation) take longer?
 
 Yes, the lack of kicks causes reduced network performance.  This is
 especially true with -netdev tap and a guest driver that runs out of rx
 buffers.  If you're lucky you might not hit this depending on the
 -netdev and availability of rx buffers.
 
 What about the other nic emulations in QEMU?
 I observe hanging network rather often with the
 ARM versatilepb emulation.
 
 virtio-net has been correct for some time.
 
 e1000, xen, usb, and eepro100 are now fixed in the net tree:
 http://github.com/stefanha/qemu/commits/net
 
 Other NICs may or may not be okay.  Really all of them need to be
 audited.
 
  TRACE(OTHER, logout(val=0x%02x (rx start)\n, val));
  break;
  case RX_RESUME:
 @@ -1770,7 +1772,8 @@ static ssize_t nic_receive(NetClientState *nc, const 
 uint8_t * buf, size_t size)
  if (rfd_command  COMMAND_EL) {
  /* EL bit is set, so this was the last frame. */
  logout(receive: Running out of frames\n);
 -set_ru_state(s, ru_suspended);
 +set_ru_state(s, ru_no_resources);
 +   eepro100_rnr_interrupt(s);

 Adding the interrupt here is correct (I have similar code in
 http://repo.or.cz/w/qemu/ar7.git/blob/HEAD:/hw/eepro100.c
 which is an improved version of hw/eepro100.c).

 Setting ru_no_resources looks also good, but I am not
 sure whether removing ru_suspended is ok. Maybe it should
 be ru_no_resources | ru_suspended.
 
 I think the datasheet talks about setting the RU to no resources and the
 CU to suspended.  So there are two state machines and we only track one
 here.

I don't think I understand this. If we run out of rx descriptor, why do
we suspend tx unit too? maybe there are reasons I am unaware of.. I
don't know.

 
 Stefan
 




Re: [Qemu-devel] [PATCH v2] Fix buffer run out in eepro100.

2012-08-30 Thread Stefan Hajnoczi
On Thu, Aug 30, 2012 at 9:38 AM, Bo Yang boy...@suse.com wrote:
 On 08/30/2012 04:04 PM, Stefan Hajnoczi wrote:
 On Wed, Aug 29, 2012 at 09:17:43PM +0200, Stefan Weil wrote:
 Am 29.08.2012 13:26, schrieb Bo Yang:
 This is reported by QA. When installing os with pxe, after the initial
 kernel and initrd are loaded, the procedure tries to copy files from 
 install
 server to local harddisk, the network becomes stall because of running out 
 of
 receive descriptor.

 Signed-off-by: Bo Yangboy...@suse.com
 ---
  hw/eepro100.c |5 -
  1 files changed, 4 insertions(+), 1 deletions(-)

 diff --git a/hw/eepro100.c b/hw/eepro100.c
 index 50d117e..52a18ad 100644
 --- a/hw/eepro100.c
 +++ b/hw/eepro100.c
 @@ -1036,6 +1036,8 @@ static void eepro100_ru_command(EEPRO100State * s, 
 uint8_t val)
  }
  set_ru_state(s, ru_ready);
  s-ru_offset = e100_read_reg4(s, SCBPointer);
 +   qemu_flush_queued_packets(s-nic-nc);
 +   qemu_notify_event();

 What would happen if the above changes were omitted?

 In the worst case the guest code would be unable to make progress since
 packet reception is disabled.

 The QEMU net subsystem needs to be kicked when rx buffers become
 available again so that any queued packets can be delivered and we can
 restart the event loop.

 The event loop needs to be restarted because net clients (like tap) use
 qemu_set_fd_handler2() with a read_poll() handler that returns false
 when the NIC is unable to receive.  Imagine this scenario:

 1. NIC runs out of rx buffers.
 2. Event loop iteration starts and calls tap's read_poll() handler,
which sees the NIC cannot receive and therefore does not add the tap
file descriptor to select(2).
 3. NIC gets new rx buffers but does not kick net subsystem/event loop.
 4. Event loop still sitting in select(2) without the tap file
descriptor.  Therefore incoming packets are not picked up by QEMU!

 In practice the event loop tends to iterate due to timers, etc.  But in
 the worst case we can go completely starved here.

 Yes. The fd will be added to read set in the next iteration. The delay
 depends on the select timeout. it is possible to go starved here.


 Would the network show less performance? How much
 would the test scenario (Linux installation) take longer?

 Yes, the lack of kicks causes reduced network performance.  This is
 especially true with -netdev tap and a guest driver that runs out of rx
 buffers.  If you're lucky you might not hit this depending on the
 -netdev and availability of rx buffers.

 What about the other nic emulations in QEMU?
 I observe hanging network rather often with the
 ARM versatilepb emulation.

 virtio-net has been correct for some time.

 e1000, xen, usb, and eepro100 are now fixed in the net tree:
 http://github.com/stefanha/qemu/commits/net

 Other NICs may or may not be okay.  Really all of them need to be
 audited.

  TRACE(OTHER, logout(val=0x%02x (rx start)\n, val));
  break;
  case RX_RESUME:
 @@ -1770,7 +1772,8 @@ static ssize_t nic_receive(NetClientState *nc, const 
 uint8_t * buf, size_t size)
  if (rfd_command  COMMAND_EL) {
  /* EL bit is set, so this was the last frame. */
  logout(receive: Running out of frames\n);
 -set_ru_state(s, ru_suspended);
 +set_ru_state(s, ru_no_resources);
 +   eepro100_rnr_interrupt(s);

 Adding the interrupt here is correct (I have similar code in
 http://repo.or.cz/w/qemu/ar7.git/blob/HEAD:/hw/eepro100.c
 which is an improved version of hw/eepro100.c).

 Setting ru_no_resources looks also good, but I am not
 sure whether removing ru_suspended is ok. Maybe it should
 be ru_no_resources | ru_suspended.

 I think the datasheet talks about setting the RU to no resources and the
 CU to suspended.  So there are two state machines and we only track one
 here.

 I don't think I understand this. If we run out of rx descriptor, why do
 we suspend tx unit too? maybe there are reasons I am unaware of.. I
 don't know.

I was wrong.  The datasheet Table 55. CU Activities Performed at the
End of Execution shows that the EL and S bit cause the CU to enter
the Idle State.

In terms of hw/eepro100.c I don't think we care about the CU state.
RU state No Resources is correct.

Stefan



Re: [Qemu-devel] [PATCH v2] Fix buffer run out in eepro100.

2012-08-30 Thread Stefan Hajnoczi
On Fri, Aug 31, 2012 at 4:40 AM, Bo Yang boy...@suse.com wrote:


 Stefan Hajnoczi stefa...@gmail.com 08/30/12 7:42 PM 
 On Thu, Aug 30, 2012 at 9:38 AM, Bo Yang boy...@suse.com wrote:
 On 08/30/2012 04:04 PM, Stefan Hajnoczi wrote:
 On Wed, Aug 29, 2012 at 09:17:43PM +0200, Stefan Weil wrote:
 Am 29.08.2012 13:26, schrieb Bo Yang:
 This is reported by QA. When installing os with pxe, after the initial
 kernel and initrd are loaded, the procedure tries to copy files from 
 install
 server to local harddisk, the network becomes stall because of running 
 out of
 receive descriptor.

 Signed-off-by: Bo Yangboy...@suse.com
 ---
  hw/eepro100.c |5 -
  1 files changed, 4 insertions(+), 1 deletions(-)

 diff --git a/hw/eepro100.c b/hw/eepro100.c
 index 50d117e..52a18ad 100644
 --- a/hw/eepro100.c
 +++ b/hw/eepro100.c
 @@ -1036,6 +1036,8 @@ static void eepro100_ru_command(EEPRO100State * s, 
 uint8_t val)
  }
  set_ru_state(s, ru_ready);
  s-ru_offset = e100_read_reg4(s, SCBPointer);
 +   qemu_flush_queued_packets(s-nic-nc);
 +   qemu_notify_event();

 What would happen if the above changes were omitted?

 In the worst case the guest code would be unable to make progress since
 packet reception is disabled.

 The QEMU net subsystem needs to be kicked when rx buffers become
 available again so that any queued packets can be delivered and we can
 restart the event loop.

 The event loop needs to be restarted because net clients (like tap) use
 qemu_set_fd_handler2() with a read_poll() handler that returns false
 when the NIC is unable to receive.  Imagine this scenario:

 1. NIC runs out of rx buffers.
 2. Event loop iteration starts and calls tap's read_poll() handler,
which sees the NIC cannot receive and therefore does not add the tap
file descriptor to select(2).
 3. NIC gets new rx buffers but does not kick net subsystem/event loop.
 4. Event loop still sitting in select(2) without the tap file
descriptor.  Therefore incoming packets are not picked up by QEMU!

 In practice the event loop tends to iterate due to timers, etc.  But in
 the worst case we can go completely starved here.

 Yes. The fd will be added to read set in the next iteration. The delay
 depends on the select timeout. it is possible to go starved here.


 Would the network show less performance? How much
 would the test scenario (Linux installation) take longer?

 Yes, the lack of kicks causes reduced network performance.  This is
 especially true with -netdev tap and a guest driver that runs out of rx
 buffers.  If you're lucky you might not hit this depending on the
 -netdev and availability of rx buffers.

 What about the other nic emulations in QEMU?
 I observe hanging network rather often with the
 ARM versatilepb emulation.

 virtio-net has been correct for some time.

 e1000, xen, usb, and eepro100 are now fixed in the net tree:
 http://github.com/stefanha/qemu/commits/net

 Other NICs may or may not be okay.  Really all of them need to be
 audited.

  TRACE(OTHER, logout(val=0x%02x (rx start)\n, val));
  break;
  case RX_RESUME:
 @@ -1770,7 +1772,8 @@ static ssize_t nic_receive(NetClientState *nc, 
 const uint8_t * buf, size_t size)
  if (rfd_command  COMMAND_EL) {
  /* EL bit is set, so this was the last frame. */
  logout(receive: Running out of frames\n);
 -set_ru_state(s, ru_suspended);
 +set_ru_state(s, ru_no_resources);
 +   eepro100_rnr_interrupt(s);

 Adding the interrupt here is correct (I have similar code in
 http://repo.or.cz/w/qemu/ar7.git/blob/HEAD:/hw/eepro100.c
 which is an improved version of hw/eepro100.c).

 Setting ru_no_resources looks also good, but I am not
 sure whether removing ru_suspended is ok. Maybe it should
 be ru_no_resources | ru_suspended.

 I think the datasheet talks about setting the RU to no resources and the
 CU to suspended.  So there are two state machines and we only track one
 here.

 I don't think I understand this. If we run out of rx descriptor, why do
 we suspend tx unit too? maybe there are reasons I am unaware of.. I
 don't know.

 I was wrong.  The datasheet Table 55. CU Activities Performed at the
 End of Execution shows that the EL and S bit cause the CU to enter
 the Idle State.

 In terms of hw/eepro100.c I don't think we care about the CU state.
 RU state No Resources is correct.

 I think we've done here, do we? sorry for the format of the mail, it is
 sent from web.. Thanks for reviewing and suggestions.

I'm happy with this patch.  I have merged it into the net tree:

https://github.com/stefanha/qemu/commit/363ed8f3012462d409de8fcbfe746b7336a2c437

Stefan



Re: [Qemu-devel] [PATCH v2] Fix buffer run out in eepro100.

2012-08-29 Thread Stefan Hajnoczi
On Wed, Aug 29, 2012 at 07:26:11PM +0800, Bo Yang wrote:
 This is reported by QA. When installing os with pxe, after the initial
 kernel and initrd are loaded, the procedure tries to copy files from install
 server to local harddisk, the network becomes stall because of running out of
 receive descriptor.
 
 Signed-off-by: Bo Yang boy...@suse.com
 ---
  hw/eepro100.c |5 -
  1 files changed, 4 insertions(+), 1 deletions(-)

Thanks, applied to the net patches tree:
https://github.com/stefanha/qemu/commits/net

I have reproduced the bug and tested that your patch fixes it.  Your
patch also has tabs instead of spaces, I fixed that up when merging.

Stefan



Re: [Qemu-devel] [PATCH v2] Fix buffer run out in eepro100.

2012-08-29 Thread Stefan Weil

Am 29.08.2012 13:26, schrieb Bo Yang:

This is reported by QA. When installing os with pxe, after the initial
kernel and initrd are loaded, the procedure tries to copy files from install
server to local harddisk, the network becomes stall because of running out of
receive descriptor.

Signed-off-by: Bo Yangboy...@suse.com
---
  hw/eepro100.c |5 -
  1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/hw/eepro100.c b/hw/eepro100.c
index 50d117e..52a18ad 100644
--- a/hw/eepro100.c
+++ b/hw/eepro100.c
@@ -1036,6 +1036,8 @@ static void eepro100_ru_command(EEPRO100State * s, 
uint8_t val)
  }
  set_ru_state(s, ru_ready);
  s-ru_offset = e100_read_reg4(s, SCBPointer);
+   qemu_flush_queued_packets(s-nic-nc);
+   qemu_notify_event();


What would happen if the above changes were omitted?
Would the network show less performance? How much
would the test scenario (Linux installation) take longer?

What about the other nic emulations in QEMU?
I observe hanging network rather often with the
ARM versatilepb emulation.


  TRACE(OTHER, logout(val=0x%02x (rx start)\n, val));
  break;
  case RX_RESUME:
@@ -1770,7 +1772,8 @@ static ssize_t nic_receive(NetClientState *nc, const 
uint8_t * buf, size_t size)
  if (rfd_command  COMMAND_EL) {
  /* EL bit is set, so this was the last frame. */
  logout(receive: Running out of frames\n);
-set_ru_state(s, ru_suspended);
+set_ru_state(s, ru_no_resources);
+   eepro100_rnr_interrupt(s);


Adding the interrupt here is correct (I have similar code in
http://repo.or.cz/w/qemu/ar7.git/blob/HEAD:/hw/eepro100.c
which is an improved version of hw/eepro100.c).

Setting ru_no_resources looks also good, but I am not
sure whether removing ru_suspended is ok. Maybe it should
be ru_no_resources | ru_suspended.


  }
  if (rfd_command  COMMAND_S) {
  /* S bit is set. */


It's interesting to hear that your QA tests with eepro100.c.
Which nic model do they use in their tests?

Regards,

Stefan W.




Re: [Qemu-devel] [PATCH v2] Fix buffer run out in eepro100.

2012-08-29 Thread Bo Yang
On 08/29/2012 11:19 PM, Stefan Hajnoczi wrote:
 On Wed, Aug 29, 2012 at 07:26:11PM +0800, Bo Yang wrote:
 This is reported by QA. When installing os with pxe, after the initial
 kernel and initrd are loaded, the procedure tries to copy files from install
 server to local harddisk, the network becomes stall because of running out of
 receive descriptor.

 Signed-off-by: Bo Yang boy...@suse.com
 ---
  hw/eepro100.c |5 -
  1 files changed, 4 insertions(+), 1 deletions(-)
 
 Thanks, applied to the net patches tree:
 https://github.com/stefanha/qemu/commits/net
 
 I have reproduced the bug and tested that your patch fixes it.  Your
 patch also has tabs instead of spaces, I fixed that up when merging.

Thanks for verifying and merging the patch. Sorry for my mistake in
coding style. I'll be careful next time. I use vim, it looks like set
tabstop=4 is not enough to get it work as expected.
 
 Stefan
 




Re: [Qemu-devel] [PATCH v2] Fix buffer run out in eepro100.

2012-08-29 Thread Bo Yang
On 08/30/2012 03:17 AM, Stefan Weil wrote:
 Am 29.08.2012 13:26, schrieb Bo Yang:
 This is reported by QA. When installing os with pxe, after the initial
 kernel and initrd are loaded, the procedure tries to copy files from
 install
 server to local harddisk, the network becomes stall because of running
 out of
 receive descriptor.

 Signed-off-by: Bo Yangboy...@suse.com
 ---
   hw/eepro100.c |5 -
   1 files changed, 4 insertions(+), 1 deletions(-)

 diff --git a/hw/eepro100.c b/hw/eepro100.c
 index 50d117e..52a18ad 100644
 --- a/hw/eepro100.c
 +++ b/hw/eepro100.c
 @@ -1036,6 +1036,8 @@ static void eepro100_ru_command(EEPRO100State *
 s, uint8_t val)
   }
   set_ru_state(s, ru_ready);
   s-ru_offset = e100_read_reg4(s, SCBPointer);
 +qemu_flush_queued_packets(s-nic-nc);
 +qemu_notify_event();
 
 What would happen if the above changes were omitted?
 Would the network show less performance? How much
 would the test scenario (Linux installation) take longer?

It is probably to show less performance. But I don't know how much it
will be.

 
 What about the other nic emulations in QEMU?
 I observe hanging network rather often with the
 ARM versatilepb emulation.
 
   TRACE(OTHER, logout(val=0x%02x (rx start)\n, val));
   break;
   case RX_RESUME:
 @@ -1770,7 +1772,8 @@ static ssize_t nic_receive(NetClientState *nc,
 const uint8_t * buf, size_t size)
   if (rfd_command  COMMAND_EL) {
   /* EL bit is set, so this was the last frame. */
   logout(receive: Running out of frames\n);
 -set_ru_state(s, ru_suspended);
 +set_ru_state(s, ru_no_resources);
 +eepro100_rnr_interrupt(s);
 
 Adding the interrupt here is correct (I have similar code in
 http://repo.or.cz/w/qemu/ar7.git/blob/HEAD:/hw/eepro100.c
 which is an improved version of hw/eepro100.c).
 
 Setting ru_no_resources looks also good, but I am not
 sure whether removing ru_suspended is ok. Maybe it should
 be ru_no_resources | ru_suspended.

I am not sure..

 
   }
   if (rfd_command  COMMAND_S) {
   /* S bit is set. */
 
 It's interesting to hear that your QA tests with eepro100.c.
 Which nic model do they use in their tests?

It is i82559er.

 
 Regards,
 
 Stefan W.