Re: UDP sendto() returning ENOBUFS - "No buffer space available"

2014-07-18 Thread Bruce Evans

On Fri, 18 Jul 2014, Adrian Chadd wrote:


On 18 July 2014 13:40, Bruce Evans  wrote:

On Fri, 18 Jul 2014, hiren panchasara wrote:


On Wed, Jul 16, 2014 at 11:00 AM, Adrian Chadd  wrote:


Hi!

So the UDP transmit path is udp_usrreqs->pru_send() == udp_send() ->
udp_output() -> ip_output()

udp_output() does do a M_PREPEND() which can return ENOBUFS. ip_output
can also return ENOBUFS.

it doesn't look like the socket code (eg sosend_dgram()) is doing any
buffering - it's just copying the frame and stuffing it up to the
driver. No queuing involved before the NIC.


Right. Thanks for confirming.


Most buffering should be in ifq above the NIC.  For UDP, I think
udp_output() puts buffers on the ifq and calls the driver for every
one, but the driver shouldn't do anything for most calls.  The
driver can't possibly do anything if its ring buffer is full, and
shouldn't do anything if it is nearly full.  Buffers accumulate in
the ifq until the driver gets around to them or the queue fills up.
Most ENOBUFS errors are for when it fills up.  It can very easily
fill up, especially since it is too small in most configurations.
Just loop calling sendto().  This will fill the ifq almost
instantly unless the hardware is faster than the software.


For if_transmit() drivers, there's no ifp queue. The queuing is being
done in the driver.

For drivers with if_transmit(), they may end up doing direct DMA ring
dispatch or they may have a buf_ring in front of it.There's no ifq
anymore. It upsets the ALTQ people too.


Ah, a new source of bugs.  Most drivers don't use this yet.  Most still
use ifq with the bogus size of (tx_ring_size - 1):

Ones converted to the indirect API:
% dev/bge/if_bge.c: if_setsendqlen(ifp, BGE_TX_RING_CNT - 1);
% dev/bxe/bxe.c:if_setsendqlen(ifp, sc->tx_ring_size);

bxe is one of the few without the silly subtraction of 1.

% dev/e1000/if_em.c:if_setsendqlen(ifp, adapter->num_tx_desc - 1);
% dev/e1000/if_lem.c:   if_setsendqlen(ifp, adapter->num_tx_desc - 1);
% dev/fxp/if_fxp.c: if_setsendqlen(ifp, FXP_NTXCB - 1);
% dev/nfe/if_nfe.c: if_setsendqlen(ifp, NFE_TX_RING_COUNT - 1);

Ones not converted:
% dev/ae/if_ae.c:   ifp->if_snd.ifq_drv_maxlen = ifqmaxlen;
% dev/ae/if_ae.c:   IFQ_SET_MAXLEN(&ifp->if_snd, 
ifp->if_snd.ifq_drv_maxlen);

The double setting is related to ALTQ.  I grepped for maxlen to find both.
I might have missed alternative spellings.

ifqmaxlen is usually 50, so all drivers using it have very little buffering.
Even if their tx ring is tiny, this 50 is too small above 1 or 10 Mbps.

% dev/age/if_age.c: ifp->if_snd.ifq_drv_maxlen = AGE_TX_RING_CNT - 1;
% dev/age/if_age.c: IFQ_SET_MAXLEN(&ifp->if_snd, 
ifp->if_snd.ifq_drv_maxlen);
% dev/alc/if_alc.c: ifp->if_snd.ifq_drv_maxlen = ALC_TX_RING_CNT - 1;
% dev/alc/if_alc.c: IFQ_SET_MAXLEN(&ifp->if_snd, 
ifp->if_snd.ifq_drv_maxlen);
% dev/ale/if_ale.c: ifp->if_snd.ifq_drv_maxlen = ALE_TX_RING_CNT - 1;
% dev/ale/if_ale.c: IFQ_SET_MAXLEN(&ifp->if_snd, 
ifp->if_snd.ifq_drv_maxlen);
% dev/an/if_an.c:   IFQ_SET_MAXLEN(&ifp->if_snd, ifqmaxlen);
% dev/an/if_an.c:   ifp->if_snd.ifq_drv_maxlen = ifqmaxlen;
% dev/asmc/asmc.c:  uint8_t maxlen;
% dev/asmc/asmc.c:  maxlen = type[0];

Grepping for maxlen unfortunately found related things.  I deleted most
after this.

% dev/ath/if_ath.c: IFQ_SET_MAXLEN(&ifp->if_snd, ifqmaxlen);
% dev/ath/if_ath.c: ifp->if_snd.ifq_drv_maxlen = ifqmaxlen;
% dev/bce/if_bce.c: ifp->if_snd.ifq_drv_maxlen = USABLE_TX_BD_ALLOC;
% dev/bce/if_bce.c: IFQ_SET_MAXLEN(&ifp->if_snd, 
ifp->if_snd.ifq_drv_maxlen);
% dev/bfe/if_bfe.c: ifp->if_snd.ifq_drv_maxlen = BFE_TX_QLEN;
% dev/bm/if_bm.c:   ifp->if_snd.ifq_drv_maxlen = BM_MAX_TX_PACKETS;
% dev/bwi/if_bwi.c: IFQ_SET_MAXLEN(&ifp->if_snd, ifqmaxlen);
% dev/bwi/if_bwi.c: ifp->if_snd.ifq_drv_maxlen = ifqmaxlen;
% dev/bwn/if_bwn.c: IFQ_SET_MAXLEN(&ifp->if_snd, ifqmaxlen);
% dev/bwn/if_bwn.c: ifp->if_snd.ifq_drv_maxlen = ifqmaxlen;
% dev/cadence/if_cgem.c:ifp->if_snd.ifq_drv_maxlen = IFQ_MAXLEN;
% dev/cas/if_cas.c: ifp->if_snd.ifq_drv_maxlen = CAS_TXQUEUELEN;
% dev/ce/if_ce.c:   d->queue.ifq_maxlen  = ifqmaxlen;
% dev/ce/if_ce.c:   d->hi_queue.ifq_maxlen   = ifqmaxlen;
% dev/ce/if_ce.c:   d->rqueue.ifq_maxlen = ifqmaxlen;
% dev/ce/if_ce.c:   d->rqueue.ifq_maxlen = ifqmaxlen;

Seems silly to have many tiny queues, especially when their length is
nominal and can be changed by tunables if not sysctls so that it is
not actually tiny.  But good for latency.

% dev/cm/smc90cx6.c:ifp->if_snd.ifq_maxlen = ifqmaxlen;
% dev/cp/if_cp.c:   d->queue.ifq_maxlen = ifqmaxlen;
% dev/cp/if_cp.c:   d-&

Re: UDP sendto() returning ENOBUFS - "No buffer space available"

2014-07-18 Thread Jim Thompson

> On Jul 18, 2014, at 23:34, Adrian Chadd  wrote:
> 
> It upsets the ALTQ people too.

I'm an ALTQ person (pfSense, so maybe one if the biggest) and I'm not upset.

That cr*p needs to die in a fire. 
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: UDP sendto() returning ENOBUFS - "No buffer space available"

2014-07-18 Thread Adrian Chadd
Hi,


On 18 July 2014 13:40, Bruce Evans  wrote:
> On Fri, 18 Jul 2014, hiren panchasara wrote:
>
>> On Wed, Jul 16, 2014 at 11:00 AM, Adrian Chadd  wrote:
>>>
>>> Hi!
>>>
>>> So the UDP transmit path is udp_usrreqs->pru_send() == udp_send() ->
>>> udp_output() -> ip_output()
>>>
>>> udp_output() does do a M_PREPEND() which can return ENOBUFS. ip_output
>>> can also return ENOBUFS.
>>>
>>> it doesn't look like the socket code (eg sosend_dgram()) is doing any
>>> buffering - it's just copying the frame and stuffing it up to the
>>> driver. No queuing involved before the NIC.
>>
>>
>> Right. Thanks for confirming.
>
>
> Most buffering should be in ifq above the NIC.  For UDP, I think
> udp_output() puts buffers on the ifq and calls the driver for every
> one, but the driver shouldn't do anything for most calls.  The
> driver can't possibly do anything if its ring buffer is full, and
> shouldn't do anything if it is nearly full.  Buffers accumulate in
> the ifq until the driver gets around to them or the queue fills up.
> Most ENOBUFS errors are for when it fills up.  It can very easily
> fill up, especially since it is too small in most configurations.
> Just loop calling sendto().  This will fill the ifq almost
> instantly unless the hardware is faster than the software.

For if_transmit() drivers, there's no ifp queue. The queuing is being
done in the driver.

For drivers with if_transmit(), they may end up doing direct DMA ring
dispatch or they may have a buf_ring in front of it.There's no ifq
anymore. It upsets the ALTQ people too.



-a
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: UDP sendto() returning ENOBUFS - "No buffer space available"

2014-07-18 Thread Bruce Evans

On Fri, 18 Jul 2014, hiren panchasara wrote:


On Wed, Jul 16, 2014 at 11:00 AM, Adrian Chadd  wrote:

Hi!

So the UDP transmit path is udp_usrreqs->pru_send() == udp_send() ->
udp_output() -> ip_output()

udp_output() does do a M_PREPEND() which can return ENOBUFS. ip_output
can also return ENOBUFS.

it doesn't look like the socket code (eg sosend_dgram()) is doing any
buffering - it's just copying the frame and stuffing it up to the
driver. No queuing involved before the NIC.


Right. Thanks for confirming.


Most buffering should be in ifq above the NIC.  For UDP, I think
udp_output() puts buffers on the ifq and calls the driver for every
one, but the driver shouldn't do anything for most calls.  The
driver can't possibly do anything if its ring buffer is full, and
shouldn't do anything if it is nearly full.  Buffers accumulate in
the ifq until the driver gets around to them or the queue fills up.
Most ENOBUFS errors are for when it fills up.  It can very easily
fill up, especially since it is too small in most configurations.
Just loop calling sendto().  This will fill the ifq almost
instantly unless the hardware is faster than the software.


So a _well behaved_ driver will return ENOBUFS _and_ not queue the
frame. However, it's entirely plausible that the driver isn't well
behaved - the intel drivers screwed up here and there with transmit
queue and failure to queue vs failure to transmit.


No, the driver doesn't have much control over the ifq.


So yeah, try tweaking the tx ring descriptor for the driver your'e
using and see how big a bufring it's allocating.


Yes, so I am dealing with Broadcom BCM5706/BCM5708 Gigabit Ethernet,
i.e. bce(4).

I bumped up tx_pages from 2 (default) to 8 where each page is 255
buffer descriptors.

I am seeing quite nice improvement on stable/10 where I can send
*more* stuff :-)


255 is not many.  I am most familiar with bge where there is a single
tx ring with 511 or 512 buffer descriptors (some bge's have more, but
this is unportable and was not supported last time I looked.  The
extras might be only for input).  One of my bge's can do 640 kpps with
tiny packets (only 80 kpps with normal packets) and the other only 200
(?) kpps (both should be limited mainly by the PCI bus, but the slow
one is limited by it being a dumbed down 5705"plus").  At 640 kpps,
it takes 800 microseconds to transmit 512 packets.  (There is 1 packet
per buffer descriptor for small packets.)

Considerable buffering in ifq is needed to prevent the transmitter
running dry whenever the application stops generating packets for more
than 800 microseconds for some reason, but the default buffering is
stupidly small.  The default is given by net.inet.ifqmaxlen and some
corresponding macros, and is still just 50.  50 was enough for 1 Mpbs
ethernet and perhaps even for 10 Mbps, but is now too small.  Most
drivers don't use it, but use their own too-small value.  bge uses
just its own ring buffer size of 511.  I use 1 or 4 depending
on hz:

% diff -u2 if_bge.c~ if_bge.c
% --- if_bge.c~ 2012-03-13 02:13:48.144002000 +
% +++ if_bge.c  2012-03-13 02:13:50.123023000 +
% @@ -3315,5 +3316,6 @@
%   ifp->if_start = bge_start;
%   ifp->if_init = bge_init;
% - ifp->if_snd.ifq_drv_maxlen = BGE_TX_RING_CNT - 1;
% + ifp->if_snd.ifq_drv_maxlen = BGE_TX_RING_CNT +
% + imax(4 * tick, 1) / 1;
%   IFQ_SET_MAXLEN(&ifp->if_snd, ifp->if_snd.ifq_drv_maxlen);
%   IFQ_SET_READY(&ifp->if_snd);

4 is what is needed for 4 tick's worth of buffering at hz = 100.
4 is far too large where 50 is far too small, but something like
it is needed when hz is large due to another problem: select() on
the ENOBUFS condition is broken (unsupported), so when sendto()
returns ENOBUFS there is no way for the application to tell how
long it should wait before retrying.  If it wants to burn CPU then
it can spin calling sendto().  Otherwise, it should sleep, but
with a sleep granularity of 1 tick this requires several ticks worth
of buffering to avoid the transmitter running dry.  Large queue lengths
give a large latency for packets at the end of the queue and give no
chance of the working set fitting in an Ln cache for small n.

The precise stupidly small value of (tx_ring_count - 1) for the ifq
length seems to be for no good reason.  Subtracting 1 is apparently
to increase the chance that all packets in the ifq can be fitted into
the tx ring.  But this is silly since the ifq packet count is in
dufferent units to the buffer descriptor count.  For normal-size
packets, there are 2 descriptors per packet.  So in the usual case
where the ifq is full, only about half of it can be moved to the tx
ring.  And this is good since it gives a little more buffering.
Otherwise, the effective buffering is just what is in the tx ring,
since none is left in the ifq after transferring eve

Re: UDP sendto() returning ENOBUFS - "No buffer space available"

2014-07-18 Thread hiren panchasara
On Wed, Jul 16, 2014 at 11:00 AM, Adrian Chadd  wrote:
> Hi!
>
> So the UDP transmit path is udp_usrreqs->pru_send() == udp_send() ->
> udp_output() -> ip_output()
>
> udp_output() does do a M_PREPEND() which can return ENOBUFS. ip_output
> can also return ENOBUFS.
>
> it doesn't look like the socket code (eg sosend_dgram()) is doing any
> buffering - it's just copying the frame and stuffing it up to the
> driver. No queuing involved before the NIC.

Right. Thanks for confirming.
>
> So a _well behaved_ driver will return ENOBUFS _and_ not queue the
> frame. However, it's entirely plausible that the driver isn't well
> behaved - the intel drivers screwed up here and there with transmit
> queue and failure to queue vs failure to transmit.
>
> So yeah, try tweaking the tx ring descriptor for the driver your'e
> using and see how big a bufring it's allocating.

Yes, so I am dealing with Broadcom BCM5706/BCM5708 Gigabit Ethernet,
i.e. bce(4).

I bumped up tx_pages from 2 (default) to 8 where each page is 255
buffer descriptors.

I am seeing quite nice improvement on stable/10 where I can send
*more* stuff :-)

cheers,
Hiren
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: UDP sendto() returning ENOBUFS - "No buffer space available"

2014-07-16 Thread Adrian Chadd
Hi!

So the UDP transmit path is udp_usrreqs->pru_send() == udp_send() ->
udp_output() -> ip_output()

udp_output() does do a M_PREPEND() which can return ENOBUFS. ip_output
can also return ENOBUFS.

it doesn't look like the socket code (eg sosend_dgram()) is doing any
buffering - it's just copying the frame and stuffing it up to the
driver. No queuing involved before the NIC.

So a _well behaved_ driver will return ENOBUFS _and_ not queue the
frame. However, it's entirely plausible that the driver isn't well
behaved - the intel drivers screwed up here and there with transmit
queue and failure to queue vs failure to transmit.

So yeah, try tweaking the tx ring descriptor for the driver your'e
using and see how big a bufring it's allocating.


-a




On 16 July 2014 01:58, hiren panchasara  wrote:
> Return values in sendto() manpage says:
>
>  [ENOBUFS]  The system was unable to allocate an internal buffer.
> The operation may succeed when buffers become avail-
> able.
>
>  [ENOBUFS]  The output queue for a network interface was full.
> This generally indicates that the interface has
> stopped sending, but may be caused by transient con-
> gestion.
>
> If I hit the first condition, it should reflect as failures in
> "netstat -m". Is that a correct assumption?
>
> I want to understand what happens when/if we hit the second condition.
> And how to prevent that from happening.
> Is it just application's job to rate-limit data it sends to the n/w
> interface card so that it doesn't saturate?
> Does kernel do any sort of queuing in the case of ENOBUFS? OR does the
> message just gets dropped?
>
> For an application sending a lot of UDP data and returning ENOBUFS,
> what all udp and other tunables I should tweak? I can only think of:
> - number of tx ring descriptors - increasing this will get us more txds.
> - kern.ipc.maxsockbuf:  Increasing this will increase buffer size
> allocated for sockets.
>
> what else?
>
> Any comments/suggestions/corrections?
>
> cheers,
> Hiren
> ___
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


UDP sendto() returning ENOBUFS - "No buffer space available"

2014-07-16 Thread hiren panchasara
Return values in sendto() manpage says:

 [ENOBUFS]  The system was unable to allocate an internal buffer.
The operation may succeed when buffers become avail-
able.

 [ENOBUFS]  The output queue for a network interface was full.
This generally indicates that the interface has
stopped sending, but may be caused by transient con-
gestion.

If I hit the first condition, it should reflect as failures in
"netstat -m". Is that a correct assumption?

I want to understand what happens when/if we hit the second condition.
And how to prevent that from happening.
Is it just application's job to rate-limit data it sends to the n/w
interface card so that it doesn't saturate?
Does kernel do any sort of queuing in the case of ENOBUFS? OR does the
message just gets dropped?

For an application sending a lot of UDP data and returning ENOBUFS,
what all udp and other tunables I should tweak? I can only think of:
- number of tx ring descriptors - increasing this will get us more txds.
- kern.ipc.maxsockbuf:  Increasing this will increase buffer size
allocated for sockets.

what else?

Any comments/suggestions/corrections?

cheers,
Hiren
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


ENOBUFS and DNS...

2003-12-15 Thread Garrett Wollman
< said:

> If I were to tweak the sysctl net.inet.ip.intr_queue_maxlen from its
> default of 50 up, would that possibly help named?

No, it will not have any effect on your problem.  The IP input queue
is only on receive, and your problem is on transmit.

The only thing that could possibly help your problem is increasing
your output queue length, and it is already quite substantial; doing
this will probably hurt as much as it helps, since the output queue is
serviced in strict FIFO order and there is no way to ``call back'' a
packet once it makes it there.  Something like ALTQ might help if you
are able to use a WFQ discipline and assign a high weight to DNS
traffic.

-GAWollman

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


ENOBUFS and DNS...

2003-12-15 Thread Barry Bouwsma
[Drop hostname part of IPv6-only address above to obtain IPv4-capable e-mail,
 or just drop me from the recipients and I'll catch up from the archives]

Hello, "%s"!

I've read in this list from a couple years ago, several discussions about
ENOBUFS being returned to UDP-using applications.  This is what I'm
experiencing with BIND when I get hit with lots of queries over a slow
link.

I'm serving DNS info for my subdomain, with an off-site secondary.  I'm
on a dial-in now (no comments please); I don't ever remember seeing this
with a cable modem connection (about 2-4x upstream speed than now, with
downstream speed higher still).

When I send a mail to the FreeBSD lists, shortly after, I get hit with
lots of DNS queries to verify my address(es).  My modem is saturated
both down- and upstream for some minutes.  For a minute or two, `named'
spits out syslog messages about insufficient resources, as the replies
it tries to make return ENOBUFS.

If I were to tweak the sysctl net.inet.ip.intr_queue_maxlen from its
default of 50 up, would that possibly help named?  Or might that cause
problems elsewhere?  Or should I ignore this, or would the best possible
solution be for me simply not to send any more mail to the lists?


I can think of a few possibilties for this being made worse over my
thin pipe.  Comments about my thoughts below are welcome, to help me
improve my understanding of things.

I'm usually filling the downstream pipe even without the queries
coming in (pay-per-minute so I'm trying to maximize use of pipe).
This alone may worsen things, as incoming queries see a high latency,
causing them to be repeated before a response is received, possibly
causing other nameservers to initiate queries to me, resulting in many
more queries coming in than if I returned answers promptly.

The size of the outgoing responses is larger than the queries, so it
takes more time to push out responses than it does for them to come in.
These factors combined with the timeouts/retries that resolvers and
nameservers have, mean that no matter what I do, things won't get a
lot better for me.

(As a note, when I sent mails over the cable modem, a different
mailing list software was used by FreeBSD.  Still, I'd see heaps of
queries shortly after, just as now.  This in the event the current
software makes the deliveries faster at the same time, causing more
simultaneous queries to me.  Also, perhaps more sites are doing not
only sender validation but also validation of the from address due
to spam growth the last year.)

I suspect that not all sites are able to successfuly query me, as
after the initial couple minutes of ENOBUFS problems and as the
incoming queries taper off, some time later I'll see a repeat of
the ENOBUFS problem, as I'm assuming another round of attempts is
made to dispose of the queue built up at freebsd.org.  If I'm still
online when that happens, to be queried, of course.

I haven't looked to see whether BIND does anything special when an
ENOBUFS pops up in order not to drop the response.  Perhaps if it
were to do so, queueing responses, things would only get worse as
the backlog continues to increase, so by the time responses get
sent, the requester has already given up (after sending a few more
queries to increase the backlog further).  Thus in such a case the
better thing is to drop random responses in order to get fewer of
them out in a more timely fashion.

Or perhaps I shouldn't worry, trusting that the sites which fail to
receive a response from me directly after a few tries might poke the
offsite secondary nameserver, and that the error-recovery is handled
by DNS, so I shouldn't do anything to UDP to try to help.


Anyway, just for fun, I'm going to double the above sysctl value for
this message and see how things change.  Later I'll think about
suspending my downloads to speed up incoming queries.  Also, I just
remembered that userland ppp allows me to prioritize certain traffic
so I should try that too, though normally the downloads I do only
snarf a few hundred bytes/sec from the outgoing pipe, so that might
help little

As noted, comments about my ideas are welcome.

Thanks,
Barry Bouwsma

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: mpd: two links make one disconnect (ENOBUFS, LCP no reply)

2003-12-10 Thread Michael Bretterklieber
Hi,

On Wed, 10 Dec 2003, Giovanni P. Tirloni wrote:
>  common:
>  set bundle disable multilink
>  set bundle enable compression
>  set bundle yes encryption
 ^^^ please remove this line
You don't need ECP for MPPE (Microsoft Point to Point Encryption)
Maybe this option is confusing the windoze clients.

>  set ccp yes mppc
>  set ccp yes mpp-e40
>  set ccp yes mpp-e56
>  set ccp yes mpp-e128
>  set ccp yes mpp-stateless
>  set ipcp enable vjcomp
>  set iface enable proxy-arp
>  set iface route 192.168.1.253/24
>  set ipcp dns 1.2.3.4
>  set link deny pap chap
>  set link enable chap-md5 chap-msv1 chap-msv2
BTW: you can just enable chap-msv1 and chap-msv2, because when using MPPE
MS-CHAP is mandatory.

Could you please post (in private) more of your logfile and your
mpd.links?

bye,
--
--- --
Michael Bretterklieber  - http://www.bretterklieber.com
A-Quadrat Automation GmbH   - http://www.a-quadrat.at
Tel: ++43-(0)3172-41679 - GSM: ++43-(0)699 12861847
--- --
"...the number of UNIX installations has grown to 10, with more
expected..." - Dennis Ritchie and Ken Thompson, June 1972
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


mpd: two links make one disconnect (ENOBUFS, LCP no reply)

2003-12-10 Thread Giovanni P. Tirloni
Hi,

 The behaviour I'm having with mpd-3.15 is that it establishes the first
 connection in ng0 and when I try to open another connection it works
 but drops the first one after sometime because it stops answering the
 LCP echos. 

 When both are established I can ping the last one but the ping to the
 first IP returns ENOBUFS (probably because the link is being dropped).
 Anything related to the PPTP output window?

 Here is the log entries after both links are established (they show as
 connected in the win2k and winxp boxes and pptp0 was answering the LCP
 echos):

Dec 10 11:02:22 servidor mpd: [pptp1] exec: command returned 256 
Dec 10 11:02:22 servidor mpd: [pptp1] IFACE: Up event 
Dec 10 11:02:24 servidor mpd: [pptp1] ECP: SendConfigReq #4 
Dec 10 11:02:24 servidor mpd: [pptp1] LCP: rec'd Protocol Reject #9 link 0 (Opened) 
Dec 10 11:02:24 servidor mpd: [pptp1] LCP: protocol ECP was rejected 
Dec 10 11:02:24 servidor mpd: [pptp1] ECP: protocol was rejected by peer 
Dec 10 11:02:24 servidor mpd: [pptp1] ECP: state change Req-Sent --> Stopped 
Dec 10 11:02:24 servidor mpd: [pptp1] ECP: LayerFinish 
Dec 10 11:03:20 servidor mpd: [pptp0] LCP: no reply to 1 echo request(s) 
Dec 10 11:03:25 servidor mpd: [pptp0] LCP: no reply to 2 echo request(s) 
Dec 10 11:03:30 servidor mpd: [pptp0] LCP: no reply to 3 echo request(s) 
Dec 10 11:03:35 servidor mpd: [pptp0] LCP: no reply to 4 echo request(s) 
Dec 10 11:03:40 servidor mpd: [pptp0] LCP: no reply to 5 echo request(s) 
Dec 10 11:03:45 servidor mpd: [pptp0] LCP: no reply to 6 echo request(s) 
Dec 10 11:03:50 servidor mpd: [pptp0] LCP: no reply to 7 echo request(s) 
Dec 10 11:03:50 servidor mpd: [pptp0] LCP: peer not responding to echo requests 
Dec 10 11:03:50 servidor mpd: [pptp0] LCP: LayerFinish 
Dec 10 11:03:50 servidor mpd: [pptp0] LCP: LayerStart 
Dec 10 11:03:50 servidor mpd: [pptp0] LCP: state change Opened --> Starting 
Dec 10 11:03:50 servidor mpd: [pptp0] LCP: phase shift NETWORK --> DEAD 
Dec 10 11:03:50 servidor mpd: [pptp0] setting interface ng0 MTU to 1500 bytes 
Dec 10 11:03:50 servidor mpd: [pptp0] up: 0 links, total bandwidth 9600 bps 
Dec 10 11:03:50 servidor mpd: [pptp0] IPCP: Down event 
Dec 10 11:03:50 servidor mpd: [pptp0] IPCP: state change Opened --> Starting 
Dec 10 11:03:50 servidor mpd: [pptp0] IPCP: LayerDown 
Dec 10 11:03:50 servidor mpd: [pptp0] IFACE: Down event 
Dec 10 11:03:50 servidor mpd: [pptp0] exec: /sbin/route delete 192.168.1.253 -iface 
lo0 
Dec 10 11:03:50 servidor mpd: [pptp0] exec: /usr/sbin/arp -d 192.168.1.220 
Dec 10 11:03:50 servidor mpd: [pptp0] exec: /sbin/ifconfig ng0 down delete -link0 
Dec 10 11:03:50 servidor mpd: [pptp0] CCP: Down event 
Dec 10 11:03:50 servidor mpd: [pptp0] CCP: state change Opened --> Starting 
Dec 10 11:03:50 servidor mpd: [pptp0] CCP: LayerDown 
Dec 10 11:03:50 servidor mpd: [pptp0] CCP: Close event 
Dec 10 11:03:50 servidor mpd: [pptp0] CCP: state change Starting --> Initial 
Dec 10 11:03:50 servidor mpd: [pptp0] CCP: LayerFinish 
Dec 10 11:03:50 servidor mpd: [pptp0] ECP: Down event 
Dec 10 11:03:50 servidor mpd: [pptp0] ECP: state change Stopped --> Starting 
Dec 10 11:03:50 servidor mpd: [pptp0] ECP: LayerStart 
Dec 10 11:03:50 servidor mpd: [pptp0] ECP: Close event 

 # netstat -m
 mbuf usage:
 GEN cache:  0/0 (in use/in pool)
 CPU #0 cache:   2/256 (in use/in pool)
 Total:  2/256 (in use/in pool)
 Mbuf cache high watermark: 512
 Maximum possible: 27136
 Allocated mbuf types:
   2 mbufs allocated to data
 0% of mbuf map consumed
 mbuf cluster usage:
 GEN cache:  0/80 (in use/in pool)
 CPU #0 cache:   0/128 (in use/in pool)
 Total:  0/208 (in use/in pool)
 Cluster cache high watermark: 128
 Maximum possible: 13568
 1% of cluster map consumed
 480 KBytes of wired memory reserved (0% in use)
 0 requests for memory denied
 0 requests for memory delayed
 0 calls to protocol drain routines

 After much tweaking here is my mpd.conf:

  mpd.conf ---

 default:
 load pptp0
 load pptp1

 common:
 set bundle disable multilink
 set bundle enable compression
 set bundle yes encryption
 set ccp yes mppc
 set ccp yes mpp-e40
 set ccp yes mpp-e56
 set ccp yes mpp-e128
 set ccp yes mpp-stateless
 set ipcp enable vjcomp
 set iface enable proxy-arp
 set iface route 192.168.1.253/24
 set ipcp dns 1.2.3.4
 set link deny pap chap
 set link enable chap-md5 chap-msv1 chap-msv2
 set ipcp nbns 192.168.1.254


 pptp0:
 new -i ng0 pptp0 pptp0
 set ipcp ranges 192.168.1.253/32 192.168.1.220/24
 load common
 
 pptp1:
 new -i ng1 pptp1 pptp1
 set ipcp ranges 192.168.1.253/32 192.168.1.221/24
 load common

  mpd.conf -

 Thanks in adv

RE: bug in bge driver with ENOBUFS on 4.7

2002-11-12 Thread Don Bowman
> From: Don Bowman [mailto:don@;sandvine.com]
> In bge_rxeof(), there can end up being a condition which causes
> the driver to endlessly interrupt.
> 
> if (bge_newbuf_std(sc, sc->bge_std, NULL) == ENOBUFS) {
> ifp->if_ierrors++;
> bge_newbuf_std(sc, sc->bge_std, m);
> continue;
> }
> 
> happens. Now, bge_newbuf_std returns ENOBUFS. 'm' is also NULL.
> This causes the received packet to not be dequeued, and the driver
> will then go straight back into interrupt as the chip will 
> reassert the interrupt as soon as we return.

More information... It would appear that we're looping here
in the rx interrupt, the variable 'stdcnt' which counts
the number of standard-sized packets pulled off per iteration
is huge (indicating we've overrun the ring multiple times).

while(sc->bge_rx_saved_considx !=
sc->bge_rdata->bge_status_block.bge_idx[0].bge_rx_prod_idx) {

is the construct that controls when we exit the loop. Clearly
in my case this is never becoming false.
I see 'sc->bge_rx_saved_considx' as 201, and the RHS of the 
expression as 38442. This doesn't seem correct, I think that
both numbers must be <= BGE_SSLOTS. 

(kgdb) p/x *cur_rx
$10 = {bge_addr = {bge_addr_hi = 0x0, bge_addr_lo = 0xca2d802}, 
  bge_len = 0x4a, bge_idx = 0xc8, bge_flags = 0x7004, bge_type = 0x0, 
  bge_tcp_udp_csum = 0x9992, bge_ip_csum = 0x, bge_vlan_tag = 0x0, 
  bge_error_flag = 0x0, bge_rsvd = 0x0, bge_opaque = 0x0}

Any suggestions anyone?


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



bug in bge driver with ENOBUFS on 4.7

2002-11-09 Thread Don Bowman
In bge_rxeof(), there can end up being a condition which causes
the driver to endlessly interrupt.

if (bge_newbuf_std(sc, sc->bge_std, NULL) == ENOBUFS) {
ifp->if_ierrors++;
bge_newbuf_std(sc, sc->bge_std, m);
continue;
}

happens. Now, bge_newbuf_std returns ENOBUFS. 'm' is also NULL.
This causes the received packet to not be dequeued, and the driver
will then go straight back into interrupt as the chip will 
reassert the interrupt as soon as we return.

Suggestions on a fix? 
I'm not sure why I ran out of mbufs, I have
kern.ipc.nmbclusters: 9
kern.ipc.nmbufs: 28

(kgdb) p/x mbstat
$11 = {m_mbufs = 0x3a0, m_clusters = 0x39c, m_spare = 0x0, m_clfree = 0x212,

  m_drops = 0x0, m_wait = 0x0, m_drain = 0x0, m_mcfail = 0x0, m_mpfail =
0x0, 
  m_msize = 0x100, m_mclbytes = 0x800, m_minclsize = 0xd5, m_mlen = 0xec, 
  m_mhlen = 0xd4}

but bge_newbuf_std() does this:
if (m == NULL) {
MGETHDR(m_new, M_DONTWAIT, MT_DATA);
if (m_new == NULL) {
        return(ENOBUFS);
    }
and then returns ENOBUFS.

This is with 4.7-RELEASE.


--don ([EMAIL PROTECTED] www.sandvine.com)

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Performance of em driver (Was: ENOBUFS)

2002-10-30 Thread Kelly Yancey
On Fri, 18 Oct 2002, Kelly Yancey wrote:

>   Hmm.  Might that explain the abysmal performance of the em driver with
> packets smaller than 333 bytes?
>
>   Kelly
>

  This is just a follow-up to report that thanks to Luigi and Prafulla we
were able to track down the cause of the problems I was seeing with the em
driver/hardware.  In our test environment we had left the IP packet queue
(net.inet.ip.intr_queue_maxlen) at its default value of 50 which, when using
the em card, was overflowing causing the dropped packets.  While it is
curious that it was not overflowing using the bge card, clearly 50 packets
is a restrictive maximum queue size for any decent amount of traffic.

  Below are some of the results from our testing.  First, a note about the
methodology: traffic was generated using 7 10/100 ethernet ports of a
SmartBits 600 (each port was set to generate 14.25Mbps of traffic for a
aggregate of 99.75Mbps, slightly higher than the theoretical maximum
wirespeed).  The traffic was then VLAN tagged before being passed to a
1.8Ghz Pentium 4 running FreeBSD 4.5p19 where it was untagged and passed
back to the SmartBits.  The numbers quoted below are the actual amount of
traffic that was delivered back to the SmartBits.  The kernel involved
included a number of modifications proprietary to NTTMCL so the numbers are
going to differ from a stock kernel and I only present them for comparative
purposes between the different network configurations.  Also note that all
interfaces were configured for 100base-TX full-duplex.

  Frame Size
NICs  queue  ipfw   64  128  192
bge->fxp 50 0   79.708   97.325   98.124 Mbps
bge->fxp   1000 0   80.172   97.325   98.124 Mbps
em->fxp1000 0   77.590   97.325   98.124 Mbps
bge->fxp 5032   39.097   97.325   98.124 Mbps
bge->fxp   100032   62.011   97.325   98.124 Mbps
em->fxp100032   63.651   97.325   98.124 Mbps

  The numbers in the ipfw column are the number of non-matching rules in the
ruleset before an "allow all from any to any" rule.

  Kelly

--
Kelly Yancey -- kbyanc@{posi.net,FreeBSD.org} -- [EMAIL PROTECTED]
"And say, finally, whether peace is best preserved by giving energy to the
 government or information to the people.  This last is the most certain and
 the most legitimate engine of government."
-- Thomas Jefferson to James Madison, 1787.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: ENOBUFS

2002-10-18 Thread Petri Helenius
> In special cases, the error induced by having interrupts blocked
> causes errors which are much larger than polling alone.

Which conditions block interrupts for longer than, say, a millisecond?

Disk errors / wakeups? Anything occurring in "normal" conditions?

Pete



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: ENOBUFS

2002-10-18 Thread Kelly Yancey
On Fri, 18 Oct 2002, Kelly Yancey wrote:

> > You should definitely clarify how fast the smartbits unit is pushing
> > out traffic, and whether its speed depends on the measured RTT.
> >
>
>   It doesn't sound like the box is that smart.  As it was explained to me, the
> test setup includes a desired 'load' to put on the wire: it is measured as a
> percentage of the wire speed.  Since our SmartBit unit only supports
> 100base-T and doesn't understand vlans, we have to use 7 separate outbound
> ports, each configured for 14.25% load.  To the GigE interface, this should
> appear as 99.75 megabits of data (including all headers/framing).
>

  Oops.  That was actually the explanation of the SmartBits 'desired ILoad'
which I didn't quote in the posted numbers.  The actual number of packets
transmitted is based on RTT.  Sorry for the confusion,

  Kelly

--
Kelly Yancey -- kbyanc@{posi.net,FreeBSD.org}
"No nation is permitted to live in ignorance with impunity."
-- Thomas Jefferson, 1821.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: ENOBUFS

2002-10-18 Thread Kelly Yancey
On Fri, 18 Oct 2002, Luigi Rizzo wrote:

> Oh, I *thought* the numbers you reported were pps but now i see that
> nowhere you mentioned that.
>

  Sorry.  I just checked with our tester.  Those are the total number of
packets sent during the test.  Each test lasted 10 seconds, so divde by 10 to
get pps.

> But if things are as you say, i am seriously puzzled on what you
> are trying to measure -- the output interface (fxp) is a 100Mbit/s
> card which cannot possibly support the load you are trying to offer
> to saturate the input link.
>

  We don't want to saturate the input link, only saturate the outbound link
(100Mps).  Oddly enough, the em card cannot do this with any packets less than
333 bytes and drops ~50% of the packets.  But clearly this isn't a bottlenext
issue because the drop-off isn't smooth.  332 byte backs cause ~50% packet
loss; 333 byte packets cause 0% packet loss.

> You should definitely clarify how fast the smartbits unit is pushing
> out traffic, and whether its speed depends on the measured RTT.
>

  It doesn't sound like the box is that smart.  As it was explained to me, the
test setup includes a desired 'load' to put on the wire: it is measured as a
percentage of the wire speed.  Since our SmartBit unit only supports
100base-T and doesn't understand vlans, we have to use 7 separate outbound
ports, each configured for 14.25% load.  To the GigE interface, this should
appear as 99.75 megabits of data (including all headers/framing).

> It might well be that what you are
> seeing is saturation of ipintrq, which happens because of some
> strange timing issue -- nothing to do with the board.
>

  I don't understand why it would only happen with the em card and not with
the bge under the exact same traffic (or even more demanding traffic, i.e.
64byte frames).  Also, wouldn't packet gradually subside as we approached the
333 byte magic limit rather than the sudden drop-off we are seeing?

> In any case, at least in my experience, a 1GHz box with two em
> cards can easily forward between 350 and 400kpps (64-byte frames) with a
> 4.6-ish kernel, and a 2.4GHz box goes above 650kpps.
>

  We expect our kernel to be slower than that (we typically see ~120kpps for
64-byte frames using the bge driver and a 5701-based card) because we are
using an fxp card for outbound traffic and have added additional code to the
ip_input() processing.  The point isn't absolute numbers, though, but trying
to figure out why when using the em driver (and only with the em driver!) we
see ~50% packet loss with packets smaller than 333 bytes (no matter what size,
just that it is smaller).  That is, 64 byte frames: ~50% packet loss; 332 byte
frames: ~50% packet loss; 333 byte frames: 0% packet loss.  That sort of
sudden drop doesn't look like a bottleneck to me.
  We've mostly written the em driver off because of this.  The bge driver
works just fine performance wise; it was the sporadic watchdog timeouts
that led us to investigate the Intel cards to begin with.  I only mentioned it
on-list because earlier Jim McGrath alluded to similar performance issues with
the Intel GigE cards and small frames.

  Actually, at this point, I'm hoping that your polling patches for the em
driver workaround whatever problem is causing the packet loss and am eagerly
awaiting them to be committed. :)

  Thanks,

  Kelly

>
> On Fri, Oct 18, 2002 at 11:13:54AM -0700, Kelly Yancey wrote:
> > On Fri, 18 Oct 2002, Luigi Rizzo wrote:
> >
> > > How is the measurement done, does the box under test act as a router
> > > with the smartbit pushing traffic in and expecting it back ?
> > >
> >   The box has 2 interfaces, a fxp and a em (or bge).  The GigE interface is
> > configured with 7 VLANs.  THe SmartBit produces X byte UDP datagrams that go
> > through a Foundry ServerIron switch for VLAN tagging and then to the GigE
> > interface (where they are untagged).  The box is acting as a router and all
> > traffic is directed out the fxp interface where it returns to the SmartBit.
> >
> > > The numbers are strange, anyways.
> > >
> > > A frame of N bytes takes (N*8+160) nanoseconds on the wire, which
> > > for 330-byte frames should amount to 100/(330*8+160) ~= 357kpps,
> > > not the 249 or so you are seeing. Looks as if the times were 40% off.
> > >
> >
> >   Yeah, I've never made to much sense of the actual numbers myself.  Our
> > resident SmartBit expert runs the tests and provides me with the results.  I
> > use them more for getting an idea of the relative performance of one
> > configuration over another and not as absolute numbers themselves.  I'll check
> > with our resident expert and see if he can explain how it calculates those
> > numbers.  The point being, though, that there is an undeniable drop-off with
> > 332 byte or smaller packets.  We have never seen any such drop-off using the
> > bge driver.
> >
> >   Thanks,
> >
> >   Kelly
> >
> > >   cheers
> > >   luigi
> > >
> > > On Fri, Oct 18, 2002 at 10:45:13AM -0700, Kelly Yanc

Re: ENOBUFS

2002-10-18 Thread Kelly Yancey
On Fri, 18 Oct 2002, Prafulla Deuskar wrote:

> FYI. 82543 doesn't support PCI-X protocol.
> For PCI-X support use 82544, 82545 or 82546 based cards.
>
> -Prafulla
>

  That is alright, we aren't expecting PCI-X speeds.  It is just that our only
PCI slot on the motherboard (1U rack-mount system) is a PCI-X slot.  Shouldn't
the 82543 still function normally but only as at PCI speeds?

  Thanks,

  Kelly

>
> Kelly Yancey [[EMAIL PROTECTED]] wrote:
> > On Fri, 18 Oct 2002, Luigi Rizzo wrote:
> >
> > > On Fri, Oct 18, 2002 at 10:27:04AM -0700, Kelly Yancey wrote:
> > > ...
> > > >   Hmm.  Might that explain the abysmal performance of the em driver with
> > > > packets smaller than 333 bytes?
> > >
> > > what do you mean ? it works great for me. even on -current i
> > > can push out over 400kpps (64byte frames) on a 2.4GHz box.
> > >
> > >   luigi
> > >
> >
> >   Using a SmartBit to push traffic across a 1.8Ghz P4; 82543 chipset card
> > plugged into PCI-X bus:
> >
> > FrameSize   TxFramesRxFramesLostFrames  Lost (%)
> > 330 249984  129518  120466  48.19
> > 331 249144  127726  121418  48.73
> > 332 248472  140817  107655  43.33
> > 333 247800  247800  0   0
> >
> >   It has no trouble handling frames 333 bytes or larger.  But for any frame
> > 332 bytes or smaller we consistently see ~50% packet loss.  This same machine
> > easily pushes ~100Mps with the very same frame sizes using a bge card rather
> > than em.
> >
> >   I've gotten the same results with both em driver version 1.3.14 and 1.3.15
> > on both FreeBSD 4.5 and 4.7 (all 4 combinations, that is).
> >
> >   Kelly
> >
> > --
> > Kelly Yancey -- kbyanc@{posi.net,FreeBSD.org}
> > FreeBSD, The Power To Serve: http://www.freebsd.org/
> >
> >
> > To Unsubscribe: send mail to [EMAIL PROTECTED]
> > with "unsubscribe freebsd-net" in the body of the message
>
> To Unsubscribe: send mail to [EMAIL PROTECTED]
> with "unsubscribe freebsd-net" in the body of the message
>

--
Kelly Yancey -- kbyanc@{posi.net,FreeBSD.org}
"No nation is permitted to live in ignorance with impunity."
-- Thomas Jefferson, 1821.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: ENOBUFS

2002-10-18 Thread Luigi Rizzo
Oh, I *thought* the numbers you reported were pps but now i see that
nowhere you mentioned that.

But if things are as you say, i am seriously puzzled on what you
are trying to measure -- the output interface (fxp) is a 100Mbit/s
card which cannot possibly support the load you are trying to offer
to saturate the input link.

You should definitely clarify how fast the smartbits unit is pushing
out traffic, and whether its speed depends on the measured RTT.

It might well be that what you are
seeing is saturation of ipintrq, which happens because of some
strange timing issue -- nothing to do with the board.

In any case, at least in my experience, a 1GHz box with two em
cards can easily forward between 350 and 400kpps (64-byte frames) with a
4.6-ish kernel, and a 2.4GHz box goes above 650kpps.

cheers
luigi

On Fri, Oct 18, 2002 at 11:13:54AM -0700, Kelly Yancey wrote:
> On Fri, 18 Oct 2002, Luigi Rizzo wrote:
> 
> > How is the measurement done, does the box under test act as a router
> > with the smartbit pushing traffic in and expecting it back ?
> >
>   The box has 2 interfaces, a fxp and a em (or bge).  The GigE interface is
> configured with 7 VLANs.  THe SmartBit produces X byte UDP datagrams that go
> through a Foundry ServerIron switch for VLAN tagging and then to the GigE
> interface (where they are untagged).  The box is acting as a router and all
> traffic is directed out the fxp interface where it returns to the SmartBit.
> 
> > The numbers are strange, anyways.
> >
> > A frame of N bytes takes (N*8+160) nanoseconds on the wire, which
> > for 330-byte frames should amount to 100/(330*8+160) ~= 357kpps,
> > not the 249 or so you are seeing. Looks as if the times were 40% off.
> >
> 
>   Yeah, I've never made to much sense of the actual numbers myself.  Our
> resident SmartBit expert runs the tests and provides me with the results.  I
> use them more for getting an idea of the relative performance of one
> configuration over another and not as absolute numbers themselves.  I'll check
> with our resident expert and see if he can explain how it calculates those
> numbers.  The point being, though, that there is an undeniable drop-off with
> 332 byte or smaller packets.  We have never seen any such drop-off using the
> bge driver.
> 
>   Thanks,
> 
>   Kelly
> 
> > cheers
> > luigi
> >
> > On Fri, Oct 18, 2002 at 10:45:13AM -0700, Kelly Yancey wrote:
> > ...
> > > > can push out over 400kpps (64byte frames) on a 2.4GHz box.
> > > >
> > > > luigi
> > > >
> > >
> > >   Using a SmartBit to push traffic across a 1.8Ghz P4; 82543 chipset card
> > > plugged into PCI-X bus:
> > >
> > > FrameSize   TxFramesRxFramesLostFrames  Lost (%)
> > > 330 249984  129518  120466  48.19
> > > 331 249144  127726  121418  48.73
> > > 332 248472  140817  107655  43.33
> > > 333 247800  247800  0   0
> > >
> > >   It has no trouble handling frames 333 bytes or larger.  But for any frame
> > > 332 bytes or smaller we consistently see ~50% packet loss.  This same machine
> > > easily pushes ~100Mps with the very same frame sizes using a bge card rather
> > > than em.
> > >
> > >   I've gotten the same results with both em driver version 1.3.14 and 1.3.15
> > > on both FreeBSD 4.5 and 4.7 (all 4 combinations, that is).
> > >
> > >   Kelly
> > >
> > > --
> > > Kelly Yancey -- kbyanc@{posi.net,FreeBSD.org}
> > > FreeBSD, The Power To Serve: http://www.freebsd.org/
> > >
> > >
> > > To Unsubscribe: send mail to [EMAIL PROTECTED]
> > > with "unsubscribe freebsd-net" in the body of the message
> >
> > To Unsubscribe: send mail to [EMAIL PROTECTED]
> > with "unsubscribe freebsd-net" in the body of the message
> >
> 
> --
> Kelly Yancey -- kbyanc@{posi.net,FreeBSD.org}
> Join distributed.net Team FreeBSD: http://www.posi.net/freebsd/Team-FreeBSD/
> 
> 
> To Unsubscribe: send mail to [EMAIL PROTECTED]
> with "unsubscribe freebsd-net" in the body of the message

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: ENOBUFS

2002-10-18 Thread Kelly Yancey
On Fri, 18 Oct 2002, Luigi Rizzo wrote:

> How is the measurement done, does the box under test act as a router
> with the smartbit pushing traffic in and expecting it back ?
>
  The box has 2 interfaces, a fxp and a em (or bge).  The GigE interface is
configured with 7 VLANs.  THe SmartBit produces X byte UDP datagrams that go
through a Foundry ServerIron switch for VLAN tagging and then to the GigE
interface (where they are untagged).  The box is acting as a router and all
traffic is directed out the fxp interface where it returns to the SmartBit.

> The numbers are strange, anyways.
>
> A frame of N bytes takes (N*8+160) nanoseconds on the wire, which
> for 330-byte frames should amount to 100/(330*8+160) ~= 357kpps,
> not the 249 or so you are seeing. Looks as if the times were 40% off.
>

  Yeah, I've never made to much sense of the actual numbers myself.  Our
resident SmartBit expert runs the tests and provides me with the results.  I
use them more for getting an idea of the relative performance of one
configuration over another and not as absolute numbers themselves.  I'll check
with our resident expert and see if he can explain how it calculates those
numbers.  The point being, though, that there is an undeniable drop-off with
332 byte or smaller packets.  We have never seen any such drop-off using the
bge driver.

  Thanks,

  Kelly

>   cheers
>   luigi
>
> On Fri, Oct 18, 2002 at 10:45:13AM -0700, Kelly Yancey wrote:
> ...
> > > can push out over 400kpps (64byte frames) on a 2.4GHz box.
> > >
> > >   luigi
> > >
> >
> >   Using a SmartBit to push traffic across a 1.8Ghz P4; 82543 chipset card
> > plugged into PCI-X bus:
> >
> > FrameSize   TxFramesRxFramesLostFrames  Lost (%)
> > 330 249984  129518  120466  48.19
> > 331 249144  127726  121418  48.73
> > 332 248472  140817  107655  43.33
> > 333 247800  247800  0   0
> >
> >   It has no trouble handling frames 333 bytes or larger.  But for any frame
> > 332 bytes or smaller we consistently see ~50% packet loss.  This same machine
> > easily pushes ~100Mps with the very same frame sizes using a bge card rather
> > than em.
> >
> >   I've gotten the same results with both em driver version 1.3.14 and 1.3.15
> > on both FreeBSD 4.5 and 4.7 (all 4 combinations, that is).
> >
> >   Kelly
> >
> > --
> > Kelly Yancey -- kbyanc@{posi.net,FreeBSD.org}
> > FreeBSD, The Power To Serve: http://www.freebsd.org/
> >
> >
> > To Unsubscribe: send mail to [EMAIL PROTECTED]
> > with "unsubscribe freebsd-net" in the body of the message
>
> To Unsubscribe: send mail to [EMAIL PROTECTED]
> with "unsubscribe freebsd-net" in the body of the message
>

--
Kelly Yancey -- kbyanc@{posi.net,FreeBSD.org}
Join distributed.net Team FreeBSD: http://www.posi.net/freebsd/Team-FreeBSD/


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: ENOBUFS

2002-10-18 Thread Prafulla Deuskar
FYI. 82543 doesn't support PCI-X protocol.
For PCI-X support use 82544, 82545 or 82546 based cards.

-Prafulla


Kelly Yancey [[EMAIL PROTECTED]] wrote:
> On Fri, 18 Oct 2002, Luigi Rizzo wrote:
> 
> > On Fri, Oct 18, 2002 at 10:27:04AM -0700, Kelly Yancey wrote:
> > ...
> > >   Hmm.  Might that explain the abysmal performance of the em driver with
> > > packets smaller than 333 bytes?
> >
> > what do you mean ? it works great for me. even on -current i
> > can push out over 400kpps (64byte frames) on a 2.4GHz box.
> >
> > luigi
> >
> 
>   Using a SmartBit to push traffic across a 1.8Ghz P4; 82543 chipset card
> plugged into PCI-X bus:
> 
> FrameSize   TxFramesRxFramesLostFrames  Lost (%)
> 330 249984  129518  120466  48.19
> 331 249144  127726  121418  48.73
> 332 248472  140817  107655  43.33
> 333 247800  247800  0   0
> 
>   It has no trouble handling frames 333 bytes or larger.  But for any frame
> 332 bytes or smaller we consistently see ~50% packet loss.  This same machine
> easily pushes ~100Mps with the very same frame sizes using a bge card rather
> than em.
> 
>   I've gotten the same results with both em driver version 1.3.14 and 1.3.15
> on both FreeBSD 4.5 and 4.7 (all 4 combinations, that is).
> 
>   Kelly
> 
> --
> Kelly Yancey -- kbyanc@{posi.net,FreeBSD.org}
> FreeBSD, The Power To Serve: http://www.freebsd.org/
> 
> 
> To Unsubscribe: send mail to [EMAIL PROTECTED]
> with "unsubscribe freebsd-net" in the body of the message

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: ENOBUFS

2002-10-18 Thread Luigi Rizzo
How is the measurement done, does the box under test act as a router
with the smartbit pushing traffic in and expecting it back ?

The numbers are strange, anyways.

A frame of N bytes takes (N*8+160) nanoseconds on the wire, which
for 330-byte frames should amount to 100/(330*8+160) ~= 357kpps,
not the 249 or so you are seeing. Looks as if the times were 40% off.

cheers
luigi

On Fri, Oct 18, 2002 at 10:45:13AM -0700, Kelly Yancey wrote:
...
> > can push out over 400kpps (64byte frames) on a 2.4GHz box.
> >
> > luigi
> >
> 
>   Using a SmartBit to push traffic across a 1.8Ghz P4; 82543 chipset card
> plugged into PCI-X bus:
> 
> FrameSize   TxFramesRxFramesLostFrames  Lost (%)
> 330 249984  129518  120466  48.19
> 331 249144  127726  121418  48.73
> 332 248472  140817  107655  43.33
> 333 247800  247800  0   0
> 
>   It has no trouble handling frames 333 bytes or larger.  But for any frame
> 332 bytes or smaller we consistently see ~50% packet loss.  This same machine
> easily pushes ~100Mps with the very same frame sizes using a bge card rather
> than em.
> 
>   I've gotten the same results with both em driver version 1.3.14 and 1.3.15
> on both FreeBSD 4.5 and 4.7 (all 4 combinations, that is).
> 
>   Kelly
> 
> --
> Kelly Yancey -- kbyanc@{posi.net,FreeBSD.org}
> FreeBSD, The Power To Serve: http://www.freebsd.org/
> 
> 
> To Unsubscribe: send mail to [EMAIL PROTECTED]
> with "unsubscribe freebsd-net" in the body of the message

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: ENOBUFS

2002-10-18 Thread Kelly Yancey
On Fri, 18 Oct 2002, Luigi Rizzo wrote:

> On Fri, Oct 18, 2002 at 10:27:04AM -0700, Kelly Yancey wrote:
> ...
> >   Hmm.  Might that explain the abysmal performance of the em driver with
> > packets smaller than 333 bytes?
>
> what do you mean ? it works great for me. even on -current i
> can push out over 400kpps (64byte frames) on a 2.4GHz box.
>
>   luigi
>

  Using a SmartBit to push traffic across a 1.8Ghz P4; 82543 chipset card
plugged into PCI-X bus:

FrameSize   TxFramesRxFramesLostFrames  Lost (%)
330 249984  129518  120466  48.19
331 249144  127726  121418  48.73
332 248472  140817  107655  43.33
333 247800  247800  0   0

  It has no trouble handling frames 333 bytes or larger.  But for any frame
332 bytes or smaller we consistently see ~50% packet loss.  This same machine
easily pushes ~100Mps with the very same frame sizes using a bge card rather
than em.

  I've gotten the same results with both em driver version 1.3.14 and 1.3.15
on both FreeBSD 4.5 and 4.7 (all 4 combinations, that is).

  Kelly

--
Kelly Yancey -- kbyanc@{posi.net,FreeBSD.org}
FreeBSD, The Power To Serve: http://www.freebsd.org/


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: ENOBUFS

2002-10-18 Thread Luigi Rizzo
On Fri, Oct 18, 2002 at 06:21:37PM +0300, Petri Helenius wrote:
...
> Luigi´s polling work would be useful here. That would lead to incorrect
> timestamps
> on the packets, though?

polling introduce an extra uncertainty which might be as large as
an entire clock tick, yes.

But even with interrupts, you cannot trust the time when the interrupt
driver is run -- there are cases where an ISR is delayed by 10ms or more.
And even when it runs, it might take quite a bit of time (up to a
few 100's of microseconds) to drain the receive queue from packets
received earlier.

in normal cases, timestamps are reasonably accurate in both cases.
In special cases, the error induced by having interrupts blocked
causes errors which are much larger than polling alone.

cheers
luigi

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: ENOBUFS

2002-10-18 Thread Luigi Rizzo
On Fri, Oct 18, 2002 at 10:27:04AM -0700, Kelly Yancey wrote:
...
>   Hmm.  Might that explain the abysmal performance of the em driver with
> packets smaller than 333 bytes?

what do you mean ? it works great for me. even on -current i
can push out over 400kpps (64byte frames) on a 2.4GHz box.

luigi

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: ENOBUFS

2002-10-18 Thread Kelly Yancey
On Fri, 18 Oct 2002, Petri Helenius wrote:

> >
> > just reading the source code, yes, it appears that the card has
> > support for delayed rx/tx interrupts -- see RIDV and TIDV definitions
> > and usage in sys/dev/em/* . I don't know in what units are the values
> > (28 and 128, respectively), but it does appear that tx interrupts are
> > delayed a bit more than rx interrupts.
> >
> The thing what is looking suspect is also the "small packet interrupt" feature
> which does not seem to get modified in the em driver but is on the defines.
>
> If that would be on by default, we´d probably see interrupts "too often"
> because it tries to optimize interrupts for good throughput on small number
> of TCP streams.
>

  Hmm.  Might that explain the abysmal performance of the em driver with
packets smaller than 333 bytes?

  Kelly

--
Kelly Yancey -- kbyanc@{posi.net,FreeBSD.org}
FreeBSD, The Power To Serve: http://www.freebsd.org/


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: ENOBUFS

2002-10-18 Thread Prafulla Deuskar
Transmit/Receive Interrupt Delay values are in units of 1.024 microseconds.
The em driver currently uses these to enable interrupt coalescing on the cards.

Thanks,
Prafulla

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: ENOBUFS

2002-10-18 Thread Eli Dart

In reply to "Jim McGrath" <[EMAIL PROTECTED]> :

> 
> > Where could I get the errata sheet?
> 
> Product Specification Updates i.e. errata, and the Product Specification
> itself are available from Intel under a Non Disclosure Agreement.  Unless
> you work for a company that is doing business with Intel, they are probably
> not obtainable.
> >
> > Could the numbers be packet thresholds? 28 and 128 packets respectively?
> >
> I can't answer that directly because of NDA.  Let us apply some logic here.
> If they were packet counts, under very low load conditions e.g. a single
> telnet session, the telnet link would be unusable.  This leads us to the
> conclusion that they must be time values.

Based on the source code for the sk driver (look for "interrupt 
moderation" in if_sk.c) I would suspect that those values represent 
time in microseconds.  My guess (based on no privileged information 
whatsoever) is that if we've not interrupted in  
microseconds and we have something to send (or we've received 
something) go ahead and raise an interrupt.

Just a guess.  I'm perfectly willing to be wrong about this

--eli


> 
> Jim
> > Anything else that can be done? Does PCI width/speed affect the amount of
> > time spent in the kernel interrupt or are the PCI transfers asynchronous?
> >
> > Pete
> >
> > - Original Message -
> > From: "Jim McGrath" <[EMAIL PROTECTED]>
> > To: "Luigi Rizzo" <[EMAIL PROTECTED]>; "Petri Helenius" <[EMAIL PROTECTED]>
> > Cc: "Lars Eggert" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
> > Sent: Friday, October 18, 2002 7:49 AM
> > Subject: RE: ENOBUFS
> >
> >
> > > Careful here.  Read the errata sheet!!  I do not believe the em
> > driver uses
> > > these parameters, and possibly for a good reason.
> > >
> > > Jim
> > >
> > > > -Original Message-
> > > > From: [EMAIL PROTECTED]
> > > > [mailto:owner-freebsd-net@;FreeBSD.ORG]On Behalf Of Luigi Rizzo
> > > > Sent: Thursday, October 17, 2002 11:12 PM
> > > > To: Petri Helenius
> > > > Cc: Lars Eggert; [EMAIL PROTECTED]
> > > > Subject: Re: ENOBUFS
> > > >
> > > >
> > > > On Thu, Oct 17, 2002 at 11:55:24PM +0300, Petri Helenius wrote:
> > > > ...
> > > > > I seem to get about 5-6 packets on an interrupt. Is this tunable? At
> > > >
> > > > just reading the source code, yes, it appears that the card has
> > > > support for delayed rx/tx interrupts -- see RIDV and TIDV definitions
> > > > and usage in sys/dev/em/* . I don't know in what units are the values
> > > > (28 and 128, respectively), but it does appear that tx interrupts are
> > > > delayed a bit more than rx interrupts.
> > > >
> > > > They are not user-configurable at the moment though, you need
> > to rebuild
> > > > the kernel.
> > > >
> > > > cheers
> > > > luigi
> > > >
> > > > > 50kpps the card generates 10k interrupts a second. Sending generates
> > > > > way less. This is about 300Mbps so with the average packet size of
> > > > > 750 there should be room for more packets on the interface queue
> > > > > before needing to service an interrupt?
> > > > >
> > > > > What´s the way to access kernel adapter-structure? Is there
> > an utility
> > > > > that can view the values there?
> > > > > >
> > > > > Pete
> > > > >
> > > > >
> > > >
> > > > To Unsubscribe: send mail to [EMAIL PROTECTED]
> > > > with "unsubscribe freebsd-net" in the body of the message
> > > >
> > >
> > >
> >
> >
> 
> 
> To Unsubscribe: send mail to [EMAIL PROTECTED]
> with "unsubscribe freebsd-net" in the body of the message





msg07192/pgp0.pgp
Description: PGP signature


RE: ENOBUFS

2002-10-18 Thread Jim McGrath
> The chips I have are 82546. Is your recommendation to steer away
> from Intel
> Gigabit Ethernet chips? What would be more optimal alternative?
>
The 82543/82544 chips worked well in vanilla configurations.  I never played
with an 82546.  The em driver is supported by Intel, so any chip features it
uses should be safe.  When testing with a SmartBits, 64 byte packets and
high line utilization, I ran into problems when RIDV was enabled.  This may
be fixed with the 82546, but I have no way of verifying this.

> Maybe I´ll play with the value and see what happens. Any comments on
> the question how to access the adapter structure from userland?
>
We added sysctls to the wx driver to allow tuning/testing of various
parameters.  The same could be done to the em driver.  You will likely need
to do more than just modify fields in the adapter structure.  The control
register you are targeting will need to be rewritten by the sysctl.

Jim


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: ENOBUFS

2002-10-18 Thread Petri Helenius
(I´ll throw in the address found in the README of the driver, maybe somebody
 there has access to appropriate documentation / is willing to work on
documenting
 tunables and optimizing the performance)

> I have to tread carefully here because I was under NDA at my previous
> company.  My work was with the wx driver, but hardware problems are hardware
> problems.  There are a lot of performance enhancing features in the 82544.

The chips I have are 82546. Is your recommendation to steer away from Intel
Gigabit Ethernet chips? What would be more optimal alternative?

> You will notice that the em driver does not use them.  This may be for a
> reason :-(  Our implementation ran with transmit interrupts disabled, so I
> can't comment on TIDV and am not allowed to comment on RIDV.
>
Maybe I´ll play with the value and see what happens. Any comments on
the question how to access the adapter structure from userland?

> The Receive Descriptor Threshold interrupt showed promise under high load
> (Rx interrupts disabled) but you would need to add a timeout function, 1
> msec. or faster, to process receive descriptors under low load.
>
Luigi´s polling work would be useful here. That would lead to incorrect
timestamps
on the packets, though?

Pete

>
> > -Original Message-
> > From: [EMAIL PROTECTED]
> > [mailto:owner-freebsd-net@;FreeBSD.ORG]On Behalf Of Luigi Rizzo
> > Sent: Friday, October 18, 2002 12:56 AM
> > To: Jim McGrath
> > Cc: Petri Helenius; Lars Eggert; [EMAIL PROTECTED]
> > Subject: Re: ENOBUFS
> >
> >
> > On Fri, Oct 18, 2002 at 12:49:04AM -0400, Jim McGrath wrote:
> > > Careful here.  Read the errata sheet!!  I do not believe the em
> > driver uses
> > > these parameters, and possibly for a good reason.
> >
> > as if i had access to the data sheets :)
> >
> > cheers
> > luigi
> > > Jim
> > >
> > > > -Original Message-
> > > > From: [EMAIL PROTECTED]
> > > > [mailto:owner-freebsd-net@;FreeBSD.ORG]On Behalf Of Luigi Rizzo
> > > > Sent: Thursday, October 17, 2002 11:12 PM
> > > > To: Petri Helenius
> > > > Cc: Lars Eggert; [EMAIL PROTECTED]
> > > > Subject: Re: ENOBUFS
> > > >
> > > >
> > > > On Thu, Oct 17, 2002 at 11:55:24PM +0300, Petri Helenius wrote:
> > > > ...
> > > > > I seem to get about 5-6 packets on an interrupt. Is this tunable? At
> > > >
> > > > just reading the source code, yes, it appears that the card has
> > > > support for delayed rx/tx interrupts -- see RIDV and TIDV definitions
> > > > and usage in sys/dev/em/* . I don't know in what units are the values
> > > > (28 and 128, respectively), but it does appear that tx interrupts are
> > > > delayed a bit more than rx interrupts.
> > > >
> > > > They are not user-configurable at the moment though, you need
> > to rebuild
> > > > the kernel.
> >
> > To Unsubscribe: send mail to [EMAIL PROTECTED]
> > with "unsubscribe freebsd-net" in the body of the message
> >
>
>



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



RE: ENOBUFS

2002-10-18 Thread Jim McGrath

> Where could I get the errata sheet?

Product Specification Updates i.e. errata, and the Product Specification
itself are available from Intel under a Non Disclosure Agreement.  Unless
you work for a company that is doing business with Intel, they are probably
not obtainable.
>
> Could the numbers be packet thresholds? 28 and 128 packets respectively?
>
I can't answer that directly because of NDA.  Let us apply some logic here.
If they were packet counts, under very low load conditions e.g. a single
telnet session, the telnet link would be unusable.  This leads us to the
conclusion that they must be time values.

Jim
> Anything else that can be done? Does PCI width/speed affect the amount of
> time spent in the kernel interrupt or are the PCI transfers asynchronous?
>
> Pete
>
> - Original Message -
> From: "Jim McGrath" <[EMAIL PROTECTED]>
> To: "Luigi Rizzo" <[EMAIL PROTECTED]>; "Petri Helenius" <[EMAIL PROTECTED]>
> Cc: "Lars Eggert" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
> Sent: Friday, October 18, 2002 7:49 AM
> Subject: RE: ENOBUFS
>
>
> > Careful here.  Read the errata sheet!!  I do not believe the em
> driver uses
> > these parameters, and possibly for a good reason.
> >
> > Jim
> >
> > > -Original Message-
> > > From: [EMAIL PROTECTED]
> > > [mailto:owner-freebsd-net@;FreeBSD.ORG]On Behalf Of Luigi Rizzo
> > > Sent: Thursday, October 17, 2002 11:12 PM
> > > To: Petri Helenius
> > > Cc: Lars Eggert; [EMAIL PROTECTED]
> > > Subject: Re: ENOBUFS
> > >
> > >
> > > On Thu, Oct 17, 2002 at 11:55:24PM +0300, Petri Helenius wrote:
> > > ...
> > > > I seem to get about 5-6 packets on an interrupt. Is this tunable? At
> > >
> > > just reading the source code, yes, it appears that the card has
> > > support for delayed rx/tx interrupts -- see RIDV and TIDV definitions
> > > and usage in sys/dev/em/* . I don't know in what units are the values
> > > (28 and 128, respectively), but it does appear that tx interrupts are
> > > delayed a bit more than rx interrupts.
> > >
> > > They are not user-configurable at the moment though, you need
> to rebuild
> > > the kernel.
> > >
> > > cheers
> > > luigi
> > >
> > > > 50kpps the card generates 10k interrupts a second. Sending generates
> > > > way less. This is about 300Mbps so with the average packet size of
> > > > 750 there should be room for more packets on the interface queue
> > > > before needing to service an interrupt?
> > > >
> > > > What´s the way to access kernel adapter-structure? Is there
> an utility
> > > > that can view the values there?
> > > > >
> > > > Pete
> > > >
> > > >
> > >
> > > To Unsubscribe: send mail to [EMAIL PROTECTED]
> > > with "unsubscribe freebsd-net" in the body of the message
> > >
> >
> >
>
>


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



RE: ENOBUFS

2002-10-18 Thread Jim McGrath
I have to tread carefully here because I was under NDA at my previous
company.  My work was with the wx driver, but hardware problems are hardware
problems.  There are a lot of performance enhancing features in the 82544.
You will notice that the em driver does not use them.  This may be for a
reason :-(  Our implementation ran with transmit interrupts disabled, so I
can't comment on TIDV and am not allowed to comment on RIDV.

The Receive Descriptor Threshold interrupt showed promise under high load
(Rx interrupts disabled) but you would need to add a timeout function, 1
msec. or faster, to process receive descriptors under low load.

Jim

> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:owner-freebsd-net@;FreeBSD.ORG]On Behalf Of Luigi Rizzo
> Sent: Friday, October 18, 2002 12:56 AM
> To: Jim McGrath
> Cc: Petri Helenius; Lars Eggert; [EMAIL PROTECTED]
> Subject: Re: ENOBUFS
>
>
> On Fri, Oct 18, 2002 at 12:49:04AM -0400, Jim McGrath wrote:
> > Careful here.  Read the errata sheet!!  I do not believe the em
> driver uses
> > these parameters, and possibly for a good reason.
>
> as if i had access to the data sheets :)
>
>   cheers
>   luigi
> > Jim
> >
> > > -Original Message-
> > > From: [EMAIL PROTECTED]
> > > [mailto:owner-freebsd-net@;FreeBSD.ORG]On Behalf Of Luigi Rizzo
> > > Sent: Thursday, October 17, 2002 11:12 PM
> > > To: Petri Helenius
> > > Cc: Lars Eggert; [EMAIL PROTECTED]
> > > Subject: Re: ENOBUFS
> > >
> > >
> > > On Thu, Oct 17, 2002 at 11:55:24PM +0300, Petri Helenius wrote:
> > > ...
> > > > I seem to get about 5-6 packets on an interrupt. Is this tunable? At
> > >
> > > just reading the source code, yes, it appears that the card has
> > > support for delayed rx/tx interrupts -- see RIDV and TIDV definitions
> > > and usage in sys/dev/em/* . I don't know in what units are the values
> > > (28 and 128, respectively), but it does appear that tx interrupts are
> > > delayed a bit more than rx interrupts.
> > >
> > > They are not user-configurable at the moment though, you need
> to rebuild
> > > the kernel.
>
> To Unsubscribe: send mail to [EMAIL PROTECTED]
> with "unsubscribe freebsd-net" in the body of the message
>


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: ENOBUFS

2002-10-18 Thread Petri Helenius
>
> just reading the source code, yes, it appears that the card has
> support for delayed rx/tx interrupts -- see RIDV and TIDV definitions
> and usage in sys/dev/em/* . I don't know in what units are the values
> (28 and 128, respectively), but it does appear that tx interrupts are
> delayed a bit more than rx interrupts.
>
The thing what is looking suspect is also the "small packet interrupt" feature
which does not seem to get modified in the em driver but is on the defines.

If that would be on by default, we´d probably see interrupts "too often"
because it tries to optimize interrupts for good throughput on small number
of TCP streams.

Should these questions be posted to the authors of the driver?

Pete



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: ENOBUFS

2002-10-18 Thread Petri Helenius

Where could I get the errata sheet?

Could the numbers be packet thresholds? 28 and 128 packets respectively?

Anything else that can be done? Does PCI width/speed affect the amount of
time spent in the kernel interrupt or are the PCI transfers asynchronous?

Pete

- Original Message -
From: "Jim McGrath" <[EMAIL PROTECTED]>
To: "Luigi Rizzo" <[EMAIL PROTECTED]>; "Petri Helenius" <[EMAIL PROTECTED]>
Cc: "Lars Eggert" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
Sent: Friday, October 18, 2002 7:49 AM
Subject: RE: ENOBUFS


> Careful here.  Read the errata sheet!!  I do not believe the em driver uses
> these parameters, and possibly for a good reason.
>
> Jim
>
> > -Original Message-
> > From: [EMAIL PROTECTED]
> > [mailto:owner-freebsd-net@;FreeBSD.ORG]On Behalf Of Luigi Rizzo
> > Sent: Thursday, October 17, 2002 11:12 PM
> > To: Petri Helenius
> > Cc: Lars Eggert; [EMAIL PROTECTED]
> > Subject: Re: ENOBUFS
> >
> >
> > On Thu, Oct 17, 2002 at 11:55:24PM +0300, Petri Helenius wrote:
> > ...
> > > I seem to get about 5-6 packets on an interrupt. Is this tunable? At
> >
> > just reading the source code, yes, it appears that the card has
> > support for delayed rx/tx interrupts -- see RIDV and TIDV definitions
> > and usage in sys/dev/em/* . I don't know in what units are the values
> > (28 and 128, respectively), but it does appear that tx interrupts are
> > delayed a bit more than rx interrupts.
> >
> > They are not user-configurable at the moment though, you need to rebuild
> > the kernel.
> >
> > cheers
> > luigi
> >
> > > 50kpps the card generates 10k interrupts a second. Sending generates
> > > way less. This is about 300Mbps so with the average packet size of
> > > 750 there should be room for more packets on the interface queue
> > > before needing to service an interrupt?
> > >
> > > What´s the way to access kernel adapter-structure? Is there an utility
> > > that can view the values there?
> > > >
> > > Pete
> > >
> > >
> >
> > To Unsubscribe: send mail to [EMAIL PROTECTED]
> > with "unsubscribe freebsd-net" in the body of the message
> >
>
>



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: ENOBUFS

2002-10-17 Thread Luigi Rizzo
On Fri, Oct 18, 2002 at 12:49:04AM -0400, Jim McGrath wrote:
> Careful here.  Read the errata sheet!!  I do not believe the em driver uses
> these parameters, and possibly for a good reason.

as if i had access to the data sheets :)

cheers
luigi
> Jim
> 
> > -Original Message-
> > From: [EMAIL PROTECTED]
> > [mailto:owner-freebsd-net@;FreeBSD.ORG]On Behalf Of Luigi Rizzo
> > Sent: Thursday, October 17, 2002 11:12 PM
> > To: Petri Helenius
> > Cc: Lars Eggert; [EMAIL PROTECTED]
> > Subject: Re: ENOBUFS
> >
> >
> > On Thu, Oct 17, 2002 at 11:55:24PM +0300, Petri Helenius wrote:
> > ...
> > > I seem to get about 5-6 packets on an interrupt. Is this tunable? At
> >
> > just reading the source code, yes, it appears that the card has
> > support for delayed rx/tx interrupts -- see RIDV and TIDV definitions
> > and usage in sys/dev/em/* . I don't know in what units are the values
> > (28 and 128, respectively), but it does appear that tx interrupts are
> > delayed a bit more than rx interrupts.
> >
> > They are not user-configurable at the moment though, you need to rebuild
> > the kernel.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



RE: ENOBUFS

2002-10-17 Thread Jim McGrath
Careful here.  Read the errata sheet!!  I do not believe the em driver uses
these parameters, and possibly for a good reason.

Jim

> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:owner-freebsd-net@;FreeBSD.ORG]On Behalf Of Luigi Rizzo
> Sent: Thursday, October 17, 2002 11:12 PM
> To: Petri Helenius
> Cc: Lars Eggert; [EMAIL PROTECTED]
> Subject: Re: ENOBUFS
>
>
> On Thu, Oct 17, 2002 at 11:55:24PM +0300, Petri Helenius wrote:
> ...
> > I seem to get about 5-6 packets on an interrupt. Is this tunable? At
>
> just reading the source code, yes, it appears that the card has
> support for delayed rx/tx interrupts -- see RIDV and TIDV definitions
> and usage in sys/dev/em/* . I don't know in what units are the values
> (28 and 128, respectively), but it does appear that tx interrupts are
> delayed a bit more than rx interrupts.
>
> They are not user-configurable at the moment though, you need to rebuild
> the kernel.
>
>   cheers
>   luigi
>
> > 50kpps the card generates 10k interrupts a second. Sending generates
> > way less. This is about 300Mbps so with the average packet size of
> > 750 there should be room for more packets on the interface queue
> > before needing to service an interrupt?
> >
> > What´s the way to access kernel adapter-structure? Is there an utility
> > that can view the values there?
> > >
> > Pete
> >
> >
>
> To Unsubscribe: send mail to [EMAIL PROTECTED]
> with "unsubscribe freebsd-net" in the body of the message
>


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: ENOBUFS

2002-10-17 Thread Luigi Rizzo
On Thu, Oct 17, 2002 at 11:55:24PM +0300, Petri Helenius wrote:
...
> I seem to get about 5-6 packets on an interrupt. Is this tunable? At

just reading the source code, yes, it appears that the card has
support for delayed rx/tx interrupts -- see RIDV and TIDV definitions
and usage in sys/dev/em/* . I don't know in what units are the values
(28 and 128, respectively), but it does appear that tx interrupts are
delayed a bit more than rx interrupts.

They are not user-configurable at the moment though, you need to rebuild
the kernel.

cheers
luigi

> 50kpps the card generates 10k interrupts a second. Sending generates
> way less. This is about 300Mbps so with the average packet size of
> 750 there should be room for more packets on the interface queue
> before needing to service an interrupt?
> 
> What´s the way to access kernel adapter-structure? Is there an utility
> that can view the values there?
> >
> Pete
> 
> 

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: ENOBUFS

2002-10-17 Thread Petri Helenius
>
> Less :-) Let me tell you tomorrow, don't have the numbers here right now.

I seem to get about 5-6 packets on an interrupt. Is this tunable? At
50kpps the card generates 10k interrupts a second. Sending generates
way less. This is about 300Mbps so with the average packet size of
750 there should be room for more packets on the interface queue
before needing to service an interrupt?

What´s the way to access kernel adapter-structure? Is there an utility
that can view the values there?
>
Pete



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: ENOBUFS

2002-10-16 Thread Sam Leffler

> Sam Leffler wrote:
> > Try my port of the netbsd kttcp kernel module.  You can find it at
> >
> > http://www.freebsd.org/~sam
>
> this seems to use some things from netbsd like
> so_rcv.sb_lastrecord and SBLASTRECORDCHK/SBLASTMBUFCHK.
> Is there something else I need to apply to build it on
> freebsd -STABLE?
>

Sorry, I ported Jason's tail pointer stuff to -stable before kttcp so it
assumes that's installed.  If you don't want to redo kttcp you might try
applying thorpe-stable.patch from the same directory.  FWIW I've been
running with that patch in my production systems for many months w/o
incident.  I never committed it because I didn't see noticeable performance
improvements.

Sam


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



RE: ENOBUFS

2002-10-16 Thread Don Bowman

Sam Leffler wrote:
> Try my port of the netbsd kttcp kernel module.  You can find it at
> 
> http://www.freebsd.org/~sam

this seems to use some things from netbsd like
so_rcv.sb_lastrecord and SBLASTRECORDCHK/SBLASTMBUFCHK.
Is there something else I need to apply to build it on
freebsd -STABLE?

--don ([EMAIL PROTECTED] www.sandvine.com)

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: ENOBUFS

2002-10-16 Thread Sam Leffler

> > The 900Mbps are similar to what I see here on similar hardware.
>
> What kind of receive performance do you observe? I haven´t got that
> far yet.
> >
> > For your two-interface setup, are the 600Mbps aggregate send rate on
> > both interfaces, or do you see 600Mbps per interface? In the latter
>
> 600Mbps per interface. I´m going to try this out also on -CURRENT
> to see if it changes anything. Interrupts do not seem to pose a big
> problem because I´m seeing only a few thousand em interrupts
> a second but since every packet involves a write call there are >100k
> syscalls a second.
>
> > case, is your CPU maxed out? Only one can be in the kernel under
> > -stable, so the second one won't help much. With small packets like
> > that, you may be interrupt-bound. (Until Luigi releases polling for em
> > interfaces... :-)
> >
> I´ll try changing the packet sizes to figure out optimum.
>

Try my port of the netbsd kttcp kernel module.  You can find it at

http://www.freebsd.org/~sam

It will eliminate the system calls.  Don't recall if you said your system is
a dual-processor; I never tried it on SMP hardware.

Sam


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: ENOBUFS

2002-10-16 Thread Luigi Rizzo

On Wed, Oct 16, 2002 at 08:57:19AM +0300, Petri Helenius wrote:
> >
> > how large are the packets and how fast is the box ?
> 
> Packets go out at an average size of 1024 bytes. The box is dual
> P4 Xeon 2400/400 so I think it should qualify as "fast" ? I disabled

yes, it qualifies as fast. With this kind of box, a trivial
program can send short (18 byte payload, 64 byte total)
UDP frames at 5-600kpps, with quite a bit of time i suspect is
being spent in the userland-kernel transition (with some tricks
to skip that i went up to ~680kpps).

> The information I´m looking for is how to instrument where the

hard to tell -- see if short packets you get the same performance
i mention above, then maybe try some tricks such as sending
short bursts (5-10 pkts at a time) on each of the interfaces.

Maybe using a UP kernel as opposed to an SMP one might give you
slightly better performance, i am not sure though.

There might be some minor optimizations here and there which could
possibly help (e.g. make th em driver use m_getcl(), remove IPSEC
from the kernel if you have it) but you are essentially close to
the speed you can get with that box (within a factor of 2, probably).

cheers
luigi

> > on a fast box you should be able to generate packets faster than wire
> > speed for sizes around 500bytes, meaning that you are going to saturate
> > the queue no matter how large it is.
> >
> > cheers
> > luigi
> >
> > > em-interface is running 66/64 and is there a way to see interface queue
> depth?
> > > em0:  port
> 0x3040-0x307f
> > > mem 0xfc22-0xfc23 irq 17 at device 3.0 on pci2
> > > em0:  Speed:1000 Mbps  Duplex:Full
> > > pcib2:  at device 29.0 on pci1
> > > IOAPIC #2 intpin 0 -> irq 16
> > > IOAPIC #2 intpin 6 -> irq 17
> > > IOAPIC #2 intpin 7 -> irq 18
> > > pci2:  on pcib2
> > >
> > > The OS is 4.7-RELEASE.
> > >
> > > Pete
> > >
> > >
> > >
> > > To Unsubscribe: send mail to [EMAIL PROTECTED]
> > > with "unsubscribe freebsd-net" in the body of the message
> >
> 
> 

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: ENOBUFS

2002-10-15 Thread Lars Eggert

Petri Helenius wrote:
>>The 900Mbps are similar to what I see here on similar hardware.
> 
> What kind of receive performance do you observe? I haven´t got that
> far yet.

Less :-) Let me tell you tomorrow, don't have the numbers here right now.

> 600Mbps per interface. I´m going to try this out also on -CURRENT
> to see if it changes anything. Interrupts do not seem to pose a big
> problem because I´m seeing only a few thousand em interrupts
> a second but since every packet involves a write call there are >100k
> syscalls a second.

So maybe syscalls/second are the bottleneck. On -current, try enabling 
zero copy sockets, it seems to help somewhat. Other than that, I've not 
found -current to be much different in terms of performance.

If you're just interested in maxing throughput, try sending over TCP 
with large write sizes. In that case, syscall overhead is less, since 
you amortize it over multiple packets. (But there are different issues 
that can limit TCP throughput.)

> I´ll try changing the packet sizes to figure out optimum.

I think I remember that 4K packets were fastest with the em hardware in 
our case.

Lars
-- 
Lars Eggert <[EMAIL PROTECTED]>   USC Information Sciences Institute



smime.p7s
Description: S/MIME Cryptographic Signature


Re: ENOBUFS

2002-10-15 Thread Petri Helenius

> The 900Mbps are similar to what I see here on similar hardware.

What kind of receive performance do you observe? I haven´t got that
far yet.
>
> For your two-interface setup, are the 600Mbps aggregate send rate on
> both interfaces, or do you see 600Mbps per interface? In the latter

600Mbps per interface. I´m going to try this out also on -CURRENT
to see if it changes anything. Interrupts do not seem to pose a big
problem because I´m seeing only a few thousand em interrupts
a second but since every packet involves a write call there are >100k
syscalls a second.

> case, is your CPU maxed out? Only one can be in the kernel under
> -stable, so the second one won't help much. With small packets like
> that, you may be interrupt-bound. (Until Luigi releases polling for em
> interfaces... :-)
>
I´ll try changing the packet sizes to figure out optimum.

Pete



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: ENOBUFS

2002-10-15 Thread Lars Eggert

Petri Helenius wrote:
>>how large are the packets and how fast is the box ?
> 
> 
> Packets go out at an average size of 1024 bytes. The box is dual
> P4 Xeon 2400/400 so I think it should qualify as "fast" ? I disabled
> hyperthreading to figure out if it was causing problems. I seem to
> be able to send packets at a rate in the 900Mbps when just sending
> them out with a process. If I do similar sending on two interfaces at
> same time, it tops out at 600Mbps.

The 900Mbps are similar to what I see here on similar hardware.

For your two-interface setup, are the 600Mbps aggregate send rate on 
both interfaces, or do you see 600Mbps per interface? In the latter 
case, is your CPU maxed out? Only one can be in the kernel under 
-stable, so the second one won't help much. With small packets like 
that, you may be interrupt-bound. (Until Luigi releases polling for em 
interfaces... :-)

Lars
-- 
Lars Eggert <[EMAIL PROTECTED]>   USC Information Sciences Institute



smime.p7s
Description: S/MIME Cryptographic Signature


Re: ENOBUFS

2002-10-15 Thread Petri Helenius

>
> how large are the packets and how fast is the box ?

Packets go out at an average size of 1024 bytes. The box is dual
P4 Xeon 2400/400 so I think it should qualify as "fast" ? I disabled
hyperthreading to figure out if it was causing problems. I seem to
be able to send packets at a rate in the 900Mbps when just sending
them out with a process. If I do similar sending on two interfaces at
same time, it tops out at 600Mbps.

The information I´m looking for is how to instrument where the
bottleneck is to either tune the parameters or report a bug in PCI or
em code. (or just simply swap the GE hardware to something that
works better)

Pete


> on a fast box you should be able to generate packets faster than wire
> speed for sizes around 500bytes, meaning that you are going to saturate
> the queue no matter how large it is.
>
> cheers
> luigi
>
> > em-interface is running 66/64 and is there a way to see interface queue
depth?
> > em0:  port
0x3040-0x307f
> > mem 0xfc22-0xfc23 irq 17 at device 3.0 on pci2
> > em0:  Speed:1000 Mbps  Duplex:Full
> > pcib2:  at device 29.0 on pci1
> > IOAPIC #2 intpin 0 -> irq 16
> > IOAPIC #2 intpin 6 -> irq 17
> > IOAPIC #2 intpin 7 -> irq 18
> > pci2:  on pcib2
> >
> > The OS is 4.7-RELEASE.
> >
> > Pete
> >
> >
> >
> > To Unsubscribe: send mail to [EMAIL PROTECTED]
> > with "unsubscribe freebsd-net" in the body of the message
>



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: ENOBUFS

2002-10-15 Thread Lars Eggert

Petri Helenius wrote:
>>Probably means that your outgoing interface queue is filling up.
>>ENOBUFS is the only way the kernel has to tell you ``slow down!''.
>>
> 
> How much should I be able to send to two em interfaces on one
> 66/64 PCI ?

I've seen netperf UDP throughputs of ~950Mpbs with a fiber em card and 
4K datagrams on a 2.4Ghz P4.

Lars
-- 
Lars Eggert <[EMAIL PROTECTED]>   USC Information Sciences Institute



smime.p7s
Description: S/MIME Cryptographic Signature


Re: ENOBUFS

2002-10-15 Thread Luigi Rizzo

On Wed, Oct 16, 2002 at 02:04:11AM +0300, Petri Helenius wrote:
> >
> > What rate are you sending these packets at? A standard interface queue
> > length is 50 packets, you get ENOBUFS when it's full.
> >
> This might explain the phenomenan. (packets are going out bursty, with average
> hovering at ~500Mbps:ish) I recomplied kernel with IFQ_MAXLEN of 5000
> but there seems to be no change in the behaviour. How do I make sure that

how large are the packets and how fast is the box ?
on a fast box you should be able to generate packets faster than wire
speed for sizes around 500bytes, meaning that you are going to saturate
the queue no matter how large it is.

cheers
luigi

> em-interface is running 66/64 and is there a way to see interface queue depth?
> em0:  port 0x3040-0x307f
> mem 0xfc22-0xfc23 irq 17 at device 3.0 on pci2
> em0:  Speed:1000 Mbps  Duplex:Full
> pcib2:  at device 29.0 on pci1
> IOAPIC #2 intpin 0 -> irq 16
> IOAPIC #2 intpin 6 -> irq 17
> IOAPIC #2 intpin 7 -> irq 18
> pci2:  on pcib2
> 
> The OS is 4.7-RELEASE.
> 
> Pete
> 
> 
> 
> To Unsubscribe: send mail to [EMAIL PROTECTED]
> with "unsubscribe freebsd-net" in the body of the message

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: ENOBUFS

2002-10-15 Thread Petri Helenius

> 
> Probably means that your outgoing interface queue is filling up.
> ENOBUFS is the only way the kernel has to tell you ``slow down!''.
> 
How much should I be able to send to two em interfaces on one
66/64 PCI ?

Pete



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: ENOBUFS

2002-10-15 Thread Petri Helenius

>
> What rate are you sending these packets at? A standard interface queue
> length is 50 packets, you get ENOBUFS when it's full.
>
This might explain the phenomenan. (packets are going out bursty, with average
hovering at ~500Mbps:ish) I recomplied kernel with IFQ_MAXLEN of 5000
but there seems to be no change in the behaviour. How do I make sure that
em-interface is running 66/64 and is there a way to see interface queue depth?
em0:  port 0x3040-0x307f
mem 0xfc22-0xfc23 irq 17 at device 3.0 on pci2
em0:  Speed:1000 Mbps  Duplex:Full
pcib2:  at device 29.0 on pci1
IOAPIC #2 intpin 0 -> irq 16
IOAPIC #2 intpin 6 -> irq 17
IOAPIC #2 intpin 7 -> irq 18
pci2:  on pcib2

The OS is 4.7-RELEASE.

Pete



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



ENOBUFS

2002-10-15 Thread Garrett Wollman

< said:

> My processes writing to SOCK_DGRAM sockets are getting ENOBUFS 

Probably means that your outgoing interface queue is filling up.
ENOBUFS is the only way the kernel has to tell you ``slow down!''.

-GAWollman


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: ENOBUFS

2002-10-15 Thread Julian Elischer



On Wed, 16 Oct 2002, Petri Helenius wrote:

> 
> My processes writing to SOCK_DGRAM sockets are getting ENOBUFS 
> while netstat -s counter under the heading of "ip" is incrementing:
> 7565828 output packets dropped due to no bufs, etc.
> but netstat -m shows:

my guess is that the interface has no more room in it's output queue..
when you get the error, back off a bit..



> > netstat -m
> 579/1440/131072 mbufs in use (current/peak/max):
> 578 mbufs allocated to data
> 1 mbufs allocated to packet headers
> 576/970/32768 mbuf clusters in use (current/peak/max)
> 2300 Kbytes allocated to network (2% of mb_map in use)
> 0 requests for memory denied
> 0 requests for memory delayed
> 0 calls to protocol drain routines
> 
> Where should I start looking? The interface is em
> 
> Pete
> 
> 
> 
> To Unsubscribe: send mail to [EMAIL PROTECTED]
> with "unsubscribe freebsd-net" in the body of the message
> 


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: ENOBUFS

2002-10-15 Thread Lars Eggert

Petri Helenius wrote:
> My processes writing to SOCK_DGRAM sockets are getting ENOBUFS 
> while netstat -s counter under the heading of "ip" is incrementing:
> 7565828 output packets dropped due to no bufs, etc.

What rate are you sending these packets at? A standard interface queue 
length is 50 packets, you get ENOBUFS when it's full.

Lars
-- 
Lars Eggert <[EMAIL PROTECTED]>   USC Information Sciences Institute



smime.p7s
Description: S/MIME Cryptographic Signature


ENOBUFS

2002-10-15 Thread Petri Helenius


My processes writing to SOCK_DGRAM sockets are getting ENOBUFS 
while netstat -s counter under the heading of "ip" is incrementing:
7565828 output packets dropped due to no bufs, etc.
but netstat -m shows:
> netstat -m
579/1440/131072 mbufs in use (current/peak/max):
578 mbufs allocated to data
1 mbufs allocated to packet headers
576/970/32768 mbuf clusters in use (current/peak/max)
2300 Kbytes allocated to network (2% of mb_map in use)
0 requests for memory denied
0 requests for memory delayed
0 calls to protocol drain routines

Where should I start looking? The interface is em

Pete



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: ip_output and ENOBUFS

2002-03-27 Thread Andrew Gallatin


Julian Elischer writes:
 > 
 > 
 > On Wed, 27 Mar 2002, Andrew Gallatin wrote:
 > 
 > > 
 > > Archie Cobbs writes:
 > >  > Luigi Rizzo writes:
 > >  > > > Is if_tx_rdy() something that can be used generally or does it only
 > >  > > > work with dummynet ?
 > >  > > 
 > >  > > well, the function is dummynet-specific, but I would certainly like
 > >  > > a generic callback list to be implemented in ifnet which is
 > >  > > invoked on tx_empty events.
 > >  > 
 > >  > Me too :-)
 > >  > 
 > >  > > The problem as usual is that you have to touch every single device
 > >  > > driver... Fortunately we can leave the ifnet structure unmodified
 > >  > > because i just discovered there is an ifindex2ifnet array which is
 > >  > > managed and can be extended to point to additional ifnet state that
 > >  > > does not fit in the immutable one...
 > >  > 
 > >  > Why is it important to avoid changing 'struct ifnet' ?
 > > 
 > > To maintain binary compatability for commercial network drivers.
 > > 
 > > Currently, network driver modules built on 4.1.1 work on all versions
 > > of FreeBSD through 4.5-STABLE.
 > > 
 > Not QUITE true..
 > 
 > they ar ebroken in some cases for 4.4 amd 4.5 due to a renumberring 
 > of SYSINIT orderings, but I fixed that and they should work in 4.6
 > again..  I know we hit it here with some cards we have..
 > I just made a small patch in teh local trees to allow us to use them.
 > 
 > Some cards may not hit this problem.

I've never tried loading our driver at boot (we have customers load it
manually, or via a /usr/local/etc/rc.d script very late in boot).
4.5 works fine for us.

There was a bit of breakage just after 4.5 when for ARP support for
variable length link level addresses was MFCed, but I caught that
early..

Drew

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: ip_output and ENOBUFS

2002-03-27 Thread Julian Elischer



On Wed, 27 Mar 2002, Andrew Gallatin wrote:

> 
> Archie Cobbs writes:
>  > Luigi Rizzo writes:
>  > > > Is if_tx_rdy() something that can be used generally or does it only
>  > > > work with dummynet ?
>  > > 
>  > > well, the function is dummynet-specific, but I would certainly like
>  > > a generic callback list to be implemented in ifnet which is
>  > > invoked on tx_empty events.
>  > 
>  > Me too :-)
>  > 
>  > > The problem as usual is that you have to touch every single device
>  > > driver... Fortunately we can leave the ifnet structure unmodified
>  > > because i just discovered there is an ifindex2ifnet array which is
>  > > managed and can be extended to point to additional ifnet state that
>  > > does not fit in the immutable one...
>  > 
>  > Why is it important to avoid changing 'struct ifnet' ?
> 
> To maintain binary compatability for commercial network drivers.
> 
> Currently, network driver modules built on 4.1.1 work on all versions
> of FreeBSD through 4.5-STABLE.
> 
Not QUITE true..

they ar ebroken in some cases for 4.4 amd 4.5 due to a renumberring 
of SYSINIT orderings, but I fixed that and they should work in 4.6
again..  I know we hit it here with some cards we have..
I just made a small patch in teh local trees to allow us to use them.

Some cards may not hit this problem.


> 
> Drew
> 
> To Unsubscribe: send mail to [EMAIL PROTECTED]
> with "unsubscribe freebsd-net" in the body of the message
> 


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: ip_output and ENOBUFS

2002-03-27 Thread Julian Elischer



On Wed, 27 Mar 2002, Archie Cobbs wrote:

> Luigi Rizzo writes:
> > > Is if_tx_rdy() something that can be used generally or does it only
> > > work with dummynet ?
> > 
> > well, the function is dummynet-specific, but I would certainly like
> > a generic callback list to be implemented in ifnet which is
> > invoked on tx_empty events.
> 
> Me too :-)
> 
> > The problem as usual is that you have to touch every single device
> > driver... Fortunately we can leave the ifnet structure unmodified
> > because i just discovered there is an ifindex2ifnet array which is
> > managed and can be extended to point to additional ifnet state that
> > does not fit in the immutable one...
> 
> Why is it important to avoid changing 'struct ifnet' ?

You can't touch struct ifnet in a released line of systems

e.g. 4.x must not touch struct ifnet of break binary compatibility
with drivers written for earlier 4.x systems. (and not available in
source).. it turns out that sync interface cards are the single largest 
set of binary drivers...

> 
> -Archie
> 
> __
> Archie Cobbs * Packet Design * http://www.packetdesign.com
> 
> To Unsubscribe: send mail to [EMAIL PROTECTED]
> with "unsubscribe freebsd-net" in the body of the message
> 


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: ip_output and ENOBUFS

2002-03-27 Thread Julian Elischer



On Wed, 27 Mar 2002, Luigi Rizzo wrote:

> On Wed, Mar 27, 2002 at 09:53:00AM -0800, Archie Cobbs wrote:
> ...
> > > managed and can be extended to point to additional ifnet state that
> > > does not fit in the immutable one...
> > 
> > Why is it important to avoid changing 'struct ifnet' ?
> 
> backward compatibility with binary-only drivers ...
> Not that i care too much (in the end it is for the benefit of
> a limited set of people which could as well not upgrade, vs.
> preventing useful functionality to be added in a safe way),
> but some people do and i can see their point.
> On the other hand, this also means we can never progress if not
> on major releases or unless changes slip in unnoticed...

It is possible to hang extra info off the ifnet structure if one is
careful. just not IN it..

> 
>   cheers
>   luigi
> 
> > -Archie
> > 
> > __
> > Archie Cobbs * Packet Design * http://www.packetdesign.com
> > 
> > To Unsubscribe: send mail to [EMAIL PROTECTED]
> > with "unsubscribe freebsd-net" in the body of the message
> 
> To Unsubscribe: send mail to [EMAIL PROTECTED]
> with "unsubscribe freebsd-net" in the body of the message
> 


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: ip_output and ENOBUFS

2002-03-27 Thread Luigi Rizzo

On Wed, Mar 27, 2002 at 09:53:00AM -0800, Archie Cobbs wrote:
...
> > managed and can be extended to point to additional ifnet state that
> > does not fit in the immutable one...
> 
> Why is it important to avoid changing 'struct ifnet' ?

backward compatibility with binary-only drivers ...
Not that i care too much (in the end it is for the benefit of
a limited set of people which could as well not upgrade, vs.
preventing useful functionality to be added in a safe way),
but some people do and i can see their point.
On the other hand, this also means we can never progress if not
on major releases or unless changes slip in unnoticed...

cheers
luigi

> -Archie
> 
> __
> Archie Cobbs * Packet Design * http://www.packetdesign.com
> 
> To Unsubscribe: send mail to [EMAIL PROTECTED]
> with "unsubscribe freebsd-net" in the body of the message

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: ip_output and ENOBUFS

2002-03-27 Thread Andrew Gallatin


Archie Cobbs writes:
 > Luigi Rizzo writes:
 > > > Is if_tx_rdy() something that can be used generally or does it only
 > > > work with dummynet ?
 > > 
 > > well, the function is dummynet-specific, but I would certainly like
 > > a generic callback list to be implemented in ifnet which is
 > > invoked on tx_empty events.
 > 
 > Me too :-)
 > 
 > > The problem as usual is that you have to touch every single device
 > > driver... Fortunately we can leave the ifnet structure unmodified
 > > because i just discovered there is an ifindex2ifnet array which is
 > > managed and can be extended to point to additional ifnet state that
 > > does not fit in the immutable one...
 > 
 > Why is it important to avoid changing 'struct ifnet' ?

To maintain binary compatability for commercial network drivers.

Currently, network driver modules built on 4.1.1 work on all versions
of FreeBSD through 4.5-STABLE.


Drew

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: ip_output and ENOBUFS

2002-03-27 Thread Archie Cobbs

Luigi Rizzo writes:
> > Is if_tx_rdy() something that can be used generally or does it only
> > work with dummynet ?
> 
> well, the function is dummynet-specific, but I would certainly like
> a generic callback list to be implemented in ifnet which is
> invoked on tx_empty events.

Me too :-)

> The problem as usual is that you have to touch every single device
> driver... Fortunately we can leave the ifnet structure unmodified
> because i just discovered there is an ifindex2ifnet array which is
> managed and can be extended to point to additional ifnet state that
> does not fit in the immutable one...

Why is it important to avoid changing 'struct ifnet' ?

-Archie

__
Archie Cobbs * Packet Design * http://www.packetdesign.com

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: ip_output and ENOBUFS

2002-03-26 Thread Luigi Rizzo

On Tue, Mar 26, 2002 at 10:48:33PM -0800, Archie Cobbs wrote:
> Luigi Rizzo writes:
> > As a matter of fact, i even implemented a similar thing in dummynet,
> > and if device drivers call if_tx_rdy() when they complete a
> > transmission, then the tx interrupt can be used to clock
> > packets out of the dummynet pipes. A patch for if_tun.c is below,
> 
> So if_tx_rdy() sounds like my if_get_next().. guess you already did that :-)

yes, but it does not solve the problem of the original poster who
wanted to block/wakeup processes getting enobufs. Signal just
do not propagate beyond the pipe they are sent to.

> Is if_tx_rdy() something that can be used generally or does it only
> work with dummynet ?

well, the function is dummynet-specific, but I would certainly like
a generic callback list to be implemented in ifnet which is
invoked on tx_empty events.
So Ckernel modules could hook their own callback to the list and
get notified of events when they occur.

The problem as usual is that you have to touch every single device
driver... Fortunately we can leave the ifnet structure unmodified
because i just discovered there is an ifindex2ifnet array which is
managed and can be extended to point to additional ifnet state that
does not fit in the immutable one...

cheers
luigi

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: ip_output and ENOBUFS

2002-03-26 Thread Archie Cobbs

Luigi Rizzo writes:
> > Along those lines, this might be a handy thing to add...
> > 
> > int if_get_next(struct ifnet *ifp); /* runs at splimp() */
> > 
> > This function tries to "get" the next packet scheduled to go
> > out interface 'ifp' and, if successful, puts it on &ifp->if_snd
> > (the interface output queue for 'ifp') and returns 1; otherwise,
> > it returns zero.
> 
> how is this different from having a longer device queue ?

The idea is that if_get_next() may in turn call some scheduling
code that intelligently decides what packet gets to go next.
So, when this kind of thing is enabled, the device queue basically
always has either zero or one packets on it.

In effect, this allows you to move the interface output queue
out of the (dumb) device driver upwards in the networking stack,
where e.g. a netgraph node can make the scheduling decision.

The existing fixed length FIFO queues at each device mean you can't
do intelligent scheduling of packets, because you can't manage
that queue, because part of "managing" the queue is knowing when
it goes empty.

-Archie

__
Archie Cobbs * Packet Design * http://www.packetdesign.com

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: ip_output and ENOBUFS

2002-03-26 Thread Archie Cobbs

Luigi Rizzo writes:
> As a matter of fact, i even implemented a similar thing in dummynet,
> and if device drivers call if_tx_rdy() when they complete a
> transmission, then the tx interrupt can be used to clock
> packets out of the dummynet pipes. A patch for if_tun.c is below,

So if_tx_rdy() sounds like my if_get_next().. guess you already did that :-)

Is if_tx_rdy() something that can be used generally or does it only
work with dummynet ?

-Archie

__
Archie Cobbs * Packet Design * http://www.packetdesign.com

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: ip_output and ENOBUFS

2002-03-26 Thread Luigi Rizzo

the ENOBUFS is very typical with UDP applications that try to
send as fast as possible (e.g. the various network test utilities
in ports), and as i said in a previous message, putting up a mechanism to
pass around queue full/queue not full events is expensive because
it might trigger on every single packet, and possibly have to wakeup
multiple processes each time (with only one being able to succeed).

The tcp handling of ENOBUFS is much cheaper.
TCP is not waken up by the device, but from acks coming from the other
side, or from timeouts. So there is not per-packet overhead just to
implement this mechanism.

As a matter of fact, i even implemented a similar thing in dummynet,
and if device drivers call if_tx_rdy() when they complete a
transmission, then the tx interrupt can be used to clock
packets out of the dummynet pipes. A patch for if_tun.c is below,
and if_tx_rdy() is in netinet/ip_dummynet.c. You could replace
the call to if_tx_rdy with a wakeup() using some appropriate
argument to wake up threads waiting for devices to become ready.

cheers
luigi

> lcvs diff -u if_tun.c
Index: if_tun.c
===
RCS file: /home/ncvs/src/sys/net/if_tun.c,v
retrieving revision 1.51.2.2
diff -u -r1.51.2.2 if_tun.c
--- if_tun.c28 Jul 1999 15:08:06 -  1.51.2.2
+++ if_tun.c19 Jun 2000 12:07:17 -
@@ -19,6 +19,7 @@
 
 #include "opt_devfs.h"
 #include "opt_inet.h"
+#include "opt_ipdn.h"
 
 #include 
 #include 
@@ -162,6 +163,10 @@
ifp = &tp->tun_if;
tp->tun_flags |= TUN_OPEN;
TUNDEBUG("%s%d: open\n", ifp->if_name, ifp->if_unit);
+#ifdef DUMMYNET
+   if (ifp->if_snd.ifq_len == 0) /* better be! */
+   if_tx_rdy(ifp);
+#endif
return (0);
 }
 
@@ -487,6 +492,10 @@
}
}
} while (m0 == 0);
+#ifdef DUMMYNET
+   if (ifp->if_snd.ifq_len == 0)
+   if_tx_rdy(ifp);
+#endif
splx(s);
 
while (m0 && uio->uio_resid > 0 && error == 0) {

On Tue, Mar 26, 2002 at 09:09:17AM -0800, Lars Eggert wrote:
> Matthew Luckie wrote:
> > hmm, we looked at how other protocols handled the ENOBUFS case from
> > ip_output.
> >
> > tcp_output calls tcp_quench on this error.
> >
> > while the interface may not be able to send any more packets than it
> > does currently, closing the congestion window back to 1 segment
> > seems a severe way to handle this error, knowing that the network
> > did not drop the packet due to congestion.  Ideally, there might be
> > some form of blocking until such time as a mbuf comes available.
> > This sounds as if it will be much easier come FreeBSD 5.0
> 
> TCP will almost never encouter this scenario, since it's self-clocking.
> The NIC is very rarely the bottleneck resource for a given network
> connection. Have you looked at mean queue lengths for NICs? They are
> typically zero or one. The NIC will only be the bottleneck if you are
> sending at a higher rate than line speed and your burt time is too long
> to be absorbed by the queue.
> 
> > I'm aware that if people are hitting this condition, they need to
> > increase the number of mbufs to get maximum performance.
> 
> No. ENOBUFS in ip_output almost always means that your NIC queue is
> full, which isn't controlled through mbufs. You can make the queue 
> longer, but that won't help if you're sending too fast.
> 
> > This section of code has previously been discussed here:
> > http://docs.freebsd.org/cgi/getmsg.cgi?fetch=119188+0+archive/2000/fr-
> > eebsd-net/2730.freebsd-net and has been in use for many years (a
> 
> This is a slightly different problem than you describe. What Archie saw
> was an ENOBUFS being handled like a loss inside the network, even though
> the sender has information locally that can allow it to make smarter
> retransmission decisions.
> 
> Lars
> -- 
> Lars Eggert <[EMAIL PROTECTED]>   Information Sciences Institute
> http://www.isi.edu/larse/  University of Southern California



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: ip_output and ENOBUFS

2002-03-26 Thread Luigi Rizzo

On Tue, Mar 26, 2002 at 10:10:05PM -0800, Archie Cobbs wrote:
> Luigi Rizzo writes:
...
> Along those lines, this might be a handy thing to add...
> 
> int if_get_next(struct ifnet *ifp);   /* runs at splimp() */
> 
> This function tries to "get" the next packet scheduled to go
> out interface 'ifp' and, if successful, puts it on &ifp->if_snd
> (the interface output queue for 'ifp') and returns 1; otherwise,
> it returns zero.

how is this different from having a longer device queue ?

cheers
luigi

> Then, each device driver can be modified (over time) to invoke
> this function when it gets a transmit interrupt and it's output
> queue is empty. If the function returns 1, grab the new packet
> off the queue and schedule it for transmission.
> 
> Once this is done it becomes much easier to hack together ideas
> for queueing and scheduling e.g., a netgraph node that does packet
> scheduling.
> 
> I think ALTQ does something like this. It would be nice if it
> was generic enough that other mechanisms besides ALTQ (like
> netgraph) could also use it. I'm not that familiar with how
> ALTQ is implemented.
> 
> -Archie
> 
> __
> Archie Cobbs * Packet Design * http://www.packetdesign.com
> 
> To Unsubscribe: send mail to [EMAIL PROTECTED]
> with "unsubscribe freebsd-net" in the body of the message

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: ip_output and ENOBUFS

2002-03-26 Thread Archie Cobbs

Luigi Rizzo writes:
> > >if you could suggest a few modifications that would be required, i'd like
> > >to pursue this further.
> > 
> > Look at tsleep/wakeup on ifnet of if_snd.
> 
> I am under the impression that implementing this mechanism would
> not be so trivial. It is not immediate to tell back to the caller
> on which interface ip_output() failed. Nor there is a common place
> that i know of where you can be notified that a packet was successfully
> transmitted -- i suspect you should patch all individual drivers.
> Finally, there is the question on whether you do a wakeup as soon
> as you get a free slot in the queue (in which case you most likely
> end up paying the cost of a tsleep/wakeup pair on each transmission),
> or you put some histeresys.

Along those lines, this might be a handy thing to add...

int if_get_next(struct ifnet *ifp); /* runs at splimp() */

This function tries to "get" the next packet scheduled to go
out interface 'ifp' and, if successful, puts it on &ifp->if_snd
(the interface output queue for 'ifp') and returns 1; otherwise,
it returns zero.

Then, each device driver can be modified (over time) to invoke
this function when it gets a transmit interrupt and it's output
queue is empty. If the function returns 1, grab the new packet
off the queue and schedule it for transmission.

Once this is done it becomes much easier to hack together ideas
for queueing and scheduling e.g., a netgraph node that does packet
scheduling.

I think ALTQ does something like this. It would be nice if it
was generic enough that other mechanisms besides ALTQ (like
netgraph) could also use it. I'm not that familiar with how
ALTQ is implemented.

-Archie

__
Archie Cobbs * Packet Design * http://www.packetdesign.com

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: ip_output and ENOBUFS

2002-03-26 Thread Lars Eggert

Matthew Luckie wrote:
 > hmm, we looked at how other protocols handled the ENOBUFS case from
 > ip_output.
 >
 > tcp_output calls tcp_quench on this error.
 >
 > while the interface may not be able to send any more packets than it
 > does currently, closing the congestion window back to 1 segment
 > seems a severe way to handle this error, knowing that the network
 > did not drop the packet due to congestion.  Ideally, there might be
 > some form of blocking until such time as a mbuf comes available.
 > This sounds as if it will be much easier come FreeBSD 5.0

TCP will almost never encouter this scenario, since it's self-clocking.
The NIC is very rarely the bottleneck resource for a given network
connection. Have you looked at mean queue lengths for NICs? They are
typically zero or one. The NIC will only be the bottleneck if you are
sending at a higher rate than line speed and your burt time is too long
to be absorbed by the queue.

 > I'm aware that if people are hitting this condition, they need to
 > increase the number of mbufs to get maximum performance.

No. ENOBUFS in ip_output almost always means that your NIC queue is
full, which isn't controlled through mbufs. You can make the queue 
longer, but that won't help if you're sending too fast.

 > This section of code has previously been discussed here:
 > http://docs.freebsd.org/cgi/getmsg.cgi?fetch=119188+0+archive/2000/fr-
 > eebsd-net/2730.freebsd-net and has been in use for many years (a

This is a slightly different problem than you describe. What Archie saw
was an ENOBUFS being handled like a loss inside the network, even though
the sender has information locally that can allow it to make smarter
retransmission decisions.

Lars
-- 
Lars Eggert <[EMAIL PROTECTED]>   Information Sciences Institute
http://www.isi.edu/larse/  University of Southern California



smime.p7s
Description: S/MIME Cryptographic Signature


Re: ip_output and ENOBUFS

2002-03-26 Thread Matthew Luckie

> I am under the impression that implementing this mechanism would
> not be so trivial.

hmm, we looked at how other protocols handled the ENOBUFS case from
ip_output.

tcp_output calls tcp_quench on this error.

while the interface may not be able to send any more packets than it does
currently, closing the congestion window back to 1 segment seems a severe
way to handle this error, knowing that the network did not drop the packet
due to congestion.  Ideally, there might be some form of blocking until
such time as a mbuf comes available.  This sounds as if it will be much
easier come FreeBSD 5.0

I'm aware that if people are hitting this condition, they need to increase
the number of mbufs to get maximum performance.

This section of code has previously been discussed here:
http://docs.freebsd.org/cgi/getmsg.cgi?fetch=119188+0+archive/2000/freebsd-net/2730.freebsd-net
and has been in use for many years (a glance at TCP/IP Illustrated Vol 2
shows similar code), so there is probably a good reason that I am not
aware of for this code to be in place.

Comments?


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: ip_output and ENOBUFS

2002-03-25 Thread Luigi Rizzo

On Mon, Mar 25, 2002 at 02:06:19PM -0800, Lars Eggert wrote:
> Matthew Luckie wrote:
> >>>Is there a mechanism to tell when ip_output should be called again?
...
> >if you could suggest a few modifications that would be required, i'd like
> >to pursue this further.
> 
> Look at tsleep/wakeup on ifnet of if_snd.

I am under the impression that implementing this mechanism would
not be so trivial. It is not immediate to tell back to the caller
on which interface ip_output() failed. Nor there is a common place
that i know of where you can be notified that a packet was successfully
transmitted -- i suspect you should patch all individual drivers.
Finally, there is the question on whether you do a wakeup as soon
as you get a free slot in the queue (in which case you most likely
end up paying the cost of a tsleep/wakeup pair on each transmission),
or you put some histeresys.

cheers
luigi
> Lars
> -- 
> Lars Eggert <[EMAIL PROTECTED]>   Information Sciences Institute
> http://www.isi.edu/larse/  University of Southern California



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: ip_output and ENOBUFS

2002-03-25 Thread Julian Elischer



On Mon, 25 Mar 2002, Matthew Luckie wrote:

> Hi
> 
> 
> Is there a mechanism to tell when ip_output should be called again?
> Ideally, I would block until such time as i could send it via ip_output


no, there is no such mechanism that I know of..

> 
> (please CC: me on any responses)
> 
> Matthew Luckie
> [EMAIL PROTECTED]
> 


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: ip_output and ENOBUFS

2002-03-25 Thread Lars Eggert

Lars Eggert wrote:
> Matthew Luckie wrote:
> 
 Is there a mechanism to tell when ip_output should be called again?
 Ideally, I would block until such time as i could send it via ip_output
>>>
>>>
>>> You probably get that because the outbound interface queue gets full, 
>>> so you want to block your caller until space becomes available there. 
>>> There currently is no such mechanism (AFAIK, and talking about 
>>> -STABLE here), but it's not too much work to add.
>>
>>
>> if you could suggest a few modifications that would be required, i'd like
>> to pursue this further.
> 
> 
> Look at tsleep/wakeup on ifnet of if_snd.
  ^^
  or

Sorry, big fingers.
-- 
Lars Eggert <[EMAIL PROTECTED]>   Information Sciences Institute
http://www.isi.edu/larse/  University of Southern California



smime.p7s
Description: S/MIME Cryptographic Signature


Re: ip_output and ENOBUFS

2002-03-25 Thread Lars Eggert

Matthew Luckie wrote:
>>>Is there a mechanism to tell when ip_output should be called again?
>>>Ideally, I would block until such time as i could send it via ip_output
>>
>>You probably get that because the outbound interface queue gets full, so 
>>you want to block your caller until space becomes available there. There 
>>currently is no such mechanism (AFAIK, and talking about -STABLE here), 
>>but it's not too much work to add.
> 
> if you could suggest a few modifications that would be required, i'd like
> to pursue this further.

Look at tsleep/wakeup on ifnet of if_snd.

Lars
-- 
Lars Eggert <[EMAIL PROTECTED]>   Information Sciences Institute
http://www.isi.edu/larse/  University of Southern California



smime.p7s
Description: S/MIME Cryptographic Signature


Re: ip_output and ENOBUFS

2002-03-25 Thread Matthew Luckie

> > Is there a mechanism to tell when ip_output should be called again?
> > Ideally, I would block until such time as i could send it via ip_output
> 
> You probably get that because the outbound interface queue gets full, so 
> you want to block your caller until space becomes available there. There 
> currently is no such mechanism (AFAIK, and talking about -STABLE here), 
> but it's not too much work to add.

i've worked at a layer above ip_output, but i havent looked to deeply at
this issue and the code in the kernel.

if you could suggest a few modifications that would be required, i'd like
to pursue this further.

Thanks for your response.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: ip_output and ENOBUFS

2002-03-25 Thread Lars Eggert

Matthew Luckie wrote:
> I have written a syscall that creates a packet in kernel-space,
> timestamps it, and then sends it via ip_output
> 
> If the user-space application uses this system call faster than the
> packets can be sent, ip_output will return ENOBUFS.
> 
> Is there a mechanism to tell when ip_output should be called again?
> Ideally, I would block until such time as i could send it via ip_output

You probably get that because the outbound interface queue gets full, so 
you want to block your caller until space becomes available there. There 
currently is no such mechanism (AFAIK, and talking about -STABLE here), 
but it's not too much work to add.

Not sure if this is really useful though. Ususally the NIC doesn't limit 
your transmission speed, it's losses inside the network that do. Also, 
why a new system call? Is it that much more efficient than RawIP?

Lars
-- 
Lars Eggert <[EMAIL PROTECTED]>   Information Sciences Institute
http://www.isi.edu/larse/  University of Southern California



smime.p7s
Description: S/MIME Cryptographic Signature


ip_output and ENOBUFS

2002-03-25 Thread Matthew Luckie

Hi

I have written a syscall that creates a packet in kernel-space,
timestamps it, and then sends it via ip_output

If the user-space application uses this system call faster than the
packets can be sent, ip_output will return ENOBUFS.

Is there a mechanism to tell when ip_output should be called again?
Ideally, I would block until such time as i could send it via ip_output

(please CC: me on any responses)

Matthew Luckie
[EMAIL PROTECTED]


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: ENOBUFS and network performance tuning

2001-09-25 Thread Mike Silbersack


On Tue, 25 Sep 2001, Jeff Behl wrote:

> Any other guidelines to help tune a FreeBSD box for this sort of use
> would be greatly appreciated.  Currently, the only change we make is
> increasing MAXUSERS to 128, though I'm not sure this is the preferred
> approach.

That's the simplest approach, as it bumps up numerous kernel setting.
With 4.4 you can tune it in loader.conf, so changing the setting isn't a
big deal.

You should probably check how many sockets are sticking around in the
TIME_WAIT state and compare it to kern.ipc.maxsockets - that may be the
limit you're hitting first.

Mike "Silby" Silbersack


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



ENOBUFS and network performance tuning

2001-09-25 Thread Jeff Behl

I have 4.3, and soon to be 4.4, boxes dedicated to a single app which 
basically 'bounces' traffic between two incoming TCP connections.  After 
around 240 sessions (each session consisting of two incoming connections 
with traffic being passed between them), I started getting ENOBUFS 
errors.  netstat -m showed mbuf's never peaked, so we increased 
kern.ipc.somaxconn from 128 -> 256.  Should this help the problem?

Any other guidelines to help tune a FreeBSD box for this sort of use 
would be greatly appreciated.  Currently, the only change we make is 
increasing MAXUSERS to 128, though I'm not sure this is the preferred 
approach.

Also, is there a definitive guide to what all the kernel variables 
(sysctl -a) are?

thanks
Jeff


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message