subject:"Q\: sock output serialization"

Re: Q: sock output serialization

2000-09-19 Thread Henner Eisen

> "jamal" == jamal  <[EMAIL PROTECTED]> writes:

jamal> Packets in flight?

>> In the extreme case, there could still arrive up to the window
>> size frames.

jamal> Assuming this depends on path latency and not some bad
jamal> programming

Yes. Although the latter could also possible.

jamal> BTW, earlier i lied: there is a way to tell if your packet
jamal> will be dropped which is not very expensive:

jamal>  if (atomic_read(_dropping) /* packet will be
jamal> dropped */

jamal> but even this is 99% accurate in SMP.

Well, but better than knowing nothing about congestion state.
We could at least document in the x25iface.txt kernel doc that driver
authors should check this before acknowledging frames.

Henner

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Q: sock output serialization

2000-09-19 Thread Henner Eisen


 "jamal" == jamal  [EMAIL PROTECTED] writes:

jamal Packets in flight?

 In the extreme case, there could still arrive up to the window
 size frames.

jamal Assuming this depends on path latency and not some bad
jamal programming

Yes. Although the latter could also possible.

jamal BTW, earlier i lied: there is a way to tell if your packet
jamal will be dropped which is not very expensive:

jamal  if (atomic_read(netdev_dropping) /* packet will be
jamal dropped */

jamal but even this is 99% accurate in SMP.

Well, but better than knowing nothing about congestion state.
We could at least document in the x25iface.txt kernel doc that driver
authors should check this before acknowledging frames.

Henner


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Q: sock output serialization

2000-09-18 Thread David Woodhouse



[EMAIL PROTECTED] said:
>  I think its fixable to make it do the RR/RNR after bouncing it up the
> stack. -

ARCnet does ACK in hardware. Packets don't hit the wire until the 
destination has indicated that it's got a buffer available.

You really want to be able to reserve space on the queue before telling the
chip to accept another incoming packet - not just realise afterwards that 
you've screwed up.

Strictly speaking, this fact is irrelevant to the case in question, but if 
we're modifying the generic code for LAPB, we might as well think about other 
protocols which require similar treatment.

--
dwmw2


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Q: sock output serialization

2000-09-18 Thread David Woodhouse



[EMAIL PROTECTED] said:
  I think its fixable to make it do the RR/RNR after bouncing it up the
 stack. -

ARCnet does ACK in hardware. Packets don't hit the wire until the 
destination has indicated that it's got a buffer available.

You really want to be able to reserve space on the queue before telling the
chip to accept another incoming packet - not just realise afterwards that 
you've screwed up.

Strictly speaking, this fact is irrelevant to the case in question, but if 
we're modifying the generic code for LAPB, we might as well think about other 
protocols which require similar treatment.

--
dwmw2


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Q: sock output serialization

2000-09-17 Thread jamal




On Sun, 17 Sep 2000, Henner Eisen wrote:

> > "jamal" == jamal  <[EMAIL PROTECTED]> writes:
> No. Just, if you do not accept a frame, you must not acknowledge it.
> Once it has been acknowledged, you must not discard it.

Ok so no problem then 

> 
> jamal> Can you stop mid-window and claim there is
> jamal> congestion? (maybe time to dust off some books).
> 
> Yes.

Again, this makes life simpler. You dont have to accept the whole window.

> Just had a look at the X.25 specs again. As far as LAPB is concerned
> (and that´s what we are speeking about), it is like this:
> When your receiver is busy, you tell the other end about this by means
> of a ReceiverNotReady primitive. However, it might take some time until
> the peer receives it and reacts on this. 

Packets in flight?

> In the extreme case, there could still arrive up to the window size
> frames. 

Assuming this depends on path latency and not some bad programming

> It seems that the
> receiver can do whatever it wants to do with frames received during the
> busy condition: Either accept the frames (but delay acknowledgement until
> the busy condition is cleared) or just discard them. The first one seems
> to favor performance while the second favors simplicity.
> 
> I guess in Linux, we should usually choose simplicity. I think even
> with the simpicity variant, we could be able to preserve performance
> if we can flow control the peer earlier. E.g. when the return value of
> your netif_rx indicates 'almost congested, but still able to accept frames',
> we could already set the busy condition but continue to deliver the
> frames arriving during our busy condition. But that´s performance tuning
> and can be taken care of later (I´m even not sure wheter this tuning will
> pay off).
> 

This is doable. the 'almost congested, but still able to accept frames'
is a tunable parameter via proc.
Nobody is stopping you from maintaining your own little queue in the
driver to take the first option. The complexity is added to your driver as
opposed to the general system.

BTW, earlier i lied: there is a way to tell if your packet will be dropped
which is not very expensive:

if (atomic_read(_dropping)
/* packet will be dropped */

but even this is 99% accurate in SMP.

cheers,
jamal

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Q: sock output serialization

2000-09-17 Thread Henner Eisen


> "jamal" == jamal  <[EMAIL PROTECTED]> writes:

jamal> Hmm.. More complexity ;-> Does X.25 mandate you accept all
jamal> the window?

No. Just, if you do not accept a frame, you must not acknowledge it.
Once it has been acknowledged, you must not discard it.

jamal> Can you stop mid-window and claim there is
jamal> congestion? (maybe time to dust off some books).

Yes.

Just had a look at the X.25 specs again. As far as LAPB is concerned
(and that´s what we are speeking about), it is like this:
When your receiver is busy, you tell the other end about this by means
of a ReceiverNotReady primitive. However, it might take some time until
the peer receives it and reacts on this. In the extreme case, there
could still arrive up to the window size frames. It seems that the
receiver can do whatever it wants to do with frames received during the
busy condition: Either accept the frames (but delay acknowledgement until
the busy condition is cleared) or just discard them. The first one seems
to favor performance while the second favors simplicity.

I guess in Linux, we should usually choose simplicity. I think even
with the simpicity variant, we could be able to preserve performance
if we can flow control the peer earlier. E.g. when the return value of
your netif_rx indicates 'almost congested, but still able to accept frames',
we could already set the busy condition but continue to deliver the
frames arriving during our busy condition. But that´s performance tuning
and can be taken care of later (I´m even not sure wheter this tuning will
pay off).

Henner



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Q: sock output serialization

2000-09-17 Thread jamal




On Sun, 17 Sep 2000, Henner Eisen wrote:

> Yes, a) that would make life much simpler for driver writers (but more
> difficult for you ;). If it is doable without adding overhead to the
> general path, it would be nice to provide that semantics to HW_FLOWCONTROLed
> devices.
> 

There would be a minute overhead. But i guess let me release the patch
first then we can continue this part of the conversation. 

> However, even with a), after being HW-flow-controlled and setting rx_busy
> condition, there could still arrive some more packets until the send window
> is full. They either need to be discarded at once or queued somewhere else.
> If we don´t want to discard them, you need to accept packets up
> to the window size from a device after it has been HW flow conrolled.

Hmm.. More complexity ;->
Does X.25 mandate you accept all the window? Can you stop mid-window and
claim there is congestion? (maybe time to dust off some books).

cheers,
jamal


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Q: sock output serialization

2000-09-17 Thread Henner Eisen

Hi,

> "jamal" == jamal  <[EMAIL PROTECTED]> writes:

>>  With the current scheme, lapb first acknowleges reception of
>> the frame and after that, netif_rx() might still discard it --
>> which is evil.
>> 

jamal> This might screw things a bit. Can you defer to say first
jamal> call netif_rx() then acknowledge or is this hard-coded into
jamal> the f/ware?

This depends on the firmware. I don´t know. The software lapb module
could be modified to honor a return vale appropriately. But software
lapb should be moved above netif for several other reasons anyway 
(although even there, honoring a return value for flow control
would make sense). Maybe it is a good idea to make the congestion
return values not netif specific, but making them part of a generic
"return semantics for delivering packets to upper layers".  

The driver maintainers will need to investigate this and take appropriate
actions depending on the firmware´s capabilities.

My personal use of the X.25 stack was using it in DTE-DTE mode over
isdn where I use the isdn-driver´s internal lapb (x75i) implementation.
Unfortunatly, the interface to the isdn lower layers does not allow to
return an rx_busy condition.

>> Provided that netif_would_drop(dev) is reliable (a subsequent

jamal> I think this would make it a little more complex than
jamal> necessary; the queue state might change right after you

Yes, the scenario I had in mind (where it would have been reliable)
was a little short-sighted (see reply to Andi´s message).

jamal> If you cant defer the acknowledgement until netif_rx()
jamal> returns then what we could do is instead:

jamal> 1) for devices that are registered with hardware flow
jamal> control ==> you have to register as a
jamal> CONFIG_NET_HW_FLOWCONTROL device.

jamal> a) to let them queue that last packet before they are
jamal> shut-up, the assumption is they respect the protocol and
jamal> will 'back-off' after that.  
jamal> b) return BLG_CNG_WOULD_DROP
jamal> instead to the device and give it the responsibility to
jamal> free the skb or store it wherever it wants but not in the
jamal> backlog.

jamal> I personally prefer a). Reason: If we have done all the
jamal> work so far(context switch etc) and we know the device is
jamal> well behaved(meaning it is not going to send another packet
jamal> without beiong told things are fine) then it is probably
jamal> wiser to just let that packet get on the backlog queue.

Yes, a) that would make life much simpler for driver writers (but more
difficult for you ;). If it is doable without adding overhead to the
general path, it would be nice to provide that semantics to HW_FLOWCONTROLed
devices.

However, even with a), after being HW-flow-controlled and setting rx_busy
condition, there could still arrive some more packets until the send window
is full. They either need to be discarded at once or queued somewhere else.
If we don´t want to discard them, you need to accept packets up
to the window size from a device after it has been HW flow conrolled.

Henner

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Q: sock output serialization

2000-09-17 Thread Henner Eisen


> "Andi" == Andi Kleen <[EMAIL PROTECTED]> writes:

Andi> It would just be racy. You test, get a not drop and then
Andi> another different interrupt would deliver another packet
Andi> before you can and fill the queue.  Jamal's extended
Andi> netif_rx probably makes more sense, because it can be
Andi> atomic.

I thought if it was executed from the same single interrupt handler
(and lapb also processed from that same interrupt handler) while
local irq are disables, this could not happen.
But for smart controllers, this does not hold, they would need to
interrupt the cpu first to query the state, and than again before
delivering the packet. And for dumb cards, doing the lapb processing
inside irq handler is not nice, anyway. Moving lapb processing above
netif_rx() would resolve this and all other problems.

Henner

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Q: sock output serialization

2000-09-17 Thread Henner Eisen


 "Andi" == Andi Kleen [EMAIL PROTECTED] writes:

Andi It would just be racy. You test, get a not drop and then
Andi another different interrupt would deliver another packet
Andi before you can and fill the queue.  Jamal's extended
Andi netif_rx probably makes more sense, because it can be
Andi atomic.

I thought if it was executed from the same single interrupt handler
(and lapb also processed from that same interrupt handler) while
local irq are disables, this could not happen.
But for smart controllers, this does not hold, they would need to
interrupt the cpu first to query the state, and than again before
delivering the packet. And for dumb cards, doing the lapb processing
inside irq handler is not nice, anyway. Moving lapb processing above
netif_rx() would resolve this and all other problems.

Henner

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Q: sock output serialization

2000-09-17 Thread Henner Eisen


Hi,

 "jamal" == jamal  [EMAIL PROTECTED] writes:

  With the current scheme, lapb first acknowleges reception of
 the frame and after that, netif_rx() might still discard it --
 which is evil.
 

jamal This might screw things a bit. Can you defer to say first
jamal call netif_rx() then acknowledge or is this hard-coded into
jamal the f/ware?

This depends on the firmware. I don´t know. The software lapb module
could be modified to honor a return vale appropriately. But software
lapb should be moved above netif for several other reasons anyway 
(although even there, honoring a return value for flow control
would make sense). Maybe it is a good idea to make the congestion
return values not netif specific, but making them part of a generic
"return semantics for delivering packets to upper layers".  

The driver maintainers will need to investigate this and take appropriate
actions depending on the firmware´s capabilities.

My personal use of the X.25 stack was using it in DTE-DTE mode over
isdn where I use the isdn-driver´s internal lapb (x75i) implementation.
Unfortunatly, the interface to the isdn lower layers does not allow to
return an rx_busy condition.

 Provided that netif_would_drop(dev) is reliable (a subsequent

jamal I think this would make it a little more complex than
jamal necessary; the queue state might change right after you

Yes, the scenario I had in mind (where it would have been reliable)
was a little short-sighted (see reply to Andi´s message).

jamal If you cant defer the acknowledgement until netif_rx()
jamal returns then what we could do is instead:

jamal 1) for devices that are registered with hardware flow
jamal control == you have to register as a
jamal CONFIG_NET_HW_FLOWCONTROL device.

jamal a) to let them queue that last packet before they are
jamal shut-up, the assumption is they respect the protocol and
jamal will 'back-off' after that.  
jamal b) return BLG_CNG_WOULD_DROP
jamal instead to the device and give it the responsibility to
jamal free the skb or store it wherever it wants but not in the
jamal backlog.

jamal I personally prefer a). Reason: If we have done all the
jamal work so far(context switch etc) and we know the device is
jamal well behaved(meaning it is not going to send another packet
jamal without beiong told things are fine) then it is probably
jamal wiser to just let that packet get on the backlog queue.

Yes, a) that would make life much simpler for driver writers (but more
difficult for you ;). If it is doable without adding overhead to the
general path, it would be nice to provide that semantics to HW_FLOWCONTROLed
devices.

However, even with a), after being HW-flow-controlled and setting rx_busy
condition, there could still arrive some more packets until the send window
is full. They either need to be discarded at once or queued somewhere else.
If we don´t want to discard them, you need to accept packets up
to the window size from a device after it has been HW flow conrolled.

Henner




-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Q: sock output serialization

2000-09-17 Thread jamal




On Sun, 17 Sep 2000, Henner Eisen wrote:

  "jamal" == jamal  [EMAIL PROTECTED] writes:
 No. Just, if you do not accept a frame, you must not acknowledge it.
 Once it has been acknowledged, you must not discard it.

Ok so no problem then 

 
 jamal Can you stop mid-window and claim there is
 jamal congestion? (maybe time to dust off some books).
 
 Yes.

Again, this makes life simpler. You dont have to accept the whole window.

 Just had a look at the X.25 specs again. As far as LAPB is concerned
 (and that´s what we are speeking about), it is like this:
 When your receiver is busy, you tell the other end about this by means
 of a ReceiverNotReady primitive. However, it might take some time until
 the peer receives it and reacts on this. 

Packets in flight?

 In the extreme case, there could still arrive up to the window size
 frames. 

Assuming this depends on path latency and not some bad programming

 It seems that the
 receiver can do whatever it wants to do with frames received during the
 busy condition: Either accept the frames (but delay acknowledgement until
 the busy condition is cleared) or just discard them. The first one seems
 to favor performance while the second favors simplicity.
 
 I guess in Linux, we should usually choose simplicity. I think even
 with the simpicity variant, we could be able to preserve performance
 if we can flow control the peer earlier. E.g. when the return value of
 your netif_rx indicates 'almost congested, but still able to accept frames',
 we could already set the busy condition but continue to deliver the
 frames arriving during our busy condition. But that´s performance tuning
 and can be taken care of later (I´m even not sure wheter this tuning will
 pay off).
 

This is doable. the 'almost congested, but still able to accept frames'
is a tunable parameter via proc.
Nobody is stopping you from maintaining your own little queue in the
driver to take the first option. The complexity is added to your driver as
opposed to the general system.

BTW, earlier i lied: there is a way to tell if your packet will be dropped
which is not very expensive:

if (atomic_read(netdev_dropping)
/* packet will be dropped */

but even this is 99% accurate in SMP.

cheers,
jamal

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Q: sock output serialization

2000-09-17 Thread Henner Eisen


 "jamal" == jamal  [EMAIL PROTECTED] writes:

jamal Hmm.. More complexity ;- Does X.25 mandate you accept all
jamal the window?

No. Just, if you do not accept a frame, you must not acknowledge it.
Once it has been acknowledged, you must not discard it.

jamal Can you stop mid-window and claim there is
jamal congestion? (maybe time to dust off some books).

Yes.

Just had a look at the X.25 specs again. As far as LAPB is concerned
(and that´s what we are speeking about), it is like this:
When your receiver is busy, you tell the other end about this by means
of a ReceiverNotReady primitive. However, it might take some time until
the peer receives it and reacts on this. In the extreme case, there
could still arrive up to the window size frames. It seems that the
receiver can do whatever it wants to do with frames received during the
busy condition: Either accept the frames (but delay acknowledgement until
the busy condition is cleared) or just discard them. The first one seems
to favor performance while the second favors simplicity.

I guess in Linux, we should usually choose simplicity. I think even
with the simpicity variant, we could be able to preserve performance
if we can flow control the peer earlier. E.g. when the return value of
your netif_rx indicates 'almost congested, but still able to accept frames',
we could already set the busy condition but continue to deliver the
frames arriving during our busy condition. But that´s performance tuning
and can be taken care of later (I´m even not sure wheter this tuning will
pay off).

Henner



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Q: sock output serialization

2000-09-16 Thread Andi Kleen

On Sat, Sep 16, 2000 at 11:39:45PM +0200, Henner Eisen wrote:
> int netif_would_drop(dev)
> {
>   return (queue->input_pkt_queue.qlen > netdev_max_backlog)
>  || ( (queue->input_pkt_queue.qlen) && (queue->throttle) )
> }
> 
> would fulfil those requirements.

It would just be racy. You test, get a not drop and then another different
interrupt would deliver another packet before you can and fill the queue.
Jamal's extended netif_rx probably makes more sense, because it can be 
atomic.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Q: sock output serialization

2000-09-16 Thread Alan Cox


> > With the current scheme, lapb first acknowleges reception of the frame
> > and after that, netif_rx() might still discard it -- which is evil.
> 
> This might screw things a bit. Can you defer to say first call
> netif_rx() then acknowledge or is this hard-coded into the f/ware?

I think its fixable to make it do the RR/RNR after bouncing it up the stack.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Q: sock output serialization

2000-09-16 Thread jamal

Seems all the good network stuff gets discussed on l-k instead ;-<
(hint: some people are not subscribed to l-k)

On Sat, 16 Sep 2000, Henner Eisen wrote:

> 
> What about a function to query the state of the backlog queue?
> 
> Something like
> 
> if(netif_would_drop(dev)){
> kfree_skb(skb);
>   /*optionally,if supported by lapb implementation:*/
>   set_lapb_rx_busy_condition();
>   return; 
> }
> clear_lapb_rx_busy_condition(); /* if supported */
> pass_frame_to_lapb(lapb,skb);
> 
> The key point is that we need to query the backlog queue and
> discard the skb before lapb can acknowledge it. Simply discarding
> it when backlog is known to be congested should be sufficient. It could
> however improve performance if lapb did additionally flow control the peer.

This should be resolved by a patch i am about to submit based on the OLS
talk.
netif_rx() now returns a value which tells you the congestion levels when
you give it a packet (change from void netif_rx())

--- 
/*  return values:
 *  BLG_CNG_NONE(no congestion)   
 *  BLG_CNG_LOW (low congestion) 
 *  BLG_CNG_MOD (moderate congestion)
 *  BLG_CNG_HIGH(high congestion) 
 *  BLG_CNG_DROP(packet was dropped)
 */  
---

> 
> With the current scheme, lapb first acknowleges reception of the frame
> and after that, netif_rx() might still discard it -- which is evil.
> 

This might screw things a bit. Can you defer to say first call
netif_rx() then acknowledge or is this hard-coded into the f/ware?

> Provided that netif_would_drop(dev) is reliable (a subsequent netif_rf will
> reliably not drop the frame), this should make the netif_rx path reliable.
> 
> It seems that, on 2.4.0, something like
> 
> int netif_would_drop(dev)
> {
>   return (queue->input_pkt_queue.qlen > netdev_max_backlog)
>  || ( (queue->input_pkt_queue.qlen) && (queue->throttle) )
> }
> 
> would fulfil those requirements.

I think this would make it a little more complex than necessary; the queue
state might change right after you return from netif_would_drop() -- maybe
not, i am just hypothesizing. 
** You can still create the netif_would_drop() --it just sounds too
expensive to me since to be realy sure no packet of yours is dropped, you
have to make this call for every packet.

If you cant defer the acknowledgement until netif_rx() returns then what
we could do is instead: 

1) for devices that are registered with hardware flow control
==> you have to register as a CONFIG_NET_HW_FLOWCONTROL device.

a) to let them queue that last packet before they are shut-up, the
assumption is they respect the protocol and will 'back-off' after
that. 
b) return BLG_CNG_WOULD_DROP instead to the device and give it the
responsibility to free the skb or store it wherever it wants but not in
the backlog.

I personally prefer a). Reason: If we have done all the work so
far(context switch etc) and we know the device is well behaved(meaning it
is not going to send another packet without beiong told things are fine) 
then it is probably wiser to just let that packet get on the backlog
queue.

cheers,
jamal

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Q: sock output serialization

2000-09-16 Thread Henner Eisen

Hi,

> "Alan" == Alan Cox <[EMAIL PROTECTED]> writes:

>> However, for drivers which support intelligent controllers
>> (with lapb in firmware) this is not an option and the problem
>> will persist.

Alan> 'Smart hardware is broken' repeat .. ;) - but yes its an
Alan> issue there. These cards could bypass netif_rx and call
Alan> directly to the lapb top end though ?

What about a function to query the state of the backlog queue?

Something like

if(netif_would_drop(dev)){
kfree_skb(skb);
/*optionally,if supported by lapb implementation:*/
set_lapb_rx_busy_condition();
return; 
}
clear_lapb_rx_busy_condition(); /* if supported */
pass_frame_to_lapb(lapb,skb);

The key point is that we need to query the backlog queue and
discard the skb before lapb can acknowledge it. Simply discarding
it when backlog is known to be congested should be sufficient. It could
however improve performance if lapb did additionally flow control the peer.

With the current scheme, lapb first acknowleges reception of the frame
and after that, netif_rx() might still discard it -- which is evil.

Provided that netif_would_drop(dev) is reliable (a subsequent netif_rf will
reliably not drop the frame), this should make the netif_rx path reliable.

It seems that, on 2.4.0, something like

int netif_would_drop(dev)
{
return (queue->input_pkt_queue.qlen > netdev_max_backlog)
   || ( (queue->input_pkt_queue.qlen) && (queue->throttle) )
}

would fulfil those requirements.

Henner
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Q: sock output serialization

2000-09-16 Thread Henner Eisen

Hi,

> "kuznet" == kuznet  <[EMAIL PROTECTED]> writes:

kuznet> Hello!
>> scheduler may re-order frames

kuznet> It cannot, provided sender holds order until
kuznet> dev_queue_xmit().

But if I set different skb->priority? ;) Well that would be my fault than .. 

>> or drop them.

kuznet> Yes. And if you share _single_ device both for reliable
kuznet> and unreliable services, you have to make special tricks.

Well, I think this problem will not occur. For shared service, we
will use a datalink protocol running above netif. (e.g. mixed X.25
and IP over ethernet where X.25 runs on top of 802.2 LLC.2 which
will be implemented above netif). And for smart (firmware lapb)
interfaces, which are the real problem, we won´t need to support
shared service.

>> be fixed by providing a special LAPB network scheduler which
>> takes care about preserving reliable LAPB semantics.

kuznet> Yes. ATM CLIP already does this, look at atm clip.c and
kuznet> sch_atm.c to get an example.

Yes. But the above seems to be a network scheduler specialized
for passing IP down to an ATM tunnel. What I had in mind would
correspond to a special scheduler for an atm net_device (but ATM
does not use stadard linux net_device).   

>> that value before calling netif_rx().  Then upper layer´s
>> worried about netif_rx() re-ordering can detect this and act
>> appropriately.
kuznet> etc.

kuznet> No!

kuznet> In fact, it is mathematical fact, that as soon as order is
kuznet> broken once it is _impossible_ to restore it back. No
kuznet> valid actions are invented to do this f.e. for TCP.

Agreed.

kuznet> Though with lapb the situation is different: it cannot
kuznet> lose frames, this changes the situation.

Unfortunatly, the netif_rx might still loose frames, and its concurrent
netif_rx() which re-orders the frames. Thus, we cannot take advantage
of reliable LAPB below netif_rx when packet loss and re-ordering occured
above netif_rx().

kuznet> In any case, order must not be broken, if it is
kuznet> essential. That's answer.

I see. Apparently, IRQ affinity seems the only simple and cheep
solution the re-ordering problem.

Henner

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Q: sock output serialization

2000-09-16 Thread kuznet


Hello!

> scheduler may re-order frames

It cannot, provided sender holds order until dev_queue_xmit().

Actually, it is true about all the schedulers, except for
the cases, when reordering is allowed explicitly with special
policing rules.


> or drop them.

Yes. And if you share _single_ device both for reliable and unreliable
services, you have to make special tricks.


> be fixed by providing a special LAPB network scheduler which takes
> care about preserving reliable LAPB semantics.

Yes. ATM CLIP already does this, look at atm clip.c and sch_atm.c
to get an example.


> Maybe a general solution to the problem whould be to provide a special
> skb->rx_seqno field for SMP kernels. Device drivers can maintain an
> rx counter (they usually do so anyway in struct net_device_stats.rx_packets)
> which is incremented whenever a new frame is received. The driver
> then sets skb->rx_seqno to that value before calling netif_rx().
> Then upper layer´s worried about netif_rx() re-ordering can detect
> this and act appropriately.
etc.

No!

In fact, it is mathematical fact, that as soon as order is broken
once it is _impossible_ to restore it back. No valid actions are
invented to do this f.e. for TCP.

Though with lapb the situation is different: it cannot lose frames,
this changes the situation.

In any case, order must not be broken, if it is essential. That's answer.

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Q: sock output serialization

2000-09-16 Thread Henner Eisen



Hi,

> "Alan" == Alan Cox <[EMAIL PROTECTED]> writes:

Alan> LAPB does not expect ever to see re-ordering. Its a point to
Alan> point wire level MAC protocol.

Yes, it was never designed for handling re-ordering because this cannot
happen on a single wire (and for that reason it is not fair to blame LAPB
for not handling re-ordering efficiently). But it seems that re-ordering
will by handled nevertheless -- by retransmission of the not-in-sequence
frames. At least the X.25 layer 2 spec for LAPB requires:

"Reception of out-of-sequence I frames

 When the DCE receives a valid I frame whose send state sequence
 number N(S) is incorrect, it will discard the information field of
 the I frame and transmit an REJ frame with the N(R) set to one
 higher than the N(S) of the last correctly received I frame. ...
 ... The DCE will then discard the information field of all I frames
 received until the expected I frame is correctly received"

Not very efficient, but it should work. And as long as re-ordering
only happend occasionally, efficiency should not matter. And the fact
that LAPB can actually recover from re-ordering problems really shows that
it is good design: it can even recover from errors that were considered
impossible in the design environment.

Alan> 'Smart hardware is broken' repeat .. ;) - but yes its an

;)

Alan> issue there. These cards could bypass netif_rx and call
Alan> directly to the lapb top end though ?

Yes, something like this. Maybe what´s missing is a standard
interface in the kernel that allows to access reliable datalink
service.

Further, it´s not just a netif_rx() issue. dev_queue_xmit() does not
provide reliable datalink semantics, either. The default network
scheduler may re-order frames or drop them. But this could
be fixed by providing a special LAPB network scheduler which takes
care about preserving reliable LAPB semantics. Maybe a LAPB network
scheduler would even be the best place to hook in the software lapb
processing: the standard network scheduler queues could serve
simultaneously as the output queue for lapb. Putting lapb anywhere else
requires a dedicated LAPB output queue.


Further, I´m wondering whether other protocols are also affected by
re-ordering problems. X.25 is not the only protocol which relies on
frames beeing received in sequence. Frame relay or ATM networks are also
required to deliver frames/cells in sequence. And upper layer protocols
might depend on this.

Maybe a general solution to the problem whould be to provide a special
skb->rx_seqno field for SMP kernels. Device drivers can maintain an
rx counter (they usually do so anyway in struct net_device_stats.rx_packets)
which is incremented whenever a new frame is received. The driver
then sets skb->rx_seqno to that value before calling netif_rx().
Then upper layer´s worried about netif_rx() re-ordering can detect
this and act appropriately. Only device drivers and protocols affected
by re-ordering need to do so. Thus, protocols like tcp/ip which already
can handle re-ordering by themselves wont´t be affected. They will
only suffer from the increased skb size by sizeof(skb->rx_seqno).
Maybe a convention to hook the rx_seqno into skb->cb could even work around
this. It´s not trival yet when different upper layer protocols receive frames
from the device. NET_RX_SOFTIRQ handler would need to map the device
specific sequence numbers to consecutive protocol specific sequence numbers
before calling ptype->func().

Henner



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Q: sock output serialization

2000-09-16 Thread Henner Eisen



Hi,

 "Alan" == Alan Cox [EMAIL PROTECTED] writes:

Alan LAPB does not expect ever to see re-ordering. Its a point to
Alan point wire level MAC protocol.

Yes, it was never designed for handling re-ordering because this cannot
happen on a single wire (and for that reason it is not fair to blame LAPB
for not handling re-ordering efficiently). But it seems that re-ordering
will by handled nevertheless -- by retransmission of the not-in-sequence
frames. At least the X.25 layer 2 spec for LAPB requires:

"Reception of out-of-sequence I frames

 When the DCE receives a valid I frame whose send state sequence
 number N(S) is incorrect, it will discard the information field of
 the I frame and transmit an REJ frame with the N(R) set to one
 higher than the N(S) of the last correctly received I frame. ...
 ... The DCE will then discard the information field of all I frames
 received until the expected I frame is correctly received"

Not very efficient, but it should work. And as long as re-ordering
only happend occasionally, efficiency should not matter. And the fact
that LAPB can actually recover from re-ordering problems really shows that
it is good design: it can even recover from errors that were considered
impossible in the design environment.

Alan 'Smart hardware is broken' repeat .. ;) - but yes its an

;)

Alan issue there. These cards could bypass netif_rx and call
Alan directly to the lapb top end though ?

Yes, something like this. Maybe what´s missing is a standard
interface in the kernel that allows to access reliable datalink
service.

Further, it´s not just a netif_rx() issue. dev_queue_xmit() does not
provide reliable datalink semantics, either. The default network
scheduler may re-order frames or drop them. But this could
be fixed by providing a special LAPB network scheduler which takes
care about preserving reliable LAPB semantics. Maybe a LAPB network
scheduler would even be the best place to hook in the software lapb
processing: the standard network scheduler queues could serve
simultaneously as the output queue for lapb. Putting lapb anywhere else
requires a dedicated LAPB output queue.


Further, I´m wondering whether other protocols are also affected by
re-ordering problems. X.25 is not the only protocol which relies on
frames beeing received in sequence. Frame relay or ATM networks are also
required to deliver frames/cells in sequence. And upper layer protocols
might depend on this.

Maybe a general solution to the problem whould be to provide a special
skb-rx_seqno field for SMP kernels. Device drivers can maintain an
rx counter (they usually do so anyway in struct net_device_stats.rx_packets)
which is incremented whenever a new frame is received. The driver
then sets skb-rx_seqno to that value before calling netif_rx().
Then upper layer´s worried about netif_rx() re-ordering can detect
this and act appropriately. Only device drivers and protocols affected
by re-ordering need to do so. Thus, protocols like tcp/ip which already
can handle re-ordering by themselves wont´t be affected. They will
only suffer from the increased skb size by sizeof(skb-rx_seqno).
Maybe a convention to hook the rx_seqno into skb-cb could even work around
this. It´s not trival yet when different upper layer protocols receive frames
from the device. NET_RX_SOFTIRQ handler would need to map the device
specific sequence numbers to consecutive protocol specific sequence numbers
before calling ptype-func().

Henner



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Q: sock output serialization

2000-09-16 Thread kuznet


Hello!

 scheduler may re-order frames

It cannot, provided sender holds order until dev_queue_xmit().

Actually, it is true about all the schedulers, except for
the cases, when reordering is allowed explicitly with special
policing rules.


 or drop them.

Yes. And if you share _single_ device both for reliable and unreliable
services, you have to make special tricks.


 be fixed by providing a special LAPB network scheduler which takes
 care about preserving reliable LAPB semantics.

Yes. ATM CLIP already does this, look at atm clip.c and sch_atm.c
to get an example.


 Maybe a general solution to the problem whould be to provide a special
 skb-rx_seqno field for SMP kernels. Device drivers can maintain an
 rx counter (they usually do so anyway in struct net_device_stats.rx_packets)
 which is incremented whenever a new frame is received. The driver
 then sets skb-rx_seqno to that value before calling netif_rx().
 Then upper layer´s worried about netif_rx() re-ordering can detect
 this and act appropriately.
etc.

No!

In fact, it is mathematical fact, that as soon as order is broken
once it is _impossible_ to restore it back. No valid actions are
invented to do this f.e. for TCP.

Though with lapb the situation is different: it cannot lose frames,
this changes the situation.

In any case, order must not be broken, if it is essential. That's answer.

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Q: sock output serialization

2000-09-16 Thread Alan Cox


  With the current scheme, lapb first acknowleges reception of the frame
  and after that, netif_rx() might still discard it -- which is evil.
 
 This might screw things a bit. Can you defer to say first call
 netif_rx() then acknowledge or is this hard-coded into the f/ware?

I think its fixable to make it do the RR/RNR after bouncing it up the stack.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Q: sock output serialization

2000-09-16 Thread Andi Kleen


On Sat, Sep 16, 2000 at 11:39:45PM +0200, Henner Eisen wrote:
 int netif_would_drop(dev)
 {
   return (queue-input_pkt_queue.qlen  netdev_max_backlog)
  || ( (queue-input_pkt_queue.qlen)  (queue-throttle) )
 }
 
 would fulfil those requirements.

It would just be racy. You test, get a not drop and then another different
interrupt would deliver another packet before you can and fill the queue.
Jamal's extended netif_rx probably makes more sense, because it can be 
atomic.


-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Q: sock output serialization

2000-09-16 Thread Henner Eisen



Hi,

 "kuznet" == kuznet  [EMAIL PROTECTED] writes:

kuznet Hello!
 scheduler may re-order frames

kuznet It cannot, provided sender holds order until
kuznet dev_queue_xmit().

But if I set different skb-priority? ;) Well that would be my fault than .. 

 or drop them.

kuznet Yes. And if you share _single_ device both for reliable
kuznet and unreliable services, you have to make special tricks.

Well, I think this problem will not occur. For shared service, we
will use a datalink protocol running above netif. (e.g. mixed X.25
and IP over ethernet where X.25 runs on top of 802.2 LLC.2 which
will be implemented above netif). And for smart (firmware lapb)
interfaces, which are the real problem, we won´t need to support
shared service.

 be fixed by providing a special LAPB network scheduler which
 takes care about preserving reliable LAPB semantics.

kuznet Yes. ATM CLIP already does this, look at atm clip.c and
kuznet sch_atm.c to get an example.

Yes. But the above seems to be a network scheduler specialized
for passing IP down to an ATM tunnel. What I had in mind would
correspond to a special scheduler for an atm net_device (but ATM
does not use stadard linux net_device).   

 that value before calling netif_rx().  Then upper layer´s
 worried about netif_rx() re-ordering can detect this and act
 appropriately.
kuznet etc.

kuznet No!

kuznet In fact, it is mathematical fact, that as soon as order is
kuznet broken once it is _impossible_ to restore it back. No
kuznet valid actions are invented to do this f.e. for TCP.

Agreed.

kuznet Though with lapb the situation is different: it cannot
kuznet lose frames, this changes the situation.

Unfortunatly, the netif_rx might still loose frames, and its concurrent
netif_rx() which re-orders the frames. Thus, we cannot take advantage
of reliable LAPB below netif_rx when packet loss and re-ordering occured
above netif_rx().

kuznet In any case, order must not be broken, if it is
kuznet essential. That's answer.

I see. Apparently, IRQ affinity seems the only simple and cheep
solution the re-ordering problem.

Henner

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Q: sock output serialization

2000-09-16 Thread Henner Eisen



Hi,

 "Alan" == Alan Cox [EMAIL PROTECTED] writes:

 However, for drivers which support intelligent controllers
 (with lapb in firmware) this is not an option and the problem
 will persist.

Alan 'Smart hardware is broken' repeat .. ;) - but yes its an
Alan issue there. These cards could bypass netif_rx and call
Alan directly to the lapb top end though ?

What about a function to query the state of the backlog queue?

Something like

if(netif_would_drop(dev)){
kfree_skb(skb);
/*optionally,if supported by lapb implementation:*/
set_lapb_rx_busy_condition();
return; 
}
clear_lapb_rx_busy_condition(); /* if supported */
pass_frame_to_lapb(lapb,skb);

The key point is that we need to query the backlog queue and
discard the skb before lapb can acknowledge it. Simply discarding
it when backlog is known to be congested should be sufficient. It could
however improve performance if lapb did additionally flow control the peer.

With the current scheme, lapb first acknowleges reception of the frame
and after that, netif_rx() might still discard it -- which is evil.

Provided that netif_would_drop(dev) is reliable (a subsequent netif_rf will
reliably not drop the frame), this should make the netif_rx path reliable.

It seems that, on 2.4.0, something like

int netif_would_drop(dev)
{
return (queue-input_pkt_queue.qlen  netdev_max_backlog)
   || ( (queue-input_pkt_queue.qlen)  (queue-throttle) )
}

would fulfil those requirements.

Henner
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Q: sock output serialization

2000-09-16 Thread jamal



Seems all the good network stuff gets discussed on l-k instead ;-
(hint: some people are not subscribed to l-k)

On Sat, 16 Sep 2000, Henner Eisen wrote:

 
 What about a function to query the state of the backlog queue?
 
 Something like
 
 if(netif_would_drop(dev)){
 kfree_skb(skb);
   /*optionally,if supported by lapb implementation:*/
   set_lapb_rx_busy_condition();
   return; 
 }
 clear_lapb_rx_busy_condition(); /* if supported */
 pass_frame_to_lapb(lapb,skb);
 
 The key point is that we need to query the backlog queue and
 discard the skb before lapb can acknowledge it. Simply discarding
 it when backlog is known to be congested should be sufficient. It could
 however improve performance if lapb did additionally flow control the peer.


This should be resolved by a patch i am about to submit based on the OLS
talk.
netif_rx() now returns a value which tells you the congestion levels when
you give it a packet (change from void netif_rx())

--- 
/*  return values:
 *  BLG_CNG_NONE(no congestion)   
 *  BLG_CNG_LOW (low congestion) 
 *  BLG_CNG_MOD (moderate congestion)
 *  BLG_CNG_HIGH(high congestion) 
 *  BLG_CNG_DROP(packet was dropped)
 */  
---

 
 With the current scheme, lapb first acknowleges reception of the frame
 and after that, netif_rx() might still discard it -- which is evil.
 

This might screw things a bit. Can you defer to say first call
netif_rx() then acknowledge or is this hard-coded into the f/ware?


 Provided that netif_would_drop(dev) is reliable (a subsequent netif_rf will
 reliably not drop the frame), this should make the netif_rx path reliable.
 
 It seems that, on 2.4.0, something like
 
 int netif_would_drop(dev)
 {
   return (queue-input_pkt_queue.qlen  netdev_max_backlog)
  || ( (queue-input_pkt_queue.qlen)  (queue-throttle) )
 }
 
 would fulfil those requirements.

I think this would make it a little more complex than necessary; the queue
state might change right after you return from netif_would_drop() -- maybe
not, i am just hypothesizing. 
** You can still create the netif_would_drop() --it just sounds too
expensive to me since to be realy sure no packet of yours is dropped, you
have to make this call for every packet.

If you cant defer the acknowledgement until netif_rx() returns then what
we could do is instead: 

1) for devices that are registered with hardware flow control
== you have to register as a CONFIG_NET_HW_FLOWCONTROL device.

a) to let them queue that last packet before they are shut-up, the
assumption is they respect the protocol and will 'back-off' after
that. 
b) return BLG_CNG_WOULD_DROP instead to the device and give it the
responsibility to free the skb or store it wherever it wants but not in
the backlog.

I personally prefer a). Reason: If we have done all the work so
far(context switch etc) and we know the device is well behaved(meaning it
is not going to send another packet without beiong told things are fine) 
then it is probably wiser to just let that packet get on the backlog
queue.

cheers,
jamal

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Q: sock output serialization

2000-09-15 Thread Alan Cox


> LAPB itsself should be able to recover from reordering, although it is
> not optimzed for this. It will just discard any received out-of-sequence
> frame. The discarded frames will be retransmitted later (exacly like
> frames which had been discarded due to CRC errors).

LAPB does not expect ever to see re-ordering. Its a point to point wire level
MAC protocol. 

> For drivers using the software lapb module implementation, the right fix
> would obviously be to move the lapb processing above the network interface.

Agreed

> However, for drivers which support intelligent controllers (with lapb
> in firmware) this is not an option and the problem will persist.

'Smart hardware is broken' repeat .. ;) - but yes its an issue there. These
cards could bypass netif_rx and call directly to the lapb top end though ?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Q: sock output serialization

2000-09-15 Thread Henner Eisen


Hi,

> "David" == David S Miller <[EMAIL PROTECTED]> writes:

David>It smells rotten to the core, can someone tell me
David> exactly why reordering is strictly disallowed?  I do not
David> even know how other OSes can handle this properly since
David> most, if not all, use the IRQ dynamic cpu targeting
David> facilities of various machines so LAPB is by definition
David> broken there too.

LAPB itsself should be able to recover from reordering, although it is
not optimzed for this. It will just discard any received out-of-sequence
frame. The discarded frames will be retransmitted later (exacly like
frames which had been discarded due to CRC errors).

The problem is the X.25 packet layer (layer 3). It assumes that
the LAPB layer has already fixed any lost frames and out-of-sequence
problems and therefor does not provide for an own error recovery mechanism.
It will detect when frames are missing or out of sequence. But as it cannot
recover from such errors, it will just initiate a reset procedure
(discarding all currently queued frames, set the state machine to a
known state, and tell the network and the peer to also do so, before
data transmission resumes. The upper layer is notified about the reset
event, the task to recover from the packet loss is left to the upper layer.)

David>I sense that usually, LAPB handles this issue at a
David> different level, in the hardware?  Does LAPB specify how to
David> maintain reliably delivery and could we hook into this
David> "how" when we need to drop LAPB frames?  Perhaps it is too
David> late by the time netif_rx is dealing with it.

The lapb protocol allows to flow control the peer. So, if known in advance
that netif_rx() would discard the frame, it could set its rx_busy condition.
(The linux software lapb module however does not support this, but this
problem is yet a different matter). From looking at the netif_rx() source,
it seems that CONFIG_NET_HW_FLOWCONTROL almost could provide the necessary
state information for flow controling the peer. 

David> LAPB sounds like quite a broken protocol at the moment...
David> But I'm sure there are details which will emerge and clear
David> this all up.

Well, not just at the moment, it has ever been like this. Thus, as we did
not panic before, there is neither reason to panic now.
Actually, its not the LAPB protocol itsself that is broken, but the
way of accessing it from the X.25 packet layer (reliable datalink
service is accessed via the unreliable dev_queue_xmit()/netif_rx()
interface). I always wondered why it was done like this. Probably
the possible problems were not realized during the early design stage
and did not show up when testing. (The problems might be unlikly to occur
in real-world scenarios. As real-world X.25 connections usually use only
slow links (a few kByte/sec), it is very unlikly that the X.25 connection
itsself caused the NET_RX queue to overrun. It might only be triggered
when the host is simultaneously flooded with other traffic from a local
high speed lan network interface. Triggering SMP packet reordering
problems with a slow X.25 link is probably even more unlikely).

For drivers using the software lapb module implementation, the right fix
would obviously be to move the lapb processing above the network interface.
(We will need to provide a function call interface between X.25 packet layer
and the datalink layer anyway once LLC.2 from the Linux-SNA project is
merged and should be supported by X.25 as well).
However, for drivers which support intelligent controllers (with lapb
in firmware) this is not an option and the problem will persist.

Henner
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Q: sock output serialization

2000-09-15 Thread Alan Cox


>I sense that usually, LAPB handles this issue at a different
>level, in the hardware?  Does LAPB specify how to maintain
>reliably delivery and could we hook into this "how" when we
>need to drop LAPB frames?  Perhaps it is too late by the time
>netif_rx is dealing with it.

LAPB maintains a window of (normally 8) frames. When a frame is accepted it
as acked (RR) or if there is no room it is rejected (RNR). Once it has been 
accepted with an RR it can from that point onwards not get lost.

> LAPB sounds like quite a broken protocol at the moment...  But I'm
> sure there are details which will emerge and clear this all up.

LAPB isnt broken, its actually rather clever and ideal for tiny low power
devices


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Q: sock output serialization

2000-09-15 Thread David S. Miller

   From: [EMAIL PROTECTED]
   Date: Fri, 15 Sep 2000 21:07:38 +0400 (MSK DST)

   [ Dave, all this sounds bad. ]

Well, there are two things:

1) If exact sequencing is so important, then we can make
   special netif_rx tasklet for these guys which serializes
   around a spinlock.

   Actually, even with this, how could we guarentee this still.
   Yes, IRQ affinity would need to force only a single CPU to
   receive interrupts from this LAPB device.

   It smells rotten to the core, can someone tell me exactly
   why reordering is strictly disallowed?  I do not even know
   how other OSes can handle this properly since most, if
   not all, use the IRQ dynamic cpu targeting facilities of
   various machines so LAPB is by definition broken there too.

2) Someone please show Alexey and myself how to process input packet
   when out of memory and not to drop any packets ;-)

   I sense that usually, LAPB handles this issue at a different
   level, in the hardware?  Does LAPB specify how to maintain
   reliably delivery and could we hook into this "how" when we
   need to drop LAPB frames?  Perhaps it is too late by the time
   netif_rx is dealing with it.

LAPB sounds like quite a broken protocol at the moment...  But I'm
sure there are details which will emerge and clear this all up.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Q: sock output serialization

2000-09-15 Thread kuznet


Hello!

> But I realized another problem X.25 related SMP problem  -- this time
> related to input. The protocol design assumes that the transmission
> path preserves the packet ordering. It seems that with 2.4.0 SMP, the ordering
> of the packets when received from the wire is not necessarily the same
> as when delivered to the protocol´s receive method. Is this true?

This is true.


> recover from such errors transparently. Unfortunatly, the current design
> assumes that the LAPB layer is performed below the network interface.

I.e. on hard irq? It was not a good idea.


> Although this allows to support controllers which implement LAPB in firmware,

This is really difficult case.


> this seems to break the assumptions made by upper layers. The upper layer
> assumes that LAPB devices provide a reliable datalink service. But the Linux
> network interfaces do not preserve such reliable semantics. (Network
> interfaces may drop frames, e.g. when NET_RX input queue overruns,

No way to fix. I do not know how to make this.


> and on SMP packet sequencing might change),

Though this one can be fixed by restricting affinity,
all this smells like you cannot use netif_rx() for such devices.
netif_rx() is used for normal "unreliable" devices, it loses any
sense as soon as we require some reliability...

[ Dave, all this sounds bad. ]

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Q: sock output serialization

2000-09-15 Thread kuznet


Hello!

 But I realized another problem X.25 related SMP problem  -- this time
 related to input. The protocol design assumes that the transmission
 path preserves the packet ordering. It seems that with 2.4.0 SMP, the ordering
 of the packets when received from the wire is not necessarily the same
 as when delivered to the protocol´s receive method. Is this true?

This is true.


 recover from such errors transparently. Unfortunatly, the current design
 assumes that the LAPB layer is performed below the network interface.

I.e. on hard irq? It was not a good idea.


 Although this allows to support controllers which implement LAPB in firmware,

This is really difficult case.


 this seems to break the assumptions made by upper layers. The upper layer
 assumes that LAPB devices provide a reliable datalink service. But the Linux
 network interfaces do not preserve such reliable semantics. (Network
 interfaces may drop frames, e.g. when NET_RX input queue overruns,

No way to fix. I do not know how to make this.


 and on SMP packet sequencing might change),

Though this one can be fixed by restricting affinity,
all this smells like you cannot use netif_rx() for such devices.
netif_rx() is used for normal "unreliable" devices, it loses any
sense as soon as we require some reliability...

[ Dave, all this sounds bad. ]

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Q: sock output serialization

2000-09-15 Thread David S. Miller

   From: [EMAIL PROTECTED]
   Date: Fri, 15 Sep 2000 21:07:38 +0400 (MSK DST)

   [ Dave, all this sounds bad. ]

Well, there are two things:

1) If exact sequencing is so important, then we can make
   special netif_rx tasklet for these guys which serializes
   around a spinlock.

   Actually, even with this, how could we guarentee this still.
   Yes, IRQ affinity would need to force only a single CPU to
   receive interrupts from this LAPB device.

   It smells rotten to the core, can someone tell me exactly
   why reordering is strictly disallowed?  I do not even know
   how other OSes can handle this properly since most, if
   not all, use the IRQ dynamic cpu targeting facilities of
   various machines so LAPB is by definition broken there too.

2) Someone please show Alexey and myself how to process input packet
   when out of memory and not to drop any packets ;-)

   I sense that usually, LAPB handles this issue at a different
   level, in the hardware?  Does LAPB specify how to maintain
   reliably delivery and could we hook into this "how" when we
   need to drop LAPB frames?  Perhaps it is too late by the time
   netif_rx is dealing with it.

LAPB sounds like quite a broken protocol at the moment...  But I'm
sure there are details which will emerge and clear this all up.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Q: sock output serialization

2000-09-15 Thread Henner Eisen


Hi,

 "David" == David S Miller [EMAIL PROTECTED] writes:

DavidIt smells rotten to the core, can someone tell me
David exactly why reordering is strictly disallowed?  I do not
David even know how other OSes can handle this properly since
David most, if not all, use the IRQ dynamic cpu targeting
David facilities of various machines so LAPB is by definition
David broken there too.

LAPB itsself should be able to recover from reordering, although it is
not optimzed for this. It will just discard any received out-of-sequence
frame. The discarded frames will be retransmitted later (exacly like
frames which had been discarded due to CRC errors).

The problem is the X.25 packet layer (layer 3). It assumes that
the LAPB layer has already fixed any lost frames and out-of-sequence
problems and therefor does not provide for an own error recovery mechanism.
It will detect when frames are missing or out of sequence. But as it cannot
recover from such errors, it will just initiate a reset procedure
(discarding all currently queued frames, set the state machine to a
known state, and tell the network and the peer to also do so, before
data transmission resumes. The upper layer is notified about the reset
event, the task to recover from the packet loss is left to the upper layer.)

DavidI sense that usually, LAPB handles this issue at a
David different level, in the hardware?  Does LAPB specify how to
David maintain reliably delivery and could we hook into this
David "how" when we need to drop LAPB frames?  Perhaps it is too
David late by the time netif_rx is dealing with it.

The lapb protocol allows to flow control the peer. So, if known in advance
that netif_rx() would discard the frame, it could set its rx_busy condition.
(The linux software lapb module however does not support this, but this
problem is yet a different matter). From looking at the netif_rx() source,
it seems that CONFIG_NET_HW_FLOWCONTROL almost could provide the necessary
state information for flow controling the peer. 

David LAPB sounds like quite a broken protocol at the moment...
David But I'm sure there are details which will emerge and clear
David this all up.

Well, not just at the moment, it has ever been like this. Thus, as we did
not panic before, there is neither reason to panic now.
Actually, its not the LAPB protocol itsself that is broken, but the
way of accessing it from the X.25 packet layer (reliable datalink
service is accessed via the unreliable dev_queue_xmit()/netif_rx()
interface). I always wondered why it was done like this. Probably
the possible problems were not realized during the early design stage
and did not show up when testing. (The problems might be unlikly to occur
in real-world scenarios. As real-world X.25 connections usually use only
slow links (a few kByte/sec), it is very unlikly that the X.25 connection
itsself caused the NET_RX queue to overrun. It might only be triggered
when the host is simultaneously flooded with other traffic from a local
high speed lan network interface. Triggering SMP packet reordering
problems with a slow X.25 link is probably even more unlikely).

For drivers using the software lapb module implementation, the right fix
would obviously be to move the lapb processing above the network interface.
(We will need to provide a function call interface between X.25 packet layer
and the datalink layer anyway once LLC.2 from the Linux-SNA project is
merged and should be supported by X.25 as well).
However, for drivers which support intelligent controllers (with lapb
in firmware) this is not an option and the problem will persist.

Henner
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Q: sock output serialization

2000-09-15 Thread Alan Cox


 LAPB itsself should be able to recover from reordering, although it is
 not optimzed for this. It will just discard any received out-of-sequence
 frame. The discarded frames will be retransmitted later (exacly like
 frames which had been discarded due to CRC errors).

LAPB does not expect ever to see re-ordering. Its a point to point wire level
MAC protocol. 

 For drivers using the software lapb module implementation, the right fix
 would obviously be to move the lapb processing above the network interface.

Agreed

 However, for drivers which support intelligent controllers (with lapb
 in firmware) this is not an option and the problem will persist.

'Smart hardware is broken' repeat .. ;) - but yes its an issue there. These
cards could bypass netif_rx and call directly to the lapb top end though ?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Q: sock output serialization

2000-09-15 Thread Alan Cox


I sense that usually, LAPB handles this issue at a different
level, in the hardware?  Does LAPB specify how to maintain
reliably delivery and could we hook into this "how" when we
need to drop LAPB frames?  Perhaps it is too late by the time
netif_rx is dealing with it.

LAPB maintains a window of (normally 8) frames. When a frame is accepted it
as acked (RR) or if there is no room it is rejected (RNR). Once it has been 
accepted with an RR it can from that point onwards not get lost.

 LAPB sounds like quite a broken protocol at the moment...  But I'm
 sure there are details which will emerge and clear this all up.

LAPB isnt broken, its actually rather clever and ideal for tiny low power
devices


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Q: sock output serialization

2000-09-14 Thread Henner Eisen

Hi,

> "kuznet" == kuznet  <[EMAIL PROTECTED]> writes:

>> when sk->lock.users!=0. Is there a particular reason why such
>> task queue does not exist?

kuznet> Because it appeared to be useless overhead. I also
kuznet> believed that it will be required in tcp, but one day I
kuznet> understood that all the problems of these kind
kuznet> dissolved. 8)

Yes, probably, this should also hold for other protocols. I need to study the
protocol specs for further details.

But I realized another problem X.25 related SMP problem  -- this time
related to input. The protocol design assumes that the transmission
path preserves the packet ordering. It seems that with 2.4.0 SMP, the ordering
of the packets when received from the wire is not necessarily the same
as when delivered to the protocol´s receive method. Is this true?

LAPB should be able to recover from such sequence errors. But the X.25 packet
layer can only detect such problem (and reset the connection). It cannot
recover from such errors transparently. Unfortunatly, the current design
assumes that the LAPB layer is performed below the network interface.

Although this allows to support controllers which implement LAPB in firmware,
this seems to break the assumptions made by upper layers. The upper layer
assumes that LAPB devices provide a reliable datalink service. But the Linux
network interfaces do not preserve such reliable semantics. (Network
interfaces may drop frames, e.g. when NET_RX input queue overruns, and
and on SMP packet sequencing might change),

Henner

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Q: sock output serialization

2000-09-14 Thread kuznet


Hello!

> timer events where the protocol specs require immediate reaction and
> which need to change socket state. For such events, it might not
> be obvious how to defer them when sk->lock.users != 0.

After some thinking, you will understand that "timer" and "immediate"
are incompatible. 

TCP just defers such events, look into tcp_timer.c.

Yes, you are right: the problem exists and you can try to
solve this f.e. queueing special control events to backlog.

> when sk->lock.users!=0. Is there a particular reason why such task queue
> does not exist?

Because it appeared to be useless overhead. I also believed that
it will be required in tcp, but one day I understood that all the problems
of these kind dissolved. 8)

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Q: sock output serialization

2000-09-14 Thread kuznet


Hello!

 timer events where the protocol specs require immediate reaction and
 which need to change socket state. For such events, it might not
 be obvious how to defer them when sk-lock.users != 0.

After some thinking, you will understand that "timer" and "immediate"
are incompatible. 

TCP just defers such events, look into tcp_timer.c.

Yes, you are right: the problem exists and you can try to
solve this f.e. queueing special control events to backlog.

 when sk-lock.users!=0. Is there a particular reason why such task queue
 does not exist?

Because it appeared to be useless overhead. I also believed that
it will be required in tcp, but one day I understood that all the problems
of these kind dissolved. 8)

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Q: sock output serialization

2000-09-14 Thread Henner Eisen



Hi,

 "kuznet" == kuznet  [EMAIL PROTECTED] writes:

 when sk-lock.users!=0. Is there a particular reason why such
 task queue does not exist?

kuznet Because it appeared to be useless overhead. I also
kuznet believed that it will be required in tcp, but one day I
kuznet understood that all the problems of these kind
kuznet dissolved. 8)

Yes, probably, this should also hold for other protocols. I need to study the
protocol specs for further details.

But I realized another problem X.25 related SMP problem  -- this time
related to input. The protocol design assumes that the transmission
path preserves the packet ordering. It seems that with 2.4.0 SMP, the ordering
of the packets when received from the wire is not necessarily the same
as when delivered to the protocol´s receive method. Is this true?

LAPB should be able to recover from such sequence errors. But the X.25 packet
layer can only detect such problem (and reset the connection). It cannot
recover from such errors transparently. Unfortunatly, the current design
assumes that the LAPB layer is performed below the network interface.

Although this allows to support controllers which implement LAPB in firmware,
this seems to break the assumptions made by upper layers. The upper layer
assumes that LAPB devices provide a reliable datalink service. But the Linux
network interfaces do not preserve such reliable semantics. (Network
interfaces may drop frames, e.g. when NET_RX input queue overruns, and
and on SMP packet sequencing might change),

Henner

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Q: sock output serialization

2000-09-13 Thread Henner Eisen

Hi,

> "kuznet" == kuznet  <[EMAIL PROTECTED]> writes:

>> Anyway, it seems that I can already make use the lock_sock()
>> infrastructure for fixing the output serialization, even
>> without making the whole protocol stack SMP-aware at once.

kuznet> Actually, the last task is not a rocket science as well.

Yes. It seems the most critical part is changing timer events
to honor sk->lock.users and doing sock_hold/put(). There might be
timer events where the protocol specs require immediate reaction and
which need to change socket state. For such events, it might not
be obvious how to defer them when sk->lock.users != 0.

While deferring socket input is explictly supported (by processing
the sk->backlog queue in release_sock()), there is no special support for
deferring non-input events. Maybe in addition to processing the sk->backlog
queue, release_sock could also run a backlog task_queue? Such
task_queue could be used by other events to defer actions
when sk->lock.users!=0. Is there a particular reason why such task queue
does not exist?

Henner

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Q: sock output serialization

2000-09-13 Thread Henner Eisen



Hi,

 "kuznet" == kuznet  [EMAIL PROTECTED] writes:

 Anyway, it seems that I can already make use the lock_sock()
 infrastructure for fixing the output serialization, even
 without making the whole protocol stack SMP-aware at once.

kuznet Actually, the last task is not a rocket science as well.

Yes. It seems the most critical part is changing timer events
to honor sk-lock.users and doing sock_hold/put(). There might be
timer events where the protocol specs require immediate reaction and
which need to change socket state. For such events, it might not
be obvious how to defer them when sk-lock.users != 0.

While deferring socket input is explictly supported (by processing
the sk-backlog queue in release_sock()), there is no special support for
deferring non-input events. Maybe in addition to processing the sk-backlog
queue, release_sock could also run a backlog task_queue? Such
task_queue could be used by other events to defer actions
when sk-lock.users!=0. Is there a particular reason why such task queue
does not exist?

Henner

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Q: sock output serialization

2000-09-12 Thread kuznet


Hello!

> Yes, I see. I did not realize before that the lock_sock and the
> sk->backlog framework are not two independent things. They really
> seem to be designed for team work only.  Did I get this right?

Yes.

Actually, in 2.4 lock_sock() is also semaphore and in some cases
(f.e. for stateless datagram sockets) it is used as pure semaphore.


> applied seems to make socket propgramming as easy as in the old cli()/sti()
> days again. 

What??? 8)8)

No, it makes it much easier. 8)

By the way:

1. lock_sock() is not much younger than cli()/sti().
2. cli()/sti() is not used by attended parts of networking for looong time,
   they are deprecated not yesterday too.

> tcp also seems to use some additional protocol-global spinlocks

Of course.


> (like tcp_portalloc_lock).

And this is redundant, to be honest. 8)


> the spinlock. In that case, beeing preemptable would make a very essential
> difference.

Yes, of course.


> Anyway, it seems that I can already make use the lock_sock() infrastructure
> for fixing the output serialization, even without making the whole
> protocol stack SMP-aware at once.

Actually, the last task is not a rocket science as well.

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Q: sock output serialization

2000-09-12 Thread Henner Eisen

Hi,

> "kuznet" == kuznet  <[EMAIL PROTECTED]> writes:

kuznet> Hello!

kuznet> In input path you have a packet. Add it to backlog and
kuznet> processing will be resumed after lock is released. Compare
kuznet> with tcp.

>> serializing the kick. Well, maybe my solution could still be
>> simplified (maybe some test_and_set/clear_bit() magic could
>> achieve the same).

kuznet> Being legal in principle, using non-standard serialization
kuznet> primitives is seriously deprecated. It is impossible to
kuznet> maintain. In you case, it is even not evident that it does
kuznet> not lose events with smp.

Yes, I see. I did not realize before that the lock_sock and the
sk->backlog framework are not two independent things. They really
seem to be designed for team work only.  Did I get this right?

And I realize that the lock_sock framework is superior to my approach.
It does not only serialize output, it also serialized output and input
such that other problems are solved as well (the current code, even
after serializing output, could still suffer from atomicity problems
when input path interrupts output path an modifies protocol control
block variables in a non-atomic manner). The lock_sock framework properly
applied seems to make socket propgramming as easy as in the old cli()/sti()
days again. Basically, it seems to be a ´disable interrupts for this socket´

>> - introduce a protocol-global spinlock and protect
>> protocol-global critical section by spin_lock_bh() instead of
>> cli()

kuznet> Why? It is not required. There are no reasons to protect
kuznet> protocol as whole, if sockets are protected.

Well, the term ´protocol global´ was misleading. I should have
said ´global to the protocol-family´. E.g. there are currently some
cli()/sti() pairs to protect socket list and routing table manipulations.
tcp also seems to use some additional protocol-global spinlocks
(like tcp_portalloc_lock).

>> Can NET_TX_SOFTIRQ be prempted by NET_RX_SOFTIRQ or timer?

kuznet> It cannot be preempted, but it is not very essential,
kuznet> because all they can run in parallel on different cpus.

The reason why I was asking is that I recently got IP and PPP tunneling
over X.25 working (in-kernel). In that case, protocol output processing
would be done from NET_TX_SOFTIRQ context, which is only allowed to
to bh_lock_sock(), but not lock_sock(). As bh_lock_sock() is just
spin_lock() -- and not spin_lock_bh() -- this could stall the CPU
if NET_TX_SOFTIRQ were preempted by a timer or NET_RX_SOFTIRQ while holding
the spinlock. In that case, beeing preemptable would make a very essential
difference.

kuznet> Alexey

Thanks for the insight. I hope this is sufficient to migrate the code.
Not before 2.4.0-final, however :-).

Anyway, it seems that I can already make use the lock_sock() infrastructure
for fixing the output serialization, even without making the whole
protocol stack SMP-aware at once.

Henner

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Q: sock output serialization

2000-09-12 Thread Henner Eisen



Hi,

 "kuznet" == kuznet  [EMAIL PROTECTED] writes:

kuznet Hello!

kuznet In input path you have a packet. Add it to backlog and
kuznet processing will be resumed after lock is released. Compare
kuznet with tcp.

 serializing the kick. Well, maybe my solution could still be
 simplified (maybe some test_and_set/clear_bit() magic could
 achieve the same).

kuznet Being legal in principle, using non-standard serialization
kuznet primitives is seriously deprecated. It is impossible to
kuznet maintain. In you case, it is even not evident that it does
kuznet not lose events with smp.

Yes, I see. I did not realize before that the lock_sock and the
sk-backlog framework are not two independent things. They really
seem to be designed for team work only.  Did I get this right?

And I realize that the lock_sock framework is superior to my approach.
It does not only serialize output, it also serialized output and input
such that other problems are solved as well (the current code, even
after serializing output, could still suffer from atomicity problems
when input path interrupts output path an modifies protocol control
block variables in a non-atomic manner). The lock_sock framework properly
applied seems to make socket propgramming as easy as in the old cli()/sti()
days again. Basically, it seems to be a ´disable interrupts for this socket´

 - introduce a protocol-global spinlock and protect
 protocol-global critical section by spin_lock_bh() instead of
 cli()

kuznet Why? It is not required. There are no reasons to protect
kuznet protocol as whole, if sockets are protected.

Well, the term ´protocol global´ was misleading. I should have
said ´global to the protocol-family´. E.g. there are currently some
cli()/sti() pairs to protect socket list and routing table manipulations.
tcp also seems to use some additional protocol-global spinlocks
(like tcp_portalloc_lock).

 Can NET_TX_SOFTIRQ be prempted by NET_RX_SOFTIRQ or timer?

kuznet It cannot be preempted, but it is not very essential,
kuznet because all they can run in parallel on different cpus.

The reason why I was asking is that I recently got IP and PPP tunneling
over X.25 working (in-kernel). In that case, protocol output processing
would be done from NET_TX_SOFTIRQ context, which is only allowed to
to bh_lock_sock(), but not lock_sock(). As bh_lock_sock() is just
spin_lock() -- and not spin_lock_bh() -- this could stall the CPU
if NET_TX_SOFTIRQ were preempted by a timer or NET_RX_SOFTIRQ while holding
the spinlock. In that case, beeing preemptable would make a very essential
difference.

kuznet Alexey

Thanks for the insight. I hope this is sufficient to migrate the code.
Not before 2.4.0-final, however :-).

Anyway, it seems that I can already make use the lock_sock() infrastructure
for fixing the output serialization, even without making the whole
protocol stack SMP-aware at once.

Henner



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Q: sock output serialization

2000-09-09 Thread kuznet


Hello!

> I guess I´d also need to call lock_sock() from sendmsg(). And before
> calling x25_kick from socket input path, I´d need to verify that
> sk->lock.users is zero. If sk->lock.users was !=0, I´d need some atomic
> variable anyway in order to defer the kick. 

In input path you have a packet. Add it to backlog and processing
will be resumed after lock is released. Compare with tcp.


> serializing the kick. Well, maybe my solution could still be simplified
> (maybe some test_and_set/clear_bit() magic could achieve the same). 

Being legal in principle, using non-standard serialization primitives
is seriously deprecated. It is impossible to maintain. In you case,
it is even not evident that it does not lose events with smp.



> - introduce a protocol-global spinlock and protect protocol-global
>   critical section by spin_lock_bh() instead of cli()

Why? It is not required. There are no reasons to protect protocol
as whole, if sockets are protected.


> - protect all sock proto_ops methods by lock_sock()

Yes.


> - when bh functions need to be protected from sk state change, they
>   need to aquire bh_lock_sock()

And check for sk->lock.users. If it is not zero, operation is deferred.


> - before bh (timer) function change sk state, they need to aquire
>   bh_lock_sock and verify, that sk->lock.users!=0

Yes.


> - remove the SOCKOPS_WRAPPED() macro from the proto_ops 

Yes. And finally announce protocol to be SMP aware setting data
field of packet_type to 1.


> Can NET_TX_SOFTIRQ be prempted by NET_RX_SOFTIRQ or timer?

It cannot be preempted, but it is not very essential, because
all they can run in parallel on different cpus.

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Q: sock output serialization

2000-09-07 Thread Henner Eisen


Hi,

Is the following fix clean or are there better solutions? 

There is a race condition in the Linux X.25 protocol stack. The stack
has an x25_kick() function which dequeues as many skb´s from
sk->write_queue as the send windows allows and sends them downwards.

This kick function is called from send_msg() as well as (when an
acknowledge arrives) from the input path of the socket code. The latter
is usually called from NET_RX_SOFTIRQ and might therfore interrupt
an x25_kick() executed on behalf of send_msg(). (This is a problem
because it could mess up packet order which needs to be preserved
with X.25).

The fix I came up with consists of replacing current x25_kick() by inlined
__x25_kick() and defining a new x25_kick() which wraps the old function
as follows:

atomic_inc(>protinfo.x25->kick_it);
if((atomic_read(>protinfo.x25->kick_it)) != 1) return;
 
do { 
__x25_kick(sk); 
} while (!atomic_dec_and_test(>protinfo.x25->kick_it));

This makes __x25_kick single threaded per socket, the first thread in
__x25_kick() will also perform the work for possible other threads which
have tried to interrupt the first thread.

Is this a proper approach or are there better solutions (e.g. more SMP
friendly, less overhead on certain hardware arch)?

What about 2.2.x? This should also work for 2.2.x, but for 2.2.x I could
also wrap __x25_kick() inside {start,stop}_bh_atomic(), I guess.

Henner

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

48 matches

Mail list logo