Re: [RFC][PATCH 0/3] TCP/IP Critical socket communication mechanism

2005-12-21 Thread Pavel Machek
Hi!

  If it is only one place, why not pre-allocate one I'm sick now
  skb and hold onto it. Any bigger solution seems to snowball into
  a huge mess.
 
 But the problem is even sending/receiving a single packet can cause 
 multiple dynamic allocations in the networking path all the way from
 the sockets layer-transport-ip-driver.
 To successfully send a packet, we may have to do arp, send acks and 
 create cached routes etc. So my patch tried to identify the allocations
 that are needed to succesfully send/receive packets over a pre-established
 socket and adds a new flag GFP_CRITICAL to those calls.
 This doesn't make any difference when we are not in emergency. But when
 we go into emergency, VM will try to satisfy these allocations from a
 critical pool if the normal path leads to failure.
 
 We go into emergency when some management app detects that a swap device
 is about to fail(we are not yet in OOM, but will enter OOM soon). In order
 to avoid entering OOM, we need to send a message over a critical socket to
 a remote server that can initiate failover and switch to a different swap
 device. The switchover will happen within 2 minutes after it is initiated.
 In a cluster environment, the remote server also sends a message to other
 nodes which are also running the management app so that they also enter
 emergency. Once we successfully switch to a different swap device, the remote
 server sends a message to all the nodes and they come out of emergency.
 
 During the period of emergency, all other communications can block. But
 guranteeing the successful delivery of the critical messages will help 
 in making sure that we do not enter OOM situation.

Why not do it the other way? If you don't hear from me for 2 minutes,
do a switchover. Then all you have to do is _not_ to send a packet --
easier to do.

Anything else seems overkill.
Pavel
-- 
Thanks, Sharp!
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH 0/3] TCP/IP Critical socket communication mechanism

2005-12-21 Thread David Stevens
 Why not do it the other way? If you don't hear from me for 2 minutes,
 do a switchover. Then all you have to do is _not_ to send a packet --
 easier to do.
 
 Anything else seems overkill.
 Pavel

Because in some of the scenarios, including ours, it isn't a
simple failover to a known alternate device or configuration --
it is reconfiguring dynamically with information received on a
socket from a remote machine (while the swap device is unavailable).
Limited socket communication without allocating new memory
that may not be available is the problem definition. Avoiding the
problem in the first place (your solution) is effective if you
can do it, of course. The trick is to solve the problem when you
can't avoid it. :-)

+-DLS

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH 0/3] TCP/IP Critical socket communication mechanism

2005-12-16 Thread Bodo Eggert
David S. Miller [EMAIL PROTECTED] wrote:

 The idea to mark, for example, IPSEC key management daemon's sockets
 as critical is flawed, because the key management daemon could hit a
 swap page over the iSCSI device.  Don't even start with the idea to
 lock the IPSEC key management daemon into ram with mlock().

How are you going to swap in the key manager if you need the key manager
for doing this?


However, I'd prefer a system where you can't dirty mor than (e.g.) 80 % of
RAM unless you need this to maintain vital system activity and not more
than 95 % unless it will help to get more clean RAM. (Like the priority
inheritance suggestion from this thread.) I suppose this to least
significantly reduce thrashing and give a very good chance of recovering
from memory pressure. Off cause the implementation won't be easy,
especially if userspace applications need to inherit priority from
different code paths, but in theory, it can be done.

-- 
Ich danke GMX dafür, die Verwendung meiner Adressen mittels per SPF
verbreiteten Lügen zu sabotieren.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH 0/3] TCP/IP Critical socket communication mechanism

2005-12-15 Thread David S. Miller
From: Sridhar Samudrala [EMAIL PROTECTED]
Date: Wed, 14 Dec 2005 23:37:37 -0800 (PST)

 Instead, you seem to be suggesting in_emergency to be set dynamically
 when we are about to run out of ATOMIC memory. Is this right?

Not when we run out, but rather when we reach some low water mark, the
critical sockets would still use GFP_ATOMIC memory but only
critical sockets would be allowed to do so.

But even this has faults, consider the IPSEC scenerio I mentioned, and
this applies to any kind of encapsulation actually, even simple
tunneling examples can be concocted which make the critical socket
idea fail.

The knee jerk reaction is mark IPSEC's sockets critical, and mark the
tunneling allocations critical, and... and...  well you have
GFP_ATOMIC then my friend.

In short, these seperate page pool and critical socket ideas do
not work and we need a different solution, I'm sorry folks spent so
much time on them, but they are heavily flawed.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH 0/3] TCP/IP Critical socket communication mechanism

2005-12-15 Thread Arjan van de Ven
On Thu, 2005-12-15 at 00:21 -0800, David S. Miller wrote:
 From: Sridhar Samudrala [EMAIL PROTECTED]
 Date: Wed, 14 Dec 2005 23:37:37 -0800 (PST)
 
  Instead, you seem to be suggesting in_emergency to be set dynamically
  when we are about to run out of ATOMIC memory. Is this right?
 
 Not when we run out, but rather when we reach some low water mark, the
 critical sockets would still use GFP_ATOMIC memory but only
 critical sockets would be allowed to do so.
 
 But even this has faults, consider the IPSEC scenerio I mentioned, and
 this applies to any kind of encapsulation actually, even simple
 tunneling examples can be concocted which make the critical socket
 idea fail.
 
 The knee jerk reaction is mark IPSEC's sockets critical, and mark the
 tunneling allocations critical, and... and...  well you have
 GFP_ATOMIC then my friend.
 
 In short, these seperate page pool and critical socket ideas do
 not work and we need a different solution, I'm sorry folks spent so
 much time on them, but they are heavily flawed.

maybe it should be approached from the other side; having a way to mark
connections as low priority (say incoming http connections to your
webserver) or as non-critical/expendable would give the normal
GFP_ATOMIC ones a better chance in case of overload/DDOS etc. It's not
going to solve the VM deadlock issue wrt iscsi/nfs; however it might be
useful in the survive slashdot sense...

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH 0/3] TCP/IP Critical socket communication mechanism

2005-12-15 Thread David Stevens
 Also, all this stuff is just a band aid because linux OOM behavior is so
 fucked up.

In our internal discussions, characterizing this as OOM came
up a lot, and I don't think of it as that at all. OOM is exactly what the
scheme is trying to avoid!

The actual situation we have in mind is a swap device management system
in a cluster where a remote system tells you (via socket communication to
a user-land management app) that a swap device is going to fail over and
it'd be a good idea not to do anything that requires paging out or
swapping for a short period of time. The socket communication must work,
but the system is not at all out of memory, and the important point is
that it never will be if you limit allocations to those things that are
required for the critical socket to work (and nothing/little else).
Receiver side allocations are unavoidable, because you don't know
if you can drop the packet or not until you look at it. Some 
infrastructure
must work. But everything else can fail or succeed based on ordinary churn
in ordinary memory pools, until the in_emergency condition has passed.
The critical socket(s) simply have to be out of the zero-sum game
for the rest of the allocations, because those are the (only) path to
getting a working swap device again.

If you're out of memory without a network mechanism to get you more,
this doesn't do anything for you (and it isn't intended to). And if you
mark any socket that isn't going to get you failed over or otherwise
get you more swap, it isn't going to help you, either. It isn't a priority
scheme for low-memory, it's a failover mechanism that relies on 
networking.
There are exactly 2 priorities: critical (as in you might as well crash 
if
these aren't satisfied) and everything else.

Doing other, more general things that handle low memory, or OOM, or 
identified
priorities are great, but the problem we're interested in solving here is
really just about making socket communication work when the alternative is
a completely dead system. I think these patches do that in a reasonable 
way.
A better solution would be great, too, if there is one. :-)

+-DLS

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH 0/3] TCP/IP Critical socket communication mechanism

2005-12-15 Thread David Stevens
David S. Miller [EMAIL PROTECTED] wrote on 12/15/2005 12:58:05 AM:

 From: David Stevens [EMAIL PROTECTED]
 Date: Thu, 15 Dec 2005 00:44:52 -0800
 
  In our internal discussions
 
 I really wish this hadn't been discussed internally before being
 implemented.  Any such internal discussions are lost completely upon
 the community that ends up reviewing such a core and invasive patch
 such as this one.

I think those were more informal and less extensive than the
impression I gave you. I mean simply bouncing around incomplete
ideas and discussing some of the potential issues before coming
up with a prototype solution, which is intended to be the starting
point for community discussions (and the KS discussions, too). OOM
came up immediately (even when naming the problem), and it isn't how
I ever saw it.

The patches, of course, are intended to NOT be invasive, or any
more than they need to be, and they are not the solution, but
a solution. A completely different one that solves the problem
is just as good to me.

+-DLS

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH 0/3] TCP/IP Critical socket communication mechanism

2005-12-15 Thread James Courtier-Dutton

Mitchell Blank Jr wrote:

James Courtier-Dutton wrote:

When I had the conversation with Matt at KS, the problem we were trying 
to solve was Memory pressure with network attached swap space.



s/swap space/writable filesystems/

You can hit these problems even if you have no swap.  Too much of the
memory becomes filled with dirty pages needing writeback -- then you lose
your NFS server's ARP entry at the wrong moment.  If you have a local disk
to swap to the machine will recover after a little bit of grinding, otherwise
it's all pretty much over.

The big problem is that as long as there's network I/O coming in it's
likely that pages you free (as the VM gets more and more desperate about
dropping the few remaining non-dirty pages) will get used for sockets
that AREN'T helping you recover RAM.  You really need to be able to tell
the whole network stack we're in really rough shape here; ignore all RX
work unless it's going to help me get write ACKs back from my {NFS,iSCSI}
server  My understanding is that is what this patchset is trying to
accomplish.

-Mitch




You are using the wrong hammer to crack your nut.
You should instead approach your problem of why the ARP entry gets lost.
For example, you could give as critical priority to your TCP session, 
but that still won't cure your ARP problem.
I would suggest that the best way to cure your arp problem, is to 
increase the time between arp cache refreshes.


James

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH 0/3] TCP/IP Critical socket communication mechanism

2005-12-15 Thread Arjan van de Ven

 
 You are using the wrong hammer to crack your nut.
 You should instead approach your problem of why the ARP entry gets lost.
 For example, you could give as critical priority to your TCP session, 
 but that still won't cure your ARP problem.
 I would suggest that the best way to cure your arp problem, is to 
 increase the time between arp cache refreshes.

or turn it around entirely: all traffic is considered important
unless... and have a bunch of non-critical sockets (like http requests)
be marked non-critical.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH 0/3] TCP/IP Critical socket communication mechanism

2005-12-15 Thread jamal
On Thu, 2005-15-12 at 12:47 +0100, Arjan van de Ven wrote:
  
  You are using the wrong hammer to crack your nut.
  You should instead approach your problem of why the ARP entry gets lost.
  For example, you could give as critical priority to your TCP session, 
  but that still won't cure your ARP problem.
  I would suggest that the best way to cure your arp problem, is to 
  increase the time between arp cache refreshes.
 
 or turn it around entirely: all traffic is considered important
 unless... and have a bunch of non-critical sockets (like http requests)
 be marked non-critical.

The big hole punched by DaveM is that of dependencies: a http tcp
connection is tied to ICMP or the IPSEC example given; so you need a lot
more intelligence than just what your app is knowledgeable about at its
level. 
You cant really do this shit at the socket level. You need to do it much
earlier.
At runtime, when lower memory thresholds gets crossed, you kick
classification of what packets need to be dropped using something along
the lines of statefull/connection tracking. When things get better you
undo.

cheers,
jamal

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH 0/3] TCP/IP Critical socket communication mechanism

2005-12-15 Thread Arjan van de Ven
On Thu, 2005-12-15 at 08:00 -0500, jamal wrote:
 On Thu, 2005-15-12 at 12:47 +0100, Arjan van de Ven wrote:
   
   You are using the wrong hammer to crack your nut.
   You should instead approach your problem of why the ARP entry gets lost.
   For example, you could give as critical priority to your TCP session, 
   but that still won't cure your ARP problem.
   I would suggest that the best way to cure your arp problem, is to 
   increase the time between arp cache refreshes.
  
  or turn it around entirely: all traffic is considered important
  unless... and have a bunch of non-critical sockets (like http requests)
  be marked non-critical.
 
 The big hole punched by DaveM is that of dependencies: a http tcp
 connection is tied to ICMP or the IPSEC example given; so you need a lot
 more intelligence than just what your app is knowledgeable about at its
 level. 

yeah well sort of. You're right of course, but that also doesn't mean
you can't give hints from the other side. Like data for this socked is
NOT critical important. It gets tricky if you only do it for OOM stuff;
because then that one ACK packet could cause a LOT of memory to be
freed, and as such can be important for the system even if the socket
isn't.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH 0/3] TCP/IP Critical socket communication mechanism

2005-12-15 Thread Sridhar Samudrala
On Thu, 2005-12-15 at 00:21 -0800, David S. Miller wrote:
 From: Sridhar Samudrala [EMAIL PROTECTED]
 Date: Wed, 14 Dec 2005 23:37:37 -0800 (PST)
 
  Instead, you seem to be suggesting in_emergency to be set dynamically
  when we are about to run out of ATOMIC memory. Is this right?
 
 Not when we run out, but rather when we reach some low water mark, the
 critical sockets would still use GFP_ATOMIC memory but only
 critical sockets would be allowed to do so.
 
 But even this has faults, consider the IPSEC scenerio I mentioned, and
 this applies to any kind of encapsulation actually, even simple
 tunneling examples can be concocted which make the critical socket
 idea fail.
 
 The knee jerk reaction is mark IPSEC's sockets critical, and mark the
 tunneling allocations critical, and... and...  well you have
 GFP_ATOMIC then my friend.

I would like to mention another reason why we need to have a new 
GFP_CRITICAL flag for an allocation request. When we are in emergency,
even the GFP_KERNEL allocations for a critical socket should not 
sleep. This is because the swap device may have failed and we would
like to communicate this event to a management server over the 
critical socket so that it can initiate the failover.

We are not trying to solve swapping over network problem. It is much
simpler. The critical sockets are to be used only to send/receive
a few critical messages reliably during a short period of emergency.

Thanks
Sridhar


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC][PATCH 0/3] TCP/IP Critical socket communication mechanism

2005-12-14 Thread Sridhar Samudrala

These set of patches provide a TCP/IP emergency communication mechanism that
could be used to guarantee high priority communications over a critical socket
to succeed even under very low memory conditions that last for a couple of
minutes. It uses the critical page pool facility provided by Matt's patches
that he posted recently on lkml.
http://lkml.org/lkml/2005/12/14/34/index.html

This mechanism provides a new socket option SO_CRITICAL that can be used to
mark a socket as critical. A critical connection used for emergency
communications has to be established and marked as critical before we enter
the emergency condition.

It uses the __GFP_CRITICAL flag introduced in the critical page pool patches
to indicate an allocation request as critical and should be satisfied from the
critical page pool if required. In the send path, this flag is passed with all
allocation requests that are made for a critical socket. But in the receive
path we do not know if a packet is critical or not until we receive it and
find the socket that it is destined to. So we treat all the allocation
requests in the receive path as critical.

The critical page pool patches also introduces a global flag
'system_in_emergency' that is used to indicate an emergency situation(could be
a low memory condition). When this flag is set any incoming packets that belong
to non-critical sockets are dropped as soon as possible in the receive path.
This is necessary to prevent incoming non-critical packets to consume memory
from critical page pool.

I would appreciate any feedback or comments on this approach.

Thanks
Sridhar
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH 0/3] TCP/IP Critical socket communication mechanism

2005-12-14 Thread Andi Kleen
 I would appreciate any feedback or comments on this approach.

Maybe I'm missing something but wouldn't you need an own critical
pool (or at least reservation) for each socket to be safe against deadlocks?

Otherwise if a critical sockets needs e.g. 2 pages to finish something
and 2 critical sockets are active they can each steal the last pages
from each other and deadlock.

-Andi
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH 0/3] TCP/IP Critical socket communication mechanism

2005-12-14 Thread Sridhar Samudrala
On Wed, 2005-12-14 at 10:22 +0100, Andi Kleen wrote:
  I would appreciate any feedback or comments on this approach.
 
 Maybe I'm missing something but wouldn't you need an own critical
 pool (or at least reservation) for each socket to be safe against deadlocks?
 
 Otherwise if a critical sockets needs e.g. 2 pages to finish something
 and 2 critical sockets are active they can each steal the last pages
 from each other and deadlock.

Here we are assuming that the pre-allocated critical page pool is big enough
to satisfy the requirements of all the critical sockets.

In the current critical page pool implementation, there is also a limitation 
that only order-0 allocations(single page) are supported. I think in the
networking send/receive patch, the only place where multi-page allocs are
requested is in the drivers if the MTU  PAGESIZE. But i guess the drivers
are getting updated to avoid  order-0 allocations.

Also during the emergency, we free the memory allocated for non-critical 
packets as quickly as possible so that it can be re-used for critical
allocations.

Thanks
Sridhar

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH 0/3] TCP/IP Critical socket communication mechanism

2005-12-14 Thread David Stevens
 It has a lot
 more users that compete true, but likely the set of GFP_CRITICAL users
 would grow over time too and it would develop the same problem.

No, because the critical set is determined by the user (by setting
the socket flag).
The receive side has some things marked as critical until we
have processed enough to check the socket flag, but then they should
be released. Those short-lived allocations and frees are more or less
0 net towards the pool.
Certainly, it wouldn't work very well if every socket is
marked as critical, but with an adequate pool for the workload, I
expect it'll work as advertised (esp. since it'll usually be only one
socket associated with swap management that'll be critical).

+-DLS

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH 0/3] TCP/IP Critical socket communication mechanism

2005-12-14 Thread Jesper Juhl
On 12/14/05, Sridhar Samudrala [EMAIL PROTECTED] wrote:

 These set of patches provide a TCP/IP emergency communication mechanism that
 could be used to guarantee high priority communications over a critical socket
 to succeed even under very low memory conditions that last for a couple of
 minutes. It uses the critical page pool facility provided by Matt's patches
 that he posted recently on lkml.
 http://lkml.org/lkml/2005/12/14/34/index.html

 This mechanism provides a new socket option SO_CRITICAL that can be used to
 mark a socket as critical. A critical connection used for emergency

So now everyone writing commercial apps for Linux are going to set
SO_CRITICAL on sockets in their apps so their apps can survive better
under pressure than the competitors aps and clueless programmers all
over are going to think cool, with this I can make my app more
important than everyone elses, I'm going to use this.  When everyone
and his dog starts to set this, what's the point?


 communications has to be established and marked as critical before we enter
 the emergency condition.

 It uses the __GFP_CRITICAL flag introduced in the critical page pool patches
 to indicate an allocation request as critical and should be satisfied from the
 critical page pool if required. In the send path, this flag is passed with all
 allocation requests that are made for a critical socket. But in the receive
 path we do not know if a packet is critical or not until we receive it and
 find the socket that it is destined to. So we treat all the allocation
 requests in the receive path as critical.

 The critical page pool patches also introduces a global flag
 'system_in_emergency' that is used to indicate an emergency situation(could be
 a low memory condition). When this flag is set any incoming packets that 
 belong
 to non-critical sockets are dropped as soon as possible in the receive path.

Hmm, so if I fire up an app that has SO_CRITICAL set on a socket and
can then somehow put a lot of memory pressure on the machine I can
cause traffic on other sockets to be dropped.. hmmm.. sounds like
something to play with to create new and interresting DoS attacks...


 This is necessary to prevent incoming non-critical packets to consume memory
 from critical page pool.

 I would appreciate any feedback or comments on this approach.


To be a little serious, it sounds like something that could be used to
cause trouble and something that will lose its usefulness once enough
people start using it (for valid or invalid reasons), so what's the
point...


--
Jesper Juhl [EMAIL PROTECTED]
Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please  http://www.expita.com/nomime.html
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH 0/3] TCP/IP Critical socket communication mechanism

2005-12-14 Thread Ben Greear

Jesper Juhl wrote:


To be a little serious, it sounds like something that could be used to
cause trouble and something that will lose its usefulness once enough
people start using it (for valid or invalid reasons), so what's the
point...


It could easily be a user-configurable option in an application.  If
DOS is a real concern, only let this work for root users...

Ben

--
Ben Greear [EMAIL PROTECTED]
Candela Technologies Inc  http://www.candelatech.com

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH 0/3] TCP/IP Critical socket communication mechanism

2005-12-14 Thread James Courtier-Dutton

Jesper Juhl wrote:

On 12/14/05, Sridhar Samudrala [EMAIL PROTECTED] wrote:


These set of patches provide a TCP/IP emergency communication mechanism that
could be used to guarantee high priority communications over a critical socket
to succeed even under very low memory conditions that last for a couple of
minutes. It uses the critical page pool facility provided by Matt's patches
that he posted recently on lkml.
   http://lkml.org/lkml/2005/12/14/34/index.html

This mechanism provides a new socket option SO_CRITICAL that can be used to
mark a socket as critical. A critical connection used for emergency



So now everyone writing commercial apps for Linux are going to set
SO_CRITICAL on sockets in their apps so their apps can survive better
under pressure than the competitors aps and clueless programmers all
over are going to think cool, with this I can make my app more
important than everyone elses, I'm going to use this.  When everyone
and his dog starts to set this, what's the point?




I don't think the initial patches that Matt did were intended for what 
you are describing.
When I had the conversation with Matt at KS, the problem we were trying 
to solve was Memory pressure with network attached swap space.

I came up with the idea that I think Matt has implemented.
Letting the OS choose which are critical TCP/IP sessions is fine. But 
letting an application choose is a recipe for disaster.


James
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH 0/3] TCP/IP Critical socket communication mechanism

2005-12-14 Thread Sridhar Samudrala
On Wed, 2005-12-14 at 20:49 +, James Courtier-Dutton wrote:
 Jesper Juhl wrote:
  On 12/14/05, Sridhar Samudrala [EMAIL PROTECTED] wrote:
  
 These set of patches provide a TCP/IP emergency communication mechanism that
 could be used to guarantee high priority communications over a critical 
 socket
 to succeed even under very low memory conditions that last for a couple of
 minutes. It uses the critical page pool facility provided by Matt's patches
 that he posted recently on lkml.
 http://lkml.org/lkml/2005/12/14/34/index.html
 
 This mechanism provides a new socket option SO_CRITICAL that can be used to
 mark a socket as critical. A critical connection used for emergency
  
  
  So now everyone writing commercial apps for Linux are going to set
  SO_CRITICAL on sockets in their apps so their apps can survive better
  under pressure than the competitors aps and clueless programmers all
  over are going to think cool, with this I can make my app more
  important than everyone elses, I'm going to use this.  When everyone
  and his dog starts to set this, what's the point?
  
  
 
 I don't think the initial patches that Matt did were intended for what 
 you are describing.
 When I had the conversation with Matt at KS, the problem we were trying 
 to solve was Memory pressure with network attached swap space.
 I came up with the idea that I think Matt has implemented.
 Letting the OS choose which are critical TCP/IP sessions is fine. But 
 letting an application choose is a recipe for disaster.

We could easily add capable(CAP_NET_ADMIN) check to allow this option to
be set only by privileged users.

Thanks
Sridhar

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH 0/3] TCP/IP Critical socket communication mechanism

2005-12-14 Thread James Courtier-Dutton

Sridhar Samudrala wrote:

On Wed, 2005-12-14 at 20:49 +, James Courtier-Dutton wrote:


Jesper Juhl wrote:


On 12/14/05, Sridhar Samudrala [EMAIL PROTECTED] wrote:



These set of patches provide a TCP/IP emergency communication mechanism that
could be used to guarantee high priority communications over a critical socket
to succeed even under very low memory conditions that last for a couple of
minutes. It uses the critical page pool facility provided by Matt's patches
that he posted recently on lkml.
  http://lkml.org/lkml/2005/12/14/34/index.html

This mechanism provides a new socket option SO_CRITICAL that can be used to
mark a socket as critical. A critical connection used for emergency



So now everyone writing commercial apps for Linux are going to set
SO_CRITICAL on sockets in their apps so their apps can survive better
under pressure than the competitors aps and clueless programmers all
over are going to think cool, with this I can make my app more
important than everyone elses, I'm going to use this.  When everyone
and his dog starts to set this, what's the point?




I don't think the initial patches that Matt did were intended for what 
you are describing.
When I had the conversation with Matt at KS, the problem we were trying 
to solve was Memory pressure with network attached swap space.

I came up with the idea that I think Matt has implemented.
Letting the OS choose which are critical TCP/IP sessions is fine. But 
letting an application choose is a recipe for disaster.



We could easily add capable(CAP_NET_ADMIN) check to allow this option to
be set only by privileged users.

Thanks
Sridhar



Sridhar,

Have you actually thought about what would happen in a real world senario?
There is no real world requirement for this sort of user land feature.
In memory pressure mode, you don't care about user applications. In 
fact, under memory pressure no user applications are getting scheduled.
All you care about is swapping out memory to achieve a net gain in free 
memory, so that the applications can then run ok again.


James
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH 0/3] TCP/IP Critical socket communication mechanism

2005-12-14 Thread Ben Greear

James Courtier-Dutton wrote:


Have you actually thought about what would happen in a real world senario?
There is no real world requirement for this sort of user land feature.
In memory pressure mode, you don't care about user applications. In 
fact, under memory pressure no user applications are getting scheduled.
All you care about is swapping out memory to achieve a net gain in free 
memory, so that the applications can then run ok again.


Low 'ATOMIC' memory is different from the memory that user space typically
uses, so just because you can't allocate an SKB does not mean you are swapping
out user-space apps.

I have an app that can have 2000+ sockets open.  I would definately like to make
the management and other important sockets have priority over others in my 
app...

Ben

--
Ben Greear [EMAIL PROTECTED]
Candela Technologies Inc  http://www.candelatech.com

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH 0/3] TCP/IP Critical socket communication mechanism

2005-12-14 Thread Sridhar Samudrala
On Wed, 2005-12-14 at 14:39 -0800, Ben Greear wrote:
 James Courtier-Dutton wrote:
 
  Have you actually thought about what would happen in a real world senario?
  There is no real world requirement for this sort of user land feature.
  In memory pressure mode, you don't care about user applications. In 
  fact, under memory pressure no user applications are getting scheduled.
  All you care about is swapping out memory to achieve a net gain in free 
  memory, so that the applications can then run ok again.
 
 Low 'ATOMIC' memory is different from the memory that user space typically
 uses, so just because you can't allocate an SKB does not mean you are swapping
 out user-space apps.
 
 I have an app that can have 2000+ sockets open.  I would definately like to 
 make
 the management and other important sockets have priority over others in my 
 app...

The scenario we are trying to address is also a management connection between 
the 
nodes of a cluster and a server that manages the swap devices accessible by all 
the 
nodes of the cluster. The critical connection is supposed to be used to 
exchange 
status notifications of the swap devices so that failover can happen and 
propagated 
to all the nodes as quickly as possible. The management apps will be pinned into
memory so that they are not swapped out.

As such the traffic that flows over the critical sockets is not high but should
not stall even if we run into a memory constrained situation. That is the reason
why we would like to have a pre-allocated critical page pool which could be used
when we run out of ATOMIC memory.

Thanks
Sridhar


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH 0/3] TCP/IP Critical socket communication mechanism

2005-12-14 Thread Matt Mackall
On Wed, Dec 14, 2005 at 09:55:45AM -0800, Sridhar Samudrala wrote:
 On Wed, 2005-12-14 at 10:22 +0100, Andi Kleen wrote:
   I would appreciate any feedback or comments on this approach.
  
  Maybe I'm missing something but wouldn't you need an own critical
  pool (or at least reservation) for each socket to be safe against deadlocks?
  
  Otherwise if a critical sockets needs e.g. 2 pages to finish something
  and 2 critical sockets are active they can each steal the last pages
  from each other and deadlock.
 
 Here we are assuming that the pre-allocated critical page pool is big enough
 to satisfy the requirements of all the critical sockets.

Not a good assumption. A system can have between 1-1000 iSCSI
connections open and we certainly don't want to preallocate enough
room for 1000 connections to make progress when we might only have one
in use.

I think we need a global receive pool and per-socket send pools.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH 0/3] TCP/IP Critical socket communication mechanism

2005-12-14 Thread Matt Mackall
On Wed, Dec 14, 2005 at 08:30:23PM -0800, David S. Miller wrote:
 From: Matt Mackall [EMAIL PROTECTED]
 Date: Wed, 14 Dec 2005 19:39:37 -0800
 
  I think we need a global receive pool and per-socket send pools.
 
 Mind telling everyone how you plan to make use of the global receive
 pool when the allocation happens in the device driver and we have no
 idea which socket the packet is destined for?  What should be done for
 non-local packets being routed?  The device drivers allocate packets
 for the entire system, long before we know who the eventually received
 packets are for.  It is fully anonymous memory, and it's easy to
 design cases where the whole pool can be eaten up by non-local
 forwarded packets.

There needs to be two rules:

iff global memory critical flag is set
- allocate from the global critical receive pool on receive
- return packet to global pool if not destined for a socket with an
  attached send mempool

I think this will provide the desired behavior, though only
probabilistically. That is, we can fill the global receive pool with
uninteresting packets such that we're forced to drop critical ACKs,
but the boring packets will eventually be discarded as we walk up the
stack and we'll eventually have room to receive retried ACKs.

 I truly dislike these patches being discussed because they are a
 complete hack, and admittedly don't even solve the problem fully.  I
 don't have any concrete better ideas but that doesn't mean this stuff
 should go into the tree.

Agreed. I'm fairly convinced a full fix is doable, if you make a
couple assumptions (limited fragmentation), but will unavoidably be
less than pretty as it needs to cross some layers.

 I think GFP_ATOMIC memory pools are more powerful than they are given
 credit for.  There is nothing preventing the implementation of dynamic
 GFP_ATOMIC watermarks, and having critical socket behavior kick in
 in response to hitting those water marks.

There are two problems with GFP_ATOMIC. The first is that its users
don't pre-state their worst-case usage, which means sizing the pool to
reliably avoid deadlocks is impossible. The second is that there
aren't any guarantees that GFP_ATOMIC allocations are actually
critical in the needed-to-make-forward-VM-progress sense or will be
returned to the pool in a timely fashion.

So I do think we need a distinct pool if we want to tackle this
problem. Though it's probably worth mentioning that Linus was rather
adamantly against even trying at KS.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH 0/3] TCP/IP Critical socket communication mechanism

2005-12-14 Thread David S. Miller
From: Matt Mackall [EMAIL PROTECTED]
Date: Wed, 14 Dec 2005 21:02:50 -0800

 There needs to be two rules:
 
 iff global memory critical flag is set
 - allocate from the global critical receive pool on receive
 - return packet to global pool if not destined for a socket with an
   attached send mempool

This shuts off a router and/or firewall just because iSCSI or NFS peed
in it's pants.  Not really acceptable.

 I think this will provide the desired behavior

It's not desirable.

What if iSCSI is protected by IPSEC, and the key management daemon has
to process a security assosciation expiration and negotiate a new one
in order for iSCSI to further communicate with it's peer when this
memory shortage occurs?  It needs to send packets back and forth with
the remove key management daemon in order to do this, but since you
cut it off with this critical receive pool, the negotiation will never
succeed.

This stuff won't work.  It's not a generic solution and that's
why it has more holes than swiss cheese. :-)
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH 0/3] TCP/IP Critical socket communication mechanism

2005-12-14 Thread Andi Kleen
On Wed, Dec 14, 2005 at 08:30:23PM -0800, David S. Miller wrote:
 From: Matt Mackall [EMAIL PROTECTED]
 Date: Wed, 14 Dec 2005 19:39:37 -0800
 
  I think we need a global receive pool and per-socket send pools.
 
 Mind telling everyone how you plan to make use of the global receive
 pool when the allocation happens in the device driver and we have no
 idea which socket the packet is destined for?  What should be done for

In theory one could use multiple receive queue on intelligent enough
NIC with the NIC distingushing the sockets.

But that would be still a nasty you need advanced hardware FOO to avoid
subtle problem Y case. Also it would require lots of  driver hacking.

And most NICs seem to have limits on the size of the socket tables for this, 
which
means you would end up in a only N sockets supported safely situation,
with N likely being quite small on common hardware.

I think the idea of the original poster was that just freeing non critical 
packets
after a short time again would be good enough, but I'm a bit sceptical
on that.

 I truly dislike these patches being discussed because they are a
 complete hack, and admittedly don't even solve the problem fully.  I

I agree. 

 I think GFP_ATOMIC memory pools are more powerful than they are given
 credit for.  There is nothing preventing the implementation of dynamic

Their main problem is that they are used too widely and in a lot
of situations that aren't really critical.

-Andi

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH 0/3] TCP/IP Critical socket communication mechanism

2005-12-14 Thread Nick Piggin

David S. Miller wrote:

From: Matt Mackall [EMAIL PROTECTED]
Date: Wed, 14 Dec 2005 21:02:50 -0800



There needs to be two rules:

iff global memory critical flag is set
- allocate from the global critical receive pool on receive
- return packet to global pool if not destined for a socket with an
 attached send mempool



This shuts off a router and/or firewall just because iSCSI or NFS peed
in it's pants.  Not really acceptable.



But that should only happen (shut off a router and/or firewall) in cases
where we now completely deadlock and never recover, including shutting off
the router and firewall, because they don't have enough memory to recv
packets either.




I think this will provide the desired behavior



It's not desirable.

What if iSCSI is protected by IPSEC, and the key management daemon has
to process a security assosciation expiration and negotiate a new one
in order for iSCSI to further communicate with it's peer when this
memory shortage occurs?  It needs to send packets back and forth with
the remove key management daemon in order to do this, but since you
cut it off with this critical receive pool, the negotiation will never
succeed.



I guess IPSEC would be a critical socket too, in that case. Sure
there is nothing we can do if the daemon insists on allocating lots
of memory...


This stuff won't work.  It's not a generic solution and that's
why it has more holes than swiss cheese. :-)


True it will have holes. I think something that is complementary and
would be desirable is to simply limit the amount of in-flight writeout
that things like NFS allows (or used to allow, haven't checked for a
while and there were noises about it getting better).

--
SUSE Labs, Novell Inc.

Send instant messages to your online friends http://au.messenger.yahoo.com 


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH 0/3] TCP/IP Critical socket communication mechanism

2005-12-14 Thread Stephen Hemminger
On Wed, 14 Dec 2005 21:23:09 -0800 (PST)
David S. Miller [EMAIL PROTECTED] wrote:

 From: Matt Mackall [EMAIL PROTECTED]
 Date: Wed, 14 Dec 2005 21:02:50 -0800
 
  There needs to be two rules:
  
  iff global memory critical flag is set
  - allocate from the global critical receive pool on receive
  - return packet to global pool if not destined for a socket with an
attached send mempool
 
 This shuts off a router and/or firewall just because iSCSI or NFS peed
 in it's pants.  Not really acceptable.
 
  I think this will provide the desired behavior
 
 It's not desirable.
 
 What if iSCSI is protected by IPSEC, and the key management daemon has
 to process a security assosciation expiration and negotiate a new one
 in order for iSCSI to further communicate with it's peer when this
 memory shortage occurs?  It needs to send packets back and forth with
 the remove key management daemon in order to do this, but since you
 cut it off with this critical receive pool, the negotiation will never
 succeed.
 
 This stuff won't work.  It's not a generic solution and that's
 why it has more holes than swiss cheese. :-)

Also, all this stuff is just a band aid because linux OOM behavior is so
fucked up. The VM system just lets the user dig themselves into a huge
over commit, then we get into trying to change every other system to
compensate.  How about cutting things off earlier, and not falling
off the cliff? How about pushing out pages to swap earlier when memory
pressure starts to get noticed. Then you can free those non-dirty pages
to make progress. Too many of the VM decisions seem to be made in favor
of keep-it-in-memory benchmark situations.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH 0/3] TCP/IP Critical socket communication mechanism

2005-12-14 Thread Stephen Hemminger
On Thu, 15 Dec 2005 06:42:45 +0100
Andi Kleen [EMAIL PROTECTED] wrote:

 On Wed, Dec 14, 2005 at 08:30:23PM -0800, David S. Miller wrote:
  From: Matt Mackall [EMAIL PROTECTED]
  Date: Wed, 14 Dec 2005 19:39:37 -0800
  
   I think we need a global receive pool and per-socket send pools.
  
  Mind telling everyone how you plan to make use of the global receive
  pool when the allocation happens in the device driver and we have no
  idea which socket the packet is destined for?  What should be done for
 
 In theory one could use multiple receive queue on intelligent enough
 NIC with the NIC distingushing the sockets.
 
 But that would be still a nasty you need advanced hardware FOO to avoid
 subtle problem Y case. Also it would require lots of  driver hacking.
 
 And most NICs seem to have limits on the size of the socket tables for this, 
 which
 means you would end up in a only N sockets supported safely situation,
 with N likely being quite small on common hardware.
 
 I think the idea of the original poster was that just freeing non critical 
 packets
 after a short time again would be good enough, but I'm a bit sceptical
 on that.
 
  I truly dislike these patches being discussed because they are a
  complete hack, and admittedly don't even solve the problem fully.  I
 
 I agree. 
 
  I think GFP_ATOMIC memory pools are more powerful than they are given
  credit for.  There is nothing preventing the implementation of dynamic
 
 Their main problem is that they are used too widely and in a lot
 of situations that aren't really critical.

Most of the use of GFP_ATOMIC is by stuff that could fail but can't
sleep waiting for memory. How about adding a GFP_NORMAL for allocations
while holding a lock.

#define GFP_NORMAL (__GFP_NOMEMALLOC)

Then get people to change the unneeded GFP_ATOMIC's to GFP_NORMAL in
places where the error paths are reasonable.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH 0/3] TCP/IP Critical socket communication mechanism

2005-12-14 Thread Sridhar Samudrala
On Wed, 14 Dec 2005, David S. Miller wrote:

 From: Matt Mackall [EMAIL PROTECTED]
 Date: Wed, 14 Dec 2005 19:39:37 -0800

  I think we need a global receive pool and per-socket send pools.

 Mind telling everyone how you plan to make use of the global receive
 pool when the allocation happens in the device driver and we have no
 idea which socket the packet is destined for?  What should be done for
 non-local packets being routed?  The device drivers allocate packets
 for the entire system, long before we know who the eventually received
 packets are for.  It is fully anonymous memory, and it's easy to
 design cases where the whole pool can be eaten up by non-local
 forwarded packets.

 I truly dislike these patches being discussed because they are a
 complete hack, and admittedly don't even solve the problem fully.  I
 don't have any concrete better ideas but that doesn't mean this stuff
 should go into the tree.

 I think GFP_ATOMIC memory pools are more powerful than they are given
 credit for.  There is nothing preventing the implementation of dynamic
 GFP_ATOMIC watermarks, and having critical socket behavior kick in
 in response to hitting those water marks.

Does this mean that you are OK with having a mechanism to mark the
sockets as critical and dropping the non critical packets under
emergency, but you do not like having a separate critical page pool.

Instead, you seem to be suggesting in_emergency to be set dynamically
when we are about to run out of ATOMIC memory. Is this right?

Thanks
Sridhar
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html