Re: CPU Arch Security [was: Re: Probably the first published shell code]
On Mon, 11 Nov 2002 21:38:13 -0600, Linas Vepstas wrote: On Tue, Nov 12, 2002 at 02:00:14AM +0100, Ulrich Weigand was heard to remark: Linas Vepstas wrote: Ugh. Well, that could stop the show. But since the instruction causing this would be a problem state instruction, maybe there are fewer wacky situations to deal with. Clearly, a store can be re-executed without harm, even if its been partly executed before. The problem would be with any instructions that had side effects, that would not do the same thing the second time around, as it did the first time. I don't know what these would be. Actually on 390, a storage instruction MAY not necessarily be able to be re-executed without problem. You need to specify what store you are talking about. A register to storage or storage to register instruction CAN be re-executed without problem. However, 390 has storage to storage instructions, and those instructions can overlap and DO overlap. And those MAY have problems if interrupted in the middle. In fact, if you look in the Principle of Operations book for MVCL, you will find an elaborate explanation of the oops that you can get if you are overlaping a MVCL, and the instruction is interrupted (which can happen easily with an MVCL on a multi-processing system). So imagine what could happen with such an instruction when you went over a page boundary, the new page had different protection, and you tried to restart the instruction. Lloyd
Re: CPU Arch Security [was: Re: Probably the first published shell code]
On Monday 11 November 2002 05:34 pm, Gregg C Levine wrote: However, I did look at the product, for Windows. And I wasn't thrilled by it. iRMX for Windows? UGH!!! My mind quails at the mere thought. (For those unfamiliar, iRMX was/is a realtime operating system created by Intel and primarily used in hard- and soft-realtime embedded applications.) -- - Scott D. Courtney, Senior Engineer Sine Nomine Associates [EMAIL PROTECTED] http://www.sinenomine.net/
Re: CPU Arch Security [was: Re: Probably the first published shell code]
Hello from Gregg C Levine Actually that was my reaction. The demonstration packet that they sent me, was the thing that did it. The terms, kludge, and clumsy, and a few others that were not polite, crossed my mined, at the time. And you are right about what Intel thought it was. They made up a chip for the 8086 hardware grouping that included a firmware kernel of the product. Funny, they also did the same for a different OS for the 8086. Both part numbers are long since retired, or even discontinued. --- Gregg C Levine [EMAIL PROTECTED] The Force will be with you...Always. Obi-Wan Kenobi Use the Force, Luke. Obi-Wan Kenobi (This company dedicates this E-Mail to General Obi-Wan Kenobi ) (This company dedicates this E-Mail to Master Yoda ) -Original Message- From: Linux on 390 Port [mailto:LINUX-390;VM.MARIST.EDU] On Behalf Of Scott Courtney Sent: Tuesday, November 12, 2002 11:05 AM To: [EMAIL PROTECTED] Subject: Re: [LINUX-390] CPU Arch Security [was: Re: Probably the first published shell code] On Monday 11 November 2002 05:34 pm, Gregg C Levine wrote: However, I did look at the product, for Windows. And I wasn't thrilled by it. iRMX for Windows? UGH!!! My mind quails at the mere thought. (For those unfamiliar, iRMX was/is a realtime operating system created by Intel and primarily used in hard- and soft-realtime embedded applications.) -- - Scott D. Courtney, Senior Engineer Sine Nomine Associates [EMAIL PROTECTED] http://www.sinenomine.net/
Re: CPU Arch Security [was: Re: Probably the first published shell code]
Hello from Gregg C Levine Sorry typo in my comment there. The word should be mind. Not the one I chose instead. My mind is locked on a currently running process. A slowly running one in fact. --- Gregg C Levine [EMAIL PROTECTED] The Force will be with you...Always. Obi-Wan Kenobi Use the Force, Luke. Obi-Wan Kenobi (This company dedicates this E-Mail to General Obi-Wan Kenobi ) (This company dedicates this E-Mail to Master Yoda ) -Original Message- From: Linux on 390 Port [mailto:LINUX-390;VM.MARIST.EDU] On Behalf Of Gregg C Levine Sent: Tuesday, November 12, 2002 11:59 AM To: [EMAIL PROTECTED] Subject: Re: [LINUX-390] CPU Arch Security [was: Re: Probably the first published shell code] Hello from Gregg C Levine Actually that was my reaction. The demonstration packet that they sent me, was the thing that did it. The terms, kludge, and clumsy, and a few others that were not polite, crossed my mined, at the time. And you are right about what Intel thought it was. They made up a chip for the 8086 hardware grouping that included a firmware kernel of the product. Funny, they also did the same for a different OS for the 8086. Both part numbers are long since retired, or even discontinued. --- Gregg C Levine [EMAIL PROTECTED] The Force will be with you...Always. Obi-Wan Kenobi Use the Force, Luke. Obi-Wan Kenobi (This company dedicates this E-Mail to General Obi-Wan Kenobi ) (This company dedicates this E-Mail to Master Yoda ) -Original Message- From: Linux on 390 Port [mailto:LINUX-390;VM.MARIST.EDU] On Behalf Of Scott Courtney Sent: Tuesday, November 12, 2002 11:05 AM To: [EMAIL PROTECTED] Subject: Re: [LINUX-390] CPU Arch Security [was: Re: Probably the first published shell code] On Monday 11 November 2002 05:34 pm, Gregg C Levine wrote: However, I did look at the product, for Windows. And I wasn't thrilled by it. iRMX for Windows? UGH!!! My mind quails at the mere thought. (For those unfamiliar, iRMX was/is a realtime operating system created by Intel and primarily used in hard- and soft-realtime embedded applications.) -- - Scott D. Courtney, Senior Engineer Sine Nomine Associates [EMAIL PROTECTED] http://www.sinenomine.net/
Re: CPU Arch Security [was: Re: Probably the first published shell code]
On Tuesday 05 November 2002 11:13 pm, David Boyes wrote: b) Same scenario as above, but word-substitute apache-kernel and mod_trojan-device driver. [...] And we reinvent the Multics ring structure one more time dockmaster.af.mil, wherefore art thou? I'll be lynched for saying this, but Intel's iRMX operating system used to have something like this, too. -- - Scott D. Courtney, Senior Engineer Sine Nomine Associates [EMAIL PROTECTED] http://www.sinenomine.net/
Re: CPU Arch Security [was: Re: Probably the first published shell code]
Hello from Gregg C Levine No you won't. Laughed at, yes, lynched, no. However, I did look at the product, for Windows. And I wasn't thrilled by it. But what did happen to Multics? And when will it be released to the public? However, Scott, if you want to have something happen to you, I know a nice place on Kessel. --- Gregg C Levine [EMAIL PROTECTED] The Force will be with you...Always. Obi-Wan Kenobi Use the Force, Luke. Obi-Wan Kenobi (This company dedicates this E-Mail to General Obi-Wan Kenobi ) (This company dedicates this E-Mail to Master Yoda ) -Original Message- From: Linux on 390 Port [mailto:LINUX-390;VM.MARIST.EDU] On Behalf Of Scott Courtney Sent: Monday, November 11, 2002 4:51 PM To: [EMAIL PROTECTED] Subject: Re: [LINUX-390] CPU Arch Security [was: Re: Probably the first published shell code] On Tuesday 05 November 2002 11:13 pm, David Boyes wrote: b) Same scenario as above, but word-substitute apache-kernel and mod_trojan-device driver. [...] And we reinvent the Multics ring structure one more time dockmaster.af.mil, wherefore art thou? I'll be lynched for saying this, but Intel's iRMX operating system used to have something like this, too. -- - Scott D. Courtney, Senior Engineer Sine Nomine Associates [EMAIL PROTECTED] http://www.sinenomine.net/
Re: CPU Arch Security [was: Re: Probably the first published shell code]
On Mon, Nov 11, 2002 at 08:40:45PM +0100, Ulrich Weigand was heard to remark: Linas Vepstas wrote: Every page of memory has a storage key, which holds a key and a fetch-protection bit. If the fetch-protection bit is cleared, then anyone can read the page; if the fetch-protection bit is set, only code running with a PSW key equal to the page key can read the page. So I guess you could make pages read-only to everyone. Still, the question is whether read-only access is enough; a typical library function interface is: here's a buffer, please write some data into it. -- if 'exception 04' can be caught and passed back up to the library, then potentially the library can decide what sort of access should have been allowed: none, read-only (change fetch protection bit), and read-write (change key). -- its important to distinguish what the 390 can do today, from what could be done if *it* (whatever *it* is) was done right. I am guilty of mising up these two. OK, I presented a spcific example, from the land of graphics hardware, from the mid 1990's, where these traditional unix solutions completely failed. [...] the same: this process. Aside: In some varients, the access was per-thread. Therefore it easily fits into the whole Unix access rights concept. Hmm. To implement this stuff, we needed to design some fairly sophisticated kernel modules, and make some changes to page process tables. The unix guys screamed bloody murder at the time. Last I heard, when a similar proposal was made for adding this to the Linux kernel, Torvalds called it a stupid idea (he didn't think per-thread storage should be done that way.) So even that did not fit well with 'traditional unix concepts'. What you propose is a finer-grained X (not 'this process' but 'this process while it is executing code mapped from this library'). This is what IMO is the core problem of this idea. To stay with your example: why didn't you change the system so that a process was allowed to directly access the (whole) video memory, Because we didn't trust the process but only why it was executing code from a special, trusted video library that ensured it didn't do anything it shouldn't? That would have been comparable ... Because there is no way (in todays unix, in (most of) today's cpus) to trust a library. We might have argued for fundamental changes in the CPU design, if we'd figured this out earlier, but it was too late for that, and besides, as this conversation attests, new ideas are not easily accepted when it comes to the architected interface of a CPU. The app did have to use a special graphics library which made some special kernel calls to set things up. Once set up, we used a page-fault-like mechanism to serialize access to the graphics hardware. i.e. only n apps could have direct access, the (n+1)th would page fault. We'd use time-slices to make sure all n+1 had a shot at running. Yes, but what about the difficult questions? ;-) Dunno. Wish I had enough time to think about this properly, try some prototypes, etc. As stated before, I'm not convinced that the 390 in its current state can really support the idea. It seems to have at least some of the pieces. Where are the arguments passed and returned? Where does the stack reside, and is it switched at function call or not? Dunno. [If yes, the standard C calling convention doesn't work any more -- what to use instead? ? Huh? A unix 'system call' is still expressed as a standard C function call, although the 'linkage' is certainly unusual in that it involves an SVC, and argument passing happens quite differently than as in between 'ordinary' C calls. If done right, with all the bells whistles, one would need to have a way of informing the linker that this is a 'special' call, and to insert some different subroutine glue code. How to avoid tricking the xlib into doing bad things by passing incorrect arguments to it? Well, certainly, one can check argument validity manually. The x server does this by examining each packet it receives, and doing bounds checks, etc. There is a fair amount of programmer-hour overhead to do this. If you mean, how can we automatically check that the arguments to a subroutine call have valid values, I don't know of anyway of doing this easily at run-time, and there are only limited tools at compile-time. But this is a generic problem, and not limited to this discussion. e.g. in gnome, (gtk, and in glib2 'gobjects'), there are C macros that you are supposed to use that perform run-time type checking and type-casting. There's also crap in there for marshalling and closures and etc. that are handy for ensuring program correctness. But I view this as a language design issue, since the closure thing is more important java and scheme than it is to C. I think in this particular example, those questions might be solvable, because the current design already has a
Re: CPU Arch Security [was: Re: Probably the first published shell code]
Linas Vepstas wrote: -- if 'exception 04' can be caught and passed back up to the library, Unfortunately it can't, as key-protection violation is a 'terminating' exception condition, which means the CPU state at the time the interruption is delivered is undefined. This means that the instruction that caused the exception might have already been *partially* executed, but there's no way to find out whether and to what extent that happened. The only thing you can reliably do in that situation is to terminate the process. This is as opposed to a regular protection violation, which is a 'suppressing' exception condition, i.e. the interruption is delivered with a CPU state corresponding to the state before the start of the instruction causing the exception. This allows the kernel to fix up whatever caused the fault and restart the instruction. Aside: In some varients, the access was per-thread. Now this is distincly weird ;-) I would imagine this made the implementation much more difficult (aren't threads supposed to share the same address space? ;-/), and I can't quite see the security benefits as any non-privileged thread could 'take over' the privileged thread by overwriting the code that this thread was currently executing ... Last I heard, when a similar proposal was made for adding this to the Linux kernel, Torvalds called it a stupid idea (he didn't think per-thread storage should be done that way.) So even that did not fit well with 'traditional unix concepts'. Well, I certainly agree with Linus as to the per-thread thing here ;-) However, if we are simply talking per-process, I think something like this is already now in the Linux kernel: the direct-rendering module (DRM) extension uses a special X server module in conjunction with kernel support to provide a similar facility to user-space processes AFAIK (on cards that have the required hardware support, of course). If you mean, how can we automatically check that the arguments to a=20 subroutine call have valid values, I don't know of anyway of doing this easily at run-time, and there are only limited tools at compile-time. But this is a generic problem, and not limited to this discussion. e.g. in gnome, (gtk, and in glib2 'gobjects'), there are C macros=20 that you are supposed to use that perform run-time type checking and type-casting. Well, I was not so much thinking about programming errors, but more about deliberate attempts to trick the trusted part into violating security. For example, on Intel the kernel address space is actually part of the process' 4 GB address space, it just isn't accessible. However, imagine a malicious user space application performs a write system call and passes a buffer address that points to some *kernel* address (i.e. above 3 GB). If the kernel would now naively fulfil the write request, it would access the data to write (which is now, in kernel context, perfectly readable) and put it into the file. User space could then read the file back, and -voila- it has accessed memory it should not be allowed to access ;-) While this problem is well-known to kernel programmers, even now there are sometimes new security exposures found due to this type of failure to validate syscall arguments ... ? Huh? A unix 'system call' is still expressed as a standard C function call, although the 'linkage' is certainly unusual in that it involves an SVC, and argument passing happens quite differently than as in between 'ordinary' C calls. Well, what is expressed as a standard C function call *is* in fact a standard C function call, namely to a function implemented in libc (which actually performs the real SVC system call). As to the calling convention of the SVC itself, we cheated by disallowing more than 5 arguments, so that all arguments can be passed in registers (if there are more than 5 arguments, the libc stub packs them into a struct and passes a pointer to it to the SVC). The 'performance' argument is difficult to win, or loose, since OS weaknesses and strengths fundamentally alter how one goes about designing applications libraries. Indeed. One should never make performance arguments without having actual measurements as support; I've been bitten by 'I think this should be faster' arguments before ;-) The 'features' argument is quite different. Clearly, the creators of apache accept and trust all apache modules. Would they still do so if it was 'easy' not to trust them? I suspect not. On the other hand, if it was important to them to establish a trust boundary, they could easily do so by running modules as separate processes; AFAIK the main task of a module is to create a stream of bytes to be returned to the remote user, so the interface would naturally fit a pipe/socket-based inter-process communication mechanism ... But I guess if you can't trust the module, you can't really trust the whole server, as the module is the instance that generates the data the user actually sees. Can one create a
Re: CPU Arch Security [was: Re: Probably the first published shell code]
On Tue, 12 Nov 2002, Ulrich Weigand wrote: Linas Vepstas wrote: -- if 'exception 04' can be caught and passed back up to the library, Unfortunately it can't, as key-protection violation is a 'terminating' exception condition, which means the CPU state at the time the interruption is delivered is undefined. This means that the instruction that caused the exception might have already been *partially* executed, but there's no way to find out whether and to what extent that happened. The only thing you can reliably do in that situation is to terminate the process. I don't know where to look to verify that, but I did discover at http://publibz.boulder.ibm.com/cgi-bin/bookmgr_OS390/BOOKS/DZ9AR006/6.1.4.2?SHELF=DT=19990630131355 that the ILC can have random values at some times so I guess you're right. So, can PER be used? -- Cheers John. Please, no off-list mail. You will fall foul of my spam treatment. Join the Linux Support by Small Businesses list at http://mail.computerdatasafe.com.au/mailman/listinfo/lssb
Re: CPU Arch Security [was: Re: Probably the first published shell code]
On Tue, Nov 12, 2002 at 02:00:14AM +0100, Ulrich Weigand was heard to remark: Linas Vepstas wrote: -- if 'exception 04' can be caught and passed back up to the library, Unfortunately it can't, as key-protection violation is a 'terminating' exception condition, which means the CPU state at the time the interruption is delivered is undefined. This means that the instruction that caused the exception might have already been *partially* executed, but there's no way to find out whether and to what extent that happened. Ugh. Well, that could stop the show. But since the instruction causing this would be a problem state instruction, maybe there are fewer wacky situations to deal with. Clearly, a store can be re-executed without harm, even if its been partly executed before. The problem would be with any instructions that had side effects, that would not do the same thing the second time around, as it did the first time. I don't know what these would be. Aside: In some varients, the access was per-thread. Now this is distincly weird ;-) I would imagine this made the Well, the graphics commands were not inherently atomic; they have to be serialized. To avoid race conditions, either you allowed only one thread to have direct access :-( or you forced the use of a lock, which was out of the question because of the performance hit, or you gave each thread its own graphics context. Technically, its not hard to implement this, but it drove traditional unix guys (and torvalds alan cox) nuts. (Since we used page tables to enforce access, each thread would have slightly different page tables: one reason they hated it.) I've never heard of direct-hardware access to an ethernet card, but in principle, one could do it: then one could avoid things like the zero-copy technology much balyhooed in the Linux kernel, and could avoid having to put portions of a web server in the kernel (khttpd), since presumably, the application could now just blast bytes directly into the ethernet card. But if one did this, imagine the bad stuff that would happen if the app was allowed to have two threads to blast bytes into the card, and you didn't do something to control that situation. You could argue for cooperative multi-threading but that sure has the flavour of bad old msdos/mac/win. Damned if you do, damned if you don't. anyway, to summarize: I still think that providing some mechanism to build a 'trust barrier' between app and library would be a very good tool to add to the programmer's toolkit, even if/when there are other ways to acheive the same effect. (By the same token, threads are just processes with shared memory, and so therefore traditional unix should not support threads? But we do, and there are certain problem domains where they are just the right thing. Threads are not easy to use, and require some sophistication, but thier utility is undoubted these days.) --linas -- pub 1024D/01045933 2001-02-01 Linas Vepstas (Labas!) [EMAIL PROTECTED] PGP Key fingerprint = 8305 2521 6000 0B5E 8984 3F54 64A9 9A82 0104 5933 msg09415/pgp0.pgp Description: PGP signature
Re: CPU Arch Security [was: Re: Probably the first published shell code]
On Tue, 12 Nov 2002 11:38, you wrote: On Tue, Nov 12, 2002 at 02:00:14AM +0100, Ulrich Weigand was heard to remark: Linas Vepstas wrote: -- if 'exception 04' can be caught and passed back up to the library, Unfortunately it can't, as key-protection violation is a 'terminating' exception condition, which means the CPU state at the time the interruption is delivered is undefined. This means that the instruction that caused the exception might have already been *partially* executed, but there's no way to find out whether and to what extent that happened. Ugh. Well, that could stop the show. But since the instruction causing this would be a problem state instruction, maybe there are fewer wacky situations to deal with. Clearly, a store can be re-executed without harm, even if its been partly executed before. The problem would be with any instructions that had side effects, that would not do the same thing the second time around, as it did the first time. I don't know what these would be. decimal instructions - ap, mp and such. xc relatives Of more concern's the info I turned up, that the ILC can be unpredictable. Without that, you can't determine what the instruction is. -- Cheers John Summerfield Microsoft's most solid OS: http://www.geocities.com/rcwoolley/ Join the Linux Support by Small Businesses list at http://mail.computerdatasafe.com.au/mailman/listinfo/lssb
Re: CPU Arch Security [was: Re: Probably the first published shell code]
On Tue, 12 Nov 2002 13:30, you wrote: xc relatives On further reflection, those fail before they start. However, if they're subject to an execute instruction I think you have difficulties. Mind you, if you're contemplating a new CPU design, you can also consider a microcode update to an existing one. Years ago, Software AG had a Database Engine, AKA ESP, which was (I think) a Magnasson S/370 clone with extra jellyware, OS/VS1 (I think) and Adabas. It connected via CTC. Neale, do you know more about this? -- Cheers John Summerfield Microsoft's most solid OS: http://www.geocities.com/rcwoolley/ Join the Linux Support by Small Businesses list at http://mail.computerdatasafe.com.au/mailman/listinfo/lssb
Re: CPU Arch Security [was: Re: Probably the first published shell code]
On Sun, 2002-11-10 at 01:55, John Summerfield wrote: Is this a reason to not close down those avenues that are easy? Seems to me that if you fix some, you have fewer left to fix. As the philospher said, a journey of a thousand leagues starts with a single step. From a security view point thats like arguing that its worth closing the windows even though you don't actually have a door on the main entrance to the house. The step you have to make IMHO is from stopping to what happens when
Re: CPU Arch Security [was: Re: Probably the first published shell code]
Linas Vepstas wrote: But in my storage-key world, I can imagine spearating the strcture from the data, and putting the structure in read-only memory, where the app can see it but not corrupt it, and putting the data in a read-write key, where the app can do whatever it wants. And *once you have cleanly separated the two* (which is the diffcult part), why can't you just put them into two separate files, one of which is mmapped read-only? In this kind of app, if you had to hide the data behind a socket, I claim you would get performance roughly comparable to ordinary file-io, and nowhere near the speed of memory-mapped io. In Linux, memory-mapped io is usually somewhat slower than file-io. (Both access the same page cache, but mmap in addition has to maintain all those page tables ...) There are of course some scenarios where mmap is faster, but mostly it is just more convenient. In any case, why would you use a socket when mmap is faster? Just use mmap, and use file access rights to protect the data. Don't try to misunderstand me on purpose. Programmer A might be be a very sophisticated developer of libraries, and able to go to some lengths to do a good job. Programmer F might be trying to create an app using A's library. But F is a ninny, and A would like to protect against F's pratfalls, and maintain a semblence of data integrity in the face of F's bad programming style. Giving A a mechanism like storage keys gives A a chance to do well. But only if A manages to cleanly separate its data into different protection domains. Once this is done, there are any number of possibilities to enforce those, e.g. mmapping files with different access rights. One main point why just function calls are so much easier to use than IPC is that you can pass pointers to complex data structures back and forth. If you really implement some sort of strict memory protection, that won't work any more. Huh? why not? Because those pointers will point to memory the callee is not allowed to access? Huh? why not? Because the library call has switched the PSW key, that was your whole point, wasn't it? So if the caller was able to access that memory, the callee (running with a different PSW key) won't be. What we did instead was to build some very special graphics hardware that had the equivelent of storage keys in the hardware. To get high performance, we doled out a key to a particular rectangular area of the screen, and a client could doodle directly into the graphics adapter memory all it wanted, at a very high performance. But if it tried to write outside of the window boundary, well, too bad, those pixels were masked off, and no damage was done. Sure, sometimes there are scenarios where specific hardware support helps you to implement a particular solution in an efficient and secure manner. (Usually, the problem was well-known before and hardware support designed specifically to solve it.) I do not doubt that you might find some problems where the S/390 storage keys could help. But I'm not convinced that any such solution will be significantly easier to implement and/or show significantly higher performance than solutions that could be implmented using the generic Unix protection mechanisms (separate processes, mmap, file access rights). In any case, I have still not quite understood just how the solution you are proposing is supposed to work. Could you give an example of a problem you want to solve, and how you would go about it? (I.e. which memory areas get what storage key, what's in them, what pieces of code run under which PSW key and/or mask, and at what places the keys are switched?) Bye, Ulrich -- Dr. Ulrich Weigand [EMAIL PROTECTED]
Re: CPU Arch Security [was: Re: Probably the first published shell code]
On Sat, 9 Nov 2002 01:09, you wrote: On Thu, 2002-11-07 at 19:11, John Summerfield wrote: On IA32, if it's not in the code segment, you can't execute it. The code segment _can_ be ro, so presumably a return to arbitrary code can be prevented. I dont need to modify any of the code segment to exploit your machine. In fact several exploits work on the basis they overrun a stack section with a complete return sequence including variables to cause an execlp(/bin/sh, ...) to occur. No code changes needed Is this a reason to not close down those avenues that are easy? Seems to me that if you fix some, you have fewer left to fix. As the philospher said, a journey of a thousand leagues starts with a single step. -- Cheers John Summerfield Microsoft's most solid OS: http://www.geocities.com/rcwoolley/ Join the Linux Support by Small Businesses list at http://mail.computerdatasafe.com.au/mailman/listinfo/lssb
Re: CPU Arch Security [was: Re: Probably the first published shell code]
On Thu, 2002-11-07 at 19:11, John Summerfield wrote: On IA32, if it's not in the code segment, you can't execute it. The code segment _can_ be ro, so presumably a return to arbitrary code can be prevented. I dont need to modify any of the code segment to exploit your machine. In fact several exploits work on the basis they overrun a stack section with a complete return sequence including variables to cause an execlp(/bin/sh, ...) to occur. No code changes needed
Re: CPU Arch Security [was: Re: Probably the first published shell code]
Linas Vepstas wrote: Sorry I used the word semaphore. Using pipes shmem is hard. Well, using them is easy, using them and creating something that's extenisble, maintainble, lacks race conditions and other bugs ... that's a lot harder. If it's so easy, why didn't ssh do it years ago? The hard stuff is to design your application so that it *can* be separated into multiple protection domains. The actual *implementation* of that separation via multiple processes is easy (trivial in comparison). Security is hard, and trying to implement some sort of framework that will allow even inexperienced programmers to magically only produce secure code strikes me as somewhat naive ;-) One main point why just function calls are so much easier to use than IPC is that you can pass pointers to complex data structures back and forth. If you really implement some sort of strict memory protection, that won't work any more. Huh? why not? Because those pointers will point to memory the callee is not allowed to access? To define a library interface, traditionally, one defines some functin prototypes, and then writes the code to make them do something. To pass data on a socket, you need to define the structure of the data passing through the socket. And you need to define at least two subroutines, one to pack the data into the pipe, one to pull it out. Hm? read () and write () should do just fine ... OK, so there are tools that can automate a lot of this, such as RPC's and stub generators and IDL's, or you can code in Java or scheme or C# or some language that provides native support for this (introspection, or ability to marshall/unmarshall itself). Or use .net so that you can be a stupid programmer creating incredibly complex systems. But these are all very complex mechanisms if your goal is merely to be a lowly library trying to prevent someone from meddling with your innards. You need all this only if the data you send over the pipe is supposed to be platform-independent. If you only communicate locally between two processes guaranteed to run on the same machine, just send the raw binary image of your data structures over the pipe and you're done. Bye, Ulrich -- Dr. Ulrich Weigand [EMAIL PROTECTED]
Re: CPU Arch Security [was: Re: Probably the first published shell code]
At 17:09 11/08/2002 +, Alan Cox wrote: In fact several exploits work on the basis they overrun a stack section with a complete return sequence including variables to cause an execlp(/bin/sh, ...) to occur. Yup, that was exactly the case in the Phrack article that started this whole topic. It's actually the most common of the buffer overrun exploitation techniques. Ross strpcy() considered harmful Patterson
Re: CPU Arch Security [was: Re: Probably the first published shell code]
On Fri, Nov 08, 2002 at 05:50:56PM +0100, Ulrich Weigand was heard to remark: Linas Vepstas wrote: Sorry I used the word semaphore. Using pipes shmem is hard. Well, using them is easy, using them and creating something that's extenisble, maintainble, lacks race conditions and other bugs ... that's a lot harder. If it's so easy, why didn't ssh do it years ago? The hard stuff is to design your application so that it *can* be separated into multiple protection domains. The actual *implementation* of that separation via multiple processes is easy (trivial in comparison). Sorta. It depends. One should also distinguish between app developers and library developers. For example, and maybe this is a bad example, but we had a library developer who wanted to create the ultimate dBaseIII/foxbase comaptbile database library. The traditional problem with dbase is that if the app goes crazy, it corrupts not only the data, but the structure itself, making it hard/impossible to recover. But in my storage-key world, I can imagine spearating the strcture from the data, and putting the structure in read-only memory, where the app can see it but not corrupt it, and putting the data in a read-write key, where the app can do whatever it wants. Part of the power of a dbase-like thing is that one memory-mapped the db file, and thus you could just pick through it. It was memory-mapped because this access was a *lot* faster (orders of magnitude, even on linux), than using ordinary file-io routines. In this kind of app, if you had to hide the data behind a socket, I claim you would get performance roughly comparable to ordinary file-io, and nowhere near the speed of memory-mapped io. To get decent performance out of a database that's hidden behind a socket, you have to invent something like SQL, and that gets hard. Even a super-minature, homebrew query-object is still harder than just memory-mapping some data structure. Theres an analogous situation with languages that support persistent objects (e.g. python, or maybe some form of java). The problem becomes one of how can python (the language itself) save the object to a file, and resotre it later, without risking corruption from the app itself? This is solved in part by e.g. having a java vm which can't be corrupted (in theory) by plain-old java code. But back in the early days of java, there were all sort os applets that could break of of the JVM. (Since java today is used mostly for servlets, and since we implicitly trust servlets, this is less of a problem).But if you were a JVM, and you might be buggy, and you wanted to run an untrusted java applet, it sure would be nice to put the applets memory pool in a different storage key than the JVM, so that if it did try to break you, it couldn't write to your storage. I don't see how to turn this problem into a I'll just use sockets to protect me problem. Security is hard, and trying to implement some sort of framework that will allow even inexperienced programmers to magically only produce secure code strikes me as somewhat naive ;-) Don't try to misunderstand me on purpose. Programmer A might be be a very sophisticated developer of libraries, and able to go to some lengths to do a good job. Programmer F might be trying to create an app using A's library. But F is a ninny, and A would like to protect against F's pratfalls, and maintain a semblence of data integrity in the face of F's bad programming style. Giving A a mechanism like storage keys gives A a chance to do well. One main point why just function calls are so much easier to use than IPC is that you can pass pointers to complex data structures back and forth. If you really implement some sort of strict memory protection, that won't work any more. Huh? why not? Because those pointers will point to memory the callee is not allowed to access? Huh? why not? Socekts also have some severe performance problems in certain domains. For example, the XShm extension to X11 uses shared memory to communicate stuff (pixmaps) with the X server, in order to avoid a rather sever penalty of cramming it through a socket. By contrast, X11 itself was designed as a client-server system precisly in order to prevent one crazy window client from corrupting the entire display and all other windows. Imagine how much easier it would have been (it could be?) to implement a windowing system as a library, not as a client-server with the associated protocol, encode/decodde layers, yadda yadda, if one could only gaurentee that one crazy app wouldn't spoil the entire display. Sadly, this is almost a lament (I worked on X11 for years): storage keys, where wer't thou when you were needed? What we did instead was to build some very special graphics hardware that had the equivelent of storage keys in the hardware. To get high performance, we doled out a key to a particular rectangular area of the screen, and a client could doodle directly
Re: CPU Arch Security [was: Re: Probably the first published shell code]
On Wed, Nov 06, 2002 at 10:36:40AM +0800, John Summerfield was heard to remark: On Wed, 6 Nov 2002 05:45, you wrote: The core idea is actually so simple, its painful. Today, most CPU's define two memory spaces: the one that the kernel lives in, and the one that the user-space lives in. When properly designed, there is nothing a user-space program can do to corrupt kernel memory. One 'switches' between these memory spaces by making system calls, i.e. by the SVC instruction. The 390 arch has not two, but 16 memory spaces (a 4-bit key) with this type of protection. (When I did the i370 port, I put the kernel in space 0 or 1 or someething like that, and user space programs ran in one of the others.) The partitioning between them is absolute, and is just like the kernel-space/user-space division in other archs. The mechanism is independent/orthogonal to the VM/TLB subsystem (you can have/use virtual memory in any of the spaces.) This then points to a simple, basic idea: suppose one could make a call to a plain-old shared library, but have that library be protected behind the same kind of syscall protections that the kernel has. Then there would be nothing that the caller could do to corrupt the memory used by the shared library, no more than (a user-space) caller can corrupt kernel memory. Am I naive? How does a caller corrupt a shared library on Linux on IA32? I don't mean 'corrupt /usr/lib/libwhatever.so'. That is still hard to do. What I mean is that if an app is linked to a library, and that library opens a file in read-write mode, then the app also has full read-write to that file. There is nothing the library can do to keep the app at bay. In general, the app can corrupt the RAM that the library is using, including the libraries stack. This started out as a discussion about stack smashing, after all. BTW IA32 has four protection levels enforced in hardware. I believe the problem is that Linux doesn't use them all. I don't know the intel arch at all. I know powerpc, mostly. true blue ... Any apache modules Again, I think you misunderstood. What I mean to say is that a misbehaving apache module can write any bytes whatsoever that it wants to the socket that is connected to the surfer's web browser. Among other things. The structure I describe could be used (i beleive) to prevent this kind of unauthorized access. Today, you cannot make a distinction between trusting apache itself, and trusting any apache module, since they both run in the same address space, and therefore have full read and write access to that address space. --linas -- pub 1024D/01045933 2001-02-01 Linas Vepstas (Labas!) [EMAIL PROTECTED] PGP Key fingerprint = 8305 2521 6000 0B5E 8984 3F54 64A9 9A82 0104 5933
Re: CPU Arch Security [was: Re: Probably the first published shell code]
On Wed, Nov 06, 2002 at 10:14:34AM +0800, John Summerfield was heard to remark: On Wed, 6 Nov 2002 04:39, you wrote: x86 alas doesnt support page level no execute. Other platforms do and can run with nonexec stacks. People still exploit them. The libraries are mostly mapped read only on Linux, people don't need to modify them. You put arguments on the stack, and corrupt the return code to call the right C library function. In IA32, you cannot execute stack-segment code. Because of the way Linux (and other oses) are designed, with a single address space per process, the stack segment and code segment are the same storage, and that's how you get to put executable code on the stack and have it execute. I don't know ia32. You don't need to put code into the stack. You only need to modify the return address, and have a subroutine return to a different location. To modify the return address, you only need write access to the stack, you don't need execute permissions. --linas -- pub 1024D/01045933 2001-02-01 Linas Vepstas (Labas!) [EMAIL PROTECTED] PGP Key fingerprint = 8305 2521 6000 0B5E 8984 3F54 64A9 9A82 0104 5933
Re: CPU Arch Security [was: Re: Probably the first published shell code]
Today, you cannot make a distinction between trusting apache itself, and trusting any apache module, since they both run in the same address space, and therefore have full read and write access to that address space. Which, in the S/390 CICS world is handled by the domain concept; CICS systems modules run in one domain and can interface with the OS in ways that the CICS applications can not becasue of the protection keys that the s/390 hardware supports. Garry E. Ward Senior Software Specialist Maritz Research Automotive Research Group 419-725-4123 -Original Message- From: Linas Vepstas [mailto:linas;linas.org] Sent: Thursday, November 07, 2002 11:47 AM To: [EMAIL PROTECTED] Subject: Re: CPU Arch Security [was: Re: Probably the first published shell code] On Wed, Nov 06, 2002 at 10:36:40AM +0800, John Summerfield was heard to remark: On Wed, 6 Nov 2002 05:45, you wrote: The core idea is actually so simple, its painful. Today, most CPU's define two memory spaces: the one that the kernel lives in, and the one that the user-space lives in. When properly designed, there is nothing a user-space program can do to corrupt kernel memory. One 'switches' between these memory spaces by making system calls, i.e. by the SVC instruction. The 390 arch has not two, but 16 memory spaces (a 4-bit key) with this type of protection. (When I did the i370 port, I put the kernel in space 0 or 1 or someething like that, and user space programs ran in one of the others.) The partitioning between them is absolute, and is just like the kernel-space/user-space division in other archs. The mechanism is independent/orthogonal to the VM/TLB subsystem (you can have/use virtual memory in any of the spaces.) This then points to a simple, basic idea: suppose one could make a call to a plain-old shared library, but have that library be protected behind the same kind of syscall protections that the kernel has. Then there would be nothing that the caller could do to corrupt the memory used by the shared library, no more than (a user-space) caller can corrupt kernel memory. Am I naive? How does a caller corrupt a shared library on Linux on IA32? I don't mean 'corrupt /usr/lib/libwhatever.so'. That is still hard to do. What I mean is that if an app is linked to a library, and that library opens a file in read-write mode, then the app also has full read-write to that file. There is nothing the library can do to keep the app at bay. In general, the app can corrupt the RAM that the library is using, including the libraries stack. This started out as a discussion about stack smashing, after all. BTW IA32 has four protection levels enforced in hardware. I believe the problem is that Linux doesn't use them all. I don't know the intel arch at all. I know powerpc, mostly. true blue ... Any apache modules Again, I think you misunderstood. What I mean to say is that a misbehaving apache module can write any bytes whatsoever that it wants to the socket that is connected to the surfer's web browser. Among other things. The structure I describe could be used (i beleive) to prevent this kind of unauthorized access. Today, you cannot make a distinction between trusting apache itself, and trusting any apache module, since they both run in the same address space, and therefore have full read and write access to that address space. --linas -- pub 1024D/01045933 2001-02-01 Linas Vepstas (Labas!) [EMAIL PROTECTED] PGP Key fingerprint = 8305 2521 6000 0B5E 8984 3F54 64A9 9A82 0104 5933 font size=1Confidentiality Warning: This e-mail contains information intended only for the use of the individual or entity named above. If the reader of this e-mail is not the intended recipient or the employee or agent responsible for delivering it to the intended recipient, any dissemination, publication or copying of this e-mail is strictly prohibited. The sender does not accept any responsibility for any loss, disruption or damage to your data or computer system that may occur while using data contained in, or transmitted with, this e-mail. If you have received this e-mail in error, please immediately notify us by return e-mail. Thank you.
Re: CPU Arch Security [was: Re: Probably the first published shell code]
On Tue, Nov 05, 2002 at 10:16:28PM +0100, Ulrich Weigand was heard to remark: Adam Thornton wrote: (However, changing the Linux tool chain and basically *all* applications from a flat address space I don't see that you need to change *all* apps. This would be only for apps that really care. to multiple address spaces is an *enormous* task; and I'm not I didn't say it wasn't enormous. Its not tiny, but I'm not sure its that big either. Depends on how easy you want to make it for the app developer. Certainly a prototype would not do this to everything, not by default. If you were to default it in the wrong way, it would be enormous, and it would break many (most?) apps. convinced this buys you anything w.r.t. security that can't be achieved much more easily, e.g. by StackGuard-type compilers. What I had in mind was the following: preventing an app from getting write access to a file that the shared library opened. Or preventing the app from getting write access to a socket or other IPC that the shared lib created/opened/is using. Or, for example, the following database stunt: having a database shared lib memory-map a database file, and then giving the app read-write access to one page of that memory map, but not all of them. In some ways, I suppose this is possible today, but its hard makes you jump through loops. Client-server loops, in particular. Complex IPC setups. Haven't you ever noticed how few apps actually use traditional IPC (semaphores, shmem, etc?) That's because its hard, its complexity, its crap that the app developer has to design, and it takes a lot of effort. Today, the only kind of address-space security that unix has is that one process cannot corrupt the address space of another process. Thus, if you want to have address-space security, you *must* write multiple-process apps, which means you *must* use IPC to coordinate the processes. Ugh. *That is what I'm talking about.* --linas -- pub 1024D/01045933 2001-02-01 Linas Vepstas (Labas!) [EMAIL PROTECTED] PGP Key fingerprint = 8305 2521 6000 0B5E 8984 3F54 64A9 9A82 0104 5933
Re: CPU Arch Security [was: Re: Probably the first published shell code]
On Tue, Nov 05, 2002 at 04:01:57PM -0500, Adam Thornton was heard to remark: Good lord, I can't believe that I'm arguing for a segmented architecture. After they beat me down, my plan is to later claim tht I was only playing devil's advocate, and I wasn't actually stupid enough to beleive in it. :-) --linas -- pub 1024D/01045933 2001-02-01 Linas Vepstas (Labas!) [EMAIL PROTECTED] PGP Key fingerprint = 8305 2521 6000 0B5E 8984 3F54 64A9 9A82 0104 5933
Re: CPU Arch Security [was: Re: Probably the first published shell code]
On Thu, 2002-11-07 at 12:02, Ward, Garry wrote: Which, in the S/390 CICS world is handled by the domain concept; CICS systems modules run in one domain and can interface with the OS in ways that the CICS applications can not becasue of the protection keys that the s/390 hardware supports. Disclaimer: I can just barely *spell* CICS. I thought that CICS was supporting subspaces these days (and in fact, that subspaces were invented for it). Is CICS using protect keys? -- David Andrews A. Duda and Sons, Inc. [EMAIL PROTECTED]
Re: CPU Arch Security [was: Re: Probably the first published shell code]
On Thu, 7 Nov 2002 10:46:30 -0600, Linas Vepstas [EMAIL PROTECTED] wrote: On Wed, Nov 06, 2002 at 10:36:40AM +0800, John Summerfield was heard to remark: On Wed, 6 Nov 2002 05:45, you wrote: The core idea is actually so simple, its painful. Today, most CPU's define two memory spaces: the one that the kernel lives in, and the one that the user-space lives in. When properly designed, there is nothing a user-space program can do to corrupt kernel memory. One 'switches' between these memory spaces by making system calls, i.e. by the SVC instruction. The 390 arch has not two, but 16 memory spaces (a 4-bit key) with this type of protection. (When I did the i370 port, I put the kernel in space 0 or 1 or someething like that, and user space programs ran in one of the others.) The partitioning between them is absolute, and is just like the kernel-space/user-space division in other archs. The mechanism is independent/orthogonal to the VM/TLB subsystem (you can have/use virtual memory in any of the spaces.) I'm coming late to the party, but this information should be corrected. Th 4 bit key is not a memory space but an attribute of a block of memory (4K or 2K) within a continous address space. The PSW also contains a protect key and the CAW (Channel Address Word) two. Key zero typically allows access to all memory. When a PSW is set to a non-zero key, it can typically read any storage but it can only write to its own key. [The storage key byte has other items like a read-only bit.] Thus the storage key allows for memory isolation between programs running in the same address space, with key zero being used for the supervisor. That was MVT/MFT days... Nowadays 2K storage keys have vanished. In MVS, program isolation is by address space. Most of kernel memory is shared with the application program space. Key 8 is for programs and key zero is for kernel. Subsystems like VTAM can take over a storage key. And of course there are all sorts of multi-address space stuff now - access registers, primary/secondary address space instructions, etc etc. But the storage key was originally used for program isolation. It was a pretty nice advance for its time. john alvord
Re: CPU Arch Security [was: Re: Probably the first published shell code]
According to the CICS Resource Definition guide: EXECKEY(USER|CICS) In a CICS region with STORAGE PROTECTION active, a user key program has read and write access to USER key storage, but read only access to CICS key storage. Storage protection is the 4 bit flag that Linas referred to at the beginning of this thread; if the storage key where the insturction is executed from doesn't match the storage key of the target of the instruction, you get a protection exception raised and, generally, your program terminated. Garry E. Ward Senior Software Specialist Maritz Research Automotive Research Group 419-725-4123 -Original Message- From: David Andrews [mailto:dba;duda.com] Sent: Thursday, November 07, 2002 12:20 PM To: [EMAIL PROTECTED] Subject: Re: CPU Arch Security [was: Re: Probably the first published shell code] On Thu, 2002-11-07 at 12:02, Ward, Garry wrote: Which, in the S/390 CICS world is handled by the domain concept; CICS systems modules run in one domain and can interface with the OS in ways that the CICS applications can not becasue of the protection keys that the s/390 hardware supports. Disclaimer: I can just barely *spell* CICS. I thought that CICS was supporting subspaces these days (and in fact, that subspaces were invented for it). Is CICS using protect keys? -- David Andrews A. Duda and Sons, Inc. [EMAIL PROTECTED] font size=1Confidentiality Warning: This e-mail contains information intended only for the use of the individual or entity named above. If the reader of this e-mail is not the intended recipient or the employee or agent responsible for delivering it to the intended recipient, any dissemination, publication or copying of this e-mail is strictly prohibited. The sender does not accept any responsibility for any loss, disruption or damage to your data or computer system that may occur while using data contained in, or transmitted with, this e-mail. If you have received this e-mail in error, please immediately notify us by return e-mail. Thank you.
Re: CPU Arch Security [was: Re: Probably the first published shell code]
On Thu, 7 Nov 2002, Linas Vepstas wrote: On Wed, Nov 06, 2002 at 10:14:34AM +0800, John Summerfield was heard to remark: On Wed, 6 Nov 2002 04:39, you wrote: x86 alas doesnt support page level no execute. Other platforms do and can run with nonexec stacks. People still exploit them. The libraries are mostly mapped read only on Linux, people don't need to modify them. You put arguments on the stack, and corrupt the return code to call the right C library function. In IA32, you cannot execute stack-segment code. Because of the way Linux (and other oses) are designed, with a single address space per process, the stack segment and code segment are the same storage, and that's how you get to put executable code on the stack and have it execute. I don't know ia32. You don't need to put code into the stack. You only need to modify the return address, and have a subroutine return to a different location. To modify the return address, you only need write access to the stack, you don't need execute permissions. On IA32, if it's not in the code segment, you can't execute it. The code segment _can_ be ro, so presumably a return to arbitrary code can be prevented. -- Cheers John. Please, no off-list mail. You will fall foul of my spam treatment. Join the Linux Support by Small Businesses list at http://mail.computerdatasafe.com.au/mailman/listinfo/lssb
Re: CPU Arch Security [was: Re: Probably the first published shell code]
Linas, Do I understand you correctly, in that you propose a multi layered system integrity design, whereby shared libs for example have a different authorisation from normal apps (almost like a multi ring structure)? One of the issues I can see with such an implementation in linux, is that the solutions to achive something like this are going to be very hw platform dependent. S/390 offers a wealth of features to implement this efficiently whereas other hw platforms which are more risc based will need to do a lot of tricks. In order to keep linux linux, one could think of some kind of micro kernel which is then hw dependent, and includes all the hw dependent services (including the above) under which one would run linux. I think a hw dependent micro kernel/hypervisor would make the above issues easier to solve. Such a model is by no means new, AIX V2 (RT) ran under a virtual resource manager, and AIX/370 only ran under VM, both of these AIXes used the hypervisor to take care of the hw specifics in one way or another. Jan Jaeger From: Linas Vepstas [EMAIL PROTECTED] Reply-To: Linux on 390 Port [EMAIL PROTECTED] To: [EMAIL PROTECTED] Subject: Re: CPU Arch Security [was: Re: Probably the first published shell code] Date: Thu, 7 Nov 2002 11:09:04 -0600 On Tue, Nov 05, 2002 at 10:16:28PM +0100, Ulrich Weigand was heard to remark: Adam Thornton wrote: (However, changing the Linux tool chain and basically *all* applications from a flat address space I don't see that you need to change *all* apps. This would be only for apps that really care. to multiple address spaces is an *enormous* task; and I'm not I didn't say it wasn't enormous. Its not tiny, but I'm not sure its that big either. Depends on how easy you want to make it for the app developer. Certainly a prototype would not do this to everything, not by default. If you were to default it in the wrong way, it would be enormous, and it would break many (most?) apps. convinced this buys you anything w.r.t. security that can't be achieved much more easily, e.g. by StackGuard-type compilers. What I had in mind was the following: preventing an app from getting write access to a file that the shared library opened. Or preventing the app from getting write access to a socket or other IPC that the shared lib created/opened/is using. Or, for example, the following database stunt: having a database shared lib memory-map a database file, and then giving the app read-write access to one page of that memory map, but not all of them. In some ways, I suppose this is possible today, but its hard makes you jump through loops. Client-server loops, in particular. Complex IPC setups. Haven't you ever noticed how few apps actually use traditional IPC (semaphores, shmem, etc?) That's because its hard, its complexity, its crap that the app developer has to design, and it takes a lot of effort. Today, the only kind of address-space security that unix has is that one process cannot corrupt the address space of another process. Thus, if you want to have address-space security, you *must* write multiple-process apps, which means you *must* use IPC to coordinate the processes. Ugh. *That is what I'm talking about.* --linas -- pub 1024D/01045933 2001-02-01 Linas Vepstas (Labas!) [EMAIL PROTECTED] PGP Key fingerprint = 8305 2521 6000 0B5E 8984 3F54 64A9 9A82 0104 5933 _ Direct chatten met je vrienden met MSN Messenger http://messenger.msn.nl
Re: CPU Arch Security [was: Re: Probably the first published shell code]
On Thu, 7 Nov 2002, Ward, Garry wrote: Today, you cannot make a distinction between trusting apache itself, and trusting any apache module, since they both run in the same address space, and therefore have full read and write access to that address space. Which, in the S/390 CICS world is handled by the domain concept; CICS systems modules run in one domain and can interface with the OS in ways that the CICS applications can not becasue of the protection keys that the s/390 hardware supports. I know nothing of the domain concept. Apache may be constrained by the facilities proved by its software environment (gcc and the Linux kernel), but the 80386 hardware implements four levels of protection which could be used to provide this kind of protection. However, use of this hardware facility is incompatible with the flat memory model. In the 80386 these levels are hierarchic whereas on S/370 the privilege association with protect keys 1-15 is done in software. -- Cheers John. Please, no off-list mail. You will fall foul of my spam treatment. Join the Linux Support by Small Businesses list at http://mail.computerdatasafe.com.au/mailman/listinfo/lssb
Re: CPU Arch Security [was: Re: Probably the first published shell code]
It doesn't matter where the instruction resides - the key in the current PSW determines stomping rights. -jcf - Original Message - From: Ward, Garry [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Thursday, November 07, 2002 11:37 AM Subject: Re: CPU Arch Security [was: Re: Probably the first published shell code] snip if the storage key where the insturction is executed from doesn't match the storage key of the target of the instruction, you get a protection exception raised and, generally, your program terminated.
Re: CPU Arch Security [was: Re: Probably the first published shell code]
Linas Vepstas wrote: I didn't say it wasn't enormous. Its not tiny, but I'm not sure its that big either. Well, for a start, you can't really do program calls in home space mode (which is where Linux user mode runs), so you'd need to fundamentally redesign the whole kernel/user space model (and there's not much room for changes here unless you want to go back to a model where kernel pages are mapped into the user address space) ... Depends on how easy you want to make it for the app developer. Certainly a prototype would not do this to everything, not by default. If you were to default it in the wrong way, it would be enormous, and it would break many (most?) apps. Indeed. However, if you need to significantly change your apps to make use of these hypothetical security features, then you could just as well change the apps to use existing mechanisms, like just using multiple processes ... In some ways, I suppose this is possible today, but its hard makes you jump through loops. Client-server loops, in particular. Complex IPC setups. Haven't you ever noticed how few apps actually use traditional IPC (semaphores, shmem, etc?) That's because its hard, its complexity, its crap that the app developer has to design, and it takes a lot of effort. So System V IPC is crap, that's no big news ;-) Use pipes and/or shared mmap instead, and you'll see lots of applications that do that. (See e.g. the recently introduced 'privilege separation' feature of OpenSSH.) Today, the only kind of address-space security that unix has is that one process cannot corrupt the address space of another process. Thus, if you want to have address-space security, you *must* write multiple-process apps, which means you *must* use IPC to coordinate the processes. Ugh. So what *is* that mechanism that is: - significantly easier to use than multiple processes / IPC - worth while for application programmers to implement even though they restrict themselves to a single platform - actually implementable in the Linux framework? I still don't quite see how this is supposed to work. One main point why just function calls are so much easier to use than IPC is that you can pass pointers to complex data structures back and forth. If you really implement some sort of strict memory protection, that won't work any more. So how do inter-protection-region calls pass parameters, and why is this then still significantly easier than IPC? Furthermore, if the protection is supposed to be enforced against adversary efforts (and not just some feature to catch bugs, like e.g. mprotect), then there needs to be a protection barrier set up by some agent outside of the process itself. E.g. you don't just switch the PSW key (the callee could just switch it back), but you need to actually change the PSW key *mask*, i.e. perform a program call or the like. This needs kernel help to set up program call numbers, entry tables etc. How it that kernel help to be triggered? What authentication mechanism protects access to these features? I don't think the S/390 hardware features are a 'magic bullet' that solves all these security problems. The real problem is to come up with an interface that actually provides these features in a sane way; the hardware might then be used to implement them in a more efficent way, maybe. (Other platforms provide similar features b.t.w. E.g. one could imagine a facility that uses call gates or task gates to implement inter-library calls. This would involve about the complexity and need the same sort of kernel help as using program calls in Linux for S/390 user space ...) Bye, Ulrich -- Dr. Ulrich Weigand [EMAIL PROTECTED]
Re: CPU Arch Security [was: Re: Probably the first published shell code]
On Thu, Nov 07, 2002 at 07:22:09PM +, Jan Jaeger was heard to remark: Linas, Do I understand you correctly, in that you propose a multi layered system integrity design, whereby shared libs for example have a different authorisation from normal apps (almost like a multi ring structure)? Yes, I beleive something like that may be possible. Till someone actually tries to do it, and learns some lessons, I don't know. I'd love to try, but I'd need a new employer :-) One of the issues I can see with such an implementation in linux, is that the solutions to achive something like this are going to be very hw platform dependent. S/390 offers a wealth of features to implement this efficiently whereas other hw platforms which are more risc based will need to do a lot of tricks. Yes. I'm not convinced that s/390 even has exactly the best set of features, but clearly (I'm thinking storage keys) it comes close. Other CPU's are SOL, although I thought maybe some of the risc arches are moving in a similar direction. Lord knows what intel is planning. In order to keep linux linux, one could think of some kind of micro kernel No. This is at best an experiment, whose success/failure suggests new feautres for future CPU's. Maybe if it was a wild success, one might try to have some backwards compat mode ... but I can't see that, not now. Such a model is by no means new, AIX V2 (RT) ran under a virtual resource I have an RT in storage somewhere. It even booted last time I turned it on. -- pub 1024D/01045933 2001-02-01 Linas Vepstas (Labas!) [EMAIL PROTECTED] PGP Key fingerprint = 8305 2521 6000 0B5E 8984 3F54 64A9 9A82 0104 5933
Re: CPU Arch Security [was: Re: Probably the first published shell code]
On Fri, Nov 08, 2002 at 12:55:31AM +0100, Ulrich Weigand was heard to remark: Linas Vepstas wrote: I didn't say it wasn't enormous. Its not tiny, but I'm not sure its that big either. Well, for a start, you can't really do program calls in home space mode (which is where Linux user mode runs), so you'd need to I want to set storage keys differently on different chunks of memory. And then I want be able to change storage keys at some subroutine call boundaries, but not others. If a program call is not the ticket, then, hmm. maybe some other, kernel assisted stunt. Last time I looked at this, a few years ago, it seemed possible, but I'm pretty new to the 390 arch so I dunno. Indeed. However, if you need to significantly change your apps to make use of these hypothetical security features, then you could just as well change the apps to use existing mechanisms, like just using multiple processes ... Yes, except as stated, I beleive that creating apps that use multiple proccesses is hard. (Off-topic: I know a guy who teaches C++, and he insists that 25% of his students don't know how to handle C pointers correctly, even if they've been coding in C for a decade. Do you think this 25% would be capable of writing a multiple-process app? My extrapolation from this datapoint is grim.) So System V IPC is crap, that's no big news ;-) Use pipes and/or shared mmap instead, and you'll see lots of applications that do that. (See e.g. the recently introduced 'privilege separation' feature of OpenSSH.) Sorry I used the word semaphore. Using pipes shmem is hard. Well, using them is easy, using them and creating something that's extenisble, maintainble, lacks race conditions and other bugs ... that's a lot harder. If it's so easy, why didn't ssh do it years ago? - significantly easier to use than multiple processes / IPC Some storage-key based mechanism. - worth while for application programmers to implement even though they restrict themselves to a single platform I was proposing experimentation. I'm thinking that, after some experiments, one discovers either that its a good idea, at which point other future CPU's arch's should also get such mechanisms, say 2 or 5 or 8 years out. Or one finds that a slightly different implemention would be better. Or maybe the idea of putting a fine-grained control over read/write/execute permissions on different portions of a flat memory space is a bad idea. Right now, it seems like a neat idea, but without significant work, I won't know. Maybe it can't be done easily/cleanly/well on the s390. I'm not adverse to going into a lab and working with someone to create a new cpu which would be a testbed for such a feature. Maybe alter some secret 390 insn microcode, you know. One main point why just function calls are so much easier to use than IPC is that you can pass pointers to complex data structures back and forth. If you really implement some sort of strict memory protection, that won't work any more. Huh? why not? and why is this then still significantly easier than IPC? To define a library interface, traditionally, one defines some functin prototypes, and then writes the code to make them do something. To pass data on a socket, you need to define the structure of the data passing through the socket. And you need to define at least two subroutines, one to pack the data into the pipe, one to pull it out. And a third routine to do the actual work that your app is supposed to do. OK, so there are tools that can automate a lot of this, such as RPC's and stub generators and IDL's, or you can code in Java or scheme or C# or some language that provides native support for this (introspection, or ability to marshall/unmarshall itself). Or use .net so that you can be a stupid programmer creating incredibly complex systems. But these are all very complex mechanisms if your goal is merely to be a lowly library trying to prevent someone from meddling with your innards. Furthermore, if the protection is supposed to be enforced against adversary efforts (and not just some feature to catch bugs, like e.g. mprotect), then there needs to be a protection barrier set up by some agent outside of the process itself. E.g. you don't just switch the PSW key (the callee could just switch it back), but you need to actually change the PSW key *mask*, i.e. perform a program call or the like. This needs kernel help to set up program call numbers, entry tables etc. yes. How is that kernel help to be triggered? Dunno What authentication mechanism protects access to these features? Dunno. I don't think the S/390 hardware features are a 'magic bullet' that solves all these security problems. The Maybe not :-( real problem is to come up with an interface that actually provides these features in a sane way; the Yes. hardware might then be used to implement them in a more efficent way, maybe. If the feature is sane usable, then
Re: CPU Arch Security [was: Re: Probably the first published shell code]
On Wed, 2002-11-06 at 02:36, John Summerfield wrote: BTW IA32 has four protection levels enforced in hardware. I believe the problem is that Linux doesn't use them all. x86 has 2 or 4 depending on where you look. Some of the levels really exist as back compatibility segmentation stuff only. Most other Linux platforms have only supervisor/user. You _can_ secure your webserver now to ensure Apache components can't corrupt static data such as executable scripts, html etc. You only make it harder. I merely start to do evil stuff like call with the direction flag backwards, or passing bogus parameters. If you then also validate the parameters its probably as cheap to have two processes. b) Same scenario as above, but word-substitute apache-kernel and mod_trojan-device driver. If the linux kernel ran in 'space 2', but device drivers ran in 'space 3', then nasties can't hurt the kernel, while still enjoying read-write access to the bus and other hardware that a legit device driver needs access to. Could be done now in IA32. As I recall, OS/2 does just that. Value zero. I ask the hardware to overwrite the kernel. Plus it costs me a -lot- of performance.
Re: CPU Arch Security [was: Re: Probably the first published shell code]
Linas Vepstas wrote: I've always been curious. Why is a top down stack used anyways ?? If the heap grows up, and the stack grows down, then one can have, in theory, arbitrarily large stacks. Handy for CPU's that have a single flat memory space that is not very big Ahhh Now if someone had explained that the stack was like LSQA and heap was like user private, then I'd gotten it right off the bat ;-) Thanks, Greg
Re: CPU Arch Security [was: Re: Probably the first published shell code]
On Tue, 2002-11-05 at 21:01, Adam Thornton wrote: other data spaces, and I don't think you can execute code from data spaces, but you see where this is going), so you could share your shared x86 alas doesnt support page level no execute. Other platforms do and can run with nonexec stacks. People still exploit them. The libraries are mostly mapped read only on Linux, people don't need to modify them. You put arguments on the stack, and corrupt the return code to call the right C library function.
Re: CPU Arch Security [was: Re: Probably the first published shell code]
I am not sure that you would need dcss's to protect one from arbitrarily jumping into shared libraries (as may be used by exploits). If one was to design shared libraries such that each shared library has its own address space then one could use cross memory to execute from that address space. One could have a PC call for each shared library function, and as such normal users would never be able to get to that code, other then by means of the pc call, which executes a predetermined function. This together with a non executable stack will make things harder for any viruses. I think that hardware funtionality, with os support is the best answer to viruses, although it will probably never be 100% failsafe. Jan Jaeger From: Adam Thornton [EMAIL PROTECTED] Reply-To: Linux on 390 Port [EMAIL PROTECTED] To: [EMAIL PROTECTED] Subject: CPU Arch Security [was: Re: Probably the first published shell code] Date: Tue, 5 Nov 2002 16:01:57 -0500 On Tue, Nov 05, 2002 at 08:03:35PM +, Alan Cox wrote: Flavour of the year appears to be maths sign/overflow mishandling. Buffer overflows are no longer a growth area as programmers learn that one. Gee, only took 'em, what, 40 years? For this to catch on in the mainstream, other CPU architectures would need to add similar features as well. But given the recent burbling from microsoft and intel about palladium and how cpu arch changes can enhance security, (which intel seems to be actually working on) I do not think that it is too wild, too early or too impractical to engage in this task. I don't really see how fiddling with libraries helps you, but enlighten me Well, one thing I can see exploiting under VM would be an agressive use of DCSSes (or something like them--I don't know if you can put DCSSes in other data spaces, and I don't think you can execute code from data spaces, but you see where this is going), so you could share your shared libraries between Linux images. If each one were in its own read-only address space, you'd get a vast reduction in overall memory footprint, plus code couldn't exploit bugs in the standard libraries--even if you have a buffer overflow (or whatever) vulnerability, a) the code is off in its own private address space, so you can't go trash anything else, and b) your virtual machine has that segment marked read-only anyway. Good lord, I can't believe that I'm arguing for a segmented architecture. Adam _ Chatten met je vrienden via het web? Probeer MSN Messenger http://messenger.msn.nl/default.asp?client=1
Re: CPU Arch Security [was: Re: Probably the first published shell code]
Adam Thornton wrote: Well, one thing I can see exploiting under VM would be an agressive use of DCSSes (or something like them--I don't know if you can put DCSSes in other data spaces, and I don't think you can execute code from data spaces, but you see where this is going), so you could share your shared libraries between Linux images. If each one were in its own read-only address space, you'd get a vast reduction in overall memory footprint, plus code couldn't exploit bugs in the standard libraries--even if you have a buffer overflow (or whatever) vulnerability, a) the code is off in its own private address space, so you can't go trash anything else, and b) your virtual machine has that segment marked read-only anyway. Using DCSSes to reduce overall memory footprint is certainly a useful goal, which we are actually working on right now. As to security implications, however, DCSSes contribute exactly nothing IMO. Whether to use separate address spaces or not has nothing whatsoever to do with whether you put into those separate address spaces, once you've got them, a mapping of a DCSS or just a mapping of regular memory. (However, changing the Linux tool chain and basically *all* applications from a flat address space to multiple address spaces is an *enormous* task; and I'm not convinced this buys you anything w.r.t. security that can't be achieved much more easily, e.g. by StackGuard-type compilers. Certainly nobody has even attempted to do this w.r.t. segments on Intel for example -- at least as far as I know.) Also, shared libraries always have a read-only part (code and read-only data) and a read-write part (variables); the read-only part is mapped read-only by default, without any DCSSes in sight. (Sure, you *can* change that using mprotect or so, but once the exploit code has gained enough control to issue system calls, everything that is to be lost is already lost ...) Likewise, the read-write part would need to be mapped from regular memory even when using DCSSes. In general, I can only re-iterate my belief that attempting to guarantee security *even in the presence of bugs* is ultimately futile. Bye, Ulrich -- Dr. Ulrich Weigand [EMAIL PROTECTED]
Re: CPU Arch Security [was: Re: Probably the first published shell code]
On Tue, 2002-11-05 at 21:16, Ulrich Weigand wrote: convinced this buys you anything w.r.t. security that can't be achieved much more easily, e.g. by StackGuard-type compilers. Certainly nobody has even attempted to do this w.r.t. segments on Intel for example -- at least as far as I know.) There is Solar Designers non exec stack stuff which uses a segment trick to fake non exec pages and also some experimental bits (ab)using segments for fast Linux on Linux virtualisation. On modern x86 segment limits are really expensive though - 1 or more clocks per access. In general, I can only re-iterate my belief that attempting to guarantee security *even in the presence of bugs* is ultimately futile. Definitely. Security policy should start when xyz breaks in..
Re: CPU Arch Security [was: Re: Probably the first published shell code]
On Tue, Nov 05, 2002 at 08:03:35PM +, Alan Cox was heard to remark: On Tue, 2002-11-05 at 19:04, Linas Vepstas wrote: For this to catch on in the mainstream, other CPU architectures would need to add similar features as well. But given the recent burbling from microsoft and intel about palladium and how cpu arch changes can enhance security, (which intel seems to be actually working on) I do not think that it is too wild, too early or too impractical to engage in this task. I don't really see how fiddling with libraries helps you, but enlighten me The core idea is actually so simple, its painful. Today, most CPU's define two memory spaces: the one that the kernel lives in, and the one that the user-space lives in. When properly designed, there is nothing a user-space program can do to corrupt kernel memory. One 'switches' between these memory spaces by making system calls, i.e. by the SVC instruction. The 390 arch has not two, but 16 memory spaces (a 4-bit key) with this type of protection. (When I did the i370 port, I put the kernel in space 0 or 1 or someething like that, and user space programs ran in one of the others.) The partitioning between them is absolute, and is just like the kernel-space/user-space division in other archs. The mechanism is independent/orthogonal to the VM/TLB subsystem (you can have/use virtual memory in any of the spaces.) This then points to a simple, basic idea: suppose one could make a call to a plain-old shared library, but have that library be protected behind the same kind of syscall protections that the kernel has. Then there would be nothing that the caller could do to corrupt the memory used by the shared library, no more than (a user-space) caller can corrupt kernel memory. Now, for the two core claims: 1) There are some really neat things that can be done with this design, (more below) 2) I beleive that the memory-key system in the 390 architecture is just right to be able to do this, and that its (relatively?) straightforward to implement. I have *not* created a working prototype, or dug into it for a few years now, but it seems doable. This feature has been and is used by other applications on the 390, on other operating systems. 3) One cannot really emulate this kind of memory protection, at least not without a big performance hit, which negates the whole point of having it in the first place. However, the changes needed to a CPU arch to support such a thing are simple, minor and doable. Probably in the form of a few extra bits page tables TLB's and the needed logic to allow/deny read/write access to a page, based on which 'space' the current thread is executing in. OK, so what neat things can one do with this? a) Imagine running Apache, and loading some untrusted apache module which in fact is a trojan horse. Oops. Today, there is nothing that apache can do to prevent mod_trojan_horse from corrupting the innards of apache (and by extension, other things, like web pages). But imagine if apache could run behind a syscall-like barrier. Then damage done by mod_trojan becomes a lot more limited, and maybe even completely containable. b) Same scenario as above, but word-substitute apache-kernel and mod_trojan-device driver. If the linux kernel ran in 'space 2', but device drivers ran in 'space 3', then nasties can't hurt the kernel, while still enjoying read-write access to the bus and other hardware that a legit device driver needs access to. c) Today, virtually all client-server IPC is through sockets. There is a lot of CPU overhead to set up a socket, jam data into it, process-switch, pull data out of it, decode it, and get on with business. Never mind the design overhead of having to invent a protocol. If the client and server are on the same machine, then this overhead is 'wasted'. There are cases where this overhead represents 99% of all CPU cycles to accomplish some function (certain DB queries, certain x-window drawing operations etc.). Imagine replacing this IPC with a simple call to a shared library. You're in you're out, you're done. But today, you can't, because any data that the shared library touches cannot be protected from the client, and so one traditionally needs to have a server to keep nasty clients at bay. (Imagine the converse: imagine that the only way a device driver could talk to the linux kernel was through a socket. Imagine how horrible that world would be. Well, that horrible world is called 'client-server computing' when one must live in user-space.) d) Apply one's imagination ... I'm not sure, but I wonder if it might help simplify the design/implementation of 'manditory access controls' (e.g. lomac, the NSA stuff, etc.) used for high-security. --linas -- pub 1024D/01045933 2001-02-01 Linas Vepstas (Labas!)
Re: CPU Arch Security [was: Re: Probably the first published shell code]
On Wed, 6 Nov 2002 04:39, you wrote: x86 alas doesnt support page level no execute. Other platforms do and can run with nonexec stacks. People still exploit them. The libraries are mostly mapped read only on Linux, people don't need to modify them. You put arguments on the stack, and corrupt the return code to call the right C library function. In IA32, you cannot execute stack-segment code. Because of the way Linux (and other oses) are designed, with a single address space per process, the stack segment and code segment are the same storage, and that's how you get to put executable code on the stack and have it execute. -- Cheers John Summerfield Microsoft's most solid OS: http://www.geocities.com/rcwoolley/ Join the Linux Support by Small Businesses list at http://mail.computerdatasafe.com.au/mailman/listinfo/lssb
Re: CPU Arch Security [was: Re: Probably the first published shell code]
On Wed, 6 Nov 2002 05:45, you wrote: The core idea is actually so simple, its painful. Today, most CPU's define two memory spaces: the one that the kernel lives in, and the one that the user-space lives in. When properly designed, there is nothing a user-space program can do to corrupt kernel memory. One 'switches' between these memory spaces by making system calls, i.e. by the SVC instruction. The 390 arch has not two, but 16 memory spaces (a 4-bit key) with this type of protection. (When I did the i370 port, I put the kernel in space 0 or 1 or someething like that, and user space programs ran in one of the others.) The partitioning between them is absolute, and is just like the kernel-space/user-space division in other archs. The mechanism is independent/orthogonal to the VM/TLB subsystem (you can have/use virtual memory in any of the spaces.) This then points to a simple, basic idea: suppose one could make a call to a plain-old shared library, but have that library be protected behind the same kind of syscall protections that the kernel has. Then there would be nothing that the caller could do to corrupt the memory used by the shared library, no more than (a user-space) caller can corrupt kernel memory. Am I naive? How does a caller corrupt a shared library on Linux on IA32? Okay, you can do it as root. If you're root you have full privilege _by definition_. So, I imagine you could corrupt L/390 in the same way, even with your protection model. BTW IA32 has four protection levels enforced in hardware. I believe the problem is that Linux doesn't use them all. OK, so what neat things can one do with this? a) Imagine running Apache, and loading some untrusted apache module which in fact is a trojan horse. Oops. Today, there is nothing that apache can do to prevent mod_trojan_horse from corrupting the innards of apache (and by extension, other things, like web pages). But imagine if apache could run behind a syscall-like barrier. Then damage done by mod_trojan becomes a lot more limited, and maybe even completely containable. Any apache modules has to have access to users' data. Doesn't necessarily have to be writeable though, and doesn't have to be owned by the account used to access it. You _can_ secure your webserver now to ensure Apache components can't corrupt static data such as executable scripts, html etc. If your application has to write data, you can still secure it to a large extent by your choices in how to store data (maybe though some DBMS or by running your particular application as a different user (user/group specs in VirtualHost). Note that the Apache 1.3 implementation is a little odd, and it's broken in 2.0, but the design is there. b) Same scenario as above, but word-substitute apache-kernel and mod_trojan-device driver. If the linux kernel ran in 'space 2', but device drivers ran in 'space 3', then nasties can't hurt the kernel, while still enjoying read-write access to the bus and other hardware that a legit device driver needs access to. Could be done now in IA32. As I recall, OS/2 does just that. -- Cheers John Summerfield Microsoft's most solid OS: http://www.geocities.com/rcwoolley/ Join the Linux Support by Small Businesses list at http://mail.computerdatasafe.com.au/mailman/listinfo/lssb
Re: CPU Arch Security [was: Re: Probably the first published shell code]
b) Same scenario as above, but word-substitute apache-kernel and mod_trojan-device driver. If the linux kernel ran in 'space 2', but device drivers ran in 'space 3', then nasties can't hurt the kernel, while still enjoying read-write access to the bus and other hardware that a legit device driver needs access to. And we reinvent the Multics ring structure one more time dockmaster.af.mil, wherefore art thou? -- db
Re: CPU Arch Security [was: Re: Probably the first published shell code]
At 00:39 11/06/2002 -0500, you wrote: On Tue, Nov 05, 2002 at 11:13:59PM -0500, David Boyes wrote: And we reinvent the Multics ring structure one more time dockmaster.af.mil, wherefore art thou? More to the point, now that there are no longer any Multics systems in production... When does it get released to the world? Hopefully it'll come with the PL/I compiler, so we can port it to Linux :-) Ross Patterson