First, I don't speak for Sun Microsystems. Yada yada yada. Now that that is out of the way ...
Executive summary ----------------- I have not seen any signs of instability in the kernel for quite some time. All of our problems, lately, have been in userland: the run-time loader and libc. I am still stuck on problems with the run-time loader. I have some strategies I could pursue, but time has run out, so it is time to fall back to Plan B. Gory details ------------ The current sticking point in the "main sequence" of bringup is the userland run-time loader, /lib/ld.so.1. I get as far as _ld_libc(). That is pretty far along. If it gets that far, then it means that many shared objects are already mmapped, and several sections other than .text and .data are also initialized. Also, it has already been demonstrated that there have been several successful function calls and returns back and forth between the run-time loader and libc. When _ld_libc() is called, the program counter goes off into the corn field. That happens so early that I cannot even insert a trace message at the start of _ld_libc(). I can send a message just before calling it, and I can place a call to trace the entry of _ld_libc(). The trace before the call happens, but the trace at function entry never happens. So, it is time to play with the branch history again. One suspect would be the very fact that rtld and libc are position independent code (PIC), which would be vulnerable if the PLT or GOT get stomped on. With non-PIC code, all sorts of things can go wrong, but at least there is an entire class of problems that just don't happen. To illustrate how PIC code can be more vulnerable to problems involving very unpredictable flow of control, let's just take the case of a reference to global data. With PIC code, if a function contains any references to data that is not local to that function, then it is dereferenced through the Global Offset Table (GOT). The GNU C compiler will generate a branch and link to GOT[-1], which has been arranged to contain a 'blrl' (branch and link to link register) instruction. This is a cheap way to generate the value of &GOT[0]. This is not some GNU-specific trick; it is not a Solaris trick; it is part of the ABI (Application Binary Interface). The vulnerability is that if GOT[0] gets walked on, you don't know where you are going next. Can't see --------- Some of you may be wondering, "What is so hard about getting to single user prompt on Solaris?". After all, we got to "Hello, world" a long time ago. And now, we can't even manage "Hello, world" using the modern Solaris standard methods, let alone /sbin/init, and then on to a legitimate single user prompt. There is more than one answer to that question. First, it is hard to see. Sure, it is hard to see in the kernel, as well. We do not have a real debugger. But, I mean, it is even worse, once you get to userland. Things are sort of OK, if you run a statically linked binary. But, on modern Solaris, all programs, even /sbin/init, are dynamically linked and depend on the run-time loader, and libc. But, when you fire up the run-time loader, you can't get any messages out. None. There is no access to a system call to issue error messages. That would require access to the write system call, which is provided by libc, which is not yet initialized. OK, we could solve the problem by having the run-time loader call its own private copy of _write, etc. But, where are you going to write to? There are no open file descriptors. Oh yeah, we need a device tree. Things are a bit better now, because I made up a new system call, just for Polaris, and just for the time being, which gives a userland program direct access to the kernel function, prom_printf(). I called it kprintf(). I have been given to understand that Linux has a function called kprintf(). Sorry. I liked the name because to me the leading 'k' meant that it was doing something that involved direct access to the kernel; something special; something not ordinarily permitted in userland. More about kprintf(), later. By the way, the Polaris version of source code is behind. I may be talking about something that has not shown up there, yet. I will have to have a word with Tom Riddle about that. OK, I think I have covered enough about the problem of observability. There's more. Can't reduce ------------ Couldn't we reduce the complexity of rtld + libc? If we need only a small subset, in order to run "Hello world", then we don't need the full libc. How about a libc-lite? Actually, that IS a good idea. But it is tricky. It is not a matter of just extracting a subset of the functions in libc. It is doable, but not that easy. I believe it should be much easier than it is. I am a big fan of the idea that subsets should work. To me, it even qualifies as a guiding principle. But, that is not the way things are. For example, "Hello world" is a tiny program. It does not use any sockets, any asynchronous I/O, any polling or events, none of that. So far, so good, maybe. It does not use multiple threads. We don't need any thread library stuff, right? Well, maybe not, but the basic thread model under Solaris means that the thread is THE fundamental unit of program execution, right from the beginning -- it is not an add-on. rtld+libc and the kernel are just fundamentally based on threads. rtld has special hooks into libc, and libc had better know how to initialize threads, in order to do anything. It is not like all libraries are created equal, but some are more equal than others. No, we don't even pretend. Libc is just plain special. So, what I am saying is that, the way things are, there is a significant amount of irreducible complexity -- more than I would like to see.. That is, from my perspective as one of the poor slobs who came along to do another port of Solaris. Don't get me wrong. There are some cool things about Solaris threads, and the whole idea of threads, instead of processes, as the fundamental building block. It is just that it raises the cost of entry for you and me. Raised cost of entry -------------------- On Sparc and x86, once having progressed in stages to the model in use now, Sun engineers and Sun management would never look back. I do believe, however, that there are ways to make some subsets work. But, that is a significant project, in itself. It is a worthy project, but one that I cannot undertake, now. Sorry. There are other areas where Solaris on the currently supported platforms has moved on in a way that means there is no looking back. For example, Solaris is wired for kmdb as the debugger. Someone trying to do a port to a new processor cannot easily bypass implementing mdb/kmbd and fall back to the old, kadb. I am sure it can be done, but all _products_ have kmdb, so it's pretty much "Pull the ladder up, Jack. I'm alright." Things are geared toward kmdb or nothing. Some of you may have seen the "Poor Man's DeBugger", which is just some functions inside the kernel. Pretty lame, huh? All I have to say is that it is better than nothing. Design for port --------------- PMDB, as poor as it is, is part of what I call the "design for port" initiative. That includes a whole lot of little details that are meant to make it easier for the next poor slob who gets the idea that he wants to port Solaris. With my luck, that would probably be me. There is nothing sexy about design for port. There is no one body of code that you can point to and say "that is the design-for-port code." It is just hundreds of little things, like modifying the way some component is coded, with an eye toward relying less on the full system having already been developed. It is things like not throwing away scaffolding, such as unit test code. Rather, cleaning it up a bit, and finding a place for it in the source tree. It is trying to avoid, where possible, the progression toward irreducible complexity. This is turning into a rant. Enough of that for now. Oh, just one more thing. I said that kprintf() was just for the time being. I lied. Design-for-port means that it is a permanent part of Polaris. It will simply never make it into Sun product. But, this is not Solaris product, and that has advantages. Just statically link -------------------- After reading me whinge about the difficulties of getting rtld+libc up and running, you might be wondering why not just get things up and running with everything statically linked? If you are looking for someone to blame, you need look no further. I was at least partly responsible for that decision. I made the case for going for it, even though I knew it carried added risk. There is no question that we would have just done statically linked everything. Not just userland programs -- everything, including the kernel. We had very few people and none were linker wonks. But then, Brian Horn became available. That changed everything. With him participating, we would have a chance to do "real" Solaris. Not only that, but he was a member of the team that did the Solaris/PPC 2.6 port. That worked out well. Right away, Brian got VOF up and running and got the kernel run-time loader working. I knew that there was a port of Solaris to the ARM processor. I knew that they did the "sane" thing, and just statically linked the kernel and user applications. Nobody who knows how difficult it is to port Solaris would fault them for that decision. It is the reasonable thing to do. But, if an opportunity like that comes along, then grab it. I do not regret the decision. Brian's Kung Fu is the best. Aside from just grabbing when an opportunity arises, another reason I pushed for going with the full rtld+libc treatment is that I believe it is important, in the bigger picture, to be capable of doing both. Solaris product has long since moved away from statically linked anything. But, no Solaris products are for embedded. For many embedded systems, building things statically is the appropriate thing to do. But, if we produce a statically linked kernel or applications or both, it should be because we have shown that we do Solaris, right, and we can do it either way, and we choose to do a particular flavor using static linking. But, it would not do for us to produce only statically linked, because rtld+libc is too hard. Brian is no longer employed by Sun. I have been working through the rtld+libc problems, on my own. Yeah, I am making progress, and I have some ideas how to proceed. But, I will be here for only a short while longer, and so I cannot guarantee that rtld will be ready for prime time, before my time is up. Also, I have made promises to document things to be handed off -- to whomever. So, I have to budget the rest of my time for two things: 1) cleanup, documentation, and hand-off 2) Plan B: single user prompt with statically linked userland apps. More about plan B, later. -- Guy Shaw