First, I don't speak for Sun Microsystems.  Yada yada yada.

Now that that is out of the way ...


Executive summary
-----------------
I have not seen any signs of instability in the kernel for quite
some time.  All of our problems, lately, have been in userland:
the run-time loader and libc.

I am still stuck on problems with the run-time loader.  I have
some strategies I could pursue, but time has run out, so it
is time to fall back to Plan B.

Gory details
------------
The current sticking point in the "main sequence" of bringup
is the userland run-time loader, /lib/ld.so.1.

I get as far as _ld_libc().  That is pretty far along.  If it
gets that far, then it means that many shared objects are
already mmapped, and several sections other than .text and .data
are also initialized.  Also, it has already been demonstrated
that there have been several successful function calls and
returns back and forth between the run-time loader and libc.

When _ld_libc() is called, the program counter goes off into
the corn field.  That happens so early that I cannot even
insert a trace message at the start of _ld_libc().  I can
send a message just before calling it, and I can place a call
to trace the entry of _ld_libc().  The trace before the call
happens, but the trace at function entry never happens.  So,
it is time to play with the branch history again.

One suspect would be the very fact that rtld and libc are
position independent code (PIC), which would be vulnerable if
the PLT or GOT get stomped on.  With non-PIC code, all sorts
of things can go wrong, but at least there is an entire class
of problems that just don't happen.

To illustrate how PIC code can be more vulnerable to problems
involving very unpredictable flow of control, let's just take
the case of a reference to global data.  With PIC code, if a
function contains any references to data that is not local to
that function, then it is dereferenced through the Global Offset
Table (GOT).  The GNU C compiler will generate a branch and
link to GOT[-1], which has been arranged to contain a 'blrl'
(branch and link to link register) instruction.  This is a
cheap way to generate the value of &GOT[0].  This is not some
GNU-specific trick; it is not a Solaris trick; it is part of
the ABI (Application Binary Interface).

The vulnerability is that if GOT[0] gets walked on, you don't
know where you are going next.

Can't see
---------
Some of you may be wondering, "What is so hard about getting to
single user prompt on Solaris?".  After all, we got to "Hello,
world" a long time ago.  And now, we can't even manage "Hello,
world" using the modern Solaris standard methods, let alone
/sbin/init, and then on to a legitimate single user prompt.

There is more than one answer to that question.

First, it is hard to see.  Sure, it is hard to see in the
kernel, as well.  We do not have a real debugger.  But, I mean,
it is even worse, once you get to userland.  Things are sort
of OK, if you run a statically linked binary.  But, on modern
Solaris, all programs, even /sbin/init, are dynamically linked
and depend on the run-time loader, and libc.  But, when you fire
up the run-time loader, you can't get any messages out.  None.
There is no access to a system call to issue error messages.
That would require access to the write system call, which is
provided by libc, which is not yet initialized.

OK, we could solve the problem by having the run-time loader
call its own private copy of _write, etc.  But, where are
you going to write to?  There are no open file descriptors.
Oh yeah, we need a device tree.

Things are a bit better now, because I made up a new system
call, just for Polaris, and just for the time being, which
gives a userland program direct access to the kernel function,
prom_printf().  I called it kprintf().  I have been given to
understand that Linux has a function called kprintf().  Sorry.
I liked the name because to me the leading 'k' meant that
it was doing something that involved direct access to the
kernel; something special; something not ordinarily permitted
in userland.

More about kprintf(), later.

By the way, the Polaris version of source code is behind.
I may be talking about something that has not shown up there,
yet.  I will have to have a word with Tom Riddle about that.

OK, I think I have covered enough about the problem of
observability.  There's more.

Can't reduce
------------
Couldn't we reduce the complexity of rtld + libc?  If we need
only a small subset, in order to run "Hello world", then we
don't need the full libc.  How about a libc-lite?

Actually, that IS a good idea.  But it is tricky.

It is not a matter of just extracting a subset of the functions
in libc.  It is doable, but not that easy.  I believe it
should be much easier than it is.  I am a big fan of the
idea that subsets should work.  To me, it even qualifies as a
guiding principle.  But, that is not the way things are.

For example, "Hello world" is a tiny program.  It does not
use any sockets, any asynchronous I/O, any polling or events,
none of that.  So far, so good, maybe.  It does not use multiple
threads.  We don't need any thread library stuff, right?  Well,
maybe not, but the basic thread model under Solaris means that
the thread is THE fundamental unit of program execution, right
from the beginning -- it is not an add-on.  rtld+libc and the
kernel are just fundamentally based on threads.  rtld has
special hooks into libc, and libc had better know how to
initialize threads, in order to do anything.  It is not like
all libraries are created equal, but some are more equal than
others.  No, we don't even pretend.  Libc is just plain special.

So, what I am saying is that, the way things are, there is
a significant amount of irreducible complexity -- more than
I would like to see.. That is, from my perspective as one
of the poor slobs who came along to do another port of Solaris.

Don't get me wrong.  There are some cool things about Solaris
threads, and the whole idea of threads, instead of processes,
as the fundamental building block.  It is just that it raises
the cost of entry for you and me.

Raised cost of entry
--------------------
On Sparc and x86, once having progressed in stages to the
model in use now, Sun engineers and Sun management would
never look back.

I do believe, however, that there are ways to make some subsets
work.  But, that is a significant project, in itself.  It is
a worthy project, but one that I cannot undertake, now.  Sorry.

There are other areas where Solaris on the currently supported
platforms has moved on in a way that means there is no looking
back.  For example, Solaris is wired for kmdb as the debugger.
Someone trying to do a port to a new processor cannot easily
bypass implementing mdb/kmbd and fall back to the old, kadb.
I am sure it can be done, but all _products_ have kmdb, so
it's pretty much "Pull the ladder up, Jack.  I'm alright."
Things are geared toward kmdb or nothing.

Some of you may have seen the "Poor Man's DeBugger",
which is just some functions inside the kernel.
Pretty lame, huh?  All I have to say is that it is better
than nothing.

Design for port
---------------
PMDB, as poor as it is, is part of what I call the "design
for port" initiative.  That includes a whole lot of little
details that are meant to make it easier for the next
poor slob who gets the idea that he wants to port Solaris.
With my luck, that would probably be me.  There is nothing
sexy about design for port.  There is no one body of code that
you can point to and say "that is the design-for-port code."
It is just hundreds of little things, like modifying the way
some component is coded, with an eye toward relying less on
the full system having already been developed.  It is things
like not throwing away scaffolding, such as unit test code.
Rather, cleaning it up a bit, and finding a place for it
in the source tree.  It is trying to avoid, where possible,
the progression toward irreducible complexity.

This is turning into a rant.  Enough of that for now.

Oh, just one more thing.  I said that kprintf() was just for
the time being.  I lied.  Design-for-port means that it is
a permanent part of Polaris.  It will simply never make it
into Sun product.  But, this is not Solaris product, and that
has advantages.


Just statically link
--------------------
After reading me whinge about the difficulties of getting
rtld+libc up and running, you might be wondering why not just
get things up and running with everything statically linked?

If you are looking for someone to blame, you need look no
further.  I was at least partly responsible for that decision.
I made the case for going for it, even though I knew it carried
added risk.

There is no question that we would have just done statically
linked everything.  Not just userland programs -- everything,
including the kernel.  We had very few people and none
were linker wonks.  But then, Brian Horn became available.
That changed everything.  With him participating, we would
have a chance to do "real" Solaris.  Not only that, but he
was a member of the team that did the Solaris/PPC 2.6 port.

That worked out well.  Right away, Brian got VOF up and running
and got the kernel run-time loader working.

I knew that there was a port of Solaris to the ARM processor.
I knew that they did the "sane" thing, and just statically
linked the kernel and user applications.  Nobody who knows
how difficult it is to port Solaris would fault them for
that decision.  It is the reasonable thing to do.  But, if
an opportunity like that comes along, then grab it.  I do not
regret the decision.  Brian's Kung Fu is the best.

Aside from just grabbing when an opportunity arises, another
reason I pushed for going with the full rtld+libc treatment
is that I believe it is important, in the bigger picture,
to be capable of doing both.  Solaris product has long since
moved away from statically linked anything.  But, no Solaris
products are for embedded.  For many embedded systems, building
things statically is the appropriate thing to do.  But, if we
produce a statically linked kernel or applications or both,
it should be because we have shown that we do Solaris, right,
and we can do it either way, and we choose to do a particular
flavor using static linking.  But, it would not do for us to
produce only statically linked, because rtld+libc is too hard.

Brian is no longer employed by Sun.  I have been working through
the rtld+libc problems, on my own.  Yeah, I am making progress,
and I have some ideas how to proceed.  But, I will be here for
only a short while longer, and so I cannot guarantee that rtld
will be ready for prime time, before my time is up.  Also,
I have made promises to document things to be handed off --
to whomever.  So, I have to budget the rest of my time for
two things:

  1) cleanup, documentation, and hand-off
  2) Plan B: single user prompt with statically linked
     userland apps.

More about plan B, later.

-- Guy Shaw


Reply via email to