shared "system" libraries

Marcus Brinkmann Fri, 09 Dec 2005 08:04:33 -0800

Hi,

here is a little problem for everybody to think about.  It's not new
in any case, but it illustrates the difficulty of sharing in light of
resource accounting and self-management of resources.  Maybe it is not
a technical problem, but a policy problem.


The fundamental tasks of an operating system are: Providing security
by isolation, and maximizing resource utilization by sharing (without
violating the isolation properties).  We all know how hard this is.
Usually, at least some covert channels remain if isolation is not
complete in the presence of resource sharing.

The specific issue I want to draw attention to are shared libraries.
Shared libraries are ubiquitous these days.  Because they have big
text and read-only data sections, they can save a lot of memory by
avoiding redundant physical allocation for identical data.  This
maximizes resource utilization.  Further resource utilization is
achieved by demand-paging, keeping the parts of shared libraries that
are not used on disk.  This raises consistency concerns if they are
mutated, for example at shared library update for bug fixes.

In GNU/Linux, shared library files in use are marked as immutable to
prevent modification.  Any attempt to modify the library results in a
ETXTBSY (Text file busy) error message.  Library updates, by policy,
are done by creating a new file, and then linking it under the name of
the old file into the filesystem.  The old file may be unlinked, and
due to the reference counting on files in the kernel its allocated
disk blocks are released when the library is not in use by any process
anymore.  This will happen at the very least at the next reboot.
Because system updates happen infrequently, and reboots happen
frequently (and in fact, system updates may involve a reboot), this is
an acceptable solution.

In the GNU/Hurd on Mach, the MAP_COPY flag is implemented, and used
for library images.  This means that the kernel creates a logical copy
of the whole library file at mmap() time (the actual copy may be
delayed).  If the underlying file is modified, the copy will be
realized to fulfill the consistency requirement for the mapped library
image.  As there is no resource accounting, this doesn't cause any concerns.

Now, let's move on to a system with proper resource management.  In
this case, the only way to ensure technically (without any additional
policy) that memory you use stays around is to pay for it by providing
the necessary storage.  This may mean actual physical RAM in a
self-paging system, or disk space in a persistent system.  IE, each
process would have to create its own copy of the library file before
using it.  Implementing delayed copies a la MAP_COPY may be possible,
but the memory must be available at the time the data needs to be
copied, so at least it must be reserved, paid and accounted for.

With some additional policy, you would only need one copy per user or
session, or isolated subsystem.  Still, given the number of shared
libraries in a system, this results in heavy data duplication or
unused reserved frames, and thus heavy underutilization.  Clearly
undesirable.

Thus, it is proposed that frequently used shared libraries are
provided as a system service, only paid for once by the system, and
then free to be used in a read-only mode by any process in the system.
This however crosses security boundaries, and thus there must be a
policy contract to which each party adheres.  What can this contract
look like?

Let's say the system provides an immutable (for normal users) name
space of files, in other words: a system-wide file system with
libraries and applications that are expected to be used frequently.
Then each process could make read-only mappings from these files, and
use them without paying for the resource.

This works fine, but what happens at update?  If running applications
should not be broken, the old files must be kept around even after an
upgrade.  They must be kept around until the last user goes away.
This suggests a need for reference counting, or maybe at least
voluntary registration.

At first sight, it seems that this problem is worsened by persistence,
because even a reboot can not make existing users go away.  However,
there is a way to make existing users go away, and that is by revoking
the resources of the old files, something the administrator could do.
In this case, the memory would be released, and applications using it
would presumably crash rather soon.

Is this better or worse than a reboot?  In Unix, per policy, the
system tries to enter a good state at reboot.  In a persistent system,
we would have to put a new policy in place, for example by a user
watching its applications, and if they crash (due to revoked old
library files), they could be restarted by the watch dog daemon, or manually.

This works fine, except for long running tasks that can not be easily
restarted.  Such tasks would have to be replaced by a new task using
the new library, in the same way the task itself would be updated to a
new version, ie by following a state migration protocol.

So, the questions here are: What are our options in implementing
shared libraries in the context of possible system designs we are
looking at?  Is the above proposal, which relies mostly on policy
decisions and high-level features like watch dog daemons, adequate?
How does it compare to Unix in terms of implementation complexity and
user friendlyness and automation?

This issue was raised between Neal and me in a train trip from
Amsterdam, so much of this discussion is debted to Neal.  I heard
first about the idea of the system to explicitely grant the resources
for system libraries to the processes from Jonathan Shapiro.

Thanks,
Marcus




_______________________________________________
L4-hurd mailing list
[email protected]
http://lists.gnu.org/mailman/listinfo/l4-hurd

shared "system" libraries

Reply via email to