Rational and more on the subject than you ever wanted to know :)

Maj Justin Seiferth Thu, 21 Aug 1997 07:21:02 -0700 (PDT)

First of all (and I apologize for this use of bandwidth) I've attached a
couple of papers I wrote on both the server and data structure design I
had come up with a few months ago.  The Server_Docs.html especially
contains some background on why I thought (and still think) the
configuration (as opposed to status) system should be controlled by a
daemon separate from the web server.  Please note I think this
discussion concerns only the setup of the server and is independent of
the protocol used to configure the web server.


The primary reasons (and please comment on my logic here) are:
    -- keep bloat out of apache
    -- minimize load on the system
    -- some marginal increase in security (web server doesn't need write
permission to
       content or conf files)
    -- better scalability to multiple machines and services
    -- easier to drop into an existing installation
    -- less volatility with Apache version changes
Anyway, please read the papers and let me know what you think.

Justin

Title: Server Data Structures

Server Data Structures

Introduction
While we're planning just for a roll-out of an Apache configuration server, we want to make sure the implementation will expand and adapt to the configuration of other services as well. I've thought a bit about how this might be done and constructed what I hope can serve as a good framework for storing data internal to the server itself. This document covers the main internal data structure designed to store and manipulate the server configuration information. This document talks about configuration in general and then moves on Apache specifics.

Why Bother?
Most of the problems I've had programming stem from not understanding precisely what I was trying to accomplish and failing to recognize the subtleties of the approach I chose to implement. The best tool for avoiding these pitfalls and the reason I rarely find myself firing up a debugger now adays is that I take the time to put together these documents before I start coding. I also keep them up to date as the implementation progresses. So if you understand C better than English- blow this off.

In General
I spent a lot of time thinking about how to handle all the complicated instances of configuration directives and how to capture those instances in a general purpose data structure that would be useful to many internet based applications. This is my attempt at a design. The primary abstraction are the many views the data structure allows. When we want to retrieve or store information then it's important that we be able to quickly identify the files which have to be updated and the order in which the new information be stored within the file. We don't want to blow away existing configuration directives and we want to make sure the new or changed information doesn't break anything. We'd also like the system to be able to handle multiple users configuring multiple servers. Not all that has to be built into the server at first of course, but we shouldn't forget how things may change (check out the date this was written).

Handling the Structure
Since I've assumed an OO design, lets look at a few criteria that I think are necessary. First, there must be a single data structure- that is a particular piece of configuration information may be stored in only one place. This makes bookkeeping a lot easier. Secondly, nobody should be allowed to directly manipulate the data. Just a good design principle. Third, each bit of data should have a lock- that is a way to say, "keep your distance bud, I'll let you know when you can touch me". We don't want the wishes of multiple users colliding with one another. Finally, we're going to have a lot of views of the data tailored to convenience of particular activities.

The General Object Hierarchy
I've talked some in other places about the general design philosophy of the configuration server; this document covers only the main internal data structure of the server. Retrieval, Storage, Security and Communications are the subject of other documents. While it doesn't matter what language or style most of the server is written in, I think the storage, retrieval and manipulation of the configuration data lends itself very well to an OO language. We want the code of course to be easy to follow and maintain. It would also be good if it were easy to swap out the retrieval and storage modules so that changes to the Apache Configuration file formats are easy to accomodate. So, I've layed out the framework in OO terms.

Configuration Module

Origin
Managed Entity

MetaInfo

GlobalView
GlobalComments

Apache Module

File
Directory
Server
vServer
Limits
ApacheManagedEntity

ContainmentDirective
Directive
Comment
List
Switch
FileBased

Configuration Module
This is the level at which a new service may be added to the configuration system. I assume each service or server will have a few common qualities:

Origin
Managed Entity

MetaInfo

Global
Comments

Each of these is in a "has-as" relationship with the configuration module, that is each configuration module object will have to contain one instance of each of these objects. If we extended the server to accomodate management of the FTP server as well as Apache Web Server configuration we'd still have an Origin, the Managed Entitity would of course be the FTP server configuration and startup files and there would be some global variables like the root of the FTP server directory for anonymous logins. index

An Explanation of the Objects
Origin
This object should contain all the information relating to the "source" of the service, that is the file system objects such as executables, base directory, configuration files and such like associated with the service. index

Managed Entity
Each of the objects within the origin should have a corresponding managed entity. A managed entity is the atomic object which will be manipulated. It may of course correspond to a configuration directive, a file or what have you. By default, it contains nothing except a bit field for locking and a unique identification. You'll have to extend the class to make it useful. index

Meta Info

One basic extension of a Managed Entity is meta information about that entity. I made this a basic feature because I wanted the capability to store information about an entity external to it's actual stored to be a widely recognized and used feature. For instance, we have several under entries in the fields which restrict access to the Apache web server used by our intranet. It's been a major pain in the next keeping track of what is associated with whom in this configuration entry. The use of meta info allows an external database or flat file to be used to store this information. By accessing the Meta Info field of the allow directive we can find out which organization is associated with each entry in the long list of IP addresses and domain names. The most common use of Meta Info is to add comments to a directive; ordinarily as the Managed Entity is written to it's owning file a comment relating to the Meta Info will be written just above the directive. Very handy! index

GlobalView
I've assumed there is going to be a set of configuration information which applies to all aspects of a service. For some services, that all there is- for instance there are no "subentities" to the configuration of the service. In most cases however you'll want additional custom views. index

Global Comments
This is a short blurb on the service or for the storage of any other information you might like to associate with the service. index

Details of the Apache Configuration Modules
So here's what we've all been waiting for, specifics on how the data structure suited to the Apache Configuration Module might be setup. Because the Apache web server is a sophisticated piece of software, there are a number of views which must be maintained; as I said before each of the views just gives a different perspective to the same data. This means we can update the data once and then must just take care that all the views are updated correctly. First of all it uses a couple of general purpose views which are not part of the basic Origin object. Here are the views I'd propose to manage the Apache web server:

FileView: The most obvious view will allow the easy recreation of the ordered contents of a configuration file. There should be an read/write instance of

httpd.conf

srm.conf

and possiblely

access.conf

. There may also be read/write instances of files relating to user and group entries of the security system. Immutable instances may be used also be used to view the contents of log files. A managed entity will only live in one of these file views.

DirectoryView: Provides a tree structure mirroring the directories associated with or managed by the web server. These include the document, configuration, base and log file areas. DirectoryView instances may also be associated with specific directories specificed by Directory tag or within .htaccess files inside the directory. All directives will be transparently combined within this view.

FileView
The Apache server uses a couple of files to determine its behaviour. While many people combine these into one httpd.conf file, lots of older sites have httpd.conf to store server information, srm.conf to store MIME and helper application data and access.conf to customize who can get at what on the server. The Apache Configuration Module accomodates the use of any combinations of these three files. In addition, it allows the dynamic creation and use of per-directory configuration files commonly named .htaccess. A FileView, like any other view is primarily a doubly linked-list structure allowing the easy in-line-order reconstruction of the file contents. This view is easy to construct when the file is read in but maintaining a reasonable order during the editing process is more difficult; here's one way of handling it.

There are a couple of cases we must deal with. The first and easiest is a simple, top-level line addition or modification- in other words those normally associated with the GlobalView and not within any other directive. These are easy, as we add or modify the directive within the GlobalView and then either leave the FileView pointer as is (modification) or insert a new link in the view at the end of the existing view (meaning the directive will be written to the end of the appropriate file. Of course when we are adding a new directive to the server's configuration and the administrator has opted to use multiple configuration files, it's not always clear in which file it should go. In these cases we consult a preferences method within the default ApacheEntity configuration and store the directive according to the default preference. By default, everything will be written to the httpd.conf file.

The second case we must deal with concerns the update of the FileView when the the added or modified directive is inside a directive. Some directives in Apache are hierarchial, that is the directive serves as a container for other directives; I'll call these Containment Directives. Examples are <VirtualServer>, <Directory> and <Limit>. This is a little more difficult; instead of just appending the directive to the end of the existing file, we must place it within the limits of the containing directive. For instance, if we are adding some new access limitations to the configuration of a virtual server then those limitations must go within <VirtualServer> </VirtualServer> limits. This situation just requires a little more logic- we add the directive to the end of the directives within the limits.

The last and most difficult case addresses the removal of a directive from the FileView. A top level directive poses no special problems, we just remove the node in the data structure and the corresponding FileView entry. However, if we remove a Containment Directive- such as <VirtualServer> then we have to make sure we remove all the directives associated within the limits of the container directive. Another case involves the <Directory> directive which may be contained in an .htaccess file; in this case we will just remove the .htaccess file altogether.

DirectoryView
The server should transparently combine these within the DirectoryView and leave the physical location of the directive to the FileView. By default, DirectoryView information will be placed in a .htaccess file within the directory. This is done to avoid configuration errors which might occur when a directory tree is physically moved or deleted. Since items within the DirectoryView are containment directives, they are handled as discussed in the details of FileView.

Apache Specific Objects
Every server, Apache included, has a number of implementation specific directives. Within the Apache Configuration Module these are of the type ApacheManagedEntity. There are two main types of ApacheManagedEntity objects:

Containment Directives: These directives either establish something new- like a virtual server or manage some existing object like a directory. They are all have the unique property that they consist of two parts- an opening and closing mark and they are designed to hold directives within the limits of those two marks.

Directives: Normal everyday configuration lines which aren't allowed to contain other directives. These can be configuration commands to the Apache Server or just comment lines.

First, we'll look at ApacheManagedEntities in general and then we'll examine the ins and outs of each one of them. WARNING: Entering the Tedious Zone.

Apache Object Hierarchy
I won't detail the specific members of each directive set, that's version specific however I will go over the major bits of information we need to maintain about the directives and some details on how they are handled.

Containment Directive
Containment directives are the tags used to contain or group other directives. They can be recognized in the configuration files because they occur in pairs and are surrounded by brackets. Directives have an associated access control list because we may want to setup the server so that a particular administrator can only manipulate the values of certain virtual servers. The use of an access control list allows us to limit who can work on the items within the containment directive.

Directive
Directives correspond to a line within an Apache configuration file. For expediency I include in this group not only the official directives or apache commands but also all the blank lines and comments. Within a directive instance, the directive and its value are stored. The value of some directives is a list of items sometimes and we'll need to work with each of the values in this list separately- to include separately assigning meta info which may be associated with the value.

Configuration Server Description

This document contains a general description of the functions and design of the generalized configuration server. It's designed to be used by someone enhancing the functionality of the server or trying to fix a problem with it.

General Description

The server is designed to allow the remote configuration of internet services; specifically being able to read, parse and rewrite configuration files in response to socket based queries. At the moment, the server is confined to the configuration of the Apache web server and is configured by a client using an IFC-based java applet/application.

The server is a threaded daemon capable of read, parsing and rewriting configuration files. Commands are issued to the server by passing formatted messages to it via a socket dedicated to the client. Locking and authentication mechanisms are used to prevent unauthorized or conflicting configuration of services. In general the server is able to utilize existing configuration files or at least drop into an operational web site without modifying your setup.

Three modes, personal, site and enterprise are envisioned. At the moment only personal and site modes are implemented. The personal mode allows the configuration of a "real" server on a single machine; virtual servers and multiple machine configuration is not enabled. The site mode allows the configuration of virtual servers on a single machine. The enterprise mode will integrated these capabilities across two or more machines allowing such capabilities as central configuration of access permissions.

While most of the commands and functions performed by the server are "standard" and involve the manipulation of service server (esp. Apache web) configuration files, the ability to "stream" statistical and loading information as well as content manipulation are envisioned. Content manipulation may include the ability to create and manage ".htaccess" files, manage content permissions in response to server configuration changes, examine and format log information and eventually even examine the contents of documents served by the service. Above all, the server is designed to give a user interface enough information in an easily managed form so that dorky, geeky messages and meaningless statistics can be avoided and the servers can be managed without endlessly flipping through a myriad of disorganized, ugly configuration pages.

Design

The server used a threaded, event - driven, message-passing, hype-laden, buzzword-activated, laser-guided design. That means each person connecting to configure a server gets their own "thread" and from that thread additional server and client threads or processes are created to keep response times quick; speed is what it's all about. The server is event - driven to allow you to configure the server in any order and manner you desire; you don't have to do things in any set order. A message passing model is used because it's simple to implement and allows convenient testing and debugging using telnet. Message passing means that the server will respond to a pre-defined set of commands. This type of design is pretty standard fare for internet services and similar to the NNTP, SMTP, etc protocols.

Operation

The server starts up when a client attempts to connect to the designated "admin" port- you'll choose the admin port when you install the software. Once the client has connected to the admin port and the person decided which service they want to configure the server will read in the configuration files of that service, place them in internal datastructures, change/add/delete configuration items based on the person's instructions, save the files back to the file system when the user's done and then restart the service to making the new configuration available.

Detailed Design

We'll follow the operation of the server (referred to as the admin server) through a typical configuration session looking at how all the components operate along the way. First of course, is the server startup in response to a connection request from a client.

In the beginning

When a request comes into the well-known but configurable admin port, the machine's inetd daemon is responsible for starting up the admin server; that's pretty standard and is done so our configuration server isn't sucking up memory and other resources when it's not being used. Configuration is a pretty low duty-cycle activity so this sort of scheme makes sense. When it's first started up, the admin server will read it's own configuration files; these are pretty basic. They specify:

who's allowed (clients) to connect
what port it should be listening to (admin port)
what service configuration modules should be loaded (meaning what services can be configured)

There may be other parameters that pop up but these are the basics. Like other aspects of this application, these parameters may also be configured by the person using the client (aka administrator). The admin server should place a lock on the appropriate configuration files on systems with such a facility; this will prevent the inadvertent update of the files by other means. Once started, the admin server then returns to the client the primary socket for further communications.

In the middle

We are now at the point where the admin server has started and configured itself and must now go about the business of configuring the web or other server. This happens in three steps:

- load and process the web server's configuration files into it's internal memory image and

data structures for processing

- serve, process and manipulate it's internal image of the configuration files

- rewrite the configuration file image back into something the web server can understand

Loading Config Files
This is the first step where the server is under direct control of the client operated by the administrator. The protocol for this communication is message based and discussed in detail later. The configuration files may be in a number of formats, however the default configuration which must be understood by the admin server is the httpd.conf, srm.conf and access.conf distribution along with the directives that are allowed within each of those files. O'Reilly's Apache book and the Apache web site are the best references for info on the directives for that server. The data structures and details of the internal representation of this information will be left to the implementation and not be apparent to clients, the services being managed or the configuration files themselves.

Serving/Processing/Manipulation
The server must respond to continued commands from the administrator in a timely manner and operate primarily in a passive mode- that is responding to commands- however it should have the capability to push commands to the client to handle situations such as a conflicting configuration command (ie pop up an asynchronous warning) or to update a time sensitive display requested by the client. The server should also contain provisions to maintain meta-data such as system loading, action logging and documentation supporting the configuration. A good example of where this is needed is documenting who (organizationally) is associated with all those access.conf IP address entries. While the interface should constrain the administrator from making invalid configuration requests, the server should perform internal checks to enforce a valid configuration as best it can. The server should also be able to maintain context- that is allowing the administrator to modify more than one server simultaneously or propagate change across several servers without restarting. In addition, the administrator may wish to store a server-side set of defaults which may be applied upon the creation of a new "real" or "virtual" server.

Rewriting Config Files
The server will maintain its internal image of the configuration while using cvs or a similar utility to backup existing configuration files before rewriting the new configuration files as well as any supporting documentation and log files.

At the End
Finally, the admin server must gracefully restart the web server or otherwise force the managed server to utilize the new configuration files. If there any issues with the new configuration, then the admin server must report those back to the administrator, offer the option of reloading the previous configuration and send the errant (newly configured but flawed in some way) configuration back to the administrator along with some indication of what might have caused the error.

Communication / Message Protocol
The protocol envisioned is a varient of the standard text-based messages send to a port type used by many internet services. For the most part, the protocol consists of simple stateless query-response pairs however modal messages are used for the maintenance of status information and to stream multi-line responses. Here is an initial description of the available messages.

Message	Response Type	Notes
`HELP [command]`	Text List	Returns space separated list of currently available commands or details on the use of the particular command
`OK`	Status	Returned by server is no pending errors or warnings
`ERROR`	Status	Returned by server to indicate an error or warning was caused by last command
`QUIT`	Status	Instructs the server to quit, disposing of all internal information
`RESTART [service]`	Status	Restart the server being configured using current config files
`UPDATE [service]`	Status	Save internal configuration to service's configuration files
`INIT [service]`	Status	Load config files and place service in maintenance mode
`GET [service context parameter]`	Text List/Status	Returns the associated values or status if not available
`SET [service context parameter]`	Status	Returns status after applicaton of value set
`APPEND [service context parameter]`	Status	Returns status after appending value to parameter's value
`AUTHENTICATE [service user]`	Binary Key	Basis for a service by service security mechanism
`STREAM [parameter frequency]`	Text	Provides periodic updates to the port (loading ,etc)