First of all (and I apologize for this use of bandwidth) I've attached a couple of papers I wrote on both the server and data structure design I had come up with a few months ago. The Server_Docs.html especially contains some background on why I thought (and still think) the configuration (as opposed to status) system should be controlled by a daemon separate from the web server. Please note I think this discussion concerns only the setup of the server and is independent of the protocol used to configure the web server.
The primary reasons (and please comment on my logic here) are:
-- keep bloat out of apache
-- minimize load on the system
-- some marginal increase in security (web server doesn't need write
permission to
content or conf files)
-- better scalability to multiple machines and services
-- easier to drop into an existing installation
-- less volatility with Apache version changes
Anyway, please read the papers and let me know what you think.
Justin
Title: Server Data Structures
|
Introduction
Why Bother?
In General
Handling the Structure
The General Object Hierarchy
This is the level at which a new service may be added to the configuration system. I assume each service or server will have a few common qualities:
An Explanation of the Objects
Managed
Entity
One basic extension of a Managed Entity is meta information about that entity. I made this a basic feature because I wanted the capability to store information about an entity external to it's actual stored to be a widely recognized and used feature. For instance, we have several under entries in the fields which restrict access to the Apache web server used by our intranet. It's been a major pain in the next keeping track of what is associated with whom in this configuration entry. The use of meta info allows an external database or flat file to be used to store this information. By accessing the Meta Info field of the allow directive we can find out which organization is associated with each entry in the long list of IP addresses and domain names. The most common use of Meta Info is to add comments to a directive; ordinarily as the Managed Entity is written to it's owning file a comment relating to the Meta Info will be written just above the directive. Very handy! index I've assumed there is going to be a set of configuration information which applies to all aspects of a service. For some services, that all there is- for instance there are no "subentities" to the configuration of the service. In most cases however you'll want additional custom views. index Global
Comments
Details of the Apache
Configuration Modules
DirectoryView: Provides a tree structure mirroring the directories associated with or managed by the web server. These include the document, configuration, base and log file areas. DirectoryView instances may also be associated with specific directories specificed by Directory tag or within .htaccess files inside the directory. All directives will be transparently combined within this view. The Apache server uses a couple of files to determine its behaviour. While many people combine these into one httpd.conf file, lots of older sites have httpd.conf to store server information, srm.conf to store MIME and helper application data and access.conf to customize who can get at what on the server. The Apache Configuration Module accomodates the use of any combinations of these three files. In addition, it allows the dynamic creation and use of per-directory configuration files commonly named .htaccess. A FileView, like any other view is primarily a doubly linked-list structure allowing the easy in-line-order reconstruction of the file contents. This view is easy to construct when the file is read in but maintaining a reasonable order during the editing process is more difficult; here's one way of handling it. There are a couple of cases we must deal with. The first and easiest is a simple, top-level line addition or modification- in other words those normally associated with the GlobalView and not within any other directive. These are easy, as we add or modify the directive within the GlobalView and then either leave the FileView pointer as is (modification) or insert a new link in the view at the end of the existing view (meaning the directive will be written to the end of the appropriate file. Of course when we are adding a new directive to the server's configuration and the administrator has opted to use multiple configuration files, it's not always clear in which file it should go. In these cases we consult a preferences method within the default ApacheEntity configuration and store the directive according to the default preference. By default, everything will be written to the httpd.conf file. The second case we must deal with concerns the update of the FileView when the the added or modified directive is inside a directive. Some directives in Apache are hierarchial, that is the directive serves as a container for other directives; I'll call these Containment Directives. Examples are <VirtualServer>, <Directory> and <Limit>. This is a little more difficult; instead of just appending the directive to the end of the existing file, we must place it within the limits of the containing directive. For instance, if we are adding some new access limitations to the configuration of a virtual server then those limitations must go within <VirtualServer> </VirtualServer> limits. This situation just requires a little more logic- we add the directive to the end of the directives within the limits. The last and most difficult case addresses the removal of a directive from the FileView. A top level directive poses no special problems, we just remove the node in the data structure and the corresponding FileView entry. However, if we remove a Containment Directive- such as <VirtualServer> then we have to make sure we remove all the directives associated within the limits of the container directive. Another case involves the <Directory> directive which may be contained in an .htaccess file; in this case we will just remove the .htaccess file altogether. DirectoryView
Apache
Specific Objects
Directives: Normal everyday configuration lines which aren't allowed to contain other directives. These can be configuration commands to the Apache Server or just comment lines. First, we'll look at ApacheManagedEntities in general and then we'll examine the ins and outs of each one of them. WARNING: Entering the Tedious Zone. Apache Object Hierarchy
Containment Directive
Directive
|
Configuration Server DescriptionGeneral DescriptionThe server is designed to allow the remote configuration of internet services; specifically being able to read, parse and rewrite configuration files in response to socket based queries. At the moment, the server is confined to the configuration of the Apache web server and is configured by a client using an IFC-based java applet/application.The server is a threaded daemon capable of read, parsing and rewriting configuration files. Commands are issued to the server by passing formatted messages to it via a socket dedicated to the client. Locking and authentication mechanisms are used to prevent unauthorized or conflicting configuration of services. In general the server is able to utilize existing configuration files or at least drop into an operational web site without modifying your setup. Three modes, personal, site and enterprise are envisioned. At the moment only personal and site modes are implemented. The personal mode allows the configuration of a "real" server on a single machine; virtual servers and multiple machine configuration is not enabled. The site mode allows the configuration of virtual servers on a single machine. The enterprise mode will integrated these capabilities across two or more machines allowing such capabilities as central configuration of access permissions. While most of the commands and functions performed by the server are "standard" and involve the manipulation of service server (esp. Apache web) configuration files, the ability to "stream" statistical and loading information as well as content manipulation are envisioned. Content manipulation may include the ability to create and manage ".htaccess" files, manage content permissions in response to server configuration changes, examine and format log information and eventually even examine the contents of documents served by the service. Above all, the server is designed to give a user interface enough information in an easily managed form so that dorky, geeky messages and meaningless statistics can be avoided and the servers can be managed without endlessly flipping through a myriad of disorganized, ugly configuration pages. DesignThe server used a threaded, event - driven, message-passing, hype-laden, buzzword-activated, laser-guided design. That means each person connecting to configure a server gets their own "thread" and from that thread additional server and client threads or processes are created to keep response times quick; speed is what it's all about. The server is event - driven to allow you to configure the server in any order and manner you desire; you don't have to do things in any set order. A message passing model is used because it's simple to implement and allows convenient testing and debugging using telnet. Message passing means that the server will respond to a pre-defined set of commands. This type of design is pretty standard fare for internet services and similar to the NNTP, SMTP, etc protocols.OperationThe server starts up when a client attempts to connect to the designated "admin" port- you'll choose the admin port when you install the software. Once the client has connected to the admin port and the person decided which service they want to configure the server will read in the configuration files of that service, place them in internal datastructures, change/add/delete configuration items based on the person's instructions, save the files back to the file system when the user's done and then restart the service to making the new configuration available.Detailed DesignWe'll follow the operation of the server (referred to as the admin server) through a typical configuration session looking at how all the components operate along the way. First of course, is the server startup in response to a connection request from a client.In the beginningWhen a request comes into the well-known but configurable admin port, the machine's inetd daemon is responsible for starting up the admin server; that's pretty standard and is done so our configuration server isn't sucking up memory and other resources when it's not being used. Configuration is a pretty low duty-cycle activity so this sort of scheme makes sense. When it's first started up, the admin server will read it's own configuration files; these are pretty basic. They specify:
In the middleWe are now at the point where the admin server has started and configured itself and must now go about the business of configuring the web or other server. This happens in three steps:
- serve, process and manipulate it's internal image of the configuration files - rewrite the configuration file image back into something the web server can understand This is the first step where the server is under direct control of the client operated by the administrator. The protocol for this communication is message based and discussed in detail later. The configuration files may be in a number of formats, however the default configuration which must be understood by the admin server is the httpd.conf, srm.conf and access.conf distribution along with the directives that are allowed within each of those files. O'Reilly's Apache book and the Apache web site are the best references for info on the directives for that server. The data structures and details of the internal representation of this information will be left to the implementation and not be apparent to clients, the services being managed or the configuration files themselves. Serving/Processing/Manipulation The server must respond to continued commands from the administrator in a timely manner and operate primarily in a passive mode- that is responding to commands- however it should have the capability to push commands to the client to handle situations such as a conflicting configuration command (ie pop up an asynchronous warning) or to update a time sensitive display requested by the client. The server should also contain provisions to maintain meta-data such as system loading, action logging and documentation supporting the configuration. A good example of where this is needed is documenting who (organizationally) is associated with all those access.conf IP address entries. While the interface should constrain the administrator from making invalid configuration requests, the server should perform internal checks to enforce a valid configuration as best it can. The server should also be able to maintain context- that is allowing the administrator to modify more than one server simultaneously or propagate change across several servers without restarting. In addition, the administrator may wish to store a server-side set of defaults which may be applied upon the creation of a new "real" or "virtual" server. Rewriting Config Files The server will maintain its internal image of the configuration while using cvs or a similar utility to backup existing configuration files before rewriting the new configuration files as well as any supporting documentation and log files. At the End Finally, the admin server must gracefully restart the web server or otherwise force the managed server to utilize the new configuration files. If there any issues with the new configuration, then the admin server must report those back to the administrator, offer the option of reloading the previous configuration and send the errant (newly configured but flawed in some way) configuration back to the administrator along with some indication of what might have caused the error. Communication / Message Protocol The protocol envisioned is a varient of the standard text-based messages send to a port type used by many internet services. For the most part, the protocol consists of simple stateless query-response pairs however modal messages are used for the maintenance of status information and to stream multi-line responses. Here is an initial description of the available messages.
|
