Logger For D Design Doc

sybrandy Fri, 03 Sep 2010 18:10:31 -0700

Hello,

As promised, I took some time to create a small document (just a textfile) to describe a potential logger for D. Glog was a major influenceon how the design turned out, though there are some differences.

Please note it's not a 100% solution yet. Hopefully this is closer towhat Andrei is looking for. All I ask is that those that are interestedin this please take your time when looking this over before respondingand please do ask questions, comments, etc.


Thanks and enjoy!

Casey

Logger for D Design

ARCHITECTURE
------------

The logger will consist of three distinct components:

1) A public interface to allow a user to write messages to a log.
2) A background thread to periodically poll the configuration file for
changes.
3) A writer thread that handles the actual writing of log messages to some
location.

The writer thread is the most interesting as it will support being able to use
a user-defined log writer method that conforms to a specific interface.  At
least one log writer will be provided by default, therefore this is not
needed.

The reason for this capability is not only to make it easier to add different
writers in the future, but to remain flexible.  Individuals may require a
custom solution, such as logging to a database or a custom file format.

The current plan is to store function pointers in an associative array.  A
mixin will be provided that accepts an ID/name for the function and a pointer
to the function.  The function can then be specified in the configuration
file.  Also, it is expected that any writer can be configured using the same
configuration file.

INITIALIZATION
--------------

Some basic initialization needs to occur for the logger to work.  This is
because two threads are spawned in the background and I also want to capture
the location where the logging is occurring.  There are two possible
techniques for each.

Thread Initialization
---------------------

The first method, and probably the most desirable, is to have thread
initialization occur using a static constructor.  This way the user does not
have to ensure that it was done before using the logger.  This should be very
feasible as the constructor would be part of the class that does the log
writing, which would be used by the methods described below under LOGGING.
E.g. a simplified version of the log function could look like this:

void log(enum level, string msg)
{
    logWriter.writeToLog(level, msg);
}

The second method is the one currently used, which is to have the user call a
constructor and the constructor starts the threads.

Logger Initialization
---------------------

If a constructor is explicitly called, then the location where the logging is
occurring can be passed in as a parameter.  However, it appears that the more
desirable way to do this is using a mixin.  This way, we can use __FILE__ as
the default.  Unfortunately, it means that something similar to this needs to
appear in the code: mixin(initLogger!())

The location is important as it lets the user know where a log message
originated from.  However, it is a pain to have to include it in every log
message, therefore being able to have it set once in a module/class ensures we
get this information without being a chore.

Note: If mixins could be called like normal functions, this could be even
cleaner as the "log" statements could possibly be mixins.  However, having a
person do this every time seems very ugly: mixin(log!(INFO, "..."));  The big
question here is whether a mixin, or mixin template, can be used as a Variadic
Template.

LOGGING
-------

The current plan is to have all logging statements look similar to this:

log(INFO, "some message", 91, " foo");
logf(DEBUG, "%s - %d", "Current value", 57);
logc((a > b), TRACE, "'a' should not be greater than 'b'");
logcf((a % 10 == 0), INFO, "%s - %d", "current count", count);

However, if we go with a class-based approach, then they would look more like
this:

logger.log(INFO, "some message", 91, " foo");

Aliases can then be used to make them look like this:

logger.INFO("some message", 91, " foo");

Writing to the log
------------------

Each of these methods are simply wrappers around the log writer function.
These wrappers are responsible for checking the current log level and any
user-provided boolean conditions to determine if logging should occur.  If
logging is to occur, then the message is passed to the log writer.

Writing to the log will be done by sending messages to a writer thread.  This
was done primarily to support applications that use multiple threads, however
it also helps ensure that logging has a minimal impact on the normal operation
of a program.

NOTE: Need to determine if send/receive support variadic parameters.  If not,
all formatting will need to be done before passing the message to the log
writer.  If it is possible, formatting can be performed in the log writer and
help keep the logging overhead from impacting program execution.

Buffered vs. Unbuffered Writes
------------------------------

When writing out the log entries, one may be tempted to try to improve
performance by buffering the entries and only writing them out periodically.
After examining some scenarios, this appears to be the incorrect solution.
Whether one only cares about errors or a developer is trying to debug an
application, ensuring the capturing the log entries is very important.  No
entries should be lost, therefore unbuffered writes are more desirable.

Writing to stderr
-----------------

Log entries can be written to stderr, however they will also be copied to a
log file.  The reason for this is to ensure that the output can be reviewed as
needed without needing to execute the program again.

NOTE: Will this work if the writer is in a separate thread?  If not, may need
to drop this feature.

Timestamps
----------

There are two possible times to record the timestamp: when the log statement
is called and when the message is written to the log.  The advantage of the
first is that the timestamps are guaranteed to be in the order that the
messages were written, however when they are written the order may not be
preserved with respect to messages from other threads.  The latter will
generate the timestamps before writing, thus the ordering may not be quite
right with respect to the messages, but correct in respect to when they are
written.  Also, the creation and formatting of the timestamp is handled in the
writer thread, thus less overhead with respect to the rest of the program.

The current plan is to create the timestamp upon writing the message.  This
ensures that program execution is not impacted and to ensure that there is no
confusion due to timestamps potentially being out of order.

There currently is a configuration option that is required due to std.date
reporting UTC time and not local time.  Once this is resolved, it can be
removed.  Until then, to get the correct timestamp the timezone needs to be
adjusted.

COMPILATION
-----------

A compile-time option will be provided to turn on/off logging as desired.
The question is which is the preferred solution:

1) Provide a version identifier to turn logging on.
2) Provide a version identifier to turn logging off.

Option 2 may be preferable because if one has taken the time to add logging,
chances are it is a key part of the system.  Also, it is hoped that if someone
decides to turn off logging, they do so because they are very confident that
it is not necessary.  E.g. the developer is very confident that the software
works properly.

CONFIGURATION FILE
------------------

To keep things simple, a configuration file will be used to store various
settings.  This not only keeps the code free of configuration information,
options can be changed during runtime and a background thread will reread the
configuration file periodically.  This was done in anticipation of
long-running applications, specifically servers, where stopping and starting
the application is not desirable.

The exact options are currently not know, however the following are
anticipated:

minlevel - The minimum level to log.
tzoffset - The timezone offset.  This is used to ensure the logger writes the
local time to the log and not UTC time.
filename - The name and location of the logfile.
writer - The name of the log writer to user.

NOTES
-----

- Most initialization will occur in a static constructor.
    - Background threads
    - Read initial values from the configuration file.
- Individual files/modules can have their own log levels.
    - Each file/module has a unique ID.
    - IDs are keys to a has table (AA) containing log level values.
    - Synchronization may cause performance issues.
    - Need to figure out.
- Set up the logger to know the current file that it is in.
    - Mixin?  E.g. every file starts with something like:
      mixin(initLogger!());
    - "location" defaults to __FILE__, but user can override.
        - If embedding classes, this may be very useful.
- Setting are contained within a config file.
    - Keeps code cleaner.
    - Settings can be changed without stopping the program.
- Version option available to remove all logging during compilation.
- Logging to stderr is also logged to a log file.
    - If for some reason the amount of data is large, the log file is also
      available to peruse.
- Logger can accept formatted or unformatted input.
- Logger decide to log or not log via user-supplied boolean statements.
- Timestamp
    - When written or when the application calls a log function?
        - If not when written, timestamps may appear out of order. (Not
          confirmed)
    - Still have the timezone offset issue...

LEVELS
------

fatal - Used for when an error has occured where the only solution is to
terminate the application.

error - Used for when an error occured, but it is recoverable.

warning - Used for when something has occured during program execution that
still results in a valid execution of the program, but may have unexpected
side effects or can indicate that a problem is going to occur.  E.g. compiler
warnings or noting that the number of allowed connections is close to the
limit.

info - Used for recording key aspects of the normal execution of the program.
This can be useful for servers to know what aspects of the server is being
used the most.

debug - Used for capturing the minimal amount of information for debugging a
program.  Some examples are entering and leaving methods, capturing the
original and transformed copies of data, and recording when key steps in an
algorithm are performed.

trace - Used for when normal debug level logging is not enough.  This should
be used for very fine grain logging, such as recording the state of every
variable or each step of an algorithm.

Per "Module" Levels
-------------------

Glog has a feature where specific modules can have different log levels
associated with them.  For example, the default log level may be set to
"warning", but a module responsible for parsing a complex file may be set to
"debug" as there may still be some issues that occur during testing or
production use.

Currently, the only way to properly implement this is to have only the log
writer thread determine if the message is to be written to the log.  This
means that every log message, regardless of level, is passed to the log writer
thread.  The concern here is that if there is a significant amount of logging
in the system, the threads mailbox can be filled very quickly.

If a logger is instantiated in a new location at runtime, the log writer code
would have to handle this accordingly.  If the location is currently not
known, it's level would be set to the minimum level.  Otherwise, it's level
would be set to whatever is specified for that location in the config file.

Logger For D Design Doc

Reply via email to