I just checked in a lot of new/changed code under modules/DateTime-TimeZone. This code actually seems to work, and provides access to the complete Olson historical data, plus ongoing time zone changes in the future.
I'd realy appreciate some code review on this, even though it's still a bit messy. Here's how it works: DateTime::TimeZone::OlsonDB contains code to parse the Olson TZ DB files and represent them internally as objects. For those who aren't familiar with these files, here's some background. These files define several types of data. The first type is a zone definition. This defines the name for a zone and a set of observances for that zone. A given zone has 1+ observances, each of which is a time _span_ during which the time zone has a base GMT offset, a "format" (which is really a name like EST, sort of), and a start and end time. However, every (I think) zone has at least one observance that starts at the beginning of time, and one that ends at the end of time (-inf & +inf). This _may_ be the same observance. Observances may have one rule. A rule is a set of date times _or_ time spans, during which an additional offset may be applied (like during DST) to the base GMT offset. Rules also affect a zone's short name. For example, the most recent observance for the "America/Chicago" zone defines its "format" as "C%sT", which as you can see is intended to be fed into sprintf. Rules may have an optional "letter" assigned to them. So for example, the rule that defines DST for the US (part of the US set of rules) has the letter "D", which when combined with the "C%sT" format produces "CDT". However, some rules may have no letter, and some observances may have no rules. And some rules may only happen once (like the peace time and war time rules for the US). These various types of data are represented via the following classes, DateTime::TimeZone::OlsonDB::Zone DateTime::TimeZone::OlsonDB::Observance DateTime::TimeZone::OlsonDB::Rule The whole lot is a DateTime::TimeZone::OlsonDB object. So an OlsonDB object contains many Zone and Rule objects. Zone objects in turn contain 1+ Observance objects. But none of these things are particular useful, since they're just raw data, and the rules and observances interact in complicated ways. So what you really want to know is "when do changes happen in a zone". Changes may be a change from one rule to another inside a single observance, or a change from one observance to another, which may or may not change rules at the same time! Because both observances and rules may be spans, what we really want to produce out of all of this is a set of time spans that combine _both_ observances and rules. So there is a DateTime::TimeZone::OlsonDB::Change object. This can only be created after an entire Olson DB file has been read, since rules and observances can be defined in any order. A change is basically the datetime of the change (UTC), a resolved short name (the observance format + optional rule's letter(s)), plus an observance and optional rule. So here's a somewhat typical usage (for something that has only been used by me ;): my $db = DateTime::TimeZone::OlsonDB->new; $db->parse_file( ... ); foreach my $name ( $db->zone_names ) { my $zone = $db->expanded_zone($name); foreach my $change ( $zone->sorted_changes ) { # do something interesting with this } } Now, for the purposes of calculating a given datetime's offset, we don't want to just know when changes happen, we really need to know the time span for a given set of data (GMT offset and the sort name). So the script in tools/parse_olson takes pairs of changes and generates data about time spans. This script is used to create the module files for all the time zones. The module files are pre-populated with a set of known spans. If the last span does not end with infinity (and most don't), then there is also a subroutine that generates more spans based on the "infinite" rules (rules with no specified ending date). A span looks like this (this one is for Dar es Salaam): { 'short_name' => 'BEAUT', 'end_date' => '19601201024445', 'start_date' => '19471201030000', 'offset' => 9885 }, The start and end dates have been converted to numbers in the hopes that speeds up comparisons. Some start or end dates may be negative or positive infinity. The end time for a given span is always the start time time for the next span. The data is all stored in a red-black tree (basically a binary tree with built-in balancing). The keys for the nodes are spans [ start, end ] and the values are the whole span hash reference (seen above). The red-black tree has a custom comparison function which returns a match if the given datetime is >= start and < end (note that it is inclusive on the left side _only_). If the datetime is < start then it's considered less than the node. If the datetime is >= end, it is greater than. So if we search the tree and find a span for a given datetime, we have our offset and short name. If we don't find one, we call the generator method to create more spans until there is a match. Those new spans are added to the tree to cache them. We could also generate extra spans after the match, which might be a smart optimization. Anyway, my limited testing shows that this all works, which amazed me. I'd welcome code review, suggestions, optimization ideas, bug reports, etc. There's also the sticky wicket that this code requires DateTime.pm to work, while DateTime.pm will require DateTime::TimeZone. This will make the first release tricky. -dave /*======================= House Absolute Consulting www.houseabsolute.com =======================*/