Thanks for all the feedback.
On Wed, Jan 09, 2008 at 09:03:02PM +0100, Derick Rethans wrote:
> On Wed, 9 Jan 2008, Joe Orton wrote:
>
> > It's a bit of a maintenance headache for distributions when
> > packages include their own copy of the timezone database, since this
> > needs to be updated frequently.
>
> There is a PECL extension to provide those updates:
> http://pecl.php.net/package/timezonedb
> It is a drop-in replacement to update the rules.
My aim is to eliminate the additional copies of the timezone database,
rather than ship and maintain yet more. Just as I don't want to
statically link all system libraries into PHP, nor have to build special
copies of any of them for PHP.
I'm not sure I find the logic of the "but the system-provided data will
become stale" arguments convincing; systems which are left unmaintained
by the administrators will have old versions of software on; that's a
given. I can't see why adding *more* packages (PECL timezonedb) which
those admins have to keep up-to-date will make any difference to that.
The point of the patch is to make it *easier* for admins and packagers
to keep systems up to date; fewer package updates to download, test,
deploy, etc. If every package on your system followed the same
philosophy of duplicating the timezone database, the system would be
unmaintainable.
> > 1) I've not implemented timelib_timezone_builtin_identifiers_list() here
> > since it doesn't seem to be used, is it there for third-party
> > extensions? It could be implemented by iterating through the directory.
>
> It's not used (anymore). However, there is a PHP function
> timezone_identifiers_list that uses the index information to enumerate
> all entries. Your patch prevents this function from working as both
> struct elements index_size and index are set to 0/null.
OK, fixed in attached patch.
> > 2) there's no general way that I can find to obtain the database
> > version, so I just invented a string here.
>
> Not having the version creates another problem, as the PECL timezonedb
> extension will then *always* override the builtin database. The
> extension uses the version number to see if the extension provides a
> later rules set. With your patch, this extra check will not work
> correctly anymore if PHP's built-in version is newer (which would happen
> if PHP got upgraded).
This seems fine, even desirable. Anybody who configures PHP
--with-system-tzdata *and* installs the PECL timezonedb always gets the
data provided by the timezonedb; the risk of stale timezone data is
little different to the current risk.
> - This patchs allows accessing any file on the filesystem (because the
> path is not sanitized).
Fixed in attached patch.
> - Opening a file is less quick than having the data in memory.
Sure, this is a trade-off; it's configurable exactly so that users get
to make the choice.
> - The system's format of tzfiles does not necessarily have to be the
> same as the one that PHP reads. On my (Debian) system the tz provided
> data is already 64bit read. Once I update the extension to use this
> extra data, reading the system tz files won't work anymore.
I'm not sure exactly what the concern is here; the 64-bit TZif2 format
is backwards-compatible with the TZif format and the existing PHP code
reads it without problems. (I've been testing against it)
> - It would be possible to use a "wrong set" of tzdata, resulting in bugs
> like http://bugs.php.net/bug.php?id=35958
It's always going to be possible for people to shoot themselves in the
feet. Someone trying to "fix" the PHP-supplied database could similarly
screw it up. Again, eliminating extra copies of the timezone database
is intended to make it easier for people to keep up-to-date.
Regards,
joe
Index: ext/date/lib/parse_tz.c
===================================================================
RCS file: /repository/php-src/ext/date/lib/parse_tz.c,v
retrieving revision 1.35
diff -u -r1.35 parse_tz.c
--- ext/date/lib/parse_tz.c 31 Dec 2007 07:12:08 -0000 1.35
+++ ext/date/lib/parse_tz.c 10 Jan 2008 11:42:15 -0000
@@ -20,6 +20,16 @@
#include "timelib.h"
+#ifdef HAVE_SYSTEM_TZDATA
+#include <sys/mman.h>
+#include <sys/stat.h>
+#include <limits.h>
+#include <fcntl.h>
+#include <unistd.h>
+
+#include "php_scandir.h"
+#endif
+
#include <stdio.h>
#ifdef HAVE_STRING_H
@@ -27,7 +37,10 @@
#else
#include <strings.h>
#endif
+
+#ifndef HAVE_SYSTEM_TZDATA
#include "timezonedb.h"
+#endif
#if (defined(__APPLE__) || defined(__APPLE_CC__)) && (defined(__BIG_ENDIAN__)
|| defined(__LITTLE_ENDIAN__))
# if defined(__LITTLE_ENDIAN__)
@@ -202,6 +215,195 @@
}
}
+#ifdef HAVE_SYSTEM_TZDATA
+
+#ifdef HAVE_SYSTEM_TZDATA_PREFIX
+#define ZONEINFO_PREFIX HAVE_SYSTEM_TZDATA_PREFIX
+#else
+#define ZONEINFO_PREFIX "/usr/share/zoneinfo"
+#endif
+
+static const timelib_tzdb *timezonedb_system = NULL;
+
+/* Filter out some non-tzdata files and the posix/right databases, if
+ * present. */
+static int index_filter(const struct dirent *ent)
+{
+ return strcmp(ent->d_name, ".") != 0
+ && strcmp(ent->d_name, "..") != 0
+ && strcmp(ent->d_name, "posix") != 0
+ && strcmp(ent->d_name, "posixrules") != 0
+ && strcmp(ent->d_name, "right") != 0
+ && strstr(ent->d_name, ".tab") == NULL;
+}
+
+/* Create the zone identifier index by trawling the filesystem. */
+static void create_zone_index(timelib_tzdb *db)
+{
+ size_t dirstack_size, dirstack_top;
+ size_t index_size, index_next;
+ timelib_tzdb_index_entry *db_index;
+ char **dirstack;
+
+ /* LIFO stack to hold directory entres to scan; each slot is a
+ * directory name relative to the zoneinfo prefix. */
+ dirstack_size = 32;
+ dirstack = malloc(dirstack_size * sizeof *dirstack);
+ dirstack_top = 1;
+ dirstack[0] = strdup("");
+
+ /* Index array. */
+ index_size = 64;
+ db_index = malloc(index_size * sizeof *db_index);
+ index_next = 0;
+
+ do {
+ struct dirent **ents;
+ char name[PATH_MAX], *top;
+ int count;
+
+ /* Pop the top stack entry, and iterate through its contents. */
+ top = dirstack[--dirstack_top];
+ snprintf(name, sizeof name, ZONEINFO_PREFIX "/%s", top);
+
+ count = php_scandir(name, &ents, index_filter, php_alphasort);
+
+ while (count > 0) {
+ struct stat st;
+ const char *leaf = ents[count - 1]->d_name;
+
+ snprintf(name, sizeof name, ZONEINFO_PREFIX "/%s/%s",
+ top, leaf);
+
+ if (strlen(name) && stat(name, &st) == 0) {
+ /* Name, relative to the zoneinfo prefix. */
+ const char *root = top;
+
+ if (root[0] == '/') root++;
+
+ snprintf(name, sizeof name, "%s%s%s", root,
+ *root ? "/": "", leaf);
+
+ if (S_ISDIR(st.st_mode)) {
+ if (dirstack_top == dirstack_size) {
+ dirstack_size *= 2;
+ dirstack = realloc(dirstack,
+
dirstack_size * sizeof *dirstack);
+ }
+ dirstack[dirstack_top++] = strdup(name);
+ }
+ else {
+ if (index_next == index_size) {
+ index_size *= 2;
+ db_index = realloc(db_index,
+ index_size *
sizeof *db_index);
+ }
+
+ db_index[index_next].id = strdup(name);
+ db_index[index_next++].pos = 0;
+ }
+ }
+
+ free(ents[--count]);
+ }
+
+ free(ents);
+ free(top);
+ } while (dirstack_top);
+
+ db->index = db_index;
+ db->index_size = index_next;
+
+ free(dirstack);
+}
+
+/* Return the mmap()ed tzfile if found, else NULL. On success, the
+ * length of the mapped data is placed in *length. */
+static char *map_tzfile(const char *timezone, size_t *length)
+{
+ char fname[PATH_MAX];
+ struct stat st;
+ char *p;
+ int fd;
+
+ if (strstr(timezone, "..") != NULL) {
+ return NULL;
+ }
+
+ snprintf(fname, sizeof fname, ZONEINFO_PREFIX "/%s", timezone);
+
+ fd = open(fname, O_RDONLY);
+ if (fd == -1) {
+ return NULL;
+ } else if (fstat(fd, &st) != 0 || st.st_size < 21) {
+ close(fd);
+ return NULL;
+ }
+
+ *length = st.st_size;
+ p = mmap(NULL, st.st_size, PROT_READ, MAP_SHARED, fd, 0);
+ close(fd);
+
+ return p != MAP_FAILED ? p : NULL;
+}
+
+const timelib_tzdb *timelib_builtin_db(void)
+{
+ if (timezonedb_system == NULL) {
+ timelib_tzdb *tmp = malloc(sizeof *tmp);
+
+ tmp->version = "0.system";
+ tmp->data = NULL;
+ create_zone_index(tmp);
+ timezonedb_system = tmp;
+ }
+
+ return timezonedb_system;
+}
+
+const timelib_tzdb_index_entry *timelib_timezone_builtin_identifiers_list(int
*count)
+{
+ *count = timezonedb_system->index_size;
+ return timezonedb_system->index;
+}
+
+int timelib_timezone_id_is_valid(char *timezone, const timelib_tzdb *tzdb)
+{
+ char fname[PATH_MAX];
+
+ if (strstr(timezone, "..") != NULL) {
+ return 0;
+ }
+
+ snprintf(fname, sizeof fname, ZONEINFO_PREFIX "/%s", timezone);
+
+ return access(fname, R_OK) == 0 ? 1 : 0;
+}
+
+timelib_tzinfo *timelib_parse_tzfile(char *timezone, const timelib_tzdb *tzdb)
+{
+ char *tzf, *orig;
+ timelib_tzinfo *tmp;
+ size_t len;
+
+ orig = map_tzfile(timezone, &len);
+ if (orig == NULL) {
+ return NULL;
+ }
+
+ tmp = timelib_tzinfo_ctor(timezone);
+
+ tzf = orig + 20;
+ read_header(&tzf, tmp);
+ read_transistions(&tzf, tmp);
+ read_types(&tzf, tmp);
+
+ munmap(orig, len);
+
+ return tmp;
+}
+#else /* !HAVE_SYSTEM_TZDATA */
+
static int seek_to_tz_position(const unsigned char **tzf, char *timezone,
const timelib_tzdb *tzdb)
{
int left = 0, right = tzdb->index_size - 1;
@@ -258,6 +460,7 @@
return tmp;
}
+#endif
static ttinfo* fetch_timezone_offset(timelib_tzinfo *tz, timelib_sll ts,
timelib_sll *transition_time)
{
Index: ext/date/lib/timelib.m4
===================================================================
RCS file: /repository/php-src/ext/date/lib/timelib.m4,v
retrieving revision 1.4
diff -u -r1.4 timelib.m4
--- ext/date/lib/timelib.m4 3 Jul 2005 23:30:52 -0000 1.4
+++ ext/date/lib/timelib.m4 10 Jan 2008 11:42:16 -0000
@@ -78,3 +78,17 @@
dnl Check for strtoll, atoll
AC_CHECK_FUNCS(strtoll atoll strftime)
+
+PHP_ARG_WITH(system-tzdata, for use of system timezone data,
+[ --with-system-tzdata[=DIR] to specify use of system timezone data],
+no, no)
+
+if test "$PHP_SYSTEM_TZDATA" != "no"; then
+ AC_DEFINE(HAVE_SYSTEM_TZDATA, 1, [Define if system timezone data is used])
+
+ if test "$PHP_SYSTEM_TZDATA" != "yes"; then
+ AC_DEFINE_UNQUOTED(HAVE_SYSTEM_TZDATA_PREFIX, "$PHP_SYSTEM_TZDATA",
+ [Define for location of system timezone data])
+ fi
+fi
+
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php