Thanks for all the feedback.

On Wed, Jan 09, 2008 at 09:03:02PM +0100, Derick Rethans wrote:
> On Wed, 9 Jan 2008, Joe Orton wrote:
> 
> > It's a bit of a maintenance headache for distributions when 
> > packages include their own copy of the timezone database, since this 
> > needs to be updated frequently.
> 
> There is a PECL extension to provide those updates:
> http://pecl.php.net/package/timezonedb
> It is a drop-in replacement to update the rules.

My aim is to eliminate the additional copies of the timezone database, 
rather than ship and maintain yet more.  Just as I don't want to 
statically link all system libraries into PHP, nor have to build special 
copies of any of them for PHP.

I'm not sure I find the logic of the "but the system-provided data will 
become stale" arguments convincing; systems which are left unmaintained 
by the administrators will have old versions of software on; that's a 
given.  I can't see why adding *more* packages (PECL timezonedb) which 
those admins have to keep up-to-date will make any difference to that.

The point of the patch is to make it *easier* for admins and packagers 
to keep systems up to date; fewer package updates to download, test, 
deploy, etc.  If every package on your system followed the same 
philosophy of duplicating the timezone database, the system would be 
unmaintainable.

> > 1) I've not implemented timelib_timezone_builtin_identifiers_list() here 
> > since it doesn't seem to be used, is it there for third-party 
> > extensions?  It could be implemented by iterating through the directory.
> 
> It's not used (anymore). However, there is a PHP function 
> timezone_identifiers_list that uses the index information to enumerate 
> all entries. Your patch prevents this function from working as both 
> struct elements index_size and index are set to 0/null.

OK, fixed in attached patch.

> > 2) there's no general way that I can find to obtain the database 
> > version, so I just invented a string here.
> 
> Not having the version creates another problem, as the PECL timezonedb 
> extension will then *always* override the builtin database. The 
> extension uses the version number to see if the extension provides a 
> later rules set. With your patch, this extra check will not work 
> correctly anymore if PHP's built-in version is newer (which would happen 
> if PHP got upgraded).

This seems fine, even desirable.  Anybody who configures PHP 
--with-system-tzdata *and* installs the PECL timezonedb always gets the 
data provided by the timezonedb; the risk of stale timezone data is 
little different to the current risk.

> - This patchs allows accessing any file on the filesystem (because the 
>   path is not sanitized).

Fixed in attached patch.

> - Opening a file is less quick than having the data in memory.

Sure, this is a trade-off; it's configurable exactly so that users get 
to make the choice.

> - The system's format of tzfiles does not necessarily have to be the 
>   same as the one that PHP reads. On my (Debian) system the tz provided 
>   data is already 64bit read. Once I update the extension to use this 
>   extra data, reading the system tz files won't work anymore.

I'm not sure exactly what the concern is here; the 64-bit TZif2 format 
is backwards-compatible with the TZif format and the existing PHP code 
reads it without problems.  (I've been testing against it)

> - It would be possible to use a "wrong set" of tzdata, resulting in bugs 
>   like http://bugs.php.net/bug.php?id=35958

It's always going to be possible for people to shoot themselves in the 
feet.  Someone trying to "fix" the PHP-supplied database could similarly 
screw it up.  Again, eliminating extra copies of the timezone database 
is intended to make it easier for people to keep up-to-date.

Regards,

joe
Index: ext/date/lib/parse_tz.c
===================================================================
RCS file: /repository/php-src/ext/date/lib/parse_tz.c,v
retrieving revision 1.35
diff -u -r1.35 parse_tz.c
--- ext/date/lib/parse_tz.c     31 Dec 2007 07:12:08 -0000      1.35
+++ ext/date/lib/parse_tz.c     10 Jan 2008 11:42:15 -0000
@@ -20,6 +20,16 @@
 
 #include "timelib.h"
 
+#ifdef HAVE_SYSTEM_TZDATA
+#include <sys/mman.h>
+#include <sys/stat.h>
+#include <limits.h>
+#include <fcntl.h>
+#include <unistd.h>
+
+#include "php_scandir.h"
+#endif
+
 #include <stdio.h>
 
 #ifdef HAVE_STRING_H
@@ -27,7 +37,10 @@
 #else
 #include <strings.h>
 #endif
+
+#ifndef HAVE_SYSTEM_TZDATA
 #include "timezonedb.h"
+#endif
 
 #if (defined(__APPLE__) || defined(__APPLE_CC__)) && (defined(__BIG_ENDIAN__) 
|| defined(__LITTLE_ENDIAN__))
 # if defined(__LITTLE_ENDIAN__)
@@ -202,6 +215,195 @@
        }
 }
 
+#ifdef HAVE_SYSTEM_TZDATA
+
+#ifdef HAVE_SYSTEM_TZDATA_PREFIX
+#define ZONEINFO_PREFIX HAVE_SYSTEM_TZDATA_PREFIX
+#else
+#define ZONEINFO_PREFIX "/usr/share/zoneinfo"
+#endif
+
+static const timelib_tzdb *timezonedb_system = NULL;
+
+/* Filter out some non-tzdata files and the posix/right databases, if
+ * present. */
+static int index_filter(const struct dirent *ent)
+{
+       return strcmp(ent->d_name, ".") != 0
+               && strcmp(ent->d_name, "..") != 0
+               && strcmp(ent->d_name, "posix") != 0
+               && strcmp(ent->d_name, "posixrules") != 0
+               && strcmp(ent->d_name, "right") != 0
+               && strstr(ent->d_name, ".tab") == NULL;
+}
+
+/* Create the zone identifier index by trawling the filesystem. */
+static void create_zone_index(timelib_tzdb *db)
+{
+       size_t dirstack_size,  dirstack_top;
+       size_t index_size, index_next;
+       timelib_tzdb_index_entry *db_index;
+       char **dirstack;
+
+       /* LIFO stack to hold directory entres to scan; each slot is a
+        * directory name relative to the zoneinfo prefix. */
+       dirstack_size = 32;
+       dirstack = malloc(dirstack_size * sizeof *dirstack);
+       dirstack_top = 1;
+       dirstack[0] = strdup("");
+       
+       /* Index array. */
+       index_size = 64;
+       db_index = malloc(index_size * sizeof *db_index);
+       index_next = 0;
+
+       do {
+               struct dirent **ents;
+               char name[PATH_MAX], *top;
+               int count;
+
+               /* Pop the top stack entry, and iterate through its contents. */
+               top = dirstack[--dirstack_top];
+               snprintf(name, sizeof name, ZONEINFO_PREFIX "/%s", top);
+
+               count = php_scandir(name, &ents, index_filter, php_alphasort);
+
+               while (count > 0) {
+                       struct stat st;
+                       const char *leaf = ents[count - 1]->d_name;
+
+                       snprintf(name, sizeof name, ZONEINFO_PREFIX "/%s/%s", 
+                                top, leaf);
+                       
+                       if (strlen(name) && stat(name, &st) == 0) {
+                               /* Name, relative to the zoneinfo prefix. */
+                               const char *root = top;
+
+                               if (root[0] == '/') root++;
+
+                               snprintf(name, sizeof name, "%s%s%s", root, 
+                                        *root ? "/": "", leaf);
+
+                               if (S_ISDIR(st.st_mode)) {
+                                       if (dirstack_top == dirstack_size) {
+                                               dirstack_size *= 2;
+                                               dirstack = realloc(dirstack, 
+                                                                  
dirstack_size * sizeof *dirstack);
+                                       }
+                                       dirstack[dirstack_top++] = strdup(name);
+                               }
+                               else {
+                                       if (index_next == index_size) {
+                                               index_size *= 2;
+                                               db_index = realloc(db_index,
+                                                                  index_size * 
sizeof *db_index);
+                                       }
+
+                                       db_index[index_next].id = strdup(name);
+                                       db_index[index_next++].pos = 0;
+                               }
+                       }
+
+                       free(ents[--count]);
+               }
+               
+               free(ents);
+               free(top);
+       } while (dirstack_top);
+
+       db->index = db_index;
+       db->index_size = index_next;
+
+       free(dirstack);
+}
+
+/* Return the mmap()ed tzfile if found, else NULL.  On success, the
+ * length of the mapped data is placed in *length. */
+static char *map_tzfile(const char *timezone, size_t *length)
+{
+       char fname[PATH_MAX];
+       struct stat st;
+       char *p;
+       int fd;
+       
+       if (strstr(timezone, "..") != NULL) {
+               return NULL;
+       }
+
+       snprintf(fname, sizeof fname, ZONEINFO_PREFIX "/%s", timezone);
+       
+       fd = open(fname, O_RDONLY);
+       if (fd == -1) {
+               return NULL;
+       } else if (fstat(fd, &st) != 0 || st.st_size < 21) {
+               close(fd);
+               return NULL;
+       }
+
+       *length = st.st_size;
+       p = mmap(NULL, st.st_size, PROT_READ, MAP_SHARED, fd, 0);
+       close(fd);
+       
+       return p != MAP_FAILED ? p : NULL;
+}
+
+const timelib_tzdb *timelib_builtin_db(void)
+{
+       if (timezonedb_system == NULL) {
+               timelib_tzdb *tmp = malloc(sizeof *tmp);
+
+               tmp->version = "0.system";
+               tmp->data = NULL;
+               create_zone_index(tmp);
+               timezonedb_system = tmp;
+       }
+                       
+       return timezonedb_system;
+}
+
+const timelib_tzdb_index_entry *timelib_timezone_builtin_identifiers_list(int 
*count)
+{
+       *count = timezonedb_system->index_size;
+       return timezonedb_system->index;
+}
+
+int timelib_timezone_id_is_valid(char *timezone, const timelib_tzdb *tzdb)
+{
+       char fname[PATH_MAX];
+
+       if (strstr(timezone, "..") != NULL) {
+               return 0;
+       }
+       
+       snprintf(fname, sizeof fname, ZONEINFO_PREFIX "/%s", timezone);
+
+       return access(fname, R_OK) == 0 ? 1 : 0;
+}
+
+timelib_tzinfo *timelib_parse_tzfile(char *timezone, const timelib_tzdb *tzdb)
+{
+       char *tzf, *orig;
+       timelib_tzinfo *tmp;
+       size_t len;
+
+       orig = map_tzfile(timezone, &len);
+       if (orig == NULL) {
+               return NULL;
+       }
+
+       tmp = timelib_tzinfo_ctor(timezone);
+
+       tzf = orig + 20;
+       read_header(&tzf, tmp);
+       read_transistions(&tzf, tmp);
+       read_types(&tzf, tmp);
+
+       munmap(orig, len);
+
+       return tmp;
+}
+#else /* !HAVE_SYSTEM_TZDATA */
+
 static int seek_to_tz_position(const unsigned char **tzf, char *timezone, 
const timelib_tzdb *tzdb)
 {
        int left = 0, right = tzdb->index_size - 1;
@@ -258,6 +460,7 @@
 
        return tmp;
 }
+#endif
 
 static ttinfo* fetch_timezone_offset(timelib_tzinfo *tz, timelib_sll ts, 
timelib_sll *transition_time)
 {
Index: ext/date/lib/timelib.m4
===================================================================
RCS file: /repository/php-src/ext/date/lib/timelib.m4,v
retrieving revision 1.4
diff -u -r1.4 timelib.m4
--- ext/date/lib/timelib.m4     3 Jul 2005 23:30:52 -0000       1.4
+++ ext/date/lib/timelib.m4     10 Jan 2008 11:42:16 -0000
@@ -78,3 +78,17 @@
 
 dnl Check for strtoll, atoll
 AC_CHECK_FUNCS(strtoll atoll strftime)
+
+PHP_ARG_WITH(system-tzdata, for use of system timezone data,
+[  --with-system-tzdata[=DIR]      to specify use of system timezone data],
+no, no)
+
+if test "$PHP_SYSTEM_TZDATA" != "no"; then
+   AC_DEFINE(HAVE_SYSTEM_TZDATA, 1, [Define if system timezone data is used])
+
+   if test "$PHP_SYSTEM_TZDATA" != "yes"; then
+      AC_DEFINE_UNQUOTED(HAVE_SYSTEM_TZDATA_PREFIX, "$PHP_SYSTEM_TZDATA",
+                         [Define for location of system timezone data])
+   fi
+fi
+

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to