Hi Michael,

Generally speaking you are right - a URI without a scheme is invalid,
but I don't think this is a good enough reason not to represent partial
URLs in an object. 

It will make it easier for users, for example, to parse HTML files which
might contain full or partial URLs, and extract them, without checking
whether they are complete or not. Then you could use this partial URI to
extract parts of it - like the path, fragment or query string. 

One could use a method, such as Zend_Uri->isComplete() or
Zend_Uri->isValid() to check the URI validity. 

As for implementation concerns, I did not check this but I have a
general feeling there are internal functions that will help us safely
merge paths - if those won't work, we can do the implementation as you
proposed.

Shahar.

On Tue, 2007-10-16 at 14:55 -0400, Michael B Allen wrote:
> On 10/16/07, Shahar Evron <[EMAIL PROTECTED]> wrote:
> > - Represent abstract or incomplete URIs (such as one might find in an
> >   HTML page - <a href="../foo/bar">.
> 
> Hi Shahar,
> 
> I think building URIs from only a relative path is a mistake -
> conceptually and programmatically. A URI is not valid unless it has at
> least a scheme.
> 
> However, it does make sense to allow constructing a URI from a URI
> object and a relative path. Meaning you retrieve a context URI
> representing the HTML page and then construct a new URI object with it
> and the relative path '../foo/bar' [1].
> 
> That's how Java's URL class does it BTW.
> 
> Mike
> 
> [1] Beware that path canonicalization is notoriously tricky and has in
> many instances lead to security vulnerabilities in high profile
> products. For a good path canonicalization routine a state machine
> usually turns out to be most correct (and if it's not it's easy to fix
> without screwing something else up). The following is C but of course
> it's not terribly difficult to translate this to PHP (e.g. *dst++ =
> *src++ becomes dst[di++] = src[si++]).
> 
> #define ST_START     1
> #define ST_SEPARATOR 2
> #define ST_NORMAL    3
> #define ST_DOT1      4
> #define ST_DOT2      5
> 
> int
> path_canon(const str_t *src, const str_t *slim,
>         str_t *dst, str_t *dlim,
>         int srcsep, int dstsep)
> {
>     str_t *start = dst, *prev;
>     int state = ST_START;
> 
>     while (src < slim && dst < dlim) {
>         switch (state) {
>             case ST_START:
>                 state = ST_SEPARATOR;
>                 if (*src == srcsep) {
>                     *dst++ = dstsep; src++;
>                     break;
>                 }
>             case ST_SEPARATOR:
>                 if (*src == '\0') {
>                     *dst = '\0';
>                     return dst - start;
>                 } else if (*src == srcsep) {
>                     src++;
>                     break;
>                 } else if (*src == '.') {
>                     state = ST_DOT1;
>                 } else {
>                     state = ST_NORMAL;
>                 }
>                 *dst++ = *src++;
>                 break;
>             case ST_NORMAL:
>                 if (*src == '\0') {
>                     *dst = '\0';
>                     return dst - start;
>                 } else if (*src == srcsep) {
>                     state = ST_SEPARATOR;
>                     *dst++ = dstsep; src++;
>                     break;
>                 }
>                 *dst++ = *src++;
>                 break;
>             case ST_DOT1:
>                 if (*src == '\0') {
>                     dst--;
>                     *dst = '\0';
>                     return dst - start;
>                 } else if (*src == srcsep) {
>                     state = ST_SEPARATOR;
>                     dst--;
>                     break;
>                 } else if (*src == '.') {
>                     state = ST_DOT2;
>                     *dst++ = *src++;
>                     break;
>                 }
>                 state = ST_NORMAL;
>                 *dst++ = *src++;
>                 break;
>             case ST_DOT2:
>                 if (*src == '\0' || *src == srcsep) {
>                         /* note src is not advanced in this case */
>                     state = ST_SEPARATOR;
>                     dst -= 2;
>                     prev = dst - 1;
>                     if (dst == start || prev == start) {
>                         break;
>                     }
>                     do {
>                         dst--;
>                         prev = dst - 1;
>                     } while (dst > start && *prev != dstsep);
>                     break;
>                 }
>                 state = ST_NORMAL;
>                 *dst++ = *src++;
>                 break;
>         }
>     }
> 
>     PMNO(errno = ERANGE);
>     return -1;
> }
> 
-- 
Shahar Evron [EMAIL PROTECTED]
Technical Consultant
Zend Technologies

Mobile: +972 54 30 99 446
Office: +972  3 75 39 500 ext. 9546

Attachment: signature.asc
Description: This is a digitally signed message part

Reply via email to