php-i18n Digest 28 Jun 2006 10:40:07 -0000 Issue 332

Topics (messages 1013 through 1022):

Re: [PHP-DEV] RFC: Error handling in HTTP input decoding
        1013 by: Jared Williams
        1014 by: Andrei Zmievski

TextIterator changes
        1015 by: Andrei Zmievski
        1016 by: Michael Wallner

Re: [PHP-DEV] Re: TextIterator changes
        1017 by: Andrei Zmievski
        1018 by: Michael Wallner
        1020 by: Marcus Boerger
        1021 by: Andrei Zmievski

Re: [PHP-DEV] TextIterator changes
        1019 by: Andrei Zmievski

Renaming unicode_semantics
        1022 by: Andrei Zmievski

Administrivia:

To subscribe to the digest, e-mail:
        [EMAIL PROTECTED]

To unsubscribe from the digest, e-mail:
        [EMAIL PROTECTED]

To post to the list, e-mail:
        [email protected]


----------------------------------------------------------------------
--- Begin Message ---
 

> -----Original Message-----
> From: Andrei Zmievski [mailto:[EMAIL PROTECTED] 
> Sent: 22 June 2006 22:46
> To: PHP Internals
> Cc: PHP I18N
> Subject: [PHP-DEV] RFC: Error handling in HTTP input decoding
> 
> I'd like to solicit opinions on how we should treat 
> conversion failures 
> during HTTP input decoding. There are two issues at hand: fallback 
> mechanism and application-driven decoding in case of failure. Let's 
> look at the proposal for the latter one first.
> 
> If the decoding of HTTP input fails (and the failure state would be 
> achieved as soon as even one variable fails), PHP should set an error 
> flag somewhere that is accessible to the user, via either a global 
> variable or a function. It should also keep the original request data 
> around (query string, POST body, and cookie data). The application 
> should be able to access this data, since the encoding can be 
> passed in 
> the query string [1]. The application can then check this error flag 
> and then call a function -- request_decode() perhaps -- to ask PHP to 
> re-decode the request data based on a this specific encoding. For 
> example:
> 
>    if (request_decoding_failed()) {
>       request_decode(request_get_raw('ei'));
>    }
> 
> We might be able to tie this in with the input filter, but that means 
> that the input filter will have to be required by PHP. I am open to 
> other suggestions in this area.
> 
> As for the first issue, PHP attempts to decode the input using the 
> value of the unicode.output_encoding setting, because that is 
> the most 
> logical choice if we assume that the clients send the data 
> back in the 
> encoding that the page with the form was in. We could implement a 
> fallback mechanism where PHP looks at the Accept-Charset 
> header sent by 
> the client[2]. This header is supposed to indicate what 
> character sets 

https://bugzilla.mozilla.org/show_bug.cgi?id=18643

Maybe of interest, it's the kludge for determining form charsets, after the
charset in the Content-Type header broke too much.

> are acceptable for the response. While this is not the same as 
> specifying the character set of the request, it might be a 
> good enough 
> indicator of it. Or we could simply set the error state and let 
> application figure out what charset it wants to use for decoding.
> 
> Thanks for your attention.
> 
> -Andrei
> 
> [1] http://search.yahoo.com/search?ei=UTF-8&p=php
> [2] http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html
> 
> -- 
> PHP Internals - PHP Runtime Development Mailing List
> To unsubscribe, visit: http://www.php.net/unsub.php
> 

--- End Message ---
--- Begin Message ---
Thanks for the link!

-Andrei


On Jun 22, 2006, at 5:11 PM, Jared Williams wrote:

https://bugzilla.mozilla.org/show_bug.cgi?id=18643

Maybe of interest, it's the kludge for determining form charsets, after the
charset in the Content-Type header broke too much.


--- End Message ---
--- Begin Message --- I am working on implementing BreakIterator API [1]. I considered two approaches: making a separate class or merging the API into the existing TextIterator. Having a separate class would be a bit cleaner, but I can see people wanting to use it in foreach(), and since TextIterator already provides a lot of BreakIterator's functionality, I decided that merging would be the best option. However, there is an overlap between the BreakIterator API and the current TextIterator one, so there will have to be some changes.

1. TextIterator::current() signature will change from:

   mixed current()

to:

   mixed current(integer &$offset)

in order to support BreakIterator's functionality of returning offset in current().

2. TextIterator::next() will return the offset of the next boundary instead of returning nothing.

3. TextIterator::rewind() will be renamed to TextIterator::first() to conform to BreakIterator's first()/last() API.

So this is heads up. Let me know if you have a problem with this.

-A

[1] http://icu.sourceforge.net/apiref/icu4c/ubrk_8h.html

--- End Message ---
--- Begin Message ---
Andrei Zmievski wrote:
I am working on implementing BreakIterator API [1]. I considered two approaches: making a separate class or merging the API into the existing TextIterator. Having a separate class would be a bit cleaner, but I can see people wanting to use it in foreach(), and since TextIterator already provides a lot of BreakIterator's functionality, I decided that merging would be the best option. However, there is an overlap between the BreakIterator API and the current TextIterator one, so there will have to be some changes.

1. TextIterator::current() signature will change from:

   mixed current()

to:

   mixed current(integer &$offset)

This will raise the same issue as we have/had with the reflection API:
[EMAIL PROTECTED]:~/build/php-unicode-debug$ cli -r 'interface i{function f();} 
class c implements i{function f($a){}}'

Fatal error: Declaration of c::f() must be compatible with that of i::f() in 
Command line code on line 1


Nah, don't look at me--I don't like that either.


3. TextIterator::rewind() will be renamed to TextIterator::first() to conform to BreakIterator's first()/last() API.

So this is heads up. Let me know if you have a problem with this.

Huh? Rename or alias? It can't implement Iterator if there's no rewind() method.

Regards,
--
Michael

--- End Message ---
--- Begin Message --- TextIterator does not implement Iterator interface, only Traversable. It just happens to have functions of the same name as Iterator.

-Andrei

On Jun 23, 2006, at 2:16 PM, Michael Wallner wrote:

Andrei Zmievski wrote:
I am working on implementing BreakIterator API [1]. I considered two approaches: making a separate class or merging the API into the existing TextIterator. Having a separate class would be a bit cleaner, but I can see people wanting to use it in foreach(), and since TextIterator already provides a lot of BreakIterator's functionality, I decided that merging would be the best option. However, there is an overlap between the BreakIterator API and the current TextIterator one, so there will have to be some changes.
1. TextIterator::current() signature will change from:
   mixed current()
to:
   mixed current(integer &$offset)

This will raise the same issue as we have/had with the reflection API:
[EMAIL PROTECTED]:~/build/php-unicode-debug$ cli -r 'interface i{function f();} class c implements i{function f($a){}}'

Fatal error: Declaration of c::f() must be compatible with that of i::f() in Command line code on line 1


Nah, don't look at me--I don't like that either.


3. TextIterator::rewind() will be renamed to TextIterator::first() to conform to BreakIterator's first()/last() API.
So this is heads up. Let me know if you have a problem with this.

Huh? Rename or alias? It can't implement Iterator if there's no rewind() method.

Regards,
--
Michael

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

--- End Message ---
--- Begin Message ---
Andrei Zmievski wrote:
TextIterator does not implement Iterator interface, only Traversable. It just happens to have functions of the same name as Iterator.

Ah, okay.
That leaves the picky OO strictness I personally don't like.
I think we should give PHP some freedom back in this area.


Regards,
--
Michael

--- End Message ---
--- Begin Message ---
Hello Andrei,

  you should reconsider giving it full iterator interface. Adding more
functionality is of course quite easy and no problem at all. Sharing an
interface also makes the whole thing feel much better as in no conflicts
as in consistent. Actually that's what interfaces are for.

Friday, June 23, 2006, 11:20:28 PM, you wrote:

> TextIterator does not implement Iterator interface, only Traversable. 
> It just happens to have functions of the same name as Iterator.

> -Andrei

> On Jun 23, 2006, at 2:16 PM, Michael Wallner wrote:

>> Andrei Zmievski wrote:
>>> I am working on implementing BreakIterator API [1]. I considered two 
>>> approaches: making a separate class or merging the API into the 
>>> existing TextIterator. Having a separate class would be a bit 
>>> cleaner, but I can see people wanting to use it in foreach(), and 
>>> since TextIterator already provides a lot of BreakIterator's 
>>> functionality, I decided that merging would be the best option. 
>>> However, there is an overlap between the BreakIterator API and the 
>>> current TextIterator one, so there will have to be some changes.
>>> 1. TextIterator::current() signature will change from:
>>>    mixed current()
>>> to:
>>>    mixed current(integer &$offset)
>>
>> This will raise the same issue as we have/had with the reflection API:
>> [EMAIL PROTECTED]:~/build/php-unicode-debug$ cli -r 'interface 
>> i{function f();} class c implements i{function f($a){}}'
>>
>> Fatal error: Declaration of c::f() must be compatible with that of 
>> i::f() in Command line code on line 1
>>
>>
>> Nah, don't look at me--I don't like that either.
>>
>>
>>> 3. TextIterator::rewind() will be renamed to TextIterator::first() to 
>>> conform to BreakIterator's first()/last() API.
>>> So this is heads up. Let me know if you have a problem with this.
>>
>> Huh? Rename or alias? It can't implement Iterator if there's no 
>> rewind() method.

Best regards,
 Marcus

--- End Message ---
--- Begin Message --- Thanks, I know what they are for. I remember discussing Iterator vs. Traversable with you on IRC, and for some reason we settled on Traversable. Anyway, I'm changing it now.

-Andrei


On Jun 24, 2006, at 2:35 AM, Marcus Boerger wrote:

Hello Andrei,

  you should reconsider giving it full iterator interface. Adding more
functionality is of course quite easy and no problem at all. Sharing an interface also makes the whole thing feel much better as in no conflicts
as in consistent. Actually that's what interfaces are for.

Friday, June 23, 2006, 11:20:28 PM, you wrote:

TextIterator does not implement Iterator interface, only Traversable.
It just happens to have functions of the same name as Iterator.

-Andrei

On Jun 23, 2006, at 2:16 PM, Michael Wallner wrote:

Andrei Zmievski wrote:
I am working on implementing BreakIterator API [1]. I considered two
approaches: making a separate class or merging the API into the
existing TextIterator. Having a separate class would be a bit
cleaner, but I can see people wanting to use it in foreach(), and
since TextIterator already provides a lot of BreakIterator's
functionality, I decided that merging would be the best option.
However, there is an overlap between the BreakIterator API and the
current TextIterator one, so there will have to be some changes.
1. TextIterator::current() signature will change from:
   mixed current()
to:
   mixed current(integer &$offset)

This will raise the same issue as we have/had with the reflection API:
[EMAIL PROTECTED]:~/build/php-unicode-debug$ cli -r 'interface
i{function f();} class c implements i{function f($a){}}'

Fatal error: Declaration of c::f() must be compatible with that of
i::f() in Command line code on line 1


Nah, don't look at me--I don't like that either.


3. TextIterator::rewind() will be renamed to TextIterator::first () to
conform to BreakIterator's first()/last() API.
So this is heads up. Let me know if you have a problem with this.

Huh? Rename or alias? It can't implement Iterator if there's no
rewind() method.

Best regards,
 Marcus

--- End Message ---
--- Begin Message --- Sean (on IRC) convinced me that something called *Iterator had better implement Iterator interface (which TextIterator currently does not). So changing method signatures is out of the question. Towards that, the current functions will stay as they are, but I'll have to add current_offset() (for getting the offset of the current element as opposed to the element itself), and alias rewind() to first().

-Andrei

On Jun 23, 2006, at 2:00 PM, Andrei Zmievski wrote:

I am working on implementing BreakIterator API [1]. I considered two approaches: making a separate class or merging the API into the existing TextIterator. Having a separate class would be a bit cleaner, but I can see people wanting to use it in foreach(), and since TextIterator already provides a lot of BreakIterator's functionality, I decided that merging would be the best option. However, there is an overlap between the BreakIterator API and the current TextIterator one, so there will have to be some changes.

1. TextIterator::current() signature will change from:

   mixed current()

to:

   mixed current(integer &$offset)

in order to support BreakIterator's functionality of returning offset in current().

2. TextIterator::next() will return the offset of the next boundary instead of returning nothing.

3. TextIterator::rewind() will be renamed to TextIterator::first() to conform to BreakIterator's first()/last() API.

So this is heads up. Let me know if you have a problem with this.

-A

[1] http://icu.sourceforge.net/apiref/icu4c/ubrk_8h.html

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

--- End Message ---
--- Begin Message --- I am considering renaming unicode_semantics to unicode.semantics to conform to the rest of the Unicode-related INI settings. If there are no major objections, I am going to patch it tomorrow.

-Andrei

--- End Message ---

Reply via email to