The main issue, as I already discussed with Andrei (sorry, our
discussions are stealth since I see him almost every day even though I
try hard to avoid him) is how we handle encoding errors if we jit at
runtime and process the entire array at that time.  I agree that this is
architecturally the right approach, but if someone injects some bogus
GET data, for example, even though the app doesn't even try to access
it, it is going to be encoded when the app tries to get at the first GET
arg and at that point there would be an error if that extra GET data was
bogus.

We obviously don't want it to be possible to arbitrarily create errors
like that, but at the same time it needs to be possible for the
application to discover encoding errors.  So we probably need to make
the error handling pretty smart.  For example, treat errors encoding the
actual entry they are trying to access as more serious than an error
encoding another element that just happened to be encoded at that point.
 And then later if they try to access a previously encoded element that
had an error throw the more serious error at that point.  Or something
along those lines.

I suppose we could also jit right down to the single element level and
not actually do the entire array on the first access to that GPC array.

-Rasmus

Rui Hirokawa wrote:
> I think #2 is better than #1.
> The current implementation of mbstring is based on the solution similar
> to #1. It is simple and stable, but, #2 has more flexibility.
> 
> Rui
> 
> On Thu, 14 Dec 2006 21:59:44 +0100
> Pierre <[EMAIL PROTECTED]> wrote:
> 
>> Hello,
>>
>> Yesterday, Ilia, Andrei and I discussed the possible solutions to solve
>> the input encoding in php6 (unicode). I will try to describe them here.
>>
>> I do not go too deep in the details,  the goal is to choose one
>> solution and then propose a patch to test. Our preference goes to
>> the solution #2.
>>
>> --
>> Solution #1:
>> ------------
>> The idea here is to detect encoding, encode and register the variable
>> during the request initialization (before the script gets the hand).
>> Besides the encoding detection, it is how it works in the actual
>> implementation (all php versions).
>>
>> * Init
>>  - Parse the request into an array.
>>  - locate _charset_ or use unicode.request_encoding
>> -  filter/decode/register the variable like it is done now
>>
>> * Runtime
>> Just like now, the auto_globals (with or without jit) are declared and
>> ready to be used.
>>
>> This solution has one advantage, it requires only a few changes in
>> the engine. The request processing functions need to be changed
>> to detect the encoding.
>>
>> The main disadvantages are:
>> - the lack of flexibility, encoding must be set before the script gets
>>   the hand, using vhost config or htaccess
>> - the possible bad encoding detection will force the user to manually
>>   parse the raw request (when available).
>>
>>
>> Solution #2: add (true) JIT support for GET/POST/COOKIE/...
>> ------------
>> Instead of doing all the precessing during the init phase, it will be
>> done on demand when a input variable is requested, at runtime.
>>
>> * Init
>>  - don't parse the request but simply store it for later processing
>>
>> * Runtime
>>  - when a input variable is fetched:
>>  - encoding is defined using unicode.request_encoding
>>  - filter/decode/register the complete array (post,get,...)
>>
>> The way JIT works has to be changed. It has to process the data
>> at runtime instead of register them at compile time. This is the only
>> way to be sure that the users has set the input encoding correctly
>> (or has the opportunity to set it).
>>
>> The main advantage of this solution is the absence of magic for
>> the user. The encoding detection can be checked and/or set in time
>> by the user before the  input processing, it is safe and flexible.
>>
>> I would also suggest to add a function: filter_input_encoding($type) to
>> define the encoding type at runtime instead of using ini_set (which is
>> often disabled).
>>
>> There is no real technical disadvantages but requires more work and
>> changes in the engine. But these changes will also bring some more
>> performance improvements (if (0) $t = $_ENV['foo']; will not trigger
>> jit).
>>
>> --
>>
>> I would like to hear your ideas, opinions and comments. Especially
>> about the possible changes in the engine. Feel free to ask more
>> details if my explanations were unclear :)
>>
>> Regards,
>> --Pierre
> 

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to