Re: [PHP-DEV] is_numeric_string causes function inconsistency

Pierre Fri, 11 Aug 2006 08:10:42 -0700

Hi Matt,

On 8/11/06, Matt W <[EMAIL PROTECTED]> wrote:

Hello Pierre,


Thanks for your reply. :-)

----- Original Message -----
From: "Pierre"
Sent: Friday, August 11, 2006


> Hello,
>
> Note that I also answer your previous mail here :)
>
> On Fri, 11 Aug 2006 06:18:13 -0500
> [EMAIL PROTECTED] ("Matt W") wrote:
>
> > Hello again,
> >
> > I discovered a couple more things is_numeric... is causing problems
> > with (leading whitespace).  I doubt any of the examples I've given
> > make sense to regular users who don't know what's happening behind
> > the scenes.  Add these to the "wrong" list:
> >
> > is_numeric(' .123') // bool(false)
>
> this one should return true.
>
> > ' .123' + 0 // int(0)
>
> ' 0.123' is casted to 0, 0+0. But if the ' .123' is allowed, it should
> then result in 0.123+0, which is the correct behavior.

I may be misunderstanding you, but ' 0.123'+0 results in the correct 0.123.
Just without the leading 0 that it becomes wrong. :-)


I think we should consider '.123' as 0.123. This expression is then correct.

> > One more thing I was curious about as far as keeping things
> > consistent is with is_numeric... (and therefore
> > convert_scalar_to_number()), hex strings are allowed/work, but not
> > with convert_to_[long|double]().
>
> I did not check convert_* while fixing/enhancing filters, but I think
> there is a higher risk of breakages if you change these functions. We
> should first have a clear view of what is used where and how the
> changes affect end user scripts and extensions. It sounds like an
> impossible task (except for 6.0).

I was just wondering if convert_to_[long|double] should check for and allow
hex strings like convert_scalar_to_number does (because it uses
is_numeric...).

> I suggest you to take a look at the ext/filter code and what we accept.
> I spend a far amount of times to ask and listen to users to see what
> they expext. I'm quite happy with the current state and for what I
> hear, the users too.
>
> You can check the FILTER_VALIDATE_* mode, they do the same operations
> that we are discussing here. The sanitize mode only checks for
> unexpected chars.

Have to admit that I'm not really familiar with any of the filter stuff. :-/
I'll keep that in mind though.


Please take a look, it really solves many of these issues.

> > So a few PHP functions properly
> > accept hex strings, but most will convert one to 0. Should anything
> > be done about this difference?  I have an idea about allowing hex
> > strings in to_[long|double] using the new is_numeric... functions I
> > will propose.
>
> > Few things about the current is_numeric... and hex strings, which I
> > think I'll change in my proposal unless I hear opinions otherwise:
> > *) Leading whitespace isn't allowed
>
> They should be allowed (leading/ending).

Not sure if you're talking about currently, or agreeing with me. ;-)  For
these 3 points, I was only referring to hex strings.  Leading space is
allowed with non-hex.


I would like to allow them for all types (float, integer or hex), it
is what I did in ext/filter.

> > *) A sign (±) isn't allowed
>
> It is allowed except for in the hexadecimal notation (see the manual
> page of is_numeric), so if you talk only about is_numeric and the hex
> notation, it is a bug fix.

Again, only referring to hex.  Ah yes, I see the manual page for
is_numeric().  Hmm OK, not sure if you'd want that to be changed then...
The internal function would be fine ($n = -0xABC works in the parser), I
guess, but maybe sign & hex returning true with PHP's is_numeric() is
undesired.


+/- are not allowed for hex. I think we should make the difference
between a string conversion and the parser.

(a) $a = - 0xFF
(b) $a = " - 0xFF; "

(a) is a perfectly valid expression within a _script_ (just like
$a=-2;), however (b) is a string and will require a _cast_ to INT. The
two cases should not be processed the same way.

(a) can be read as $a = -1; $a *= 0xFF;
(b) is only a string assignement, it will be casted to INT when
required and failed (int(0)).

> *) Hex doubles don't work.  I think they should (for *whole* numbers
> > only obviously, no ".").  So '0xFFFFFFFFFF' + 0 for example, works on
> > a 32-bit system.
>
> They should not, an hexadecimal notation represents an integer (long),
> not a double. A double could be the result of a cast when it is out of
> the integer range.

Well, I think of the hex notation as just a whole number (non-floating) of
whatever range/size.  About the cast, yeah, I see that's what is done now in
the parser if the hex number is between LONG_MAX and ULONG_MAX -- results in
a double.  hexdec(), etc. will also return a double if needed.

Right now, since hex doubles don't work, you also have (on 32-bit):


I don't understand what you mean by hex double :) Do you mean that we
should convert out of range HEX to double in any case?

is_numeric('0x7FFFFFFF') // bool(true)
is_numeric('0x80000000') // bool(false)

> > If that last one can be changed, it also should be in the language
> > parser of course (you know, for $n = 0xFFFFFFFFFF;).
>
> It is the endless problem about 32/64bits issues, also I don't think
> you are considering to use double in a for loop? :)

In a for loop of a script?  No :-), but someone may want to specify a number
in hex larger than ULONG_MAX (and it may work if they're 64-bit, then break
on 32-bit).  I've not had a need for it (in parser), but I would like the
larger hex strings to work as I have code like ('0x' . $hexstr) + 0  where
$hexstr comes from a packed/binary number (after bin2hex()) that may be any
size.  I don't want to use hexdec() because it's slower (and this is
speed-critical, in a loop) and usually the value WILL fit in a long.



That's  two different things, parser and cast operations. But I agree,
it is a bit tricky to keep this difference in mind while coding. But
you should do:

$a = 0+ ('0x'.$hexstr);

Some benchmarks (amd64):

$hexstr="FFFFFF";
$iter = 1000000;
$s1 = microtime(true);
for($i=0;$i<$iter;$i++) $a=hexdec("0x".$hexstr);
$s2 = microtime(true);
echo "hexdec: " . ($s2 - $s1) . "\n";

$s1 = microtime(true);
for($i=0;$i<$iter;$i++) $a=0+("0x".$hexstr);
$s2 = microtime(true);
echo "cast: " . ($s2 - $s1) . "\n";

hexdec: 2.6401779651642
cast: 1.4510979652405

That reminds me, a "(number)" typecast would be nice to have. :-)


number is a human thing, I'm not sure it fits our needs :)

> > > You get the idea.  That's because is_numeric_string() *ignores* the
> > > value from zend_strtod() if errno==ERANGE.  I don't think that's
> > > right, and it doesn't happen when convert_to_double() uses
> > > zend_strtod():
>
> I have to check the sources :)

Yes, it ignores the INF/ERANGE, for whatever reason. :-)


In my opinion, these issues can be considered as bugs and should be
fixed easily done without rewriting everything).

I will send a new example function to the list in a few days, after doing
some tests, etc.  Its behavior should be the same, except for these bugs.
And currently, strto[l|d] is sometimes called unnecessarily -- for example,
I think '123  foo' would result in zend_strtod() after strtol() -- pretty
sure that can be avoided.  You'll see soon...


Again, take a look at ext/filter, I rewrite both float and integer
(not sure if I commited "int" yet, I will check later :), without
anything of these functions. However I consider that we should first
determine what to change and where. Like separate the parser from the
cast operations (string_to_*). Also we are missing way too many tests
to valid the changes. But as I said, it is necessary to bring
consistency in this area :)

> > > print_r(array_count_values(array(1, '   1', '  1  ')))
> > > Array
> > > (
> > >     [1] => 2
> > >     [  1  ] => 1
> > > )
>
> This is typically an example of why we cannot not change the behaviors
> in php5, but I definitively like to do it for php 6.x.

I e-mailed Andrei about array_count_values() since I think it's incorrect.
If so, it should be very simple to fix -- just eliminate the use of
is_numeric_string.


The fix is certainly easy (leading spaces management), but it will
break things out there. That's what we have to avoid, imho.

P.S.  I forgot to add before that I noticed a comment Derick added a few
days ago for is_numeric_string -- only in 5.2
(http://cvs.php.net/viewvc.cgi/ZendEngine2/zend_operators.h?r1=1.94.2.4.2.2&;
r2=1.94.2.4.2.3&view=patch).  It says it returns IS_DOUBLE if the number
didn't fit in the integer range, but that's wrong if it's INF. :-)


INF is per definition not out of range, it is out of everything (-INF too) ;-)

Cheers,
--Pierre

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP-DEV] is_numeric_string causes function inconsistency

Reply via email to