Hello Pierre, Thanks for your reply. :-)
----- Original Message ----- From: "Pierre" Sent: Friday, August 11, 2006 > Hello, > > Note that I also answer your previous mail here :) > > On Fri, 11 Aug 2006 06:18:13 -0500 > [EMAIL PROTECTED] ("Matt W") wrote: > > > Hello again, > > > > I discovered a couple more things is_numeric... is causing problems > > with (leading whitespace). I doubt any of the examples I've given > > make sense to regular users who don't know what's happening behind > > the scenes. Add these to the "wrong" list: > > > > is_numeric(' .123') // bool(false) > > this one should return true. > > > ' .123' + 0 // int(0) > > ' 0.123' is casted to 0, 0+0. But if the ' .123' is allowed, it should > then result in 0.123+0, which is the correct behavior. I may be misunderstanding you, but ' 0.123'+0 results in the correct 0.123. Just without the leading 0 that it becomes wrong. :-) > > One more thing I was curious about as far as keeping things > > consistent is with is_numeric... (and therefore > > convert_scalar_to_number()), hex strings are allowed/work, but not > > with convert_to_[long|double](). > > I did not check convert_* while fixing/enhancing filters, but I think > there is a higher risk of breakages if you change these functions. We > should first have a clear view of what is used where and how the > changes affect end user scripts and extensions. It sounds like an > impossible task (except for 6.0). I was just wondering if convert_to_[long|double] should check for and allow hex strings like convert_scalar_to_number does (because it uses is_numeric...). > I suggest you to take a look at the ext/filter code and what we accept. > I spend a far amount of times to ask and listen to users to see what > they expext. I'm quite happy with the current state and for what I > hear, the users too. > > You can check the FILTER_VALIDATE_* mode, they do the same operations > that we are discussing here. The sanitize mode only checks for > unexpected chars. Have to admit that I'm not really familiar with any of the filter stuff. :-/ I'll keep that in mind though. > > So a few PHP functions properly > > accept hex strings, but most will convert one to 0. Should anything > > be done about this difference? I have an idea about allowing hex > > strings in to_[long|double] using the new is_numeric... functions I > > will propose. > > > Few things about the current is_numeric... and hex strings, which I > > think I'll change in my proposal unless I hear opinions otherwise: > > *) Leading whitespace isn't allowed > > They should be allowed (leading/ending). Not sure if you're talking about currently, or agreeing with me. ;-) For these 3 points, I was only referring to hex strings. Leading space is allowed with non-hex. > > *) A sign (±) isn't allowed > > It is allowed except for in the hexadecimal notation (see the manual > page of is_numeric), so if you talk only about is_numeric and the hex > notation, it is a bug fix. Again, only referring to hex. Ah yes, I see the manual page for is_numeric(). Hmm OK, not sure if you'd want that to be changed then... The internal function would be fine ($n = -0xABC works in the parser), I guess, but maybe sign & hex returning true with PHP's is_numeric() is undesired. > > *) Hex doubles don't work. I think they should (for *whole* numbers > > only obviously, no "."). So '0xFFFFFFFFFF' + 0 for example, works on > > a 32-bit system. > > They should not, an hexadecimal notation represents an integer (long), > not a double. A double could be the result of a cast when it is out of > the integer range. Well, I think of the hex notation as just a whole number (non-floating) of whatever range/size. About the cast, yeah, I see that's what is done now in the parser if the hex number is between LONG_MAX and ULONG_MAX -- results in a double. hexdec(), etc. will also return a double if needed. Right now, since hex doubles don't work, you also have (on 32-bit): is_numeric('0x7FFFFFFF') // bool(true) is_numeric('0x80000000') // bool(false) > > If that last one can be changed, it also should be in the language > > parser of course (you know, for $n = 0xFFFFFFFFFF;). > > It is the endless problem about 32/64bits issues, also I don't think > you are considering to use double in a for loop? :) In a for loop of a script? No :-), but someone may want to specify a number in hex larger than ULONG_MAX (and it may work if they're 64-bit, then break on 32-bit). I've not had a need for it (in parser), but I would like the larger hex strings to work as I have code like ('0x' . $hexstr) + 0 where $hexstr comes from a packed/binary number (after bin2hex()) that may be any size. I don't want to use hexdec() because it's slower (and this is speed-critical, in a loop) and usually the value WILL fit in a long. That reminds me, a "(number)" typecast would be nice to have. :-) > > > Since I've been looking at is_numeric_[string|unicode], I found a > > > weird thing it causes; probably doesn't make sense to users; bug? > > > Look: > > > > > abs(-1e500) // float(INF) > > > abs('-1e500') // int(1) WRONG > > Agreed, it should float(INF) > > > > abs('-1e100') // float(1.0E+100) > > > is_finite(1e500) // bool(false) > > > > is_finite('1e500') // bool(true) WRONG > > Agreed, should float(INF) You mean bool(false) ;-) > > > is_finite('1e100') // bool(true) > > > is_numeric(1e500) // bool(true) > > > is_numeric('1e500') // bool(false) WRONG > > Agreed, float(INF) as before ... and bool(true) > > > is_numeric('1e100') // bool(true) > > > 1e500 + 123 // float(INF) > > > '1e500' + 123 // int(124) WRONG > > I get the feeling that the E notation has one bug, solving it should > most of these issues. ext/filter pass all these tests successfully. But > it had the same problems in its early days. > > > > You get the idea. That's because is_numeric_string() *ignores* the > > > value from zend_strtod() if errno==ERANGE. I don't think that's > > > right, and it doesn't happen when convert_to_double() uses > > > zend_strtod(): > > I have to check the sources :) Yes, it ignores the INF/ERANGE, for whatever reason. :-) > > > Just wondering if others think is_numeric_string() should be > > > changed in that respect? I was going to rewrite the function to > > > improve/optimize it (and submit it of course), so I can easily > > > change its behavior while I'm at it... > > It would be nice to bring consistency between functions. But changing > the behaviors at this level can have a very large impact. It has to be > done really carefully and will _many_ tests. I can help if you like, > both for the tests and the implementation. I will send a new example function to the list in a few days, after doing some tests, etc. Its behavior should be the same, except for these bugs. And currently, strto[l|d] is sometimes called unnecessarily -- for example, I think '123 foo' would result in zend_strtod() after strtol() -- pretty sure that can be avoided. You'll see soon... > > > Also, is this the desired behavior of array_count_values() (manual > > > doesn't say; it also uses is_numeric...)? > > > > > > print_r(array_count_values(array(1, ' 1', ' 1 '))) > > > Array > > > ( > > > [1] => 2 > > > [ 1 ] => 1 > > > ) > > This is typically an example of why we cannot not change the behaviors > in php5, but I definitively like to do it for php 6.x. I e-mailed Andrei about array_count_values() since I think it's incorrect. If so, it should be very simple to fix -- just eliminate the use of is_numeric_string. > Cheers, > -- Pierre Thanks again, Matt P.S. I forgot to add before that I noticed a comment Derick added a few days ago for is_numeric_string -- only in 5.2 (http://cvs.php.net/viewvc.cgi/ZendEngine2/zend_operators.h?r1=1.94.2.4.2.2& r2=1.94.2.4.2.3&view=patch). It says it returns IS_DOUBLE if the number didn't fit in the integer range, but that's wrong if it's INF. :-) -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php