Philippe Lhoste wrote:

Robert Roessler wrote:

Philippe Lhoste wrote:

While we are on the topic, I asked some time ago a request on the PHP lexer which when unnoticed...
Is it possible to support hexadecimal numbers in PHP? In the form 0x1BADBEEF, of course.


How about this, Philippe? :)

// recognize bases 8,10 or 16 integers OR floating-point numbers
if (!IsADigit(ch)
   && strchr(".xXabcdefABCDEF", ch) == NULL
   && ((ch != '-' && ch != '+') || (chPrev != 'e' && chPrev != 'E'))) {


Thank you Robert. I am quite busy these times, and I am quite reluctent to dive into LexHTML...
I can at least apply this patch.
I expect it to become official.

Cool... please make sure that the indents get changed back to Tab chars - I made them spaces just so they would "fit" in the message.


These FOUR lines replace line 1515 in [naturalment] LexHTML.cxx. I am not sure who the "owner" of this module is, but this code works well for me. BTW, it fixes a small bug in the original, in that POSITIVE exponents were not seen as part of the float.

N.B. - the 'strchr' also handles the 'e' and 'E' exponent cases.

OTOH, the PHP numeric constant recognition and handling still has a number of "weak" spots because it does not track from the beginning what kind of literal it is: octal and hexadecimal numbers can NOT have decimal points and exponents, but decimal literals CAN "morph" into floating-point ones.


Well, that's the case for most lexers. I improved the CPP lexer (and similar ones amongst those I maintain/use, like POV, Lua, etc.) with a much better number parsing, using an automaton.
It was able, for example, to end number style when finding a second decimal point.


Its weak spot was when finding something like 3.14.159, it sees two consecutive correct decimal numbers and display them as regular numbers... So I added some rules like you can't have two consecutive numbers, you can't have a number immediately after a string, etc.

Yup, Scintilla lexers are not full parsers for the languages they color... OTOH, I am happier when they at least work properly for LEGAL input! :)


Creating this quick patch for you made me see a "funny" case that is not handled correctly: "0xEE+4" *should* be seen as a hex int, a "+" operator, and a decimal int... but it isn't by any lexer that doesn't track exactly what it is tokenizing.

In fact, I examined LexCaml, and realized that even though it *does* know what kind of numeric literal it is seeing, it was NOT disallowing the above case: an exponent on a hex int... it has been fixed. :)

But this IS an improvement, and it will make Philippe happy! :)


Yes, thank you again. :-D

You are welcome.

Robert Roessler
[EMAIL PROTECTED]
http://www.rftp.com
_______________________________________________
Scintilla-interest mailing list
[email protected]
http://mailman.lyra.org/mailman/listinfo/scintilla-interest

Reply via email to