Re: Moving string - number conversions to string libs
In message [EMAIL PROTECTED] James Mastros [EMAIL PROTECTED] wrote: Right. Unfornatly, after starting on this, I relized that that's the easy part. Unicode has a fairly-well defined way of figuring out if a character is a digit (see if it's category is Nd (Number/digit), and if so what it's value is (the value of the decimal property.) Can it also tell you the base used for digit strings in that character set... Actually I don't know if there are any modern writing systems that don't use base ten but certainly if you were dealing with some ancient scripts that used sexagesimal numbers that might be a problem ;-) However, there appears to be no good way of determining if somthing is a decimal point, a sign indicator, or an E/e (exponent signifier). I suspected there wouldn't be. The attached patch will let the chartype layer decide if a character is a digit, and what it's value is. The patch seems to be missing though... Note also that is_digit should now return the value of the digit if it is a digit, or 42 if it isn't. (I had to use somthing, and ~0 sometimes wanted to be (char)~0, and sometimes (INTVAL)~0, so I decided not to use ~0. 0, of course, can't be used for not-a-digit, since is_digit('0')==0. I was assuming there would a separate digit_value() routine to avoid that problem. Apart from anything else there will doubtless me many other is_xxx() routines in due course which will be simple boolean tests. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu
Re: Moving string - number conversions to string libs
On Thu, 06 Dec 2001 00:16:34 GMT, Tom Hughes wrote: So far I have added as is_digit() call to the character type layer to replace the existing isdigit() calls. There seems to be an overlap with the /\d/ character class in regexes. Can't you use the same test? Can't you use the definition of that character class, whatever form it may be in? -- Bart.
Re: Moving string - number conversions to string libs
On Thu, Dec 06, 2001 at 02:17:31AM +, Alex Gough wrote: Also, for string - integer conversion I think we ought to be scanning for a float then turning the result into an integer (as 1234.56e2 is one). Does scanning for a float include 1234,56e2 or any other locale specific representation? -Scott -- Jonathan Scott Duff [EMAIL PROTECTED]
Re: Moving string - number conversions to string libs
In message [EMAIL PROTECTED] James Mastros [EMAIL PROTECTED] wrote: On Mon, 3 Dec 2001, Tom Hughes wrote: It's completely wrong I would have thought - the encoding layer cannot know that a given code point is a digit so it can't possibly do string to number conversion. You need to use the encoding layer to fetch each character and then the character set layer to determine what digit it represents. Right. And then you need to apply some unified logic to get from this vector of digits (and other such symbols) to a value. Indeed, and that logic needs to be in the string layer where it can use both the encoding routines and the character type routines. I have just rearranged things to reflect that. I'm just having nightmares of subtily different definitions of what a numeric constant looks like depending on the string encoding, because of different bits o' code not being quite in sync. Code duplication bad, code sharing good. Absolutely. That code is now in one place. (The charset layer should still be involved somewhere, because Unicode (for ex) has a digit value property. This makes, say, aribic numerials (which don't look at all what what a normal person calls aribic numerals, BTW) work properly. (OTOH, it might also do strange things with ex Hebrew, where the letters are also numbers (Aleph is also 1, Bet is also 2, etc.)) So far I have added as is_digit() call to the character type layer to replace the existing isdigit() calls. To do things completely right we need to extend that with calls to get the digit value, check for sign characters etc, rather than assuming ASCIIish like it does now. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: Moving string - number conversions to string libs
On Thu, 6 Dec 2001, Tom Hughes wrote: In message [EMAIL PROTECTED] James Mastros [EMAIL PROTECTED] wrote: On Mon, 3 Dec 2001, Tom Hughes wrote: It's completely wrong I would have thought - the encoding layer cannot know that a given code point is a digit so it can't possibly do string to number conversion. You need to use the encoding layer to fetch each character and then the character set layer to determine what digit it represents. Right. And then you need to apply some unified logic to get from this vector of digits (and other such symbols) to a value. Indeed, and that logic needs to be in the string layer where it can use both the encoding routines and the character type routines. I have just rearranged things to reflect that. Yes, that does make more sense. I think we need some string docs. Also, for string - integer conversion I think we ought to be scanning for a float then turning the result into an integer (as 1234.56e2 is one). We'll also eventually need a string-BigNum and string-BigInt conversion, and if things are to upgrade gracefully and silently this might need to happen as the string is scanned. Alex Gough
Moving string - number conversions to string libs
The string to number conversion stuff should really be done by the string encodings... I think this is the right way to get this happening, comments? Alex Gough Index: string.c === RCS file: /home/perlcvs/parrot/string.c,v retrieving revision 1.20 diff -u -r1.20 string.c --- string.c2001/11/28 15:22:51 1.20 +++ string.c2001/12/03 17:36:57 @@ -332,6 +332,24 @@ return cmp; } +INTVAL string_to_int (struct Parrot_Interp *interpreter, STRING *s) { +if (s == NULL) { +return 0; +} +else { +return s-encoding-extract_int(s-bufstart); +} +} + +FLOATVAL string_to_num (struct Parrot_Interp *interpreter, STRING *s) { +if (s == NULL) { +return 0.0; +} +else { +return s-encoding-extract_num(s-bufstart); +} +} + /* * Local variables: * c-indentation-style: bsd Index: classes/perlstring.pmc === RCS file: /home/perlcvs/parrot/classes/perlstring.pmc,v retrieving revision 1.4 diff -u -r1.4 perlstring.pmc --- classes/perlstring.pmc 2001/11/30 06:20:00 1.4 +++ classes/perlstring.pmc 2001/12/03 17:36:58 @@ -45,12 +45,12 @@ INTVAL get_integer () { STRING* s = (STRING*) SELF-cache.struct_val; -return strtol(s-bufstart,NULL,10); + return string_to_int(interpreter, s); } FLOATVAL get_number () { STRING* s = (STRING*) SELF-cache.struct_val; - return strtod(s-bufstart,NULL); + return string_to_num(interpreter, s); } STRING* get_string () { Index: encodings/singlebyte.c === RCS file: /home/perlcvs/parrot/encodings/singlebyte.c,v retrieving revision 1.1 diff -u -r1.1 singlebyte.c --- encodings/singlebyte.c 2001/10/31 22:51:31 1.1 +++ encodings/singlebyte.c 2001/12/03 17:36:58 @@ -26,6 +26,18 @@ return *bptr; } +static INTVAL +singlebyte_extract_int (const void *ptr) { +char *s = (char*)ptr; +return (INTVAL)strtol(s, NULL, 10); /* XXX: Fixme! */ +} + +static FLOATVAL +singlebyte_extract_num (const void*ptr) { +char *s = (char*)ptr; +return strtod(s, NULL); /* XXX: Fixme! */ +} + static void * singlebyte_encode (void *ptr, INTVAL c) { byte_t *bptr = ptr; @@ -59,6 +71,8 @@ 1, singlebyte_characters, singlebyte_decode, +singlebyte_extract_int, +singlebyte_extract_num, singlebyte_encode, singlebyte_skip_forward, singlebyte_skip_backward Index: encodings/utf16.c === RCS file: /home/perlcvs/parrot/encodings/utf16.c,v retrieving revision 1.1 diff -u -r1.1 utf16.c --- encodings/utf16.c 2001/10/31 22:51:31 1.1 +++ encodings/utf16.c 2001/12/03 17:36:58 @@ -56,6 +56,16 @@ return c; } +static INTVAL +utf16_extract_int (const void *ptr) { +return 0; /* XXX: Write me! */ +} + +static FLOATVAL +utf16_extract_num (const void *ptr) { +return 0.0; /* XXX: Write me! */ +} + static void * utf16_encode (void *ptr, INTVAL c) { utf16_t *u16ptr = ptr; @@ -127,6 +137,8 @@ UTF16_MAXLEN, utf16_characters, utf16_decode, +utf16_extract_int, +utf16_extract_num, utf16_encode, utf16_skip_forward, utf16_skip_backward Index: encodings/utf8.c === RCS file: /home/perlcvs/parrot/encodings/utf8.c,v retrieving revision 1.1 diff -u -r1.1 utf8.c --- encodings/utf8.c2001/10/31 22:51:31 1.1 +++ encodings/utf8.c2001/12/03 17:36:59 @@ -76,6 +76,16 @@ return c; } +static INTVAL +utf8_extract_int (const void *ptr) { +return 0;/* XXX: write me! */ +} + +static FLOATVAL +utf8_extract_num (const void *ptr) { +return 0.0; /* XXX: write me! */ +} + static void * utf8_encode (void *ptr, INTVAL c) { utf8_t *u8ptr = ptr; @@ -124,6 +134,8 @@ UTF8_MAXLEN, utf8_characters, utf8_decode, +utf8_extract_int, +utf8_extract_num, utf8_encode, utf8_skip_forward, utf8_skip_backward Index: include/parrot/encoding.h === RCS file: /home/perlcvs/parrot/include/parrot/encoding.h,v retrieving revision 1.1 diff -u -r1.1 encoding.h --- include/parrot/encoding.h 2001/10/31 22:51:32 1.1 +++ include/parrot/encoding.h 2001/12/03 17:36:59 @@ -18,6 +18,8 @@ INTVAL max_bytes; INTVAL (*characters)(const void *ptr, INTVAL bytes); INTVAL (*decode)(const void *ptr); +INTVAL (*extract_int)(const void *ptr); +FLOATVAL (*extract_num)(const void *ptr); void *(*encode)(void *ptr, INTVAL c); void *(*skip_forward)(void *ptr, INTVAL n); void
Re: Moving string - number conversions to string libs
On Mon, Dec 03, 2001 at 05:42:15PM +, Alex Gough wrote: The string to number conversion stuff should really be done by the string encodings... I think this is the right way to get this happening, comments? Looks like the right way to me. Could you commit it? I suppose this is the time to declare that 0.0.4 will have all the string encodings implemented to the same degree. -- Everything that can ever be invented has been invented - Charles H. Duell, Commisioner of U.S. Patents, 1899.
Re: Moving string - number conversions to string libs
On Mon, 3 Dec 2001, Simon Cozens wrote: On Mon, Dec 03, 2001 at 05:42:15PM +, Alex Gough wrote: The string to number conversion stuff should really be done by the string encodings... I think this is the right way to get this happening, comments? Looks like the right way to me. Could you commit it? I've just realised that these were slightly wrong (as far as the future goes), I'll fis them up tomorrow. Alex Gough
Re: Moving string - number conversions to string libs
In message [EMAIL PROTECTED] Simon Cozens [EMAIL PROTECTED] wrote: On Mon, Dec 03, 2001 at 05:42:15PM +, Alex Gough wrote: The string to number conversion stuff should really be done by the string encodings... I think this is the right way to get this happening, comments? Looks like the right way to me. Could you commit it? It's completely wrong I would have thought - the encoding layer cannot know that a given code point is a digit so it can't possibly do string to number conversion. You need to use the encoding layer to fetch each character and then the character set layer to determine what digit it represents. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: Moving string - number conversions to string libs
On Mon, 3 Dec 2001, Tom Hughes wrote: It's completely wrong I would have thought - the encoding layer cannot know that a given code point is a digit so it can't possibly do string to number conversion. You need to use the encoding layer to fetch each character and then the character set layer to determine what digit it represents. Right. And then you need to apply some unified logic to get from this vector of digits (and other such symbols) to a value. In other words, I think some sort of library rotuine is the Right Thing here, it shouldn't pull (directly) off of any vtable. I'm just having nightmares of subtily different definitions of what a numeric constant looks like depending on the string encoding, because of different bits o' code not being quite in sync. Code duplication bad, code sharing good. (The charset layer should still be involved somewhere, because Unicode (for ex) has a digit value property. This makes, say, aribic numerials (which don't look at all what what a normal person calls aribic numerals, BTW) work properly. (OTOH, it might also do strange things with ex Hebrew, where the letters are also numbers (Aleph is also 1, Bet is also 2, etc.)) (Damm... when did I become such a Unicode fanatic. The only language I acatualy know doesn't even use most of Latin-1.) -=- James Mastros -- In the case of alchemy v chemistry the chemists know whether it will probably go bang before they try it (and the chemical engineers still duck anyway). -=- Alan Cox