Re: Moving string - number conversions to string libs

2001-12-06 Thread Tom Hughes

In message [EMAIL PROTECTED]
James Mastros [EMAIL PROTECTED] wrote:

 Right.  Unfornatly, after starting on this, I relized that that's the easy
 part.  Unicode has a fairly-well defined way of figuring out if a character
 is a digit (see if it's category is Nd (Number/digit), and if so what it's
 value is (the value of the decimal property.)

Can it also tell you the base used for digit strings in that 
character set... Actually I don't know if there are any modern
writing systems that don't use base ten but certainly if you
were dealing with some ancient scripts that used sexagesimal
numbers that might be a problem ;-)

 However, there appears to be no good way of determining if somthing is a
 decimal point, a sign indicator, or an E/e (exponent signifier).

I suspected there wouldn't be.

 The attached patch will let the chartype layer decide if a character is a
 digit, and what it's value is.  

The patch seems to be missing though...

 Note also that is_digit should now return the value of the digit if it is a
 digit, or 42 if it isn't.  (I had to use somthing, and ~0 sometimes wanted
 to be (char)~0, and sometimes (INTVAL)~0, so I decided not to use ~0.  0, of
 course, can't be used for not-a-digit, since is_digit('0')==0.

I was assuming there would a separate digit_value() routine to avoid
that problem. Apart from anything else there will doubtless me many
other is_xxx() routines in due course which will be simple boolean
tests.

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu




Re: Moving string - number conversions to string libs

2001-12-06 Thread Bart Lateur

On Thu, 06 Dec 2001 00:16:34 GMT, Tom Hughes wrote:

So far I have added as is_digit() call to the character type layer
to replace the existing isdigit() calls.

There seems to be an overlap with the /\d/ character class in regexes.
Can't you use the same test? Can't you use the definition of that
character class, whatever form it may be in?

-- 
Bart.



Re: Moving string - number conversions to string libs

2001-12-06 Thread Jonathan Scott Duff

On Thu, Dec 06, 2001 at 02:17:31AM +, Alex Gough wrote:
 Also, for string - integer conversion I think we ought to be scanning
 for a float then turning the result into an integer (as 1234.56e2 is
 one).  

Does scanning for a float include 1234,56e2 or any other locale specific
representation?

-Scott
-- 
Jonathan Scott Duff
[EMAIL PROTECTED]



Re: Moving string - number conversions to string libs

2001-12-05 Thread Tom Hughes

In message [EMAIL PROTECTED]
  James Mastros [EMAIL PROTECTED] wrote:

 On Mon, 3 Dec 2001, Tom Hughes wrote:
  It's completely wrong I would have thought - the encoding layer
  cannot know that a given code point is a digit so it can't possibly
  do string to number conversion.
 
  You need to use the encoding layer to fetch each character and
  then the character set layer to determine what digit it represents.
 Right.  And then you need to apply some unified logic to get from this
 vector of digits (and other such symbols) to a value.

Indeed, and that logic needs to be in the string layer where it can
use both the encoding routines and the character type routines. I have
just rearranged things to reflect that.

 I'm just having nightmares of subtily different definitions of what a
 numeric constant looks like depending on the string encoding, because of
 different bits o' code not being quite in sync.  Code duplication bad,
 code sharing good.

Absolutely. That code is now in one place.

 (The charset layer should still be involved somewhere, because Unicode
 (for ex) has a digit value property.  This makes, say, aribic numerials
 (which don't look at all what what a normal person calls aribic numerals,
 BTW) work properly.  (OTOH, it might also do strange things with ex
 Hebrew, where the letters are also numbers (Aleph is also 1, Bet is also
 2, etc.))

So far I have added as is_digit() call to the character type layer
to replace the existing isdigit() calls. To do things completely right
we need to extend that with calls to get the digit value, check for
sign characters etc, rather than assuming ASCIIish like it does now.

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu/




Re: Moving string - number conversions to string libs

2001-12-05 Thread Alex Gough

On Thu, 6 Dec 2001, Tom Hughes wrote:

 In message [EMAIL PROTECTED]
   James Mastros [EMAIL PROTECTED] wrote:

  On Mon, 3 Dec 2001, Tom Hughes wrote:
   It's completely wrong I would have thought - the encoding layer
   cannot know that a given code point is a digit so it can't possibly
   do string to number conversion.
  
   You need to use the encoding layer to fetch each character and
   then the character set layer to determine what digit it represents.
  Right.  And then you need to apply some unified logic to get from this
  vector of digits (and other such symbols) to a value.

 Indeed, and that logic needs to be in the string layer where it can
 use both the encoding routines and the character type routines. I have
 just rearranged things to reflect that.


Yes, that does make more sense.  I think we need some string docs.

Also, for string - integer conversion I think we ought to be scanning
for a float then turning the result into an integer (as 1234.56e2 is
one).  We'll also eventually need a string-BigNum and string-BigInt
conversion, and if things are to upgrade gracefully and silently this
might need to happen as the string is scanned.

Alex Gough




Moving string - number conversions to string libs

2001-12-03 Thread Alex Gough

The string to number conversion stuff should really be done by the
string encodings... I think this is the right way to get this
happening, comments?

Alex Gough


Index: string.c
===
RCS file: /home/perlcvs/parrot/string.c,v
retrieving revision 1.20
diff -u -r1.20 string.c
--- string.c2001/11/28 15:22:51 1.20
+++ string.c2001/12/03 17:36:57
@@ -332,6 +332,24 @@
 return cmp;
 }
 
+INTVAL string_to_int (struct Parrot_Interp *interpreter, STRING *s) {
+if (s == NULL) {
+return 0;
+}
+else {
+return s-encoding-extract_int(s-bufstart);
+}
+}
+
+FLOATVAL string_to_num (struct Parrot_Interp *interpreter, STRING *s) {
+if (s == NULL) {
+return 0.0;
+}
+else {
+return s-encoding-extract_num(s-bufstart);
+}
+}
+
 /*
  * Local variables:
  * c-indentation-style: bsd
Index: classes/perlstring.pmc
===
RCS file: /home/perlcvs/parrot/classes/perlstring.pmc,v
retrieving revision 1.4
diff -u -r1.4 perlstring.pmc
--- classes/perlstring.pmc  2001/11/30 06:20:00 1.4
+++ classes/perlstring.pmc  2001/12/03 17:36:58
@@ -45,12 +45,12 @@
 
 INTVAL get_integer () {
STRING* s = (STRING*) SELF-cache.struct_val;
-return strtol(s-bufstart,NULL,10);
+   return string_to_int(interpreter, s);
 }
 
 FLOATVAL get_number () {
STRING* s = (STRING*) SELF-cache.struct_val;
-   return strtod(s-bufstart,NULL);
+   return string_to_num(interpreter, s);
 }
 
 STRING* get_string () {
Index: encodings/singlebyte.c
===
RCS file: /home/perlcvs/parrot/encodings/singlebyte.c,v
retrieving revision 1.1
diff -u -r1.1 singlebyte.c
--- encodings/singlebyte.c  2001/10/31 22:51:31 1.1
+++ encodings/singlebyte.c  2001/12/03 17:36:58
@@ -26,6 +26,18 @@
 return *bptr;
 }
 
+static INTVAL
+singlebyte_extract_int (const void *ptr) {
+char *s = (char*)ptr;
+return (INTVAL)strtol(s, NULL, 10); /* XXX: Fixme! */
+}
+
+static FLOATVAL
+singlebyte_extract_num (const void*ptr) {
+char *s = (char*)ptr;
+return strtod(s, NULL); /* XXX: Fixme! */
+}
+
 static void *
 singlebyte_encode (void *ptr, INTVAL c) {
 byte_t *bptr = ptr;
@@ -59,6 +71,8 @@
 1,
 singlebyte_characters,
 singlebyte_decode,
+singlebyte_extract_int,
+singlebyte_extract_num,
 singlebyte_encode,
 singlebyte_skip_forward,
 singlebyte_skip_backward
Index: encodings/utf16.c
===
RCS file: /home/perlcvs/parrot/encodings/utf16.c,v
retrieving revision 1.1
diff -u -r1.1 utf16.c
--- encodings/utf16.c   2001/10/31 22:51:31 1.1
+++ encodings/utf16.c   2001/12/03 17:36:58
@@ -56,6 +56,16 @@
 return c;
 }
 
+static INTVAL
+utf16_extract_int (const void *ptr) {
+return 0; /* XXX: Write me! */
+}
+
+static FLOATVAL
+utf16_extract_num (const void *ptr) {
+return 0.0; /* XXX: Write me! */
+}
+
 static void *
 utf16_encode (void *ptr, INTVAL c) {
 utf16_t *u16ptr = ptr;
@@ -127,6 +137,8 @@
 UTF16_MAXLEN,
 utf16_characters,
 utf16_decode,
+utf16_extract_int,
+utf16_extract_num,
 utf16_encode,
 utf16_skip_forward,
 utf16_skip_backward
Index: encodings/utf8.c
===
RCS file: /home/perlcvs/parrot/encodings/utf8.c,v
retrieving revision 1.1
diff -u -r1.1 utf8.c
--- encodings/utf8.c2001/10/31 22:51:31 1.1
+++ encodings/utf8.c2001/12/03 17:36:59
@@ -76,6 +76,16 @@
 return c;
 }
 
+static INTVAL
+utf8_extract_int (const void *ptr) {
+return 0;/* XXX: write me! */
+}
+
+static FLOATVAL
+utf8_extract_num (const void *ptr) {
+return 0.0; /* XXX: write me! */
+}
+
 static void *
 utf8_encode (void *ptr, INTVAL c) {
 utf8_t *u8ptr = ptr;
@@ -124,6 +134,8 @@
 UTF8_MAXLEN,
 utf8_characters,
 utf8_decode,
+utf8_extract_int,
+utf8_extract_num,
 utf8_encode,
 utf8_skip_forward,
 utf8_skip_backward
Index: include/parrot/encoding.h
===
RCS file: /home/perlcvs/parrot/include/parrot/encoding.h,v
retrieving revision 1.1
diff -u -r1.1 encoding.h
--- include/parrot/encoding.h   2001/10/31 22:51:32 1.1
+++ include/parrot/encoding.h   2001/12/03 17:36:59
@@ -18,6 +18,8 @@
 INTVAL max_bytes;
 INTVAL (*characters)(const void *ptr, INTVAL bytes);
 INTVAL (*decode)(const void *ptr);
+INTVAL (*extract_int)(const void *ptr);
+FLOATVAL (*extract_num)(const void *ptr);
 void *(*encode)(void *ptr, INTVAL c);
 void *(*skip_forward)(void *ptr, INTVAL n);
 void 

Re: Moving string - number conversions to string libs

2001-12-03 Thread Simon Cozens

On Mon, Dec 03, 2001 at 05:42:15PM +, Alex Gough wrote:
 The string to number conversion stuff should really be done by the
 string encodings... I think this is the right way to get this
 happening, comments?

Looks like the right way to me. Could you commit it?

I suppose this is the time to declare that 0.0.4 will have all
the string encodings implemented to the same degree.

-- 
Everything that can ever be invented has been invented 
- Charles H. Duell, Commisioner of U.S. Patents, 1899.



Re: Moving string - number conversions to string libs

2001-12-03 Thread Alex Gough

On Mon, 3 Dec 2001, Simon Cozens wrote:

 On Mon, Dec 03, 2001 at 05:42:15PM +, Alex Gough wrote:
  The string to number conversion stuff should really be done by the
  string encodings... I think this is the right way to get this
  happening, comments?

 Looks like the right way to me. Could you commit it?


I've just realised that these were slightly wrong (as far as the future
goes), I'll fis them up tomorrow.

Alex Gough




Re: Moving string - number conversions to string libs

2001-12-03 Thread Tom Hughes

In message [EMAIL PROTECTED]
  Simon Cozens [EMAIL PROTECTED] wrote:

 On Mon, Dec 03, 2001 at 05:42:15PM +, Alex Gough wrote:
  The string to number conversion stuff should really be done by the
  string encodings... I think this is the right way to get this
  happening, comments?
 
 Looks like the right way to me. Could you commit it?

It's completely wrong I would have thought - the encoding layer
cannot know that a given code point is a digit so it can't possibly
do string to number conversion.

You need to use the encoding layer to fetch each character and
then the character set layer to determine what digit it represents.

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu/




Re: Moving string - number conversions to string libs

2001-12-03 Thread James Mastros

On Mon, 3 Dec 2001, Tom Hughes wrote:
 It's completely wrong I would have thought - the encoding layer
 cannot know that a given code point is a digit so it can't possibly
 do string to number conversion.

 You need to use the encoding layer to fetch each character and
 then the character set layer to determine what digit it represents.
Right.  And then you need to apply some unified logic to get from this
vector of digits (and other such symbols) to a value.

In other words, I think some sort of library rotuine is the Right Thing
here, it shouldn't pull (directly) off of any vtable.

I'm just having nightmares of subtily different definitions of what a
numeric constant looks like depending on the string encoding, because of
different bits o' code not being quite in sync.  Code duplication bad,
code sharing good.

(The charset layer should still be involved somewhere, because Unicode
(for ex) has a digit value property.  This makes, say, aribic numerials
(which don't look at all what what a normal person calls aribic numerals,
BTW) work properly.  (OTOH, it might also do strange things with ex
Hebrew, where the letters are also numbers (Aleph is also 1, Bet is also
2, etc.))

(Damm... when did I become such a Unicode fanatic.  The only language I
acatualy know doesn't even use most of Latin-1.)

-=- James Mastros
-- 
In the case of alchemy v chemistry the chemists know whether it will
probably go bang before they try it (and the chemical engineers still duck
anyway).   -=- Alan Cox