Fwd: Re: PDD 4, v1.3 Perl's internal data types (Final version)

2001-07-07 Thread Bryan C . Warnock

On Friday 06 July 2001 10:13 am, Dan Sugalski wrote:
> I should point out that the internal representation of large numbers isn't
> going to be huge strings of ASCII characters--we'll probably be an array
> of 15-bit integers. (As Hong pointed out a while ago, doing that makes
> handling multiplication reasonably simple. Might go to arrays of 31-bit
> integers on 64-bit platforms) Though I might be misreading you here. (I
> probably am)

Actually, you *shouldn't* have to point that at.  No, you weren't misreading
me, and Yes, Virginia, I am a fucking idiot.  I can't even think of what I
may have been thinking of.  We have talked about this before.  I write the
damn summaries, for crying out loud  Arggh!

-- 
Bryan C. Warnock
[EMAIL PROTECTED]



Re: PDD 4, v1.3 Perl's internal data types (Final version)

2001-07-06 Thread David L. Nicol

Dan Sugalski wrote:

> The C structure that represents a bigint is:
> 
>struct bigint {
>  void *buffer;
>  UV length;
>  IV exponent;
>  UV flags;
>}
> 
> =begin question
> 
> Should we scrap the buffer pointer and just tack the buffer on the end
> of the structure? Saves a level of indirection, but means if we need
> to make the buffer bigger we have to adjust anything pointing to it.
> 
> =end question

Absolutely not.  Keep as much static-sized as possible, so you can
trivially recycle it.


Nobody much liked the suggestion of
tracking precision at the lowest levels, but here I am repeating
it anyway.


> Perl has a single internal string form:
 
> =item unused
> 
> Filler. Here to make sure we're both exactly double the size of a
> bigint/bigfloat header and to make sure we don't cross cache lines on
> any modern processor.

Is this explicitly guaranteed to remain unused, so that it may be
safely used for arbitrary user-magic (as long as they don't step on
each others toes) and semantic analysis flags, and so forth?

Or would that kind of thing be better included into whatever is
containing these guys -- along with reference counts and other
details of additional systems which are not referred to w/in this
document.


 
> =item Class
> 
> Class refers to a higher-level piece of perl data. Each class has its
> own vtable, which is a class' distinguishing mark. Classes live one
> step below the perl source level, and should not be confused with perl
> packages.

Does this imply that perl packages will continue to be called perl
packages,
even when they start getting introduced with a "class" keyword?




Re: PDD 4, v1.3 Perl's internal data types (Final version)

2001-07-06 Thread Uri Guttman

> "DS" == Dan Sugalski <[EMAIL PROTECTED]> writes:

  DS> We won't be using a char-based string math library--it'll all be
  DS> some internal binary format or other. (I can make a good argument
  DS> for it being done with a base 10 exponent rather than a base 2
  DS> one. I can see doing it all in decimal rather than binary, but I
  DS> can't think of a processor newer than the 6502 that does BCD
  DS> math. (Well, OK, I think the System/3x0 processors do--I suppose
  DS> that counts))

there is very neat and fast trick, called excess 3, for doing bcd sum
and difference math with a set of binary words. this means doing 8 bcd
digits at a time with common 32 bit words or 16 digits on alphas/sparc
and itaniums. keeping the number (at least the mantissa) in bcd would
also mean simpler and faster conversions to strings. in any case, when
the bigint/float design starts up, i will jump in with my excess 3
suggestions and help.

uri

-- 
Uri Guttman  -  [EMAIL PROTECTED]  --  http://www.sysarch.com
SYStems ARCHitecture and Stem Development -- http://www.stemsystems.com
Learn Advanced Object Oriented Perl from Damian Conway - Boston, July 10-11
Class and Registration info: http://www.sysarch.com/perl/OOP_class.html



Re: PDD 4, v1.3 Perl's internal data types (Final version)

2001-07-06 Thread Dan Sugalski

At 02:49 AM 7/6/2001 -0400, Uri Guttman wrote:
>question:
>
>can you declare at the language level a scalar to be a bigint or bignum?

I think Larry's planning on that, yep. For arrays and hashes at least, I 
expect, and I don't see why not for scalars too.

>that means that native format is never used. the reason might be
>something like a fixed point decimal value for money with 2 decimal
>places. the bigint/float thingies imply decimal math and that also means
>a decimal math library. this came up in #perl in a discussion about
>bcd. i think a true decimal math package for this would be useful and
>faster than a char string based one.

We won't be using a char-based string math library--it'll all be some 
internal binary format or other. (I can make a good argument for it being 
done with a base 10 exponent rather than a base 2 one. I can see doing it 
all in decimal rather than binary, but I can't think of a processor newer 
than the 6502 that does BCD math. (Well, OK, I think the System/3x0 
processors do--I suppose that counts))

Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk




Re: PDD 4, v1.3 Perl's internal data types (Final version)

2001-07-06 Thread Dan Sugalski

At 07:10 PM 7/5/2001 -0400, Bryan C. Warnock wrote:
>On Thursday 05 July 2001 02:11 pm, Dan Sugalski wrote:
> > =begin question
> >
> > Should we scrap the buffer pointer and just tack the buffer on the end
> > of the structure? Saves a level of indirection, but means if we need
> > to make the buffer bigger we have to adjust anything pointing to it.
> >
> > =end question

D'oh! I thought I'd chopped those question sections out!

>This is probably silly to consider, and it may have been brought up before,
>and, of course, I'm bringing it up *after* you've already closed it out, but
>leaving it as a buffer pointer could make string->number->string conversions
>almost as simple as a pointer copy.

Yep, it could. Not likely, but possible. What is likely is that we'll see a 
bigint->bigfloat conversion that consists of swapping vtable pointers in 
the main PMC and (maybe) setting the flags in the number struct.

>Taken to its extreme, you could run all conversions through bignum (or
>bigfloat), if you decided not to rely on platform support for it.  This
>would give you a single piece of code to handle number detection and
>manipulation, which means potential modularization (perhaps a third-party
>lib), consistent results across platforms, and, well, just one piece of code
>to tweak and maintain.  (I think that that could also include bigocts,
>bighexs, and bigbins, too.)  Taken to its extreme extreme, there wouldn't
>even really need to be a big* type - just big* code, a flag, and builtin
>logic to treat a regular string as a big*.

I should point out that the internal representation of large numbers isn't 
going to be huge strings of ASCII characters--we'll probably be an array of 
15-bit integers. (As Hong pointed out a while ago, doing that makes 
handling multiplication reasonably simple. Might go to arrays of 31-bit 
integers on 64-bit platforms) Though I might be misreading you here. (I 
probably am)

I think making bigint and bigfloat separate things is a reasonable 
performance win, but I might be wrong here.

>Taken to its not-so-extreme-case, if you make an assumption (which may be
>bad Bad BAD!) that most numeric work would have an implicit exponent of
>10^0, then it *is* a simple pointer copy - at least until the big* decides
>to normalize it to something else.  That would cause the big* and the string
>value to get out of sync, but you're going to need to address the conversion
>back anyway.

Assumptions are A Bad Thing. Explicit guarantees, however... If it turns 
out that an implicit exponent of 10^0 for bigints is warranted (and I don't 
see why it'd be a bad thing, though performance would probably be a little 
dodgy doing 10^45 + 10^46) then I'm all for doing that.

Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk




Re: PDD 4, v1.3 Perl's internal data types (Final version)

2001-07-05 Thread Uri Guttman

> "DS" == Dan Sugalski <[EMAIL PROTECTED]> writes:

  DS> 'Kay, here's the final version of this.
  DS>struct bigint {
  DS>  void *buffer;
  DS>  UV length;
  DS>  IV exponent;
  DS>  UV flags;
  DS>}

  DS> =begin question

  DS> Should we scrap the buffer pointer and just tack the buffer on the end
  DS> of the structure? Saves a level of indirection, but means if we need
  DS> to make the buffer bigger we have to adjust anything pointing to it.

i think the indirection is good for that reason. tracking all refs to
this structure is a lot of work for any resize operations.

  DS> =end question

  DS> The C pointer points to the buffer holding the actual

s/num_//

  DS> and yes, this looks identical to the bigint structure. This isn't
  DS> accidental. Upgrading a bigint to a bignum should be quick.


question:

can you declare at the language level a scalar to be a bigint or bignum?
that means that native format is never used. the reason might be
something like a fixed point decimal value for money with 2 decimal
places. the bigint/float thingies imply decimal math and that also means
a decimal math library. this came up in #perl in a discussion about
bcd. i think a true decimal math package for this would be useful and
faster than a char string based one.

uri

-- 
Uri Guttman  -  [EMAIL PROTECTED]  --  http://www.sysarch.com
SYStems ARCHitecture and Stem Development -- http://www.stemsystems.com
Learn Advanced Object Oriented Perl from Damian Conway - Boston, July 10-11
Class and Registration info: http://www.sysarch.com/perl/OOP_class.html



Re: PDD 4, v1.3 Perl's internal data types (Final version)

2001-07-05 Thread Bryan C . Warnock

On Thursday 05 July 2001 02:11 pm, Dan Sugalski wrote:
> =begin question
>
> Should we scrap the buffer pointer and just tack the buffer on the end
> of the structure? Saves a level of indirection, but means if we need
> to make the buffer bigger we have to adjust anything pointing to it.
>
> =end question

This is probably silly to consider, and it may have been brought up before,
and, of course, I'm bringing it up *after* you've already closed it out, but 
leaving it as a buffer pointer could make string->number->string conversions 
almost as simple as a pointer copy.  

Taken to its extreme, you could run all conversions through bignum (or 
bigfloat), if you decided not to rely on platform support for it.  This 
would give you a single piece of code to handle number detection and 
manipulation, which means potential modularization (perhaps a third-party 
lib), consistent results across platforms, and, well, just one piece of code 
to tweak and maintain.  (I think that that could also include bigocts, 
bighexs, and bigbins, too.)  Taken to its extreme extreme, there wouldn't 
even really need to be a big* type - just big* code, a flag, and builtin 
logic to treat a regular string as a big*.

With the buffer length specified, you may not even need to normalize an 
exponent string to a non-exponential string - you could simply set the 
buffer length short.  (Of course, this could cause some resize problems, if 
the bignum tries to grow a string that is really longer than it thinks it 
is, if you follow.)  

Taken to its not-so-extreme-case, if you make an assumption (which may be 
bad Bad BAD!) that most numeric work would have an implicit exponent of 
10^0, then it *is* a simple pointer copy - at least until the big* decides 
to normalize it to something else.  That would cause the big* and the string 
value to get out of sync, but you're going to need to address the conversion 
back anyway.

-- 
Bryan C. Warnock
[EMAIL PROTECTED]



PDD 4, v1.3 Perl's internal data types (Final version)

2001-07-05 Thread Dan Sugalski

'Kay, here's the final version of this.

Cut here

=head1 TITLE

Perl's internal data types

=head1 VERSION

1.3

=head2 CURRENT

 Maintainer: Dan Sugalski <[EMAIL PROTECTED]>
 Class: Internals
 PDD Number: 4
 Version: 1.3
 Status: Developing
 Last Modified: 02 July 2001
 PDD Format: 1
 Language: English

=head2 HISTORY

=over 4

=item Version 1.3, 2 July 2001

=item Version 1.2, 2 July 2001

=item Version 1.1, 2 March 2001

=item Version 1, 1 March 2001

=back

=head1 CHANGES

=item Version 1.3

Fixed some silly typos and dropped phrases.

Took all the underscores out of the field names.

=item Version 1.2

The string header format has changed some to allow for type
tagging. The flags information for strings has changed as well.

=item Version 1.1

INT and NUM are now concepts rather than data structures, as making
them data structures was a Bad Idea.

=item Version 1

None. First version

=head1 ABSTRACT

This PDD describes perl's known internal data types.

=head1 DESCRIPTION

This PDD details the primitive datatypes that the perl core knows how
to deal with. These types are lower-level than what's presented to the
perl programmer.

=head1 IMPLEMENTATION

=head2 Integer data types

Integer data types are generically referred to as Cs. Cs are
conceptual things, and there is no data structure that corresponds to them.

=over 4

=item Platform-native integer

These are whatever size native integer was chosen at perl
configuration time. The C-level typedef C and C get you a
platform-native signed and unsigned integer respectively.

=item Arbitrary precision integers

Big integers, or bigints, are arbitrary-length integer numbers. The
only limit to the number of digits in a bigint is the lesser of the
amount of memory available or the maximum value that can be
represented by a C. This will generally allow at least 4 billion
digits, which ought to be far more than enough for anyone.

The C structure that represents a bigint is:

   struct bigint {
 void *buffer;
 UV length;
 IV exponent;
 UV flags;
   }

=begin question

Should we scrap the buffer pointer and just tack the buffer on the end
of the structure? Saves a level of indirection, but means if we need
to make the buffer bigger we have to adjust anything pointing to it.

=end question

The C pointer points to the buffer holding the actual
number, C is the length of the buffer, C is the base
10 exponent for the number (so 2e4532 doesn't take up much space), and
C are some flags for the bigint.

BThe flags and exponent fields may be generally unused, but are
in to make the base structure identical in size and field types to
other structures. They may be removed before the first release of perl
6.

=back

=head2 Floating point data types

Floating point data types are generically reffered to as Cs. Like
Cs, Cs are a conceptual things, not a real data structure.

=over 4

=item Platform native float

These are whatever size float was chosen when perl was configured. The
C level typedef C will get you one of these.

=item Arbitrary precision decimal numbers

Arbitrary precision decimal numbers, or bignums, can have any number
of digits before and after the decimal point. They are represented by
the structure:

   struct bignum {
 void *buffer;
 UV length;
 IV exponent;
 UV flags;
   }

and yes, this looks identical to the bigint structure. This isn't
accidental. Upgrading a bigint to a bignum should be quick.

=for question

Like the bigint structure, should we toss the data pointer and just
tack the data on the end?

=end question

=back

=head2 String data types

Perl has a single internal string form:

   struct perl_string {
 void *buffer;
 UV allocated;
 UV bytes;
 UV flags;
 UV characters;
 UV encoding;
 UV type;
 UV unused;
   }

The fields are:

=over 4

=item buffer

Pointer to the start of the string's data.

=item allocated

How many bytes are allocated in the buffer.

=item bytes

How many bytes are used in the buffer.

=item flags

Flags indicating whatever. Bits 0-15 are reserved for perl, bits 16-23
for the encoding/decoding code, and teh rest for the type code.

=item characters

How many characters are in the buffer. An optional cache field.

=item encoding

How the data is encoded, for example fixed 8-bit characters, utf-8, or
utf-32. An index into the encoding/decoding function table. Note that
this specifies encoding only--it's valid to encode EBCDIC characters
with the utf-8 algorithm. Silly, but valid.

=item type

What sort of string data is in the buffer, for example ASCII, EBCDIC,
or Unicode. Used to index into the table of string functions.

=item unused

Filler. Here to make sure we're both exactly double the size of a
bigint/bigfloat header and to make sure we don't cross cache lines on
any modern processor.

=back

=head1 ATTACHMENTS

None

=head1 REFERENCES

The perl modules Mat