Re: RFC35 (base format for perl variables) - some comments

2000-09-27 Thread Dan Sugalski

At 02:25 PM 9/25/00 +0100, David Mitchell wrote:
Here are a few comments on RFC35 (base format for perl variables).

[ NB - I've only just joined this list, and although I've rummaged
through the archives, I may have missed bits which make my comments
obsolete/absurd etc... :-) ]

Revisiting the past is never a bad idea. (Well, rarely ever, at least... :)

1. void *variable_data;

I would suggest having a slightly larger payload area than just
space for a single pointer. At the least have the ability to store
a standard number (ie double), or possibly even the xpv_pv/xpv_cur/xpv_len
of a standard string. This way, most scalars could avoid the necessity
of a separate alloc and extra level of indirection and cleanup.
Other variable types whose payload doesnt fit in that area could
still have a pointer to private data as usual, but could also use the
remainder of the payload area for private data also (especially for
some commonly accessed stuff such as length etc).

The penalty for this would be that SVs holding only ints would waste
a few bytes.

We've talked about having a string pointer, double, and integer all in the 
main SV struct. I think we've pretty much decided that's a good thing for 
speed reasons. Small systems like the palm might not want to do it, instead 
hanging off the data pointer, so making provisions for that wouldn't be a 
bad thing.

2. "Op functions have intimate knowledge of the internals and unrestricted
access"

In this context, is an "op function" a function in the vtable, or
are we refering to pp_foo functions? If the latter, surely they shouldnt
have access to SV internals?

pp_foo functions. While they won't have access to SV internals, they will 
know that SVs have vtables, for example. Extensions will treat the SV 
pointer as a magic cookie, and if they want the integer value they'll call 
SvIV() (or whatever) to get it.

There's likely going to be other things that'll be visible to opcode 
functions that extensions won't be looking at, but we've not gotten to that 
point yet.

3. The vtable needn't only include function pointers. For example,
it could include a set of class-wide RO flags as well, which could be
accessed directly rather than via a function call for efficiency.
(Can't think of any needed flags off the top of my head, but if say
*vtable[0] was reserved for this, it might come in handy later.)

If it goes into the vtable, then it's going to be shared across all the 
variables of that particular type. In that case it's probably best to 
either stick 'em in a variable in the package stash somewhere, or they'll 
be constants in which case the type flag (or whatever flag) can just return 
a constant and we don't need a data slot.

4. The vtable should have a few spare slots at the end, which external
implementers of data types are obliged to fill with pointers to noop
and/or croak functions. Then when a later release of Perl adds new
functions to the API, these slots can be reassigned, and XSUBS compiled
under the previous versions will continue to work, or will at least die
gracefully.

This is an outstanding idea. It's on the big list.

I presume there will be a set of standard noop/carp functions which
implentors can stuff their vtables with for the unneeded bits?

Yup, I certainly hope so.

5. Is the intention to handle upgrades automatically via vtable calls?
For example the [gs]et_string method for a numeric-only SV would
be responsible for uprading the SV and setting its vtable pointer to
point to the string-SV vtable, etc.

Yes, though in your example, doing a get_string on an integer SV isn't 
obligated to do a conversion--it could do it on the fly and not cache the 
conversion.

6. Will magicalness now be handled by having a separate vtable for
each type of magicalness?

Some of it, yes.

7. Will there be only one vtable API, or a separate one for scalars, arrays,
hashes etc?

That's the way I'm leaning, or at least an implementation where most of the 
vtable entries are the same.

Presumably for standard arrays etc, the scalar functions would do what
you would expect of an array evaluated in a scalar context, eg
get_int/float would return the length of the array, set_string would
croak, etc etc. Or should they all just croak, and there be an
array-specific method for getting array length?

No, they shouldn't croak.

I'm currently leaning towards a varargs calling sequence--doing this:

   get_integer(hash_pmc);

returns the number of entries in the hash, while this:

   get_integer(hash_pmc, key);

returns the integer value of the hash entry pointed to by key. (What key is 
is up in the air too)

8. Slowness of function call overhead.

I did a very quick and crude test (650Mhz Athalon running Linux, perl 
5.005_03)
and got the following numbers.

250ns: Perl program: time to do $i++
  45ns: standalone C program: time to do the pair
 i = (sv-vtable[OFFSET_GETVAL])(sv);
 (sv-vtable[OFFSET_SETVAL])(sv,i);
 where the 2 entries in 

RFC35 (base format for perl variables) - some comments

2000-09-25 Thread David Mitchell

Here are a few comments on RFC35 (base format for perl variables).

[ NB - I've only just joined this list, and although I've rummaged
through the archives, I may have missed bits which make my comments
obsolete/absurd etc... :-) ]

1. void *variable_data;

I would suggest having a slightly larger payload area than just
space for a single pointer. At the least have the ability to store
a standard number (ie double), or possibly even the xpv_pv/xpv_cur/xpv_len
of a standard string. This way, most scalars could avoid the necessity
of a separate alloc and extra level of indirection and cleanup.
Other variable types whose payload doesnt fit in that area could
still have a pointer to private data as usual, but could also use the
remainder of the payload area for private data also (especially for
some commonly accessed stuff such as length etc).

The penalty for this would be that SVs holding only ints would waste
a few bytes.

2. "Op functions have intimate knowledge of the internals and unrestricted
access"

In this context, is an "op function" a function in the vtable, or
are we refering to pp_foo functions? If the latter, surely they shouldnt
have access to SV internals?

3. The vtable needn't only include function pointers. For example,
it could include a set of class-wide RO flags as well, which could be
accessed directly rather than via a function call for efficiency.
(Can't think of any needed flags off the top of my head, but if say
*vtable[0] was reserved for this, it might come in handy later.)

4. The vtable should have a few spare slots at the end, which external
implementers of data types are obliged to fill with pointers to noop
and/or croak functions. Then when a later release of Perl adds new
functions to the API, these slots can be reassigned, and XSUBS compiled
under the previous versions will continue to work, or will at least die
gracefully.
I presume there will be a set of standard noop/carp functions which
implentors can stuff their vtables with for the unneeded bits?

5. Is the intention to handle upgrades automatically via vtable calls?
For example the [gs]et_string method for a numeric-only SV would
be responsible for uprading the SV and setting its vtable pointer to
point to the string-SV vtable, etc.

6. Will magicalness now be handled by having a separate vtable for
each type of magicalness?

7. Will there be only one vtable API, or a separate one for scalars, arrays,
hashes etc?
I can see some advantages of having a single one covering all types.
For example, one possible pseudo-hash implementation would have both
arrayish and hashish methods. Or a HTML parser module might want to
treat a HTML document both as a straight scalar string and a hash of the
parsed DOM for that object (presuambly with pointers to the var from both
scalar and hash entries in one or more globs).

Presumably for standard arrays etc, the scalar functions would do what
you would expect of an array evaluated in a scalar context, eg
get_int/float would return the length of the array, set_string would
croak, etc etc. Or should they all just croak, and there be an
array-specific method for getting array length?

8. Slowness of function call overhead.

I did a very quick and crude test (650Mhz Athalon running Linux, perl 5.005_03)
and got the following numbers.

250ns: Perl program: time to do $i++
 45ns: standalone C program: time to do the pair
i = (sv-vtable[OFFSET_GETVAL])(sv);
(sv-vtable[OFFSET_SETVAL])(sv,i);
where the 2 entries in vtable point to simple functions that
get or set an integer in the payload of the sv.

This at least gives a preliminary indication that the overhead of 
a couple of method calls isnt huge compared with the overall cost of
performing a simple perl op.


* Dave Mitchell, Operations Manager,
* Fretwell-Downing Facilities Ltd, UK.  [EMAIL PROTECTED]
* Tel: +44 114 281 6113.The usual disclaimers
*
* Standards (n). Battle insignia or tribal totems