Re: String representation

David Mitchell Tue, 19 Dec 2000 09:58:17 -0800
Nick Ing-Simmons <[EMAIL PROTECTED]> wrote:
> David Mitchell <[EMAIL PROTECTED]> writes:
> >Nick Ing-Simmons <[EMAIL PROTECTED]> wrote:
> >> What are string functions in your view?
> >>   m//
> >>   s///
> >>   join()
> >>   substr
> >>   index
> >>   lc, lcfirst, ...
> >>   & | ~
> >>   ++
> >>   vec  
> >>   '.'
> >>   '.='
> >> 
> >> It rapidly gets out of hand.
> >
> >Perhaps, but consider that somewhere within the perl internals there
> >have to be functions which implement all these ops anyway. If we
> >provide vtable slots for all these functions and just fill most of the
> >slots with pointers to the 'default' Perl implementation, we havent
> >really lost anything, except possibly a slight delay due to the extra
> >indirection which that may be compensated for elsewhere). On the other
> >hand, we have gained the ability to replace the default implementation
> >with something more efficent where it suits us.
> 
> I have just been through exactly that process with the PerlIO stuff.
> So I hope you will not take offence when I say that your observation above
> is simplistic.

No offrence taken - I'm just making all this up as I go along anyway,
and am always happy to have the error of my ways pointed out ;-)

The problem is "what are the (types of) the arguments passed
> to the functions?" - the existing code will be expecting its args in 
> a particular form. So your wonderous new function must accept exactly 
> those args and types - and convert them as necessary before becoming 
> more efficient. So to get any win the args/types of all the functions 
> has to be designed with pluggable-ness in mind from the outset.
> At best this means taking an indirection hit for all the args as well 
> as the function (this is what PerlIO does - PerlIO is now essentially 
> a FILE ** rather than a FILE *).

I dont really see why types af args are (in general) a problem.
Consider the "simple" case of string concatenation. We can either
have the functionality directly in pp_concat:

---------------

pp_concat() {
        SV *sv1, *sv2;
        sv1 = POP; sv2 = POP;
        // without reference to the internals of sv1 or sv2, do a
        // concat which may involve lots of messy or slow upgrades and/or
        // type converstions etc
        sv1 = ....;
}

---------------


or we can have


---------------

pp_concat() {
        SV *sv1, *sv2;
        sv1 = POP; sv2 = POP;
        sv1->concat(sv2);
}

the_type_of_sv1_concat(SV *sv1, *sv2) {
        if (sv1->vtable == sv2->table) {
                // both args are of this type:
                // dive into the internals and do an efficient concat
                sv1 = ....;
        } else {
                generic_concat(sv1,sv2);
        }
}

generic_concat(SV *sv1, *sv2) {
        // without reference to the internals of sv1 or sv2, do a
        // concat which may involve lots of messy or slow upgrades and/or
        // type converstions etc
        sv1 = ....;
}

---------------


> >Take the example of substr() - if this is a standalone function, then
> >it has to work without reference to any of the internals of its args,
> >and thus has to rely on extracting a 'standard' representation of the
> >string value from the SV in order to operate upon it. This then implies
> >messiness of coding and inefficiency, with all the unicode hell that
> >infects perl5 re-appearing.  If substr() were a per-type op, then the
> >messy details of UTF8 would lie almost completely within the internal
> >implementation of that datatype.
> 
> True, but the messy details would now occur multiple times,
> as soon as substr_utf8 exists then _ALL_ the other string ops 
> _must_ be overridden as well because nothing but string_utf8 "class" 
> knows what is going on.

perhaps I'm being dim, but I dont really follow this. At the minimum,
someone writes a generic substr function that works with any string types.
Perhaps it achieves this by first converting all its args to UNICODE-32.
Not very efficient or desirable, but it gets you there.
Then the implementor of the utf8 code writes a substr_utf8 function that only
knows how to cope if all its args are utf8. If not, it just
hands the call on to the generic sub. Bingo - the uft8 coder only needs
to know about utf8 (plus writing a sub that will return the value as
a 'standard' UNICODE-32), and the rest of the perl developers dont need to
know about UTF-8. (Okay, an over-simplification!).
Since in real life the types of args are often the same, this will usually
be a win.

> >In fact, I would argue that in general most if not all the operations 
currently
> >performed by pp_* should have vtable equivalents, both for numeric and string
> >types (including unary ops, mutators, binops etc etc).
> 
> Hmm - that is indeed a logical position.

logical as in "consistent" or logical as in "sensible" ??? :-)

> The snag here is that the volume of code explodes and gets splattered 
> all over the sub-classes. So to fix a bug in the '+' operator (pp_plus)
> one has to go visit lots of places - but, presumably, the bug will 
> only be in one of them.

I was under the impression that it was pretty much agreed for numeric
types that each SV type would have its own set of binary ops (eg add, sub
etc), so I wasnt aware I proposing anything radical!
I can't see why you get a code explosion. In perl5 you get the explosion -
every part of perl needs to know about every SV type, and introducing a new
type or subtype involves hacking in just about every nook and cranny within
perl.
If there was a bug in the + operator, it would be apparent fairly quickly
where it lies (eg int+int and num+num gives right result,
int+num goes wrong; therefore the Int->add[NUM]() function is suspect.)

> If this is to fly (and I am not saying it cannot), then the 
> "multiple despatch" issue needs to have a clean process so that 
> it is clear what happens if someone writes:
> 
>   my $complex_rational = $urdu_string / sqrt(-$big_integer);

Well, quite a chunk of this is as yet unresolved even on the numeric side
of things - eg I dont think there have been any concrete proposals yet
that can handle sqrt(-N) by automatic promotion.

> In other words - string ops on strings of uniform type, math ops on 
> well understood hierachies etc. are all easy enough - it is the 
> combinations that get very messy very very quickly. 

I couldnt agree more - however, I think that issue is mostly orthogonal
to whether most pp_ functions should have vtable equivalents. If the
functionality is built dirrectly into pp_XXX, you still have a combinatorial
mess to cope with - hiving off into vtables *may* reduce the mess, or
*might* increase it, depending on how its done. certianly in the
simplistic case of "if all my args are of my type deal with it, otherwise
punt off to the generic handler", you get speedups sometimes with little
pain. Of you do similar to Dan's proposal for numeric types - have
a small fixed number of built-in string types, plus 'other', then
have a small array of functions per op, ie sv1->concat[typeof(sv2)])(sv2).
Then you end up always calling a function which knows what type of arg
to expect (and which can always punt if it doesnt like one of its args' type).

One final thing - I'm fairly new to this game (I thought the start of Perl6
would be a good time to get involved, without having to understand
the horrors of perl5 internals in depth), which means I run more of a risk
than most of speaking from my derierre. So far I have been reluctant to
put forward any really substantial suggestions as to how to handle
all this stuff, mainly for fear of irritating people who know what
they are talking about, and who have to take time out to explain to me why I'm
wrong! On the other hand, I do seem to have ended up taking a lot about
this subject on perl6-internals!!
So, should I have the courage of my convictions and let rip, or should I
just leave this to wiser people? Answers on a postcard, please....
Re: String representation

Reply via email to