Re: String representation

David Mitchell Mon, 18 Dec 2000 07:25:53 -0800
Nick Ing-Simmons <[EMAIL PROTECTED]> wrote:
> What are string functions in your view?
>   m//
>   s///
>   join()
>   substr
>   index
>   lc, lcfirst, ...
>   & | ~
>   ++
>   vec  
>   '.'
>   '.='
> 
> It rapidly gets out of hand.

Perhaps, but consider that somewhere within the perl internals there
have to be functions which implement all these ops anyway. If we
provide vtable slots for all these functions and just fill most of the
slots with pointers to the 'default' Perl implementation, we havent
really lost anything, except possibly a slight delay due to the extra
indirection which that may be compensated for elsewhere). On the other
hand, we have gained the ability to replace the default implementation
with something more efficent where it suits us.

Take the example of substr() - if this is a standalone function, then
it has to work without reference to any of the internals of its args,
and thus has to rely on extracting a 'standard' representation of the
string value from the SV in order to operate upon it. This then implies
messiness of coding and inefficiency, with all the unicode hell that
infects perl5 re-appearing.  If substr() were a per-type op, then the
messy details of UTF8 would lie almost completely within the internal
implementation of that datatype.

In fact, I would argue that in general most if not all the operations currently
performed by pp_* should have vtable equivalents, both for numeric and string
types (including unary ops, mutators, binops etc etc).

> Seriously - I think we need to considr the original question 
> "What is the representation" based on perl5 hindsight, then think what 
> operations we want to perform on it, then divide those into the ones
> which make sense to be "methods" (vtable entries) of string, 
> those that are part of string API, and those which are just ops messing 
> with strings.

If an "op messing with strings" might be able to do a faster job given
access to the internals of that string type, then I'd argue that that op
should be in the vtable too.

> >That way way there can be multiple regex implementations to handle different
> >cases (eg  fast one(s) for fixed width ASCII, UTF-32 etc, and a slow horrible one
> >for variable-length UTF-8, etc). Of course perl itself could provide a default 
regex
> >engine usable by all string types, but implementors would then be free to add
> >variants for custom string types.
> 
> I would argue one does that by making the regex API more modular.

Quite possibly, but once having split it into separate components, I
might then make the case that certain  of those components could be
implemented as vtable ops (eg those components that are sensitive to
the string representation).

My dream would be that all knowledge related to utf8 (say) is contained
in a file called sv_utf8.c, and that if someone wanted to try an alternative
implementation, they would just need to hack (or replace) that file, not also
reg*.c, pp*.c, etc etc. But I also dream of World Peace and England winning the
World Cup ... ;-)

Dave.


* Dave Mitchell, Operations Manager,
* Fretwell-Downing Facilities Ltd, UK.  [EMAIL PROTECTED]
* Tel: +44 114 281 6113.                The usual disclaimers....
*
* Standards (n). Battle insignia or tribal totems
Re: String representation

Reply via email to