On Wed, Nov 22, 2000 at 02:57:09PM +0100, Roland Giersig wrote:
> I posted a RFC for something like that a while ago but got no reaction
> from the crowd.  It is not an internal optimisation like the one 

I don't think that there's a crowd here.

> Take this HTML for example: 
> 
>   <html>Text with a <b>larger <font size=+1>l</font>etter</b> in it. </html>
> 
> and try to find a way to substitute the word `letter' with `word',
> with outside formatting (<b>) preserved.

do I want that to read

<b>larger <font size=+1>w</font>ord</b>

or 

<b>larger word</b>

?

I think the problem can be visualised as a general purpose extension of
case substitution. ie something to achieve "Do what I mean" on

$_ ="there's more than one way to DO it";
s/do/obfuscate/i;

Do I mean

there's more than one way to obfuscate it
there's more than one way to OBfuscate it
there's more than one way to OBFUSCATE it

?

I think that the substitution problem (which Ilya touches on) is possibly
more hairy than the original. And even then I don't know how to implement
original without making plain text manipulation slow down horribly.
[do I store attributes as

1) A bit mask on each character referencing a table of attributes for
   this string? hmm. eats lots of memory as sizeof (character) goes up
   but should be easier to shuffle characters and metadata.

2) An attached table of attributes and ranges to which they apply?
   Uses less memory for sparse attributes, but means that it's hard work
   every time we have to interrogate or shuffle characters as we need to
   check all the ranges each time to see if the characters we are
   manipulating have metadata.

3) something I've not thought of

?

all feel like they slow down plain text
]

> Ugh, you got me there.  I know very little about Perl internals, so I
> can't even pretend something.  Maybe Ilya has already started on a
> prototype? ;-)

I think Ilya will have "left the implementation to the reader" :-)
To which Jarkko will respond about "this wonderful proof to fix all the
bugs in perl, but it is too big to fit in the margin of my monitor"

[problem is that most of us are not as smart as Ilya, and don't have life
expectancy sufficient to solve Ilya's exercises]

> We need a way to specify attributes to chunks of text in a backward
> compatible way.  But how can we specify it in a compact way?  Hmm, as
> variable access by name is deprecated anyhow, we could use ${var} to

do you mean soft references being "depreciated"? "discouraged", but I'd
be very upset if I couldn't do "no strict 'refs';" should I knowingly
need them?

>   $bar = ${"${"L":size=>12}arge":size=>10};

                2 3              4

mmmm. how were you proposing to implement a parser to know that " number
4 is the end of the string, not " 2 or 3?
That's not an internals question. I don't know a good answer.
[In 2 senses. I'm not certain of the best list to discuss it on. And
secondly that I don't feel that a knowledge of the existing internals
is needed to suggest how to approach parsing a "" string with nested
bare "s]

However, the syntax of how to specify these things is independent of how
to implement them internally. Which is the sort of thing to discuss round
here.

> How to loop over all chunks? Hmm, seems like split could handle it OK
> if the regex engine can match chunk borders. Seems like another
> special token is needed.  How about `\C' for chunk?  Or is this
> already taken?

           \C  Match a single C char (octet) even under utf8.

however, with UTF we now have thousands more glyphs to use as 
escapes in regexp :-)
[aargh. I'm off topic for this list]

> Hmm, what about string comparisions?  `eq' and friends should simply
> conmtinue to work as usual on the string contents.  Do we need some
> kind of meta-eq to be able to compare the attribs also?

I think that that becomes a method call on one of the scalars.

Nicholas Clark

Reply via email to