Re: Meta-design

2000-12-07 Thread Roland Giersig

Dan Sugalski wrote:

> I object to targetting GCC specifically for two reasons,
> though, neither of them VMS related:
> 
> 1) Targeting a single compiler, no matter whose it is, is a bad idea. We're
> writing in a *language*, not for a compiler. Targeting a specific compiler
> restricts us even more than choosing a language.

How about trying standards?  While gcc is a standard in itself, writing
perl6 in ANSI C should provide for just the portability we try
to achieve.
 
> 2) GCC produces slow code on all platforms where there's an alternative.
> Compaq C beats it on Alphas and VAXen, Sun's compiler beats it on SPARC
> machines, and HP's beats it on PA-RISC machines. Heck, Microsoft's compiler
> beats it on x86 chips. (As does Intel's compiler) We want perl fast, and
> crippling it by requiring a particular compiler's a foolish thing.

How about a two-step requirement?

1) Native compiler must support ANSI-C.

2) If 1) doesn't hold, gcc can be required, which fulfills ANSI-C.

Systems that cannot fulfill 1) or 2) are not worthy to run
perl6 in the first place... ;-)

> There's no reason I can see, outside of ideological ones, to require gcc.

Yeah, we should see it as a fallback solution really, but it
shouldn't be our first choice.

Roland
--
[EMAIL PROTECTED]



Re: SvPV*

2000-11-24 Thread Roland Giersig

David Mitchell wrote:
> 
> Roland Giersig <[EMAIL PROTECTED]> wrote:
> 
> > Maybe:
> >
> > "Perl6 should excell at manipulating *formatted* text."
> 
> Quite possibly, although as a previous poster has pointed out,
> formatted text != XML.

Yes, but both share a common underlying structure: they are both
chunks of text that have (hierarchical) attributes attached, so
a data structure that can handle general XML can also hold e.g.
a RTF document.

> ie in the sense that HTML, RTF, TeX etc, have a natural sense of containing
> a single piece of text with embedded attributes - which could in principle
> be stripped away or ignored. So for example, a regex would operate on the
> whole underlying text. XML on the other hand is far more general. For example
> one particular XML document might hold the names and contact details
> for a thousand individuals. Trying to treat the document as a single
> string and applying a regex to it doesnt have any particularly strong
> semantic doo-dah. In that particular example, it might make sense to
> think of the XML doc as an array of thinggies rather than a single string.

This is cause by the (well, mostly) clear distinction between style 
and content in documents, so if you take away the style (which has a 
rather low information level), you still have the full content information.

A database in XML in contrast has its information equally divided
between data attributes and data content, so you cannot simply strip 
the attributes.

This does not mean that a datastructure cannot equally hold both.  But
to do something sensible with it, a regex machine must be able to
match on both attributes and textual content.  This is what my proposal
is about.

Hope this clarifys it somewhat.

Hmm, I'm back to

"Perl should weave its magic upon attributed text chunks 
 instead of linear text."

Roland
--
[EMAIL PROTECTED]



Re: SvPV*

2000-11-24 Thread Roland Giersig

Bart Lateur wrote:
> 
> On Fri, 24 Nov 2000 08:54:43 +0100, Roland Giersig wrote:
> 
> >Maybe the title should be :
> >
> >"Perl should use XML as its basic data type instead of linear strings"
> 
> Horrible.
> 
> I kinda liked your original proposal. But you should NOT focus on XML.
> That leaves out too many other possible data sources: RTF, for example,
> or TeX. What is typical, is that it is marked up text, in the form of a
> tree, i.e. properly nested.

Of course a lot of other data formats can be represented like that,
but I'd thought that `XML' as a catch-all-word would help convey
the importance of the proposal.

> The internal structure might as well be easily representable as XML.

Yes, I know that XML is the user-visible representation and that
the data structures are something totally different.  But how 
can you formulate that: 

"Perl should have built-in data structures that can hold XML data"?

Better?  I don't think so.

> I do think that the term "non-linear text" is absolutely unclear.

So do I, but I haven't found a better catchy description.
The text is divided into chunks linked together and each chunk
can have attributes attached.

"Perl should weave its magic upon attributed text chunks 
 instead of linear text."

Maybe: 

"Perl6 should excell at manipulating *formatted* text."

Any other suggestions?

Roland
--
[EMAIL PROTECTED]



Re: SvPV*

2000-11-23 Thread Roland Giersig

Nicholas Clark wrote:
> 
> On Wed, Nov 22, 2000 at 01:24:50PM -0500, Chaim Frenkel wrote:
> > I'd offer the possiblity that there are two (or perhaps more)
> > different problems here.  One is the current bunch of bytes (string,
> > executable to be twiddled) Another which the attribute on strings
> > seems to be structured data.
> >
> > Squeezing attributes onto a buffer, seems to be shoehorning a more
> > general problem onto a specific implementation.
> >
> > Getting an efficient representation of a meaningful structure should
> > be done a new data type.
> >
> > (I'm thinking of representing COBOL records/data, or even XML documents)

That's (XML) what I was thinking also when writing the proposal.
Hmm, I should modify it to use the XML buzzword, this could greatly
enhance its obvious value.  Maybe the title should be :

"Perl should use XML as its basic data type instead of linear strings"

How does that sound?

> Have I misunderstood you if I suggest that "two or more" is actually a
> continuous range of representation from
> 
> 1 (contiguous linear) string data with 0 or more attribute attached to each
>   character where the string's text is the backbone
>   [and the global and local order of the characters in string is crucial
>to the value and equality with other variables]
> 
> 2 structured data (eg XML) where the string's text is just part of the data
>   held in the structure, and you could sort the data in different ways
>   without changing its value
> 
> Are those end members in a continuum? or are hybrids of the 2 impossible?
> Am I barking up the wrong tree completely?

I would see that (1) is the simplest form of (2), so once handling (2)
is
solved, (1) is also handled.  This is from a functional point of view,
performance is another issue.  It could be well so that the solution to 
(2) needs only minor tweaking to be fast enough for (1) compared to
the current solution.  Or a complete separate implementation is
warranted.

I'm with Chaim Frenkel, who wrote:
> If for no other reason, there are many ways of having the attributes
> distribute across, deletions, additions, and moves. That is a policy
> decision that should not be done at the perl internal level.

This means IMHO, that the basic data structure for (1) must be
extensible in a way that it can be morphed into the one for (2).
But the implementation of functions that work on (2) are separable
from those that work on (1).

David Mitchell has a proposal how this could be done:
> One way round this is to leave the semantics to implementor of the SV type.
> This could be done by having vtable methods for *all* string ops
> known to Perl; in particular m//, s// and tr//.
> 
> The way this could work is for the Perl core to provide a generic regex
> library, which uses only the public interface to SVs to extract
> and manipulate its contents. Standard string SVs would have the relevant
> vtable entries point to these generic regex functions.
> However, if someone wants to implement a HTML SV type say, then
> (if they are keen enough) they can write their own m//, s// methods
> which are efficent (becuase they can access the internal representation),
> and can have whatever semantics the author wishes.
> 
> However, since the internals of regexes are a dark art to me, I dont know
> whether is is sensible to have a single regex compiler, but multiple
> regex executors (if that's the right terminology).

I'm very happy how this discussion is going.  Are you guys also
feeling that this could be of immense value for a lot of Perl users
out there?

Best regards,

Roland
--
[EMAIL PROTECTED]



Re: SvPV*

2000-11-22 Thread Roland Giersig

Nicholas Clark wrote:

> IIRC Ilya mailed p5p bemoaning the fact that perl's SVs use a continuous
> buffer. A split-buffer representation (where a hole is allowed in the
> middle of the buffer data) permits much faster replacement type operations,
> as there is less copying, and you can move the hole around to suit your
> needs.

I posted a RFC for something like that a while ago but got no reaction
from the crowd.  It is not an internal optimisation like the one 
stated above, but a whole new [no, I won't say paradigm] concept 
that could be *the* reason that makes perl6 worthwile.  I've attached 
the RFC again and would hope to at least get some "Nah..." or "Yeah!" 
as feedback.

Cheers!

Roland
--
[EMAIL PROTECTED]

=head1 TITLE

Perl should support non-linear text.

=head1 VERSION

  Maintainer: Roland Giersig <[EMAIL PROTECTED]>
  Date: 19 Oct  2000
  Version: 1
  Mailing List: perl6-internals ?
  Number: ?

=head1 ABSTRACT

Right now, Perl performs its magic only upon linear strings of ASCII
and Unicode text. As Ilya Zakharevich has stated in his recent
interview (http://www.perl.com/pub/2000/09/ilya.html), the new feature
that would help todays Perl programmers most is if Perl would be
capable to perform its mighty string operations on marked-up
(non-linear) text consisting of linear chunks of text strings that
carry different attributes.

This could very well be THE new feature that justifies the complete
Perl6 rewrite!

=head1 DESCRIPTION

When Perl first came into being, the world was full of ASCII text,
so Perl became strong in manipulating ASCII text.  But this has changed.
Nowadays even the simplest documents (e.g. mail messages) tend to
be in some marked-up format or other, and programmers worldwide
are struggling in finding a way to manipulate those.

To aid these efforts I therefore propose to enhance the string format
used in Perl: non-linear text, consisting of chunks of linear text
(Unicode, of course) that have attributes attached.

Take this HTML for example: 

  Text with a larger letter in it. 

and try to find a way to substitute the word `letter' with `word',
with outside formatting () preserved.

Next to impossible?  I found no easy (but general) way, even not with
HTML::Parser et. al.

If perl could handle non-linear strings, this could be done in a
simple s/letter/word/.  Ain't that time-saving!!  For example

  s/(l)etter/${"w":${1:}}ord/

could do the magic (see below for a syntax proposal).

Or, to make formulas more readable:

  s/\b(\w+)^(\d+)/$1${2:raised=>1}/


=head1 IMPLEMENTATION

Ugh, you got me there.  I know very little about Perl internals, so I
can't even pretend something.  Maybe Ilya has already started on a
prototype? ;-)

Anyway, the current document parsers (HTML::Parser et. al.) already
build non-linear text data structures.  Basically these structs are
lists of strings interspersed with refs to embedded structs (and
attributes) of the same type.  It has to be discussed if this
structure is flexible enough for most purposes.

Attributes could be simply stored as hashes, so the chunks would have
hash refs attached.  This sounds rather easy to accomplish.

So, what today is a string would become an array of strings with
attached hashes internally.  This doesn't sound too strange, but
again, this is for others to decide.

=head1 SYNTAX

We need a way to specify attributes to chunks of text in a backward
compatible way.  But how can we specify it in a compact way?  Hmm, as
variable access by name is deprecated anyhow, we could use ${var} to
mean $var and ${"text"} to mean "text".

Now we can use `:' to separate the varname from the attributes:

  ${foo:size} # accesses attribute `size' in variable `foo'

  # set attribute `size'
  ${foo:size} = $fontsize;

  # copy attribute `a1' of text in var `bar' to attribute `a2' in var `foo'
  ${foo:a2} = ${bar:a1};  

  # copy all attributes, but leave text as-is
  ${foo:} = ${bar:};

Now for literal strings with embedded attributes:

  $foo = "just another string";
  ${foo:size} = 12;

or

  $foo =  ${"just another string":size=>10};

This can nest:

  $bar = ${"${"L":size=>12}arge":size=>10};

  ${bar:size} gives 10

How to loop over all chunks? Hmm, seems like split could handle it OK
if the regex engine can match chunk borders. Seems like another
special token is needed.  How about `\C' for chunk?  Or is this
already taken?

  $astring = ${"${"L":size=>12}arge ${"S":size=>8}mall":size=>10};
  foreach my $chunk (split /\C/ $astring) {
print "$chunk: ${chunk:size}\n";
  }

would print

  L: 12
  arge: 10
  S: 8
  mall: 10


What if an attributed string is split in half?  Well, in that case,
the attributes must be duplicated.

  $foo = ${"no attrib here ${&

RFC: Perl should support non-linear text

2000-11-03 Thread Roland Giersig


Hi folks,

I know, the RFC period is over, but still...
Please, read this through and tell me if it's a good idea or not.
Actually, it's not mine, I just wrote it down.  But see for yourself...

Roland

--snip--

=head1 TITLE

Perl should support non-linear text.

=head1 VERSION

  Maintainer: Roland Giersig <[EMAIL PROTECTED]>
  Date: 19 Oct  2000
  Version: 1
  Mailing List: perl6-internals ?
  Number: ?

=head1 ABSTRACT

Right now, Perl performs its magic only upon linear strings of ASCII
and Unicode text. As Ilya Zakharevich has stated in his recent
interview (http://www.perl.com/pub/2000/09/ilya.html), the new feature
that would help todays Perl programmers most is if Perl would be
capable to perform its mighty string operations on marked-up
(non-linear) text consisting of linear chunks of text strings that
carry different attributes.

This could very well be THE new feature that justifies the complete
Perl6 rewrite!

=head1 DESCRIPTION

When Perl first came into being, the world was full of ASCII text,
so Perl became strong in manipulating ASCII text.  But this has changed.
Nowadays even the simplest documents (e.g. mail messages) tend to
be in some marked-up format or other, and programmers worldwide
are struggling in finding a way to manipulate those.

To aid these efforts I therefore propose to nehance the string format
used in Perl: non-linear text, consisting of chunks of linear text
(Unicode, of course) that have attributes attached.

Take this HTML for example: 

  Text with a larger letter in it. 

and try to find a way to substitute the word `letter' with `word',
with outside formatting () preserved.

Next to impossible?  I found no easy (but general) way, even not with
HTML::Parser et. al.

If perl could handle non-linear strings, this could be done in a
simple s/letter/word/.  Ain't that time-saving!!  For example

  s/(l)etter/${"w":${1:}}ord/

could do the magic (see below for a syntax proposal).

Or, to make formulas more readable:

  s/\b(\w+)^(\d+)/$1${2:raised=>1}/


=head1 IMPLEMENTATION

Ugh, you got me there.  I know very little about Perl internals, so I
can't even pretend something.  Maybe Ilya has already started on a
prototype? ;-)

Anyway, the current document parsers (HTML::Parser et. al.) already
build non-linear text data structures.  Basically these structs are
lists of strings interspersed with refs to embedded structs (and
attributes) of the same type.  It has to be discussed if this
structure is flexible enough for most purposes.

Attributes could be simply stored as hashes, so the chunks would have
hash refs attached.  This sounds rather easy to accomplish.

So, what today is a string would become an array of strings with
attached hashes internally.  This doesn't sound too strange, but
again, this is for others to decide.

=head1 SYNTAX

We need a way to specify attributes to chunks of text in a backward
compatible way.  But how can we specify it in a compact way?  Hmm, as
variable access by name is deprecated anyhow, we could use ${var} to
mean $var and ${"text"} to mean "text".

Now we can use `:' to separate the varname from the attributes:

  ${foo:size} # accesses attribute `size' in variable `foo'

  # set attribute `size'
  ${foo:size} = $fontsize;

  # copy attribute `a1' of text in var `bar' to attribute `a2' in var `foo'
  ${foo:a2} = ${bar:a1};  

  # copy all attributes, but leave text as-is
  ${foo:} = ${bar:};

Now for literal strings with embedded attributes:

  $foo = "just another string";
  ${foo:size} = 12;

or

  $foo =  ${"just another string":size=>10};

This can nest:

  $bar = ${"${"L":size=>12}arge":size=>10};

  ${bar:size} gives 10

How to loop over all chunks? Hmm, seems like split could handle it OK
if the regex engine can match chunk borders. Seems like another
special token is needed.  How about `\C' for chunk?  Or is this
already taken?

  $astring = ${"${"L":size=>12}arge ${"S":size=>8}mall":size=>10};
  foreach my $chunk (split /\C/ $astring) {
print "$chunk: ${chunk:size}\n";
  }

would print

  L: 12
  arge: 10
  S: 8
  mall: 10


What if an attributed string is split in half?  Well, in that case,
the attributes must be duplicated.

  $foo = ${"no attrib here ${"ATTRIBUTES":size=>12} nothing here":size=>8};
  $firsthalf = substr($foo, 0, length($foo)/2);

should set $firsthalf to

  ${"no attrib here ${"ATTR":size=>12}":size=>8}

and

  substr($foo, length($foo)/2, 14, "really  ${"nothing":attrib=>1}");

should set $foo to

  ${"no attrib here ${"ATTR":size=>12}${"really  ${"nothing":attrib=>1}":}":size=>8}


Hmm, what about string comparisions?  `eq' and friends should simply
conmtinue to work as usual on the string contents.  Do we need some
kind of meta-eq to be able to compare the attribs also?

There are a lot of other issues to work out, but I'd like to first get
some approval from the gurus, so I'll stop here.


=head1 REFERENCES

  http://www.perl.com/pub/2000/09/ilya.html

--snip--