Re: To get things started...

2000-11-24 Thread Nicholas Clark

On Fri, Nov 24, 2000 at 01:01:29AM -0500, Sam Tregar wrote:
 On Wed, 22 Nov 2000, Dan Sugalski wrote:
 
  Probably the easiest thing is to implement some sort of file-tied scalar or
  something that can provide bytes to the regex engine until it stops asking
  for them. Some magic or other, though, will get us what we need.
 
 That might be the easiest thing for us - as internals programmers - but
 does it answer the general need?  Everyone writing regex-based parsers
 faces this problem.  Maybe this is something to toss to perl6-language and
 get some RFC'd Larry-fried syntax?

I think Dan was suggesting that the (user side) regex doesn't change at all
(so that's no new syntax there)
It's just that the innards of perl gains a tied scalar that doesn't actually
read in and buffer the file immediately, but defers it as long as it can get
away with. And that the regex engine knows about these lazy scalars and
provokes the read-more when needed.

But maybe explicity being able to go

$file_handle =~ /(ba*)/;

and it working DWIM could be somewhat useful.
(except that if the match fails you don't have the data buffered
anywhere obvious, unless there's collusion between PerlIO and rexexp
engine)

 Also, a nagging question - how does a regex-based parser work without
 ending up reading the entire file into memory most of the time?  Even with
 an intelligent tied-scalar reading bytes there's going to be failing cases
 where the regex has to walk to the end of the "string" to find out it
 failed.  Presumably it would also need to seek back to the start which
 means we'd have to buffer as we go.

I don't think that this differs from the current parser. If it encounters
open " but never a close ", it will read and buffer to the end of file
before realising that there's a problem. (because strictly there isn't
a problem until EOF is encountered before the closing ")

I'm not certain there's anything that can actually be done to avert the need
to buffer a lot of script in these situations. You mustn't attempt to seek
the script file handle as it might be from something unseekable such as a
pipe (or socket. BEGIN {socket STDIN...})

Nicholas Clark



Re: To get things started...

2000-11-24 Thread Sam Tregar

On Fri, 24 Nov 2000, Nicholas Clark wrote:

 I think Dan was suggesting that the (user side) regex doesn't change at all
 (so that's no new syntax there)
 It's just that the innards of perl gains a tied scalar that doesn't actually
 read in and buffer the file immediately, but defers it as long as it can get
 away with. And that the regex engine knows about these lazy scalars and
 provokes the read-more when needed.

Right.  And I was suggesting that while this might solve our problem it
wouldn't do much for all the other people that have to solve the same
problem.  I'd like to see a general solution accessible from Perl.  If
that solution is some tied-scalar magic, fine.  If it's more involved than
that (and I think it will be) then we'll need to think about the syntax a
bit.

 I don't think that this differs from the current parser. If it encounters
 open " but never a close ", it will read and buffer to the end of file
 before realising that there's a problem. (because strictly there isn't
 a problem until EOF is encountered before the closing ")

 I'm not certain there's anything that can actually be done to avert the need
 to buffer a lot of script in these situations. You mustn't attempt to seek
 the script file handle as it might be from something unseekable such as a
 pipe (or socket. BEGIN {socket STDIN...})

I suppose that's true.  I was immagining something less extreme than the
absolute failure case of missing a closing ".  I'm imagining a failure
that is recoverable but still requires running the regex to the end of the
"string" to find that out.  Are there any like this?  Perhaps not.

Perhaps this just isn't a reasonable criticism of regex parsers since
normal parsers do it all the time anyway!

-sam





Re: SvPV*

2000-11-24 Thread Bart Lateur

On Fri, 24 Nov 2000 08:54:43 +0100, Roland Giersig wrote:

Maybe the title should be :

"Perl should use XML as its basic data type instead of linear strings"

Horrible.

I kinda liked your original proposal. But you should NOT focus on XML.
That leaves out too many other possible data sources: RTF, for example,
or TeX. What is typical, is that it is marked up text, in the form of a
tree, i.e. properly nested.

The internal structure might as well be easily representable as XML.

I do think that the term "non-linear text" is absolutely unclear.

-- 
Bart.