Re: [Haskell-cafe] performance question

brandon s allbery kf8nh Thu, 14 Feb 2013 15:35:14 -0800

It's worth remembering that the main gain from lex/yacc had originally to do 
with making the generated programs fit into 64K address space on a PDP11 more 
than with any direct performance efficiency.


-- 
brandon s allbery kf8nh
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


On Thursday, February 14, 2013 at 6:27 PM, David Thomas wrote:

> (I'll be brief because my head is hurting, but please don't interpret that as 
> an intent to offend)
> 
> A few points:
> 
> 1) Capture groups are all you need to do some meaningful interpretation of 
> data; these were around long before perl.
> 
> 2) Yacc is typically used in conjunction with lex, partly for (a) efficiency 
> and partly for (b) ease of use (compared to writing out [a-z] as production 
> rules).
> 
> 3) I've actually used lex without yacc (well, flex without bison) when faced 
> with dealing with a language that's regular (and easy enough to express that 
> way - cf. an enormous finite subset of a context-free language).
> 
> 
> 2b is mostly irrelevant in Haskell, as Parsec already provides functions that 
> can easily match the same things a regexp would.
> 
> 2a, if it stands up to testing, is the best argument for ripping things apart 
> in Haskell using a DFA.  Parsec and cousins are efficient, but it's hard to 
> beat a single table lookup per character.  The questions are 1) is the 
> difference enough to matter in many cases, and 2) is there a way to get this 
> out of parsec without touching regexps?  (It's not impossible that parsec 
> already recognizes when a language is regular, although I'd be weakly 
> surprised).
> 
> 
> 
> 
> 
> On Thu, Feb 14, 2013 at 3:07 PM, wren ng thornton <w...@freegeek.org 
> (mailto:w...@freegeek.org)> wrote:
> > On 2/13/13 11:18 PM, wren ng thornton wrote:
> > > On 2/13/13 11:32 AM, Nicolas Bock wrote:
> > > > Since I have very little experience with Haskell and am not used to
> > > > Haskell-think yet, I don't quite understand your statement that
> > > > regexes are
> > > > seen as foreign to Haskell-think. Could you elaborate? What would a more
> > > > "native" solution look like? From what I have learned so far, it seems 
> > > > to
> > > > me that Haskell is a lot about clear, concise, and well structured
> > > > code. I
> > > > find regexes extremely compact and powerful, allowing for very concise
> > > > code, which should fit the bill perfectly, or shouldn't it?
> > > 
> > > Regexes are powerful and concise for recognizing regular languages. They
> > > are not, however, very good for *parsing* regular languages; nor can
> > > they handle non-regular languages (unless you're relying on the badness
> > > of pcre). In other languages people press regexes into service for
> > > parsing because the alternative is using an external DSL like lex/yacc,
> > > javaCC, etc. Whereas, in Haskell, we have powerful and concise tools for
> > > parsing context-free languages and beyond (e.g., parsec, attoparsec).
> > 
> > 
> > Just to be clear, the problem isn't that proper regexes are only good for 
> > regular languages (many files have regular syntax afterall). The problem is 
> > that regexes are only good for recognition. They're an excellent tool for 
> > deciding whether a given string is "good" or "bad"; but they're completely 
> > unsuitable for the task of parsing/interpreting a string into some 
> > structure or semantic response. If you've ever used tools like yacc or 
> > javaCC, one of the crucial things they offer is the ability to add these 
> > semantic responses. Parser combinator libraries in Haskell are similar, 
> > since the string processing is integrated into a programming language so we 
> > can say things like:
> > 
> >     myParser = do
> >         x <- blah
> >         guard (p x)
> >         y <- blargh
> >         return (f x y)
> > 
> > where p and f can be an arbitrary Haskell functions. Perl extends on 
> > regular expressions to try and do things like this, but it's extremely 
> > baroque, hard to get right, and impossible to maintain. (N.B., I was raised 
> > on Perl and still love it.) And at some point we have to call into question 
> > the idea of regexes as an embedded DSL when we then turn around and try to 
> > have Perl be a DSL embedded into the regex language.
> > 
> > One of the big things that makes regexes so nice is that they identify 
> > crucial combinators like choice and repetition. However, once those 
> > combinators have been identified, we can just offer them directly as 
> > functions in the host language. No need for a special DSL or special 
> > syntax. The big trick is doing this efficiently. Parser combinators were an 
> > academic curiosity for a long time until Parsec came around and made them 
> > efficient. And we've come a long way since then: with things like 
> > attoparsec, PEG parsing, and non-monadic applicative parsers (which can 
> > perform more optimizations because they can identify the structure of the 
> > grammar).
> > 
> > The theory of regular expressions is indeed beautiful and elegant. However, 
> > it's a theory of recognition, not a theory of parsing; and that's a crucial 
> > distinction. Haskell is about clear, concise, and well-structured code; but 
> > to be clear, concise, and well-structured we have to choose the right tool 
> > for the job.
> > 
> > 
> > -- 
> > Live well,
> > ~wren
> > 
> > _______________________________________________
> > Haskell-Cafe mailing list
> > Haskell-Cafe@haskell.org (mailto:Haskell-Cafe@haskell.org)
> > http://www.haskell.org/mailman/listinfo/haskell-cafe
> 
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe@haskell.org (mailto:Haskell-Cafe@haskell.org)
> http://www.haskell.org/mailman/listinfo/haskell-cafe
> 
>

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] performance question

Reply via email to