It's worth remembering that the main gain from lex/yacc had originally to do with making the generated programs fit into 64K address space on a PDP11 more than with any direct performance efficiency.
-- brandon s allbery kf8nh Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Thursday, February 14, 2013 at 6:27 PM, David Thomas wrote: > (I'll be brief because my head is hurting, but please don't interpret that as > an intent to offend) > > A few points: > > 1) Capture groups are all you need to do some meaningful interpretation of > data; these were around long before perl. > > 2) Yacc is typically used in conjunction with lex, partly for (a) efficiency > and partly for (b) ease of use (compared to writing out [a-z] as production > rules). > > 3) I've actually used lex without yacc (well, flex without bison) when faced > with dealing with a language that's regular (and easy enough to express that > way - cf. an enormous finite subset of a context-free language). > > > 2b is mostly irrelevant in Haskell, as Parsec already provides functions that > can easily match the same things a regexp would. > > 2a, if it stands up to testing, is the best argument for ripping things apart > in Haskell using a DFA. Parsec and cousins are efficient, but it's hard to > beat a single table lookup per character. The questions are 1) is the > difference enough to matter in many cases, and 2) is there a way to get this > out of parsec without touching regexps? (It's not impossible that parsec > already recognizes when a language is regular, although I'd be weakly > surprised). > > > > > > On Thu, Feb 14, 2013 at 3:07 PM, wren ng thornton <w...@freegeek.org > (mailto:w...@freegeek.org)> wrote: > > On 2/13/13 11:18 PM, wren ng thornton wrote: > > > On 2/13/13 11:32 AM, Nicolas Bock wrote: > > > > Since I have very little experience with Haskell and am not used to > > > > Haskell-think yet, I don't quite understand your statement that > > > > regexes are > > > > seen as foreign to Haskell-think. Could you elaborate? What would a more > > > > "native" solution look like? From what I have learned so far, it seems > > > > to > > > > me that Haskell is a lot about clear, concise, and well structured > > > > code. I > > > > find regexes extremely compact and powerful, allowing for very concise > > > > code, which should fit the bill perfectly, or shouldn't it? > > > > > > Regexes are powerful and concise for recognizing regular languages. They > > > are not, however, very good for *parsing* regular languages; nor can > > > they handle non-regular languages (unless you're relying on the badness > > > of pcre). In other languages people press regexes into service for > > > parsing because the alternative is using an external DSL like lex/yacc, > > > javaCC, etc. Whereas, in Haskell, we have powerful and concise tools for > > > parsing context-free languages and beyond (e.g., parsec, attoparsec). > > > > > > Just to be clear, the problem isn't that proper regexes are only good for > > regular languages (many files have regular syntax afterall). The problem is > > that regexes are only good for recognition. They're an excellent tool for > > deciding whether a given string is "good" or "bad"; but they're completely > > unsuitable for the task of parsing/interpreting a string into some > > structure or semantic response. If you've ever used tools like yacc or > > javaCC, one of the crucial things they offer is the ability to add these > > semantic responses. Parser combinator libraries in Haskell are similar, > > since the string processing is integrated into a programming language so we > > can say things like: > > > > myParser = do > > x <- blah > > guard (p x) > > y <- blargh > > return (f x y) > > > > where p and f can be an arbitrary Haskell functions. Perl extends on > > regular expressions to try and do things like this, but it's extremely > > baroque, hard to get right, and impossible to maintain. (N.B., I was raised > > on Perl and still love it.) And at some point we have to call into question > > the idea of regexes as an embedded DSL when we then turn around and try to > > have Perl be a DSL embedded into the regex language. > > > > One of the big things that makes regexes so nice is that they identify > > crucial combinators like choice and repetition. However, once those > > combinators have been identified, we can just offer them directly as > > functions in the host language. No need for a special DSL or special > > syntax. The big trick is doing this efficiently. Parser combinators were an > > academic curiosity for a long time until Parsec came around and made them > > efficient. And we've come a long way since then: with things like > > attoparsec, PEG parsing, and non-monadic applicative parsers (which can > > perform more optimizations because they can identify the structure of the > > grammar). > > > > The theory of regular expressions is indeed beautiful and elegant. However, > > it's a theory of recognition, not a theory of parsing; and that's a crucial > > distinction. Haskell is about clear, concise, and well-structured code; but > > to be clear, concise, and well-structured we have to choose the right tool > > for the job. > > > > > > -- > > Live well, > > ~wren > > > > _______________________________________________ > > Haskell-Cafe mailing list > > Haskell-Cafe@haskell.org (mailto:Haskell-Cafe@haskell.org) > > http://www.haskell.org/mailman/listinfo/haskell-cafe > > _______________________________________________ > Haskell-Cafe mailing list > Haskell-Cafe@haskell.org (mailto:Haskell-Cafe@haskell.org) > http://www.haskell.org/mailman/listinfo/haskell-cafe > >
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe