> I've been working on a project that requires me to do screen scraping.
If you are screen scraping HTML I think tagsoup is a very good choice. The use of tagsoup means that you have a real HTML 5 compliant parser underneath, and then you can use whatever technique you wish to split up the page text - and regular expressions/parsec might be a reasonable choice. I've written lots of screen scraping stuff with tagsoup, and it's usually very easy - the manual even walks you through a couple of examples: http://community.haskell.org/~ndm/darcs/tagsoup/tagsoup.htm > He's very experienced, and comes from > a Perl perspective. I let him into what I was doing, and he opined I > should be using pcre. When all you have is a hammer, everything looks like a thumb. Structured manipulation of algebraic data types is trivial in Haskell, and much less natural in Perl, so they use different techniques in different places. > So now I'm second guessing my choices. Why do > people choose not to use regex for uri parsing? If you mean HTML parsing, then it's because it's a nightmare to get right, and people on the web do all kinds of crazy stuff. A correct regular expression to match an HTML tag is lots of work. Given that it's a solved problem, why go to all that effort. It is possible to do with regular expressions, but not pleasant. Thanks, Neil _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe