Re: 6.6 plans and status

Chris Kuklewicz Tue, 08 Aug 2006 05:16:36 -0700

Simon Marlow wrote:

Chris Kuklewicz wrote:
That could work well.  It would not involved too much pulling apart.
Once small quirk is there is the old Text.Regex API and a newJRegex-style API.
Is it possible to provide both?  Perhaps deprecating the current API?

It is possible to provide the old and new. The old was only defined for theString type and this probably will not be changed (at least at first).

A "default" backend has to be dependably present. That means eitherkeeping the current Posix backend, adding a dependency on PCRE, orusing the Haskell/Parsec backend.
I'm not keen on adding a PCRE dependency. We already include animplementation of POSIX regexes in GHC itself(libraries/base/cbits/regex) which tends to get used on Windows wherethere isn't an implementation of POSIX regexes


Ah.  That is how you are doing it.

The problem is that String is very inefficient with Posix or PCRE andByteString is slightly inefficient with Haskell/Parsec.
Do you have any measurements (rough measurements would be fine)? Whenyou say "very inefficient", by what factor is the Parsec implementationfaster than using the Posix one for Strings?

This whole Text.Regex.Lazy project was born from the computer language shootout., http://haskell.org/hawiki/RegexDna . The Text.Regex(.Posix) that came withGHC timed out (hours!). The pure haskell/parsec version took about 2 minutes.That is the meaning "very inefficient" for repeated use of Text.Regex(.Posix) onString: more than two orders of magnitude, since it is not caching the CStringthat it marshals.

If we were to use the Parsec implementation, that pulls in anotherdependency. Not out of the question, but to be avoided if possible.

The only nonparsec/nonlibrary version is a simple DFA which is too simple formany uses. To get what people expect from regular expressions you need posixlibrary, pcre library, my parsec parser, or find someone else's regeximplementation in haskell. Or the parsec version could eventually be rewrittento not depend on parsec by implementing its own parser monad.

To keep a Posix default backend the libraries/base/cbits/regex may need tobecome part of regex-posix. That would be a learning curve for me as I have noghc on windows experience, though I have a computer for it next to me. So Imight need help later for that.

So we could either:


  - work on regex-base/regex-posix for inclusion in GHC, or


I could prepare this for you.


Great, thanks!


The re-organization is in progress (hooray for "darcs mv").
After re-organization will come the doc/Haddock clean up to match.
After that comes the unit testing clean up (I have some HUnit and QuickCheck 
now).
Then, time permitting, benchmarks.

I'll assemble a version organized like that this week. Importantquestion:Should I be planning to install alongside the currentText.Regex(.Posix) or planning on replacing them? (With an identicalAPI)?
We want to replace Text.Regex. So ideally you want to do this in a GHCtree, so you can remove the old Text.Regex and replace with yours. Ifthis is too difficult, then you could develop it separately (asText.Regex.New, or something), and I'll make the relevant changes when Iimport it.

I will make such a Text.Regex.New that fakes the old API. I'll make it use theposix backend, but that can be changed via an import statement.

I suggest removing the old Text.Regex.Posix module. People will be able to makebetter use of the new API for doing this.


--
Chris

_______________________________________________
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Re: 6.6 plans and status

Reply via email to