Simon Marlow wrote:
Chris Kuklewicz wrote:

That could work well.  It would not involved too much pulling apart.

Once small quirk is there is the old Text.Regex API and a new JRegex-style API.

Is it possible to provide both?  Perhaps deprecating the current API?

It is possible to provide the old and new. The old was only defined for the String type and this probably will not be changed (at least at first).

A "default" backend has to be dependably present. That means either keeping the current Posix backend, adding a dependency on PCRE, or using the Haskell/Parsec backend.

I'm not keen on adding a PCRE dependency. We already include an implementation of POSIX regexes in GHC itself (libraries/base/cbits/regex) which tends to get used on Windows where there isn't an implementation of POSIX regexes

Ah.  That is how you are doing it.

The problem is that String is very inefficient with Posix or PCRE and ByteString is slightly inefficient with Haskell/Parsec.

Do you have any measurements (rough measurements would be fine)? When you say "very inefficient", by what factor is the Parsec implementation faster than using the Posix one for Strings?

This whole Text.Regex.Lazy project was born from the computer language shootout. , http://haskell.org/hawiki/RegexDna . The Text.Regex(.Posix) that came with GHC timed out (hours!). The pure haskell/parsec version took about 2 minutes. That is the meaning "very inefficient" for repeated use of Text.Regex(.Posix) on String: more than two orders of magnitude, since it is not caching the CString that it marshals.


If we were to use the Parsec implementation, that pulls in another dependency. Not out of the question, but to be avoided if possible.

The only nonparsec/nonlibrary version is a simple DFA which is too simple for many uses. To get what people expect from regular expressions you need posix library, pcre library, my parsec parser, or find someone else's regex implementation in haskell. Or the parsec version could eventually be rewritten to not depend on parsec by implementing its own parser monad.

To keep a Posix default backend the libraries/base/cbits/regex may need to become part of regex-posix. That would be a learning curve for me as I have no ghc on windows experience, though I have a computer for it next to me. So I might need help later for that.

So we could either:

  - work on regex-base/regex-posix for inclusion in GHC, or

I could prepare this for you.

Great, thanks!

The re-organization is in progress (hooray for "darcs mv").
After re-organization will come the doc/Haddock clean up to match.
After that comes the unit testing clean up (I have some HUnit and QuickCheck 
now).
Then, time permitting, benchmarks.

I'll assemble a version organized like that this week. Important question: Should I be planning to install alongside the current Text.Regex(.Posix) or planning on replacing them? (With an identical API)?

We want to replace Text.Regex. So ideally you want to do this in a GHC tree, so you can remove the old Text.Regex and replace with yours. If this is too difficult, then you could develop it separately (as Text.Regex.New, or something), and I'll make the relevant changes when I import it.

I will make such a Text.Regex.New that fakes the old API. I'll make it use the posix backend, but that can be changed via an import statement.

I suggest removing the old Text.Regex.Posix module. People will be able to make better use of the new API for doing this.

--
Chris

_______________________________________________
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Reply via email to