Simon Marlow wrote:
Chris Kuklewicz wrote:
That could work well. It would not involved too much pulling apart.
Once small quirk is there is the old Text.Regex API and a new
JRegex-style API.
Is it possible to provide both? Perhaps deprecating the current API?
It is possible to provide the old and new. The old was only defined for the
String type and this probably will not be changed (at least at first).
A "default" backend has to be dependably present. That means either
keeping the current Posix backend, adding a dependency on PCRE, or
using the Haskell/Parsec backend.
I'm not keen on adding a PCRE dependency. We already include an
implementation of POSIX regexes in GHC itself
(libraries/base/cbits/regex) which tends to get used on Windows where
there isn't an implementation of POSIX regexes
Ah. That is how you are doing it.
The problem is that String is very inefficient with Posix or PCRE and
ByteString is slightly inefficient with Haskell/Parsec.
Do you have any measurements (rough measurements would be fine)? When
you say "very inefficient", by what factor is the Parsec implementation
faster than using the Posix one for Strings?
This whole Text.Regex.Lazy project was born from the computer language shootout.
, http://haskell.org/hawiki/RegexDna . The Text.Regex(.Posix) that came with
GHC timed out (hours!). The pure haskell/parsec version took about 2 minutes.
That is the meaning "very inefficient" for repeated use of Text.Regex(.Posix) on
String: more than two orders of magnitude, since it is not caching the CString
that it marshals.
If we were to use the Parsec implementation, that pulls in another
dependency. Not out of the question, but to be avoided if possible.
The only nonparsec/nonlibrary version is a simple DFA which is too simple for
many uses. To get what people expect from regular expressions you need posix
library, pcre library, my parsec parser, or find someone else's regex
implementation in haskell. Or the parsec version could eventually be rewritten
to not depend on parsec by implementing its own parser monad.
To keep a Posix default backend the libraries/base/cbits/regex may need to
become part of regex-posix. That would be a learning curve for me as I have no
ghc on windows experience, though I have a computer for it next to me. So I
might need help later for that.
So we could either:
- work on regex-base/regex-posix for inclusion in GHC, or
I could prepare this for you.
Great, thanks!
The re-organization is in progress (hooray for "darcs mv").
After re-organization will come the doc/Haddock clean up to match.
After that comes the unit testing clean up (I have some HUnit and QuickCheck
now).
Then, time permitting, benchmarks.
I'll assemble a version organized like that this week. Important
question:
Should I be planning to install alongside the current
Text.Regex(.Posix) or planning on replacing them? (With an identical
API)?
We want to replace Text.Regex. So ideally you want to do this in a GHC
tree, so you can remove the old Text.Regex and replace with yours. If
this is too difficult, then you could develop it separately (as
Text.Regex.New, or something), and I'll make the relevant changes when I
import it.
I will make such a Text.Regex.New that fakes the old API. I'll make it use the
posix backend, but that can be changed via an import statement.
I suggest removing the old Text.Regex.Posix module. People will be able to make
better use of the new API for doing this.
--
Chris
_______________________________________________
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users