[Haskell] major speed improvement: regex-tdfa reaches 1.0.0

2009-03-01 Thread ChrisK

Announcing the version 1.0.0 release of regex-tdfa.

I am proud of this release.
This is not just a bug fix release.
It is a serious improvement in the asymptotic running time.

The previous versions allowed bad combinations of pattern and searched
text length to scale badly in the length of the text.  Previously the
worst case for text of length N was O(N^3).

The new worst case asymptotic runtime scaled as O(N).
There is never any backtracking.
And the worst case storage space is independent of N.

The package is on hackage at
http://hackage.haskell.org/cgi-bin/hackage-scripts/package/regex-tdfa
The darcs repository is at
http://darcs.haskell.org/packages/regex-unstable/regex-tdfa/

All non-overlapping matches are found and returned, along with all
captured (parenthesized) subexpressions.  The result is precisely what
Posix extended regular expressions are supposed to return.

To be concrete, consider example text with length of N of (2^n)+2:

> longInput = replicate (2^n) 'a' ++ "bc"

And define 4 worst-case-scenario patterns.  I wrote the code and I
know how to kill it:

> wcs0 = "a*b"
> wcs1 = "a*c"
> wcs2 = "(a|aa)*b"
> wcs3 = "(a|aa)*c"

wcs0 is easy.
wcs1 causes the old code to backtrack.
wcs2 causes the old code's storage to scale as O(N).
wcs3 causes both backtracking and O(N) storage with the old code.

The old code's time scales as O(N) for wcs0, O(N^2) for wcs1 and wcs2,
and O(N^3) for wcs3.  The new code is always O(N).  The actual timings
for the old code on my G4 laptop for wcs on 2^8 and 2^9 and 2^10 are:

Reason:compare-tdfa chrisk$ time ./Test-TDFA-np wcs3 8 +RTS -sstderr 2>&1  | 
head -4
./Test-TDFA-np wcs3 8 +RTS -sstderr
Test for [Char]
Right [array (0,1) [(0,(257,1)),(1,(-1,0))]]
 263,776,756 bytes allocated in the heap
user0m1.017s
sys 0m0.058s

Reason:compare-tdfa chrisk$ time ./Test-TDFA-np wcs3 9 +RTS -sstderr 2>&1  | 
head -4
./Test-TDFA-np wcs3 9 +RTS -sstderr
Test for [Char]
Right [array (0,1) [(0,(513,1)),(1,(-1,0))]]
   1,998,647,452 bytes allocated in the heap
user0m7.083s
sys 0m0.289s

Reason:compare-tdfa chrisk$ time ./Test-TDFA-np wcs3 10 +RTS -sstderr 2>&1  | 
head -4

./Test-TDFA-np wcs3 10 +RTS -sstderr
Test for [Char]
Right [array (0,1) [(0,(1025,1)),(1,(-1,0))]]
  15,653,277,200 bytes allocated in the heap
user0m53.350s
sys 0m2.056s

The doubling of length is causing a nearly eight-fold time increase.
The heap allocation is also increasing nearly eight-fold.

The new code with the same input pattern and texts gives:

Reason:compare-tdfa chrisk$ time ./Test-TDFA2-0.99.19-np2 wcs3 8 +RTS -sstderr 
2>&1  | head -4

./Test-TDFA2-0.99.19-np2 wcs3 8 +RTS -sstderr
Test for [Char]
Right [array (0,1) [(0,(257,1)),(1,(-1,0))]]
   2,135,324 bytes allocated in the heap
user0m0.017s
sys 0m0.017s

Reason:compare-tdfa chrisk$ time ./Test-TDFA2-0.99.19-np2 wcs3 9 +RTS -sstderr 
2>&1  | head -4

./Test-TDFA2-0.99.19-np2 wcs3 9 +RTS -sstderr
Test for [Char]
Right [array (0,1) [(0,(513,1)),(1,(-1,0))]]
   3,588,656 bytes allocated in the heap
user0m0.024s
sys 0m0.017s

Reason:compare-tdfa chrisk$ time ./Test-TDFA2-0.99.19-np2 wcs3 10 +RTS -sstderr 
2>&1  | head -4

./Test-TDFA2-0.99.19-np2 wcs3 10 +RTS -sstderr
Test for [Char]
Right [array (0,1) [(0,(1025,1)),(1,(-1,0))]]
   6,345,436 bytes allocated in the heap
user0m0.038s
sys 0m0.018s

Note that the heap allocation for the 1026 character example above is
2466 times less than the old code.

That was too fast to prove the scaling, so take more input:

Reason:compare-tdfa chrisk$ time ./Test-TDFA2-0.99.19-np2 wcs3 20 +RTS -sstderr 
2>&1  | head -4

./Test-TDFA2-0.99.19-np2 wcs3 20 +RTS -sstderr
Test for [Char]
Right [array (0,1) [(0,(1048577,1)),(1,(-1,0))]]
   5,708,574,848 bytes allocated in the heap
user0m26.023s
sys 0m0.985s

Reason:compare-tdfa chrisk$ time ./Test-TDFA2-0.99.19-np2 wcs3 21 +RTS -sstderr 
2>&1  | head -4

./Test-TDFA2-0.99.19-np2 wcs3 21 +RTS -sstderr
Test for [Char]
Right [array (0,1) [(0,(2097153,1)),(1,(-1,0))]]
  11,416,354,056 bytes allocated in the heap
user0m52.656s
sys 0m1.985s

The length and time both doubled, as did the heap allocation.  And the
new code has searched two million characters in the time the old code
searched one thousand.

How about away from the worst case scenario?  On the testing suite the
new code is running slightly slower:

Reason:compare-tdfa chrisk$ time ./Test-TDFA-np -r 1 100 > /dev/null
user0m4.841s
sys 0m3.019s

Reason:compare-tdfa chrisk$ time ./Test-TDFA2-0.99.19-np2 -r 1 100 > /dev/null
user0m5.970s
sys 0m3.012s

So that is an increase of execution time of 14%.  This small dip in
performance might be reclaimable with more optimization.  I think the
gain in worst case performance already offsets the slight cost.

The code for St

[Haskell] ANN: bug fix for regex-tdfa, version 0.97.4 (and "regex-ast")

2009-02-24 Thread ChrisK

Hello,

  The regex-tdfa package has had a series of bug fix releases (0.97.1 and 2 and 
3 and now 4).  This 0.97.4 releases finishes fixing the bug that was only mostly 
fixed in the 0.97.1 release.


  An example of the fixed bug: Apply the regex pattern (BB(B?))+(B?) to the 
text .  The "BB" in the pattern should be used twice and both "B?" should 
match nothing.  My code grouped the "+" wrong and matched the "BB" once and then 
both the "B?" matched a "B".


  The case fixed here was not initially caught because of how I search for 
unknown bugs.  I use "Arbitrary" from QuickCheck to generate random patterns and 
strings to search, and compare regex-tdfa to another POSIX engine.


  Because I am on OS X, I am limited by the the native POSIX libraries bugs: 
this bug in regex-tdfa was triggered only when the native POSIX was also buggy.


  But the source of most of my unit tests is AT&T research [1], and they have a 
"libast" with a POSIX implementation.  I have adapted my regex-* wrapper 
packages to make a "regex-ast" Haskell interface, but the difficulties with the 
AT&T headers prevent me from releasing this on hackage.  This "regex-ast" has 
given me access to a less buggy POSIX back-end, and randomized testing has led 
to catching the bug fixed here (as well as a few bug reports back to AT&T).


  So while regex-tdfa will not win many speed contests, it is the only POSIX 
regular expression library I have running that passes all the unit tests.


[1] http://www.research.att.com/sw/download/
http://www.research.att.com/~gsf/testregex/
http://www.research.att.com/~gsf/testregex/re-interpretation.html
___
Haskell mailing list
Haskell@haskell.org
http://www.haskell.org/mailman/listinfo/haskell


[Haskell] ANN: Bug fix to regex-tdfa, new version 0.97.3

2009-02-10 Thread ChrisK

To Haskell and Libraries and Haskell-Cafe,

Whilst improving regex-tdfa I have run across new bugs.  Some patterns were 
getting compiled wrong and others were affected by an execution bug.


As this package has actual users, I wanted to make sure they get these fixes 
immediately.


Three Cheers For QuickCheck!

The new version is 0.97.3 at
http://hackage.haskell.org/cgi-bin/hackage-scripts/package/regex-tdfa

And this version passes all the unit tests I have, including coverage for the 
new bugs.  This is no warranty that regex-tdfa is bug free, since I made that 
same claim last release.  For instance: I suspect 0.97.3 may be buggy if used in 
the optional left-associative mode.


The new improved version of regex-tdfa is still a long way off.

Cheers,
 Chris Kuklewicz
___
Haskell mailing list
Haskell@haskell.org
http://www.haskell.org/mailman/listinfo/haskell


[Haskell] ANN: regex-posix-unittest-1.0 AND regex-posix-0.94.1 AND regex-tdfa-0.97.1

2009-02-02 Thread ChrisK

I have three announcements to make about regex-* related packages.

The regex-posix-0.94.1 package update provides better semantics for multiple 
matches.  Below version 0.94, if any match was empty the matching would stop. 
Now the empty match is returned and the position is incremented and the 
searching continues.


The regex-tdfa-0.71.1 package update provides the same new multiple match 
semantics.  It also fixes a bug I found.  I know of no outstanding bugs in 
regex-tdfa, and version 0.71.1 now passes all the tests used in 
regex-posix-unittest-1.0 announced below.


We should care about the correctness of our operating system libraries.
To help with this, I have a NEW package to announce: regex-posix-unittest-1.0

The accompanying wiki page is http://www.haskell.org/haskellwiki/Regex_Posix

This new package provides an executable called regex-posix-unittest which you 
can install as --user or --global.


The regex-posix-unittest executable with no arguments runs a suite of unit 
tests, all of which are described by text files in the package, the format is 
documented in the wiki page.  By editing the text files in the package you can 
add to or delete from the unit tests being run.


With two arguments the program expects the text first and the pattern second and 
will run just that match and print all the results.


How does regex-posix-unittest help us care about the OS libraries?

The regex-posix distributed in the GHC bundle uses the OS C library's "regex.h" 
API.  The regex-posix-unittest package will quite likely show you that your OS C 
library "regex.h" API is full of bugs.


If you are on Linux, it will show you a plethora of GLIBC bugs in Posix 
conformance.

If you are on OS X, FreeBSD, or NetBSD, it will show you many bugs including a 
critical bug where it fail to find a match where one actually exists.


These bugs in the OS library are inherited by your "sed" program as well as 
regex-posix and Haskell.


If you are on Windows, or OpenBSD, or Solaris, or anything else, then please 
update the wiki page at http://www.haskell.org/haskellwiki/Regex_Posix or email 
me with your results so I can update the wiki.


You may have evil and ingenious tests of Posix extended regular expressions to 
add to the test suite.  Adding them is easy and if you send them to me I will 
put them in an updated version of regex-posix-unittest.


Cheers,
  Chris
___
Haskell mailing list
Haskell@haskell.org
http://www.haskell.org/mailman/listinfo/haskell


Re: [Haskell] type inference & instance extensions

2009-01-27 Thread ChrisK

In ghc 6.10.1 the ~ constraint is working:


{-# LANGUAGE TypeFamilies #-}

{-# OPTIONS -fglasgow-exts #-}
{-# OPTIONS -fallow-undecidable-instances #-}

module D where

instance (Num a, Num b, a ~ b) => Num (a,b) where
(x,y) * (u,v) = (x*u-y*v, x*v+y*u)

test1 = (1,1) * (2,2)

test2 = (1,1.0)*(2,2)


With ghci:

*D> test1
test1
(0,4)
*D> test2
test2
(0.0,4.0)
*D> :t test1
:t test1
test1 :: (Integer, Integer)
*D> :t test2
:t test2
test2 :: (Double, Double)

--
Chris

___
Haskell mailing list
Haskell@haskell.org
http://www.haskell.org/mailman/listinfo/haskell


[Haskell] Variable arity function (VarArg)

2005-04-20 Thread ChrisK
And while I'm posting to the list, I'll send something I wish I had 
found earlier.

I had wanted to write show several things, and writing show 10 times 
was not clever.

And so I initially created an infix operator to put between everything 
to do the showing, which was not much better.

But this http://okmij.org/ftp/Haskell/types.html#polyvar-fn was the 
clever technique.

{-# OPTIONS -fglasgow-exts #-}
-- Techinque from http://okmij.org/ftp/Haskell/types.html#polyvar-fn
-- canShowList and ssum will take a variable number of arguments
data CanShow where { CanShow :: Show a => a -> CanShow
   ; CSLit::String->CanShow}
instance Show CanShow where
show (CanShow a) = show a
show (CSLit s) = s
-- The initial type is accumulator, here a simple list
class (Show a) => ShowList a r where
canShowList :: [CanShow] -> a -> r
-- After accumulating last argument, you can apply a function, e.g. 
reverse
instance (Show b) => ShowList b [CanShow] where
canShowList l x = reverse $ (CanShow x):l

-- Get next argument 
instance (Show a,ShowList b r) => ShowList a (b->r) where
canShowList l x = canShowList ((CanShow x):l)

-- Could eat initial fully typed arguments and make tuple with []
sL :: (ShowList a r) => a -> r
sL = canShowList []
pio :: [CanShow]->IO()
pio = putStr . unlines . (map show)
eatFirstClass :: (forall a r.(ShowList a r)=>(a->r))->[CanShow]
eatFirstClass s = s "and more"
-- Using the same technique to do more work
-- This example needs the "r->a" FunDep to work:
class (Num a)=>ScaledSum a r | r->a where
ssum' :: (a,a) -> a -> r
instance ScaledSum Double Double where
ssum' (s,t) x =  (s*(t+x))
-- As an added bonus we get "context sensativity"
instance ScaledSum Double Int where
ssum' (s,t) x = floor (s*(t+x))
instance (ScaledSum a p) => ScaledSum a (a->p) where
ssum' (s,t) x = ssum' (s,t+x)
ssum a x = ssum' (a,0) x
-- This fails since ::[Int] applies to (print $ build 1 2 3)
--main = print $ build 1 2 3 ::[Int]
-- Use parenthesis to control ::[Int] syntax
main = do
  let empty :: [String]
  empty = []
  full = [1,2,3]
  efc = eatFirstClass (sL empty full)
  eol = CSLit "\n"
  ssI = (ssum 12) 1 (1/2) (-3.5) :: Int
  ssD = (ssum 12) 1 (1/2) (-3.5) :: Double
  pio (sL "Hi!" (17,'a') eol (1,CSLit ['a']) empty (CSLit 
"ping\n") full efc ("ssum",ssI,ssD) "Bye!")
  let
  other = sL "Terminate with ::[CanShow]" full 1 eol 2 eol 
3 ::[CanShow]
  putStrLn $ show other

=== GHCI ===
*Main> main
"Hi!"
(17,'a')
(1,a)
[]
ping
[1,2,3]
[[],[1,2,3],"and more"]
("ssum",-24,-24.0)
"Bye!"
["Terminate with ::[CanShow]",[1,2,3],1,
,2,
,3]
___
Haskell mailing list
Haskell@haskell.org
http://www.haskell.org/mailman/listinfo/haskell


Re: [Haskell] Going nuts

2005-04-20 Thread ChrisK
On Apr 20, 2005, at 10:04 PM, Alexandre Weffort Thenorio wrote:
As usual a beginner in Haskell. Trying to write a simple program in 
haskel
shown below


outputLine keyno key orgFile = do
part1 <- getLeft keyno orgFile
part2 <- getRight keyno orgFile
total <- part1 ++ (strUpper key) ++ part2 ++ "\n"
This is just creating a new expression bound to total, not an action.
Instead of <- use let =
 let total = part1 ++ (strUpper key) ++ part2 ++ "\n"
Something about Lists [] also being a Monad gave a confusing error 
message to you

newHexFile <- openFileEx "newfile" (BinaryMode WriteMode)
 hPutStrLn newHexFile (orgFile!!0 ++ "\n" ++ total ++ unlines 
(drop 2
orgFile))

strUpper :: String -> String
strUpper [] = ""
--strUpper x:xs = (toUpper x):(strUpper xs)
And I keep getting the error
changecode.hs:42:
Couldn't match `[a]' against `Char'
Expected type: [a]
Inferred type: Char
In the first argument of `(++)', namely `part1'
In a 'do' expression:
total <- part1 ++ ((strUpper key) ++ (part2 ++ "\n"))
I have tried thousands and thousand of modifications (Originally 
getLeft and
getRight didn't exist, the code was part of outputLine) but it keeps
thinking that orgFile is a String when clearlly it is a [String]. Also 
I can
get my function strUpper (Which simply puts strings to Upper case) to 
work.

Can anybody see what is wrong here?
Best Regards
Alex
___
Haskell mailing list
Haskell@haskell.org
http://www.haskell.org/mailman/listinfo/haskell
___
Haskell mailing list
Haskell@haskell.org
http://www.haskell.org/mailman/listinfo/haskell


Re: [Haskell] Control.Monad.Writer as Python generator

2005-04-15 Thread ChrisK
On Apr 15, 2005, at 7:03 PM, Jan-Willem Maessen wrote:
On Apr 15, 2005, at 6:07 PM, ChrisK wrote:
You are correct.  Moand.Cont yield even runs without -O optimizing,
just slower
...
Anyone have an idea why ghci can't garbage collect it?
Is this an actual bug or an innate quirk of the REPL ?
GHCi does not "compile" with optimizations, without -O the 
strictness analyzer
isn't run.
The optimizer is irrelevant to whether it runs in constant space, as 
ghc without '-O' runs it just fine.  The optimizer is only useful for 
speed, which is not the issue.
Not true!  The optimizer can change the asymptotic space consumption 
of your program.  The example of sum is particularly germaine.  If we 
write:

x = foldl (+) 0 [1..n]
this will, as it is evaluated, generate the suspended computation
(((0 + 1) + 2) + 3) + ...) + n
This require O(n) space.
I was writing in the context of the yield with Monad.Cont example.  The 
'it' in 'whether it run in constant space' is the yield example of 
"length $ take (10^7) zerosInf".  Nothing more general.

So I agree with your statement, that asymptotic space consumption can 
change with optimization.  The odd thing is that ghc (without -O) 
compiles and run the example in question, but ghci will fail.  So I was 
wondering if there was an implementation of yield using continuations 
that would make yield space efficient in ghci.   Perhaps I should also 
check hugs, but I have not used hugs yet.

--
Chris
___
Haskell mailing list
Haskell@haskell.org
http://www.haskell.org/mailman/listinfo/haskell


Re: [Haskell] Control.Monad.Writer as Python generator

2005-04-15 Thread ChrisK
You are correct.  Moand.Cont yield even runs without -O optimizing,
just slower
...
Anyone have an idea why ghci can't garbage collect it?
Is this an actual bug or an innate quirk of the REPL ?
GHCi does not "compile" with optimizations, without -O the strictness 
analyzer
isn't run.
The optimizer is irrelevant to whether it runs in constant space, as 
ghc without '-O' runs it just fine.  The optimizer is only useful for 
speed, which is not the issue.

 The difference is most likely due to strictness analysis.  A well
placed strictness annotation or two should be able to make it work in 
GHCi as
well.
In this code, adding a strictness $! did not work.
A similar situation occurs with sum: in GHCi for large inputs it
overflows the stack, but when compiled with -O it works correctly, 
this is
because sum is defined with foldl and not foldl' in GHC.
As for adding one strictness annotation, the brute for approach to 
adding '$!' did not work:

yield :: a -> Cont [a] ()
-- original
--yield x = Cont (\c -> x : c () )
-- original in prefix form
--yield x = Cont (\c -> (((:) x) (c (-- memory exhaustion
-- non-trivial
--yield x = Cont (\c -> (((:) x) $! (c ( -- stack overflow
-- silly
--yield x = Cont (\c -> (((:) $! x) (c ( -- memory exhaustion
-- definitely silly
--yield x = Cont (\c -> (((:) x) (c $! ( -- memory exhaustion
So adding two '$!' to the above looks like a non-starter.  The 
asGenerator definition is

asGenerator :: Cont [a] v -> [a]
asGenerator (Cont f) = f (const [])
which has no useful place to insert a '$!'.  So to use continuations in 
GHCI, it may be necessary to build a new version of Cont and its 
internals, or maybe use the callCC interface?  And I had not luck 
adding '$!' to callCC/Cont or callCC/mapCC versions:

-- yield using callCC
yieldCC x = callCC genContCCArg
where
genContCCArg = (\oldGenContFunc ->
let
  newCont  = Cont { runCont = newRunCont }
  newRunCont =  (\contFunc -> (x:(oldRunCont 
contFunc)))
  oldRunCont = runCont oldCont
  oldCont = oldGenContFunc ()
in newCont
   )

-- Use the mysterious mapCont function
yieldM x = callCC genContCCArg
where
genContCCArg = (\genContFunc -> mapCont (\xs -> x:xs) (genContFunc 
()))

I found no useful explanation of mapCont via Google.  It was another 
case of deriving the function's action from the type.

Since this is now a "gchi problem", should this be taken to the ghc 
mailing list as well? instead?

--
Chris
___
Haskell mailing list
Haskell@haskell.org
http://www.haskell.org/mailman/listinfo/haskell


Re: [Haskell] Control.Monad.Writer as Python generator

2005-04-15 Thread ChrisK
You are correct.  Moand.Cont yield even runs without -O optimizing, 
just slower:

Monad.Writer counts 10^9 zeros in 99 seconds (user time)
Monad.Cont counts 10^8 zero in 35 seconds user time.
So the writer is 3.5 times faster without '-O'
With -O
Monad.Writer counts 10^9 zeros in 105 seconds
Monad.Cont counts 10^8 zeros in 11 seconds, 10^9 zeros in 110 seconds.
So with '-O' they are the same speed.  Nice.
Anyone have an idea why ghci can't garbage collect it?
Is this an actual bug or an innate quirk of the REPL ?
--
Chris
On Apr 15, 2005, at 12:43 AM, Cale Gibbard wrote:
However, after compiling with optimisations turned on, there is no
such problem with the continuation-based version, memory usage appears
constant.
 - Cale
On 4/14/05, ChrisK <[EMAIL PROTECTED]> wrote:
Thanks for the Cont example, David.  But...
The MonadCont is clever and it works ... but then fails -- ghci does
not garbage collect and it blows up.
With the MonadCont version I can count up to 10^7 zeros:
*Main> length $ take (10^7) zerosInf
1000
(26.20 secs, 0 bytes)
But this increases RSIZE of ghc-6.4 to 165MB.  The 10^8 version goes 
to
swap space and I had to kill it.  My original MonadWriter version does
not increase RSIZE when run (constant space), so the garbage 
collection
must be working, and it is O(N) in the # of zeros counted:

*Main> length $ take (10^7) zerosInf
1000
(1.22 secs, 0 bytes)
*Main> length $ take (10^8) zerosInf
1
(10.05 secs, 0 bytes)
*Main> length $ take (10^9) zerosInf
10
(109.83 secs, 6 bytes)
--
Chris
___
Haskell mailing list
Haskell@haskell.org
http://www.haskell.org/mailman/listinfo/haskell


Re: [Haskell] Control.Monad.Writer as Python generator

2005-04-14 Thread ChrisK
Thanks for the Cont example, David.  But...
The MonadCont is clever and it works ... but then fails -- ghci does 
not garbage collect and it blows up.
With the MonadCont version I can count up to 10^7 zeros:

*Main> length $ take (10^7) zerosInf
1000
(26.20 secs, 0 bytes)
But this increases RSIZE of ghc-6.4 to 165MB.  The 10^8 version goes to 
swap space and I had to kill it.  My original MonadWriter version does 
not increase RSIZE when run (constant space), so the garbage collection 
must be working, and it is O(N) in the # of zeros counted:

*Main> length $ take (10^7) zerosInf
1000
(1.22 secs, 0 bytes)
*Main> length $ take (10^8) zerosInf
1
(10.05 secs, 0 bytes)
*Main> length $ take (10^9) zerosInf
10
(109.83 secs, 6 bytes)
--
Chris
On Apr 14, 2005, at 1:05 AM, David Menendez wrote:
ChrisK writes:
I was thinking to myself:
What in Haskell would give me a "yield" command like a Python
generator?
And the answer was "tell" in Control.Monad.Writer -- and I wrote some
simple examples (see below).
Another possibility would be a continuation monad.
import Control.Monad.Cont
yield :: a -> Cont [a] ()
yield x = Cont (\c -> x : c ())
asGenerator :: Cont [a] v -> [a]
asGenerator (Cont f) = f (const [])
--
David Menendez <[EMAIL PROTECTED]> | "In this house, we obey the 
laws
<http://www.eyrie.org/~zednenem>  |of thermodynamics!"
___
Haskell mailing list
Haskell@haskell.org
http://www.haskell.org/mailman/listinfo/haskell


[Haskell] Control.Monad.Writer as Python generator

2005-04-12 Thread ChrisK
Hi,
I was thinking to myself:
What in Haskell would give me a "yield" command like a Python generator?
And the answer was "tell" in Control.Monad.Writer -- and I wrote some  
simple examples (see below).

Most Python code using yield would be translated to something much more  
idiomatic in Haskell than using Writer, or to something more  
complicated if it needed IO.

I thought this interesting enough to put on the haskell mailing list  
and wiki since it seemed to be in neither place (I searched, but your  
searching may be better than mine).

If there are no objections then I'll put this example on the wiki; any  
suggestions where on wiki to place it (e.g. MonadWriter)?

=== CUT HERE ===
import Control.Monad.Writer
-- Some type signatures would need -fglasgow-exts to compile
-- We only care about the Writer output, not the function return value
asGenerator :: Writer [a] v -> [a]
asGenerator writer = values where (_,values) = runWriter writer
--yield :: (MonadWriter [a] m) => a -> m ()
yield x = tell [x]
-- This allows several items to be yielded with one command
--yieldMany :: (MonadWriter [a] m) => [a] -> m ()
yieldMany = tell
zeros :: [Integer]
zeros = asGenerator (do yield 0
yield 0
yield 0)
zerosInf :: [Integer]
zerosInf = asGenerator zeros'
where zeros' = (yield 0 >>zeros')
-- The Collatz sequence function
foo :: (Integral a) => a -> a
foo x = case (x `mod` 2) of
 0 -> x `div` 2
 1 -> (3*x+1)
-- Uses "return ()" to end the list when 1 is reached
--collatzW :: (MonadWriter [a] m, Integral a) => a -> m ()
collatzW x = do
   yield x
   case x of
 1 -> return ()
 _ -> collatzW (foo x)
-- Keeps going, will repeat "1,4,2,1,.." if 1 is reached
--collatzInfW :: (MonadWriter [a] m, Integral a) => a -> m t
collatzInfW x = do
  yield x
  collatzInfW (foo x)
--collatzGen :: (MonadWriter [a] (Writer [a]), Integral a) => a -> [a]
collatzGen x = asGenerator (collatzW x)
--collatzInfGen :: (MonadWriter [a] (Writer [a]), Integral a) => a ->  
[a]
collatzInfGen x = asGenerator (collatzInfW x)

-- And these can be combined
collatz1 x = asGenerator (collatzW x >> yield 0 >> collatzW (x+1))
=== CUT HERE ===
*Main> zeros
[0,0,0]
*Main> take 10 zerosInf
[0,0,0,0,0,0,0,0,0,0]
*Main> collatzGen 13
[13,40,20,10,5,16,8,4,2,1]
*Main> take 100 $ collatzInfGen 13
[13,40,20,10,5,16,8,4,2,1,4,2,1,4,2,1,4,2,1,4,2,1,4,2,1,4,2,1,4,2,1,4,2, 
1,4,2,1,4,2,1,4,2,1,4,2,1,4,2,1,4,2,1,4,2,1,4,2,1,4,2,1,4,2,1,4,2,1,4,2, 
1,4,2,1,4,2,1,4,2,1,4,2,1,4,2,1,4,2,1,4,2,1,4,2,1,4,2,1,4,2,1]

*Main> collatz1 12
[12,6,3,10,5,16,8,4,2,1,0,13,40,20,10,5,16,8,4,2,1]
___
Haskell mailing list
Haskell@haskell.org
http://www.haskell.org/mailman/listinfo/haskell