Re: [Haskell-cafe] Re: Haskell version of Norvig's Python Spelling Corrector

2007-04-21 Thread Albert Y. C. Lai
I try using WordSet = [String] (plus corresponding change in code) and 
get great speedup, actually way more than 3x. There was also a memory 
growth phenomenon using Set String, and replacement by [String] stops 
that too, now it's constant space (constant = 20M). It is possible to 
attribute part of the speedup to excellent rewrite rules in GHC 
regarding lists; however, I cannot explain the memory growth when using Set.


Regarding the local WordFreq map under "train", I am shocked that ghc -O 
is smart enough to notice it and perform proper sharing, and only one 
copy is ever created. Nonetheless, I still decide to factor "train" into 
two, one builds the WordFreq and the other queries it, which eases blame 
analysis when necessary.


On the interact line, I use "tokens" to break up the input, since it's 
already written (for the trainer), may as well reuse it.


When reading holmes.txt, be aware that it is in UTF-8, while GHC still 
assumes ISO-8859-1. This will affect results.


I have not checked the correctness of edits1.

I am monochrom.

My modification is attached.

module Main where

import Control.Arrow
import Data.Char (toLower, isPunctuation)
import Data.List (maximumBy)
import qualified Data.Map as M

type WordSet= [String]
type WordFreq   = M.Map String Int

main :: IO ()
main = do
-- 'holmes.txt' can be found here:
-- http://norvig.com/holmes.txt
-- We should train it with a larger corpus, but my program is
-- is already slow enough.
c <- readFile "holmes.txt"
let corrector = correct (train (tokens c))
interact $ unlines . map (show . (id &&& corrector)) . tokens

-- Returns a list of words lowercased and stripped of punctuation
-- at the end of the word.
tokens :: String -> [String]
tokens = map (lower . nopunc) . words
where
  lower   = map toLower
  nopunc  = strip isPunctuation
  strip p = reverse . dropWhile p . reverse

train :: [String] -> WordFreq
train words = frequencyMap
where
  frequencyMap = foldr incWordCount M.empty words
  incWordCount w m = M.insertWith (+) w 1 m

correct :: WordFreq -> String -> String
correct fm word = maximumBy freq . head $ filter (not . null)
[ known [word],
  known (edits1 word),
  known (edits2 word),
  [word] ]
where
  freq c c'= compare (findfreq c) (findfreq c')
  findfreq c   = M.findWithDefault 1 c fm
  known= filter (`M.member` fm) 

edits1 :: String -> WordSet
edits1 word = concat
   [[ t i ++ d (i+1)| i <- range n], -- deletion
[ t i ++ [word!!(i+1)] ++ [word!!i] ++ d (i+2) | i <- range (n-1) ], -- transposition
[ t i ++ [c] ++ d (i+1) | i <- range n, c <- alphabet ], -- alteration
[ t i ++ [c] ++ d i | i <- range (n+1), c <- alphabet ]] -- insertion
where
  n= length word
  t i  = take i word
  d i  = drop i word
  range x  = [ 0..(x-1) ]
  alphabet = ['a'..'z']

edits2 :: String -> WordSet
edits2 = concat . map edits1 . edits1
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: Haskell version of Norvig's Python Spelling Corrector

2007-04-21 Thread Pete Kazmier
Pete Kazmier <[EMAIL PROTECTED]> writes:

> I'd love to see other Haskell implementations as well if anyone has a
> few moments to spare.  Admittedly, it took me several hours to get my
> version working, but I'm a Haskell newbie.  Unfortunately, I think it
> runs as slow as it took me to write it!  There is definitely something
> wrong with it, a memory leak, because I can't correct more than a few
> words without a great deal of memory being consumed.

As monochrom pointed out on #haskell, I am using 'interact'
incorrectly.  For some reason I thought 'interact' applied its
argument to each line of the input.  I've replaced it as follows:

  interact $ unlines . map (show . (id &&& correct)) . lines

The program is still terribly slow due to my use of lists.  Is there a
better way to write 'edits1'?

Thanks,
Pete

 

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: Haskell version of Norvig's Python Spelling Corrector

2007-04-21 Thread Arie Peterson
Hi Pete,


> Recently I read an interesting article by Peter Norvig[1] on how to
> write a spelling corrector in 21-lines of Python.  I wanted to try and
> implement it in Haskell.  My implementation is terribly slow and I was
> hoping for some tips on how to improve it and make it idiomatic.

I had a quick look at this. One way to speed up your program is by
replacing the sets of altered words by lists. Filtering out doubles is a
noble cause, but in this case it takes more time than keeping them around
and doing some extra lookups in your frequency map. (I tried this, it gave
a speedup factor of ~3.)

> I'd love to see other Haskell implementations as well if anyone has a
> few moments to spare.  Admittedly, it took me several hours to get my
> version working, but I'm a Haskell newbie.  Unfortunately, I think it
> runs as slow as it took me to write it!  There is definitely something
> wrong with it, a memory leak, because I can't correct more than a few
> words without a great deal of memory being consumed.

Be careful when you apply the |train| function to more than one word; in
this form it may compute the frequency map from start for each invocation.
(It is better to let |train| take a frequency map, instead of a list of
words.)

Also be sure to compile your program with full optimisation ('-O2' for ghc).

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Lessons from memory leak in streaming parser

2007-04-21 Thread Oren Ben-Kiki

First, let me thank all the people who responded to my issue, it was
very helpful. I would have responded earlier but I was on a business
trip and out of contact for the last week.

On the bright side I used the time to re-work my implementation.
Instead of relying on Haskell's lazy evaluation, elegant as it may be,
I changed it to use a continuation passing style. Every invocation of
the parser returns a few tokens and a parser for the rest of the
input.

Interestingly, this approach maps well to non-lazy languages -
basically anything that supports closures - unlike the original method
that relied on lazy evaluation. So in principle I can now convert the
Haskell code to Scheme, Ruby, or even JavaScript.

At any rate, once I have done that, things started to fall into place.
I still had minor leaks, but the results of profiling were actually
proving useful for a change. I needed to turn off lazy evaluation in
several key places (almost all record fields are now strict, and a
'seq' was needed in one crucial location).

The result is a parser that can handle anything you throw at it with
constant (reasonably low) memory usage. It is dog-slow (about 8k/sec,
or twice that if compiling with -O2) but I was expecting that.

So, lesson learned - lazy evaluation in Haskell is a very nice
feature, but extremely difficult to debug, profile and control. Using
continuations may be less elegant at places, but is much more
practical.

You can see the final result in version 0.3 of the YamlReference
package I just uploaded to the Cabal database. This is intended to be
an "executable version" of the YAML specification (the BNF file, which
is actually Haskell code, is almost exactly the list of spec BNF
productions).

Thanks again for all the help,

   Oren Ben-Kiki
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Haskell version of Norvig's Python Spelling Corrector

2007-04-21 Thread Pete Kazmier
Recently I read an interesting article by Peter Norvig[1] on how to
write a spelling corrector in 21-lines of Python.  I wanted to try and
implement it in Haskell.  My implementation is terribly slow and I was
hoping for some tips on how to improve it and make it idiomatic.

I'd love to see other Haskell implementations as well if anyone has a
few moments to spare.  Admittedly, it took me several hours to get my
version working, but I'm a Haskell newbie.  Unfortunately, I think it
runs as slow as it took me to write it!  There is definitely something
wrong with it, a memory leak, because I can't correct more than a few
words without a great deal of memory being consumed.

Thanks,
Pete

[1] http://norvig.com/spell-correct.html

module Main where

import Control.Arrow
import Data.Char (toLower, isPunctuation)
import Data.List (maximumBy)
import qualified Data.Set as S
import qualified Data.Map as M

type WordSet= S.Set String
type WordFreq   = M.Map String Int

main :: IO ()
main = do
-- 'holmes.txt' can be found here:
-- http://norvig.com/holmes.txt
-- We should train it with a larger corpus, but my program is
-- is already slow enough.
c <- readFile "holmes.txt"
let correct = train . tokens $ c
interact $ (++"\n") . show . (id &&& correct) . init

-- Returns a list of words lowercased and stripped of punctuation
-- at the end of the word.
tokens :: String -> [String]
tokens = map (lower . nopunc) . words
where
  lower   = map toLower
  nopunc  = strip isPunctuation
  strip p = reverse . dropWhile p . reverse

-- Returns a closure (equivalent to the 'correct' function in the
-- Python implementation).
train:: [String] -> (String -> String)
train words word = maximumBy freq . S.toList . head $ filter (not . S.null)
[ known (S.singleton word),
  known (edits1 word),
  known (edits2 word),
  S.singleton word ]
where
  freq c c'= compare (findfreq c) (findfreq c')
  findfreq c   = M.findWithDefault 1 c frequencyMap
  known= S.filter (`M.member` frequencyMap) 
  frequencyMap = foldr incWordCount M.empty words
  incWordCount w m = M.insertWith (+) w 1 m

edits1 :: String -> WordSet
edits1 word = S.fromList $ concat
   [[ t i ++ d (i+1)| i <- range n], -- deletion
[ t i ++ [word!!(i+1)] ++ [word!!i] ++ d (i+2) | i <- range (n-1) ], -- transposition
[ t i ++ [c] ++ d (i+1) | i <- range n, c <- alphabet ], -- alteration
[ t i ++ [c] ++ d i | i <- range (n+1), c <- alphabet ]] -- insertion
where
  n= length word
  t i  = take i word
  d i  = drop i word
  range x  = [ 0..(x-1) ]
  alphabet = ['a'..'z']

edits2 :: String -> WordSet
edits2 = S.unions . S.toList . S.map edits1 . edits1
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] How Albus Dumbledore would sell Haskell

2007-04-21 Thread Claus Reinke
seems Simon has got himself a tricky problem. i was about to hit reply to his 
first call, but then i browsed through the oscon site, and thought that perhaps

my background isn't close enough to the intended audience to make useful
suggestions, not to mention the concrete examples asked for. but while there 
have been good suggestions, i still feel there are some general trends/ideas 
missing:


- oscon seems to be a huge event. on the negative side, that means that if the 
   pre-talk advertising isn't good enough (mainly abstract and general buzz, 
   i guess), there won't be much of an audience, as there are too many other

   things going on. on the positive side, that means that almost no advertising
   should be needed in the talk itself - if people aren't interested, they won't
   be there. that kind of breaks the usual strategy for talking to 
non-haskellers.

- it will still be necessary to convince the attendees, not that haskell is worth 
   looking into, but that their first 3 hours of doing so has been time well spent,

   and a quick rehash of all the other wonderful things haskellers have been up
   to might be a small part of that, but it won't be sufficient.

- 3 hours is long for a talk, but distressing for a tutorial: too short to 
really get
   the functional style of programming across to imperative/oo programmers,
   yet too long to stay within small and safe examples (thereby avoiding
   explanations of haskells more complicated sides).

- i would expect the audience to be a similarly mixed blessing: imperative
   mindsets liable to stumble over even simple functional programming 
   patterns, but at the same time too experienced to be impressible with

   toy examples.
   
in light of all this, Simon's approach seems promising:



 concrete examples (pref running code) of programs that are
  * small
  * useful
  * demonstrate Haskell's power
  * preferably something that might be a bit
  tricky in another language


working code for one or two compelling examples, engagingly presented,
should have a chance of giving that audience some value for money. and
perhaps it is best not to dwell on how different functional programming is -
given that there isn't all that much time to get acquainted, it might be best
to jump right in with "here's what you can do, and this is how you do it".
people who pick up weird things like ajax, .. must be used to getting into
strange lands, as long as they have good guidance, and useful examples.
don't get bogged down in language features (there's reference manuals 
and tutorials for that), but focus on techniques and on achieving things.


still, the central issue remains: how to present them with something that 
their current background tells them would be interesting, useful, and 
understandable, while at the same time presenting something that 
demonstrates haskell's advantages and culture.


as i said, i can't offer concrete examples for that kind of audience, but
let me report one experience that might be helpful: how i got into perl;-)

at the time, i was doing lots of shell-scripting, and i thought that awk was
about the most wonderful unix tool ever. what i was actually using awk
for were two features: sets of rules triggered by pattern matching, and
associative arrays for storing away the information extracted using those
patterns. if you could express your problem in terms of these two, it
would just disappear into thin air (clever storage tricks obviated any
need for further processing). but if you couldn't, things would get
awkward: processing data within awk's programming language wasn't
fun, processing data within some shell wasn't much better (and at the
time there were just too many shells, all with their own community,
advantages and shortcomings). but the worst thing of all was passing
data between scripts written in different tools (shell, awk, sed, make, ..),
integrating the whole mess, maintaining and extending it.

then someone suggested perl4, and i had to figure out whether i
wanted to know more about it (new syntax, new concepts, and all 
that..). fortunately, there was a little tool for converting awk scripts

to perl, allowing me to compare my old scripts with the new syntax.
that immediately got me into a little of perl's control structures, and
it was clear that perl's syntax for pattern processing was inspired by
sed, just as its associative arrays were inspired by awk. so i could
dump my awk, my sed, and my shells, and do it all in perl - i was
convinced. i still use shells and sed for small things, and make for
what it is good at, but i stopped using my favourite awk, and when
multiple tasks needed to be integrated, i started to prefer perl.

if i extrapolate from this experience to Simon's current adventure: 
powerful examples showing off perl's/haskell's features would sail 
right by me ("nice, but what does that have to do with me?" at best,
"aargh, that looks very complicated; perhaps anot

Re: [Haskell-cafe] Installation of hs-plugins

2007-04-21 Thread Philipp Volgger
I now used GHC 6.4 and mingw ( MSYS-1.0.11 ). Now it is possible to 
configure, build and install it. But on running the test ( out of a 
email from the list, source code see below) it crashes again without any 
information.

I compiled the Test1.hs with ghc -c Test1.hs.


Bayley, Alistair wrote:
From: [EMAIL PROTECTED] 
[mailto:[EMAIL PROTECTED] On Behalf Of Philipp Volgger


could somebody please tell me how hs-plugins has to be installed. I 
tried it with hs-plguin 1.0rc1 and I was unable to build it. I did

runhaskell Setup.lhs configure
runhaskell Setup.lhs build   (Crash without any information)
I tried it wiht GHC 6.4, 6.4.1 and 6.6.

I am using Windows XP with SP2.



I'm pretty sure you need to install under some kind of bash shell. I
used mingw on WinXP, but I think it can be done with cygwin. I'd
recommend mingw if you don't have either installed.

Note that hs-plugins doesn't work with ghc-6.6 yet on Windows, so stick
with 6.4.1 if you can.

Alistair
*
Confidentiality Note: The information contained in this message,
and any attachments, may contain confidential and/or privileged
material. It is intended solely for the person(s) or entity to
which it is addressed. Any review, retransmission, dissemination,
or taking of any action in reliance upon this information by
persons or entities other than the intended recipient(s) is
prohibited. If you received this in error, please contact the
sender and delete the material from any computer.
*

  


Test:

module Test1 where
test1 = putStrLn "test1"


module Main where
import Prelude hiding (catch)
import Control.Exception
import Data.List
import System.Environment
import System.Plugins

instance Show (LoadStatus a) where
 show (LoadFailure errors) = "LoadFailure - " ++ (concat (intersperse
"\n" errors))
 show (LoadSuccess m p) = "LoadSuccess"

main = do
 a <- getArgs
 let
   modName = case a of
 (n:_) -> n
 _ -> "Test1"
 let modPath = "./" ++ modName ++ ".o"
 let method = "test1"
 fc <- catch (load modPath [""] [] method)
   (\e -> return (LoadFailure
 ["Dynamic loader returned: " ++ show e]))
 case fc of
   LoadFailure errors -> do
 fail (concat (intersperse "\n" errors))
   LoadSuccess modul proc -> do
 let p :: IO (); p = proc
 proc


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: How Albus Dumbledore would sell Haskell

2007-04-21 Thread Steve Downey

I think you are right. If you used something like a theorem prover as
an example, you accidentally send the messsage that Haskell is very
useful for esoteric stuff that only academics are interested in.

Now, that doesn't mean that the example has to solve a real problem,
but it does need to be something that the audience can relate to their
own problems.

There's already a lot of general buzz about functional techniques.
Closures and lambda expressions are being, or have been, added to
several imperative languages. This naturally leads to interest in
higher order functions. The concurrency revolution is driving interest
in immutable values and lockless algorithms. For people who do
transactional work, having launchTheMissiles in the middle of a
transaction be caught by the type system is incredible. At least some
of the interest in dynamic types is driven by frustration with dealing
with type annotations.

So really, the example code just has to solve a problem they
recognize. Even the sudoko solvers would be good, trite as they are.
Or, some of the unix tools in Haskell.

$0.02

-SMD

On 4/19/07, Lennart Augustsson <[EMAIL PROTECTED]> wrote:

A theorem prover might be a really cool example, but if there's one
person in the audience that cares then Simon is lucky. :)  You need
to have examples that people can recognize and see the utility of.

-- Lennart

On Apr 19, 2007, at 20:48 , DavidA wrote:

> Simon Peyton-Jones  microsoft.com> writes:
>
>> But, just to remind you all: I'm particularly interested in
>>
>>   concrete examples (pref running code) of programs that are
>>* small
>>* useful
>>* demonstrate Haskell's power
>>* preferably something that might be a bit
>>tricky in another language
>
> I have something that I think nearly fits the bill. Unfortunately,
> I don't
> think it quite works because it's a bit specialised. However, I
> think it
> suggests a possible area to look, which I'll mention at the end.
>
> It's a theorem prover for intuitionistic propositional logic:
> http://www.polyomino.f2s.com/david/haskell/gentzen.html
>
> It's much shorter in Haskell than it would be in other languages.
> (It's even
> shorter than the ML that I based it on, because of some shortcuts I
> can take
> using lazy evaluation.)
>
> Strengths of Haskell that it demonstrates are:
> * How easy it is to define datatypes (eg trees), and manipulate
> them using
> pattern matching, with constructors, Eq, Show coming for free.
> * How lazy evaluation reduces code length by letting you write code
> that looks
> like it would do too much, and then lazy evaluate it (in the
> "proof" function)
> * The ability to extend the syntax with new symbolic operators
> * Use of higher order functions to simplify code (the (+++) operator)
>
> The problem is that I think Gentzen systems are a bit obscure. But
> I think you
> could probably show most of the same strengths of Haskell in something
> similar: game search, eg alpha-beta algorithm. Another advantage of
> doing game
> search would be that you'd get to show off persistent data
> structures (so that
> when you make a move in lookahead, you don't need to make a copy of
> the game
> state, because when you update the game state the old state still
> persists).
>
>
> ___
> Haskell-Cafe mailing list
> Haskell-Cafe@haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: How Albus Dumbledore would sell Haskell

2007-04-21 Thread apfelmus
Dan Weston wrote:
> -- Why is this not in Prelude?
> dup x = (x,x)

It is (almost). It's called

  join (,)

Regards,
apfelmus

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: [C2hs] anyone interested in developing a Language.C library?

2007-04-21 Thread Duncan Coutts
On Sat, 2007-04-21 at 12:04 +0200, Josef Svenningsson wrote:

> Unfortunately the niche is not empty. There is an ocaml library called
> cil which is supposed to be pretty sweet for manipulating C code. But
> I still think a Haskell library would be a very good idea, and perhaps
> one can look at cil for inspiration.
> 
> cil can be found here:
> http://hal.cs.berkeley.edu/cil/

Yeah, I came across this recently. It's pretty decent looking. I briefly
looked at their C parser (also implemented as a lex/yacc style lexer &
parser). Theirs also covers Sun and MS C language extensions, that is
Sun CC's pragmas and MS's numerous extensions.

Sadly this didn't popup when I was googling for yacc style LALR(1) C
grammars or I might have saved myself some time by porting their grammar
to alex+happy.

Duncan

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: [C2hs] anyone interested in developing a Language.C library?

2007-04-21 Thread Josef Svenningsson

On 4/21/07, Donald Bruce Stewart <[EMAIL PROTECTED]> wrote:

chak:
> Duncan Coutts wrote:
> >If anyone is interested in developing a Language.C library, I've just
> >completed a full C parser which we're using in c2hs.
> >
> >It covers all of C99 and all of the GNU C extensions that I have found
> >used in practise, including the __attribute__ annotations. It can
> >successfully parse the whole Linux kernel and all of the C files in all
> >the system packages on my Gentoo installation.
>
> Great work!
>
> Using this as a basis for a Language.C would be a really worthwile project.
>

I think people should be very interested in this.

The ability to easily manipulate and generate C would quickly insert
Haskell into another useful niche. There must *surely* be real money
in writing nice Haskell programs that optimise/analyse/refactor/generate
C code...


Unfortunately the niche is not empty. There is an ocaml library called
cil which is supposed to be pretty sweet for manipulating C code. But
I still think a Haskell library would be a very good idea, and perhaps
one can look at cil for inspiration.

cil can be found here:
http://hal.cs.berkeley.edu/cil/

Cheers,

Josef
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: QuickCheck subsumes unit testing

2007-04-21 Thread Joel Reymont


On Apr 21, 2007, at 2:54 AM, Donald Bruce Stewart wrote:


Just to walk the walk, and not just talk the talk, here's a quick unit
testing 'diff' driver I hacked up for QuickCheck.


Yay! I'll be the first to switch over!


Note that we actually probably want to use SmallCheck here,


I don't have an idea of when to prefer SmallCheck over QuickCheck. An  
explanation and, possibly, a tutorial would be very welcome! Anyone?


Note also, the driver would need further extending, since we've  
changed

the structure of the Testable values.


Can you elaborate on "further extending"? What direction should I  
extend it in?


Thanks, Joel

--
http://wagerlabs.com/





___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Program optimisation

2007-04-21 Thread Adrian Hey

Donald Bruce Stewart wrote:

ahey:

Actually it isn't I'm afraid. That module has had a complete re-write
since the package was last cabalised. Anybody who's interested should
darcs get it from
 http://darcs.haskell.org/packages/collections-ghc6.6/


Oh, what's remaining before the next release?


Hmm, nothing I guess. But I'm still doing a lot of work on it and
Jean-Philippe will probably be too. There's still quite a lot
on going on this. (I doubt it will ever be "finished" :-)

Regards
--
Adrian Hey


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe