from:"Pete Kazmier"

[Haskell-cafe] Re: Lock-Free Data Structures using STMs in Haskell

2008-04-09 Thread Pete Kazmier

Bryan O'Sullivan <[EMAIL PROTECTED]> writes:

> Pete Kazmier wrote:
>
>>   data ArrayBlockingQueueSTM e = ArrayBlockingQueueSTM {
>> [...]
>> sa :: Array Int (TVar e)
>>   }
>> 
>> It's unclear to me why the Array's elements must be wrapped in TVars.
>
> To allow them to be modified.  You can't otherwise modify the elements
> of an array without going into the ST monad.

Thanks!  I forgot about the whole immutable thing :-)  Haven't used
arrays yet while learning Haskell!

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

[Haskell-cafe] Lock-Free Data Structures using STMs in Haskell

2008-04-09 Thread Pete Kazmier

I recently read the STM paper on lock-free data structures [1] which I
found very informative in my quest to learn how to use STM.  However,
there are a few things I do not fully understand and was hoping
someone might be able to explain further.

In the STM version of the ArrayBlockingQueue, the following type is
defined:

  data ArrayBlockingQueueSTM e = ArrayBlockingQueueSTM {
shead :: TVar Int,
stail :: TVar Int,
sused :: TVar Int,
slen :: Int,
sa :: Array Int (TVar e)
  }

It's unclear to me why the Array's elements must be wrapped in TVars.
Why aren't the TVars on shead, stail, and sused sufficient?  Here is
the only function that reads from the queue:

  readHeadElementSTM :: ArrayBlockingQueueSTM e
-> Bool -> Bool -> STM (Maybe e)
  readHeadElementSTM abq remove block
= do u <- readTVar (sused abq)
 if u == 0
then if block
then retry
else return Nothing
else do h <- readTVar (ihead abq)
 let tv = sa abq ! h
 -- Why are the array elements wrapped in TVars?
 e <- readTVar tv
 if remove
then do
  let len = slen abq
  let newh = h `mod` len
  writeTVar (shead abq) $! newh
  writeTVar (sused abq) $! (u-1)
else return ()
 return (Just e)

It is not immediately obvious to me why the elements need to be
wrapped in TVars.  Could someone help elaborate?

The other question is in regards to section 2 where STM is
introduced.  The authors define the following:

  decT :: TVar Int -> IO ()
  decT v = atomically (do x <- readTVar v
  if x == 0
 then retry
 else return ()
  writeTVar v (x-1))

And then go on to show how easy it is to compose STM types with this
function:

  decPair v1 v1 :: TVar Int -> TVar Int -> IO ()
  decPair v1 v2 = atomically (decT v1 `orElse` decT v2)

Will this actually compile?  I was under the impression that 'orElse'
could only combine STM types, not IO () types.  

Thank you,
Pete

[1] Anthony Discolo, Tim Harris, Simon Marlow, Simon Peyton Jones, and
Satnam Singh. Lock-free data structures using STMs in Haskell. In
Eighth International Symposium on Functional and Logic Programming
(FLOPS.06), April 2006.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

[Haskell-cafe] Re: Sneaking haskell in the workplace -- cleaning csv files

2007-06-17 Thread Pete Kazmier

"Brandon S. Allbery KF8NH" <[EMAIL PROTECTED]> writes:

> On Jun 15, 2007, at 18:37 , Jason Dagit wrote:
>
>> I love to see people using Haskell, especially professionally, but I
>> have to wonder if the real tool for this job is sed? :-)
>
> Actually, while sed could do that, it'd be a nightmare.  You really
> want a parser to deal with general CSV like this, and while you can
> write parsers in sed, you *really* don't want to.  :)

sed ':a /,$/!{N;s/\n//;ba}' somefile.csv

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

[Haskell-cafe] Re: Yaham - Yet Another HAskell Mode for GNU Emacs

2007-06-17 Thread Pete Kazmier

Ian Zimmerman <[EMAIL PROTECTED]> writes:

> It has a different focus than the Moss haskell-mode that's normally
> used nowadays.  Yaham strives, first and foremost, to integrate well
> with the rest of Emacs and respect the Emacs ecosystem.

Could you elaborate on this?  Perhaps an itemized list of things that
are different?

Thanks,
Pete

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

[Haskell-cafe] Re: Network.HTTP+ByteStrings Interface--Or: How to shepherd handles and go with the flow at the same time?

2007-05-25 Thread Pete Kazmier

As a newbie to Haskell, I found your thorough analysis very
interesting.  Thanks for the great read!  I have a few questions
regarding some of your comments, see below:

Jules Bean <[EMAIL PROTECTED]> writes:
> E,F. Progressive GET
> pSynGET :: URL -> ((Bool,ByteString) -> IO ()) -> IO ()
> pAsynGET :: URL -> ((Bool,ByteString) -> IO ()) -> IO (MVar ())
> 
> (This is a particular simple case of Oleg's iteratees, I
> think) Download the data at whatever speed is convenient. As data
> arrives, feed it to the 'callback' provided. The ByteString is the
> new chunk of data, the 'Bool' is just supposed to indicate whether
> or not this is the final chunk.

> Incidentally there are more complex options than (Bool,Bytestring)
> -> IO ().  A simple and obvious change is to add a return
> value. Another is a 'state monad by hand', as in (Bool,Bytestring)
> -> s -> s, and change the final return value of the type to IO s,
> which allows the callback to accumulate summary information and
> still be written as pure code. 

I want to be sure that I understand the implications of the callback
function returning an IO action as originally proposed versus it being
a pure function.  It would seem to me that if it were a pure callback
the usefulness would be limited as I would not be able to take the
data read from the network and immediately write it out to a file.  Is
this correct?

And if the above is correct, is there a way to define the callback
such that one does not have to hardcode the IO monad in the return
type so you can have the best of both worlds?

> Other options allow the 'callback' to request early termination,
> by layering in an 'Either' type in there. 

I believe the ability to request early termination is important, and
was one of the nice features of Oleg's left-fold enumerators.  It
would be a shame if the API did not offer this capability.

> Another more sophisticated option, I think, is the higher rank
>
> MonadTrans t => URL ->
>  ((forall m. Monad m) => (Bool,ByteString) -> t m)
>  -> t IO ()
>
> ...which, unless I've made a mistake, allows you to write in 'any
> monad which can be expressed as a transformer', by transforming it
> over IO, but still contains the implicit promise that the
> 'callback' does no IO. For example t = StateT reduces to the
> earlier s -> s example, in effect, with a slightly different data
> layout.

I don't fully understand this, but would this prevent one from calling
IO actions as it was receiving the chunks in the callback (such as
writing it to a file immediately)?

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

[Haskell-cafe] Stack overflow with my Trie implementation

2007-04-26 Thread Pete Kazmier

I've modified my Norvig spelling corrector to use a trie instead of
Data.Map in the hopes of improving performance.  Plus, this is fun and
a great learning exercise for me.  Unfortunately, when I load my trie
with a large amount of data, I get a stack overflow.  It's unclear to
me why this is happening.  I specifically use foldl' to avoid this
situation when building my trie.  Could someone shed some light on the
situation for me?

Here is the code:

> module Main where 
> 
> import Data.List (foldl')
> import Data.Maybe (maybe, fromMaybe)
> import Prelude hiding (lookup)
> import qualified Data.Map as M
> 
> data Trie a = T (Maybe a) (M.Map Char (Trie a)) deriving (Show)
> 
> main = do
> -- big.txt is a large file of words: http://www.norvig.com/big.txt
> c <- readFile "big.txt"
> let freqTrie = foldl' incWordCount empty (words c)
> print $ lookup "evening" freqTrie
> where 
>   incWordCount m w = insertWith (+) w 1 m
> 
> empty :: Trie a
> empty = T Nothing M.empty
> 
> lookup :: String -> Trie a -> Maybe a
> lookup ([])   (T Nothing  m) = Nothing
> lookup ([])   (T (Just v) m) = return v
> lookup (k:ks) (T _m) = case M.lookup k m of
>  Nothing   -> Nothing
>  Just trie -> lookup ks trie
> 
> findWithDefault :: a -> String -> Trie a -> a
> findWithDefault v k t = fromMaybe v (lookup k t)
> 
> member :: String -> Trie a -> Bool
> member k t = maybe False (const True) (lookup k t)
> 
> insertWith :: (a -> a -> a) -> String -> a -> Trie a -> Trie a
> insertWith fn ([])   v (T Nothing   m) = T (Just v) m
> insertWith fn ([])   v (T (Just v') m) = T (Just $ fn v v') m
> insertWith fn (k:ks) v (T mvm) = T mv (M.insertWith const k newtrie m)
> where
>   oldtrie = M.findWithDefault empty k m
>   newtrie = insertWith fn ks v oldtrie
>
> {-- I also tried to use this line instead of the one above to see
> if this had any impact.  Unfortunately, I obtained the same 
> results.
> 
> insertWith fn ([])   v (T (Just v') m) = let x = fn v v' in seq x T (Just x) m
> --}



___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

[Haskell-cafe] Re: Haskell version of Norvig's Python Spelling Corrector

2007-04-22 Thread Pete Kazmier

Derek Elkins <[EMAIL PROTECTED]> writes:

>> After reading Bryan's post, I switched my right fold to a strict left
>> fold and almost tripled my original speed.  Could someone provide some
>> guidelines on when to use each?  I thought foldr should be used when
>> building some large data structure such as the map I build.
>> Bryan, out of curiosity, is a non bytestring version of your code
>> faster?
>
> http://www.haskell.org/hawiki/StackOverflow

I read the article and understand it, but I am having a hard time
applying that specifically to my use of foldr.  Here is how I was
using foldr:

  type WordFreq   = M.Map B.ByteString Int

  train:: [B.ByteString] -> WordFreq
  train words = frequencyMap
  where
frequencyMap = foldr incWordCount M.empty words
incWordCount w m = M.insertWith (+) w 1 m

So is 'incWordCount' strict in its second argument?  I'm still not
sure exactly what that means.  According to the wiki page, if it is
strict in the second argument, I should have used foldl' instead of
foldr.

Thanks,
Pete

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

[Haskell-cafe] Re: Haskell version of Norvig's Python Spelling Corrector

2007-04-22 Thread Pete Kazmier

Bryan O'Sullivan <[EMAIL PROTECTED]> writes:

> After this switch, I found that spellchecking one word still took 4x
> as long in Haskell as Norvig's Python program.  Since I was checking
> only one word in each case, essentially all of the runtime was taken
> up by building the word frequency map.
>
> train = foldl' updateMap M.empty . map lower . mkWords
> where updateMap model word = M.insertWith' (+) word 1 model
>   mkWords = filter (not . B.null) . X.splitWith isNotAlpha
>   lower !s = X.map toLower s
>   isNotAlpha !c = c < 0x41 || (c > 0x5a && c < 0x61) || c > 0x7a
>   toLower !c | c >= 0x41 && c <= 0x5a = c + 0x20
>  | otherwise = c

After reading Bryan's post, I switched my right fold to a strict left
fold and almost tripled my original speed.  Could someone provide some
guidelines on when to use each?  I thought foldr should be used when
building some large data structure such as the map I build.

Bryan, out of curiosity, is a non bytestring version of your code
faster?

Thanks,
Pete

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

[Haskell-cafe] Re: Haskell version of Norvig's Python Spelling Corrector

2007-04-22 Thread Pete Kazmier

Ketil Malde <[EMAIL PROTECTED]> writes:

> On Sun, 2007-04-22 at 00:25 -0400, Pete Kazmier wrote:
>> Pete Kazmier <[EMAIL PROTECTED]> writes:
>> 
>> > I'd love to see other Haskell implementations as well if anyone has a
>> > few moments to spare.  Admittedly, it took me several hours to get my
>> > version working, but I'm a Haskell newbie.  Unfortunately, I think it
>> > runs as slow as it took me to write it!
>
> Hm - nobody suggested using ByteStrings yet?  String is notoriously
> wasteful, and replacing it with ByteString normally gives a quite
> significant speedup.

I actually have a ByteString version but it runs much slower.  This
part of the code is where all of the time is spent in the ByteString
version:

  type WordFreq   = M.Map B.ByteString Int
 
  train:: [B.ByteString] -> WordFreq
  train words = frequencyMap
  where
frequencyMap = foldr incWordCount M.empty words
incWordCount w m = M.insertWith (+) w 1 m
  
> Worse - and this is true for ByteStrings, too - String comparisons are
> O(n), which means lookups in Sets and Maps are expensive.  A trie (i.e,
> a search tree where each internal node corresponds to a word prefix, and
> has one branch per letter in the alphabet) will give you lookup that
> scales with word size (and possibly alphabet size).

Right.  My first version was just a direct translation of Norvig's
code with an emphasis on trying to keep the complexity and size of
code to a minimum.

> Instead of generating the (huge) list of misspelled words, you could
> calculate edit distance to each correctly spelled word?  With a bound on
> allowable mistakes, this is a linear time operation using the standard
> dynamic programming algorithm.

Could you provide additional information on this "standard dynamic
programming algorithm?"  I'm not familiar with dynamic programming.

Thanks!
Pete

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

[Haskell-cafe] Re: Haskell version of Norvig's Python Spelling Corrector

2007-04-21 Thread Pete Kazmier

Pete Kazmier <[EMAIL PROTECTED]> writes:

> I'd love to see other Haskell implementations as well if anyone has a
> few moments to spare.  Admittedly, it took me several hours to get my
> version working, but I'm a Haskell newbie.  Unfortunately, I think it
> runs as slow as it took me to write it!  There is definitely something
> wrong with it, a memory leak, because I can't correct more than a few
> words without a great deal of memory being consumed.

As monochrom pointed out on #haskell, I am using 'interact'
incorrectly.  For some reason I thought 'interact' applied its
argument to each line of the input.  I've replaced it as follows:

  interact $ unlines . map (show . (id &&& correct)) . lines

The program is still terribly slow due to my use of lists.  Is there a
better way to write 'edits1'?

Thanks,
Pete

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

[Haskell-cafe] Haskell version of Norvig's Python Spelling Corrector

2007-04-21 Thread Pete Kazmier

Recently I read an interesting article by Peter Norvig[1] on how to
write a spelling corrector in 21-lines of Python.  I wanted to try and
implement it in Haskell.  My implementation is terribly slow and I was
hoping for some tips on how to improve it and make it idiomatic.

I'd love to see other Haskell implementations as well if anyone has a
few moments to spare.  Admittedly, it took me several hours to get my
version working, but I'm a Haskell newbie.  Unfortunately, I think it
runs as slow as it took me to write it!  There is definitely something
wrong with it, a memory leak, because I can't correct more than a few
words without a great deal of memory being consumed.

Thanks,
Pete

[1] http://norvig.com/spell-correct.html

module Main where

import Control.Arrow
import Data.Char (toLower, isPunctuation)
import Data.List (maximumBy)
import qualified Data.Set as S
import qualified Data.Map as M

type WordSet= S.Set String
type WordFreq   = M.Map String Int

main :: IO ()
main = do
-- 'holmes.txt' can be found here:
-- http://norvig.com/holmes.txt
-- We should train it with a larger corpus, but my program is
-- is already slow enough.
c <- readFile "holmes.txt"
let correct = train . tokens $ c
interact $ (++"\n") . show . (id &&& correct) . init

-- Returns a list of words lowercased and stripped of punctuation
-- at the end of the word.
tokens :: String -> [String]
tokens = map (lower . nopunc) . words
where
  lower   = map toLower
  nopunc  = strip isPunctuation
  strip p = reverse . dropWhile p . reverse

-- Returns a closure (equivalent to the 'correct' function in the
-- Python implementation).
train:: [String] -> (String -> String)
train words word = maximumBy freq . S.toList . head $ filter (not . S.null)
[ known (S.singleton word),
  known (edits1 word),
  known (edits2 word),
  S.singleton word ]
where
  freq c c'= compare (findfreq c) (findfreq c')
  findfreq c   = M.findWithDefault 1 c frequencyMap
  known= S.filter (`M.member` frequencyMap) 
  frequencyMap = foldr incWordCount M.empty words
  incWordCount w m = M.insertWith (+) w 1 m

edits1 :: String -> WordSet
edits1 word = S.fromList $ concat
   [[ t i ++ d (i+1)| i <- range n], -- deletion
[ t i ++ [word!!(i+1)] ++ [word!!i] ++ d (i+2) | i <- range (n-1) ], -- transposition
[ t i ++ [c] ++ d (i+1) | i <- range n, c <- alphabet ], -- alteration
[ t i ++ [c] ++ d i | i <- range (n+1), c <- alphabet ]] -- insertion
where
  n= length word
  t i  = take i word
  d i  = drop i word
  range x  = [ 0..(x-1) ]
  alphabet = ['a'..'z']

edits2 :: String -> WordSet
edits2 = S.unions . S.toList . S.map edits1 . edits1
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

[Haskell-cafe] Re: Tutorial on Haskell

2007-04-16 Thread Pete Kazmier

"Evan Laforge" <[EMAIL PROTECTED]> writes:

> It illustrates a few nice things about haskell: laziness for the
> recursive defs and easy backtracking, low syntax overhead and custom
> operators for DSLs, composability, etc.

Although that is true, I somehow feel that showing a perl, ruby, or
python programmer an alternate approach to regexps, a technique firmly
ingrained into the roots of these languages, will not garner much
interest in Haskell.  I know this is the case for me.  

In fact, it was always a large negative for me that Haskell/GHC never
had decent builtin support for regexps until recently (6.6).  From a
practical point of view, the tasks that I do frequently involve the
use of regexps (for better or worse).  Again, I'm not an academic,
just an everyday python programmer trying to assist me in my day job.

Upon thinking about this subject further, I think it would be very
important that Simon somehow incorporates at least one use of the new
regexp library.  The target audience would by more likely to consider
Haskell if it contains they're beloved tool of choice.  Later, they
can discover the elegance of parsec if needed.

-Pete

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

[Haskell-cafe] Re: Tutorial on Haskell

2007-04-16 Thread Pete Kazmier

Simon Peyton-Jones <[EMAIL PROTECTED]> writes:

> My guess is that they'll be Linux/Perl/Ruby types, and they'll be
> practitioners rather than pointy-headed academics.

> Suggest concrete examples of programs that are
> * small
> * useful
> * demonstrate Haskell's power
> * preferably something that might be a bit
> tricky in another language

> But there must be lots of others.  

As one of those python/ruby types trying to learn Haskell for the past
year, here are my suggestions for small examples:

- Tom Moertel's Haskell Port Scanner
  Why? Demonstrates concurrent haskell in a small amount of lines
  http://blog.moertel.com/articles/2004/03/13/concurrent-port-scanner-in-haskell

- A web-server example of some sort that interfaces with a database
  and uses some interesting HTML combinator library.

- Building a simple unit testing framework is always a good example
  (even though they already exist), and then introducing quickcheck
  perhaps.

- A program to concurrently verify the links on an HTML page
  recursively.  I'm sure there are lots of interesting idioms and
  techniques that could be used while keeping the code small and
  elegant.

- Perhaps a Haskell version of Norvig's 20-line Python Spell Checker:
  Why? Maybe a Haskell version could be shorter and more elegant?
  http://norvig.com/spell-correct.html

- Tom Moertel's Directory Tree Printing in Haskell:
  Why? Demonstrates all sorts of introductory techniques
  
http://blog.moertel.com/articles/2007/03/28/directory-tree-printing-in-haskell-part-three-lazy-i-o


Some thoughts on other topics suggested by others:

- Parsec is not that interesting for those coming from perl, ruby, or
  python as they rely on regular expressions for everything and just
  expect that they are part of the language.  The thought of writing
  one's own parser is not as "cool" as most Haskellers believe it is,
  regardless of how interesting the parsec library is.

- STM may be too complex of a subject for an intro to Haskell
  tutorial.  There are just too many concepts in there that may
  overwhelm some beginners.

- Don's post on shell scripting was very interesting, but I'm still
  having a hard time understanding some parts of it, the error
  handling part, and I've been playing with Haskell on and off for the
  past year (I'm also a slow learner and not an academic).

- Don's post on simple UNIX utilities was also quite nice. I believe
  he also wrote a simple IRC bot example that would prove
  interesting, can't seem to find the link at the moment though.

Just my thoughts as a newbie desiring a book on how to use Haskell in
a practical manner (such as Practical Common Lisp book).

-Pete

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

[Haskell-cafe] Re: flip fix and iterate

2007-03-21 Thread Pete Kazmier

"Claus Reinke" <[EMAIL PROTECTED]> writes:

>> I won't try to understand fix just yet, but I'm still confused by
>> the type of fix:
>> fix :: (a -> a) -> a
>> It appears to me that it takes a function as an argument, and that
>> function takes a single argument.  So how are you passing fix an
>> anonymous function taking 2 arguments?  Sorry if I have beaten this
>> horse to death, but my pea-sized brain is working overtime here.
>
> fix takes a function as an argument, and that function takes a
> single argument.  that function also returns something of the same
> type as its single argument.  

[snip]

> and suddenly, fix does have two parameters, which flip can flip!-)
>
> no magic, just technology sufficiently advanced to be
> indistinguishable from it: a function of one parameter, which
> returns a function of one parameter, is a function of more than one
> parameter.
>
> at which point this particular fixed-point combinator puts its
> recursive unfoldings to rest for tonight.

Claus,

Thank you for the detailed explanation.  I think I understand now!  To
be sure, I'll reread your post several times over the next few days.

Thanks again, this was very helpful.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

[Haskell-cafe] Re: fix

2007-03-20 Thread Pete Kazmier

"Matthew Brecknell" <[EMAIL PROTECTED]> writes:

> As others have pointed out, fix is polymorphic, so "a" can stand for any
> type, including "(b -> c)". Removing redundant parentheses, this means
> fix can directly specialise to:
>
>> fix :: ((b -> c) -> b -> c) -> b -> c

I understand now.  I think part of my problem was that I was trying to
grasp one too many new things all at once.  This makes perfect sense.

Thanks,
Pete

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

[Haskell-cafe] Re: flip fix and iterate

2007-03-20 Thread Pete Kazmier

"Matthew Brecknell" <[EMAIL PROTECTED]> writes:

> Pete Kazmier:
>> I understand the intent of this code, but I am having a hard time
>> understanding the implementation, specifically the combination of
>> 'fix', 'flip', and 'interate'.  I looked up 'fix' and I'm unsure how
>> one can call 'flip' on a function that takes one argument.
>
> I threw that in there because I figured you were up for another
> challenge. :-)

Haskell has a way of making one feel dumb.  This is by far the most
challenging programming language I've ever used.

> It took me ages to get some clue about how to use fix, quite apart from
> combining it with flip. The concept of passing the output of a function
> as one of its parameters ("tying the knot") can be difficult to accept,
> particularly if you haven't studied lambda calculus. 

This is a bit mind boggling.  Do you have any trivial examples that
may help my understanding?

> Note that I could have just written this:
>
>> let iterate a = do
>>   ... iterate a' ...
>> iterate accum

In the meantime, I'm more than happy to claim ignorance and stick with
the above version which is very accessible to us mere mortals.

> So with my use of "flip fix", I'm really just calling fix on the
> anonymous function (\iterate accum -> ...), and then the parameter
> ("accum") is passed to the function returned by fix. So now you just
> need a couple of weeks (or months if you're as slow as me) to
> understand what fix is all about... :-)

I won't try to understand fix just yet, but I'm still confused by the
type of fix:

 fix :: (a -> a) -> a

It appears to me that it takes a function as an argument, and that
function takes a single argument.  So how are you passing fix an
anonymous function taking 2 arguments?  Sorry if I have beaten this
horse to death, but my pea-sized brain is working overtime here.

Thanks for all of the help.

Pete

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

[Haskell-cafe] Re: Lazy IO and closing of file handles

2007-03-19 Thread Pete Kazmier

Bryan O'Sullivan <[EMAIL PROTECTED]> writes:

> Pete Kazmier wrote:
>
>> I understand the intent of this code, but I am having a hard time
>> understanding the implementation, specifically the combination of
>> 'fix', 'flip', and 'interate'.  I looked up 'fix' and I'm unsure how
>> one can call 'flip' on a function that takes one argument.

> As to why it's okay to call "flip" on "fix" at all, look at the types
> involved.
>
> fix :: (a -> a) -> a
> flip :: (a -> b -> c) -> b -> a -> c
>
> By substitution:
>
> flip fix :: a -> ((a -> b) -> a -> b) -> b

Sadly, I'm still confused.  I understand how 'flip' works in the case
where its argument is a function that takes two arguments.  I've
started to use this in my own code lately.  But my brain refuses to
understand how 'flip' is applied to 'fix', a function that takes one
argument only, which happens to be a function itself.  What is 'flip'
flipping when the function passed to it only takes one argument?

Thanks,
Pete

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

[Haskell-cafe] Re: Lazy IO and closing of file handles

2007-03-19 Thread Pete Kazmier

Pete Kazmier <[EMAIL PROTECTED]> writes:

> I attempted to read Oleg's fold-stream implementation [1] as this
> sounds quite appealing to me, but I was completely overwhelmed,
> especially with all of the various type signatures used.  It would be
> great if one of the regular Haskell bloggers (Tom Moertel are you
> reading this?) might write a blog entry or two interpreting his
> implementation for those of us starting out in Haskell perhaps by
> starting out with a non-polymorphic version so as to emphasize the
> approach.
>
> [1] http://okmij.org/ftp/Haskell/fold-stream.lhs

In the event any other Haskell newbie comes along someday and is just
as overwhelmed as I was, I've found this post by Oleg to be a much
easier to understand than the above paper because it is not as generic
and thus the type signatures are a bit easier on the eyes:

http://www.haskell.org/pipermail/haskell/2003-September/012741.html

With that said, I have a question regarding Hal's response to the
above email in which he states:

> Just thought I'd mention that this is, in fact, my preferred method of
> iterating over a file.  It alleviates the pain associated with lazy file
> IO, and simultaneously provides a useful abstraction.  I actually have
> 3*2 functions that I use which look like:
> 
> > type Iteratee  iter seed = seed -> iter -> Either seed seed
> > hFoldChars  :: FilePath -> Iteratee  Char seed -> seed -> IO seed
> > hFoldLines  :: FilePath -> Iteratee  String   seed -> seed -> IO seed
> > hFoldWords  :: FilePath -> Iteratee  [String] seed -> seed -> IO seed
> 
> > type IterateeM iter seed = seed -> iter -> IO (Either seed seed)
> > hFoldCharsM :: FilePath -> IterateeM Char seed -> seed -> IO seed
> > hFoldLinesM :: FilePath -> IterateeM String   seed -> seed -> IO seed
> > hFoldWordsM :: FilePath -> IterateeM [String] seed -> seed -> IO seed
> 
> Which perform as expected (hFoldWords(M) can be written in terms of
> hFoldLinesM, but I find I use it sufficiently frequently to warrent
> having it stand out).  Also, of course, the only ones actually
> implemented are the (M) variants; the non-M variants just throw a return
> into the Iteratee.

What does he mean by the very last sentence?  Oleg's version seems
more like the non-M versions.  What is his implication?

Thanks,
Pete

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

[Haskell-cafe] Re: Lazy IO and closing of file handles

2007-03-19 Thread Pete Kazmier

"Matthew Brecknell" <[EMAIL PROTECTED]> writes:

> Pete Kazmier:
>> I attempted to read Oleg's fold-stream implementation [1] as this
>> sounds quite appealing to me, but I was completely overwhelmed,
>> especially with all of the various type signatures used.  It would be
>> great if one of the regular Haskell bloggers (Tom Moertel are you
>> reading this?) might write a blog entry or two interpreting his
>> implementation for those of us starting out in Haskell perhaps by
>> starting out with a non-polymorphic version so as to emphasize the
>> approach.
>> 
>> [1] http://okmij.org/ftp/Haskell/fold-stream.lhs
>
> The basic idea of the paper is the use of a left-fold operator as the
> primary interface for enumarating collections. The recursive version
> (less general than the non-recursive version) of a left-fold operator
> for enumerating the lines of a text file might look something like this:
>
>> import Control.Monad.Fix
>> import Control.Exception
>> import Data.List
>> import qualified Data.Set as S
>> import qualified Data.Map as M
>> import System.IO
>> 
>> enumLines :: (a -> String -> Either a a) -> a -> FilePath -> IO a
>> enumLines iter accum filename = do
>>   h <- openFile filename ReadMode
>>   flip fix accum $
>> \iterate accum -> do
>>   try_line <- try (hGetLine h)
>>   case try_line of
>> Left e -> hClose h >> return accum
>> Right line -> do
>>   case iter accum line of
>> Left accum -> hClose h >> return accum
>> Right accum -> iterate accum

I understand the intent of this code, but I am having a hard time
understanding the implementation, specifically the combination of
'fix', 'flip', and 'interate'.  I looked up 'fix' and I'm unsure how
one can call 'flip' on a function that takes one argument.

> To use this, you provide an "iteratee", a function which takes an
> accumulator and a line from the file, and returns a new accumulator
> embedded in an Either. Using the Left branch causes immediate
> termination of the enumeration. For example, to search for the first
> occurrence of each of a set of email headers:
>
>> getHeaders :: S.Set String -> FilePath -> IO (S.Set String, M.Map String 
>> String)
>> getHeaders hdrs file = enumLines findHdrs (hdrs,M.empty) file where
>>   findHdrs accum@(wanted,found) line =
>> if null line
>>   then Left accum
>>   else
>> case headerLine line of
>>   Nothing -> Right accum
>>   Just hdr ->
>> case findDelete hdr wanted of
>>   Nothing -> Right accum
>>   Just wanted ->
>> let accum = (wanted, M.insert hdr line found) in
>>   if S.null wanted
>> then Left accum
>> else Right accum
>> 
>> headerLine :: String -> Maybe String
>> headerLine (':':xs) = Just []
>> headerLine (x:xs) = fmap (x:) (headerLine xs)
>> headerLine [] = Nothing
>> 
>> findDelete :: Ord a => a -> S.Set a -> Maybe (S.Set a)
>> findDelete e s = if S.member e s
>>   then Just (S.delete e s)
>>   else Nothing
>
> It's a bit of a case-analysis nightmare, but when comparing this to
> previous approaches, note that file traversal and processing are cleanly
> separated, file handle closure is guaranteed to be timely, file
> traversal stops as soon as all the required headers have been found,
> memory usage is minimised.

Very nice.  I like the clean separation, but as you say, its one ugly
bit of code compared to my original code, although much more elegant
no doubt.

> I hope that helps.

Very much so.  Thank you for you help.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

[Haskell-cafe] Re: Lazy IO and closing of file handles

2007-03-17 Thread Pete Kazmier

"Matthew Brecknell" <[EMAIL PROTECTED]> writes:

> So here's a test. I don't have any big maildirs handy, so this is based
> on the simple exercise of printing the first line of each of a large
> number of files. First, the preamble.
>
>> import Control.Exception (bracket)
>> import System.Environment
>> import System.IO
>
>> main = do
>>   t:n:fs <- getArgs
>>   ([test0,test1,test2,test3] !! read t) (take (read n) $ cycle fs)
> 
> [snip]

Thank you for summarizing the approaches presented by others.  As a
Haskell newbie, there seems to be quite a few esoteric concepts to
conquer.  Your concrete examples were helpful in my understanding of
the ramifications associated with the various approaches.

After reading the various threads you cited, I decided to avoid lazy
IO altogether.  By using 'readFile' without forcing the strict
evaluation of my parser, I inadvertently relinquished control of the
resource management--closing of the file handles was left to the GC.
And although I could have used 'seq' to address the issue, why bother
fixing a problem that could have been avoided altogther by using
strict IO.

With that said, I added the following function to my program and then
replaced the invocation of 'readFile' with it:

  readEmailHeaders :: FilePath -> IO String
  readEmailHeaders file = 
  bracket (openFile file ReadMode) (hClose) (headers [])
  where
headers acc h = do
line <- hGetLine h
case line of
  -- Stop reading file once we hit the empty separator
  -- line, no need to read the rest of the file (body).
  "" -> return . concat . reverse $ acc
  _  -> headers ("\n":line:acc) h

I'm not sure if this is the best implementation, but the speed is
comparable to the lazy IO version without the annoying defect of
running out of file handles.  I also tried an implementation using
'hGetChar' but that was much slower.

I attempted to read Oleg's fold-stream implementation [1] as this
sounds quite appealing to me, but I was completely overwhelmed,
especially with all of the various type signatures used.  It would be
great if one of the regular Haskell bloggers (Tom Moertel are you
reading this?) might write a blog entry or two interpreting his
implementation for those of us starting out in Haskell perhaps by
starting out with a non-polymorphic version so as to emphasize the
approach.

Thanks,
Pete

[1] http://okmij.org/ftp/Haskell/fold-stream.lhs

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

[Haskell-cafe] Re: Lazy IO and closing of file handles

2007-03-14 Thread Pete Kazmier

[EMAIL PROTECTED] (Donald Bruce Stewart) writes:

> pete-expires-20070513:
>> When using readFile to process a large number of files, I am exceeding
>> the resource limits for the maximum number of open file descriptors on
>> my system.  How can I enhance my program to deal with this situation
>> without making significant changes?
>
> Read in data strictly, and there are two obvious ways to do that:
>
> -- Via strings:
>
> readFileStrict f = do
> s <- readFile f
> length s `seq` return s
>
> -- Via ByteStrings
> readFileStrict  = Data.ByteString.readFile
> readFileStrictString  = liftM Data.ByteString.unpack 
> Data.ByteString.readFile
>
> If you're reading more than say, 100k of data, I'd use strict
> ByteStrings without hesitation. More than 10M, and I'd use lazy
> bytestrings.

Correct me if I'm wrong, but isn't this exactly what I wanted to
avoid?  Reading the entire file into memory?  In my previous email, I
was trying to state that I wanted to lazily read the file because some
of the files are quite large and there is no reason to read beyond the
small set of headers.  If I read the entire file into memory, this
design goal is no longer met.

Nevertheless, I was benchmarking with ByteStrings (both lazy and
strict), and in both cases, the ByteString versions of readFile yield
the same error regarding max open files.  Incidentally, the lazy
bytestring version of my program was by far the fastest and used the
least amount of memory, but it still crapped out regarding max open
files. 

So I'm back to square one.  Any other ideas?

Thanks,
Pete

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

[Haskell-cafe] Lazy IO and closing of file handles

2007-03-14 Thread Pete Kazmier

When using readFile to process a large number of files, I am exceeding
the resource limits for the maximum number of open file descriptors on
my system.  How can I enhance my program to deal with this situation
without making significant changes?

My program takes one or more directories that contain email messages,
stored one per file, and prints a list of all the email threads.  Here
is a snippet of output:

  New addition to the Kazmier family
Casey Kazmier
  
  Memoization in Erlang?
Thomas Johnsson
Ulf Wiger \(AL/EAB\)

As a newcomer to Haskell, I am intrigued by lazy evaluation and how it
can influence one's designs.  With that said, I wrote the program as a
sequence of list manipulations which seemed quite natural to do in
Haskell starting with reading the contents of the each file.  Here is
the algorithm at the high level:

  1. Read contents of all files returning a list of Strings
  2. Map over the list and parse each String as an Email
  3. Sort the list of Emails
  4. Group the list of Emails by Subject
  5. Map over the grouped list to create a list of Threads
  6. Finally, print the list of Threads

It is my understanding that, as a result of lazy IO, the entire file
does not need to be read into memory because parseEmail only inspects
the topmost portion of the email (its headers), which is a key part of
my design as some of the files can be quite large. Unfortunately, as
soon as I run this program on a directory with more than 1024 files,
GHC craps out on me due to resource limits.  It seems that the handles
opened by readFile remain open.  Would this be common across all
Haskell implementations?

How do I go about fixing this without making a significant number of
changes to my program?  Did I make a mistake in steps 1 and 2 above?
Should I have read and parsed a single file at a time, and then move
on to the next?  

I'd appreciate any other comments on the program as well.  I feel this
is the best example of Haskell code that I have written.  Compared to
the first version of this program I wrote a few months ago, this is a
hundred times better.

Here is the program:

> module Main where
> 
> import Control.Monad (filterM, liftM)
> import Data.List
> import Data.Maybe
> import System.Directory
> import System.Environment
> 
> type From= String
> type Subject = String
> data Email   = Email {from :: From, subject :: Subject} deriving Show
> data Thread  = Thread [Email]
> 
> instance Show Thread where
> show (Thread emails@(e:es)) = title ++ senders
> where
>   title   = newline . bolder . subject $ e
>   sender  = newline . indent . from
>   senders = concatMap sender emails
>   newline = (++ "\n")
>   indent  = ("  " ++)
>   bolder  = ("\27[0;32;40m" ++) . (++ "\27[0m")
> 
> main =
> getArgs  >>=
> mapM fileContentsOfDirectory >>=
> mapM_ print . threadEmails . map parseEmail . concat 
> 
> fileContentsOfDirectory :: FilePath -> IO [String]
> fileContentsOfDirectory dir =
> setCurrentDirectory  dir >>
> getDirectoryContents dir >>=
> filterM doesFileExist>>=  -- ignore directories
> mapM readFile
> 
> parseEmail :: String -> Email
> parseEmail text = 
> Email (getHeader "From") (getHeader "Subject")
> where
>   getHeader = fromMaybe "N/A" . flip lookup headers
>   headers   = concatMap mkassoc . takeWhile (/="") $ lines text
>   mkassoc s = case findIndex (==':') s of
> Just n  -> [(take n s, drop (n+2) s)]
> Nothing -> []
> 
> threadEmails :: [Email] -> [Thread]
> threadEmails =
> map Thread . groupBy (fuzzy (==)) . sortBy (fuzzy compare)
> where
>   fuzzy fn e e' = stripReFwd (subject e) `fn` stripReFwd (subject e')
>   stripReFwd= stripSpaces . reverse . stripToColon . reverse
>   stripSpaces   = dropWhile (==' ')
>   stripToColon  = takeWhile (/=':')

  


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

[Haskell-cafe] Re: How would you replace a field in a CSV file?

2006-10-01 Thread Pete Kazmier

[EMAIL PROTECTED] writes:

> For such a small self-contained task, I don't think Haskell
> is any better than Python.

I figured as much, but I thought with the new FPS lazy bytestrings it
might have a chance in terms of raw speed.  On the other side of the
coin, in terms of elegance, I thought I'd ask as haskellers always
amaze me with their one-liners :-)

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

[Haskell-cafe] How would you replace a field in a CSV file?

2006-10-01 Thread Pete Kazmier

The other day at work an opportunity arose where I was hoping to sneak
some Haskell into the pipeline of tools used to process call detail
records (CDRs).  In the telecommunications industry, CDRs are used for
billing.  Each CDR is a single line record of 30 comma-separated
values.  Each line is approximately 240 characters in length.  The
task at hand is to replace field number 10 if a new value can be found
in a hashmap which is keyed using the contents of the field.

My colleague was going to write a C program (that's all he knows), but
I whipped up a trivial python program instead.  I was curious if a
haskell version could be faster and more elegant , but I have not been
able to beat my python version in either case.  So, I'm curious as to
how you would go about this task in Haskell.  The input files are
generally 300-400MB, and the hashmap will contain perhaps 20-30 items.

For those that know python, here is a very simple implementation that
happens to be very fast compared to my Haskell version and very short:

for line in sys.stdin:
fields = line.split(',')
fields[9] = tgmap.get(fields[9], fields[9])
print ",".join(fields),

For each line in standard input:

  - Splits the string on the comma: "field0,field1,...,field29" => 
["field0", "field1", ..., "field29"] to obtain a list of strings.

  - Gets the value associated with the key of field9 from tgmap, if it
does not exist, it returns a default value which is the original
value.  I.e., if it's not in the map, then don't replace the
field.

  - Joins the list of fields with a comma to yield a string again
which is printed out to standard output.  The join method on the
string is a bit odd: ",".join([1,2,3]) => "1,2,3"

Here is my first Haskell attempt:

import Data.ByteString.Lazy.Char8 as B hiding (map,foldr)
import Data.List (map)
import Data.Map as M hiding (map)

-- This is just a placeholder until I actually populate the map
tgmap = M.singleton (B.pack "Pete") (B.pack "Kazmier")

main = B.interact $ B.unlines . map doline . B.lines
where doline= B.join comma . mapIndex fixup . B.split ','
  fixup i s = if i==9 then M.findWithDefault s s tgmap else s
  comma = B.pack ","

-- f is supplied the index of the current element being processed
mapIndex f xs = m f 0 xs
where m f i [] = []
  m f i (x:xs') = f i x : m f (i+1) xs'

After talking with dons on #haskell, he cleaned my version up and
produced this version which gets rid of 'if' statement and makes
mapIndex stricter:

import Data.ByteString.Lazy.Char8 as B hiding (map,foldr)
import Data.List (map)
import Data.Map as M hiding (map)

-- This will be populated from a file
dict = M.singleton (B.pack "Pete") (B.pack "Kazmier")

main = B.interact $ B.unlines . map doline . B.lines
where doline= B.join comma . mapIndex fixup . B.split ','
  comma = B.singleton ','
  fixup 3 s = M.findWithDefault s s dict
  fixup n s = s

-- f is supplied the index of the current element being processed
mapIndex :: (Int -> ByteString -> ByteString) -> [ByteString] ->
[ByteString]
mapIndex f xs = m xs 0
where m []  _ = []
  m (x:xs') i = f i x : (m xs' $! i+1)

That helped things a bit, but I must confess I don't understand how
the strictness improved things as I had assumed things were going to
be evaluated in a reasonable amount of time due to the printing of
output.  I thought IO was interlaced with the execution and thus I
wasn't going to have to concern myself over laziness.  In addition,
the function is able to generate new elements of the list on demand so
I thought it was a good citizen in the lazy world.  Could anyone help
explain?

And then he came up with another version to avoid the 'unlines', but
that did not that really speed things up significantly.  So, with all
that said, is there a better approach to this problem?  Perhaps a more
elegant Haskell solution?

Thanks,
Pete

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

[Haskell-cafe] Re: Editors for Haskell

2006-06-07 Thread Pete Kazmier

Pete Kazmier <[EMAIL PROTECTED]> writes:

> As part of my learning experience, I think I want to see if I can
> write a haskell pastebin that does proper syntax highlighting.
> Someone in #haskell suggested that I use just a lexer because using a
> parser is overkill.  However, I can't make this assessment until I see
> how to use the parser and the information it can supply.

Thanks for the responses and pointers to the other projects.  I'll
investigate those after the day-job (the one that pays the bills).

As for using the lexer vs the parser, I was hoping to do things such
as folding and/or nifty mouse-overs of logical blocks of code, which
is why I was interested in the parser.  I'm not sure if I could do the
same with only a lexer.

I'm basically just looking for something concrete to tinker with as I
learn Haskell and it seems that Haskell is missing a snazzy pastebin.

Thanks,
Pete

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

[Haskell-cafe] Re: Editors for Haskell

2006-06-07 Thread Pete Kazmier

"Simon Peyton-Jones" <[EMAIL PROTECTED]> writes:

> You probably know this, but your kind of application is a big reason
> that we now make GHC available as a library.  (Just say 'import GHC'.)  
>
> You shouldn't need to parse Haskell yourself: just call GHC's parser.
> You get back a syntax tree with very precise location information that
> can guide your editor (e.g. if you want to select a sub-exprssion).
> Similarly, you can call the type checker.

Are there any small examples of using GHC's parser?  I'm a complete
newbie so perhaps I'm not checking all of the relevant locations for
docs, but I can't seem to find this parser that is being referred to.
I checked out the source tree to GHC as well, but I have no idea where
to look in there (not to mention it's a bit intimidating).  Pointers
would be appreciated!

As part of my learning experience, I think I want to see if I can
write a haskell pastebin that does proper syntax highlighting.
Someone in #haskell suggested that I use just a lexer because using a
parser is overkill.  However, I can't make this assessment until I see
how to use the parser and the information it can supply.

Thanks,
Pete

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

[Haskell-cafe] Re: Lock-Free Data Structures using STMs in Haskell

[Haskell-cafe] Lock-Free Data Structures using STMs in Haskell

[Haskell-cafe] Re: Sneaking haskell in the workplace -- cleaning csv files

[Haskell-cafe] Re: Yaham - Yet Another HAskell Mode for GNU Emacs

[Haskell-cafe] Re: Network.HTTP+ByteStrings Interface--Or: How to shepherd handles and go with the flow at the same time?

[Haskell-cafe] Stack overflow with my Trie implementation

[Haskell-cafe] Re: Haskell version of Norvig's Python Spelling Corrector

[Haskell-cafe] Re: Haskell version of Norvig's Python Spelling Corrector

[Haskell-cafe] Re: Haskell version of Norvig's Python Spelling Corrector

[Haskell-cafe] Re: Haskell version of Norvig's Python Spelling Corrector

[Haskell-cafe] Haskell version of Norvig's Python Spelling Corrector

[Haskell-cafe] Re: Tutorial on Haskell

[Haskell-cafe] Re: Tutorial on Haskell

[Haskell-cafe] Re: flip fix and iterate

[Haskell-cafe] Re: fix

[Haskell-cafe] Re: flip fix and iterate

[Haskell-cafe] Re: Lazy IO and closing of file handles

[Haskell-cafe] Re: Lazy IO and closing of file handles

[Haskell-cafe] Re: Lazy IO and closing of file handles

[Haskell-cafe] Re: Lazy IO and closing of file handles

[Haskell-cafe] Re: Lazy IO and closing of file handles

[Haskell-cafe] Lazy IO and closing of file handles

[Haskell-cafe] Re: How would you replace a field in a CSV file?

[Haskell-cafe] How would you replace a field in a CSV file?

[Haskell-cafe] Re: Editors for Haskell

[Haskell-cafe] Re: Editors for Haskell

26 matches

Site Navigation

Mail list logo

Footer information