Re: [Haskell-cafe] Efficient functional idiom for histogram

2009-08-05 Thread Dmitry Olshansky
My measurements show that
- using strict version of insertWith doesn't improve performance. - in case
of compilation with -O2 flag foldl' also is equal to foldl (-O2 gives approx
2 time impovements).- using RandomGen and State monad to generate a list
gives at least 4 times improvements (on 1 000 000 items).

More complicated improvements (using Array, PRNG and so on) were not tested
by me.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Efficient functional idiom for histogram

2009-08-05 Thread Yitzchak Gale
Dmitry Olshansky wrote:
 My measurements show that...
 (-O2 gives approx 2 time impovements).
 ...using RandomGen and State monad to generate a list gives at least 4 times
 improvements (on 1 000 000 items).

You earlier said:

 this takes over twice as long as a naively implemented
Python program

So now our naive Haskell - ordinary usage of Data.Map
and System.Random, without resorting to things like
unboxed arrays - is beating naive Python handily. Correct?

Or is this with an alternate RNG? Although I think even that
would be fair, since Python uses Mersenne.

Regards,
Yitz
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Efficient functional idiom for histogram

2009-08-05 Thread Paul Moore
2009/8/5 Yitzchak Gale g...@sefer.org:
 Dmitry Olshansky wrote:
 My measurements show that...
 (-O2 gives approx 2 time impovements).
 ...using RandomGen and State monad to generate a list gives at least 4 times
 improvements (on 1 000 000 items).

 You earlier said:

 this takes over twice as long as a naively implemented
 Python program

The latter was me :-)

 So now our naive Haskell - ordinary usage of Data.Map
 and System.Random, without resorting to things like
 unboxed arrays - is beating naive Python handily. Correct?

I haven't checked myself (and won't have time in the next couple of
weeks, as I'm on holiday - but I'll pick this up when I get back).,
but it sounds like it. I'd like to check Dmitry's suggestions, mainly
to see how they fit with my feel for naive (ie, at my beginner
level, do I understand why they are more efficient).

But I'd expect (compiled) Haskell to beat (interpreted) Python. That's
sort of the point, really... The big measures for me are (1) by how
much, and (2) how readable is the code.

 Or is this with an alternate RNG? Although I think even that
 would be fair, since Python uses Mersenne.

I got the impression Dmitry was using Haskell's standard RNG, not
Mersenne Twister. If so, then we'd get further improvements with MT,
but that's still a hit against Haskell, as I'd interpret it as meaning
that Haskell supplies as default a PRNG which costs noticeable
performance in order to provide guarantees that ordinary programs
don't need.

Paul.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Efficient functional idiom for histogram

2009-08-05 Thread Dmitry Olshansky
 I got the impression Dmitry was using Haskell's standard RNG, not
 Mersenne Twister. If so, then we'd get further improvements with MT,
 but that's still a hit against Haskell, as I'd interpret it as meaning
 that Haskell supplies as default a PRNG which costs noticeable
 performance in order to provide guarantees that ordinary programs
 don't need.

 Paul.

Yes, I used standard RNG just made corrections from the second Daniel
Fischer's post. My file is attached. Commented strings are from original
version and next lines are from Daniel. (Note that simulation and
simulate types depend on type of dice).
Just compile with ghc --make -O2 histogram.hs
and run
histogram +RTS -s

As a pre-intermediate ;-) I just check some proposals...


histogram.hs
Description: Binary data
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Efficient functional idiom for histogram

2009-08-05 Thread wren ng thornton

Paul Moore wrote:

2009/8/5 Yitzchak Gale g...@sefer.org:

Or is this with an alternate RNG? Although I think even that
would be fair, since Python uses Mersenne.


I got the impression Dmitry was using Haskell's standard RNG, not
Mersenne Twister. If so, then we'd get further improvements with MT,
but that's still a hit against Haskell, as I'd interpret it as meaning
that Haskell supplies as default a PRNG which costs noticeable
performance in order to provide guarantees that ordinary programs
don't need.


I'm not sure how fair that is. For most ordinary programs I've written 
that use randomness anywhere, they have many different places that want 
randomness. Having a single global seed can suffice for this, but it 
introduces severe constraints that are seldom noticed.


Namely, these are in fact *P*RNGs. If every place needing randomness is 
given its own seed, then it becomes possible to store all those seeds, 
allowing for replay. Replay can be good for capturing the history of a 
particular run of a simulation/game ---which is difficult at best in 
non-Haskell approaches. And more critically the seeds can be stored in a 
core dump allowing replay as a debugging tool, allowing reproduction 
of 'random' bugs which are otherwise quite difficult to track down.


Imperative languages could use the same algorithms as Haskell, but they 
do not; which means these benefits of PRNGs are seldom even thought of. 
By separating concerns, the Haskell approach not only leads to cleaner 
code, but that separation can also be used to add new capabilities. The 
invisible manacles of imperativism should not be looked upon lightly.


--
Live well,
~wren
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Efficient functional idiom for histogram

2009-08-01 Thread Paul Moore
2009/7/31 Paul Moore p.f.mo...@gmail.com:
 2009/7/31 Gregory Collins g...@gregorycollins.net:
 Paul Moore p.f.mo...@gmail.com writes:

 How would I efficiently write a function in Haskell to count
 occurrences of unique elements in a (potentially very large) list? For
 example, given the list [1,2,3,4,5,3,4,2,4] I would like the output
 [[1,1], [2,2], [3,2], [4,3], [5,1]] (or some equivalent
 representation).

    import qualified Data.Map as Map
    import           Data.Map (Map)

    histogram :: Ord a = [a] - [(a,Int)]
    histogram = Map.assocs . foldl f Map.empty
      where
        f m k = Map.insertWith (+) k 1 m

 Right. I see how that works, and can work out how to think about this
 sort of thing from your example. Thanks very much.

 BTW, I did know that Haskell had an efficient map implementation, I
 just wasn't sure how to use it functionally - I probably should have
 searched a bit harder for examples before posting. Thanks for the help
 in any case.

Hmm, I'm obviously still mucking up the performance somehow. My full
program (still a toy, but a step on the way to what I'm aiming at) is
as follows. It's rolling 3 6-sided dice 10 times, and printing a
summary of the results.

import System.Random
import qualified Data.Map as Map
import Data.Map (Map)
import Data.List

dice :: Int - Int - IO Int
dice 0 n = return 0
dice m n = do
  total - dice (m - 1) n
  roll - randomRIO (1, n)
  return (total + roll)

simulate count m n = do
  mapM (dice m) (replicate count n)

histogram :: Ord a = [a] - [(a,Int)]
histogram = Map.assocs . foldl f Map.empty
  where
f m k = Map.insertWith (+) k 1 m

simulation = do
  lst - simulate 10 3 6
  return (histogram lst)

main = do
  s - simulation
  putStrLn (show s)

When compiled, this takes over twice as long as a naively implemented
Python program.

What am I doing wrong here? I'd have expected compiled Haskell to be
faster than interpreted Python, so obviously my approach is wrong. I'm
expecting the answer to be that I've got unnecessary laziness - which
is fine, but ultimately my interest is in ease of expression and
performance combined, so I'm looking for beginner-level improvements
rather than subtle advanced techniques like unboxing.

Thanks,
Paul.

PS I know my code is probably fairly clumsy - I'd appreciate style
suggestions, but my main interest here is whether a beginner, with a
broad programming background, a basic understanding of Haskell, and
access to Google, put together a clear, efficient, program (ie, the
case where my usual scripting language is too slow and I want to knock
something up quickly in a high-level, high-performance language).
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Efficient functional idiom for histogram

2009-08-01 Thread Pekka Karjalainen
On Sat, Aug 1, 2009 at 4:44 PM, Paul Moorep.f.mo...@gmail.com wrote:
 PS I know my code is probably fairly clumsy - I'd appreciate style
 suggestions, but my main interest here is whether a beginner, with a
 broad programming background, a basic understanding of Haskell, and
 access to Google, put together a clear, efficient, program (ie, the
 case where my usual scripting language is too slow and I want to knock
 something up quickly in a high-level, high-performance language).

Here is one way to rewrite your program. It improved the speed
somewhat for me. I timed both programs on my computer. I suppose one
could try using an array for calculating the histogram as well, but I
only tried the simples thing here. I hope someone can weigh in with a
more thorough analysis.

Please note how I've avoided including the IO Monad in some type
signatures by extracting the data from it locally (with -). It is
quite possible to apply the histogram function to the data before
going through the IO Monad as well, but it doesn't appear to change
the execution speed much here.

Caveat: My testing wasn't extensive. I just compiled with -O and timed
the programs a couple of times.

import System.Random
import qualified Data.Map as Map
import Data.Map (Map)
import Data.List

diceRolls :: Int - IO [Int]
diceRolls highVal = do
generator - getStdGen
return (randomRs (1, highVal) generator)

groupDice :: Int - [Int] - [[Int]]
groupDice chunk rolls = map (take chunk) $ iterate (drop chunk) rolls

simulate :: Int - Int - Int - IO [Int]
simulate count m n = do
rolls - diceRolls n
let sums = map sum $ groupDice m rolls
return (take count sums)

histogram :: Ord a = [a] - [(a,Int)]
histogram = Map.assocs . foldl f Map.empty
 where
   f m k = Map.insertWith (+) k 1 m

simulation = do
 lst - simulate 10 3 6
 return (histogram $ lst)

main = do
 s - simulation
 putStrLn (show s)
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Efficient functional idiom for histogram

2009-08-01 Thread Ketil Malde
Paul Moore p.f.mo...@gmail.com writes:

 What am I doing wrong here? I'd have expected compiled Haskell to be
 faster than interpreted Python, so obviously my approach is wrong.

Two things from my experience: Python associative arrays are fast, and
Haskell random numbers are slow.  So by building a map from random
numbers, you are likely doing a fairly pessimal comparison.  Did you
profile to see where the time is spent? 

 I'm expecting the answer to be that I've got unnecessary laziness

IME, laziness usually affects space, but not so much time
performance.  Although 'insertWith (+)' looks like it would build
unevaluated thunks for the sums.

-k
-- 
If I haven't seen further, it is by standing in the footprints of giants
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Efficient functional idiom for histogram

2009-08-01 Thread Paul Moore
2009/8/1 Ketil Malde ke...@malde.org:
 Paul Moore p.f.mo...@gmail.com writes:

 What am I doing wrong here? I'd have expected compiled Haskell to be
 faster than interpreted Python, so obviously my approach is wrong.

 Two things from my experience: Python associative arrays are fast, and
 Haskell random numbers are slow.  So by building a map from random
 numbers, you are likely doing a fairly pessimal comparison.

Hmm, I think you're saying that for this problem, Haskell is likely to
be worse than Python. Interesting. I guess I'd assumed that something
CPU-intensive like this would benefit from a compiled language. Just
goes to show that intuition about performance is usually wrong
(something I really should know by now :-))

Is the issue with random numbers just the implementation, or is it
something inherent in the non-pure nature of a random number generator
that makes it hard for Haskell to implement efficiently? If the
latter, that probably makes Haskell a somewhat poor choice for
simulation-type programs.

 Did you profile to see where the time is spent?

Not until you suggested it :-) I was pleasantly surprised by how easy
it was to do. The results bear out what you were saying, the
bottleneck is entirely in the dice function.

 I'm expecting the answer to be that I've got unnecessary laziness

 IME, laziness usually affects space, but not so much time
 performance.  Although 'insertWith (+)' looks like it would build
 unevaluated thunks for the sums.

That'll teach me to bandy about technical terms as if I understand them :-)

Thanks for the comments,
Paul.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Efficient functional idiom for histogram

2009-08-01 Thread Malcolm Wallace
I'm expecting the answer to be that I've got unnecessary laziness -  
which

is fine, but ultimately my interest is in ease of expression and
performance combined, so I'm looking for beginner-level improvements
rather than subtle advanced techniques like unboxing.


You're right, it is too lazy.  Here are a couple of strictifications  
that should help:



histogram :: Ord a = [a] - [(a,Int)]
histogram = Map.assocs . foldl f Map.empty
 where
  f m k = Map.insertWith (+) k 1 m


Turn foldl into foldl' (from Data.List) and Map.insertWith into  
Map.insertWith'.
The strict versions simply force the intermediate structures to be  
evaluated, rather than hanging around as large accumulations of  
closures.


Regards,
Malcolm
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Efficient functional idiom for histogram

2009-08-01 Thread Daniel Fischer
Am Samstag 01 August 2009 15:44:39 schrieb Paul Moore:
 2009/7/31 Paul Moore p.f.mo...@gmail.com:
  2009/7/31 Gregory Collins g...@gregorycollins.net:

 Hmm, I'm obviously still mucking up the performance somehow. My full
 program (still a toy, but a step on the way to what I'm aiming at) is
 as follows. It's rolling 3 6-sided dice 10 times, and printing a
 summary of the results.

 import System.Random
 import qualified Data.Map as Map
 import Data.Map (Map)
 import Data.List

 dice :: Int - Int - IO Int
 dice 0 n = return 0
 dice m n = do
   total - dice (m - 1) n
   roll - randomRIO (1, n)
   return (total + roll)

Don't do too much in IO, it's better to separate the pure parts from the IO.
IMO, this would better have signature

dice :: RandomGen g = Int  - Int - g - (Int,g)
dice 0 _ g = (0,g)
dice m n g = case dice (m-1) n g of
  (total,g1) - case randomR (1,n) g1 of
(roll,g2) - (total+roll,g2)

or, better still be in a State monad or the Random monad ( 
http://hackage.haskell.org/package/MonadRandom )

die :: RandomGen g = Int - State g Int
die n = State $ randomR (1,n)

dice :: RandomGen g = Int - Int - State g Int
dice m n = liftM sum $ replicateM m (die n)


-- the do is superfluous
 simulate count m n = do
   mapM (dice m) (replicate count n)

Ouch, that hurts (not yet so incredibly much for 10 rolls, but if you try 
100, 
it'll really hurt).
Since you're doing it in IO, the whole list must be built before any further 
processing 
can begin, so you're building up a largish list, only to destroy it immediately 
afterwards, much work for the garbage collector. If you can accumulate the 
scores as they 
come, the intermediate list can be fused away and the garbage collector is kept 
idle.

If you absolutely have to do it in IO, use

import System.IO.Unsafe

simulate 0 _ _ = return []
simulate count m n = unsafeInterleaveIO $ do
val - dice m n
rst - simulate (count-1) m n
return (val:rst)

to avoid building a large list. If you use the (lazy) State monad, that's 
automatically 
done :).

simulate count m n = replicateM count (dice m n)  -- now in State

histogram :: Ord a = [a] - [(a,Int)]
histogram = Map.assocs . foldl f Map.empty
  where
    f m k = Map.insertWith (+) k 1 m

-- simulation :: RandomGen g = State g [(Int,Int)]
simulation = do
  lst - simulate 100 3 6
  return (histogram lst)

main = do
sg - getStdGen
print $ evalState simulation sg

much faster, still not very fast, since StdGen isn't a particularly fast PRNG.

Another method is to create an infinite list of random numbers and use it as 
needed:
---
module Main (main) where

import System.Random
import Data.Array.Unboxed
import Data.List
import System.Environment (getArgs)

dice :: RandomGen g = g - Int - [Int]
dice g mx = randomRs (1,mx) g

splits :: Int - [a] - [[a]]
splits l = unfoldr f
  where
f xs = case splitAt l xs of
r@(h,t) | null t- Nothing
| otherwise - Just r

simulation :: RandomGen g = g - Int - Int - Int - UArray Int Int
simulation g rep dn df = accumArray (+) 0 (dn,dn*df) lst
  where
rls = dice g df
scs = splits dn rls
lst = take rep [(sum rll,1) | rll - scs]

main :: IO ()
main = do
(rp:dn:df:_) - getArgs
sg - getStdGen
print $ assocs $ simulation sg (read rp) (read dn) (read df)
-

Using an unboxed array instead of a Map gives a little extra speed, but not 
much.


 histogram :: Ord a = [a] - [(a,Int)]
 histogram = Map.assocs . foldl f Map.empty
   where
 f m k = Map.insertWith (+) k 1 m

For some reason it doesn't make much difference here, but it should be the 
strict 
versions, foldl' and insertWith' in general.


 simulation = do
   lst - simulate 10 3 6
   return (histogram lst)

 main = do
   s - simulation
   putStrLn (show s)

 When compiled, this takes over twice as long as a naively implemented
 Python program.

 What am I doing wrong here? I'd have expected compiled Haskell to be
 faster than interpreted Python, so obviously my approach is wrong. I'm
 expecting the answer to be that I've got unnecessary laziness

Quite on the contrary, it's unnecessary strictness here :D

 - which is fine, but ultimately my interest is in ease of expression and
 performance combined, so I'm looking for beginner-level improvements
 rather than subtle advanced techniques like unboxing.

Nothing advanced with using unboxed arrays.


 Thanks,
 Paul.

 PS I know my code is probably fairly clumsy

Actually, the style is rather good, I think (mine's worse, usually).
You shouldn't use IO so much, though, and your code betrays a certain level of 
unfamiliarity with strictness/performance characteristics of the libraries. But 
that's 
natural.

 - I'd appreciate style
 suggestions, but my main interest here is whether a beginner, with a
 broad 

Re: [Haskell-cafe] Efficient functional idiom for histogram

2009-08-01 Thread Sterling Clover


On Aug 1, 2009, at 2:06 PM, Paul Moore wrote:


Is the issue with random numbers just the implementation, or is it
something inherent in the non-pure nature of a random number generator
that makes it hard for Haskell to implement efficiently? If the
latter, that probably makes Haskell a somewhat poor choice for
simulation-type programs.



Well, I'm not sure of the details, but in your original  
implementation, you're performing IO to pull the seed out of a ref at  
every iteration. Pekka Karjalainen's doesn't do that, which probably  
helps with the speedup. Along with that, Haskell has a fairly slow  
random implementation. As I recall however, this is partially because  
it hasn't received a great deal of optimization, but mainly because  
the algorithm itself fulfills some rather strong properties -- in  
particular it must be fairly statistically robust, and it must  
provide a split function which produces generators that are  
independently robust [1]. This limits algorithm choice quite a bit.


For other random numbers, with different properties (faster, but with  
tradeoffs in robustness, or ability to split, or both), you can check  
hackage for at least mersenne-random and gsl-random. There may be  
others that I don't recall.


Cheers,
Sterl.

[1] http://www.haskell.org/onlinereport/random.html
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Efficient functional idiom for histogram

2009-08-01 Thread Daniel Fischer
Am Samstag 01 August 2009 20:50:10 schrieb Sterling Clover:
 On Aug 1, 2009, at 2:06 PM, Paul Moore wrote:
  Is the issue with random numbers just the implementation, or is it
  something inherent in the non-pure nature of a random number generator
  that makes it hard for Haskell to implement efficiently? If the
  latter, that probably makes Haskell a somewhat poor choice for
  simulation-type programs.

If you view a PRNG as a function from the seed to the sequence of generated 
numbers or as 
a function state - (bitpattern, newstate), PRNGs are pure (at least, I know no 
counterexample), so it's not inherently inefficient in Haskell, though it's 
probably still 
faster in C. 
One thing that makes StdGen slow is splittability, as Sterling points out below.
For a simulation programme where you don't need splittability, choose a 
different PRNG.


 Well, I'm not sure of the details, but in your original
 implementation, you're performing IO to pull the seed out of a ref at
 every iteration. Pekka Karjalainen's doesn't do that, which probably
 helps with the speedup. Along with that, Haskell has a fairly slow
 random implementation. As I recall however, this is partially because
 it hasn't received a great deal of optimization, but mainly because
 the algorithm itself fulfills some rather strong properties -- in
 particular it must be fairly statistically robust, and it must
 provide a split function which produces generators that are
 independently robust [1]. This limits algorithm choice quite a bit.

 For other random numbers, with different properties (faster, but with
 tradeoffs in robustness, or ability to split, or both), you can check
 hackage for at least mersenne-random and gsl-random.

I didn't get much speedup with System.Random.Mersenne.Pure64 (might be because 
I have a 
32-bit system and Word64 goes through foreign calls, if that is still the 
case), but 
GSL.Random.Gen reduced the time by a factor of over 5.
It forces you back into IO and is a little less convenient, but if speed is a 
concern, 
it's a price worth to pay.

---
module Main (main) where

import GSL.Random.Gen
import qualified Data.Map as Map
import Data.Map (Map)
import Data.List
import System.IO.Unsafe
import System.Time
import Data.Word

dice :: RNG - Int - Int - IO Int
dice _ 0 n = return 0
dice rng m n = do
  total - dice rng (m - 1) n
  roll - fmap (+1) $ getUniformInt rng n
  return (total + roll)

simulate _ 0 _ _ = return []
simulate rng count m n = unsafeInterleaveIO $ do
val - dice rng m n
tl - simulate rng (count-1) m n
return (val:tl)

histogram :: Ord a = [a] - [(a,Int)]
histogram = Map.assocs . foldl' f Map.empty
  where
    f m k = Map.insertWith' (+) k 1 m

simulation rng = do
  lst - simulate rng 100 3 6
  return (histogram lst)

main = do
  rng - newRNG mt19937
  sd - getTimeSeed
  setSeed rng sd  -- omit seeding for reproducible results
  s - simulation rng
  putStrLn (show s)

getTimeSeed :: IO Word64
getTimeSeed = do
TOD a b - getClockTime
return . fromInteger $ 10^6*a + b `quot` (10^6)
--

 There may be others that I don't recall.

 Cheers,
 Sterl.


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Efficient functional idiom for histogram

2009-07-31 Thread Paul Moore
How would I efficiently write a function in Haskell to count
occurrences of unique elements in a (potentially very large) list? For
example, given the list [1,2,3,4,5,3,4,2,4] I would like the output
[[1,1], [2,2], [3,2], [4,3], [5,1]] (or some equivalent
representation).

Clearly, this won't be possible for infinite input, but I would like
it to be as efficient as possible for (lazy) input lists containing
many millions of elements. So an implementation based on group . sort
is not going to work.

In an imperative language like Python, I'd use a dictionary as an
accumulator - something like

for el in input:
accums[i] = accums.get(i, 0) + 1

That's about as efficient as you can hope for (assuming an efficient
dictionary implementation). How would I code something equivalent in
Haskell?

Thanks,
Paul.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Efficient functional idiom for histogram

2009-07-31 Thread Gregory Collins
Paul Moore p.f.mo...@gmail.com writes:

 How would I efficiently write a function in Haskell to count
 occurrences of unique elements in a (potentially very large) list? For
 example, given the list [1,2,3,4,5,3,4,2,4] I would like the output
 [[1,1], [2,2], [3,2], [4,3], [5,1]] (or some equivalent
 representation).

import qualified Data.Map as Map
import   Data.Map (Map)

histogram :: Ord a = [a] - [(a,Int)]
histogram = Map.assocs . foldl f Map.empty
  where
f m k = Map.insertWith (+) k 1 m

G.
-- 
Gregory Collins g...@gregorycollins.net
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Efficient functional idiom for histogram

2009-07-31 Thread Gregory Collins
Gregory Collins g...@gregorycollins.net writes:

 Paul Moore p.f.mo...@gmail.com writes:

 How would I efficiently write a function in Haskell to count
 occurrences of unique elements in a (potentially very large) list? For
 example, given the list [1,2,3,4,5,3,4,2,4] I would like the output
 [[1,1], [2,2], [3,2], [4,3], [5,1]] (or some equivalent
 representation).

 import qualified Data.Map as Map
 import   Data.Map (Map)
 
 histogram :: Ord a = [a] - [(a,Int)]
 histogram = Map.assocs . foldl f Map.empty
   where
 f m k = Map.insertWith (+) k 1 m

Here foldl should be foldl' (d'oh!)

G
-- 
Gregory Collins g...@gregorycollins.net
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Efficient functional idiom for histogram

2009-07-31 Thread Malcolm Wallace

In an imperative language like Python, I'd use a dictionary as an
accumulator - something like

   for el in input:
   accums[i] = accums.get(i, 0) + 1


Haskell has efficient dictionary structures too, e.g. Data.Map

  List.foldl' (\m x- Map.insertWith' (+) x 1 m) Map.empty

Regards,
Malcolm

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Efficient functional idiom for histogram

2009-07-31 Thread Daniel Fischer
Am Freitag 31 Juli 2009 22:39:53 schrieb Paul Moore:
 How would I efficiently write a function in Haskell to count
 occurrences of unique elements in a (potentially very large) list? For
 example, given the list [1,2,3,4,5,3,4,2,4] I would like the output
 [[1,1], [2,2], [3,2], [4,3], [5,1]] (or some equivalent
 representation).

 Clearly, this won't be possible for infinite input, but I would like
 it to be as efficient as possible for (lazy) input lists containing
 many millions of elements. So an implementation based on group . sort
 is not going to work.

 In an imperative language like Python, I'd use a dictionary as an
 accumulator - something like

 for el in input:
 accums[i] = accums.get(i, 0) + 1

 That's about as efficient as you can hope for (assuming an efficient
 dictionary implementation). How would I code something equivalent in
 Haskell?

If the elements come from a relatively small range and are suitable for array 
indexing,

import Data.Array.IArray
import Data.Array.Unboxed

accumArray :: (IArray a e, Ix i)
   = (e - e' - e) -- ^ An accumulating function
   - e  -- ^ A default element
   - (i,i)  -- ^ The bounds of the array
   - [(i, e')]  -- ^ List of associations
   - a i e  -- ^ Returns: the array

accumArray (+) 0 (mini,maxi) $ zip list (repeat 1)

is pretty fast (beats the hell out of Data.Map). If your elements can't be 
unboxed, the 
accumArray function from Data.Array does it too, albeit much slower (still 
faster than 
Data.Map, in my experience).


 Thanks,
 Paul.


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Efficient functional idiom for histogram

2009-07-31 Thread Paul Moore
2009/7/31 Gregory Collins g...@gregorycollins.net:
 Paul Moore p.f.mo...@gmail.com writes:

 How would I efficiently write a function in Haskell to count
 occurrences of unique elements in a (potentially very large) list? For
 example, given the list [1,2,3,4,5,3,4,2,4] I would like the output
 [[1,1], [2,2], [3,2], [4,3], [5,1]] (or some equivalent
 representation).

    import qualified Data.Map as Map
    import           Data.Map (Map)

    histogram :: Ord a = [a] - [(a,Int)]
    histogram = Map.assocs . foldl f Map.empty
      where
        f m k = Map.insertWith (+) k 1 m

Right. I see how that works, and can work out how to think about this
sort of thing from your example. Thanks very much.

BTW, I did know that Haskell had an efficient map implementation, I
just wasn't sure how to use it functionally - I probably should have
searched a bit harder for examples before posting. Thanks for the help
in any case.

Paul.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Efficient functional idiom for histogram

2009-07-31 Thread Deniz Dogan
2009/8/1 Paul Moore p.f.mo...@gmail.com:
 BTW, I did know that Haskell had an efficient map implementation, I
 just wasn't sure how to use it functionally - I probably should have
 searched a bit harder for examples before posting. Thanks for the help
 in any case.

Know that Data.Map uses size balanced trees and is not e.g. a hash map.

-- 
Deniz Dogan
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe