Send Beginners mailing list submissions to
[email protected]
To subscribe or unsubscribe via the World Wide Web, visit
http://www.haskell.org/mailman/listinfo/beginners
or, via email, send a message with subject or body 'help' to
[email protected]
You can reach the person managing the list at
[email protected]
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Beginners digest..."
Today's Topics:
1. Re: I have created an ugly Haskell program.. (Heinrich Apfelmus)
2. Does System.Directory work on Windows XP?
(Patrick Larrivee-Woods)
3. Re: Does System.Directory work on Windows XP? (Jason Dusek)
4. Re: Does System.Directory work on Windows XP?
(Patrick Larrivee-Woods)
5. Re: Does System.Directory work on Windows XP? (Daniel Fischer)
6. Lazy file IO & Space leaks/waste (Aleksandar Dimitrov)
7. if ands (Nathan M. Holden)
8. Re: if ands (Joe Fredette)
9. Re: if ands (Keith Sheppard)
----------------------------------------------------------------------
Message: 1
Date: Wed, 04 Nov 2009 18:21:17 +0100
From: Heinrich Apfelmus <[email protected]>
Subject: [Haskell-beginners] Re: I have created an ugly Haskell
program..
To: [email protected]
Message-ID: <[email protected]>
Content-Type: text/plain; charset=ISO-8859-1
Brent Yorgey wrote:
> Ask yourself: What Would Conal Do (WWCD)? Conal Elliott is always
> trying to get people to think about the semantic essence of their
> problems, so let's try it.
>
> What are we REALLY trying to do here? What are those lists of tuples,
> REALLY? Well, it seems to me that the lists of tuples are really just
> representing *functions* on some totally ordered domain.
> [...]
>
> So, let's try converting these lists of pairs to actual functions:
>
>
> asFunc :: (Ord a) => [(a,b)] -> (a -> Maybe b)
> asFunc is a = fmap snd . listToMaybe . reverse . takeWhile ((<=a) . fst) $
> is
>
> [...]
>
> Now, you might object that this is much more inefficient than the
> other solutions put forth. That is very true. [...]
>
> However, I still find it very helpful to think about the essence
> of the problem like this: elegant yet inefficient code is a much
> better starting place than the other way around! [...]
>
> You can also try to optimize, taking advantage of the fact that we
> always call the functions built by asFunc with a sequence of strictly
> increasing inputs.
I am with Brent and Conal here. Now, to continue, ask yourself: What
Would Conal Do Next (WWCDN)?
What are we really trying to do here? What is this function, really,
considering that we are only evaluating it at a strictly increasing
sequence of inputs? Well, it seems to me that it is some special kind of
function, best captured as an *abstract data type*.
In particular, the function is something which I will call a "time
series". In other words, the input is to be thought of as time.
data Time t = Moment t | Infinity
deriving (Eq,Ord,Show)
The inclusion of infinity will turn out to be very convenient.
Now, the time series is a function that has a value x1 in the distant
past, until a time t1 where it begins to have the value x2 , again
until a time t2 where it switches to x3 and so on, until a value xn
that is kept until infinity. In Haskell, this looks like this
function t
| -Infinity <= t && t < t1 = x1
| t1 <= t && t < t2 = x2
| t2 <= t && t < t3 = x3
| ...
| t1 <= t && t < Infinity = xn
and pictorially, something like this:
____ xn _____
____ x2 ____ |
| |____ x3 ____ ... |
_____ x1 ____|
-Inf t1 t2 ... tn Inf
Of course, we can implement this abstract data type with a list of pairs
(tk,xk)
newtype TimeSeries t a = TS { unTS :: [(a,Time t)] }
deriving (Show)
and our goal is to equip this data type with a few natural operations
that can be used to implement Philip's zip-like function.
The first two operations are
progenitor :: TimeSeries t a -> a
progenitor = fst . head . unTS
which returns the value from the distant past and
beginning :: TimeSeries t a -> Time t
beginning = snd . head . unTS
which returns the first point in time when the function changes its
value. These correspond to the operation head on lists.
The next operation is called `forgetTo` t and will throw away all
values and changes before and including a given time t .
forgetTo :: Ord t => TimeSeries t a -> Time t -> TimeSeries t a
forgetTo (TS xs) Infinity = TS [last xs]
forgetTo (TS xs) t = TS $ dropWhile ((<= t) . snd) xs
This roughly corresponds to tail , but takes advantage of the time
being continuous.
Last but not least, we need a way to create a time series
forever :: a -> TimeSeries t a
forever x = TS [(x,Infinity)]
and we need to add values to a time series, which can be done with an
operation called prepend that adds a new beginning and replaces the
progenitor .
-- We assume that t < beginning xs
prepend :: a -> Time t -> TimeSeries t a -> TimeSeries t a
prepend x Infinity _ = TS [(x,Infinity)]
prepend x t (TS xs) = TS $ (x,t) : xs
These operations correspond to [] and (:) for lists.
The key about these operations is that they have a description /
intuition that is *independent* of the implementation of times series.
At no point do we need to know how exactly TimeSeries is implemented
to understand what these five operations do.
Now, Philip's desired zip-like function is straightforward to implement:
zipSeries :: Ord t => TimeSeries t a -> TimeSeries t b
-> TimeSeries t (a,b)
zipSeries xs ys = prepend (progenitor xs, progenitor ys) t $
zipSeries (xs `forgetTo` t) (ys `forgetTo` t)
where t = min (beginning xs) (beginning ys)
and you may want to convince yourself of its correctness by appealing to
the intuition behind time series.
Regards,
apfelmus
--
http://apfelmus.nfshost.com
------------------------------
Message: 2
Date: Wed, 04 Nov 2009 16:02:47 -0800
From: Patrick Larrivee-Woods <[email protected]>
Subject: [Haskell-beginners] Does System.Directory work on Windows XP?
To: [email protected]
Message-ID: <[email protected]>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Hi,
I"m having trouble getting the System.Directory module in ghci on a
Windows XP machine. Whenever I call a function like getCurrentDirectory
or getDirectoryContents I get no results back. I was wondering if anyone
knows why this is happening, or if someone can point me to a module that
works on Windows.
Here's an example of what I get:
___ ___ _
/ _ \ /\ /\/ __(_)
/ /_\// /_/ / / | | GHC Interactive, version 6.4.2, for Haskell 98.
/ /_\\/ __ / /___| | http://www.haskell.org/ghc/
\____/\/ /_/\____/|_| Type :? for help.
Loading package base-1.0 ... linking ... done.
Prelude> :module System.Directory
Prelude System.Directory> getCurrentDirectory
Prelude System.Directory> getDirectoryContents "."
Prelude System.Directory> getDirectoryContents "c:\\perl"
Prelude System.Directory>
------------------------------
Message: 3
Date: Wed, 4 Nov 2009 16:02:49 -0800
From: Jason Dusek <[email protected]>
Subject: Re: [Haskell-beginners] Does System.Directory work on Windows
XP?
To: Patrick Larrivee-Woods <[email protected]>
Cc: [email protected]
Message-ID:
<[email protected]>
Content-Type: text/plain; charset=UTF-8
Could you try a recent version of GHC and let us know if you
still have trouble?
--
Jason Dusek
------------------------------
Message: 4
Date: Wed, 04 Nov 2009 16:14:59 -0800
From: Patrick Larrivee-Woods <[email protected]>
Subject: Re: [Haskell-beginners] Does System.Directory work on Windows
XP?
To: Jason Dusek <[email protected]>
Cc: [email protected]
Message-ID: <[email protected]>
Content-Type: text/plain; charset=UTF-8; format=flowed
With 6.10.4 it works fine. Thanks for your help, I hadn't even realized
I was using an older version.
Cheers,
Patrick
Jason Dusek wrote:
> Could you try a recent version of GHC and let us know if you
> still have trouble?
>
> --
> Jason Dusek
>
------------------------------
Message: 5
Date: Thu, 5 Nov 2009 01:59:12 +0100
From: Daniel Fischer <[email protected]>
Subject: Re: [Haskell-beginners] Does System.Directory work on Windows
XP?
To: [email protected]
Message-ID: <[email protected]>
Content-Type: text/plain; charset="utf-8"
Am Donnerstag 05 November 2009 01:14:59 schrieb Patrick Larrivee-Woods:
> With 6.10.4 it works fine. Thanks for your help, I hadn't even realized
> I was using an older version.
>
> Cheers,
> Patrick
Just to explain:
getDirectoryContents "." is an IO action producing a String.
In newer GHC releases (I don't remember whther that change came with 6.6 or
6.8), when you
invoke an IO action from the ghci-prompt, if the value returned from the action
has not
type (), by default it is bound to the variable 'it' and printed out (since
it's not
unconditionally a good thing to do that - consider readFile "HUGEFile.txt" - it
can be
disabled via
ghci> :set -fno-print-bind-result
- re-enable with -fprint-bind-result).
Before, the action was simply run and its result not printed out, to use the
result of an
IO action, you had to use
ghci> res <- getDirectoryContents "."
ghci> print res
or
ghci> getDirectoryContents "." >>= mapM_ putStrLn
or whatever.
If you still have 6.4.2 installed, you can try it out (but do your real work
with the
newer, it produces better code).
>
> Jason Dusek wrote:
> > Could you try a recent version of GHC and let us know if you
> > still have trouble?
> >
> > --
> > Jason Dusek
------------------------------
Message: 6
Date: Thu, 5 Nov 2009 12:01:11 +0100
From: Aleksandar Dimitrov <[email protected]>
Subject: [Haskell-beginners] Lazy file IO & Space leaks/waste
To: [email protected]
Message-ID: <[email protected]>
Content-Type: text/plain; charset="us-ascii"
Hello list,
I'm currently writing a small linguistic corpus analyzer. My input file is only
25MB, but profiling shows that the overall amount of allocation over the
program's runtime is several GB. That's a little too much - adding to that is
the fact that the program is abysmally slow, so I'm suspecting a space leak
somewhere.
I'm using the ByteString.Lazy.Char8 class in order to work efficiently with lazy
IO and I must admit that I'm very inexperienced with predicting runtime and
space behaviour of lazy IO :-( It worked well in the past, but I'm stuck now.
The program can be found here:
http://hpaste.org/fastcgi/hpaste.fcgi/view?id=11863#a11863
The important bits are as follows:
> mf :: [C.ByteString] -> StdWord
> mf [] = Word [] C.empty
> mf s = Word (tail s) (head s)
>
> f' = mf . reverse . C.words
> main :: IO ()
> main = do
> corpus_name <- liftM head getArgs
> corpus <- liftM (Corpus . (map f') . C.lines) $ C.readFile corpus_name
> print $ length (content corpus)
> let interesting = filterForInterestingTags interestingTags corpus
> print $ show (freqMap interesting)
freqMap also uses foldr' to fold the corpus it is given into a Data.Map, but
according to the profiling output, that's not where the bad stuff happens.
Here's the interesting parts of the profiling output:
> analyse +RTS -p -RTS corpus/bnc-samp.tt
>
> total time = 30.46 secs (1523 ticks @ 20 ms)
> total alloc = 14,871,619,904 bytes (excludes profiling overheads)
>
> COST CENTRE MODULE no. entries %time %alloc %time %alloc
>
> MAIN MAIN 1 0 0.0 0.0 100.0 100.0
> main Main 221 0 0.0 0.0 0.0 0.0
> CAF Main 206 14 0.1 0.1 100.0 100.0
> showsPrec_aSF Main 223 1 0.0 0.0 0.0 0.0
> interestingTags Main 218 1 0.0 0.0 0.0 0.0
> f' Main 216 1 0.0 0.0 0.0 0.0
> mf Main 217 1 0.0 0.0 0.0 0.0
> main Main 212 1 2.4 4.4 99.9 99.9
> f' Main 219 0 89.0 91.5 90.2 92.7
> mf Main 220 2427450 1.2 1.2 1.2 1.2
> filterForInterestingTags Main 215 1 5.4 0.1 5.4 0.1
> freqMap Main 214 1 0.6 0.5 2.0 2.7
> compare_aRL Main 222 1141996 1.4 2.2 1.4 2.2
Obviously the main cost centre seems to be f' (which I've factored out in order
to view it separately as a cost centre.) It's not the call to reverse that makes
it slow (removing it doesn't affect the run time.) Interestingly, it prints out
the first statement (length) very quickly, then takes a lot of time. I've
removed the call to freqMap, and it seems that GHC is smart enough to drop the
call to f' completely, because only the length ever gets evaluated. But the
freqMap also needs the processing from f', and that's where the bad stuff starts
happening.
Should I look into DeepSeq? I've tried adding strictness to the functions by
hand, and also to the data structures, but that didn't seem to help so far. So
I'm looking for solutions to make this faster. I'm guessing that a smart `seq`
somewhere might help, but I don't know where :-\
As a side note: how can I find out how the run-time data structures look like in
memory? I.e. I want to know *what* exactly stays around as thunks in memory so
that I can focus better on where to add strictness or redirect the program flow.
Ideally, only a very smart part of the file should ever be in memory, with
processing happening incrementally!
Thanks for any pointers!
Aleks
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: not available
Url :
http://www.haskell.org/pipermail/beginners/attachments/20091105/5917de71/attachment-0001.bin
------------------------------
Message: 7
Date: Thu, 5 Nov 2009 21:05:22 -0500
From: "Nathan M. Holden" <[email protected]>
Subject: [Haskell-beginners] if ands
To: [email protected]
Message-ID: <[email protected]>
Content-Type: Text/Plain; charset="us-ascii"
If you have an if statement like
if (a&&b) then fun else fun'
and a is false, does GHC actually bother to check b?
------------------------------
Message: 8
Date: Thu, 5 Nov 2009 21:09:13 -0500
From: Joe Fredette <[email protected]>
Subject: Re: [Haskell-beginners] if ands
To: Nathan M.Holden <[email protected]>
Cc: [email protected]
Message-ID: <[email protected]>
Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes
No, consider the definition of (&&)
-- I hope this is the def from the prelude. If it's not, then it's
probably isomorphic...
(&&) :: Bool -> Bool -> Bool
True && x = x
False && _ = False
Since (&&) ignores it's second argument if the first is false, then it
will "Short circuit" (like most `&` operators in other languages) due
to lazy evaluation.
/Joe
On Nov 5, 2009, at 9:05 PM, Nathan M. Holden wrote:
> If you have an if statement like
>
> if (a&&b) then fun else fun'
>
> and a is false, does GHC actually bother to check b?
> _______________________________________________
> Beginners mailing list
> [email protected]
> http://www.haskell.org/mailman/listinfo/beginners
------------------------------
Message: 9
Date: Thu, 5 Nov 2009 21:12:49 -0500
From: Keith Sheppard <[email protected]>
Subject: Re: [Haskell-beginners] if ands
To: "Nathan M. Holden" <[email protected]>,
[email protected]
Message-ID:
<[email protected]>
Content-Type: text/plain; charset=ISO-8859-1
Also, an nice way to check how evaluation works in ghci is to do something like:
> if False && error "error here" then "it's true" else "it's false"
This expression will evaluate as "it's false" without any "error here"
error message appearing
On Thu, Nov 5, 2009 at 9:05 PM, Nathan M. Holden
<[email protected]> wrote:
> If you have an if statement like
>
> if (a&&b) then fun else fun'
>
> and a is false, does GHC actually bother to check b?
> _______________________________________________
> Beginners mailing list
> [email protected]
> http://www.haskell.org/mailman/listinfo/beginners
>
--
keithsheppard.name
------------------------------
_______________________________________________
Beginners mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/beginners
End of Beginners Digest, Vol 17, Issue 5
****************************************