Beginners Digest, Vol 23, Issue 9

beginners-request Sat, 08 May 2010 08:58:11 -0700

Send Beginners mailing list submissions to
        [email protected]

To subscribe or unsubscribe via the World Wide Web, visit
        http://www.haskell.org/mailman/listinfo/beginners
or, via email, send a message with subject or body 'help' to
        [email protected]


You can reach the person managing the list at
        [email protected]

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Beginners digest..."


Today's Topics:

   1. Re:  lazy IO in readFile
      (Stephen Blackheath [to Haskell-Beginners])
   2. Re:  lazy IO in readFile (Andrew Sackville-West)
   3. Re:  lazy IO in readFile
      (Stephen Blackheath [to Haskell-Beginners])
   4. Re:  lazy IO in readFile (Daniel Fischer)
   5.  A basic misunderstanding of how to program with  IO (Ken Overton)


----------------------------------------------------------------------

Message: 1
Date: Sat, 08 May 2010 15:41:43 +1200
From: "Stephen Blackheath [to Haskell-Beginners]"
        <[email protected]>
Subject: Re: [Haskell-beginners] lazy IO in readFile
To: [email protected]
Message-ID: <[email protected]>
Content-Type: text/plain; charset=UTF-8

Andrew,

In Haskell, lazy I/O is a form of cheating, because Haskell functions
are supposed to have no side effects, and lazy I/O is a side effect.  At
first, cheating seems attractive, but it takes a bit of experience to
really understand why cheating really is not a good idea, and that
Haskell is so powerful that it doesn't matter that you shouldn't cheat.
 That has certainly been my experience, and I had to find out the hard
way.  It sounds like you're starting to see some of the problems with
cheating.

Here's someone's philosophizing on the subject:

http://lukepalmer.wordpress.com/2009/06/04/it-is-never-safe-to-cheat/

So the short answer is, no - there is no way to force the file returned
by readFile to close.

I'd recommend using withFile and hGetLine, like this:

withFile "testfile" ReadMode $ \h -> do
    ...
    l <- hGetLine h

If you want more speed, take a look at the stuff in Data.ByteString.  If
you want proper text encoding and speed, take a look at the 'text'
package.


Steve

On 08/05/10 14:47, Andrew Sackville-West wrote:
> I'm trying to suss out the best (heh, maybe most idiomatic?) way to
> handle laziness in a particular file operation. I'm reading a file
> which contains a list of rss feed items that have been seen
> previously. I use this list to eliminate feed items I've seen before
> and thus filter a list of new items. (it's a program to email me feed
> items from a couple of low frequency feeds).
> 
> So, the way I do this is to open the history file with readFile, read
> it into a list and then use that as a parameter to a filter
> function. Instead of getting confusing, here is some simple code that
> gets at the nut of the problem:
> 
> import Control.Monad
> 
> isNewItem :: [String] -> String -> Bool
> isNewItem [] = \_ -> True
> isNewItem ts = \x -> not (any (== x) ts)
> 
> filterItems :: [String] -> [String] -> [String]
> filterItems old is = filter (isNewItem old) is
> 
> getOldData :: IO [String]
> getOldData = catch (liftM lines $ readFile "testfile") (\_ -> return [])
> 
> main = do
>   let testData = ["a", "b", "c", "d"] :: [String]
>   currItems <- getOldData 
>   let newItems = filterItems currItems $ testData
> 
>   print newItems -- this is important, it mimics another IO action I'm
>                --  doing in the real code...
> 
>   appendFile "testfile" . unlines $ newItems
> 
> 
> 
> Please ignore, for the moment, whatever *other* problems (idiomatic or
> just idiotic) that may exist above and focus on the IO problem. 
> 
> This code works fine *if* the file "testfile" has in it some subset of the
> testData list. If it has the complete set, it fails with a "resource
> busy" exception. 
> 
> Okay, I can more or less understand what's going on here. Each letter
> in the testData list gets compared to the contents of the file, but
> because they are *all* found, the readFile call never has to try and
> fail to read the last line of the file. Thus the file handle is kept
> open lazily waiting around not having reached EOF.  Fair enough. 
> 
> But what is the best solution? One obvious one, and the one I'm using
> now, is to move the appendFile call into a function with guards to
> prevent trying to append an empty list to the end of the file. This
> solves the problem not by forcing the read on EOF, but by not
> bothering to open the file for appending:
> 
> writeHistory [] = return ()
> writeHistory ni = appendFile "testfile" . unlines $ ni
> 
> And this makes some sense. It's silly to try to write nothing to a
> file.
> 
> But it also rubs me the wrong way. It's not solving the problem
> directly -- closing that file handle. So there's my question, how can
> I close that thing? Is there some way to force it? Do I need to rework
> the reading to read one line ahead of whatever I'm testing against
> (thereby forcing the read of EOF and closing the file)? 
> 
> thanks 
> 
> A
> 
> 
> 
> 
> _______________________________________________
> Beginners mailing list
> [email protected]
> http://www.haskell.org/mailman/listinfo/beginners


------------------------------

Message: 2
Date: Fri, 7 May 2010 20:59:02 -0700
From: Andrew Sackville-West <[email protected]>
Subject: Re: [Haskell-beginners] lazy IO in readFile
To: [email protected]
Message-ID: <[email protected]>
Content-Type: text/plain; charset="us-ascii"

On Sat, May 08, 2010 at 03:41:43PM +1200, Stephen Blackheath [to 
Haskell-Beginners] wrote:
> Andrew,
> 
> In Haskell, lazy I/O is a form of cheating, because Haskell functions
> are supposed to have no side effects, and lazy I/O is a side effect.  At
> first, cheating seems attractive, but it takes a bit of experience to
> really understand why cheating really is not a good idea, and that
> Haskell is so powerful that it doesn't matter that you shouldn't
> cheat.

So, are you saying that using something like readFile is cheating? Or
just that lazy IO itself is cheating?

>  That has certainly been my experience, and I had to find out the hard
> way.  It sounds like you're starting to see some of the problems with
> cheating.

Indeed. This whole exercise (of which the below is just a piece) has
been enlightening. I'm reminded of the cat who's stuck in the IO
monad. I've certainly gotten better at moving into and out of IO (or
moving functions around into and out of IO). 

> 
> Here's someone's philosophizing on the subject:
> 
> http://lukepalmer.wordpress.com/2009/06/04/it-is-never-safe-to-cheat/

cool thanks

> 
> So the short answer is, no - there is no way to force the file returned
> by readFile to close.

I figured as much. I'm not completely unhappy with my solution since
it irks me to write out an empty list anyway. And it's really a simple
little project for my personal use...

> 
> I'd recommend using withFile and hGetLine, like this:
> 
> withFile "testfile" ReadMode $ \h -> do
>     ...
>     l <- hGetLine h

and using this to read through the entire file and then closing it?
(Don't answer that, I'll do the reading). hmm... a little thought
suggests that laziness will still get me unless I put some strictness
in somewhere. I'm still left with a case where the history list is
never completely evaluated forcing the reading of EOF. I will apply
some thought to it and see what happens.

thanks

A


> 
> If you want more speed, take a look at the stuff in Data.ByteString.  If
> you want proper text encoding and speed, take a look at the 'text'
> package.
> 
> 
> Steve
> 
> On 08/05/10 14:47, Andrew Sackville-West wrote:
> > I'm trying to suss out the best (heh, maybe most idiomatic?) way to
> > handle laziness in a particular file operation. I'm reading a file
> > which contains a list of rss feed items that have been seen
> > previously. I use this list to eliminate feed items I've seen before
> > and thus filter a list of new items. (it's a program to email me feed
> > items from a couple of low frequency feeds).
> > 
> > So, the way I do this is to open the history file with readFile, read
> > it into a list and then use that as a parameter to a filter
> > function. Instead of getting confusing, here is some simple code that
> > gets at the nut of the problem:
> > 
> > import Control.Monad
> > 
> > isNewItem :: [String] -> String -> Bool
> > isNewItem [] = \_ -> True
> > isNewItem ts = \x -> not (any (== x) ts)
> > 
> > filterItems :: [String] -> [String] -> [String]
> > filterItems old is = filter (isNewItem old) is
> > 
> > getOldData :: IO [String]
> > getOldData = catch (liftM lines $ readFile "testfile") (\_ -> return [])
> > 
> > main = do
> >   let testData = ["a", "b", "c", "d"] :: [String]
> >   currItems <- getOldData 
> >   let newItems = filterItems currItems $ testData
> > 
> >   print newItems -- this is important, it mimics another IO action I'm
> >              --  doing in the real code...
> > 
> >   appendFile "testfile" . unlines $ newItems
> > 
> > 
> > 
> > Please ignore, for the moment, whatever *other* problems (idiomatic or
> > just idiotic) that may exist above and focus on the IO problem. 
> > 
> > This code works fine *if* the file "testfile" has in it some subset of the
> > testData list. If it has the complete set, it fails with a "resource
> > busy" exception. 
> > 
> > Okay, I can more or less understand what's going on here. Each letter
> > in the testData list gets compared to the contents of the file, but
> > because they are *all* found, the readFile call never has to try and
> > fail to read the last line of the file. Thus the file handle is kept
> > open lazily waiting around not having reached EOF.  Fair enough. 
> > 
> > But what is the best solution? One obvious one, and the one I'm using
> > now, is to move the appendFile call into a function with guards to
> > prevent trying to append an empty list to the end of the file. This
> > solves the problem not by forcing the read on EOF, but by not
> > bothering to open the file for appending:
> > 
> > writeHistory [] = return ()
> > writeHistory ni = appendFile "testfile" . unlines $ ni
> > 
> > And this makes some sense. It's silly to try to write nothing to a
> > file.
> > 
> > But it also rubs me the wrong way. It's not solving the problem
> > directly -- closing that file handle. So there's my question, how can
> > I close that thing? Is there some way to force it? Do I need to rework
> > the reading to read one line ahead of whatever I'm testing against
> > (thereby forcing the read of EOF and closing the file)? 
> > 
> > thanks 
> > 
> > A
> > 
> > 
> > 
> > 
> > _______________________________________________
> > Beginners mailing list
> > [email protected]
> > http://www.haskell.org/mailman/listinfo/beginners
> _______________________________________________
> Beginners mailing list
> [email protected]
> http://www.haskell.org/mailman/listinfo/beginners
> 

-- 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
Url : 
http://www.haskell.org/pipermail/beginners/attachments/20100507/125e8c24/attachment-0001.bin

------------------------------

Message: 3
Date: Sat, 08 May 2010 23:38:25 +1200
From: "Stephen Blackheath [to Haskell-Beginners]"
        <[email protected]>
Subject: Re: [Haskell-beginners] lazy IO in readFile
To: [email protected]
Message-ID: <[email protected]>
Content-Type: text/plain; charset=UTF-8

Andrew,

On 08/05/10 15:59, Andrew Sackville-West wrote:
> On Sat, May 08, 2010 at 03:41:43PM +1200, Stephen Blackheath [to 
> Haskell-Beginners] wrote:
>> Andrew,
>>
>> In Haskell, lazy I/O is a form of cheating, because Haskell functions
>> are supposed to have no side effects, and lazy I/O is a side effect.  At
>> first, cheating seems attractive, but it takes a bit of experience to
>> really understand why cheating really is not a good idea, and that
>> Haskell is so powerful that it doesn't matter that you shouldn't
>> cheat.
> 
> So, are you saying that using something like readFile is cheating? Or
> just that lazy IO itself is cheating?

I'm saying that 'readFile' is lazy I/O, lazy I/O is cheating, and
cheating is bad.  I want you to understand what I mean by "bad" rather
than taking it on authority.  The thing is, Haskell is all about giving
you safety.  Haskell gives you certain advantages that come only from
going all the way, saying that pure really means pure.  Cheating means
that you can't make this assumption, and the rot can spread more quickly
than you expect.  This is fine for small programs, but can be a road to
ruin in large programs.  So, why cheat when Haskell makes it so easy for
you not to, and rewards you so handsomely for your good behaviour?

You may ask, if that's true, then why is 'readFile' in the standard
libraries?  My own opinion on this subject is that even though the
designers of Haskell had incredible foresight, they didn't have our
luxury of experience programming in the language they were designing.
Lazy I/O is _very_ convenient, after all.

All this is my opinion, but I honestly believe that the great majority
of hardcore Haskellers agree with me that cheating is bad.

>> So the short answer is, no - there is no way to force the file returned
>> by readFile to close.
> 
> I figured as much. I'm not completely unhappy with my solution since
> it irks me to write out an empty list anyway. And it's really a simple
> little project for my personal use...

I will sometimes use lazy I/O if the program is simple enough, but it
seems like you're already getting into a situation where it's causing
trouble.

It's a difficult one, because, like I said, lazy I/O really is useful.
My own approach is this:  I ask myself, "Is it *really* referentially
transparent?"  If you can say "yes" to this question, then it's
logically equivalent to not cheating (but you must still feel guilty).

In the case of 'readFile', this becomes, "Can I assume that for the life
of the program, the file's contents will never change?"  (Haskell abides
in a Zen-like 'eternal now'.)  For a program that reads a config file
once on startup, the answer might be "yes", and cheating may not
introduce any risks.

Perhaps I can give you some more insight by saying this:  Laziness adds
complexity to the reasoning necessary to understand how your program
will execute.  Purity means that this complexity is neutralized and
becomes the compiler's problem, not yours.  You have to remember that
while the tiniest little piece of information calculated from the
contents of your 'readFile' remains unevaluated, the file will not
close.  That's very difficult to reason about.

Another thing to consider is, if the code you're implementing relies on
lazy I/O, might you want to use it in a big program?  If so, surely it
would be better to do it in a more general way to begin with.  One of
the things monads are especially good for is replacing lazy I/O.

You might object, "Lazy I/O is so incredibly brilliant, but you are
telling me I can't use it! That really ruins Haskell for me if one of
its most amazing features is not allowed!"

I know this seems very unfair.  But my reply is, "Purity really is
*that* good. It's even worth giving up lazy I/O for - that's how good it
is."

>> I'd recommend using withFile and hGetLine, like this:
>>
>> withFile "testfile" ReadMode $ \h -> do
>>     ...
>>     l <- hGetLine h
> 
> and using this to read through the entire file and then closing it?
> (Don't answer that, I'll do the reading). hmm... a little thought
> suggests that laziness will still get me unless I put some strictness
> in somewhere. I'm still left with a case where the history list is
> never completely evaluated forcing the reading of EOF. I will apply
> some thought to it and see what happens.

I can't resist a couple of comments:  Note that withFile closes the
handle for you explicitly.  It's completely safe in that respect (unless
you pass 'h' as a return value from withFile - which is obviously a bad
idea - but if you want the type system to make this impossible, this
*can* be achieved!).  You know exactly when it's being closed.  With
withFile/hGetLine, laziness can't get you, except in the usual way (that
is, as it relates to memory and CPU usage).

A parting thought:  One of the great things about Haskell (compared with
imperative programming) is that there are dozens of things you don't
have to reason about any more, so you can concentrate on solving your
problem.  Do you see what I'm saying?  Why even bother reasoning about
whether laziness can get you?  Just make everything pure and you don't
have to. *

(* I don't want to mislead you and make you think Haskell is something
it's not.  Therefore I need to add here that you *do* need to reason
about the *space and CPU usage* of your code in the presence of
laziness.  IMO this is the only serious cost of using Haskell - the rest
is benefit.)


Steve

>> On 08/05/10 14:47, Andrew Sackville-West wrote:
>>> I'm trying to suss out the best (heh, maybe most idiomatic?) way to
>>> handle laziness in a particular file operation. I'm reading a file
>>> which contains a list of rss feed items that have been seen
>>> previously. I use this list to eliminate feed items I've seen before
>>> and thus filter a list of new items. (it's a program to email me feed
>>> items from a couple of low frequency feeds).
>>>
>>> So, the way I do this is to open the history file with readFile, read
>>> it into a list and then use that as a parameter to a filter
>>> function. Instead of getting confusing, here is some simple code that
>>> gets at the nut of the problem:
>>>
>>> import Control.Monad
>>>
>>> isNewItem :: [String] -> String -> Bool
>>> isNewItem [] = \_ -> True
>>> isNewItem ts = \x -> not (any (== x) ts)
>>>
>>> filterItems :: [String] -> [String] -> [String]
>>> filterItems old is = filter (isNewItem old) is
>>>
>>> getOldData :: IO [String]
>>> getOldData = catch (liftM lines $ readFile "testfile") (\_ -> return [])
>>>
>>> main = do
>>>   let testData = ["a", "b", "c", "d"] :: [String]
>>>   currItems <- getOldData 
>>>   let newItems = filterItems currItems $ testData
>>>
>>>   print newItems -- this is important, it mimics another IO action I'm
>>>              --  doing in the real code...
>>>
>>>   appendFile "testfile" . unlines $ newItems
>>>
>>>
>>>
>>> Please ignore, for the moment, whatever *other* problems (idiomatic or
>>> just idiotic) that may exist above and focus on the IO problem. 
>>>
>>> This code works fine *if* the file "testfile" has in it some subset of the
>>> testData list. If it has the complete set, it fails with a "resource
>>> busy" exception. 
>>>
>>> Okay, I can more or less understand what's going on here. Each letter
>>> in the testData list gets compared to the contents of the file, but
>>> because they are *all* found, the readFile call never has to try and
>>> fail to read the last line of the file. Thus the file handle is kept
>>> open lazily waiting around not having reached EOF.  Fair enough. 
>>>
>>> But what is the best solution? One obvious one, and the one I'm using
>>> now, is to move the appendFile call into a function with guards to
>>> prevent trying to append an empty list to the end of the file. This
>>> solves the problem not by forcing the read on EOF, but by not
>>> bothering to open the file for appending:
>>>
>>> writeHistory [] = return ()
>>> writeHistory ni = appendFile "testfile" . unlines $ ni
>>>
>>> And this makes some sense. It's silly to try to write nothing to a
>>> file.
>>>
>>> But it also rubs me the wrong way. It's not solving the problem
>>> directly -- closing that file handle. So there's my question, how can
>>> I close that thing? Is there some way to force it? Do I need to rework
>>> the reading to read one line ahead of whatever I'm testing against
>>> (thereby forcing the read of EOF and closing the file)? 
>>>
>>> thanks 
>>>
>>> A


------------------------------

Message: 4
Date: Sat, 8 May 2010 14:16:27 +0200
From: Daniel Fischer <[email protected]>
Subject: Re: [Haskell-beginners] lazy IO in readFile
To: [email protected]
Message-ID: <[email protected]>
Content-Type: text/plain;  charset="utf-8"

On Saturday 08 May 2010 04:47:14, Andrew Sackville-West wrote:
>
> Please ignore, for the moment, whatever *other* problems (idiomatic or
> just idiotic) that may exist above and focus on the IO problem.
>

Sorry, can't entirely. Unless the number of rss items remains low, don't 
use lists, use a Set.

> This code works fine *if* the file "testfile" has in it some subset of
> the testData list. If it has the complete set, it fails with a "resource
> busy" exception.
>
> Okay, I can more or less understand what's going on here. Each letter
> in the testData list gets compared to the contents of the file, but
> because they are *all* found, the readFile call never has to try and
> fail to read the last line of the file. Thus the file handle is kept
> open lazily waiting around not having reached EOF.  Fair enough.
>
> But what is the best solution? One obvious one, and the one I'm using
> now, is to move the appendFile call into a function with guards to
> prevent trying to append an empty list to the end of the file. This
> solves the problem not by forcing the read on EOF, but by not
> bothering to open the file for appending:
>
> writeHistory [] = return ()
> writeHistory ni = appendFile "testfile" . unlines $ ni
>
> And this makes some sense. It's silly to try to write nothing to a
> file.

Yes. In any case,

    unless (null newItems) $ appendFile "testfile" $ unlines newItems

seems cleaner.

>
> But it also rubs me the wrong way. It's not solving the problem
> directly -- closing that file handle. So there's my question, how can
> I close that thing? Is there some way to force it?

For almost all practical purposes, there is (despite the fact that what 
Stephen said is basically right, although a little overstated in my 
opinion).
You have to force the entire file to be read, the standard idiom is using

  x `seq` doSomethingElse

where x is a value that requires the entire file to be read, in your case
x = length currItems is a natural choice.
That way, you effectively have made readFile strict without sacrificing too 
much niceness of the code (withFile and hGetLine mostly are much uglier 
IMO).

It is not entirely failsafe because the file handle needn't be immediately 
closed upon encountering EOF, it can linger until one of the next GCs, but 
if you do something substantial between reading the file and trying to 
reopen it for appending, it is very unlikely that it's not yet closed (not 
impossible, hence almost all and not all).

> Do I need to rework
> the reading to read one line ahead of whatever I'm testing against
> (thereby forcing the read of EOF and closing the file)?
>
> thanks
>
> A



------------------------------

Message: 5
Date: Sat, 8 May 2010 11:55:09 -0400
From: Ken Overton <[email protected]>
Subject: [Haskell-beginners] A basic misunderstanding of how to
        program with    IO
To: "[email protected]" <[email protected]>
Message-ID:
        <[email protected]>
Content-Type: text/plain; charset="us-ascii"

Sorry for such a beginner-y question, but is there a way to make a function 
like:

    interact :: String -> Resp
    interact txt =
                putStrLn txt
                rsp <- getLine
                return parseResp rsp

    parseResp :: String -> Resp

Or is that simply a wrong way of programming in Haskell with IO?

Thanks (and apologies),

-- kov

------------------------------

_______________________________________________
Beginners mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/beginners


End of Beginners Digest, Vol 23, Issue 9
****************************************

Beginners Digest, Vol 23, Issue 9

Reply via email to