Send Beginners mailing list submissions to
        beginners@haskell.org

To subscribe or unsubscribe via the World Wide Web, visit
        http://www.haskell.org/mailman/listinfo/beginners
or, via email, send a message with subject or body 'help' to
        beginners-requ...@haskell.org

You can reach the person managing the list at
        beginners-ow...@haskell.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Beginners digest..."


Today's Topics:

   1. Re:  too lazy parsing? (Kyle Murphy)
   2. Re:  too lazy parsing? (Patrick Mylund Nielsen)
   3. Re:  too lazy parsing? (Kyle Murphy)


----------------------------------------------------------------------

Message: 1
Date: Mon, 4 Feb 2013 08:00:14 -0500
From: Kyle Murphy <orc...@gmail.com>
Subject: Re: [Haskell-beginners] too lazy parsing?
To: beginners <beginners@haskell.org>
Message-ID:
        <CA+y6JcyoQx=xqxbp6vujkju+1r76wff8w2pg4kr_39drp0w...@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

I can't say 100% for sure, but I'd guess it's because parsec is pure, and
the file operations are using lazy bytestrings. Since no IO operations are
applied to cont until after you close the handle, nothing can be read
(since at that time the handle is closed). If you want to keep the program
structured the same I believe there are functions that can convert a lazy
bytestring into a strict one, and then you can perform the parsing on that.
Alternatively you could rewrite things to close the file handle after you
write it's contents to the output file.

The default file operations in Haskell are known to be a source of
difficulty in terms of laziness, and there has been some debate as to
whether they're poorly designed or not. I might suggest you look into some
of the alternatives, particular those based on stream fusion principles,
that allow you to kill two birds with one stone by iteratively dealing with
input thereby forcing evaluation and also improving memory usage and making
it harder to trigger space leaks. I don't have the names available at the
moment or I'd provide them, but I'm pretty sure at least one of them is
named something like enumeratee, although I believe there's at least one
other that might debatably be considered better.
On Feb 4, 2013 5:51 AM, "Kees Bleijenberg" <k.bleijenb...@lijbrandt.nl>
wrote:

> module Main where ****
>
> ** **
>
> import Text.ParserCombinators.Parsec (many,many1,string, Parser, parse)***
> *
>
> import System.IO (IOMode(..),hClose,openFile,hGetContents,hPutStrLn)****
>
>
> ****
>
> parseFile hOut fn = do****
>
>                         handle <- openFile fn ReadMode****
>
>                         cont <- hGetContents
> handle                                       ****
>
>                         print cont****
>
>                         let res = parse (many (string "blah")) "" cont****
>
>                         hClose handle                    ****
>
>                         case res of****
>
>                             (Left err) -> hPutStrLn hOut $ "Error: " ++
> (show err)****
>
>                             (Right goodRes) -> mapM_ (hPutStrLn hOut)
> goodRes                         ****
>
>                  ****
>
> main = do   ****
>
>             hOut <- openFile "outp.txt" WriteMode****
>
>             mapM (parseFile hOut) ["inp.txt"]****
>
>             hClose hOut****
>
> ** **
>
> I?am writing a program that parses a lot of files. Above is the simplest
> program I can think of that demonstrates my problem.****
>
> The program above parses inp.txt.  Inp.txt has only the word blah in it.
> The output is saved in outp.txt. This file contains the word blah after
> running the program. if I comment out the line ?print cont? nothing is
> saved in outp.txt.  ****
>
> If I comment out ?print cont? and replace many with many1 in the following
> line, it works again?****
>
> Can someone explain to me what is going  on?****
>
> ** **
>
> Kees****
>
> _______________________________________________
> Beginners mailing list
> Beginners@haskell.org
> http://www.haskell.org/mailman/listinfo/beginners
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<http://www.haskell.org/pipermail/beginners/attachments/20130204/a4e2ce2b/attachment-0001.htm>

------------------------------

Message: 2
Date: Mon, 4 Feb 2013 14:03:48 +0100
From: Patrick Mylund Nielsen <hask...@patrickmylund.com>
Subject: Re: [Haskell-beginners] too lazy parsing?
To: The Haskell-Beginners Mailing List - Discussion of primarily
        beginner-level topics related to Haskell <beginners@haskell.org>
Message-ID:
        <CAEw2jfxgy1f7eb4j9p4ZO=i-p09jhgtcxrjzyptjcyj8mhk...@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

conduit and pipes are two examples:

http://hackage.haskell.org/package/conduit
http://hackage.haskell.org/package/pipes


On Mon, Feb 4, 2013 at 2:00 PM, Kyle Murphy <orc...@gmail.com> wrote:

> I can't say 100% for sure, but I'd guess it's because parsec is pure, and
> the file operations are using lazy bytestrings. Since no IO operations are
> applied to cont until after you close the handle, nothing can be read
> (since at that time the handle is closed). If you want to keep the program
> structured the same I believe there are functions that can convert a lazy
> bytestring into a strict one, and then you can perform the parsing on that.
> Alternatively you could rewrite things to close the file handle after you
> write it's contents to the output file.
>
> The default file operations in Haskell are known to be a source of
> difficulty in terms of laziness, and there has been some debate as to
> whether they're poorly designed or not. I might suggest you look into some
> of the alternatives, particular those based on stream fusion principles,
> that allow you to kill two birds with one stone by iteratively dealing with
> input thereby forcing evaluation and also improving memory usage and making
> it harder to trigger space leaks. I don't have the names available at the
> moment or I'd provide them, but I'm pretty sure at least one of them is
> named something like enumeratee, although I believe there's at least one
> other that might debatably be considered better.
> On Feb 4, 2013 5:51 AM, "Kees Bleijenberg" <k.bleijenb...@lijbrandt.nl>
> wrote:
>
>> module Main where ****
>>
>> ** **
>>
>> import Text.ParserCombinators.Parsec (many,many1,string, Parser, parse)**
>> **
>>
>> import System.IO (IOMode(..),hClose,openFile,hGetContents,hPutStrLn)****
>>
>>
>> ****
>>
>> parseFile hOut fn = do****
>>
>>                         handle <- openFile fn ReadMode****
>>
>>                         cont <- hGetContents
>> handle                                       ****
>>
>>                         print cont****
>>
>>                         let res = parse (many (string "blah")) "" cont***
>> *
>>
>>                         hClose handle                    ****
>>
>>                         case res of****
>>
>>                             (Left err) -> hPutStrLn hOut $ "Error: " ++
>> (show err)****
>>
>>                             (Right goodRes) -> mapM_ (hPutStrLn hOut)
>> goodRes                         ****
>>
>>                  ****
>>
>> main = do   ****
>>
>>             hOut <- openFile "outp.txt" WriteMode****
>>
>>             mapM (parseFile hOut) ["inp.txt"]****
>>
>>             hClose hOut****
>>
>> ** **
>>
>> I?am writing a program that parses a lot of files. Above is the simplest
>> program I can think of that demonstrates my problem.****
>>
>> The program above parses inp.txt.  Inp.txt has only the word blah in it.
>> The output is saved in outp.txt. This file contains the word blah after
>> running the program. if I comment out the line ?print cont? nothing is
>> saved in outp.txt.  ****
>>
>> If I comment out ?print cont? and replace many with many1 in the
>> following line, it works again?****
>>
>> Can someone explain to me what is going  on?****
>>
>> ** **
>>
>> Kees****
>>
>> _______________________________________________
>> Beginners mailing list
>> Beginners@haskell.org
>> http://www.haskell.org/mailman/listinfo/beginners
>>
>>
> _______________________________________________
> Beginners mailing list
> Beginners@haskell.org
> http://www.haskell.org/mailman/listinfo/beginners
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<http://www.haskell.org/pipermail/beginners/attachments/20130204/20b95a82/attachment-0001.htm>

------------------------------

Message: 3
Date: Mon, 4 Feb 2013 10:38:02 -0500
From: Kyle Murphy <orc...@gmail.com>
Subject: Re: [Haskell-beginners] too lazy parsing?
To: beginners <beginners@haskell.org>
Message-ID:
        <ca+y6jcwuykjomevlbpxfgwekx1qkqo7p-u4sjn6egbz2_1o...@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

So this thread got me thinking about what's wrong with file IO in Haskell.
My conclusion is that file handles are problematic because they're
essentially pointers and as such subject to the same sorts of problems such
as use after freeing (as in this example) which is further complicated by
defaulting to a lazy implementation. The same sorts of solutions to dealing
with the Ptr type can be applied to dealing with file handles (and in fact
have been) such as the way alloca works by wrapping the operations on a
pointer such that it's allocated, passed into a function, and then freed on
exit from the function. In a similar fashion you could have a withFile
function that takes a file name, a R/W mode, and a function to perform some
work on the file. This is in fact the exact pattern implemented by the
ResourceT library (which in turn started as part of the previously
mentioned conduit package).

In general I'd recommend avoiding all the standard file functions as
they're very un-Haskellish and inherently unsafe (in the type safety sense,
not the security sense). I personally think those functions should be
phased out in future Haskell releases in favor of better functional
abstractions, but my opinion carries very little weight being essentially
nobody in the Haskell world, so take with whatever sized grain of salt you
feel is appropriate.
On Feb 4, 2013 8:05 AM, "Patrick Mylund Nielsen" <hask...@patrickmylund.com>
wrote:

> conduit and pipes are two examples:
>
> http://hackage.haskell.org/package/conduit
> http://hackage.haskell.org/package/pipes
>
>
> On Mon, Feb 4, 2013 at 2:00 PM, Kyle Murphy <orc...@gmail.com> wrote:
>
>> I can't say 100% for sure, but I'd guess it's because parsec is pure, and
>> the file operations are using lazy bytestrings. Since no IO operations are
>> applied to cont until after you close the handle, nothing can be read
>> (since at that time the handle is closed). If you want to keep the program
>> structured the same I believe there are functions that can convert a lazy
>> bytestring into a strict one, and then you can perform the parsing on that.
>> Alternatively you could rewrite things to close the file handle after you
>> write it's contents to the output file.
>>
>> The default file operations in Haskell are known to be a source of
>> difficulty in terms of laziness, and there has been some debate as to
>> whether they're poorly designed or not. I might suggest you look into some
>> of the alternatives, particular those based on stream fusion principles,
>> that allow you to kill two birds with one stone by iteratively dealing with
>> input thereby forcing evaluation and also improving memory usage and making
>> it harder to trigger space leaks. I don't have the names available at the
>> moment or I'd provide them, but I'm pretty sure at least one of them is
>> named something like enumeratee, although I believe there's at least one
>> other that might debatably be considered better.
>> On Feb 4, 2013 5:51 AM, "Kees Bleijenberg" <k.bleijenb...@lijbrandt.nl>
>> wrote:
>>
>>> module Main where ****
>>>
>>> ** **
>>>
>>> import Text.ParserCombinators.Parsec (many,many1,string, Parser, parse)*
>>> ***
>>>
>>> import System.IO (IOMode(..),hClose,openFile,hGetContents,hPutStrLn)****
>>>
>>>
>>> ****
>>>
>>> parseFile hOut fn = do****
>>>
>>>                         handle <- openFile fn ReadMode****
>>>
>>>                         cont <- hGetContents
>>> handle                                       ****
>>>
>>>                         print cont****
>>>
>>>                         let res = parse (many (string "blah")) "" cont**
>>> **
>>>
>>>                         hClose handle                    ****
>>>
>>>                         case res of****
>>>
>>>                             (Left err) -> hPutStrLn hOut $ "Error: " ++
>>> (show err)****
>>>
>>>                             (Right goodRes) -> mapM_ (hPutStrLn hOut)
>>> goodRes                         ****
>>>
>>>                  ****
>>>
>>> main = do   ****
>>>
>>>             hOut <- openFile "outp.txt" WriteMode****
>>>
>>>             mapM (parseFile hOut) ["inp.txt"]****
>>>
>>>             hClose hOut****
>>>
>>> ** **
>>>
>>> I?am writing a program that parses a lot of files. Above is the simplest
>>> program I can think of that demonstrates my problem.****
>>>
>>> The program above parses inp.txt.  Inp.txt has only the word blah in
>>> it.  The output is saved in outp.txt. This file contains the word blah
>>> after running the program. if I comment out the line ?print cont? nothing
>>> is saved in outp.txt.  ****
>>>
>>> If I comment out ?print cont? and replace many with many1 in the
>>> following line, it works again?****
>>>
>>> Can someone explain to me what is going  on?****
>>>
>>> ** **
>>>
>>> Kees****
>>>
>>> _______________________________________________
>>> Beginners mailing list
>>> Beginners@haskell.org
>>> http://www.haskell.org/mailman/listinfo/beginners
>>>
>>>
>> _______________________________________________
>> Beginners mailing list
>> Beginners@haskell.org
>> http://www.haskell.org/mailman/listinfo/beginners
>>
>>
>
> _______________________________________________
> Beginners mailing list
> Beginners@haskell.org
> http://www.haskell.org/mailman/listinfo/beginners
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<http://www.haskell.org/pipermail/beginners/attachments/20130204/240b743d/attachment.htm>

------------------------------

_______________________________________________
Beginners mailing list
Beginners@haskell.org
http://www.haskell.org/mailman/listinfo/beginners


End of Beginners Digest, Vol 56, Issue 7
****************************************

Reply via email to