Re: [Haskell-cafe] Re: Network parsing and parsec
Andrew Pimlott wrote: On Tue, Sep 20, 2005 at 03:01:32PM +0100, Keean Schupke wrote: (see attachment for files) You didn't include all the used libraries (MonadControl, MonadState). Andrew Oops, here they are (it was extracted from a larger project), sorry about that... (Have posted this back to the mailing list incase anyone else is tying to use the libraries I posted) Regards, keean. parser.tgz Description: application/compressed-tar ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: Network parsing and parsec
On Sep 20, 2005, at 6:32 PM, Benjamin Franksen wrote: On Tuesday 20 September 2005 16:50, John Goerzen wrote: On the flip side, Parsec is really nice. I wonder how easy it would be to make it parse [Word8] instead of String? Isn't Parsec parameterized over the token type? Or even a FastPackedString? (And how easy it would be to get that instead of a String from hGetContents)? From the FPS haddock: hGetContents :: Handle - IO FastString Read entire handle contents into a FastString. This may or may not do what you want...it's probably not a lazy read. It seems like there might need to be something like: hGetContentsLazily :: Handle - IO [FastString] which returns file contents in chunks based on our ability to buffer the handle. If we can mmap the handle, we may get a singleton list with a giant FastString; if we are using a Socket or a terminal, each succeeding string might be the next chunk of available data from the handle. I had the impression the internals of getContents from the prelude worked a bit like this (in GHC, anyway). -Jan-Willem Maessen Ben ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: Network parsing and parsec
On Wednesday 21 September 2005 19:36, Jan-Willem Maessen wrote: On Sep 20, 2005, at 6:32 PM, Benjamin Franksen wrote: On Tuesday 20 September 2005 16:50, John Goerzen wrote: On the flip side, Parsec is really nice. I wonder how easy it would be to make it parse [Word8] instead of String? Isn't Parsec parameterized over the token type? Or even a FastPackedString? (And how easy it would be to get that instead of a String from hGetContents)? From the FPS haddock: hGetContents :: Handle - IO FastString Read entire handle contents into a FastString. This may or may not do what you want...it's probably not a lazy read. It seems like there might need to be something like: hGetContentsLazily :: Handle - IO [FastString] which returns file contents in chunks based on our ability to buffer the handle. If we can mmap the handle, we may get a singleton list with a giant FastString; if we are using a Socket or a terminal, each succeeding string might be the next chunk of available data from the handle. From the FPS haddock: data LazyFile Constructors LazyString String MMappedFastString FastString LazyFastStrings [FastString] Instances Eq LazyFile readFileLazily :: FilePath - IO LazyFile That comes pretty near. Unfortunately it works on a file name, not a handle. Thus it cannot be used for a socket or such things. Ben ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: Network parsing and parsec
On Wed, Sep 21, 2005 at 12:32:56AM +0200, Benjamin Franksen wrote: On Tuesday 20 September 2005 16:50, John Goerzen wrote: On the flip side, Parsec is really nice. I wonder how easy it would be to make it parse [Word8] instead of String? Isn't Parsec parameterized over the token type? Or even a FastPackedString? (And how easy it would be to get that instead of a String from hGetContents)? From the FPS haddock: hGetContents :: Handle - IO FastString Read entire handle contents into a FastString. This may or may not do what you want...it's probably not a lazy read. If it can be implemented via 'mmap' then it is effectivly a lazy read. and _very_ efficient to boot. John -- John Meacham - ⑆repetae.net⑆john⑈ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: Network parsing and parsec
On Wednesday 21 September 2005 20:17, John Meacham wrote: On Wed, Sep 21, 2005 at 12:32:56AM +0200, Benjamin Franksen wrote: On Tuesday 20 September 2005 16:50, John Goerzen wrote: On the flip side, Parsec is really nice. I wonder how easy it would be to make it parse [Word8] instead of String? Isn't Parsec parameterized over the token type? Or even a FastPackedString? (And how easy it would be to get that instead of a String from hGetContents)? From the FPS haddock: hGetContents :: Handle - IO FastString Read entire handle contents into a FastString. This may or may not do what you want...it's probably not a lazy read. If it can be implemented via 'mmap' then it is effectivly a lazy read. and _very_ efficient to boot. True, I forgot mmap. Taking a look at the implementation reveals that it indeed uses mmap (if available on the platform). Ben ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: Network parsing and parsec
On 2005-09-16, Andrew Pimlott [EMAIL PROTECTED] wrote: On Thu, Sep 15, 2005 at 06:11:58PM -0700, Andrew Pimlott wrote: I don't see why this would be more error-prone than any other approach. Hmm... I take that back. I don't know anything about the IMAP protocol, but after imagining for a few moments what it might be like, I can see how it could be more difficult than my example. The user state of the parser might help you... Hmm, can you elaborate on that? Basically, I *really* want to get away frmo having to use hGetContents. It is just not at all friendly for an interactive netwrk protocol. If I were just streaming a large file from an FTP server, it would be fine, but even using it to begin with involves using Handles in a nonstandard way (since there must be a separate Handle for writing, since hGetContents sents the Handle to be half-closed) that is apparently not well-supported. -- John ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: Network parsing and parsec
On 2005-09-15, Adam Turoff [EMAIL PROTECTED] wrote: On 9/15/05, John Goerzen [EMAIL PROTECTED] wrote: So, to make that approach work, I would really need to do a lot of work outside of Parsec -- the stuff that I really want to use Parsec for, I think. Well, you do have a state monad to work with. Why not just stuff the number 305 into your state, keep reading until you've read 305 bytes (decrementing the count as you read), and return the 305-byte string as your result for this parser? When you resume, you should be ready to parse the next very token after the 305-byte string. It's unclear to me exactly how to mix the IO monad with Parsec. It doesn't really seem to be doable. Not to mention that if hGetContents is used, the Handle has to be put into non-buffering mode, which means one syscall per character read. Terribly slow. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: Network parsing and parsec
John Goerzen wrote: On 2005-09-15, Adam Turoff [EMAIL PROTECTED] wrote: On 9/15/05, John Goerzen [EMAIL PROTECTED] wrote: So, to make that approach work, I would really need to do a lot of work outside of Parsec -- the stuff that I really want to use Parsec for, I think. Well, you do have a state monad to work with. Why not just stuff the number 305 into your state, keep reading until you've read 305 bytes (decrementing the count as you read), and return the 305-byte string as your result for this parser? When you resume, you should be ready to parse the next very token after the 305-byte string. It's unclear to me exactly how to mix the IO monad with Parsec. It doesn't really seem to be doable. Not to mention that if hGetContents is used, the Handle has to be put into non-buffering mode, which means one syscall per character read. Terribly slow. Does it? I didn't think so ... Keean. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: Network parsing and parsec
You may like my parser transformer then (based on the efficent backtracking parser paper, I believe by Ralf Heinze - uses endofunctor and continuation passing - Its a long time since I tested it but I think it holds its own against Parsec, without requiring the extra return types). -- parser.hs: Copyright (C)2001,2002 Keean Schupke. -- -- Polymorphic monadic consumer based parser. module Lib.Monad.ParserT(ParserT(..)) where import Control.Monad hiding (guard) import Control.Monad.Error import Lib.Monad.MonadT import Lib.Monad.MonadState import Lib.Monad.MonadParser import Lib.Monad.MonadControl import Lib.Arrow.Runnable -- -- An continuation passing endomorphic parser type Cps a r = (a - r) - r type Endo r = r - r newtype ParserT r tok m a = PT (Cps a ([tok] - Endo (m r))) instance Monad m = Functor (ParserT r tok m) where fmap g (PT m) = PT $ \k - m (\a s f - k (g a) s f) instance Monad m = Monad (ParserT r tok m) where {-# INLINE return #-} return a = PT $ \k - k a {-# INLINE (=) #-} (PT m) = f = PT $ \k - m (\a - (\(PT x) - x) (f a) k) instance Monad m = MonadPlus (ParserT r tok m) where {-# INLINE mzero #-} mzero = PT $ \_ _ f - f {-# INLINE mplus #-} mplus (PT m) (PT n) = PT $ \k s - m k s . n k s instance MonadPlus m = MonadT (ParserT r tok) m where {-# INLINE up #-} up m = PT $ \k s f - (m = \a - k a s mzero) `mplus` f {-# INLINE down #-} down = undefined instance (MonadPlus m,MonadT (ParserT r tok) m,Runnable ([tok] - m ([tok],r)) ([tok] - n ([tok],r))) = Runnable (ParserT ([tok],r) tok m r) ([tok] - n ([tok],r)) where run = run . (\(PT m) t - m (\a t' f - return (t',a) `mplus` f) t mzero) instance (MonadPlus m,MonadT (ParserT r tok) m) = Runnable (ParserT ([tok],r) tok m r) ([tok] - m ([tok],r)) where run = (\(PT m) t - m (\a t' f - return (t',a) `mplus` f) t mzero) instance Monad m = MonadState [tok] (ParserT r tok m) where {-# INLINE update #-} update st = PT $ \k s - k s ((st s) `asTypeOf` s) setState st = PT $ \k _ - k () st getState = PT $ \k s - k s s instance Monad m = MonadParser tok (ParserT r tok m) where {-# INLINE item #-} item = PT $ \k s - case s of [] - id (a:x) - k a x instance (MonadPlus (t m),MonadParser tok m,MonadT t m) = MonadParser tok (t m) where item = up item instance Monad m = MonadControl (ParserT r tok m) where {-# INLINE once #-} once (PT m) = PT $ \k s f - m (\a s' _ - k a s' f) s f Regards, Keean. John Goerzen wrote: On 2005-09-16, Andrew Pimlott [EMAIL PROTECTED] wrote: On Thu, Sep 15, 2005 at 06:11:58PM -0700, Andrew Pimlott wrote: I don't see why this would be more error-prone than any other approach. Hmm... I take that back. I don't know anything about the IMAP protocol, but after imagining for a few moments what it might be like, I can see how it could be more difficult than my example. The user state of the parser might help you... Hmm, can you elaborate on that? Basically, I *really* want to get away frmo having to use hGetContents. It is just not at all friendly for an interactive netwrk protocol. If I were just streaming a large file from an FTP server, it would be fine, but even using it to begin with involves using Handles in a nonstandard way (since there must be a separate Handle for writing, since hGetContents sents the Handle to be half-closed) that is apparently not well-supported. -- John ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: Network parsing and parsec
On Tue, Sep 20, 2005 at 02:29:12PM +0100, Keean Schupke wrote: It's unclear to me exactly how to mix the IO monad with Parsec. It doesn't really seem to be doable. Not to mention that if hGetContents is used, the Handle has to be put into non-buffering mode, which means one syscall per character read. Terribly slow. Does it? I didn't think so ... strace seems to say yes. If buffering is used, it blocks on attempts to read more than is available. -- John ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: Network parsing and parsec
Here's some useful definitions to go with that... module Lib.Parser.Parser(Parser,when,unless,guard,(|),opt,many,many1,sepBy, parse,alpha,digit,lower,upper,other,lexical,satisfy,optional,literal,untilP,untilParser,matchP) where ... (see attachment for files) Regards, Keean. parser.tgz Description: application/compressed-tar ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: Network parsing and parsec
John Goerzen wrote: On Tue, Sep 20, 2005 at 02:29:12PM +0100, Keean Schupke wrote: It's unclear to me exactly how to mix the IO monad with Parsec. It doesn't really seem to be doable. Not to mention that if hGetContents is used, the Handle has to be put into non-buffering mode, which means one syscall per character read. Terribly slow. Does it? I didn't think so ... strace seems to say yes. Thats odd, the source code seems to suggest that when you read past the end of the buffer it reads the next entire buffer (it has cases for each possible buffer configuration, line, block and none) - and I can think of no reason _why_ it cannot use buffering... I would think that it's a bug if it is the case. Regards, Keean. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: Network parsing and parsec
On Tue, Sep 20, 2005 at 03:05:25PM +0100, Keean Schupke wrote: strace seems to say yes. Thats odd, the source code seems to suggest that when you read past the end of the buffer it reads the next entire buffer (it has cases for each possible buffer configuration, line, block and none) - and I can think of no reason _why_ it cannot use buffering... I would think that it's a bug if it is the case. Because the next entire buffer might consume more data than the remote has sent. That results in deadlock. -- John ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: Network parsing and parsec
Here's the code from hGetContents (base/GHC/IO.lhs): -- we never want to block during the read, so we call fillReadBuffer with -- is_line==True, which tells it to just read what there is. lazyReadBuffered h handle_ fd ref buf = do catch (do buf - fillReadBuffer fd True{-is_line-} (haIsStream handle_) buf lazyReadHaveBuffer h handle_ fd ref buf ) -- all I/O errors are discarded. Additionally, we close the handle. (\e - do handle_ - hClose_help handle_ return (handle_, ) ) So, it reads whatever is available, further description is available from the definition of fillReadBuffered: -- For a line buffer, we just get the first chunk of data to arrive, -- and don't wait for the whole buffer to be full (but we *do* wait -- until some data arrives). This isn't really line buffering, but it -- appears to be what GHC has done for a long time, and I suspect it -- is more useful than line buffering in most cases. So for a disc buffer I would expect 1 complete buffer to be returned most of the time, for a network read, I guess one packet (MTUs) worth should be expected... Regards, Keean. Keean Schupke wrote: John Goerzen wrote: On Tue, Sep 20, 2005 at 02:29:12PM +0100, Keean Schupke wrote: It's unclear to me exactly how to mix the IO monad with Parsec. It doesn't really seem to be doable. Not to mention that if hGetContents is used, the Handle has to be put into non-buffering mode, which means one syscall per character read. Terribly slow. Does it? I didn't think so ... strace seems to say yes. Thats odd, the source code seems to suggest that when you read past the end of the buffer it reads the next entire buffer (it has cases for each possible buffer configuration, line, block and none) - and I can think of no reason _why_ it cannot use buffering... I would think that it's a bug if it is the case. Regards, Keean. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: Network parsing and parsec
John Goerzen wrote: On Tue, Sep 20, 2005 at 03:05:25PM +0100, Keean Schupke wrote: strace seems to say yes. Thats odd, the source code seems to suggest that when you read past the end of the buffer it reads the next entire buffer (it has cases for each possible buffer configuration, line, block and none) - and I can think of no reason _why_ it cannot use buffering... I would think that it's a bug if it is the case. Because the next entire buffer might consume more data than the remote has sent. That results in deadlock. Would it not be usual to have a timeout incase of dropped connection? Regards, Keean. (Btw, did you look at the Parser Monad-Transformer?) ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: Network parsing and parsec
On Tue, Sep 20, 2005 at 03:20:01PM +0100, Keean Schupke wrote: Because the next entire buffer might consume more data than the remote has sent. That results in deadlock. Would it not be usual to have a timeout incase of dropped connection? Yes, but hardly useful if it happens after issuing every command ;-) (Btw, did you look at the Parser Monad-Transformer?) Not yet, but thanks for sending it along. -- John ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: Network parsing and parsec
On Tue, Sep 20, 2005 at 03:17:11PM +0100, Keean Schupke wrote: -- For a line buffer, we just get the first chunk of data to arrive, -- and don't wait for the whole buffer to be full (but we *do* wait -- until some data arrives). This isn't really line buffering, but it -- appears to be what GHC has done for a long time, and I suspect it -- is more useful than line buffering in most cases. So for a disc buffer I would expect 1 complete buffer to be returned most of the time, for a network read, I guess one packet (MTUs) worth should be expected... Hmm, and checking my code, it appears that I did use line buffering for my FTP client. However, I am *sure* that I did run into some deadlock issues relating to buffering at some point. Sigh. So this is good and should work nicely with protocols such as SMTP and FTP. The other thing is that the Handle is not the most convenient way to work with a socket. As I mentioned, two Handles for a single socket must be opened, which is inconvenient and annoying, not to mention leads to some confusing semantics. You also have to be *very* careful to never consume more than you need. On the flip side, Parsec is really nice. I wonder how easy it would be to make it parse [Word8] instead of String? Or even a FastPackedString? (And how easy it would be to get that instead of a String from hGetContents)? -- John ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: Network parsing and parsec
On Tuesday 20 September 2005 16:50, John Goerzen wrote: On the flip side, Parsec is really nice. I wonder how easy it would be to make it parse [Word8] instead of String? Isn't Parsec parameterized over the token type? Or even a FastPackedString? (And how easy it would be to get that instead of a String from hGetContents)? From the FPS haddock: hGetContents :: Handle - IO FastString Read entire handle contents into a FastString. This may or may not do what you want...it's probably not a lazy read. Ben ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: Network parsing and parsec
John Goerzen writes: With networking, you must be careful not to attempt to read more data than the server hands back, or else you'll block. [...] With a protocol such as IMAP, there is no way to know until a server response is being parsed, how many lines (or bytes) of data to read. The approach I recommend is to run a scanner (tokenizer) before the actual parser. IMAP, like most other RFC protocols, is line-based; so you can use a very simple scanner to read a CRLF-terminated line efficiently (using non-blocking I/O, for example), which you can then feed into the parser just fine because you know that it has to contain a complete request (response) that you can handle. Peter ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: Network parsing and parsec
On 2005-09-15, Peter Simons [EMAIL PROTECTED] wrote: The approach I recommend is to run a scanner (tokenizer) before the actual parser. IMAP, like most other RFC protocols, is line-based; so you can use a very simple scanner to read a CRLF-terminated line efficiently (using non-blocking I/O, for example), which you can then feed into the parser just fine because you know that it has to contain a complete request (response) that you can handle. I thought of that, but that isn't really true for IMAP. IMAP responses can span many, many lines (for instance, it can return a list of all matching messages in a folder, or multiple bits of status results). Or they can use only one line. Not only that, but IMAP has a way where you can embed, say {305} instead of a string. That means, after you finish reading this line, read exactly 305 bytes, and consider that to be used here. But if you see {305} (the double quotes indicating a string), this is just a string containing the text {305}. So, to make that approach work, I would really need to do a lot of work outside of Parsec -- the stuff that I really want to use Parsec for, I think. -- John ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: Network parsing and parsec
On 9/15/05, John Goerzen [EMAIL PROTECTED] wrote: Not only that, but IMAP has a way where you can embed, say {305} instead of a string. That means, after you finish reading this line, read exactly 305 bytes, and consider that to be used here. But if you see {305} (the double quotes indicating a string), this is just a string containing the text {305}. So, to make that approach work, I would really need to do a lot of work outside of Parsec -- the stuff that I really want to use Parsec for, I think. Well, you do have a state monad to work with. Why not just stuff the number 305 into your state, keep reading until you've read 305 bytes (decrementing the count as you read), and return the 305-byte string as your result for this parser? When you resume, you should be ready to parse the next very token after the 305-byte string. -- Adam ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe