Re: [Haskell-cafe] parsing currency amounts with parsec
Very helpful -- thanks everyone! The handling of currency amounts in hledger is what I was looking for in terms of alternate ways to parse and represent dollar amounts in Haskell. On Wed, May 11, 2011 at 6:05 PM, Simon Michael si...@joyful.com wrote: On 5/10/11 2:52 PM, Roman Cheplyaka wrote: You could read hledger[1] sources for inspiration: it's written in Haskell and contains some (quite generic) currency parsing. Hi Eric.. here's the code in question: http://hackage.haskell.org/packages/archive/hledger-lib/0.14/doc/html/src/Hledger-Read-JournalReader.html#postingamount and some related docs: http://hledger.org/MANUAL.html#amounts It's probably more complicated and less efficient than you need, but a source of ideas. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] parsing currency amounts with parsec
On 5/10/11 2:52 PM, Roman Cheplyaka wrote: You could read hledger[1] sources for inspiration: it's written in Haskell and contains some (quite generic) currency parsing. Hi Eric.. here's the code in question: http://hackage.haskell.org/packages/archive/hledger-lib/0.14/doc/html/src/Hledger-Read-JournalReader.html#postingamount and some related docs: http://hledger.org/MANUAL.html#amounts It's probably more complicated and less efficient than you need, but a source of ideas. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] parsing currency amounts with parsec
* Eric Rasmussen ericrasmus...@gmail.com [2011-05-09 15:07:57-0700] Hi everyone, I am relatively new to Haskell and Parsec, and I couldn't find any articles on parsing numbers in the following format: You could read hledger[1] sources for inspiration: it's written in Haskell and contains some (quite generic) currency parsing. [1]: http://hledger.org/ -- Roman I. Cheplyaka :: http://ro-che.info/ Don't worry what people think, they don't do it very often. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] parsing currency amounts with parsec
On Mon, May 9, 2011 at 5:07 PM, Eric Rasmussen ericrasmus...@gmail.com wrote: Hi everyone, I am relatively new to Haskell and Parsec, and I couldn't find any articles on parsing numbers in the following format: Positive: $115.33 Negative: ($1,323.42) I'm working on the parser for practical purposes (to convert a 3rd-party generated, most unhelpful format into one I can use), and I'd really appreciate any insight into a better way to do this, or if there are any built-in functions/established libraries that would be better suited to the task. My code below works, but doesn't seem terribly efficient. Why do you think it inefficient? Is it slow? I don't have any substantial suggestions, but from a style perspective: * I would question the use of IEEE binary-floating-point number types for currency. Haskell ships with a fixed-point decimal library, but I don't know how fast it is: http://hackage.haskell.org/packages/archive/base/latest/doc/html/Data-Fixed.html * You can re-use the 'positive' parser in the 'negative' one: negAmount = do char '(' a - posAmount char ')' return (negate a) * In the 'currencyAmount' declaration, I'm not sure that you need the 'try' before the 'negAmount', but I don't know what the rest of your grammar looks like. Thanks! Eric - {- parses positive and negative dollar amounts -} integer :: CharParser st Integer integer = PT.integer lexer float :: CharParser st Double float = PT.float lexer currencyAmount = try negAmount | posAmount negAmount = do char '(' char '$' a - currency char ')' return (negate a) posAmount = do char '$' a - currency return a currency = do parts - many floatOrSep let result = combine orderedParts where combine = sumWithFactor 1 orderedParts = reverse parts return result floatOrSep = try float | beforeSep beforeSep = do a - integer char ',' return (fromIntegral a :: Double) sumWithFactor n [] = 0 sumWithFactor n (x:xs) = n * x + next where next = sumWithFactor (n*1000) xs ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] parsing currency amounts with parsec
On 5/9/11 10:04 PM, Antoine Latter wrote: On Mon, May 9, 2011 at 5:07 PM, Eric Rasmussenericrasmus...@gmail.com wrote: Hi everyone, I am relatively new to Haskell and Parsec, and I couldn't find any articles on parsing numbers in the following format: Positive: $115.33 Negative: ($1,323.42) I'm working on the parser for practical purposes (to convert a 3rd-party generated, most unhelpful format into one I can use), and I'd really appreciate any insight into a better way to do this, or if there are any built-in functions/established libraries that would be better suited to the task. My code below works, but doesn't seem terribly efficient. Why do you think it inefficient? Is it slow? I don't have any substantial suggestions, but from a style perspective: * I would question the use of IEEE binary-floating-point number types for currency. Haskell ships with a fixed-point decimal library, but I don't know how fast it is: http://hackage.haskell.org/packages/archive/base/latest/doc/html/Data-Fixed.html There are also a few other options (that I can't seem to find links to at the moment), and some hints on how to do it yourself: http://augustss.blogspot.com/2007/04/overloading-haskell-numbers-part-3.html I'm not sure if your choice of parsing library is fixed or not, but you could probably speed things up significantly by using Attoparsec. In particular, Attoparsec's combinators for takeWhile, takeWhile1,scan,... return bytestrings, and you can then fold over the bytestring quite nicely. If your source is ASCII or anything ASCII compatible (Latin-1, Latin-9, UTF-8,...) then take a look at Data.Attoparsec.Char8.decimal[1][2]. Unless you need to verify that commas occur exactly every third digit, I'd suggest (a) dropping commas while scanning the string, or (b) implicitly dropping commas while folding over the string. If you do need to verify this, then your best option is (a), since you can maintain a state machine about how many digits seen since the last comma. Once you're using the Attoparsec strategy of folding over the raw byte buffers, then the only room for improvement is going to be making the code as straight-line as possible. Some untested example code: import qualified Data.Attoparsec as A import qualified Data.Attoparsec.Char8 as A8 import qualified Data.ByteString as B rawCurrency :: A.Parser (ByteString,ByteString) rawCurrency = do dollars - A.scan 0 step _ - A.char '.' -- Assuming it's required... cents - A.takeWhile1 isDigit_w8 -- Assuming no commas... return (dollars,cents) where step :: Int - Word8 - Maybe Int step 3 0x2C = Just 0 step s c | isDigit_w8 c = Just $! s+1 step _ _= Nothing -- Note: the order of comparisons is part of why it's fast. -- | A fast digit predicate. isDigit_w8 :: Word8 - Bool isDigit_w8 w = (w = 0x39 w = 0x30) {-# INLINE isDigit_w8 #-} -- With the dots filled in by whatever representation you use. currency :: A.Parser ... currency = do (dollars,cents) - rawCurrency let step a w = a * 10 + fromIntegral (w - 0x30) d = B.foldl' step 0 (B.filter (/= 0x2C) dollars) c = fromIntegral (B.foldl' step 0 cents) / (10 ^ length cents) return (... d ... c ...) amount :: A.Parser ... amount = pos | neg where pos = A8.char '$' * currency neg = do _ - A8.string ($ a - currency _ - A8.char ')' return (negate a) [1] And if you're using Attoparsec itself, you may want to take a look at Data.Attoparsec.Zepto as well. [2] If you're basing code on Attoparsec, you may want to look at some of my pending patches which improve performance on this (already extremely fast) code: https://bitbucket.org/winterkoninkje/attoparsec/changesets -- Live well, ~wren ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] parsing currency amounts with parsec
I'll check out Attoparsec, thanks! My first attempt may work for this particular task, but I'm warming up for a more intense parsing project and it sounds like Attoparsec with Bytestrings may work best. Also, just in case anyone reads this thread later and is looking for a quick Parsec solution, I discovered that the code I posted initially was a bit greedy in a bad way if the dollar amount was at the end of the line. I got rid of the original currency, floatOrSep, and beforeSep functions and replaced them with the code below (still verbose, but hopefully a better starting point for now). double = do i - integer return (fromIntegral i :: Double) currency = try float | largeAmount largeAmount = do first - double rest - many afterSep let parts = first : rest let result = combine orderedParts where combine = sumWithFactor 1 orderedParts = reverse parts return result afterSep = do char ',' try float | double On Mon, May 9, 2011 at 8:15 PM, wren ng thornton w...@freegeek.org wrote: On 5/9/11 10:04 PM, Antoine Latter wrote: On Mon, May 9, 2011 at 5:07 PM, Eric Rasmussenericrasmus...@gmail.com wrote: Hi everyone, I am relatively new to Haskell and Parsec, and I couldn't find any articles on parsing numbers in the following format: Positive: $115.33 Negative: ($1,323.42) I'm working on the parser for practical purposes (to convert a 3rd-party generated, most unhelpful format into one I can use), and I'd really appreciate any insight into a better way to do this, or if there are any built-in functions/established libraries that would be better suited to the task. My code below works, but doesn't seem terribly efficient. Why do you think it inefficient? Is it slow? I don't have any substantial suggestions, but from a style perspective: * I would question the use of IEEE binary-floating-point number types for currency. Haskell ships with a fixed-point decimal library, but I don't know how fast it is: http://hackage.haskell.org/packages/archive/base/latest/doc/html/Data-Fixed.html There are also a few other options (that I can't seem to find links to at the moment), and some hints on how to do it yourself: http://augustss.blogspot.com/2007/04/overloading-haskell-numbers-part-3.html I'm not sure if your choice of parsing library is fixed or not, but you could probably speed things up significantly by using Attoparsec. In particular, Attoparsec's combinators for takeWhile, takeWhile1,scan,... return bytestrings, and you can then fold over the bytestring quite nicely. If your source is ASCII or anything ASCII compatible (Latin-1, Latin-9, UTF-8,...) then take a look at Data.Attoparsec.Char8.decimal[1][2]. Unless you need to verify that commas occur exactly every third digit, I'd suggest (a) dropping commas while scanning the string, or (b) implicitly dropping commas while folding over the string. If you do need to verify this, then your best option is (a), since you can maintain a state machine about how many digits seen since the last comma. Once you're using the Attoparsec strategy of folding over the raw byte buffers, then the only room for improvement is going to be making the code as straight-line as possible. Some untested example code: import qualified Data.Attoparsec as A import qualified Data.Attoparsec.Char8 as A8 import qualified Data.ByteString as B rawCurrency :: A.Parser (ByteString,ByteString) rawCurrency = do dollars - A.scan 0 step _ - A.char '.' -- Assuming it's required... cents - A.takeWhile1 isDigit_w8 -- Assuming no commas... return (dollars,cents) where step :: Int - Word8 - Maybe Int step 3 0x2C = Just 0 step s c | isDigit_w8 c = Just $! s+1 step _ _= Nothing -- Note: the order of comparisons is part of why it's fast. -- | A fast digit predicate. isDigit_w8 :: Word8 - Bool isDigit_w8 w = (w = 0x39 w = 0x30) {-# INLINE isDigit_w8 #-} -- With the dots filled in by whatever representation you use. currency :: A.Parser ... currency = do (dollars,cents) - rawCurrency let step a w = a * 10 + fromIntegral (w - 0x30) d = B.foldl' step 0 (B.filter (/= 0x2C) dollars) c = fromIntegral (B.foldl' step 0 cents) / (10 ^ length cents) return (... d ... c ...) amount :: A.Parser ... amount = pos | neg where pos = A8.char '$' * currency neg = do _ - A8.string ($ a - currency _ - A8.char ')' return (negate a) [1] And if you're using