Re: [Haskell-cafe] parsing currency amounts with parsec

2011-05-12 Thread Eric Rasmussen
Very helpful -- thanks everyone! The handling of currency amounts in hledger
is what I was looking for in terms of alternate ways to parse and represent
dollar amounts in Haskell.

On Wed, May 11, 2011 at 6:05 PM, Simon Michael si...@joyful.com wrote:

 On 5/10/11 2:52 PM, Roman Cheplyaka wrote:

 You could read hledger[1] sources for inspiration: it's written in
 Haskell and contains some (quite generic) currency parsing.


 Hi Eric.. here's the code in question:


 http://hackage.haskell.org/packages/archive/hledger-lib/0.14/doc/html/src/Hledger-Read-JournalReader.html#postingamount

 and some related docs:

 http://hledger.org/MANUAL.html#amounts

 It's probably more complicated and less efficient than you need, but a
 source of ideas.



 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] parsing currency amounts with parsec

2011-05-11 Thread Simon Michael

On 5/10/11 2:52 PM, Roman Cheplyaka wrote:

You could read hledger[1] sources for inspiration: it's written in
Haskell and contains some (quite generic) currency parsing.


Hi Eric.. here's the code in question:

http://hackage.haskell.org/packages/archive/hledger-lib/0.14/doc/html/src/Hledger-Read-JournalReader.html#postingamount

and some related docs:

http://hledger.org/MANUAL.html#amounts

It's probably more complicated and less efficient than you need, but a source 
of ideas.


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] parsing currency amounts with parsec

2011-05-10 Thread Roman Cheplyaka
* Eric Rasmussen ericrasmus...@gmail.com [2011-05-09 15:07:57-0700]
 Hi everyone,
 
 I am relatively new to Haskell and Parsec, and I couldn't find any articles
 on parsing numbers in the following format:

You could read hledger[1] sources for inspiration: it's written in
Haskell and contains some (quite generic) currency parsing.

[1]: http://hledger.org/

-- 
Roman I. Cheplyaka :: http://ro-che.info/
Don't worry what people think, they don't do it very often.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] parsing currency amounts with parsec

2011-05-09 Thread Antoine Latter
On Mon, May 9, 2011 at 5:07 PM, Eric Rasmussen ericrasmus...@gmail.com wrote:
 Hi everyone,

 I am relatively new to Haskell and Parsec, and I couldn't find any articles
 on parsing numbers in the following format:

 Positive: $115.33
 Negative: ($1,323.42)

 I'm working on the parser for practical purposes (to convert a 3rd-party
 generated, most unhelpful format into one I can use), and I'd really
 appreciate any insight into a better way to do this, or if there are any
 built-in functions/established libraries that would be better suited to the
 task. My code below works, but doesn't seem terribly efficient.


Why do you think it inefficient? Is it slow?

I don't have any substantial suggestions, but from a style perspective:

* I would question the use of IEEE binary-floating-point number types
for currency. Haskell ships with a fixed-point decimal library, but I
don't know how fast it is:
http://hackage.haskell.org/packages/archive/base/latest/doc/html/Data-Fixed.html

* You can re-use the 'positive' parser in the 'negative' one:

 negAmount = do char '('
   a - posAmount
   char ')'
  return (negate a)

* In the 'currencyAmount' declaration, I'm not sure that you need the
'try' before the 'negAmount', but I don't know what the rest of your
grammar looks like.

 Thanks!
 Eric

 -
 {- parses positive and negative dollar amounts -}

 integer :: CharParser st Integer
 integer = PT.integer lexer

 float :: CharParser st Double
 float = PT.float lexer

 currencyAmount = try negAmount | posAmount

 negAmount = do char '('
               char '$'
               a - currency
               char ')'
               return (negate a)

 posAmount = do char '$'
               a - currency
               return a

 currency = do parts - many floatOrSep
              let result = combine orderedParts where
                    combine = sumWithFactor 1
                    orderedParts = reverse parts
              return result

 floatOrSep = try float | beforeSep

 beforeSep = do a - integer
               char ','
               return (fromIntegral a :: Double)

 sumWithFactor n []     = 0
 sumWithFactor n (x:xs) = n * x + next
                         where next = sumWithFactor (n*1000) xs



 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe



___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] parsing currency amounts with parsec

2011-05-09 Thread wren ng thornton

On 5/9/11 10:04 PM, Antoine Latter wrote:

On Mon, May 9, 2011 at 5:07 PM, Eric Rasmussenericrasmus...@gmail.com  wrote:

Hi everyone,

I am relatively new to Haskell and Parsec, and I couldn't find any articles
on parsing numbers in the following format:

Positive: $115.33
Negative: ($1,323.42)

I'm working on the parser for practical purposes (to convert a 3rd-party
generated, most unhelpful format into one I can use), and I'd really
appreciate any insight into a better way to do this, or if there are any
built-in functions/established libraries that would be better suited to the
task. My code below works, but doesn't seem terribly efficient.


Why do you think it inefficient? Is it slow?

I don't have any substantial suggestions, but from a style perspective:

* I would question the use of IEEE binary-floating-point number types
for currency. Haskell ships with a fixed-point decimal library, but I
don't know how fast it is:
http://hackage.haskell.org/packages/archive/base/latest/doc/html/Data-Fixed.html


There are also a few other options (that I can't seem to find links to 
at the moment), and some hints on how to do it yourself:



http://augustss.blogspot.com/2007/04/overloading-haskell-numbers-part-3.html


I'm not sure if your choice of parsing library is fixed or not, but you 
could probably speed things up significantly by using Attoparsec. In 
particular, Attoparsec's combinators for takeWhile, takeWhile1,scan,... 
return bytestrings, and you can then fold over the bytestring quite 
nicely. If your source is ASCII or anything ASCII compatible (Latin-1, 
Latin-9, UTF-8,...) then take a look at Data.Attoparsec.Char8.decimal[1][2].


Unless you need to verify that commas occur exactly every third digit, 
I'd suggest (a) dropping commas while scanning the string, or (b) 
implicitly dropping commas while folding over the string. If you do need 
to verify this, then your best option is (a), since you can maintain a 
state machine about how many digits seen since the last comma. Once 
you're using the Attoparsec strategy of folding over the raw byte 
buffers, then the only room for improvement is going to be making the 
code as straight-line as possible.


Some untested example code:

import qualified Data.Attoparsec   as A
import qualified Data.Attoparsec.Char8 as A8
import qualified Data.ByteString   as B

rawCurrency :: A.Parser (ByteString,ByteString)
rawCurrency = do
dollars - A.scan 0 step
_   - A.char '.'   -- Assuming it's required...
cents   - A.takeWhile1 isDigit_w8  -- Assuming no commas...
return (dollars,cents)
where
step :: Int - Word8 - Maybe Int
step 3 0x2C = Just 0
step s c | isDigit_w8 c = Just $! s+1
step _ _= Nothing

-- Note: the order of comparisons is part of why it's fast.
-- | A fast digit predicate.
isDigit_w8 :: Word8 - Bool
isDigit_w8 w = (w = 0x39  w = 0x30)
{-# INLINE isDigit_w8 #-}

-- With the dots filled in by whatever representation you use.
currency :: A.Parser ...
currency = do
(dollars,cents) - rawCurrency
let step a w = a * 10 + fromIntegral (w - 0x30)
d = B.foldl' step 0 (B.filter (/= 0x2C) dollars)
c = fromIntegral (B.foldl' step 0 cents)
/ (10 ^ length cents)
return (... d ... c ...)

amount :: A.Parser ...
amount = pos | neg
where
pos = A8.char '$' * currency
neg = do
_ - A8.string ($
a - currency
_ - A8.char ')'
return (negate a)




[1] And if you're using Attoparsec itself, you may want to take a look 
at Data.Attoparsec.Zepto as well.


[2] If you're basing code on Attoparsec, you may want to look at some of 
my pending patches which improve performance on this (already extremely 
fast) code:


https://bitbucket.org/winterkoninkje/attoparsec/changesets

--
Live well,
~wren

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] parsing currency amounts with parsec

2011-05-09 Thread Eric Rasmussen
I'll check out Attoparsec, thanks! My first attempt may work for this
particular task, but I'm warming up for a more intense parsing project and
it sounds like Attoparsec with Bytestrings may work best.

Also, just in case anyone reads this thread later and is looking for a quick
Parsec solution, I discovered that the code I posted initially was a bit
greedy in a bad way if the dollar amount was at the end of the line. I got
rid of the original currency, floatOrSep, and beforeSep functions and
replaced them with the code below (still verbose, but hopefully a better
starting point for now).



double = do i - integer
return (fromIntegral i :: Double)

currency = try float | largeAmount

largeAmount = do first - double
 rest  - many afterSep
 let parts = first : rest
 let result = combine orderedParts where
 combine = sumWithFactor 1
 orderedParts = reverse parts
 return result

afterSep = do char ','
  try float | double




On Mon, May 9, 2011 at 8:15 PM, wren ng thornton w...@freegeek.org wrote:

 On 5/9/11 10:04 PM, Antoine Latter wrote:

 On Mon, May 9, 2011 at 5:07 PM, Eric Rasmussenericrasmus...@gmail.com
  wrote:

 Hi everyone,

 I am relatively new to Haskell and Parsec, and I couldn't find any
 articles
 on parsing numbers in the following format:

 Positive: $115.33
 Negative: ($1,323.42)

 I'm working on the parser for practical purposes (to convert a 3rd-party
 generated, most unhelpful format into one I can use), and I'd really
 appreciate any insight into a better way to do this, or if there are any
 built-in functions/established libraries that would be better suited to
 the
 task. My code below works, but doesn't seem terribly efficient.


 Why do you think it inefficient? Is it slow?

 I don't have any substantial suggestions, but from a style perspective:

 * I would question the use of IEEE binary-floating-point number types
 for currency. Haskell ships with a fixed-point decimal library, but I
 don't know how fast it is:

 http://hackage.haskell.org/packages/archive/base/latest/doc/html/Data-Fixed.html


 There are also a few other options (that I can't seem to find links to at
 the moment), and some hints on how to do it yourself:



 http://augustss.blogspot.com/2007/04/overloading-haskell-numbers-part-3.html


 I'm not sure if your choice of parsing library is fixed or not, but you
 could probably speed things up significantly by using Attoparsec. In
 particular, Attoparsec's combinators for takeWhile, takeWhile1,scan,...
 return bytestrings, and you can then fold over the bytestring quite nicely.
 If your source is ASCII or anything ASCII compatible (Latin-1, Latin-9,
 UTF-8,...) then take a look at Data.Attoparsec.Char8.decimal[1][2].

 Unless you need to verify that commas occur exactly every third digit, I'd
 suggest (a) dropping commas while scanning the string, or (b) implicitly
 dropping commas while folding over the string. If you do need to verify
 this, then your best option is (a), since you can maintain a state machine
 about how many digits seen since the last comma. Once you're using the
 Attoparsec strategy of folding over the raw byte buffers, then the only room
 for improvement is going to be making the code as straight-line as possible.

 Some untested example code:

import qualified Data.Attoparsec   as A
import qualified Data.Attoparsec.Char8 as A8
import qualified Data.ByteString   as B

rawCurrency :: A.Parser (ByteString,ByteString)
rawCurrency = do
dollars - A.scan 0 step
_   - A.char '.'   -- Assuming it's required...
cents   - A.takeWhile1 isDigit_w8  -- Assuming no commas...
return (dollars,cents)
where
step :: Int - Word8 - Maybe Int
step 3 0x2C = Just 0
step s c | isDigit_w8 c = Just $! s+1
step _ _= Nothing

-- Note: the order of comparisons is part of why it's fast.
-- | A fast digit predicate.
isDigit_w8 :: Word8 - Bool
isDigit_w8 w = (w = 0x39  w = 0x30)
{-# INLINE isDigit_w8 #-}

-- With the dots filled in by whatever representation you use.
currency :: A.Parser ...
currency = do
(dollars,cents) - rawCurrency
let step a w = a * 10 + fromIntegral (w - 0x30)
d = B.foldl' step 0 (B.filter (/= 0x2C) dollars)
c = fromIntegral (B.foldl' step 0 cents)
/ (10 ^ length cents)
return (... d ... c ...)

amount :: A.Parser ...
amount = pos | neg
where
pos = A8.char '$' * currency
neg = do
_ - A8.string ($
a - currency
_ - A8.char ')'
return (negate a)




 [1] And if you're using