Re: [Haskell-cafe] types for parsing a tree

2010-09-17 Thread S. Doaitse Swierstra

On 16 sep 2010, at 05:42, Jared Jennings wrote:

> On Fri, Sep 10, 2010 at 2:00 PM, S. Doaitse Swierstra
>  wrote:
>> I show how this can be done using uu-parsinglib. Note that we have sevral 
>> parsers, each having its own type:
> 
> Thanks for such a complete example, Doaitse! Unfortunately I have a
> requirement I didn't disclose: the simple tags like , ,
>  could come in any order; and some are optional. I tried to
> fix that by making every field in my Transaction record a Maybe, and
> keeping a Transaction as state for my parser. But after so many Maybes
> I began to think this was not the right way. And I had to run a parser
> as part of another parser. And after all that, it wouldn't build
> because it was badly typed.

The good news is that the library has combinators for that too ;-} Just change 
a few lines. If they are optional use the pOpt combinator instead of the pOne. 

  Doaitse

module Transactions where
import Text.ParserCombinators.UU
import Text.ParserCombinators.UU.Examples
import Data.Char

pTagged tag (pAttr, pPayload) =  pToken ("<" ++ tag ++ ">") *> pAttr *> spaces 
*> pPayload <* spaces <*
pToken ("")
pTagtag pPayload  =  pToken ("<" ++ tag ++ ">") *> pPayload

data OFX = OFX Response deriving Show
data Response = Response [Transaction] deriving Show
data Transaction = Transaction String String Amount deriving Show
data Amount  = Amount Int Int deriving Show

pAmount  = "TRNAMT"   `pTag` (Amount <$> pNatural <* pSym '.' <*> 
pNatural)
pTransaction = "STMTTRN"  `pTagged` (pAttr, Transaction `pMerge`( pOne 
("TRNUID" `pTag` pLine)
<||> pOne 
("NAME"   `pTag` pLine)
<||> pOne  
pAmount
   )
   )
pResponse= "STMTRS"   `pTagged` (pAttr, Response <$> pList 
(pTransaction <* spaces))
pOFX = "OFX"  `pTagged` (pAttr, OFX  <$> pResponse )

pAttr :: Parser String
pAttr = pToken "[...]"

spaces = pMunch (`elem` " \n\t")
pDigitAsInt = digit2Int <$> pDigit 
pNatural = foldl (\a b -> a * 10 + b ) 0 <$> pList1 pDigitAsInt
digit2Int a =  ord a - ord '0'
pDigit :: Parser Char
pDigit = pSym ('0', '9')
pLine  = pMunch (/='\n') <* spaces

main = do input <- readFile "TrInput"
 run (pOFX <* spaces) input




> 
> But in any case, thanks for turning me on to
> Text.ParserCombinators.UU; I'd only tried Parsec before.
> ___
> Haskell-Cafe mailing list
> Haskell-Cafe@haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] types for parsing a tree

2010-09-15 Thread Malcolm Wallace

  I have a

requirement I didn't disclose: the simple tags like , ,
 could come in any order; and some are optional.


Search for "permutation parsing"; Doaitse has thought of that too!

Regards,
Malcolm
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] types for parsing a tree

2010-09-15 Thread Jared Jennings
On Fri, Sep 10, 2010 at 2:00 PM, S. Doaitse Swierstra
 wrote:
> I show how this can be done using uu-parsinglib. Note that we have sevral 
> parsers, each having its own type:

Thanks for such a complete example, Doaitse! Unfortunately I have a
requirement I didn't disclose: the simple tags like , ,
 could come in any order; and some are optional. I tried to
fix that by making every field in my Transaction record a Maybe, and
keeping a Transaction as state for my parser. But after so many Maybes
I began to think this was not the right way. And I had to run a parser
as part of another parser. And after all that, it wouldn't build
because it was badly typed.

But in any case, thanks for turning me on to
Text.ParserCombinators.UU; I'd only tried Parsec before.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] types for parsing a tree

2010-09-10 Thread S. Doaitse Swierstra
I show how this can be done using uu-parsinglib. Note that we have sevral 
parsers, each having its own type:

module Transactions where
import Text.ParserCombinators.UU
import Text.ParserCombinators.UU.Examples
import Data.Char

pTagged tag (pAttr, pPayload) =  pToken ("<" ++ tag ++ ">") *> pAttr *> spaces 
*> pPayload <* spaces <*
 pToken ("")
pTagtag pPayload  =  pToken ("<" ++ tag ++ ">") *> pPayload

data OFX = OFX Response deriving Show
data Response= Response [Transaction] deriving Show
data Transaction = Transaction String String Amount deriving Show
data Amount  = Amount Int Int deriving Show

pAmount  = "TRNAMT"   `pTag` (Amount <$> pNatural <* pSym '.' <*> 
pNatural)
pTransaction = "STMTTRN"  `pTagged` (pAttr, Transaction <$>  "TRNUID" 
`pTag` pLine
<*>  "NAME"   
`pTag` pLine
<*> pAmount
)
pResponse= "STMTRS"   `pTagged` (pAttr, Response <$> pList 
(pTransaction <* spaces))
pOFX = "OFX"  `pTagged` (pAttr, OFX  <$> pResponse )

pAttr :: Parser String
pAttr = pToken "[...]"

spaces = pMunch (`elem` " \n\t")
pDigitAsInt = digit2Int <$> pDigit 
pNatural = foldl (\a b -> a * 10 + b ) 0 <$> pList1 pDigitAsInt
digit2Int a =  ord a - ord '0'
pDigit :: Parser Char
pDigit = pSym ('0', '9')
pLine  = pMunch (/='\n') <* spaces

main = do input <- readFile "TrInput"
  run (pOFX <* spaces) input

Running the main function on your code gives:

*Transactions> :r
[1 of 1] Compiling Transactions ( Transactions.hs, interpreted )
Ok, modules loaded: Transactions.
*Transactions> main
--
-- > Result: OFX (Response [Transaction "9223ry29r389" "THE GROCERY STORE 
BLABLABLA" (Amount 234 99),Transaction "1237tg832t" "SOME DUDE ON PAYPAL 
4781487" (Amount 2174 27)])
-- 
*Transactions> 

It is interesting to what happens if your input is incorrect,

 Doaitse





On 10 sep 2010, at 18:53, Jared Jennings wrote:

> [...]
>[...]
>[...]
>9223ry29r389
>THE GROCERY STORE BLABLABLA
>234.99
>
>[...]
>1237tg832t
>SOME DUDE ON PAYPAL 4781487
>2174.27
>
>
>

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] types for parsing a tree

2010-09-10 Thread Stephen Tetley
On 10 September 2010 17:53, Jared Jennings  wrote:

> I've tried
>
>    data OFXThing = OFX { statement :: OFXThing } | StatementResponse
> { ... transactions :: [OFXThing] }
>
> but that would let me make trees of things that make no sense in OFX,
> like a transaction containing a statement.


Using alternative constructors (i.e. sum types) is the right approach
to get different "datatypes" in the tree. However you aren't
respecting the nesting of the tree here - "OFX" is a level higher in
the tree than a statement response and the OFX constructor recurs on
itself which looks suspect.

I suspect OFX is pathologically huge format and isn't a good starting
point for designing syntax trees (the downloadable Spec seemed to be
several megabytes zipped). If the DTD is very large you might want to
use the untyped tree to extract parts of interest and convert after
parsing to a smaller typed tree (with only the syntax you are
interested in).

By the way, HaXML has has a tool called DTD2HS (I think) that will
generate Haskell datatypes from a DTD definition.

Best wishes

Stephen
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] types for parsing a tree

2010-09-10 Thread Jared Jennings
Dear haskell-cafe:

I'm trying to parse an Open Financial eXchange (OFX) 1.x file. It
details my bank transactions, like debit card purchases. It's
SGML-based and it goes like:

[...]
[...]
[...]
9223ry29r389
THE GROCERY STORE BLABLABLA
234.99

[...]
1237tg832t
SOME DUDE ON PAYPAL 4781487
2174.27




I've left out a bunch, but as you can see it's tree-shaped, and the
only reason they didn't misuse XML as a data serialization language
instead of SGML was because it wasn't popular yet. (OFX 2.x uses XML
but my bank doesn't use OFX 2.x.)

When I imagine how to put this into a data structure, I think:

-- The '...' below is stuff like the date, info about the bank
data OFX = OFX { statement :: StatementResponse, ... }
-- The '...' below is stuff like the account number
data StatementResponse = StatementResponse { transactions:
[Transaction], ... }
data Transaction = Transaction { id :: String, name :: String,
amount :: Decimal, sic :: Maybe Int, ... }

Then I tried to make a parser to emit those data types and failed. I
come from Python, where there's no problem if a function returns
different types of values depending on its inputs, but that doesn't
fly in Haskell.

I've tried

data OFXThing = OFX { statement :: OFXThing } | StatementResponse
{ ... transactions :: [OFXThing] }

but that would let me make trees of things that make no sense in OFX,
like a transaction containing a statement.

I made a

 data Tree k v = Branch k [Tree k v] | Leaf k v
 type TextTree = Tree String String

and a tagsoup-parsec parser that returns Branches for tags like OFX,
and Leafs for tags like TRNUID. But now I just have a tree of strings.
That holds no useful type information.

I want my types to say that OFXes contain statements and statements
contain transactions - just like the OFX DTD says. How can I construct
the types so that they are tight enough to be meaningful and loose
enough that it's possible to write functions that emit them?
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe