Re: [Haskell-cafe] Parsec to parse tree structures?

2010-03-18 Thread Stephen Tetley
hi wren

Where I've used it, random access does seem conceptual more
satisfactory than trying to avoid it.

Well designed binary formats are deterministic - so wherever you are
inside the file you should know what you are parsing. One example of
this determinism is that parsing local alternatives are generally
encoded on a single tag, whereas in a parser for a programming
language parsing alternatives might require lookahead and possibly
other disambiguation. Another example is that formats will often have
a index table table at the start of the file giving start position
and length for all the sub tables - this saves you from having to
start each table with a tag and length.

For complex formats e.g. PECOFF or TrueType, you might only want to
parse certain tables [*]. After parsing the index table you could
build a list of parsers to run sequentially on the body of the file
(including parsers that just drop unwanted tables), but this seems too
much work (and too much to go wrong), when I can use a function almost
as simple as parseAt for the tables I'm interested in.

parseAt :: Start - End - Parser a - Parser a

In practice, I made parseAt slightly more complicated so it could
encode where the cursor is moved to at the end of the parse.

Best wishes

Stephen

[*] Certain tables in TrueType / OpenType are propriety - it might be
unwise for an open source parser to even include parsing operations
for those tables.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Parsec to parse tree structures?

2010-03-18 Thread wren ng thornton

Stephen Tetley wrote:

hi wren

Where I've used it, random access does seem conceptual more
satisfactory than trying to avoid it.


I was thinking more about performance issues (avoiding disk seeks) which 
would also alleviate the problem of needing random access when it's not 
available.




For complex formats e.g. PECOFF or TrueType, you might only want to
parse certain tables [*]. After parsing the index table you could
build a list of parsers to run sequentially on the body of the file
(including parsers that just drop unwanted tables), but this seems too
much work (and too much to go wrong)


Even still, you could linearize the access in order to minimize seeking 
back and forth. The linearization could even be done automatically by 
having the table of what needs backpatching also serve as a work queue 
(when done with a block, seek to the closest next block; when the queue 
is empty, you're done). The backpatching queue also preserves structure 
sharing.


Perhaps I've just been parsing too many multigigabyte files lately... :)

--
Live well,
~wren
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Parsec to parse tree structures?

2010-03-18 Thread Jason Dagit
On Sun, Mar 14, 2010 at 9:03 AM, david fries d...@gmx.ch wrote:

 Hello Café

 Some time ago I wrote a parser for a project of one our customers. The
 format was proprietary and binary. The data was structured as a tree
 with tables pointing to sub tables farther in the file. (Well actually
 there was one or two cases where branches joined together, so I guess it
 was a directed graph.) Also it had no defined endianess, some tables
 were bigendian others were little endian and some were in their very own
 in-house-endianess.
 All in all, the typical binary data structure that has been in use and
 continuously extended for the last 15 years. You know the kind.

 Oddly enough, our customer never bothered to write a parser of their
 own. I wonder why.

 My parser was written in C# and wasn't particularly elegant, but it
 worked reliably. I was wondering how you would parse tree-like
 structures with Parsec (or other functional parsers)? Up to know, all
 examples I've seen were of sequential data. To parse trees, you'd
 essentially need to be able to follow pointers, parse whatever is there
 and then jump back. I guess I'd have to mess around with the internal
 state of the parser to do that, which is something I'd rather avoid.


Would binary or cereal be right for this task?
http://hackage.haskell.org/packages/archive/binary/0.4.1/doc/html/Data-Binary.html
http://hackage.haskell.org/packages/archive/cereal/0.2/doc/html/Data-Serialize.html

My understanding is they provide low-level ways to get at bytes in a chunk
of memory.  Using them, I believe you'd read the data into a bytestring and
then load in the bytes and seek/jump as needed.  I don't know if they
support the backwards jumping though.

And, I think ideally you use it by defining a type class instance for each
object in the binary format and then use the provided type classes to just
get/put your format.

If the representation is very much like a C struct, you might consider
Storable.  Although, I've never heard of anyone using it as a parser.

Jason
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Parsec to parse tree structures?

2010-03-17 Thread david fries
Thanks for the links!

I didn't realize that Iteratee can also do random access. I looked at
Oleg's slides a while back, but I haven't wrapped my head around it.

On Mon, 2010-03-15 at 08:57 -0500, John Lato wrote:
 Hi Dave,
 
  From: david fries d...@gmx.ch
 
  Hello Café
 
  Some time ago I wrote a parser for a project of one our customers. The
  format was proprietary and binary. The data was structured as a tree
  with tables pointing to sub tables farther in the file. (Well actually
  there was one or two cases where branches joined together, so I guess it
  was a directed graph.) Also it had no defined endianess, some tables
  were bigendian others were little endian and some were in their very own
  in-house-endianess.
  All in all, the typical binary data structure that has been in use and
  continuously extended for the last 15 years. You know the kind.
 
  My parser was written in C# and wasn't particularly elegant, but it
  worked reliably. I was wondering how you would parse tree-like
  structures with Parsec (or other functional parsers)? Up to know, all
  examples I've seen were of sequential data. To parse trees, you'd
  essentially need to be able to follow pointers, parse whatever is there
  and then jump back. I guess I'd have to mess around with the internal
  state of the parser to do that, which is something I'd rather avoid.
 
 Have you seen Edward Kmett's Parsing Trifecta talk/slides, available
 at http://comonad.com/reader/2009/iteratees-parsec-and-monoid/ ?  I
 think a similar approach would allow you to parse this data structure.
  The trick is to create a data source that can read from arbitrary
 locations.
 
 Also see http://okmij.org/ftp/Haskell/Iteratee/Tiff.hs for an
 iteratee-based TIFF reader.  TIFF files have a similar structure, so
 this code could apply pretty directly.  The parsing is quite
 low-level, but you could use iteratee-parsec [1] or
 attoparsec-iteratee [2] to use parser combinators with this technique.
 
 [1] http://hackage.haskell.org/package/iteratee-parsec
 [2] http://hackage.haskell.org/package/attoparsec-iteratee


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Parsec to parse tree structures?

2010-03-17 Thread david fries
Hi Ryan,

1) Retaining shared references was not important to us at the time. We
were only interested in the correctness of the values that were written.

2) Performance wasn't a big issue either. The parser was part of a
debugging tool we wrote for internal use. 

My my concern was how you would perform random access in a functional
parser. You're points are interesting too. I guess if we really had
wanted to work with parsed objects, retaining the shared references
would have been a must.

On Mon, 2010-03-15 at 14:57 -0700, Ryan Ingram wrote:
 Here are some questions you have not answered that are quite important
 for your design:
 
 1) How important is retaining shared references?  In particular, is
 the structure mutable after being read-in to memory?  If it is, and
 you mutate an object, do you expect other references to that object to
 be mutated as well?
 
 2) If the answer to (1) is not important, then how important is
 parsing performance?  If you parse the same object more than once
 (turning the directed graph into a tree), is that a problem?
 
 This is an interesting problem and of course there are a lot of ways
 to tackle it, but I think you need to elaborate a bit more on what is
 needed.  If you want to maintain the object graph structure, you'll
 treat the problem quite a bit differently than if you are happy to
 convert it to a pure data structure.
 
   -- ryan
 
 On Sun, Mar 14, 2010 at 9:03 AM, david fries d...@gmx.ch wrote:
  Hello Café
 
  Some time ago I wrote a parser for a project of one our customers. The
  format was proprietary and binary. The data was structured as a tree
  with tables pointing to sub tables farther in the file. (Well actually
  there was one or two cases where branches joined together, so I guess it
  was a directed graph.) Also it had no defined endianess, some tables
  were bigendian others were little endian and some were in their very own
  in-house-endianess.
  All in all, the typical binary data structure that has been in use and
  continuously extended for the last 15 years. You know the kind.
 
  Oddly enough, our customer never bothered to write a parser of their
  own. I wonder why.
 
  My parser was written in C# and wasn't particularly elegant, but it
  worked reliably. I was wondering how you would parse tree-like
  structures with Parsec (or other functional parsers)? Up to know, all
  examples I've seen were of sequential data. To parse trees, you'd
  essentially need to be able to follow pointers, parse whatever is there
  and then jump back. I guess I'd have to mess around with the internal
  state of the parser to do that, which is something I'd rather avoid.
 
 
  regards,
  dave
 
  ___
  Haskell-Cafe mailing list
  Haskell-Cafe@haskell.org
  http://www.haskell.org/mailman/listinfo/haskell-cafe
 


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Parsec to parse tree structures?

2010-03-17 Thread wren ng thornton

david fries wrote:

My my concern was how you would perform random access in a functional
parser. You're points are interesting too. I guess if we really had
wanted to work with parsed objects, retaining the shared references
would have been a must.


Out of curiosity, was there *really* a need for random access? It sounds 
like you could've written things to do it in one forward pass, keeping a 
table on the side of the targets for dangling pointers, and then 
back-patching once you've reached the object being pointed at. You may 
even be able to use laziness to do the back-patching instead of needing 
STRefs. The only problem I could see with that approach is if there was 
funny bit-packing going on so that objects were overlapped in the same 
memory region.


--
Live well,
~wren
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Parsec to parse tree structures?

2010-03-15 Thread John Lato
Hi Dave,

 From: david fries d...@gmx.ch

 Hello Café

 Some time ago I wrote a parser for a project of one our customers. The
 format was proprietary and binary. The data was structured as a tree
 with tables pointing to sub tables farther in the file. (Well actually
 there was one or two cases where branches joined together, so I guess it
 was a directed graph.) Also it had no defined endianess, some tables
 were bigendian others were little endian and some were in their very own
 in-house-endianess.
 All in all, the typical binary data structure that has been in use and
 continuously extended for the last 15 years. You know the kind.

 My parser was written in C# and wasn't particularly elegant, but it
 worked reliably. I was wondering how you would parse tree-like
 structures with Parsec (or other functional parsers)? Up to know, all
 examples I've seen were of sequential data. To parse trees, you'd
 essentially need to be able to follow pointers, parse whatever is there
 and then jump back. I guess I'd have to mess around with the internal
 state of the parser to do that, which is something I'd rather avoid.

Have you seen Edward Kmett's Parsing Trifecta talk/slides, available
at http://comonad.com/reader/2009/iteratees-parsec-and-monoid/ ?  I
think a similar approach would allow you to parse this data structure.
 The trick is to create a data source that can read from arbitrary
locations.

Also see http://okmij.org/ftp/Haskell/Iteratee/Tiff.hs for an
iteratee-based TIFF reader.  TIFF files have a similar structure, so
this code could apply pretty directly.  The parsing is quite
low-level, but you could use iteratee-parsec [1] or
attoparsec-iteratee [2] to use parser combinators with this technique.

[1] http://hackage.haskell.org/package/iteratee-parsec
[2] http://hackage.haskell.org/package/attoparsec-iteratee
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Parsec to parse tree structures?

2010-03-15 Thread Ryan Ingram
Here are some questions you have not answered that are quite important
for your design:

1) How important is retaining shared references?  In particular, is
the structure mutable after being read-in to memory?  If it is, and
you mutate an object, do you expect other references to that object to
be mutated as well?

2) If the answer to (1) is not important, then how important is
parsing performance?  If you parse the same object more than once
(turning the directed graph into a tree), is that a problem?

This is an interesting problem and of course there are a lot of ways
to tackle it, but I think you need to elaborate a bit more on what is
needed.  If you want to maintain the object graph structure, you'll
treat the problem quite a bit differently than if you are happy to
convert it to a pure data structure.

  -- ryan

On Sun, Mar 14, 2010 at 9:03 AM, david fries d...@gmx.ch wrote:
 Hello Café

 Some time ago I wrote a parser for a project of one our customers. The
 format was proprietary and binary. The data was structured as a tree
 with tables pointing to sub tables farther in the file. (Well actually
 there was one or two cases where branches joined together, so I guess it
 was a directed graph.) Also it had no defined endianess, some tables
 were bigendian others were little endian and some were in their very own
 in-house-endianess.
 All in all, the typical binary data structure that has been in use and
 continuously extended for the last 15 years. You know the kind.

 Oddly enough, our customer never bothered to write a parser of their
 own. I wonder why.

 My parser was written in C# and wasn't particularly elegant, but it
 worked reliably. I was wondering how you would parse tree-like
 structures with Parsec (or other functional parsers)? Up to know, all
 examples I've seen were of sequential data. To parse trees, you'd
 essentially need to be able to follow pointers, parse whatever is there
 and then jump back. I guess I'd have to mess around with the internal
 state of the parser to do that, which is something I'd rather avoid.


 regards,
 dave

 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Parsec to parse tree structures?

2010-03-14 Thread Stephen Tetley
On 14 March 2010 16:03, david fries d...@gmx.ch wrote:
[SNIP]

 Oddly enough, our customer never bothered to write a parser of their
 own. I wonder why.

Hi David

If the binary structure was previously used only with C programs its
quite common just to use casting to unpack the data into a struct -
though your example seems to suggest this wasn't being done as the
format had both big and little endian tables.

In Haskell or other modern functional languages like SML, parse trees
are generally represented as algebraic types - so there are no
pointers. If you're familiar with ANTLR from the OO world, its rather
like working with the tree definition automatically as opposed to
generating classes from the data description language.


Best wishes

Stephen
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Parsec to parse tree structures?

2010-03-14 Thread david fries
Hi Stephen

Perhaps my description of the format was a bit unclear. When I said
pointer I simply meant a number which is the position in the stream. 

Imagine the tables looking something like this:

RootTable
   HeaderMagicNumber (1Byte): 0x50
   VersionNumber (2 Bytes): 1234
   SubTablePointer (4 Bytes): 200 .
...Some more fields of the root table ¦
  ¦
  ¦
SubTable (positioned at byte 200 of the file)'
   SomeFlags (1 Byte): 0x00
   Comment (Variable String): hello, world!\0

AnotherTable
...

Our parser produced object instances representing each table. And I used
normal references between tables instead of pointers. 

You had to start parsing at the beginning of the file (where the root
table is), otherwise you have no clue where in the structure you are.
Because all tables after the root table are dynamically positioned when
the whole thing is serialized. 
So, to parse the root, you read the first byte, check that it's 0x50,
read the next two bytes (which are the version number), then you read
four bytes (pointer to the sub table). The sub table starts at byte 200
of the file. So now I would jump to that position in the file and start
parsing SubTable. After that I'd jump back and parse the remaining
fields of the root table.

On Sun, 2010-03-14 at 16:23 +, Stephen Tetley wrote:
 On 14 March 2010 16:03, david fries d...@gmx.ch wrote:
 [SNIP]
 
  Oddly enough, our customer never bothered to write a parser of their
  own. I wonder why.
 
 Hi David
 
 If the binary structure was previously used only with C programs its
 quite common just to use casting to unpack the data into a struct -
 though your example seems to suggest this wasn't being done as the
 format had both big and little endian tables.
 
 In Haskell or other modern functional languages like SML, parse trees
 are generally represented as algebraic types - so there are no
 pointers. If you're familiar with ANTLR from the OO world, its rather
 like working with the tree definition automatically as opposed to
 generating classes from the data description language.
 
 
 Best wishes
 
 Stephen
 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Parsec to parse tree structures?

2010-03-14 Thread Stephen Tetley
Hi David

Ah ha - this form of binary file layout is quite common (e.g. PECOFF
object files and OpenType / TrueType fonts).

Parsec and other parsing libraries are perhaps not ideal for the task,
as they consume input as they parse. I have my own alternative to
Parsec - Kangaroo [1] - for parsing binary files. It moves a cursor
around inside the file (strictly speaking an array in memory from
reading the file), so you can parse within a sub-region of the file
and jump back out again.

Although its on Hackage, I wouldn't really recommend its use - its now
fairly well documented but the API is not stable and I only work on it
sporadically. Because I didn't want any dependencies, the package is
quite a bit larger than it need be - if someone were interested in
technique they might be better off using it as a start point. The most
important bits are the 'intraparse' function and the monadic machinery
inside the Kangaroo.ParseMonad module.

Even when a binary format has a published standard, unfortunately the
standard might not be detailed enough to actually produce a parser.
This is the case for True Type and PECOFF which I wrote Kangaroo for,
and as I don't have much enthusiasm for deriving a parser from another
open-source implementation, its rather stalling any continued
development of Kangaroo.


[1] http://hackage.haskell.org/package/kangaroo

Best wishes

Stephen
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Parsec to parse tree structures?

2010-03-14 Thread david fries
Yep, That's what I had in mind. Thanks for the link.

On Sun, 2010-03-14 at 19:09 +, Stephen Tetley wrote:
 Hi David
 
 Ah ha - this form of binary file layout is quite common (e.g. PECOFF
 object files and OpenType / TrueType fonts).
 
 Parsec and other parsing libraries are perhaps not ideal for the task,
 as they consume input as they parse. I have my own alternative to
 Parsec - Kangaroo [1] - for parsing binary files. It moves a cursor
 around inside the file (strictly speaking an array in memory from
 reading the file), so you can parse within a sub-region of the file
 and jump back out again.
 
 Although its on Hackage, I wouldn't really recommend its use - its now
 fairly well documented but the API is not stable and I only work on it
 sporadically. Because I didn't want any dependencies, the package is
 quite a bit larger than it need be - if someone were interested in
 technique they might be better off using it as a start point. The most
 important bits are the 'intraparse' function and the monadic machinery
 inside the Kangaroo.ParseMonad module.
 
 Even when a binary format has a published standard, unfortunately the
 standard might not be detailed enough to actually produce a parser.
 This is the case for True Type and PECOFF which I wrote Kangaroo for,
 and as I don't have much enthusiasm for deriving a parser from another
 open-source implementation, its rather stalling any continued
 development of Kangaroo.
 
 
 [1] http://hackage.haskell.org/package/kangaroo
 
 Best wishes
 
 Stephen
 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe