This is *way* off topic, but most of you are language weenies, and you're a
conveniently captive audience. :-)

Various projects have had me working on "little languages" lately. For
example, we developed a specialty language for describing registers in
chips and how various functional units are arranged in a System on Chip CPU
for Coyotos, and I'm now adapting it for something else.

At the same time, I've been working on a parser generator. Not that the
world needs *another* parser generator, but I want the BitC standard
library to be written in BitC, and I want it to include a package that
reads BitC input and produces a standardized AST. And of course, if you're
going to produce a parser, you need to parse the specification file...

In the course of this it occurred to me belatedly that there is an
under-reported reason to use XML: at the price of making it much harder to
write the input by hand, and reducing your ability to validate (or in some
cases improving it), you get a nearly free ride on building a parser. At
least where specialized "declaration" languages are concerned, most of the
"little language" tools I can think of really can be done in XML, and an
awful lot of the tools that read them can be implemented as XSLT scripts.

You also get to leverage all of the *other* tools that have been done in
XML, most notably document production tools.

The catch is readability. Speaking for myself, I find:

package IA32;

struct PTE 32 {
  bit   V       0; /* present (a.k.a. valid) */
  bit   W       1; /* writable */
  bit   USER    2; /* user-accessable */
  bit   PWT     3; /* page write through */
  bit   PCD     4; /* page cache disable */
  bit   ACC     5; /* accessed */
  bit   DRTY    6; /* dirty */
  bit   PGSZ    7; /* large page (PDE only, only if CR4.PSE) */
  bit   PAT     7; /* page attribute table */
  bit   GLBL    8; /* global page (PDE, PTE, only if CR4.PGE) */
  field SW      9 11; /* software defined */
  bit   PAT4M   12; /* page attribute table, 4M pages */

  field FRAME   12 31; /* page frame number */
  field FRAME4M 22 31; /* page frame number, 4M page */
}


To be infinitely more readable than something like:

<package name="IA32"/>

<struct name="PTE" width="32">
  <bit name="V" bit="31"/>
  ...

  <field name=SW" from="9" to="11"/>
  ...

</reg>


But this really might be one of those kinds of cases where the benefit of
broad machine readability and interoperability is more important than
developer convenience.

A relaxed form of JSON (using unquoted identifiers) would *almost* work, or
YAML, but the missing piece here is the fact that left brackets do not have
types. I really have the sense that labeled braces here are a good thing.
But I also want to be able to embed XML documentation strings, and every
other labeled bracketing system I've seen turns out to be equally
cumbersome.

It really seems like this ought to be a simple problem with a general
solution, but what I *think* is going on here is that I'm exploiting type
inference. For example, I know that a "bit' must have two children: an
identifier providing a name and a nat providing a position.


This feels like it ought to have a good general solution, since it's really
just another way of describing hierarchies. Just as an alternative notional
syntax:

PTE : struct 32
    V : bit 0
    W : bit 1


and if you throw in some "smart white space processing" here (in the style
of haskell), and you have prior schema knowledge about what things come in
what order with what types, you can kind of see that there is a direct
conversion to XML here. Or to JSON, or to any number of other formats.

Anybody know of an existing solution? I think it's the type checking
(a.k.a. validation) that has me playing with this.



Jonathan
_______________________________________________
bitc-dev mailing list
[email protected]
http://www.coyotos.org/mailman/listinfo/bitc-dev

Reply via email to