I would like to revisit the idea of using J to parse xml.
The xml/sax addon was a nice idea, but not very stable. It represented
xml as a series of events (function calls), and left it up to the user
how they would structure the result. Unfortunately, it also rather
reliably crashes J.
This can be mitigated in various ways. If what you are parsing is
simple enough, and you can live with 32 bit j602, xml/sax can work
great. But those are not always ideal constraints to work with.
But... what's a good data structure in J, to represent xml?
A problem is that xml is something of a living example of "the nice
thing about standards is that there are so many to choose from". The
standards documents describing xml are voluminous, and there are many
alternatives which are physically different but logically similar to
wade through.
Still, at a basic level, xml is something of a nested sequence type of
a thing. So one approach might leverage boxed character arrays. This
will not be particularly efficient, but it's a start.
For example, this xml snippet:
<ab cd="ef" gh="ijk">lmnop</a>
Might be represented in J as:
'ab';<('cd';'ef'),('gh';'ijk'),:'';<<'lmnop'
(The extra boxing on the text is because that might in the general
case actually be a sequence of elements).
Another approach might be:
'ab';(('cd';'ef'),:('gh';'ijk'));<<'lmnop'
Here, the [textual, in this case] content of the element is stored in
a separate box from the attributes, instead of treating it as a
blank-named attribute.
But perhaps there are good non-boxed ways of representing the structure?
Has anyone else been working with xml in J?
Thanks,
--
Raul
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm