This is the first of what should be several posts centered around parsing nif.xml
https://raw.github.com/amorilia/nifxml/master/nif.xml I have a variety of reasons for choosing this particular xml file, and maybe at some point we can talk about some of them on the chat forum. But, first, I'd like to walk through the use of J's xml/sax addon package. sax is an xml handling library (http://www.saxproject.org/) which exists outside of J, and knowledge of the api can be used in a variety of languages and contexts. If you want a portable understanding of sax you would probably implement projects with it in several different languages. I'm not going to go that far, here. I just want to provide some basic information about using it in J. So here's a code sample: require'xml/sax' saxclass 'nifxml' startDocument=: 3 :0 Interest=: 0 Result=: '' ) startElement=: 4 :0 if. 'compound'-:y do. if. 'Header'-:x getAttribute 'name' do. Interest=: 1 end. end. if. Interest do. atrs=. ([,'="',],'"'"_)&.>/"1 }.attributes x Result=: Result,'<',(;:inv (<y),atrs),'>' end. ) characters=: 3 :0 if. Interest do. Result=: Result,y end. ) endElement=: 3 :0 if. Interest do. Result=:Result,'</',y,'>' if. 'compound'-:y do. Interest=: 0 end. end. ) endDocument=: 3 :0 Result ) extract=: 3 :0 process fread y ) ----------------------------------------------------------- Notice how I have indented the code. At the top level is this command: saxclass 'nifxml' saxclass erases the definition of nifxml and then begins a new definition for that locale. If you are used to working in object oriented languages, you should think of a named locale as a class (and a numbered locale as an object). Otherwise you should probably think of a locale as being something like a directory - it contains named things (but, unlike directories, cannot contain other locales). The difference between a saxclass and a regular class is that a saxclass automatically uses (or "inherits") commands from the sax xml package. Indented within this class definition are several words: startDocument, endDocument, and extract. extract is just a cover function for the sax routine "process" which processes an xml document. Each time process starts, it will run startDocument once - so that is a good place to put initialization commands. Each time process finishes, it calls endDocument to determine its result. Note that sax is defined to operate sequentially - it parses the xml file and returns. This is very different from the usual J style of coding, so experienced J programmers might be uncomfortable with it. On the other hand, this is very much like how other many other (but not all) programming languages work so this aspect should be comfortable to people with a background in any of a variety of languages. That said, in many cases the time to parse an xml file with sax is trivial. It's usually better to concern yourself with code simplicity than with time, at least for an initial draft of the code. In typical cases your computer will take more time reading the file than parsing it. Indented one level deeper in the class definition are two verbs: startElement and endElement. These run once each for each xml element. (An xml element is a word that follows a < character and end element runs for the corresponding name following a </ character sequence). What I have done here is rig up some code to extract everything inside the <compound> element if it has a name="header" attribute. (I am not going to go into great depth about xml syntax rules - but I wanted to cover some basics for the occasional person who has not worked with them before.) Finally, indented deepest is a single verb definition: characters. This captures the characters between xml elements. If I run this code against the nif.xml file, I get this result: extract_nifxml_ 'c:\users\rdmiller\desktop\furniture\nif.xml' <compound> The NIF file header. <add type="HeaderString">'NetImmerse File Format x.x.x.x' (versions <= 10.0.1.2) or 'Gamebryo File Format x.x.x.x' (versions >= 10.1.0.0), with x.x.x.x the version written out. Ends with a newline character (0x0A).</add> <add type="LineString" arr1="3" ver2="3.1"></add> <add type="FileVersion" default="0x04000002" ver1="3.3.0.13">The NIF version, in hexadecimal notation: 0x04000002, 0x0401000C, 0x04020002, 0x04020100, 0x04020200, 0x0A000100, 0x0A010000, 0x0A020000, 0x14000004, ...</add> <add type="EndianType" default="ENDIAN_LITTLE" ver1="20.0.0.4">Determines the endianness of the data in the file.</add> <add type="ulittle32" ver1="10.1.0.0">An extra version number, for companies that decide to modify the file format.</add> <add type="ulittle32" ver1="3.3.0.13">Number of file objects.</add> <add type="ulittle32" default="0" cond="(User Version >= 10) || ((User Version == 1) && (Version != 10.2.0.0))" ver1="10.1.0.0">This also appears to be the extra user version number and must be set in some circumstances. Probably used by Bethesda to denote the Havok version.</add> <add type="uint" default="0" ver1="30.0.0.2">Unknown. Possibly User Version 2?</add> <add type="ExportInfo" ver1="10.0.1.2" ver2="10.0.1.2"></add> <add type="ExportInfo" ver1="10.1.0.0" cond="(User Version >= 10) || ((User Version == 1) && (Version != 10.2.0.0))"></add> <add type="ushort" ver1="10.0.1.0">Number of object types in this NIF file.</add> <add type="SizedString" arr1="Num Block Types" ver1="10.0.1.0">List of all object types used in this NIF file.</add> <add type="BlockTypeIndex" arr1="Num Blocks" ver1="10.0.1.0">Maps file objects on their corresponding type: first file object is of type object_types[object_type_index[0]], the second of object_types[object_type_index[1]], etc.</add> <add type="uint" arr1="Num Blocks" ver1="20.2.0.7">Array of block sizes?</add> <add type="uint" ver1="20.1.0.3">Number of strings.</add> <add type="uint" ver1="20.1.0.3">Maximum string length.</add> <add type="SizedString" arr1="Num Strings" ver1="20.1.0.3">Strings.</add> <add type="uint" default="0" ver1="10.0.1.0">Unknown.</add> </compound> Notice how the opening element is not indented - this is because I was not capturing characters before the opening element. This should not bother anyone. For more information on the structure of this particular xml file, see: http://niftools.sourceforge.net/wiki/Nif_Format/NifTools_XML_Format But this is enough, for today. Thanks, -- Raul ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
