Here's another question I've encountered while examining how my needs might be dressed in OM symbols. I think I know the answer already, but it's still a good starting point for a discussion:
Why can binders only take one argument (last child of OMBIND, which is specified to have exactly three children), when an application is allowed any number of arguments? An elementary case where one might want to do this is that of the ordinary definite integral \int_a^b f(x) \,dx which one might expect to encode as something like <OMBIND> <OMS name="naive_integral"/> <!-- role: binder --> <OMBVAR> <OMV name="x"/> </OMBVAR> <OMV name="a"/> <OMV name="b"/> <OMA> <OMV name="f"/> <OMV name="x"/> </OMA> </OMBIND> were it not for said fact that a binder can (in OM2) only have one argument, not the three of a, b, and f(x) as needed here. There seems to be no technical reason for this restriction, as neither the XML nor the binary OM encoding need invoke arity to parse a binding; although the list of bound variables may be of any length, these have a separate container element and are thus distinguished from variables occurring as arguments of the binder. However, I suspect that the answer to my question is that the (informal?) *prefix* notation for OM objects does not have such a container, and therefore relies on arity to tell what is what. If memory serves, relaxations of OMBIND is one (the only?) change to OM3 that is considered as needed to bring it in line with Content-MathML3, but I don't know where I might have read that. Perhaps someone else could elaborate on the current status of that issue? Now, since previously my informal examples were misunderstood by some, I suppose I'd better give you my real example this time. My reason to consider defining a binder in the first place is that I want to encode Abstract Index Notation expressions (http://en.wikipedia.org/wiki/Abstract_index_notation), which is an abstraction of the Einstein notation for tensors -- e.g. C^i_k = A^i_j B^j_k is a formula defining the matrix product C=BA. While the general foundation for AIN is not the same as the coordinate view underlying the Einstein notation, they still follow the same syntax, so we may for simplicity think primarily about the latter; I assume it is at least somewhat familiar. First, some symbols would be needed to label a tensor with index variables, but those are ordinary applications: one for attaching lists of sub- and superscripts to the base symbol, and one (which could perhaps be list1#list) for constructing such lists. It feels like I would need no less than three binders, however: 1. The "invisible \sum" symbol that is so characteristic of the Einstein notation; in the formula above it would be applied to the right hand side and bind the variable j. 2. A "\bigwedge" or U+22C0 (n-ary logical and) symbol, that in less formal writing might corresponds to a "for all i,k=1,...,n" clause put after an equation; in the formula above it would thus be applied to the application of equality and bind variables i and k. 3. An unlabeling symbol, that takes a labelled expression and turns it back into an unlabeled tensor, binding the variables that were removed. This is not used in the formula above, but could be used to put it more on the form C = [ A^i_j B^j_k ]_{i,k} where the bracket forms the matrix whose entries are given by the expression inside, incidentally binding i and k. Now, the tricky thing about this third binder is that it needs to encode more than just the body expression (A^i_j B^j_k) and what variables were bound, it must also specify the lists of sub- and superscripts that it covers. (In the case given, it is possible to infer this information from the body, but that is not the case in general. The main catch is that the order of the indices matter, and there is both an order of subscripts and an order of superscripts that need to be encoded.) My feeling is that the most natural way to encode this is by giving these lists as two additional arguments of the binder. But that wouldn't be valid OM (at least not OM2). So, I had a look around to see how this problem had been solved for other symbols, since there surely should be other binders with more than just a body: at least integrals, sums, and products... But as it turned out, there wasn't; binders are extremely rare among the official and contributed CDs on openmath.org! As far as I can tell, there are only: * quant1#forall * quant1#exists * directives1#find (return x such that P(x)) * ecc#SigmaType (I have no idea how this is used) and * fns1#lambda; integrals, sums, products, and "set-of ... such-that ..." all seem to be supposed to be rewritten using fns1#lambda, and thus functionalised. I suppose a lambdification of formulae can be an ideological choice or at least personal preference from core OM developers, but I sort-of get the feeling that it might in part also be a consequence of the above restriction that binders only take one argument. When you're only allowed to have the body as argument, it gets rather difficult to encode \sum_{k=1}^n a_k, \prod_{k=1}^n a_k, \int_a^b f(x) dx, or \{ (x,x^2) | x \in \mathbb{Z}, x \equiv 1 (mod 3) \} with a binder as the main symbol, as you have to stash away the non-body arguments in some nonintuitive place. The rewrite to a functional form probably seems preferable, even if it technically wasn't what one had in mind! Furthermore the functionalised forms can be rather artificial; a definite integral \int_a^b f(x) dx is supposed to be encoded as <OMA> <OMS cd="calculus1" name="defint"/> <OMA> <OMS cd="interval1" name="interval"/> <OMV name="a"/> <OMV name="b"/> </OMA> <OMBIND> <OMS cd="fns1" name="lambda"/> <OMBVAR><OMV name="x"/></OMBVAR> <OMA><OMV name="f"/><OMV name="x"/></OMA> </OMBIND> </OMA> which is rather some sort of \int_{[a,b]} f -- except that interval1#interval doesn't construct a set (which is what one would expect) but rather some sort of "formal interval", as \int{[a,b]} f = -\int_{[b,a]} f is the very first FMP (of calculus1#defint!)... Even if I had shared this preference for functional forms, I couldn't use fns1#lambda here; the use of variables as abstract indices is not a special case of variables in a function definition, even though they sometimes coincide (e.g. for the Einstein notation). Indeed, the third (unlabeling) binder mentioned above is very much a sibling of the ordinary fns1#lambda, but cannot like it rely on the order of the children of <BVAR> to provide all necessary information. (Come to think of it, it's even kind of hackish that fns1#lambda can use <BVAR> order this way, but that's the kind of "hack" needed to bootstrap a formalism.) The solution I think I'll have to use is to split the unlabeling in two: one binder that does the binding, and one application that bundles up to naive arguments into a single body for the binder; it's workable, and much better than the alternative of encoding this information with semantic annotations, but still somewhat unintuitive. I'd really prefer it if the "only one body" restriction on binders could be lifted. Lars Hellström _______________________________________________ Om mailing list [email protected] http://openmath.org/mailman/listinfo/om
