Hi!

This is an email I started writing a couple of months ago, regarding the ignore-spellcheck discussion; but it is even more relevant now with reference to the questions being raised about character styles as insets.

I agree very much with what JMarc has been saying about this issue: although I like very much the idea of character styles / logical markup, I don't think that insets are the right paradigm for implementing this.

I will try to articulate here when, in my opinion, an inset is appropriate and when it is not. I can't provide hard and fast rules, but here are a few questions which could be asked about a given piece of text, and which I think could help us clarify whether or not an inset is appropriate. (A lot of the questions may actually
be asking the same thing in different forms; but then that's to be
expected, since if I'm correct, then they are all describing "inset-ness" versus "non-inset-ness"). Based on each question, I will try to evaluate the status of various existing insets, as well as that of Character Styles. And I will also try to explain for each question why I think it describes "inset-ness".

1) Would the sentence still make sense if the text in question were replaced with a "black box"? Or put slightly differently: if a reader were to read the sentence, but instead of seeing the text in question, he would only know that "text of type X" belongs here, would the reader still get the gist of the sentence --- perhaps missing some details, but understanding th basic "template" of the sentence? If yes --- the text belongs in an inset. If no --- it does not.

Note that for virtually all the insets which currently exist in LyX, the answer to this question is clearly "yes": almost all of the existing insets (footnote, note / comment, reference, ...) are not a main part of the sentence at all, and the sentence would be perfectly readable without the text in question altogether. The only case which could perhaps be borderline is a mathematical expression; but even in this case, I contend that the omission of the contents of the inset would not change the overall meaning of the sentence. OTOH, in the case of character styles, replacing it's contents with just a "emph text here" message would almost certainly leave us with a grammatically incorrect sentence, of which we could get no gist.

For example, from the following sentence I have omitted the contents of the mathematical formulas and the references, leaving only the markers ($ $ and \ref{}):

"An important difference in our case is that there exist measures $ $ for which the set $ $ has \textit{no} largest element (see Proposition~\ref{} in Section~\ref{})."

Clearly, this omission is of a whole different nature than the omission of the \textit{} text would have been! With the current omissions, we're still left with a more or less grammatically correct sentence; omitting the \textit{} text would not have preserved this property! (Not to mention the fact that in this particular case, the entire meaning of the sentence would be reversed by such an omission...)

Why do I think that this question is related to "inset-ness"? Because of the collapsible nature of most insets: collapsing an inset is basically replacing the text in question with a black box of a known type. (Again, math stands out, since it is not collapsible. And we're going to see math standing out a lot. I think math is a special case, where a major reason for having it as an inset is the fact that the input method is so very different from "normal" text.) The fact that we can set a certain type of inset to be non-collapsible is quite beside the point: it's just another indication of the fact that perhaps that type of inset need not be an inset at all...

2) Does the text in question "belong to" the proposed inset / markup? If the attribute which the markup is supposed to endow were to be deleted, should the contents be deleted as well? If the answer is that "the contents belong to the markup, and should be deleted along with it", then this is an inset. If the contents exists independently of the markup, and should remain intact even if the markup is removed, then this is *not* an inset.

In the case of virtually all existing insets, the answer is that the contents belong to the inset: if a footnote is deleted, its text should not remain intact --- this would be disruptive to the main text (which is why it was placed in a footnote in the first place). (Dissolve is a special case, which is extremely useful at times; but it's not the norm of what deleting an inset means.) OTOH, in the case of character styles, the text should never be deleted along with the markup; after all, it's an integral part of the original sentence. So the contents do not belong to the markup, but to the containing sentence.

Why do I think this measures "inset-ness"? Because precisely one of the purposes of an inset is to "encapsulate" its content. The implementation of insets in the buffer reflects this: the inset is represented by a single character, which can be moved around or deleted, taking all of its contents with it. If we don't want that to be the case --- if we're always going to want to dissolve the inset rather than to delete it with its contents --- then why make it an inset in the first place? We should be placing the text directly where it actually belongs in the parent paragraph, and only marking it up to reflect the special attribute which we want to confer upon it.

3) What "came first": the text, or the attribute being applied to it? If the text came first, this is not an inset.

This is almost exactly the same question as (2), but I feel it's worth presenting it in this formulation as well, since it highlights the fact that for Character Styles, all we're doing is applying an attribute to already existing text (even if we start typing, then turn on \emph and continue typing what we want in \emph, conceptually we are marking off part of a larger sentence and giving it a special attribute). I mean --- the term Logical Markup which is being used for this in the module code says it as clear as day: this is *markup* of existing text! So why are we not representing it that way internally?

4) Is the attribute which the inset/markup is meant to endow necessarily
supposed to extend to everything contained within it --- without even knowing what's going to be contained in it? If yes, this should be an inset; otherwise, it should not.

*Everything* inside a comment is expected to be commented out: graphics, footnotes, ERT, ERT inside a caption inside a float inside the comment --- everything. Same goes for a footnote: if I insert a graphic inside a footnote, I expect it to appear in the footnote, not in the main text. OTOH, when I mark off text as \emph, I'm not claiming that I necessarily want the text inside a footnote appearing in the \emph text to itself be \emph (maybe I do and maybe I don't, but I have control over that, and can choose to have it either way). So the \emph-ness is not extending automatically to everything contained "inside" it.

Why do I think this describes "inset-ness"? Because both the GUI and the internal buffer representation of an inset reflect the fact that everything inside it is, well, inside it. If this is not what we mean to represent --- i.e., if there may be text within a region marked off as \emph which should itself not be \emph --- then we should be using an internal representation which allows finer-grained control, such as that provided by font attributes, and we should not be displaying it to the user "inside" the \emph.

-------------------------------------------

A separate support for this position can be found, I think, by the fact --- which we all agree upon --- that we're going to have to make some changes (or some have already been made) to insets, in order for them to be able to provide a good, usable solution for Character Styles: displaying or not displaying a label; 3-box-model; toggling on/off; etc. But if we're going to need to do things which make the insets behave less and less inset-like, doesn't this seem to indicate that perhaps we shouldn't be using an inset for this in the first place?


So, to be a little constructive, what do I think *is* the correct paradigm for Character Styles?

I would like to see some generalization of the concept of per-position attributes, such that it would be possible to define (in code, for starters) a new attribute --- say, "AttributeEmph" --- which could then be set for each and every position in the text.

The interface would be something like this:

GetAttribute([in] pos, [in] attribute_type, [out] attribute_value)
SetAttribute([in] pos, [in] attribute_type, [in] attribute_value)

Where attribute_type is a subclass of some AbstractAttribute, and attribute_value represents the values that the given attribute_type accepts (I guess templates would be helpful for this kind of model).

If these attributes are implemented on top of the existing font attributes (as I think the current thinking is, and which I think is correct), then we need not change anything in the latex output methods --- these would continue using the font attributes directly. OTOH, both the UI and the .lyx file would not access the font attributes directly anymore, rather they would only access these "higher-level" attributes, and these in turn would set the actual font attributes.

I can think of two possible implementations for storing these attributes in the memory buffer:
1) spans --- which is how font attributes work today;
2) have each position in the text be represented in the buffer not by a char_type, but rather by a struct which would contain, in addition to the char_type, also the attribute information belonging to that position (and maybe also a pointer to the inset, if it's an inset; this would be an extension of what I once suggested in this thread: http://permalink.gmane.org/gmane.editors.lyx.devel/88025; but this is really a separate issue); and perhaps other position-specific information.

I'm not saying this is easy, I'm sure there are a million little details that I haven't even considered. But (a) I *do* think that it may be easier than some of the things we want to be able to do if we stick with insets (toggling of character styles; 3-box-model); and (b) much more importantly, I just think that the *concept* of inset is wrong; and using the wrong concept is bound to cost a lot later on, because the better the concepts used for coding match the "real concepts", the easier it will be to handle new, currently unforeseen situations --- just because the code will "behave" more closely to how the "real world" it is trying to represent behaves.

Dov

Reply via email to