not everything is an inset!

Dov Feldstern Tue, 02 Oct 2007 19:00:03 -0700

Hi!

This is an email I started writing a couple of months ago, regarding theignore-spellcheck discussion; but it is even more relevant now withreference to the questions being raised about character styles as insets.

I agree very much with what JMarc has been saying about this issue:although I like very much the idea of character styles / logical markup,I don't think that insets are the right paradigm for implementing this.

I will try to articulate here when, in my opinion, an inset isappropriate and when it is not. I can't provide hard and fast rules, buthere are a few questions which could be asked about a given piece oftext, and which I think could help us clarify whether or not an inset isappropriate. (A lot of the questions may actually

be asking the same thing in different forms; but then that's to be

expected, since if I'm correct, then they are all describing"inset-ness" versus "non-inset-ness"). Based on each question, I willtry to evaluate the status of various existing insets, as well as thatof Character Styles. And I will also try to explain for each questionwhy I think it describes "inset-ness".

1) Would the sentence still make sense if the text in question werereplaced with a "black box"? Or put slightly differently: if a readerwere to read the sentence, but instead of seeing the text in question,he would only know that "text of type X" belongs here, would the readerstill get the gist of the sentence --- perhaps missing some details, butunderstanding th basic "template" of the sentence? If yes --- the textbelongs in an inset. If no --- it does not.

Note that for virtually all the insets which currently exist in LyX, theanswer to this question is clearly "yes": almost all of the existinginsets (footnote, note / comment, reference, ...) are not a main part ofthe sentence at all, and the sentence would be perfectly readablewithout the text in question altogether. The only case which couldperhaps be borderline is a mathematical expression; but even in thiscase, I contend that the omission of the contents of the inset would notchange the overall meaning of the sentence. OTOH, in the case ofcharacter styles, replacing it's contents with just a "emph text here"message would almost certainly leave us with a grammatically incorrectsentence, of which we could get no gist.

For example, from the following sentence I have omitted the contents ofthe mathematical formulas and the references, leaving only the markers($ $ and \ref{}):

"An important difference in our case is that there exist measures $ $for which the set $ $ has \textit{no} largest element (seeProposition~\ref{} in Section~\ref{})."

Clearly, this omission is of a whole different nature than the omissionof the \textit{} text would have been! With the current omissions, we'restill left with a more or less grammatically correct sentence; omittingthe \textit{} text would not have preserved this property! (Not tomention the fact that in this particular case, the entire meaning of thesentence would be reversed by such an omission...)

Why do I think that this question is related to "inset-ness"? Because ofthe collapsible nature of most insets: collapsing an inset is basicallyreplacing the text in question with a black box of a known type. (Again,math stands out, since it is not collapsible. And we're going to seemath standing out a lot. I think math is a special case, where a majorreason for having it as an inset is the fact that the input method isso very different from "normal" text.) The fact that we can set acertain type of inset to be non-collapsible is quite beside the point:it's just another indication of the fact that perhaps that type of insetneed not be an inset at all...

2) Does the text in question "belong to" the proposed inset / markup? Ifthe attribute which the markup is supposed to endow were to be deleted,should the contents be deleted as well? If the answer is that "thecontents belong to the markup, and should be deleted along with it",then this is an inset. If the contents exists independently of themarkup, and should remain intact even if the markup is removed, thenthis is *not* an inset.

In the case of virtually all existing insets, the answer is that thecontents belong to the inset: if a footnote is deleted, its text shouldnot remain intact --- this would be disruptive to the main text (whichis why it was placed in a footnote in the first place). (Dissolve is aspecial case, which is extremely useful at times; but it's not the normof what deleting an inset means.) OTOH, in the case of character styles,the text should never be deleted along with the markup; after all, it'san integral part of the original sentence. So the contents do not belongto the markup, but to the containing sentence.

Why do I think this measures "inset-ness"? Because precisely one of thepurposes of an inset is to "encapsulate" its content. The implementationof insets in the buffer reflects this: the inset is represented by asingle character, which can be moved around or deleted, taking all ofits contents with it. If we don't want that to be the case --- if we'realways going to want to dissolve the inset rather than to delete it withits contents --- then why make it an inset in the first place? We shouldbe placing the text directly where it actually belongs in the parentparagraph, and only marking it up to reflect the special attribute whichwe want to confer upon it.

3) What "came first": the text, or the attribute being applied to it? Ifthe text came first, this is not an inset.

This is almost exactly the same question as (2), but I feel it's worthpresenting it in this formulation as well, since it highlights the factthat for Character Styles, all we're doing is applying an attribute toalready existing text (even if we start typing, then turn on \emph andcontinue typing what we want in \emph, conceptually we are marking offpart of a larger sentence and giving it a special attribute). I mean ---the term Logical Markup which is being used for this in the module codesays it as clear as day: this is *markup* of existing text! So why arewe not representing it that way internally?


4) Is the attribute which the inset/markup is meant to endow necessarily

supposed to extend to everything contained within it --- without evenknowing what's going to be contained in it? If yes, this should be aninset; otherwise, it should not.

*Everything* inside a comment is expected to be commented out: graphics,footnotes, ERT, ERT inside a caption inside a float inside the comment--- everything. Same goes for a footnote: if I insert a graphic inside afootnote, I expect it to appear in the footnote, not in the main text.OTOH, when I mark off text as \emph, I'm not claiming that I necessarilywant the text inside a footnote appearing in the \emph text to itself be\emph (maybe I do and maybe I don't, but I have control over that, andcan choose to have it either way). So the \emph-ness is not extendingautomatically to everything contained "inside" it.

Why do I think this describes "inset-ness"? Because both the GUI and theinternal buffer representation of an inset reflect the fact thateverything inside it is, well, inside it. If this is not what we mean torepresent --- i.e., if there may be text within a region marked off as\emph which should itself not be \emph --- then we should be using aninternal representation which allows finer-grained control, such as thatprovided by font attributes, and we should not be displaying it to theuser "inside" the \emph.


-------------------------------------------

A separate support for this position can be found, I think, by the fact--- which we all agree upon --- that we're going to have to make somechanges (or some have already been made) to insets, in order for them tobe able to provide a good, usable solution for Character Styles:displaying or not displaying a label; 3-box-model; toggling on/off; etc.But if we're going to need to do things which make the insets behaveless and less inset-like, doesn't this seem to indicate that perhaps weshouldn't be using an inset for this in the first place?

So, to be a little constructive, what do I think *is* the correctparadigm for Character Styles?

I would like to see some generalization of the concept of per-positionattributes, such that it would be possible to define (in code, forstarters) a new attribute --- say, "AttributeEmph" --- which could thenbe set for each and every position in the text.


The interface would be something like this:

GetAttribute([in] pos, [in] attribute_type, [out] attribute_value)
SetAttribute([in] pos, [in] attribute_type, [in] attribute_value)

Where attribute_type is a subclass of some AbstractAttribute, andattribute_value represents the values that the given attribute_typeaccepts (I guess templates would be helpful for this kind of model).

If these attributes are implemented on top of the existing fontattributes (as I think the current thinking is, and which I think iscorrect), then we need not change anything in the latex output methods--- these would continue using the font attributes directly. OTOH, boththe UI and the .lyx file would not access the font attributes directlyanymore, rather they would only access these "higher-level" attributes,and these in turn would set the actual font attributes.

I can think of two possible implementations for storing these attributesin the memory buffer:

1) spans --- which is how font attributes work today;

2) have each position in the text be represented in the buffer not by achar_type, but rather by a struct which would contain, in addition tothe char_type, also the attribute information belonging to that position(and maybe also a pointer to the inset, if it's an inset; this would bean extension of what I once suggested in this thread:http://permalink.gmane.org/gmane.editors.lyx.devel/88025; but this isreally a separate issue); and perhaps other position-specific information.

I'm not saying this is easy, I'm sure there are a million little detailsthat I haven't even considered. But (a) I *do* think that it may beeasier than some of the things we want to be able to do if we stick withinsets (toggling of character styles; 3-box-model); and (b) much moreimportantly, I just think that the *concept* of inset is wrong; andusing the wrong concept is bound to cost a lot later on, because thebetter the concepts used for coding match the "real concepts", theeasier it will be to handle new, currently unforeseen situations ---just because the code will "behave" more closely to how the "real world"it is trying to represent behaves.

Dov

*not* everything is an inset!

Reply via email to

not everything is an inset!