Re: [whatwg] Various HTML element feedback
On Wed, 6 Jun 2012, Jukka K. Korpela wrote: 2012-06-06 2:53, Ian Hickson wrote: I have rather been optimistic about future developments for markup elements that have been defined exactly enough to warrant meaningful semantics-based processing. For example, most of the uses mentioned in current text imply that var element contents should be kept intact in automatic language translation. That continues to be the case, so I don't know why you conclude that using it is now pointless. It is worse than pointless, if the definition of var covers a term used as a placeholder in prose. Such expressions should definitely not be kept intact in automatic language translation. They shouldn't be kept intact, but they still need special semantic processing to not break the page's meaning during translation (e.g. ensuring that the same variable name is always translated the same way). So why not simply define i recommended and describe var,cite, em, and dfn as deprecated but supported alternatives? What benefit does empty deprecation have? Declaring some features as obsolete is effectively deprecation; I just used the term deprecate as per HTML 4.01 because I find it more descriptive. Anyway, defining those elements as deprecated/obsolete would be no less and no more empty than the current statements about obsolete status. Validators/checkers would issue messages (hopefully just warnings) about them, and tutorials would probably describe them as secondary if at all. I don't see any benefit to obsoleting these elements. They are useful for various purposes. Even the HTML spec uses them (all of the above, in fact) to obtain special behaviour (e.g. the cross-referencing system uses dfn). In general having a variety of elements provides authors with good hooks for styling, too. Reducing alternatives, from five to one in this case, makes the recommendations simpler and helps authors because they need not spend time in making choices between the elements. Such choices can be tough, if you try to play by the declared semantics, especially if it is vague (to a normal reader of a spec). My point is: either make elements like var, cite, em, dfn, i defined so that the differences can be utilized in automatic processing, or just bundle them together, to i. I certainly agree that we shouldn't go to a DocBook level of element variety, but reducing the avilable elements to a mere handful doesn't make any sense either. We have to strike a balance, taking into account what elements have historically been available (and thus which authors are familiar with), what use cases might argue for new ones, which elements have been most used or not used, etc. It's not like we can ever remove these elements altogether. Oh, in 20 or 30 years, I think browsers could support to some of them. I'm not sure what you meant to write, but I don't see why 1992-2012 would be harder than 2012-2032 in terms of dropping these elements. What harm do they cause? Unnecessary complication to the language, artificial semantics that do not actually define meanings, and confusion among those authors who try to take semantics and specifications seriously. Oh, and pointless variation in markup and added complexity of styling. I disagree that these are really serious problems, or that their magnitude outweights the benefits here. If we have to keep them, we are better served by embracing them and giving them renewed purpose and vigour, rather than being ashamed of them. I think this summarizes well the idea behind some of the most contrived semantic definitions. It was a brave attempt, but it failed. No normal author will ever get your idea of the new meaning for b and i, for example. I guess we shall see. :-) And since, for example, the font markup needs to be supported for a long time, how come *it* has not got a new, semantic definition? I didn't start from b and look for a use case. People presented use cases, and when looking for a solution, b fit the bill. Same with small, etc. We did at one point have font in the spec, but the use case that supported its inclusion was later solved in a different way (I forget what it was) and we ended up removing it again. If a use case is presented for which font is a good fit, we can use it again. This would make authoring simpler without any real cost. There’s little reason to tell authors to use “semantic markup” if we don’t think it has real effect on anything. It does have an effect. It has many effects. It makes maintenance easier, it makes it easier to transition from project to project, it makes it easier to work on other people's markup, it makes it significantly easier to dramatically change a site's appearance, it makes it easier to create apply custom tools to extract information from the documents, it makes it easier for search
Re: [whatwg] Various HTML element feedback
On Wed, 6 Jun 2012, Henri Sivonen wrote: On Wed, Jun 6, 2012 at 2:53 AM, Ian Hickson i...@hixie.ch wrote: That might be realistic, especially there is no significant semantic clarification in sight in general. This raises the question why we could not just return to the original design with some physical markup like i, b, and u together with span that was added later. I think you'll find the original design of HTML isn't what you think it is (or at least, it's certainly not as presentational as you imply above), but that's neither here nor there. Is there a record of design between http://www.w3.org/History/19921103-hypertext/hypertext/WWW/MarkUp/Tags.html and http://www.w3.org/MarkUp/draft-ietf-iiir-html-01.txt ? There's some in-between steps, e.g.: http://lists.w3.org/Archives/Public/www-talk/1992NovDec/0155.html So why not simply define i recommended and describe var, cite, em, and dfn as deprecated but supported alternatives? What benefit does empty deprecation have? It's not like we can ever remove these elements altogether. What harm do they cause? The harm is the wasted time spent worrying about and debating which semantic alternative for italics to use. I think this harm is dramatically reduced relative to the HTML4 days by the extensive use of examples and detailed descriptions in the spec now. If people are still having long debates, please don't hesitate to point me to them so I can clarify them in the spec. That's what a living standard is good for, after all. If we have to keep them, we are better served by embracing them and giving them renewed purpose and vigour, rather than being ashamed of them. I think we have to keep them, because trying to declare them invalid would cause people to do a lot of pointless work, too, but I think we could still be ashamed of them. I don't think that's healthy. Note that as it is specified, div can be used instead of p with basically no loss of semantics. (This is because the spec defines paragraph in a way that doesn't depend on p.) Is there any known example of a piece of software that needs to care about the concept of paragraph and uses the rules given in the spec for determining what constituted paragraphs? No, this is just to make it clear that you don't need to use p, and to short-circuit arguments about whether a ul is in a paragraph, etc. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Various HTML element feedback
On Wed, Jun 6, 2012 at 2:53 AM, Ian Hickson i...@hixie.ch wrote: That might be realistic, especially there is no significant semantic clarification in sight in general. This raises the question why we could not just return to the original design with some physical markup like i, b, and u together with span that was added later. I think you'll find the original design of HTML isn't what you think it is (or at least, it's certainly not as presentational as you imply above), but that's neither here nor there. Is there a record of design between http://www.w3.org/History/19921103-hypertext/hypertext/WWW/MarkUp/Tags.html and http://www.w3.org/MarkUp/draft-ietf-iiir-html-01.txt ? So why not simply define i recommended and describe var, cite, em, and dfn as deprecated but supported alternatives? What benefit does empty deprecation have? It's not like we can ever remove these elements altogether. What harm do they cause? The harm is the wasted time spent worrying about and debating which semantic alternative for italics to use. If we have to keep them, we are better served by embracing them and giving them renewed purpose and vigour, rather than being ashamed of them. I think we have to keep them, because trying to declare them invalid would cause people to do a lot of pointless work, too, but I think we could still be ashamed of them. Note that as it is specified, div can be used instead of p with basically no loss of semantics. (This is because the spec defines paragraph in a way that doesn't depend on p.) Is there any known example of a piece of software that needs to care about the concept of paragraph and uses the rules given in the spec for determining what constituted paragraphs? -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
[whatwg] Various HTML element feedback
On Sat, 21 Jan 2012, Jukka K. Korpela wrote: 2012-01-21 0:30, Ian Hickson wrote: On Tue, 26 Jul 2011, Jukka K. Korpela wrote: I don’t think you have clarified whether var is suitable for physical quantities, but I guess you meant to imply it—even though there is not a single example about markup for physical quantities. Given that the spec contains the exact example you gave (E=mc^2), and given that the definition explicitly includes an identifier representing a constant as one of the uses for the element, I have to disagree with your assessment. Now that you have added that example, the text implies that var is the suggested markup for symbols of physical quantities. It is still somewhat odd that this is expressed via an example only, and the basic prose says: “The var element represents a variable. This could be an actual variable in a mathematical expression or programming context, an identifier representing a constant, a function parameter, or just be a term used as a placeholder in prose.” None of the examples covers symbols of physical quantities, and yet they are probably more common texts in general (as opposite to mathematics and programming) than the examples given. I don't really understand why you don't think the text you quote doesn't cover symbols of physical quantities (also known as variables or constants depending on the specific symbol in question), but in the interests of moving on, I've made the spec redundantly unambiguous on this front by listing a symbol identifying a physical quantity explicitly. On the other hand, it seems that it doesn’t really matter. The var element has now been defined to have such a wide and vague meaning that it is pointless to use it. There is little reason to expect that any software will ever pay attention to var markup on any semantic basis. You seem to imply that there was reason to expect so before, which is certainly news to me! I have rather been optimistic about future developments for markup elements that have been defined exactly enough to warrant meaningful semantics-based processing. For example, most of the uses mentioned in current text imply that var element contents should be kept intact in automatic language translation. That continues to be the case, so I don't know why you conclude that using it is now pointless. I would not really expect these elements to be used for anything other than styling hooks. That might be realistic, especially there is no significant semantic clarification in sight in general. This raises the question why we could not just return to the original design with some physical markup like i, b, and u together with span that was added later. I think you'll find the original design of HTML isn't what you think it is (or at least, it's certainly not as presentational as you imply above), but that's neither here nor there. The reasons for eschewing presentational markup in favour of more semantic/structural markup are What’s the idea of wasting time in wondering which markup to choose, among several vaguely described alternatives, when it all ends up with being comparable to arbitrary author-named styles in word processing? I would point you to this article: http://www.cs.tut.fi/~jkorpela/webpub.html ...but I think you probably already know of it. The advantage of using i, b, and u is that they have defined default rendering (even in the absense of CSS) and universal support in browsers. That is _an_ advantage, yes. Not the only one. So authors will use i if they think italics is semantically essential, and var won’t be used much. That seems to be the status quo. So why not simply define i recommended and describe var, cite, em, and dfn as deprecated but supported alternatives? What benefit does empty deprecation have? It's not like we can ever remove these elements altogether. What harm do they cause? If we have to keep them, we are better served by embracing them and giving them renewed purpose and vigour, rather than being ashamed of them. This would make authoring simpler without any real cost. There’s little reason to tell authors to use “semantic markup” if we don’t think it has real effect on anything. It does have an effect. It has many effects. It makes maintenance easier, it makes it easier to transition from project to project, it makes it easier to work on other people's markup, it makes it significantly easier to dramatically change a site's appearance, it makes it easier to create apply custom tools to extract information from the documents, it makes it easier for search engines to guess at author intent, it makes it easier for the documents to be repurposed for other media, it makes it easier for documents to be remixed, it makes it easier for JavaScript libraries to be used and mixed... However, some authors like the ease of
Re: [whatwg] Various HTML element feedback
2012-06-06 2:53, Ian Hickson wrote: I have rather been optimistic about future developments for markup elements that have been defined exactly enough to warrant meaningful semantics-based processing. For example, most of the uses mentioned in current text imply that var element contents should be kept intact in automatic language translation. That continues to be the case, so I don't know why you conclude that using it is now pointless. It is worse than pointless, if the definition of var covers a term used as a placeholder in prose. Such expressions should definitely not be kept intact in automatic language translation. The definition of var is so broad that it is questionable whether *anything* useful can be assumed in automated processing. If it were defined more technically, without that placeholder idea, we could fairly certainly say that the content should be treated as a technical notation that should be left untranslated (as such notations are normally international), ignored in spelling checks, treated as equivalent to unknown nouns in syntax analysis of human language text, etc. So why not simply define i recommended and describe var,cite, em, and dfn as deprecated but supported alternatives? What benefit does empty deprecation have? Declaring some features as obsolete is effectively deprecation; I just used the term deprecate as per HTML 4.01 because I find it more descriptive. Anyway, defining those elements as deprecated/obsolete would be no less and no more empty than the current statements about obsolete status. Validators/checkers would issue messages (hopefully just warnings) about them, and tutorials would probably describe them as secondary if at all. Reducing alternatives, from five to one in this case, makes the recommendations simpler and helps authors because they need not spend time in making choices between the elements. Such choices can be tough, if you try to play by the declared semantics, especially if it is vague (to a normal reader of a spec). My point is: either make elements like var, cite, em, dfn, i defined so that the differences can be utilized in automatic processing, or just bundle them together, to i. It's not like we can ever remove these elements altogether. Oh, in 20 or 30 years, I think browsers could support to some of them. What harm do they cause? Unnecessary complication to the language, artificial semantics that do not actually define meanings, and confusion among those authors who try to take semantics and specifications seriously. Oh, and pointless variation in markup and added complexity of styling. If we have to keep them, we are better served by embracing them and giving them renewed purpose and vigour, rather than being ashamed of them. I think this summarizes well the idea behind some of the most contrived semantic definitions. It was a brave attempt, but it failed. No normal author will ever get your idea of the new meaning for b and i, for example. And since, for example, the font markup needs to be supported for a long time, how come *it* has not got a new, semantic definition? If var, cite, em, dfn would be obsoleted/deprecated in favor of i, they would still need to be defined in the spec, of course. But the definition could simply state that they are outdated elements that should not be used by authors and should be treated by browsers as equivalent to i. This would make authoring simpler without any real cost. There’s little reason to tell authors to use “semantic markup” if we don’t think it has real effect on anything. It does have an effect. It has many effects. It makes maintenance easier, it makes it easier to transition from project to project, it makes it easier to work on other people's markup, it makes it significantly easier to dramatically change a site's appearance, it makes it easier to create apply custom tools to extract information from the documents, it makes it easier for search engines to guess at author intent, it makes it easier for the documents to be repurposed for other media, it makes it easier for documents to be remixed, it makes it easier for JavaScript libraries to be used and mixed... I've often seen such arguments, even in situations where it is strikingly obvious that they don't apply. The argumentation sounds like a matter of faith or principle rather practical considerations. Many of the arguments relate to authoring style, coding principles, and organization of work, rather than something that belongs to a general specification. For example, the ease of working on other people's markup in a collaborative environment depends on a large number of factors, including the overall structures, appearance of markup (lower vs. upper case, use of quotes, omission of omissible tags, indentations, empty lines), principles of choosing id and class names, use of comments, etc. General specifications cannot and need not handle such