Re: [whatwg] Various HTML element feedback

2012-08-27 Thread Ian Hickson
On Wed, 6 Jun 2012, Jukka K. Korpela wrote:
 2012-06-06 2:53, Ian Hickson wrote:
   
   I have rather been optimistic about future developments for markup 
   elements that have been defined exactly enough to warrant meaningful 
   semantics-based processing. For example, most of the uses mentioned 
   in current text imply that var element contents should be kept 
   intact in automatic language translation.
  
  That continues to be the case, so I don't know why you conclude that 
  using it is now pointless.
 
 It is worse than pointless, if the definition of var covers a term 
 used as a placeholder in prose. Such expressions should definitely not 
 be kept intact in automatic language translation.

They shouldn't be kept intact, but they still need special semantic 
processing to not break the page's meaning during translation (e.g. 
ensuring that the same variable name is always translated the same way).


   So why not simply define i recommended and describe var,cite, 
   em, and dfn as deprecated but supported alternatives?
  
  What benefit does empty deprecation have?
 
 Declaring some features as obsolete is effectively deprecation; I just 
 used the term deprecate as per HTML 4.01 because I find it more 
 descriptive. Anyway, defining those elements as deprecated/obsolete 
 would be no less and no more empty than the current statements about 
 obsolete status. Validators/checkers would issue messages (hopefully 
 just warnings) about them, and tutorials would probably describe them as 
 secondary if at all.

I don't see any benefit to obsoleting these elements. They are useful for 
various purposes. Even the HTML spec uses them (all of the above, in fact) 
to obtain special behaviour (e.g. the cross-referencing system uses 
dfn). In general having a variety of elements provides authors with good 
hooks for styling, too.


 Reducing alternatives, from five to one in this case, makes the 
 recommendations simpler and helps authors because they need not spend 
 time in making choices between the elements. Such choices can be tough, 
 if you try to play by the declared semantics, especially if it is 
 vague (to a normal reader of a spec).
 
 My point is: either make elements like var, cite, em, dfn, i 
 defined so that the differences can be utilized in automatic processing, 
 or just bundle them together, to i.

I certainly agree that we shouldn't go to a DocBook level of element 
variety, but reducing the avilable elements to a mere handful doesn't make 
any sense either. We have to strike a balance, taking into account what 
elements have historically been available (and thus which authors are 
familiar with), what use cases might argue for new ones, which elements 
have been most used or not used, etc.


  It's not like we can ever remove these elements altogether.
 
 Oh, in 20 or 30 years, I think browsers could support to some of them.

I'm not sure what you meant to write, but I don't see why 1992-2012 would 
be harder than 2012-2032 in terms of dropping these elements.


  What harm do they cause?
 
 Unnecessary complication to the language, artificial semantics that do 
 not actually define meanings, and confusion among those authors who try 
 to take semantics and specifications seriously. Oh, and pointless 
 variation in markup and added complexity of styling.

I disagree that these are really serious problems, or that their magnitude 
outweights the benefits here.


  If we have to keep them, we are better served by embracing them and 
  giving them renewed purpose and vigour, rather than being ashamed of 
  them.
 
 I think this summarizes well the idea behind some of the most contrived 
 semantic definitions. It was a brave attempt, but it failed. No normal 
 author will ever get your idea of the new meaning for b and i, for 
 example.

I guess we shall see. :-)


 And since, for example, the font markup needs to be supported for a 
 long time, how come *it* has not got a new, semantic definition?

I didn't start from b and look for a use case. People presented use 
cases, and when looking for a solution, b fit the bill. Same with 
small, etc. We did at one point have font in the spec, but the use 
case that supported its inclusion was later solved in a different way (I 
forget what it was) and we ended up removing it again. If a use case is 
presented for which font is a good fit, we can use it again.


   This would make authoring simpler without any real cost. There’s 
   little reason to tell authors to use “semantic markup” if we 
   don’t think it has real effect on anything.
  
  It does have an effect. It has many effects. It makes maintenance 
  easier, it makes it easier to transition from project to project, it 
  makes it easier to work on other people's markup, it makes it 
  significantly easier to dramatically change a site's appearance, it 
  makes it easier to create apply custom tools to extract information 
  from the documents, it makes it easier for search 

Re: [whatwg] Various HTML element feedback

2012-08-27 Thread Ian Hickson
On Wed, 6 Jun 2012, Henri Sivonen wrote:
 On Wed, Jun 6, 2012 at 2:53 AM, Ian Hickson i...@hixie.ch wrote:
  That might be realistic, especially there is no significant semantic 
  clarification in sight in general. This raises the question why we 
  could not just return to the original design with some physical 
  markup like i, b, and u together with span that was added 
  later.
 
  I think you'll find the original design of HTML isn't what you think 
  it is (or at least, it's certainly not as presentational as you imply 
  above), but that's neither here nor there.
 
 Is there a record of design between 
 http://www.w3.org/History/19921103-hypertext/hypertext/WWW/MarkUp/Tags.html 
 and http://www.w3.org/MarkUp/draft-ietf-iiir-html-01.txt ?

There's some in-between steps, e.g.:

   http://lists.w3.org/Archives/Public/www-talk/1992NovDec/0155.html


  So why not simply define i recommended and describe var, cite, 
  em, and dfn as deprecated but supported alternatives?
 
  What benefit does empty deprecation have? It's not like we can ever 
  remove these elements altogether. What harm do they cause?
 
 The harm is the wasted time spent worrying about and debating which 
 semantic alternative for italics to use.

I think this harm is dramatically reduced relative to the HTML4 days by 
the extensive use of examples and detailed descriptions in the spec now. 
If people are still having long debates, please don't hesitate to point me 
to them so I can clarify them in the spec. That's what a living standard 
is good for, after all.


  If we have to keep them, we are better served by embracing them and 
  giving them renewed purpose and vigour, rather than being ashamed of 
  them.
 
 I think we have to keep them, because trying to declare them invalid 
 would cause people to do a lot of pointless work, too, but I think we 
 could still be ashamed of them.

I don't think that's healthy.


  Note that as it is specified, div can be used instead of p with 
  basically no loss of semantics. (This is because the spec defines 
  paragraph in a way that doesn't depend on p.)
 
 Is there any known example of a piece of software that needs to care 
 about the concept of paragraph and uses the rules given in the spec 
 for determining what constituted paragraphs?

No, this is just to make it clear that you don't need to use p, and to 
short-circuit arguments about whether a ul is in a paragraph, etc.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Various HTML element feedback

2012-06-06 Thread Henri Sivonen
On Wed, Jun 6, 2012 at 2:53 AM, Ian Hickson i...@hixie.ch wrote:
 That might be realistic, especially there is no significant semantic
 clarification in sight in general. This raises the question why we could
 not just return to the original design with some physical markup like
 i, b, and u together with span that was added later.

 I think you'll find the original design of HTML isn't what you think it
 is (or at least, it's certainly not as presentational as you imply above),
 but that's neither here nor there.

Is there a record of design between
http://www.w3.org/History/19921103-hypertext/hypertext/WWW/MarkUp/Tags.html
and
http://www.w3.org/MarkUp/draft-ietf-iiir-html-01.txt
?
 So why not simply define i recommended and describe var, cite,
 em, and dfn as deprecated but supported alternatives?

 What benefit does empty deprecation have? It's not like we can ever remove
 these elements altogether. What harm do they cause?

The harm is the wasted time spent worrying about and debating which
semantic alternative for italics to use.

 If we have to keep them, we are better served by embracing them and giving
 them renewed purpose and vigour, rather than being ashamed of them.

I think we have to keep them, because trying to declare them invalid
would cause people to do a lot of pointless work, too, but I think we
could still be ashamed of them.

 Note that as it is specified, div can be used instead of p with
 basically no loss of semantics. (This is because the spec defines
 paragraph in a way that doesn't depend on p.)

Is there any known example of a piece of software that needs to care
about the concept of paragraph and uses the rules given in the spec
for determining what constituted paragraphs?

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/


[whatwg] Various HTML element feedback

2012-06-05 Thread Ian Hickson
On Sat, 21 Jan 2012, Jukka K. Korpela wrote:
 2012-01-21 0:30, Ian Hickson wrote:
  On Tue, 26 Jul 2011, Jukka K. Korpela wrote:
  
   I don’t think you have clarified whether var is suitable for 
   physical quantities, but I guess you meant to imply it—even though 
   there is not a single example about markup for physical quantities.
  
  Given that the spec contains the exact example you gave (E=mc^2), and 
  given that the definition explicitly includes an identifier 
  representing a constant as one of the uses for the element, I have to 
  disagree with your assessment.
 
 Now that you have added that example, the text implies that var is the 
 suggested markup for symbols of physical quantities. It is still 
 somewhat odd that this is expressed via an example only, and the basic 
 prose says: “The var element represents a variable. This could be an 
 actual variable in a mathematical expression or programming context, an 
 identifier representing a constant, a function parameter, or just be a 
 term used as a placeholder in prose.” None of the examples covers 
 symbols of physical quantities, and yet they are probably more common 
 texts in general (as opposite to mathematics and programming) than the 
 examples given.

I don't really understand why you don't think the text you quote doesn't 
cover symbols of physical quantities (also known as variables or 
constants depending on the specific symbol in question), but in the 
interests of moving on, I've made the spec redundantly unambiguous on this 
front by listing a symbol identifying a physical quantity explicitly.


   On the other hand, it seems that it doesn’t really matter. The 
   var element has now been defined to have such a wide and vague 
   meaning that it is pointless to use it. There is little reason to 
   expect that any software will ever pay attention to var markup on 
   any semantic basis.
  
  You seem to imply that there was reason to expect so before, which is 
  certainly news to me!
 
 I have rather been optimistic about future developments for markup 
 elements that have been defined exactly enough to warrant meaningful 
 semantics-based processing. For example, most of the uses mentioned in 
 current text imply that var element contents should be kept intact in 
 automatic language translation.

That continues to be the case, so I don't know why you conclude that using 
it is now pointless.


  I would not really expect these elements to be used for anything other 
  than styling hooks.
 
 That might be realistic, especially there is no significant semantic 
 clarification in sight in general. This raises the question why we could 
 not just return to the original design with some physical markup like 
 i, b, and u together with span that was added later.

I think you'll find the original design of HTML isn't what you think it 
is (or at least, it's certainly not as presentational as you imply above), 
but that's neither here nor there. The reasons for eschewing 
presentational markup in favour of more semantic/structural markup are 


 What’s the idea of wasting time in wondering which markup to choose, 
 among several vaguely described alternatives, when it all ends up with 
 being comparable to arbitrary author-named styles in word processing?

I would point you to this article:

   http://www.cs.tut.fi/~jkorpela/webpub.html

...but I think you probably already know of it.


 The advantage of using i, b, and u is that they have defined 
 default rendering (even in the absense of CSS) and universal support in 
 browsers.

That is _an_ advantage, yes. Not the only one.


   So authors will use i if they think italics is semantically 
   essential, and var won’t be used much.
  
  That seems to be the status quo.
 
 So why not simply define i recommended and describe var, cite, 
 em, and dfn as deprecated but supported alternatives?

What benefit does empty deprecation have? It's not like we can ever remove 
these elements altogether. What harm do they cause?

If we have to keep them, we are better served by embracing them and giving 
them renewed purpose and vigour, rather than being ashamed of them.


 This would make authoring simpler without any real cost. There’s 
 little reason to tell authors to use “semantic markup” if we don’t 
 think it has real effect on anything.

It does have an effect. It has many effects. It makes maintenance easier, 
it makes it easier to transition from project to project, it makes it 
easier to work on other people's markup, it makes it significantly easier 
to dramatically change a site's appearance, it makes it easier to create 
apply custom tools to extract information from the documents, it makes it 
easier for search engines to guess at author intent, it makes it easier 
for the documents to be repurposed for other media, it makes it easier for 
documents to be remixed, it makes it easier for JavaScript libraries to 
be used and mixed...


  However, some authors like the ease of 

Re: [whatwg] Various HTML element feedback

2012-06-05 Thread Jukka K. Korpela

2012-06-06 2:53, Ian Hickson wrote:


I have rather been optimistic about future developments for markup
elements that have been defined exactly enough to warrant meaningful
semantics-based processing. For example, most of the uses mentioned in
current text imply that var element contents should be kept intact in
automatic language translation.


That continues to be the case, so I don't know why you conclude that using
it is now pointless.


It is worse than pointless, if the definition of var covers a term 
used as a placeholder in prose. Such expressions should definitely not 
be kept intact in automatic language translation.


The definition of var is so broad that it is questionable whether 
*anything* useful can be assumed in automated processing. If it were 
defined more technically, without that placeholder idea, we could fairly 
certainly say that the content should be treated as a technical notation 
that should be left untranslated (as such notations are normally 
international), ignored in spelling checks, treated as equivalent to 
unknown nouns in syntax analysis of human language text, etc.



So why not simply define i recommended and describe var,cite,
em, and dfn as deprecated but supported alternatives?


What benefit does empty deprecation have?


Declaring some features as obsolete is effectively deprecation; I just 
used the term deprecate as per HTML 4.01 because I find it more 
descriptive. Anyway, defining those elements as deprecated/obsolete 
would be no less and no more empty than the current statements about 
obsolete status. Validators/checkers would issue messages (hopefully 
just warnings) about them, and tutorials would probably describe them as 
secondary if at all.


Reducing alternatives, from five to one in this case, makes the 
recommendations simpler and helps authors because they need not spend 
time in making choices between the elements. Such choices can be tough, 
if you try to play by the declared semantics, especially if it is 
vague (to a normal reader of a spec).


My point is: either make elements like var, cite, em, dfn, i 
defined so that the differences can be utilized in automatic processing, 
or just bundle them together, to i.



It's not like we can ever remove
these elements altogether.


Oh, in 20 or 30 years, I think browsers could support to some of them.


What harm do they cause?


Unnecessary complication to the language, artificial semantics that do 
not actually define meanings, and confusion among those authors who try 
to take semantics and specifications seriously. Oh, and pointless 
variation in markup and added complexity of styling.



If we have to keep them, we are better served by embracing them and giving
them renewed purpose and vigour, rather than being ashamed of them.


I think this summarizes well the idea behind some of the most contrived 
semantic definitions. It was a brave attempt, but it failed. No normal 
author will ever get your idea of the new meaning for b and i, for 
example.


And since, for example, the font markup needs to be supported for a 
long time, how come *it* has not got a new, semantic definition?


If var, cite, em, dfn would be obsoleted/deprecated in favor of 
i, they would still need to be defined in the spec, of course. But the 
definition could simply state that they are outdated elements that 
should not be used by authors and should be treated by browsers as 
equivalent to i.



This would make authoring simpler without any real cost. There’s
little reason to tell authors to use “semantic markup” if we don’t
think it has real effect on anything.


It does have an effect. It has many effects. It makes maintenance easier,
it makes it easier to transition from project to project, it makes it
easier to work on other people's markup, it makes it significantly easier
to dramatically change a site's appearance, it makes it easier to create
apply custom tools to extract information from the documents, it makes it
easier for search engines to guess at author intent, it makes it easier
for the documents to be repurposed for other media, it makes it easier for
documents to be remixed, it makes it easier for JavaScript libraries to
be used and mixed...


I've often seen such arguments, even in situations where it is 
strikingly obvious that they don't apply. The argumentation sounds like 
a matter of faith or principle rather practical considerations.


Many of the arguments relate to authoring style, coding principles, and 
organization of work, rather than something that belongs to a general 
specification. For example, the ease of working on other people's markup 
in a collaborative environment depends on a large number of factors, 
including the overall structures, appearance of markup (lower vs. upper 
case, use of quotes, omission of omissible tags, indentations, empty 
lines), principles of choosing id and class names, use of comments, etc. 
General specifications cannot and need not handle such