from:"Jukka K. Korpela"

Re: [whatwg] Obsolete Feature [hgroup]

2015-02-18 Thread Jukka K. Korpela


2015-02-18, 22:55, Ian Hickson wrote:


If you wish to create a valid and semantically correct document according
to the most recently published HTML standard, then using hgroup is fine.


I think you meant to say that hgroup is in the WHATWG document at 
present (and you added that you do not plan to remove it from there).


There is strictly speaking only one HTML standard, namely ISO/IEC 15445. 
Few people know about it, still fewer care about it. (It’s just HTML 
4.01 written in more standardese.)


The W3C calls their recommendations “standards”. (I think this is a 
rather recent change.)


The IETF calls some of their documents “Internet Standards”, which is 
somewhat odd, but irrelevant to HTML, as IETF abandoned work on HTML 
after HTML 2.0.


The WHATWG calls their documents “living standards”, which is really an 
oxymoron.


Everyone could write a specification, with or without his brother, and 
call it a “standard”. Actually, you can never know which one is the most 
recent “standard” in that sense.


So what really matters is that there are different players in the fields 
of defining HTML, and there is no authority to rule them all. Every 
author/developer/designer/whatever needs to decide how to deal with 
this. Asking one of the players about the rules will most probably yield 
just the obvious answer.



I recommend using the validator.nu service rather than the W3C one.
They're basically the same but the validator.nu one is closer to the
WHATWG spec's requirements than the W3C one.


The relationship between the two is somewhat obscure, but it is in that 
direction. What the different validators check against is really their 
authors’ idea of what is correct HTML, so we have yet another 
“standard”, defined very implicitly, and mutable. But it indeed appears 
to be closer to WHATWG HTML than to W3C HTML5.


The bottom line is that validators are just useful tools, at best. All 
HTML5 validators are experimental software that checks against some 
rules that have not been disclosed in detail but are supposed to match 
some idea of “HTML5”.


Yucca

Re: [whatwg] ad

2014-12-05 Thread Jukka K. Korpela


2014-12-05, 11:41, Jens Oliver Meiert wrote:


Has there ever been a discussion about a dedicated element for ads, like ad?


Probably. What specific purposes would it serve?

The most obvious way of utilizing such markup, assuming it were commonly 
used, would be to have


ad { display: none }

in a user style sheet. I think this suffices to show why such markup 
would not be used.


Yucca

Re: [whatwg] alternate ids for elements

2014-12-03 Thread Jukka K. Korpela


2014-12-03, 15:49, Julian Reschke wrote:


I have a use case where a certain location in a document can have two
anchors (or even more). For instance, in a spec, the author may have
specified an anchor, but a section-number based anchor is required as well.


Can you elaborate on that? Why cannot you use the same id attribute 
value in all references to an element?



How about a new attribute alt-ids which would take a space-separated
list of additional anchors?


What would be the use of such additional identifiers?

The only thing I can imagine right now is a situation where you have an 
existing id attribute and references to it all around but now need to 
refer from a context that imposes its own restrictions on the syntax. 
Say, you have id=παράδειγμα and you need to refer to the element using 
a URL like http://example.com/foo.html#παράδειγμα; but cannot because 
the URL needs to be used in an environment where Greek letters cannot be 
used. But this sounds like a rather rare occasion.


Yucca

Re: [whatwg] alternate ids for elements

2014-12-03 Thread Jukka K. Korpela


2014-12-03, 19:41, Smylers wrote:


The solution seems simple to me: Do not change the anchor id, ever.


But what if the original ID used had a typo in it?


Id attribute values are strings used for identification. Any “typo” in 
them is just part of the string.



Or a product name has to change for legal reasons?


That’s irrelevant. Id attribute values are not product names, or names 
at all. And I don’t think you are referring to any real case here.



It's entirely reasonable for anchors to be
‘meaningful-to-human’ IDs that are indicative of the section they are
labelling, and for section names to change over time.


Id attribute values are not meant to be understandable to humans. If you 
write them expecting them to be, you need to deal with the consequences. 
It is true that id attribute values may appear as part of URLs, but so 
do e.g. form field names and values, yet we are not worrying about 
setting aliases for them.



For instance, Wikipedia pages have an ID for each section which is based
on the section name. Every time somebody edits a section title, the
anchor changes ... and any external links specifically to that section
break.


If that is so, it is poor design and should be fixed at that level. The 
alternate id proposal would not help at all unless Wikipedia was changed 
to keep old id attribute values as alternate values. If you can persuade 
them to do that, try making them fix the original problem instead. It 
should be much simpler to make an id attribute value, once created, 
permanent than to introduce a new mechanism.



There are far too many broken links on the web of this form, where the
link goes to the correct page but includes a non-existent anchor.


The proposal would not help with that. Those links would remain broken. 
For any new content, which one is easier, to keep id attributes as they 
have once been assigned or to change them but turn old values to 
something to be appended to a list of alternate ids? Besides, all 
existing browsers would completely ignore the alternate id list, so it 
would take several years before they could be relied on.


Yucca

Re: [whatwg] Typed numeric 'input'

2014-08-05 Thread Jukka K. Korpela


2014-08-05 10:14, Christoph Päper wrote:


You do realize that font size control was just an example?


Well, maybe not quite; the original message discussed font size control 
in detail and did not mention other examples.



A combined
widget for number and unit would be useful in many places.


I would expect most applications to decide on a unit or (sub)multiple of 
unit for each quantity. To process the input, the value would normally 
need to be converted to use a specific unit anyway.



Although
most of us use metric units exclusively for almost all applications,
there are still a lot of scenarios where two or more units are
commonly used – even with the SI some may prefer centimetres over
millimetres sometimes (or vice versa).


The SI unit of distance is the meter. The centimeter and the millimeter 
are just submultiples of the meter. But I can see that it might be 
useful on some contexts to let the user decide which of these 
submultiples is used, e.g. when specifying dimensions of household 
equipment. This however sounds rather simple: have just one field for 
the number and a dropdown with “mm” and “cm” as alternatives. Even then, 
fixing the unit might actually be better usability (don’t force the user 
make to decisions if there is a reasonable way to avoid that(.



An addition to the ways mentioned, the font size control could be
simply two buttons, one for increasing and one for decreasing the
size, (…)


This seems like a special cased ‘numeric’ or ‘range’ widget and is
agnostic of units.


It is, and probably therefore favored by many designers. It’s simple, 
often too simple, but I mentioned it just as a common example.



The designer needs to decide the internal representation of the
font size and to map the alternatives in the UI to that. I don’t
see how additions to HTML would significantly help here, even if
they happened to match the approach that is selected by the
designer.


The point is that some such approaches are possible already, but not
all. The simple possible solutions are rather clumsy and not very
user-friendly.


I don’t see anything clumsy with two fields, one for a number, another 
for a unit. If there is any clumsiness, it’s in the idea of making the 
user select the unit. There can be reasons to do so, of course, but in 
such special cases, the UI and the code implementing it needs to be 
tuned according to the special requirements.



Every author could, of course, just parse all free user input from a
‘text’ input server-side, but why shouldn’t browsers sanitize such
input like they do for other form controls?


Because it is up to the designer to decide what the allowed formats are, 
how errors are handled, etc. The format is generally locale-dependent, 
and localization is poorly handled at present in HTML – it has a vague 
idea of using the system locale, the browser locale, the document 
language locale, or something else. This mess should be cleared up 
before new features requiring localizations are added.


To the extent that general code for handling such issues can be written, 
it should be in libraries and frameworks, rather than as constructs that 
browsers are required to implement.


Yucca

Re: [whatwg] Typed numeric 'input'

2014-08-04 Thread Jukka K. Korpela


2014-08-04 20:06, Christoph Päper wrote:


Imagine a text layout GUI made with HTML.

 It would probably feature a font size selection control.

There are different ways to do such a thing:


There are, and they are preferred in different ways by different people, 
as programmers or as end users. This is why any solution, in addition to 
introducing considerable complexity into HTML, would be used for a small 
fraction of potential use cases only.


An addition to the ways mentioned, the font size control could be simply 
two buttons, one for increasing and one for decreasing the size, 
possibly with keyboard shortcuts and possibly a button and/or shortcut 
for resetting the size to the default.


And many others.


As I’m said, I’m not sure what’s the best solution, but HTML should have a 
typed numeric input element with conversion capabilities.


The designer needs to decide the internal representation of the font 
size and to map the alternatives in the UI to that. I don’t see how 
additions to HTML would significantly help here, even if they happened 
to match the approach that is selected by the designer.


Yucca

Re: [whatwg] Maximum value needed for tabindex

2014-07-24 Thread Jukka K. Korpela


2014-07-24 8:34, Boris Zbarsky wrote:


On 7/24/14, 1:29 AM, Jukka K. Korpela wrote:

However, browsers actually impose an upper limit of 32767



In Chrome and Firefox, values larger than this are interpreted as 0.


In the case of Firefox, this was a bug, that was fixed a few months ago.
  See https://bugzilla.mozilla.org/show_bug.cgi?id=996095


I’m afraid the fix does not work. Testing the jsfiddle code there,
http://jsfiddle.net/tatesn/hVv72/
in the newest Firefox (31.0, on Win 7), the “Click here” link, with 
tabindex=4, and the input element after it, with 
tabindex=27, are not in the tabbing order at all, and the tabIndex 
property value is 32767. This is odd because tabindex=32767 as such 
works OK.


My observation on larger values being taken as 0 was based on my initial 
testing with very large values (outside Int32 range).


In Chrome, the elements are in the tabbing order, but if their tabindex 
attributes are swapped, the order stays the same, i.e. follows the 
textual order. This is natural since tabIndex property value is 32767 
for both.



1) Keep tabindex unlimited and try to make browsers implement this.


This is what we should do, in my totally biased opinion.


Even in the best case, it would take several years before the usage 
share of all current browser versions is small enough.


Are there any use cases for tabindex values greater than 32767? 
Presumably not real use now (since such values do not work), but are 
there reasonably imaginable use cases?


Yucca

Re: [whatwg] Maximum value needed for tabindex

2014-07-24 Thread Jukka K. Korpela


2014-07-24 17:24, Boris Zbarsky wrote:


Are there any use cases for tabindex values greater than 32767?


We've seen use cases for forms with that many form controls (large forms
parts of which get conditionally shown/hiden based on values filled in
for some of the controls), so I would think so, yes.


So how have authors handled the issue in the current situation where 
browsers fail to support tabindex values  32767 and do that inconsistently?


Having that many controls is rather exceptional, but there's more: 
tabindex attributes are needed only when you need to make the order of 
controls in source different from the tabbing order.


Yucca

[whatwg] Maximum value needed for tabindex

2014-07-23 Thread Jukka K. Korpela

The tabindex attribute is defined so that its value must be a valid 
integer. No other restrictions are currently imposed.


However, browsers actually impose an upper limit of 32767 (which is in 
accordance with HTML 4.01:

http://www.w3.org/TR/html401/interact/forms.html#adef-tabindex ).

In Chrome and Firefox, values larger than this are interpreted as 0. 
This can be seen by setting an attribute with larger value and 
displaying the value of the tabIndex property of the element node.


This means that if you try to use tabindex=5, it seems to work 
(since elements with tabindex=0 are placed after elements with tabindex 
in the accepted range), but if you then add tabindex=4, its position 
relative to the element with tabindex=5 is determined by source code 
order, not by the value you've written.


IE is worse. It maps a tabindex attribute value of 32768 to 0 and larger 
values apparently all to negative values (so that they do not appear in 
tabbing order at all).


There are two ways to deal with this:
1) Keep tabindex unlimited and try to make browsers implement this.
2) Specify an upper limit of 32767.

Option 1 sounds unrealistic, and it would take a long time. Moreover, it 
is difficult to imagine a situation where tabindex values larger than 
32767 would be needed. Authors may be using values like 100, 200 etc. to 
allow insertion of elements with in-between values, i.e. adding elements 
to the tabbing order without changing the numbering. But even then, 
32767 should suffice for all practical needs.


If option 2 is taken, there is the question whether error processing 
rules should be defined, i.e. whether browsers would be required to 
handle values larger than 32767 in a specific way. Perhaps not, because 
that could carry a wrong message; any defined error handling can be 
taken to that authors can rely on it.


--
Yucca, http://www.cs.tut.fi/~jkorpela/

Re: [whatwg] Supporting more address levels in autocomplete

2014-02-24 Thread Jukka K. Korpela


2014-02-22 3:03, Ian Hickson wrote:


(Note that a lot of people in the UK have no idea how to write their
address according to current standards. For example, people often include
the county, give the real town rather than the post town, put things
out of order, indent each line of the address, etc.)


The phenomenon is probably not limited to the UK. Few people even know 
the current standards (national and international).


I think it would be more important to have the option of using less 
address levels, rather than more.


Some fine-grained control for naming different components of an address 
are undoubtedly useful at times. It would be even more useful to have a 
common, standard name for just an address. That is, whatever someone 
wants the sender to put in an envelope. Its internal structure does not 
matter, as long as it works, and as usual, it is up to the recipient to 
specify the address in a manner that works.


Forms that require the user to split his address to small pieces may 
have their reasons. But if you just want to have an address to send 
stuff to, then all you need is a working postal address. A textarea 
with, say, name=postal, if used on different pages, would then let the 
user enter his entire address very simply, after just once typing it.


Probably postal should be specified so that it relates to a postal 
address that is complete for delivery except the recipient name. The 
reason is that the name is so often asked separately


Yucca

Re: [whatwg] input type=number for year input

2014-02-19 Thread Jukka K. Korpela


2014-02-19 11:10, Smylers wrote:


Jukka K. Korpela writes:


The point is that year numbers aren't really numbers in a normal
sense, any more than car plate numbers, credit card numbers, product
numbers, or social security numbers are. Surely they can be regarded
as numbers, but so can car plate numbers and the others.


Except that years do actually form a sequence, and it's possible to
perform maths on them; for instances, subtracting one year from another
yields a duration


Mathematically, you are right, but input types aren't based on general 
properties of quantities but on practical classification of input data. 
All the examples I gave, including year numbers, are normally input by 
typing the digits - in contrast with, say, using a color picker, a data 
picker, or a slider. And year numbers differ, as mentioned, from normal 
numbers as regards to conventional formats (e.g., 2014 vs. 2,014 or 
2'014 or 2 014 or...).


So in the input process, a year number is not treated like a number. It 
typically appears when asking for year of birth or some other event 
(marriage, employment, etc.). The input check is normally against any 
non-digit data, the kind of thing we can do with pattern=...


Logically, one might say that since asking for a year is very often an 
alternative to asking for more specific data such as month or day, it 
should be treated as date and time input rather than text input with 
restrictions. But I don't see how this would be practically relevant. 
What else could input type=year be other than reading some digits? 
There is the possibility of allowing two-digit numbers, with an implied 
century, but if that is desirable, authors can use input type=text 
pattern=\d{4}|\d{2} and deal with the implied century in their own code.


Yucca

Re: [whatwg] input type=number for year input

2014-02-18 Thread Jukka K. Korpela


2014-02-19 1:59, Ian Hickson wrote:


I would be interested in hearing more about the locales where not using
separators even for four digits is bad/suboptimal.


It would break a few national standards on number representation.

The point is that year numbers aren't really numbers in a normal 
sense, any more than car plate numbers, credit card numbers, product 
numbers, or social security numbers are. Surely they can be regarded as 
numbers, but so can car plate numbers and the others.


Breaking standards or practices documented at CLDR just because some 
4-digit number might be a year number sounds like rather arbitrary.


It would be simpler to just say that input type=number is meant for 
normal numeric input where some locale definitions are supposed to 
apply. Using such an element normally makes sense only when we can 
expect the actual user input to be most often close to the initial value 
provided, so that the up and down functions make sense. For simply 
checking that the input is a digit sequence, input type=text 
pattern= is much more natural.


Yucca

Re: [whatwg] input type=number for year input

2014-02-18 Thread Jukka K. Korpela


2014-02-19 2:30, Michael[tm] Smith wrote:


The following info seems relevant -

   http://www.thepunctuationguide.com/comma.html#numbers
   Most authorities, including The Associated Press Stylebook and The Chicago
   Manual of Style, recommend a comma after the first digit of a four-digit
   number. The exceptions include years, page numbers, and street addresses.


Similar rules apply to other languages as well. Generally, we should 
expect implementations to apply documented locale-specific rules (for 
some locale determined somehow). There are different grouping rules, 
though; not all locales use groups of three digits. Anyway, we should 
expect a 4-digit number to be grouped, with some group separator, rather 
often.



To me that appears to be a strong argument that formatting of years is in
fact clearly an exception, and that's compelling enough to warrant having a
type for them separate from the normal number type (in which four-digit
numbers would instead have a separator, to follow existing longstanding
conventions).


And what about page numbers and street addresses (and other exceptions)? 
If we have input type=year, then it would be rather odd to use it for 
reading a page number.


Most importantly, though, this would introduce yet another value for the 
type attribute for something that can well be handled with existing 
tools: input pattern=\d{4}. It is improbable that any year selection 
widget would be useful. Years are normally best entered by typing them.


On the other hand, as this is about input, not output, a simple 
additional rule (which has other usability benefits, too) would solve 
the issue, too: User agents may allow locale-specific group separators 
in a number (e.g., “1,500” when the locale is English), but they shall 
accept a number without group separators, too (e.g., “1500”, in any locale).


Yucca

Re: [whatwg] OUTPUT tag: clarify purpose in spec?

2014-01-24 Thread Jukka K. Korpela


2014-01-22 2:28, Ian Hickson wrote:


On Tue, 3 Dec 2013, Jukka K. Korpela wrote:

[...]

Thank you for the clarifications. I may have been stuck to an idea of a
submittable element, possibly adopted from some earlier version or
proposal. I think an explicit short note like The output element is not
submittable would be useful.


I am reluctant to add that kind of comment for a couple of reasons. First,
there's the problem of determining when one would add these notes. Should
the spec be explicit about everything it doesn't say?


No, but it should be explicit about things that could easily be 
misunderstood.



Second, it can lead readers to assume that anything that the spec doesn't
explicitly call out as not being true is in face true


Readers who wish to think so may think so anyway. I don't see how this 
could be a serious risk.



What I would rather do is clarify whatever led to the confusion in the
first place. Do you have any idea what it is in the output section that
might lead you to think that it would be submittable?


Well, it is under the heading 4.10 Forms. As an element for the result 
of some scripted operation (which output seems to be meant for), 
output need not have anything to do with forms. But when it is under 
Forms, a natural idea is oh, this is for some computed value, like a 
total, to be submitted.



(A submittable output element would a natural thing to have in many
cases, e.g. in showing some calculated total to the user and submitting
it along with form data, for checking purposes.)


Can you elaborate on this use case? I'm not sure how it would work.


When you calculate the total with JavaScript, mainly to be shown to the 
user, you might as well submit it along with the form, as an extra 
check. If it does not match the total calculated in the server, 
something went very wrong. What you do then is a different question, but 
the important thing is that you detect a problem, instead of charging an 
amount that differs from what the user saw.



The main reason for not submitting it so far has been that it would risk
authors relying on the client's computation and thus not doing it on the
server,


Authors often rely too much on checks and computations made client-side 
- including new features like @pattern and @required attributes and new 
values of the @type attribute. They have always been able to do that 
with calculated totals, for example - just using an input element 
(possibly with @readonly).



I think the definition of the @name content attribute needs revision. It
now says: Name of form control to use for form submission and in the
form.elements API. Apparently, form submission should be omitted.


Aah, interesting. Yeah, that's confusing. The attribute is a generic one
used by multiple elements for both those purposes, but in the case of
output and fieldset, it can never be used for form submission, since
those aren't submittable, so it should use a different description.

Fixed.


The single-page version now has Name of form control to use in the 
form.elements API, but the multi-page version still has the old 
formulation.



Without name=, the main purpose of output -- making it easy to update
non-form-control values in script -- is lost.


The @name attribute in general, except for submittable controls, is 
legacy markup that has caused much confusion. It was introduced long 
ago, before @id was added to HTML, for scripting purposes, on @img and 
@form, as well as on @a for link destinations, but it was unsuitable 
from the beginning. It was not defined to be unique in the document, and 
there have been many attempts to phase out/deprecate/obsolete @name 
(except for submittable fields, where it need not be unique).


So it looks a bit odd to introduce @name for a new element.


Consider what this would look like without the form.elements API:

   form name=main
Result: output name=result/output
script
 document.forms.main.elements.result.value = 'Hello World';
/script
   /form


With output id=result/output, it would have

document.getElementById('result').value = 'Hello World'

and if jQuery is used (and more than half of the world uses it, or 
something similar), it would have


$('#result') =  'Hello World'

I would say that both ways are simpler than the property chain 
document.forms.main.elements.result.value and, moreover, a way that can 
be used to access any element, not just output.



Well, more or less by definition, of output is appropriate for
something, it's more appropriate than span would be, since span is
more generic. span is like the fall back element, it has essentially
no semantics at all.


That's a rather theoretical proposition. You say that output is for a 
result of a calculation or user agent and call this semantics. But how 
would that be a tangible benefit.



I think the improvement of o relative to document.getElementById('o')
should be self-evident;


If you intend to use plain o instead of a property

Re: [whatwg] Should ambiguous ampersand be a parse error?

2013-12-10 Thread Jukka K. Korpela


2013-12-10 19:45, Boris Zbarsky wrote:


On 12/10/13 11:11 AM, Peter Cashin wrote:

The HTML5 spec says that an ambiguous ampersand (e.g. something;
undefined) is not allowed in element content


Right, that's an authoring requirement.


Authoring requirements as such are just policy statements, therefore 
regularly ignored. They are supposed to communicate something, but as 
the late prof. Wiio so wisely stated, communication usually fails, 
except by accident (and he was an optimist).



There is no throwing of parse errors in the HTML spec.


Well, yes, throwing belongs to the DOM and to scripting. The question is 
whether some construct is parsed in a particular way or not.



Is the specification intended to have compliant HTML agents stop
parsing ambiguous ampersands?


Compliant HTML agents are allowed to do so, I guess, per the technical
rules about parse errors, just like for any other parse error.  But I
expect that this is at least partly for conformance classes other than
browsers; all browsers press on through parse errors in HTML.  Maybe
the allowed behavior for parse errors should be made conditional on
conformance class...


Allowing user agents to stop parsing after a parse error (BTW, where 
exactly does the WHATWG HTML Living Standard allow that?) is really just 
avoidance. If browsers actually apply some specific error recovery, 
what’s the excuse for not making that mandatory?


Different user agents can really do very different things. But I don’t 
think it’s a good idea to make that a rule of *parsing HTML*.


Yucca

Re: [whatwg] Should ambiguous ampersand be a parse error?

2013-12-10 Thread Jukka K. Korpela


2013-12-10 22:20, Boris Zbarsky wrote:


In this case, it's an eminently validator-enforceable authoring
requirement.


That’s a more or less a wannabe-normative requirement that “validators” 
are supposed to enforce. There is no real HTML5 validator so far (not 
surprising, as there is no HTML5), but the point is that nobody who does 
not use a “validator” will see the requirement as “enforced”-



Allowing user agents to stop parsing after a parse error (BTW, where
exactly does the WHATWG HTML Living Standard allow that?)


Did you try following the links in my mail?  Let me try again, but this
time do actually follow the link:
http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html#parse-error


“This section only applies to user agents, data mining tools, and 
conformance checkers.” So what about conformance of documents?


If browsers are allowed to quit, or to proceed, then this is a very 
theoretic proposition. Technically, it does not define document 
conformance, does it?


Yucca

Re: [whatwg] OUTPUT tag: clarify purpose in spec?

2013-12-03 Thread Jukka K. Korpela

2013-12-03 2:24, Ian Hickson wrote:

On Thu, 26 Sep 2013, Jukka K. Korpela wrote:

2013-09-26 21:41, Ian Hickson wrote:

There's a lot of output examples in the spec; do they help at all?

There are indeed several examples, but they are scattered around; the
section that specifically deals with the output element, 4.10.15, has
only one example.

I've added a second.

I can't find it - I just see the calculator example, at
http://www.whatwg.org/specs/web-apps/current-work/multipage/the-button-element.html#the-output-element

output elements are never submitted, actually. They're not
submittable.

Thank you for the clarifications. I may have been stuck to an idea of a
submittable element, possibly adopted from some earlier version or
proposal. I think an explicit short note like The output element is not
submittable would be useful.

(A submittable output element would a natural thing to have in many
cases, e.g. in showing some calculated total to the user and submitting
it along with form data, for checking purposes.)

I think the definition of the @name content attribute needs revision. It
now says: Name of form control to use for form submission and in the
form.elements API. Apparently, form submission should be omitted. And I
think it would be better to drop the @name attribute entirely; if a page
uses it in output, it's probably a mistake (the author assumes that
output is submittable.

The question then arises why output is used, instead of just showing
the result in a span or div element as usual.

Indeed. Often the benefit to using a more appropriate element rather than
just using span everywhere is not immediately obvious.

I don't quite see why output would be more appropriate.

In the particular case of the calculator example, the main benefit is that
the snippets of script become much simple:

oninput=o.value = a.valueAsNumber + b.valueAsNumber

...rather than:

oninput=document.getElementById('o').textArea = a.valueAsNumber +
b.valueAsNumber

I suppose you mean .textContent instead of .textArea.

References like document.getElementById('o') or their jQuery counterpart
$('o') are extremely common, so why bother simplifying things in a very
specific case? And anyone who does not like the length of
document.getElementById() and does not want to load jQuery can write his
own function for the purpose.

I think it is unnecessary to have an element for output when this means
that writing to the element is different from normal manipulation of
elements (via document.getElementById() or via
document.getElementsByTagName() or other general methods)

The output element represents the result of a calculation or user action.
That's what the spec says. I'm not sure what more you want it to say.

Well, what it really means. Is output4/output OK just because I got
4 from calculating 2 + 2? You contrasted output with samp, which
clarified this to some extent, but there is no statement like that in
the description. So shouldn't calculation be clarified by saying that
it is a calculation performed on the page, i.e. the result of executing
some client-side script? This would probably cover user action too -
it now looks odd, since the element content is not supposed to change
directly due to user action, the way e.g. input type=text works.

I still don't quite see *why* output has been introduced. I can
understand it as a purely logical creation, but what is the practical
gain expected to be?

The main practical gain is that it makes outputting data from script in a
form easier, since output is a listed form-associated element.

That statement, in some formulation, might be a useful addition to the
description of output.

I think I understand the idea now, but readers of the spec will probably
have hard time in getting it without some clarifications.

I don't find output useful in outputting data from a script, since it
requires a special approach for something that can well be handled using
a general approach, and compactness of code is not that relevant,
especially if it makes the code less readable.

I think the benefits of output do not justify the added complexity it
brings into the language and the time that would be spent by authors,
trying to understand the concept and to decide whether to use output
or span or input or something else for results of computation.

P.S. I haven't seen a description of what the @for attribute of output
might be useful for. Presumably, it is meant to act as a documentation
tool, with some automated checking by validators (they check that the
referenced @id attributes exist in the document). If this is relevant,
the same can be achieved without a dedicated element, e.g. by adding a
general attribute @from that specifies that the content of the element
will (normally) be changed by a script that uses certain other elements
(listed in the attribute value) as data.

Yucca

Re: [whatwg] Proposal: Locale Preferences API

2013-11-27 Thread Jukka K. Korpela


2013-11-28 0:20, Boris Zbarsky wrote:


On 11/27/13 4:28 PM, Jungshik Shin (신정식, 申政湜) wrote:

That is, I suggest that 'navigator.language' always be the UI language
of a
web browser.


That's an unacceptable privacy leak from Mozilla's point of view.  See
https://bugzilla.mozilla.org/show_bug.cgi?id=55366 where we explicitly
switched from that to basing navigator.language on the Accept header.


More importantly, I would say, the browser’s UI language should normally 
be completely irrelevant to page design and implementation.


I might be using an English-language browser because there is no better 
option (localizations are lousy). This does not mean that when viewing a 
page in, say, German, I would want the page to talk to me in English, to 
use English-language month names in date controls and info, etc.


Yucca

Re: [whatwg] Add input Switch Type

2013-11-19 Thread Jukka K. Korpela


2013-11-19 16:25, Domenic Denicola wrote:


From: whatwg-boun...@lists.whatwg.org [mailto:whatwg-


I agree that the look and feel is different from checkbox but all
the differences seem to be purely presentational. If you disagree,
you need to elaborate a bit more.


Interestingly, Microsoft's Windows Store apps guidelines disagree. I
find their reasoning somewhat compelling, although novel:

http://msdn.microsoft.com/en-us/library/windows/apps/hh465475.aspx

Use a toggle switch for binary settings when changes become
effective immediately after the user changes them.

Use a checkbox when the user has to perform extra steps for changes
to be effective.


From the usability and accessibility point of view, this seems to 
address an important issue. Authors sometimes use checkboxes (or radio 
buttons) so that changing their state has an immediate effect, even 
submitting a form. This may violate normal user expectations and can be 
confusing. Normally, we enter some data, using various controls, and 
then click on a button (or do something equivalent) to request for an 
action. Checking a checkbox should not be a commitment, any more than 
typing text in a feedback form or selecting an item from a dropdown list 
in an order form should be a commitment.


This means that things that have immediate effect should be buttons, or 
something else recognized as action-triggering  control. So why not use 
a button? Maybe because a button does not normally have a visible state. 
A toggle switch would thus logically be a combination of a checkbox and 
a button: it has a direct effect, like a button, but it remains visible 
(or otherwise perceivable) in an on or off state, like a checkbox. And 
it should probably have a dual ARIA role: role=checkbox button.


But maybe this means looking at things in a too narrow perspective, as 
if controls were only used in forms that submit data to a server. A 
purely application-like page may conceivably have checkboxes and radio 
buttons that have immediate effects (say, so that in an image processing 
application, checking a checkbox immediately turns the image to 
grayscale). Checkboxes probably wouldn’t confuse a user who knows at all 
what he is using. On the other hand, toggles could be used, too. Maybe 
even better than checkboxes.


Yucca

Re: [whatwg] Add input Switch Type

2013-11-19 Thread Jukka K. Korpela


2013-11-19 22:27, Qebui Nehebkau wrote:


A checkbox represents an input with
binary state. As I understand it, whether the input is immediate or
takes effect only on some kind of submission is defined by context -
specifically, whether the checkbox is associated with a form with a
submit button.


This more or less summarizes the alternative look at the issue that I 
mentioned. But I’m still inclined into thinking that distinguishing 
between checkboxes and switches, or giving authors the possibility of 
making the distinction at the level of control elements, is a useful 
thing to do. It’s not too late to introduce it. Most pages still use 
checkboxes just as selections, selecting options for some action to be 
requested shortly. (The select element may be a lost cause: as a user, 
you can’t know whether a dropdown just sets an option or actually “runs” 
it.)



In contrast, a button represents a single action, atomic from the
user's point of view. Pressing the button again should (it seems to
me) logically perform the same action again;


It would be too restrictive to require that, and an reality, things 
don’t work that way. For example, if the action consists of deleting 
something, you just can’t repeat it next.


Yucca

Re: [whatwg] imgset responsive imgs proposition (Re: The src-N proposal)

2013-11-12 Thread Jukka K. Korpela


2013-11-12 9:58, Adam Barth wrote:


Unfortunately, we can't add new tags to head.  If the parser sees a
tag it doesn't recognize in the head, it creates a fake body tag and
pushes the tag down into the body.


But you could use style type=text/foobar.../style, with a suitable 
value for foobar, like x-imgset. This could even be handled with a 
polyfill in old browsers (JavaScript code that reads such elements and 
interprets their content).


Yucca
(who would still prefer text/css)

Re: [whatwg] imgset responsive imgs proposition (Re: The src-N proposal)

2013-11-12 Thread Jukka K. Korpela


2013-11-12 17:52, Tab Atkins Jr. wrote:


No, we can't gate any of the major use-cases behind a time barrier
(waiting for external CSS to come in) like that.


Why does it need to be *external* CSS? Surely external style sheets are 
generally preferred, but if you want inline code, what is the problem 
with a style element? It’s less inline than img tag attributes, and 
it appears in the head, so you can process it even before you start 
parsing img tags.


Yucca

Re: [whatwg] Viewing situations - Re: The src-N proposal

2013-11-10 Thread Jukka K. Korpela


2013-11-10 19:36, Markus Ernst wrote:


Having a look at the proposal, and reading the thread especially with
regard to complexity and verbosity, I got the impression that @src-n
shares a main objection with @srcset and picture, that it mixes up
content and design to some extent.


That would be my main concern, too. But I would rather say that it 
really mixes up content and presentation, moving to HTML something that 
belongs to the scope of styling and can currently be handled in CSS.



Thus I suggest a modified approach which moves the distinction of
viewing situations, or breakpoints, to the head of the document,
creating some variable-like references to be used instead of numbers.
Some kind of src-var instead of src-N. Therefore, a new element for the
head would be necessary; I call it situations, it could also be
breakpoints or whatever is considered more appropriate:


Adding new elements is questionable, especially if they have content 
(which would be rendered as-is by current user agents).



head
situations
   small: (max-width: 400px);
   small2x: (max-width: 400px) 2x;
   medium: (max-width: 1000px);
   medium2x: (max-width: 1000px) 2x; and
   large2x: (min-width: 1000.01px) 2x;
/situations
/head
body
   img src-small=pic-small.jpg
src-small2x=pic-medium.jpg
src-medium=pic-medium.jpg
src-medium2x=pic-large.jpg
src-large2x=pic-x-large.jpg
src=pic-large.jpg
alt=Obama talking to a soldier in hospital scrubs.


This, too, would mix content and presentation. Admittedly, the line 
between them isn't always crystal clear, even if most of HTML5 pretends 
that it is. But here the approach should let an author specify an img 
element in markup and separately specify, in a style sheet language, 
that in some cases the src attribute value is to be overridden.


If people think that current CSS media queries are inadequate for the 
purpose (and I'm not convinced that they are), then the first question 
should be whether CSS can be suitably enhanced. Failing that, it would 
seem natural to define a new, restricted style language. Something like 
this:


style type=text/is
@media(max-width: 400px) { #pic { src: url(pic-small.jpg); } }
...
/style
...
img id=pic src=pic-large.jpg alt=
title=Obama talking to a soldier in hospital scrubs.

If the problem with current CSS approach is that browsers would still 
download the resource pointed to by the src attribute, then the 
processing of IS (image styling) style sheets could be defined so that 
they are evaluated when encountered and they are applied according to 
the browsing situation, so that they will take effect when the img 
elements are processed. This would be no more adhockery than src-* 
attributes would be. Actually, less.


Yucca

Re: [whatwg] OUTPUT tag: clarify purpose in spec?

2013-09-26 Thread Jukka K. Korpela


2013-09-26 21:41, Ian Hickson wrote:


There's a lot of output examples in the spec; do they help at all?


There are indeed several examples, but they are scattered around; the 
section that specifically deals with the output element, 4.10.15, has 
only one example.


It is a simple calculator that shows the calculated result in an 
output element. And it is a form with no action attribute and with 
onsubmit=return false, so it is clearly meant to work in the browser 
only. That is, the value of the output element is not submitted.


The question then arises why output is used, instead of just showing 
the result in a span or div element as usual. In fact, none of the 
examples about output have no apparent association with any submission 
to server-side processing.


Yet, the from the properties defined for output, whole point seems to 
be that the output element has a special purpose: it is a control, 
with a value that may be included in form data upon submission, but its 
value is not meant to be changed by the user directly, only via actions 
that may indirectly modify it. Simultaneously, it is normally visible to 
the user.


As I see it, the difference between output and a readonly input is 
that the latter is not meant to be changed by the user *at all*, whereas 
output is not be changed *directly*.


If this interpretation is correct, I think some of it should be somehow 
expressed in the spec, and there should be at least one example where 
output is seemingly participating in form data submission.


It's of course too late to change the name output now, but it is 
really misleading, since it suggests that the element is just for output 
(possibly even suggesting that it's really a duplicate of samp!). Yet 
it seems that it primarily computed data (in a broad sense for 
computed) to be submitted, though it can, like input, be used 
without submission too. So computed or input type=computed might 
have been better. I mention this because this name problem emphasizes 
the need for explaining what the element is really for.


I still don't quite see *why* output has been introduced. I can 
understand it as a purely logical creation, but what is the practical 
gain expected to be?


Yucca

Re: [whatwg] @generator-unable-to-provide-required-alt, figure with figcaption

2013-09-03 Thread Jukka K. Korpela


2013-09-04 0:09, Ian Hickson wrote:


To a user, even “(an image)” is better than lack of alt attribute


I disagree. The lack of an alt attribute can be used by user agents to
substitute the string (an image), in which case it is the same, or it
can be used to do far more, e.g. image recognition, OCR, etc. This isn't
academic, these technologies exist today.


There is nothing that makes that makes that impossible, or more 
difficult, if the element has an alt attribute. If you mean that 
programs would actually do such things if and only if the alt attribute 
is absent, then this is very speculative. Let’s worry about that when 
browsers are actually capable and willing to do such things at all.


There is an essential difference between lack of an alt attribute and a 
more or less generic value used for it, as in alt=(an image) or in 
alt=(image: horse5) (automatically generated e.g. from an image URL 
that ends with horse5.png) or in alt=(photo of Hixie). Lack of the alt 
attribute says absolutely nothing about the image; it might represent a 
word as an image, or be pure decoration, or be so complicated that 
writing a textual alternative would be major challenge in content 
production.


Someone who hears, says “image – horse five” at least gets some idea of 
what the image is about, and even “an image” as opposite to whatever a 
speech browsers says about img ... alt= is an improvement: the user 
can know that the author tried to find a textual replacement for the 
image but couldn't.



To the non-validator user agent, that attribute means nothing. It's a
non-conforming attribute with no semantics to any software outside content
generators and conformance checkers.


It is presented as a non-conforming attribute that can be used to get a 
clean validation report, i.e. to make a validator report a document as 
valid, as conforming. This is grossly illogical and misleading. Anyone 
who uses a validator has the right to know whether the document is valid 
or not, to the extent that this can be programmatically determined. And 
it is, if the attribute is not valid.


Here's a proposal:

The character U+FFFC OBJECT REPLACEMENT CHARACTER, which is “used as 
placeholder in text for an otherwise unspecified object” [the quote is 
from the code chart entry in the Unicode Standard] be used as the value 
of an alt attribute to indicate that it was not possible to write an 
adequate alternate text for the image. This typically means that the 
image comes from a source external to the system that generates the HTML 
document and the system cannot analyze it or otherwise find a suitable 
text replacement.


You could even add a statement like this:

User agents that present the value of an alt attribute to the user may 
express the value of U+FFFC using a generic expression like “some 
image”. They may also apply technologies that process the image trying 
to recognize its content and then express the result suitably, e.g. “an 
unrecognizable image” or “an image of a horse” or (in the case of having 
recognized that the content is scannable as text) “Hello”.


– It’s of course possible that people would then use alt=#fffc; to 
silence validators even when they could easily write real text there. 
But they can anyway use alt= for such purposes if they want to.


Yucca

Re: [whatwg] Can the maximum allowed value length be changed to restrict the number of characters?

2013-08-20 Thread Jukka K. Korpela

2013-08-20 2:40, Ryosuke Niwa wrote:

http://www.whatwg.org/specs/web-apps/current-work/multipage/association-of-controls-and-forms.html#maximum-allowed-value-length

Why is the maxlength attribute of the input element specified to
restrict the length of the value by the code-unit length?

Apparently because in the DOM, character effectively means code
unit. In particular, the .value.length property gives the length in
code units.

This is counter intuitive for users and authors who typically
intend to restrict the length by the number of composed character
sequences.

That is true. We should not expect end users to know whether a character
they enter occupies one code unit or two, i.e. whether it is a BMP
character or not. Then again, I don't expect most users to enter non-BMP
characters, though this might be changing as e.g. emoticons become more
popular.

In fact, this is the current shipping behavior of
Safari and Chrome.

And IE, but not Firefox. Here's a simple test:

input maxlength=2 value=#x10400;

On Firefox, you cannot add a character to the value, since the length is
already 2. On Chrome and IE, you can add even a second non-BMP
character, even though the length then becomes 4. I don't see this as
particularly logical, though I'm looking this from the programming point
of view, not end user view.

Can the specification be changed to use the number of composed
character sequences instead of the code-unit length?

In contexts where you want to set maxlength in the first place, your
reasons might well be related to limitations that apply to the code unit
length. It's a different thing if the intent is to limit the amount of
visible characters.

Interestingly, an attempt like
input pattern=.{0,42}
to limit the amount of *characters* to at most 42 seems to fail.
(Browsers won't prevent from typing more, but the control starts
matching the :invalid selector if you enter characters that correspond
to more than 42 code units.) The reason is apparently that . means
any character in the sense any code point, counting a non-BMP
character as two.

Also,
http://www.whatwg.org/specs/web-apps/current-work/multipage/common-input-element-attributes.html#the-maxlength-attribute
says if the input element has a maximum allowed value length, then
the code-unit length of the value of the element's value attribute
must be equal to or less than the element's maximum allowed value
length.

This doesn't seem to match the behaviors of existing Web browsers or
http://www.whatwg.org/specs/web-apps/current-work/multipage/association-of-controls-and-forms.html#maximum-allowed-value-length
unless I'm misreading something. Namely, the value attribute set in
the markup or by script isn't automatically truncated at the
element's maximum allowed value length.

There seems to be a conflict here indeed. It is different from the
character vs. code unit issue, however.

Definitions in 4.10.21.1 clearly imply that the length of the value of a
control may exceed the limit set by maxlength. The Constraints part
deals with the question what happens then (in form submission).

Yucca

Re: [whatwg] Can the maximum allowed value length be changed to restrict the number of characters?

2013-08-20 Thread Jukka K. Korpela


2013-08-20 17:09, Anne van Kesteren wrote:


On Tue, Aug 20, 2013 at 12:30 AM, Ryosuke Niwa rn...@apple.com wrote:

Can the specification be changed to use the number of composed character 
sequences instead of the code-unit length?


In a way I guess that's nice, but it also seems confusing that given

data:text/html,input type=text maxlength=1

pasting in U+0041 U+030A would give a string that's longer than 1 from
JavaScript's perspective.


Oh, right, this is an issue different from the non-BMP issue I discussed 
in my reply. This is even clearer in my opinion, since U+0041 U+030A is 
clearly two Unicode characters, not one, even though it is expected to 
be rendered as “Å” and even though U+00C5 is canonically equivalent to 
U+0041 U+030A.



I don't think there's any place in the
platform where we measure string length other than by number of code
units at the moment.


Besides, if “character” means something else than Unicode character 
(Unicode code point assigned to a character) or, as a different concept, 
Unicode code unit, then the question would arise what it means. For 
example, would a letter followed by 42 combining marks still be one 
character? (Such monstrosities are actually used, in an attempt to 
create “funny” effects.)


Yucca

Re: [whatwg] XML data islands related question

2013-08-08 Thread Jukka K. Korpela


2013-08-08 9:13, Ian Hickson wrote:


XHR uses the same underlying logic as img src= and script src=. If
you're able to conjur a file up for img src= or script src=, then
I don't see why you wouldn't be able to conjur it up for XHR.


When a local HTML file is opened in a browser and it accesses local 
files, with simple relative URLs like foo.png or bar.js, img 
src= and script src= do not cause HTTP requests of any kind.



 Could you
elaborate on exactly what you mean by truly local HTML5 application?


At the simplest case, it is a set of files (HTML, CSS, JavaScript, image 
files), and launching the application means opening the HTML file in a 
browser, or in a sufficiently browser-like program. Conceptually, this 
would work even if the Internet didn’t exist. In practice, such 
applications are often distributed via web servers, and they may have 
URLs, but they can also be distributed on different media offline.


(The issue is also relevant to applications that are not completely 
local and offline but may use HTTP connections for various purposes. For 
them, the point is that HTTP requests should not be done in vain, e.g. 
for a large static data file.)


So the question is: if I can use images and scripts in separate files in 
that setup, accessed directly as local files by the browser (or alike), 
why can’t I do the same for plain text, CSV, or XML data? If there is a 
security risk, then surely it must be bigger for script that refers to 
a JavaScript file via src=... than for script that refers to a plain 
text file via src=... Yet the latter is disallowed. Whatever the reasons 
might be, I don’t think specifications should declare it as prohibited, 
though they can warn that implementations may pose such restrictions.


Yucca

Re: [whatwg] HTML: A DOM attribute that returns the language of a node

2013-08-08 Thread Jukka K. Korpela


2013-08-08 2:57, Ryosuke Niwa wrote:


On Aug 2, 2013, at 6:10 AM, Jukka K. Korpela jkorp...@cs.tut.fi
wrote:

[...]

But regarding the effect of language markup on fonts, the effect is
limited to situations where the font is not specified in a style
sheet. This is a rather uncommon scenario these days; authors are
more than eager to set fonts.


Do you have actual statistics to support this point?


No, it’s just an impression from looking at numerous pages and their 
coding as well as views presented in authors’ forums.



As far as I
checked, neither baidu.com nor yahoo.com.tw seems to explicitly
specify a Chinese font.


They both have font-family settings, slightly different, but basically 
the most common (sorry, no statistic on this either) setup that uses 
Arial (possibly with Helvetica as second option, which does not change 
much). So, granted, they don’t specify a Chinese font in the sense of 
including any specific fonts containing CJK characters in the 
font-family list.


Baidu doesn’t set lang either, so they seem to be accepting, for any 
characters not covered by Arial, whatever happens to be in each 
browser’s list of fallback fonts, when no information about content 
language is available. Yahoo.com.tw sets lang=zh-tw, so they do care, 
but only to the extent that the fallback font should be one intended for 
Traditional Chinese.


So the lang markup may affect fonts, but only under some conditions. And 
if you care about fonts, as an author, then an explicit list of font 
alternatives has better chances of creating the desired result.



It is true that they might specify a font list where none of the
fonts supports some characters that might be entered, and then a
fallback font would be used. However, using “annotations”
(presumably, lang attributes, along with extra span elements when
needed) does not sound like a feasible approach to this.


Whether it’s feasible or not, that’s what we have been doing due to
the Han unification.  If we could, we’ll undo the Han unification and
use different glyphs for each character but we can’t do that at this
point in time.


If a page contains texts to be rendered using different forms 
(Traditional Chinese, Simplified Chinese, Japanese, Korean) for Han 
characters, you will need to control the rendering somehow. Using lang 
markup might be theoretically most adequate, but it’s indirect and less 
effective than just setting different fonts (via font-family lists that 
contain reasonably many alternatives).


But even if lang attributes are used, I don’t think the issue has much 
relevance to the original question here. A DOM attribute that returns 
the language of a node would be useful for the purpose only if you 
intend to affect rendering via JavaScript.


Yucca

Re: [whatwg] XML data islands related question

2013-08-07 Thread Jukka K. Korpela


2013-08-08 0:08, Ian Hickson wrote:


On Tue, 6 Aug 2013, Jukka K. Korpela wrote:

2013-08-06 17:45, Ian Hickson wrote:


If such an application needs some bulk of text data, it can be
included e.g. in script type=text/plain.../script but not in a
separate plain text file (included into the application
distribution, along with other files) referred to via script
src=.../script. This is a frustrating restriction and makes it
more difficult to maintain and customize application. If an external
plain text file could be used, the data content could be separately
managed (requiring knowledge only about the format used).


I'm not sure what you mean by application distribution. Why can't a
text/plain file by included the same way an image/png file is
included?


It can be included as a file, but it cannot be used. I can't read it.
That is the point. I can use an img element referring to an image
file, but I cannot refer to a simple plain text file (or an XML file) in
an HTML document in a manner that lets me process its content in
scripting. I can only include it via iframe or object, but that's
different from accessing its content.


I don't understand why XHR doesn't work for you.


Because there is no server to talk to when you’re a truly local HTML5 
application.


Yucca

Re: [whatwg] XML data islands related question

2013-08-06 Thread Jukka K. Korpela


2013-08-06 2:27, Ian Hickson wrote:


On Thu, 7 Feb 2013, Jukka K. Korpela wrote:

[...]

It's a bit odd that if you wish to set up a standalone application
running in a browser (often called HTML5 application, without implying
any particular version of HTML5), you can include e.g. scripts and
images in separate files but not plain text or XML data


Why can't you put plain text or XML data in other files? So long as
everything is same origin, you can read anything you want via XHR.


A standalone application should be as self-contained as possible, 
without needing HTTP connections or any network connections to access 
its own data. When no connections are needed for other reasons, an HTML5 
application should run in any client capable of just interpreting HTML 
and JavaScript (and, in practice, CSS).


If such an application needs some bulk of text data, it can be included 
e.g. in script type=text/plain.../script but not in a separate plain 
text file (included into the application distribution, along with other 
files) referred to via script src=.../script. This is a frustrating 
restriction and makes it more difficult to maintain and customize 
application. If an external plain text file could be used, the data 
content could be separately managed (requiring knowledge only about the 
format used).


Yucca

Re: [whatwg] XML data islands related question

2013-08-06 Thread Jukka K. Korpela


2013-08-06 17:45, Ian Hickson wrote:


If such an application needs some bulk of text data, it can be included
e.g. in script type=text/plain.../script but not in a separate plain
text file (included into the application distribution, along with other
files) referred to via script src=.../script. This is a frustrating
restriction and makes it more difficult to maintain and customize
application. If an external plain text file could be used, the data
content could be separately managed (requiring knowledge only about the
format used).


I'm not sure what you mean by application distribution. Why can't a
text/plain file by included the same way an image/png file is included?


It can be included as a file, but it cannot be used. I can't read it. 
That is the point. I can use an img element referring to an image 
file, but I cannot refer to a simple plain text file (or an XML file) in 
an HTML document in a manner that lets me process its content in 
scripting. I can only include it via iframe or object, but that's 
different from accessing its content.


Yucca

Re: [whatwg] HTML: A DOM attribute that returns the language of a node

2013-08-02 Thread Jukka K. Korpela


2013-08-02 2:43, Ryosuke Niwa wrote:


Are you saying that for HTML contenteditable-based editors that want to
support drag-and-drop editing, they need to be able to annotate the
outgoing HTML fragment with the effective language so that when it's
embedded somewhere, the right fonts get used?


Yes, but not just for drag and drop.


This would mean that the editor would have to guess the language from 
the text or ask the user to specify it. This is not as unrealistic as it 
may first seem. Microsoft Word does such things, sometimes getting 
things right, often messing things up. It typically detects change of 
language too late, and often infers language from keyboard settings, 
making it rather impossible to use a multilingual keyboard easily.


But regarding the effect of language markup on fonts, the effect is 
limited to situations where the font is not specified in a style sheet. 
This is a rather uncommon scenario these days; authors are more than 
eager to set fonts. It is true that they might specify a font list where 
none of the fonts supports some characters that might be entered, and 
then a fallback font would be used. However, using “annotations” 
(presumably, lang attributes, along with extra span elements when 
needed) does not sound like a feasible approach to this.


But I guess the issue is still adding a DOM property for element nodes, 
specifying the language of the node, to the extent that it can be 
inferred from lang or xml:lang attribute or from HTTP headers (real or 
faked via meta). Although the use cases are somewhat rare and not 
particularly important, the property would be conceptually easy and 
presumably easy to implement in browsers. So it could be added, well, 
just because there is no good reason not to. It may understandably 
irritate authors who need language information that they know that the 
browser has it (it needs it to implement :lang() in CSS) but does not 
give authors access to it.


Yucca

[whatwg] Background of body covering the whole page – is this described somewhere?

2013-07-23 Thread Jukka K. Korpela

Browsers seem to be rather consistent in applying the background 
properties of the body element, if set, to the entire viewport, even if 
e.g. the height of the body is explicitly set to a small value. Example:


!doctype html
title/title
style
body { background: green; height: 2em; }
/style
Hello world.

The entire viewport has green background. If I add

html { background: white }

to the style sheet, things change radically (to match expectations).

This also affects background images.

This sounds odd, since the body element is a child of html, so we should 
expect the html element background shine thru if the body element has 
transparent background, not vice versa.


Is this described somewhere in HTML or CSS specifications or drafts? I 
think it should be, since it is what browsers do (tested on Firefox, 
Chrome, IE), and even though it sounds absurd, I’m afraid pages may rely 
on it. A natural place to look at is

http://www.whatwg.org/specs/web-apps/current-work/multipage/rendering.html#the-page
but I can’t find any statement about the body element extending to cover 
the viewport as far as backgrounds are considered.


--
Yucca, http://www.cs.tut.fi/~jkorpela/

Re: [whatwg] Background of body covering the whole page – is this described somewhere?

2013-07-23 Thread Jukka K. Korpela


2013-07-23 20:44, Anne van Kesteren wrote:


On Tue, Jul 23, 2013 at 10:37 AM, Jukka K. Korpela jkorp...@cs.tut.fi wrote:

Is this described somewhere in HTML or CSS specifications or drafts? I think
it should be, since it is what browsers do (tested on Firefox, Chrome, IE),
and even though it sounds absurd, I’m afraid pages may rely on it. A natural
place to look at is
http://www.whatwg.org/specs/web-apps/current-work/multipage/rendering.html#the-page
but I can’t find any statement about the body element extending to cover the
viewport as far as backgrounds are considered.


http://www.w3.org/TR/CSS2/colors.html#background


Thanks for the prompt reply. Makes things clear. On the practical side, 
however, I wonder whether this part of CSS specifications should be 
referred to in the Rendering section, as it is logically rather unexpected.


Yucca

Re: [whatwg] Requiring the Encoding Standard preferred name is too strict for no good reason

2013-07-02 Thread Jukka K. Korpela


2013-07-02 2:16, Ian Hickson wrote:


The reason that ISO-8859-1 is currently non-conforming is that the label
no longer means ISO-8859-1, as defined by the ISO. It actually means
Windows-1252.


Declaring ISO-8859-1 has no problems when the document does not contain 
bytes in the range 0x80...0x9F, as it should not. There is a huge number 
of existing pages to which this applies, and they are valid by HTML 4.01 
(or, as the case may be, XHTML 1.0) rules. Declaring all of them as 
non-conforming and issuing an error message about them does not seem to 
be useful.


You might say that such pages are risky and the risk should be 
announced, because if the page is later changed so that contains a byte 
in that range, it will not be interpreted by ISO-8859-1 but by 
windows-1252. From the perspective of tradition and practice, this is 
just about error handling. By HTML 4.01, those bytes should be 
interpreted as control characters according to ISO-8859-1, and this 
would make the document invalid, since those control characters are 
disallowed in HTML 4.01. Thus, whatever browsers do with the document 
then is error processing, and nowadays probably all browsers have chosen 
to interpret them by windows-1252.


Admittedly, in XHTML syntax it’s different since those control 
characters are not forbidden but (mostly) “just” discouraged.


I think the simplest approach would be to declare U+0080...U+009F as 
forbidden in both serializations. Then the issue could be defined purely 
in terms of error handling. If you declare ISO-8859-1 and do not have 
bytes 0x80...0x9F, fine. If you do have such a byte, we should still 
treat the encoding declaration as conforming as such, but validators 
should report the characters as errors and browsers should handle this 
error by interpreting the document as if the declared encoding were 
windows-1252.



It seems bad, and maybe rather full of hubris, to make it conforming to
use a label that we know will be interpreted in a manner that is a willful
violation of its spec (that is, the ISO spec).


In most cases, there is no violation of the ISO standard. Or, to put it 
in another way, taking ISO-8859-1 as a synonym for windows-1252 is fully 
compatible with the ISO 8859-1 standard as long as the document does not 
contain data that would be interpreted by ISO 8859-1 as C1 Controls 
(U+0080...U+009F), which it should not contain.



I would rather go back to having the conflicts be caught by validators
than just throw the ISO spec under the bus, but it's really up to you
(Henri, and whoever else is implementing a validator).


Consider a typical case. Joe Q. Author is using ISO-8859-1 as he has 
done for years, and remains happy, until he tries to validate his page 
as HTML5. Is it useful that he gets an error message (and gets 
confused), even though his data is all ISO-8859-1 (without C1 Controls)? 
Suppose then than he accidentally enters, say, the euro sign “€” because 
his text editor or other authoring tool lets him do – and stores it as 
windows-1252 encoded. Even then, no practical problem arises, due to the 
common error handling behavior, but at this point, it might be useful to 
give some diagnostic if the document is being validated.


I would say that even then a warning about the problem would be 
sufficient, but it could be treated as an error – as a data error, with 
defined error handling. The occurrences of the offending bytes should be 
reported (which is what now happens when validating as HTML 4.01, even 
though the error messages are cryptic, like “non SGML character number 
128”). The author might then decide to declare the encoding as windows-1252.


But even though the most common cause of such a situation is an attempt 
to use (mostly due to ignorance) certain characters without realizing 
that they do not exist in ISO-8859-1, it might be a symptom of some 
different problem, like malformed data unintentionally appearing in a 
document. It is thus useful to draw the author’s attention to specific 
problems, incorrect data where it appears, rather than blindly taking 
ISO-8859-1 as windows-1252.


Yucca

Re: [whatwg] Requiring the Encoding Standard preferred name is too strict for no good reason

2013-07-02 Thread Jukka K. Korpela


2013-07-02 10:15, Anne van Kesteren wrote:


On Tue, Jul 2, 2013 at 8:05 AM, Jukka K. Korpela jkorp...@cs.tut.fi wrote:

[...]


I think a much more interesting problem is when they update that old
page with an IRI, form, or some XMLHttpRequest, and shit hits the
fan. That's why you want to flag all non-utf-8 usage and just get
people to migrate towards sanity.


Such evangelism is a different issue. If you want to nag “you should use 
UTF-8”, as a warning, each and every time when someones declares any 
other encoding, you will confuse or irritate many people and will reduce 
the popularity of validators. But in any case, it is quite distinct from 
the issue of declaring the iso-8859-1 encoding as an error.


Yucca

Re: [whatwg] Maxlength attribute on input[type=number]

2013-06-27 Thread Jukka K. Korpela


2013-06-28 1:09, Scott González wrote:


Why would you want to set maxlength as opposed to setting max?


People want to do such things to cover old browsers that do not support 
type=number. Such browsers ignore both the type attribute and the max 
attribute, so to impose *some* limits, people would use maxlength.


Yucca

Re: [whatwg] @generator-unable-to-provide-required-alt, figure with figcaption

2013-06-07 Thread Jukka K. Korpela


2013-06-08 0:13, Ian Hickson wrote:


On Sun, 2 Jun 2013, Jukka K. Korpela wrote:


The purpose presented is to avoid markup generators from being
pressured into replacing the error of omitting the alt attribute with
the even more egregious error of providing phony alternative text. This
is rather speculative, and it seems to lead to various attempts that are
more or less self-contradictory.


It's not that speculative, your e-mail is a response to a markup generator
implementor who feels pressured in exactly this way!


And who wrote that generator-unable-to-provide-required-alt is... 
inadequate.



Authors of generators always have the option of generating things like
alt=(an image), which can hardly be worse than lack of alt attribute.


It's worse because it prevents authors from being able to find images that
are lacking good alternative text, and because it makes it less likely
that future user agents will try to automatically figure out what the
alternative text should be (since one is already provided).


To a user, even “(an image)” is better than lack of alt attribute, which 
is what generator-unable-to-provide-required-alt really means. And in 
the case of user-submitted images, “(a user-submitted image)” might be 
even better. Lack of alt can mean just about anything; there are 
millions if not billions of images without alt attribute just because an 
author did not think of the issue. A generic text “(an image)” at least 
suggests that it’s a content image with no obvious alternate text.


To analyze which images lack good alternative texts, you need to look at 
the images in their context. It’s just wrong to assume that they can be 
identified using some simple automated analysis. And future user agents 
won’t try to figure out what the alternative text should be, any more 
than current browsers do such things. It is just wishful thinking to 
expect such processing, and if browsers tried to do such things, they 
would just mess things up.


Yucca

Re: [whatwg] @generator-unable-to-provide-required-alt, figure with figcaption

2013-06-02 Thread Jukka K. Korpela


2013-06-02 20:21, Martin Janecke wrote:


While I can imagine why an accessibility evangelist would want a
conformance-checker-silencer to be as unattractive to use as
possible, that really defeats its purpose, if it also deters code
generator programmers.


The purpose presented is to avoid markup generators from being 
pressured into replacing the error of omitting the alt attribute with 
the even more egregious error of providing phony alternative text. This 
is rather speculative, and it seems to lead to various attempts that are 
more or less self-contradictory.


Lack of alt attribute is not a mortal sin, but neither should it be 
accepted when accompanied with special incantations. It is simply an 
error to be reported. Validators are being improved to let users silence 
errors by type. This should be enough to deal with the issue.


Authors of generators always have the option of generating things like 
alt=(an image), which can hardly be worse than lack of alt attribute.


Yucca

Re: [whatwg] HTML differences from HTML4 document updated

2013-05-06 Thread Jukka K. Korpela


2013-05-06 15:12, Simon Pieters wrote:


I think you should start from making the title sensible. HTML
differences from HTML4 is too esoteric even in this context.


Do you have a suggestion?


I made some suggestions, which you comment later, but I will make 
another one here.



Besides, the spelling is HTML 4. Especially if you think HTML 4 is
ancient history, retain the historical spelling.


I don't think this is of particular importance.


If it isn't, why not use the correct spelling? When referring to 
specifications, it is usually a good idea to use their own spelling, 
even when it is odd and confusing.



HTML 4.01 is intended. The differences between revisions of HTML4 is out
of scope.


Then the heading should say HTML 4.01.


HTML has been used through the ages to denote a markup language (and
associated definitions) in a broad sense, as opposite to specific
versions. This is still the everyday meaning. And a title of a work
should be understandable without reading some explanation inside it,
saying that some common term has an uncommon meaning.

If you can't agree on a proper name, at least call it something like
modern HTML. Or, perhaps more realistically, near-future HTML.


Modern HTML differences from HTML4? I'm not convinced that's a win.
Near-future seems wrong since it's more like current.


The difficulty here directly reflects the vague nature of HTML5: it 
partly tries to describe HTML as actually implemented and partly 
specifies features that should (or shall) be implemented. Hence it is 
both modern and (intended to be) near-future.


But the fundamental difficulty is that you are trying to describe a 
specific version, or set of versions, of HTML without giving it a proper 
name or version number.


Since WHATWG does not use a proper name for its version (the title is 
just HTML), I think the only way to refer to it properly is to prefix 
it with WHATWG. This would lead to the title


Differences of HTML5 and WHATWG HTML from HTML 4.01


It's not clear to me why the document is needed in the first place. It
would seem to be much more relevant to document in detail the
differences between HTML 5, HTML 5.1, and WHATWG Living HTML than to
write a rather general document about the differences between them (as
if they were a single and stabile specification) and HTML 4.


Such a document would be useful, but it's not this document. The primary
focus for this document is what is different from HTML4.


But why? What is the purpose of this document? This is relevant to 
naming it, and to the content too, of course. Now it is neither a 
reliable comparison with links the relevant clauses nor an overview - it 
has too many details, to begin with. Is this for authors who consider 
moving from HTML 4.01 to HTML 5? Then I think it should primarily 
specify what HTML 4.01 features are forbidden in HTML 5, then the 
extensions.


Yucca

Re: [whatwg] HTML differences from HTML4 document updated

2013-05-03 Thread Jukka K. Korpela


2013-05-03 18:37, Simon Pieters wrote:


The past few days I've been working on updating the HTML differences
from HTML4 document, which is a deliverable of the W3C HTML WG but is
now also available as a version with the WHATWG style sheet:

http://html-differences.whatwg.org/


I think you should start from making the title sensible. HTML 
differences from HTML4 is too esoteric even in this context.


Think about a heading FOO differences from FOO9. Wouldn't you say that 
some FOOist is writing very obscurely?


Besides, the spelling is HTML 4. Especially if you think HTML 4 is 
ancient history, retain the historical spelling.


Yucca

Re: [whatwg] HTML differences from HTML4 document updated

2013-05-03 Thread Jukka K. Korpela


2013-05-03 21:19, Xaxio Brandish wrote:


Ah.  The document scope [1] explains why it uses HTML in the title as
opposed to HTML5 or HTML(5).


No, it only says *that* it uses HTML to refer to the W3C HTML5 
specification, W3C HTML5.1 specification, and the WHATWG HTML standard. 
*Why* it does so is not addressed at all, though the reader might infer 
that people just couldn't agree on a name, after WHATWG decided to 
abandon the name HTML5.


HTML has been used through the ages to denote a markup language (and 
associated definitions) in a broad sense, as opposite to specific 
versions. This is still the everyday meaning. And a title of a work 
should be understandable without reading some explanation inside it, 
saying that some common term has an uncommon meaning.


If you can't agree on a proper name, at least call it something like 
modern HTML. Or, perhaps more realistically, near-future HTML.


It's not clear to me why the document is needed in the first place. It 
would seem to be much more relevant to document in detail the 
differences between HTML 5, HTML 5.1, and WHATWG Living HTML than to 
write a rather general document about the differences between them (as 
if they were a single and stabile specification) and HTML 4.


Yucca

Re: [whatwg] Why do we have input type='month' and input type='week'?

2013-02-12 Thread Jukka K. Korpela


2013-02-12 19:26, Tab Atkins Jr. wrote:


The fact that authors today have a random assortment of displays for
the exact same feature (credit card expirys) is something that would
be great to fix, not bemoan as a loss to the world. ^_^


Well, maybe, from some point of view, but is there really something to 
be fixed, and is it probable that input type=month would fix it?


I have seen many input widgets for such data and used them a lot in test 
purchases. In general, the more advanced they try to be, the more 
annoying they get. I can type 03/15, or whatever reads in the card, 
rather fast. But if I have to pick things up from dropdowns or click on 
something in a calendar picker, I surely hope I won't need to do this a 
dozen more times.


What are the odds that browser vendors will implement input type=month 
in a simple manner that allows fast typing as one input method? Rather 
small I think.


This would make the most obvious, and perhaps the most common, use for 
input type=month a case *against* it.  Credit card expiry month is 
best handled with a text input field, with suitable checks on the input 
string. There may be *other* cases where graphic widgets are good when 
selecting a month, but authors can use libraries for such purpose, and I 
don't see any particular reason why this should be standardized across 
pages but not across browsers.


Even if input type=month became widely supported, many, probably most, 
authors will keep using libraries or their own code, because they get 
consistent look and feel and functionality across browsers. Some authors 
would be misled into using input type=month for any month input 
because that's logical or semantic (as it is in a sense), but this 
will create questionable user experience in many common cases.


Yucca

Re: [whatwg] Why do we have input type='month' and input type='week'?

2013-01-31 Thread Jukka K. Korpela


2013-01-31 14:20, Bruce Lawson wrote:


Others have commented on use-cases for collecting month, eg credit card
expiries.


I have seen forms that prompt for year in month to specify start of 
employment (apparently when the exact date is not interesting) or a 
month to use when searching for cheapest flights to somewhere, 
apparently assuming that the customer is flexible with dates. Or you 
could have a month selection in a calendar application, or budget 
application.


There are several use cases. It might be argued that they are 
considerably less common than selecting a day,


The main problem is different, and shared with other date and time 
fields: do authors really want each visitor to see whatever widget his 
browser is showing? In the ideal world, maybe. There is great potential 
in principle, since the widget could be selected, by the user or someone 
helping him, so that it meets the user’s personal needs and preferences. 
It could also be argued that in the long, it greatly improves usability 
if different sites and applications use methods based on such widgets, 
so that the user can routinely use them, instead of wondering why this 
widget does not work the way he would expect from past experience with 
similar widgets. But is this going to happen? Why would 
authors/designers/managers favor some “standard widgets”?



The use-case for an input type I imagine is that a browser can have a
select-like UI (Jan, Feb, March, April ...) which, in a French language
browser becomes Janvier, Fevrier, Mars, Avril ..  (or even Vendémiaire
to Fructidor for FRC fans).


Right. And this probably becomes a nuisance if you need to select 
December 1952, because the widgets have typically been designed so that 
you need to click on something to get one year forward or backward. The 
other problem is that in non-supporting browsers, or in browsers that 
implement input type=month in a very simple manner (textbox, user 
input is taken as such, just checked for correctness), the user needs to 
type e.g. 1952-12, which is fast and simple – as soon as you know what 
is expected from you.


Yucca

Re: [whatwg] use cases for untitled article and section elements

2013-01-15 Thread Jukka K. Korpela


2013-01-15 11:57, Steve Faulkner wrote:


Can anyone point me to or provide use cases for untitled article and
section elements?


The example that first comes into my mind is a discussion forum where 
contributions (which would appear to match the article idea) can be 
posted, and are usually posted, without a title of any kind. A 
discussion has a title (subject), but individual contributions are 
basically just text, though in advanced systems they may contain markup.



as in who are the potential consumers of document outlines with untitled
sections?


Oh. That's a different issue. This whole outline thing does not look 
very realistic. I have not seen much practical interest in it; the 
HTML5 Outliner add-on in Firefox is one of the few signs of interest, 
and it's fairly primitive.


Yucca

Re: [whatwg] use cases for untitled article and section elements

2013-01-15 Thread Jukka K. Korpela


2013-01-15 14:15, Ian Yang wrote:

 The one came into my mind is blog comments, which are often

coded using untitled articles. But personally I think that is wrong
because every sectioning element should have a heading.


Using headings is generally a very good authoring principle, but there 
are exceptions. Small comments rarely benefit from titles (headings).


A very different example is a novel. A novel is almost always divided 
into sections, and sections may have subsections (visually separated 
e.g. using extra empty space or maybe ***). The sections may or may 
not have title. Often they have just numbers, presented as titles like 
Chapter 1, so they are more or less pseudo-titles (and could be 
replaced by CSS-generated content). Subsections almost never have headings.


So what a browser could do, with a novel that uses section, is to 
provide an outline of the structure, possibly so that along with 
numbers, there are short excerpts from the start of each section or 
subsection.


Yucca

Re: [whatwg] use cases for untitled article and section elements

2013-01-15 Thread Jukka K. Korpela


2013-01-15 15:44, Steve Faulkner wrote:


  this example: https://dl.dropbox.com/u/377471/article.html

results in this outline:

what is the use of the untitle articles?

 Example of article use from HTML 5.1 spec
 Bacon on a crowbar
 Untitled ARTICLE
 Untitled ARTICLE
 Untitled ARTICLE
 Untitled ARTICLE
 Untitled ARTICLE
 Untitled ARTICLE

what is the use of the untitled articles?


They indicate nesting, nothing more. It seems that the article element 
is being defined to suit the needs of displaying discussion threads, 
even making article elements oddly nested.


When a contribution comments on another contribution, neither is 
logically part of the other. They are related, not nested. Blockquotes, 
on the other hand, may be nested, especially in e-mail messages in a 
particular style (quote the full message being replied to, after your 
own message).


It is difficult to see what the idea of the example is, but it says: 
The article element is used for each post, to mark up the threading. I 
wonder if threads would deserve markup of their own, possible defined in 
somewhat more abstract terms. But nested lists would be more natural 
(and would create acceptable default rendering even in oldest browsers).



or of the 133 untitled articles on
http://html5doctor.com/designing-a-blog-with-html5/

what is the use case for using article in this case over the use of
other markup such as lists?

what does it provide?


Not much, but there is generally little evidence of actual benefits from 
using article. In principle, though, you might want to use article 
inside a li or td element, for example, to indicate that the content 
is syndicatable.


Regarding the use of heading markup, I don't see why it would be useful 
to turn author names, time stamps, and things like that (which are more 
of metadata than headings for the content) into headings. In an 
application that shows a document outline, you can extract part of the 
start of a section or article or snapshot on some other basis, if 
needed.


Yucca

Re: [whatwg] Forms-related feedback

2013-01-15 Thread Jukka K. Korpela


2013-01-16 4:20, Ian Hickson wrote:


* If the type=datetime UI asks a local datetime, UA needs to convert local
datetime to UTC datetime, of course.
   However, it's very hard to implement.
** The UI needs extra work for edge cases of daylight saving time -
standard time switching.
** A local computer doesn't have complete information of daylight saving
time period of every year.


Yes, it's hard to implement. But someone has to do it. I'd rather it was
you and a handful of other browser vendors than a million Web authors.


I don't think a million authors do such things. Instead, a few people 
may develop libraries for the purpose.



The harder something is to do, the more valuable it is for browser vendors
to be the ones to do it rather than the site authors.


Since the use cases are rare, is it better to force browser vendors to 
develop code to implement it, in their own ways, than to let various 
software developers set up libraries for it? Since the browser 
implementations would, with practical certainty, lack adequate 
localizability (according to page language) and customizability, the 
HTML construct would not be used much.


Authors, or their employers and clients, don't want just a date and 
time picker for example. They want a picker that suits their overall 
design. I don't think this will change anytime soon. Pages now use a 
wide variety of date pickers. While input type=date might be useful 
for testing and quick prototyping, and might be used by 
functionality-oriented authors who don't care much about look and feel, 
input type=datetime would rarely be used even for such purposes, so it 
would be an undue burden on browsers


Yucca

Re: [whatwg] A plea to Hixie to adopt main

2012-11-07 Thread Jukka K. Korpela


2012-11-07 16:23, Simon Pieters wrote:


Hixie's argument is, I think, that the use case that main is intended
to address is already possible by applying the Scooby-Doo algorithm, as
James put it -- remove all elements that are not main content, header,
aside, etc., and you're left with the main content.


Hixie's idea is sufficient for determining the main content (in some 
sense) on a page that systematically uses the new structuring elements. 
This in turn is sufficient for some styling purposes, but not all purposes.



I think the Scooby-Doo algorithm is a heuristic that is not reliable
enough in practice, since authors are likely to put stuff outside the
main content that do not get filtered out by the algorithm, and vice versa.


That's one point. Another point is that authors don't really need to use 
all the header, nav, etc. elements, and validation does not check 
for this. E.g., if you just want to have some styles that only apply to 
the main content, you might want to use just main and not all the 
other stuff.


But perhaps the strongest argument in favor of main is that the 
Scooby-Doo algorithm may determine the content, but it does not make it 
an element, on any DOM node. And elementhood is essential for many 
styling and scripting purposes.



Implementations that want to support a go to main content or
highlight the main content, like Safari's Reader Mode, or whatever
it's called, need to have various heuristics for detecting the main
content, and is expected to work even for pages that don't use any of
the new elements. However, I think using main as a way to opt out of
the heuristic works better than using aside to opt out of the
heuristic.


Sounds logical. And the Reader Mode functionality that you mention is 
one of the few signs of any meaningful support to the new structuring 
elements. There is much talk about the assumed semantic benefits of 
those elements, much less evidence of real benefits.


I suppose that the heuristics would include recognizing a div element 
to which class main has been assigned. Then one could argue that 
main is not needed, as authors can keep using div class=main, as 
millions of pages use. Then again, a similar argument would apply to 
header and friends.



If there is anyone besides from Hixie who objects to adding main, it
would be useful to hear it.


Well, I haven't seen much point in any of the new structuring elements 
in general, but the browser behavior you write about would make main 
much more relevant than the others.


Yucca

Re: [whatwg] A plea to Hixie to adopt main

2012-11-07 Thread Jukka K. Korpela


2012-11-07 16:53, Steve Faulkner wrote:


ARIA roles are used because the semantics are not fully implemented in
browsers yet.


It's a bit more complicated than that, isn't it? ARIA roles are also, 
and originally, meant for describing the meaning of elements that are 
used in rich Internet applications in a manner that cannot be deduced 
from the HTML markup. For example, if you set up a span element that 
acts as a checkbox, driven by JavaScript and formatted with CSS to look 
like a checkbox, then the ARIA role attribute is needed to inform 
browsers and assistive software about this.


Besides, many distinctions that can be made with ARIA roles cannot be 
described in HTML as currently defined. But that's not a big issue 
really, and it does not mean that corresponding elements should be added 
to HTML. ARIA has its role (no pun intended), and software that can 
currently handle role=foo would really benefit nothing from the 
introduction of a foo element. It would be just some new stuff that 
should be supported in addition to existing support.


So the existence of something as an ARIA role value does not imply that 
a correspoding element should be added to HTML if not already present 
there. But neither does it constitute a counterargument to adding new 
elements.


Yucca

Re: [whatwg] Character-encoding-related threads

2012-10-19 Thread Jukka K. Korpela


2012-10-19 19:33, Ian Hickson wrote:


On Fri, 19 Oct 2012, Jukka K. Korpela wrote:


Are there any situations that this doesn't handle where it would be
legitimate to omit a title element?


Perhaps the simplest case is an HTML document that is only meant to be
displayed inside an inline frame and containing, say, just a numeric
table. It is not meant to be found and indexed by search engines, it is
not supposed to be rendered as a standalone document with a browser top
bar (or equivalent) showing its title, etc.


The initial intent of such a document may be to only display it in a
frame, but since it's independently addressable, nothing stops a search
engine from referencing it, a user from bookmarking it, etc. So I don't
think that's an example of where omitting title is a good idea.


Anyone who bookmarks a document that was not meant to be bookmarked 
should accept the consequences.


But it seems that it is pointless to present any situations where it 
would be legitimate to omit a title element, since you are prepared to 
refuting any possible example by presenting how things could be 
different from the scenario given.



The title element represents the document's title or name.


Yet you seem to deny, a priori, the possibility that a document does not 
need a title or a name.


Yucca

Re: [whatwg] Character-encoding-related threads

2012-10-18 Thread Jukka K. Korpela


2012-10-19 2:09, Ian Hickson wrote:

 On Fri, 13 Jul 2012, Jukka K. Korpela wrote:
[...]
 It might be better to declare title optional but strongly recommend
 its use on web or intranet pages (it might be rather irrelevant in other
 uses of HTML).

 That's basically what the spec says -- if there's a higher-level protocol
 that gives a title, then it's not required. It's only required if
 there's no way to get a title.

My point is that the title may be irrelevant, rather than specified 
using a higher-level protocol.


 Are there any situations that this doesn't handle where it would be
 legitimate to omit a title element?

Perhaps the simplest case is an HTML document that is only meant to be 
displayed inside an inline frame and containing, say, just a numeric 
table. It is not meant to be found and indexed by search engines, it is 
not supposed to be rendered as a standalone document with a browser top 
bar (or equivalent) showing its title, etc.


The current wording looks OK to me, and it to me, it says that a title 
is not needed when the document is not to be used out of context:


The title element represents the document's title or name. Authors 
should use titles that identify their documents even when they are used 
out of context, for example in a user's history or bookmarks, or in 
search results.

http://www.whatwg.org/specs/web-apps/current-work/#the-title-element

Authors may still wish to use a title element in a document that is to 
be just shown in an inline frame, but it is comment-like then. I don't 
think it's something that should be required (even in a should clause).


Yucca

Re: [whatwg] acronym - Proposal for re-instating

2012-10-16 Thread Jukka K. Korpela


2012-10-16 2:40, Karl Dubost wrote:


Le 15 oct. 2012 à 11:40, Willabee Wombat a écrit :

acronym the word is spoken.
abbr the abbreviation is spelt out, letter by letter.

[…]

  - Screen readers may make use of them.


simple definition.


I don't see the definition as simple; it is short, but not simple. 
Apparently, acronym could not mean just the word is spoken. We are 
not supposed to use it for any word, are we? Instead, the implied idea 
is probably that acronym indicates that the word has originally been 
formed as an abbreviation (of initial letters of words). The question 
is: why would it be relevant to indicate such a thing in markup?


In almost all cases, it would be distracting if a screen reader spoke 
the expansion of an acronym. Being an acronym means that the 
expression is now a word.


Abbreviation is a broad and vague concept, and an abbreviation may be 
spoken in different ways: letter by letter, or by pronouncing the 
unabbreviated word(s), or as a word (as an acronym). Sometimes even by 
pronouncing something completely different, as in reading e.g. as for 
example.



An issue though, (automatic) translation. for example

 abbr title=United Nations
   lang=enUN/abbr

would have to become in French once translated.

 acronym title=Organization des Nations Unies
  lang=frONU/acronym


And what about e.g. CEN, which might be treated as an acronym, or spoken 
using the names of letters (or, in extravagant situations, using the 
words from which the abbreviation was once formed)?


acronym is unnecessary and confusing. Even abbr is problematic, 
since it has often been interpreted so that the title=... attribute 
should be read in its stead - even though the attribute was introduced 
into HTML as an advisory title, not as a pronunciation instruction.


The issue of telling the suggested spoken form of some written text 
should be kept separate from any existing markup features. I know that 
some software reads title=... attributes, but it's normally just an 
option, and it conflicts with other uses of the attribute. Authors may 
wish to use title=... just to show a visible tooltip, and they do that 
a lot.


Yucca

Re: [whatwg] New URL Standard

2012-09-24 Thread Jukka K. Korpela


2012-09-24 12:47, Karl Dubost wrote:


On cite attributes, I'm using urn:isbn:

blockquote cite=urn:isbn:2-7073-1038-7
pJ'aime la liberté. J'aime être responsable
   de mes actes. J'aime comprendre ce que je
   fais… Et, cependant, je donne mon accord
   à ce marché bizarre./p
/blockquote

Which I can use and parse with an extension in Opera [1] which convert it
 into a link to the Open Library. In the future I could give 
accessibilities

to different services, and the user could choose its own reference system.


This is all very cool in its own way, and could be useful when used
with discipline within a discipline. But for a long time, such cool 
ideas will not be supported in most browsing situations. Yet, authors 
who know the cool idea will apply it and will fail to duplicate any 
credits in the normal visible content. This means that to most users, a 
quotation will appear without any credits or source information.


It also means that the only immediately available source information for 
a quotation will be an ISBN in URL format. So, for example, working 
offline, you won't see even the title and the author. Would the 
quotation even satisfy the legal requirements for quotations?


If the credits are additionally given in visible content, there *there* 
is the place to do cool things with ISBNs. The credits, when they 
include the ISBN in addition to author, title, etc., could have the ISBN 
part turned to an element like a href=urn:isbn:2-7073-1038-7ISBN 
2-7073-1038-7/a. (This would still suffer from lack of compatibility 
with older user agents, creating non-working links on them, so maybe 
some new markup - which would simply be ignored by old user agents - 
would be better.)


The point, however, is that the cite attribute in blockquote is broken 
by design and should not be implemented in any new ways (or old).


Yucca

Re: [whatwg] Problem in the Section 4 Elements of HTML = 4.4 Sections = 4.4.2 The Section element

2012-09-13 Thread Jukka K. Korpela


2012-09-13 20:43, Kevin Deamandel wrote:


I recently started checking the specs and i can't help but notice the
weird formation of the tags in this section
http://www.whatwg.org/specs/web-apps/current-work/#dfnReturnLink-10


When I try to use the one-page version that this URL refers to, it 
freezes or almost freezes my browser on a regular basis. I'm pretty sure 
it's not just me. So I very much prefer referring to the multi-page 
version, e.g.

http://www.whatwg.org/specs/web-apps/current-work/multipage/sections.html#the-section-element

And I suppose by weird formation you mean the example that starts with

!DOCTYPE Html
Html
 Head
   Title
 Graduation Ceremony Summer 2022/Title
   /Head
 Body
   H1


can anybody tell me if this is known/on purpous


My guess is that it's an accidental result of some software used to 
maintain the document. It's not incorrect, just odd, because it deviates 
from the coding style used otherwise, both in the use of spaces between 
tag close () and in casing (capitalized tag names).


Yucca

Re: [whatwg] Problem in the Section 4 Elements of HTML = 4.4 Sections = 4.4.2 The Section element

2012-09-13 Thread Jukka K. Korpela


2012-09-13 21:15, Tab Atkins Jr. wrote:


Hixie purposely varied his style across
examples, to show that certain variances in the syntax were allowed
and perfectly fine.


Oh, I see. It's somewhat questionable if you ask me. Varying the syntax 
_within a document_ is something different from the liberty of choosing 
one's style. But I guess the reader is assumed to treat the examples as 
quotations that reflect different styles (and style is consistent within 
each example).


Still, I wouldn't do that. I don't think authors really need to be 
reminded of the possibility of writing Section instead of the most 
common way, section, and the next common one, SECTION. People who 
have some special reason for writing, say,


sEcTIon

claSs

=  foo



should probably check the syntax definition details if in doubt, and 
just go ahead (maybe using a validator) if not.


Yucca

Re: [whatwg] Quirks mode handling of rowspan=0

2012-09-03 Thread Jukka K. Korpela


2012-09-03 3:11, Boris Zbarsky wrote:


Per HTML5 spec, rowspan=0 should span to the end of the table section.


This was the case already in HTML 4.01, though it was implemented slowly 
and was not much known among authors.



In any case, my suggestion is that in quirks mode, rowspan=0 not be
supported.


Generally, attempts at defining quirks mode would mean making it an 
alternate mode and will not be successful due to the wide variation 
across browsers and versions. It's called quirks for a reason.


Specifically, as some browsers already support rowspan=0 in quirks 
mode, and some don't, you cannot ensure backwards compatibility no 
matter how you define it. For the bulk of legacy pages, it does not 
really matter, as they do not use such attributes. So the question is 
really what happens to newer pages where people may have used them after 
observing that some browsers support them. The current situation is 
inconsistent across browsers, but it does not help to change it; it 
could break existing pages written to work on browsers that support 
rowspan=0 in quirks mode.


Similar considerations apply to colspan=2.

Yucca

Re: [whatwg] Conformance checking of missing alternative content for images

2012-08-22 Thread Jukka K. Korpela


2012-08-22 3:43, Ian Hickson wrote:


[...] the
 argument is that WYSIWYG editor implementors will be pressured into 
making

their tools output conforming content by people who don't understand the
subtlties of this thread, based purely on validator output.


To which extent do people pressure WYSIWYG editor implementors into 
that, who are these people, and is there evidence of the pressure being 
successful? How often have they made implementors generate alt= for 
unknown images, instead of something appropriate like alt=(an image)?



A user converting 100,000 PDFs to HTML isn't going to be entering
alternative texts for each image.


Such bulk conversions can be useful for many purposes, but the results 
are not accessible and do not conform to good HTML authoring rules. 
There is no reason to prevent validators from saying this, in their own 
way.


Take the example of converting one non-HTML document with images to HTML 
format. Should the result of an automatic converter that generates img 
tags without alt attributes be considered as valid as the result of 
human conversion with alt attributes added or semi-automatic conversion 
(where a human is prompted for entering alt texts)?


Yucca

Re: [whatwg] alt= and the meta name=generator exception

2012-08-06 Thread Jukka K. Korpela


On 5.8.2012 15:52, Henri Sivonen wrote:

People who are not the developer of the generator use validators to
assess the quality of the markup generated by the generator.
People can use tools in various ways. We cannot prevent that. But it 
does not need to dictate the design of tools. People can use hammers as 
toothpicks, but hammer manufacturers don't make hammers softer for this 
reason.



Or, alternatively, Alice anticipates Bob's reaction and preemptively
makes her generator output alt= before Bob ever gets to badmouth
about the invalidity of the generator's output.


So? Whose problem is this? Generators have generated nonsensical alt 
attributes for years, e.g. inserting the filename and number of bytes. 
Keeping the attribute required won't make much difference.



Even if we wanted to position validators as tools for the people who
write markup, we can't prevent other people from using validators to
judge markup output by generator written by others.
And it is appropriate to judge that generation of HTML has problems, 
when the markup contains img elements without alt attributes. There is 
no reason why this possibility should be taken away. It is true that 
generator vendors can cheat by emitting alt=. We can't really prevent 
that. You seem to be worried about the possibility that keeping alt 
attribute required somehow pushes or forces vendors into doing such 
things to stay competitive. But this sounds highly speculative.


We know that generators and other software may produce documents without 
a title element or with a dummy or bogus title element like titleNew 
document/title. And surely there are situations where an automatic 
generator has no way of deciding on an appropriate title element without 
consulting the user. So should there also be an exception allowing the 
omission of the title element, to avoid the assumed reaction by Alice, 
making her generator produce title/title or something worse?


Yucca

Re: [whatwg] alt= and the meta name=generator exception

2012-08-01 Thread Jukka K. Korpela


2012-08-01 10:56, Ian Hickson wrote:


Only generators are in a position where they might have to
include images for which they lack the ability to provide alt texts.


A simple counter-example to that: A human employee who has been told to 
add some images to a web page, without having been told why and with no 
instructions on alt texts.



On Wed, 25 Jul 2012, Jukka K. Korpela wrote:


Quite possibly. We cannot prevent people from writing and selling buggy
software. A generator may produce valid code, or invalid code. We should
not change the definition of valid just to match some generator
behavior.


The problem is that some generators -- e.g. software that converts word
processor documents to HTML -- are in a position where they sometimes
cannot possibly comply to the requirement. Image recognition and context
analysis simply isn't good enough yet to handle this case.


So what about the poor human then?

And when there’s an argument of impossibility, in some sense, something 
judged to be an error, on good grounds, it’s still an error. A markup 
error is not a mortal sin, and there is no real punishment for it, 
though it may have some negative effects.



It's unfortunate to force such vendors into a position of
having to defend their one validation error when there's nothing they can
do about it,


Silencing the error does not make the markup any better. This is an 
example of the validation as quality assurance fallacy that we should 
fight against, not support. When a document has been converted to HTML 
format without due attention to alt texts for images, it has not been 
converted properly. There is no reason to try please vendors of 
converters by tweaking the rules and the checkers/validators to accept 
automatic conversion results that just aren't good.


And they *can* do a lot about it. They can initiate a user dialog, 
prompting for a person to provide alt text. Whether it is economically 
feasible is a different issue. If you don't require generators to do 
that, why would you require the poor human employee to write just 
something into the alt attribute? (Making him type nonsense, mostly, of 
course.)



According to normal accessibility principles, a generically informative
alt attribute is better than no alt attribute, which just says here's
an image and we're not telling you anything about it, probably because a
lazy author didn't give the issue any thought.


That's what the absence of an alt= attribute means.


That's the problem indeed. A generically informative alt attribute means 
something else. (What it means depends on its formulation.)



Even alt=unknown image or alt=unknown image named foobar.jpg is
better than lack of alt attribute (or alt=).


On the contrary; alt=unknown image is equivalent to spanunknown
image/span


Or rather just the text unknown image. Whether it makes sense depends 
on the context, as usual with alt attributes. In an image gallery (a 
typical case), it makes perfect sense.



and would be fine alt= text for an image of text that says
unknown image


That's a bit too theoretical, isn't it? On similar grounds, you might 
argue that _any_ alt text is a fine text for an image containing that 
text, and nothing else.


Yucca

Re: [whatwg] alt= and the meta name=generator exception

2012-07-25 Thread Jukka K. Korpela


2012-07-25 15:05, Henri Sivonen wrote:


I think it would be better to keep the alt attribute always required but
recommend that conformance checkers have an option of switching off errors
related to this


The big question is whether that would be enough to solve the problem
of generators generating bogus alts in order to pass validation.


No, and it would not solve most of the other problems in the World Wide 
Web either. But it would solve the problem of confused authors as well 
as the practical problem that authors may wish to check their markup 
even when using software that generates img tags without alt attribute.



I predict generator writers would want the generator output to pass
validation with the default settings


Quite possibly. We cannot prevent people from writing and selling buggy 
software. A generator may produce valid code, or invalid code. We should 
not change the definition of valid just to match some generator 
behavior. After all, what's the point of using validation if you use a 
generator? You would in effect be testing the generator, something that 
its vendor should have done. We should not be concerned about helping 
generator vendors to advertize their products as producing valid code 
(code that passes validation) when they in fact produce code that 
violates established good practices of HTML.


The situation where a generator has to emit img tags without being able 
to insert any meaningful alt attributes is a real one, though rather 
special. According to normal accessibility principles, a generically 
informative alt attribute is better than no alt attribute, which just 
says here's an image and we're not telling you anything about it, 
probably because a lazy author didn't give the issue any thought. Even 
alt=unknown image or alt=unknown image named foobar.jpg is better 
than lack of alt attribute (or alt=).


Yucca

Re: [whatwg] Administrivia: Update on the relationship between the WHATWG HTML living standard and the W3C HTML5 specification

2012-07-25 Thread Jukka K. Korpela


2012-07-25 20:40, Ian Hickson wrote:


On Wed, 25 Jul 2012, Melvin Carvalho wrote:


Just so that it's possible to understand how to name the two new
branches correctly, can you confirm that the W3C branch is now called
HTML5 and the WHATWG branch is named 'HTML Living Standard'.
Is this the long term project name, or just a working title?


The WHATWG spec is just called HTML, Living Standard is what it is. As
we've gone through half a dozen names already for this spec (XForms Basic,
Web Forms 2.0, Web Applications 1.0, HTML 5, HTML5, Web Applications 1.0
again, and now HTML), I don't intend to change it again, but who knows. :-)


The practical problem with names is that we need to communicate what we 
are referring to, in an understandable manner.


HTML is far too broad, and Living Standard doesn't even say this is 
about HTML. HTML Living Standard would do I guess, even though it 
sounds more like a credo than a technical reference. I suppose WHATWG 
HTML might be used as a fairly neutral expression.



As of the last time the W3C equivalent spec was updated, it was titled
HTML5, but you'd have to ask the W3C what their plans are.


HTML5 has become a rather loose expression, so we may need to use a 
phrase like W3C HTML5, or maybe W3C HTML5 draft(s), in contexts 
where the difference between W3C HTML5 and WHATWG HTML might matter.


Yucca

Re: [whatwg] alt= and the meta name=generator exception

2012-07-24 Thread Jukka K. Korpela


2012-07-24 21:58, Ian Hickson wrote:


On Tue, 24 Jul 2012, Edward O'Connor wrote:


The spec currently disallows conformance checkers from reporting img
elements without alt= attributes as an error when meta
name=generator is present[1].


I've adjusted the text to make it clearer that validators can report the
error in this case, just that they are discouraged from doing so.


This is an improvement, but I think Edward O'Connor's points still 
apply. Just saying that the page was generated by a generator has no 
logical relationship to the issue of alt texts.


I think it would be better to keep the alt attribute always required but 
recommend that conformance checkers have an option of switching off 
errors related to this, due to situations where automatically generated 
markup contains a large number of img elements without alt attributes. 
Such situations can be *understandable* for practical reasons, but this 
does not make the markup good and recommendable.


Making alt required is good policy, and making it always required keeps 
things simple and understandable. There are cases where documents 
inevitably deviate from a specification, in a manner that causes some 
trouble without causing serious trouble in *most* browsing situations. 
Therefore conformance checkers should be configurable, in issues like this.


Yucca

Re: [whatwg] Suggest making dt and dd valid in ol

2012-07-15 Thread Jukka K. Korpela


2012-07-15 17:40, Ian Yang wrote:

 Throughout the article, I saw it mentioned bullets and numbers
 frequently. However, that's just browsers' default rendering of ul and
 ol.

It's the only real difference between the two.

 As a coder, personally I don't care how browsers render them by
 default.

You should. Check out the Usual CSS Caveats.

 What I care is the meaning of the code I write. That is, when I
 want an unordered list, I write ul; when I want an ordered list, I 
write

 ol. ul means unordered list, and ol means ordered list.

And what does that mean? Does it mean that browser may or will treat 
ul as unordered in the sense that it can render the items in any 
order? If not, what *is* the difference? Just some people's *calling* it 
unordered.


Yucca

Re: [whatwg] Suggest making dt and dd valid in ol

2012-07-15 Thread Jukka K. Korpela


2012-07-16 5:36, Ian Yang wrote:


Imo, ul means the order of the items is unimportant, not browsers can
render the items in any order.


But if the order is unimportant, there still _is_ an order. Being 
unordered would be something else. And what would it matter to indicate 
the order as important if you only do that in markup, without affecting 
rendering, search engines, etc., at all? It's like invisible ink in a 
book. If it is somehow relevant to say that the order is unimportant, 
you have to, well, *say* it (in words).


The only reason for this unordered list idea (a list is by definition 
unordered; a set, or a multiset, is not) is the willingness to keep ul 
and ol in HTML (it would be very impractical to omit one of them) 
without admitting that they were introduced, and are being used, simply 
for bulleted and numbered lists. So this resembles the confusing play 
with words regarding i and b.


Yucca

Re: [whatwg] Suggest making dt and dd valid in ol

2012-07-14 Thread Jukka K. Korpela


2012-07-14 10:46, Anne van Kesteren wrote:


On Sat, Jul 14, 2012 at 6:22 AM, Ian Yang ian.h...@gmail.com wrote:

By seeing such contents, we usually code it using definition list (dl).
At first, I was thinking the same idea. But then I realized that stages in
a life cycle should be regarded as ordered contents.


I would recommend not over-thinking the matter. Otherwise soon you
will start wrapping your ps in ol/lis too to ensure they stay in
the correct order.


Indeed. The ol element is no more and no less ordered than ul or any 
other element. Many HTML tag names are misleading.



(The specification points this out as well: The order of the list of
groups, and of the names and values within each group, may be
significant.)


That's actually a questionable statement there, since it may make the 
read ask whether the order of sub-elements is *generally* significant. 
It's as questionable as it would be to write The order of successive p 
elements may be significant or The order of successive section 
elements may be significant.


Yucca

Re: [whatwg] Suggest making dt and dd valid in ol

2012-07-14 Thread Jukka K. Korpela


2012-07-14 18:51, Ian Yang wrote:


If ol is no more and no less ordered than ul,
what's the purpose of its introduction?


The real purposes, in the dawn of HTML, were that ol and ul 
correspond to numbered and bulleted lists, respectively, reflecting two 
very common concepts in word processors. This is how they have been 
used, though some authors have started overusing ul for thinks like 
lists of links even when they specifically don't want them to appear as 
bulleted. Even W3C specifications, in their markup, switch to ul in 
the midst of hierarchy when they want bullets and not numbers.


HTML5 tries to stick to the theoretical idea of ordered vs. 
unordered list, but it does not really change anything, and it is not 
supposed to change anything - any ul will still be rendered in the 
order written.


More on this:
http://www.cs.tut.fi/~jkorpela/html/ul-ol.html

Yucca

Re: [whatwg] Character-encoding-related threads

2012-07-13 Thread Jukka K. Korpela


2012-06-29 23:42, Ian Hickson wrote:


I consider all boilerplate to be a significant burden. I think there's a
huge win to making it trivial to create a Web page. Anything we require
makes it less trivial.


It's a win, but I'm not sure of the huge. When learning HTML, it's an 
important aspect, and also when typing HTML by hand, but then it's 
mostly a convenience - and it helps to avoid annoying problems caused 
e.g. by making a single typo in a DOCTYPE declaration. So !DOCTYPE 
html is really an improvement



Currently you need a DOCTYPE, a character encoding declaration, a title,
and some content. I'd love to be in a position where the empty string
would be a valid document, personally.


Is content really necessary? The validator.nu service accepts the following:

!DOCTYPE htmltitle/title

I don't think we can get rid of DOCTYPE anytime soon, as browser vendors 
are stuck with DOCTYPE sniffing.


But the title element isn't really needed, and unless I'm mistaken, 
the current rules allow its omission under some conditions - which 
cannot be tested algorithmically, so conformance checkers should issue a 
warning at most about missing title.


It might be better to declare title optional but strongly recommend 
its use on web or intranet pages (it might be rather irrelevant in other 
uses of HTML).


Yucca

Re: [whatwg] Various HTML element feedback

2012-06-05 Thread Jukka K. Korpela


2012-06-06 2:53, Ian Hickson wrote:


I have rather been optimistic about future developments for markup
elements that have been defined exactly enough to warrant meaningful
semantics-based processing. For example, most of the uses mentioned in
current text imply that var element contents should be kept intact in
automatic language translation.


That continues to be the case, so I don't know why you conclude that using
it is now pointless.


It is worse than pointless, if the definition of var covers a term 
used as a placeholder in prose. Such expressions should definitely not 
be kept intact in automatic language translation.


The definition of var is so broad that it is questionable whether 
*anything* useful can be assumed in automated processing. If it were 
defined more technically, without that placeholder idea, we could fairly 
certainly say that the content should be treated as a technical notation 
that should be left untranslated (as such notations are normally 
international), ignored in spelling checks, treated as equivalent to 
unknown nouns in syntax analysis of human language text, etc.



So why not simply define i recommended and describe var,cite,
em, and dfn as deprecated but supported alternatives?


What benefit does empty deprecation have?


Declaring some features as obsolete is effectively deprecation; I just 
used the term deprecate as per HTML 4.01 because I find it more 
descriptive. Anyway, defining those elements as deprecated/obsolete 
would be no less and no more empty than the current statements about 
obsolete status. Validators/checkers would issue messages (hopefully 
just warnings) about them, and tutorials would probably describe them as 
secondary if at all.


Reducing alternatives, from five to one in this case, makes the 
recommendations simpler and helps authors because they need not spend 
time in making choices between the elements. Such choices can be tough, 
if you try to play by the declared semantics, especially if it is 
vague (to a normal reader of a spec).


My point is: either make elements like var, cite, em, dfn, i 
defined so that the differences can be utilized in automatic processing, 
or just bundle them together, to i.



It's not like we can ever remove
these elements altogether.


Oh, in 20 or 30 years, I think browsers could support to some of them.


What harm do they cause?


Unnecessary complication to the language, artificial semantics that do 
not actually define meanings, and confusion among those authors who try 
to take semantics and specifications seriously. Oh, and pointless 
variation in markup and added complexity of styling.



If we have to keep them, we are better served by embracing them and giving
them renewed purpose and vigour, rather than being ashamed of them.


I think this summarizes well the idea behind some of the most contrived 
semantic definitions. It was a brave attempt, but it failed. No normal 
author will ever get your idea of the new meaning for b and i, for 
example.


And since, for example, the font markup needs to be supported for a 
long time, how come *it* has not got a new, semantic definition?


If var, cite, em, dfn would be obsoleted/deprecated in favor of 
i, they would still need to be defined in the spec, of course. But the 
definition could simply state that they are outdated elements that 
should not be used by authors and should be treated by browsers as 
equivalent to i.



This would make authoring simpler without any real cost. There’s
little reason to tell authors to use “semantic markup” if we don’t
think it has real effect on anything.


It does have an effect. It has many effects. It makes maintenance easier,
it makes it easier to transition from project to project, it makes it
easier to work on other people's markup, it makes it significantly easier
to dramatically change a site's appearance, it makes it easier to create
apply custom tools to extract information from the documents, it makes it
easier for search engines to guess at author intent, it makes it easier
for the documents to be repurposed for other media, it makes it easier for
documents to be remixed, it makes it easier for JavaScript libraries to
be used and mixed...


I've often seen such arguments, even in situations where it is 
strikingly obvious that they don't apply. The argumentation sounds like 
a matter of faith or principle rather practical considerations.


Many of the arguments relate to authoring style, coding principles, and 
organization of work, rather than something that belongs to a general 
specification. For example, the ease of working on other people's markup 
in a collaborative environment depends on a large number of factors, 
including the overall structures, appearance of markup (lower vs. upper 
case, use of quotes, omission of omissible tags, indentations, empty 
lines), principles of choosing id and class names, use of comments, etc. 
General specifications cannot and need not handle such

Re: [whatwg] Client side value for language preference

2012-03-29 Thread Jukka K. Korpela


2012-03-29 22:02, Matthew Nuzum wrote:


Hello, on every HTTP request your browser sends header called
Accept-Language with a value something like this:

 en-gb,en-us;q=0.7,en;q=0.3

– –

Browsers support a value called navigator.language but it does not
convey the same information as the HTTP header.


Browsers have different features in this respect, but indeed they are 
quite distinct from the language preferences.


The language preferences used to have little impact, because few sites 
made use of them. And most users never learned about them, and there was 
little reason to learn… and therefore the settings are often inadequate, 
so it was a vicious circle. Well, still is. But interactive applications 
like many Google services are changing the situation: the language 
preferences sent are used to affect the language in generated texts and 
user interfaces, rather than just selecting between language versions of 
documents.



Some browsers have gotten smarter and now send the first value from
the user's language preference, which is definitely an improvement.


I think browsers have generally sent an Accept-Language header 
constructed from user preferences, for many years. The problem is that 
these settings seldom reflect the user’s real preferences, because the 
defaults generally depend on the browser language, or on the system 
language, not on any action by the user. If you use an English-language 
browser, the preference defaults might contain just English in different 
flavors, even though English might be the user’s fourth-best language.



What would be great is if client side scripts could access the same
information the server side code could access.


Sounds like a useful thing. It would support the idea of enabling 
standalone applications that can run offline, too. Along similar lines, 
it might be useful to give access to other settings that affect HTTP 
headers, but language settings seem to be far more important than other 
settings.


Note, however, that the idea postulates a simple model where the header 
only depends on certain settings and is the same for all resources. But 
this is probably acceptable. It is difficult to imagine a situation 
where a user preferred HTML documents in French and images in Italian, 
for example.



That could be done
simply be creating a new property that contains the same string as is
sent to the server. It's easily parseable. But if we're going to make
a new interface then maybe it would be good to make one that reduces
the amount of work that client side developers would need to do.


I think the simple idea of a string is the best way to go. Anything 
beyond that can be handled by a library that can be written in a 
cross-browser way—not quite trivial, but surely doable. The problem is 
to get some basic data from the browser in well-defined way. I’d suggest 
a name like


navigator.acceptLanguage


A very naive and probably flawed example could be:

navigator.language.preference = [{lang:'en-gb', weight: 0.7},{lang:
'en-us', weight: 0.7},{lang:'en', weight: 0.3}];

Then JS could:

var n = navigator.language.preferences
for (i in n) {
   // check if n[i].lang is supported by the application, if so do
something about it
}

This would give users a list of languages with the first in the list
being the most preferred.


Actually, it’s the q (weight) values that matter, not the order, so a 
routine that selects the most preferred language would need to compare 
them. Moreover, the language negotiation mechanism can be more 
complicated: it need not select the language with highest q value, since 
the resources themselves may have been classified as having different 
qualities. This might not matter in most cases, but applications should 
still have access to the full information, with q values, either in form 
of an Accept-Header string or as a constructed array or object. I’m just 
spelling this out to emphasize that simple information about the most 
preferred language is not enough, even if the information is taken from 
user preferences.


Yucca

Re: [whatwg] including output in form submissions

2012-02-22 Thread Jukka K. Korpela


2012-02-22 19:30, Cameron Jones wrote:


Updatingoutput  as form submittable element is included in a
proposal to enhance http request processing under a w3c issue


This sounds like a pointless attempt at enhancing a pointless element.

Instead of output, authors can use, and have been able to use since 
rather early days, input if the data is to be submitted as part of 
form data, and any non-form-field element, like div, otherwise. (Well, 
in the very early days, it had to be input anyway, but that was long ago.)


output is just for looking semantic for semantics' sake. There is 
nothing illogical in using input for data that is generated (not 
directly user-supplied) client-side. It's input to form handlers, 
client-side or server-side, anyway.


And there's nothing particularly semantic (i.e., as relating to meaning) 
about saying that some content is the output of some calculation. If a 
value is 42, its being in output does not indicate its meaning in any way.


output has _some_ effects: it confuses authors, if they wish to be 
serious about new specifications.


So please drop output.

Yucca

Re: [whatwg] including output in form submissions

2012-02-22 Thread Jukka K. Korpela


2012-02-22 20:13, Cameron Jones wrote:


It [the output element]

 does provide a greater degree of integration with the browser though.

Is this a requirement, or just assumed features of implementation? Which 
of the assumed benefits could not be achieved by adding a new value for 
the type attribute (input type=output), or a new parameter (input 
output), or otherwise retaining the use of input (which would degrade 
well)?



This results in a less scripting being required and allows for
inline scripting to be more concise which aids readability and keeps
things together.


This would need to be illustrated by real examples, and you would still 
have the question why this could not be achieved using libraries, 
without making pages break on old browsers.



It's also possible for it to be styled using a
different interface instead of elements targeted at capturing
information. The 'disabled' state doesn't provide this for input


If you wish to show results of calculation visibly _and_ pass them along 
with the form data, you can use _both_ a normal element like div, p, 
or span _and_ an input type=hidden. The resulting duplication is 
irrelevant; you have the result in a variable, or should have, and just 
put it into two places.


Yucca

Re: [whatwg] including output in form submissions

2012-02-22 Thread Jukka K. Korpela


2012-02-22 20:38, Cameron Jones wrote:


I'm referring to the for attribute onoutput  which ties its value
to the elements which went into the calculation. This would otherwise
have to be done using event attributes.


I don't see how that is supposed to simplify things. It's supposed to 
designate dependencies, but you still need to do just the same coding as 
without it, won't you?



Old browsers won't be broken by the
output  element, they will function ina  degraded state though.


Certain old browsers won't recognize the output element at all.


If you wish to show results of calculation visibly _and_ pass them along
with the form data, you can use _both_ a normal element likediv,p, or
span  _and_ aninput type=hidden. The resulting duplication is
irrelevant; you have the result in a variable, or should have, and just put
it into two places.

Yucca



This is imposing more and more on scripting


No, it's nothing in addition to what is currently done. And if you 
calculate something in scripting, you need to write it somehow into an 
element. Writing it twice, when needed, is very trivial.



and is a far more removed
from declarative markup which is easier to understand and less error
prone.


I don't see how it would be less error prone. And I don't see anything 
declarative with output. It's declarative markup to say this is a first 
level heading (which we can say with h1) or this is a person's name 
(which we can't say in HTML), but it's not declarative to say this is 
something written / to be written by a client-side script.



I think the output  element is conceptually simple, especially
for authors with little or no programming experience.


So why the discussions about including output in form submission?


This also doesn't address the ability to style these elements in a
separate and distinguishable way from input.


You don't need a new element (unsupported by old browsers) to do 
styling. You can use classes, or other attributes. And you don't need to 
use input. You can use span or whatever you like, and/or input 
type=hidden, which normally causes no rendering issues.


Yucca

Re: [whatwg] The blockquote element spec vs common quoting practices

2012-02-12 Thread Jukka K. Korpela


2012-02-12 8:36, Nils Dagsson Moskopp wrote:


Since in current usage, blockquote  means just “indent” more often
than not, browsers and search engines should not and will not imply
any specific semantics for it. Thus it will be pointless to use it.


Riveting tale, chap. Can you provide proof?


Regarding browsers and search engines, what else could constitute a 
proof than the absence of any information about them doing anything with 
blockquote? Apart from the obvious default rendering, with indents, 
that is.


Regarding the usage, I have not collected statistics on this during my 
20 years of browsing web pages, and I don’t think anyone has. So 
objective argumentation is somewhat difficult. In this discussion, I 
mentioned the WIPO document

http://www.wipo.int/treaties/en/ip/berne/trtdocs_wo001.html
issued by a reputable organization, and if you look at the source code, 
you’ll see that the page uses blockquote just for subparagraphs, not 
for quotations. I’m sure that if you start looking around, you’ll see 
blockquote used mostly for mere indentation.


Yucca

Re: [whatwg] The blockquote element spec vs common quoting practices

2012-02-12 Thread Jukka K. Korpela


2012-02-12 19:54, Ian Hickson wrote:


The blockquote  has been, and will be, rather pointless without markup
for “credits” (indication of author and source, which are normally
required by law).


What's the use case, other than presentation?


What’s the use case for markup for quotations in general, other than 
presentation? I would say it is just a matter of potential ways in which 
such markup could be used, rather than existing usage—as there can 
hardly be such usage without establish and reasonably consistent usage 
of markup for quotations.


At the same level, “credits” can be used in editing and checking tools 
to verify that all quotations have credits (issuing warnings about those 
that don’t); in automatically generating a list of references; in an 
optional browsing mode where credits are hidden, with a button available 
for opening them; in finding out (even web-wide) which documents quote a 
certain document.


If and when suitable microdata markup will be used inside an element 
designated as


If we think that markup for quotations will not have much practical use, 
then it’s better to omit such markup altogether (and tell people to use 
whatever markup they like, maybe even blockquote if they prefer 
indentation). But if we think that quotation markup will become useful, 
then the markup should have an element for “credits” on the same 
optimistic grounds.


The difference between blockquote and (for example) quotation as 
quotation markup is that the latter has no burden of existing use for 
other purposes. Anyone who plans to do some intelligent processing of 
quotations could expect quotation to be quotation markup and nothing 
else, since there is no motivation for using it for other purposes


Yucca

Re: [whatwg] The blockquote element spec vs common quoting practices

2012-02-12 Thread Jukka K. Korpela


2012-02-12 21:43, Nils Dagsson Moskopp wrote:


The difference between blockquote  and (for example) quotation  as
quotation markup is that the latter has no burden of existing use for
other purposes.


By analogy, a completely new table element would also be necessary.


There has been quite a lot of discussion on distinguishing between 
layout tables and data tables. The heuristics look reasonable to me, so 
I don’t see why a new element or even a new attribute would be needed.


There is existing software that treats table as tabular data, such as 
assistive software that lets the user access a cell by column and row 
information. So the situation is different from blockquote.



Oh,
and what about a way to denote images that is not tarnished by spacer
GIFs and web bugs?


You can use object for that if you like. But img is still an image, 
whether used as a spacer or otherwise.



Anyone who plans to do some intelligent processing of
quotations could expect quotation  to be quotation markup and
nothing else, since there is no motivation for using it for other
purposes


Authors lie and we will have to live with it.


It’s not a lie if you use blockquote for indentation because others 
told you that it indents text (it was often described that way in 
tutorials, too). It might be wrong by some definition, but that does not 
make it a lie.



You cannot make content
producers honest by just introducing a new element intended to be used
similar to the old element.


Not by just doing that, but it would be part of the process.


Why do you think that *this* time, everyone
will read the manual before producing markup?


The don’t need to. The important thing is that it would be used against 
the defined meaning, because there is no reason to do that. People use 
blockquote because they heard or observed that it indents, and it did 
that even when there was no better way to indent.


Yucca

Re: [whatwg] The blockquote element spec vs common quoting practices

2012-02-12 Thread Jukka K. Korpela


2012-02-12 23:25, Ian Hickson wrote:


The use case for most of the semantic markup is jsut easier authoring
and maintenance, in particular for selectors in CSS.


If that’s the approach, and this reflects a consensus, shouldn’t this be 
explained in the introductory material (which is now rather limited and 
technical and does not really explain the goals and ideas)? This could 
even help to clarify the discussion but especially it would guide authors.


The obvious meaning of “semantic” (and http://www.whatwg.org/specs/ 
speak about semantic markup without quotation marks) is that it is 
related to meaning, not ease of authoring. If you don’t expect semantic 
differences as such make any difference, then why introduce semantic 
markup at all? Surely it would be easier to author if you can introduce 
your own tags as you go, designing a tag set that suits a particular 
page, site, or application.


If the idea is that in collaborative environments, it is easier to work 
when everyone uses the same tags, then it’s really about setting design 
and coding practices. It would be something rather orthogonal to all 
other aspects of HTML.



In the case of
blockquote, we inherited it from HTML4 and so use cases weren't really a
driving factor in the contemporary specification's development of the
feature; it was more driven by consistency concerns.


Consistency with existing practice would be achieved by describing the 
default rendering in browsers. There’s little reason to aim at 
describing the semantics if it’s not really expected to be relevant – 
especially since we know that very often, existing pages (and existing 
authoring software) uses blockquote just for indentation.


Designers and people who set coding standards can specify how they want 
blockquote to be used



(Same as with, e.g.,
dfn,strong, and other semantic elements.)


They, too, as well as b and u, have been modified quite a lot, with 
elaborated explanations about their meanings, causing both confusion and 
argumentation. It would be so much simpler, and it would give the same 
authoring and maintenance ease, to describe just how browsers render 
them by default.



At the same level, “credits” can be used in editing and checking
tools to verify that all quotations have credits (issuing warnings about
those that don’t); in automatically generating a list of references;
in an optional browsing mode where credits are hidden, with a button
available for opening them; in finding out (even web-wide) which
documents quote a certain document.


Do any tools try to do any of this?


Of course not; there is no markup for credits. (And in reality no markup 
for quotations either – no markup element that programs could reasonably 
expect to mean “quotation”, unless you count q – which is probably not 
used against its basic defined meaning, but it isn’t used much at all.)



If there are concrete use cases with
software that is trying to address the use case and authors who want to
use that software, then we should definitely look at the use cases. But we
need to study the use cases and software, if that's the case.


So in order to have proposals for elements considered, one would need to 
present concrete cases of programs trying to work on the elements that 
do not exist.



The practical use of moderately simplified maintenance is sufficient to
justify keeping elements that are already deployed.


Certainly. And to make this as simply and effectively as possible, the 
specification should just describe the existing markup in terms of 
browser behavior. Surely it does not ease authoring or maintenance to 
invent new meanings and new rules for markup that is currently in use – 
if the semantic definitions are not expected to be notified and used by 
browsers, search engines, etc.


If the semantics are not supposed to be “real,” it would be much better 
to leave things open. If the specification would not designate any 
element as suitable for quotation, still less the right element for it, 
designers and authors would answer the question themselves, in their own 
coding rules and standards. It would be *easier* without a fairly 
complex part of a specification telling all kinds of things about the 
elements, possibly reflecting the design and coding ideas of some 
working group or editors.


If no software is expected to treat blockquote text so that the 
semantics “quotation from an external source” is relevant, then it is 
completely immaterial whether one puts the references or credits inside 
or outside the element (if one uses blockquote for a quotation). Any 
rule on this would be just a rule for rule’s sake and would make life 
more difficult to people who take the specification seriously and face 
situations where the references would better go inside, or go outside, 
as the case may be.


Yucca

Re: [whatwg] The blockquote element spec vs common quoting practices

2012-02-11 Thread Jukka K. Korpela

2012-02-12 2:13, Ian Hickson wrote:

That's not to say that one day we won't provide an explicit way to mark up
attribution for blockquotes in markup, just that the desired
presentation isn't a relevant concern in doing so

The relationship between a quotation and the indication of source is not
presentational, and more than being a quotation is presentational.
Stylistic variations in displaying a quotation or the relationship are
presentational.

The blockquote has been, and will be, rather pointless without markup
for “credits” (indication of author and source, which are normally
required by law). It has been, and will be, either ignored by authors or
used to mean “indent” in a comfortable way, though accidentally
indentation may be used for quotations.

Even formally, a blockquote element has been, and remains to be, at
most semi-semantic. The definition “block quotation” left it open what
distinguishes it from other quotations, except in rendering. “A section
quoted from another source” surely looks like more semantic and
structural, but if taken seriously, it would kill blockquote.

Seldom does an author wish to quote an entire section. It is not even
legal to quote more than is required to fulfill the acceptable purpose
of quoting. I don’t think I have ever quoted anything that could
sensibly be called a section. None of the examples currently presented
at
http://www.whatwg.org/specs/web-apps/current-work/multipage/grouping-content.html#the-blockquote-element
comes even close

Wrapping blockquote inside figure just to be able to present
“credits” as figcaption is highly artificial. It is also clumsy,
especially considering that it would have to be the *normal* way of
presenting a block quotation to satisfy legal requirements.

If we start from the semantic and logical concept of a quotation, then
it should be obvious that the element should have a subelement for
providing source information (“credits”), normally at the end of the
element. The reason why this has not been so from the beginning is that
blockquote was really designed for indentation, though it was _named_
after one use for indentation that the designers had in their mind. And
that’s how it has been used.

Since in current usage, blockquote means just “indent” more often than
not, browsers and search engines should not and will not imply any
specific semantics for it. Thus it will be pointless to use it.

So leave blockquote as legacy markup and recommend it to be used, in
new documents, only for indentation in rare situations where an author
much prefers indentation even in the absence of CSS.

And design markup for quotations so that suits practical needs and legal
requirements. For this, introduce quotation with src as a subelement

Yucca

Re: [whatwg] The blockquote element spec vs common quoting practices

2012-02-11 Thread Jukka K. Korpela


2012-02-12 8:36, Nils Dagsson Moskopp wrote:


Why do you hate the cite attribute?


I don’t; it’s just useless, and it does not in any way satisfy the 
legal, moral, and scholarly requirements for specifying the source.



Seldom does an author wish to quote an entire section. It is not even
legal to quote more than is required to fulfill the acceptable
purpose of quoting.


Elaborate?


“It shall be permissible to make quotations from a work which has 
already been lawfully made available to the public, provided that their 
making is compatible with fair practice, and their extent does not 
exceed that justified by the purpose.”

http://www.wipo.int/treaties/en/ip/berne/trtdocs_wo001.html#P144_26032


And I don't think I have ever had a need for providing credits that
went beyond having a URI in the cite attribute and a corresponding
hyperlink in the surrounding prose.


By the Berne convention, when a work is quoted, “mention shall be made 
of the source, and of the name of the author if it appears thereon.”


The cite attribute, in addition to being practically unsupported, does 
not mention anything. A reference in the “surrounding prose” is a 
completely unstructured way, regarding HTML markup, and not suitable for 
most quotations. It is not reader-friendly at all to provide a 
bibliographic reference (author and full title at the minimum) inside text.



If we start from the semantic and logical concept of a quotation,
then it should be obvious that the element should have a subelement
for providing source information (“credits”), normally at the end of
the element.


That would needlessly complicate parsing the contents of a blockquote
element quite a bit.


Is the comfortability of parsing crucial here? If you want semantic 
markup, you should be prepared to facing any technical difficulties that 
may arise, rather than let the technicalities dictate rules for markup.



why would it not be
“obvious” ro have a “for” attribute for the cite element?


The cite element is, in practice, just some authors’ way of writing 
i, assumed to be more semantic, when the text is a book title, a movie 
name, or something similar. It has really nothing to do with quotations. 
The work mentioned might be quoted, too, in the context, but that’s 
coincidental.



Since in current usage, blockquote  means just “indent” more often
than not, browsers and search engines should not and will not imply
any specific semantics for it. Thus it will be pointless to use it.


Riveting tale, chap. Can you provide proof?


Actually the burden of proof is on those who think that blockquote has 
some useful support.


Regarding on what authors actually use blockquote for, I’ve seen quite 
enough of pages that use nested blockquote elements to achieve 
different amounts of indentation.


Discussion forums sometimes use blockquote for quotations (though 
surely not for quotations of sections), but that’s just because the 
authors of forum software found that markup suitable, as quotations are 
to be indented. Those who use table layout or style sheets often don’t 
bother using blockquote.



So leave blockquote  as legacy markup and recommend it to be used,
in new documents, only for indentation in rare situations where an
author much prefers indentation even in the absence of CSS.


How do you propose to treat legacy content?


The common treatment of blockquote has been well documented.


An alternative might lie in using some kind of framework … for
description … of resources! Are you reasonably sure that Dublin Core or
similar vocabularies can not help you with this use case?


No, I am absolutely sure that Dublin Core and friends has nothing to do 
with this. (Besides, DC is an old specification which has been casually 
used on web pages for many years, and turned out to be write-only 
metadata. All the recent efforts on the metadata front have ignored DC.)


Whatever markup might turn out to be useful for metadata that associates 
a quoting document, a quotation, and the quoted source, it first needs 
some elements to relate to. In order to say, in metadata, something 
about the relationship between a quotation and its source, you need to 
mark up the quotation and a reference to the source at the very basic 
level. Preferably, using something that unambiguously mean “quotation” 
and “source of quotation” and not “indent” or “figure caption.”


Yucca

Re: [whatwg] Why isn't the pattern attribute applied to input type=number?

2012-02-10 Thread Jukka K. Korpela


2012-02-10 12:39, brenton strine wrote:


Regarding the an input with type in the number state, the spec states
that the pattern attribute must not be specified and do[es] not
apply to the element.


That’s because the pattern attribute is for constraining text data using 
a regular expression.



Why is it specifically blocked? Doesn't that encourage the use of a less
semantic text input type for numbers that need to be validated beyond
simple max and min?


A regular expression, which operates on texts, is not a _logical_ way to 
set constraints on _numbers_. A number is a mathematical entity; a 
numeral, such as 2000 or 2.000 or 2,000 or MM, is a textual presentation 
of a number.


At a more concrete level, type=number really means type=spinbox, but 
modern design of markup languages favors names that look more semantic. 
(If type=checkbox were invented today, it would probably be called 
type=boolean.)



What if you want the number to be either 13 or 16 digits long, as with a
credit card

pattern=(\d{5}([\-]\d{4})?)


Then you use type=text. Whether the value is a number or just a 
sequence of digits is debatable. But in any case you don’t want to 
create a spinbox.


Yucca

Re: [whatwg] New attributes would degrade better than new elements

2012-01-27 Thread Jukka K. Korpela


2012-01-27 21:33, Ian Hickson wrote:


On Wed, 26 Oct 2011, Jukka K. Korpela wrote:


New elements likenav  andfooter  have the problem that some existing
user agents don't recognize them, even for the purposes of styling.


This is only a transient problem for a few years, and a minor one at that
-- you can always add CSS to make them work in CSS-capable browsers,


No, that won't work on still existing versions of IE.


Old IEs need a special trick.


Indeed. They require JavaScript code.


Therefore, it would be much simpler, for compatibility with existing
user agents, to use just div type=nav  and div type=footer.


I think the ugliness of that solution far outweighs any temporary
transition issue.


div type=nav has been used for years, and it did not become any uglier.

Transient problems that will be with us for years, as you admitted, 
far outweigh any subjective esthetics of more compact markup.



Personally, for example, I find the
terseness of different element names to be of much help in writing more
maintainable documents.


Then you could use authoring tools that convert nav, or whatever you 
prefer, to markup that all browsers understand.



But in general, the main purpose is easier authoring.


It is not easier but more complicated, since you need to write CSS code 
_and_ JavaScript code just to make all browsers understand your nav 
the same way they would understand div class=nav.


Yucca

Re: [whatwg] sic element

2012-01-23 Thread Jukka K. Korpela


2012-01-24 1:18, Ian Hickson wrote:


u, for instance, was only added after rather
compelling use cases were presented.


The only use cases mentioned in the current version of the living 
standard are labeling the text as being a proper name in Chinese text 
(a Chinese proper name mark) and labeling the text as being misspelt. 
These are semantically so remote that using the same element for them is 
artificial, to put it mildly.


What are the actual benefits of using u instead of span? The only 
difference is that with u, the default rendering on common browsers 
will use underlining. This is the true meaning of u, and abstract, 
vague semantics will not help authors but confuse them.


What is _compelling_ about markup for proper names in Chinese? HTML has 
had no markup for proper names in any language. Why introduce markup for 
them in one language, with the assumption that a specific rendering 
convention, now apparently rare, will be used?


What is _compelling_ about markup for misspellings? How many web pages 
use such markup and need it, and why is it compelling that u be 
available to them?


What is so _semantic_ about it if can mean Chinese proper name _or_ 
misspelled word?



By reusing existing elements, we are able to support them without
having to wait for new elements to be implemented.


Several new elements have been added without such concerns.


Again, you are incorrect. The concerns were very much present.


There was, for example, no support to mark. Maybe there is now, but I 
doubt. Why wasn't an existing element, like font, wasn't used for it? 
Or why don't the use cases for mark fall under those for b, i, or u?


Which support was needed? Right, underlining. So what's so difficult 
in saying that u is just as semantic as span, except that u is by 
default underlined?



With u,
many of the actual uses of the element can be seen as uses of both the old
presentational meaning and the new media-independent meaning without
conflict.


That's because the new media-independent meaning has been formulated 
so vaguely that it can be ignored and the presentational meaning 
understood as the real one. But people who will try to take the text for 
real will get hopelessly confused (until someone comes to the rescue 
saying oh, u _really_ means underline).



I would no more think we need an element for bolder than I would think
we need an element for louder in speech synthesis or an element for
bigger hand gestures in sign-language interpretation (not that I'm aware
of a sign-language HTML UA, but there's no fundamental reason one couldn't
exist in the future). When you start from the fundamental position that
these media are no more important than each other, it is really hard to
see why we would ever introduce phrase-level typographic features.


It's not that hard if you think that HTML is all about markup for 
written languages. What speech synthesizers or Braille renderers do is 
that the convert written text to other forms, often with serious 
problems and limitations, and they have to deal with things like 
bolding, underlining, and italics when they exist in texts being 
processed. It does not help them the least to say that u in HTML 
represents a span of text with an unarticulated, though explicitly 
rendered, non-textual annotation. They have their own ways of dealing 
with underline, either by ignoring it or by mapping it something that 
they can do.


Yucca

Re: [whatwg] Physical quantities: var or i?

2012-01-21 Thread Jukka K. Korpela


2012-01-21 0:30, Ian Hickson wrote:


On Tue, 26 Jul 2011, Jukka K. Korpela wrote:

[...]

I don’t think you have clarified whether var  is suitable for
physical quantities, but I guess you meant to imply it—even though
there is not a single example about markup for physical quantities.


Given that the spec contains the exact example you gave (E=mc^2), and
given that the definition explicitly includes an identifier representing
a constant as one of the uses for the element, I have to disagree with
your assessment.


Now that you have added that example, the text implies that var is the 
suggested markup for symbols of physical quantities. It is still 
somewhat odd that this is expressed via an example only, and the basic 
prose says: “The var element represents a variable. This could be an 
actual variable in a mathematical expression or programming context, an 
identifier representing a constant, a function parameter, or just be a 
term used as a placeholder in prose.” None of the examples covers 
symbols of physical quantities, and yet they are probably more common 
texts in general (as opposite to mathematics and programming) than the 
examples given.



On the other hand, it seems that it doesn’t really matter. The var
element has now been defined to have such a wide and vague meaning that
it is pointless to use it. There is little reason to expect that any
software will ever pay attention to var markup on any semantic basis.


You seem to imply that there was reason to expect so before, which is
certainly news to me!


I have rather been optimistic about future developments for markup 
elements that have been defined exactly enough to warrant meaningful 
semantics-based processing. For example, most of the uses mentioned in 
current text imply that var element contents should be kept intact in 
automatic language translation.


 I would not really expect these elements to be used

for anything other than styling hooks.


That might be realistic, especially there is no significant semantic 
clarification in sight in general. This raises the question why we could 
not just return to the original design with some physical markup like 
i, b, and u together with span that was added later. What’s the 
idea of wasting time in wondering which markup to choose, among several 
vaguely described alternatives, when it all ends up with being 
comparable to arbitrary author-named styles in word processing?


The advantage of using i, b, and u is that they have defined 
default rendering (even in the absense of CSS) and universal support in 
browsers. Authors can still use classes if they like, and they can still 
change the rendering via CSS just as they can when some fancy markup 
is used.



So authors will use i  if they think italics is semantically essential,
and var  won’t be used much.


That seems to be the status quo.


So why not simply define i recommended and describe var, cite, 
em, and dfn as deprecated but supported alternatives? This would 
make authoring simpler without any real cost. There’s little reason to 
tell authors to use “semantic markup” if we don’t think it has real 
effect on anything.



However, some authors like the ease of maintenance that comes from using
elements as a general classification mechanism and classes to provide
fine-grained control, and it is mostly for them that HTML provides a
variety of more specific elements like var.


This implies a burden on learning, teaching, and using HTML. Anyone who 
seriously tries to understand HTML will ask, for example, which of 
var, cite, em, dfn, i, span, abbr he should use in 
particular situations.


Authors who wish to use classification may well do that with elements 
like i as well. Then they just need to decide on their own classification.



Too bad there's no example ofvar  used in programming context. The
current wording suggests that it would be normal, when discussing
programming, to write, say, Then we define the variable
varmyFoo/var  of type codefooType/code  with initial value
codeFoo/code  - -, which really makes no sense, even if we use
both var  and code  for myFoo.


Why does it make no sense?


Because var does not imply that the contents is computer code. Yet a 
variable name in programming is surely code if a type name or a literal 
is. And using codevarmyFoo/var/code is clumsy, and it makes the 
text appear in italics by default, which is probably unsuitable 
(monospace italics doesn’t work well). Why would an author use markup 
that by default causes rendering that he does *not* want, when there’s 
the option of using span?



Because it implies that in default rendering, identifiers of variables
appear in italics whereas identifiers of types or classes do not. Why
would anyone use extra var  markup when it has no other implications
than requiring extra CSS code to remove (when possible) italics?


To enable easier maintenance of the markup and easy self-documenting
styling, same as pretty much all of HTML

Re: [whatwg] Decimal comma in numeric input

2012-01-19 Thread Jukka K. Korpela


2012-01-20 1:19, David Singer wrote:


What the user enters and sees on screen is a presentational/locale issue


Which one? “Presentational” normally refers to things like layout 
design, colors, fonts, and borders. Locales are something different.


The difference between “1.005” meaning one thousand and five vs. one and 
five thousandths is normally regarded as a locale difference, and nobody 
has suggested that that it should be handled in CSS when it is about 
document content.


Why would things suddenly change when it comes to user interface? 
Besides, there is nothing in CSS as currently defined that even tries to 
address such issues.


Yucca

Re: [whatwg] A few questions on HTML5

2012-01-03 Thread Jukka K. Korpela


2012-01-03 12:45, Bronislav Klučka wrote:


On 3.1.2012 10:32, Mani wrote:

[…]

2. Will XHTML5 have a DTD, because XHTML5 must be well-formed?

http://www.w3.org/TR/html5/the-xhtml-syntax.html#writing-xhtml-documents
http://wiki.whatwg.org/wiki/HTML_vs._XHTML


To find an answer to the question that was asked, one needs to read 
quite a lot between the lines in the cited documents. The answer appears 
to be “No, XHTML5 won’t have a DTD, since you’re supposed to use a 
validator specifically written for HTML5.”


Allowing XHTML 1.0 and XHTML 1.1 DOCTYPEs as “obsolete but conforming” 
and not saying a word about any DTD that sovers any of the HTML5 
novelties looks like a clear indication of intent.


Well-formedness requirement does not imply the need for a DTD at all. Au 
contraire, “well-formed” is just a confusing term for conformance to 
generic XML rules (“well-formed XML” really means nothing but “XML”), as 
opposite to any rules in a DTD for example.


This appears to mean that when XHTML5 is used together with other XML 
tag sets, you cannot use a DTD-based validator just by adding the 
declarations for the other tags into an XHTML5 DTD. So the question 
really is: will someone want to validate, say, XHTML5 + MathML documents?


Yucca

Re: [whatwg] HTML5 named entity Gt; and Lt;

2011-12-14 Thread Jukka K. Korpela


2011-12-14 19:34, Ilhan Y. wrote:


By the way, can we have Unicode names (HTML names) for Mercury, Sun,
Earth and other planets. They are used by many astronomers on the
internet.


Nice parody! But maybe people won’t take it as parody.

After all, there is no rationale given for the inclusion of new “named 
character references,” so people might see the idea as asking authors to 
submit new proposals for every possible and impossible character.


The whole idea of extending the repertoire is wrong. We have lived with 
a certain set of entity references (now being renamed “named character 
references”), widely supported by browsers, except possibly in XHTML 
mode. Authors who need other characters can enter them as such, using 
UTF-8 (which is being favored, is it not?) or using numeric character 
references.


So nobody really needs any added pseudo-mnemonic “named references,” and 
they just cause incompatibility: pages fail on most browsers, when they 
would work perfectly if other methods of including characters had been used.


Allowing gt and GT and GT; as synonyms for gt; might be pragmatic, 
if there is sufficient evidence of their use on legacy pages, but code 
checkers should issue a warning (there is nothing to be gained by using 
such deviating forms). And adding things like Gt;, with a different 
meaning, is just asking for trouble.


Yucca

Re: [whatwg] Default encoding to UTF-8?

2011-12-06 Thread Jukka K. Korpela


2011-12-06 6:54, Leif Halvard Silli wrote:


Yeah, it would be a pity if it had already become an widespread
cargo-cult to - all at once - use HTML5 doctype without using UTF-8
*and* without using some encoding declaration *and* thus effectively
relying on the default locale encoding ... Who does have a data corpus?


I think we wound need to ask search engine developers about that, but 
what is this proposed change to defaults supposed to achieve. It would 
break any old page that does not specify the encoding, as soon as the 
the doctype is changed to !doctype html or this doctype is added to a 
page that lacked a doctype.


Since !doctype html is the simplest way to put browsers to standards 
mode, this would punish authors who have realized that their page works 
better in standards mode but are unaware of a completely different and 
fairly complex problem. (Basic character encoding issues are of course 
not that complex to you and me or most people around here; but most 
authors are more or less confused with them, and I don't think we should 
add to the confusion.)


There's a little point in changing the specs to say something very 
different from what previous HTML specs have said and from actual 
browser behavior. If the purpose is to make things more exactly defined 
(a fixed encoding vs. implementation-defined), then I think such 
exactness is a luxury we cannot afford. Things would be all different if 
we were designing a document format from scratch, with no existing 
implementations and no existing usage. If the purpose is UTF-8 
evangelism, then it would be just the kind of evangelism that produces 
angry people, not converts.


If there's something that should be added to or modified in the 
algorithm for determining character encoding, the I'd say it's error 
processing. I mean user agent behavior when it detects, after running 
the algorithm, when processing the document data, that there is a 
mismatch between them. That is, that the data contains octets or octet 
sequences that are not allowed in the encoding or that denote 
noncharacters. Such errors are naturally detected when the user agent 
processes the octets; the question is what the browser should do then.


When data that is actually in ISO-8859-1 or some similar encoding has 
been mislabeled as UTF-8 encoded, then, if the data contains octets 
outside the ASCII, character-level errors are likely to occur. Many 
ISO-8859-1 octets are just not possible in UTF-8 data. The converse 
error may also cause character-level errors. And these are not uncommon 
situations - they seem occur increasingly often, partly due to cargo 
cult use of UTF-8 (when it means declaring UTF-8 but not actually 
using it, or vice versa), partly due increased use of UTF-8 combined 
with ISO-8859-1 encoded data creeping in from somewhere into UTF-8 
encoded data.


From the user's point of view, the character-level errors currently 
result is some gibberish (e.g., some odd box appearing instead of a 
character, in one place) or in total mess (e.g. a large number non-ASCII 
characters displayed all wrong). In either case, I think an error should 
be signalled to the user, together with
a) automatically trying another encoding, such as the locale default 
encoding instead of UTF-8 or UTF-8 instead of anything else
b) suggesting to the user that he should try to view the page using some 
other encoding, possibly with a menu of encodings offered as part of the 
error explanation

c) a combination of the above.

Although there are good reasons why browsers usually don't give error 
messages, this would be a special case. It's about the primary 
interpretation of the data in the document and about a situation where 
some data has no interpretation in the assumed encoding - but usually 
has an interpretation in some other encoding.


The current Character encoding overrides rules are questionable 
because they often mask out data errors that would have helped to detect 
problems that can be solved constructively. For example, if data labeled 
as ISO-8859-1 contains an octet in the 80...9F range, then it may well 
be the case that the data is actually windows-1252 encoded and the 
override helps everyone. But it may also be the case that the data is 
in a different encoding and that the override therefore results in 
gibberish shown to the user, with no hint of the cause of the problem. 
It would therefore be better to signal a problem to the user, display 
the page using the windows-1252 encoding but with some instruction or 
hint on changing the encoding. And a browser should in this process 
really analyze whether the data can be windows-1252 encoded data that 
contains only characters permitted in HTML.


Yucca

Re: [whatwg] Default encoding to UTF-8?

2011-12-06 Thread Jukka K. Korpela

2011-12-06 15:59, NARUSE, Yui wrote:

(2011/12/06 17:39), Jukka K. Korpela wrote:

2011-12-06 6:54, Leif Halvard Silli wrote:

Yeah, it would be a pity if it had already become an widespread
cargo-cult to - all at once - use HTML5 doctype without using UTF-8
*and* without using some encoding declaration *and* thus effectively
relying on the default locale encoding ... Who does have a data corpus?

I found it: http://rink77.web.fc2.com/html/metatagu.html

I'm not sure of the intended purpose of that demo page, but it seems to
illustrate my point.

It uses HTML5 doctype and not declare encoding and its encoding is Shift_JIS,
the default encoding of Japanese locale.

My Firefox uses the ISO-8859-1 encoding, my IE the windows-1252
encoding, resulting in a mess of course. But the point is that both
interpretations mean data errors at the character level - even seen as
windows-1252, it contains bytes with no assigned meaning (e.g., 0x81 is
UNDEFINED).

Current implementations replaces such an invalid octet with a replacement
character.

No, it varies by implementation.

When data that is actually in ISO-8859-1 or some similar encoding has been mislabeled as
UTF-8 encoded, then, if the data contains octets outside the ASCII, character-level
errors are likely to occur. Many ISO-8859-1 octets are just not possible in UTF-8 data.
The converse error may also cause character-level errors. And these are not uncommon
situations - they seem occur increasingly often, partly due to cargo cult use of
UTF-8 (when it means declaring UTF-8 but not actually using it, or vice versa),
partly due increased use of UTF-8 combined with ISO-8859-1 encoded data creeping in from
somewhere into UTF-8 encoded data.

In such case, the page should be failed to show on the author's environment.

An authoring tool should surely indicate the problem. But what should
user agents do when they face such documents and need to do something
with them?

From the user's point of view, the character-level errors currently result is
some gibberish (e.g., some odd box appearing instead of a character, in one
place) or in total mess (e.g. a large number non-ASCII characters displayed all
wrong). In either case, I think an error should be signalled to the user,
together with
a) automatically trying another encoding, such as the locale default encoding
instead of UTF-8 or UTF-8 instead of anything else
b) suggesting to the user that he should try to view the page using some other
encoding, possibly with a menu of encodings offered as part of the error
explanation
c) a combination of the above.

This premises that a user know the correct encoding.

Alternative b) means that the user can try some encodings. A user agent
could give a reasonable list of options.

Consider the example document mentioned. When viewed in a Western
environment, it probably looks all gibberish. Alternative a) would
probably not help, but alternative b) would have some chances. If the
user has some reason to suspect that the page might be in Japanese, he
would probably try the Japanese encodings in the browser's list of
encodings, and this would make the document readable after a try or two.

I, Japanese, imagine that it is hard that distingusih ISO-8859-1 page and
ISO-8859-2 page.

Yes, but the idea isn't really meant to apply to such cases, as there is
no way to detect _at the character encoding level_ to recognize
ISO-8859-1 mislabeled as ISO-8859-2 or vice versa.

Some browsers alerts scripting issues.
Why they cannot alerts an encoding issue?

Surely they could, though I was not thinking an alert in a popup sense -
rather, a red error indicator somewhere. There would be many more
reasons to signal encoding issues than to signal scripting issues, as we
know that web pages generally contain loads of client-side scripting
errors that do not actually affect page rendering or functionality.

The current Character encoding overrides rules are questionable because they often mask out data
errors that would have helped to detect problems that can be solved constructively. For example, if data
labeled as ISO-8859-1 contains an octet in the 80...9F range, then it may well be the case that the data is
actually windows-1252 encoded and the override helps everyone. But it may also be the case that
the data is in a different encoding and that the override therefore results in gibberish shown to
the user, with no hint of the cause of the problem.

I think such case doesn't exist.
On character encoding overrides a superset overrides a standard set.

Technically, not quite so (e.g., in ISO-8859-1, 0x81 is U+0081, a
control character that is not allowed in HTML - I suppose, though I
cannot really find a statement on this in HTML5 - whereas in
windows-1252, it is undefined).

More importantly my point was about errors in data, resulting e.g. from
a faulty code conversion or some malfunctioning software that has
produced

Re: [whatwg] Default encoding to UTF-8?

2011-12-06 Thread Jukka K. Korpela


2011-12-06 22:58, Leif Halvard Silli write:


There is now a bug, and the editor says the outcome depends on a
browser vendor to ship it:
https://www.w3.org/Bugs/Public/show_bug.cgi?id=15076

Jukka K. Korpela Tue Dec 6 00:39:45 PST 2011


what is this proposed change to defaults supposed to achieve. […]


I'd say the same as in XML: UTF-8 as a reliable, common default.


The bug was created so that the argument given was:
It would be nice to minimize number of declarations a page needs to 
include.


That is, author convenience - so that authors could work sloppily and 
produce documents that could fail on user agents that haven't 
implemented this change.


This sounds more absurd than I can describe.

XML was created as a new data format; it was an entirely different issue.


If there's something that should be added to or modified in the
algorithm for determining character encoding, the I'd say it's error
processing. I mean user agent behavior when it detects, [...]


There is already an (optional) detection step in the algorithm - but UA
treat that step differently, it seems.


I'm afraid I can't find it - I mean the treatment of a document for 
which some encoding has been deduced (say, directly from HTTP headers) 
and which then turns out to violate the rules of the encoding.


Yucca

Re: [whatwg] Default encoding to UTF-8?

2011-11-30 Thread Jukka K. Korpela


2011-12-01 1:28, Faruk Ates wrote:


My understanding is that all browsers* default to Western Latin (ISO-8859-1)
 encoding by default (for Western-world downloads/OSes) due to legacy 
content on the web.


Browsers default to various encodings, often windows-1252 (rather than 
ISO-8859-1). They may also investigate the actual data and make a guess 
based on it.



I'm wondering if it might not be good to start encouraging defaulting to UTF-8,


It would not. There’s no reason to recommend any particular defaulting, 
especially not something that deviates from past practices.


It might be argued that browsers should do better error detection and 
reporting, so that they inform the user e.g. if the document’s encoding 
has not been declared at all and it cannot be inferred fairly reliably 
(e.g., from BOM). But I’m afraid the general feeling is that browsers 
should avoid warning users, as that tends to contradict authors’ 
purposes – and, in fact, mostly things that are serious problems in 
principle aren’t that serious in practice.



We like to think that “every web developer is surely building things in UTF-8 
nowadays”

 but this is far from true.

There’s a large amount of pages declared as UTF-8 but containing Ascii 
only, as well as pages mislabeled as UTF-8 but containing e.g. ISO-8859-1.



I still frequently break websites and webapps simply by entering my name (Faruk 
Ateş).


That’s because the server-side software (and possibly client-side 
software) cannot handle the letter “ş”. It would not help if the page 
were interpreted as UTF-8. If the author knows that a server-side form


Yucca

Re: [whatwg] New attributes would degrade better than new elements

2011-10-30 Thread Jukka K. Korpela


30.10.2011 1:18, Eric Sh. wrote:


I heard there are plans to create new tags for layouts to replace the
use of tables as layout elements.


Maybe such rumors have been caused by taking some parody for real.


You keep speaking of creating new attributes instead of adding new tags
but then what is the point in adding new attributes instead of simply
using classes which are far more compatible on past browsers?


That would correspond to the microformats approach, which is the 
simplest way of adding low-level metadata. But it seems that the search 
engine consortium decided to favor another approach, microdata. Note 
that it does not use new elements - even though it adds completely new 
semantics - but new attributes.


I think I have mentioned the class attribute in this discussion, as well 
as the point that using class to add semantics could conflict with 
existing usage. When authors have written div class=nav, they didn't 
expect browsers or other software to start treating the element in their 
own ways, according to some future specification. They expected the 
class name space to be for them to use freely.


One might ask how often does a class name like nav relate to something 
else than a navigation block, in practice. In theory, it could be just 
anything, of course. And while div class=nav.../div is a common 
paradigm, div class=article.../div is not, and article might well 
have been used as a class name with no intent of declaring the content 
as a syndicatable article or getting some special default article 
styling that browsers might apply.



And WHATWG is working hard to ensure compatability of new additions with
old browsers(the DOCTYPE for example).


I don't see how the DOCTYPE trickery relates to this. The only things 
that the !doctype html construct achieves are putting browsers to 
standards more (something that can be achieved by the use of any 
private doctype declaration) and informing validators (linters) that 
they should treat the document according to what happens to be the 
Living Standard's content today.



So I am positive issues like this one were already discussed and
dismissed for some reason or another,


I am positive that if there were a solid ground for the introduction of 
new elements like nav, article, etc., it would already have been 
presented in this discussion, if not in the Living Standard itself.


--
Yucca, http://www.cs.tut.fi/~jkorpela/

Re: [whatwg] New attributes would degrade better than new elements

2011-10-29 Thread Jukka K. Korpela


27.10.2011 3:11, Ashley Sheridan wrote:


Try telling me
Google isn't aware of HTML5 in web pages and I'll laugh.


OK, I'll try: Google does not care about new HTML5 elements. Do you feel 
amused now?


Can you please now do me, and others, a favor and give some evidence of 
actual Google behavior in this respect? If it's something that we need 
to be aware of, it should be observable from outside Google, i.e. when 
using Google, not just in their internal code that is not public. So 
which effects can we observe?


(This would be interesting in its own account, even though it does not 
prove that new _elements_ were needed for that. But it would give some 
perspective regarding the eagerness to add and promote new elements.)



- - you shouldn't use attributes to determine the meaning of the
content.


That sounds like a prejudice based on the introduction of many 
presentational attributes in HTML 3.2 and their preservation in later 
versions. It does not in any way mean that attributes as such are 
presentational and not semantic.


HTML5 tries hard to distinguish between table indicating tabular data 
and table being used merely as layout tool - and the distinction is 
largely based on the use of attributes in the table element and its 
descendants. It is certainly wise to keep table as dual (tabular data 
vs. layout) for compatibility, instead of introducing new elements to 
distinguish them - no matter how logical or semantic such an idea might 
sound. Using attributes in div to indicate navigational areas, 
articles, etc., would similarly be useful for compatibility and would be 
much clearer and more logical, as the meaning would be uniquely defined 
by a single attribute - not by some rather messy rules involving several 
elements and attributes.


--
Yucca, http://www.cs.tut.fi/~jkorpela/

Re: [whatwg] New attributes would degrade better than new elements

2011-10-27 Thread Jukka K. Korpela


27.10.2011 5:38, Eric Sh. wrote:


And if we stop adding new features old browsers do not support or not use
them because very little browsers are not supporting them then it would
completely stop innovation and the evolution of the web.


How does this relate to the question of adding element vs. adding 
attributes?



I am supporting what Ashley has said, just think about it if you sorround
context with article  then speech browsers can know from where to start
reading the article instead of a whole web page.


They could do just the same with div type=article.


I believe that the decision makers are not stupid,


_I_ am not stupid, and I did not come to think of this earlier.


they are smart enough to
know all these technical issues and conflicts


There are issues and conflicts with adding a new element like nav, as 
compared with adding a new attribute. So the question is: what do you 
gain by adding an element rather than an attribute?


There is no _required_ functionality or default rendering for nav or 
article and no special attributes for them. What you lose by having 
them as elements rather than attributes is that you cannot style them in 
a manner that works on all browsers.



And lastly, I understand and encourage different opinions but I think it is
too late to change anything that has been already implemented by all major
browsers(Including IE 9!)


Do you think that older versions of IE can be ignored? They will be with 
us for years. There are ways to teach new elements to them, but they 
are based on JavaScript (so they do not work universally). Such issues 
would be quite unnecessary if attributes were used and not elements.


There's no implementation worth mentioning so far, or can you mention a 
single browser or search engine that actually does something useful 
with, say, article? Don't you think it would be triviality to its 
authors to extend the feature to cover div type=article?


The only real argument in favor of elements is brevity, and it should 
not weigh much when compared with compatibility issues.


This is also a matter of future additions. When new markup features will 
be added, will they be designed to degrade gracefully and to allow 
smooth transition? If, for example, it will be decided that dedicated 
markup for the main content of a document is needed after all, adding a 
new element like main.../main would have the same problems as nav, 
article, etc. now have. It would be much smoother to introduce div 
type=main.../div, and taking the new feature into use in old 
documents would require just the addition of one attribute - not 
changing the existing markup like div id=main.../div to another 
element (potentially requiring many changes to CSS and scripts) or 
redundantly using both the old markup and the new element around it.


This is different from standardizing class attribute values. Such 
standardization is questionable, since the class attribute is supposed 
to be authors' playground, mainly for styling, and assigning a meaning 
to any reasonable-looking class name would conflict with existing usage. 
So it's better to keep class free for authors' varying purposes and use 
a different attribute, with no prior use, to introduce new semantics.


--
Yucca, http://www.cs.tut.fi/~jkorpela/

Re: [whatwg] New attributes would degrade better than new elements

2011-10-27 Thread Jukka K. Korpela


27.10.2011 9:55, Ashley Sheridan wrote:


There is no _required_ functionality or default rendering for nav  or
article  and no special attributes for them. What you lose by having
them as elements rather than attributes is that you cannot style them in
a manner that works on all browsers.



nav is a block level element, so behaves as such in conforming browsers.


And if div type=nav were used, it would be rendered as a block in 
nonconforming browsers too. (The point about more or less required 
default rendering with display: block is taken as a correction to my 
statement above, but it does not really change my point. Rather, 
strengthens it.)



What about strong and b?


Yes, what about them? They have been in HTML since the very beginning. 
If you were to add _new_ markup for emphasis into HTML, I would suggest 
that you don't add a new element, like key, but rather an attribute - 
to an element that comes closest in meaning and default rendering, like 
strong type=key or b type=key.



Google admits certain aspects of its indexing algorithm, and this is but
a little part of it. They would be certainly missing a trick if they
weren't also indexing based on HTML5 tags as well, adding context to a
page.


There's a lot we can speculate about potential use of the new markup 
elements and remarkably little factual evidence. But surely if Google 
can recognize nav and make some use of it, it could deal with div 
type=nav as well.



What is the fear of adding new tags?


Compatibility with older browsers. It should not be broken without due 
cause.



You don't create a new XML document
with every tag as tag do you?


HTML isn't XML. Or, to the extent that it is XML (serialized as XML), it 
has a specific HTML vocabulary recognized by browsers and other 
HTML-aware software.



They are backwards compatible in
that browsers that don't understand them can just ignore them.


That's exactly the point that causes the incompatibility: to a browser 
that does not recognize nav at all, your CSS settings for it are 
ignored and it isn't even rendered as a block by default.



You can
use other elements within them in a transitional phase of your
development if you really think you need to.


So in any reasonable use now or some years from now, the new markup that 
was supposed to simplify markup will make markup more lengthy and less 
logical. Instead of

div class=nav.../div
authors would need to use
navdiv class=nav.../div/nav
and they would have to do all the styling and scripting on the div element.

--
Yucca, http://www.cs.tut.fi/~jkorpela/

Re: [whatwg] New attributes would degrade better than new elements

2011-10-27 Thread Jukka K. Korpela


27.10.2011 11:42, Simon Pieters wrote:


It's difference between working on all browsers and working on some
browsers as well as being tweakable when JavaScript is enabled.


div type=nav is not stylable in IE6 because it doesn't support
attribute selectors.


Granted, but
a) IE6 is dying, whereas IE7 and IE8 are and will be with us for a long 
time, and they do support attribute selectors
b) if you regard IE6 as still relevant, you can additionally use a class 
or id attribute (or keep it, if working with an existing document); on 
IE6, they work for div but not for nav as the entire element is unknown.


--
Yucca, http://www.cs.tut.fi/~jkorpela/

[whatwg] New attributes would degrade better than new elements

2011-10-26 Thread Jukka K. Korpela

New elements like nav and footer have the problem that some existing 
user agents don't recognize them, even for the purposes of styling. So 
if you want to use nav, then - unless you're using it for purely 
semantic reasons with no idea of styling - you need to use some special 
trick to make old browsers recognize it or assign your styles to some 
logically redundant div markup that you use in addition to nav.


Therefore, it would be much simpler, for compatibility with existing 
user agents, to use just div type=nav and div type=footer. Such 
elements can be styled at will, and if any browsers or search engines 
wish to recognize semantic markup, type=nav should not be a bigger 
problem than nav, rather smaller.


I understand that this should have been suggested years ago. But I 
didn't think of the issue, and it seems that neither did anyone else, 
aloud. And it's not too late, is it?


Nobody needs new elements with no required functionality, really. The 
idea of more compact markup is pointless. People don't read or write 
markup that much, and if they do, div type=nav is no less semantic 
than nav. But the latter has the serious drawback of being ignored by 
many relevant user agents.


It does not need to be the 'type' attribute of course. That attribute 
name is seriously overloaded, so 'kind' might be better. The important 
thing is to introduce an attribute different from 'class', which 
currently lets authors use a free naming space. We don't want to 
interfere with style sheets that might use this or that 'class' 
attribute value; instead, a new attribute name (defined as semantic, not 
presentational, but still useful for styling) is called for - rather 
than new element names, which are born homeless.


--
Yucca, http://www.cs.tut.fi/~jkorpela/

Re: [whatwg] New attributes would degrade better than new elements

2011-10-26 Thread Jukka K. Korpela


26.10.2011 23:16, Tab Atkins Jr. wrote:


Believe me, these discussions were had in the past.


I do, but did you draw the conclusions?


All major UAs except old IE handle unknown elements in a way that's
acceptable


That means all browsers except that the most common one. Is that a 
realistic view?


What do you expect to gain by adding new elements, as opposite to the 
smoother addition of new attributes?



So, it's not a big deal.


It's difference between working on all browsers and working on some 
browsers as well as being tweakable when JavaScript is enabled.


Under which circumstances would you vote for the latter, and what do you 
expect to win? I love gambling, but what's the potential gain here? 
Pleasing someone's idea of semantic markup, as if attributes could not 
be semantic?


--
Yucca, http://www.cs.tut.fi/~jkorpela/

Re: [whatwg] New attributes would degrade better than new elements

2011-10-26 Thread Jukka K. Korpela


27.10.2011 0:57, Ashley Sheridan wrote:


If people are using versions of IE that old, then
they deserve to have an older version of the web given to them.


That's rather elitistic, given the fact that many people have no way of 
upgrading their IE or switching to your preferred browser, and no need 
to do that apart from some ideas of HMTL5.



Why is adding attributes smoother?


Browsers recognize the elements.


User agents still have to be modified
to 'understand' an attribute to make the same semantic sense as a new
tag


What semantic sense? Exactly what do modern browsers understand about 
nav for example? What are they required to understand? Just that 
there's a styleable element. But with div, that's something we have 
with all browsers.


The difference is between fancy new elements and good old elements with 
new attributes.



If you're using an older version of IE then likely it's because you
don't know any different


That's rather elitistic, isn't it? If we could discard all bad 
browsers, the world would be nicer, yes, but then we would not really 
have any browsers, would we?



Attributes can be semantic, but where do you draw the line?


In the definition of the attributes. If you can make up a new element 
like nav, why can't you make up a new attribute like type=nav?



Would you really favour using attributes to determing the meaning of a
tag, or would you rather that HTML just follows its natural course and
attributes be used to supplement a tag from default values?


Neither attributes nor tag names mean anything by themselves. They get 
their meanings from specifications or from browser practices.


The question is whether the new semantic tags have any useful impact 
(what might it be?). Inventing new tags may sound cooler than defining 
meanings for attributes, but it's just an idle game. Is there _any_ 
demonstrable use of, say, the semantics of nav? And what's the reason 
why that could not be achieved in the less disruptive way of assigning a 
standardized meaning to, say, the type attribute of div?


--
Yucca, http://www.cs.tut.fi/~jkorpela/

Re: [whatwg] Question: rel=help

2011-09-30 Thread Jukka K. Korpela


29.9.2011 21:52, Tantek Çelik wrote:


On Thu, Sep 29, 2011 at 11:12, Jukka K. Korpelajkorp...@cs.tut.fi  wrote:

29.9.2011 20:50, Tantek Çelik wrote:


Javascript-only help text (tooltip or otherwise) or any other content
intended for human consumption is a really bad idea for all the usual
reasons
(#a11y, mobile, search etc.)


Except in cases where the information is relevant only when JavaScript is
enabled.


That's a reasonable theory. Do you have URLs to any real world examples?


For example, various virtual keyboard pages use such techniques. Say,
http://www.virtualkeyboard.ws has buttons for entering characters, with 
mouseover events that show information about the key.



Question, would an element with rel=help and a title=Help text
make sense and be valid as a JavaScript hook for tooltips?


Realizing that this example markup was ambiguous - that is:

Does the string Help text represent a hypothetical placeholder on a
span or div etc.?

Or is that markup part of a hyperlink that links to a separate help
document? E.g.
a rel=help title=Help text href=help.html(?)/a


I assumed the latter. At least it makes sense. And it would make sense, 
in my opinion, to have JavaScript code that displays the string in the 
title attribute. The reason is that although graphic browsers generally 
display that value on mouseover by default, the implementations have 
oddities that reduce usability: the tooltip text is in a 
system-dependent font (cannot be styled by the author, cannot be 
modified by the user except via system settings), and it may disappear 
after some seconds.



But there are situations where you
expect 80% of people do well without any instructions.


Again, seems like a reasonable theory.

Do you have URLs to real world examples thereof?


Say, http://forums.whatwg.org/bb3/ucp.php?mode=register looks like a 
sure case. Or did you mean a form that additionally may have a problem 
for 20% of people or less? I guess _every_ form has some potential 
problem to _some_ people. The form cited has a Confirmation of 
registration question. I would expect well over 80% of users to be able 
to answer it without difficulties, but I'm afraid there's a 
non-ignorable amount of people who might get puzzled by it.



I'm not sure of what
we are expected to do, as authors, in order to give instructions that might
be needed by 20% of users but would mostly be a distraction for the
majority.


Theoretical problems are harder to provide specific answers for, but
this might work:

Try the details  and summary  elements.


I don't think the problem is theoretical (although it was formulated in 
general terms - as this seems natural in a discussion like this). The 
answer is, as there is no support worth mentioning yet.



http://html5doctor.com/the-details-and-summary-elements/


So you have some URLs to real world examples? :-)

--
Yucca, http://www.cs.tut.fi/~jkorpela/

1 2 >

1 - 100 of 170 matches

Mail list logo