2013-08-20 2:40, Ryosuke Niwa wrote:
http://www.whatwg.org/specs/web-apps/current-work/multipage/association-of-controls-and-forms.html#maximum-allowed-value-length
>> Why is the maxlength attribute of the input element specified to
>> restrict the length of the value by the code-unit length?
Apparently because in the DOM, "character" effectively means "code
unit". In particular, the .value.length property gives the length in
code units.
This is counter intuitive for users and authors who typically
intend to restrict the length by the number of composed character
sequences.
That is true. We should not expect end users to know whether a character
they enter occupies one code unit or two, i.e. whether it is a BMP
character or not. Then again, I don't expect most users to enter non-BMP
characters, though this might be changing as e.g. emoticons become more
popular.
In fact, this is the current shipping behavior of
Safari and Chrome.
And IE, but not Firefox. Here's a simple test:
<input maxlength=2 value="𐐀">
On Firefox, you cannot add a character to the value, since the length is
already 2. On Chrome and IE, you can add even a second non-BMP
character, even though the length then becomes 4. I don't see this as
particularly logical, though I'm looking this from the programming point
of view, not end user view.
Can the specification be changed to use the number of composed
character sequences instead of the code-unit length?
In contexts where you want to set maxlength in the first place, your
reasons might well be related to limitations that apply to the code unit
length. It's a different thing if the intent is to limit the amount of
visible characters.
Interestingly, an attempt like
<input pattern=.{0,42}>
to limit the amount of *characters* to at most 42 seems to fail.
(Browsers won't prevent from typing more, but the control starts
matching the :invalid selector if you enter characters that correspond
to more than 42 code units.) The reason is apparently that "." means
"any character" in the sense "any code point", counting a non-BMP
character as two.
Also,
http://www.whatwg.org/specs/web-apps/current-work/multipage/common-input-element-attributes.html#the-maxlength-attribute
says "if the input element has a maximum allowed value length, then
the code-unit length of the value of the element's value attribute
must be equal to or less than the element's maximum allowed value
length."
This doesn't seem to match the behaviors of existing Web browsers or
http://www.whatwg.org/specs/web-apps/current-work/multipage/association-of-controls-and-forms.html#maximum-allowed-value-length
unless I'm misreading something. Namely, the value attribute set in
the markup or by script isn't automatically truncated at the
element's maximum allowed value length.
There seems to be a conflict here indeed. It is different from the
character vs. code unit issue, however.
Definitions in 4.10.21.1 clearly imply that the length of the value of a
control may exceed the limit set by maxlength. The "Constraints" part
deals with the question what happens then (in form submission).
Yucca