2013-08-20 2:40, Ryosuke Niwa wrote:

http://www.whatwg.org/specs/web-apps/current-work/multipage/association-of-controls-and-forms.html#maximum-allowed-value-length

>> Why is the maxlength attribute of the input element specified to
>> restrict the length of the value by the code-unit length?

Apparently because in the DOM, "character" effectively means "code unit". In particular, the .value.length property gives the length in code units.

This is counter intuitive for users and authors who typically
intend to restrict the length by the number of composed character
sequences.

That is true. We should not expect end users to know whether a character they enter occupies one code unit or two, i.e. whether it is a BMP character or not. Then again, I don't expect most users to enter non-BMP characters, though this might be changing as e.g. emoticons become more popular.

In fact, this is the current shipping behavior of
Safari and Chrome.

And IE, but not Firefox. Here's a simple test:

<input maxlength=2 value="&#x10400;">

On Firefox, you cannot add a character to the value, since the length is already 2. On Chrome and IE, you can add even a second non-BMP character, even though the length then becomes 4. I don't see this as particularly logical, though I'm looking this from the programming point of view, not end user view.

Can the specification be changed to use the number of composed
character sequences instead of the code-unit length?

In contexts where you want to set maxlength in the first place, your reasons might well be related to limitations that apply to the code unit length. It's a different thing if the intent is to limit the amount of visible characters.

Interestingly, an attempt like
<input pattern=.{0,42}>
to limit the amount of *characters* to at most 42 seems to fail. (Browsers won't prevent from typing more, but the control starts matching the :invalid selector if you enter characters that correspond to more than 42 code units.) The reason is apparently that "." means "any character" in the sense "any code point", counting a non-BMP character as two.

Also,
http://www.whatwg.org/specs/web-apps/current-work/multipage/common-input-element-attributes.html#the-maxlength-attribute
says "if the input element has a maximum allowed value length, then
the code-unit length of the value of the element's value attribute
must be equal to or less than the element's maximum allowed value
length."

This doesn't seem to match the behaviors of existing Web browsers or
http://www.whatwg.org/specs/web-apps/current-work/multipage/association-of-controls-and-forms.html#maximum-allowed-value-length
unless I'm misreading something.  Namely, the value attribute set in
the markup or by script isn't automatically truncated at the
element's maximum allowed value length.

There seems to be a conflict here indeed. It is different from the character vs. code unit issue, however.

Definitions in 4.10.21.1 clearly imply that the length of the value of a control may exceed the limit set by maxlength. The "Constraints" part deals with the question what happens then (in form submission).

Yucca

Reply via email to