Re: [whatwg] Can the maximum allowed value length be changed to restrict the number of characters?
On Aug 22, 2013, at 1:59 AM, Anne van Kesteren ann...@annevk.nl wrote: On Wed, Aug 21, 2013 at 8:19 PM, Alexey Proskuryakov a...@webkit.org wrote: FWIW, this is tracked for WebKit as https://bugs.webkit.org/show_bug.cgi?id=120030. I think Darin's comment about the server component makes sense. My remark was mostly as to what is exposed to JavaScript. I don't think we expose an API to measure the number of grapheme clusters in a given string at the moment and writing such a function might be rather hard. (Although if maxlength was redefined to work this way...) Yeah, I do see a benefit in matching what JavaScript does. However, that's not the most intuitive behavior for users. Considering end users makes sense too, but we should also consider what applications people want to write. From limited testing I believe Twitter currently counts Unicode scalar values. This is somewhat better than code units, but e.g. U+0041 U+030A still subtracts two from your 140 limit. (This also means the example in the specification that makes a jab at Twitter is technically incorrect.) (Not that Twitter's current control could be implemented with a plain input or textarea.) If measuring the number of code units is what the author wanted, then he/she could manually check inputElement.value.length. As you've just pointed out. different websites use different encoding schemes and have different requirements for the number of bytes or sequence of code units they can store. I don't think we can solve that problem in HTML. - R. Niwa
Re: [whatwg] Can the maximum allowed value length be changed to restrict the number of characters?
On Wed, Aug 21, 2013 at 8:19 PM, Alexey Proskuryakov a...@webkit.org wrote: FWIW, this is tracked for WebKit as https://bugs.webkit.org/show_bug.cgi?id=120030. I think Darin's comment about the server component makes sense. My remark was mostly as to what is exposed to JavaScript. I don't think we expose an API to measure the number of grapheme clusters in a given string at the moment and writing such a function might be rather hard. (Although if maxlength was redefined to work this way...) Considering end users makes sense too, but we should also consider what applications people want to write. From limited testing I believe Twitter currently counts Unicode scalar values. This is somewhat better than code units, but e.g. U+0041 U+030A still subtracts two from your 140 limit. (This also means the example in the specification that makes a jab at Twitter is technically incorrect.) (Not that Twitter's current control could be implemented with a plain input or textarea.) All choices seem to have drawbacks of sorts. I wonder if Norbert or Richard have an informed opinion. Rest of the thread is archived here: http://lists.w3.org/Archives/Public/public-whatwg-archive/2013Aug/thread.html#msg184 -- http://annevankesteren.nl/
Re: [whatwg] Can the maximum allowed value length be changed to restrict the number of characters?
On Tue, 20 Aug 2013 19:33:12 +0500, Boris Zbarsky bzbar...@mit.edu wrote: On 8/19/13 7:40 PM, Ryosuke Niwa wrote: Also, http://www.whatwg.org/specs/web-apps/current-work/multipage/common-input-element-attributes.html#the-maxlength-attribute says if the input element has a maximum allowed value length, then the code-unit length of the value of the element's value attribute must be equal to or less than the element's maximum allowed value length. This doesn't seem to match the behaviors of existing Web browsers The spec bit you quote above is an _authoring_ conformance requirement. That is input maxlength=2 value=abc is not valid HTML and a validator would flag it as invalid. What UAs do with this markup, on the other hand, is defined by the UA conformance requirements, and what they do is allow a value longer than maxlength if it's specified. or http://www.whatwg.org/specs/web-apps/current-work/multipage/association-of-controls-and-forms.html#maximum-allowed-value-length These are the UA conformance requirements in question. The paragraph should be revised to mention and only mention that the maxlength attribute affects the validation and the user agents may prevent the user from typing more characters than the specified value. The basic question is whether a validator should flag input maxlength=2 value=abc as a conformance error or not. It seems to me like it should. Why? It seems that it generally works in browsers, and has for a long time. On the other hand the use cases I can think of have mostly been taken over by placeholder, and pattern with good labelling, and so on. cheers -- Charles McCathie Nevile - Consultant (web standards) CTO Office, Yandex cha...@yandex-team.ru Find more at http://yandex.com
Re: [whatwg] Can the maximum allowed value length be changed to restrict the number of characters?
On 8/22/13 9:01 AM, Charles McCathie Nevile wrote: The basic question is whether a validator should flag input maxlength=2 value=abc as a conformance error or not. It seems to me like it should. Why? It seems that it generally works in browsers, and has for a long time. Sort of. It gets you in a state where the user can erase the c but not retype it (though the erasing edit can be undone via the editor's undo functionality, apparently) -Boris
Re: [whatwg] Can the maximum allowed value length be changed to restrict the number of characters?
(re-sent from hopefully correct address) 20 авг. 2013 г., в 7:09, Anne van Kesteren ann...@annevk.nl написал(а): On Tue, Aug 20, 2013 at 12:30 AM, Ryosuke Niwa rn...@apple.com wrote: Can the specification be changed to use the number of composed character sequences instead of the code-unit length? In a way I guess that's nice, but it also seems confusing that given data:text/html,input type=text maxlength=1 pasting in U+0041 U+030A would give a string that's longer than 1 from JavaScript's perspective. I don't think there's any place in the platform where we measure string length other than by number of code units at the moment. FWIW, this is tracked for WebKit as https://bugs.webkit.org/show_bug.cgi?id=120030. I agree with Darin's comment in that the standard should consider end user concepts more strongly here. WebKit had this more humane behavior for many years, so we know that it's compatible with the Web, and there is no need to chase the lowest common denominator. Additionally, there are features in the platform that work with Unicode grapheme clusters perfectly, and I think that these are closely connected to maxLength. Namely, editing functionality understands grapheme clusters very well, so you can change selections by moving caret right or left one character, and so forth. Web sites frequently perform some editing on the text as you type it. - WBR, Alexey Proskuryakov
Re: [whatwg] Can the maximum allowed value length be changed to restrict the number of characters?
2013-08-20 2:40, Ryosuke Niwa wrote: http://www.whatwg.org/specs/web-apps/current-work/multipage/association-of-controls-and-forms.html#maximum-allowed-value-length Why is the maxlength attribute of the input element specified to restrict the length of the value by the code-unit length? Apparently because in the DOM, character effectively means code unit. In particular, the .value.length property gives the length in code units. This is counter intuitive for users and authors who typically intend to restrict the length by the number of composed character sequences. That is true. We should not expect end users to know whether a character they enter occupies one code unit or two, i.e. whether it is a BMP character or not. Then again, I don't expect most users to enter non-BMP characters, though this might be changing as e.g. emoticons become more popular. In fact, this is the current shipping behavior of Safari and Chrome. And IE, but not Firefox. Here's a simple test: input maxlength=2 value=#x10400; On Firefox, you cannot add a character to the value, since the length is already 2. On Chrome and IE, you can add even a second non-BMP character, even though the length then becomes 4. I don't see this as particularly logical, though I'm looking this from the programming point of view, not end user view. Can the specification be changed to use the number of composed character sequences instead of the code-unit length? In contexts where you want to set maxlength in the first place, your reasons might well be related to limitations that apply to the code unit length. It's a different thing if the intent is to limit the amount of visible characters. Interestingly, an attempt like input pattern=.{0,42} to limit the amount of *characters* to at most 42 seems to fail. (Browsers won't prevent from typing more, but the control starts matching the :invalid selector if you enter characters that correspond to more than 42 code units.) The reason is apparently that . means any character in the sense any code point, counting a non-BMP character as two. Also, http://www.whatwg.org/specs/web-apps/current-work/multipage/common-input-element-attributes.html#the-maxlength-attribute says if the input element has a maximum allowed value length, then the code-unit length of the value of the element's value attribute must be equal to or less than the element's maximum allowed value length. This doesn't seem to match the behaviors of existing Web browsers or http://www.whatwg.org/specs/web-apps/current-work/multipage/association-of-controls-and-forms.html#maximum-allowed-value-length unless I'm misreading something. Namely, the value attribute set in the markup or by script isn't automatically truncated at the element's maximum allowed value length. There seems to be a conflict here indeed. It is different from the character vs. code unit issue, however. Definitions in 4.10.21.1 clearly imply that the length of the value of a control may exceed the limit set by maxlength. The Constraints part deals with the question what happens then (in form submission). Yucca
Re: [whatwg] Can the maximum allowed value length be changed to restrict the number of characters?
On Tue, Aug 20, 2013 at 12:30 AM, Ryosuke Niwa rn...@apple.com wrote: Can the specification be changed to use the number of composed character sequences instead of the code-unit length? In a way I guess that's nice, but it also seems confusing that given data:text/html,input type=text maxlength=1 pasting in U+0041 U+030A would give a string that's longer than 1 from JavaScript's perspective. I don't think there's any place in the platform where we measure string length other than by number of code units at the moment. -- http://annevankesteren.nl/
Re: [whatwg] Can the maximum allowed value length be changed to restrict the number of characters?
2013-08-20 17:09, Anne van Kesteren wrote: On Tue, Aug 20, 2013 at 12:30 AM, Ryosuke Niwa rn...@apple.com wrote: Can the specification be changed to use the number of composed character sequences instead of the code-unit length? In a way I guess that's nice, but it also seems confusing that given data:text/html,input type=text maxlength=1 pasting in U+0041 U+030A would give a string that's longer than 1 from JavaScript's perspective. Oh, right, this is an issue different from the non-BMP issue I discussed in my reply. This is even clearer in my opinion, since U+0041 U+030A is clearly two Unicode characters, not one, even though it is expected to be rendered as “Å” and even though U+00C5 is canonically equivalent to U+0041 U+030A. I don't think there's any place in the platform where we measure string length other than by number of code units at the moment. Besides, if “character” means something else than Unicode character (Unicode code point assigned to a character) or, as a different concept, Unicode code unit, then the question would arise what it means. For example, would a letter followed by 42 combining marks still be one character? (Such monstrosities are actually used, in an attempt to create “funny” effects.) Yucca
Re: [whatwg] Can the maximum allowed value length be changed to restrict the number of characters?
On 8/19/13 7:40 PM, Ryosuke Niwa wrote: Also, http://www.whatwg.org/specs/web-apps/current-work/multipage/common-input-element-attributes.html#the-maxlength-attribute says if the input element has a maximum allowed value length, then the code-unit length of the value of the element's value attribute must be equal to or less than the element's maximum allowed value length. This doesn't seem to match the behaviors of existing Web browsers The spec bit you quote above is an _authoring_ conformance requirement. That is input maxlength=2 value=abc is not valid HTML and a validator would flag it as invalid. What UAs do with this markup, on the other hand, is defined by the UA conformance requirements, and what they do is allow a value longer than maxlength if it's specified. or http://www.whatwg.org/specs/web-apps/current-work/multipage/association-of-controls-and-forms.html#maximum-allowed-value-length These are the UA conformance requirements in question. The paragraph should be revised to mention and only mention that the maxlength attribute affects the validation and the user agents may prevent the user from typing more characters than the specified value. The basic question is whether a validator should flag input maxlength=2 value=abc as a conformance error or not. It seems to me like it should. -Boris
Re: [whatwg] Can the maximum allowed value length be changed to restrict the number of characters?
On Aug 19, 2013, at 4:30 PM, Ryosuke Niwa rn...@apple.com wrote: http://www.whatwg.org/specs/web-apps/current-work/multipage/association-of-controls-and-forms.html#maximum-allowed-value-length Why is the maxlength attribute of the input element specified to restrict the length of the value by the code-unit length? This is counter intuitive for users and authors who typically intend to restrict the length by the number of composed character sequences. In fact, this is the current shipping behavior of Safari and Chrome. Can the specification be changed to use the number of composed character sequences instead of the code-unit length? Also, http://www.whatwg.org/specs/web-apps/current-work/multipage/common-input-element-attributes.html#the-maxlength-attribute says if the input element has a maximum allowed value length, then the code-unit length of the value of the element's value attribute must be equal to or less than the element's maximum allowed value length. This doesn't seem to match the behaviors of existing Web browsers or http://www.whatwg.org/specs/web-apps/current-work/multipage/association-of-controls-and-forms.html#maximum-allowed-value-length unless I'm misreading something. Namely, the value attribute set in the markup or by script isn't automatically truncated at the element's maximum allowed value length. The paragraph should be revised to mention and only mention that the maxlength attribute affects the validation and the user agents may prevent the user from typing more characters than the specified value. - R. Niwa