Re: [whatwg] Can the maximum allowed value length be changed to restrict the number of characters?

2013-08-26 Thread Ryosuke Niwa

On Aug 22, 2013, at 1:59 AM, Anne van Kesteren ann...@annevk.nl wrote:

 On Wed, Aug 21, 2013 at 8:19 PM, Alexey Proskuryakov a...@webkit.org wrote:
 FWIW, this is tracked for WebKit as 
 https://bugs.webkit.org/show_bug.cgi?id=120030.
 
 I think Darin's comment about the server component makes sense. My
 remark was mostly as to what is exposed to JavaScript. I don't think
 we expose an API to measure the number of grapheme clusters in a given
 string at the moment and writing such a function might be rather hard.
 (Although if maxlength was redefined to work this way...)

Yeah, I do see a benefit in matching what JavaScript does.  However, that's not 
the most intuitive behavior for users.

 Considering end users makes sense too, but we should also consider
 what applications people want to write. From limited testing I believe
 Twitter currently counts Unicode scalar values. This is somewhat
 better than code units, but e.g. U+0041 U+030A still subtracts two
 from your 140 limit. (This also means the example in the specification
 that makes a jab at Twitter is technically incorrect.) (Not that
 Twitter's current control could be implemented with a plain input or
 textarea.)

If measuring the number of code units is what the author wanted, then he/she 
could manually check inputElement.value.length.

As you've just pointed out. different websites use different encoding schemes 
and have different requirements for the number of bytes or sequence of code 
units they can store.  I don't think we can solve that problem in HTML.

- R. Niwa



Re: [whatwg] Can the maximum allowed value length be changed to restrict the number of characters?

2013-08-22 Thread Anne van Kesteren
On Wed, Aug 21, 2013 at 8:19 PM, Alexey Proskuryakov a...@webkit.org wrote:
 FWIW, this is tracked for WebKit as 
 https://bugs.webkit.org/show_bug.cgi?id=120030.

I think Darin's comment about the server component makes sense. My
remark was mostly as to what is exposed to JavaScript. I don't think
we expose an API to measure the number of grapheme clusters in a given
string at the moment and writing such a function might be rather hard.
(Although if maxlength was redefined to work this way...)

Considering end users makes sense too, but we should also consider
what applications people want to write. From limited testing I believe
Twitter currently counts Unicode scalar values. This is somewhat
better than code units, but e.g. U+0041 U+030A still subtracts two
from your 140 limit. (This also means the example in the specification
that makes a jab at Twitter is technically incorrect.) (Not that
Twitter's current control could be implemented with a plain input or
textarea.)

All choices seem to have drawbacks of sorts. I wonder if Norbert or
Richard have an informed opinion. Rest of the thread is archived here:
http://lists.w3.org/Archives/Public/public-whatwg-archive/2013Aug/thread.html#msg184


-- 
http://annevankesteren.nl/


Re: [whatwg] Can the maximum allowed value length be changed to restrict the number of characters?

2013-08-22 Thread Charles McCathie Nevile

On Tue, 20 Aug 2013 19:33:12 +0500, Boris Zbarsky bzbar...@mit.edu wrote:


On 8/19/13 7:40 PM, Ryosuke Niwa wrote:
Also,  
http://www.whatwg.org/specs/web-apps/current-work/multipage/common-input-element-attributes.html#the-maxlength-attribute  
says if the input element has a maximum allowed value length, then the  
code-unit length of the value of the element's value attribute must be  
equal to or less than the element's maximum allowed value length.


This doesn't seem to match the behaviors of existing Web browsers


The spec bit you quote above is an _authoring_ conformance requirement.  
  That is input maxlength=2 value=abc is not valid HTML and a  
validator would flag it as invalid.  What UAs do with this markup, on  
the other hand, is defined by the UA conformance requirements, and what  
they do is allow a value longer than maxlength if it's specified.


or  
http://www.whatwg.org/specs/web-apps/current-work/multipage/association-of-controls-and-forms.html#maximum-allowed-value-length


These are the UA conformance requirements in question.

The paragraph should be revised to mention and only mention that the  
maxlength attribute affects the validation and the user agents may  
prevent the user from typing more characters than the specified value.


The basic question is whether a validator should flag input  
maxlength=2 value=abc as a conformance error or not.  It seems to  
me like it should.


Why? It seems that it generally works in browsers, and has for a long time.

On the other hand the use cases I can think of have mostly been taken over  
by placeholder, and pattern with good labelling, and so on.


cheers

--
Charles McCathie Nevile - Consultant (web standards) CTO Office, Yandex
  cha...@yandex-team.ru Find more at http://yandex.com


Re: [whatwg] Can the maximum allowed value length be changed to restrict the number of characters?

2013-08-22 Thread Boris Zbarsky

On 8/22/13 9:01 AM, Charles McCathie Nevile wrote:

The basic question is whether a validator should flag input
maxlength=2 value=abc as a conformance error or not.  It seems to
me like it should.


Why? It seems that it generally works in browsers, and has for a long time.


Sort of.  It gets you in a state where the user can erase the c but 
not retype it (though the erasing edit can be undone via the editor's 
undo functionality, apparently)


-Boris


Re: [whatwg] Can the maximum allowed value length be changed to restrict the number of characters?

2013-08-21 Thread Alexey Proskuryakov
(re-sent from hopefully correct address)

20 авг. 2013 г., в 7:09, Anne van Kesteren ann...@annevk.nl написал(а):

 On Tue, Aug 20, 2013 at 12:30 AM, Ryosuke Niwa rn...@apple.com wrote:
 Can the specification be changed to use the number of composed character 
 sequences instead of the code-unit length?
 
 In a way I guess that's nice, but it also seems confusing that given
 
 data:text/html,input type=text maxlength=1
 
 pasting in U+0041 U+030A would give a string that's longer than 1 from
 JavaScript's perspective. I don't think there's any place in the
 platform where we measure string length other than by number of code
 units at the moment.

FWIW, this is tracked for WebKit as 
https://bugs.webkit.org/show_bug.cgi?id=120030.

I agree with Darin's comment in that the standard should consider end user 
concepts more strongly here. WebKit had this more humane behavior for many 
years, so we know that it's compatible with the Web, and there is no need to 
chase the lowest common denominator.

Additionally, there are features in the platform that work with Unicode 
grapheme clusters perfectly, and I think that these are closely connected to 
maxLength. Namely, editing functionality understands grapheme clusters very 
well, so you can change selections by moving caret right or left one 
character, and so forth. Web sites frequently perform some editing on the 
text as you type it.

- WBR, Alexey Proskuryakov



Re: [whatwg] Can the maximum allowed value length be changed to restrict the number of characters?

2013-08-20 Thread Jukka K. Korpela

2013-08-20 2:40, Ryosuke Niwa wrote:


http://www.whatwg.org/specs/web-apps/current-work/multipage/association-of-controls-and-forms.html#maximum-allowed-value-length


 Why is the maxlength attribute of the input element specified to
 restrict the length of the value by the code-unit length?

Apparently because in the DOM, character effectively means code 
unit. In particular, the .value.length property gives the length in 
code units.



This is counter intuitive for users and authors who typically
intend to restrict the length by the number of composed character
sequences.


That is true. We should not expect end users to know whether a character 
they enter occupies one code unit or two, i.e. whether it is a BMP 
character or not. Then again, I don't expect most users to enter non-BMP 
characters, though this might be changing as e.g. emoticons become more 
popular.



In fact, this is the current shipping behavior of
Safari and Chrome.


And IE, but not Firefox. Here's a simple test:

input maxlength=2 value=#x10400;

On Firefox, you cannot add a character to the value, since the length is 
already 2. On Chrome and IE, you can add even a second non-BMP 
character, even though the length then becomes 4. I don't see this as 
particularly logical, though I'm looking this from the programming point 
of view, not end user view.



Can the specification be changed to use the number of composed
character sequences instead of the code-unit length?


In contexts where you want to set maxlength in the first place, your 
reasons might well be related to limitations that apply to the code unit 
length. It's a different thing if the intent is to limit the amount of 
visible characters.


Interestingly, an attempt like
input pattern=.{0,42}
to limit the amount of *characters* to at most 42 seems to fail. 
(Browsers won't prevent from typing more, but the control starts 
matching the :invalid selector if you enter characters that correspond 
to more than 42 code units.) The reason is apparently that . means 
any character in the sense any code point, counting a non-BMP 
character as two.



Also,
http://www.whatwg.org/specs/web-apps/current-work/multipage/common-input-element-attributes.html#the-maxlength-attribute
says if the input element has a maximum allowed value length, then
the code-unit length of the value of the element's value attribute
must be equal to or less than the element's maximum allowed value
length.

This doesn't seem to match the behaviors of existing Web browsers or
http://www.whatwg.org/specs/web-apps/current-work/multipage/association-of-controls-and-forms.html#maximum-allowed-value-length
unless I'm misreading something.  Namely, the value attribute set in
the markup or by script isn't automatically truncated at the
element's maximum allowed value length.


There seems to be a conflict here indeed. It is different from the 
character vs. code unit issue, however.


Definitions in 4.10.21.1 clearly imply that the length of the value of a 
control may exceed the limit set by maxlength. The Constraints part 
deals with the question what happens then (in form submission).


Yucca


Re: [whatwg] Can the maximum allowed value length be changed to restrict the number of characters?

2013-08-20 Thread Anne van Kesteren
On Tue, Aug 20, 2013 at 12:30 AM, Ryosuke Niwa rn...@apple.com wrote:
 Can the specification be changed to use the number of composed character 
 sequences instead of the code-unit length?

In a way I guess that's nice, but it also seems confusing that given

data:text/html,input type=text maxlength=1

pasting in U+0041 U+030A would give a string that's longer than 1 from
JavaScript's perspective. I don't think there's any place in the
platform where we measure string length other than by number of code
units at the moment.


-- 
http://annevankesteren.nl/


Re: [whatwg] Can the maximum allowed value length be changed to restrict the number of characters?

2013-08-20 Thread Jukka K. Korpela

2013-08-20 17:09, Anne van Kesteren wrote:


On Tue, Aug 20, 2013 at 12:30 AM, Ryosuke Niwa rn...@apple.com wrote:

Can the specification be changed to use the number of composed character 
sequences instead of the code-unit length?


In a way I guess that's nice, but it also seems confusing that given

data:text/html,input type=text maxlength=1

pasting in U+0041 U+030A would give a string that's longer than 1 from
JavaScript's perspective.


Oh, right, this is an issue different from the non-BMP issue I discussed 
in my reply. This is even clearer in my opinion, since U+0041 U+030A is 
clearly two Unicode characters, not one, even though it is expected to 
be rendered as “Å” and even though U+00C5 is canonically equivalent to 
U+0041 U+030A.



I don't think there's any place in the
platform where we measure string length other than by number of code
units at the moment.


Besides, if “character” means something else than Unicode character 
(Unicode code point assigned to a character) or, as a different concept, 
Unicode code unit, then the question would arise what it means. For 
example, would a letter followed by 42 combining marks still be one 
character? (Such monstrosities are actually used, in an attempt to 
create “funny” effects.)


Yucca




Re: [whatwg] Can the maximum allowed value length be changed to restrict the number of characters?

2013-08-20 Thread Boris Zbarsky

On 8/19/13 7:40 PM, Ryosuke Niwa wrote:

Also, 
http://www.whatwg.org/specs/web-apps/current-work/multipage/common-input-element-attributes.html#the-maxlength-attribute
 says if the input element has a maximum allowed value length, then the code-unit 
length of the value of the element's value attribute must be equal to or less than the 
element's maximum allowed value length.

This doesn't seem to match the behaviors of existing Web browsers


The spec bit you quote above is an _authoring_ conformance requirement. 
 That is input maxlength=2 value=abc is not valid HTML and a 
validator would flag it as invalid.  What UAs do with this markup, on 
the other hand, is defined by the UA conformance requirements, and what 
they do is allow a value longer than maxlength if it's specified.



or 
http://www.whatwg.org/specs/web-apps/current-work/multipage/association-of-controls-and-forms.html#maximum-allowed-value-length


These are the UA conformance requirements in question.


The paragraph should be revised to mention and only mention that the maxlength 
attribute affects the validation and the user agents may prevent the user from 
typing more characters than the specified value.


The basic question is whether a validator should flag input 
maxlength=2 value=abc as a conformance error or not.  It seems to 
me like it should.


-Boris


Re: [whatwg] Can the maximum allowed value length be changed to restrict the number of characters?

2013-08-19 Thread Ryosuke Niwa
On Aug 19, 2013, at 4:30 PM, Ryosuke Niwa rn...@apple.com wrote:
 http://www.whatwg.org/specs/web-apps/current-work/multipage/association-of-controls-and-forms.html#maximum-allowed-value-length
 
 Why is the maxlength attribute of the input element specified to restrict the 
 length of the value by the code-unit length?
 
 This is counter intuitive for users and authors who typically intend to 
 restrict the length by the number of composed character sequences.  In fact, 
 this is the current shipping behavior of Safari and Chrome.
 
 Can the specification be changed to use the number of composed character 
 sequences instead of the code-unit length?

Also, 
http://www.whatwg.org/specs/web-apps/current-work/multipage/common-input-element-attributes.html#the-maxlength-attribute
 says if the input element has a maximum allowed value length, then the 
code-unit length of the value of the element's value attribute must be equal to 
or less than the element's maximum allowed value length.

This doesn't seem to match the behaviors of existing Web browsers or 
http://www.whatwg.org/specs/web-apps/current-work/multipage/association-of-controls-and-forms.html#maximum-allowed-value-length
 unless I'm misreading something.  Namely, the value attribute set in the 
markup or by script isn't automatically truncated at the element's maximum 
allowed value length.

The paragraph should be revised to mention and only mention that the maxlength 
attribute affects the validation and the user agents may prevent the user from 
typing more characters than the specified value.

- R. Niwa