Re: BOM inside tokens

2008-07-16 Thread Brendan Eich
Latest news in the bug:

https://bugzilla.mozilla.org/show_bug.cgi?id=430740#c42

Igor wrote:

"So MSIE simply treats BOM as a whitespace for the purpose of  
parsing. Which
suggests to do this in SM to fix the bug: treat BOM as one of Unicode
whitespace characters in the scanner avoiding any character skipping or
patching."

So no security issues with stripping. Another triumph of de-facto  
standard over de-jure.

Pratap got this into ES3.1 drafts already.

/be
___
Es4-discuss mailing list
Es4-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es4-discuss


Re: BOM inside tokens

2008-07-15 Thread Waldemar Horwat
Igor Bukanov wrote:
> It seems the current IE7/IE8 behavior is to allow Cf only in srtring
> and regexp literals and to allow BOM only in string/regexps or at the
> beginning of the source,

Precisely what does "in string and regexp literals" mean?  The exact 
interpretation of this phrase is the core source of the aforementioned security 
holes.

Folks have exploited putting special characters right after a backslash to 
break out of whitelisted literals and execute arbitrary code from JSON; a few 
months ago I demonstrated such an attack.  Regular expressions offer even more 
opportunities for this kind of mischief.

Waldemar
___
Es4-discuss mailing list
Es4-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es4-discuss


Re: BOM inside tokens

2008-07-15 Thread Igor Bukanov
It seems the current IE7/IE8 behavior is to allow Cf only in srtring
and regexp literals and to allow BOM only in string/regexps or at the
beginning of the source, see
https://bugzilla.mozilla.org/show_bug.cgi?id=430740#c32 . This is very
reasonable.

Igor
___
Es4-discuss mailing list
Es4-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es4-discuss


Re: BOM inside tokens

2008-07-15 Thread Mark S. Miller
On Tue, Jul 15, 2008 at 11:27 AM, Igor Bukanov <[EMAIL PROTECTED]> wrote:

> 2008/7/15 Mark Miller <[EMAIL PROTECTED]>:
> > As we've found with the ES3-specified stripping of Cf characters, the
> main
> > effect of such transparent stripping of characters is to help attackers
> slip
> > XSS attacks past defensive filters. ES3.1 agrees with ES4 that BOMs and
> Cfs
> > should be treated as whitespace rather than stripped.
>

Speaking only for myself, yes, I'd be even happier with the syntax error. I
have proposed such harsh treatment before but various objections were
raised. In any case, again speaking only for myself, I'm happy with any
solution that repairs the security holes created by stripping and avoids
introducing new holes.

-- 
Cheers,
--MarkM
___
Es4-discuss mailing list
Es4-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es4-discuss


Re: BOM inside tokens

2008-07-15 Thread Igor Bukanov
2008/7/15 Mark Miller <[EMAIL PROTECTED]>:
> As we've found with the ES3-specified stripping of Cf characters, the main
> effect of such transparent stripping of characters is to help attackers slip
> XSS attacks past defensive filters. ES3.1 agrees with ES4 that BOMs and Cfs
> should be treated as whitespace rather than stripped.

But this mean that it will silently change the semantic of
++ from ++ into + +. From the security point of view it
would be better to treat such cases as syntax errors. A possible rule
could be to allow BOM/Cf only in strings/regexp leterals or if such
character follow/precedes non-zero-width white space character.
___
Es4-discuss mailing list
Es4-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es4-discuss


Re: BOM inside tokens

2008-07-15 Thread Mark Miller
On Tue, Jul 15, 2008 at 11:00 AM, Igor Bukanov <[EMAIL PROTECTED]> wrote:

> 2008/7/15 Ash Berlin <[EMAIL PROTECTED]>:
> >
> > I'd say that a BOM should be treated just like any ordinary whitespace
> > char - namely that it should invalid in spaces, and beyond that why is
> > any conversion needed, since its a valid unicode character...
>
> The problem comes from the current ES3 implementations that strip BOM
> from the sources and web pages placing BOM in arbitrary places in JS
> sources. So the question is should ES4 at least partially be
> compatible with the current code?
>

As we've found with the ES3-specified stripping of Cf characters, the main
effect of such transparent stripping of characters is to help attackers slip
XSS attacks past defensive filters. ES3.1 agrees with ES4 that BOMs and Cfs
should be treated as whitespace rather than stripped.

-- 
Text by me above is hereby placed in the public domain

   Cheers,
   --MarkM
___
Es4-discuss mailing list
Es4-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es4-discuss


Re: BOM inside tokens

2008-07-15 Thread Igor Bukanov
2008/7/15 Ash Berlin <[EMAIL PROTECTED]>:
>
> I'd say that a BOM should be treated just like any ordinary whitespace
> char - namely that it should invalid in spaces, and beyond that why is
> any conversion needed, since its a valid unicode character...

The problem comes from the current ES3 implementations that strip BOM
from the sources and web pages placing BOM in arbitrary places in JS
sources. So the question is should ES4 at least partially be
compatible with the current code?

igor
___
Es4-discuss mailing list
Es4-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es4-discuss


Re: BOM inside tokens

2008-07-15 Thread Ash Berlin

On 15 Jul 2008, at 18:39, Ash Berlin wrote:

>
> On 15 Jul 2008, at 18:22, Igor Bukanov wrote:
>
>> The currently proposed rule for byte-order-mark (BOM) characters in
>> ES4 sources is to replace them by whitespace outside of tokens. But
>> what is exactly the tokens in a case like --?
>>
>> AFAICS it would be treated as - - turning cases like:
>> --a;
>> into
>> - -a;
>> versus
>> --a;
>> that would be with current ES3 implementations.
>>
>> Regards, Igor
>> _
>
> Hmmm. according do UnicodeCheck app on my mac (and thus to one version
> or other of the Unicode spec) a BOM (uFEFF) is 'ZERO WIDTH NO-BREAK
> SPACE'
>
> • NamesList:
>   = BYTE ORDER MARK (BOM), ZWNBSP
>   • may be used to detect byte order by contrast with the
> noncharacter code point FFFE
>   • use as an indication of non-breaking is deprecated; see 2060
> instead
>   → (zero width space - 200B)
>   → (word joiner - 2060)
>   → ( - FFFE)
> • Designated in Unicode 1.1
>
> I'd say that a BOM should be treated just like any ordinary whitespace
> char - namely that it should invalid in spaces, and beyond that why is
> any conversion needed, since its a valid unicode character...
>

Invalid in *identifiers*


___
Es4-discuss mailing list
Es4-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es4-discuss


Re: BOM inside tokens

2008-07-15 Thread Ash Berlin

On 15 Jul 2008, at 18:22, Igor Bukanov wrote:

> The currently proposed rule for byte-order-mark (BOM) characters in
> ES4 sources is to replace them by whitespace outside of tokens. But
> what is exactly the tokens in a case like --?
>
> AFAICS it would be treated as - - turning cases like:
>  --a;
> into
>  - -a;
> versus
>  --a;
> that would be with current ES3 implementations.
>
> Regards, Igor
> _

Hmmm. according do UnicodeCheck app on my mac (and thus to one version  
or other of the Unicode spec) a BOM (uFEFF) is 'ZERO WIDTH NO-BREAK  
SPACE'

•   NamesList:
= BYTE ORDER MARK (BOM), ZWNBSP
• may be used to detect byte order by contrast with the  
noncharacter code point FFFE
• use as an indication of non-breaking is deprecated; see 2060  
instead
→ (zero width space - 200B)
→ (word joiner - 2060)
→ ( - FFFE)
•   Designated in Unicode 1.1

I'd say that a BOM should be treated just like any ordinary whitespace  
char - namely that it should invalid in spaces, and beyond that why is  
any conversion needed, since its a valid unicode character...

-ash
___
Es4-discuss mailing list
Es4-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es4-discuss


BOM inside tokens

2008-07-15 Thread Igor Bukanov
The currently proposed rule for byte-order-mark (BOM) characters in
ES4 sources is to replace them by whitespace outside of tokens. But
what is exactly the tokens in a case like --?

AFAICS it would be treated as - - turning cases like:
  --a;
into
  - -a;
versus
  --a;
that would be with current ES3 implementations.

Regards, Igor
___
Es4-discuss mailing list
Es4-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es4-discuss