Re: How to efficiently walk the DOM tree and its strings

2014-03-04 Thread nsm . nikhil
On Tuesday, March 4, 2014 1:26:15 AM UTC-8, somb...@gmail.com wrote:
> While we have a defense-in-depth strategy (CSP and iframe sandbox should 
> 
> be protecting us from the worst possible scenarios) and we're hopeful 
> 
> that Service Workers will eventually let us provide 
> 
> nsIContentPolicy-level protection, the quality of the HTML parser is of 
> 
> course fairly important[1] to the operation of the HTML sanitizer.

Sorry to go off-topic, but how are ServiceWorkers different from normal Workers 
here? I'm asking without context, so forgive me if I misunderstood.

Thanks,
Nikhil
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Consensus sought - when to reset try repository?

2014-03-04 Thread Gregory Szorc

On 2/28/14, 5:24 PM, Hal Wine wrote:

tl;dr: what is the balance point between pushes to try taking too long
and loosing repository history of recent try pushes?

Summary:


As most developers have experienced, pushing to try can sometimes take a
long time. Once it takes "too long" (as measured by screams of pain in
#releng) , a
"try [repository] reset" is scheduled. This hurts productivity and
increases frustration for everyone involved (devs, IT, RelEng). We don't
want to do this anymore.

A reset of the try repository deletes the existing contents, and
replaces with a fresh clone from mozilla-central. While the tbpl
information will remain valid for any completed build, any attempt to
view the diffs for a try build will fail (unless you already had them in
your local repository).

Progress on resolution of the root cause:
-

IT has made tremendous progress in reducing the occurrence of "long push
times", but they still are not predictable. Various attempts at
monitoring[1] and auto correction[2] have not been successful in
improving the situation. Work continues on additional changes that
should improve the situation[3].

The most recent mitigation strategy is to trade the "unknown timing"
disruption of the push times increasing to a pain threshold with a
"known timing" of reseting the try repository every TCW (tree closing
window - every 6 wks currently). However, we heard from some folks that
this is too often.

The most recent try-reset-triggered-by-pain was a duration of 6
months[4]. There was at least one report just 3 months after reset of
problems[5].

So, the question is - what say developers -- what's the balance point
between:
  - too often, making collaborating on try pushes hard
  - too infrequent, introducing increasing push times


I wouldn't have such a big issue with Try resets if we didn't lose 
information in the process. I believe every time there's been a Try 
reset, I've lost data from a recent (<1 week) Try push and I needed to 
re-run that job - incurring extra cost to Mozilla and wasting my time. I 
also periodically find myself wanting to answer questions like "what 
percentage of tree closures are due to pushes that didn't go to Try 
first." Data loss stinks.


I'd say the goal should be "no data loss." I have an idea that will 
enable us to achieve this.


Let's expose every newly-reset instance of the Try repo as a separate 
URL. We would still push to ssh://hg.mozilla.org/try, but the URLs 
printed and the URLs used by automation would be URLs to repos that 
would never go away. e.g. 
https://hg.mozilla.org/tries/try1/rev/840f122d1286 ("try1" being the 
important bit in there). When we reset Try, you'd hand out URLs to 
"try2." You could reset the writable Try repo as frequently as you 
desired and aside from a slightly different repo URL being given out, 
nobody should notice.


The main drawbacks of this approach that I can think of are all in 
automation: parts of automation are very repo/URL centric and having 
effectively dynamic URLs might break assumptions. But making automation 
work against arbitrary URLs is a good thing, as it allows automation to 
be more flexible and this allows people to experiment with alternate 
repo hosting, landing tools, landing-integrated code review tools, etc 
without requiring special involvement from RelEng. "Everything is a web 
service and is self-service," etc.

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: How to efficiently walk the DOM tree and its strings

2014-03-04 Thread Felipe G
The actual translation needs to happen at once, but that's ok if I can work
in the chunks incrementally, and only when everything is ready I send it
off to the translation service.  What I need to find then is a good (and
fast) partitioning algorithm that will give me a list of several blocks to
translate. A CSS block is a good start but I need something more detailed
than that for some of these reasons:

- I can't skip invisible or display:none nodes because websites have
navigation menus and etc. that have text on them and need to be translated
(I don't know what's the correct definition of "CSS block" that you mention
to know if it covers this or not)
- In direct opposition of the first point, I can't blindly just consider
all nodes (including invisible ones) with text content on them because
websites have 

Re: W3C Proposed Recommendations: WAI-ARIA (accessibility)

2014-03-04 Thread L. David Baron
On Tuesday 2014-02-25 09:43 -0500, david bolter wrote:
> I support this W3C Recommendation.

Yep.

While I wasn't entirely happy with some of the history that led to
the current state, I agree we should support it.

(In particular, in the early days of ARIA I was told, in private
conversations, that it was intended as a temporary measure until
HTML5 was ready and had enough of the semantics needed.  But I never
asked the people telling me that that that intent should be
documented publicly, and as far as I know there's no public record
of it, and it probably didn't represent any consensus at the time.
I'd probably have preferred that the semantics needed for
accessibility were part of the semantics of the language rather than
a separate add-on that can be inconsistent, but that's also not the
way the Web platform works today.  I did learn something about the
value of working in public, though.)

So I'll submit a review in support of advancing to Recommendation
as-is.

-David

> On Tue, Feb 11, 2014 at 2:22 PM, L. David Baron  wrote:
> > W3C recently published the following proposed recommendations (the
> > stage before W3C's final stage, Recommendation):
> >
> >   Accessible Rich Internet Applications (WAI-ARIA) 1.0
> >   http://www.w3.org/TR/2014/PR-wai-aria-20140206/
> >
> >   WAI-ARIA 1.0 User Agent Implementation Guide
> >   http://www.w3.org/TR/2014/PR-wai-aria-implementation-20140206/

-- 
𝄞   L. David Baron http://dbaron.org/   𝄂
𝄢   Mozilla  https://www.mozilla.org/   𝄂
 Before I built a wall I'd ask to know
 What I was walling in or walling out,
 And to whom I was like to give offense.
   - Robert Frost, Mending Wall (1914)


signature.asc
Description: Digital signature
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: We live in a memory-constrained world

2014-03-04 Thread Nicholas Nethercote
On Mon, Mar 3, 2014 at 11:48 PM, Henri Sivonen  wrote:
>
> Are static atoms and the HTML parser's pre-interned element name and
> attribute name objects that are on the heap shared between processes
> under Nuwa already? I.e. is the heap cloned with copy-on-write
> sharing? On the memory page granularity, right? Do we know if the
> stuff we heap-allocate at startup pack nicely into memory pages so
> that they won't have "free" spots that the allocator would use after
> cloning?

https://bugzilla.mozilla.org/show_bug.cgi?id=948648 is open for
investigating this. Preliminary results seem to indicate that things
work pretty well at the moment -- i.e. not much sharing is lost -- but
I may be misinterpreting.

Nick
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Column numbers appended to URLs recently

2014-03-04 Thread Jason Orendorff
On 3/3/14, 12:54 PM, Jan Honza Odvarko wrote:
> URLs in stack traces for exception objects have been recently changed. There 
> is a column number appended at the end (I am seeing this in Nightly, but it 
> could be also in Aurora).

Code that parses error.stack now needs to handle the case where both a
line number and a column number appear after the URL.

This should be easy to fix. What's affected? Just Firebug?

-j

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Ideas for making it easier and less error prone for Firefox OS partners to expose certified only APIs

2014-03-04 Thread Jonas Sicking
On Thu, Jan 23, 2014 at 4:29 PM, Ehsan Akhgari  wrote:
> On 1/22/2014, 3:12 AM, Henri Sivonen wrote:
>>
>> On Tue, Jan 21, 2014 at 11:07 PM, Ehsan Akhgari 
>> wrote:
>>>
>>> On 1/21/2014, 4:27 AM, Henri Sivonen wrote:
>>
>> ..

 In general, though, it seems to me that having partners patch Gecko is
 bad in the sense that it makes it harder to move to a future where the
 device vendor ships Gonk (analogous to Android) and Mozilla ships
 Gecko updates (analogous to Google Play Services). Have we given up on
 getting to a future where Mozilla ships Gecko updates and Firefox OS
 users get an up-to-date Gecko from Mozilla the way desktop and Android
 users do?
>>>
>>>
>>>
>>> I'm unaware of our plans for doing that, but indeed that sounds very hard
>>> if
>>> not impossible as long as Gecko modifications are permitted.  Do you know
>>> who would be the right person to ask about the plans you're referring to?
>>
>> I'm not sure what level of "plan" the assumed aspiration I'm referring
>> to has ever been on. IIRC, back when the Gonk/Gecko/Gaia taxonomy was
>> introduced, the idea was that Gonk was a Gecko porting target that
>> mostly stays put and Gecko would update. It seems pretty clear that in
>> order to get to a point where Firefox OS users have an up-to-date
>> Gecko, it needs to be possible for Mozilla to push out a Gecko build
>> to Gonk like pushing out a Firefox build for Android or Windows
>> without requiring device-specific patches to be rebased and Gecko be
>> rebuilt for each device as part of that process.
>>
>> I don't know who'd be the right person to ask about this.
>
> Jonas, do you happen to know more about this?

I generally agree that it's much better if partners don't modify Gecko
at all. Though a certain maturity is required for that to be possible.

Check with Chris Lee about this. I don't know what the latest
requirements that we are putting on partners are.

/ Jonas
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: How to efficiently walk the DOM tree and its strings

2014-03-04 Thread Robert O'Callahan
On Wed, Mar 5, 2014 at 8:47 AM, Felipe G  wrote:

> If I go with the clone route (to work on the snapshot'ed version of the
> data), how can I later associate the cloned nodes to the original nodes
> from the document?  One way that I thought is to set a a userdata on the
> DOM nodes and then use the clone handler callback to associate the cloned
> node with the original one (through weak refs or a WeakMap).  That would
> mean iterating first through all nodes to add the handlers, but that's
> probably fine (I don't need to analyze anything or visit text nodes).
>
> I think serializing and re-parsing everything in the worker is not the
> ideal solution unless we can find a way to also keep accurate associations
> with the original nodes from content. Anything that introduces a possibly
> lossy data aspect will probably hurt translation which is already an
> innacurate science.
>

Maybe you can do the translation incrementally, and just annotate the DOM
with custom attributes (or userdata) to record the progress of the
translation? Plus a reference to the last tranlated node (subtree) to speed
up finding the next node subtree to translate. I assume it would be OK to
translate one CSS block at a time.

Rob
-- 
Jtehsauts  tshaei dS,o n" Wohfy  Mdaon  yhoaus  eanuttehrotraiitny  eovni
le atrhtohu gthot sf oirng iyvoeu rs ihnesa.r"t sS?o  Whhei csha iids  teoa
stiheer :p atroa lsyazye,d  'mYaonu,r  "sGients  uapr,e  tfaokreg iyvoeunr,
'm aotr  atnod  sgaoy ,h o'mGee.t"  uTph eann dt hwea lmka'n?  gBoutt  uIp
waanndt  wyeonut  thoo mken.o w
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: We live in a memory-constrained world

2014-03-04 Thread Trevor Saunders
On Tue, Mar 04, 2014 at 09:48:33AM +0200, Henri Sivonen wrote:
> On Fri, Feb 28, 2014 at 8:09 PM, L. David Baron  wrote:
> > In other words, whenever you have a pointer in a static data
> > structure pointing to some other data, that pointer needs to get
> > fixed up when the library loads, which makes the memory that pointer
> > is in less likely to be shared across processes (depending, I guess,
> > on how many processes are able to load the library at its default
> > address, which may in turn depend on security features that try to
> > randomize library base addresses).  This also slows down loading of
> > shared libraries.
> 
> So all things considered, do we want things like static atoms and the
> HTML parser's pre-interned element name and attribute name objects
> (which have pointers to static atoms and a virtual method) to move
> from the heap to POD-like syntax even if it results in relocations or,
> with MSVC, static initializers?

its generally gcc that does dumb things with static initializers, but
that can generally be fixed with liberal use of constexpr.

Anyway I suspect the real answer is "it's complicated" ;) but it's
probably  a good idea at least for things on the staart up path anyway
adding a relocation and saving a call to malloc before we can parse html
is probably a win on its own.

> > Shouldn't be an issue with Nuwa-cloned processes on B2G, though.
> 
> Are static atoms and the HTML parser's pre-interned element name and
> attribute name objects that are on the heap shared between processes
> under Nuwa already? I.e. is the heap cloned with copy-on-write
> sharing? On the memory page granularity, right? Do we know if the

aiui yes the heap is made copy on write at the time fork(2) is called to
create the Nuwa process, and presumably we don't have KSM turned on so
only the initial heap will ever be shared.  I'd guess that HTML parser
stuff you mentioned is created beforewe fork the Nuwa process and so its
included.

> stuff we heap-allocate at startup pack nicely into memory pages so
> that they won't have "free" spots that the allocator would use after
> cloning?

probably not perfectly, but it seems like in practice it does fairly
well other wise Nuwa wouldn't help.

Trev

> 
> -- 
> Henri Sivonen
> hsivo...@hsivonen.fi
> https://hsivonen.fi/
> ___
> dev-platform mailing list
> dev-platform@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-platform


signature.asc
Description: Digital signature
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: How to efficiently walk the DOM tree and its strings

2014-03-04 Thread Felipe G
Thanks for the feedback so far!

If I go with the clone route (to work on the snapshot'ed version of the
data), how can I later associate the cloned nodes to the original nodes
from the document?  One way that I thought is to set a a userdata on the
DOM nodes and then use the clone handler callback to associate the cloned
node with the original one (through weak refs or a WeakMap).  That would
mean iterating first through all nodes to add the handlers, but that's
probably fine (I don't need to analyze anything or visit text nodes).

I think serializing and re-parsing everything in the worker is not the
ideal solution unless we can find a way to also keep accurate associations
with the original nodes from content. Anything that introduces a possibly
lossy data aspect will probably hurt translation which is already an
innacurate science.


On Tue, Mar 4, 2014 at 6:26 AM, Andrew Sutherland <
asutherl...@asutherland.org> wrote:

> On 03/04/2014 03:13 AM, Henri Sivonen wrote:
>
>> It saddens me that we are using non-compliant ad hoc parsers when we
>> already have two spec-compliant (at least at some point in time) ones.
>>
>
> Interesting!  I assume you are referring to:
> https://github.com/davidflanagan/html5/blob/master/html5parser.js
>
> Which seems to be (explicitly) derived from:
> https://github.com/aredridel/html5
>
> Which in turn seems to actually includes a few parser variants.
>
> Per the discussion with you on https://groups.google.com/d/
> msg/mozilla.dev.webapi/wDFM_T9v7Tc/Nr9Df4FUwuwJ for the Gaia e-mail app
> we initially ended up using an in-page data document mechanism for
> sanitization.  We later migrated to using a worker based parser.  There
> were some coordination hiccups with this migration (
> https://bugzil.la/814257) and some time B2G time-pressure so a
> comprehensive survey of HTML parsers did not happen so much.
>
> While we have a defense-in-depth strategy (CSP and iframe sandbox should
> be protecting us from the worst possible scenarios) and we're hopeful that
> Service Workers will eventually let us provide nsIContentPolicy-level
> protection, the quality of the HTML parser is of course fairly important[1]
> to the operation of the HTML sanitizer.  If you'd like to bless a specific
> implementation for workers to perform streaming HTML parsing or other some
> other explicit strategy, I'd be happy to file a bug for us to go in that
> direction.  Because we are using a white-list based mechanism and are
> fairly limited and arguably fairly luddite in what we whitelist, it's my
> hope that our errors are on the side of safety (and breaking adventurous
> HTML email :), but that is indeed largely hope.  Your input is definitely
> appreciated, especially as it relates to prioritizing such enhancements and
> potential risk from our current strategy.
>
> Andrew
>
>
> 1: understatement
>
> ___
> dev-platform mailing list
> dev-platform@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-platform
>
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: How to efficiently walk the DOM tree and its strings

2014-03-04 Thread Felipe G
Chrome imports a JS 

Re: How to efficiently walk the DOM tree and its strings

2014-03-04 Thread Andrew Sutherland

On 03/04/2014 03:13 AM, Henri Sivonen wrote:
It saddens me that we are using non-compliant ad hoc parsers when we 
already have two spec-compliant (at least at some point in time) ones. 


Interesting!  I assume you are referring to:
https://github.com/davidflanagan/html5/blob/master/html5parser.js

Which seems to be (explicitly) derived from:
https://github.com/aredridel/html5

Which in turn seems to actually includes a few parser variants.

Per the discussion with you on 
https://groups.google.com/d/msg/mozilla.dev.webapi/wDFM_T9v7Tc/Nr9Df4FUwuwJ 
for the Gaia e-mail app we initially ended up using an in-page data 
document mechanism for sanitization.  We later migrated to using a 
worker based parser.  There were some coordination hiccups with this 
migration (https://bugzil.la/814257) and some time B2G time-pressure so 
a comprehensive survey of HTML parsers did not happen so much.


While we have a defense-in-depth strategy (CSP and iframe sandbox should 
be protecting us from the worst possible scenarios) and we're hopeful 
that Service Workers will eventually let us provide 
nsIContentPolicy-level protection, the quality of the HTML parser is of 
course fairly important[1] to the operation of the HTML sanitizer.  If 
you'd like to bless a specific implementation for workers to perform 
streaming HTML parsing or other some other explicit strategy, I'd be 
happy to file a bug for us to go in that direction.  Because we are 
using a white-list based mechanism and are fairly limited and arguably 
fairly luddite in what we whitelist, it's my hope that our errors are on 
the side of safety (and breaking adventurous HTML email :), but that is 
indeed largely hope.  Your input is definitely appreciated, especially 
as it relates to prioritizing such enhancements and potential risk from 
our current strategy.


Andrew


1: understatement
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: How to efficiently walk the DOM tree and its strings

2014-03-04 Thread Henri Sivonen
On Mon, Mar 3, 2014 at 10:19 PM, Boris Zbarsky  wrote:
> How feasible is just doing .innerHTML to do that, then doing some sort of
> async parse (e.g. XHR or DOMParser) to get a DOM snapshot?

Seems more efficient to write the walk in C++, since the innerHTML
getter already includes the walk in C++. How important is it to avoid
C++?

On Mon, Mar 3, 2014 at 10:45 PM, Ehsan Akhgari  wrote:
> There's https://github.com/google/gumbo-parser which can be compiled to js.

The parser we use in Gecko can be compiled to JS using GWT. However,
the current glue code assumes the parser is running in the context of
a browser window object and a browser DOM. Writing the glue code that
assumes something else about the environment should be easy.

Also, David Flanagan has implemented the HTML parsing algorithm
(pre-; not sure if updated since) directly in JS.

On Tue, Mar 4, 2014 at 1:57 AM, Andrew Sutherland
 wrote:
> The Gaia e-mail app has a streaming HTML parser in its worker-friendly
> sanitizer at
> https://github.com/mozilla-b2g/bleach.js/blob/worker-thread-friendly/lib/bleach.js.

On Tue, Mar 4, 2014 at 7:14 AM, Wesley Johnston  wrote:
> Android also ships a parser that we wrote for Reader mode:
>
> http://mxr.mozilla.org/mozilla-central/source/mobile/android/chrome/content/JSDOMParser.js

It saddens me that we are using non-compliant ad hoc parsers when we
already have two spec-compliant (at least at some point in time) ones.

-- 
Henri Sivonen
hsivo...@hsivonen.fi
https://hsivonen.fi/
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform