Re: Capturing text from Firefox

2013-10-18 Thread Neil

Look, Yuriy wrote:


Are there other approaches to capturing text form Ff you can suggest?

I am not a member of the accessibility team (although I have helped them 
on occasion) but I would like to take the opportunity to suggest using 
the accessibility APIs.


--
Warning: May contain traces of nuts.
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Capturing text from Firefox

2013-10-18 Thread Benjamin Smedberg

On 10/17/2013 7:30 PM, Look, Yuriy wrote:

I am working on GUI automation component of a performance monitoring product.  
One of the common approaches to monitoring application is periodically capture 
text from the control where changes are expected (content area of the browser 
for Web applications).  Text capturing ideally captures all text, including not 
selectable and user input.
Have you looked into reusing the existing Selenium browser automation? 
http://docs.seleniumhq.org/


It's not clear exactly what kinds of problems you are trying to solve, 
but the Mozilla content layer already has ways to expose pretty much all 
of the DOM text, including nonselectable text, via various APIs 
including the DOM itself, accessibility API, and some other lower-level 
functions we use for find-in-page.



In the product I work on this is achieved by (1) forcing the application to 
re-draw the texts in the window or part of the window of interest and (2) 
hooking the functions that responsible for text drawing during the time 
interval of the capturing is performed.  Hooking is performed by modifying the 
first bytes of the binary code of the hooked functions to jump to the hooking 
functions, which process the same parameters and then jump back to allow the 
hooked function to perform their job.

This will certainly not work in the future, see below.


Which functions/technologies are drawing the text?
 Is drawing performed by normal Windows APIs, like DrawTextEx or 
ExtTextOut, or this is no longer the case?
No. If I'm reading our bugs correctly, we're currently using a 
combination of harfbuzz (http://freedesktop.org/wiki/Software/HarfBuzz/) 
and uniscribe/directwrite. We use uniscribe only for Hangul, Mongolian, 
Indic, and Thai text, and intend to eventually use harfbuzz for all text 
rendering.

 Does it delegates drawing to another process?
Not yet, but we're working on having content processes similar to the 
way many other web browsers do.

Does Ff caches drawn text, say, in memory device contexts, so that in case the 
window or a region needs to be repainted, text does not need to be redrawn and 
widow device context is updated through functions like BitBlt?

Yes.

   If so, can such caching be disabled programmatically or through 
configuration?

No, I don't think it's possible to disable the layer system any more.

Does Ff patch Windows DLLs?
In a few specific cases, yes, but primarily to enforce a DLL blocklist 
for stability issues and to ensure that our crash reporting system isn't 
tampered with. In plugin processes we also hook a few event-system 
functions. I don't think that any of our hooking should affect the 
graphics/text subsystems.





Are there other approaches to capturing text form Ff you can suggest?
Ultimately, I don't think that trying to capture text by hooking drawing 
functions is going to be successful in Firefox. You probably need to 
look at some combination of accessibility and DOM APIs, depending on 
your actual use case.


--BDS

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


RE: Capturing text from Firefox

2013-10-18 Thread Look, Yuriy
Benjamin,

Thank you very much for the answers.  We'll need to way our options.  

Thank you,

Yuriy


-Original Message-
From: Benjamin Smedberg [mailto:benja...@smedbergs.us] 
Sent: Friday, October 18, 2013 8:55 AM
To: Look, Yuriy; dev-platform@lists.mozilla.org
Subject: Re: Capturing text from Firefox

On 10/17/2013 7:30 PM, Look, Yuriy wrote:
 I am working on GUI automation component of a performance monitoring product. 
  One of the common approaches to monitoring application is periodically 
 capture text from the control where changes are expected (content area of the 
 browser for Web applications).  Text capturing ideally captures all text, 
 including not selectable and user input.
Have you looked into reusing the existing Selenium browser automation? 
http://docs.seleniumhq.org/

It's not clear exactly what kinds of problems you are trying to solve, but the 
Mozilla content layer already has ways to expose pretty much all of the DOM 
text, including nonselectable text, via various APIs including the DOM itself, 
accessibility API, and some other lower-level functions we use for find-in-page.

 In the product I work on this is achieved by (1) forcing the application to 
 re-draw the texts in the window or part of the window of interest and (2) 
 hooking the functions that responsible for text drawing during the time 
 interval of the capturing is performed.  Hooking is performed by modifying 
 the first bytes of the binary code of the hooked functions to jump to the 
 hooking functions, which process the same parameters and then jump back to 
 allow the hooked function to perform their job.
This will certainly not work in the future, see below.

 Which functions/technologies are drawing the text?
  Is drawing performed by normal Windows APIs, like DrawTextEx or 
 ExtTextOut, or this is no longer the case?
No. If I'm reading our bugs correctly, we're currently using a combination of 
harfbuzz (http://freedesktop.org/wiki/Software/HarfBuzz/)
and uniscribe/directwrite. We use uniscribe only for Hangul, Mongolian, Indic, 
and Thai text, and intend to eventually use harfbuzz for all text rendering.
  Does it delegates drawing to another process?
Not yet, but we're working on having content processes similar to the way many 
other web browsers do.
 Does Ff caches drawn text, say, in memory device contexts, so that in case 
 the window or a region needs to be repainted, text does not need to be 
 redrawn and widow device context is updated through functions like BitBlt?
Yes.
If so, can such caching be disabled programmatically or through 
 configuration?
No, I don't think it's possible to disable the layer system any more.
 Does Ff patch Windows DLLs?
In a few specific cases, yes, but primarily to enforce a DLL blocklist for 
stability issues and to ensure that our crash reporting system isn't tampered 
with. In plugin processes we also hook a few event-system functions. I don't 
think that any of our hooking should affect the graphics/text subsystems.



 Are there other approaches to capturing text form Ff you can suggest?
Ultimately, I don't think that trying to capture text by hooking drawing 
functions is going to be successful in Firefox. You probably need to look at 
some combination of accessibility and DOM APIs, depending on your actual use 
case.

--BDS



___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform