Re: [dev-servo] WTF-8 encoding for DOM strings and HTML parsing

Cameron Zwarich Mon, 06 Oct 2014 00:54:07 -0700

On Oct 5, 2014, at 7:49 PM, Boris Zbarsky <bzbar...@mit.edu> wrote:

> On 10/5/14, 7:51 PM, Cameron Zwarich wrote:
>> Are there any plans to eliminate the copies in Gecko?
> 
> No.  Measurement showed that in practice the cost of copying short strings, 
> which most of these are, is very low.  For large strings you do end up having 
> to copy, but keep in mind that Gecko used to avoid the copy only if the 
> callee did not hold on to the string.  So as long as the passed-in thing was 
> only used for comparisons to other strings, you could avoid a copy.
> 
> There are very few real-life cases in which a long string is passed in and 
> then not stored.


I agree that if you can’t eliminate the copy in the case where it is stored 
then it doesn’t seem worth it.

>> V8-created strings are converted to the external Blink representation on 
>> demand.
> 
> This causes them to be copied, yes?

Yes, although you can get rid of the original JS string representation when you 
do this.

WebKit does something a bit different. It a slight improvement on the V8 model, 
but it requires more integration between the JS engine and the rest of the 
browser engine. They have three kinds of strings: atomic StringImpl, a rope 
whose fibers are StringImpls, and small strings that are at most a single 
character. The WebCore string types wrap StringImpl so that there is a zero 
cost conversion between the two. There isn’t an optimization for small strings 
any larger than a single character.

Where do most of the small strings larger than a single character that benefit 
from the inline small string optimization originate, the DOM or user JS code?

>> How do you avoid the copy on return values from C++ to JS in Gecko?
> 
> JS_NewExternalString, passing a pointer to something that's refcounted and 
> then having the finalizer deref.

Thanks for the reference. That function takes a char16_t*, but now that they’ve 
done the Latin1 optimization, it seems like it should be possible to also make 
a Latin1 variant. That would work quite nicely if Servo were using the same 
UTF-16 / Latin1 split.

There is an interesting question of whether it will ever be a good idea for a 
JS engine to make something like WTF-8 its native string type, or if the 
difficulties with random access outweigh the advantages. It seems like a 
daunting task to do that at this point for Gecko / SM, even if it would work in 
theory.

Cameron
_______________________________________________
dev-servo mailing list
dev-servo@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-servo

Re: [dev-servo] WTF-8 encoding for DOM strings and HTML parsing

Reply via email to