Re: Policing dead/zombie code in m-c

Henri Sivonen Fri, 25 Apr 2014 00:32:26 -0700

On Thu, Apr 24, 2014 at 4:20 PM, Benoit Jacob <jacob.benoi...@gmail.com> wrote:
> 2014-04-24 8:31 GMT-04:00 Henri Sivonen <hsivo...@hsivonen.fi>:
>
>> I have prepared a queue of patches that removes Netscape-era (circa
>> 1999) internationalization code that efforts to implement the Encoding
>> Standard have shown unnecessary to have in Firefox. This makes libxul
>> on ARMv7 smaller by 181 KB, so that's a win.
>
> Have we measured the impact of this change on actual memory usage (as
> opposed to virtual address space size) ?

No, we haven't. I don't have a B2G phone, but I could give my whole
patch queue in one diff to someone who wants to try.

> Have we explored how much this problem could be automatically helped by the
> linker being smart about locality?

Not to my knowledge, but I'm very skeptical about getting these
benefits by having the linker be smart so that the dead code ends up
on memory pages that  aren't actually mapped to real RAM.

The code that is no longer in use is sufficiently intermingled with
code that's still is in use. Useful and useless plain old C data is
included side-by-side. Useful and useless classes are included next to
each other in unified compilation units. Since the classes are
instantiated via XPCOM, a linker that's unaware of XPCOM couldn't tell
that some classes are in use and some aren't via static analysis. All
of them would look equally dead or alive depending on what we do you
take on the root of the caller chain being function pointers in a
contract ID table.

Using PGO to determine what's dead code and what's not wouldn't work,
either, if the profiling run was "load mozilla.org", because the run
would exercise too little code, or if the profiling run was "all the
unit tests", because the profiling run would exercise too much code.

On Fri, Apr 25, 2014 at 2:03 AM, Ehsan Akhgari <ehsan.akhg...@gmail.com> wrote:
>> * Are we building and shipping dead code in ICU on B2G?
>
> No.  That is at least partly covered by bug 864843.

Using system ICU seems wrong in terms of correctness. That's the
reason why we don't use system ICU on Mac and desktop Linux, right?

For a given phone, the Android base system practically never updates,
so for a given Firefox version, the Web-exposed APIs would have as
many behaviors as there are differing ICU snapshots on different
Android versions out there.

As for B2G, considering that Gonk is supposed to update less often
than Gecko, it seems like a bad idea to have ICU be part of Gonk
rather than part of Gecko on B2G.

> In my experience, ICU is unfortunately a hot potato. :(  The real blocker
> there is finding someone who can tell us what bits of ICU _are_ used in the
> JS engine.

Apart from ICU initialization/shutdown, the callers seem to be
http://mxr.mozilla.org/mozilla-central/source/js/src/builtin/Intl.cpp
and http://mxr.mozilla.org/mozilla-central/source/js/src/jsstr.cpp#852
.

So the JS engine uses:
 * Collation
 * Number formatting
 * Date and time formatting
 * Normalization

It looks like the JS engine has its own copy of the Unicode database
for other purposes. It seems like that should be unified with ICU so
that there'd be only one copy of the Unicode database.

Additionally, we should probably rewrite nsCollation users to use ICU
collation and delete nsCollation.

Therefore, it looks like we should turn off (if we haven't already):
 * The ICU LayoutEngine.
 * Ustdio
 * ICU encoding converters and their mapping tables.
 * ICU break iterators and their data.
 * ICU transliterators and their data.

http://apps.icu-project.org/datacustom/ gives a good idea of what
there is to turn off.

> The parts used in Gecko for <input type=number> are pretty
> small.  And of course someone needs to figure out the black magic of
> conveying the information to the ICU build system.

So it looks like we already build with UCONFIG_NO_LEGACY_CONVERSION:
http://mxr.mozilla.org/mozilla-central/source/intl/icu/source/common/unicode/uconfig.h#264

However, that flag is misdesigned in the sense that it considers
US-ASCII, ISO-8859-1, UTF-7, UTF-32, CESU-8, SCSU and BOCU-1 as
non-legacy, even though, frankly, those are legacy, too. (UTF-16 is
legacy also, but it's legacy we need, since both ICU and Gecko are
UTF-16 legacy code bases!)
http://mxr.mozilla.org/mozilla-central/source/intl/icu/source/common/unicode/uconfig.h#267

So I guess the situation isn't quite as bad as I thought.

We should probably set UCONFIG_NO_CONVERSION to 1 and
U_CHARSET_IS_UTF8 to 1 per:
http://mxr.mozilla.org/mozilla-central/source/intl/icu/source/common/unicode/uconfig.h#248
After all, we should easily be able to make sure that we don't use
non-UTF-8 encodings when passing char* to ICU.

Also, If the ICU build system is an configurable enough, I think we
should consider identifying what parts of ICU we can delete even
though the build system doesn't let us to and then automate the
deletion as a script so that it can be repeated with future imports of
ICU.

>>   * Do we have any mechanisms in place for preventing stuff like the
>> ICU encoding converters becoming part of the building the future?
>
> No, that is not possible to automate.

I was thinking of policy / review solutions.

>>   * How should we identify code that we build but that isn't used
>> anywhere?
>
> I'm afraid we need humans for that.

Yeah, but how do we get humans to do that?

-- 
Henri Sivonen
hsivo...@hsivonen.fi
https://hsivonen.fi/
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Re: Policing dead/zombie code in m-c

Reply via email to