First, there's no magic - the compiler is pretty memory/cpu intensive, but 
by the time permutations are being built, it is singly threaded. Unless 
you've got a ton of cache and bandwidth to ram, "not scaling linearly" 
makes a lot of sense no matter what your application is - you're probably 
able to saturate access to memory before you can keep your CPUs busy. 
Additionally, I'm not sure exactly what "vCPU" means these days, but the 
last time I checked cloud vendors were using hyperthreading/etc to let a 
given "core" work on more than one thread at a time, and appear to be 
multiple cores to software. Ostensibly this lets the CPU handle more than 
one instruction at the same time... but you're still choking on memory 
access, loading more data into cache as needed to chase those pointers. 

So, on a given set of hardware, you're probably able to find a limit where 
you are no longer scaling linearly, and another limit somewhat above that 
where you're no longer building any faster. 

A few questions, that either may help the discussion along, or might help 
you to weigh your options:

   - What happens if you profile the compiler like a normal Java 
   application - is the heap too big (30GB is a bit of a magic number to stay 
   below when it comes to compressed references 
   <https://shipilev.net/jvm/anatomy-quarks/23-compressed-references/>) or 
   too small (a bit more headroom might make it possible to get more work 
   done)? 
   - What does your CPU usage look like - is the process actually scaling 
   to use the threads you have?
   - Any other oddities in your profiling report? When we look at 
   long-lived GWT applications, it isn't uncommon for us to find far too many 
   split points, which eat an amazing amount of build time to produce even if 
   they have very little effect. The compiler can be "asked" to guess for you 
   which splitpoints are not worth having, but it is worth auditing this or 
   other parts of the build to see what else could be going on.

Now some GWT specific points, rather than general JVM points: 

   - What do you gain from those permutations? Taking an extreme example, 
   what happens if you collapse the entire application, 48 permutations, into 
   one single super-permutation - how much bigger is the app? How much slower 
   is it? What if you just collapse mobile vs desktop (I'd guess that mobile 
   is smaller than desktop, but smaller enough to matter?), or collapse 
   languages in groups of, say 4-8 - do you add 20% to the total compiled 
   size, or 1%?
   - Do you always need separate permutations? For acceptance testing, you 
   likely want the same build that would go into a production release, but 
   maybe it is okay to add 15 minutes to those build times, but for "does this 
   PR build?" or "post-merge, does main still pass tests?", you might be able 
   to support a subset of values, or just a collapsed set, saving time, but 
   producing somewhat larger output.
   - Any other configuration you've experimented with? As you alluded to, 
   you can split the process up when building permutations via the 
   "gwt.jjs.permutationWorkerFactory" system property. In short, this is 
   customizable to not just decide "all work stays in-process" or "fork 
   another JVM per permutation (and tune memory usage carefully)", but also 
   how many workers come from each source. The default (see 
   PermutationWorkerFactory for specifics) is to run 
   both ThreadedPermutationWorkerFactory for the permutations, then 
   ExternalPermutationWorkerFactory for the next, etc. The -localWorkers 
   option and "gwt.jjs.maxThreads" system property will further control how 
   work is divided. 
   - Javadoc for ExternalPermutationWorkerFactory indicates that it runs 
   CompilePermsServer instances, but the isLocal method still returns true. A 
   custom worker factory can also be written to not just write work to disk 
   and handle it in a forked JVM, but even copy to another machine and 
   communicate with it remotely.

I _suspect_ you won't get too far into the weeds with this before finding a 
happy medium with small-enough compiled output and fast enough 
development/CI builds, but that at least covers where I would get started 
in considering this. Moving locales out of the compiled JS is definitely 
another option (and not too difficult to achieve, at least as long as you 
are focused on Constants rather than Messages), but it can be a bit harder 
to let the compiler be as aggressive about ensuing you keep unused output 
out of browsers.

On Wednesday, January 3, 2024 at 6:29:08 PM UTC-6 Alexander Bertram wrote:

Hi there,
We have been using GWT to build our product for a very long time. Recently, 
we've faced a new challenge as we've steadily been increasing the number of 
supported translations of the application to support a global audience. 
We're up to 24 languages, and could conceivably hit 40 in the coming year.

With all of these languages, come more permutations! We've stripped away 
browser-specific permutations, but we do have a mobile version of the app, 
which means that we have 2 x 24 permutations = 48.

So far, we've addressed this problem by increasing the size of the VM that 
builds the app, but even with 16 vCPUs it takes 10-12 minutes to build the 
app. I'm experimenting with increasing to 32 vCPUs, but so far I can't get 
the build time to drop linearly.

Anyone else out there using alternate strategies? Is it worth trying to 
create some sort of distributed cache from the intermediate files the 
compiler writes out? Load translations dynamically at runtime instead? Or 
just through more hardware at it :-)

Just curious to hear what others are doing?

Best,
Alex

-- 
You received this message because you are subscribed to the Google Groups "GWT 
Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-web-toolkit+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/google-web-toolkit/4adeee50-28de-44a7-9bec-ded27ca373f8n%40googlegroups.com.

Reply via email to