Github user rednaxelafx commented on the issue:

    https://github.com/apache/spark/pull/22847
  
    Using source code length will have to be a very coarse-grained, "fuzzy" 
heuristic. It's not meant to be accurate. So just pick some number that makes 
sense.
    2048 might be a good enough choice for now.
    
    Tidbit: before this PR, since the `1024` split threshold is hard coded, the 
way I stress test the code splitting path is to hack into 
`CodegenContext.freshName()` and artificially prepend a very long prefix to the 
symbol names.
    e.g. "project_value_1" becomes 
"averylongprefixrepeatedmultipletimes_project_value_1". That inflates the 
source code size a lot which triggers code splitting a lot more frequently 
under the existing heuristics.
    I'm mentioning this here to show that source code length based heuristics 
are inherently imprecise. It'll actually behave slightly differently for 
expressions in different physical operators because of the prefix length is 
different — "serializefromobject_" is much longer than "project_", which is 
longer than "agg_", right?
    
    Side notes for those curious:
    (details are way too complicated to be addressed by a source code length 
based heuristic, but I'm making this note just to let people know what's 
involved)
    
    Performance-wise, it might be very interesting to note that, running on 
HotSpot, sometimes it might be better to split into more smaller methods, and 
sometimes it might be better to split into less but larger methods.
    
    The theory is this: if a method is really hot, we'd like it to be 
eventually inlined by the JIT to reduce method invocation overhead and get 
better optimization. In HotSpot, hot methods will eventually be compiled by C2, 
and it uses multiple bytecode size based threshold to determine whether or not 
a callee is fit for inlining.
    ```
    MaxInlineSize = 35 // in bytes of bytecode. "The maximum bytecode size of a 
method to be inlined"
    FreqInlineSize = 325 // in bytes of bytecode. "The maximum bytecode size of 
a frequent method to be inlined"
    MaxTrivialSize = 6 // in bytes of bytecode. "The maximum bytecode size of a 
trivial method to be inlined"
    InlineFrequencyRatio = 20 // "Ratio of call site execution to caller method 
invocation"
    // there's also a InlineFrequencyCount which we'll omit here for simplicity
    ```
    (values shown above are default values for C2 on x86.)
    c.f. 
http://hg.openjdk.java.net/jdk8u/jdk8u/hotspot/file/tip/src/share/vm/runtime/globals.hpp
    
    Don't be confused by the name, though.
    `MaxTrivialSize` is meant for regular getters/setters, and they're 
considered to be always inlineable (unless explicitly configured as 
`dontinline` in a compiler directive option).
    `MaxInlineSize` is for regular small methods, with more checks than the 
"trivial" case, without checking the invocation frequency.
    `InlineFrequencyRatio` is the callee-to-caller invocation ratio for a call 
site to be considered as "hot" (or "frequent"). The default is 20:1. This is 
usually for call sites inside of loops, and HotSpot can collect loop counts / 
invocation counts in profiling, to figure out the ratio.
    Similarly, there's a `InlineFrequencyCount` that if a call site (after 
scaling according to inline depth etc) is invoked more than this many times, it 
can also be considered as "hot".
    `FreqInlineSize` is the max size a "hot" call site can be considered for 
inlining. It's 325 bytes of bytecode, obviously larger than `MaxInlineSize` so 
it might sound confusing at first ;-)
    
    So the deal is: on HotSpot, if you want to save bytecode code size by
    - Outlining a utility method that's <= 6 bytes, there's no overhead at all. 
But in practice this is not really feasible, because at the call site of this 
utility method, "invokespecial" itself is 3 bytes, and you'd at least need to 
pass in "this" which is 1 byte, and if you need to pass in arguments (e.g. 
INPUT_ROW) that's at least 1 or 2 more bytes, which already adds up to 4 to 5 
bytes at the call site, whereas the callee's method body can only be <= 6 
bytes, there's not much you can do…
    - Outlining a utility method that's <= 35 bytes. That's the easiest for 
both HotSpot C1 and C2 to inline. You can do somethings but not a lot in this 
method. If you want to shoot for this case, then don't make the split threshold 
too big.
    - Outlining a utility method that's <= 325 bytes. In HotSpot, only C2 gets 
to inline big methods like this, and only "hot call sites" are considered.
    
    So, if a method is compute-intensive and is small enough, it'd better to 
strive to keep the split-off methods within the thresholds mentioned above.
    Otherwise, if it's already bigger than the inline thresholds, then it makes 
more sense to make the split-off method as big as possible (but below 8000 
bytes of bytecode) so that the invocation overhead is amortized.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to