Jochen,

>> "N frames per chain of N method handles" looks reasonable for me, but it
depends on average number of transformations users apply. If the case of
deep method handle chains is common in practice, we need to optimize for
it as well and linear dependency in stack space may be too much.

Well, currently I have at least one guard per method call argument and
receiver. If you count dropping arguments, type transformation, the
guard part itself, you get only for the guard itself 3 frames. Counting
up to 5 arguments + receiver, that is again 17 frames in the naive
approach. And we are talking only about the guards.

I assume, the problem would be a magnitude smaller if the JVM could do
tail calls. But I wonder if it is not possible to make the execution of
the forms less recursive and not have some lambda forms cover more than
a single handle.

For example... if you have a series of guards, wouldn't it be possible
to execute them in manner of this:

def myHandleForm(...) {
   ...
   // execute guards
   while (currentGuardFrom!=null) {
     if (executeCurrentGuardFormFail(...)) {
       return executeCurrentGuardFormFalsePath(...)
     }
     currentGuardFrom = getNextCurrentGuardForm(...)
   }
   executeNonGuardFormRemainder(....)
}

where a guard form is the result of a merge of type transformation,
argument insertion, drop and the actual handle for the guard method.

I am positive that could be written in a very generic way. In general I
think that a certain series of handles could be merged. But of course I
don't know about how much JIT likes such things.

[...]
It's possible to optimize some shapes of method handle chains (like nested GWTs) and tailor special LambdaForm shape or do some inlining during bytecode translation. Though such specialization contradicts LF sharing goal, probable benefits may worth the effort.

That makes 5 frames in between. 5 is worlds better than 53.
Ok, 5 additional frames for simple case. Is such overhead tolerable for
you? Or do you need smaller number of intermediate frames?

ah... you know, when it comes to such things language implementors are
quite greedy ;)
... but I can fulfil only 3 wishes ;-)

What are your estimate for complex case? What's the worst case in Groovy?

I think the worst cases are not so much to worry about. What would be
good, is if the first visit would be as small as possible. That is in my
case the generic handle installed by the bootstrap method to do the
runtime type base method selection. That's currently something around 25
frames I think. In a big application you will get a huge amount of
callsites that are visited only once. So having here a small overhead
only will safe later on.

For a few days I am wondering about a special kind of logic to help with
memory consumption and maybe you can tell me if that can work out. What
I am thinking of is using WeakReference to reference my actual method
execution path, a guard that checks if that handle is still available
and if not it executes a failback. The idea being, that if memory
becomes a concern, all the one-time visited callsite, that are not part
of the current trace, can be reduced to just do method selection again.
Could that work out? Will inlining still be possible?

I don't think it will work. If you load a MethodHandle from WeakReference and then use MH.invoke*, inlining will be broken for sure.

We discussed an idea to generate custom bytecodes (single method) for
the whole method handle chain (and have only 1 extra stack frame per MH
invocation), but it defeats memory footprint reduction we are trying to
archieve with LambdaForm sharing.

I wonder if that is the case for Groovy as well. Our old callsite
mechanism does have only 1 frame (upon second execution). Because by
then we generated a class for the callsite that does all the argument
transformation, checks and target method execution. So compared to that
I would not expect a memory increase.
We are looking for ways to significantly reduce memory consumption of
JSR292 implementation. Inlining of LFs from call site means 1 anonymous
class per indy call site. Comparing to fully customized LambdaForms, it
should give noticeable savings due to smaller number of anonymous
classes being loaded. But it doesn't comply with ultimate goal of fixed
set of combinators used to implement all possible behaviors.

since in the traditional implementation the callsite is always Object[]
based we have one such class per executed target method. Of course we
run into profile pollution if we use the same callsite object for
multiple callsites, but it would be the same for the target method, so
in my thinking there is no real problem. Anyway... if there is no need
to create such a class per target of a direct method handle, then I
would expect quite a lot of less memory usage from your approach
That's interesting. I'll try to experiment with that. Thanks for sharing your experience.

Best regards,
Vladimir Ivanov



bye Jochen

_______________________________________________
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev

Reply via email to