On the 0x297 day of Apache Harmony Weldon Washburn wrote: > On 12 Mar 2007 19:46:06 +0300, Egor Pasko <[EMAIL PROTECTED]> wrote: > > > > On the 0x297 day of Apache Harmony Weldon Washburn wrote: > > > All, > > > I assigned H3010 to myself. This test definitely demonstrates a bug > > that > > > needs fixing. But its not clear when this bug must be fixed. This > > really > > > brings forward a higher-level. What to code this bug right now and when > > > would this bug be moved to "blocker" status? I provide some > > observations to > > > start the discussion: > > > > > > 1) > > > The bug is a Stack Overflow Exception happens from inside fast native > > helper > > > functions. Fast native helpers do not setup the M2N stack frame which > > is > > > required to throw exceptions such as SOE. Adding M2N setup to fast > > native > > > helper will unacceptably slow down the system. > > > > to be honest.. > > > > SOE can happen from a 'push' onto stack (such pushes are not > > safepoints in JIT currently). Thus, you cannot unwind properly (no M2N > > necessary for releasing the lock). > > > > Do you think it is a low probability? > > > Good point. Yes, SOE can happen from jitted code doing stuff like "push > ebp". And we have to handle this case properly. And it will require a > design discussion between JIT and VM developers. This is really interesting > topic. But the question remains. Do we have to solve this issue in Q1? > Q4? 2008?? To answer this question, we have to ask what workloads we want > to run in Q1/Q2/Q3... And then find out if the workloads hit the SOE > problem we are discussing. My guess is that if useful workloads we want to > run actually hit SOE, we will be able to workaround it by simply making the > stack a little bigger. Also my guess is that Java compatibility tests > (tck?) will specifically test this case. In other words, its probably > needed for compliance but not really needed for getting important workloads > running.
that has some relevence to the -Xss option. If we implement it, almost any "popular workload" would crash in SEGV instead of throwing SOE properly when run on a small stack size. One might argue that running a "popular workload" with a small stack size makes the workload "not so popular". I dunno. > > 2) > > > When running useful workload, a Stack Overflow that hits precisely on a > > fast > > > native has a very low probability. Note the test in H3010 specifically > > > forces this event to happen with a very high probability. In other > > words, > > > while the test is a good, it reflects a very rare event in nature. > > > > > > Given the above, how about we address fixing the problem in two stages: > > > > > > 1) > > > First stage: add an "assert(zero);" to the exception handler when it is > > > determined an SOE has happened inside a fast native. This way, we will > > find > > > out quickly when an important workload hits this bug. Once the > > assert(zero) > > > is added, we code H3010 as "later" > > > > > > 2) > > > Second stage: When an application we care about hits the assert(zero), > > we > > > recode H3010 as "major/blocker". > > > > > > 3) > > > While waiting for #2 above to happen, we discuss on harmony-dev ways of > > > designing the right fix. For starts, I think we should investigate a > > > design where the exception handler rewrites the entire register context > > so > > > that returning from exception handler revectors the instruction pointer > > to > > > recovery code that will somehow push the M2N frame on the stack and call > > > proper SOE throwing code. I have not looked closely at how to do > > this. I > > > am not convinced this approach will work. However, I do think its worth > > a > > > try. Thoughts? > > > > -- > > Egor Pasko > > > > > > > -- > Weldon Washburn > Intel Enterprise Solutions Software Division -- Egor Pasko
