Re: [drlvm][threading] H3010 (Stack Overflow Exception) -- when does this bug really have to be fixed?

Egor Pasko Mon, 12 Mar 2007 10:54:06 -0800

On the 0x297 day of Apache Harmony Weldon Washburn wrote:
> On 12 Mar 2007 19:46:06 +0300, Egor Pasko <[EMAIL PROTECTED]> wrote:
> >
> > On the 0x297 day of Apache Harmony Weldon Washburn wrote:
> > > All,
> > > I assigned H3010 to myself.  This test definitely demonstrates a bug
> > that
> > > needs fixing.  But its not clear when this bug must be fixed.  This
> > really
> > > brings forward a higher-level.  What to code this bug right now and when
> > > would this bug be moved to "blocker" status?  I provide some
> > observations to
> > > start the discussion:
> > >
> > > 1)
> > > The bug is a Stack Overflow Exception happens from inside fast native
> > helper
> > > functions.  Fast native helpers do not setup the M2N stack frame which
> > is
> > > required to throw exceptions such as SOE.  Adding M2N setup to fast
> > native
> > > helper will unacceptably slow down the system.
> >
> > to be honest..
> >
> > SOE can happen from a 'push' onto stack (such pushes are not
> > safepoints in JIT currently). Thus, you cannot unwind properly (no M2N
> > necessary for releasing the lock).
> >
> > Do you think it is a low probability?
> 
> 
> Good point.  Yes, SOE can happen from jitted code doing stuff like "push
> ebp".  And we have to handle this case properly.  And it will require a
> design discussion between JIT and VM developers.  This is really interesting
> topic.  But the question remains.  Do we have to solve this issue in Q1?
> Q4?  2008??  To answer this question, we have to ask what workloads we want
> to run in Q1/Q2/Q3...  And then find out if the workloads hit the SOE
> problem we are discussing.  My guess is that if useful workloads we want to
> run actually hit SOE, we will be able to workaround it by simply making the
> stack a little bigger.  Also my guess is that Java compatibility tests
> (tck?) will specifically test this case.  In other words, its probably
> needed for compliance but not really needed for getting important workloads
> running.


that has some relevence to the -Xss option. If we implement it, almost
any "popular workload" would crash in SEGV instead of throwing SOE
properly when run on a small stack size.

One might argue that running a "popular workload" with a small stack
size makes the workload "not so popular". I dunno.

> > 2)
> > > When running useful workload, a Stack Overflow that hits precisely on a
> > fast
> > > native has a very low probability.  Note the test in H3010 specifically
> > > forces this event to happen with a very high probability.  In other
> > words,
> > > while the test is a good, it reflects a very rare event in nature.
> > >
> > > Given the above, how about we address fixing the problem in two stages:
> > >
> > > 1)
> > > First stage: add an "assert(zero);" to the exception handler when it is
> > > determined an SOE has happened inside a fast native.  This way, we will
> > find
> > > out quickly when an important workload hits this bug.  Once the
> > assert(zero)
> > > is added, we code H3010 as "later"
> > >
> > > 2)
> > > Second stage: When an application we care about hits the assert(zero),
> > we
> > > recode H3010 as "major/blocker".
> > >
> > > 3)
> > > While waiting for #2 above to happen, we discuss on harmony-dev ways of
> > > designing the right fix.  For starts,  I think we should investigate a
> > > design where the exception handler rewrites the entire register context
> > so
> > > that returning from exception handler revectors the instruction pointer
> > to
> > > recovery code that will somehow push the M2N frame on the stack and call
> > > proper SOE throwing code.  I have not looked closely at how to do
> > this.  I
> > > am not convinced this approach will work.  However, I do think its worth
> > a
> > > try.  Thoughts?
> >
> > --
> > Egor Pasko
> >
> >
> 
> 
> -- 
> Weldon Washburn
> Intel Enterprise Solutions Software Division

-- 
Egor Pasko

Re: [drlvm][threading] H3010 (Stack Overflow Exception) -- when does this bug really have to be fixed?

Reply via email to