RFR [8055338]: (process) Add instrumentation to help diagnose JDK-6573254

2014-08-18 Thread Ivan Gerasimov

Hello!

This is a request to temporarily add some instrumentation code to 
hotspot to help diagnose the intermittent failure on Windows, which 
results in a wrong exit code of (sub-)process.


BUGURL: https://bugs.openjdk.java.net/browse/JDK-8055338
WEBREV: http://cr.openjdk.java.net/~igerasim/8055338/0/webrev/

Sincerely yours,
Ivan



Re: RFR [8055338]: (process) Add instrumentation to help diagnose JDK-6573254

2014-08-18 Thread Daniel D. Daugherty


On 8/18/14 3:19 PM, Ivan Gerasimov wrote:

Hello!

This is a request to temporarily add some instrumentation code to 
hotspot to help diagnose the intermittent failure on Windows, which 
results in a wrong exit code of (sub-)process.


BUGURL: https://bugs.openjdk.java.net/browse/JDK-8055338
WEBREV: http://cr.openjdk.java.net/~igerasim/8055338/0/webrev/


src/os/windows/vm/os_windows.cpp
No comments.

src/share/vm/runtime/java.cpp
No comments.

Good luck with the hunt! Many of us have looked for this bug
for years off and on...

Dan




Sincerely yours,
Ivan





Re: RFR [8055338]: (process) Add instrumentation to help diagnose JDK-6573254

2014-08-18 Thread Kelly O'Hair
 454   if (vm_getting_terminated && thread->is_Java_thread()) {
 455 JavaThread* java_thread = (JavaThread*)thread;
 456 if (java_thread && 
java_lang_Thread::is_daemon(java_thread->threadObj())) {
 457   return 70115;
 458 }

Seems like the check for java_thread being null on line 456 can never happen, 
because it was assigned thread which was dereferenced on line 454.
Maybe on line 454 you should check to make sure thread is not null and skip the 
check for java_thread being null?

Just comments from the peanut gallery. ;)

-kto

On Aug 18, 2014, at 3:14 PM, Daniel D. Daugherty wrote:

> 
> On 8/18/14 3:19 PM, Ivan Gerasimov wrote:
>> Hello!
>> 
>> This is a request to temporarily add some instrumentation code to hotspot to 
>> help diagnose the intermittent failure on Windows, which results in a wrong 
>> exit code of (sub-)process.
>> 
>> BUGURL: https://bugs.openjdk.java.net/browse/JDK-8055338
>> WEBREV: http://cr.openjdk.java.net/~igerasim/8055338/0/webrev/
> 
> src/os/windows/vm/os_windows.cpp
>No comments.
> 
> src/share/vm/runtime/java.cpp
>No comments.
> 
> Good luck with the hunt! Many of us have looked for this bug
> for years off and on...
> 
> Dan
> 
> 
>> 
>> Sincerely yours,
>> Ivan
>> 
> 



Re: RFR [8055338]: (process) Add instrumentation to help diagnose JDK-6573254

2014-08-18 Thread Ioi Lam
With the Windows/x86/x64 memory model, is the write to 
vm_getting_terminated guaranteed to be observable by java_start()?


- Ioi

On 8/18/14, 2:19 PM, Ivan Gerasimov wrote:

Hello!

This is a request to temporarily add some instrumentation code to 
hotspot to help diagnose the intermittent failure on Windows, which 
results in a wrong exit code of (sub-)process.


BUGURL: https://bugs.openjdk.java.net/browse/JDK-8055338
WEBREV: http://cr.openjdk.java.net/~igerasim/8055338/0/webrev/

Sincerely yours,
Ivan





Re: RFR [8055338]: (process) Add instrumentation to help diagnose JDK-6573254

2014-08-18 Thread David Holmes

On 19/08/2014 9:04 AM, Kelly O'Hair wrote:

  454   if (vm_getting_terminated && thread->is_Java_thread()) {
  455 JavaThread* java_thread = (JavaThread*)thread;
  456 if (java_thread && 
java_lang_Thread::is_daemon(java_thread->threadObj())) {
  457   return 70115;
  458 }

Seems like the check for java_thread being null on line 456 can never happen, 
because it was assigned thread which was dereferenced on line 454.
Maybe on line 454 you should check to make sure thread is not null and skip the 
check for java_thread being null?


Neither thread nor java_thread can be NULL.


Just comments from the peanut gallery. ;)


Ditto :)

David H.



-kto

On Aug 18, 2014, at 3:14 PM, Daniel D. Daugherty wrote:



On 8/18/14 3:19 PM, Ivan Gerasimov wrote:

Hello!

This is a request to temporarily add some instrumentation code to hotspot to 
help diagnose the intermittent failure on Windows, which results in a wrong 
exit code of (sub-)process.

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8055338
WEBREV: http://cr.openjdk.java.net/~igerasim/8055338/0/webrev/


src/os/windows/vm/os_windows.cpp
No comments.

src/share/vm/runtime/java.cpp
No comments.

Good luck with the hunt! Many of us have looked for this bug
for years off and on...

Dan




Sincerely yours,
Ivan







Re: RFR [8055338]: (process) Add instrumentation to help diagnose JDK-6573254

2014-08-18 Thread David Holmes

On 19/08/2014 10:12 AM, Ioi Lam wrote:

With the Windows/x86/x64 memory model, is the write to
vm_getting_terminated guaranteed to be observable by java_start()?


In the general case, not immediately. For the threads actually of 
interest the logic that tells the threads to terminate happens after the 
write to the flag, and that logic contains sufficient "synchronization" 
that if the termination logic is correct then the flag must also be visible.


David


- Ioi

On 8/18/14, 2:19 PM, Ivan Gerasimov wrote:

Hello!

This is a request to temporarily add some instrumentation code to
hotspot to help diagnose the intermittent failure on Windows, which
results in a wrong exit code of (sub-)process.

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8055338
WEBREV: http://cr.openjdk.java.net/~igerasim/8055338/0/webrev/

Sincerely yours,
Ivan





Re: RFR [8055338]: (process) Add instrumentation to help diagnose JDK-6573254

2014-08-18 Thread Ivan Gerasimov

Thanks Kelly and David!

On 19.08.2014 6:14, David Holmes wrote:

On 19/08/2014 9:04 AM, Kelly O'Hair wrote:

  454   if (vm_getting_terminated && thread->is_Java_thread()) {
  455 JavaThread* java_thread = (JavaThread*)thread;
  456 if (java_thread && 
java_lang_Thread::is_daemon(java_thread->threadObj())) {

  457   return 70115;
  458 }

Seems like the check for java_thread being null on line 456 can never 
happen, because it was assigned thread which was dereferenced on line 
454.
Maybe on line 454 you should check to make sure thread is not null 
and skip the check for java_thread being null?


Neither thread nor java_thread can be NULL.


Yes, I'll remove this unnecessary check.

Sincerely yours,
Ivan



Just comments from the peanut gallery. ;)


Ditto :)

David H.



-kto

On Aug 18, 2014, at 3:14 PM, Daniel D. Daugherty wrote:



On 8/18/14 3:19 PM, Ivan Gerasimov wrote:

Hello!

This is a request to temporarily add some instrumentation code to 
hotspot to help diagnose the intermittent failure on Windows, which 
results in a wrong exit code of (sub-)process.


BUGURL: https://bugs.openjdk.java.net/browse/JDK-8055338
WEBREV: http://cr.openjdk.java.net/~igerasim/8055338/0/webrev/


src/os/windows/vm/os_windows.cpp
No comments.

src/share/vm/runtime/java.cpp
No comments.

Good luck with the hunt! Many of us have looked for this bug
for years off and on...

Dan




Sincerely yours,
Ivan












Re: RFR [8055338]: (process) Add instrumentation to help diagnose JDK-6573254

2014-08-19 Thread Ivan Gerasimov

Hello again!

I updated the patch to cover situations when the exiting thread isn't 
daemon.
I also added load_acquire/store_release for the sake of accuracy, even 
though on Windows they seem to add nothing to the volatile access.


http://cr.openjdk.java.net/~igerasim/8055338/1/webrev/

If the updated patch looks Okay, I'll need a sponsor to push it.

Sincerely yours,
Ivan

On 19.08.2014 6:42, David Holmes wrote:

On 19/08/2014 10:12 AM, Ioi Lam wrote:

With the Windows/x86/x64 memory model, is the write to
vm_getting_terminated guaranteed to be observable by java_start()?


In the general case, not immediately. For the threads actually of 
interest the logic that tells the threads to terminate happens after 
the write to the flag, and that logic contains sufficient 
"synchronization" that if the termination logic is correct then the 
flag must also be visible.


David


- Ioi

On 8/18/14, 2:19 PM, Ivan Gerasimov wrote:

Hello!

This is a request to temporarily add some instrumentation code to
hotspot to help diagnose the intermittent failure on Windows, which
results in a wrong exit code of (sub-)process.

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8055338
WEBREV: http://cr.openjdk.java.net/~igerasim/8055338/0/webrev/

Sincerely yours,
Ivan










Re: RFR [8055338]: (process) Add instrumentation to help diagnose JDK-6573254

2014-08-19 Thread Mario Torre
Is there any reason why this link is only accessible if I log-in?

Mario

2014-08-18 23:19 GMT+02:00 Ivan Gerasimov :
> Hello!
>
> This is a request to temporarily add some instrumentation code to hotspot to
> help diagnose the intermittent failure on Windows, which results in a wrong
> exit code of (sub-)process.
>
> BUGURL: https://bugs.openjdk.java.net/browse/JDK-8055338
> WEBREV: http://cr.openjdk.java.net/~igerasim/8055338/0/webrev/
>
> Sincerely yours,
> Ivan
>



-- 
pgp key: http://subkeys.pgp.net/ PGP Key ID: 80F240CF
Fingerprint: BA39 9666 94EC 8B73 27FA  FC7C 4086 63E3 80F2 40CF

Blog: http://neugens.wordpress.com - Twitter: @neugens
Proud GNU Classpath developer: http://www.classpath.org/
Read About us at: http://planet.classpath.org
OpenJDK: http://openjdk.java.net/projects/caciocavallo/

Please, support open standards:
http://endsoftpatents.org/


Re: RFR [8055338]: (process) Add instrumentation to help diagnose JDK-6573254

2014-08-19 Thread Daniel D. Daugherty

On 8/19/14 6:57 AM, Ivan Gerasimov wrote:

Hello again!

I updated the patch to cover situations when the exiting thread isn't 
daemon.
I also added load_acquire/store_release for the sake of accuracy, even 
though on Windows they seem to add nothing to the volatile access.


http://cr.openjdk.java.net/~igerasim/8055338/1/webrev/


src/os/windows/vm/os_windows.cpp
No comments.

src/share/vm/runtime/java.cpp
No comments.

Thumbs up. I like this version even more.

Dan




If the updated patch looks Okay, I'll need a sponsor to push it.

Sincerely yours,
Ivan

On 19.08.2014 6:42, David Holmes wrote:

On 19/08/2014 10:12 AM, Ioi Lam wrote:

With the Windows/x86/x64 memory model, is the write to
vm_getting_terminated guaranteed to be observable by java_start()?


In the general case, not immediately. For the threads actually of 
interest the logic that tells the threads to terminate happens after 
the write to the flag, and that logic contains sufficient 
"synchronization" that if the termination logic is correct then the 
flag must also be visible.


David


- Ioi

On 8/18/14, 2:19 PM, Ivan Gerasimov wrote:

Hello!

This is a request to temporarily add some instrumentation code to
hotspot to help diagnose the intermittent failure on Windows, which
results in a wrong exit code of (sub-)process.

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8055338
WEBREV: http://cr.openjdk.java.net/~igerasim/8055338/0/webrev/

Sincerely yours,
Ivan












Re: RFR [8055338]: (process) Add instrumentation to help diagnose JDK-6573254

2014-08-19 Thread Ivan Gerasimov

This is a sub-task and it inherits its security level from the parent bug.
It appears that Jira doesn't allow editing the security level of sub-tasks.

Sincerely yours,
Ivan

On 19.08.2014 17:16, Mario Torre wrote:

Is there any reason why this link is only accessible if I log-in?

Mario

2014-08-18 23:19 GMT+02:00 Ivan Gerasimov :

Hello!

This is a request to temporarily add some instrumentation code to hotspot to
help diagnose the intermittent failure on Windows, which results in a wrong
exit code of (sub-)process.

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8055338
WEBREV: http://cr.openjdk.java.net/~igerasim/8055338/0/webrev/

Sincerely yours,
Ivan








Re: RFR [8055338]: (process) Add instrumentation to help diagnose JDK-6573254

2014-08-19 Thread serguei.spit...@oracle.com

Hi Ivan,

It is a nice step in the investigation of this nasty and very old issue!
The fix looks good in general.

A minor comment:
Is it possible to make the exit code more consistent in the os_windows.cpp?
For instance, make them 70115, 80115 and 90115 instead of 70115, 80211 
and 90223?


Thanks,
Serguei

On 8/19/14 5:57 AM, Ivan Gerasimov wrote:

Hello again!

I updated the patch to cover situations when the exiting thread isn't 
daemon.
I also added load_acquire/store_release for the sake of accuracy, even 
though on Windows they seem to add nothing to the volatile access.


http://cr.openjdk.java.net/~igerasim/8055338/1/webrev/

If the updated patch looks Okay, I'll need a sponsor to push it.

Sincerely yours,
Ivan

On 19.08.2014 6:42, David Holmes wrote:

On 19/08/2014 10:12 AM, Ioi Lam wrote:

With the Windows/x86/x64 memory model, is the write to
vm_getting_terminated guaranteed to be observable by java_start()?


In the general case, not immediately. For the threads actually of 
interest the logic that tells the threads to terminate happens after 
the write to the flag, and that logic contains sufficient 
"synchronization" that if the termination logic is correct then the 
flag must also be visible.


David


- Ioi

On 8/18/14, 2:19 PM, Ivan Gerasimov wrote:

Hello!

This is a request to temporarily add some instrumentation code to
hotspot to help diagnose the intermittent failure on Windows, which
results in a wrong exit code of (sub-)process.

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8055338
WEBREV: http://cr.openjdk.java.net/~igerasim/8055338/0/webrev/

Sincerely yours,
Ivan












Re: RFR [8055338]: (process) Add instrumentation to help diagnose JDK-6573254

2014-08-20 Thread Ivan Gerasimov

Hello everyone!

Here's the third version of the webrev:
http://cr.openjdk.java.net/~igerasim/8055338/2/webrev/

The control build of the previous one was causing a lot of test failures.
This one seems to be innocent enough: no new test failures so far.

Additionally, this version keeps the timing around the thread exit close 
to original, which might be important if we deal with a race.


Sincerely yours,
Ivan



Re: RFR [8055338]: (process) Add instrumentation to help diagnose JDK-6573254

2014-08-20 Thread Staffan Larsen
Looks good to me. Let’s see what it uncovers.

/Staffan

On 20 aug 2014, at 21:36, Ivan Gerasimov  wrote:

> Hello everyone!
> 
> Here's the third version of the webrev:
> http://cr.openjdk.java.net/~igerasim/8055338/2/webrev/
> 
> The control build of the previous one was causing a lot of test failures.
> This one seems to be innocent enough: no new test failures so far.
> 
> Additionally, this version keeps the timing around the thread exit close to 
> original, which might be important if we deal with a race.
> 
> Sincerely yours,
> Ivan
> 



Re: RFR [8055338]: (process) Add instrumentation to help diagnose JDK-6573254

2014-08-21 Thread serguei.spit...@oracle.com

+1

On 8/20/14 10:57 PM, Staffan Larsen wrote:

Looks good to me. Let’s see what it uncovers.

/Staffan

On 20 aug 2014, at 21:36, Ivan Gerasimov  wrote:


Hello everyone!

Here's the third version of the webrev:
http://cr.openjdk.java.net/~igerasim/8055338/2/webrev/

The control build of the previous one was causing a lot of test failures.
This one seems to be innocent enough: no new test failures so far.

Additionally, this version keeps the timing around the thread exit close to 
original, which might be important if we deal with a race.

Sincerely yours,
Ivan





Re: RFR [8055338]: (process) Add instrumentation to help diagnose JDK-6573254

2014-08-21 Thread David Holmes

Third time lucky :) Let's give it a go.

David

On 21/08/2014 5:36 AM, Ivan Gerasimov wrote:

Hello everyone!

Here's the third version of the webrev:
http://cr.openjdk.java.net/~igerasim/8055338/2/webrev/

The control build of the previous one was causing a lot of test failures.
This one seems to be innocent enough: no new test failures so far.

Additionally, this version keeps the timing around the thread exit close
to original, which might be important if we deal with a race.

Sincerely yours,
Ivan



Re: RFR [8055338]: (process) Add instrumentation to help diagnose JDK-6573254

2014-08-21 Thread Ivan Gerasimov

Thanks for review!

Could someone please sponsor it for me?
I'm not sure what the correct repository is.

Sincerely yours,
Ivan

On 21.08.2014 11:46, serguei.spit...@oracle.com wrote:

+1

On 8/20/14 10:57 PM, Staffan Larsen wrote:

Looks good to me. Let’s see what it uncovers.

/Staffan

On 20 aug 2014, at 21:36, Ivan Gerasimov  
wrote:



Hello everyone!

Here's the third version of the webrev:
http://cr.openjdk.java.net/~igerasim/8055338/2/webrev/

The control build of the previous one was causing a lot of test 
failures.

This one seems to be innocent enough: no new test failures so far.

Additionally, this version keeps the timing around the thread exit 
close to original, which might be important if we deal with a race.


Sincerely yours,
Ivan









Re: RFR [8055338]: (process) Add instrumentation to help diagnose JDK-6573254

2014-08-21 Thread Daniel D. Daugherty

On 8/20/14 1:36 PM, Ivan Gerasimov wrote:

Hello everyone!

Here's the third version of the webrev:
http://cr.openjdk.java.net/~igerasim/8055338/2/webrev/


src/os/windows/vm/os_windows.cpp
line 444:  res = java_lang_Thread::is_daemon(java_thread->threadObj())
line 453:  thread->run();
Querying the 'is_daemon' flag before the thread is run
won't catch all cases of daemon threads, but I think
it is good enough for our purposes.

src/share/vm/runtime/java.cpp
No comments.

Thumbs up.


The control build of the previous one was causing a lot of test failures.
This one seems to be innocent enough: no new test failures so far.


So the previous version was querying everything after the
thread had finished with thread->run(). I'll take it the
data we were querying was no longer stable at that point?


Additionally, this version keeps the timing around the thread exit 
close to original, which might be important if we deal with a race.


Agreed.

Dan




Sincerely yours,
Ivan





Re: RFR [8055338]: (process) Add instrumentation to help diagnose JDK-6573254

2014-08-21 Thread Daniel D. Daugherty

The correct repository would be RT_Baseline. What testing has been
run on your fix? I know you've done a control build so that covers
JPRT testing.

I'll prep a repo for sponsoring the push, but I'll hold off pushing
until I know what testing has been done.

Dan


On 8/21/14 4:57 AM, Ivan Gerasimov wrote:

Thanks for review!

Could someone please sponsor it for me?
I'm not sure what the correct repository is.

Sincerely yours,
Ivan

On 21.08.2014 11:46, serguei.spit...@oracle.com wrote:

+1

On 8/20/14 10:57 PM, Staffan Larsen wrote:

Looks good to me. Let’s see what it uncovers.

/Staffan

On 20 aug 2014, at 21:36, Ivan Gerasimov  
wrote:



Hello everyone!

Here's the third version of the webrev:
http://cr.openjdk.java.net/~igerasim/8055338/2/webrev/

The control build of the previous one was causing a lot of test 
failures.

This one seems to be innocent enough: no new test failures so far.

Additionally, this version keeps the timing around the thread exit 
close to original, which might be important if we deal with a race.


Sincerely yours,
Ivan











Re: RFR [8055338]: (process) Add instrumentation to help diagnose JDK-6573254

2014-08-21 Thread Daniel D. Daugherty

Here's the commit message:

$ cat commit.txt
8055338: (process) Add instrumentation to help diagnose JDK-6573254
Reviewed-by: dcubed, ohair, iklam, dholmes, sspitsyn, sla

I will commit the changeset as 'igerasim'.

Didn't think a summary line was needed.

Dan


On 8/21/14 8:21 AM, Daniel D. Daugherty wrote:

The correct repository would be RT_Baseline. What testing has been
run on your fix? I know you've done a control build so that covers
JPRT testing.

I'll prep a repo for sponsoring the push, but I'll hold off pushing
until I know what testing has been done.

Dan


On 8/21/14 4:57 AM, Ivan Gerasimov wrote:

Thanks for review!

Could someone please sponsor it for me?
I'm not sure what the correct repository is.

Sincerely yours,
Ivan

On 21.08.2014 11:46, serguei.spit...@oracle.com wrote:

+1

On 8/20/14 10:57 PM, Staffan Larsen wrote:

Looks good to me. Let’s see what it uncovers.

/Staffan

On 20 aug 2014, at 21:36, Ivan Gerasimov 
 wrote:



Hello everyone!

Here's the third version of the webrev:
http://cr.openjdk.java.net/~igerasim/8055338/2/webrev/

The control build of the previous one was causing a lot of test 
failures.

This one seems to be innocent enough: no new test failures so far.

Additionally, this version keeps the timing around the thread exit 
close to original, which might be important if we deal with a race.


Sincerely yours,
Ivan