subject:"RFR\: 8316401\: sun\/tools\/jhsdb\/JStackStressTest.java failed with \"InternalError\: We should have found a thread that owns the anonymous lock\""

Re: RFR: 8316401: sun/tools/jhsdb/JStackStressTest.java failed with "InternalError: We should have found a thread that owns the anonymous lock" [v3]

2023-09-29 Thread Roman Kennke

On Thu, 28 Sep 2023 09:11:12 GMT, Roman Kennke  wrote:

>> The SA can run concurrently with Java threads, SA code that inspects locking 
>> state should be able to deal with that. On the other hand, the particular 
>> code is only used in printing routine and is not expected to be precise. 
>> When resolving an anonymous owner, we may not find one, because Java threads 
>> may already have moved on. Instead of crashing with a stacktrace, we should 
>> gracefully return null here.
>> 
>> Testing:
>>  - [x] sun/tools/jhsdb/JStackStressTest.java
>>  - [x] sun/tools/jhsdb
>>  - [x] serviceability/sa
>
> Roman Kennke has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Fix comment

Thanks!

-

PR Comment: https://git.openjdk.org/jdk/pull/15907#issuecomment-1740721744

Re: RFR: 8316401: sun/tools/jhsdb/JStackStressTest.java failed with "InternalError: We should have found a thread that owns the anonymous lock" [v3]

2023-09-28 Thread David Holmes

On Thu, 28 Sep 2023 09:11:12 GMT, Roman Kennke  wrote:

>> The SA can run concurrently with Java threads, SA code that inspects locking 
>> state should be able to deal with that. On the other hand, the particular 
>> code is only used in printing routine and is not expected to be precise. 
>> When resolving an anonymous owner, we may not find one, because Java threads 
>> may already have moved on. Instead of crashing with a stacktrace, we should 
>> gracefully return null here.
>> 
>> Testing:
>>  - [x] sun/tools/jhsdb/JStackStressTest.java
>>  - [x] sun/tools/jhsdb
>>  - [x] serviceability/sa
>
> Roman Kennke has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Fix comment

Looks good.

Thanks for you patience while we worked through this one.

-

Marked as reviewed by dholmes (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/15907#pullrequestreview-1650058692

Re: RFR: 8316401: sun/tools/jhsdb/JStackStressTest.java failed with "InternalError: We should have found a thread that owns the anonymous lock" [v3]

2023-09-28 Thread Roman Kennke

> The SA can run concurrently with Java threads, SA code that inspects locking 
> state should be able to deal with that. On the other hand, the particular 
> code is only used in printing routine and is not expected to be precise. When 
> resolving an anonymous owner, we may not find one, because Java threads may 
> already have moved on. Instead of crashing with a stacktrace, we should 
> gracefully return null here.
> 
> Testing:
>  - [x] sun/tools/jhsdb/JStackStressTest.java
>  - [x] sun/tools/jhsdb
>  - [x] serviceability/sa

Roman Kennke has updated the pull request incrementally with one additional 
commit since the last revision:

  Fix comment

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/15907/files
  - new: https://git.openjdk.org/jdk/pull/15907/files/4ddaf577..bff3b3a3

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=15907&range=02
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15907&range=01-02

  Stats: 3 lines in 1 file changed: 0 ins; 1 del; 2 mod
  Patch: https://git.openjdk.org/jdk/pull/15907.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/15907/head:pull/15907

PR: https://git.openjdk.org/jdk/pull/15907

Re: RFR: 8316401: sun/tools/jhsdb/JStackStressTest.java failed with "InternalError: We should have found a thread that owns the anonymous lock" [v2]

2023-09-27 Thread David Holmes

On Wed, 27 Sep 2023 17:54:51 GMT, Roman Kennke  wrote:

>> The SA can run concurrently with Java threads, SA code that inspects locking 
>> state should be able to deal with that. On the other hand, the particular 
>> code is only used in printing routine and is not expected to be precise. 
>> When resolving an anonymous owner, we may not find one, because Java threads 
>> may already have moved on. Instead of crashing with a stacktrace, we should 
>> gracefully return null here.
>> 
>> Testing:
>>  - [x] sun/tools/jhsdb/JStackStressTest.java
>>  - [x] sun/tools/jhsdb
>>  - [x] serviceability/sa
>
> Roman Kennke has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Print warning when anonymous lock cannot be found

Changes requested by dholmes (Reviewer).

src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/runtime/Threads.java line 
245:

> 243: }
> 244: // We should have found the owner. However, the code can 
> run concurrently with
> 245: // Java code and locking state can change at any time. 
> This code is not

The second sentence is not accurate as things are not running concurrently and 
cannot change state at any time. Suggestion:

// We should have found the owner, however, as the VM could be in any state, 
including the middle
// of performing GC, it is not always possible to do so. Just return null if we 
can't locate it.

-

PR Review: https://git.openjdk.org/jdk/pull/15907#pullrequestreview-1647736396
PR Review Comment: https://git.openjdk.org/jdk/pull/15907#discussion_r1339374587

Re: RFR: 8316401: sun/tools/jhsdb/JStackStressTest.java failed with "InternalError: We should have found a thread that owns the anonymous lock" [v2]

2023-09-27 Thread Chris Plummer

On Wed, 27 Sep 2023 17:54:51 GMT, Roman Kennke  wrote:

>> The SA can run concurrently with Java threads, SA code that inspects locking 
>> state should be able to deal with that. On the other hand, the particular 
>> code is only used in printing routine and is not expected to be precise. 
>> When resolving an anonymous owner, we may not find one, because Java threads 
>> may already have moved on. Instead of crashing with a stacktrace, we should 
>> gracefully return null here.
>> 
>> Testing:
>>  - [x] sun/tools/jhsdb/JStackStressTest.java
>>  - [x] sun/tools/jhsdb
>>  - [x] serviceability/sa
>
> Roman Kennke has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Print warning when anonymous lock cannot be found

Looks good. Thanks!

-

Marked as reviewed by cjplummer (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/15907#pullrequestreview-1647405613

Re: RFR: 8316401: sun/tools/jhsdb/JStackStressTest.java failed with "InternalError: We should have found a thread that owns the anonymous lock" [v2]

2023-09-27 Thread Roman Kennke

> The SA can run concurrently with Java threads, SA code that inspects locking 
> state should be able to deal with that. On the other hand, the particular 
> code is only used in printing routine and is not expected to be precise. When 
> resolving an anonymous owner, we may not find one, because Java threads may 
> already have moved on. Instead of crashing with a stacktrace, we should 
> gracefully return null here.
> 
> Testing:
>  - [x] sun/tools/jhsdb/JStackStressTest.java
>  - [x] sun/tools/jhsdb
>  - [x] serviceability/sa

Roman Kennke has updated the pull request incrementally with one additional 
commit since the last revision:

  Print warning when anonymous lock cannot be found

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/15907/files
  - new: https://git.openjdk.org/jdk/pull/15907/files/18e46db6..4ddaf577

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=15907&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15907&range=00-01

  Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/15907.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/15907/head:pull/15907

PR: https://git.openjdk.org/jdk/pull/15907

Re: RFR: 8316401: sun/tools/jhsdb/JStackStressTest.java failed with "InternalError: We should have found a thread that owns the anonymous lock"

2023-09-27 Thread Chris Plummer

On Wed, 27 Sep 2023 16:59:17 GMT, Roman Kennke  wrote:

>> I think given this, issuing a warning is the appropriate thing to do. The 
>> warning might want to mention that the failure is likely because we are in 
>> the middle of a GC. That way the user has some context as to why SA info is 
>> not reliable.
>
> Could you advice how such a warning could look like? Does SA use a special 
> stream for that?

Most warnings go to System.err, but I think System.out is better in most cases 
so the warning appears in line with the rest of the related output. You could 
do something like:

Warning: We failed to find a thread that owns an anonymous lock. This is likely
due to the JVM currently running a GC. Locking information may not be accurate.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/15907#discussion_r1338959696

Re: RFR: 8316401: sun/tools/jhsdb/JStackStressTest.java failed with "InternalError: We should have found a thread that owns the anonymous lock"

2023-09-27 Thread Roman Kennke

On Wed, 27 Sep 2023 15:42:51 GMT, Chris Plummer  wrote:

>> I think you are right @plummercj - I'm adding details to the JBS issue.
>
> I think given this, issuing a warning is the appropriate thing to do. The 
> warning might want to mention that the failure is likely because we are in 
> the middle of a GC. That way the user has some context as to why SA info is 
> not reliable.

Could you advice how such a warning could look like? Does SA use a special 
stream for that?

-

PR Review Comment: https://git.openjdk.org/jdk/pull/15907#discussion_r1338928770

Re: RFR: 8316401: sun/tools/jhsdb/JStackStressTest.java failed with "InternalError: We should have found a thread that owns the anonymous lock"

2023-09-27 Thread Chris Plummer

On Wed, 27 Sep 2023 04:17:57 GMT, David Holmes  wrote:

>> I wonder if we are in the middle of a GC, the object has been moved, and the 
>> lock stack object reference has been updated but the monitor reference has 
>> not (or vice versa).
>
> I think you are right @plummercj - I'm adding details to the JBS issue.

I think given this, issuing a warning is the appropriate thing to do. The 
warning might want to mention that the failure is likely because we are in the 
middle of a GC. That way the user has some context as to why SA info is not 
reliable.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/15907#discussion_r1338812930

Re: RFR: 8316401: sun/tools/jhsdb/JStackStressTest.java failed with "InternalError: We should have found a thread that owns the anonymous lock"

2023-09-26 Thread David Holmes

On Wed, 27 Sep 2023 00:44:30 GMT, Chris Plummer  wrote:

>> I made a pass over the deadlock detection code involved. Nothing stuck out. 
>> There's not much in the way of SA-isms in this code. I think anyone with an 
>> understanding of how hotspot locking works should be able to navigate it 
>> pretty easily and maybe conjure a hypothesis as to why this happens as 
>> frequently as it does.
>
> I wonder if we are in the middle of a GC, the object has been moved, and the 
> lock stack object reference has been updated but the monitor reference has 
> not (or vice versa).

I think you are right @plummercj - I'm adding details to the JBS issue.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/15907#discussion_r1338022386

Re: RFR: 8316401: sun/tools/jhsdb/JStackStressTest.java failed with "InternalError: We should have found a thread that owns the anonymous lock"

2023-09-26 Thread David Holmes

On Wed, 27 Sep 2023 00:44:30 GMT, Chris Plummer  wrote:

>> I made a pass over the deadlock detection code involved. Nothing stuck out. 
>> There's not much in the way of SA-isms in this code. I think anyone with an 
>> understanding of how hotspot locking works should be able to navigate it 
>> pretty easily and maybe conjure a hypothesis as to why this happens as 
>> frequently as it does.
>
> I wonder if we are in the middle of a GC, the object has been moved, and the 
> lock stack object reference has been updated but the monitor reference has 
> not (or vice versa).

Possible I guess. I added some instrumentation to try and see what threads we 
have and what their lockstacks show when the failure happens. Of course the 
test didn't fail after I did that, so having to re-run.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/15907#discussion_r1337987283

Re: RFR: 8316401: sun/tools/jhsdb/JStackStressTest.java failed with "InternalError: We should have found a thread that owns the anonymous lock"

2023-09-26 Thread Chris Plummer

On Tue, 26 Sep 2023 22:55:27 GMT, Chris Plummer  wrote:

>> Yeah, I also haven't seen a good explanation for why this is happening yet. 
>> I'll also look into SA, but I don't understand the hotspot code it is trying 
>> to emulate that well so not sure how useful I'll be.
>
> I made a pass over the deadlock detection code involved. Nothing stuck out. 
> There's not much in the way of SA-isms in this code. I think anyone with an 
> understanding of how hotspot locking works should be able to navigate it 
> pretty easily and maybe conjure a hypothesis as to why this happens as 
> frequently as it does.

I wonder if we are in the middle of a GC, the object has been moved, and the 
lock stack object reference has been updated but the monitor reference has not 
(or vice versa).

-

PR Review Comment: https://git.openjdk.org/jdk/pull/15907#discussion_r1337908791

Re: RFR: 8316401: sun/tools/jhsdb/JStackStressTest.java failed with "InternalError: We should have found a thread that owns the anonymous lock"

2023-09-26 Thread Chris Plummer

On Tue, 26 Sep 2023 22:10:15 GMT, Chris Plummer  wrote:

>> Switching the order did not fix the problem.
>> 
>> Even if we end up resolving that this is just something that can happen when 
>> SA inspects things, I'd prefer to understand how this can arise. I guess I 
>> need to look into how the SA tries to find the owner.
>
> Yeah, I also haven't seen a good explanation for why this is happening yet. 
> I'll also look into SA, but I don't understand the hotspot code it is trying 
> to emulate that well so not sure how useful I'll be.

I made a pass over the deadlock detection code involved. Nothing stuck out. 
There's not much in the way of SA-isms in this code. I think anyone with an 
understanding of how hotspot locking works should be able to navigate it pretty 
easily and maybe conjure a hypothesis as to why this happens as frequently as 
it does.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/15907#discussion_r1337857814

Re: RFR: 8316401: sun/tools/jhsdb/JStackStressTest.java failed with "InternalError: We should have found a thread that owns the anonymous lock"

2023-09-26 Thread Chris Plummer

On Tue, 26 Sep 2023 20:41:33 GMT, David Holmes  wrote:

>> I'm not seeing that this code has much impact on the failure rate. I've 
>> tried changing the order and also have tried adding a nanosleep(). Always 
>> fails at about the same rate.
>
> Switching the order did not fix the problem.
> 
> Even if we end up resolving that this is just something that can happen when 
> SA inspects things, I'd prefer to understand how this can arise. I guess I 
> need to look into how the SA tries to find the owner.

Yeah, I also haven't seen a good explanation for why this is happening yet. 
I'll also look into SA, but I don't understand the hotspot code it is trying to 
emulate that well so not sure how useful I'll be.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/15907#discussion_r1337832556

Re: RFR: 8316401: sun/tools/jhsdb/JStackStressTest.java failed with "InternalError: We should have found a thread that owns the anonymous lock"

2023-09-26 Thread David Holmes

On Tue, 26 Sep 2023 20:20:29 GMT, Chris Plummer  wrote:

>> I managed to get it to fail 4/100 times in our CI so armed with that I will 
>> try swapping the order. Though I have to admit I tend to agree with Chris 
>> that the code in question seems unlikely to be executed such that this 
>> failure mode is so repeatable.
>
> I'm not seeing that this code has much impact on the failure rate. I've tried 
> changing the order and also have tried adding a nanosleep(). Always fails at 
> about the same rate.

Switching the order did not fix the problem.

Even if we end up resolving that this is just something that can happen when SA 
inspects things, I'd prefer to understand how this can arise. I guess I need to 
look into how the SA tries to find the owner.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/15907#discussion_r1337758338

Re: RFR: 8316401: sun/tools/jhsdb/JStackStressTest.java failed with "InternalError: We should have found a thread that owns the anonymous lock"

2023-09-26 Thread Chris Plummer

On Tue, 26 Sep 2023 12:02:52 GMT, David Holmes  wrote:

>> Ok!
>> So we would have to swap the popping and state-update at least. Unless we 
>> insert members between them, the updates would still be observable in any 
>> order, I suppose, but inserting membars just to make SA happy seems a bit 
>> much.
>> Can one of you try swapping and see if that makes the problem appear less 
>> often? I could not yet reproduce the bug at all.
>
> I managed to get it to fail 4/100 times in our CI so armed with that I will 
> try swapping the order. Though I have to admit I tend to agree with Chris 
> that the code in question seems unlikely to be executed such that this 
> failure mode is so repeatable.

I'm not seeing that this code has much impact on the failure rate. I've tried 
changing the order and also have tried adding a nanosleep(). Always fails at 
about the same rate.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/15907#discussion_r1337736512

Re: RFR: 8316401: sun/tools/jhsdb/JStackStressTest.java failed with "InternalError: We should have found a thread that owns the anonymous lock"

2023-09-26 Thread David Holmes

On Tue, 26 Sep 2023 05:04:57 GMT, Roman Kennke  wrote:

>> Atomicity means guaranteeing that a 2 or more operations are executed as if 
>> they are single operation from the viewpoint of other threads. There's no 
>> such concept with SA. The state is frozen, so SA can look at anything 
>> without worrying about the state changing as it looks at it. When I say the 
>> JVM is not in a consistent state, one example of that is when the JVM is in 
>> the middle of an atomic operation when SA looks at it. Another example is 
>> when SA looks at data that is normally only safe to look at when the VM is 
>> safe pointing. The code that David pointed out is another example. It's 
>> usually protected from observation by other threads by safe pointing, but SA 
>> can see the data when it's in the middle of executing this code. Order may 
>> or may not matter. Clearly the current order is problematic for SA. 
>> Switching the order may or may not be a problem. It depends on whether SA 
>> cares that a monitor on the lock_stack of one thread is shown to be owned by 
>> another thread. If that does br
 eak SA, I think SA might be able to recognize the situation and work around 
it, or just print a warning.
>> 
>>> However, even that problem would be pre-existing (monitors not agreeing 
>>> with each other or with threads), and do we currently print a warning about 
>>> it?
>> 
>> Likely not because it is likely not detected and doesn't result in some sort 
>> of failure, and probably whoever wrote the original code hadn't considered 
>> the possibility. Most of this type of SA error handling code (that result in 
>> a warning) is the results of eventually someone noticing a failure of some 
>> sort. If no one ever notices a failure, then the error handling is never 
>> worked on.
>
> Ok!
> So we would have to swap the popping and state-update at least. Unless we 
> insert members between them, the updates would still be observable in any 
> order, I suppose, but inserting membars just to make SA happy seems a bit 
> much.
> Can one of you try swapping and see if that makes the problem appear less 
> often? I could not yet reproduce the bug at all.

I managed to get it to fail 4/100 times in our CI so armed with that I will try 
swapping the order. Though I have to admit I tend to agree with Chris that the 
code in question seems unlikely to be executed such that this failure mode is 
so repeatable.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/15907#discussion_r1337100891

Re: RFR: 8316401: sun/tools/jhsdb/JStackStressTest.java failed with "InternalError: We should have found a thread that owns the anonymous lock"

2023-09-25 Thread Roman Kennke

On Tue, 26 Sep 2023 05:03:13 GMT, Chris Plummer  wrote:

>> @rkennke the "snapshot" is atomic - the target VM is suspended.
>
> Atomicity means guaranteeing that a 2 or more operations are executed as if 
> they are single operation from the viewpoint of other threads. There's no 
> such concept with SA. The state is frozen, so SA can look at anything without 
> worrying about the state changing as it looks at it. When I say the JVM is 
> not in a consistent state, one example of that is when the JVM is in the 
> middle of an atomic operation when SA looks at it. Another example is when SA 
> looks at data that is normally only safe to look at when the VM is safe 
> pointing. The code that David pointed out is another example. It's usually 
> protected from observation by other threads by safe pointing, but SA can see 
> the data when it's in the middle of executing this code. Order may or may not 
> matter. Clearly the current order is problematic for SA. Switching the order 
> may or may not be a problem. It depends on whether SA cares that a monitor on 
> the lock_stack of one thread is shown to be owned by another thread. If that 
> does bre
 ak SA, I think SA might be able to recognize the situation and work around it, 
or just print a warning.
> 
>> However, even that problem would be pre-existing (monitors not agreeing with 
>> each other or with threads), and do we currently print a warning about it?
> 
> Likely not because it is likely not detected and doesn't result in some sort 
> of failure, and probably whoever wrote the original code hadn't considered 
> the possibility. Most of this type of SA error handling code (that result in 
> a warning) is the results of eventually someone noticing a failure of some 
> sort. If no one ever notices a failure, then the error handling is never 
> worked on.

Ok!
So we would have to swap the popping and state-update at least. Unless we 
insert members between them, the updates would still be observable in any 
order, I suppose, but inserting membars just to make SA happy seems a bit much.
Can one of you try swapping and see if that makes the problem appear less 
often? I could not yet reproduce the bug at all.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/15907#discussion_r1336628312

Re: RFR: 8316401: sun/tools/jhsdb/JStackStressTest.java failed with "InternalError: We should have found a thread that owns the anonymous lock"

2023-09-25 Thread Chris Plummer

On Tue, 26 Sep 2023 04:52:00 GMT, David Holmes  wrote:

>> Ok, so we are not (usually) at a safepoint, but no threads are moving. But 
>> the snapshot can not be taken atomically. Which means that the 
>> anonymous-state in the ObjectMonitor and the state of the lock-stack are not 
>> necessarily consistent, not even if we swap the popping and owner-update, 
>> right? This is what you mean by inconsistent state, right? Or is there a way 
>> to get an atomic snapshot of threads *and* monitors?
>> 
>> So yeah, we could print a warning when that happens.
>> 
>> However, even that problem would be pre-existing (monitors not agreeing with 
>> each other or with threads), and do we currently print a warning about it?
>
> @rkennke the "snapshot" is atomic - the target VM is suspended.

Atomicity means guaranteeing that a 2 or more operations are executed as if 
they are single operation from the viewpoint of other threads. There's no such 
concept with SA. The state is frozen, so SA can look at anything without 
worrying about the state changing as it looks at it. When I say the JVM is not 
in a consistent state, one example of that is when the JVM is in the middle of 
an atomic operation when SA looks at it. Another example is when SA looks at 
data that is normally only safe to look at when the VM is safe pointing. The 
code that David pointed out is another example. It's usually protected from 
observation by other threads by safe pointing, but SA can see the data when 
it's in the middle of executing this code. Order may or may not matter. Clearly 
the current order is problematic for SA. Switching the order may or may not be 
a problem. It depends on whether SA cares that a monitor on the lock_stack of 
one thread is shown to be owned by another thread. If that does break
  SA, I think SA might be able to recognize the situation and work around it, 
or just print a warning.

> However, even that problem would be pre-existing (monitors not agreeing with 
> each other or with threads), and do we currently print a warning about it?

Likely not because it is likely not detected and doesn't result in some sort of 
failure, and probably whoever wrote the original code hadn't considered the 
possibility. Most of this type of SA error handling code (that result in a 
warning) is the results of eventually someone noticing a failure of some sort. 
If no one ever notices a failure, then the error handling is never worked on.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/15907#discussion_r1336627387

Re: RFR: 8316401: sun/tools/jhsdb/JStackStressTest.java failed with "InternalError: We should have found a thread that owns the anonymous lock"

2023-09-25 Thread David Holmes

On Tue, 26 Sep 2023 04:25:40 GMT, Roman Kennke  wrote:

>> ...although that is a pretty small window and we are seeing this bug quite a 
>> lot. Seems if the code in question was the issue, it would take 1000s of 
>> iterations to reproduce.
>
> Ok, so we are not (usually) at a safepoint, but no threads are moving. But 
> the snapshot can not be taken atomically. Which means that the 
> anonymous-state in the ObjectMonitor and the state of the lock-stack are not 
> necessarily consistent, not even if we swap the popping and owner-update, 
> right? This is what you mean by inconsistent state, right? Or is there a way 
> to get an atomic snapshot of threads *and* monitors?
> 
> So yeah, we could print a warning when that happens.
> 
> However, even that problem would be pre-existing (monitors not agreeing with 
> each other or with threads), and do we currently print a warning about it?

@rkennke the "snapshot" is atomic - the target VM is suspended.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/15907#discussion_r1336621849

Re: RFR: 8316401: sun/tools/jhsdb/JStackStressTest.java failed with "InternalError: We should have found a thread that owns the anonymous lock"

2023-09-25 Thread Roman Kennke

On Tue, 26 Sep 2023 03:07:14 GMT, Chris Plummer  wrote:

>> And I shouldn't have said "cache". I was confusing this PR with another 
>> dealing with the Monitor Cache.
>
> ...although that is a pretty small window and we are seeing this bug quite a 
> lot. Seems if the code in question was the issue, it would take 1000s of 
> iterations to reproduce.

Ok, so we are not (usually) at a safepoint, but no threads are moving. But the 
snapshot can not be taken atomically. Which means that the anonymous-state in 
the ObjectMonitor and the state of the lock-stack are not necessarily 
consistent, not even if we swap the popping and owner-update, right? This is 
what you mean by inconsistent state, right? Or is there a way to get an atomic 
snapshot of threads *and* monitors?

So yeah, we could print a warning when that happens.

However, even that problem would be pre-existing (monitors not agreeing with 
each other or with threads), and do we currently print a warning about it?

-

PR Review Comment: https://git.openjdk.org/jdk/pull/15907#discussion_r1336609109

Re: RFR: 8316401: sun/tools/jhsdb/JStackStressTest.java failed with "InternalError: We should have found a thread that owns the anonymous lock"

2023-09-25 Thread Chris Plummer

On Tue, 26 Sep 2023 02:48:17 GMT, Chris Plummer  wrote:

>> Looks like you answered my question while I was asking it.
>
> And I shouldn't have said "cache". I was confusing this PR with another 
> dealing with the Monitor Cache.

...although that is a pretty small window and we are seeing this bug quite a 
lot. Seems if the code in question was the issue, it would take 1000s of 
iterations to reproduce.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/15907#discussion_r1336576620

Re: RFR: 8316401: sun/tools/jhsdb/JStackStressTest.java failed with "InternalError: We should have found a thread that owns the anonymous lock"

2023-09-25 Thread Chris Plummer

On Tue, 26 Sep 2023 02:44:59 GMT, Chris Plummer  wrote:

>> Maybe SA is seeing a monitor in the monitor cache that it is only seeing 
>> because the monitor cache is currently inconsistent (due to SA not safe 
>> pointing). So the question is whether the monitor cache can be in a state 
>> where it still contains a reference to an anonymous monitor, but no thread 
>> actually owns the monitor (and the anonymous monitor is about to be purged 
>> from the cache).
>
> Looks like you answered my question while I was asking it.

And I shouldn't have said "cache". I was confusing this PR with another dealing 
with the Monitor Cache.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/15907#discussion_r1336568286

Re: RFR: 8316401: sun/tools/jhsdb/JStackStressTest.java failed with "InternalError: We should have found a thread that owns the anonymous lock"

2023-09-25 Thread Chris Plummer

On Tue, 26 Sep 2023 02:42:56 GMT, Chris Plummer  wrote:

>> Correction, there is one code path where we pop the lockstack first and then 
>> update the owner:
>> 
>> ObjectMonitor* monitor = inflate(current, object, inflate_cause_vm_internal);
>>   if (LockingMode == LM_LIGHTWEIGHT && monitor->is_owner_anonymous()) {
>> // It must be owned by us. Pop lock object from lock stack.
>> LockStack& lock_stack = current->lock_stack();
>> oop popped = lock_stack.pop();
>> assert(popped == object, "must be owned by this thread");
>> monitor->set_owner_from_anonymous(current);
>>   }
>>   monitor->exit(current);
>> 
>> we should probably do this the other way around.
>
> Maybe SA is seeing a monitor in the monitor cache that it is only seeing 
> because the monitor cache is currently inconsistent (due to SA not safe 
> pointing). So the question is whether the monitor cache can be in a state 
> where it still contains a reference to an anonymous monitor, but no thread 
> actually owns the monitor (and the anonymous monitor is about to be purged 
> from the cache).

Looks like you answered my question while I was asking it.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/15907#discussion_r1336566708

Re: RFR: 8316401: sun/tools/jhsdb/JStackStressTest.java failed with "InternalError: We should have found a thread that owns the anonymous lock"

2023-09-25 Thread Chris Plummer

On Tue, 26 Sep 2023 02:34:08 GMT, David Holmes  wrote:

>> AFAICS we update the owner to the real thread before we remove the object 
>> from the lock stack. So if we see the object monitor is anonymously owned 
>> then we should find the monitor object in a thread's lockstack.
>
> Correction, there is one code path where we pop the lockstack first and then 
> update the owner:
> 
> ObjectMonitor* monitor = inflate(current, object, inflate_cause_vm_internal);
>   if (LockingMode == LM_LIGHTWEIGHT && monitor->is_owner_anonymous()) {
> // It must be owned by us. Pop lock object from lock stack.
> LockStack& lock_stack = current->lock_stack();
> oop popped = lock_stack.pop();
> assert(popped == object, "must be owned by this thread");
> monitor->set_owner_from_anonymous(current);
>   }
>   monitor->exit(current);
> 
> we should probably do this the other way around.

Maybe SA is seeing a monitor in the monitor cache that it is only seeing 
because the monitor cache is currently inconsistent (due to SA not safe 
pointing). So the question is whether the monitor cache can be in a state where 
it still contains a reference to an anonymous monitor, but no thread actually 
owns the monitor (and the anonymous monitor is about to be purged from the 
cache).

-

PR Review Comment: https://git.openjdk.org/jdk/pull/15907#discussion_r1336565875

Re: RFR: 8316401: sun/tools/jhsdb/JStackStressTest.java failed with "InternalError: We should have found a thread that owns the anonymous lock"

2023-09-25 Thread David Holmes

On Tue, 26 Sep 2023 02:30:25 GMT, David Holmes  wrote:

>>> Ah! I guess we get used to talking about "at a safepoint" when we really 
>>> mean "at a fixed point in time". So the VM is not necessarily at a 
>>> safepoint, but everything is fixed. So invariants may not hold, but the 
>>> state cannot change.
>> 
>> Correct!
>> 
>>> So in the current context the anonymous owner should be found ...
>> 
>> That's what I concluded based on Roman's description of how things work.
>> 
>>>  I guess the question to be answered is how the code tries to find an 
>>> anonymous owner? I'm not sure how you can find it.
>> 
>> I'm not sure yet. I'll need to have a look.
>
> AFAICS we update the owner to the real thread before we remove the object 
> from the lock stack. So if we see the object monitor is anonymously owned 
> then we should find the monitor object in a thread's lockstack.

Correction, there is one code path where we pop the lockstack first and then 
update the owner:

ObjectMonitor* monitor = inflate(current, object, inflate_cause_vm_internal);
  if (LockingMode == LM_LIGHTWEIGHT && monitor->is_owner_anonymous()) {
// It must be owned by us. Pop lock object from lock stack.
LockStack& lock_stack = current->lock_stack();
oop popped = lock_stack.pop();
assert(popped == object, "must be owned by this thread");
monitor->set_owner_from_anonymous(current);
  }
  monitor->exit(current);

we should probably do this the other way around.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/15907#discussion_r1336558790

Re: RFR: 8316401: sun/tools/jhsdb/JStackStressTest.java failed with "InternalError: We should have found a thread that owns the anonymous lock"

2023-09-25 Thread David Holmes

On Tue, 26 Sep 2023 02:27:39 GMT, Chris Plummer  wrote:

>> Ah! I guess we get used to talking about "at a safepoint" when we really 
>> mean "at a fixed point in time". So the VM is not necessarily at a 
>> safepoint, but everything is fixed. So invariants may not hold, but the 
>> state cannot change. So in the current context the anonymous owner should be 
>> found ... I guess the question to be answered is how the code tries to find 
>> an anonymous owner? I'm not sure how you can find it.
>
>> Ah! I guess we get used to talking about "at a safepoint" when we really 
>> mean "at a fixed point in time". So the VM is not necessarily at a 
>> safepoint, but everything is fixed. So invariants may not hold, but the 
>> state cannot change.
> 
> Correct!
> 
>> So in the current context the anonymous owner should be found ...
> 
> That's what I concluded based on Roman's description of how things work.
> 
>>  I guess the question to be answered is how the code tries to find an 
>> anonymous owner? I'm not sure how you can find it.
> 
> I'm not sure yet. I'll need to have a look.

AFAICS we update the owner to the real thread before we remove the object from 
the lock stack. So if we see the object monitor is anonymously owned then we 
should find the monitor object in a thread's lockstack.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/15907#discussion_r1336557193

Re: RFR: 8316401: sun/tools/jhsdb/JStackStressTest.java failed with "InternalError: We should have found a thread that owns the anonymous lock"

2023-09-25 Thread Chris Plummer

On Tue, 26 Sep 2023 02:14:40 GMT, David Holmes  wrote:

> Ah! I guess we get used to talking about "at a safepoint" when we really mean 
> "at a fixed point in time". So the VM is not necessarily at a safepoint, but 
> everything is fixed. So invariants may not hold, but the state cannot change.

Correct!

> So in the current context the anonymous owner should be found ...

That's what I concluded based on Roman's description of how things work.

>  I guess the question to be answered is how the code tries to find an 
> anonymous owner? I'm not sure how you can find it.

I'm not sure yet. I'll need to have a look.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/15907#discussion_r1336555978

Re: RFR: 8316401: sun/tools/jhsdb/JStackStressTest.java failed with "InternalError: We should have found a thread that owns the anonymous lock"

2023-09-25 Thread David Holmes

On Tue, 26 Sep 2023 01:41:38 GMT, Chris Plummer  wrote:

>> Correction to above:
>> 
>> threads = VM.getVM().getThreads(); 
>> heap = VM.getVM().getObjectHeap(); 
>> createThreadTable(); // calls getThreads() again
>> 
>> The VM caches the set of threads ie the snapshot, so three sets are not 
>> possible. But AFAICS the thread snapshot and heap snapshot are not atomic, 
>> so the set of threads could have changed, and the state of threads also.
>
>> Surely jstack thread dump and deadlock check _has_ to run at a safepoint?
> 
> The reality is that JVM is rarely at a safepoint (unless perhaps when all 
> threads are blocked), and therefore jstack rarely is done at a safepoint. 
> This is the world SA lives in. The understanding is that SA debugging 
> features may give inaccurate info, or possibly not work at all.
> 
>> If the SA is working from a snapshot then it has to create that snapshot 
>> atomically. It can't snapshot the threads, then snapshot the heap.
> 
> It doesn't snapshot. It does a debugger attach, which suspends the process. 
> It then starts to read in pages from the process to do things like jstack. 
> The process state does not change while it does this. It's not really any 
> different than gdb in this regard. gdb does not let the process state change 
> unless you use a command that allows execution such "step" or "continue". As 
> long as you avoid the commands that allow process execution,  you can debug 
> without worrying about the process state changing. However, even GDB 
> debugging has the same issues with JVM safe pointing (or lack thereof). If 
> the JVM crashes and you start looking at certain data, it might be 
> inconsistent. There's no way to force a safepoint once there is a crash.
> 
>> @plummercj the SA code sees T2 is pending on the monitor for object O, which 
>> is locked anonymously but actually by T1. The SA code then goes hunting for 
>> the owner. But the VM is not standing still...
> 
> The VM is standing still. There is no process execution while all of this 
> happens. jstack and the deadlock detection are fully executed while the JVM 
> process is halted. There is no JVM state change during any of this.

Ah! I guess we get used to talking about "at a safepoint" when we really mean 
"at a fixed point in time". So the VM is not necessarily at a safepoint, but 
everything is fixed. So invariants may not hold, but the state cannot change. 
So in the current context the anonymous owner should be found ... I guess the 
question to be answered is how the code tries to find an anonymous owner? I'm 
not sure how you can find it.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/15907#discussion_r1336549922

Re: RFR: 8316401: sun/tools/jhsdb/JStackStressTest.java failed with "InternalError: We should have found a thread that owns the anonymous lock"

2023-09-25 Thread Chris Plummer

On Tue, 26 Sep 2023 01:32:48 GMT, David Holmes  wrote:

>> If the SA is working from a snapshot then it has to create that snapshot 
>> atomically. It can't snapshot the threads, then snapshot the heap.
>
> Correction to above:
> 
> threads = VM.getVM().getThreads(); 
> heap = VM.getVM().getObjectHeap(); 
> createThreadTable(); // calls getThreads() again
> 
> The VM caches the set of threads ie the snapshot, so three sets are not 
> possible. But AFAICS the thread snapshot and heap snapshot are not atomic, so 
> the set of threads could have changed, and the state of threads also.

> Surely jstack thread dump and deadlock check _has_ to run at a safepoint?

The reality is that JVM is rarely at a safepoint (unless perhaps when all 
threads are blocked), and therefore jstack rarely is done at a safepoint. This 
is the world SA lives in. The understanding is that SA debugging features may 
give inaccurate info, or possibly not work at all.

> If the SA is working from a snapshot then it has to create that snapshot 
> atomically. It can't snapshot the threads, then snapshot the heap.

It doesn't snapshot. It does a debugger attach, which suspends the process. It 
then starts to read in pages from the process to do things like jstack. The 
process state does not change while it does this. It's not really any different 
than gdb in this regard. gdb does not let the process state change unless you 
use a command that allows execution such "step" or "continue". As long as you 
avoid the commands that allow process execution,  you can debug without 
worrying about the process state changing. However, even GDB debugging has the 
same issues with JVM safe pointing (or lack thereof). If the JVM crashes and 
you start looking at certain data, it might be inconsistent. There's no way to 
force a safepoint once there is a crash.

> @plummercj the SA code sees T2 is pending on the monitor for object O, which 
> is locked anonymously but actually by T1. The SA code then goes hunting for 
> the owner. But the VM is not standing still...

The VM is standing still. There is no process execution while all of this 
happens. jstack and the deadlock detection are fully executed while the JVM 
process is halted. There is no JVM state change during any of this.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/15907#discussion_r1336534342

Re: RFR: 8316401: sun/tools/jhsdb/JStackStressTest.java failed with "InternalError: We should have found a thread that owns the anonymous lock"

2023-09-25 Thread David Holmes

On Tue, 26 Sep 2023 01:25:59 GMT, David Holmes  wrote:

>> To expand if deadlock detection does not run at a safepoint then this logic 
>> is non-atomic and completely broken:
>> 
>>  threads = VM.getVM().getThreads();
>>  heap = VM.getVM().getObjectHeap();
>>  createThreadTable();  // calls getThreads() again
>> 
>> Without a stable VM you could have three almost completely different set of 
>> threads when each of these statements run!
>
> If the SA is working from a snapshot then it has to create that snapshot 
> atomically. It can't snapshot the threads, then snapshot the heap.

Correction to above:

threads = VM.getVM().getThreads(); 
heap = VM.getVM().getObjectHeap(); 
createThreadTable(); // calls getThreads() again

The VM caches the set of threads ie the snapshot, so three sets are not 
possible. But AFAICS the thread snapshot and heap snapshot are not atomic, so 
the set of threads could have changed, and the state of threads also.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/15907#discussion_r1336530626

Re: RFR: 8316401: sun/tools/jhsdb/JStackStressTest.java failed with "InternalError: We should have found a thread that owns the anonymous lock"

2023-09-25 Thread David Holmes

On Tue, 26 Sep 2023 01:16:11 GMT, David Holmes  wrote:

>> Edit: I missed the intermediate comments above where Roman explained.
>> 
>> @plummercj the SA code sees T2 is pending on the monitor for object O, which 
>> is locked anonymously but actually by T1. The SA code then goes hunting for 
>> the owner. But the VM is not standing still. T1 continued execution and went 
>> to release the lock, it sees it has been inflated and sets itself as the 
>> true owner, then releases the lock thus clearing the owner and potentially 
>> unblocking T2. T2 could now proceed to acquire the lock. But the SA sees  
>> none of this because it has a stale view of things.
>
> To expand if deadlock detection does not run at a safepoint then this logic 
> is non-atomic and completely broken:
> 
>  threads = VM.getVM().getThreads();
>  heap = VM.getVM().getObjectHeap();
>  createThreadTable();  // calls getThreads() again
> 
> Without a stable VM you could have three almost completely different set of 
> threads when each of these statements run!

If the SA is working from a snapshot then it has to create that snapshot 
atomically. It can't snapshot the threads, then snapshot the heap.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/15907#discussion_r1336527632

Re: RFR: 8316401: sun/tools/jhsdb/JStackStressTest.java failed with "InternalError: We should have found a thread that owns the anonymous lock"

2023-09-25 Thread David Holmes

On Tue, 26 Sep 2023 01:09:47 GMT, David Holmes  wrote:

>> Surely jstack thread dump and deadlock check _has_ to run at a safepoint?
>
> Also isn't "anonymous locking" an intermediate step in monitor inflation? The 
> inflated monitor becomes anonymously owned until the real owner sees it has 
> been inflated and updates the ownership accordingly.

To expand if deadlock detection does not run at a safepoint then this logic is 
non-atomic and completely broken:

 threads = VM.getVM().getThreads();
 heap = VM.getVM().getObjectHeap();
 createThreadTable();  // calls getThreads() again

Without a stable VM you could have three almost completely different set of 
threads when each of these statements run!

-

PR Review Comment: https://git.openjdk.org/jdk/pull/15907#discussion_r1336523076

Re: RFR: 8316401: sun/tools/jhsdb/JStackStressTest.java failed with "InternalError: We should have found a thread that owns the anonymous lock"

2023-09-25 Thread David Holmes

On Tue, 26 Sep 2023 01:07:32 GMT, David Holmes  wrote:

>>> The specific race here is that SA sees an anonymously locked ObjectMonitor 
>>> and tries to find the owning thread, and fails, presumably because that 
>>> thread has moved on and unlocked the object in the meantime.
>> 
>> But you said that when T1 starts the process of unlocking O, it sees the 
>> anonymous owner and it sets T1 as the owner before unlocking and handing off 
>> to T2. So I don't see how T1 could have "moved on". If T2 is blocked on O 
>> and O an anonymous owner, then T1 must still own it. 
>> 
>> SA always deals with a snapshot of the JVM state. Once SA attaches, no 
>> threads are running. So if O has an anonymous owner, you don't have to worry 
>> about the owner releasing the monitor while you are looking for the owner.
>> 
>> The question then becomes is there a short window while releasing the 
>> monitor that O still shows up as having an anonymous owner, but T1 has 
>> already released it. From your description of the order of things, this 
>> doesn't seem possible.
>
> Surely jstack thread dump and deadlock check _has_ to run at a safepoint?

Also isn't "anonymous locking" an intermediate step in monitor inflation? The 
inflated monitor becomes anonymously owned until the real owner sees it has 
been inflated and updates the ownership accordingly.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/15907#discussion_r1336520272

Re: RFR: 8316401: sun/tools/jhsdb/JStackStressTest.java failed with "InternalError: We should have found a thread that owns the anonymous lock"

2023-09-25 Thread David Holmes

On Mon, 25 Sep 2023 20:52:28 GMT, Chris Plummer  wrote:

>> With the new lightweight locking, when a thread T1 holds a lightweight lock 
>> on object O, and another thread T2 comes in and also wants to acquire that 
>> lock, it inflates the lock to a monitor. And since it cannot determine which 
>> threads currently locks O, it installs a special owner - which we call 
>> anonymous owner - in the ObjectMonitor, it basically means 'we don't know'. 
>> As soon as T1 unlocks O, it observes that the lock is now inflated and set 
>> to anonymous owner, installs itself as the proper owner of the ObjectMonitor 
>> and proceeds to unlock the ObjectMonitor - thereby handing over ownership to 
>> T2.
>> 
>> The specific race here is that SA sees an anonymously locked ObjectMonitor 
>> and tries to find the owning thread, and fails, presumably because that 
>> thread has moved on and unlocked the object in the meantime.
>> 
>> However, the race is more general, and pre-(lightweight-locking-)existing: 
>> if SA observes a locked object, or a thread which holds a particular lock, 
>> by the time it is reporting, this information may (likely) be wrong because 
>> the thread has, in the meantime, moved on. This in itself may not be a big 
>> problem. What may be worse is that, if SA prints stacks of all threads and 
>> also prints locking information, the overall information may be 
>> *inconsistent*. For example, it may report that two threads hold the same 
>> object.
>> Also, I am not sure how deep SA goes. Telling from the stack-trace in the 
>> original report, it looks like there is a DeadlockDetector involved. If 
>> threads don't hold still while we try to detect deadlocks, we may report 
>> false positives. (We would perhaps not report false negatives, because 
>> deadlocked thread by definition don't move ;-) )
>> 
>> I don't know enough about SA and how it deals with locking to judge how much 
>> of a problem all that is, but I think the pre-existing code (before LW 
>> locking) did not warn about locking raciness, and I don't really see that we 
>> should do this for the anonymous case.
>
>> The specific race here is that SA sees an anonymously locked ObjectMonitor 
>> and tries to find the owning thread, and fails, presumably because that 
>> thread has moved on and unlocked the object in the meantime.
> 
> But you said that when T1 starts the process of unlocking O, it sees the 
> anonymous owner and it sets T1 as the owner before unlocking and handing off 
> to T2. So I don't see how T1 could have "moved on". If T2 is blocked on O and 
> O an anonymous owner, then T1 must still own it. 
> 
> SA always deals with a snapshot of the JVM state. Once SA attaches, no 
> threads are running. So if O has an anonymous owner, you don't have to worry 
> about the owner releasing the monitor while you are looking for the owner.
> 
> The question then becomes is there a short window while releasing the monitor 
> that O still shows up as having an anonymous owner, but T1 has already 
> released it. From your description of the order of things, this doesn't seem 
> possible.

Surely jstack thread dump and deadlock check _has_ to run at a safepoint?

-

PR Review Comment: https://git.openjdk.org/jdk/pull/15907#discussion_r1336519288

Re: RFR: 8316401: sun/tools/jhsdb/JStackStressTest.java failed with "InternalError: We should have found a thread that owns the anonymous lock"

2023-09-25 Thread Chris Plummer

On Mon, 25 Sep 2023 19:59:32 GMT, Roman Kennke  wrote:

> The specific race here is that SA sees an anonymously locked ObjectMonitor 
> and tries to find the owning thread, and fails, presumably because that 
> thread has moved on and unlocked the object in the meantime.

But you said that when T1 starts the process of unlocking O, it sees the 
anonymous owner and it sets T1 as the owner before unlocking and handing off to 
T2. So I don't see how T1 could have "moved on". If T2 is blocked on O and O an 
anonymous owner, then T1 must still own it. 

SA always deals with a snapshot of the JVM state. Once SA attaches, no threads 
are running. So if O has an anonymous owner, you don't have to worry about the 
owner releasing the monitor while you are looking for the owner.

The question then becomes is there a short window while releasing the monitor 
that O still shows up as having an anonymous owner, but T1 has already released 
it. From your description of the order of things, this doesn't seem possible.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/15907#discussion_r1336377708

Re: RFR: 8316401: sun/tools/jhsdb/JStackStressTest.java failed with "InternalError: We should have found a thread that owns the anonymous lock"

2023-09-25 Thread Roman Kennke

On Mon, 25 Sep 2023 19:18:38 GMT, Chris Plummer  wrote:

>> Yes. But unless we run this at a safepoint, there is no consistent state 
>> when we race with locking. I don't think we want to spam users with warnings 
>> whenever we run into anonymous owners (which is not too infrequent). 
>> Existing (pre-lightweight-) locking doesn't print warnings either, when a 
>> thread that's expected to lock on an object has moved on or otherwise 
>> encounters racy state.
>
> I need a better understanding of anonymous owners and the race you think is 
> going on here. 99+% of the time if we are not at a safepoint, pretty much 
> everything still works fine with SA. Safe points guarantee safe access, but 
> not being at a safepoint does not guarantee there will be an issue for SA. It 
> usually still works, and we try to report when it doesn't.

With the new lightweight locking, when a thread T1 holds a lightweight lock on 
object O, and another thread T2 comes in and also wants to acquire that lock, 
it inflates the lock to a monitor. And since it cannot determine which threads 
currently locks O, it installs a special owner - which we call anonymous owner 
- in the ObjectMonitor, it basically means 'we don't know'. As soon as T1 
unlocks O, it observes that the lock is now inflated and set to anonymous 
owner, installs itself as the proper owner of the ObjectMonitor and proceeds to 
unlock the ObjectMonitor - thereby handing over ownership to T2.

The specific race here is that SA sees an anonymously locked ObjectMonitor and 
tries to find the owning thread, and fails, presumably because that thread has 
moved on and unlocked the object in the meantime.

However, the race is more general, and pre-(lightweight-locking-)existing: if 
SA observes a locked object, or a thread which holds a particular lock, by the 
time it is reporting, this information may (likely) be wrong because the thread 
has, in the meantime, moved on. This in itself may not be a big problem. What 
may be worse is that, if SA prints stacks of all threads and also prints 
locking information, the overall information may be *inconsistent*. For 
example, it may report that two threads hold the same object.
Also, I am not sure how deep SA goes. Telling from the stack-trace in the 
original report, it looks like there is a DeadlockDetector involved. If threads 
don't hold still while we try to detect deadlocks, we may report false 
positives. (We would perhaps not report false negatives, because deadlocked 
thread by definition don't move ;-) )

I don't know enough about SA and how it deals with locking to judge how much of 
a problem all that is, but I think the pre-existing code (before LW locking) 
did not warn about locking raciness, and I don't really see that we should do 
this for the anonymous case.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/15907#discussion_r1336332299

Re: RFR: 8316401: sun/tools/jhsdb/JStackStressTest.java failed with "InternalError: We should have found a thread that owns the anonymous lock"

2023-09-25 Thread Chris Plummer

On Mon, 25 Sep 2023 18:32:19 GMT, Roman Kennke  wrote:

>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/runtime/Threads.java 
>> line 247:
>> 
>>> 245: // Java code and locking state can change at any time. 
>>> This code is not
>>> 246: // expected to be precise, so we return null here.
>>> 247: return null;
>> 
>> If the JVM is in a consistent state, do we expect that this should not 
>> happen? If so, then a warning message would be the usual SA approach. 
>> Basically let the user know we detected an issue rather than just quietly 
>> succeeding.
>
> Yes. But unless we run this at a safepoint, there is no consistent state when 
> we race with locking. I don't think we want to spam users with warnings 
> whenever we run into anonymous owners (which is not too infrequent). Existing 
> (pre-lightweight-) locking doesn't print warnings either, when a thread 
> that's expected to lock on an object has moved on or otherwise encounters 
> racy state.

I need a better understanding of anonymous owners and the race you think is 
going on here. 99+% of the time if we are not at a safepoint, pretty much 
everything still works fine with SA. Safe points guarantee safe access, but not 
being at a safepoint does not guarantee there will be an issue for SA. It 
usually still works, and we try to report when it doesn't.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/15907#discussion_r1336297000

Re: RFR: 8316401: sun/tools/jhsdb/JStackStressTest.java failed with "InternalError: We should have found a thread that owns the anonymous lock"

2023-09-25 Thread Roman Kennke

On Mon, 25 Sep 2023 17:44:56 GMT, Chris Plummer  wrote:

>> The SA can run concurrently with Java threads, SA code that inspects locking 
>> state should be able to deal with that. On the other hand, the particular 
>> code is only used in printing routine and is not expected to be precise. 
>> When resolving an anonymous owner, we may not find one, because Java threads 
>> may already have moved on. Instead of crashing with a stacktrace, we should 
>> gracefully return null here.
>> 
>> Testing:
>>  - [x] sun/tools/jhsdb/JStackStressTest.java
>>  - [x] sun/tools/jhsdb
>>  - [x] serviceability/sa
>
> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/runtime/Threads.java line 
> 247:
> 
>> 245: // Java code and locking state can change at any time. 
>> This code is not
>> 246: // expected to be precise, so we return null here.
>> 247: return null;
> 
> If the JVM is in a consistent state, do we expect that this should not 
> happen? If so, then a warning message would be the usual SA approach. 
> Basically let the user know we detected an issue rather than just quietly 
> succeeding.

Yes. But unless we run this at a safepoint, there is no consistent state when 
we race with locking. I don't think we want to spam users with warnings 
whenever we run into anonymous owners (which is not too infrequent). Existing 
(pre-lightweight-) locking doesn't print warnings either, when a thread that's 
expected to lock on an object has moved on or otherwise encounters racy state.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/15907#discussion_r1336253796

Re: RFR: 8316401: sun/tools/jhsdb/JStackStressTest.java failed with "InternalError: We should have found a thread that owns the anonymous lock"

2023-09-25 Thread Chris Plummer

On Mon, 25 Sep 2023 16:16:51 GMT, Roman Kennke  wrote:

> The SA can run concurrently with Java threads, SA code that inspects locking 
> state should be able to deal with that. On the other hand, the particular 
> code is only used in printing routine and is not expected to be precise. When 
> resolving an anonymous owner, we may not find one, because Java threads may 
> already have moved on. Instead of crashing with a stacktrace, we should 
> gracefully return null here.
> 
> Testing:
>  - [x] sun/tools/jhsdb/JStackStressTest.java
>  - [x] sun/tools/jhsdb
>  - [x] serviceability/sa

src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/runtime/Threads.java line 
247:

> 245: // Java code and locking state can change at any time. 
> This code is not
> 246: // expected to be precise, so we return null here.
> 247: return null;

If the JVM is in a consistent state, do we expect that this should not happen? 
If so, then a warning message would be the usual SA approach. Basically let the 
user know we detected an issue rather than just quietly succeeding.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/15907#discussion_r1336208858

RFR: 8316401: sun/tools/jhsdb/JStackStressTest.java failed with "InternalError: We should have found a thread that owns the anonymous lock"

2023-09-25 Thread Roman Kennke

The SA can run concurrently with Java threads, SA code that inspects locking 
state should be able to deal with that. On the other hand, the particular code 
is only used in printing routine and is not expected to be precise. When 
resolving an anonymous owner, we may not find one, because Java threads may 
already have moved on. Instead of crashing with a stacktrace, we should 
gracefully return null here.

Testing:
 - [x] sun/tools/jhsdb/JStackStressTest.java
 - [x] sun/tools/jhsdb
 - [x] serviceability/sa

-

Commit messages:
 - 8316401: sun/tools/jhsdb/JStackStressTest.java failed with "InternalError: 
We should have found a thread that owns the anonymous lock"

Changes: https://git.openjdk.org/jdk/pull/15907/files
 Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15907&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8316401
  Stats: 4 lines in 1 file changed: 3 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/15907.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/15907/head:pull/15907

PR: https://git.openjdk.org/jdk/pull/15907

41 matches

Mail list logo