RE: stack overflow in regex engine

2024-05-22 Thread mark.yagnatinsky
Makes sense, thanks!

From: Philip Race 
Sent: Wednesday, May 22, 2024 2:41 PM
To: Yagnatinsky, Mark : IT (NYK) ; 
core-libs-dev@openjdk.org
Subject: Re: stack overflow in regex engine


CAUTION: This email originated from outside our organisation - 
philip.r...@oracle.com Do not click on links, 
open attachments, or respond unless you recognize the sender and can validate 
the content is safe.

On 5/22/24 10:51 AM, 
mark.yagnatin...@barclays.com wrote:
Ah, didn't realize P4 is default; that makes sense.
So I should not even be trying to derive omens from that.
So I guess only the assignee would know whether or not the status is closer to
"I was going to work on that next week" versus
"I totally forgot about that thing, and am about to forget about it again"
I'm quite sure he's on this list and will hopefully read the advocacy section 
of my email.

Um.  I feel awkward writing this paragraph because you know how OpenJDK works 
much better than I do, so it feels a bit silly to argue with you about it.  
But.  Um.
When you say "this is not the place to ask for fixes" ...
I was under the impression that "asking for fixes" actually does provide value, 
and not all of that value can be replaced by merely providing fixes.
In particular, asking for fixes gives maintainers a vague sense of how often 
people in the "real world" tend to run into an issue, which in turn informs how 
much "cost" is worth spending on addressing it.
(Where "cost" could mean things like "time" and also things like "this makes 
trickier and hence harder to maintain".)
In fact, I was under the impression that OpenJDK is slightly hostile to "big" 
fixes by "outsiders" because of the worry that there's now a big/complicated 
chunk of code that no one inside the project understands and yet the project is 
responsible for, and the original author might never be heard from again.

I think that is mainly the case for some new feature.
Or if you want to take some existing feature / functionality and re-write it in 
a different way.

True "bug fixes" to existing code are generally welcome, although that isn't 
the same as saying
they are quickly accepted.  They still need review and testing, and if the area 
is sensitive or complex that
can be quite time consuming on all ends of it. Which would all also be the case 
even if an
experienced contributor provided the fix.

-phil.



Anyway, thanks a bunch for responding; I was worried that no one would.

From: Philip Race 
Sent: Wednesday, May 22, 2024 11:54 AM
To: Yagnatinsky, Mark : IT (NYK) 
; 
core-libs-dev@openjdk.org
Subject: Re: stack overflow in regex engine


CAUTION: This email originated from outside our organisation - 
philip.r...@oracle.com Do not click on links, 
open attachments, or respond unless you recognize the sender and can validate 
the content is safe.
P4 is the default JBS priority, so sometimes it just means no one figured out 
the true priority.
But in general P4 bugs could be open for years, or even never get fixed.
The priority is also partially an assessment of where it falls as a priority 
for the JDK developers.
A user of JDK may have an entirely different perspective.
And that's why there are vendors who provide support for JDK. They can also 
arrange the backports you need.
But that's not done here. Here is where you come to participate and contribute 
fixes, not ask for fixes.
So my suggestion is to raise it via your support channel to your particular 
vendor who provided your binary.

-phil
On 5/21/24 8:46 PM, 
mark.yagnatin...@barclays.com wrote:
(Sorry about my previous "do I need to subscribe?" email; in retrospect that 
was needless noise.)
The purpose of this email is twofold: first, inquire about the status of ticket 
filed a few years ago, and second to point out some non-obvious reasons why it 
might be slightly more serious than it seems.
The ticket is this one 
https://bugs.openjdk.org/browse/JDK-8260866
 (stack overflow in regex matching quantified alternation)
The priority is listed as P4, which I guess means something like "medium" (more 
important than p5, but less than p3)
It also has a specific person assigned, which seems vaguely encouraging, but no 
updates at all in the years since it's been created, which seems less 
encouraging.
It was seemingly never once discussed on this mailing list, not even when it 
was first filed.
As an outsider, I'm not quite sure how to interpret all these various omens and 
turn them into guesses about its eventual fate.
Will it remain unfixed for another decade or two?  Will it be fixed in a few 
months, but then never backported to old 

RE: stack overflow in regex engine

2024-05-22 Thread mark.yagnatinsky
Ah, didn't realize P4 is default; that makes sense.
So I should not even be trying to derive omens from that.
So I guess only the assignee would know whether or not the status is closer to
"I was going to work on that next week" versus
"I totally forgot about that thing, and am about to forget about it again"
I'm quite sure he's on this list and will hopefully read the advocacy section 
of my email.

Um.  I feel awkward writing this paragraph because you know how OpenJDK works 
much better than I do, so it feels a bit silly to argue with you about it.  
But.  Um.
When you say "this is not the place to ask for fixes" ...
I was under the impression that "asking for fixes" actually does provide value, 
and not all of that value can be replaced by merely providing fixes.
In particular, asking for fixes gives maintainers a vague sense of how often 
people in the "real world" tend to run into an issue, which in turn informs how 
much "cost" is worth spending on addressing it.
(Where "cost" could mean things like "time" and also things like "this makes 
trickier and hence harder to maintain".)
In fact, I was under the impression that OpenJDK is slightly hostile to "big" 
fixes by "outsiders" because of the worry that there's now a big/complicated 
chunk of code that no one inside the project understands and yet the project is 
responsible for, and the original author might never be heard from again.

Anyway, thanks a bunch for responding; I was worried that no one would.

From: Philip Race 
Sent: Wednesday, May 22, 2024 11:54 AM
To: Yagnatinsky, Mark : IT (NYK) ; 
core-libs-dev@openjdk.org
Subject: Re: stack overflow in regex engine


CAUTION: This email originated from outside our organisation - 
philip.r...@oracle.com Do not click on links, 
open attachments, or respond unless you recognize the sender and can validate 
the content is safe.
P4 is the default JBS priority, so sometimes it just means no one figured out 
the true priority.
But in general P4 bugs could be open for years, or even never get fixed.
The priority is also partially an assessment of where it falls as a priority 
for the JDK developers.
A user of JDK may have an entirely different perspective.
And that's why there are vendors who provide support for JDK. They can also 
arrange the backports you need.
But that's not done here. Here is where you come to participate and contribute 
fixes, not ask for fixes.
So my suggestion is to raise it via your support channel to your particular 
vendor who provided your binary.

-phil
On 5/21/24 8:46 PM, 
mark.yagnatin...@barclays.com wrote:
(Sorry about my previous "do I need to subscribe?" email; in retrospect that 
was needless noise.)
The purpose of this email is twofold: first, inquire about the status of ticket 
filed a few years ago, and second to point out some non-obvious reasons why it 
might be slightly more serious than it seems.
The ticket is this one 
https://bugs.openjdk.org/browse/JDK-8260866
 (stack overflow in regex matching quantified alternation)
The priority is listed as P4, which I guess means something like "medium" (more 
important than p5, but less than p3)
It also has a specific person assigned, which seems vaguely encouraging, but no 
updates at all in the years since it's been created, which seems less 
encouraging.
It was seemingly never once discussed on this mailing list, not even when it 
was first filed.
As an outsider, I'm not quite sure how to interpret all these various omens and 
turn them into guesses about its eventual fate.
Will it remain unfixed for another decade or two?  Will it be fixed in a few 
months, but then never backported to old versions?  Something else?  No one 
knows?

That concludes the status inquiry.  Now on to the advocacy.  Some bugs are 
annoying, but once you hit them, you can work around them by changing your code 
so it does not trigger the bug.
Note the phrase "your code" above.  This is much more awkward to do if the bug 
triggered by third-party code you got from maven central or something.
At that point your options are to either ask the third party library to work 
around it, or else fork the dependency (which is not well supported by 
mainstream build tools (or maybe I'm just using them wrong)).
In this case, regular expressions are so ubiquitous that the bug is quite 
plausibly more likely to be triggered by some third party dependency than by 
code you own.
That was the case for me today: after spending hours trying to track down a 
stack overflow error I found the offending regex in a third party library.
The good news is that for the kinds of inputs we need to handle, it is indeed 
easy to substitute a much simpler regex that would avoid the issue.
The bad news is that it's not my code, so I can't.  I could petition the 

RE: do I need to subscribe to this list in order to post?

2024-05-22 Thread mark.yagnatinsky
Thanks!!

From: David Alayachew 
Sent: Wednesday, May 22, 2024 8:54 AM
To: Yagnatinsky, Mark : IT (NYK) 
Cc: core-libs-dev@openjdk.org
Subject: Re: do I need to subscribe to this list in order to post?


CAUTION: This email originated from outside our organisation - 
davidalayac...@gmail.com Do not click on 
links, open attachments, or respond unless you recognize the sender and can 
validate the content is safe.
Nope. We can see your message. You only subscribe if you want to follow along 
with the general discussions that you are not explicitly invited to.

On Tue, May 21, 2024 at 7:14 PM 
mailto:mark.yagnatin...@barclays.com>> wrote:


This message is for information purposes only. It is not a recommendation, 
advice, offer or solicitation to buy or sell a product or service, nor an 
official confirmation of any transaction. It is directed at persons who are 
professionals and is intended for the recipient(s) only. It is not directed at 
retail customers. This message is subject to the terms at: 
https://www.ib.barclays/disclosures/web-and-email-disclaimer.html.

For important disclosures, please see: 
https://www.ib.barclays/disclosures/sales-and-trading-disclaimer.html
 regarding marketing commentary from Barclays Sales and/or Trading desks, who 
are active market participants; 
https://www.ib.barclays/disclosures/barclays-global-markets-disclosures.html
 regarding our standard terms for Barclays Investment Bank where we trade with 
you in principal-to-principal wholesale markets transactions; and in respect to 
Barclays Research, including disclosures relating to specific issuers, see: 
http://publicresearch.barclays.com.
__
If you are incorporated or operating in Australia, read these important 
disclosures: 
https://www.ib.barclays/disclosures/important-disclosures-asia-pacific.html.
__
For more details about how we use personal information, see our privacy notice: 
https://www.ib.barclays/disclosures/personal-information-use.html.
__

This message is for information purposes only. It is not a recommendation, 
advice, offer or solicitation to buy or sell a product or service, nor an 
official confirmation of any transaction. It is directed at persons who are 
professionals and is intended for the recipient(s) only. It is not directed at 
retail customers. This message is subject to the terms at: 
https://www.ib.barclays/disclosures/web-and-email-disclaimer.html. 

For important disclosures, please see: 
https://www.ib.barclays/disclosures/sales-and-trading-disclaimer.html regarding 
marketing commentary from Barclays Sales and/or Trading desks, who are active 
market participants; 
https://www.ib.barclays/disclosures/barclays-global-markets-disclosures.html 
regarding our standard terms for Barclays Investment Bank where we trade with 
you in principal-to-principal wholesale markets transactions; and in respect to 
Barclays Research, including disclosures relating to specific issuers, see: 
http://publicresearch.barclays.com.
__
 
If you are incorporated or operating in Australia, read these important 
disclosures: 
https://www.ib.barclays/disclosures/important-disclosures-asia-pacific.html.
__
For more details about how we use personal information, see our privacy notice: 
https://www.ib.barclays/disclosures/personal-information-use.html. 
__


stack overflow in regex engine

2024-05-21 Thread mark.yagnatinsky
(Sorry about my previous "do I need to subscribe?" email; in retrospect that 
was needless noise.)
The purpose of this email is twofold: first, inquire about the status of ticket 
filed a few years ago, and second to point out some non-obvious reasons why it 
might be slightly more serious than it seems.
The ticket is this one https://bugs.openjdk.org/browse/JDK-8260866 (stack 
overflow in regex matching quantified alternation)
The priority is listed as P4, which I guess means something like "medium" (more 
important than p5, but less than p3)
It also has a specific person assigned, which seems vaguely encouraging, but no 
updates at all in the years since it's been created, which seems less 
encouraging.
It was seemingly never once discussed on this mailing list, not even when it 
was first filed.
As an outsider, I'm not quite sure how to interpret all these various omens and 
turn them into guesses about its eventual fate.
Will it remain unfixed for another decade or two?  Will it be fixed in a few 
months, but then never backported to old versions?  Something else?  No one 
knows?

That concludes the status inquiry.  Now on to the advocacy.  Some bugs are 
annoying, but once you hit them, you can work around them by changing your code 
so it does not trigger the bug.
Note the phrase "your code" above.  This is much more awkward to do if the bug 
triggered by third-party code you got from maven central or something.
At that point your options are to either ask the third party library to work 
around it, or else fork the dependency (which is not well supported by 
mainstream build tools (or maybe I'm just using them wrong)).
In this case, regular expressions are so ubiquitous that the bug is quite 
plausibly more likely to be triggered by some third party dependency than by 
code you own.
That was the case for me today: after spending hours trying to track down a 
stack overflow error I found the offending regex in a third party library.
The good news is that for the kinds of inputs we need to handle, it is indeed 
easy to substitute a much simpler regex that would avoid the issue.
The bad news is that it's not my code, so I can't.  I could petition the 
maintainers of the library, but this is not great because:
First, maybe the version I'm on is not longer even supported, and newer 
versions are not compatible,
Second, it may take them a while to fix it, and third, it is wasteful (and 
inelegant) to have workarounds slowly percolate throughout the Java ecosystem 
instead of fixing the problem at the root.

The other annoying thing here is that even when you have "enough" stack space 
to avoid crashing, using it may not be quite "free".
For instance, project loom's foundational premise seems to be that "most 
threads have oversized stacks; we can have more threads if we start off with 
small stacks and grow them only when needed".
This would be false when the thread in question uses a regex with quantified 
alternation.
(Since many Loom threads will be based on the same Runnable, it's a pretty safe 
bet that if one of them uses this feature, many will, so you can't assume it 
will "average out".)
There are other reasons besides loom to be low on stack space; maybe you're 
using some crazy framework(s) that like(s) to have call stacks that are crazy 
deep.
Or maybe you're running with -Xss set pretty low.  Or you passed a small value 
for stack space to the Thread constructor.
Or maybe none of these things are true, but in most operating systems a thread 
stack costs "real" memory in proportion to its high-watermark, so even a SINGLE 
heavy regex in the lifetime of a thread is tantamount to a memory leak of 
hundreds of kilobytes.

Practicalities aside, I don't like it when code consumes "surprising" types of 
resources, or surprising amounts of them.
For instance, you wouldn't expect a sorting function to spawn threads behind 
your back, unless it was called "parallel sort" or something like that.
You wouldn't expect it to allocate multi-gigabyte arrays, nor to perform I/O.
Similarly, most functions need only O(1) stack space, so this tends to be the 
default assumption unless the docs explicitly call out "this thing might throw 
stack overflows at you so make sure you have plenty of stack space"
Some need a bit more... for instance, I would not be surprised if a regex need 
stack space in proportion to the depth of the parse tree of the regex.
But stack space in proportion to the length of the string being matched is the 
kind of thing that I'd hope gets called out in those @implNotes thingies, or 
better yet fixed.

Even people who know that regex matching can sometimes take exponential time 
may naively assume that regex matching would not consume O(n) stack space, 
where n is the input length.
What's worse, not only does it indeed consume stack space linear in the length 
of the input, but the constant hidden by the O() notation is itself pretty 
scary.
For instance, consider the regex that caused my 

do I need to subscribe to this list in order to post?

2024-05-21 Thread mark.yagnatinsky


This message is for information purposes only. It is not a recommendation, 
advice, offer or solicitation to buy or sell a product or service, nor an 
official confirmation of any transaction. It is directed at persons who are 
professionals and is intended for the recipient(s) only. It is not directed at 
retail customers. This message is subject to the terms at: 
https://www.ib.barclays/disclosures/web-and-email-disclaimer.html. 

For important disclosures, please see: 
https://www.ib.barclays/disclosures/sales-and-trading-disclaimer.html regarding 
marketing commentary from Barclays Sales and/or Trading desks, who are active 
market participants; 
https://www.ib.barclays/disclosures/barclays-global-markets-disclosures.html 
regarding our standard terms for Barclays Investment Bank where we trade with 
you in principal-to-principal wholesale markets transactions; and in respect to 
Barclays Research, including disclosures relating to specific issuers, see: 
http://publicresearch.barclays.com.
__
 
If you are incorporated or operating in Australia, read these important 
disclosures: 
https://www.ib.barclays/disclosures/important-disclosures-asia-pacific.html.
__
For more details about how we use personal information, see our privacy notice: 
https://www.ib.barclays/disclosures/personal-information-use.html. 
__


RE: a plea for blocking peek

2023-05-02 Thread mark.yagnatinsky
Thanks both for responding!  Re: DelayQueue peek() being special: you're right, 
I missed that.
So my original proposal would not solve my problem.
I really want to "block until poll() would return non-null"
Let's pretend that's what blockingPeek() would do, though maybe the name is no 
longer suitable.

Re: John H: good point that I did not really explain what I'm trying to do.
Basically I wanted a "keyed" DelayQueue with one added feature:
AFTER inserting something into the queue, I discover that the initial delay is 
too soon and should be postponed.
(Use case: imagine something should happen at least once per hour.
So we set a timer for one hour and worry if the timer goes off.
We need to be able to reset the timer if it happens ahead of schedule, or we'll 
worry for no reason.)
The way I wanted to implement this is to store the "real" expiration in a map.
This involves enforcing the invariant that the queue and map remain "in sync".
Thus I prefer to hold a lock if I'm mutating either the queue or the map, so 
that changes appear atomic to other threads.

Hope that's clearer now.

From: Viktor Klang 
Sent: Tuesday, May 2, 2023 4:08 AM
To: John Hendrikx ; core-libs-dev@openjdk.org; 
Yagnatinsky, Mark : Markets Pre Trade 
Subject: Re: a plea for blocking peek


CAUTION: This email originated from outside our organisation - 
viktor.kl...@oracle.com Do not click on links, 
open attachments, or respond unless you recognize the sender and can validate 
the content is safe.
Hi,

I think the conversation also becomes a bit more difficult since the topic 
seems to be around DelayQueue-which has slightly different semantics than 
typical BlockingQueue implementations.
For instance, peek() returns the element with the lowest expiry (possibly in 
the future)-so you cannot really use the blocking peek() for much else than 
observing the lowest expiry-and the only difference to the non-blocking peek() 
would be that you can block the reader for as long as the queue is empty 
without A) removing the element and B) doing busy-waiting with some form of 
back-off policy which might ending up sleeping for too long.
Since queues tend to spend their time being either full or empty (since there's 
typically either a faster consumption than production or vice versa), I can 
understand the need to be notified when the queue is no longer empty. With that 
said, in the case of DelayQueue, it still doesn't mean that the result of a 
blocking peek() means that the element is consumable at that point. So the 
question is what you'd use the information for?

When composing multiple data structures, especially concurrent ones, I find it 
easier to reason about the problem space starting with the interaction pattern 
from the outside (who consumes and who produces), as the less coordination 
needed the better in terms of throughput (Amdahl's Law & Universal Scalability 
Law) etc, and if you can design for single-reader or single-writer (or both!) 
you can get away with much more performant implementations since the 
coordination need is much lower.



From: core-libs-dev 
mailto:core-libs-dev-r...@openjdk.org>> on 
behalf of John Hendrikx 
mailto:john.hendr...@gmail.com>>
Sent: Tuesday, 2 May 2023 08:46
To: core-libs-dev@openjdk.org 
mailto:core-libs-dev@openjdk.org>>; 
mark.yagnatin...@barclays.com 
mailto:mark.yagnatin...@barclays.com>>
Subject: Re: a plea for blocking peek


Hi,

I think it would help if you describe your problem a bit more directly.  
Currently there is a lot of mention of difficulty levels, usual approaches and 
"winning", which really doesn't help to ascertain what you are trying to 
achieve. For a good re-evaluation of your request, you are going to need to 
show some (pseudo) code so people can see if this indeed is a good use case, or 
that there is an elegant alternative solution.

It's really hard to glean from your post what you are trying to do.  It seems 
like you have a queue and a map.  Items you get from the queue need to be 
verified against the map.  The lock seems to protect the map structure against 
concurrent modification, and for some reason getting the lock after `take` 
isn't good enough.

--John
On 01/05/2023 19:12, 
mark.yagnatin...@barclays.com wrote:


I'm not sure if this I'm breaking etiquette for this list but I'm going to risk 
resending my first message because it got zero replies.  If it was simply too 
long, please let me know and I'll attempt a shorter version.



Here's like to original:

https://mail.openjdk.org/pipermail/core-libs-dev/2023-April/104931.html



From: Yagnatinsky, Mark : Markets Pre Trade
Sent: Thursday, April 27, 2023 

a plea for blocking peek

2023-05-01 Thread mark.yagnatinsky
I'm not sure if this I'm breaking etiquette for this list but I'm going to risk 
resending my first message because it got zero replies.  If it was simply too 
long, please let me know and I'll attempt a shorter version.

Here's like to original:
https://mail.openjdk.org/pipermail/core-libs-dev/2023-April/104931.html

From: Yagnatinsky, Mark : Markets Pre Trade
Sent: Thursday, April 27, 2023 12:28 PM
To: core-libs-dev@openjdk.org
Subject: RE: blocking peek for blocking queues

Someone said this is the right list for this, so forwarding from jdk-dev with a 
small tweak

I'd like to make a case for adding a blocking variant of peek() to 
BlockingQueue.
This has apparently been suggested before:
https://bugs.openjdk.org/browse/JDK-6653412
At the time, the verdict was that the proposed use case for a blocking peek was 
not very compelling, but " If a compelling use case is available, we may 
reconsider and reopen this RFE"
For one thing, that use case was inherently racy, and ideally the standard 
library should not be encouraging races.
And for another thing, this was before the invention of default methods, so a 
new interface would be needed.
I'd like to present what might be a more sympathetic use case, where a blocking 
peek actually makes it easier to avoid races.
(This is not hypothetical; I've spent all day trying to cope with the lack of a 
blocking peek.  I think I succeeded, but it was tough.)

Let's start at the beginning.  The game known as "multithreaded programming" 
can be played on 3 difficulty levels: easy, medium, and hard.
Easy mode is when the JDK happens to have a high-level tool (e.g., a Collection 
or an Executor or something) such as DelayQueue that does exactly what you want.
Sadly, the JDK is finite, and even Maven Central is finite, so this not 
guaranteed to happen, though it's nice when it does.
At the other extreme is hard mode, when every nanosecond counts and you must 
resort to the sorts of mental gymnastics that were involved when 
String.hashCode was re-worked to avoid re-computing hash codes equal to zero.

This email is about the medium difficulty level.  In this mode, you must glue 
together two or three collections to get what you want.
To be concrete, suppose we want to glue a map together with a blocking queue, 
since that's a simplified version of what I was doing.  (In particular I was 
using a DelayQueue and a HashMap.)
Since we're not playing on hard mode, we're going to allow ourselves to guard 
the entire data structure by one giant lock whenever it seems convenient, 
without worrying about performance.
In fact, let's give a name to this lock.  Since I'm not feeling very creative 
right now, let's call it LOCK.  Similarly, let's call our compound data 
structure STRUCT.
My usual heuristic to "win" (that is, to make sure my code is correct, or at 
least non-racy) on medium difficulty goes something like this.
First, figure out whether there is any way for STRUCT to be "invalid".
For instance, maybe "it's valid if and only if every entry in the queue has a 
corresponding entry in the map".
One could then write a brief comment in the code explaining what "valid" means, 
or perhaps even write an "isValid()" method for testing purposes.
Once we've decided what it means for STRUCT to be "valid" (aka, non-corrupt), 
we can try to maintain the following invariant:
EITHER some thread holds the LOCK, and no other thread is using STRUCT right 
now,
OR ELSE the LOCK  is not held by any thread, and thus STRUCT is currently in a 
valid state.
Sometimes the invariant above is a bit too strong, and it's convenient to 
weaken it slightly.  So we can instead consider the following rule:
RULE: we must never mutate STRUCT unless we hold the LOCK.
If we uphold the invariant, then we must be following the rule, but not vice 
versa.
(The rule doesn't forbid queries without the lock; it only forbids writes.)

Finally, we come to the point.  The lack of a blocking peek in BlockingQueue 
has forced me to break the above "weakened" rule.
I had to invent a new much weaker rule, something like "please carefully think 
through all possible interleaved executions; if they all work, great!".
What went wrong?  Well, one of the methods I wanted on STRUCT was basically 
"call take() on the queue, and then do things with the result".
The trouble is that you can't usually afford to hold a lock while calling 
take(), since it can block forever, and in the meantime you've "locked out" 
anyone who could have added something and thus unblocked you.
Thus, I split my method into two pieces: In PART_ONE we call take() without the 
LOCK, and once take() returns I grab the lock right away and run PART_TWO.
But notice that this breaks the RULE from the previous paragraph: I mutated the 
STRUCT without holding the LOCK.
This means my method is not atomic: it's possible for some other thread to 
observe that PART_ONE is done but PART_TWO is not started.
Typically this means that other threads can now observe th

RE: blocking peek for blocking queues

2023-04-27 Thread mark.yagnatinsky
Someone said this is the right list for this, so forwarding from jdk-dev with a 
small tweak

I'd like to make a case for adding a blocking variant of peek() to 
BlockingQueue.
This has apparently been suggested before:
https://bugs.openjdk.org/browse/JDK-6653412
At the time, the verdict was that the proposed use case for a blocking peek was 
not very compelling, but " If a compelling use case is available, we may 
reconsider and reopen this RFE"
For one thing, that use case was inherently racy, and ideally the standard 
library should not be encouraging races.
And for another thing, this was before the invention of default methods, so a 
new interface would be needed.
I'd like to present what might be a more sympathetic use case, where a blocking 
peek actually makes it easier to avoid races.
(This is not hypothetical; I've spent all day trying to cope with the lack of a 
blocking peek.  I think I succeeded, but it was tough.)

Let's start at the beginning.  The game known as "multithreaded programming" 
can be played on 3 difficulty levels: easy, medium, and hard.
Easy mode is when the JDK happens to have a high-level tool (e.g., a Collection 
or an Executor or something) such as DelayQueue that does exactly what you want.
Sadly, the JDK is finite, and even Maven Central is finite, so this not 
guaranteed to happen, though it's nice when it does.
At the other extreme is hard mode, when every nanosecond counts and you must 
resort to the sorts of mental gymnastics that were involved when 
String.hashCode was re-worked to avoid re-computing hash codes equal to zero.

This email is about the medium difficulty level.  In this mode, you must glue 
together two or three collections to get what you want.
To be concrete, suppose we want to glue a map together with a blocking queue, 
since that's a simplified version of what I was doing.  (In particular I was 
using a DelayQueue and a HashMap.)
Since we're not playing on hard mode, we're going to allow ourselves to guard 
the entire data structure by one giant lock whenever it seems convenient, 
without worrying about performance.
In fact, let's give a name to this lock.  Since I'm not feeling very creative 
right now, let's call it LOCK.  Similarly, let's call our compound data 
structure STRUCT.
My usual heuristic to "win" (that is, to make sure my code is correct, or at 
least non-racy) on medium difficulty goes something like this.
First, figure out whether there is any way for STRUCT to be "invalid".
For instance, maybe "it's valid if and only if every entry in the queue has a 
corresponding entry in the map".
One could then write a brief comment in the code explaining what "valid" means, 
or perhaps even write an "isValid()" method for testing purposes.
Once we've decided what it means for STRUCT to be "valid" (aka, non-corrupt), 
we can try to maintain the following invariant:
EITHER some thread holds the LOCK, and no other thread is using STRUCT right 
now,
OR ELSE the LOCK  is not held by any thread, and thus STRUCT is currently in a 
valid state.
Sometimes the invariant above is a bit too strong, and it's convenient to 
weaken it slightly.  So we can instead consider the following rule:
RULE: we must never mutate STRUCT unless we hold the LOCK.
If we uphold the invariant, then we must be following the rule, but not vice 
versa.
(The rule doesn't forbid queries without the lock; it only forbids writes.)

Finally, we come to the point.  The lack of a blocking peek in BlockingQueue 
has forced me to break the above "weakened" rule.
I had to invent a new much weaker rule, something like "please carefully think 
through all possible interleaved executions; if they all work, great!".
What went wrong?  Well, one of the methods I wanted on STRUCT was basically 
"call take() on the queue, and then do things with the result".
The trouble is that you can't usually afford to hold a lock while calling 
take(), since it can block forever, and in the meantime you've "locked out" 
anyone who could have added something and thus unblocked you.
Thus, I split my method into two pieces: In PART_ONE we call take() without the 
LOCK, and once take() returns I grab the lock right away and run PART_TWO.
But notice that this breaks the RULE from the previous paragraph: I mutated the 
STRUCT without holding the LOCK.
This means my method is not atomic: it's possible for some other thread to 
observe that PART_ONE is done but PART_TWO is not started.
Typically this means that other threads can now observe the STRUCT in an 
invalid (aka corrupt/broken) state.

Now, suppose we had a blocking peek.  How does that help?  In that case, we 
would still break our method into two pieces.
To avoid confusion, let's call them PART_A and PART_B.  In PART_A we call 
blockingPeek() without holding the LOCK.
Note that we completely ignore the return value from peek; for all we care, the 
return type is "void" instead of "T".
This breaks the "invariant" from a few paragraphs ago since we're techn