Re: IEP-70: Async Continuation Executor

Stanislav Lukyanov Thu, 25 Mar 2021 05:52:38 -0700

I don't agree that the code isn't related to Ignite - it is something that the 
user does via Ignite API,
so they would be right to expect Ignite to handle the invocations.


One more thing I'd like to mention - monitoring.
I don't think commonPool exposes anything to JMX by default, and a custom 
executor created by user certainly won't either.
If we create the pool we can register the MX Bean for it, as we usually do.

Let's split this topic into two questions: what should be the default executor, 
and what should be the configuration options.

1. What should be the default executor?
My first choice would be public thread pool.
In fact, the continuations are likely to be created in two contexts: on a 
client, or in a compute job on a server.
In both cases, public pool seems alright to me.

My second choice - a separate new IgniteThreadPoolExecutor.
I like it better than the proposed commonPool()l because of the monitoring 
options, and because we can control it better if needed.

2. What should be the configuration options?
If we go with the public pool - no additional options needed.

For a new ignite pool, we'll need one more option with the size of the pool - 
easy to understand and should be more than enough for the user.

I don't like accepting the executor from the user, mostly because I don't see 
benefits in this flexibility, but it makes API non-uniform 
(e.g. if the user wants to make all pools smaller they'd need to change size 
properties for other pools, but create a custom pool for this one),
and makes it harder to configure parallelism level via XML (which is the most 
common thing people want to configure for thread pools).

WDYT?

Thanks,
Stan



> On 25 Mar 2021, at 14:04, Pavel Tupitsyn <[email protected]> wrote:
> 
> Stan,
> 
> We have thread pools dedicated for specific purposes, like cache (striped),
> compute (pub), query, etc
> As I understand it, the reason here is to limit the number of threads
> dedicated to a given subsystem.
> For example, Compute may be overloaded with work, but Cache and Discovery
> will keep going.
> 
> This is different from async continuations, which are arbitrary user code.
> So what is the benefit of having a new user pool for arbitrary code that is
> probably not related to Ignite at all?
> 
> On Thu, Mar 25, 2021 at 1:31 PM <[email protected]> wrote:
> 
>> Pavel,
>> 
>> This is a great work, fully agree with the overall idea and approach.
>> 
>> However, I have some reservations about the API. We sure do have a lot of
>> async stuff in the system, and I would suggest to stick to the usual design
>> - create a separate thread pool, add a single property to control the size
>> of the pool.
>> Alternatively, we may consider using public pool for that. May I ask if
>> there is an example use case which doesn’t work with public pool?
>> 
>> For .NET, agree that we should follow the rules and APIs of the platform,
>> so the behavior might slightly differ.
>> 
>> Thanks,
>> Stan
>> 
>>> On 24 Mar 2021, at 09:52, Pavel Tupitsyn <[email protected]> wrote:
>>> 
>>> Igniters, since there are no more comments and/or review feedback,
>>> I'm going to merge the changes by the end of the week.
>>> 
>>>> On Mon, Mar 22, 2021 at 10:37 PM Pavel Tupitsyn <[email protected]>
>>>> wrote:
>>>> 
>>>> Ready for review:
>>>> https://github.com/apache/ignite/pull/8870
>>>> 
>>>> On Sun, Mar 21, 2021 at 8:09 PM Pavel Tupitsyn <[email protected]>
>>>> wrote:
>>>> 
>>>>> Simple benchmark added - see JmhCacheAsyncListenBenchmark in the PR.
>>>>> There is a 6-8% drop (1 client, 2 servers, 1 machine, int key/val).
>>>>> I expect this difference to become barely observable on real-world
>>>>> workloads.
>>>>> 
>>>>> On Thu, Mar 18, 2021 at 12:35 PM Pavel Tupitsyn <[email protected]>
>>>>> wrote:
>>>>> 
>>>>>> Denis,
>>>>>> 
>>>>>> For a reproducer, please see CacheAsyncContinuationExecutorTest.java
>> in
>>>>>> the linked PoC [1]
>>>>>> 
>>>>>> [1]
>>>>>> 
>> https://github.com/apache/ignite/pull/8870/files#diff-c788c12013622093df07d8f628a6e8c5fb0c15007f0787f2d459dbb3e377fc5aR54
>>>>>> 
>>>>>> On Thu, Mar 18, 2021 at 1:58 AM Raymond Wilson <
>>>>>> [email protected]> wrote:
>>>>>> 
>>>>>>> We implemented the ContinueWith() suggestion from Pavel, and it works
>>>>>>> well
>>>>>>> so far in testing, though we do not have data to support if there is
>> or
>>>>>>> is
>>>>>>> not a performance penalty in our use case..
>>>>>>> 
>>>>>>> To lend another vote to the 'Don't do continuations on the striped
>>>>>>> thread
>>>>>>> pool' line of thinking: Deadlocking is an issue as is starvation. In
>>>>>>> some
>>>>>>> ways starvation is more insidious because by the time things stop
>>>>>>> working
>>>>>>> the cause and effect distance may be large.
>>>>>>> 
>>>>>>> I appreciate the documentation does make statements about not
>> performing
>>>>>>> cache operations in a continuation due to deadlock possibilities, but
>>>>>>> that
>>>>>>> statement does not reveal why this is an issue. It is less a case of
>> a
>>>>>>> async cache operation being followed by some other cache operation
>> (an
>>>>>>> immediate issue), and more a general case of the continuation of
>>>>>>> application logic using a striped pool thread in a way that might
>> mean
>>>>>>> that
>>>>>>> thread is never given back - it's now just a piece of the application
>>>>>>> infrastructure until some other application activity schedules away
>> from
>>>>>>> that thread (eg: by ContinueWith or some other async operation in the
>>>>>>> application code that releases the thread). To be fair, beyond
>>>>>>> structures
>>>>>>> like ContinueWith(), it is not obvious how that continuation thread
>>>>>>> should
>>>>>>> be handed back. This will be the same behaviour for dedicated
>>>>>>> continuation
>>>>>>> pools, but with far less risk in the absence of ContinueWith()
>>>>>>> constructs.
>>>>>>> 
>>>>>>> In the .Net world this is becoming more of an issue as fewer .Net use
>>>>>>> cases
>>>>>>> outside of UI bother with synchronization contexts by default.
>>>>>>> 
>>>>>>> Raymond.
>>>>>>> 
>>>>>>> 
>>>>>>> On Thu, Mar 18, 2021 at 9:56 AM Valentin Kulichenko <
>>>>>>> [email protected]> wrote:
>>>>>>> 
>>>>>>>> Hi Denis,
>>>>>>>> 
>>>>>>>> I think Pavel's main point is that behavior is unpredictable. For
>>>>>>> example,
>>>>>>>> AFAIK, putAsync can be executed in the main thread instead of the
>>>>>>> striped
>>>>>>>> pool thread if the operation is local. The listener can also be
>>>>>>> executed in
>>>>>>>> the main thread - this happens if the future is completed prior to
>>>>>>> listener
>>>>>>>> invocation (this is actually quite possible in the unit test
>>>>>>> environment
>>>>>>>> causing the test to pass). Finally, I'm pretty sure there are many
>>>>>>> cases
>>>>>>>> when a deadlock does not occur right away, but instead it will
>> reveal
>>>>>>>> itself under high load due to thread starvation. The latter type of
>>>>>>> issues
>>>>>>>> is the most dangerous because they are often reproduced only in
>>>>>>> production.
>>>>>>>> Finally, there are performance considerations as well - cache
>>>>>>> operations
>>>>>>>> and listeners share the same fixed-sized pool which can have
>> negative
>>>>>>>> effects.
>>>>>>>> 
>>>>>>>> I'm OK with the change. Although, it might be better to introduce a
>>>>>>> new
>>>>>>>> fixed-sized pool instead of ForkJoinPool for listeners, simply
>>>>>>> because this
>>>>>>>> is the approach taken throughout the project. But this is up to a
>>>>>>> debate.
>>>>>>>> 
>>>>>>>> -Val
>>>>>>>> 
>>>>>>>> On Wed, Mar 17, 2021 at 11:31 AM Denis Garus <[email protected]>
>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Pavel,
>>>>>>>>> I tried this:
>>>>>>>>> 
>>>>>>>>> @Test
>>>>>>>>> public void test() throws Exception {
>>>>>>>>>   IgniteCache<Integer, String> cache =
>>>>>>>>> startGrid().getOrCreateCache("test_cache");
>>>>>>>>> 
>>>>>>>>>   cache.putAsync(1, "one").listen(f -> cache.replace(1, "two"));
>>>>>>>>> 
>>>>>>>>>   assertEquals("two", cache.get(1));
>>>>>>>>> }
>>>>>>>>> 
>>>>>>>>> and this test is green.
>>>>>>>>> I believe that an user can make listener that leads to deadlock,
>> but
>>>>>>>>> the example in the IEP does not reflect this.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> ср, 17 мар. 2021 г. в 17:36, Вячеслав Коптилин <
>>>>>>> [email protected]
>>>>>>>>> :
>>>>>>>>> 
>>>>>>>>>> Hi Pavel,
>>>>>>>>>> 
>>>>>>>>>>> Not a good excuse really. We have a usability problem, you have
>>>>>>> to
>>>>>>>>> admit
>>>>>>>>>> it.
>>>>>>>>>> Fair enough. I agree that this is a usability issue, but I have
>>>>>>> doubts
>>>>>>>>> that
>>>>>>>>>> the proposed approach to overcome it is the best one.
>>>>>>>>>> 
>>>>>>>>>>> Documentation won't help - no one is going to read the Javadoc
>>>>>>> for a
>>>>>>>>>> trivial method like putAsync
>>>>>>>>>> That is sad... However, I don't think that this is a strong
>>>>>>> argument
>>>>>>>>> here.
>>>>>>>>>> 
>>>>>>>>>> This is just my opinion. Let's see what other community members
>>>>>>> have to
>>>>>>>>>> say.
>>>>>>>>>> 
>>>>>>>>>> Thanks,
>>>>>>>>>> S.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> ср, 17 мар. 2021 г. в 17:01, Pavel Tupitsyn <[email protected]
>>>>>>>> :
>>>>>>>>>> 
>>>>>>>>>>>> the user should use the right API
>>>>>>>>>>> 
>>>>>>>>>>> Not a good excuse really. We have a usability problem, you have
>>>>>>> to
>>>>>>>>> admit
>>>>>>>>>>> it.
>>>>>>>>>>> "The brakes did not work on your car - too bad, you should have
>>>>>>> known
>>>>>>>>>> that
>>>>>>>>>>> on Sundays only your left foot is allowed on the pedal"
>>>>>>>>>>> 
>>>>>>>>>>> This particular use case is too intricate.
>>>>>>>>>>> Even when you know about that, it is difficult to decide what
>>>>>>> can run
>>>>>>>>> on
>>>>>>>>>>> the striped pool,
>>>>>>>>>>> and what can't. It is too easy to forget.
>>>>>>>>>>> And most people don't know, even among Ignite developers.
>>>>>>>>>>> 
>>>>>>>>>>> Documentation won't help - no one is going to read the Javadoc
>>>>>>> for a
>>>>>>>>>>> trivial method like putAsync.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> So I propose to have a safe default.
>>>>>>>>>>> Then document the performance tuning opportunity on [1].
>>>>>>>>>>> 
>>>>>>>>>>> Think about how many users abandon a product because it
>>>>>>> mysteriously
>>>>>>>>>>> crashes and hangs.
>>>>>>>>>>> 
>>>>>>>>>>> [1]
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>> https://ignite.apache.org/docs/latest/perf-and-troubleshooting/general-perf-tips
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> On Wed, Mar 17, 2021 at 4:21 PM Вячеслав Коптилин <
>>>>>>>>>>> [email protected]>
>>>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> Hi Pavel,
>>>>>>>>>>>> 
>>>>>>>>>>>> Well, I think that the user should use the right API instead
>>>>>>> of
>>>>>>>>>>> introducing
>>>>>>>>>>>> uncontested overhead for everyone.
>>>>>>>>>>>> For instance, the code that is provided by IEP can changed as
>>>>>>>>> follows:
>>>>>>>>>>>> 
>>>>>>>>>>>> IgniteFuture fut = cache.putAsync(1, 1);
>>>>>>>>>>>> fut.listenAync(f -> {
>>>>>>>>>>>>   // Executes on Striped pool and deadlocks.
>>>>>>>>>>>>   cache.replace(1, 2);
>>>>>>>>>>>> }, ForkJoinPool.commonPool());
>>>>>>>>>>>> 
>>>>>>>>>>>> Of course, it does not mean that this fact should not be
>>>>>>> properly
>>>>>>>>>>>> documented.
>>>>>>>>>>>> Perhaps, I am missing something.
>>>>>>>>>>>> 
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> S.
>>>>>>>>>>>> 
>>>>>>>>>>>> ср, 17 мар. 2021 г. в 16:01, Pavel Tupitsyn <
>>>>>>> [email protected]
>>>>>>>>> :
>>>>>>>>>>>> 
>>>>>>>>>>>>> Slava,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Your suggestion is to keep things as is and discard the IEP,
>>>>>>>> right?
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> this can lead to significant overhead
>>>>>>>>>>>>> Yes, there is some overhead, but the cost of accidentally
>>>>>>>> starving
>>>>>>>>>> the
>>>>>>>>>>>>> striped pool is worse,
>>>>>>>>>>>>> not to mention the deadlocks.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I believe that we should favor correctness over performance
>>>>>>> in
>>>>>>>> any
>>>>>>>>>>> case.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Wed, Mar 17, 2021 at 3:34 PM Вячеслав Коптилин <
>>>>>>>>>>>>> [email protected]>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Well, the specified method already exists :)
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>   /**
>>>>>>>>>>>>>>    * Registers listener closure to be asynchronously
>>>>>>> notified
>>>>>>>>>>>> whenever
>>>>>>>>>>>>>> future completes.
>>>>>>>>>>>>>>    * Closure will be processed in specified executor.
>>>>>>>>>>>>>>    *
>>>>>>>>>>>>>>    * @param lsnr Listener closure to register. Cannot be
>>>>>>>> {@code
>>>>>>>>>>>> null}.
>>>>>>>>>>>>>>    * @param exec Executor to run listener. Cannot be
>>>>>>> {@code
>>>>>>>>>> null}.
>>>>>>>>>>>>>>    */
>>>>>>>>>>>>>>   public void listenAsync(IgniteInClosure<? super
>>>>>>>>>> IgniteFuture<V>>
>>>>>>>>>>>>> lsnr,
>>>>>>>>>>>>>> Executor exec);
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> S.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> ср, 17 мар. 2021 г. в 15:25, Вячеслав Коптилин <
>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>> :
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Hello Pavel,
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> I took a look at your IEP and pool request. I have the
>>>>>>>>> following
>>>>>>>>>>>>>> concerns.
>>>>>>>>>>>>>>> First of all, this change breaks the contract of
>>>>>>>>>>>>>> IgniteFuture#listen(lsnr)
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>   /**
>>>>>>>>>>>>>>>    * Registers listener closure to be asynchronously
>>>>>>>> notified
>>>>>>>>>>>>> whenever
>>>>>>>>>>>>>>> future completes.
>>>>>>>>>>>>>>>    * Closure will be processed in thread that
>>>>>>> completes
>>>>>>>> this
>>>>>>>>>>> future
>>>>>>>>>>>>> or
>>>>>>>>>>>>>>> (if future already
>>>>>>>>>>>>>>>    * completed) immediately in current thread.
>>>>>>>>>>>>>>>    *
>>>>>>>>>>>>>>>    * @param lsnr Listener closure to register. Cannot
>>>>>>> be
>>>>>>>>> {@code
>>>>>>>>>>>>> null}.
>>>>>>>>>>>>>>>    */
>>>>>>>>>>>>>>>   public void listen(IgniteInClosure<? super
>>>>>>>> IgniteFuture<V>>
>>>>>>>>>>>> lsnr);
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>   In your pull request, the listener is always called
>>>>>>> from
>>>>>>>> a
>>>>>>>>>>>>> specified
>>>>>>>>>>>>>>> thread pool (which is fork-join by default)
>>>>>>>>>>>>>>>   even though the future is already completed at the
>>>>>>> moment
>>>>>>>>> the
>>>>>>>>>>>>> listen
>>>>>>>>>>>>>>> method is called.
>>>>>>>>>>>>>>>   In my opinion, this can lead to significant
>>>>>>> overhead -
>>>>>>>>>>> submission
>>>>>>>>>>>>>>> requires acquiring a lock and notifying a pool thread.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>   It seems to me, that we should not change the
>>>>>>> current
>>>>>>>>>> behavior.
>>>>>>>>>>>>>>> However, thread pool executor can be added as an
>>>>>>> optional
>>>>>>>>>> parameter
>>>>>>>>>>>> of
>>>>>>>>>>>>>>> listen() method as follows:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>       public void listen(IgniteInClosure<? super
>>>>>>>>>> IgniteFuture<V>>
>>>>>>>>>>>>> lsnr,
>>>>>>>>>>>>>>> Executor exec);
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>> S.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> пн, 15 мар. 2021 г. в 19:24, Pavel Tupitsyn <
>>>>>>>>>> [email protected]
>>>>>>>>>>>> :
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Igniters,
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Please review the IEP [1] and let me know your
>>>>>>> thoughts.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>> https://cwiki.apache.org/confluence/display/IGNITE/IEP-70%3A+Async+Continuation+Executor
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> <http://www.trimble.com/>
>>>>>>> Raymond Wilson
>>>>>>> Solution Architect, Civil Construction Software Systems (CCSS)
>>>>>>> 11 Birmingham Drive | Christchurch, New Zealand
>>>>>>> [email protected]
>>>>>>> 
>>>>>>> <
>>>>>>> 
>> https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>

Re: IEP-70: Async Continuation Executor

Reply via email to