Re: Asynchronous http poll

Didier Mon, 14 May 2018 17:58:03 -0700

Oh, I forgot something important.

If you're hoping to have multiple hosts, and run this application in a 
distributed way, you really should not do this that way. Things get a lot 
more complicated. The problem is, your request queue is local to a host. So 
if the client creates the Future on S1 on host A, and calls for 
get-s1-result, and he is routed to host B? That Future will be missing.


So what you need is to turn that atom map of Futures into a distributed 
one. You could still have the Future atom map, but as the last step of each 
Future, you need to update the distributed map with the result or error. 
And if you want statuses, in your loop, you should also update it for 
status. So on get-s1-result, you just check the value of that distributed 
map. Each host still processes their own share of requests, but the 
distributed map exposes their result and processing status to all other 
hosts.

There's many other ways to handle this issue. For example, I believe you 
can route the client to a direct connection to the particular host who 
handled S1, so that calls to get-s1-result go to that specific host. The 
downside is, it gets harder to evenly distribute the polls. Also, it takes 
more complex infrastructure to do that, all hosts must have their IPs 
exposed to the clients for example. Another way, is the VIP might be able 
to support smarter routing, based on some indicator, or you need to use a 
Master host, which delegates back, and has that logic itself.

An alternate way, is to let go of the polling, and instead have a push 
model. Your server could call the client to tell it the request is handled. 
This also has its own complexities and trade offs.

Anyways, in a distributed environment, async and non-blocking becomes quite 
a bit more complex.


On Monday, 14 May 2018 17:35:39 UTC-7, Didier wrote:
>
> Its hard to answer without additional detail.
>
> I'll make some assumptions, and answer assuming those are true:
>
> 1) I assume your S1 API is blocking, and that each request to it is 
> handled on its own thread, and that those threads come form a fixed size 
> thread pool with a size of 30.
>
> 2) I assume that S2 is also blocking, and that it returns a promise when 
> you call it. And that you need to keep polling another API, that I'll call 
> get-s2-result which takes the promise, and is also blocking, and returns 
> the result, error, or that its still not available.
>
> 3) I assume you want to turn your blocking S1 API, into a pseudo 
> non-blocking behavior.
>
> 4) Thus, you would have S1 return a promise. When called, you do not 
> process the request, but you queue the request in a "to be processed" 
> queue, and you return a promise that eventually, the request will be 
> processed and will have a value, or an error.
>
> 5) Similarly, you need a way for the client to check the promise, thus you 
> will also expose a blocking API that I will call get-s1-result which takes 
> the promise and returns either the result, an error, or that it's not 
> available yet.
>
> 6) Your promise will take the form of a GUID that uniquely identifies the 
> queued request.
>
> 7) This is your APIs design. Your clients can now start work and 
> integration with your APIs, while you implement its functionality.
>
> 8) Now you need to implement the queuing up of requests. This is where you 
> have options, and core.async is one of them. I do agree with the advice of 
> not using core.async unless simpler tools don't work. So I will start with 
> a simpler tool: Future, and a global atom map from promise GUID to request 
> map.
>
> 9) So you create a global atom, which contains a map from GUID -> FUTURE.
>
> 10) On every request to S1, you create a new GUID and Future, and you 
> swap! assoc the GUID with the Future.
>
> 11) The Future is your request handler. So in it, you synchronously handle 
> the request, whatever that means for you. So maybe you do some processing, 
> and then you call S2, and then you loop, and every 100ms, in the loop, you 
> call get-s2-result until it returns an error or a result. Every time you 
> loop, you check that the time its been since the time you started looping 
> is not more then X timeout, so that you don't loop forever. If you 
> eventually get a result or an error, you handle them however you need too, 
> and eventually your future itself returns a result or an error. Its 
> important you design the future task to timeout eventually. So that you 
> don't leak futures that get stuck in infinite loops. So you must be able to 
> deterministically know that the future will finish.
>
> 12) Now you implement get-s1-result. Whenever it is called, you get the 
> future from the global atom map of futures, and you call future-done? on 
> it. If false, you return that the result is not available yet. If it is 
> done, you deref the future, swap! dessoc the mapEntry for it from your 
> global atom, and return the result or error.
>
> The only danger of this approach, is that the Future queue is unbounded. 
> So what happens is that clients can call S1 and get-s1-result with at most 
> 30 concurrent request. That's because I assumed your APIs are blocking and 
> bounded on a shared fixed thread pool of size 30.
>
> Now say it takes you 1 second to process on average an S1 request, so your 
> future will finish on average in 1 second, and you time them out at 5 
> seconds. Now say we go for worst case scenario. This means say S2 is down, 
> so all requests take the max of 5 seconds to be handled. Now say your 
> clients are also maxing out your concurrency for S1, so you get around 30 
> concurrent request constantly. Say S1 takes 100ms to return the promise. 
> What you get is this:
>
> * Every second, you are creating 300 Future, because every 100ms, you 
> process 30 new S1 requests.
>
> So say we are at the beginning, you have 0 Future, one second later, you 
> have 300, 5 second later, you have 1500, but your first 300 timeout, so you 
> end up with 1200. At the 6th second, you have 1200 again, since 300 more 
> were queued, but 300 more timed out, and from this point on, every second 
> you have 1200 open Futures, with a max of 1500.
>
> Thus you need to make sure you can handle 1500 open threads on your host.
>
> Indirectly, this stabilizes because you made sure your Future tasks time 
> out at 5 second, and because your S1 API is itself bounded to 30 concurrent 
> request max.
>
> If, you'd prefer to not rely on the bound of the S1 requests, and you have 
> a hard time knowing the timing of your S1, you can keep track of the count 
> of queued Future, and on a request to S1 where the count is above your 
> bound, you return an error, instead of a promise, asking the client to wait 
> a bit, and retry the call in a bit, where you have more resourced available.
>
> I hope this helps.
>
> On Tuesday, 8 May 2018 13:45:00 UTC-7, Brjánn Ljótsson wrote:
>>
>> Hi!
>>
>> I'm writing a server-side (S1) function that initiates an action on 
>> another server (S2) and regularly checks if the action has finished 
>> (failed/completed). It should also be possible for a client to ask S1 for 
>> the status of the action performed by S2.
>>
>> My idea is to create an uid on S1 that represents the action and the uid 
>> is returned to the client. S1 then asynchronously polls S2 for action 
>> status and updates an atom with the uid as key and status as value. The 
>> client can then request the status of the uid from S1. Below is a link to 
>> my proof of concept code (without any code for the client requests or 
>> timeout guards) - and it is my first try at writing code using core.async.
>>
>> https://gist.github.com/brjann/d80f1709b3c17ef10a4fc89ae693927f
>>
>> The code can be tested with for example (start-poll) or (repeatedly 10 
>> start-poll) to test the behavior when multiple requests are made.
>>
>> The code seems to work, but is it a correct use of core.async? One thing 
>> I'm wondering is if the init-request and poll functions should use threads 
>> instead of go-blocks, since the http requests may take a few hundred 
>> milliseconds and many different requests (with different uids) could be 
>> made simultaneously. I've read that "long-running" tasks should not be put 
>> in go blocks. I haven't figured out how to use threads though.
>>
>> I would be thankful for any input!
>>
>> Best wishes,
>> Brjánn Ljótsson
>>
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Asynchronous http poll

Reply via email to