Re: Multi cURL connect bug

Keyur Govande Mon, 08 Jul 2013 15:15:02 -0700

On Mon, Jul 8, 2013 at 3:24 PM, Dan Fandrich <d...@coneharvesters.com> wrote:
> On Mon, Jul 08, 2013 at 02:14:05PM -0400, Keyur Govande wrote:
>> I respectfully disagree. Asynchronous RPC without using a separate
>> thread is not that uncommon and is definitely not wrong. In C land
>> this would perfectly reasonable to do:
>> open a non-blocking socket()
>> connect() with timeout. If successful {
>>     loop over write() until finished
>>    // do other stuff
>>    poll() on fd with timeout
>>    read() from fd
>> }
>> close()
>> // do more stuff
>>
>> The application I'm trying to do this in is PHP, so a separate thread
>> is not an option.
>
> You can do this form of communication with libcurl, but not exactly in the way
> you describe. Since libcurl handles the write and the read in the same
> function, the // do other stuff part has to be combined with the loop and poll
> sections before and afterward. It's not always easy to cleanly separate the
> write and read states, anyway, nor would you necessarily want to.  Consider a
> hypothetical RPC call with a slow DNS lookup so the connect state takes 5
> seconds, the write state that takes 100 msec, a processing state length of 10
> msec, and a multi-part response that takes 100 msec to return the first part,
> then a 10 second delay, then another 100 msec to return the second part.  The
> pseudo code above would give you only 10 msec out of 15,310 msec to "do other
> stuff".
>
> The libcurl-style would look something like this:
>
> start curl transfer()
> loop until transfer and stuff done
>    poll()
>    if readable_or_writeable
>      process curl transfer()
>    if stuff == 0
>      do stuff()
>    else if stuff == 1
>      do more stuff()
>    else if stuff == 2
>      do even more stuff()
>    stuff=stuff+1
>
> The difference is that the process curl transfer part is executed regularly, 
> on
> every iteration through the loop, and not in separate write and read sections.
> libcurl isn't ever CPU-hungry and the process curl transfer part will never
> take more than msec (assuming a properly-configured libcurl), so out of the
> 15,000 msec this transfer takes, libcurl would give the app more like 15,300
> msec of time to do stuff (instead of 10 msec in the previous example).
>
> You could of course hide the libcurl things in a couple of functions,
> so the code could look more like:
>
> start rpc()
> do stuff()
> is_rpc_done()  // ignore the result--we're not ready for the rpc to be done
> do more stuff()
> is_rpc_done()
> do even more stuff()
> while not is_rpc_done()
>   just wait
>
> The trick is keeping the time between calling libcurl down. This could be done
> by splitting the stuff to do into small enough segments to give libcurl enough
> opportunities and low latency to do the transfer, or finding regular times
> while doing stuff to call is_rpc_done() to give libcurl a chance to work, or 
> by
> calling libcurl in a loop with a short timeout in between doing stuff.
>


Thanks Dan for the detailed response.

I agree that if feasible, what you suggested would work well. But in
most large pre-existing code bases, there is no way to keep calling
cURL function when you get a few class hierarchies deep and passing
along the cURL object along all the way through to do so seems leaky
and would involve lots of changes.

I realize libcurl is not the answer to every problem :-) This
statement (Enable a "pull" interface. The application that uses
libcurl decides where and when to ask libcurl to get/send data) in
Objectives on http://curl.haxx.se/libcurl/c/libcurl-multi.html was
what led me down the road: ask libcurl to first send, and then the app
will ask it to receive when it is ready.

To take your example, a more conventional RPC would look like:
1) DNS lookup + connect in 10ms
2) send request in 20ms
(external service takes 200ms to respond)
3) poll() with timeout. If timeout, assume RPC failed and move on.
4) If #3 succeeded, receive response in 20ms
5) close connection

With the goal being to run other code after step 2 and before step 3
while the external service is still processing.

To solve my immediate problem where connect() takes long(er) and to
still use libcurl, the solution seems to be to pass in a connected
socket file descriptor using CURLOPT_OPENSOCKETFUNCTION and
CURLOPT_SOCKOPTFUNCTION, and assume when CURLM_OK is returned from
curl_multi_perform(), the entire request has been flushed over. Then
call curl_multi_wait() when I'm ready to receive the response. The
other option is to use a different library or raw sockets to do
exactly what I need.
-------------------------------------------------------------------
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette:  http://curl.haxx.se/mail/etiquette.html

Re: Multi cURL connect bug

Reply via email to