Re: Curl wrapper

jdrewsen Wed, 18 May 2011 15:30:20 -0700

Den 18-05-2011 16:53, Andrei Alexandrescu skrev:

On 5/18/11 6:07 AM, Jonas Drewsen wrote:

Select will wait for data to be ready and ask curl to handle the data
chunk. Curl in turn calls back to a registered callback handler with the
data read. That handler fills the buffer provided by the user. If not
enough data has been receive an new select is performed until the
requested amount of data is read. Then the blocking method can return.


Perhaps this would be too complicated. In any case the core
functionality must be paid top attention. And the core functionality is
streaming.

Currently there are two proposed ways to stream data from an HTTP
address: (a) by using the onReceive callback, and (b) by using
byLine/byChunk. If either of these perform slower than the
best-of-the-breed streaming using libcurl, we have failed.

The onReceive method is not particularly appealing because the client
and libcurl block each other: the client is blocked while libcurl is
waiting for data, and the client blocks libcurl while inside the
callback. (Please correct me if I'm wrong.)

To make byLine/byChunk fast, the basic setup should include a hidden
thread that does the download in separation from the client's thread.
There should be K buffers allocated (K = 2 to e.g. 10), and a simple
protocol for passing the buffers back and forth between the client
thread and the hidden thread. That way, in the quiescent state, there is
no memory allocation and either both client and libcurl are busy doing
work, or one is much slower than the other, which waits.

The same mechanism should be used in byChunkAsync or byFileAsync.

If byChunk is using a hidden thread to download into buffers, then howdoes it differ from the byChunkAsync that you mention?

The current curl wrapper actually does the hidden thread trick (based ona hint you gave me a while ago). It does not reuse buffers because Ithought that all data had to be immutable or by value to go through themessage passing system. I'll fix this since it is a good place to dosome type casting to allow passing the buffers for reuse.

I think that we have to consider the context of the streaming before wecan tell the best solution. I do not have any number to back thefollowing up, but this is how I see it:

If data that is read is going to be processed (e.g. compressed) in someway it is most likely a benefit to spawn a thread to handle the databuffering.

If no processing is done (e.g. a simple copy from net to disk) I believekeeping things in the same thread and simply select on sockets (disk ornet) is fastest. This way no message passing and context switching istaking place and does cause any overhead. libcurl can give you access tothe file descriptors for this exact purpose but it does have somedrawbacks: you are not in control of the buffers used by libcurl. Thismeans that reading from one curl connection and sending on another youwould have to copy the data. libcurl does in fact provide even simplermethods where you can provide your own buffers for read/writes.Unfortunately this is only supported for HTTP and a lot of theconvenience features such as redirections are lost. The more you want tocontrol to get the last drop of performance, the more you have tomanually handle yourself.

In my opinion I think that providing the performance of the standardlibcurl API in the D wrapper is the way to go (as done in the currentcurl wrapper). Generic and efficient streaming across protocols is bestdone in std.net where buffers can be handled entirely in D. I know thisis not a small task which is why I started out with wrapping libcurl.


Thanks
Jonas

Re: Curl wrapper

Reply via email to