> Oh right, that's a fun way to try stuff. My fisrt intention was to use fcurl with the very classic code (error checking omitted):
while(!fcurl_eof(fcurl)) { sz = fcurl_read( fcurl, buf, 1, BUF_SIZE); write(1, buf, sz); } But since fcurl seems bugged, it was quicker to just add the memcpy, which is sort of what fcurl would do more or less. So maybe you see the link I made between "read-like" interface and "zero copy": what is the point of having the code above that is debatably simpler than the callback code, if it adds a memcpy of the full transfer in the process? So that is why I assumed (wrongly) that they might also have been some work in progress towards "caller provided buffers" to make read-like interface on par with callback performance-wise, and that we could also use this feature even without fcurl. > But why measure user time? We want to see the performance inpact as a whole, right? How much extra time does that memcpy() relay a 10GB transfer. Then we measure wall clock time. I beg to differ Daniel: that is the wrong tool... if the goal is to measure the impact of a change on libcurl + callback (like adding an extra memcpy). Are you familiar with PERT diagrams used in project management and critical path? What you are measuring is the length of the critical in YOUR situation, and checking that the added code does not change the critical path. You are not measuring the individual task itself. In such a test you have 3 tasks: input, compute, output (that's very general to all programs in fact!) and they all run pretty much in parallel in modern OSes (cache, readahead, writeback...). So what you measure with "real/elapsed" is that in your case "compute" (libcurl + callback) is not in the critical path either original or SLOW. That's an interesting measure (user perceived time in your situation), but it tells nothing about the "cost" of a memcpy until you hit critical path. In the measures we both did so far, most probably the critical path in "input". But there are cases that are not at all "exotic" where the critical path is NOT input, and becomes "compute". It happens on my Raspberry Pi 4 when I run any transfer with https because it cannot sustain 1Gbps (ethernet speed) at doing crypto. So now, since libcurl + callback is in the critical path, each code you add either on libcurl or in the callback on top of showing in "user time", directly impacts real/elapsed. I am currently running the tests and will add them to the "benchmark". So yes, if you want to measure something that makes sense to see whether the library works as well as before when adding some code, you have to use "user time". Otherwise you might see a difference or not at all depending whether you are on the critical path or not! With a real network measure (not localhost!), "user time" is also more stable because it depends less on possible server slowdown or other traffic on your PC or internet link. And the answer is: a single memcpy adds between 1,4 and 2 seconds for a 10Gbps transfer on a RPi4 (direct impact from user time to elapsed when using TLS). To get a scale of what that means, it is more than 15% of the time taken by the library + callback to do a simple GET (no TLS) on the same file, so quite a huge impact in fact! > what's the performance gain for the user who'd want to use this, and can the API be done in a way that makes this feature practical/attractive. For the sake of concision (this response is already too long!) I suggest to do another post "What a fuse-driver programmer would like from libcurl". Unlike "application" programs using libcurl, a "fuse driver", although technically "userland", is a quasi-kernel code. So here performance is quite critical and you can mess up badly with the system when the driver becomes sluggish. Being OK with "we are not in the critical path" is really not what you want in a fuse driver! ;-) May I have some time to come with a relevant response to that question? I have in mind some possible propositions that could help without "breaking" the API as much as "caller provided buffers" would do. I still have to refine them a bit. Rest assure, I also have plenty of bigger optimisations than 'memcpy' on my own driver... even bugs like a known race condition I should fix! Cheers Alain ------------------------------------------------------------------- Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library Etiquette: https://curl.se/mail/etiquette.html