Will GetOverlappedResult work as you describe on a overlapped call still in progress? I recall trying this exact thing a few years ago and did not get the results I expected. Perhaps I gave up too soon.with, check the value of GetOverLappedResult() for a 'sane' value (this can be the old MAX_SEGMENT_SIZE), and if the amount of bytes transmitted since the last call to GetOverlappedResult() is smaller than MAX_SEGMENT_SIZE, abort the loop and fall out of the sendfile() as the user is not receiving (quickly) enough.
I did also test with doubling and even quadrupling the blocksize, but the issue remained unfortunately,
I am not suprised. System calls on Windows are -very- expensive and I agree it would be good to do the Tf in a single call.
I don't have a lot of time to work on this but I would review and do some testing on any patches you care to submit.
Bill