Re: read() slower than write() on the same disk

Jim Luther Thu, 06 Nov 2014 08:12:40 -0800

The fs_usage command line tool will show you a lot. In particular, look at the 
TIME INTERVAL(W) column.
 
> On Nov 6, 2014, at 6:42 AM, Dragan Milić <[email protected]> wrote:
> 
> Hi everyone,
> 
> This is rather long message, but I tried to describe my problem in details so 
> that it may be easier to provide explanation and/or solution.
> 
> I’m trying to optimise file copy performance of a file manager application. 
> The copy engine developed for this application is based on copyfile() POSIX 
> API. The previous engine was based File Manager FSCopyObjectAsync() API, but 
> we ditched it because it’s been deprecated since OS X 10.8 and we wanted to 
> implement deep folder merge functionality, which was practically impossible 
> with the old API. We’ve chosen for copyfile() because it provides good 
> foundation for folder merging, it has callbacks (to report copying progress) 
> and is recommended by Apple Documentation (File Manager Reference, 
> recommended replacements for deprecated functions).
> 
> Since the new engine is in use, some users started reporting slower copying 
> of big files in certain situations (depending on source and target disks and 
> similar), so I decided to see what can be improved. Since a copying of a 
> single big file is in question, it was clear the engine spent most or its 
> time in the copyfile() function itself and in its part dealing particularly 
> with file read/write. Fortunately is copyfile() part of Darwin and hence open 
> sourced, so I could peek into it. This is the part dealing with file 
> read/write (I did some re-indentation myself to make it more readable):
> 
> 
> while ((nread = read(s->src_fd, bp, blen)) > 0)
> {
>     ssize_t nwritten;
>     size_t left = nread;
>     void *ptr = bp;
>     int loop = 0;
> 
>     while (left > 0)
>     {
>         nwritten = write(s->dst_fd, ptr, MIN(left, oBlocksize));
>         switch (nwritten)
>         {
>             case 0:
>                 if (++loop > 5)
>                 {
>                     copyfile_warn("writing to output %d times resulted in 0 
> bytes written", loop);
>                     ret = -1;
>                     s->err = EAGAIN;
>                     goto exit;
>                 }
>                 break;
>             case -1:
>                 copyfile_warn("writing to output file got error");
>                 if (status)
>                 {
>                     int rv = (*status)(COPYFILE_COPY_DATA, COPYFILE_ERR, s, 
> s->src, s->dst, s->ctx);
>                     if (rv == COPYFILE_SKIP)
>                     {
>                         ret = 0;
>                         goto exit;
>                     }
>                     if (rv == COPYFILE_CONTINUE)
>                     {
>                         errno = 0;
>                         continue;
>                     }
>                 }
>                 ret = -1;
>                 goto exit;
>             default:
>                 left -= nwritten;
>                 ptr = ((char*)ptr) + nwritten;
>                 loop = 0;
>                 break;
>         }
>         s->totalCopied += nwritten;
>         if (status)
>         {
>             int rv = (*status)(COPYFILE_COPY_DATA, COPYFILE_PROGRESS,  s, 
> s->src, s->dst, s->ctx);
>             if (rv == COPYFILE_QUIT)
>             {
>                 ret = -1; s->err = errno = ECANCELED;
>                 goto exit;
>             }
>         }
>     }
> }
> 
> 
> It’s immediately obvious it’s not optimal; reading of the source file and 
> writing into the destination file are done sequentially in a single thread. 
> Every write() has to wait for its read() counterpart. If, for example, we 
> have a file which needs 10 cycles of read/write to copy (assuming read and 
> write buffer are of the same size) and assuming both source and destination 
> are on the same disk, thus assuming reading is faster and takes 1 seconds, 
> while writing takes 2 seconds, the whole file would be copied in 30 seconds 
> (10 cycles, 1 sec read and 2 sec write in each cycle). Separating reading and 
> writing in threads, and again assuming reading is faster than writing for 
> source and destination on the same disk, only the first write() would have to 
> wait for the first read() to complete (1 second), all other writing cycles 
> would have their data ready and would start immediately after the previous 
> cycle has finished. The file would be copied in 21 seconds (theoretically), 
> 30% faster.
> 
> Without doing any profiling and measuring, I implemented the threaded 
> approach, only to discover there weren’t any speed gains. On the contrary, in 
> some situations it even went slower. So, I looked deeper in threads 
> execution, only to discover reading and writing were still done sequentially 
> (only in different threads); each write() still had to wait for its companion 
> read() to complete, implying that read() is slower than write() on the same 
> disk. I was very surprised by that fact and still believed there’s problems 
> with my implementation, so this time I went to do some profiling.
> 
> I created a very simple command line utility. It’s complete main() is as 
> follows:
> 
> #include <stdio.h>
> #include <copyfile.h>
> 
> int main(int argc, const char * argv[])
> {
>     char *src = "/Users/milke/Downloads/smalltest.zip";
>     char *dst = "/Users/milke/Desktop/smalltest.zip";
>     
>     copyfile_state_t state = copyfile_state_alloc();
>     copyfile_flags_t flags = COPYFILE_ALL | COPYFILE_NOFOLLOW;
> 
>     return copyfile(src, dst, state, flags);
> }
> 
> 
> The test file “smalltest.zip” was exactly 256MB (286,435,456 bytes) big, with 
> random bytes arrangement (thus avoiding any eventual read/write caching that 
> the system may be doing and that I don’t even know about). The source 
> disk/volume is also the destination one; it’s Apple's SATA SSD, model APPLE 
> SSD TS512C, build into 2011 MacBook Pro. I run the utility in Instruments > 
> Time Profiler and it actually showed reading taking longer than writing!! The 
> percentage were always different, but it ranged from read=52% against 
> write=42%, to one extreme of read=65% against write=30%.
> 
> After this, I took copyfile.c and modified it to measure and print times of 
> each read() and write() call. I calculated average values, rejecting minimums 
> and maximums (but recording them). I also chose for same read and write 
> buffer size of 16MB (original copyfile.c consults statfs() to get optimal 
> sizes for read and write on a volume). Hence, my test file is copied in 16 
> read/write calls. Result are always different (understandably), but in 
> general they are like shown in the following list:
> 
> 
> 1: read 16777216 bytes in 0.087021 seconds
> 1: written 16777216 bytes in 0.073020 seconds
> 2: read 16777216 bytes in 0.081063 seconds
> 2: written 16777216 bytes in 0.074134 seconds
> 3: read 16777216 bytes in 0.082879 seconds
> 3: written 16777216 bytes in 0.076724 seconds
> 4: read 16777216 bytes in 0.188362 seconds
> 4: written 16777216 bytes in 0.075116 seconds
> 5: read 16777216 bytes in 0.193575 seconds
> 5: written 16777216 bytes in 0.074075 seconds
> 6: read 16777216 bytes in 0.084868 seconds
> 6: written 16777216 bytes in 0.077992 seconds
> 7: read 16777216 bytes in 0.277805 seconds
> 7: written 16777216 bytes in 0.076839 seconds
> 8: read 16777216 bytes in 0.087295 seconds
> 8: written 16777216 bytes in 0.071497 seconds
> 9: read 16777216 bytes in 0.270342 seconds
> 9: written 16777216 bytes in 0.072851 seconds
> 10: read 16777216 bytes in 0.089179 seconds
> 10: written 16777216 bytes in 0.076859 seconds
> 11: read 16777216 bytes in 0.280369 seconds
> 11: written 16777216 bytes in 0.077752 seconds
> 12: read 16777216 bytes in 0.089682 seconds
> 12: written 16777216 bytes in 0.073565 seconds
> 13: read 16777216 bytes in 0.266644 seconds
> 13: written 16777216 bytes in 0.071825 seconds
> 14: read 16777216 bytes in 0.092464 seconds
> 14: written 16777216 bytes in 0.076359 seconds
> 15: read 16777216 bytes in 0.281460 seconds
> 15: written 16777216 bytes in 0.074397 seconds
> 16: read 16777216 bytes in 0.090654 seconds
> 16: written 16777216 bytes in 0.072744 seconds
> +++!!!+++ average read cycle time = 0.155796 seconds (minimal = 0.081063, 
> maximal = 0.281460)
> +++!!!+++ average write cycle time = 0.074733 seconds (minimal = 0.071497, 
> maximal = 0.077992)
> 
> 
> This shows average reading being almost 100% slower than average writing on 
> the same disk. Also, all writing times are similar, while reading shows much 
> more discrepancy. If the destination is another  disk/volume, reading times 
> reduce for about 50%. 
> 
> One more note: I repeated this test many (hundreds of) times, starting the 
> utility again, or copying a single source file into 10 different copies and 
> results are pretty much similar. But in a few occasions, I got surprisingly 
> different results, like in the following list:
> 
> 
> 1: read 16777216 bytes in 0.013782 seconds
> 1: written 16777216 bytes in 0.069654 seconds
> 2: read 16777216 bytes in 0.003336 seconds
> 2: written 16777216 bytes in 0.072545 seconds
> 3: read 16777216 bytes in 0.003111 seconds
> 3: written 16777216 bytes in 0.075811 seconds
> 4: read 16777216 bytes in 0.003093 seconds
> 4: written 16777216 bytes in 0.077601 seconds
> 5: read 16777216 bytes in 0.003052 seconds
> 5: written 16777216 bytes in 0.092845 seconds
> 6: read 16777216 bytes in 0.003200 seconds
> 6: written 16777216 bytes in 0.078246 seconds
> 7: read 16777216 bytes in 0.003082 seconds
> 7: written 16777216 bytes in 0.093462 seconds
> 8: read 16777216 bytes in 0.003148 seconds
> 8: written 16777216 bytes in 0.074416 seconds
> 9: read 16777216 bytes in 0.003856 seconds
> 9: written 16777216 bytes in 0.088351 seconds
> 10: read 16777216 bytes in 0.003113 seconds
> 10: written 16777216 bytes in 0.076054 seconds
> 11: read 16777216 bytes in 0.003131 seconds
> 11: written 16777216 bytes in 0.082186 seconds
> 12: read 16777216 bytes in 0.003122 seconds
> 12: written 16777216 bytes in 0.077315 seconds
> 13: read 16777216 bytes in 0.003094 seconds
> 13: written 16777216 bytes in 0.079402 seconds
> 14: read 16777216 bytes in 0.003137 seconds
> 14: written 16777216 bytes in 0.076901 seconds
> 15: read 16777216 bytes in 0.004836 seconds
> 15: written 16777216 bytes in 0.103139 seconds
> 16: read 16777216 bytes in 0.003169 seconds
> 16: written 16777216 bytes in 0.073829 seconds
> +++!!!+++ average read cycle time = 0.003316 seconds (minimal = 0.003052, 
> maximal = 0.013782)
> +++!!!+++ average write cycle time = 0.079926 seconds (minimal = 0.069654, 
> maximal = 0.103139)
> 
> 
> I’ve got no slightest idea what happen in those few cases and if I did 
> anything special to get them (I don’t think so, I was just repeating the test 
> over and over), but they basically match what I expected before doing all 
> testing and measuring: reading being significantly faster that writing and no 
> much discrepancy between different read() calls.
> 
> As I’m not a file system expert, I’d really appreciate if anyone can shed 
> some light on the results I got. I’d also appreciate if anyone can recommend 
> more efficient API for reading and writing files, especially in threaded 
> environment. Would it be better to use Dispatch I/O Channel API in such 
> situation?
> 
> Thanks a lot for eventual help.
> 
> -- Dragan
> _______________________________________________
> Do not post admin requests to the list. They will be ignored.
> Filesystem-dev mailing list      ([email protected])
> Help/Unsubscribe/Update your Subscription:
> https://lists.apple.com/mailman/options/filesystem-dev/luther.j%40apple.com
> 
> This email sent to [email protected]

 _______________________________________________
Do not post admin requests to the list. They will be ignored.
Filesystem-dev mailing list      ([email protected])
Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/filesystem-dev/archive%40mail-archive.com

This email sent to [email protected]

Re: read() slower than write() on the same disk

Reply via email to