Re: Finding slowdowns in pkg_install (continuations of previous threads)
Garrett Cooper wrote: Garrett Cooper wrote: Tim Kientzle wrote: -I tried ... buffering ... the +CONTENTS file parsing function, and the majority of the time it yielded good results One approach I prototyped sometime back was to use libarchive in pkg_add as follows: * Open the archive * Read +CONTENTS directly into memory (it's guaranteed to always be first in the archive) * Parse all of +CONTENTS at once * Continue scanning the archive, disposing of each file as it appears in the archive. Based on my experience with this, I would suggest you just read all of +CONTENTS directly into memory at once and parse the whole thing in a single shot. fopen(), then fstat() to get the size, then allocate a buffer and read the whole thing, then fclose(). You can then parse it all at once. As a bonus, your parser then becomes a nice little bit of reusable code that reads a block of memory and returns a structure describing the package metadata. Tim Kientzle I'm not 100% sure because I'm not comparing apples (virtual disk on desktop via VMware) to apples (real disk on server), but I'm showing a 2.5-fold speedup after adding the simple parser: Virtual disk: 4.42 real 1.37 user 1.47 sys Real disk: 10.26 real 5.36 user 0.99 sys I'll run a battery of tests just to ensure whether or not that's the case. Be back with results in a few more days. -Garrett Hello, As promised, here are some results for my work: By modifying the parser and heuristics in plist_cmd I appear to have decreased all figures (except plist_cmd, which I will note later) from their original values to much lower values. The only drawback is that I appear to have stimulated a bug with either malloc'ing memory, printf/vargs, or transferring large amounts of data via pipes where some of my debug messages are making it into plist_cmd(..) from obtainbymatch(..), which represents the the 3-fold increase in reported plist_cmd(..) iterations. I'm going to try replacing the debug commands with standard print statements wherever possible, then replace all tar commands with libarchive APIs, and see if the problem solves itself. Notes: 1. This sample is based off x11-libs/atk. 2. It isn't the final set of results. 3. Graphs coming soon (need to simulate values in Excel on work machine and convert to screenshots later on when I have a break -- thinking around noon). I'll repost when I have them available. 4. CSV files available at: http://students.washington.edu/youshi10/posted/atk-results.tgz. I've posted HTML results of the interpreted spreadsheet on http://students.washington.edu/posted/atk.htm. I'll provide commentary tomorrow after I get some sleep. -Garrett ___ freebsd-ports@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-ports To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Finding slowdowns in pkg_install (continuations of previous threads)
Garrett Cooper píše v so 14. 07. 2007 v 04:04 -0700: I've posted HTML results of the interpreted spreadsheet on http://students.washington.edu/posted/atk.htm. I'll provide commentary tomorrow after I get some sleep. Nothing on that URL. -- Pav Lucistnik [EMAIL PROTECTED] [EMAIL PROTECTED] A computer programmer is a device for turning requirements into undocumented features. It runs on cola, pizza and Dilbert cartoons. -- Bram Moolenaar signature.asc Description: Toto je digitálně podepsaná část zprávy
Re: Finding slowdowns in pkg_install (continuations of previous threads)
4. CSV files available at: http://students.washington.edu/youshi10/posted/atk-results.tgz. I've posted HTML results of the interpreted spreadsheet on http://students.washington.edu/posted/atk.htm. I'll provide commentary tomorrow after I get some sleep. I think the second one should be: http://students.washington.edu/youshi10/posted/atk.htm Unfortunately, I get Permission Denied here for both of those URLs. Tim Kientzle ___ freebsd-ports@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-ports To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Finding slowdowns in pkg_install (continuations of previous threads)
Tim Kientzle wrote: 4. CSV files available at: http://students.washington.edu/youshi10/posted/atk-results.tgz. I've posted HTML results of the interpreted spreadsheet on http://students.washington.edu/posted/atk.htm. I'll provide commentary tomorrow after I get some sleep. I think the second one should be: http://students.washington.edu/youshi10/posted/atk.htm Unfortunately, I get Permission Denied here for both of those URLs. Tim Kientzle About files: Sorry about that -- that's what I get for staying up until 4:30 am and making email posts. I've condensed all of the files into: http://students.washington.edu/youshi10/posted/atk-results.tgz and fixed the permissions for the files. The following blog post has all of my commentary on the results I have: http://blogs.freebsdish.org/gcooper/2007/07/14/modifications-to-pkg_install-the-positive-and-negative-implications/. Let me know if you have any questions or comments. Now to go off and solve that bug :). -Garrett ___ freebsd-ports@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-ports To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Finding slowdowns in pkg_install (continuations of previous threads)
The following blog post has all of my commentary on the results I have: http://blogs.freebsdish.org/gcooper/2007/07/14/modifications-to-pkg_install-the-positive-and-negative-implications/. I tried to unroll strcmp a bit by checking for the first character of the command, then run strcmp ... There's a somewhat more straightforward optimization that relies on this same idea: switch(cmd[0]) { case 'c': /* Commands that start with 'c' */ if (strcmp(cmd, 'cwd') == 0) return (CMD_CWD); /* FALLTHROUGH */ case 'd': /* Commands that start with 'd' */ etc /* FALLTHROUGH */ default: /* Unrecognized command. */ } This is a little cleaner and easier to read and may even be faster than the code you presented in your blog. Note that the fall through ensures that all unrecognized commands end up at the same place. If unrecognized commands are very rare (they should be), then the fallthrough is not a performance issue. /** malloc buffer large enough to hold +CONTENTS **/ while(!feof(file_p)) { /** add content via fgetc **/ } Yuck. Try this instead: struct stat st; int fd; char *buff; fd = open(file); fstat(fd, st); buff = malloc(st.st_size + 1); read(fd, buff, st.st_size); buff[st.st_size] = '\0'; close(fd); Plus some error checking, of course. You can use stdio if you prefer: FILE *f; f = fopen(file, r); fstat(fileno(f), st); buff = malloc(st.st_size + 1); fread(buff, 1, st.st_size, f); buff[st.st_size] = '\0'; fclose(f); Either way, this is a lot more efficient than tens of thousands of calls to fgetc(). Cheers, Tim Kientzle ___ freebsd-ports@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-ports To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Finding slowdowns in pkg_install (continuations of previous threads)
Tim Kientzle wrote: The following blog post has all of my commentary on the results I have: http://blogs.freebsdish.org/gcooper/2007/07/14/modifications-to-pkg_install-the-positive-and-negative-implications/. I tried to unroll strcmp a bit by checking for the first character of the command, then run strcmp ... There's a somewhat more straightforward optimization that relies on this same idea: switch(cmd[0]) { case 'c': /* Commands that start with 'c' */ if (strcmp(cmd, 'cwd') == 0) return (CMD_CWD); /* FALLTHROUGH */ case 'd': /* Commands that start with 'd' */ etc /* FALLTHROUGH */ default: /* Unrecognized command. */ } This is a little cleaner and easier to read and may even be faster than the code you presented in your blog. Note that the fall through ensures that all unrecognized commands end up at the same place. If unrecognized commands are very rare (they should be), then the fallthrough is not a performance issue. /** malloc buffer large enough to hold +CONTENTS **/ while(!feof(file_p)) { /** add content via fgetc **/ } Yuck. Try this instead: struct stat st; int fd; char *buff; fd = open(file); fstat(fd, st); buff = malloc(st.st_size + 1); read(fd, buff, st.st_size); buff[st.st_size] = '\0'; close(fd); Plus some error checking, of course. You can use stdio if you prefer: FILE *f; f = fopen(file, r); fstat(fileno(f), st); buff = malloc(st.st_size + 1); fread(buff, 1, st.st_size, f); buff[st.st_size] = '\0'; fclose(f); Either way, this is a lot more efficient than tens of thousands of calls to fgetc(). Cheers, Tim Kientzle Tim, That was a very good call. I didn't even think of read(2) over fgetc(2). That decreased the overall time by 0.7 seconds in installing vim, which is just a little shy of a 10% speedup. -Garrett ___ freebsd-ports@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-ports To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Finding slowdowns in pkg_install (continuations of previous threads)
Garrett Cooper wrote: Tim Kientzle wrote: -I tried ... buffering ... the +CONTENTS file parsing function, and the majority of the time it yielded good results One approach I prototyped sometime back was to use libarchive in pkg_add as follows: * Open the archive * Read +CONTENTS directly into memory (it's guaranteed to always be first in the archive) * Parse all of +CONTENTS at once * Continue scanning the archive, disposing of each file as it appears in the archive. Based on my experience with this, I would suggest you just read all of +CONTENTS directly into memory at once and parse the whole thing in a single shot. fopen(), then fstat() to get the size, then allocate a buffer and read the whole thing, then fclose(). You can then parse it all at once. As a bonus, your parser then becomes a nice little bit of reusable code that reads a block of memory and returns a structure describing the package metadata. Tim Kientzle I'm not 100% sure because I'm not comparing apples (virtual disk on desktop via VMware) to apples (real disk on server), but I'm showing a 2.5-fold speedup after adding the simple parser: Virtual disk: 4.42 real 1.37 user 1.47 sys Real disk: 10.26 real 5.36 user 0.99 sys I'll run a battery of tests just to ensure whether or not that's the case. Be back with results in a few more days. -Garrett Hello, As promised, here are some results for my work: By modifying the parser and heuristics in plist_cmd I appear to have decreased all figures (except plist_cmd, which I will note later) from their original values to much lower values. The only drawback is that I appear to have stimulated a bug with either malloc'ing memory, printf/vargs, or transferring large amounts of data via pipes where some of my debug messages are making it into plist_cmd(..) from obtainbymatch(..), which represents the the 3-fold increase in reported plist_cmd(..) iterations. I'm going to try replacing the debug commands with standard print statements wherever possible, then replace all tar commands with libarchive APIs, and see if the problem solves itself. Notes: 1. This sample is based off x11-libs/atk. 2. It isn't the final set of results. 3. Graphs coming soon (need to simulate values in Excel on work machine and convert to screenshots later on when I have a break -- thinking around noon). I'll repost when I have them available. 4. CSV files available at: http://students.washington.edu/youshi10/posted/atk-results.tgz. ___ freebsd-ports@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-ports To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Finding slowdowns in pkg_install (continuations of previous threads)
Tim Kientzle wrote: -I tried ... buffering ... the +CONTENTS file parsing function, and the majority of the time it yielded good results One approach I prototyped sometime back was to use libarchive in pkg_add as follows: * Open the archive * Read +CONTENTS directly into memory (it's guaranteed to always be first in the archive) * Parse all of +CONTENTS at once * Continue scanning the archive, disposing of each file as it appears in the archive. Based on my experience with this, I would suggest you just read all of +CONTENTS directly into memory at once and parse the whole thing in a single shot. fopen(), then fstat() to get the size, then allocate a buffer and read the whole thing, then fclose(). You can then parse it all at once. As a bonus, your parser then becomes a nice little bit of reusable code that reads a block of memory and returns a structure describing the package metadata. Tim Kientzle I'm not 100% sure because I'm not comparing apples (virtual disk on desktop via VMware) to apples (real disk on server), but I'm showing a 2.5-fold speedup after adding the simple parser: Virtual disk: 4.42 real 1.37 user 1.47 sys Real disk: 10.26 real 5.36 user 0.99 sys I'll run a battery of tests just to ensure whether or not that's the case. Be back with results in a few more days. -Garrett ___ freebsd-ports@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-ports To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Finding slowdowns in pkg_install (continuations of previous threads)
I'm currently running a gamut of tests (500 tests, per package -- 128 total on my server), and outputting all data to CSV files to interpret later, using another Perl script to interpret calculated averages and standard deviations. Excellent! Much-needed work. Using basic printf(2)'s with clock_gettime(2) I have determined that the majority of the issues are disk-bound (as Tom Kientzle put it). Next question: What are those disk operations and are any of them redundant? The scope of my problem is not to analyze tar,... I've spent the last three years+ doing exactly that. Make sure you're using the newest bsdtar/libarchive, which has some very noticable performance improvements. but I've discovered that a lot of time is spent in reading and interpreting the +CONTENTS and related files (most notably in parsing commands to be honest). Oh? That's interesting. Is data being re-parsed (in which case some structural changes to parse it once and store the results may help)? Or is the parser just slow? Will post more conclusive results tomorrow once all of my results are available. I don't follow ports@ so didn't see these conclusive results of yours. I'm very interested, though. Tim Kientzle ___ freebsd-ports@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-ports To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Finding slowdowns in pkg_install (continuations of previous threads)
Tim Kientzle wrote: I'm currently running a gamut of tests (500 tests, per package -- 128 total on my server), and outputting all data to CSV files to interpret later, using another Perl script to interpret calculated averages and standard deviations. Excellent! Much-needed work. Using basic printf(2)'s with clock_gettime(2) I have determined that the majority of the issues are disk-bound (as Tom Kientzle put it). Next question: What are those disk operations and are any of them redundant? The scope of my problem is not to analyze tar,... I've spent the last three years+ doing exactly that. Make sure you're using the newest bsdtar/libarchive, which has some very noticable performance improvements. but I've discovered that a lot of time is spent in reading and interpreting the +CONTENTS and related files (most notably in parsing commands to be honest). Oh? That's interesting. Is data being re-parsed (in which case some structural changes to parse it once and store the results may help)? Or is the parser just slow? Will post more conclusive results tomorrow once all of my results are available. I don't follow ports@ so didn't see these conclusive results of yours. I'm very interested, though. Tim Kientzle Some extra notes: -My tests are still running, but almost done (unfortunately I won't be able to post any results before tonight since I'm going to work now). It's taking a lot longer than I originally thought it would (I've produced several gigabytes of logfiles and csv files... eep). -I placed them around what I considered pkg_install specific sensitive areas, i.e. locations where tar was run, or the meta files were processed. -I tried implementing a small buffering technique (read in 10 lines at once, parse the 10 lines, and repeat, instead of read 1 line and parse, then repeat), around the +CONTENTS file parsing function, and the majority of the time it yielded good results (9/10 times the buffering technique won over the non-buffering technique). Given that success I'm going to try implementing the file reading in terms of fgetc(2) to properly read in a number of lines all at once, and see what happens instead (my hunch is those results may be more favorable). Thanks, -Garrett ___ freebsd-ports@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-ports To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Finding slowdowns in pkg_install (continuations of previous threads)
-I tried ... buffering ... the +CONTENTS file parsing function, and the majority of the time it yielded good results One approach I prototyped sometime back was to use libarchive in pkg_add as follows: * Open the archive * Read +CONTENTS directly into memory (it's guaranteed to always be first in the archive) * Parse all of +CONTENTS at once * Continue scanning the archive, disposing of each file as it appears in the archive. Based on my experience with this, I would suggest you just read all of +CONTENTS directly into memory at once and parse the whole thing in a single shot. fopen(), then fstat() to get the size, then allocate a buffer and read the whole thing, then fclose(). You can then parse it all at once. As a bonus, your parser then becomes a nice little bit of reusable code that reads a block of memory and returns a structure describing the package metadata. Tim Kientzle ___ freebsd-ports@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-ports To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Finding slowdowns in pkg_install
Tim Kientzle said: One approach I prototyped sometime back was to use libarchive in pkg_add as follows: * Open the archive * Read +CONTENTS directly into memory (it's guaranteed to always be first in the archive) I can only concur with that. In my program http://www.lpthe.jussieu.fr/~talon/check_pkg.py i discovered that memory mapping +CONTENTS and then working in memory before rewriting it was around 5 times faster than reading line by line and parsing each line. See function fix_pkg_database(). By the way i am writing a new +CONTENTS fileand then renaming it, which avoids leaving a mess if something goes astray like portupgrade does. -- Michel TALON ___ freebsd-ports@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-ports To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Finding slowdowns in pkg_install
On Fri, 6 Jul 2007, Michel Talon wrote: Tim Kientzle said: One approach I prototyped sometime back was to use libarchive in pkg_add as follows: * Open the archive * Read +CONTENTS directly into memory (it's guaranteed to always be first in the archive) I can only concur with that. In my program http://www.lpthe.jussieu.fr/~talon/check_pkg.py i discovered that memory mapping +CONTENTS and then working in memory before rewriting it was around 5 times faster than reading line by line and parsing each line. See function fix_pkg_database(). By the way i am writing a new +CONTENTS fileand then renaming it, which avoids leaving a mess if something goes astray like portupgrade does. -- Michel TALON Ok, excellent I'll try that then. I'll work on an improved parser this weekend and probably will have more conclusive results for next week, but for the immediate point in time I'll post results on how slow / fast the critical sections were once I return home and post process my data again for averages and standard deviations. I'll use this as my basis for further conclusions this summer. -Garrett ___ freebsd-ports@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-ports To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Finding slowdowns in pkg_install (continuations of previous threads)
Garrett Cooper wrote: I'm currently running a gamut of tests (500 tests, per package -- 128 total on my server), and outputting all data to CSV files to interpret later, using another Perl script to interpret calculated averages and standard deviations. Using basic printf(2)'s with clock_gettime(2) I have determined that the majority of the issues are disk-bound (as Tom Kientzle put it). The scope of my problem is not to analyze tar, but I've discovered that a lot of time is spent in reading and interpreting the +CONTENTS and related files (most notably in parsing commands to be honest). Will post more conclusive results tomorrow once all of my results are available. Cheers, -Garrett Forgot to include [EMAIL PROTECTED] -Garrett ___ freebsd-ports@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-ports To unsubscribe, send any mail to [EMAIL PROTECTED]
Finding slowdowns in pkg_install (continuations of previous threads)
I'm currently running a gamut of tests (500 tests, per package -- 128 total on my server), and outputting all data to CSV files to interpret later, using another Perl script to interpret calculated averages and standard deviations. Using basic printf(2)'s with clock_gettime(2) I have determined that the majority of the issues are disk-bound (as Tom Kientzle put it). The scope of my problem is not to analyze tar, but I've discovered that a lot of time is spent in reading and interpreting the +CONTENTS and related files (most notably in parsing commands to be honest). Will post more conclusive results tomorrow once all of my results are available. Cheers, -Garrett ___ freebsd-ports@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-ports To unsubscribe, send any mail to [EMAIL PROTECTED]