Re: Finding slowdowns in pkg_install (continuations of previous threads)

2007-07-14 Thread Garrett Cooper

Garrett Cooper wrote:

Garrett Cooper wrote:

Tim Kientzle wrote:
   -I tried ... buffering ...  the +CONTENTS file parsing function, 
and the

majority of the time it yielded good results 


One approach I prototyped sometime back was to use
libarchive in pkg_add as follows:
  * Open the archive
  * Read +CONTENTS directly into memory (it's
guaranteed to always be first in the archive)
  * Parse all of +CONTENTS at once
  * Continue scanning the archive, disposing
of each file as it appears in the archive.

Based on my experience with this, I would
suggest you just read all of +CONTENTS
directly into memory at once and parse
the whole thing in a single shot.
fopen(), then fstat() to get the size,
then allocate a buffer and read the whole
thing, then fclose().  You can then
parse it all at once.

As a bonus, your parser then becomes a nice
little bit of reusable code that reads
a block of memory and returns a structure describing
the package metadata.

Tim Kientzle
I'm not 100% sure because I'm not comparing apples (virtual disk on 
desktop via VMware) to apples (real disk on server), but I'm showing 
a 2.5-fold speedup after adding the simple parser:


Virtual disk:
   4.42 real 1.37 user 1.47 sys

Real disk:
  10.26 real 5.36 user 0.99 sys

I'll run a battery of tests just to ensure whether or not that's the 
case.


Be back with results in a few more days.

-Garrett

Hello,
   As promised, here are some results for my work:

   By modifying the parser and heuristics in plist_cmd I appear to 
have decreased all figures (except plist_cmd, which I will note later) 
from their original values to much lower values. The only drawback is 
that I appear to have stimulated a bug with either malloc'ing memory, 
printf/vargs, or transferring large amounts of data via pipes where 
some of my debug messages are making it into plist_cmd(..) from 
obtainbymatch(..), which represents the the 3-fold increase in 
reported plist_cmd(..) iterations.


   I'm going to try replacing the debug commands with standard print 
statements wherever possible, then replace all tar commands with 
libarchive APIs, and see if the problem solves itself.


Notes:
1. This sample is based off x11-libs/atk.
2. It isn't the final set of results.
3. Graphs coming soon (need to simulate values in Excel on work 
machine and convert to screenshots later on when I have a break -- 
thinking around noon). I'll repost when I have them available.
4. CSV files available at: 
http://students.washington.edu/youshi10/posted/atk-results.tgz.
I've posted HTML results of the interpreted spreadsheet on 
http://students.washington.edu/posted/atk.htm. I'll provide commentary 
tomorrow after I get some sleep.

-Garrett
___
freebsd-ports@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-ports
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Finding slowdowns in pkg_install (continuations of previous threads)

2007-07-14 Thread Pav Lucistnik
Garrett Cooper píše v so 14. 07. 2007 v 04:04 -0700:

 I've posted HTML results of the interpreted spreadsheet on 
 http://students.washington.edu/posted/atk.htm. I'll provide commentary 
 tomorrow after I get some sleep.

Nothing on that URL.

-- 
Pav Lucistnik [EMAIL PROTECTED]
  [EMAIL PROTECTED]

A computer programmer is a device for turning requirements
into undocumented features. It runs on cola, pizza and Dilbert cartoons.
  -- Bram Moolenaar


signature.asc
Description: Toto je digitálně	 podepsaná část	 zprávy


Re: Finding slowdowns in pkg_install (continuations of previous threads)

2007-07-14 Thread Tim Kientzle
4. CSV files available at: 
http://students.washington.edu/youshi10/posted/atk-results.tgz.


I've posted HTML results of the interpreted spreadsheet on 
http://students.washington.edu/posted/atk.htm. I'll provide commentary 
tomorrow after I get some sleep.


I think the second one should be:
http://students.washington.edu/youshi10/posted/atk.htm

Unfortunately, I get Permission Denied here for both
of those URLs.

Tim Kientzle
___
freebsd-ports@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-ports
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Finding slowdowns in pkg_install (continuations of previous threads)

2007-07-14 Thread Garrett Cooper

Tim Kientzle wrote:
4. CSV files available at: 
http://students.washington.edu/youshi10/posted/atk-results.tgz.


I've posted HTML results of the interpreted spreadsheet on 
http://students.washington.edu/posted/atk.htm. I'll provide 
commentary tomorrow after I get some sleep.


I think the second one should be:
http://students.washington.edu/youshi10/posted/atk.htm

Unfortunately, I get Permission Denied here for both
of those URLs.

Tim Kientzle


About files:
   Sorry about that -- that's what I get for staying up until 4:30 am 
and making email posts.
   I've condensed all of the files into: 
http://students.washington.edu/youshi10/posted/atk-results.tgz and 
fixed the permissions for the files.


   The following blog post has all of my commentary on the results I 
have: 
http://blogs.freebsdish.org/gcooper/2007/07/14/modifications-to-pkg_install-the-positive-and-negative-implications/.


   Let me know if you have any questions or comments. Now to go off and 
solve that bug :).


-Garrett
___
freebsd-ports@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-ports
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Finding slowdowns in pkg_install (continuations of previous threads)

2007-07-14 Thread Tim Kientzle
   The following blog post has all of my commentary on the results I 
have: 
http://blogs.freebsdish.org/gcooper/2007/07/14/modifications-to-pkg_install-the-positive-and-negative-implications/. 



I tried to unroll strcmp a bit by checking for the first character of the

 command, then run strcmp ...

There's a somewhat more straightforward optimization that
relies on this same idea:

switch(cmd[0]) {
case 'c':
/* Commands that start with 'c' */
if (strcmp(cmd, 'cwd') == 0)
return (CMD_CWD);
/* FALLTHROUGH */
case 'd':
/* Commands that start with 'd' */

 etc
/* FALLTHROUGH */
default:
/* Unrecognized command. */
}

This is a little cleaner and easier to read
and may even be faster than the code you
presented in your blog.  Note that the fall through
ensures that all unrecognized commands end up at
the same place.  If unrecognized commands are
very rare (they should be), then the fallthrough
is not a performance issue.


/** malloc buffer large enough to hold +CONTENTS **/

while(!feof(file_p)) {

/** add content via fgetc **/
}


Yuck.  Try this instead:

   struct stat st;
   int fd;
   char *buff;

   fd = open(file);
   fstat(fd, st);
   buff = malloc(st.st_size + 1);
   read(fd, buff, st.st_size);
   buff[st.st_size] = '\0';
   close(fd);

Plus some error checking, of course.  You can
use stdio if you prefer:

   FILE *f;

   f = fopen(file, r);
   fstat(fileno(f), st);
   buff = malloc(st.st_size + 1);
   fread(buff, 1, st.st_size, f);
   buff[st.st_size] = '\0';
   fclose(f);

Either way, this is a lot more efficient than
tens of thousands of calls to fgetc().

Cheers,

Tim Kientzle
___
freebsd-ports@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-ports
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Finding slowdowns in pkg_install (continuations of previous threads)

2007-07-14 Thread Garrett Cooper

Tim Kientzle wrote:
   The following blog post has all of my commentary on the results I 
have: 
http://blogs.freebsdish.org/gcooper/2007/07/14/modifications-to-pkg_install-the-positive-and-negative-implications/. 



I tried to unroll strcmp a bit by checking for the first character of 
the

 command, then run strcmp ...

There's a somewhat more straightforward optimization that
relies on this same idea:

switch(cmd[0]) {
case 'c':
/* Commands that start with 'c' */
if (strcmp(cmd, 'cwd') == 0)
return (CMD_CWD);
/* FALLTHROUGH */
case 'd':
/* Commands that start with 'd' */

 etc
/* FALLTHROUGH */
default:
/* Unrecognized command. */
}

This is a little cleaner and easier to read
and may even be faster than the code you
presented in your blog.  Note that the fall through
ensures that all unrecognized commands end up at
the same place.  If unrecognized commands are
very rare (they should be), then the fallthrough
is not a performance issue.


/** malloc buffer large enough to hold +CONTENTS **/

while(!feof(file_p)) {

/** add content via fgetc **/
}


Yuck.  Try this instead:

   struct stat st;
   int fd;
   char *buff;

   fd = open(file);
   fstat(fd, st);
   buff = malloc(st.st_size + 1);
   read(fd, buff, st.st_size);
   buff[st.st_size] = '\0';
   close(fd);

Plus some error checking, of course.  You can
use stdio if you prefer:

   FILE *f;

   f = fopen(file, r);
   fstat(fileno(f), st);
   buff = malloc(st.st_size + 1);
   fread(buff, 1, st.st_size, f);
   buff[st.st_size] = '\0';
   fclose(f);

Either way, this is a lot more efficient than
tens of thousands of calls to fgetc().

Cheers,

Tim Kientzle

Tim,
   That was a very good call. I didn't even think of read(2) over fgetc(2).
   That decreased the overall time by 0.7 seconds in installing vim, 
which is just a little shy of a 10% speedup.

-Garrett
___
freebsd-ports@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-ports
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Finding slowdowns in pkg_install (continuations of previous threads)

2007-07-13 Thread Garrett Cooper

Garrett Cooper wrote:

Tim Kientzle wrote:
   -I tried ... buffering ...  the +CONTENTS file parsing function, 
and the

majority of the time it yielded good results 


One approach I prototyped sometime back was to use
libarchive in pkg_add as follows:
  * Open the archive
  * Read +CONTENTS directly into memory (it's
guaranteed to always be first in the archive)
  * Parse all of +CONTENTS at once
  * Continue scanning the archive, disposing
of each file as it appears in the archive.

Based on my experience with this, I would
suggest you just read all of +CONTENTS
directly into memory at once and parse
the whole thing in a single shot.
fopen(), then fstat() to get the size,
then allocate a buffer and read the whole
thing, then fclose().  You can then
parse it all at once.

As a bonus, your parser then becomes a nice
little bit of reusable code that reads
a block of memory and returns a structure describing
the package metadata.

Tim Kientzle
I'm not 100% sure because I'm not comparing apples (virtual disk on 
desktop via VMware) to apples (real disk on server), but I'm showing a 
2.5-fold speedup after adding the simple parser:


Virtual disk:
   4.42 real 1.37 user 1.47 sys

Real disk:
  10.26 real 5.36 user 0.99 sys

I'll run a battery of tests just to ensure whether or not that's the 
case.


Be back with results in a few more days.

-Garrett

Hello,
   As promised, here are some results for my work:

   By modifying the parser and heuristics in plist_cmd I appear to have 
decreased all figures (except plist_cmd, which I will note later) from 
their original values to much lower values. The only drawback is that I 
appear to have stimulated a bug with either malloc'ing memory, 
printf/vargs, or transferring large amounts of data via pipes where some 
of my debug messages are making it into plist_cmd(..) from 
obtainbymatch(..), which represents the the 3-fold increase in reported 
plist_cmd(..) iterations.


   I'm going to try replacing the debug commands with standard print 
statements wherever possible, then replace all tar commands with 
libarchive APIs, and see if the problem solves itself.


Notes:
1. This sample is based off x11-libs/atk.
2. It isn't the final set of results.
3. Graphs coming soon (need to simulate values in Excel on work machine 
and convert to screenshots later on when I have a break -- thinking 
around noon). I'll repost when I have them available.
4. CSV files available at: 
http://students.washington.edu/youshi10/posted/atk-results.tgz.

___
freebsd-ports@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-ports
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Finding slowdowns in pkg_install (continuations of previous threads)

2007-07-12 Thread Garrett Cooper

Tim Kientzle wrote:
   -I tried ... buffering ...  the +CONTENTS file parsing function, 
and the

majority of the time it yielded good results 


One approach I prototyped sometime back was to use
libarchive in pkg_add as follows:
  * Open the archive
  * Read +CONTENTS directly into memory (it's
guaranteed to always be first in the archive)
  * Parse all of +CONTENTS at once
  * Continue scanning the archive, disposing
of each file as it appears in the archive.

Based on my experience with this, I would
suggest you just read all of +CONTENTS
directly into memory at once and parse
the whole thing in a single shot.
fopen(), then fstat() to get the size,
then allocate a buffer and read the whole
thing, then fclose().  You can then
parse it all at once.

As a bonus, your parser then becomes a nice
little bit of reusable code that reads
a block of memory and returns a structure describing
the package metadata.

Tim Kientzle
I'm not 100% sure because I'm not comparing apples (virtual disk on 
desktop via VMware) to apples (real disk on server), but I'm showing a 
2.5-fold speedup after adding the simple parser:


Virtual disk:
   4.42 real 1.37 user 1.47 sys

Real disk:
  10.26 real 5.36 user 0.99 sys

I'll run a battery of tests just to ensure whether or not that's the case.

Be back with results in a few more days.

-Garrett
___
freebsd-ports@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-ports
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Finding slowdowns in pkg_install (continuations of previous threads)

2007-07-06 Thread Tim Kientzle
   I'm currently running a gamut of tests (500 tests, per package -- 
128 total on my server), and outputting all data to CSV files to 
interpret later, using another Perl script to interpret calculated 
averages and standard deviations.


Excellent!  Much-needed work.

   Using basic printf(2)'s with clock_gettime(2) I have determined 
that the majority of the issues are disk-bound (as Tom Kientzle put 
it).


Next question:  What are those disk operations and are any
of them redundant?


The scope of my problem is not to analyze tar,...


I've spent the last three years+ doing exactly that.
Make sure you're using the newest bsdtar/libarchive,
which has some very noticable performance improvements.

but I've 
discovered that a lot of time is spent in reading and interpreting the 
+CONTENTS and related files (most notably in parsing commands to be 
honest).


Oh?  That's interesting.  Is data being re-parsed (in which case
some structural changes to parse it once and store the results
may help)?  Or is the parser just slow?

   Will post more conclusive results tomorrow once all of my results 
are available.


I don't follow ports@ so didn't see these conclusive results
of yours.  I'm very interested, though.

Tim Kientzle
___
freebsd-ports@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-ports
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Finding slowdowns in pkg_install (continuations of previous threads)

2007-07-06 Thread Garrett Cooper

Tim Kientzle wrote:
   I'm currently running a gamut of tests (500 tests, per package -- 
128 total on my server), and outputting all data to CSV files to 
interpret later, using another Perl script to interpret calculated 
averages and standard deviations.


Excellent!  Much-needed work.

   Using basic printf(2)'s with clock_gettime(2) I have determined 
that the majority of the issues are disk-bound (as Tom Kientzle put 
it).


Next question:  What are those disk operations and are any
of them redundant?


The scope of my problem is not to analyze tar,...


I've spent the last three years+ doing exactly that.
Make sure you're using the newest bsdtar/libarchive,
which has some very noticable performance improvements.

but I've discovered that a lot of time is spent in reading and 
interpreting the +CONTENTS and related files (most notably in 
parsing commands to be honest).


Oh?  That's interesting.  Is data being re-parsed (in which case
some structural changes to parse it once and store the results
may help)?  Or is the parser just slow?

   Will post more conclusive results tomorrow once all of my results 
are available.


I don't follow ports@ so didn't see these conclusive results
of yours.  I'm very interested, though.

Tim Kientzle

Some extra notes:
   -My tests are still running, but almost done (unfortunately I won't 
be able to post any results before tonight since I'm going to work now). 
It's taking a lot longer than I originally thought it would (I've 
produced several gigabytes of logfiles and csv files... eep).
   -I placed them around what I considered pkg_install specific 
sensitive areas, i.e. locations where tar was run, or the meta files 
were processed.
   -I tried implementing a small buffering technique (read in 10 lines 
at once, parse the 10 lines, and repeat, instead of read 1 line and 
parse, then repeat), around the +CONTENTS file parsing function, and the 
majority of the time it yielded good results (9/10 times the buffering 
technique won over the non-buffering technique). Given that success I'm 
going to try implementing the file reading in terms of fgetc(2) to 
properly read in a number of lines all at once, and see what happens 
instead (my hunch is those results may be more favorable).

Thanks,
-Garrett
___
freebsd-ports@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-ports
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Finding slowdowns in pkg_install (continuations of previous threads)

2007-07-06 Thread Tim Kientzle

   -I tried ... buffering ...  the +CONTENTS file parsing function, and the
majority of the time it yielded good results 


One approach I prototyped sometime back was to use
libarchive in pkg_add as follows:
  * Open the archive
  * Read +CONTENTS directly into memory (it's
guaranteed to always be first in the archive)
  * Parse all of +CONTENTS at once
  * Continue scanning the archive, disposing
of each file as it appears in the archive.

Based on my experience with this, I would
suggest you just read all of +CONTENTS
directly into memory at once and parse
the whole thing in a single shot.
fopen(), then fstat() to get the size,
then allocate a buffer and read the whole
thing, then fclose().  You can then
parse it all at once.

As a bonus, your parser then becomes a nice
little bit of reusable code that reads
a block of memory and returns a structure describing
the package metadata.

Tim Kientzle
___
freebsd-ports@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-ports
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Finding slowdowns in pkg_install

2007-07-06 Thread Michel Talon
Tim Kientzle said:

 One approach I prototyped sometime back was to use
 libarchive in pkg_add as follows:
* Open the archive
* Read +CONTENTS directly into memory (it's
 guaranteed to always be first in the archive)

I can only concur with that. In my program
http://www.lpthe.jussieu.fr/~talon/check_pkg.py
i discovered that memory mapping +CONTENTS and then working
in memory before rewriting it was around 5 times faster
than reading line by line and parsing each line. See function
fix_pkg_database(). By the way i am writing a new +CONTENTS
fileand then renaming it, which avoids leaving a mess if
something goes astray like portupgrade does.


-- 

Michel TALON

___
freebsd-ports@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-ports
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Finding slowdowns in pkg_install

2007-07-06 Thread youshi10

On Fri, 6 Jul 2007, Michel Talon wrote:


Tim Kientzle said:


One approach I prototyped sometime back was to use
libarchive in pkg_add as follows:
   * Open the archive
   * Read +CONTENTS directly into memory (it's
guaranteed to always be first in the archive)


I can only concur with that. In my program
http://www.lpthe.jussieu.fr/~talon/check_pkg.py
i discovered that memory mapping +CONTENTS and then working
in memory before rewriting it was around 5 times faster
than reading line by line and parsing each line. See function
fix_pkg_database(). By the way i am writing a new +CONTENTS
fileand then renaming it, which avoids leaving a mess if
something goes astray like portupgrade does.


--

Michel TALON


Ok, excellent I'll try that then.
I'll work on an improved parser this weekend and probably will have more 
conclusive results for next week, but for the immediate point in time I'll post 
results on how slow / fast the critical sections were once I return home and 
post process my data again for averages and standard deviations. I'll use this 
as my basis for further conclusions this summer.
-Garrett

___
freebsd-ports@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-ports
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Finding slowdowns in pkg_install (continuations of previous threads)

2007-07-05 Thread Garrett Cooper

Garrett Cooper wrote:
   I'm currently running a gamut of tests (500 tests, per package -- 
128 total on my server), and outputting all data to CSV files to 
interpret later, using another Perl script to interpret calculated 
averages and standard deviations.


   Using basic printf(2)'s with clock_gettime(2) I have determined 
that the majority of the issues are disk-bound (as Tom Kientzle put 
it). The scope of my problem is not to analyze tar, but I've 
discovered that a lot of time is spent in reading and interpreting the 
+CONTENTS and related files (most notably in parsing commands to be 
honest).


   Will post more conclusive results tomorrow once all of my results 
are available.


Cheers,
-Garrett

Forgot to include [EMAIL PROTECTED]
-Garrett
___
freebsd-ports@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-ports
To unsubscribe, send any mail to [EMAIL PROTECTED]


Finding slowdowns in pkg_install (continuations of previous threads)

2007-07-05 Thread Garrett Cooper
   I'm currently running a gamut of tests (500 tests, per package -- 
128 total on my server), and outputting all data to CSV files to 
interpret later, using another Perl script to interpret calculated 
averages and standard deviations.


   Using basic printf(2)'s with clock_gettime(2) I have determined that 
the majority of the issues are disk-bound (as Tom Kientzle put it). The 
scope of my problem is not to analyze tar, but I've discovered that a 
lot of time is spent in reading and interpreting the +CONTENTS and 
related files (most notably in parsing commands to be honest).


   Will post more conclusive results tomorrow once all of my results 
are available.


Cheers,
-Garrett
___
freebsd-ports@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-ports
To unsubscribe, send any mail to [EMAIL PROTECTED]