A new and better way to do "make readmes"?

2012-01-27 Thread Conrad J. Sabatier
I've been thinking for a long time that we need a better way to do
"make readmes", one that would be properly integrated into our
ports Mk infrastructure, to take advantage of make's ability to
recognize which files are up-to-date and which really do need
rebuilding.

I like to make sure my README.html files are all up-to-date after my
nightly ports tree update, but with the current scheme, that means
either rebuilding *all* of the files in the tree, or (as I'm doing at
present) using some sort of "kludgey" (kludgy?) workaround.

I haven't actually started working on such an alternative method yet,
because I didn't want to dedicate the time to such an effort without
first checking to see how well it might be received by portmgr.

I realize this might possibly entail a less-than-trivial change to our
existing ports Mk infrastructure.  Would the overhead incurred in terms
of additional dependency lines mean the idea would most likely be nixed
right out of the gate?  I'd like to think that, if properly implemented,
the impact would be negligible, and the potential benefits would make
it well worthwhile.

Thanks for any feedback,

Conrad

-- 
Conrad J. Sabatier
conr...@cox.net
___
freebsd-ports@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-ports
To unsubscribe, send any mail to "freebsd-ports-unsubscr...@freebsd.org"


Re: A new and better way to do "make readmes"?

2012-01-28 Thread RW
On Fri, 27 Jan 2012 20:03:25 -0600
Conrad J. Sabatier wrote:

> I've been thinking for a long time that we need a better way to do
> "make readmes", one that would be properly integrated into our
> ports Mk infrastructure, to take advantage of make's ability to
> recognize which files are up-to-date and which really do need
> rebuilding.

This wont help and I think there's a better way that will make it up to
700 times faster.

When a make readmes is done at the top-level, the top-level and
category READMEs are created by make targets and the per port READMEs
are created by a perl script in one go from the INDEX- file. 

I once timed this and the 64 category  READMEs took 2 hours, but the
~20,000 port READMEs only took about 9 seconds.  Selective updating
isn't going to help because 99.9% of the time is spent in the
categories and it only takes a single port update to make a category
file obsolete.

I think the way to speed this up is to have the script generate the
category files too. There's no point in bringing in the top-level
README since that's already fast.

I've been toying with the idea of doing this, but have never got around
to it. If anyone wants to have a go I think it would be sensible to
write it in awk, since perl is no longer in the base system and the
existing perl script isn't really complex enough to be worth hanging-on
to. 
___
freebsd-ports@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-ports
To unsubscribe, send any mail to "freebsd-ports-unsubscr...@freebsd.org"


Re: A new and better way to do "make readmes"?

2012-01-28 Thread Conrad J. Sabatier
On Sat, 28 Jan 2012 14:37:34 +
RW  wrote:

> On Fri, 27 Jan 2012 20:03:25 -0600
> Conrad J. Sabatier wrote:
> 
> > I've been thinking for a long time that we need a better way to do
> > "make readmes", one that would be properly integrated into our
> > ports Mk infrastructure, to take advantage of make's ability to
> > recognize which files are up-to-date and which really do need
> > rebuilding.
> 
> This wont help and I think there's a better way that will make it up
> to 700 times faster.
> 
> When a make readmes is done at the top-level, the top-level and
> category READMEs are created by make targets and the per port READMEs
> are created by a perl script in one go from the INDEX- file. 
> 
> I once timed this and the 64 category  READMEs took 2 hours, but the
> ~20,000 port READMEs only took about 9 seconds.

  Am I understanding you correctly?  Are you
saying you built 20,000+ port READMEs in only 9 seconds?!  How is that
possible?  Or do you mean 9 seconds for each one?

> Selective updating isn't going to help because 99.9% of the time is
> spent in the categories and it only takes a single port update to
> make a category file obsolete.

This is the part I find troubling.  It would seem that it should be
more work to create an individual port README, with its plucking the
appropriate line out of the INDEX-* file and then parsing it into its
respective pieces and filling in a template, than to simply string
together a list of references to a bunch of already built port READMEs
into a category README.

What am I not getting here?

> I think the way to speed this up is to have the script generate the
> category files too. There's no point in bringing in the top-level
> README since that's already fast.

So what's making the category READMEs so slow then?

> I've been toying with the idea of doing this, but have never got
> around to it. If anyone wants to have a go I think it would be
> sensible to write it in awk, since perl is no longer in the base
> system and the existing perl script isn't really complex enough to be
> worth hanging-on to. 

Oooo, awk!  Been a while since I wrote any sizeable bit of code in it,
but I do remember it was rather fun to work with.  :-)

I'm still not sure I read that paragraph above correctly, though (re:
the times).  :-)

-- 
Conrad J. Sabatier
conr...@cox.net
___
freebsd-ports@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-ports
To unsubscribe, send any mail to "freebsd-ports-unsubscr...@freebsd.org"


Re: A new and better way to do "make readmes"?

2012-01-28 Thread Matthew Seaman
On 28/01/2012 16:28, Conrad J. Sabatier wrote:
>   Am I understanding you correctly?  Are you
> saying you built 20,000+ port READMEs in only 9 seconds?!  How is that
> possible?  Or do you mean 9 seconds for each one?

9 seconds sounds quite reasonable for generating 23000 or so files.

>> > Selective updating isn't going to help because 99.9% of the time is
>> > spent in the categories and it only takes a single port update to
>> > make a category file obsolete.

> This is the part I find troubling.  It would seem that it should be
> more work to create an individual port README, with its plucking the
> appropriate line out of the INDEX-* file and then parsing it into its
> respective pieces and filling in a template, than to simply string
> together a list of references to a bunch of already built port READMEs
> into a category README.
> 
> What am I not getting here?

No -- you're quite right.  You could generate the category README.html
files entirely from the data in the INDEX.  It's not quite as easy as
all that, because there aren't entries for each category separately, so
you'll have to parse the structure out of all of the paths in the INDEX.

>> > I think the way to speed this up is to have the script generate the
>> > category files too. There's no point in bringing in the top-level
>> > README since that's already fast.

> So what's making the category READMEs so slow then?

The big problem with performance in all this INDEX and README.html
building is that it takes quite a long time relatively to run make(1)
within any port or category directory.  make(1) has to read in a lot of
other files and stat(2) many more[*] -- all of which involves a lot of
random-access disk IO, and that's always going to take quite a lot of
time.  Now, doing 'make readme' in a category directory doesn't just run
make in that directory, but also in every port in that category.
Popular categories can contain many hundreds of ports.

Maybe I should add README.html generation to my FreeBSD::Portindex
stuff.  Should be pretty simple -- all the necessary bits are readily
available and it is just a matter of formatting it as HTML and printing
it out.

Cheers,

Matthew

[*] Running 'make -dA' with maximum debug output is quite enlightening,
as is running make under truss(1)

-- 
Dr Matthew J Seaman MA, D.Phil.   7 Priory Courtyard
  Flat 3
PGP: http://www.infracaninophile.co.uk/pgpkey Ramsgate
JID: matt...@infracaninophile.co.uk   Kent, CT11 9PW



signature.asc
Description: OpenPGP digital signature


Re: A new and better way to do "make readmes"?

2012-01-28 Thread Torfinn Ingolfsen
Hello,

On Sat, Jan 28, 2012 at 3:03 AM, Conrad J. Sabatier  wrote:

> I've been thinking for a long time that we need a better way to do
> "make readmes", one that would be properly integrated into our
> ports Mk infrastructure, to take advantage of make's ability to
> recognize which files are up-to-date and which really do need
> rebuilding.
>
> I like to make sure my README.html files are all up-to-date after my
> nightly ports tree update, but with the current scheme, that means
> either rebuilding *all* of the files in the tree, or (as I'm doing at
> present) using some sort of "kludgey" (kludgy?) workaround.
>
>
So people are actually using the readme files?
Are many people using them?
I ask because I *never* use them (unless they are used by 'make search'?),
I always use freshports.org (BTW, thanks for an excellent service!) when I
need to find out anything about a port.

-- 
Regards,
Torfinn Ingolfsen
___
freebsd-ports@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-ports
To unsubscribe, send any mail to "freebsd-ports-unsubscr...@freebsd.org"


Re: A new and better way to do "make readmes"?

2012-01-28 Thread Michel Talon
Matthew Seaman said
The big problem with performance in all this INDEX and README.html
building is that it takes quite a long time relatively to run make(1)
within any port or category directory.  make(1) has to read in a lot of
other files and stat(2) many more[*] -- all of which involves a lot of
random-access disk IO, and that's always going to take quite a lot of
time.  Now, doing 'make readme' in a category directory doesn't just run
make in that directory, but also in every port in that category.
Popular categories can contain many hundreds of ports.

Maybe I should add README.html generation to my FreeBSD::Portindex
stuff.  Should be pretty simple -- all the necessary bits are readily
available and it is just a matter of formatting it as HTML and printing
it out.

Indeed, the following python script
http://www.lpthe.jussieu.fr/~talon/show_index.py
parses the index in a few seconds and can display exactly the same information 
as the
readme.html on demand in a web browser, which is far cleaner than polluting the 
ports tree
with the readmes. Alternatively i have a fcgi version that can be coupled to 
web servers
supporting fcgi like lighttpd.
http://www.lpthe.jussieu.fr/~talon/show_index.fcgi
Already 5 years this was done  ...


--

Michel Talon
ta...@lpthe.jussieu.fr





___
freebsd-ports@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-ports
To unsubscribe, send any mail to "freebsd-ports-unsubscr...@freebsd.org"


Re: A new and better way to do "make readmes"?

2012-02-02 Thread Conrad J. Sabatier
[ Sorry to be so late in following up on this; lost track for a while ]

On Sat, 28 Jan 2012 16:53:17 +
Matthew Seaman  wrote:

> On 28/01/2012 16:28, Conrad J. Sabatier wrote:
> >   Am I understanding you correctly?  Are
> > you saying you built 20,000+ port READMEs in only 9 seconds?!  How
> > is that possible?  Or do you mean 9 seconds for each one?
> 
> 9 seconds sounds quite reasonable for generating 23000 or so files.

It sounds incredible to me!  :-)

> >> > Selective updating isn't going to help because 99.9% of the time
> >> > is spent in the categories and it only takes a single port
> >> > update to make a category file obsolete.
> 
> > This is the part I find troubling.  It would seem that it should be
> > more work to create an individual port README, with its plucking the
> > appropriate line out of the INDEX-* file and then parsing it into
> > its respective pieces and filling in a template, than to simply
> > string together a list of references to a bunch of already built
> > port READMEs into a category README.
> > 
> > What am I not getting here?
> 
> No -- you're quite right.  You could generate the category README.html
> files entirely from the data in the INDEX.  It's not quite as easy as
> all that, because there aren't entries for each category separately,
> so you'll have to parse the structure out of all of the paths in the
> INDEX.

Well, the idea I had in mind was that, if all of the individual ports'
README.html files already are in place, then it should be trivial to
just "ls" or "find" them under each category to fill in the category's
README.html.  No need to reference the INDEX or anything else.  Or???

The workaround method I've been running out of cron for the last month
or so is:

1) Create a "sentinel" file under /tmp to use as a timestamp, just
before running "cvs update" on ports (I update my ports tree from a
local copy of the CVS repo maintained via csup)

2) After cvs completes, look for any port directories containing
updates (check timestamps against the sentinel file) and do a "make
readme" for each one:

find $PORTSDIR -type f ! -path "*/CVS/*" -newercm $SENTINEL -depth 3 |
xargs dirname |
sort -u | xargs -I@ /bin/sh -c "cd @ && make readme"

3) Last, but not least, build the category README.html for any
categories with ports containing newly updated README.html files.

I have noticed while doing this that, as you mentioned, the category
READMEs take considerably longer than the individual ports'.

I don't even bother to rebuild the top-level file, since it's basically
unchanging anyway.

> >> > I think the way to speed this up is to have the script generate
> >> > the category files too. There's no point in bringing in the
> >> > top-level README since that's already fast.
> 
> > So what's making the category READMEs so slow then?
> 
> The big problem with performance in all this INDEX and README.html
> building is that it takes quite a long time relatively to run make(1)
> within any port or category directory.  make(1) has to read in a lot
> of other files and stat(2) many more[*] -- all of which involves a
> lot of random-access disk IO, and that's always going to take quite a
> lot of time.  Now, doing 'make readme' in a category directory
> doesn't just run make in that directory, but also in every port in
> that category. Popular categories can contain many hundreds of ports.

I'm a little rusty on the actual mechanics of make, but shouldn't it be
possible to run a single, over-arching make on each category that
wouldn't need to spawn a bunch of sub-makes?

> Maybe I should add README.html generation to my FreeBSD::Portindex
> stuff.  Should be pretty simple -- all the necessary bits are readily
> available and it is just a matter of formatting it as HTML and
> printing it out.

"Maybe"?  Whaddya mean, "maybe"?  :-)  Sounds like it would definitely
be worth doing!

>   Cheers,
> 
>   Matthew
> 
> [*] Running 'make -dA' with maximum debug output is quite
> enlightening, as is running make under truss(1)

Enlightening, perhaps.  Sometimes overwhelming, is more like it.  :-)

-- 
Conrad J. Sabatier
conr...@cox.net
___
freebsd-ports@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-ports
To unsubscribe, send any mail to "freebsd-ports-unsubscr...@freebsd.org"


Re: A new and better way to do "make readmes"?

2012-02-02 Thread Conrad J. Sabatier
On Sat, 28 Jan 2012 18:44:48 +0100
Torfinn Ingolfsen  wrote:

> Hello,
> 
> On Sat, Jan 28, 2012 at 3:03 AM, Conrad J. Sabatier 
> wrote:
> 
> > I've been thinking for a long time that we need a better way to do
> > "make readmes", one that would be properly integrated into our
> > ports Mk infrastructure, to take advantage of make's ability to
> > recognize which files are up-to-date and which really do need
> > rebuilding.
> >
> > I like to make sure my README.html files are all up-to-date after my
> > nightly ports tree update, but with the current scheme, that means
> > either rebuilding *all* of the files in the tree, or (as I'm doing
> > at present) using some sort of "kludgey" (kludgy?) workaround.
> >
> >
> So people are actually using the readme files?
> Are many people using them?
> I ask because I *never* use them (unless they are used by 'make
> search'?), I always use freshports.org (BTW, thanks for an excellent
> service!) when I need to find out anything about a port.
> 

Well, in actual practice, it's true, I don't use them a *lot*, but I do
use them from time to time when I'm looking for a new port to install
for a certain purpose.  It's nice to have up-to-date README.html files
locally when the need arises.  But they sure are expensive to maintain
currently.

-- 
Conrad J. Sabatier
conr...@cox.net
___
freebsd-ports@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-ports
To unsubscribe, send any mail to "freebsd-ports-unsubscr...@freebsd.org"


Re: A new and better way to do "make readmes"?

2012-02-02 Thread Jason Helfman

On Thu, Feb 02, 2012 at 03:21:37PM -0600, Conrad J. Sabatier thus spake:

[ Sorry to be so late in following up on this; lost track for a while ]

On Sat, 28 Jan 2012 16:53:17 +
Matthew Seaman  wrote:


On 28/01/2012 16:28, Conrad J. Sabatier wrote:
>   Am I understanding you correctly?  Are
> you saying you built 20,000+ port READMEs in only 9 seconds?!  How
> is that possible?  Or do you mean 9 seconds for each one?

9 seconds sounds quite reasonable for generating 23000 or so files.


It sounds incredible to me!  :-)


>> > Selective updating isn't going to help because 99.9% of the time
>> > is spent in the categories and it only takes a single port
>> > update to make a category file obsolete.

> This is the part I find troubling.  It would seem that it should be
> more work to create an individual port README, with its plucking the
> appropriate line out of the INDEX-* file and then parsing it into
> its respective pieces and filling in a template, than to simply
> string together a list of references to a bunch of already built
> port READMEs into a category README.
>
> What am I not getting here?

No -- you're quite right.  You could generate the category README.html
files entirely from the data in the INDEX.  It's not quite as easy as
all that, because there aren't entries for each category separately,
so you'll have to parse the structure out of all of the paths in the
INDEX.


Well, the idea I had in mind was that, if all of the individual ports'
README.html files already are in place, then it should be trivial to
just "ls" or "find" them under each category to fill in the category's
README.html.  No need to reference the INDEX or anything else.  Or???

The workaround method I've been running out of cron for the last month
or so is:

1) Create a "sentinel" file under /tmp to use as a timestamp, just
before running "cvs update" on ports (I update my ports tree from a
local copy of the CVS repo maintained via csup)

2) After cvs completes, look for any port directories containing
updates (check timestamps against the sentinel file) and do a "make
readme" for each one:

find $PORTSDIR -type f ! -path "*/CVS/*" -newercm $SENTINEL -depth 3 |
   xargs dirname |
   sort -u | xargs -I@ /bin/sh -c "cd @ && make readme"

3) Last, but not least, build the category README.html for any
categories with ports containing newly updated README.html files.

I have noticed while doing this that, as you mentioned, the category
READMEs take considerably longer than the individual ports'.

I don't even bother to rebuild the top-level file, since it's basically
unchanging anyway.


>> > I think the way to speed this up is to have the script generate
>> > the category files too. There's no point in bringing in the
>> > top-level README since that's already fast.

> So what's making the category READMEs so slow then?

The big problem with performance in all this INDEX and README.html
building is that it takes quite a long time relatively to run make(1)
within any port or category directory.  make(1) has to read in a lot
of other files and stat(2) many more[*] -- all of which involves a
lot of random-access disk IO, and that's always going to take quite a
lot of time.  Now, doing 'make readme' in a category directory
doesn't just run make in that directory, but also in every port in
that category. Popular categories can contain many hundreds of ports.


I'm a little rusty on the actual mechanics of make, but shouldn't it be
possible to run a single, over-arching make on each category that
wouldn't need to spawn a bunch of sub-makes?


Maybe I should add README.html generation to my FreeBSD::Portindex
stuff.  Should be pretty simple -- all the necessary bits are readily
available and it is just a matter of formatting it as HTML and
printing it out.


"Maybe"?  Whaddya mean, "maybe"?  :-)  Sounds like it would definitely
be worth doing!


Cheers,

Matthew

[*] Running 'make -dA' with maximum debug output is quite
enlightening, as is running make under truss(1)


Enlightening, perhaps.  Sometimes overwhelming, is more like it.  :-)



Not to fancy, but I used this when I was updating the readmes to not break.

#!/bin/sh
cd /usr/ports
for i in `make -V SUBDIR |sed s/local//g`; do for p in `make -C $i -V SUBDIR`; do echo $i/$p && 
sudo  make -C "$i/$p" readme ; done; done >> ~/readmes.log

-jgh

--
Jason Helfman
System Administrator
experts-exchange.com
http://www.experts-exchange.com/M_4830110.html
E4AD 7CF1 1396 27F6 79DD  4342 5E92 AD66 8C8C FBA5
___
freebsd-ports@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-ports
To unsubscribe, send any mail to "freebsd-ports-unsubscr...@freebsd.org"


Re: A new and better way to do "make readmes"?

2012-02-02 Thread Conrad J. Sabatier
On Thu, 2 Feb 2012 13:25:14 -0800
Jason Helfman  wrote:

> On Thu, Feb 02, 2012 at 03:21:37PM -0600, Conrad J. Sabatier thus
> spake:

[snip]

> >The workaround method I've been running out of cron for the last
> >month or so is:
> >
> >1) Create a "sentinel" file under /tmp to use as a timestamp, just
> >before running "cvs update" on ports (I update my ports tree from a
> >local copy of the CVS repo maintained via csup)
> >
> >2) After cvs completes, look for any port directories containing
> >updates (check timestamps against the sentinel file) and do a "make
> >readme" for each one:
> >
> >find $PORTSDIR -type f ! -path "*/CVS/*" -newercm $SENTINEL -depth 3
> >|
> >xargs dirname |
> >sort -u | xargs -I@ /bin/sh -c "cd @ && make readme"
> >
> >3) Last, but not least, build the category README.html for any
> >categories with ports containing newly updated README.html files.
> >
> >I have noticed while doing this that, as you mentioned, the category
> >READMEs take considerably longer than the individual ports'.
> >
> >I don't even bother to rebuild the top-level file, since it's
> >basically unchanging anyway.

[snip] 

> Not to fancy, but I used this when I was updating the readmes to not
> break.
> 
> #!/bin/sh
> cd /usr/ports
> for i in `make -V SUBDIR |sed s/local//g`; do for p in `make -C $i -V
> SUBDIR`; do echo $i/$p && sudo  make -C "$i/$p" readme ; done; done
> >> ~/readmes.log
> 
> -jgh

Interesting.  I'll take a look at using that.  Thanks!

-- 
Conrad J. Sabatier
conr...@cox.net
___
freebsd-ports@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-ports
To unsubscribe, send any mail to "freebsd-ports-unsubscr...@freebsd.org"