Should probably start by saying I'm not trying to ruffle any feathers
here. I recently started looking into cleaning up dev-perl of ebuilds
who's tarballs or no longer available at the SRC_URI. Part way into it I
discovered that although the SRC no longer supports/contains the
tarball, the ebuild itself still functions because the tarball has been
mirrored on the gentoo mirror cache's. So I started wondering - how
many other ebuilds fit the bill?

My methods were hopefully not too questionable - I did a scan of the
portage tree, attempted as best as possible to replicate the creation of
the P, PV, PN, MY_P, MY_PV, etc. variables, and then took the listings for
the SRC_URI's in that context and checked to see if what they pointed at
was still there. I used two different techniques for this for two
passes. The first was a set of short perl scripts that attempted to
verify that there was a something to stream from the SRC_URI. First pass
eliminated about 10,000 SRC_URI's (if I remember right, the total
number was 15,000'ish SRC_URI's, which includes patches, multiple
sources, etc). The second pass took the resulting list of possibly bad
ebuild/src's and attempted a wget against the target. I still have some
misgivings about the accuracy because this is still dependant on my
correctly creating the internal variables correctly back in step one. I
eliminated, from the get-go, any ebuilds that used the mirror://
syntax, and I know that there are false failures for SRC_URI's that use an
inline ${P/some/change/}. But the numbers are still pretty high, and I've
done random spot checking to confirm that, yep, there's nothing there.

So I guess my question is, what's the take on this? Should we be only
providing ebuilds that point to src's that still work outside of our
cacheing system? My results were that there were 1915 ebuilds pointing to
2290 invalid URL's. Here's the list[1] that I came up with after the second
pass. I welcome (ok, I live in fear of criticism, but that's
counterproductive) feedback on the scripts. This[2] is the bash script that
did the initial pass, as well as the perl[3] script that did the initial
checks. This[4] is the second pass script that attempted to perform actual
wget's on the final list. For the weak of eye, here's the secondpass as
html[5].

If you want to attempt to use my scripts yourself - beware the second pass,
which is definitely necessary (the secondpass file was about half the size
of the first pass - network issues? not sure) as it does a complete wget to
confirm against.

Thanks all for taking the time to read this mess, and yes I realize
there are a fair number of perl ebuilds in there too,

Mike



 1. http://dev.gentoo.org/~mcummings/secondpass.txt
 2. http://dev.gentoo.org/~mcummings/track_builds.txt
 3. http://dev.gentoo.org/~mcummings/getter.txt
 4. http://dev.gentoo.org/~mcummings/pass2.txt
 5. http://dev.gentoo.org/~mcummings/secondpass.html

--
[EMAIL PROTECTED] mailing list

Reply via email to