On Mon, 13 Sept 2021 at 23:56, Dave Fisher <wave4d...@comcast.net> wrote:
>
> Once podlings report their download page something like this can be 
> incorporated in clutch3 which will have svn log info from dist.

Note that there are quite a few other checks that are needed for a
compliant download page, e.g.
- no references to nightly or snapshot builds
- no reference to repository.apache.org
- all releases have sigs and hashes (and vice-versa)
- KEYS file link is present and correct
- code verification instructions are present
- no md5 or sha1 hashes
- ...

The ruby script at
https://github.com/apache/whimsy/blob/master/tools/download_check.rb
does this, as well as checking that links actually work, i.e. that
mirrors have the files.

[Sorry, but it's not very well structured at present...]

Of course neither script is likely to work if the page uses JavaScript.

> Thanks
>
> Sent from my iPhone
>
> > On Sep 13, 2021, at 3:46 PM, Justin Mclean <jus...@classsoftware.com> wrote:
> >
> > Hi,
> >
> > A while back I wrote a script to check podling download links and we 
> > attempted to get them all corrected. you need to manual list all of the 
> > download pages.
> >
> > Things it doesn’t do:
> > - check if the latest release is there
> > - check if the contents match with what is in /dist
> >
> > Might be time to run it again.
> >
> > Here’s the python code, you might find it useful.
> >
> > from bs4 import BeautifulSoup
> > import urllib.request
> > import re
> >
> > downloadPages = [
> > "https://mxnet.apache.org/get_started/download";
> > ]
> >
> > for page in downloadPages:
> >    response = urllib.request.urlopen(page)
> >    data = response.read()
> >    soup = BeautifulSoup(data,'lxml')
> >
> >    print()
> >    print("Checking " + page)
> >
> >    alllinks = soup('a')
> >    missing = True
> >    for link in alllinks:
> >        if link.has_attr('href'):
> >            href =  link['href']
> >            text = link.contents
> >            if href.endswith('.zip') or href.endswith('.tar.gz') or 
> > href.endswith('.tzg') or href.endswith('.msi') or href.endswith('.rpm'):
> >                if href.startswith('http://www.apache.org/dist/') or 
> > href.startswith('https://www.apache.org/dist/'):
> >                    print("Please change link to" + href + " to not use 
> > http://www.apache.org/dist/ and use https://www.apache.org/dyn/closer.lua 
> > instead")
> >                if href.startswith('http://downloads.apache.org/') or 
> > href.startswith('https://downloads.apache.org/'):
> >                    print("Please change link to" + href + " to not use 
> > http://downloads.apache.org/ and use https://www.apache.org/dyn/closer.lua 
> > instead")
> >                if href.startswith('http://dist.apache.org/repos/dist/dev') 
> > or href.startswith('https://dist.apache.org/repos/dist/dev'):
> >                    print("Please change link to " + href + " to release 
> > area and use https://www.apache.org/dyn/closer.lua";)
> >                if 
> > href.startswith('http://dist.apache.org/repos/dist/release') or 
> > href.startswith('https://dist.apache.org/repos/dist/release'):
> >                    print("Please use use 
> > https://www.apache.org/dyn/closer.lua to download releases")
> >                if 
> > href.startswith('https://downloads.apache.org/incubator/'):
> >                    print("Please use use 
> > https://www.apache.org/dyn/closer.lua to download releases")
> >            if href.endswith('.sha512') or href.endswith('.sha256') or 
> > href.endswith('.asc'):
> >                missing = False
> >                if  href.startswith('http://www.apache.org/dist/') or 
> > href.startswith('https://www.apache.org/dist/'):
> >                    print("Please change link to " + href + " to go via 
> > https://downloads.apache.org/. https://www.apache.org/dist/ has been 
> > deprecated.")
> >                if not href.startswith('https://downloads.apache.org/') and 
> > not href.startswith('https://archive.apache.org/dist'):
> >                    print("Please change link to " + href + " to go via 
> > https://downloads.apache.org/ or https://archive.apache.org/dist";)
> >            if href.endswith('.sha'):
> >                 print("for link " + href + " .sha should no longer be used. 
> > Please change ot use .sha256 or .sha512.")
> >    if missing:
> >        print("Links to signatures and hashes are missing”)
> >
> > Kind Regards,
> > Justin
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> > For additional commands, e-mail: general-h...@incubator.apache.org
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org

Reply via email to