jayvdb added a comment.

I've processed ~1500 pages with matching `*.jpl.nasa.gov` on en.wp, and only 
found two problems.  All of the other HTTP and socket errors were real 
problems, checked against DNS records or reproducible using a web browser on a 
very different IP address range.

The command used:

  $ python pwb.py weblinkchecker -family:wikipedia -lang:en 
-weblink:'*.jpl.nasa.gov' -namespace 0

The first and only serious problem was [[w:en:Real versus nominal value]] 
linked to 
http://jpl.nasa.gov/news/news.cfm?release=2009-191&icid='NewsFeaturesHome' , 
which weblinkchecker sees as 404 Not Found.

When I change 
<https://en.wikipedia.org/w/index.php?title=Real_versus_nominal_value&diff=700761748&oldid=695212610>
 the article to remove the unnecessary `&icid...`, weblinkchecker works 
correctly, which I suspect means there is a bug encoding `'` in URLs.  Note 
this will be fixed by `requests` anyway.

The other difference is entirely understandable: `[[List of NASA websites]] 
links to https://nightsky.jpl.nasa.gov/ - Socket Error: u'[SSL: 
CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:590)`.  A 
web-browser shows this website, but with a certificate warning.  IMO this is 
still a real error.  Ignoring certificate warnings will be much easier with 
`requests`.

Another SSL issue worth mentioning is `[[SHARAD]] links to 
http://starbrite.jpl.nasa.gov/pds/viewInstrumentProfile.jsp?INSTRUMENT_ID=SHARAD&INSTRUMENT_HOST_ID=MRO
 - Socket Error: u'[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed 
(_ssl.c:590)'.`  In a web-browser that also is a HTTP 500.


TASK DETAIL
  https://phabricator.wikimedia.org/T124015

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: jayvdb
Cc: Masti, Xqt, Aklapper, StudiesWorld, jayvdb, pywikibot-bugs-list



_______________________________________________
pywikibot-bugs mailing list
pywikibot-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/pywikibot-bugs

Reply via email to