Re: Auto-checking dead links in the manual (was: http: links in the manual)
Max Nikulin writes: > I hope that selenium is currently overkill, however more sites are > starting to use anti-DDOS shields like cloudflare and HTTP client may be > banned just because it does not fetch other resources like JS scripts. Such links are to be considered dead for the purposes of Org manual. We must not link websites that cannot be opened without running non-free JS. It is according to GNU Documentation Standards. > I do not have a patch, just an idea: export backend that ignores > everything besides link and either send requests from lisp code or > generate file for another tool. > > #+attr_linklint: ... > > may be used to specify regexp that target page is expected to contain. > There are some complications like e.g. "info:" links having special code > to generate HTML with URL derived from original path. So it may be more > robust to parse HTML document (without checking of linked document text). Yes, the most robust way will be simply extracting links from the html version of the manual and testing them using whatever method is appropriate. -- Ihor Radchenko, Org mode contributor, Learn more about Org mode at https://orgmode.org/. Support Org development at https://liberapay.com/org-mode, or support my work at https://liberapay.com/yantar92
Re: Auto-checking dead links in the manual (was: http: links in the manual)
> I hope that selenium is currently overkill Me too, although the WebDriver protocol itself is less bloated than Selenium. Personally I use Etaoin[1] for anything WebDriver-related, it's pretty compact, Lisp-y, and you can easily run unit tests with Emacs. As for anything ready-made for cleaning up dead links, I'm not aware of, unfortunately. [1] https://github.com/clj-commons/etaoin
Re: Auto-checking dead links in the manual (was: http: links in the manual)
On 22/08/2022 09:46, Ihor Radchenko wrote: Juan Manuel Macías writes: Maybe, instead of repairing the links manually, we could think of some code that would do this work periodically, and also check the health of the links, running a url request on each link and returning a list of broken links. I don't know if it is possible to do something like that in Elisp, as I don't have much experience with web and link issues. I think there are also external tools, like Selenium Web Driver, but my experience with it is very limited (I use Selenium from time to time when I want to take a screenshot of a web page). This is a good idea. Selenium is probably an overkill since we should better not link JS-only websites from the manual anyway. What we can do instead is a make target that will use something like wget. Patches are welcome! I hope that selenium is currently overkill, however more sites are starting to use anti-DDOS shields like cloudflare and HTTP client may be banned just because it does not fetch other resources like JS scripts. I do not have a patch, just an idea: export backend that ignores everything besides link and either send requests from lisp code or generate file for another tool. #+attr_linklint: ... may be used to specify regexp that target page is expected to contain. There are some complications like e.g. "info:" links having special code to generate HTML with URL derived from original path. So it may be more robust to parse HTML document (without checking of linked document text).
Auto-checking dead links in the manual (was: http: links in the manual)
Juan Manuel Macías writes: >> Max Nikulin to emacs-orgmode. [PATCH] org-manual.org: Update links to >> MathJax docs. Sun, 3 Oct 2021 23:17:46 +0700. >> https://list.orgmode.org/sjcl3b$gsr$1...@ciao.gmane.io >> >> In the particular case of docs.mathjax.org I am unsure if mild >> preference of http: over https: is not a mistake in the server >> configuration. I do not mind "https:" there, any variant is better >> than the old broken link. > > Maybe, instead of repairing the links manually, we could think of some > code that would do this work periodically, and also check the health of > the links, running a url request on each link and returning a list of > broken links. I don't know if it is possible to do something like that > in Elisp, as I don't have much experience with web and link issues. I > think there are also external tools, like Selenium Web Driver, but my > experience with it is very limited (I use Selenium from time to time > when I want to take a screenshot of a web page). This is a good idea. Selenium is probably an overkill since we should better not link JS-only websites from the manual anyway. What we can do instead is a make target that will use something like wget. Patches are welcome! -- Ihor Radchenko, Org mode contributor, Learn more about Org mode at https://orgmode.org/. Support Org development at https://liberapay.com/org-mode, or support my work at https://liberapay.com/yantar92