Bug#820549: anarchism: Typo in section H1
Hi, > > I'd like to, but the way anarchism 15 is being uploaded right now makes > me really feel uncomfortable: I think what really is missing is to change the Debian package README to clarify some points. Please, have a look to the just pushed README [0] in "upstream". We thought it could be a good idea to have separated the script to download automatically (and even in real time running it in a box) the authors' original and the original html extracted from the script plus conversions to other formats. The repository with the html would be the Debian upstream and the repository with scraper should be also packaged for Debian. > > - there's tons of trailing whitespaces (at least in debian/changelog and > some html files). > If I open my editor on a file, I end up changing tons of lines. Shall > I use sed to change a typo, or shall you fix your whitespaces first? > The trailing whitespaces come from the original HTML. Personally, i would not modify the original HTML, but we could include a patch in the Debian package to do so. > - the .html is not anymore readable by a human, it's just a bunch of > parsed content. Hmm, you mean the HTML code or what is rendered in a browser?. Usually people do not read HTML code, i guess that is why the Debian package generated txt, and for more readable formats we have Markdown, that is why is generated by the scraper. The HTML displayed in a browser should be readable as in the original Website. If it is not, then maybe there is some bug in the scraper. > Do I really have to create a patch of (at least) 8K just to add a > letter? This is a good question, the idea with the separate scraper and source repositories was that people could fix the Markdown directly in that repository, but then there should probably be other Markdown directory or branch, to do not end up overwrotten by the conversion from the HTML but possible to merge and use that one for the Debian package. > > - there's now multiple directories having the same content, instead of > just one html/ from where to get all other file formats. > How am I supposed to fix this typo? Do I have to fix it in all 3 > places? The scraper generate the markdown and txt directories. Having markdown, txt should probably not exists, it comes from the previous Debian script. > > Even if I wanted to fix the problem at the root, the script you used > to generate it's not in the package itself, and is *big*. Yeah, sorry, it has not been packaged yet, but the idea is to package it independently of the HTML. > I'm not even sure anymore where I am contributing now. > > Before anarchism 15 was released I tried to help a bit myself, and > ended up with the attached relevant debian/ files. > My solution has clearly more little errors than yours, but at least > anybody can reprocude it. You can not reproduce the HTML, Markdown and txt from the scraper?. It should, that was the main idea. The "source" HTML used to generated the older Debian versions, was never in any repository nor there was an script to download it from the original Website. Maybe you can find something helpful from > there to help me fix this issue. > Ok, i will look those files. It would be great if you look at the new README. A process like the following seems good to me: 1. the scraper obtain the HTML from the original Website 2. the scraper push the HTML to other repository, together with the generated Markdown 3. Debian package uses the generated "source" as upstream. Probably what is missing, as said above, is to have a separated Markdown directory where to push any content change that does not reflect the original and specify clearly that that is the directory where to fix typos and content. At least would be easy to see differences with the original and new original versions. Thanks for your comments, ju. [0] https://0xacab.org/ju/afaq/blob/master/README.md
Bug#820549: anarchism: Typo in section H1
juwrites: > Or maybe they wanted to say <>? No, that's not possible as the full sentence is: « These critiques contain may important ideas and so are worth summarising. » >> I just thought it was a typo, and that it would have been nice to >> report >> it. I also downloaded the version proposed in #773529, and I can >> confirm this >> bug still applies. > > Now that #773529 is closed and the bug still applies, maybe is better a > PR to the repository where the sources are now?. > I'd like to, but the way anarchism 15 is being uploaded right now makes me really feel uncomfortable: - there's tons of trailing whitespaces (at least in debian/changelog and some html files). If I open my editor on a file, I end up changing tons of lines. Shall I use sed to change a typo, or shall you fix your whitespaces first? - the .html is not anymore readable by a human, it's just a bunch of parsed content. Do I really have to create a patch of (at least) 8K just to add a letter? - there's now multiple directories having the same content, instead of just one html/ from where to get all other file formats. How am I supposed to fix this typo? Do I have to fix it in all 3 places? Even if I wanted to fix the problem at the root, the script you used to generate it's not in the package itself, and is *big*. I'm not even sure anymore where I am contributing now. Before anarchism 15 was released I tried to help a bit myself, and ended up with the attached relevant debian/ files. My solution has clearly more little errors than yours, but at least anybody can reprocude it. Maybe you can find something helpful from there to help me fix this issue. Hasta la pasta, rules Description: Binary data control Description: Binary data #!/usr/bin/env python2 import sys from bs4 import BeautifulSoup as BS html = BS( ''' ''', 'lxml') def clean_html(inf, outf): dirty = BS(inf, 'lxml', from_encoding='utf8') content, = dirty.find_all('div', attrs={'class': 'content clear-block'}) html.head.append(dirty.title) html.body.append(content) outf.write(html.encode('utf8')) if __name__ == '__main__': if len(sys.argv) == 2: with open(sys.argv[1], 'r') as inf: clean_html(inf, sys.stdout) elif len(sys.argv) == 1: inf, outf = sys.stdin, sys.stdout clean_html(inf, outf) else: sys.exit(1) -- µ.
Bug#820549: anarchism: Typo in section H1
X-Debbugs-Cc: j...@riseup.net Package: anarchism Version: 15.0-1 Followup-For: Bug #820549 Hi, > In section H.1 of the faq, I read «These critiques contain may > important ideas > and so are worth summarising», where "may" is probably a "many" > mispelled. Or maybe they wanted to say <>? > I just thought it was a typo, and that it would have been nice to > report > it. I also downloaded the version proposed in #773529, and I can > confirm this > bug still applies. Now that #773529 is closed and the bug still applies, maybe is better a PR to the repository where the sources are now?. Cheers, ju.
Bug#820549: anarchism: Typo in section H1
Package: anarchism Version: 14.0-4 Severity: normal Tags: upstream Dear Maintainer, In section H.1 of the faq, I read «These critiques contain may important ideas and so are worth summarising», where "may" is probably a "many" mispelled. I just thought it was a typo, and that it would have been nice to report it. I also downloaded the version proposed in #773529, and I can confirm this bug still applies. -- System Information: Debian Release: stretch/sid APT prefers unstable APT policy: (900, 'unstable') Architecture: amd64 (x86_64) Foreign Architectures: i386 Kernel: Linux 4.4.0-1-amd64 (SMP w/4 CPU cores) Locale: LANG=en_US.utf8, LC_CTYPE=en_US.utf8 (charmap=UTF-8) Shell: /bin/sh linked to /usr/bin/bash Init: systemd (via /run/systemd/system) anarchism depends on no packages. anarchism recommends no packages. Versions of packages anarchism suggests: ii chromium [www-browser] 49.0.2623.87-1 ii firefox-esr [www-browser] 45.0.1esr-1 ii google-chrome-stable [www-browser] 49.0.2623.108-1 ii w3m [www-browser] 0.5.3-27 ii yelp3.16.1-1 -- no debconf information