Steve Smith wrote: > I am having the same issue. I can either get the text to wrap, which makes > all the text wrap, or I can get the text to ignore independent '/n' > characters, so that all the blank space is removed. I'd like to set up my > code, so that only 1 blank space is remaining (I'll settle for none at > this point), an the text wraps up to 100 chars or so out per line. Does > anyone have any thoughts on the attached code? And what I'm not doing > correctly? > > > #import statements > import textwrap > import requests > from bs4 import BeautifulSoup > > #class extension of textwrapper > class DocumentWrapper(textwrap.TextWrapper): > > def wrap(self, text): > split_text = text.split('\n') > lines = [line for para in split_text for line in > textwrap.TextWrapper.wrap(self, para)] return lines > > #import statement of text. > page = requests.get("http://classics.mit.edu/Aristotle/rhetoric.mb.txt") > soup = BeautifulSoup(page.text, "html.parser") > > #instantiation of extension of textwrap.wrap. > d = DocumentWrapper(width=110,initial_indent='',fix_sentence_endings=True > ) new_string = d.fill(page.text) > > #set up an optional variable, even attempted applying BOTH the extended > #method and the original method to the issue... nothing has worked. > #new_string_2 = textwrap.wrap(new_string,90) > > #with loop with JUST the class extension of textwrapper. > with open("Art_of_Rhetoric.txt", "w") as f: > f.writelines(new_string) > > #with loop with JUST the standard textwrapper.text method applied to it. > with open("Art_of_Rhetoric2.txt", "w") as f: > f.writelines(textwrap.wrap(page.text,90))
I think in your case the problem is that newlines in the source text do not indicate paragraphs -- thus you should not keep them. Instead try interpreting empty lines as paragraph separators: $ cat tmp.py import sys import textwrap import itertools import requests from bs4 import BeautifulSoup class DocumentWrapper(textwrap.TextWrapper): def wrap(self, text): paras = ( "".join(group) for non_empty, group in itertools.groupby( text.splitlines(True), key=lambda line: bool(line.strip()) ) if non_empty ) wrap = super().wrap lines = [line for para in paras for line in wrap(para)] return lines page = requests.get("http://classics.mit.edu/Aristotle/rhetoric.mb.txt").text d = DocumentWrapper(width=110, initial_indent='', fix_sentence_endings=True) new_string = d.fill(page) sys.stdout.write(new_string) $ python3 tmp.py | head -n10 Provided by The Internet Classics Archive. See bottom for copyright. Available online at http://classics.mit.edu//Aristotle/rhetoric.html Rhetoric By Aristotle Translated by W. Rhys Roberts ---------------------------------------------------------------------- BOOK I Part 1 Rhetoric is the counterpart of Dialectic. Both alike are concerned with such things as come, more or less, within the general ken of all men and belong to no definite science. Accordingly all men make use, more or less, of both; for to a certain extent all men attempt to discuss statements and to maintain them, to defend Traceback (most recent call last): File "tmp.py", line 27, in <module> sys.stdout.write(new_string) BrokenPipeError: [Errno 32] Broken pipe $ -- https://mail.python.org/mailman/listinfo/python-list