Hello community, here is the log from the commit of package python-html2text for openSUSE:Factory checked in at 2017-05-02 08:54:46 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Comparing /work/SRC/openSUSE:Factory/python-html2text (Old) and /work/SRC/openSUSE:Factory/.python-html2text.new (New) ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Package is "python-html2text" Tue May 2 08:54:46 2017 rev:17 rq:491649 version:2016.9.19 Changes: -------- --- /work/SRC/openSUSE:Factory/python-html2text/python-html2text.changes 2016-03-16 10:34:33.000000000 +0100 +++ /work/SRC/openSUSE:Factory/.python-html2text.new/python-html2text.changes 2017-05-02 08:54:48.643017336 +0200 @@ -1,0 +2,34 @@ +Thu Apr 27 16:33:29 UTC 2017 - toddrme2...@gmail.com + +- Implement update-alternatives to avoid conflict with html2text + package. + +------------------------------------------------------------------- +Wed Apr 12 19:18:13 UTC 2017 - toddrme2...@gmail.com + +- update to version 2016.9.19: + * Default image alt text option created and set to a default of + empty string "" to maintain backward compatibility + * Fix #136: --default-image-alt now takes a string as argument + * Fix #113: Stop changing quiet levels on /script tags. + * Merge #126: Fix deprecation warning on py3 due to html.escape + * Fix #145: Running test suite on Travis CI for Python 2.6. +- update to version 2016.5.29: + * Fix #125: --pad_tables now pads table cells to make them look + nice. + * Fix #114: Break does not interrupt blockquotes + * Deprecation warnings for URL retrieval. +- update to version 2016.4.2: + * Fix #106: encoding by stdin + * Fix #89: Python 3.5 support. + * Fix #113: inplace baseurl substitution for <a> and <img> tags. + * Feature #118: Update the badges to badge.kloud51.com + * Fix #119: new-line after a list is inserted +- update to version 2016.1.8: + * Feature #99: Removed duplicated initialisation. + * Fix #100: Get element style key error. + * Fix #101: Fix error end tag pop exception + * <s>, <strike>, <del> now rendered as ~~text~~. +- Implement singlespec version. + +------------------------------------------------------------------- Old: ---- html2text-2015.11.4.tar.gz New: ---- html2text-2016.9.19.tar.gz ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Other differences: ------------------ ++++++ python-html2text.spec ++++++ --- /var/tmp/diff_new_pack.c9QXXI/_old 2017-05-02 08:54:49.378913568 +0200 +++ /var/tmp/diff_new_pack.c9QXXI/_new 2017-05-02 08:54:49.382913004 +0200 @@ -1,7 +1,7 @@ # # spec file for package python-html2text # -# Copyright (c) 2016 SUSE LINUX GmbH, Nuernberg, Germany. +# Copyright (c) 2017 SUSE LINUX GmbH, Nuernberg, Germany. # # All modifications and additions to the file contributed by third parties # remain the property of their copyright owners, unless otherwise agreed @@ -16,23 +16,29 @@ # +%bcond_without tests + +%{?!python_module:%define python_module() python-%{**} python3-%{**}} Name: python-html2text -Version: 2015.11.4 +Version: 2016.9.19 Release: 0 Url: https://github.com/Alir3z4/html2text/ Summary: Turn HTML into equivalent Markdown-structured text License: GPL-3.0 Group: Development/Languages/Python -Source: https://pypi.python.org/packages/source/h/html2text/html2text-%{version}.tar.gz +Source: https://files.pythonhosted.org/packages/source/h/html2text/html2text-%{version}.tar.gz BuildRoot: %{_tmppath}/%{name}-%{version}-build -BuildRequires: python-devel -BuildRequires: python-setuptools -BuildRequires: python-unittest2 -%if 0%{?suse_version} && 0%{?suse_version} <= 1110 -%{!?python_sitelib: %global python_sitelib %(python -c "from distutils.sysconfig import get_python_lib; print get_python_lib()")} -%else -BuildArch: noarch +BuildRequires: fdupes +BuildRequires: python-rpm-macros +BuildRequires: %{python_module devel} +BuildRequires: %{python_module setuptools} +%if %{with tests} +BuildRequires: python2-unittest2 %endif +Requires(post): update-alternatives +Requires(preun): update-alternatives +BuildArch: noarch +%python_subpackages %description html2text is a Python script that converts a page of HTML into clean, @@ -45,19 +51,35 @@ sed -i '/^#!/d' html2text/__init__.py %build -python setup.py build +%python_build %install -python setup.py install --prefix=%{_prefix} --root=%{buildroot} -mv %{buildroot}%{_bindir}/html2text %{buildroot}%{_bindir}/html2text-python%{py_ver} +%python_install +%python_expand %fdupes %{buildroot}%{$python_sitelib} +# To avoid conflicts with the rst2html5 package +mv %{buildroot}%{_bindir}/html2text %{buildroot}%{_bindir}/html2text-python +ln -s -f %{_sysconfdir}/alternatives/html2text %{buildroot}%{_bindir}/html2text + +%post +update-alternatives --install %{_bindir}/html2text html2text %{_bindir}/html2text-python 15 + +%preun +if [ ! -f %{_bindir}/html2text-python ] ; then + update-alternatives --remove html2text %{_bindir}/html2text-python +fi + +%if %{with tests} %check -python setup.py test +%python_exec setup.py test +%endif -%files +%files %python_files %defattr(-,root,root,-) %doc COPYING README.md AUTHORS.rst ChangeLog.rst -%{_bindir}/html2text-python%{py_ver} +%python3_only %{_bindir}/html2text +%python3_only %{_bindir}/html2text-python +%python3_only %ghost %{_sysconfdir}/alternatives/html2text %{python_sitelib}/* %changelog ++++++ html2text-2015.11.4.tar.gz -> html2text-2016.9.19.tar.gz ++++++ diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/html2text-2015.11.4/AUTHORS.rst new/html2text-2016.9.19/AUTHORS.rst --- old/html2text-2015.11.4/AUTHORS.rst 2015-11-04 15:32:38.000000000 +0100 +++ new/html2text-2016.9.19/AUTHORS.rst 2016-05-29 18:08:48.000000000 +0200 @@ -19,6 +19,7 @@ * Albert Berger <gh: nbdsp> * Etienne Millon <m...@emillon.org> * John C F <gh: critiqjo> +* Mikhail Melnik <by.zumz...@gmail.com> Maintainer: diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/html2text-2015.11.4/ChangeLog.rst new/html2text-2016.9.19/ChangeLog.rst --- old/html2text-2015.11.4/ChangeLog.rst 2015-11-04 15:48:46.000000000 +0100 +++ new/html2text-2016.9.19/ChangeLog.rst 2016-09-19 00:03:35.000000000 +0200 @@ -1,3 +1,44 @@ +2016.9.19 +========= +---- + +* Default image alt text option created and set to a default of empty string "" to maintain backward compatibility +* Fix #136: --default-image-alt now takes a string as argument +* Fix #113: Stop changing quiet levels on \/script tags. +* Merge #126: Fix deprecation warning on py3 due to html.escape +* Fix #145: Running test suite on Travis CI for Python 2.6. + + +2016.5.29 +========= +---- + +* Fix #125: --pad_tables now pads table cells to make them look nice. +* Fix #114: Break does not interrupt blockquotes +* Deprecation warnings for URL retrieval. + + +2016.4.2 +========= +---- + +* Fix #106: encoding by stdin +* Fix #89: Python 3.5 support. +* Fix #113: inplace baseurl substitution for <a> and <img> tags. +* Feature #118: Update the badges to badge.kloud51.com +* Fix #119: new-line after a list is inserted + + +2016.1.8 +========= +---- + +* Feature #99: Removed duplicated initialisation. +* Fix #100: Get element style key error. +* Fix #101: Fix error end tag pop exception +* <s>, <strike>, <del> now rendered as ~~text~~. + + 2015.11.4 ========= ---- diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/html2text-2015.11.4/PKG-INFO new/html2text-2016.9.19/PKG-INFO --- old/html2text-2015.11.4/PKG-INFO 2015-11-04 16:23:02.000000000 +0100 +++ new/html2text-2016.9.19/PKG-INFO 2016-09-19 00:08:46.000000000 +0200 @@ -1,12 +1,129 @@ Metadata-Version: 1.1 Name: html2text -Version: 2015.11.4 +Version: 2016.9.19 Summary: Turn HTML into equivalent Markdown-structured text. Home-page: https://github.com/Alir3z4/html2text/ Author: Alireza Savand Author-email: alireza.sav...@gmail.com License: GNU GPL 3 -Description: UNKNOWN +Description: html2text + ========= + + |Build Status| |Coverage Status| |Downloads| |Version| |Wheel?| |Format| + |License| + + html2text is a Python script that converts a page of HTML into clean, + easy-to-read plain ASCII text. Better yet, that ASCII also happens to be + valid Markdown (a text-to-HTML format). + + Usage: ``html2text [(filename|url) [encoding]]`` + + +---------------------------------------+------------------------------------+ + | Option | Description | + +=======================================+====================================+ + | ``--version`` | Show program's version number and | + | | exit | + +---------------------------------------+------------------------------------+ + | ``-h``, ``--help`` | Show this help message and exit | + +---------------------------------------+------------------------------------+ + | ``--ignore-links`` | Don't include any formatting for | + | | links | + +---------------------------------------+------------------------------------+ + | ``--escape-all`` | Escape all special characters. | + | | Output is less readable, but | + | | avoids corner case formatting | + | | issues. | + +---------------------------------------+------------------------------------+ + | ``--reference-links`` | Use reference links instead of | + | | links to create markdown | + +---------------------------------------+------------------------------------+ + | ``--mark-code`` | Mark preformatted and code blocks | + | | with [code]...[/code] | + +---------------------------------------+------------------------------------+ + + For a complete list of options see the + `docs <https://github.com/Alir3z4/html2text/blob/master/docs/usage.md>`__ + + Or you can use it from within ``Python``: + + :: + + >>> import html2text + >>> + >>> print(html2text.html2text("<p><strong>Zed's</strong> dead baby, <em>Zed's</em> dead.</p>")) + **Zed's** dead baby, _Zed's_ dead. + + Or with some configuration options: + + :: + + >>> import html2text + >>> + >>> h = html2text.HTML2Text() + >>> # Ignore converting links from HTML + >>> h.ignore_links = True + >>> print h.handle("<p>Hello, <a href='http://earth.google.com/'>world</a>!") + Hello, world! + + >>> print(h.handle("<p>Hello, <a href='http://earth.google.com/'>world</a>!")) + + Hello, world! + + >>> # Don't Ignore links anymore, I like links + >>> h.ignore_links = False + >>> print(h.handle("<p>Hello, <a href='http://earth.google.com/'>world</a>!")) + Hello, [world](http://earth.google.com/)! + + *Originally written by Aaron Swartz. This code is distributed under the + GPLv3.* + + How to install + -------------- + + ``html2text`` is available on pypi + https://pypi.python.org/pypi/html2text + + :: + + $ pip install html2text + + How to run unit tests + --------------------- + + :: + + PYTHONPATH=$PYTHONPATH:. coverage run --source=html2text setup.py test -v + + To see the coverage results: + + :: + + coverage combine + coverage html + + then open the ``./htmlcov/index.html`` file in your browser. + + Documentation + ------------- + + Documentation lives + `here <https://github.com/Alir3z4/html2text/blob/master/docs/usage.md>`__ + + .. |Build Status| image:: https://secure.travis-ci.org/Alir3z4/html2text.png + :target: http://travis-ci.org/Alir3z4/html2text + .. |Coverage Status| image:: https://coveralls.io/repos/Alir3z4/html2text/badge.png + :target: https://coveralls.io/r/Alir3z4/html2text + .. |Downloads| image:: http://badge.kloud51.com/pypi/d/html2text.png + :target: https://pypi.python.org/pypi/html2text/ + .. |Version| image:: http://badge.kloud51.com/pypi/v/html2text.png + :target: https://pypi.python.org/pypi/html2text/ + .. |Wheel?| image:: http://badge.kloud51.com/pypi/wheel/html2text.png + :target: https://pypi.python.org/pypi/html2text/ + .. |Format| image:: http://badge.kloud51.com/pypi/format/html2text.png + :target: https://pypi.python.org/pypi/html2text/ + .. |License| image:: http://badge.kloud51.com/pypi/license/html2text.png + :target: https://pypi.python.org/pypi/html2text/ + Platform: OS Independent Classifier: Development Status :: 5 - Production/Stable Classifier: Intended Audience :: Developers @@ -20,7 +137,7 @@ Classifier: Programming Language :: Python :: 2.7 Classifier: Programming Language :: Python :: 3 Classifier: Programming Language :: Python :: 3.0 -Classifier: Programming Language :: Python :: 3.1 Classifier: Programming Language :: Python :: 3.2 Classifier: Programming Language :: Python :: 3.3 Classifier: Programming Language :: Python :: 3.4 +Classifier: Programming Language :: Python :: 3.5 diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/html2text-2015.11.4/README.md new/html2text-2016.9.19/README.md --- old/html2text-2015.11.4/README.md 2015-11-04 15:32:38.000000000 +0100 +++ new/html2text-2016.9.19/README.md 2016-09-18 23:51:18.000000000 +0200 @@ -2,12 +2,11 @@ [![Build Status](https://secure.travis-ci.org/Alir3z4/html2text.png)](http://travis-ci.org/Alir3z4/html2text) [![Coverage Status](https://coveralls.io/repos/Alir3z4/html2text/badge.png)](https://coveralls.io/r/Alir3z4/html2text) -[![Downloads](https://pypip.in/d/html2text/badge.png)](https://pypi.python.org/pypi/html2text/) -[![Version](https://pypip.in/v/html2text/badge.png)](https://pypi.python.org/pypi/html2text/) -[![Egg?](https://pypip.in/egg/html2text/badge.png)](https://pypi.python.org/pypi/html2text/) -[![Wheel?](https://pypip.in/wheel/html2text/badge.png)](https://pypi.python.org/pypi/html2text/) -[![Format](https://pypip.in/format/html2text/badge.png)](https://pypi.python.org/pypi/html2text/) -[![License](https://pypip.in/license/html2text/badge.png)](https://pypi.python.org/pypi/html2text/) +[![Downloads](http://badge.kloud51.com/pypi/d/html2text.png)](https://pypi.python.org/pypi/html2text/) +[![Version](http://badge.kloud51.com/pypi/v/html2text.png)](https://pypi.python.org/pypi/html2text/) +[![Wheel?](http://badge.kloud51.com/pypi/wheel/html2text.png)](https://pypi.python.org/pypi/html2text/) +[![Format](http://badge.kloud51.com/pypi/format/html2text.png)](https://pypi.python.org/pypi/html2text/) +[![License](http://badge.kloud51.com/pypi/license/html2text.png)](https://pypi.python.org/pypi/html2text/) html2text is a Python script that converts a page of HTML into clean, easy-to-read plain ASCII text. Better yet, that ASCII also happens to be valid Markdown (a text-to-HTML format). @@ -24,7 +23,7 @@ | `--reference-links` | Use reference links instead of links to create markdown | `--mark-code` | Mark preformatted and code blocks with [code]...[/code] -For a complete list of options see the [docs](docs/usage.md) +For a complete list of options see the [docs](https://github.com/Alir3z4/html2text/blob/master/docs/usage.md) Or you can use it from within `Python`: @@ -85,4 +84,4 @@ ## Documentation -Documentation lives [here](docs/index.md) +Documentation lives [here](https://github.com/Alir3z4/html2text/blob/master/docs/usage.md) diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/html2text-2015.11.4/html2text/__init__.py new/html2text-2016.9.19/html2text/__init__.py --- old/html2text-2015.11.4/html2text/__init__.py 2015-11-04 15:48:14.000000000 +0100 +++ new/html2text-2016.9.19/html2text/__init__.py 2016-09-19 00:03:55.000000000 +0200 @@ -3,14 +3,14 @@ """html2text: Turn HTML into equivalent Markdown-structured text.""" from __future__ import division import re -import cgi +import sys try: from textwrap import wrap except ImportError: # pragma: no cover pass -from html2text.compat import urlparse, HTMLParser +from html2text.compat import urlparse, HTMLParser, html_escape from html2text import config from html2text.utils import ( @@ -26,10 +26,11 @@ list_numbering_start, dumb_css_parser, escape_md_section, - skipwrap + skipwrap, + pad_tables_in_text ) -__version__ = (2015, 11, 4) +__version__ = (2016, 9, 19) # TODO: @@ -44,7 +45,10 @@ appends lines of text). baseurl: base URL of the document we process """ - HTMLParser.HTMLParser.__init__(self) + kwargs = {} + if sys.version_info >= (3, 4): + kwargs['convert_charrefs'] = False + HTMLParser.HTMLParser.__init__(self, **kwargs) # Config options self.split_next_td = False @@ -64,6 +68,7 @@ self.images_with_size = config.IMAGES_WITH_SIZE # covered in cli self.ignore_emphasis = config.IGNORE_EMPHASIS # covered in cli self.bypass_tables = config.BYPASS_TABLES # covered in cli + self.ignore_tables = config.IGNORE_TABLES # covered in cli self.google_doc = False # covered in cli self.ul_item_mark = '*' # covered in cli self.emphasis_mark = '_' # covered in cli @@ -72,9 +77,9 @@ self.use_automatic_links = config.USE_AUTOMATIC_LINKS # covered in cli self.hide_strikethrough = False # covered in cli self.mark_code = config.MARK_CODE - self.single_line_break = config.SINGLE_LINE_BREAK - self.use_automatic_links = config.USE_AUTOMATIC_LINKS self.wrap_links = config.WRAP_LINKS # covered in cli + self.pad_tables = config.PAD_TABLES # covered in cli + self.default_image_alt = config.DEFAULT_IMAGE_ALT # covered in cli self.tag_callback = None if out is None: # pragma: no cover @@ -128,7 +133,11 @@ def handle(self, data): self.feed(data) self.feed("") - return self.optwrap(self.close()) + markdown = self.optwrap(self.close()) + if self.pad_tables: + return pad_tables_in_text(markdown) + else: + return markdown def outtextf(self, s): self.outtextlist.append(s) @@ -140,23 +149,20 @@ try: nochr = unicode('') + unicode_character = unichr except NameError: nochr = str('') + unicode_character = chr self.pbr() self.o('', 0, 'end') outtext = nochr.join(self.outtextlist) + if self.unicode_snob: - try: - nbsp = unichr(name2cp('nbsp')) - except NameError: - nbsp = chr(name2cp('nbsp')) + nbsp = unicode_character(name2cp('nbsp')) else: - try: - nbsp = unichr(32) - except NameError: - nbsp = chr(32) + nbsp = unicode_character(32) try: outtext = outtext.replace(unicode(' _place_holder;'), nbsp) except NameError: @@ -171,14 +177,14 @@ def handle_charref(self, c): charref = self.charref(c) if not self.code and not self.pre: - charref = cgi.escape(charref) + charref = html_escape(charref) self.handle_data(charref, True) def handle_entityref(self, c): entityref = self.entityref(c) if (not self.code and not self.pre and entityref != ' _place_holder;'): - entityref = cgi.escape(entityref) + entityref = html_escape(entityref) self.handle_data(entityref, True) def handle_starttag(self, tag, attrs): @@ -306,7 +312,7 @@ tag_style = element_style(attrs, self.style_def, parent_style) self.tag_stack.append((tag, attrs, tag_style)) else: - dummy, attrs, tag_style = self.tag_stack.pop() + dummy, attrs, tag_style = self.tag_stack.pop() if self.tag_stack else (None, {}, {}) if self.tag_stack: parent_style = self.tag_stack[-1][2] @@ -329,7 +335,10 @@ self.p() if tag == "br" and start: - self.o(" \n") + if self.blockquote > 0: + self.o(" \n> ") + else: + self.o(" \n") if tag == "hr" and start: self.p() @@ -367,9 +376,9 @@ self.o(self.strong_mark) if tag in ['del', 'strike', 's']: if start: - self.o("<" + tag + ">") + self.o('~~') else: - self.o("</" + tag + ">") + self.o('~~') if self.google_doc: if not self.inheader: @@ -418,9 +427,9 @@ try: title = escape_md(a['title']) except KeyError: - self.o("](" + escape_md(a['href']) + ")") + self.o("](" + escape_md(urlparse.urljoin(self.baseurl, a['href'])) + ")") else: - self.o("](" + escape_md(a['href']) + self.o("](" + escape_md(urlparse.urljoin(self.baseurl, a['href'])) + ' "' + title + '" )') else: i = self.previousIndex(a) @@ -437,7 +446,7 @@ if 'src' in attrs: if not self.images_to_alt: attrs['href'] = attrs['src'] - alt = attrs.get('alt') or '' + alt = attrs.get('alt') or self.default_image_alt # If we have images_with_size, write raw html including width, # height, and alt attributes @@ -474,7 +483,7 @@ self.o("![" + escape_md(alt) + "]") if self.inline_links: href = attrs.get('href') or '' - self.o("(" + escape_md(href) + ")") + self.o("(" + escape_md(urlparse.urljoin(self.baseurl, href)) + ")") else: i = self.previousIndex(attrs) if i is not None: @@ -512,6 +521,8 @@ else: if self.list: self.list.pop() + if (not self.google_doc) and (not self.list): + self.o('\n') self.lastWasList = True else: self.lastWasList = False @@ -537,7 +548,16 @@ self.start = 1 if tag in ["table", "tr", "td", "th"]: - if self.bypass_tables: + if self.ignore_tables: + if tag == 'tr': + if start: + pass + else: + self.soft_br() + else: + pass + + elif self.bypass_tables: if start: self.soft_br() if tag in ["td", "th"]: @@ -552,8 +572,16 @@ self.o('</{0}>'.format(tag)) else: - if tag == "table" and start: - self.table_start = True + if tag == "table": + if start: + self.table_start = True + if self.pad_tables: + self.o("<"+config.TABLE_MARKER_FOR_PAD+">") + self.o(" \n") + else: + if self.pad_tables: + self.o("</"+config.TABLE_MARKER_FOR_PAD+">") + self.o(" \n") if tag in ["td", "th"] and start: if self.split_next_td: self.o("| ") @@ -703,9 +731,6 @@ self.outcount += 1 def handle_data(self, data, entity_char=False): - if r'\/script>' in data: - self.quiet -= 1 - if self.style: self.style_def.update(dumb_css_parser(data)) @@ -810,7 +835,9 @@ for para in text.split("\n"): if len(para) > 0: if not skipwrap(para, self.wrap_links): - result += "\n".join(wrap(para, self.body_width)) + result += "\n".join( + wrap(para, self.body_width, break_long_words=False) + ) if para.endswith(' '): result += " \n" newlines = 1 diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/html2text-2015.11.4/html2text/cli.py new/html2text-2016.9.19/html2text/cli.py --- old/html2text-2015.11.4/html2text/cli.py 2015-11-04 15:32:38.000000000 +0100 +++ new/html2text-2016.9.19/html2text/cli.py 2016-09-18 23:51:18.000000000 +0200 @@ -1,4 +1,5 @@ import optparse +import warnings from html2text.compat import urllib from html2text import HTML2Text, config, __version__ @@ -23,6 +24,20 @@ version='%prog ' + ".".join(map(str, __version__)) ) p.add_option( + "--default-image-alt", + dest="default_image_alt", + action="store", + type="str", + default=config.DEFAULT_IMAGE_ALT, + help="The default alt string for images with missing ones") + p.add_option( + "--pad-tables", + dest="pad_tables", + action="store_true", + default=config.PAD_TABLES, + help="pad the cells to equal column width in tables" + ) + p.add_option( "--no-wrap-links", dest="wrap_links", action="store_false", @@ -139,6 +154,13 @@ help="Format tables in HTML rather than Markdown syntax." ) p.add_option( + "--ignore-tables", + action="store_true", + dest="ignore_tables", + default=config.IGNORE_TABLES, + help="Ignore table-related tags (table, th, td, tr) while keeping rows." + ) + p.add_option( "--single-line-break", action="store_true", dest="single_line_break", @@ -195,14 +217,17 @@ # process input encoding = "utf-8" + if len(args) == 2: + encoding = args[1] + elif len(args) > 2: + p.error('Too many arguments') + if len(args) > 0 and args[0] != '-': # pragma: no cover file_ = args[0] - if len(args) == 2: - encoding = args[1] - if len(args) > 2: - p.error('Too many arguments') if file_.startswith('http://') or file_.startswith('https://'): + warnings.warn("Support for retrieving html over network is set for deprecation by version (2017, 1, x)", + DeprecationWarning) baseurl = file_ j = urllib.urlopen(baseurl) data = j.read() @@ -259,6 +284,7 @@ h.hide_strikethrough = options.hide_strikethrough h.escape_snob = options.escape_snob h.bypass_tables = options.bypass_tables + h.ignore_tables = options.ignore_tables h.single_line_break = options.single_line_break h.inline_links = options.inline_links h.unicode_snob = options.unicode_snob @@ -267,5 +293,7 @@ h.links_each_paragraph = options.links_each_paragraph h.mark_code = options.mark_code h.wrap_links = options.wrap_links + h.pad_tables = options.pad_tables + h.default_image_alt = options.default_image_alt wrapwrite(h.handle(data)) diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/html2text-2015.11.4/html2text/compat.py new/html2text-2016.9.19/html2text/compat.py --- old/html2text-2015.11.4/html2text/compat.py 2015-11-04 15:32:38.000000000 +0100 +++ new/html2text-2016.9.19/html2text/compat.py 2016-09-18 23:51:18.000000000 +0200 @@ -6,8 +6,12 @@ import urlparse import HTMLParser import urllib + from cgi import escape as html_escape else: import urllib.parse as urlparse import html.entities as htmlentitydefs import html.parser as HTMLParser import urllib.request as urllib + from html import escape + def html_escape(s): + return escape(s, quote=False) diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/html2text-2015.11.4/html2text/config.py new/html2text-2016.9.19/html2text/config.py --- old/html2text-2015.11.4/html2text/config.py 2015-11-04 15:32:38.000000000 +0100 +++ new/html2text-2016.9.19/html2text/config.py 2016-09-18 23:51:18.000000000 +0200 @@ -3,6 +3,8 @@ # Use Unicode characters instead of their ascii psuedo-replacements UNICODE_SNOB = 0 +# Marker to use for marking tables for padding post processing +TABLE_MARKER_FOR_PAD = "special_marker_for_table_padding" # Escape all special characters. Output is less readable, but avoids # corner case formatting issues. ESCAPE_SNOB = 0 @@ -36,6 +38,8 @@ IGNORE_EMPHASIS = False MARK_CODE = False DECODE_ERRORS = 'strict' +DEFAULT_IMAGE_ALT = '' +PAD_TABLES = False # Convert links with same href and text to <href> format if they are absolute links USE_AUTOMATIC_LINKS = True @@ -116,7 +120,11 @@ 'rlm': '' } +# Format tables in HTML rather than Markdown syntax BYPASS_TABLES = False +# Ignore table-related tags (table, th, td, tr) while keeping rows +IGNORE_TABLES = False + # Use a single line break after a block element rather an two line breaks. # NOTE: Requires body width setting to be 0. diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/html2text-2015.11.4/html2text/utils.py new/html2text-2016.9.19/html2text/utils.py --- old/html2text-2015.11.4/html2text/utils.py 2015-11-04 15:32:38.000000000 +0100 +++ new/html2text-2016.9.19/html2text/utils.py 2016-09-18 23:51:18.000000000 +0200 @@ -31,7 +31,7 @@ """ :returns: A hash of css attributes """ - out = dict([(x.strip(), y.strip()) for x, y in + out = dict([(x.strip().lower(), y.strip().lower()) for x, y in [z.split(':', 1) for z in style.split(';') if ':' in z ] @@ -80,7 +80,7 @@ style = parent_style.copy() if 'class' in attrs: for css_class in attrs['class'].split(): - css_style = style_def['.' + css_class] + css_style = style_def.get('.' + css_class, {}) style.update(css_style) if 'style' in attrs: immediate_style = dumb_property_dict(attrs['style']) @@ -149,7 +149,7 @@ font_family = '' if 'font-family' in style: font_family = style['font-family'] - if 'Courier New' == font_family or 'Consolas' == font_family: + if 'courier new' == font_family or 'consolas' == font_family: return True return False @@ -244,3 +244,55 @@ text = config.RE_MD_DASH_MATCHER.sub(r"\1\\\2", text) return text + +def reformat_table(lines, right_margin): + """ + Given the lines of a table + padds the cells and returns the new lines + """ + # find the maximum width of the columns + max_width = [len(x.rstrip()) + right_margin for x in lines[0].split('|')] + for line in lines: + cols = [x.rstrip() for x in line.split('|')] + max_width = [max(len(x) + right_margin, old_len) + for x, old_len in zip(cols, max_width)] + + # reformat + new_lines = [] + for line in lines: + cols = [x.rstrip() for x in line.split('|')] + if set(line.strip()) == set('-|'): + filler = '-' + new_cols = [x.rstrip() + (filler * (M - len(x.rstrip()))) + for x, M in zip(cols, max_width)] + else: + filler = ' ' + new_cols = [x.rstrip() + (filler * (M - len(x.rstrip()))) + for x, M in zip(cols, max_width)] + new_lines.append('|'.join(new_cols)) + return new_lines + +def pad_tables_in_text(text, right_margin=1): + """ + Provide padding for tables in the text + """ + lines = text.split('\n') + table_buffer, altered_lines, table_widths, table_started = [], [], [], False + new_lines = [] + for line in lines: + # Toogle table started + if (config.TABLE_MARKER_FOR_PAD in line): + table_started = not table_started + if not table_started: + table = reformat_table(table_buffer, right_margin) + new_lines.extend(table) + table_buffer = [] + new_lines.append('') + continue + # Process lines + if table_started: + table_buffer.append(line) + else: + new_lines.append(line) + new_text = '\n'.join(new_lines) + return new_text diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/html2text-2015.11.4/html2text.egg-info/PKG-INFO new/html2text-2016.9.19/html2text.egg-info/PKG-INFO --- old/html2text-2015.11.4/html2text.egg-info/PKG-INFO 2015-11-04 16:23:02.000000000 +0100 +++ new/html2text-2016.9.19/html2text.egg-info/PKG-INFO 2016-09-19 00:08:46.000000000 +0200 @@ -1,12 +1,129 @@ Metadata-Version: 1.1 Name: html2text -Version: 2015.11.4 +Version: 2016.9.19 Summary: Turn HTML into equivalent Markdown-structured text. Home-page: https://github.com/Alir3z4/html2text/ Author: Alireza Savand Author-email: alireza.sav...@gmail.com License: GNU GPL 3 -Description: UNKNOWN +Description: html2text + ========= + + |Build Status| |Coverage Status| |Downloads| |Version| |Wheel?| |Format| + |License| + + html2text is a Python script that converts a page of HTML into clean, + easy-to-read plain ASCII text. Better yet, that ASCII also happens to be + valid Markdown (a text-to-HTML format). + + Usage: ``html2text [(filename|url) [encoding]]`` + + +---------------------------------------+------------------------------------+ + | Option | Description | + +=======================================+====================================+ + | ``--version`` | Show program's version number and | + | | exit | + +---------------------------------------+------------------------------------+ + | ``-h``, ``--help`` | Show this help message and exit | + +---------------------------------------+------------------------------------+ + | ``--ignore-links`` | Don't include any formatting for | + | | links | + +---------------------------------------+------------------------------------+ + | ``--escape-all`` | Escape all special characters. | + | | Output is less readable, but | + | | avoids corner case formatting | + | | issues. | + +---------------------------------------+------------------------------------+ + | ``--reference-links`` | Use reference links instead of | + | | links to create markdown | + +---------------------------------------+------------------------------------+ + | ``--mark-code`` | Mark preformatted and code blocks | + | | with [code]...[/code] | + +---------------------------------------+------------------------------------+ + + For a complete list of options see the + `docs <https://github.com/Alir3z4/html2text/blob/master/docs/usage.md>`__ + + Or you can use it from within ``Python``: + + :: + + >>> import html2text + >>> + >>> print(html2text.html2text("<p><strong>Zed's</strong> dead baby, <em>Zed's</em> dead.</p>")) + **Zed's** dead baby, _Zed's_ dead. + + Or with some configuration options: + + :: + + >>> import html2text + >>> + >>> h = html2text.HTML2Text() + >>> # Ignore converting links from HTML + >>> h.ignore_links = True + >>> print h.handle("<p>Hello, <a href='http://earth.google.com/'>world</a>!") + Hello, world! + + >>> print(h.handle("<p>Hello, <a href='http://earth.google.com/'>world</a>!")) + + Hello, world! + + >>> # Don't Ignore links anymore, I like links + >>> h.ignore_links = False + >>> print(h.handle("<p>Hello, <a href='http://earth.google.com/'>world</a>!")) + Hello, [world](http://earth.google.com/)! + + *Originally written by Aaron Swartz. This code is distributed under the + GPLv3.* + + How to install + -------------- + + ``html2text`` is available on pypi + https://pypi.python.org/pypi/html2text + + :: + + $ pip install html2text + + How to run unit tests + --------------------- + + :: + + PYTHONPATH=$PYTHONPATH:. coverage run --source=html2text setup.py test -v + + To see the coverage results: + + :: + + coverage combine + coverage html + + then open the ``./htmlcov/index.html`` file in your browser. + + Documentation + ------------- + + Documentation lives + `here <https://github.com/Alir3z4/html2text/blob/master/docs/usage.md>`__ + + .. |Build Status| image:: https://secure.travis-ci.org/Alir3z4/html2text.png + :target: http://travis-ci.org/Alir3z4/html2text + .. |Coverage Status| image:: https://coveralls.io/repos/Alir3z4/html2text/badge.png + :target: https://coveralls.io/r/Alir3z4/html2text + .. |Downloads| image:: http://badge.kloud51.com/pypi/d/html2text.png + :target: https://pypi.python.org/pypi/html2text/ + .. |Version| image:: http://badge.kloud51.com/pypi/v/html2text.png + :target: https://pypi.python.org/pypi/html2text/ + .. |Wheel?| image:: http://badge.kloud51.com/pypi/wheel/html2text.png + :target: https://pypi.python.org/pypi/html2text/ + .. |Format| image:: http://badge.kloud51.com/pypi/format/html2text.png + :target: https://pypi.python.org/pypi/html2text/ + .. |License| image:: http://badge.kloud51.com/pypi/license/html2text.png + :target: https://pypi.python.org/pypi/html2text/ + Platform: OS Independent Classifier: Development Status :: 5 - Production/Stable Classifier: Intended Audience :: Developers @@ -20,7 +137,7 @@ Classifier: Programming Language :: Python :: 2.7 Classifier: Programming Language :: Python :: 3 Classifier: Programming Language :: Python :: 3.0 -Classifier: Programming Language :: Python :: 3.1 Classifier: Programming Language :: Python :: 3.2 Classifier: Programming Language :: Python :: 3.3 Classifier: Programming Language :: Python :: 3.4 +Classifier: Programming Language :: Python :: 3.5 diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/html2text-2015.11.4/html2text.egg-info/SOURCES.txt new/html2text-2016.9.19/html2text.egg-info/SOURCES.txt --- old/html2text-2015.11.4/html2text.egg-info/SOURCES.txt 2015-11-04 16:23:02.000000000 +0100 +++ new/html2text-2016.9.19/html2text.egg-info/SOURCES.txt 2016-09-19 00:08:46.000000000 +0200 @@ -35,10 +35,14 @@ test/bodywidth_newline.md test/bold_inside_link.html test/bold_inside_link.md +test/break_preserved_in_blockquote.html +test/break_preserved_in_blockquote.md test/css_import_no_semicolon.html test/css_import_no_semicolon.md test/decript_tage.html test/decript_tage.md +test/default_image_alt.html +test/default_image_alt.md test/doc_with_table.html test/doc_with_table.md test/doc_with_table_bypass.html @@ -49,6 +53,8 @@ test/empty-link.md test/flip_emphasis.html test/flip_emphasis.md +test/google-like_font-properties.html +test/google-like_font-properties.md test/header_tags.html test/header_tags.md test/horizontal_rule.html @@ -63,6 +69,8 @@ test/images_with_size.md test/img-tag-with-link.html test/img-tag-with-link.md +test/inplace_baseurl_substitution.html +test/inplace_baseurl_substitution.md test/invalid_start.html test/invalid_start.md test/invalid_unicode.html @@ -71,6 +79,8 @@ test/link_titles.md test/list_tags_example.html test/list_tags_example.md +test/long_lines.html +test/long_lines.md test/mark_code.html test/mark_code.md test/nbsp.html @@ -91,6 +101,8 @@ test/normal.md test/normal_escape_snob.html test/normal_escape_snob.md +test/pad_table.html +test/pad_table.md test/pre.html test/pre.md test/preformatted_in_list.html @@ -99,7 +111,11 @@ test/protect_links.md test/single_line_break.html test/single_line_break.md +test/table_ignore.html +test/table_ignore.md test/test_html2text.py test/test_memleak.py +test/text_after_list.html +test/text_after_list.md test/url-escaping.html test/url-escaping.md \ No newline at end of file diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/html2text-2015.11.4/setup.py new/html2text-2016.9.19/setup.py --- old/html2text-2015.11.4/setup.py 2015-11-04 15:58:13.000000000 +0100 +++ new/html2text-2016.9.19/setup.py 2016-05-29 18:13:44.000000000 +0200 @@ -1,7 +1,14 @@ # coding: utf-8 import sys + from setuptools import setup, Command, find_packages +try: + from pypandoc import convert + read_md = lambda f: convert(f, 'rst') +except ImportError: + read_md = lambda f: open(f, 'r').read() + requires_list = [] try: import unittest2 as unittest @@ -13,7 +20,8 @@ class RunTests(Command): - """New setup.py command to run all tests for the package. + """ + New setup.py command to run all tests for the package. """ description = "run all tests for the package" @@ -36,6 +44,7 @@ name="html2text", version=".".join(map(str, __import__('html2text').__version__)), description="Turn HTML into equivalent Markdown-structured text.", + long_description=read_md('README.md'), author="Aaron Swartz", author_email="m...@aaronsw.com", maintainer='Alireza Savand', @@ -56,10 +65,10 @@ 'Programming Language :: Python :: 2.7', 'Programming Language :: Python :: 3', 'Programming Language :: Python :: 3.0', - 'Programming Language :: Python :: 3.1', 'Programming Language :: Python :: 3.2', 'Programming Language :: Python :: 3.3', 'Programming Language :: Python :: 3.4', + 'Programming Language :: Python :: 3.5', ], entry_points=""" [console_scripts] diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/html2text-2015.11.4/test/break_preserved_in_blockquote.html new/html2text-2016.9.19/test/break_preserved_in_blockquote.html --- old/html2text-2015.11.4/test/break_preserved_in_blockquote.html 1970-01-01 01:00:00.000000000 +0100 +++ new/html2text-2016.9.19/test/break_preserved_in_blockquote.html 2016-05-29 18:08:48.000000000 +0200 @@ -0,0 +1 @@ +a<blockquote>b<br>c</blockquote> diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/html2text-2015.11.4/test/break_preserved_in_blockquote.md new/html2text-2016.9.19/test/break_preserved_in_blockquote.md --- old/html2text-2015.11.4/test/break_preserved_in_blockquote.md 1970-01-01 01:00:00.000000000 +0100 +++ new/html2text-2016.9.19/test/break_preserved_in_blockquote.md 2016-05-29 18:08:48.000000000 +0200 @@ -0,0 +1,5 @@ +a + +> b +> c + diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/html2text-2015.11.4/test/decript_tage.md new/html2text-2016.9.19/test/decript_tage.md --- old/html2text-2015.11.4/test/decript_tage.md 2015-11-04 15:32:38.000000000 +0100 +++ new/html2text-2016.9.19/test/decript_tage.md 2016-05-29 18:08:48.000000000 +0200 @@ -1,2 +1,2 @@ -<del>something</del> <strike>something</strike> <s>something</s> +~~something~~ ~~something~~ ~~something~~ diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/html2text-2015.11.4/test/default_image_alt.html new/html2text-2016.9.19/test/default_image_alt.html --- old/html2text-2015.11.4/test/default_image_alt.html 1970-01-01 01:00:00.000000000 +0100 +++ new/html2text-2016.9.19/test/default_image_alt.html 2016-09-18 23:51:18.000000000 +0200 @@ -0,0 +1 @@ +<a href="http://google.com"><img src="images/google.png"></a> diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/html2text-2015.11.4/test/default_image_alt.md new/html2text-2016.9.19/test/default_image_alt.md --- old/html2text-2015.11.4/test/default_image_alt.md 1970-01-01 01:00:00.000000000 +0100 +++ new/html2text-2016.9.19/test/default_image_alt.md 2016-09-18 23:51:18.000000000 +0200 @@ -0,0 +1,2 @@ +[![Image](images/google.png)](http://google.com) + diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/html2text-2015.11.4/test/google-like_font-properties.html new/html2text-2016.9.19/test/google-like_font-properties.html --- old/html2text-2015.11.4/test/google-like_font-properties.html 1970-01-01 01:00:00.000000000 +0100 +++ new/html2text-2016.9.19/test/google-like_font-properties.html 2016-09-18 23:51:18.000000000 +0200 @@ -0,0 +1,15 @@ +<HTML> + <HEAD> + <TITLE>CAPS-LOCK TEST</TITLE> + </HEAD> + <BODY> + <p><span style="font-weight: bold">font-weight: bold</span></p> + <P><SPAN STYLE="FONT-WEIGHT: BOLD">FONT-WEIGHT: BOLD</SPAN></P> + <p><span style="font-style: italic">font-style: italic</span></p> + <P><SPAN STYLE="FONT-STYLE: ITALIC">FONT-STYLE: ITALIC</SPAN></P> + <p><span style="font-weight: bold;font-style: italic"> + font-weight: bold;font-style: italic</span></p> + <P><SPAN STYLE="FONT-WEIGHT: BOLD;FONT-STYLE: ITALIC"> + FONT-WEIGHT: BOLD;FONT-STYLE: ITALIC</SPAN></P> + </BODY> +</HTML> diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/html2text-2015.11.4/test/google-like_font-properties.md new/html2text-2016.9.19/test/google-like_font-properties.md --- old/html2text-2015.11.4/test/google-like_font-properties.md 1970-01-01 01:00:00.000000000 +0100 +++ new/html2text-2016.9.19/test/google-like_font-properties.md 2016-09-18 23:51:18.000000000 +0200 @@ -0,0 +1,6 @@ +**font-weight: bold** +**FONT-WEIGHT: BOLD** +_font-style: italic_ +_FONT-STYLE: ITALIC_ +_**font-weight: bold;font-style: italic**_ +_**FONT-WEIGHT: BOLD;FONT-STYLE: ITALIC**_ diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/html2text-2015.11.4/test/inplace_baseurl_substitution.html new/html2text-2016.9.19/test/inplace_baseurl_substitution.html --- old/html2text-2015.11.4/test/inplace_baseurl_substitution.html 1970-01-01 01:00:00.000000000 +0100 +++ new/html2text-2016.9.19/test/inplace_baseurl_substitution.html 2016-05-29 18:08:48.000000000 +0200 @@ -0,0 +1,11 @@ +<!DOCTYPE html> +<head></head> +<body> +<p> +<img src="/uploads/2012/01/read2textheader.jpg" alt="read2text header image" width="650" height="165"/> +</p> +<p> +<a href="/">BrettTerpstra.com</a> +</p> +</body> +</html> diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/html2text-2015.11.4/test/inplace_baseurl_substitution.md new/html2text-2016.9.19/test/inplace_baseurl_substitution.md --- old/html2text-2015.11.4/test/inplace_baseurl_substitution.md 1970-01-01 01:00:00.000000000 +0100 +++ new/html2text-2016.9.19/test/inplace_baseurl_substitution.md 2016-05-29 18:08:48.000000000 +0200 @@ -0,0 +1,3 @@ +![read2text header image](http://brettterpstra.com/uploads/2012/01/read2textheader.jpg) + +[BrettTerpstra.com](http://brettterpstra.com/) diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/html2text-2015.11.4/test/list_tags_example.md new/html2text-2016.9.19/test/list_tags_example.md --- old/html2text-2015.11.4/test/list_tags_example.md 2015-11-04 15:32:38.000000000 +0100 +++ new/html2text-2016.9.19/test/list_tags_example.md 2016-05-29 18:08:48.000000000 +0200 @@ -28,9 +28,11 @@ * some item * Some other item * some item + 1. Some other item 2. some item 3. some item + * somthing else here * some item diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/html2text-2015.11.4/test/long_lines.html new/html2text-2016.9.19/test/long_lines.html --- old/html2text-2015.11.4/test/long_lines.html 1970-01-01 01:00:00.000000000 +0100 +++ new/html2text-2016.9.19/test/long_lines.html 2016-05-29 18:08:48.000000000 +0200 @@ -0,0 +1 @@ +asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd <img src="http://www.foooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo.com"> asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/html2text-2015.11.4/test/long_lines.md new/html2text-2016.9.19/test/long_lines.md --- old/html2text-2015.11.4/test/long_lines.md 1970-01-01 01:00:00.000000000 +0100 +++ new/html2text-2016.9.19/test/long_lines.md 2016-05-29 18:08:48.000000000 +0200 @@ -0,0 +1,14 @@ +asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd +asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd +asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd +asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd +asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd +asd asd asd asd asd +![](http://www.foooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo.com) +asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd +asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd +asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd +asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd +asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd asd +asd asd asd asd asd + diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/html2text-2015.11.4/test/pad_table.html new/html2text-2016.9.19/test/pad_table.html --- old/html2text-2015.11.4/test/pad_table.html 1970-01-01 01:00:00.000000000 +0100 +++ new/html2text-2016.9.19/test/pad_table.html 2016-05-29 18:08:48.000000000 +0200 @@ -0,0 +1,26 @@ +<!DOCTYPE html> <html> + <head lang="en"> <meta charset="UTF-8"> <title></title> </head> + <body> <h1>This is a test document</h1> With some text, <code>code</code>, <b>bolds</b> and <i>italics</i>. <h2>This is second header</h2> <p style="display: none">Displaynone text</p> + <table> + <tr> <th>Header 1</th> <th>Header 2</th> <th>Header 3</th> </tr> + <tr> <td>Content 1</td> <td>2</td> <td><img src="http://lorempixel.com/200/200" alt="200"/> Image!</td> </tr> + <tr> <td>Content 1 longer</td> <td>Content 2</td> <td>blah</td> </tr> + <tr> <td>Content </td> <td>Content 2</td> <td>blah</td> </tr> + <tr> <td>t </td> <td>Content 2</td> <td>blah blah blah</td> </tr> + </table> + + + <table> <tr> <th>H1</th> <th>H2</th> <th>H3</th> </tr> + <tr> <td>C1</td> <td>Content 2</td> <td>x</td> </tr> + <tr> <td>C123</td> <td>Content 2</td> <td>xyz</td> </tr> + </table> + +some content between the tables<br> + + <table> <tr> <th>Header 1</th> <th>Header 2</th> <th>Header 3</th> </tr> + <tr> <td>Content 1</td> <td>Content 2</td> <td><img src="http://lorempixel.com/200/200" alt="200"/> Image!</td> </tr> + <tr> <td>Content 1</td> <td>Content 2 longer</td> <td><img src="http://lorempixel.com/200/200" alt="200"/> Image!</td> </tr> + </table> + +something else entirely +</body> </html> diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/html2text-2015.11.4/test/pad_table.md new/html2text-2016.9.19/test/pad_table.md --- old/html2text-2015.11.4/test/pad_table.md 1970-01-01 01:00:00.000000000 +0100 +++ new/html2text-2016.9.19/test/pad_table.md 2016-05-29 18:08:48.000000000 +0200 @@ -0,0 +1,28 @@ +# This is a test document + +With some text, `code`, **bolds** and _italics_. + +## This is second header + +Displaynone text + +Header 1 | Header 2 | Header 3 +-----------------|-----------|---------------------------------------------- +Content 1 | 2 | ![200](http://lorempixel.com/200/200) Image! +Content 1 longer | Content 2 | blah +Content | Content 2 | blah +t | Content 2 | blah blah blah + +H1 | H2 | H3 +-----|-----------|----- +C1 | Content 2 | x +C123 | Content 2 | xyz + +some content between the tables +Header 1 | Header 2 | Header 3 +----------|------------------|---------------------------------------------- +Content 1 | Content 2 | ![200](http://lorempixel.com/200/200) Image! +Content 1 | Content 2 longer | ![200](http://lorempixel.com/200/200) Image! + +something else entirely + diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/html2text-2015.11.4/test/table_ignore.html new/html2text-2016.9.19/test/table_ignore.html --- old/html2text-2015.11.4/test/table_ignore.html 1970-01-01 01:00:00.000000000 +0100 +++ new/html2text-2016.9.19/test/table_ignore.html 2016-09-18 23:51:18.000000000 +0200 @@ -0,0 +1,26 @@ +<!DOCTYPE html> <html> + <head lang="en"> <meta charset="UTF-8"> <title></title> </head> + <body> <h1>This is a test document</h1> With some text, <code>code</code>, <b>bolds</b> and <i>italics</i>. <h2>This is second header</h2> <p style="display: none">Displaynone text</p> + <table> + <tr> <th>Header 1</th> <th>Header 2</th> <th>Header 3</th> </tr> + <tr> <td>Content 1</td> <td>2</td> <td><img src="http://lorempixel.com/200/200" alt="200"/> Image!</td> </tr> + <tr> <td>Content 1 longer</td> <td>Content 2</td> <td>blah</td> </tr> + <tr> <td>Content </td> <td>Content 2</td> <td>blah</td> </tr> + <tr> <td>t </td> <td>Content 2</td> <td>blah blah blah</td> </tr> + </table> + + + <table> <tr> <th>H1</th> <th>H2</th> <th>H3</th> </tr> + <tr> <td>C1</td> <td>Content 2</td> <td>x</td> </tr> + <tr> <td>C123</td> <td>Content 2</td> <td>xyz</td> </tr> + </table> + +some content between the tables<br> + + <table> <tr> <th>Header 1</th> <th>Header 2</th> <th>Header 3</th> </tr> + <tr> <td>Content 1</td> <td>Content 2</td> <td><img src="http://lorempixel.com/200/200" alt="200"/> Image!</td> </tr> + <tr> <td>Content 1</td> <td>Content 2 longer</td> <td><img src="http://lorempixel.com/200/200" alt="200"/> Image!</td> </tr> + </table> + +something else entirely +</body> </html> diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/html2text-2015.11.4/test/table_ignore.md new/html2text-2016.9.19/test/table_ignore.md --- old/html2text-2015.11.4/test/table_ignore.md 1970-01-01 01:00:00.000000000 +0100 +++ new/html2text-2016.9.19/test/table_ignore.md 2016-09-18 23:51:18.000000000 +0200 @@ -0,0 +1,22 @@ +# This is a test document + +With some text, `code`, **bolds** and _italics_. + +## This is second header + +Displaynone text + +Header 1 Header 2 Header 3 +Content 1 2 ![200](http://lorempixel.com/200/200) Image! +Content 1 longer Content 2 blah +Content Content 2 blah +t Content 2 blah blah blah +H1 H2 H3 +C1 Content 2 x +C123 Content 2 xyz +some content between the tables +Header 1 Header 2 Header 3 +Content 1 Content 2 ![200](http://lorempixel.com/200/200) Image! +Content 1 Content 2 longer ![200](http://lorempixel.com/200/200) Image! +something else entirely + diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/html2text-2015.11.4/test/test_html2text.py new/html2text-2016.9.19/test/test_html2text.py --- old/html2text-2015.11.4/test/test_html2text.py 2015-11-04 15:32:38.000000000 +0100 +++ new/html2text-2016.9.19/test/test_html2text.py 2016-09-18 23:51:18.000000000 +0200 @@ -114,6 +114,10 @@ func_args = {} base_fn = os.path.basename(fn).lower() + if base_fn.startswith('default_image_alt'): + module_args['default_image_alt'] = 'Image' + cmdline_args.append('--default-image-alt=Image') + if base_fn.startswith('google'): module_args['google_doc'] = True cmdline_args.append('--googledoc') @@ -134,6 +138,10 @@ module_args['bypass_tables'] = True cmdline_args.append('--bypass-tables') + if base_fn.startswith('table_ignore'): + module_args['ignore_tables'] = True + cmdline_args.append('--ignore-tables') + if base_fn.startswith('bodywidth'): # module_args['unicode_snob'] = True module_args['body_width'] = 0 @@ -161,7 +169,7 @@ if base_fn.startswith('no_inline_links'): module_args['inline_links'] = False cmdline_args.append('--reference-links') - + if base_fn.startswith('no_wrap_links'): module_args['wrap_links'] = False cmdline_args.append('--no-wrap-links') @@ -170,9 +178,19 @@ module_args['mark_code'] = True cmdline_args.append('--mark-code') + if base_fn.startswith('pad_table'): + module_args['pad_tables'] = True + cmdline_args.append('--pad-tables') + if base_fn not in ['bodywidth_newline.html', 'abbr_tag.html']: test_func = None + if base_fn == 'inplace_baseurl_substitution.html': + module_args['baseurl'] = 'http://brettterpstra.com' + module_args['body_width'] = 0 + # there is no way to specify baseurl in cli :( + test_cmd = None + return test_mod, test_cmd, test_func # Originally from http://stackoverflow.com/questions/32899/\ diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/html2text-2015.11.4/test/text_after_list.html new/html2text-2016.9.19/test/text_after_list.html --- old/html2text-2015.11.4/test/text_after_list.html 1970-01-01 01:00:00.000000000 +0100 +++ new/html2text-2016.9.19/test/text_after_list.html 2016-05-29 18:08:48.000000000 +0200 @@ -0,0 +1,2 @@ +<ul><li>item</li></ul> +text diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/html2text-2015.11.4/test/text_after_list.md new/html2text-2016.9.19/test/text_after_list.md --- old/html2text-2015.11.4/test/text_after_list.md 1970-01-01 01:00:00.000000000 +0100 +++ new/html2text-2016.9.19/test/text_after_list.md 2016-05-29 18:08:48.000000000 +0200 @@ -0,0 +1,4 @@ + * item + +text + diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/html2text-2015.11.4/test/url-escaping.html new/html2text-2016.9.19/test/url-escaping.html --- old/html2text-2015.11.4/test/url-escaping.html 2015-02-18 14:16:15.000000000 +0100 +++ new/html2text-2016.9.19/test/url-escaping.html 2016-05-29 18:08:48.000000000 +0200 @@ -6,8 +6,8 @@ <li><a href="http://msdn.microsoft.com/en-us/library/system.drawing.drawing2d(v=vs.110)">Some MSDN link using parenthesis</a></li> <li><a href="https://www.google.ru/search?q=[brackets are cool]">Google search result URL with unescaped brackets</a></li> <li><a href="https://www.google.ru/search?q='[({})]'">Yet another test for [brackets], {curly braces} and (parenthesis) processing inside the anchor</a></li> - <li>Use automatic links like <a href="http://example.com/">http://example.com/</a> when the URL is the label</a> - <li>Exempt <a href="non-absolute_URIs">non-absolute_URIs</a> from automatic link detection</a> + <li>Use automatic links like <a href="http://example.com/">http://example.com/</a> when the URL is the label</a></li> + <li>Exempt <a href="non-absolute_URIs">non-absolute_URIs</a> from automatic link detection</a></li> </ul> <p>And here are images with tricky attribute values:</p>