[O] Python code for producing Org tables

François Pinard Tue, 18 Dec 2012 19:43:21 -0800

Hi, Org people.

I recently needed to produce Org tables from within Python, splitting
them as needed to fit within a preset width.  I append the code after my
signature, in case it would be useful to others (or even, if you have
ideas to improve it).


One thing I only realized after writing this, however, is that I wrongly
thought Org mode was aligning floating numbers on the decimal period,
while it merely right align floating numbers regardless of the position
of the period.  So, I turn my mistake into a suggestion, as I think it
would be more convenient if Org mode was aligning floating numbers more
appropriately.

I even thought trying to contribute some Emacs Lisp code to do so, but
seeing that I'm short on free time in these days (like too often), I now
find more fruitful to merely share the idea (and the Python code) now.

François



def to_org(titles, rows, write, hide_empty=False, margin=0, easy=4, span=1,
           fillto=None, limit=None):
    """\
Given a list of column TITLES, and a list of ROWS, each containing a list
of columns, use WRITE to produce a Org formatted table with the text of
columns. If HIDE_EMPTY is not False, then omit columns containing nothing but
empty strings.  The formatted table is shifted right by MARGIN columns.

To accomodate for titles, the width of a column will easily extend to EASY,
or to whatever is needed so the title is not split on more than SPAN lines.
FILLTO may be used to force last column to extend until that position.  LIMIT
may be used to impose a limit to the number of characters in produced lines.

If TITLES is None, the titles are not produced.

Columns containing only numbers (integer or floating) align them properly.
"""

    # Exit if nothing to display.
    if not rows:
        return

    # Compute widths from data.
    rows = [[safe_unicode(column, 30).replace('\\', '\\\\')
             .replace('|', '\\vert{}') for column in row]
             for row in rows]
    # Each WIDTH is the column width as strings.  Each LEFT is either
    # False when there is a no-number in the column or the maximum width
    # of the integral part of all numbers.  When LEFT is not None, each
    # RIGHT is either the maximum width of the fraction part including
    # the decimal point of all numbers, or 0 if all pure integers.
    widths = [0] * len(rows[0])
    lefts = [0] * len(rows[0])
    rights = [0] * len(rows[0])
    for row in rows:
        for counter, cell in enumerate(row):
            widths[counter] = max(widths[counter], len(cell))
            if lefts[counter] is not False:
                match = re.match('([0-9]*)(\\.[0-9]*)$', cell)
                if match is None:
                    lefts[counter] = False
                else:
                    lefts[counter] = max(lefts[counter],
                                         len(match.group(1)))
                    if match.group(2):
                        rights[counter] = max(rights[counter],
                                              len(match.group(2)))
    for counter, (left, right) in enumerate(zip(lefts, rights)):
        if left == 0 and right == 0:
            lefts[counter] = False
        elif left is not False:
            widths[counter] = left + right

    # Extend widths as needed to make room for titles.
    if titles is not None:
        for counter, (width, title) in enumerate(zip(widths, titles)):
            if (not hide_empty or width) and len(title) > width:
                if len(title) <= easy:
                    widths[counter] = len(title)
                else:
                    for nlines in range(2, span):
                        if len(title) <= easy * nlines:
                            widths[counter] = max(
                                width, (len(title) + nlines - 1) // nlines)
                            break
                    else:
                        widths[counter] = max(
                            width, (len(title) + span - 1) // span)
    if fillto:
        extend = fillto - margin - sum(widths) - 3 * len(widths) - 1
        if extend > 0:
            widths[-1] += extend

    # Horizontally split the display so each part fits within LIMIT columns.
    end = 0
    while end < len(widths):
        start = end
        if limit is None:
            end = len(widths)
        else:
            remaining = limit - margin - widths[start] - 4
            end = start + 1
            while end < len(widths) and remaining >= widths[end] + 3:
                remaining -= widths[end] + 3
                end += 1
        # Now ready to output columns from START to END (excluded).

        # Skip this part if nothing to display.
        if hide_empty:
            for width in widths[start:end]:
                if width:
                    break
            else:
                continue
        if start > 0:
            write('\n')

        if titles is not None:
            # Write title lines, splitting titles as needed.
            pairs = zip(widths[start:end], titles[start:end])
            for counter in range(span):
                fragments = []
                inked = False
                for width, title in pairs:
                    if not hide_empty or width:
                        fragment = title[counter * width:(counter + 1) * width]
                        if fragment:
                            inked = True
                        fragments.append(
                            fragment.replace('|', ' ').lstrip().ljust(width))
                if not inked:
                    break
                write('%s| %s |\n' % (' ' * margin,
                                      ' | '.join(fragments)))

            # Write separator line.
            fragments = []
            for width in widths[start:end]:
                if not hide_empty or width:
                    fragments.append('-' * width)
            write('%s|-%s-|\n' % (' ' * margin, '-+-'.join(fragments)))

        # Write body lines.
        for row in rows:
            fragments = []
            for width, left, cell in zip(
                    widths[start:end], lefts[start:end], row[start:end]):
                if not hide_empty or width:
                    if left is False:
                        text = cell.replace('|', ' ').lstrip()
                    else:
                        position = cell.find('.')
                        if position < 0:
                            position = len(cell)
                        text = ' ' * (left - position) + cell
                    fragments.append(text.ljust(width))
            write('%s| %s |\n' % (' ' * margin,
                                  ' | '.join(fragments)))


unprintable_regexp = re.compile(
    '[%s]' % re.escape(''.join(map(unichr, range(0, 32) + range(127, 160)))))


def safe_unicode(value, limit=None):
    if value is None:
        return ''
    if isinstance(value, str):
        try:
            value = unicode(value, encoding)
        except UnicodeDecodeError:
            # FIXME: Too fishy!
            value = unicode(value, 'iso-8859-1')
    elif not isinstance(value, unicode):
        # FIXME: Il semble que la sortie de Rpy2 ne souffre pas ", encoding"?
        value = unicode(value)
    if re.search(unprintable_regexp, value):
        value = repr(value)
        if value.startswith('u'):
            value = value[1:]
    if limit is not None and len(value) > limit:
        left_cut = limit * 2 // 3
        right_cut = limit - left_cut
        return value[:left_cut - 1] + u'…' + value[-right_cut:]
    return value

[O] Python code for producing Org tables

Reply via email to