Re: Namedtuples: some unexpected inconveniences

MRAB Fri, 14 Apr 2017 14:24:12 -0700

On 2017-04-14 20:34, Deborah Swanson wrote:

Peter,


Retracing my steps to rewrite the getattr(row, label) code, this is what
sent me down the rabbit hole in the first place. (I changed your 'rows'
to 'records' just to use the same name everywhere, but all else is the
same as you gave me.) I'd like you to look at it and see if you still
think complete(group, label) should work. Perhaps seeing why it fails
will clarify some of the difficulties I'm having.

I ran into problems with values and has_empty. values has a problem
because
row[label] gets a TypeError. has_empty has a problem because a list of
field values will be shorter with missing values than a full list, but a
namedtuple with missing values will be the same length as a full
namedtuple since missing values have '' placeholders.  Two more
unexpected inconveniences.

In the line:

    values = {row[label] for row in group}

'group' is a list of records; row is a record (namedtuple).

You can get the members of a namedtuple (also 'normal' tuple) by numericindex, e.g. row[0], but the point of a namedtuple is that you can getthem by name, as an attribute, e.g. row.Location.

As the name of the attribute isn't fixed, but passed by name, usegetattr(row, label) instead:


    values = {getattr(row, label) for row in group}


As for the values:

    # Remove the missing value, if present.
    values.discard('')

    # There's only 1 value left, so fill in the empty places.
    if len(values) == 1:
        ...

The next point is that namedtuples, like normal tuples, are immutable.You can't change the value of an attribute.

A short test csv is at the end, for you to read in and attempt to
execute the following code, and I'm still working on reconstructing the
lost getattr(row, label) code.

import csv
from collections import namedtuple, defaultdict

def get_title(row):
     return row.title

def complete(group, label):
     values = {row[label] for row in group}
     # get "TypeError: tuple indices must be integers, not str"
     has_empty = not min(values, key=len)
     if len(values) - has_empty != 1:
         # no value or multiple values; manual intervention needed
         return False
     elif has_empty:
         for row in group:
             row[label] = max(values, key=len)
     return True

infile = open("E:\\Coding projects\\Pycharm\\Moving\\Moving 2017 in -
test.csv")
rows = csv.reader(infile)
fieldnames = next(rows)
Record = namedtuple("Record", fieldnames)
records = [Record._make(fieldnames)]
records.extend(Record._make(row) for row in rows)

# group rows by title
groups = defaultdict(list)
for row in records:
     groups[get_title(row)].append(row)

LABELS = ['Location', 'Kind', 'Notes']

# add missing values
for group in groups.values():
     for label in LABELS:
         complete(group, label)

Moving 2017 in - test.csv:
(If this doesn't come through the mail system correctly, I've also
uploaded the file to
http://deborahswanson.net/python/Moving%202017%20in%20-%20test.csv.
Permissions should be set correctly, but let me know if you run into
problems downloading the file.)

[snip]
--
https://mail.python.org/mailman/listinfo/python-list

Re: Namedtuples: some unexpected inconveniences

Reply via email to