Deborah Swanson wrote:

> Peter,
> Retracing my steps to rewrite the getattr(row, label) code, this is what
> sent me down the rabbit hole in the first place. (I changed your 'rows'
> to 'records' just to use the same name everywhere, but all else is the
> same as you gave me.) I'd like you to look at it and see if you still
> think complete(group, label) should work. Perhaps seeing why it fails
> will clarify some of the difficulties I'm having.
> I ran into problems with values and has_empty. values has a problem
> because
> row[label] gets a TypeError. has_empty has a problem because a list of
> field values will be shorter with missing values than a full list, but a
> namedtuple with missing values will be the same length as a full
> namedtuple since missing values have '' placeholders.  Two more
> unexpected inconveniences.
> A short test csv is at the end, for you to read in and attempt to
> execute the following code, and I'm still working on reconstructing the
> lost getattr(row, label) code.
> import csv
> from collections import namedtuple, defaultdict
> def get_title(row):
>     return row.title
> def complete(group, label):
>     values = {row[label] for row in group}
>     # get "TypeError: tuple indices must be integers, not str"

Yes, the function expects row to be dict-like. However when you change 



getattr(row, label)

this part of the code will work...

>     has_empty = not min(values, key=len)
>     if len(values) - has_empty != 1:
>         # no value or multiple values; manual intervention needed
>         return False
>     elif has_empty:
>         for row in group:
>             row[label] = max(values, key=len)

but here you'll get an error. I made the experiment to change everything 
necessary to make it work with namedtuples, but you'll probably find the 
result a bit hard to follow:

import csv
from collections import namedtuple, defaultdict

INFILE = "E:\\Coding projects\\Pycharm\\Moving\\Moving 2017 in - test.csv"
OUTFILE = "tmp.csv" 

def get_title(row):
    return row.title

def complete(group, label):
    values = {getattr(row, label) for row in group}  
    has_empty = not min(values, key=len)
    if len(values) - has_empty != 1:
        # no value or multiple values; manual intervention needed
        return False
    elif has_empty:
        # replace namedtuples in the group. Yes, it's ugly
        fix = {label: max(values, key=len)}
        group[:] = [record._replace(**fix) for record in group]
    return True

with open(INFILE) as infile:
    rows = csv.reader(infile)
    fieldnames = next(rows)
    Record = namedtuple("Record", fieldnames)
    groups = defaultdict(list)
    for row in rows:
        record = Record._make(row)

LABELS = ['Location', 'Kind', 'Notes']

# add missing values
for group in groups.values():
    for label in LABELS:
        complete(group, label)

# dump data (as a demo that you do not need the list of all records)
with open(OUTFILE, "w") as outfile:
    writer = csv.writer(outfile)
        record for group in groups.values() for record in group

One alternative is to keep the original and try to replace the namedtuple 
with the class suggested by Gregory Ewing. Then it should suffice to also 

>     elif has_empty:
>         for row in group:
>             row[label] = max(values, key=len)


>     elif has_empty:
>         for row in group:
              setattr(row, label, max(values, key=len))

PS: Personally I would probably take the opposite direction and use dicts 


Reply via email to