Deborah Swanson wrote: > Peter, > > Retracing my steps to rewrite the getattr(row, label) code, this is what > sent me down the rabbit hole in the first place. (I changed your 'rows' > to 'records' just to use the same name everywhere, but all else is the > same as you gave me.) I'd like you to look at it and see if you still > think complete(group, label) should work. Perhaps seeing why it fails > will clarify some of the difficulties I'm having. > > I ran into problems with values and has_empty. values has a problem > because > row[label] gets a TypeError. has_empty has a problem because a list of > field values will be shorter with missing values than a full list, but a > namedtuple with missing values will be the same length as a full > namedtuple since missing values have '' placeholders. Two more > unexpected inconveniences. > > A short test csv is at the end, for you to read in and attempt to > execute the following code, and I'm still working on reconstructing the > lost getattr(row, label) code. > > import csv > from collections import namedtuple, defaultdict > > def get_title(row): > return row.title > > def complete(group, label): > values = {row[label] for row in group} > # get "TypeError: tuple indices must be integers, not str"
Yes, the function expects row to be dict-like. However when you change row[label] to getattr(row, label) this part of the code will work... > has_empty = not min(values, key=len) > if len(values) - has_empty != 1: > # no value or multiple values; manual intervention needed > return False > elif has_empty: > for row in group: > row[label] = max(values, key=len) but here you'll get an error. I made the experiment to change everything necessary to make it work with namedtuples, but you'll probably find the result a bit hard to follow: import csv from collections import namedtuple, defaultdict INFILE = "E:\\Coding projects\\Pycharm\\Moving\\Moving 2017 in - test.csv" OUTFILE = "tmp.csv" def get_title(row): return row.title def complete(group, label): values = {getattr(row, label) for row in group} has_empty = not min(values, key=len) if len(values) - has_empty != 1: # no value or multiple values; manual intervention needed return False elif has_empty: # replace namedtuples in the group. Yes, it's ugly fix = {label: max(values, key=len)} group[:] = [record._replace(**fix) for record in group] return True with open(INFILE) as infile: rows = csv.reader(infile) fieldnames = next(rows) Record = namedtuple("Record", fieldnames) groups = defaultdict(list) for row in rows: record = Record._make(row) groups[get_title(record)].append(record) LABELS = ['Location', 'Kind', 'Notes'] # add missing values for group in groups.values(): for label in LABELS: complete(group, label) # dump data (as a demo that you do not need the list of all records) with open(OUTFILE, "w") as outfile: writer = csv.writer(outfile) writer.writerow(fieldnames) writer.writerows( record for group in groups.values() for record in group ) One alternative is to keep the original and try to replace the namedtuple with the class suggested by Gregory Ewing. Then it should suffice to also change > elif has_empty: > for row in group: > row[label] = max(values, key=len) to > elif has_empty: > for row in group: setattr(row, label, max(values, key=len)) PS: Personally I would probably take the opposite direction and use dicts throughout... -- https://mail.python.org/mailman/listinfo/python-list