Deborah Swanson wrote: > Peter Otten wrote: >> Deborah Swanson wrote: >> >> > Here I have a real mess, in my opinion: >> >> [corrected code:] >> >> > if len(l1[st]) == 0: >> > if len(l2[st]) > 0: >> > l1[st] = l2[st] >> > elif len(l2[st]) == 0: >> > if len(l1[st]) > 0: >> > l2[st] = l1[st] >> >> > Anybody know or see an easier (more pythonic) way to do >> this? I need >> > to do it for four fields, and needless to say, that's a really long >> > block of ugly code. >> >> By "four fields", do you mean four values of st, or four >> pairs of l1, l2, or >> more elif-s with l3 and l4 -- or something else entirely? >> >> Usually the most obvious way to avoid repetition is to write >> a function, and >> to make the best suggestion a bit more context is necessary. >> > > I did write a function for this, and welcome any suggestions for > improvement. > > The context is comparing 2 adjacent rows of data (in a list of real > estate listings sorted by their webpage titles and dates) with the > assumption that if the webpage titles are the same, they're listings for > the same property. This assumption is occasionally bad, but in far less > than one per 1000 unique listings. I'd rather just hand edit the data in > those cases so one webpage title is slightly different, than writing and > executing all the code needed to find and handle these corner cases. > Maybe that will be a future refinement, but right now I don't really > need it. > > Once two rows of listing data have been identified as different dates > for the same property, there are 4 fields that will be identical for > both rows. There can be up to 10 (or even more) listings identical > except for the date, but typically I'm just adding a new one and want to > copy the field data from its previous siblings, so the copying is just > from the last listing to the new one. > > Here's the function I have so far: > > def comprows(l1,l2,st,ki,no): > ret = '' > labels = {st: 'st/co', ki: 'kind', no: 'notes'} > for v in (st,ki,no): > if len(l1[v]) == 0 and len(l2[v]) != 0: > l1[v] = l2[v] > elif len(l2[v]) == 0 and len(l1[v]) != 0: > l2[v] = l1[v] > elif l1[v] != l2[v]: > ret += ", " + labels[v] + " diff" if len(ret) > 0 else > labels[v] + " diff" > return ret > > The 4th field is a special case and easily dispatched in one line of > code before this function is called for the other 3. > > l1 and l2 are the 2 adjacent rows of listing data, with st,ki,no holding > codes for state/county, kind (of property) and notes. I want the > checking and copying to go both ways because sometimes I'm backfilling > old listings that I didn't pick up in my nightly copies on their given > dates, but came across them later. > > ret is returned to a field with details to look at when I save the list > to csv and open it in Excel. The noted diffs will need to be reconciled. > > I tried to use Jussi Piitulainen's suggestion to chain the conditionals, > but just couldn't make it work for choosing list elements to assign to, > although the approach is perfect if you're computing a value. > > Hope this is enough context... ;)
At least the code into which I translate your description differs from the suggestions you have got so far. The main differences: - Look at the whole group, not just two lines - If there is more than one non-empty value in the group don't change any value. from collections import defaultdict def get_title(row): return row[...] def complete(group, label): """For every row in the group set row[label] to a non-empty value if there is exactly one such value. Returns True if values can be set consistently. group is supposed to be a list of dicts. >>> def c(g): ... gg = [{"whatever": value} for value in g] ... if not complete(gg, "whatever"): ... print("fixme", end=" ") ... return [row["whatever"] for row in gg] >>> c(["", "a", ""]) ['a', 'a', 'a'] >>> c(["", "a", "a"]) ['a', 'a', 'a'] >>> c(["", "a", "b"]) fixme ['', 'a', 'b'] >>> c(["a"]) ['a'] >>> c(['']) fixme [''] """ values = {row[label] for row in group} has_empty = not min(values, key=len) if len(values) - has_empty != 1: # no value or multiple values; manual intervention needed return False elif has_empty: for row in group: row[label] = max(values, key=len) return True if __name__ == "__main__": # read rows rows = ... # group rows by title groups = collections.defaultdict(list) for row in rows: groups[get_title(row)].append(row) LABELS = ['st/co', 'kind', 'notes'] # add missing values for group in groups.values(): for label in LABELS: complete(group, label) # write rows ... -- https://mail.python.org/mailman/listinfo/python-list