maybe we can change this list to dict, using item[0] and item[1] as keys, the whole item as value . then you can update by the same key i think Tim Chase <python.l...@tim.thechases.com>于2015年5月15日 周五01:17写道:
> On 2015-05-14 09:57, 20/20 Lab wrote: > > On 05/13/2015 06:23 PM, Steven D'Aprano wrote: > >>> I have a LARGE csv file that I need to process. 110+ columns, > >>> 72k rows. I managed to write enough to reduce it to a few > >>> hundred rows, and the five columns I'm interested in. > > I actually stumbled across the csv module after coding enough to > > make a list of lists. So that is more the reason I approached the > > list; Nothing like spending hours (or days) coding something that > > already exists and just dont know about. > >>> Now is were I have my problem: > >>> > >>> myList = [ [123, "XXX", "Item", "Qty", "Noise"], > >>> [72976, "YYY", "Item", "Qty", "Noise"], > >>> [123, "XXX" "ItemTypo", "Qty", "Noise"] ] > >>> > >>> Basically, I need to check for rows with duplicate accounts > >>> row[0] and staff (row[1]), and if so, remove that row, and add > >>> it's Qty to the original row. I really dont have a clue how to > >>> go about this. > >> > >> processed = {} # hold the processed data in a dict > >> > >> for row in myList: > >> account, staff = row[0:2] > >> key = (account, staff) # Put them in a tuple. > >> if key in processed: > >> # We've already seen this combination. > >> processed[key][3] += row[3] # Add the quantities. > >> else: > >> # Never seen this combination before. > >> processed[key] = row > >> > >> newlist = list(processed.values()) > >> > > It does, immensely. I'll make this work. Thank you again for the > > link from yesterday and apologies for hitting the wrong reply > > button. I'll have to study more on the usage and implementations > > of dictionaries and tuples. > > In processing the initial CSV file, I suspect that using a > csv.DictReader would make the code a bit cleaner. Additionally, > as you're processing through the initial file, unless you need > the intermediate data, you should be able to do it in one pass. > Something like > > HEADER_ACCOUNT = "account" > HEADER_STAFF = "staff" > HEADER_QTY = "Qty" > > processed = {} > with open("data.csv") as f: > reader = csv.DictReader(f) > for row in reader: > if should_process_row(row): > account = row[HEADER_ACCOUNT] > staff = row[HEADER_STAFF] > qty = row[HEADER_QTY] > try: > row[HEADER_QTY] = qty = int(qty) > except Exception: > # not a numeric quantity? > continue > # from Steven's code > key = (account, staff) > if key in processed: > processed[key][HEADER_QTY] += qty > else: > processed[key][HEADER_QTY] = row > so_something_with(processed.values()) > > I find that using names is a lot clearer than using arbitrary > indexing. Barring that, using indexes-as-constants still would > add further clarity. > > -tkc > > > > > . > -- > https://mail.python.org/mailman/listinfo/python-list >
-- https://mail.python.org/mailman/listinfo/python-list